X-Ray gives you Deep Insights into your Code

Hotspots are code that we have to work with frequently. We know that any improvements we do to a hotspot are likely to pay-off immediately. However, sometimes those improvements aren’t straightforward; Some of the worst hotspots we’ve seen are files with several thousands lines of code. Given that amount of code, where do we start? Are all parts of that file equally important? Are there any functions or methods that contribute more to the code being a hotspot than others? CodeScene’s X-Ray feature answers these questions.

X-Ray is a language-dependent analysis. The supported programming languages are listed in the Supported Programming Languages section.

An Overview of X-Ray

X-Ray is an analysis that operates on the function/method level of your code. Thus, X-Ray is able to provide deep and detailed information on what’s happening inside a Hotspot.

There are three main use cases for the X-Ray functionality:

  1. X-Ray lets you make sense of large files and get specific recommendations on the parts to improve.

  2. X-Ray provides detailed information on why a cluster of files are temporally coupled.

  3. X-Ray recommends re-structuring opportunities on the methods in your Hotspots in order to make the code easier to understand and maintain.

In the following guide we’ll cover all of these cases. Let’s start with how you can make sense of large files.

X-Ray calculates Hotspots on a Method Level

A Hotspot analysis is orthogonal to the data it operates on. That is, CodeScene presents hotspots as individual files, but also on an architectural level as entire components and sub-systems. With X-Ray, we climb down the abstraction ladder and run a Hotspot analysis on a method level.

A large file is like a system in itself. Some parts remain stable, while other parts of the file keeping changing as new features are added and bugs get resolved. With X-Ray, you’ll get a prioritized list of the methods you want to refactor and improve first. This is important since re-designing a large module is both high-risk and expensive. So instead you want to take an iterative approach to your improvements and base those improvements on data.

To run X-Ray, go to your Hotspot map, click on the Hotspot and select ‘X-Ray’ from the context menu as shown in Fig. 105.

Run X-Ray from the context menu

Fig. 105 Run X-Ray from the context menu.

X-Ray is run on demand. That is, the first time you execute it on a Hotspot it may take a few seconds to get the results. Sub-sequent accesses are cheap since we cache the results.

Once you get the results you’ll see that you typically spend more time on some methods than others. So let’s walk through the X-Ray results and look at the individual pieces. Have a look Fig. 106 as a starting-point.

An overview of the X-Ray results.

Fig. 106 The starting point in an X-Ray analysis.

Fig. 106 shows the results of an X-Ray analysis. We see that our hotspot is a method named CreateInvoker, which consists of 193 lines of code. You also see that CreateInvoker has a Cyclomatic Complexity of 22, which is a fairly high number. Thus, the method represents complicated code that you also have to work with often.

Methods like this are exactly where you’d like to focus your refactoring efforts; The high change frequency of the method indicates that improvements are likely to pay-off immediately. And the lines of code and complexity numbers gives you a sense of the effort you need to invest to make the necessary improvements.

But X-Ray gives you more information. As you see in the table above, CodeScene also lets you run a Complexity Trend analysis of an individual method:

CodeScene presents complexity trends on a function level.

Fig. 107 CodeScene presents complexity trends on a function level.

Interpret Cyclomatic Complexity in Context of Relevance

The cyclomatic complexity measure included in X-Ray shouldn’t stand on its own. Just because some code is complex doesn’t mean it’s a problem. However, when we combine a complexity measure with change frequencies – like X-Ray does – we get information we can act upon since the code complexity is put into context and ranked based on relevance.

CodeScene includes its cyclomatic complexity metric as a supplement to the other information as a decent approximation of, well, complexity. As a rule of thumb, any cyclomatic complexity value above 10 is likely to be problematic. A cyclomatic complexity beyond 25 is likely to hint at a true maintenance nightmare. But again, use the complexity value as a guide, not as an absolute truth.

Cyclomatic complexity also helps you make refactoring decisions in the sense that you get a rough idea on how hard the code will be to test. Each branch in your functions add to their complexity value and, as a direct consequence, to the testing efforts.

Break Down Defects to the Method Level

In addition to the complexity metrics, CodeScene’s X-Ray lets you break down defect statistics to individual functions. The purpose of this analysis is to further inform refactoring and rework decisions; maybe it’s easier to communicate the need for a larger refactoring effort if you can show that 30% of all bug fixes are in a single hotspot method?

CodeScene breaks down defect statistics from a hotspot file to a function level.

Fig. 108 CodeScene breaks down defect statistics from a hotspot file to a function level.

Calculating defects in the X-Ray analysis requires an integration with a project management tool as described in Integrate Costs and Issues into CodeScene (Jira, Trello, Azure DevOps and GitHub Issues).

A Note on Overloaded Methods

Some languages like C++, C#, and Java let you use the same function name for different implementations. CodeScene lets you configure how to analyse overloads. There are two options:

  1. Analyse overloaded methods separately: each overloaded method is treated as a separate unit of analysis in the X-Ray.

  2. Combine overloaded methods: This is the default behavior, and CodeScene presents the statistics for all of the overloads as one, single entry.

You configure your choice in each analysis project (Hotspot section).

In case #2, X-Ray will combine all overloads with the same name into a single unit of measure. That is, if you have functions with the signature f(int) and f(string) they will be combined in the analysis. This approach typically gives you better results since the overloaded functions are part of the same logical unit of design and you want to analyze them as such.

CodeScene includes a count on the total number of methods to highlight such overloads, as shown in Fig. 109.

Overloaded methods in X-Ray

Fig. 109 X-Ray highlights the total number of methods behind each overloaded hotspot.

X-Ray calculates Temporal Coupling between Methods

As you X-Ray a Hotspot, CodeScene also looks for temporal coupling between individual methods in that file. This is information that helps you identify unexpected change patterns. Let’s look the example in Fig. 110.

X-Ray calculates temporal coupling between the methods in your Hotspot.

Fig. 110 X-Ray calculates temporal coupling between the methods in your Hotspot.

Fig. 110 shows that two methods, CreateInvoker and Invoke_UsesDefaultValuesIfNotBound changes together in 60% of all changes. That is, every second time you change one of these methods there’s a predictable change to the other one.

You use the Temporal Coupling results as input to your refactoring efforts. For example, in the example above, you probably want to have a close look at both methods to see why they are so strongly coupled in time. Often, there’s either a leaky abstraction or a fair chunk of duplicated logic in either part of the code.

X-Ray lets you look into Temporal Coupling Clusters

Temporal Coupling is one of the most powerful software analyses in our arsenal. A temporal coupling analysis often highlights unexpected change patterns in our codebase and provides us with important information that we cannot deduce from the code alone. However, temporal coupling has also been one of the hardest results to act upon.

Think about it for a minute. Let’s say that you investigate some temporal coupling results and identify a cluster of 10 files that tend to change together. Now, how do you uncover the reason for this coupling in time? Well, in more complex cases you need to compare the code and walk through the historic revisions to know which parts of the files that are responsible for the coupling. This can be painful, particularly for large files that are low on cohesion. Enter X-Ray for temporal coupling.

With X-Ray, all of these steps are completely automated. You just click on a file in the temporal coupling visualization and select ‘X-Ray’ from the context menu as illustrated in Fig. 111.

Invoke X-Ray by using the context menu in a temporal coupling visualization.

Fig. 111 X-Ray lets you investigate temporal coupling clusters in detail.

Once X-Ray is done, you’re presented with a dependency wheel on method level. Have a look the dependency wheel in Fig. 112 and I’ll walk you though the details.

The X-Ray of external temporal coupling

Fig. 112 The dependency wheel shows the temporal coupling between methods.

The dependency wheel in Fig. 112 is an interactive visualization. As you see in the example above, when we hover over the part that represents the method RendersLinkTagsForGlobbedHrefResults, we see that the method is coupled in time to six other methods located in a different class. This information is powerful: now we’ve limited the amount of code you need to inspect in order to improve the design and break this expensive change pattern.

Find change patterns across repository boundaries

Since CodeScene’s analyses are language neutral it can identify implicit/hidden change patterns between code implemented in different languages. But CodeScene can go an extra mile: it can even uncover such change patterns when the different files are located in separate Git repositories! Take a look at the X-Ray results in Fig. 113.

The X-Ray of external temporal coupling between repositories.

Fig. 113 X-Ray works across multiple repositories.

As you see in the preceding figure, X-Ray works across Git repository boundaries to identify the functions responsible for the temporal coupling. This is a powerful analysis that is particularly useful to:

  • Microservices: Implicit dependencies across service boundaries is problematic since it couples the life cycle of different services to each other. Use CodeScene to detect and X-Ray such dependencies.

  • Producer/Consumer: The preceding example is a modern variation of the client-server pattern. Use X-Ray to learn about the change pattern in a complex, multi-repository project.

  • Inter-Team Coordination: In large organizations different teams tend to be responsible for the code in different repositories. Using X-Ray’s inter-repo analysis lets you uncover expensive change patterns that impact other teams.

Unfortunately X-Ray across repository boundaries doesn’t work by magic; There has to be some mechanism to relate different commits to the same logical change set. CodeScene use Ticket IDs for that purpose, so all you need to do is to configure your Ticket ID patterns and this X-Ray feature will become enabled.

As a bonus, this feature also works well in the case of differing commit styles; Some organizations prefer to build their features by many small and incomplete commits. As a consequence, a single commit contains very little information and there’s usually no temporal coupling between commits. Temporal coupling by Ticket ID provides a viable alternative here.

X-Ray detects Software Clones

Temporal coupling arises for several reasons. It’s also important to note that all coupling isn’t bad. For example, you’d expect a unit test to change together with the code under test. However, in the case where you can’t think about any good reason two pieces of code keep changing at the same time you’ll inevitably find a refactoring opportunity.

One of the most common reasons for unexpected temporal coupling is a dear old friend: copy-paste. In fact, copy-paste is so common that we’ve included an analysis of code similarity in X-Ray.

You get to the code similarity analysis by clicking at the result tab for External Temporal Coupling Details as illustrated in Fig. 114.

An example on the code similarity analysis in X-Ray

Fig. 114 The Code Similarity analysis let you uncover copy-paste code.

In Fig. 114 you see that there are two methods with the same name, but located in different classes, that have a code similarity of 98%. You want to use this data as a starting point. If you could encapsulate that shared logic in a separate method that you re-use between the two classes your temporal coupling will go away. Your application will become a little bit easier to maintain.

A word on Software Clone Detection

Copy-paste detection isn’t exactly a new technique. However, it’s still far from mainstream in the software industry. One reason that copy-paste detectors haven’t caught on is because they fail to prioritize their findings in a sensible way.

If you look at studies of large codebases, you’ll learn that around 5-20% of all large codebases represents duplicated logic to some degree. That’s quite a lot. There’s simply no way you can start to refactor that amount of code and hope to get a return on that investment. In fact, most of that duplicated code doesn’t matter. So how can we find the software clones that limit out ability to maintain the system?

CodeScene’s X-Ray solves this dilemma. By combining copy-paste detection with temporal coupling we know that the identified software clones matter. For example, if you look at the example above, you’ll see that the two methods with a code similarity of 98% are changed together in one third of all cases. That is, with X-Ray you’ll find the software clones that actually matter. This lets you prioritize the improvements that you do while still ensuring that you get a real return on those refactoring investments.

Follow the Restructuring Recommendations

Empear’s CodeScene is the first ever software analysis tool that implements a proximity analysis. The X-Ray findings present the proximity results as a set of recommendations on how to re-structure the methods in a Hotspot in order to make the code more readable. Let’s start by understanding the concept of proximity and why it matters to our ability to maintain code.

The proximity principle focuses on how well organized your code is with respect to readability and change. You use proximity both as a design principle and as a heuristic to evaluate the cohesion and structure of existing code.

The principle of proximity is a concept from Gestalt psychology. The Gestalt movement pioneered principles on how we make sense of all chaotic input from our sensory systems. We need to understand the Gestalt principles if we want to optimize our code for readability. Remember, we use the same brain to interpret code as we use to make sense of the physical world.

The principle of proximity

Fig. 115 An illustration of the Principle of Proximity where our brain forms groups of related objects.

Within Gestalt psychology, the principle of proximity specifies that objects or shapes that are close to one another appear to form groups as illustrated in Fig. 115. If we translate this to software, it means that readable code is structured in a way that lets our brain understand parts of the source code file as a whole. The main reason is because we want our code to support our change patterns: code that is expected to be changed together should be close. Such a code structure serves as a powerful reminder to both the programmer and, more important, the code reader that a set of functions belong together.

CodeScene measures proximity based on your change patterns (aka internal temporal coupling). You see an example on a proximity analysis in Fig. 116 from the implementation of the Clojure programming language.

An example on a proximity analysis in X-Ray

Fig. 116 The Proximity Analysis recommends re-structuring of the methods in a Hotspot.

The highlighted recommendation in Fig. 116 shows two functions, hash-map and array-map, that are frequently changed together. That is, they are temporally coupled. However, if you look at the implementation in the Clojure project you’ll see that there are thousands of lines of code between hash-map and array-map. This is bad news for a maintenance programmer because it’s so easy to miss an update to one of the functions. A simple, low-risk refactoring is to just move those two functions next to each other. That simple change lets the code signal that the functions belong together. In addition it dramatically increases the chances that a bug fix to one of the functions is applied to the other function too.

So what metric do we use for proximity? If you look at Fig. 116 you see that there’s a Total Proximity column in the analysis results. The proximity values specify the distance between the related functions. The unit of measure is the number of intermediate functions between the related parts. In our example with hash-map and array-map Fig. 116 shows that there’s a total proximity of 299. That means that there are 299 (!) functions separating the implementation of hash-map from its related temporally coupled array-map.

Know the limitations of Method-level analyses

CodeScene tracks renamed content. That is, if you move or rename a file, we make sure to fetch its past history even if you’ve renamed the file multiple times. We implement a similar mechanism for X-Ray too. X-Ray will track and analyze the history of renamed methods/functions…except when it won’t. Let’s elaborate on that so that you know the possible corner cases.

First of all we have a philosophical question here. Let’s say you decide to refactor parts of your code. You simplify some parts of it and rename a few functions. Now, when is a function renamed and when is it actually a new function that replaces an old one? This distinction isn’t clear.

X-Ray resolves this dilemma by introducing a set of heuristics for its rename detection. In general, X-Ray tries to do the most sensible thing while avoiding false positives in the analysis results.

Increase the Depth of the Analysis

By default, X-Ray will look at a maximum of 200 revisions. In most codebases that’s more than enough. So why put a limit on it? Well, there are projects that have been around for a long time and their top Hotspots may well have over thousands of commits. To X-Ray that data will take quite some time. In addition, the most interesting patterns are likely to be in the recent evolution of the Hotspot.

Most of the time this is the behavior that you want. However, in case you want to dive deeper and X-Ray the complete evolution of a Hotspot you need to instruct CodeScene to do that. This choice is a simple matter of configuration as illustrated in Fig. 117.

Change X-Ray configuration to include all revisions

Fig. 117 The project configuration lets you X-Ray all revisions of a Hotspot.