Know the possible Biases in the Data

Our social metrics, like all software metrics, are an approximation of the real world. There will always going to be corner cases and biases in the data. In particular, there are some situations where the metrics don’t perform as well. So please read the following section in order to minimize the bias in the analysis results.

Developers with Multiple Aliases

A developer may end up with multiple aliases. Perhaps they’re committing from both a personal- and a company account. Or they’ve changed their e-mail address. This introduces a bias in the data since CodeScene uses the name of each developer as their identification.

Fortunately, you can avoid this bias by resolving the author aliases in CodeScene’s configuration UI. As an alternative to the UI, you may also use a Git feature called .mailmap. A .mailmap is a file that you include in the root of your Git repository. The file specifies a mapping from multiple names and addresses to the canonical name and address of each developer with multiple aliases. It’s straightforward to use a .mailmap, so please check out the git log documentation for the format.

Autosquash Commits

Some teams may use a Git feature called autosquash. This feature is a way of re-writing the development history. It may be fine if squashing is used for the work of an individual developer. Unfortunately the feature is sometimes used to combine the work of multiple programmers into a single commit.

The consequence is that the analyses lose important data for temporal coupling and, in particular, the social metrics become more limited than they’d have to be. For example, it’s not possible to generate a knowledge map over individual programmers, which means that you miss the opportunity to use the analysis methods for on- and off-boarding.

It’s highly recommended that you reconsider the autosquash strategy in case you apply it today. In general, the work of multiple programmers should not be compressed in a single commit.

Pair Programming

The knowledge metrics in CodeScene are based on the author of the code as recorded by Git. This may obviously be misleading if your organization does pair-programming.

CodeScene does supports knowledge maps for pair and mob programming, where the credits are split between the contributors in the pair. Refer to in Configure Developers and Teams for the configuration options needed to activate this feature.