Empirical Software Engineering, Volume 26, Issue 1


Anthology ID:
G21-23
Month:
Year:
2021
Address:
Venue:
GWF
SIG:
Publisher:
Springer Science and Business Media LLC
URL:
https://gwf-uwaterloo.github.io/gwf-publications/G21-23
DOI:
Bib Export formats:
BibTeX MODS XML EndNote

pdf bib
ID-correspondence: a measure for detecting evolutionary coupling
Manishankar Mondal | Banani Roy | Chanchal K. Roy | Kevin A. Schneider

Evolutionary coupling is a well investigated phenomenon in software maintenance research and practice. Association rules and two related measures, support and confidence, have been used to identify evolutionary coupling among program entities. However, these measures only emphasize the co-change (i.e., changing together) frequency of entities and cannot determine whether the entities co-evolved by experiencing related changes. Consequently, the approach reports false positives and fails to detect evolutionary coupling among infrequently co-changed entities. We propose a new measure, identifier correspondence (id-correspondence), that quantifies the extent to which changes that occurred to the co-changed entities are related based on identifier similarity. Identifiers are the names given to different program entities such as variables, methods, classes, packages, interfaces, structures, unions etc. We use Dice-Sørensen co-efficient for measuring lexical similarity between the identifiers involved in the changed lines of the co-changed entities. Our investigation on thousands of revisions from nine subject systems covering three programming languages shows that id-correspondence can considerably improve the detection accuracy of evolutionary coupling. It outperforms the existing state-of-the-art evolutionary coupling based techniques with significantly higher recall and F-score in predicting future co-change candidates.