Proceedings of the 40th International Conference on Software Engineering


Anthology ID:
G18-144
Month:
Year:
2018
Address:
Venue:
GWF
SIG:
Publisher:
ACM
URL:
https://gwf-uwaterloo.github.io/gwf-publications/G18-144
DOI:
Bib Export formats:
BibTeX MODS XML EndNote

pdf bib
CCAligner
Pengcheng Wang | Jeffrey Svajlenko | Yanzhao Wu | Yun Xu | Chanchal K. Roy

Copying code and then pasting with large number of edits is a common activity in software development, and the pasted code is a kind of complicated Type-3 clone. Due to large number of edits, we consider the clone as a large-gap clone. Large-gap clone can reflect the extension of code, such as change and improvement. The existing state-of-the-art clone detectors suffer from several limitations in detecting large-gap clones. In this paper, we propose a tool, CCAligner, using code window that considers e edit distance for matching to detect large-gap clones. In our approach, a novel e-mismatch index is designed and the asymmetric similarity coefficient is used for similarity measure. We thoroughly evaluate CCAligner both for large-gap clone detection, and for general Type-1, Type-2 and Type-3 clone detection. The results show that CCAligner performs better than other competing tools in large-gap clone detection, and has the best execution time for 10MLOC input with good precision and recall in general Type-1 to Type-3 clone detection. Compared with existing state-of-the-art tools, CCAligner is the best performing large-gap clone detection tool, and remains competitive with the best clone detectors in general Type-1, Type-2 and Type-3 clone detection.