Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering


Anthology ID:
G19-129
Month:
Year:
2019
Address:
Venue:
GWF
SIG:
Publisher:
ACM
URL:
https://gwf-uwaterloo.github.io/gwf-publications/G19-129
DOI:
Bib Export formats:
BibTeX MODS XML EndNote

pdf bib
CloneCognition: machine learning based code clone validation tool
Golam Mostaeen | Jeffrey Svajlenko | Banani Roy | Chanchal K. Roy | Kevin A. Schneider

A code clone is a pair of similar code fragments, within or between software systems. To detect each possible clone pair from a software system while handling the complex code structures, the clone detection tools undergo a lot of generalization of the original source codes. The generalization often results in returning code fragments that are only coincidentally similar and not considered clones by users, and hence requires manual validation of the reported possible clones by users which is often both time-consuming and challenging. In this paper, we propose a machine learning based tool 'CloneCognition' (Open Source Codes: https://github.com/pseudoPixels/CloneCognition ; Video Demonstration: https://www.youtube.com/watch?v=KYQjmdr8rsw) to automate the laborious manual validation process. The tool runs on top of any code clone detection tools to facilitate the clone validation process. The tool shows promising clone classification performance with an accuracy of up to 87.4%. The tool also exhibits significant improvement in the results when compared with state-of-the-art techniques for code clone validation.