[Research Paper] CroLSim: Cross Language Software Similarity Detector Using API Documentation

Kawser Wazed Nafi, Banani Roy, Chanchal K. Roy, Kevin A. Schneider


Abstract
In today's open source era, developers look forsimilar software applications in source code repositories for anumber of reasons, including, exploring alternative implementations, reusing source code, or looking for a better application. However, while there are a great many studies for finding similarapplications written in the same programming language, there isa marked lack of studies for finding similar software applicationswritten in different languages. In this paper, we fill the gapby proposing a novel modelCroLSimwhich is able to detectsimilar software applications across different programming lan-guages. In our approach, we use the API documentation tofind relationships among the API calls used by the differentprogramming languages. We adopt a deep learning based word-vector learning method to identify semantic relationships amongthe API documentation which we then use to detect cross-language similar software applications. For evaluating CroLSim, we formed a repository consisting of 8,956 Java, 7,658 C#, and 10,232 Python applications collected from GitHub. Weobserved thatCroLSimcan successfully detect similar softwareapplications across different programming languages with a meanaverage precision rate of 0.65, an average confidence rate of3.6 (out of 5) with 75% high rated successful queries, whichoutperforms all related existing approaches with a significantperformance improvement.
Cite:
Kawser Wazed Nafi, Banani Roy, Chanchal K. Roy, and Kevin A. Schneider. 2018. [Research Paper] CroLSim: Cross Language Software Similarity Detector Using API Documentation. 2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM).
Copy Citation: