Fast, scalable and user-guided clone detection

Jeffrey Svajlenko, Chanchai K. Roy


Abstract
Despite the great number of clone detection approaches proposed in the literature, few have the scalability and speed to analyze large inter-project source datasets, where clone detection has many potential applications. Furthermore, because of the many uses of clone detection, an approach is needed that can adapt to the needs of the user to detect any kind of clone. We propose a clone detection approach designed for user-guided clone detection by exploiting the power of source transformation in a plugin based source processing pipeline. Clones are detected using a simple Jaccard-based clone similarity metric, and users customize the representation of their source code as sets of terms to target particular types or kinds of clones. Fast and scalable clone detection is achieved with indexing, sub-block filtering and input partitioning.
Cite:
Jeffrey Svajlenko and Chanchai K. Roy. 2018. Fast, scalable and user-guided clone detection. Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings.
Copy Citation: