15th Innovations in Software Engineering Conference
- Anthology ID:
Exploring the source code of a software system is a prevailing task that is frequently done by contributors to a system. Practitioners often use call graphs to aid in understanding the source code of an inadequately documented software system. Call graphs, when visualized, show caller and callee relationships between functions. A static call graph provides an overall structure of a software system and dynamic call graphs generated from dynamic execution logs can be used to trace program behaviour for a particular scenario. Unfortunately a call graph of an entire system can be very complicated and hard to understand. Hierarchically abstracting a call graph can be used to summarize an entire system’s structure and more easily comprehending function calls. In this work, we mine concepts from source code entities (functions) to generate a concept cluster tree with improved naming of cluster nodes to complement existing studies and facilitate more effective program comprehension for developers. We apply three different information retrieval techniques (TFIDF, LDA, and LSI) on function names and function name variants to label the nodes of a concept cluster tree generated by clustering execution paths. From our experiment in comparing automatic labelling with manual labeling by participants for 12 use cases, we found that among the techniques on average, TFIDF performs better with 64% matching. LDA and LSI had 37% and 23% matching respectively. In addition, using the words in function name variants performed at least 5% better in participant ratings for all three techniques on average for the use cases.
Testing software is considered to be one of the most crucial phases in software development life cycle. Software bug fixing requires a significant amount of time and effort. A rich body of recent research explored ways to predict bugs in software artifacts using machine learning based techniques. For a reliable and trustworthy prediction, it is crucial to also consider the explainability aspects of such machine learning models. In this paper, we show how the feature transformation techniques can significantly improve the prediction accuracy and build confidence in building bug prediction models. We propose a novel approach for improved bug prediction that first extracts the features, then finds a weighted transformation of these features using a genetic algorithm that best separates bugs from non-bugs when plotted in a low-dimensional space, and finally, trains the machine learning model using the transformed dataset. In our experiment with real-life bug datasets, the random forest and k-nearest neighbor classifier models that leveraged feature transformation showed 4.25% improvement in recall values on an average of over 8 software systems when compared to the models built on original data.
Software developers often submit questions to technical Q&A sites like Stack Overflow (SO) to resolve their code-level problems. Usually, they include example code segments with their questions to explain the programming issues. When users of SO attempt to answer the questions, they prefer to reproduce the issues reported in questions using the given code segments. However, such code segments could not always reproduce the issues due to several unmet challenges (e.g., too short code segment) that might prevent questions from receiving appropriate and prompt solutions. A previous study produced a catalog of potential challenges that hinder the reproducibility of issues reported at SO questions. However, it is unknown how the practitioners (i.e., developers) perceive the challenge catalog. Understanding the developers’ perspective is inevitable to introduce interactive tool support that promotes reproducibility. We thus attempt to understand developers’ perspectives by surveying 53 users of SO. In particular, we attempt to – (1) see developers’ viewpoints on the agreement to those challenges, (2) find the potential impact of those challenges, (3) see how developers address them, and (4) determine and prioritize tool support needs. Survey results show that about 90% of participants agree to the already exposed challenges. However, they report some additional challenges (e.g., error log missing) that might prevent reproducibility. According to the participants, too short code segment and absence of required Class/Interface/Method from code segments severely prevent reproducibility, followed by missing important part of code. To promote reproducibility, participants strongly recommend introducing tool support that interacts with question submitters with suggestions for improving the code segments if the given code segments fail to reproduce the issues.
Software bug prediction is one of the promising research areas in software engineering. Software developers must allocate a reasonable amount of time and resources to test and debug the developed software extensively to improve software quality. However, it is not always possible to test software thoroughly with limited time and resources to develop high quality software. Sometimes software companies release software products in a hurry to make profit in a competitive environment. As a result the released software might have software defects and can affect the reputation of those software companies. Ideally, any software application that is already in the market should not contain bugs. If it does, depending on its severity, it might cause a great cost. Although a significant amount of work has been done to automate different parts of testing to detect bugs, fixing a bug after it is discovered is still a costly task that developers need to do. Sometimes these bug fixing changes introduce new bugs in the system. Researchers estimated that 80% of the total cost of a software system is spent on fixing bugs . They show that the software faults and failures costs the US economy $59.5 billion a year .