2022
2021
Software architectural changes involve more than one module or component and are complex to analyze compared to local code changes. Development teams aiming to review architectural aspects (design) of a change commit consider many essential scenarios such as access rules and restrictions on usage of program entities across modules. Moreover, design review is essential when proper architectural formulations are paramount for developing and deploying a system. Untangling architectural changes, recovering semantic design, and producing design notes are the crucial tasks of the design review process. To support these tasks, we construct a lightweight tool [4] that can detect and decompose semantic slices of a commit containing architectural instances. A semantic slice consists of a description of relational information of involved modules, their classes, methods and connected modules in a change instance, which is easy to understand to a reviewer. We extract various directory and naming structures (DANS) properties from the source code for developing our tool. Utilizing the DANS properties, our tool first detects architectural change instances based on our defined metric and then decomposes the slices (based on string processing). Our preliminary investigation with ten open-source projects (developed in Java and Kotlin) reveals that the DANS properties produce highly reliable precision and recall (93-100%) for detecting and generating architectural slices. Our proposed tool will serve as the preliminary approach for the semantic design recovery and design summary generation for the project releases.
Software architectural changes involve more than one module or component and are complex to analyze compared to local code changes. Development teams aiming to review architectural aspects (design) of a change commit consider many essential scenarios such as access rules and restrictions on usage of program entities across modules. Moreover, design review is essential when proper architectural formulations are paramount for developing and deploying a system. Untangling architectural changes, recovering semantic design, and producing design notes are the crucial tasks of the design review process. To support these tasks, we construct a lightweight tool [4] that can detect and decompose semantic slices of a commit containing architectural instances. A semantic slice consists of a description of relational information of involved modules, their classes, methods and connected modules in a change instance, which is easy to understand to a reviewer. We extract various directory and naming structures (DANS) properties from the source code for developing our tool. Utilizing the DANS properties, our tool first detects architectural change instances based on our defined metric and then decomposes the slices (based on string processing). Our preliminary investigation with ten open-source projects (developed in Java and Kotlin) reveals that the DANS properties produce highly reliable precision and recall (93-100%) for detecting and generating architectural slices. Our proposed tool will serve as the preliminary approach for the semantic design recovery and design summary generation for the project releases.
Applications of image registration tasks are computation-intensive, memory-intensive, and communication-intensive. Robust efforts are required on error recovery and re-usability of both the data and the operations, along with performance optimization. Considering these, we explore various programming models aiming to minimize the folding operations (such as join and reduce) which are the primary candidates of data shuffling, concurrency bugs and expensive communication in a distributed cluster. Particularly, we analyze modular MapReduce execution of an image registration pipeline (IRP) with the external and internal data (data-tunneling) flow mechanism and compare them with the compact model. Experimental analyzes with the ComputeCanada cluster and a crop field data-sets containing 1000 images show that these design options are valuable for large-scale IRPs executed with a MapReduce cluster. Additionally, we present an effectiveness measurement metric to analyze the impact of a design model for the Big IRP, accumulating the error-recovery and re-usability metrics along with the data size and execution time. Our explored design models and their performance analysis can serve as a benchmark for the researchers and application developers who deploy large-scale image registration and other image processing tasks.
Applications of image registration tasks are computation-intensive, memory-intensive, and communication-intensive. Robust efforts are required on error recovery and re-usability of both the data and the operations, along with performance optimization. Considering these, we explore various programming models aiming to minimize the folding operations (such as join and reduce) which are the primary candidates of data shuffling, concurrency bugs and expensive communication in a distributed cluster. Particularly, we analyze modular MapReduce execution of an image registration pipeline (IRP) with the external and internal data (data-tunneling) flow mechanism and compare them with the compact model. Experimental analyzes with the ComputeCanada cluster and a crop field data-sets containing 1000 images show that these design options are valuable for large-scale IRPs executed with a MapReduce cluster. Additionally, we present an effectiveness measurement metric to analyze the impact of a design model for the Big IRP, accumulating the error-recovery and re-usability metrics along with the data size and execution time. Our explored design models and their performance analysis can serve as a benchmark for the researchers and application developers who deploy large-scale image registration and other image processing tasks.
2019
In modern days, mobile applications (apps) have become omnipresent. Components of mobile apps (such as 3rd party libraries) require to be separated and analyzed differently for security issue detection, repackaged app detection, tumor code purification and so on. Various techniques are available to automatically analyze mobile apps. However, analysis of the app's executable binary remains challenging due to required curated database, large codebases and obfuscation. Considering these, we focus on exploring a versatile technique to separate different components with design-based features independent of code obfuscation. Particularly, we conducted an empirical study using design patterns and fuzzy signatures to separate app components such as 3rd party libraries. In doing so, we built a system for automatically extracting design patterns from both the executable package (APK) and Jar of an Android application. The experimental results with various standard datasets containing 3rd party libraries, obfuscated apps and malwares reveal that design features like these are present significantly within them (within 60% APKs including malware). Moreover, these features remain unaltered even after app obfuscation. Finally, as a case study, we found that the design patterns alone can detect 3rd party libraries within the obfuscated apps considerably (F1 score is 32%). Overall, our empirical study reveals that design features might play a versatile role in separating various Android components for various purposes.
Big Data analytics or systems developed with parallel distributed processing frameworks (e.g., Hadoop and Spark) are becoming popular for finding important insights from a huge amount of heterogeneous data (e.g., image, text, and sensor data). These systems offer a wide range of tools and connect them to form workflows for processing Big Data. Independent schemes from different studies for managing programs and data of workflows have been already proposed by many researchers and most of the systems have been presented with data or metadata management. However, to the best of our knowledge, no study particularly discusses the performance implications of utilizing intermediate states of data and programs generated at various execution steps of a workflow in distributed platforms. In order to address the shortcomings, we propose a scheme of Big Data management for micro-level modular computation-intensive programs in a Spark and Hadoop-based platform. In this paper, we investigate whether management of the intermediate states can speed up the execution of an image processing pipeline consisting of various image processing tools/APIs in Hadoop Distributed File System (HDFS) while ensuring appropriate reusability and error monitoring. From our experiments, we obtained prominent results, e.g., we have reported that with the intermediate data management, we can gain up to 87% computation time for an image processing job.