#### 2023

The development of state-of-the-art convolutional neural networks (CNN) has allowed researchers to perform plant classification tasks previously thought impossible and rely on human judgment. Researchers often develop complex CNN models to achieve better performances, introducing over-parameterization and forcing the model to overfit on a training dataset. The most popular process for evaluating overfitting in a deep learning model is using accuracy and loss curves. Train and loss curves may help understand the performance of a model but do not provide guidance on how the model could be modified to attain better performance. In this article, we analyzed the relation between the features learned by a model and its capacity and showed that a model with higher representational capacity might learn many subtle features that may negatively affect its performance. Next, we showed that the shallow layers of a deep learning model learn more diverse features than the ones learned by the deeper layers. Finally, we propose SSIM cut curve, a new way to select the depth of a CNN model by using the pairwise similarity matrix between the visualization of the features learned at different depths by using Guided Backpropagation. We showed that our proposed method could potentially pave a new way to select a better CNN model.

#### 2022

Testing software is considered to be one of the most crucial phases in software development life cycle. Software bug fixing requires a significant amount of time and effort. A rich body of recent research explored ways to predict bugs in software artifacts using machine learning based techniques. For a reliable and trustworthy prediction, it is crucial to also consider the explainability aspects of such machine learning models. In this paper, we show how the feature transformation techniques can significantly improve the prediction accuracy and build confidence in building bug prediction models. We propose a novel approach for improved bug prediction that first extracts the features, then finds a weighted transformation of these features using a genetic algorithm that best separates bugs from non-bugs when plotted in a low-dimensional space, and finally, trains the machine learning model using the transformed dataset. In our experiment with real-life bug datasets, the random forest and k-nearest neighbor classifier models that leveraged feature transformation showed 4.25% improvement in recall values on an average of over 8 software systems when compared to the models built on original data.

AbstractLet M be a two-dimensional table with each cell weighted by a nonzero positive number. A StreamTable visualization of M represents the columns as non-overlapping vertical streams and the rows as horizontal stripes such that the intersection between a stream and a stripe is a rectangle with area equal to the weight of the corresponding cell. To avoid large wiggle of the streams, it is desirable to keep the consecutive cells in a stream to be adjacent. Let B be the smallest axis-aligned bounding box containing the StreamTable. Then the difference between the area of B and the sum of the weights is referred to as the excess area. We attempt to optimize various StreamTable aesthetics (e.g., minimizing excess area, or maximizing cell adjacencies in streams). If the row permutation is fixed and the row heights are given, then we give an O(rc)-time algorithm to optimizes these aesthetics, where r and c are the number of rows and columns, respectively. If the row permutation is fixed but the row heights can be chosen, then we discuss a technique to compute an aesthetic (but not necessarily optimal) StreamTable by solving a quadratically-constrained quadratic program, followed by iterative improvements. If the row heights are restricted to be integers, then we prove the problem to be NP-hard. If the row permutations can be chosen, then we show that it is NP-hard to find a row permutation that optimizes the area or adjacency aesthetics. KeywordsGeometric AlgorithmsTable CartogramStreamgraphs

Multi-attribute dataset visualizations are often designed based on attribute types, i.e., whether the attributes are categorical or numerical. Parallel Sets and Parallel Coordinates are two well-known techniques to visualize categorical and numerical data, respectively. A common strategy to visualize mixed data is to use multiple information linked view, e.g., Parallel Coordinates are often augmented with maps to explore spatial data with numeric attributes. In this paper, we design visualizations for mixed data, where the dataset may include numerical, categorical, and spatial attributes. The proposed solution SET-STAT-MAP is a harmonious combination of three interactive components: Parallel Sets (visualizes sets determined by the combination of categories or numeric ranges), statistics columns (visualizes numerical summaries of the sets), and a geospatial map view (visualizes the spatial information). We augment these components with colors and textures to enhance users' capability of analyzing distributions of pairs of attribute combinations. To improve scalability, we merge the sets to limit the number of possible combinations to be rendered on the display. We demonstrate the use of Set-stat-map using two different types of datasets: a meteorological dataset and an online vacation rental dataset (Airbnb). To examine the potential of the system, we collaborated with the meteorologists, which revealed both challenges and opportunities for Set-stat-map to be used for real-life visual analytics.

#### 2021

Changes in spatiotemporal data may often go unnoticed due to their inherent noise and low variability (e.g., geological processes over years). Commonly used approaches such as side-by-side contour plots and spaghetti plots do not provide a clear idea about the temporal changes in such data. We propose ContourDiff, a vector-based visualization over contour plots to visualize the trends of change across spatial regions and temporal domain. Our approach first aggregates for each location, its value differences from the neighboring points over the temporal domain, and then creates a vector field representing the prominent changes. Finally, it overlays the vectors along the contour paths, revealing differential trends that the contour lines experienced over time. We evaluated our visualization using real-life datasets, consisting of millions of data points, where the visualizations were generated in less than a minute in a single-threaded execution. Our experimental results reveal that ContourDiff can reliably visualize the differential trends, and provide a new way to explore the change pattern in spatiotemporal data.

Given an m×n table T of positive weights, and a rectangle R with an area equal to the sum of the weights, a table cartogram computes a partition of R into m×n convex quadrilateral faces such that each face has the same adjacencies as its corresponding cell in T, and has an area equal to the cell’s weight. In this paper, we examine constraint optimization-based and physics-inspired cartographic transformation approaches to produce cartograms for large tables with thousands of cells. We show that large table cartograms may provide diagrammatic representations in various real-life scenarios, e.g., for analyzing correlations between geospatial variables and creating visual effects in images. Our experiments with real-life datasets provide insights into how one approach may outperform the other in various application contexts.

#### 2020

Emanation graphs of grade k, introduced by Hamedmohseni, Rahmati, and Mondal, are plane spanners made by shooting \(2^{k+1}\) rays from each given point, where the shorter rays stop the longer ones upon collision. The collision points are the Steiner points of the spanner.

Eye tracking systems can provide people with severe motor impairments a way to communicate through gaze-based interactions. Such systems transform a user's gaze input into mouse pointer coordinates that can trigger keystrokes on an on-screen keyboard. However, typing using this approach requires large back-and-forth eye movements, and the required effort depends both on the length of the text and the keyboard layout. Motivated by the idea of sketch-based image search, we explore a gaze-based approach where users draw a shape on a sketchpad using gaze input, and the shape is used to search for similar letters, words, and other predefined controls. The sketch-based approach is area efficient (compared to an on-screen keyboard), allows users to create custom commands, and creates opportunities for gaze-based authentication. Since variation in the drawn shapes makes the search difficult, the system can show a guide (e.g., a 14-segment digital display) on the sketchpad so that users can trace their desired shape. In this paper, we take a first step that investigates the feasibility of the sketch-based approach, by examining how well users can trace a given shape using gaze input. We designed an interface where participants traced a set of given shapes. We then compared the similarity of the drawn and traced shapes. Our study results show the potential of the sketch-based approach: users were able to trace shapes reasonably well using gaze input, even for complex shapes involving three letters; shape tracing accuracy for gaze was better than `free-form' hand drawing. We also report on how different shape complexities influence the time and accuracy of the shape tracing tasks.

Two vertex-labelled polygons are compatible if they have the same clockwise cyclic ordering of vertices. The definition extends to polygonal regions (polygons with holes) and to triangulations—for every face, the clockwise cyclic order of vertices on the boundary must be the same. It is known that every pair of compatible n -vertex polygonal regions can be extended to compatible triangulations by adding O ( n 2 ) Steiner points. Furthermore, Ω ( n 2 ) Steiner points are sometimes necessary, even for a pair of polygons. Compatible triangulations provide piecewise linear homeomorphisms and are also a crucial first step in morphing planar graph drawings, aka “2D shape animation.” An intriguing open question, first posed by Aronov, Seidel, and Souvaine in 1993, is to decide if two compatible polygons have compatible triangulations with at most k Steiner points. In this paper we prove the problem to be NP-hard for polygons with holes. The question remains open for simple polygons.

#### 2019

Abstract With the era of big data approaching, the number of software systems, their dependencies, as well as the complexity of the individual system is becoming larger and more intricate. Understanding these evolving software systems is thus a primary challenge for cost-effective software management and maintenance. In this paper we perform a case study with evolving code clones. The programmers often need to manually analyze the co-evolution of clone fragments to decide about refactoring, tracking, and bug removal. However, manual analysis is time consuming, and nearly infeasible for a large number of clones, e.g., with millions of similarity pairs, where clones are evolving over hundreds of software revisions. We propose an interactive visual analytics system, Clone-World, which leverages big data visualization approach to manage code clones in large software systems. Clone-World, gives an intuitive yet powerful solution to the clone analytic problems. Clone-World combines multiple information-linked zoomable views, where users can explore and analyze clones through interactive exploration in real time. User studies and experts’ reviews suggest that Clone-World may assist developers in many real-life software development and maintenance scenarios. We believe that Clone-World will ease the management and maintenance of clones, and inspire future innovation to adapt visual analytics to manage big software systems.