| Literature DB >> 31144628 |
Sugeerth Murugesan, Sana Malik, Fan Du, Eunyee Koh, Tuan Manh Lai.
Abstract
Deep learning models have become the state-of-the-art for many tasks, from text sentiment analysis to facial image recognition. However, understanding why certain models perform better than others or how one model learns differently than another is often difficult yet critical for increasing their effectiveness, improving prediction accuracy, and enabling fairness. Traditional methods for comparing models' efficacy, such as accuracy, precision, and recall provide a quantitative view of performance; however, the qualitative intricacies of why one model performs better than another are hidden. In this paper, we interview machine learning practitioners to understand their evaluation and comparison workflow. From there, we iteratively design a visual analytic approach, DeepCompare, to systematically compare the results of deep learning models, in order to provide insight into the model behavior and interactively assess tradeoffs between two such models. The tool allows users to evaluate model results, identify and compare activation patterns for misclassifications, and link the test results back to specific neurons. We conduct a preliminary evaluation through two real-world case studies to show that experts can make more informed decisions about the effectiveness of different types of models, understand in more detail the strengths and weaknesses of the models, and holistically evaluate the behavior of the models.Year: 2019 PMID: 31144628 DOI: 10.1109/MCG.2019.2919033
Source DB: PubMed Journal: IEEE Comput Graph Appl ISSN: 0272-1716 Impact factor: 2.088