Literature DB >> 31874057

Critiquing Protein Family Classification Models Using Sufficient Input Subsets.

Brandon Carter1,2, Maxwell Bileschi2, Jamie Smith2, Theo Sanderson2, Drew Bryant2, David Belanger2, Lucy J Colwell2,3.   

Abstract

In many application domains, neural networks are highly accurate and have been deployed at large scale. However, users often do not have good tools for understanding how these models arrive at their predictions. This has hindered adoption in fields such as the life and medical sciences, where researchers require that models base their decisions on underlying biological phenomena rather than peculiarities of the dataset. We propose a set of methods for critiquing deep learning models and demonstrate their application for protein family classification, a task for which high-accuracy models have considerable potential impact. Our methods extend the Sufficient Input Subsets (SIS) technique, which we use to identify subsets of features in each protein sequence that are alone sufficient for classification. Our suite of tools analyzes these subsets to shed light on the decision-making criteria employed by models trained on this task. These tools show that while deep models may perform classification for biologically relevant reasons, their behavior varies considerably across the choice of network architecture and parameter initialization. While the techniques that we develop are specific to the protein sequence classification task, the approach taken generalizes to a broad set of scientific contexts in which model interpretability is essential.

Entities:  

Keywords:  interpretability; machine learning; model selection; neural networks; protein classification; protein domain

Year:  2019        PMID: 31874057     DOI: 10.1089/cmb.2019.0339

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  3 in total

1.  Interpreting Neural Networks for Biological Sequences by Learning Stochastic Masks.

Authors:  Johannes Linder; Alyssa La Fleur; Zibo Chen; Ajasja Ljubeti; David Baker; Sreeram Kannan; Georg Seelig
Journal:  Nat Mach Intell       Date:  2022-01-25

2.  Antibody complementarity determining region design using high-capacity machine learning.

Authors:  Ge Liu; Haoyang Zeng; Jonas Mueller; Brandon Carter; Ziheng Wang; Jonas Schilz; Geraldine Horny; Michael E Birnbaum; Stefan Ewert; David K Gifford
Journal:  Bioinformatics       Date:  2020-04-01       Impact factor: 6.937

3.  Improving protein domain classification for third-generation sequencing reads using deep learning.

Authors:  Nan Du; Jiayu Shang; Yanni Sun
Journal:  BMC Genomics       Date:  2021-04-09       Impact factor: 3.969

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.