Literature DB >> 35966405

Interpreting Neural Networks for Biological Sequences by Learning Stochastic Masks.

Johannes Linder1, Alyssa La Fleur1, Zibo Chen2, Ajasja Ljubeti2, David Baker2, Sreeram Kannan3, Georg Seelig1,3.   

Abstract

Sequence-based neural networks can learn to make accurate predictions from large biological datasets, but model interpretation remains challenging. Many existing feature attribution methods are optimized for continuous rather than discrete input patterns and assess individual feature importance in isolation, making them ill-suited for interpreting non-linear interactions in molecular sequences. Building on work in computer vision and natural language processing, we developed an approach based on deep learning - Scrambler networks - wherein the most salient sequence positions are identified with learned input masks. Scramblers learn to predict Position-Specific Scoring Matrices (PSSMs) where unimportant nucleotides or residues are scrambled by raising their entropy. We apply Scramblers to interpret the effects of genetic variants, uncover non-linear interactions between cis-regulatory elements, explain binding specificity for protein-protein interactions, and identify structural determinants of de novo designed proteins. We show that Scramblers enable efficient attribution across large datasets and result in high-quality explanations, often outperforming state-of-the-art methods.

Entities:  

Year:  2022        PMID: 35966405      PMCID: PMC9373874          DOI: 10.1038/s42256-021-00428-6

Source DB:  PubMed          Journal:  Nat Mach Intell        ISSN: 2522-5839


  45 in total

1.  The prothrombin 3'end formation signal reveals a unique architecture that is sensitive to thrombophilic gain-of-function mutations.

Authors:  Sven Danckwardt; Niels H Gehring; Gabriele Neu-Yilik; Patrick Hundsdoerfer; Margit Pforsich; Ute Frede; Matthias W Hentze; Andreas E Kulozik
Journal:  Blood       Date:  2004-04-01       Impact factor: 22.113

2.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning.

Authors:  Babak Alipanahi; Andrew Delong; Matthew T Weirauch; Brendan J Frey
Journal:  Nat Biotechnol       Date:  2015-07-27       Impact factor: 54.908

3.  Integrating distal and proximal information to predict gene expression via a densely connected convolutional neural network.

Authors:  Wanwen Zeng; Yong Wang; Rui Jiang
Journal:  Bioinformatics       Date:  2020-01-15       Impact factor: 6.937

4.  Glycine residues appear to be evolutionarily conserved for their ability to inhibit aggregation.

Authors:  Claudia Parrini; Niccolò Taddei; Matteo Ramazzotti; Donatella Degl'Innocenti; Giampietro Ramponi; Christopher M Dobson; Fabrizio Chiti
Journal:  Structure       Date:  2005-08       Impact factor: 5.006

5.  Critiquing Protein Family Classification Models Using Sufficient Input Subsets.

Authors:  Brandon Carter; Maxwell Bileschi; Jamie Smith; Theo Sanderson; Drew Bryant; David Belanger; Lucy J Colwell
Journal:  J Comput Biol       Date:  2019-12-23       Impact factor: 1.479

6.  Integration of the Rosetta suite with the python software stack via reproducible packaging and core programming interfaces for distributed simulation.

Authors:  Alexander S Ford; Brian D Weitzner; Christopher D Bahl
Journal:  Protein Sci       Date:  2019-12-02       Impact factor: 6.725

7.  Recessive mutations in the INS gene result in neonatal diabetes through reduced insulin biosynthesis.

Authors:  Intza Garin; Emma L Edghill; Ildem Akerman; Oscar Rubio-Cabezas; Itxaso Rica; Jonathan M Locke; Miguel Angel Maestro; Adnan Alshaikh; Ruveyde Bundak; Gabriel del Castillo; Asma Deeb; Dorothee Deiss; Juan M Fernandez; Koumudi Godbole; Khalid Hussain; Michele O'Connell; Thomasz Klupa; Stanislava Kolouskova; Fauzia Mohsin; Kusiel Perlman; Zdenek Sumnik; Jose M Rial; Estibaliz Ugarte; Thiruvengadam Vasanthi; Karen Johnstone; Sarah E Flanagan; Rosa Martínez; Carlos Castaño; Ann-Marie Patch; Eduardo Fernández-Rebollo; Klemens Raile; Noel Morgan; Lorna W Harries; Luis Castaño; Sian Ellard; Jorge Ferrer; Guiomar Perez de Nanclares; Andrew T Hattersley
Journal:  Proc Natl Acad Sci U S A       Date:  2010-01-28       Impact factor: 11.205

Review 8.  Deep learning: new computational modelling techniques for genomics.

Authors:  Gökcen Eraslan; Žiga Avsec; Julien Gagneur; Fabian J Theis
Journal:  Nat Rev Genet       Date:  2019-07       Impact factor: 53.242

9.  Before It Gets Started: Regulating Translation at the 5' UTR.

Authors:  Patricia R Araujo; Kihoon Yoon; Daijin Ko; Andrew D Smith; Mei Qiao; Uthra Suresh; Suzanne C Burns; Luiz O F Penalva
Journal:  Comp Funct Genomics       Date:  2012-05-28

10.  Characterising the loss-of-function impact of 5' untranslated region variants in 15,708 individuals.

Authors:  Daniel G MacArthur; James S Ware; Nicola Whiffin; Konrad J Karczewski; Xiaolei Zhang; Sonia Chothani; Miriam J Smith; D Gareth Evans; Angharad M Roberts; Nicholas M Quaife; Sebastian Schafer; Owen Rackham; Jessica Alföldi; Anne H O'Donnell-Luria; Laurent C Francioli; Stuart A Cook; Paul J R Barton
Journal:  Nat Commun       Date:  2020-05-27       Impact factor: 14.919

View more
  1 in total

1.  Interpreting Neural Networks for Biological Sequences by Learning Stochastic Masks.

Authors:  Johannes Linder; Alyssa La Fleur; Zibo Chen; Ajasja Ljubeti; David Baker; Sreeram Kannan; Georg Seelig
Journal:  Nat Mach Intell       Date:  2022-01-25
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.