Literature DB >> 33554119

Interpretable detection of novel human viruses from genome sequencing data.

Jakub M Bartoszewicz1, Anja Seidel1, Bernhard Y Renard1.   

Abstract

Viruses evolve extremely quickly, so reliable methods for viral host prediction are necessary to safeguard biosecurity and biosafety alike. Novel human-infecting viruses are difficult to detect with standard bioinformatics workflows. Here, we predict whether a virus can infect humans directly from next-generation sequencing reads. We show that deep neural architectures significantly outperform both shallow machine learning and standard, homology-based algorithms, cutting the error rates in half and generalizing to taxonomic units distant from those presented during training. Further, we develop a suite of interpretability tools and show that it can be applied also to other models beyond the host prediction task. We propose a new approach for convolutional filter visualization to disentangle the information content of each nucleotide from its contribution to the final classification decision. Nucleotide-resolution maps of the learned associations between pathogen genomes and the infectious phenotype can be used to detect regions of interest in novel agents, for example, the SARS-CoV-2 coronavirus, unknown before it caused a COVID-19 pandemic in 2020. All methods presented here are implemented as easy-to-install packages not only enabling analysis of NGS datasets without requiring any deep learning skills, but also allowing advanced users to easily train and explain new models for genomics.
© The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.

Entities:  

Year:  2021        PMID: 33554119      PMCID: PMC7849996          DOI: 10.1093/nargab/lqab004

Source DB:  PubMed          Journal:  NAR Genom Bioinform        ISSN: 2631-9268


  63 in total

1.  Editorial commentary: Unbiased next-generation sequencing and new pathogen discovery: undeniable advantages and still-existing drawbacks.

Authors:  Arianna Calistri; Giorgio Palù
Journal:  Clin Infect Dis       Date:  2015-01-07       Impact factor: 9.079

2.  DEEP MOTIF DASHBOARD: VISUALIZING AND UNDERSTANDING GENOMIC SEQUENCES USING DEEP NEURAL NETWORKS.

Authors:  Jack Lanchantin; Ritambhara Singh; Beilun Wang; Yanjun Qi
Journal:  Pac Symp Biocomput       Date:  2017

3.  Unified rational protein engineering with sequence-based deep representation learning.

Authors:  Ethan C Alley; Grigory Khimulya; Surojit Biswas; Mohammed AlQuraishi; George M Church
Journal:  Nat Methods       Date:  2019-10-21       Impact factor: 28.547

4.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation.

Authors:  Sebastian Bach; Alexander Binder; Grégoire Montavon; Frederick Klauschen; Klaus-Robert Müller; Wojciech Samek
Journal:  PLoS One       Date:  2015-07-10       Impact factor: 3.240

5.  Convolutional neural network architectures for predicting DNA-protein binding.

Authors:  Haoyang Zeng; Matthew D Edwards; Ge Liu; David K Gifford
Journal:  Bioinformatics       Date:  2016-06-15       Impact factor: 6.937

6.  Enhanced Integrated Gradients: improving interpretability of deep learning models using splicing codes as a case study.

Authors:  Anupama Jha; Joseph K Aicher; Matthew R Gazzara; Deependra Singh; Yoseph Barash
Journal:  Genome Biol       Date:  2020-06-19       Impact factor: 13.583

7.  Host Taxon Predictor - A Tool for Predicting Taxon of the Host of a Newly Discovered Virus.

Authors:  Wojciech Gałan; Maciej Bąk; Małgorzata Jakubowska
Journal:  Sci Rep       Date:  2019-03-05       Impact factor: 4.379

8.  Next Steps for Access to Safe, Secure DNA Synthesis.

Authors:  James Diggans; Emily Leproust
Journal:  Front Bioeng Biotechnol       Date:  2019-04-24

9.  Integrating regulatory DNA sequence and gene expression to predict genome-wide chromatin accessibility across cellular contexts.

Authors:  Surag Nair; Daniel S Kim; Jacob Perricone; Anshul Kundaje
Journal:  Bioinformatics       Date:  2019-07-15       Impact factor: 6.937

10.  A new coronavirus associated with human respiratory disease in China.

Authors:  Fan Wu; Su Zhao; Bin Yu; Yan-Mei Chen; Wen Wang; Zhi-Gang Song; Yi Hu; Zhao-Wu Tao; Jun-Hua Tian; Yuan-Yuan Pei; Ming-Li Yuan; Yu-Ling Zhang; Fa-Hui Dai; Yi Liu; Qi-Min Wang; Jiao-Jiao Zheng; Lin Xu; Edward C Holmes; Yong-Zhen Zhang
Journal:  Nature       Date:  2020-02-03       Impact factor: 49.962

View more
  8 in total

Review 1.  The science of the host-virus network.

Authors:  Gregory F Albery; Daniel J Becker; Liam Brierley; Cara E Brook; Rebecca C Christofferson; Lily E Cohen; Tad A Dallas; Evan A Eskew; Anna Fagre; Maxwell J Farrell; Emma Glennon; Sarah Guth; Maxwell B Joseph; Nardus Mollentze; Benjamin A Neely; Timothée Poisot; Angela L Rasmussen; Sadie J Ryan; Stephanie Seifert; Anna R Sjodin; Erin M Sorrell; Colin J Carlson
Journal:  Nat Microbiol       Date:  2021-11-24       Impact factor: 30.964

2.  AMAISE: a machine learning approach to index-free sequence enrichment.

Authors:  Meera Krishnamoorthy; Piyush Ranjan; John R Erb-Downward; Robert P Dickson; Jenna Wiens
Journal:  Commun Biol       Date:  2022-06-09

3.  Explainable deep neural networks for novel viral genome prediction.

Authors:  Chandra Mohan Dasari; Raju Bhukya
Journal:  Appl Intell (Dordr)       Date:  2021-06-25       Impact factor: 5.019

4.  Characterizing and Evaluating the Zoonotic Potential of Novel Viruses Discovered in Vampire Bats.

Authors:  Laura M Bergner; Nardus Mollentze; Richard J Orton; Carlos Tello; Alice Broos; Roman Biek; Daniel G Streicker
Journal:  Viruses       Date:  2021-02-06       Impact factor: 5.048

5.  Predicting the animal hosts of coronaviruses from compositional biases of spike protein and whole genome sequences through machine learning.

Authors:  Liam Brierley; Anna Fowler
Journal:  PLoS Pathog       Date:  2021-04-20       Impact factor: 6.823

6.  Correcting the Estimation of Viral Taxa Distributions in Next-Generation Sequencing Data after Applying Artificial Neural Networks.

Authors:  Moritz Kohls; Magdalena Kircher; Jessica Krepel; Pamela Liebig; Klaus Jung
Journal:  Genes (Basel)       Date:  2021-10-31       Impact factor: 4.096

Review 7.  Chaos game representation and its applications in bioinformatics.

Authors:  Hannah Franziska Löchel; Dominik Heider
Journal:  Comput Struct Biotechnol J       Date:  2021-11-10       Impact factor: 7.271

8.  Identifying and prioritizing potential human-infecting viruses from their genome sequences.

Authors:  Nardus Mollentze; Simon A Babayan; Daniel G Streicker
Journal:  PLoS Biol       Date:  2021-09-28       Impact factor: 8.029

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.