Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Taxonomic classification of DNA sequences beyond sequence similarity using deep neural networks.

Literature DB >> 36018838

Taxonomic classification of DNA sequences beyond sequence similarity using deep neural networks.

Florian Mock¹, Fleming Kretschmer², Anton Kriese³, Sebastian Böcker², Manja Marz^1,4,5,6.

Abstract

Taxonomic classification, that is, the assignment to biological clades with shared ancestry, is a common task in genetics, mainly based on a genome similarity search of large genome databases. The classification quality depends heavily on the database, since representative relatives must be present. Many genomic sequences cannot be classified at all or only with a high misclassification rate. Here we present BERTax, a deep neural network program based on natural language processing to precisely classify the superkingdom and phylum of DNA sequences taxonomically without the need for a known representative relative from a database. We show BERTax to be at least on par with the state-of-the-art approaches when taxonomically similar species are part of the training data. For novel organisms, however, BERTax clearly outperforms any existing approach. Finally, we show that BERTax can also be combined with database approaches to further increase the prediction quality in almost all cases. Since BERTax is not based on similar entries in databases, it allows precise taxonomic classification of a broader range of genomic sequences, thus increasing the overall information gain.

Entities: Chemical

Keywords: deep learning; meta genome; taxonomic classification

Mesh：

Substances：
DNA

Year: 2022 PMID： 36018838 PMCID： PMC9436379 DOI： 10.1073/pnas.2122636119

Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN： 0027-8424 Impact factor: 12.779

27 in total

1. Fast and sensitive protein alignment using DIAMOND.

Authors: Benjamin Buchfink; Chao Xie; Daniel H Huson
Journal: Nat Methods Date: 2014-11-17 Impact factor: 28.547

2. k-SLAM: accurate and ultra-fast taxonomic classification and gene identification for large metagenomic data sets.

Authors: David Ainsworth; Michael J E Sternberg; Come Raczy; Sarah A Butcher
Journal: Nucleic Acids Res Date: 2017-02-28 Impact factor: 16.971

3. DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome.

Authors: Yanrong Ji; Zhihan Zhou; Han Liu; Ramana V Davuluri
Journal: Bioinformatics Date: 2021-02-04 Impact factor: 6.937

4. How many species are there on Earth and in the ocean?

Authors: Camilo Mora; Derek P Tittensor; Sina Adl; Alastair G B Simpson; Boris Worm
Journal: PLoS Biol Date: 2011-08-23 Impact factor: 8.029

5. CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers.

Authors: Rachid Ounit; Steve Wanamaker; Timothy J Close; Stefano Lonardi
Journal: BMC Genomics Date: 2015-03-25 Impact factor: 3.969

6. Gene2vec: gene subsequence embedding for prediction of mammalian N ⁶-methyladenosine sites from mRNA.

Authors: Quan Zou; Pengwei Xing; Leyi Wei; Bin Liu
Journal: RNA Date: 2018-11-13 Impact factor: 4.942

7. Modeling aspects of the language of life through transfer-learning protein sequences.

Authors: Michael Heinzinger; Ahmed Elnaggar; Yu Wang; Christian Dallago; Dmitrii Nechaev; Florian Matthes; Burkhard Rost
Journal: BMC Bioinformatics Date: 2019-12-17 Impact factor: 3.169

8. Database indexing for production MegaBLAST searches.

Authors: Aleksandr Morgulis; George Coulouris; Yan Raytselis; Thomas L Madden; Richa Agarwala; Alejandro A Schäffer
Journal: Bioinformatics Date: 2008-06-21 Impact factor: 6.937

9. Fast and sensitive taxonomic classification for metagenomics with Kaiju.

Authors: Peter Menzel; Kim Lee Ng; Anders Krogh
Journal: Nat Commun Date: 2016-04-13 Impact factor: 14.919

Review 10. SciPy 1.0: fundamental algorithms for scientific computing in Python.

Authors: Pauli Virtanen; Ralf Gommers; Travis E Oliphant; Matt Haberland; Tyler Reddy; David Cournapeau; Evgeni Burovski; Pearu Peterson; Warren Weckesser; Jonathan Bright; Stéfan J van der Walt; Matthew Brett; Joshua Wilson; K Jarrod Millman; Nikolay Mayorov; Andrew R J Nelson; Eric Jones; Robert Kern; Eric Larson; C J Carey; İlhan Polat; Yu Feng; Eric W Moore; Jake VanderPlas; Denis Laxalde; Josef Perktold; Robert Cimrman; Ian Henriksen; E A Quintero; Charles R Harris; Anne M Archibald; Antônio H Ribeiro; Fabian Pedregosa; Paul van Mulbregt
Journal: Nat Methods Date: 2020-02-03 Impact factor: 28.547

1 in total

1. Taxonomic classification of DNA sequences beyond sequence similarity using deep neural networks.

Authors: Florian Mock; Fleming Kretschmer; Anton Kriese; Sebastian Böcker; Manja Marz
Journal: Proc Natl Acad Sci U S A Date: 2022-08-26 Impact factor: 12.779

1 in total