Literature DB >> 20876033

Comparison of statistical methods to classify environmental genomic fragments.

Gail L Rosen1, Steven D Essinger.   

Abstract

"Binning" (or taxonomic classification) of DNA sequence reads is an initial step to analyzing an environmental biological sample. Currently, a homology-based tool, BLAST, is one of the most commonly used tools to label DNA reads, but it is argued that BLAST will quickly lose its classification ability as the genome databases grow. In this paper, we compare the accuracies of a naïve Bayes classifier (NBC) and statistical language model to BLAST for binning reads and demonstrate that NBC obtains good performance for the low cost of computational complexity. On the other hand, the back-off n-gram language model can improve accuracy when only partial training data is available (such as in-progress sequencing projects). NBC demonstrates comparable performance to BLAST and can also be optimized on partial training datasets by adjusting the word feature size. A fivefold cross validation is conducted to compare each method's accuracy for determining novel genomes at different taxonomic levels, with NBC outperforming BLAST for species-level classification but BLAST outperforming NBC for genus-level and phyla-level classification. In conclusion, the NBC is a competitive taxonomic classifier, and language models can improve performance when only partial training data is available.

Mesh:

Substances:

Year:  2010        PMID: 20876033     DOI: 10.1109/TNB.2010.2081375

Source DB:  PubMed          Journal:  IEEE Trans Nanobioscience        ISSN: 1536-1241            Impact factor:   2.935


  4 in total

Review 1.  Multidrug resistance from a one health perspective in Ethiopia: A systematic review and meta-analysis of literature (2015-2020).

Authors:  Mebrahtu Tweldemedhin; Saravanan Muthupandian; Tsega Kahsay Gebremeskel; Kibrti Mehari; Getahun Kahsay Abay; Teklay Gebrecherkos Teklu; Ranjithkumar Dhandapani; Ragul Paramasivam; Tsehaye Asmelash
Journal:  One Health       Date:  2022-04-20

2.  The transcriptional response of microbial communities in thawing Alaskan permafrost soils.

Authors:  Marco J L Coolen; William D Orsi
Journal:  Front Microbiol       Date:  2015-03-16       Impact factor: 5.640

Review 3.  Integrative workflows for metagenomic analysis.

Authors:  Efthymios Ladoukakis; Fragiskos N Kolisis; Aristotelis A Chatziioannou
Journal:  Front Cell Dev Biol       Date:  2014-11-19

4.  Scalable metagenomics alignment research tool (SMART): a scalable, rapid, and complete search heuristic for the classification of metagenomic sequences from complex sequence populations.

Authors:  Aaron Y Lee; Cecilia S Lee; Russell N Van Gelder
Journal:  BMC Bioinformatics       Date:  2016-07-28       Impact factor: 3.169

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.