Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Higher-order Markov models for metagenomic sequence classification.

Literature DB >> 32516355

Higher-order Markov models for metagenomic sequence classification.

Abstract

MOTIVATION: Alignment-free, stochastic models derived from k-mer distributions representing reference genome sequences have a rich history in the classification of DNA sequences. In particular, the variants of Markov models have previously been used extensively. Higher-order Markov models have been used with caution, perhaps sparingly, primarily because of the lack of enough training data and computational power. Advances in sequencing technology and computation have enabled exploitation of the predictive power of higher-order models. We, therefore, revisited higher-order Markov models and assessed their performance in classifying metagenomic sequences.
RESULTS: Comparative assessment of higher-order models (HOMs, 9th order or higher) with interpolated Markov model, interpolated context model and lower-order models (8th order or lower) was performed on metagenomic datasets constructed using sequenced prokaryotic genomes. Our results show that HOMs outperform other models in classifying metagenomic fragments as short as 100 nt at all taxonomic ranks, and at lower ranks when the fragment size was increased to 250 nt. HOMs were also found to be significantly more accurate than local alignment which is widely relied upon for taxonomic classification of metagenomic sequences. A novel software implementation written in C++ performs classification faster than the existing Markovian metagenomic classifiers and can therefore be used as a standalone classifier or in conjunction with existing taxonomic classifiers for more robust classification of metagenomic sequences.
AVAILABILITY AND IMPLEMENTATION: The software has been made available at https://github.com/djburks/SMM. CONTACT: Rajeev.Azad@unt.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Year: 2020 PMID： 32516355 DOI： 10.1093/bioinformatics/btaa562

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

1 in total

1. Optimized splitting of mixed-species RNA sequencing data.

Authors: Xuan Song; Hai Yun Gao; Karl Herrup; Ronald P Hart
Journal: J Bioinform Comput Biol Date: 2022-01-06 Impact factor: 1.204

1 in total