| Literature DB >> 32248063 |
Filippo Utro1, Niina Haiminen1, Enrico Siragusa1, Laura-Jayne Gardiner2, Ed Seabolt3, Ritesh Krishna2, James H Kaufman4, Laxmi Parida5.
Abstract
Increasingly available microbial reference data allow interpreting the composition and function of previously uncharacterized microbial communities in detail, via high-throughput sequencing analysis. However, efficient methods for read classification are required when the best database matches for short sequence reads are often shared among multiple reference sequences. Here, we take advantage of the fact that microbial sequences can be annotated relative to established tree structures, and we develop a highly scalable read classifier, PRROMenade, by enhancing the generalized Burrows-Wheeler transform with a labeling step to directly assign reads to the corresponding lowest taxonomic unit in an annotation tree. PRROMenade solves the multi-matching problem while allowing fast variable-size sequence classification for phylogenetic or functional annotation. Our simulations with 5% added differences from reference indicated only 1.5% error rate for PRROMenade functional classification. On metatranscriptomic data PRROMenade highlighted biologically relevant functional pathways related to diet-induced changes in the human gut microbiome.Entities:
Keywords: Bioinformatics; Microbial Genetics; Microbiology
Year: 2020 PMID: 32248063 PMCID: PMC7125348 DOI: 10.1016/j.isci.2020.100988
Source DB: PubMed Journal: iScience ISSN: 2589-0042
Figure 1Distribution of Maximal Exact Match Lengths
Distribution of maximal exact match lengths on experimental metatranscriptomic data (5–33 AA) against the OMXWare database. Average MEM length is 8.3 AA.
Figure 2Clustering of Functional Profiles
Clustering of metatranscriptome PRROMenade profiles for plant- (green) and animal-based (red) diet samples at various levels of functional hierarchy. The vegetarian subject (S6) samples are denoted with a lighter shade.
Figure 3Clustering of Differentiating Functions
Clustering of top 30 differentially abundant functions at level 4 (columns) and of samples (rows), colored by RoDEO projected values; the left cluster shows functions enriched in animal-based diet and right cluster those enriched in plant-based diet.