Literature DB >> 28961782

MetaCache: context-aware classification of metagenomic reads using minhashing.

André Müller1, Christian Hundt1, Andreas Hildebrandt1, Thomas Hankeln2, Bertil Schmidt1.   

Abstract

MOTIVATION: Metagenomic shotgun sequencing studies are becoming increasingly popular with prominent examples including the sequencing of human microbiomes and diverse environments. A fundamental computational problem in this context is read classification, i.e. the assignment of each read to a taxonomic label. Due to the large number of reads produced by modern high-throughput sequencing technologies and the rapidly increasing number of available reference genomes corresponding software tools suffer from either long runtimes, large memory requirements or low accuracy.
RESULTS: We introduce MetaCache-a novel software for read classification using the big data technique minhashing. Our approach performs context-aware classification of reads by computing representative subsamples of k-mers within both, probed reads and locally constrained regions of the reference genomes. As a result, MetaCache consumes significantly less memory compared to the state-of-the-art read classifiers Kraken and CLARK while achieving highly competitive sensitivity and precision at comparable speed. For example, using NCBI RefSeq draft and completed genomes with a total length of around 140 billion bases as reference, MetaCache's database consumes only 62 GB of memory while both Kraken and CLARK fail to construct their respective databases on a workstation with 512 GB RAM. Our experimental results further show that classification accuracy continuously improves when increasing the amount of utilized reference genome data.
AVAILABILITY AND IMPLEMENTATION: MetaCache is open source software written in C ++ and can be downloaded at http://github.com/muellan/metacache. CONTACT: bertil.schmidt@uni-mainz.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

Entities:  

Mesh:

Year:  2017        PMID: 28961782     DOI: 10.1093/bioinformatics/btx520

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  7 in total

1.  MSC: a metagenomic sequence classification algorithm.

Authors:  Subrata Saha; Jethro Johnson; Soumitra Pal; George M Weinstock; Sanguthevar Rajasekaran
Journal:  Bioinformatics       Date:  2019-09-01       Impact factor: 6.937

2.  RainDrop: Rapid activation matrix computation for droplet-based single-cell RNA-seq reads.

Authors:  Stefan Niebler; André Müller; Thomas Hankeln; Bertil Schmidt
Journal:  BMC Bioinformatics       Date:  2020-07-01       Impact factor: 3.169

3.  Assembling Reads Improves Taxonomic Classification of Species.

Authors:  Quang Tran; Vinhthuy Phan
Journal:  Genes (Basel)       Date:  2020-08-17       Impact factor: 4.096

4.  LEMMI: a continuous benchmarking platform for metagenomics classifiers.

Authors:  Mathieu Seppey; Mosè Manni; Evgeny M Zdobnov
Journal:  Genome Res       Date:  2020-07-02       Impact factor: 9.043

5.  Downregulation of growth plate genes involved with the onset of femoral head separation in young broilers.

Authors:  Adriana Mércia Guaratini Ibelli; Jane de Oliveira Peixoto; Ricardo Zanella; João José de Simoni Gouveia; Maurício Egídio Cantão; Luiz Lehmann Coutinho; Jorge Augusto Petroli Marchesi; Mariane Spudeit Dal Pizzol; Débora Ester Petry Marcelino; Mônica Corrêa Ledur
Journal:  Front Physiol       Date:  2022-08-08       Impact factor: 4.755

6.  Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw data.

Authors:  Konstantin Bob; David Teschner; Thomas Kemmer; David Gomez-Zepeda; Stefan Tenzer; Bertil Schmidt; Andreas Hildebrandt
Journal:  BMC Bioinformatics       Date:  2022-07-20       Impact factor: 3.307

7.  expam-high-resolution analysis of metagenomes using distance trees.

Authors:  Sean M Solari; Remy B Young; Vanessa R Marcelino; Samuel C Forster
Journal:  Bioinformatics       Date:  2022-10-14       Impact factor: 6.931

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.