Literature DB >> 26209798

Spaced seeds improve k-mer-based metagenomic classification.

Karel Břinda1, Maciej Sykulski1, Gregory Kucherov1.   

Abstract

MOTIVATION: Metagenomics is a powerful approach to study genetic content of environmental samples, which has been strongly promoted by next-generation sequencing technologies. To cope with massive data involved in modern metagenomic projects, recent tools rely on the analysis of k-mers shared between the read to be classified and sampled reference genomes.
RESULTS: Within this general framework, we show that spaced seeds provide a significant improvement of classification accuracy, as opposed to traditional contiguous k-mers. We support this thesis through a series of different computational experiments, including simulations of large-scale metagenomic projects.Availability and implementation, Supplementary information: Scripts and programs used in this study, as well as supplementary material, are available from http://github.com/gregorykucherov/spaced-seeds-for-metagenomics. CONTACT: gregory.kucherov@univ-mlv.fr.
© The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2015        PMID: 26209798     DOI: 10.1093/bioinformatics/btv419

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  17 in total

Review 1.  A review of methods and databases for metagenomic classification and assembly.

Authors:  Florian P Breitwieser; Jennifer Lu; Steven L Salzberg
Journal:  Brief Bioinform       Date:  2019-07-19       Impact factor: 11.622

2.  Seq: A High-Performance Language for Bioinformatics.

Authors:  Ariya Shajii; Ibrahim Numanagić; Riyadh Baghdadi; Bonnie Berger; Saman Amarasinghe
Journal:  Proc ACM Program Lang       Date:  2019-10-10

3.  CMash: fast, multi-resolution estimation of k-mer-based Jaccard and containment indices.

Authors:  Shaopeng Liu; David Koslicki
Journal:  Bioinformatics       Date:  2022-06-24       Impact factor: 6.931

4.  rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison.

Authors:  Lars Hahn; Chris-André Leimeister; Rachid Ounit; Stefano Lonardi; Burkhard Morgenstern
Journal:  PLoS Comput Biol       Date:  2016-10-19       Impact factor: 4.475

5.  Assessment of Common and Emerging Bioinformatics Pipelines for Targeted Metagenomics.

Authors:  Léa Siegwald; Hélène Touzet; Yves Lemoine; David Hot; Christophe Audebert; Ségolène Caboche
Journal:  PLoS One       Date:  2017-01-04       Impact factor: 3.240

6.  Best hits of 11110110111: model-free selection and parameter-free sensitivity calculation of spaced seeds.

Authors:  Laurent Noé
Journal:  Algorithms Mol Biol       Date:  2017-02-14       Impact factor: 1.405

7.  FSH: fast spaced seed hashing exploiting adjacent hashes.

Authors:  Samuele Girotto; Matteo Comin; Cinzia Pizzi
Journal:  Algorithms Mol Biol       Date:  2018-03-22       Impact factor: 1.405

8.  ir-HSP: Improved Recognition of Heat Shock Proteins, Their Families and Sub-types Based On g-Spaced Di-peptide Features and Support Vector Machine.

Authors:  Prabina K Meher; Tanmaya K Sahu; Shachi Gahoi; Atmakuri R Rao
Journal:  Front Genet       Date:  2018-01-11       Impact factor: 4.599

9.  PhylOligo: a package to identify contaminant or untargeted organism sequences in genome assemblies.

Authors:  Ludovic Mallet; Tristan Bitard-Feildel; Franck Cerutti; Hélène Chiapello
Journal:  Bioinformatics       Date:  2017-10-15       Impact factor: 6.937

10.  Accurate multiple alignment of distantly related genome sequences using filtered spaced word matches as anchor points.

Authors:  Chris-André Leimeister; Thomas Dencker; Burkhard Morgenstern
Journal:  Bioinformatics       Date:  2019-01-15       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.