Literature DB >> 25393923

A coverage criterion for spaced seeds and its applications to support vector machine string kernels and k-mer distances.

Laurent Noé1, Donald E K Martin.   

Abstract

Spaced seeds have been recently shown to not only detect more alignments, but also to give a more accurate measure of phylogenetic distances, and to provide a lower misclassification rate when used with Support Vector Machines (SVMs). We confirm by independent experiments these two results, and propose in this article to use a coverage criterion to measure the seed efficiency in both cases in order to design better seed patterns. We show first how this coverage criterion can be directly measured by a full automaton-based approach. We then illustrate how this criterion performs when compared with two other criteria frequently used, namely the single-hit and multiple-hit criteria, through correlation coefficients with the correct classification/the true distance. At the end, for alignment-free distances, we propose an extension by adopting the coverage criterion, show how it performs, and indicate how it can be efficiently computed.

Keywords:  alignment-free distance; coverage; spaced seeds; support vector machine

Mesh:

Year:  2014        PMID: 25393923      PMCID: PMC4253314          DOI: 10.1089/cmb.2014.0173

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  34 in total

Review 1.  Alignment-free sequence comparison-a review.

Authors:  Susana Vinga; Jonas Almeida
Journal:  Bioinformatics       Date:  2003-03-01       Impact factor: 6.937

2.  Mismatch string kernels for discriminative protein classification.

Authors:  Christina S Leslie; Eleazar Eskin; Adiel Cohen; Jason Weston; William Stafford Noble
Journal:  Bioinformatics       Date:  2004-01-22       Impact factor: 6.937

3.  Error tolerant indexing and alignment of short reads with covering template families.

Authors:  Eldar Giladi; John Healy; Gene Myers; Chris Hart; Philipp Kapranov; Doron Lipson; Steve Roels; Edward Thayer; Stan Letovsky
Journal:  J Comput Biol       Date:  2010-10       Impact factor: 1.479

4.  Efficient q-gram filters for finding all epsilon-matches over a given length.

Authors:  Kim R Rasmussen; Jens Stoye; Eugene W Myers
Journal:  J Comput Biol       Date:  2006-03       Impact factor: 1.479

5.  Multiseed lossless filtration.

Authors:  Gregory Kucherov; Laurent Noé; Mikhail Roytberg
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2005 Jan-Mar       Impact factor: 3.710

6.  Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions.

Authors:  Gregory E Sims; Se-Ran Jun; Guohong A Wu; Sung-Hou Kim
Journal:  Proc Natl Acad Sci U S A       Date:  2009-02-02       Impact factor: 11.205

7.  ZOOM! Zillions of oligos mapped.

Authors:  Hao Lin; Zefeng Zhang; Michael Q Zhang; Bin Ma; Ming Li
Journal:  Bioinformatics       Date:  2008-08-06       Impact factor: 6.937

8.  A simple shortcut to unsupervised alignment-free phylogenetic genome groupings, even from unassembled sequencing reads.

Authors:  Sebastian Maurer-Stroh; Vithiagaran Gunalan; Wing-Cheong Wong; Frank Eisenhaber
Journal:  J Bioinform Comput Biol       Date:  2013-12-02       Impact factor: 1.122

9.  Rfam 11.0: 10 years of RNA families.

Authors:  Sarah W Burge; Jennifer Daub; Ruth Eberhardt; John Tate; Lars Barquist; Eric P Nawrocki; Sean R Eddy; Paul P Gardner; Alex Bateman
Journal:  Nucleic Acids Res       Date:  2012-11-03       Impact factor: 16.971

10.  Alignment-free phylogeny of whole genomes using underlying subwords.

Authors:  Matteo Comin; Davide Verzotto
Journal:  Algorithms Mol Biol       Date:  2012-12-06       Impact factor: 1.405

View more
  5 in total

Review 1.  A review of methods and databases for metagenomic classification and assembly.

Authors:  Florian P Breitwieser; Jennifer Lu; Steven L Salzberg
Journal:  Brief Bioinform       Date:  2019-07-19       Impact factor: 11.622

2.  Estimating evolutionary distances between genomic sequences from spaced-word matches.

Authors:  Burkhard Morgenstern; Bingyao Zhu; Sebastian Horwege; Chris André Leimeister
Journal:  Algorithms Mol Biol       Date:  2015-02-11       Impact factor: 1.405

3.  rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison.

Authors:  Lars Hahn; Chris-André Leimeister; Rachid Ounit; Stefano Lonardi; Burkhard Morgenstern
Journal:  PLoS Comput Biol       Date:  2016-10-19       Impact factor: 4.475

4.  Best hits of 11110110111: model-free selection and parameter-free sensitivity calculation of spaced seeds.

Authors:  Laurent Noé
Journal:  Algorithms Mol Biol       Date:  2017-02-14       Impact factor: 1.405

5.  PhylOligo: a package to identify contaminant or untargeted organism sequences in genome assemblies.

Authors:  Ludovic Mallet; Tristan Bitard-Feildel; Franck Cerutti; Hélène Chiapello
Journal:  Bioinformatics       Date:  2017-10-15       Impact factor: 6.937

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.