Literature DB >> 24829447

Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches.

Sebastian Horwege1, Sebastian Lindner2, Marcus Boden2, Klas Hatje3, Martin Kollmar3, Chris-André Leimeister2, Burkhard Morgenstern4.   

Abstract

In this article, we present a user-friendly web interface for two alignment-free sequence-comparison methods that we recently developed. Most alignment-free methods rely on exact word matches to estimate pairwise similarities or distances between the input sequences. By contrast, our new algorithms are based on inexact word matches. The first of these approaches uses the relative frequencies of so-called spaced words in the input sequences, i.e. words containing 'don't care' or 'wildcard' symbols at certain pre-defined positions. Various distance measures can then be defined on sequences based on their different spaced-word composition. Our second approach defines the distance between two sequences by estimating for each position in the first sequence the length of the longest substring at this position that also occurs in the second sequence with up to k mismatches. Both approaches take a set of deoxyribonucleic acid (DNA) or protein sequences as input and return a matrix of pairwise distance values that can be used as a starting point for clustering algorithms or distance-based phylogeny reconstruction. The two alignment-free programmes are accessible through a web interface at 'Göttingen Bioinformatics Compute Server (GOBICS)': http://spaced.gobics.de http://kmacs.gobics.de and the source codes can be downloaded.
© The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Year:  2014        PMID: 24829447      PMCID: PMC4086093          DOI: 10.1093/nar/gku398

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  17 in total

1.  PatternHunter: faster and more sensitive homology search.

Authors:  Bin Ma; John Tromp; Ming Li
Journal:  Bioinformatics       Date:  2002-03       Impact factor: 6.937

Review 2.  Alignment-free sequence comparison-a review.

Authors:  Susana Vinga; Jonas Almeida
Journal:  Bioinformatics       Date:  2003-03-01       Impact factor: 6.937

3.  Multiple sequence alignment with the Clustal series of programs.

Authors:  Ramu Chenna; Hideaki Sugawara; Tadashi Koike; Rodrigo Lopez; Toby J Gibson; Desmond G Higgins; Julie D Thompson
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

4.  The rapid generation of mutation data matrices from protein sequences.

Authors:  D T Jones; W R Taylor; J M Thornton
Journal:  Comput Appl Biosci       Date:  1992-06

5.  Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions.

Authors:  Gregory E Sims; Se-Ran Jun; Guohong A Wu; Sung-Hou Kim
Journal:  Proc Natl Acad Sci U S A       Date:  2009-02-02       Impact factor: 11.205

6.  Estimating mutation distances from unaligned genomes.

Authors:  Bernhard Haubold; Peter Pfaffelhuber; Mirjana Domazet-Loso; Thomas Wiehe
Journal:  J Comput Biol       Date:  2009-10       Impact factor: 1.479

7.  Alignment-free phylogeny of whole genomes using underlying subwords.

Authors:  Matteo Comin; Davide Verzotto
Journal:  Algorithms Mol Biol       Date:  2012-12-06       Impact factor: 1.405

8.  Pattern-based phylogenetic distance estimation and tree reconstruction.

Authors:  Michael Höhl; Isidore Rigoutsos; Mark A Ragan
Journal:  Evol Bioinform Online       Date:  2007-02-25       Impact factor: 1.625

9.  Reconstructing the phylogeny of 21 completely sequenced arthropod species based on their motor proteins.

Authors:  Florian Odronitz; Sebastian Becker; Martin Kollmar
Journal:  BMC Genomics       Date:  2009-04-21       Impact factor: 3.969

10.  Kmacs: the k-mismatch average common substring approach to alignment-free sequence comparison.

Authors:  Chris-Andre Leimeister; Burkhard Morgenstern
Journal:  Bioinformatics       Date:  2014-05-13       Impact factor: 6.937

View more
  26 in total

1.  Prot-SpaM: fast alignment-free phylogeny reconstruction based on whole-proteome sequences.

Authors:  Chris-Andre Leimeister; Jendrik Schellhorn; Svenja Dörrer; Michael Gerth; Christoph Bleidorn; Burkhard Morgenstern
Journal:  Gigascience       Date:  2019-03-01       Impact factor: 6.524

2.  A coverage criterion for spaced seeds and its applications to support vector machine string kernels and k-mer distances.

Authors:  Laurent Noé; Donald E K Martin
Journal:  J Comput Biol       Date:  2014-12       Impact factor: 1.479

3.  Sequence Comparison Without Alignment: The SpaM Approaches.

Authors:  Burkhard Morgenstern
Journal:  Methods Mol Biol       Date:  2021

4.  Specificity Analysis of Genome Based on Statistically Identical K-Words With Same Base Combination.

Authors:  Hyein Seo; Yong-Joon Song; Kiho Cho; Dong-Ho Cho
Journal:  IEEE Open J Eng Med Biol       Date:  2020-07-14

5.  Interpreting alignment-free sequence comparison: what makes a score a good score?

Authors:  Martin T Swain; Martin Vickers
Journal:  NAR Genom Bioinform       Date:  2022-09-05

6.  Read-SpaM: assembly-free and alignment-free comparison of bacterial genomes with low sequencing coverage.

Authors:  Anna-Katharina Lau; Svenja Dörrer; Chris-André Leimeister; Christoph Bleidorn; Burkhard Morgenstern
Journal:  BMC Bioinformatics       Date:  2019-12-17       Impact factor: 3.169

7.  Estimating evolutionary distances between genomic sequences from spaced-word matches.

Authors:  Burkhard Morgenstern; Bingyao Zhu; Sebastian Horwege; Chris André Leimeister
Journal:  Algorithms Mol Biol       Date:  2015-02-11       Impact factor: 1.405

8.  Fast alignment-free sequence comparison using spaced-word frequencies.

Authors:  Chris-Andre Leimeister; Marcus Boden; Sebastian Horwege; Sebastian Lindner; Burkhard Morgenstern
Journal:  Bioinformatics       Date:  2014-04-03       Impact factor: 6.937

9.  Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data.

Authors:  Saulo Alves Aflitos; Edouard Severing; Gabino Sanchez-Perez; Sander Peters; Hans de Jong; Dick de Ridder
Journal:  BMC Bioinformatics       Date:  2015-11-02       Impact factor: 3.169

10.  Kmacs: the k-mismatch average common substring approach to alignment-free sequence comparison.

Authors:  Chris-Andre Leimeister; Burkhard Morgenstern
Journal:  Bioinformatics       Date:  2014-05-13       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.