Literature DB >> 20001252

Alignment-free sequence comparison (I): statistics and power.

Gesine Reinert1, David Chew, Fengzhu Sun, Michael S Waterman.   

Abstract

Large-scale comparison of the similarities between two biological sequences is a major issue in computational biology; a fast method, the D(2) statistic, relies on the comparison of the k-tuple content for both sequences. Although it has been known for some years that the D(2) statistic is not suitable for this task, as it tends to be dominated by single-sequence noise, to date no suitable adjustments have been proposed. In this article, we suggest two new variants of the D(2) word count statistic, which we call D(2)(S) and D(2)(*). For D(2)(S), which is a self-standardized statistic, we show that the statistic is asymptotically normally distributed, when sequence lengths tend to infinity, and not dominated by the noise in the individual sequences. The second statistic, D(2)(*), outperforms D(2)(S) in terms of power for detecting the relatedness between the two sequences in our examples; but although it is straightforward to simulate from the asymptotic distribution of D(2)(*), we cannot provide a closed form for power calculations.

Entities:  

Mesh:

Year:  2009        PMID: 20001252      PMCID: PMC2818754          DOI: 10.1089/cmb.2009.0198

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  3 in total

1.  Distributional regimes for the number of k-word matches between two random sequences.

Authors:  Ross A Lippert; Haiyan Huang; Michael S Waterman
Journal:  Proc Natl Acad Sci U S A       Date:  2002-10-08       Impact factor: 11.205

2.  A statistical method for alignment-free comparison of regulatory sequences.

Authors:  Miriam R Kantorovitz; Gene E Robinson; Saurabh Sinha
Journal:  Bioinformatics       Date:  2007-07-01       Impact factor: 6.937

3.  Asymptotic behaviour and optimal word size for exact and approximate word matches between random sequences.

Authors:  Sylvain Forêt; Miriam R Kantorovitz; Conrad J Burden
Journal:  BMC Bioinformatics       Date:  2006-12-18       Impact factor: 3.169

  3 in total
  78 in total

1.  Separating significant matches from spurious matches in DNA sequences.

Authors:  Hugo Devillers; Sophie Schbath
Journal:  J Comput Biol       Date:  2011-12-09       Impact factor: 1.479

2.  The power of detecting enriched patterns: an HMM approach.

Authors:  Zhiyuan Zhai; Shih-Yen Ku; Yihui Luan; Gesine Reinert; Michael S Waterman; Fengzhu Sun
Journal:  J Comput Biol       Date:  2010-04       Impact factor: 1.479

3.  Alignment-free sequence comparison (II): theoretical power of comparison statistics.

Authors:  Lin Wan; Gesine Reinert; Fengzhu Sun; Michael S Waterman
Journal:  J Comput Biol       Date:  2010-10-25       Impact factor: 1.479

4.  A geometric interpretation for local alignment-free sequence comparison.

Authors:  Ehsan Behnam; Michael S Waterman; Andrew D Smith
Journal:  J Comput Biol       Date:  2013-07       Impact factor: 1.479

5.  Biological intuition in alignment-free methods: response to Posada.

Authors:  Mark A Ragan; Cheong Xin Chan
Journal:  J Mol Evol       Date:  2013-07-23       Impact factor: 2.395

Review 6.  Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis.

Authors:  Oliver Bonham-Carter; Joe Steele; Dhundy Bastola
Journal:  Brief Bioinform       Date:  2013-07-31       Impact factor: 11.622

7.  Multiple alignment-free sequence comparison.

Authors:  Jie Ren; Kai Song; Fengzhu Sun; Minghua Deng; Gesine Reinert
Journal:  Bioinformatics       Date:  2013-08-29       Impact factor: 6.937

Review 8.  New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing.

Authors:  Kai Song; Jie Ren; Gesine Reinert; Minghua Deng; Michael S Waterman; Fengzhu Sun
Journal:  Brief Bioinform       Date:  2013-09-23       Impact factor: 11.622

9.  The distribution of word matches between Markovian sequences with periodic boundary conditions.

Authors:  Conrad J Burden; Paul Leopardi; Sylvain Forêt
Journal:  J Comput Biol       Date:  2013-10-26       Impact factor: 1.479

10.  Prot-SpaM: fast alignment-free phylogeny reconstruction based on whole-proteome sequences.

Authors:  Chris-Andre Leimeister; Jendrik Schellhorn; Svenja Dörrer; Michael Gerth; Christoph Bleidorn; Burkhard Morgenstern
Journal:  Gigascience       Date:  2019-03-01       Impact factor: 6.524

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.