Literature DB >> 22624182

Alignment-free sequence comparison for biologically realistic sequences of moderate length.

Conrad J Burden1, Junmei Jing, Susan R Wilson.   

Abstract

The D(2) statistic, defined as the number of matches of words of some pre-specified length k, is a computationally fast alignment-free measure of biological sequence similarity. However there is some debate about its suitability for this purpose as the variability in D(2) may be dominated by the terms that reflect the noise in each of the single sequences only. We examine the extent of the problem and the effectiveness of overcoming it by using two mean-centred variants of this statistic, D(2)* and D(2c). We conclude that all three statistics are potentially useful measures of sequence similarity, for which reasonably accurate p-values can be estimated under a null hypothesis of sequences composed of identically and independently distributed letters. We show that D(2) and D(2)c, and to a somewhat lesser extent D(2)*, perform well in tests to classify moderate length query sequences as putative cis-regulatory modules.

Mesh:

Year:  2012        PMID: 22624182

Source DB:  PubMed          Journal:  Stat Appl Genet Mol Biol        ISSN: 1544-6115


  4 in total

1.  The distribution of word matches between Markovian sequences with periodic boundary conditions.

Authors:  Conrad J Burden; Paul Leopardi; Sylvain Forêt
Journal:  J Comput Biol       Date:  2013-10-26       Impact factor: 1.479

2.  Interpreting alignment-free sequence comparison: what makes a score a good score?

Authors:  Martin T Swain; Martin Vickers
Journal:  NAR Genom Bioinform       Date:  2022-09-05

3.  Google matrix analysis of DNA sequences.

Authors:  Vivek Kandiah; Dima L Shepelyansky
Journal:  PLoS One       Date:  2013-05-09       Impact factor: 3.240

Review 4.  Alignment-free inference of hierarchical and reticulate phylogenomic relationships.

Authors:  Guillaume Bernard; Cheong Xin Chan; Yao-Ban Chan; Xin-Yi Chua; Yingnan Cong; James M Hogan; Stefan R Maetschke; Mark A Ragan
Journal:  Brief Bioinform       Date:  2019-03-22       Impact factor: 11.622

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.