| Literature DB >> 2279194 |
R F Mott1, T B Kirkwood, R N Curnow.
Abstract
An accurate approximation is derived to the distribution of the length of the longest matching word present between two random DNA sequences of finite length, using only elementary probability arguments. The distribution is shown to be consistent with previous asymptotic results for the mean and variance of longest common words. The application of the distribution to assessing the statistical significance of sequence similarities is considered. It is shown how the distribution can be modified to take account of non-independence of neighbouring bases in real sequences.Mesh:
Substances:
Year: 1990 PMID: 2279194 DOI: 10.1007/bf02460808
Source DB: PubMed Journal: Bull Math Biol ISSN: 0092-8240 Impact factor: 1.758