Literature DB >> 2279194

An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences.

R F Mott1, T B Kirkwood, R N Curnow.   

Abstract

An accurate approximation is derived to the distribution of the length of the longest matching word present between two random DNA sequences of finite length, using only elementary probability arguments. The distribution is shown to be consistent with previous asymptotic results for the mean and variance of longest common words. The application of the distribution to assessing the statistical significance of sequence similarities is considered. It is shown how the distribution can be modified to take account of non-independence of neighbouring bases in real sequences.

Mesh:

Substances:

Year:  1990        PMID: 2279194     DOI: 10.1007/bf02460808

Source DB:  PubMed          Journal:  Bull Math Biol        ISSN: 0092-8240            Impact factor:   1.758


  8 in total

1.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.

Authors:  S Karlin; S F Altschul
Journal:  Proc Natl Acad Sci U S A       Date:  1990-03       Impact factor: 11.205

2.  A test for the statistical significance of DNA sequence similarities for application in databank searches.

Authors:  R F Mott; T B Kirkwood; R N Curnow
Journal:  Comput Appl Biosci       Date:  1989-04

3.  Significance levels for biological sequence comparison using non-linear similarity functions.

Authors:  S F Altschul; B W Erickson
Journal:  Bull Math Biol       Date:  1988       Impact factor: 1.758

4.  The statistical distribution of nucleic acid similarities.

Authors:  T F Smith; M S Waterman; C Burks
Journal:  Nucleic Acids Res       Date:  1985-01-25       Impact factor: 16.971

5.  A comprehensive set of sequence analysis programs for the VAX.

Authors:  J Devereux; P Haeberli; O Smithies
Journal:  Nucleic Acids Res       Date:  1984-01-11       Impact factor: 16.971

6.  Statistical characterization of nucleic acid sequence functional domains.

Authors:  T F Smith; M S Waterman; J R Sadler
Journal:  Nucleic Acids Res       Date:  1983-04-11       Impact factor: 16.971

7.  Identification of common molecular subsequences.

Authors:  T F Smith; M S Waterman
Journal:  J Mol Biol       Date:  1981-03-25       Impact factor: 5.469

8.  New approaches for computer analysis of nucleic acid sequences.

Authors:  S Karlin; G Ghandour; F Ost; S Tavare; L J Korn
Journal:  Proc Natl Acad Sci U S A       Date:  1983-09       Impact factor: 11.205

  8 in total
  3 in total

1.  Poisson, compound Poisson and process approximations for testing statistical significance in sequence comparisons.

Authors:  L Goldstein; M S Waterman
Journal:  Bull Math Biol       Date:  1992-09       Impact factor: 1.758

2.  Maximum-likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores.

Authors:  R Mott
Journal:  Bull Math Biol       Date:  1992-01       Impact factor: 1.758

3.  Pattern matching between two non-aligned random sequences.

Authors:  K N Sheng; J I Naus
Journal:  Bull Math Biol       Date:  1994-11       Impact factor: 1.758

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.