Literature DB >> 2515300

Average values of a dissimilarity measure not requiring sequence alignment are twice the averages of conventional mismatch counts requiring sequence alignment for a computer-generated model system.

B E Blaisdell1.   

Abstract

Three measures of sequence dissimilarity have been compared on a computer-generated model system in which substitutions in random sequences were made at randomly selected sites and the replacement character was chosen at random from the set of characters different from the original occupant of the site. The three measures were the conventional mismatch count between aligned sequences (AMC = m) and two measures not requiring prior sequence alignment. The latter two measures were the squared Euclidean distance between vectors of counts of t-tuples (t = 1-6) of characters in the two sequences (multiplet distribution distances or MDD = d) and counts of characters not covered by word structures of statistically significant length common to the two sequences (common long words or CLW = SIB, SIS, or SAB). Average MDD distances were found to be two times average mismatch counts in the simulated sequences for all values of t from 1 to 6 and all degrees of substitution from one per sequence to so many as to produce, effectively, random sequences. This simple relation held independently of sequence length and of sequence composition. The relation was confirmed by exact results on small model systems and by formal asymptotic results in the limit of so few substitutions that no double hits occur and in the limit of two random sequences. The coefficient of variation for MDD distances was greater than that for mismatch counts for singlets but both measures approached the same low value for sextets. Needleman-Wunsch alignment produced incorrect mismatch counts at higher degrees of substitution. The model satisfied the conditions for the derivation of the Jukes-Cantor asymptotic adjustment, but its application produced increasingly bad results with increasing degrees of substitution in accord with earlier results on model and natural sequences. This fact was a consequence of the increase with increasing degrees of substitution of the sensitivity of the adjustment to error in the observations. Average CLW distances for a variety of common word structures were more or less parallel to MDD distances for appropriately long t-tuples. These results on model systems supported the validity of the two dissimilarity measures not requiring sequence alignment that was found in earlier work on natural sequences (Blaisdell 1989).

Entities:  

Mesh:

Substances:

Year:  1989        PMID: 2515300     DOI: 10.1007/bf02602925

Source DB:  PubMed          Journal:  J Mol Evol        ISSN: 0022-2844            Impact factor:   2.395


  8 in total

1.  Cloning, sequencing, and expression of the gene coding for the human platelet alpha 2-adrenergic receptor.

Authors:  B K Kobilka; H Matsui; T S Kobilka; T L Yang-Feng; U Francke; M G Caron; R J Lefkowitz; J W Regan
Journal:  Science       Date:  1987-10-30       Impact factor: 47.728

2.  Effectiveness of measures requiring and not requiring prior sequence alignment for estimating the dissimilarity of natural sequences.

Authors:  B E Blaisdell
Journal:  J Mol Evol       Date:  1989-12       Impact factor: 2.395

3.  The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Authors:  N Saitou; M Nei
Journal:  Mol Biol Evol       Date:  1987-07       Impact factor: 16.240

4.  A measure of the similarity of sets of sequences not requiring sequence alignment.

Authors:  B E Blaisdell
Journal:  Proc Natl Acad Sci U S A       Date:  1986-07       Impact factor: 11.205

5.  A general method applicable to the search for similarities in the amino acid sequence of two proteins.

Authors:  S B Needleman; C D Wunsch
Journal:  J Mol Biol       Date:  1970-03       Impact factor: 5.469

6.  Theoretical foundations for a quantitative approach to paleogenetics. Part I: DNA.

Authors:  R Holmquist
Journal:  J Mol Evol       Date:  1971       Impact factor: 2.395

7.  Algorithms for identifying local molecular sequence features.

Authors:  S Karlin; M Morris; G Ghandour; M Y Leung
Journal:  Comput Appl Biosci       Date:  1988-03

8.  Efficient algorithms for molecular sequence analysis.

Authors:  S Karlin; M Morris; G Ghandour; M Y Leung
Journal:  Proc Natl Acad Sci U S A       Date:  1988-02       Impact factor: 11.205

  8 in total
  12 in total

1.  Phylogenetic continuum indicates "galaxies" in the protein universe: preliminary results on the natural group structures of proteins.

Authors:  I Ladunga
Journal:  J Mol Evol       Date:  1992-04       Impact factor: 2.395

2.  Average values of a dissimilarity measure not requiring sequence alignment are twice the averages of conventional mismatch counts requiring sequence alignment for a variety of computer-generated model systems.

Authors:  B E Blaisdell
Journal:  J Mol Evol       Date:  1991-06       Impact factor: 2.395

3.  Sequence Comparison Without Alignment: The SpaM Approaches.

Authors:  Burkhard Morgenstern
Journal:  Methods Mol Biol       Date:  2021

4.  Protein sequence randomness and sequence/structure correlations.

Authors:  R S Rahman; S Rackovsky
Journal:  Biophys J       Date:  1995-04       Impact factor: 4.033

5.  There appear to be conserved constraints on the distribution of nucleotide sequences in cellular genomes.

Authors:  A C Rogerson
Journal:  J Mol Evol       Date:  1991-01       Impact factor: 2.395

6.  Learning vector quantization as an interpretable classifier for the detection of SARS-CoV-2 types based on their RNA sequences.

Authors:  Marika Kaden; Katrin Sophie Bohnsack; Mirko Weber; Mateusz Kudła; Kaja Gutowska; Jacek Blazewicz; Thomas Villmann
Journal:  Neural Comput Appl       Date:  2021-04-27       Impact factor: 5.606

7.  Phylogenetic tree construction using trinucleotide usage profile (TUP).

Authors:  Si Chen; Lih-Yuan Deng; Dale Bowman; Jyh-Jen Horng Shiau; Tit-Yee Wong; Behrouz Madahian; Henry Horng-Shing Lu
Journal:  BMC Bioinformatics       Date:  2016-10-06       Impact factor: 3.169

8.  A novel fast vector method for genetic sequence comparison.

Authors:  Yongkun Li; Lily He; Rong Lucy He; Stephen S-T Yau
Journal:  Sci Rep       Date:  2017-09-22       Impact factor: 4.379

9.  A new method to cluster DNA sequences using Fourier power spectrum.

Authors:  Tung Hoang; Changchuan Yin; Hui Zheng; Chenglong Yu; Rong Lucy He; Stephen S-T Yau
Journal:  J Theor Biol       Date:  2015-03-05       Impact factor: 2.691

10.  Pattern-based phylogenetic distance estimation and tree reconstruction.

Authors:  Michael Höhl; Isidore Rigoutsos; Mark A Ragan
Journal:  Evol Bioinform Online       Date:  2007-02-25       Impact factor: 1.625

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.