Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Average values of a dissimilarity measure not requiring sequence alignment are twice the averages of conventional mismatch counts requiring sequence alignment for a computer-generated model system.

Literature DB >> 2515300

Average values of a dissimilarity measure not requiring sequence alignment are twice the averages of conventional mismatch counts requiring sequence alignment for a computer-generated model system.

Abstract

Three measures of sequence dissimilarity have been compared on a computer-generated model system in which substitutions in random sequences were made at randomly selected sites and the replacement character was chosen at random from the set of characters different from the original occupant of the site. The three measures were the conventional mismatch count between aligned sequences (AMC = m) and two measures not requiring prior sequence alignment. The latter two measures were the squared Euclidean distance between vectors of counts of t-tuples (t = 1-6) of characters in the two sequences (multiplet distribution distances or MDD = d) and counts of characters not covered by word structures of statistically significant length common to the two sequences (common long words or CLW = SIB, SIS, or SAB). Average MDD distances were found to be two times average mismatch counts in the simulated sequences for all values of t from 1 to 6 and all degrees of substitution from one per sequence to so many as to produce, effectively, random sequences. This simple relation held independently of sequence length and of sequence composition. The relation was confirmed by exact results on small model systems and by formal asymptotic results in the limit of so few substitutions that no double hits occur and in the limit of two random sequences. The coefficient of variation for MDD distances was greater than that for mismatch counts for singlets but both measures approached the same low value for sextets. Needleman-Wunsch alignment produced incorrect mismatch counts at higher degrees of substitution. The model satisfied the conditions for the derivation of the Jukes-Cantor asymptotic adjustment, but its application produced increasingly bad results with increasing degrees of substitution in accord with earlier results on model and natural sequences. This fact was a consequence of the increase with increasing degrees of substitution of the sensitivity of the adjustment to error in the observations. Average CLW distances for a variety of common word structures were more or less parallel to MDD distances for appropriately long t-tuples. These results on model systems supported the validity of the two dissimilarity measures not requiring sequence alignment that was found in earlier work on natural sequences (Blaisdell 1989).

Entities: Disease Gene

Mesh：

Substances：
DNA

Year: 1989 PMID： 2515300 DOI： 10.1007/bf02602925

Source DB: PubMed Journal: J Mol Evol ISSN： 0022-2844 Impact factor: 2.395

8 in total

1. Cloning, sequencing, and expression of the gene coding for the human platelet alpha 2-adrenergic receptor.

Authors: B K Kobilka; H Matsui; T S Kobilka; T L Yang-Feng; U Francke; M G Caron; R J Lefkowitz; J W Regan
Journal: Science Date: 1987-10-30 Impact factor: 47.728

2. Effectiveness of measures requiring and not requiring prior sequence alignment for estimating the dissimilarity of natural sequences.

Authors: B E Blaisdell
Journal: J Mol Evol Date: 1989-12 Impact factor: 2.395

3. The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Authors: N Saitou; M Nei
Journal: Mol Biol Evol Date: 1987-07 Impact factor: 16.240

4. A measure of the similarity of sets of sequences not requiring sequence alignment.

Authors: B E Blaisdell
Journal: Proc Natl Acad Sci U S A Date: 1986-07 Impact factor: 11.205

5. A general method applicable to the search for similarities in the amino acid sequence of two proteins.

Authors: S B Needleman; C D Wunsch
Journal: J Mol Biol Date: 1970-03 Impact factor: 5.469

6. Theoretical foundations for a quantitative approach to paleogenetics. Part I: DNA.

Authors: R Holmquist
Journal: J Mol Evol Date: 1971 Impact factor: 2.395

7. Algorithms for identifying local molecular sequence features.

Authors: S Karlin; M Morris; G Ghandour; M Y Leung
Journal: Comput Appl Biosci Date: 1988-03

8. Efficient algorithms for molecular sequence analysis.

Authors: S Karlin; M Morris; G Ghandour; M Y Leung
Journal: Proc Natl Acad Sci U S A Date: 1988-02 Impact factor: 11.205

8 in total

12 in total

1. Phylogenetic continuum indicates "galaxies" in the protein universe: preliminary results on the natural group structures of proteins.

Authors: I Ladunga
Journal: J Mol Evol Date: 1992-04 Impact factor: 2.395

2. Average values of a dissimilarity measure not requiring sequence alignment are twice the averages of conventional mismatch counts requiring sequence alignment for a variety of computer-generated model systems.

Authors: B E Blaisdell
Journal: J Mol Evol Date: 1991-06 Impact factor: 2.395

3. Sequence Comparison Without Alignment: The SpaM Approaches.

Authors: Burkhard Morgenstern
Journal: Methods Mol Biol Date: 2021

4. Protein sequence randomness and sequence/structure correlations.

Authors: R S Rahman; S Rackovsky
Journal: Biophys J Date: 1995-04 Impact factor: 4.033

5. There appear to be conserved constraints on the distribution of nucleotide sequences in cellular genomes.

Authors: A C Rogerson
Journal: J Mol Evol Date: 1991-01 Impact factor: 2.395

6. Learning vector quantization as an interpretable classifier for the detection of SARS-CoV-2 types based on their RNA sequences.

Authors: Marika Kaden; Katrin Sophie Bohnsack; Mirko Weber; Mateusz Kudła; Kaja Gutowska; Jacek Blazewicz; Thomas Villmann
Journal: Neural Comput Appl Date: 2021-04-27 Impact factor: 5.606