Literature DB >> 2363847

Linguistic measure of taxonomic and functional relatedness of nucleotide sequences.

S Pietrokovski1, J Hirshon, E N Trifonov.   

Abstract

The frequencies of "words", oligonucleotides within nucleotide sequences, reflect the genetic information contained in the sequence "texts". Nucleotide sequences are characteristically represented by their contrast word vocabularies. Comparison of the sequences by correlating their contrast vocabularies is shown to reflect well the relatedness (unrelatedness) between the sequences. A single value, the linguistic similarity between the sequences, is suggested as a measure of sequence relatedness. Sequences as short as 1000 bases can be characterized and quantitatively related to other sequences by this technique. The linguistic sequence similarity value is used for analysis of taxonomically and functionally diverse nucleotide sequences. The similarity value is shown to be very sensitive to the relatedness of the source species, thus providing a convenient tool for taxonomic classification of species by their sequence vocabularies. Functionally diverse sequences appear distinct by their linguistic similarity values. This can be a basis for a quick screening technique for functional characterization of the sequences and for mapping functionally distinct regions in long sequences.

Entities:  

Mesh:

Substances:

Year:  1990        PMID: 2363847     DOI: 10.1080/07391102.1990.10508563

Source DB:  PubMed          Journal:  J Biomol Struct Dyn        ISSN: 0739-1102


  13 in total

1.  A simple method for global sequence comparison.

Authors:  E Pizzi; M Attimonelli; S Liuni; C Frontali; C Saccone
Journal:  Nucleic Acids Res       Date:  1992-01-11       Impact factor: 16.971

2.  Viroids proper can be distinguished from hammerhead viroids and satellite RNAs through their dinucleotide composition.

Authors:  F U Gast; R L Spieker
Journal:  Arch Virol       Date:  1996       Impact factor: 2.574

3.  Protein sequence randomness and sequence/structure correlations.

Authors:  R S Rahman; S Rackovsky
Journal:  Biophys J       Date:  1995-04       Impact factor: 4.033

4.  A quality control algorithm for DNA sequencing projects.

Authors:  O White; T Dunning; G Sutton; M Adams; J C Venter; C Fields
Journal:  Nucleic Acids Res       Date:  1993-08-11       Impact factor: 16.971

5.  Compositional heterogeneity of the Escherichia coli genome: a role for VSP repair?

Authors:  G Gutiérrez; J Casadesús; J L Oliver; A Marín
Journal:  J Mol Evol       Date:  1994-10       Impact factor: 2.395

6.  Different clustering of genomes across life using the A-T-C-G and degenerate R-Y alphabets: early and late signaling on genome evolution?

Authors:  V Kirzhner; A Paz; Z Volkovich; E Nevo; A Korol
Journal:  J Mol Evol       Date:  2007-03-19       Impact factor: 2.395

7.  Distinguishing microbial genome fragments based on their composition: evolutionary and comparative genomic perspectives.

Authors:  Scott C Perry; Robert G Beiko
Journal:  Genome Biol Evol       Date:  2010-01-25       Impact factor: 3.416

8.  Periodic recurrence of methionines: fossil of gene fusion?

Authors:  E Kolker; E N Trifonov
Journal:  Proc Natl Acad Sci U S A       Date:  1995-01-17       Impact factor: 11.205

9.  Microbial lifestyle and genome signatures.

Authors:  Chitra Dutta; Sandip Paul
Journal:  Curr Genomics       Date:  2012-04       Impact factor: 2.236

10.  Minimal absent words in prokaryotic and eukaryotic genomes.

Authors:  Sara P Garcia; Armando J Pinho; João M O S Rodrigues; Carlos A C Bastos; Paulo J S G Ferreira
Journal:  PLoS One       Date:  2011-01-31       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.