| Literature DB >> 17083273 |
Abstract
The field of molecular evolution provides many examples of the principle that molecular differences between species contain information about evolutionary history. One surprising case can be found in the frequency of short words in DNA: more closely related species have more similar word compositions. Interest in this has often focused on its utility in deducing phylogenetic relationships. However, it is also of interest because of the opportunity it provides for studying the evolution of genome function. Word-frequency differences between species change too slowly to be purely the result of random mutational drift. Rather, their slow pattern of change reflects the direct or indirect action of purifying selection and the presence of functional constraints. Many such constraints are likely to exist, and an important challenge is to distinguish them. Here we develop a method to do so by isolating the effects acting at different word sizes. We apply our method to 2-, 4-, and 8-base-pair (bp) words across several classes of noncoding sequence. Our major result is that similarities in 8-bp word frequencies scale with evolutionary time for regions immediately upstream of genes. This association is present although weaker in intronic sequence, but cannot be detected in intergenic sequence using our method. In contrast, 2-bp and 4-bp word frequencies scale with time in all classes of noncoding sequence. These results suggest that different genomic processes are involved at different word sizes. The pattern in 2-bp and 4-bp words may be due to evolutionary changes in processes such as DNA replication and repair, as has been suggested before. The pattern in 8-bp words may reflect evolutionary changes in gene-regulatory machinery, such as changes in the frequencies of transcription-factor binding sites, or in the affinity of transcription factors for particular sequences.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17083273 PMCID: PMC1630712 DOI: 10.1371/journal.pcbi.0020150
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Examples of ISW Groups for 4-bp and 8-bp Words
(A) An example of an iso GC/dinucleotide group in 4-bp words. This group consists of the tetranucleotides TGAC, TGTC, and TCAC. In addition to having the same GC content, these words (considering also their reverse complements which are written below them) share the same six dinucleotides. The dinucleotide composition of each word is written below it with lines showing that the same dinucleotides are present in all three words.
(B) An example of an iso GC/di/tetranucleotide group in 8-bp words. The two words CAAGTTGC and CAACTTGC have the same GC content as well as sharing the same 14 dinucleotides and the same 10 tetranucleotides.
Figure 2The Number of Amino Acid Replacements per Site versus IFW Distance
Rows give results for promoter, intronic, intergenic, and coding sequence, respectively. Columns represent different word sizes. In each plot, a datapoint represents a species pair, and each plot contains all possible pairs for our 13 species. Note that the y-axis ranges vary from plot to plot.