| Literature DB >> 20333198 |
Mihai Albu1, Xiang Jia Min, G Brian Golding, Donal Hickey.
Abstract
The availability of complete genome sequences for 12 Drosophila species provides an unprecedented resource for large-scale studies of genome evolution. In this study, we looked for correlated shifts in the patterns of genome and proteome evolution within the genus Drosophila. Specifically, we asked if the nucleotide composition of the Drosophila willistoni genome--which is significantly less GC rich than the other 11 sequenced Drosophila genomes--is reflected in an altered pattern of amino acid substitutions in the encoded proteins. Our results show that this is indeed the case: There are large and highly significant asymmetries in the patterns of amino acid substitution between D. willistoni and Drosophila melanogaster, and they are in the direction predicted by the nucleotide biases. The implication of this result, combined with previous studies on long-term proteome evolution, is that substitutional biases at the DNA level can be a major factor in determining both the long-term and the short-term directions of proteome evolution.Entities:
Keywords: GC content; amino acid composition; nucleotide content
Year: 2009 PMID: 20333198 PMCID: PMC2817423 DOI: 10.1093/gbe/evp028
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
FDifferences in nucleotide content between the coding sequences of D. melanogaster and D. willistoni. Panel (A) shows a comparison of the frequency of each nucleotide in both species based on all aligned nucleotide sites (conserved and nonconserved). The results for D. melanogaster are shown in red and those for D. willistoni are shown in blue. Panel (B) shows the same comparison as in Panel but limited to the nonconserved, that is, variable, sites only. These frequency differences between the two species were highly significant (P ≪ 0.00001).
FGC content at each of the three codon positions. The results for D. melanogaster are shown in red and those for D. willistoni are shown in blue. These data are based on the nucleotide frequencies at variable sites (see fig. 1). As can be seen from this figure, D. melanogaster has a higher GC content at each of the three codon positions than does D. willistoni. The absolute numbers of GC nucleotide pairs at each position are shown in supplementary figure S1 (Supplementary Material online).
FInterspecific differences in amino acid content of homologous protein sequences at nonconserved sites. Panel (A): number of amino acids encoded by GC-rich codons in each of the two species. In this Panel, all four of the amino acids that encoded by GC-rich codons (G, A, R, P) are grouped together. The number in D. melanogaster is shown by a red bar and the number in D. willistoni by a blue bar. Panel (B): this panel shows the data for each of the four amino acids—Gly, Ala, Arg, and Pro—separately. The color coding is the same as in Panel (A).
Amino Acid Substitution Matrix between D. melanogaster and D. willistoni Homologous Sequences
| GARP | CDEHLQSTVW | FYMINK | Totals ( | |
| GARP | 41,338 | 96,162 | 26,376 | 163,876 |
| CDEHLQSTVW | 117,200 | 207,916 | 98,625 | 423,741 |
| FYMINK | 37,297 | 117,613 | 36,847 | 191,757 |
| Totals ( | 195,835 | 421,691 | 161,848 | 779,374 |
NOTE.—This summary table contains the amino acids at variable sites only. The amino acids are grouped into those encoded by GC-rich codons (G, A, R, and P), those encoded by GC-neutral codons (C, D, E, H, L, Q, S, T, V, and W), and those encoded by GC-poor codons (F, Y, M, I, N, and K).
FBiased patterns of amino acid substitution between D. melanogaster and D. willistoni protein sequences. We constructed an amino acid substitution matrix between the two species (see supplementary table S2, Supplementary Material online). Differences between the upper and lower diagonals were then color coded as follows to illustrate the asymmetry in the matrix. Differences of 250 or greater are shown in red; differences between 50 and 250 are shown in orange; and differences less than 50 are uncolored. Similarly, large negative values are shown in dark blue and intermediate negative values in light blue.