| Literature DB >> 24531731 |
Abstract
Like other species of Drosophila, Drosophila pseudoobscura has a distinct bias toward the usage of C- and G-ending codons. Previous studies have indicated that this bias is due, at least in part, to natural selection. Codon bias clearly differs among amino acids (and other codon classes) in Drosophila, which may reflect differences in the intensity of selection on codon usage. Ongoing natural selection on synonymous codon usage should be reflected in the shapes of the site frequency spectra of derived states at polymorphic positions. Specifically, regardless of other demographic effects on the spectrum, it should be shifted toward higher values for changes from less-preferred to more-preferred codons, and toward lower values for the converse. If the intensity of natural selection is increased, shifts in the site frequency spectra should be more pronounced. A total of 33,729 synonymous polymorphic sites on Chromosome 2 in D. pseudoobscura were analyzed. Shifts in the site frequency spectra are consistent with differential intensity of natural selection on codon usage, with stronger shifts associated with higher codon bias. The shifts, in general, are greater for polymorphic synonymous sites than for polymorphic intron sites, also consistent with natural selection. However, unlike observations in D. melanogaster, codon bias is not reduced in areas of low recombination in D. pseudoobscura; the site frequency spectrum signal for selection on codon usage remains strong in these regions. However, diversity is reduced, as expected. It is possible that estimates of low recombination reflect a recent change in recombination rate.Entities:
Keywords: Drosophila pseudoobscura; codon bias; natural selection; recombination; site frequency spectrum
Mesh:
Substances:
Year: 2014 PMID: 24531731 PMCID: PMC4059240 DOI: 10.1534/g3.114.010488
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Recombination rate along Chromosome 2. Estimates are from independent testcrosses reported in McGaugh .
Codon preference scores
| Codon | Amino Acid | Pref Score | Codon | Amino Acid | Pref Score |
|---|---|---|---|---|---|
| GCC | ala | 0.571 | CTG | leu | 0.697 |
| GCG | ala | 0.076 | CTC | leu | 0.316 |
| GCT | ala | −0.401 | TTG | leu | −0.434 |
| GCA | ala | −0.491 | CTA | leu | −0.443 |
| CGC | arg | 0.501 | CTT | leu | −0.458 |
| CGG | arg | 0.163 | TTA | leu | −0.572 |
| CGT | arg | −0.090 | AAG | lys | 0.700 |
| AGG | arg | −0.188 | AAA | lys | −0.700 |
| CGA | arg | −0.279 | TTC | phe | 0.534 |
| AGA | arg | −0.471 | TTT | phe | −0.534 |
| AAC | asn | 0.461 | CCC | pro | 0.425 |
| AAT | asn | −0.461 | CCG | pro | 0.211 |
| GAC | asp | 0.392 | CCT | pro | −0.404 |
| GAT | asp | −0.392 | CCA | pro | −0.442 |
| TGC | cys | 0.304 | TCC | ser | 0.287 |
| TGT | cys | −0.304 | AGC | ser | 0.270 |
| CAG | gln | 0.656 | TCG | ser | 0.228 |
| CAA | gln | −0.656 | AGT | ser | −0.297 |
| GAG | glu | 0.724 | TCT | ser | −0.351 |
| GAA | glu | −0.724 | TCA | ser | −0.456 |
| GGC | gly | 0.430 | ACC | thr | 0.435 |
| GGG | gly | −0.083 | ACG | thr | 0.204 |
| GGT | gly | −0.222 | ACT | thr | −0.357 |
| GGA | gly | −0.291 | ACA | thr | −0.416 |
| CAC | his | 0.331 | TAC | tyr | 0.421 |
| CAT | his | −0.331 | TAT | tyr | −0.421 |
| ATC | ile | 0.584 | GTG | val | 0.453 |
| ATT | ile | −0.345 | GTC | val | 0.245 |
| ATA | ile | −0.405 | GTA | val | −0.489 |
| GTT | val | −0.492 |
Pref, preference.
Estimates of synonymous divergence and diversity
| Amino Acid | ||||||
|---|---|---|---|---|---|---|
| All | 543,985.0 | 35,376 | 33,729 | 41,360 | 0.022203 | 0.076032 |
| ala | 50,380.0 | 2661 | 2513 | 3126 | 0.018033 | 0.062048 |
| arg | 48,644.0 | 2558 | 2365 | 3047 | 0.017954 | 0.062639 |
| asn | 13,894.3 | 1282 | 1281 | 1621 | 0.031502 | 0.116667 |
| asp | 15,130.0 | 1303 | 1302 | 1835 | 0.029403 | 0.121282 |
| cys | 5,124.6 | 500 | 500 | 576 | 0.033311 | 0.112398 |
| glu | 18,283.6 | 1733 | 1731 | 1936 | 0.032361 | 0.105887 |
| gln | 12,420.7 | 1034 | 1034 | 1151 | 0.028422 | 0.092668 |
| gly | 40,903.0 | 2600 | 2427 | 2932 | 0.021702 | 0.071682 |
| his | 7051.3 | 593 | 593 | 813 | 0.028713 | 0.115298 |
| ile | 31,135.4 | 1833 | 1782 | 2028 | 0.020100 | 0.065135 |
| leu | 91,350.6 | 5619 | 5139 | 6313 | 0.021001 | 0.069107 |
| lys | 17,437.3 | 1500 | 1500 | 1817 | 0.029370 | 0.104202 |
| phe | 11,413.0 | 1358 | 1356 | 1634 | 0.040624 | 0.143170 |
| pro | 35,924.0 | 2137 | 1975 | 2481 | 0.020310 | 0.069062 |
| ser | 44,903.3 | 2865 | 2735 | 3331 | 0.021784 | 0.074182 |
| thr | 43,553.0 | 2406 | 2257 | 2775 | 0.018861 | 0.063715 |
| tyr | 9123.6 | 851 | 849 | 1062 | 0.031845 | 0.116401 |
| val | 47,313.0 | 2543 | 2390 | 2882 | 0.018351 | 0.060913 |
Number of synonymous sites in D. pseudoobscura.
Number of synonymous polymorphic sites in D. pseudoobscura.
Number of synonymous polymorphic sites segregating two codons in D. pseudoobscura, and for which a recombination rate estimate is available.
Number of divergent synonymous sites between the reference strain of D. pseudoobscura v2.9 (Richards ) and D. lowei for codons fully resolved in all D. pseudoobscura strains and in D. lowei.
Watterson (1975) estimator of synonymous theta in D. pseudoobscura.
Synonymous divergence per base pair.
Figure 2Diversity and codon bias relative to recombination rate. Points are plotted at the upper end of the recombination rate range (e.g., at 0.25 for 0−0.25 cM/Mb); the red point represents sites in regions with recombination rate above 6 cM/Mb. (A) Synonymous diversity measure is the Watterson (1975) estimator of θ. (B) Intron diversity. (C) Codon bias measure is Fop (Sharp and Devine 1989).
Figure 3Diversity and codon bias along Chromosome 2. (A) Diversity in all recombination map segments. Segments upstream (FLint_upout) and downstream (FLint_dnout) of the recombination map are also shown; for these, there is no corresponding recombination rate estimate. (B) Diversity in segments with at least 5000 synonymous sites. (C) Codon bias (Fop) in all segments. (D) Codon bias in segments with at least 5,000 synonymous sites. Recombination rates (cM/Mb from the Flagstaff testcross) are shown for reference.
Figure 4Sites frequency spectrum for synonymous polymorphic sites. Shown are sites that segregate two codons and fall within a region for which recombination rate was estimated. “P to U,” a change to a more unpreferred codon; “U to P,” a change to a more preferred codon.
Shifts in site frequency spectra for each amino acid
| Amino Acid | All Sites | Singletons Excluded | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| N U→P | N P→U | Mean U→P | Mean P→U | N U→P | N P→U | Mean U→P | Mean P→U | |||||
| ala | 743 | 1770 | 3.355 | 2.267 | <10−15 | <10−15 | 346 | 682 | 4.497 | 3.748 | 3.43 × 10−7 | 2.54 × 10−8 |
| arg | 738 | 1627 | 2.976 | 2.410 | 5.78 × 10−7 | 2.96 × 10−8 | 358 | 651 | 4.168 | 3.902 | 0.0354 | 0.0177 |
| asn | 553 | 728 | 3.562 | 2.404 | 4.68 × 10−13 | 4.34 × 10−14 | 279 | 277 | 4.466 | 3.910 | 0.00246 | 0.00178 |
| asp | 618 | 684 | 3.565 | 2.367 | 1.39 × 10−14 | 3.02 × 10−14 | 305 | 259 | 4.662 | 3.811 | 8.26 × 10−6 | 6.95 × 10−7 |
| cys | 162 | 338 | 3.525 | 2.346 | 1.43 × 10−5 | 4.278 × 10−6 | 81 | 125 | 4.716 | 3.776 | 0.00319 | 8.19 × 10−4 |
| gln | 232 | 802 | 3.608 | 2.170 | 1.72 × 10−10 | 1.84 × 10−10 | 112 | 304 | 4.875 | 3.612 | 1.66 × 10−6 | 2.457 × 10−7 |
| glu | 408 | 1323 | 3.733 | 2.225 | <10−15 | <10−15 | 198 | 467 | 4.768 | 3.777 | 1.20 × 10−6 | 3.003 × 10−6 |
| gly | 850 | 1577 | 3.029 | 2.373 | 2.47 × 10−9 | 4.26 × 10−12 | 409 | 597 | 4.117 | 3.858 | 0.0315 | 0.00371 |
| his | 255 | 338 | 3.333 | 1.976 | 6.14 × 10−10 | 1.08 × 10−10 | 124 | 108 | 4.274 | 3.472 | 0.00281 | 0.00347 |
| ile | 502 | 1280 | 3.171 | 2.221 | 2.46 × 10−11 | 5.43 × 10−12 | 243 | 476 | 4.412 | 3.754 | 1.40 × 10−4 | 2.48 × 10−4 |
| leu | 1151 | 3988 | 3.581 | 2.217 | <10−15 | <10−15 | 578 | 1452 | 4.490 | 3.766 | 3.58 × 10−10 | 1.56 × 10−10 |
| lys | 335 | 1165 | 3.878 | 2.039 | <10−15 | <10−15 | 170 | 411 | 4.659 | 3.640 | 2.13 × 10−6 | 1.14 × 10−5 |
| phe | 372 | 984 | 4.024 | 2.181 | <10−15 | <10−15 | 183 | 360 | 4.787 | 3.728 | 9.97 × 10−7 | 1.42 × 10−6 |
| pro | 622 | 1353 | 3.238 | 2.362 | 3.25 × 10−11 | 4.14 × 10−11 | 304 | 547 | 4.365 | 3.826 | 4.84 × 10−4 | 4.47 × 10−4 |
| ser | 799 | 1936 | 3.299 | 2.246 | <10−15 | <10−15 | 384 | 694 | 4.331 | 3.710 | 1.164 × 10−5 | 2.66 × 10−5 |
| thr | 691 | 1566 | 3.148 | 2.317 | 8.74 × 10−12 | 3.81 × 10−12 | 340 | 615 | 4.412 | 3.784 | 2.553 × 10−5 | 1.78 × 10−4 |
| tyr | 338 | 511 | 3.781 | 2.160 | <10−15 | <10−15 | 195 | 181 | 4.482 | 3.630 | 8.016 × 10−5 | 4.88 × 10−4 |
| val | 549 | 1841 | 3.353 | 2.223 | <10−15 | <10−15 | 264 | 693 | 4.598 | 3.755 | 6.314 × 10−7 | 9.97 × 10−8 |
N, number of polymorphic sites.
Mean frequency of derived states/site.
P-values are for 1-tailed tests.
Shifts in site frequency spectra for each amino acid, Monte Carlo analyses
| Amino Acid | ||||
|---|---|---|---|---|
| ala | 0 | 0 | 0 | 0 |
| arg | 0 | 0 | 0.02667 | 0.01952 |
| asn | 0 | 0 | 0.00238 | 0.00542 |
| asp | 0 | 0 | 0.00001 | 0.00002 |
| cys | 0.00002 | 0.00008 | 0.00273 | 0.00527 |
| gln | 0 | 0 | 0 | 0 |
| glu | 0 | 0 | 0 | 0 |
| gly | 0 | 0 | 0.02055 | 0.01272 |
| his | 0 | 0 | 0.00337 | 0.00969 |
| ile | 0 | 0 | 0.00001 | 0.00002 |
| leu | 0 | 0 | 0 | 0 |
| lys | 0 | 0 | 0 | 0 |
| phe | 0 | 0 | 0 | 0 |
| pro | 0 | 0 | 0.00009 | 0.00008 |
| ser | 0 | 0 | 0 | 0 |
| thr | 0 | 0 | 0 | 0.00001 |
| tyr | 0 | 0 | 0.00007 | 0.00067 |
| val | 0 | 0 | 0 | 0 |
Permutation test of Llopart .
All P-values are for 1-tailed tests.
A reported estimate of 0 indicates that none of 100,000 data permutations led to a higher value of the test statistic.
Analysis of variance (site type × direction)
| Base Change | Effect | d.f. | SS | MS | ||
|---|---|---|---|---|---|---|
| C↔T | Site type | 1 | 667 | 667 | 107.87 | <10−15 |
| Direction | 1 | 7556 | 7556 | 1,222.32 | <10−15 | |
| Interaction | 1 | 467 | 467 | 75.62 | <10−15 | |
| Residual | 30,844 | 190,669 | 6.2 | |||
| G↔A | Site type | 1 | 179 | 179 | 29.89 | <10−7 |
| Direction | 1 | 6829 | 6829 | 1,137.35 | <10−15 | |
| Interaction | 1 | 282 | 282 | 46.95 | <10−11 | |
| Residual | 25,420 | 152,623 | 6.0 | |||
| C↔A | Site type | 1 | 200 | 200 | 37.89 | <10−9 |
| Direction | 1 | 1235 | 1235 | 234.06 | <10−15 | |
| Interaction | 1 | 97 | 97 | 18.31 | <10−4 | |
| Residual | 11,121 | 58,699 | 5.3 | |||
| G↔T | Site type | 1 | 181 | 181 | 34.05 | <10−8 |
| Direction | 1 | 1134 | 1134 | 212.96 | <10−15 | |
| Interaction | 1 | 48 | 48 | 9.00 | 0.00271 | |
| Residual | 9.848 | 52,445 | 5.3 | |||
| C↔G | Site type | 1 | 292 | 292 | 46.46 | <10−11 |
| Direction | 1 | 0.00 | 0.00 | 0.00 | 0.993 | |
| Interaction | 1 | 5.91 | 5.91 | 0.94 | 0.332 | |
| Residual | 9,091 | 57,100 | 6.3 | |||
| A↔T | Site type | 1 | 330.8 | 330.8 | 62.78 | <10−14 |
| Direction | 1 | 2.2 | 2.2 | 0.42 | 0.519 | |
| Interaction | 1 | 11.9 | 11.9 | 2.26 | 0.133 | |
| Residual | 10,981 | 57,867 | 5.3 |
d.f., degrees of freedom; SS, sum of squares; MS, mean squares.
Site type can be intron or codon third position.
Direction can be, for example, C→T or T→C.
Figure 5Shifts in site frequency spectra among codon pairs. (A) Difference in average frequency of derived states/polymorphic site for C/T codon pairs relative to codon usage (i.e., proportion of C-ending codons). (B) Difference in average frequency of derived states/site for C/T codon pairs relative to Δpref for T→C changes. (C, D) Corresponding figures for G/A codon pairs. Letters in legend correspond to single-letter amino acid codes; blue, fourfold degenerate amino acids; light blue, codon pair from fourfold degenerate subclass of sixfold degenerate amino acids; red, codon pair from isoleucine or twofold degenerate subclass of sixfold degenerate amino acids; gold, twofold degenerate amino acids. Dashed lines correspond to linear regression through all points.
Figure 6Site frequency spectra corrected for sequencing error and ancestral state misassignment (ASM). Expected proportions under a constant-Ne Wright-Fisher neutral model are shown in black; our data, assuming parsimony, are shown in blue. (A) Correction for ASM based on observed levels of diversity and divergence (following Llopart ). (B) Correction for ASM with a 0.1% sequencing error rate. (C) Correction for ASM with a 0.54% error rate.
Summary data for C/T and G/A segregating and fixed different codon third positions
| Codon Pair | Amino Acid | C3 or G3 | ΔprefTA->CG | |||
|---|---|---|---|---|---|---|
| GC C/T | Ala | 0.757 | 0.972 | 684/24,976 | 193/8,275 | 2.251:3.539 |
| GG C/T | Gly | 0.747 | 0.652 | 574/19,504 | 226/7,027 | 2.336:3.319 |
| CC C/T | Pro | 0.774 | 0.829 | 428/14,444 | 97/4,144 | 2.185:3.557 |
| AC C/T | Thr | 0.724 | 0.792 | 458/15,326 | 104/6,117 | 2.216:3.808 |
| GT C/T | Val | 0.663 | 0.737 | 355/12,959 | 98/7,011 | 2.352:3.622 |
| CG C/T | Arg4 | 0.692 | 0.591 | 509/14,398 | 194/7,044 | 2.369:3.170 |
| CT C/T | Leu4 | 0.728 | 0.774 | 525/14,629 | 116/5,869 | 2.051:3.776 |
| TC C/T | Ser4 | 0.743 | 0.638 | 436/14,149 | 150/5,164 | 1.959:4.093 |
| AT C/T | Ile | 0.601 | 0.929 | 753/21.090 | 241/15,310 | 2.112:3.560 |
| AG C/T | Ser2 | 0.708 | 0.567 | 468/16,090 | 223/7,353 | 2.348:3.090 |
| AA C/T | Asn | 0.552 | 0.922 | 728/21,727 | 553/19,956 | 2.404:3.562 |
| GA C/T | Asp | 0.505 | 0.784 | 684/21,184 | 618/24.206 | 2.367:3.565 |
| TG C/T | Cys | 0.723 | 0.608 | 338/10,916 | 162/4,458 | 2.346:3.525 |
| CA C/T | His | 0.578 | 0.662 | 338/11,373 | 255/9,781 | 1.976:3.333 |
| TT C/T | Phe | 0.629 | 1.064 | 984/20,878 | 372/13,361 | 2.181:4.024 |
| TA C/T | Tyr | 0.621 | 0.842 | 511/15,979 | 338/11,392 | 2.160:3.781 |
| GC G/A | Ala | 0.522 | 0.567 | 278/8.438 | 161/8,691 | 2.392:3.503 |
| GG G/A | Gly | 0.361 | 0.208 | 229/4,821 | 201/9,551 | 2.655:2.960 |
| CC G/A | Pro | 0.557 | 0.653 | 308/8,910 | 181/8,456 | 2.360:3.508 |
| AC G/A | Thr | 0.571 | 0.620 | 408/12,025 | 181/10,085 | 2.368:3.320 |
| GT G/A | Val | 0.849 | 0.942 | 631/23,036 | 113/4,307 | 2.019:3.894 |
| CG G/A | Arg4 | 0.566 | 0.442 | 217/5,828 | 115/5,398 | 2.618:3.104 |
| CT G/A | Leu4 | 0.874 | 1.140 | 959/34,643 | 169/5,649 | 2.088:4.172 |
| TC G/A | Ser4 | 0.730 | 0.684 | 399/12,539 | 102/5,237 | 2.120:3.941 |
| AG G/A | Arg2 | 0.574 | 0.283 | 145/4,236 | 70/3,394 | 2.159:3.171 |
| TT G/A | Leu2 | 0.827 | 0.138 | 334/12,658 | 74/2,788 | 2.135:3.824 |
| GA G/A | Glu | 0.722 | 1.448 | 1323/38,468 | 408/16,383 | 2.225:3.733 |
| CA G/A | Gln | 0.753 | 1.312 | 802/28,876 | 232/10,386 | 2.170:3.608 |
| AA G/A | Lys | 0.723 | 1.400 | 1165/37,359 | 335/14,953 | 2.039:3.878 |
Usage of the codon ending in C or G for a C/T or G/A codon pair, respectively.
Δpref for a substitution of a T- or A-ending codon with the corresponding C- or G-ending codon.
S, frequency of polymorphic sites with C or G as the ancestral state; N, frequency of sites with C or G as the ancestral state; frequencies are reported only for sites that are fully resolved at all three codon positions in all D. pseudoobscura and D. lowei sequences.
N and S for sites with T or A as the ancestral state (see c).
Mean frequency of derived states per site; CG→TA, ancestral state ends with either C or G; TA→CG, ancestral state ends with either T or A.