| Literature DB >> 21173424 |
Peter Andolfatto1, Karen M Wong, Doris Bachtrog.
Abstract
The prevalence of natural selection relative to genetic drift is of central interest in evolutionary biology. Depending on the distribution of fitness effects of new mutations, the importance of these evolutionary forces may differ in species with different effective population sizes. Here, we survey population genetic variation at 105 orthologous X-linked protein coding regions in Drosophila melanogaster and its sister species D. simulans, two closely related species with distinct demographic histories. We observe significantly higher levels of polymorphism and evidence for stronger selection on codon usage bias in D. simulans, consistent with a larger historical effective population size on average for this species. Despite these differences, we estimate that <10% of newly arising nonsynonymous mutations have deleterious fitness effects in the nearly neutral range (i.e., -10 < N(e)s < 0) in both species. The inferred distributions of fitness effects and demographic models translate into surprisingly high estimates of the fraction of "adaptive" protein divergence in both species (∼ 85-90%). Despite evidence for different demographic histories, differences in population size have apparently played little role in the dynamics of protein evolution in these two species, and estimates of the adaptive fraction (α) of protein divergence in both species remain high even if we account for recent 10-fold growth. Furthermore, although several recent studies have noted strong signatures of recurrent adaptive protein evolution at genes involved in immunity, reproduction, sexual conflict, and intragenomic conflict, our finding of high levels of adaptive protein divergence at randomly chosen proteins (with respect to function) suggests that many other factors likely contribute to the adaptive protein divergence signature in Drosophila.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21173424 PMCID: PMC3038356 DOI: 10.1093/gbe/evq086
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
FLevels of diversity at orthologous loci in D. melanogaster and D. simulans. Locus by locus estimates of (A) average pairwise diversity, π, and (B) the population mutation rate, θ. Panels A and B show P-values for the hypothesis that Dmel=Dsim using Wilcoxon Matched-pairs Signed ranks tests. Filled circles indicate 4-fold synonymous sites (Syn4f, 105 loci) and grey squares indicate short introns (21 loci). In both cases, diversity estimates are significantly positively correlated in the two species (two-tailed Spearman Rank Correlation test P-values are given in panel A). C. Lineage-specific 4-fold synonymous divergence (ds_4f, open circles) is not strongly correlated between species. Lineage-specific 0-fold nonsynonymous divergence (dn_0f, filled circles) is strongly correlated in the two species. P-values are from two-tailed Spearman Rank correlation tests. D. Synonymous site diversity (π) is negatively correlated with nonsynonymous divergence per site (dn) in both species (Dmel, filled circles; Dsim, open boxes). Synonymous site diversity estimates (π) have been corrected for ds using partial regression (Andolfatto 2007). The lines (black = Dmel; grey = Dsim) indicate a lowess fit to the data and P-values are from two-tailed Spearman Rank correlation tests.
Summary of Diversity Levels at Homologous Loci in Drosophila melanogaster and D. simulans
| Measure (Sites) | # Sites | Dsim/Dmel | |||
| π (long introns) | 3,849/3,754 | 1.61 | 1.22 | 0.76 | >0.1 |
| θ (long introns) | 3,849/3,754 | 1.69 | 2.17 | 1.29 | >0.1 |
| π (Syn4f) | 11,048 | 2.21 | 2.19 | 1.33 | 0.046 |
| θ (Syn4f) | 11,048 | 2.41 | 3.45 | 1.43 | 2.6 × 10−7 |
| π (short intron) | 1,167 | 2.21 | 2.94 | 1.33 | 0.046 |
| θ (short intron) | 1,167 | 2.39 | 3.79 | 1.59 | 3.4 × 10−3 |
| π (Nonsyn0f) | 42,629 | 0.12 | 0.17 | 1.38 | 0.024 |
| θ (Nonsyn0f) | 42,629 | 0.19 | 0.28 | 1.51 | 3.3 × 10−4 |
Weighted average × 100.
P value determined by a Wilcoxon matched-pairs signed-rank test.
Previously published data (Glinka et al. 2003; Haddrill et al. 2008).
Number of sites in D. melanogaster and D. simulans, respectively.
Mean Frequencies (Number) of Polymorphisms by Class
| Site Class | ||
| Syn (4f) | 0.248 (911) | 0.138 |
| No change (4f) | 0.131 | |
| PU(4f) | 0.234 | 0.123 |
| UP(4f) | 0.201 | |
| Nonsyn (0f) | 0.179 | 0.136 |
| Intron (short) | 0.175 (157) |
NOTE.—The expected frequency under neutral equilibrium is 0.268. All classes are significantly lower than expected under neutrality (P < 0.01, by simulations), expect those that have been underlined in D. melanogaster.
Significantly lower than intron (Syn: P = 0.027; no change: P =7.8 × 10−5; Nonsyn: P = 3 × 10−4).
Significantly lower than no change (P = 0.01, two-tailed Wilcoxon test).
Significantly lower than the no change class (P = 0.04).
Significantly higher than the no change class (P = 2.4 × 10−5).
Significantly lower than the no change class (P = 0.04).
Constraint on Proteins on the X Chromosome of Drosophila melanogaster and D. simulans
| Measure (Sites) | Dsim/Dmel | |||
| d | 0.153 | 0.279 | 1.83 | 0.09 |
| d | 0.92 | 1.00 | 1.08 | >0.1 |
| d | 7.02 | 4.86 | 0.69 | 4.7 × 10−5 |
| π(Nonsyn0f)/π(Syn4f) | 0.096 | 0.108 | 1.13 | >0.1 |
| θw(Nonsyn0f)/θw(Syn4f) | 0.097 | 0.102 | 1.06 | >0.1 |
Note.—A Jukes–Cantor correction has been applied to dn and ds.
Weighted averages across 104 loci for which π, θ, or d > 0 in both Dmel and Dsim.
P value determined by a Wilcoxon matched-pairs signed-rank test.
Excluding one locus no synonymous divergence in Dmel.
Excluding four loci with no synonymous polymorphism (two in Dmel and two in Dsim).
FThe inferred distribution of fitness effects of newly arising nonsynonymous mutations in the D. melanogaster (Dmel) and D. simulans (Dsim) lineages. The reference sites used for demographic inference are given in parentheses. Values in each category of Ne are calculated by integrating a gamma distribution with parameters in Supplement S2.1, where Ne is the weighted average of population size along the lineage. The estimates of Keightley and Eyre-Walker (2007) using the Zimbabwe subsample of D. melanogaster data of Shapiro et al. (2007) are shown for comparison (white bars). 95% confidence limits are based one 200 replicate bootstraps of the data by locus with replacement (see Methods).
Estimates of N × E(s) in Drosophila melanogaster and D. simulans
| Species | Selected Sites | Reference Sites | 95% CI | |
| NonsynOf | Syn4f | 1,202 | 468–3,686 | |
| NonsynOf | Short intron | 912 | 229–12,914 | |
| Syn4f | Short intron | 0.13 | 0.0–85 | |
| PU_Syn4f | Short intron | <0.1 | 0.0–0.4 | |
| NonsynOf | Syn4f | 8601 | 1,521–453,800 | |
| NonsynOf | Short intron | 1,787 | 317–32,338 | |
| Syn4f | Short intron | 2.7 | 0.7–6.8 | |
| PU_Syn4f | Short intron | 2.9 | 1.0–4.0 |
Note.—N × E(s) is the estimated mean selection coefficient scaled by the weighted average of N under the estimated demographic model (see supplementary S2, Supplementary Material online).
FEstimates of the fraction of nonsynoymus divergence excess relative to neutral expectations (a) in the D. melanogaster (black) and D. simulans (gray) lineages. B&EW: method of Bierne & Eyre-Walker (2004); FWW01: method of Fay et al. (2001); EW&K09: method of Eyre-Walker and Keightley (2009); 4f: four-fold synonymous sites; 0f: nondegenerate nonsynonymous sites; intron: short introns. The Keightley and Eyre-Walker (2007) estimates for the Zimbabwe subsample of the Shapiro et al. (2007) data set are shown in panel B. Note that the latter estimates use D. melanogaster–D. simulans divergence, rather than lineage-specific divergence.
FAnalysis of 4-fold synonymous sites by codon change class. (A) Relative diversity in the two species. Paired Wilcoxon test P-value levels of equal diversity in Dmel and Dsim (dashed line) are: *P<2e-4; **P<2e-5; ***P<2e-7. (B) The ratio of divergence to polymorphism (D/P). Mantel-Haenzel test (with continuity correction) P-values versus the UU+PP (no change) class are P=8.5e-5 for Dmel and P=2.2e-6 for Dsim. All of the same patterns are evident when all synonymous sites are used (not shown).
Private and Shared Derived Mutations at 4-Fold and 0-Fold Sites
| Mutation Class | Expected (95% CI) | ||
| 4-fold Syn | |||
| Private polymorphisms | 830 | 1,271 | — |
| Shared polymorphisms | 80 | 80 | 47 (35–60) |
| Private fixations | 380 | 255 | — |
| Private derived | 1,210 | 1,472 | — |
| Shared derived | 158 | 158 | 84 (66–104) |
| 0-fold Nonsyn | |||
| Private polymorphisms | 239 | 353 | — |
| Shared polymorphisms | 6 | 6 | 1.5 (0–4) |
| Private fixations | 252 | 248 | — |
| Private derived | 491 | 601 | — |
| Shared derived | 15 | 15 | 6.3 (2–12) |
Note.—Based on the analysis of 10,603 4-fold sites. Expected numbers of shared mutations due to multiple hits are based on 104 simulated replicates (see Materials and Methods).
The sum of polymorphic and fixed mutations specific to one lineage.
All mutations found in both lineage, including those that are polymorphic in one lineage but fixed in the other.
FThe effect of weak selection on the expected relative levels of diversity in two species with different population sizes. The x-axis corresponds to the intensity of selection in the species with the smaller population size (N1). The Y-axis plots expected relative levels of diversity in the two species. In red and purple are relative θW and π, respectively, in species with a 1.5-fold difference in population size. In blue and green are analogous expectations for a 2-fold difference in population size. Graphs are based on simulations of the Poisson Random Field model (Sawyer and Hartl 1992) using code kindly provided by C. Bustamante.