| Literature DB >> 20107605 |
Daniel L Halligan1, Fiona Oliver, Adam Eyre-Walker, Bettina Harr, Peter D Keightley.
Abstract
The relative contributions of neutral and adaptive substitutions to molecular evolution has been one of the most controversial issues in evolutionary biology for more than 40 years. The analysis of within-species nucleotide polymorphism and between-species divergence data supports a widespread role for adaptive protein evolution in certain taxa. For example, estimates of the proportion of adaptive amino acid substitutions (alpha) are 50% or more in enteric bacteria and Drosophila. In contrast, recent estimates of alpha for hominids have been at most 13%. Here, we estimate alpha for protein sequences of murid rodents based on nucleotide polymorphism data from multiple genes in a population of the house mouse subspecies Mus musculus castaneus, which inhabits the ancestral range of the Mus species complex and nucleotide divergence between M. m. castaneus and M. famulus or the rat. We estimate that 57% of amino acid substitutions in murids have been driven by positive selection. Hominids, therefore, are exceptional in having low apparent levels of adaptive protein evolution. The high frequency of adaptive amino acid substitutions in wild mice is consistent with their large effective population size, leading to effective natural selection at the molecular level. Effective natural selection also manifests itself as a paucity of effectively neutral nonsynonymous mutations in M. m. castaneus compared to humans.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20107605 PMCID: PMC2809770 DOI: 10.1371/journal.pgen.1000825
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Estimates of percentage diversity (θ and θ) summed over all sites for M. m. castaneus, and estimates of percentage divergence (d) to M. famulus or the rat.
| Site class |
|
| Tajima's |
|
|
| 0-fold | 0.15 [0.019] | 0.21 [0.019] | −0.87 [0.19] | 0.79 [0.12] | 3.5 [0.42] |
| 2-fold | 0.54 [0.053] | 0.67 [0.053] | −0.72 [0.19] | 2.3 [0.26] | 12 [0.61] |
| 4-fold | 0.79 [0.086] | 0.91 [0.086] | −0.49 [0.16] | 3.3 [0.27] | 19 [0.80] |
| Intron | 0.66 [0.049] | 0.83 [0.049] | −0.85 [0.095] | 2.8 [0.15] | 15 [0.49] |
Standard errors are shown in square brackets.
Figure 1Plots of the site frequency spectra for 0-fold, 2-fold, and 4-fold degenerate and intronic sites.
The upper plot includes all sites, whereas the lower plot is for non-CpG-prone sites only.
Estimates of percentage nucleotide diversity (θ) at 4-fold degenerate sites, the per nucleotide site mutation rate per generation (μ), and recent N in M. m. castaneus, African humans, and African D. melanogaster.
| Species |
| Dataset for |
| Reference (for |
|
|
| 0.79 | This study | 3.4 |
| 580,000 |
| Human | 0.11 |
| 25 |
| 9,300 |
|
| 1.70 |
| 5.8 |
| 730,000 |
Estimates of N are obtained assuming θ = 4N.
Estimated percentages of amino acid mutations in different N ranges and estimates of α, the fraction of substitutions driven to fixation by positive selection.
| Neutral Reference | Dataset | % of mutations in |
|
| ||
| 0–1 | 1–10 | >10 | ||||
| 4-fold Sites |
| 10 [3/18] | 11 [5/17] | 79 [71/90] | 0.57 [0.30/0.76] | – |
| Human, EGP | 21 [16/28] | 12 [6/18] | 67 [60/73] | 0.13 [−0.18/0.37] | 0.014 | |
| Human, EGP (subset) | 22 [14/33] | 12 [3/25] | 66 [49/77] | −0.045 [−0.73/0.36] | 0.014 | |
| Human, PGA | 25 [19/32] | 15 [7/23] | 60 [53/66] | 0.31 [0.055/0.50] | 0.11 | |
| Human | 21 [20/24] | 12 [10/14] | 66 [66/67] | 0.21 [0.12/0.27] | 0.020 | |
|
| 6 [4/7] | 7 [5/9] | 87 [85/89] | 0.52 [0.39/0.62] | 0.54 | |
| Intronic Sites |
| 14 [6/23] | 10 [3/15] | 76 [70/82] | 0.45 [0.063/0.71] | – |
| Human, EGP | 27 [19/36] | 13 [4/20] | 60 [53/67] | 0.034 [−0.36/0.34] | 0.11 | |
| Human, EGP (subset) | 32 [21/40] | 4 [3/15] | 64 [53/72] | −0.44 [−0.85/0.13] | 0.006 | |
| Human, PGA | 32 [25/40] | 14 [7/22] | 53 [47/59] | 0.13 [−0.11/0.34] | 0.16 | |
Estiamtes are obtained assuming either 4-fold degenerate sites or intronic sites as the neutral reference. P-values correspond to the comparison of α for each species (other than M. m. castaneus) with M. m. castaneus. Data analysed are: M. m. castaneus, this study, contrasted with M. famulus. African EGP polymorphism data set [28] contrasted with macaque. EGP subset refers to the set of gene orthologs sequenced in M. m. castaneus. African PGA data set [33] contrasted with macaque. African American population polymorphism data from Boyko et al. [20] contrasted with chimpanzee. African D. melanogaster polymorphism data [11] contrasted with D. simulans. The fitnesses of the wild-type, heterozygote, and mutant homozygote genotypes are assumed to be 1, 1−s/2, and 1−s, respectively. 95% confidence intervals are shown in square brackets.
P-values for contrast between estimated frequencies of nearly neutral (N<1) mutations and strongly deleterious (N>10) mutations between M. m. castaneus and human datasets.
| Neutral Reference | Dataset | P-value for mutation frequency class contrasted | |
|
|
| ||
| 4-folds | EGP | 0.02 | 0.026 |
| EGP (subset) | 0.038 | 0.068 | |
| PGA | 0.01 | 0.002 | |
| Boyko et al. | 0.014 | 0.008 | |
| Introns | EGP | 0.036 | 0.002 |
| EGP (subset) | 0.02 | 0.028 | |
| PGA | 0.006 | <0.002 | |
P-values are calcualted seperately for analysis involving 4-fold degenerate synonymous sites or introns as the neutral reference.
Estimates of the fraction of substitutions driven to fixation by positive selection obtained from estimates based on the inferred distribution of effects.
| Site type | Outgroup | Neutral reference |
|
| All |
| 4-fold | 0.57 [0.30/0.76] |
| intron | 0.45 [0.063/0.71] | ||
| Rat | 4-fold | 0.44 [0.13/0.69] | |
| intron | 0.33 [−0.11/0.68] | ||
| Non-CpG-prone |
| 4-fold | 0.54 [0.0036/0.76] |
| intron | 0.56 [−0.12/0.88] | ||
| Rat | 4-fold | 0.37 [−0.36/0.65] | |
| intron | 0.51 [−0.044/0.86] |
Estiamtes are calculated for different classes of sites and using either rat and M. famulus as outgroups. 95% confidence intervals are shown in square brackets.
Figure 2Proportion of ancestry assigned to population 1.
Assuming K = 2 using 82 (4-fold degenerate sites, (A)) or using 84 (intronic sites, (B)) unlinked SNP loci. Each column represents one of the 15 M. m. castaneus individuals.
Estimates of the fraction of substitutions driven to fixation by positive selection obtained using a simple extension of the McDonald-Kreitman test [7].
| Site type | Outgroup | Neutral reference |
|
|
| All |
| 4-fold | 0.14 [0.13] | 0.35 [0.15] |
| intron | 0.0020 [0.18] | 0.21 [0.22] | ||
| Rat | 4-fold | −0.067 [0.14] | 0.22 [0.14] | |
| intron | −0.22 [0.16] | 0.076 [0.18] | ||
| Non-CpG-prone |
| 4-fold | 0.081 [0.23] | 0.40 [0.23] |
| intron | −0.077 [0.26] | 0.44 [0.27] | ||
| Rat | 4-fold | −0.094 [0.19] | 0.18 [0.24] | |
| intron | −0.13 [0.21] | 0.26 [0.26] |
Estimates are calculated using all sites (α FWW) and using only sites with variants >10% (α FWW>10%) for different classes of sites and using either rat and M. famulus as outgroups. Standard errors are shown in square brackets.