| Literature DB >> 18516229 |
Adam R Boyko1, Scott H Williamson, Amit R Indap, Jeremiah D Degenhardt, Ryan D Hernandez, Kirk E Lohmueller, Mark D Adams, Steffen Schmidt, John J Sninsky, Shamil R Sunyaev, Thomas J White, Rasmus Nielsen, Andrew G Clark, Carlos D Bustamante.
Abstract
Quantifying the distribution of fitness effects among newly arising mutations in the human genome is key to resolving important debates in medical and evolutionary genetics. Here, we present a method for inferring this distribution using Single Nucleotide Polymorphism (SNP) data from a population with non-stationary demographic history (such as that of modern humans). Application of our method to 47,576 coding SNPs found by direct resequencing of 11,404 protein coding-genes in 35 individuals (20 European Americans and 15 African Americans) allows us to assess the relative contribution of demographic and selective effects to patterning amino acid variation in the human genome. We find evidence of an ancient population expansion in the sample with African ancestry and a relatively recent bottleneck in the sample with European ancestry. After accounting for these demographic effects, we find strong evidence for great variability in the selective effects of new amino acid replacing mutations. In both populations, the patterns of variation are consistent with a leptokurtic distribution of selection coefficients (e.g., gamma or log-normal) peaked near neutrality. Specifically, we predict 27-29% of amino acid changing (nonsynonymous) mutations are neutral or nearly neutral (|s|<0.01%), 30-42% are moderately deleterious (0.01%<|s|<1%), and nearly all the remainder are highly deleterious or lethal (|s|>1%). Our results are consistent with 10-20% of amino acid differences between humans and chimpanzees having been fixed by positive selection with the remainder of differences being neutral or nearly neutral. Our analysis also predicts that many of the alleles identified via whole-genome association mapping may be selectively neutral or (formerly) positively selected, implying that deleterious genetic variation affecting disease phenotype may be missed by this widely used approach for mapping genes underlying complex traits.Entities:
Mesh:
Year: 2008 PMID: 18516229 PMCID: PMC2377339 DOI: 10.1371/journal.pgen.1000083
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Maximum likelihood estimates of proposed distributions of deleterious fitness effects of new nonsynonymous mutations.
| AFRICAN (observed fixed differences = 22,180) | |||||
| model | ΔLL | # fixed | df | distribution | MLE (95% C.I.) |
| neutral | 8536.2 | 86,897 | 0 | Pr(γ = 0) = 1 | |
| fixed (pt mass) | 2801.4 | 4,581 | 1 | Pr(γ = k) = 1 | k = −7.324 (−7.86, −6.81) |
| exponential | 757.1 | 7,894 | 1 | Pr(γ = −x) = EXP(λ) | λ = 0.0365 (0.0336, 0.0400) |
| neutral+lethal | 136.0 | 28,016 | 1 | Pr(γ = 0) = p0; Pr(γ = −∞) = 1−p0 | p0 = 0.3224 (0.314, 0.331) |
| normal | 225.9 | 65,078 | 2 | Pr(γ = x) = NORM(μ,σ) | μ = −38.5 (−43.5, −34.0), σ = 28.6 (25.0, 32.5) |
| pt mass+lethal | 44.0 | 17,878 | 2 | Pr(γ = k) = p; Pr(γ = −∞) = 1−p | p = 0.372 (0.358, 0.387), k = −1.79 (−2.12, −1.45) |
| exponential+lethal | 28.6 | 17,754 | 2 | Pr(γ = −x) = p*EXP(λ); Pr(γ = −∞) = 1−p | λ = 0.373 (0.298, 0.464), p = 0.392 (0.375, 0.415) |
| exponential+neutral | 7.0 | 22,133 | 2 | Pr(γ = 0) = p0; Pr(γ = −x) = (1−p0)*EXP(λ) | λ = 0.0048 (0.0037, 0.0061), p0 = 0.245 (0.231, 0.256) |
| lognormal | 5.1 | 19,812 | 2 | Pr(γ = −x) = LOGNORM(μ, σ) | μ = 5.02 (4.55, 5.50), σ = 5.94 (5.18, 6.74) |
| gamma | 3.7 | 20,113 | 2 | Pr(γ = −x) = GAMMA(α, β) | α = 0.184 (0.158, 0.206), β = 8200 (3500, 20300) |
| neutral+pt mass+lethal | 3.8 | 21,335 | 3 | Pr(γ = 0) = p0; Pr(γ = k) = p; Pr(γ = −∞) = 1−p0−p | p0 = 0.245 (0.222, 0.266), p = 0.208 (0.176, 0.294), k = −13.3 (−8.8, −25.3) |
| neutral+gamma | 2.9 | 20,758 | 3 | Pr(γ = 0) = p0; Pr(γ = −x) = (1−p0)*GAMMA(α,β) | p0 = 0.148 (0.0, 0.235), α = 0.344 (0.178, 0.790), β = 1900 (280, 12300) |
| neutral+exponential+lethal | 2.7 | 20,956 | 3 | Pr(γ = 0) = p0; Pr(γ = −x) = (1−p0−p)*EXP(λ); Pr(γ = −∞) = p | λ = 0.02818 (0.0085, 0.072), p0 = 0.2176 (0.178, 0.245), p = 0.4525 (0.200, 0.548) |
| normal+lethal | – | 24,300 | 3 | Pr(γ = x) = p*NORM(μ,σ); Pr(γ = −∞) = 1−p | p = 0.428 (0.406, 0.458), μ = −4.44 (−5.40, −3.55), σ = 5.44 (4.4, 6.5) |
ML estimates and predicted number of human-chimp fixed differences under each model computed after applying demographic correction. Distributions are in terms of the scaled selection coefficient, γ = 2N, where N is 25,636 in African Americans and 52,907 in European Americans. ΔLL is the likelihood difference between the model and the overall best-fit model for the population; # fixed is the number of nonsynonymous fixed differences predicted by the model. Approximate 95% confidence intervals based on semi-parametric bootstrap are reported for African American parameter estimates.
Figure 1Simulation of demographic and selective parameter estimates with and without linkage.
Simulation results for ML estimate of demographic and selective parameters assuming African American demography (τ = 0.1328, ω = 0.3034) and gamma distribution of fitness effects (α = 0.184, β = 8200). Sample sizes and mutation rates are the same as those in the African American data projected down to N = 24 chromosomes. Each panel represents 100 replicates; actual values shown with black dashed lines. (A) Simulations without linkage; each entry of the site-frequency spectrum is a Poisson variate drawn with the mean being that expected under the demographic model (synonymous sfs) or demography+selection model (nonsynonymous sfs). (B) Simulations with linkage; each entry calculated from a simulation of 11,404 genes, each with 7 linked exons (see Methods). (C) Distribution of inferred values for unlinked (blue) and linked (red) simulations.
Figure 2Observed and expected nonsynonymous site-frequency spectra after demographic correction.
Expected site-frequency spectra under best-fit selection models after demographic correction. Note the logarithmic scale of the y-axis. (A) African American replacement SNPs versus expectation under neutrality, fixed selective effects, and gamma distribution of fitness effects. (B) European American replacement SNPs versus expectation under neutrality, fixed selective effects, and gamma distribution of fitness effects.
Distribution of fitness effects under various best-fit selective models.
| Population ancestry | African | African | European | European | European | African | European | European |
| Demographic model | expansion | expansion | complex bottleneck | complex bottleneck | complex bottleneck | stationary | stationary | simple bottleneck |
| Selection model | gamma | lognormal | gamma | lognormal | exp+neut | gamma | gamma | gamma |
|
| 27.9% | 28.4% | 24.3% | 24.6% | 25.0% | 22.2% | 35.7% | 23.5% |
| 0.00001<| | 14.7% | 14.3% | 14.7% | 15.7% | 0.9% | 48.2% | 38.7% | 16.1% |
| 0.0001<| | 21.9% | 15.4% | 23.1% | 17.5% | 53.7% | 29.6% | 25.4% | 26.1% |
|
| 35.5% | 41.9% | 37.9% | 42.3% | 20.5% | 0.0% | 0.2% | 34.3% |
Proportion of mutations falling into each selection interval for each population under each best-fit models. Demographic model parameters are listed in Table 3; selection model parameters are listed in Table 1 except for African American stationary gamma (α = 0.59, β = 37), and European American stationary gamma (α = 0.36, β = 120), and European American simple bottleneck gamma (α = 0.228, β = 5200).
Robustness of selective and demographic inference in African American dataset.
| full sfs | folded sfs | no singletons | silent site γ = −1 | |||
| MLE | 95% C.I. | MLE | MLE | MLE | ||
| demographic parameters | Nanc | 7778 | 7419–8143 | 7390 | 7847 | 10406 |
| Ncurr | 25636 | 23863–27372 | 25221 | 22293 | 30778 | |
| expansion | 6809 | 6069–7862 | 7602 | 8083 | 5218 | |
| selection parameters | A | 0.184 | 0.158–0.206 | 0.188 | 0.184 | 0.235 |
| β | 8200 | 3500–20300 | 7600 | 12000 | 3150 | |
|
| 27.9% | 27.3–29.0% | 27.4% | 25.3% | 25.4% | |
| 0.0001<| | 14.7% | 12.8–16.9% | 14.8% | 13.3% | 18.1% | |
| 0.001<| | 21.9% | 18.4–25.8% | 22.3% | 20.1% | 28.8% | |
| 0.01<| | 35.5% | 29.2–40.7% | 35.5% | 41.2% | 27.7% | |
Left column: MLE and approximate 95% confidence limits of demographic and selection parameter estimates for the full model. Center columns: MLE using folded site-frequency spectra (ΔLLsil = 1.89 and ΔLLrepl = 1.81 between folded MLE and full MLE) and using full site-frequency spectra excluding singletons (i.e., derived frequency = 1 or n−1: ΔLLsil = 2.41 and ΔLLrepl = 0.0 between no-singleton MLE and full MLE). Right column: MLE assuming silent sites are under weak purifying selection (γ = −1).
Figure 3Cummulative proportion of nonsynonymous mutations with a selection coefficient less than s.
Gamma and lognormal curves represent the best-fit gamma and lognormal models to the African American polymorphism data (Table 1). Gamma+pos and wnorm are the best-fit gamma distribution with positive selection at 2N = 5 and best-fit weighted normal model to the African American polymorphism+divergence data. All four distributions predict nearly identical site-frequency spectra that closely match the observed data. Left side are deleterious selection coefficients; right side are advantageous selection coefficients.
Figure 4Inferred fitness effects of new, segregating, and fixed mutations in African-Americans.
Estimated proportion of new nonsynonymous mutations (left column), SNPs (middle columns), and human-chimp fixed differences (right column) which are strongly deleterious (s<−10−2; red), moderately deleterious (−10−2
Distribution of fitness effects in African Americans by PolyPhen class.
| benign | possibly damaging | probably damaging | combined | |
| proportion of mutations | 50.8% | 25.7% | 23.5% | |
| proportion of SNPs | 72.7% | 17.4% | 9.9% | |
| proportion of fixed differences | 82.4% | 11.6% | 6.0% | |
| best-fit selection model | neutral+exponential | lognormal | lognormal | gamma |
| ML parameters | λ = 0.0038, p0 = 0.409 | μ = 4.69, σ = 2.83 | μ = 5.78, σ = 2.87 | α = 0.184, β = 8200 |
| expected num of fixed differences | 17,954 | 1,846 | 855 | 20,655 |
| observed num of fixed differences | 18,272 | 2,568 | 1,341 | 22,181 |
| s<0.0001 | 42.0% | 14.0% | 7.4% | 26.7% |
| 0.0001<s<0.001 | 9.3% | 25.5% | 18.6% | 15.7% |
| 0.001<s<0.01 | 40.2% | 31.3% | 30.3% | 35.6% |
| 0.01<s | 8.4% | 29.2% | 43.6% | 22.0% |
| mean | −0.0030 | −0.0087 | −0.0265 | −0.0294 |
| mean | −0.00013 | −0.00027 | −0.00040 | −0.00014 |
Best-fit selection distribution for each PolyPhen class of mutations in African Americans and the proportion of new mutations in each PolyPhen class falling into each selection interval. Combined values are the sum of the best-fit values across classes. All selection inferences were run under the maximum likelihood African American expansion model.
A comparison of several recent estimates of the DFE at nonsynonymous sites in humans.
| this study | Eyre-Walker | Yampolsky | |||
| African American (gamma distribution) | European American (gamma distribution) | basic gamma distribution | gamma with demographic correction | ||
|
| 18% | 15% | 11% | 10% | 12% |
| 0.00001< | 10% | 9% | 8% | 10% | 14% |
| 0.0001< | 37% | 38% | 37% | 37% | 49% |
| 0.01< | 36% | 38% | 44% | 29% | 25% |
Note that the selection intervals are those used by [10] and differ from those used in Tables 2– 4.