| Literature DB >> 28249985 |
Bernard Y Kim1, Christian D Huber1, Kirk E Lohmueller2,3,4.
Abstract
The distribution of fitness effects (DFE) has considerable importance in population genetics. To date, estimates of the DFE come from studies using a small number of individuals. Thus, estimates of the proportion of moderately to strongly deleterious new mutations may be unreliable because such variants are unlikely to be segregating in the data. Additionally, the true functional form of the DFE is unknown, and estimates of the DFE differ significantly between studies. Here we present a flexible and computationally tractable method, called Fit∂a∂i, to estimate the DFE of new mutations using the site frequency spectrum from a large number of individuals. We apply our approach to the frequency spectrum of 1300 Europeans from the Exome Sequencing Project ESP6400 data set, 1298 Danes from the LuCamp data set, and 432 Europeans from the 1000 Genomes Project to estimate the DFE of deleterious nonsynonymous mutations. We infer significantly fewer (0.38-0.84 fold) strongly deleterious mutations with selection coefficient |s| > 0.01 and more (1.24-1.43 fold) weakly deleterious mutations with selection coefficient |s| < 0.001 compared to previous estimates. Furthermore, a DFE that is a mixture distribution of a point mass at neutrality plus a gamma distribution fits better than a gamma distribution in two of the three data sets. Our results suggest that nearly neutral forces play a larger role in human evolution than previously thought.Entities:
Keywords: deleterious mutations; diffusion theory; population genetics; site frequency spectrum
Mesh:
Year: 2017 PMID: 28249985 PMCID: PMC5419480 DOI: 10.1534/genetics.116.197145
Source DB: PubMed Journal: Genetics ISSN: 0016-6731 Impact factor: 4.562
Figure 1Previously inferred DFEs differ across studies. We rescaled the DFE in terms of the population size assumed or inferred in each study. A population size of 10,000 diploids is used to rescale the distribution of 2Ns to s for Eyre-Walker . For Boyko and Li , we rescale the DFE from 2Ns to s using population sizes of 25,636 and 52,097 diploids, respectively (see Materials and Methods).
Performance of Fit∂a∂i on simulated data sets
| Demography | α (shape) | β (scale) | 0 ≤ | | 10−5 ≤ | | 10−4 ≤ | | 10−3 ≤ | | 10−2 ≤ | | |
|---|---|---|---|---|---|---|---|---|
| True | — | 0.184 | 3266 | 0.182 | 0.096 | 0.146 | 0.219 | 0.357 |
| Constant size | 2596 | 0.180 ± 0.010 | 3712.2 ± 980.2 | 0.186 ± 0.009 | 0.095 ± 0.002 | 0.144 ± 0.006 | 0.213 ± 0.013 | 0.363 ± 0.016 |
| 24 | 0.185 ± 0.028 | 3613.1 ± 4196.7 | 0.182 ± 0.017 | 0.097 ± 0.009 | 0.148 ± 0.023 | 0.221 ± 0.043 | 0.353 ± 0.060 | |
| Two-fold expansion | 2596 | 0.191 ± 0.007 | 2606.0 ± 410.7 | 0.178 ± 0.008 | 0.098 ± 0.002 | 0.152 ± 0.004 | 0.230 ± 0.009 | 0.341 ± 0.010 |
| 24 | 0.187 ± 0.023 | 3259.8 ± 2612.6 | 0.181 ± 0.016 | 0.097 ± 0.008 | 0.149 ± 0.019 | 0.223 ± 0.036 | 0.350 ± 0.050 | |
| LuCamp | 2596 | 0.182 ± 0.008 | 3411.9 ± 558.5 | 0.184 ± 0.008 | 0.096 ± 0.001 | 0.145 ± 0.004 | 0.216 ± 0.008 | 0.358 ± 0.009 |
| 24 | 0.186 ± 0.027 | 3435.4 ± 3249.9 | 0.182 ± 0.017 | 0.097 ± 0.009 | 0.148 ± 0.022 | 0.222 ± 0.042 | 0.351 ± 0.060 |
95% intervals were estimated as ± 1.96 SD of 100 replicates in each simulation set. chr, chromosome.
Values show the true DFE used to simulate the data.
Here the simulation was scaled to an ancestral population size of N = 10,085 diploids.
Figure 2The discrete DFE can recover the approximate form of the DFE from simulated data. The distributions of the proportions of mutations with different selective effects, as inferred by the discrete DFE for 100 simulated data sets, are shown. Each simulation set assumed the demographic model fit to the LuCamp synonymous SFS. A red point depicts the true proportions of the simulated DFE. The true DFE for each set is: (A) the continuous neutral+gamma distribution of Li (pneu = 0.2, α = 4, β = 1.065 × 10−4), (B) the discretized version of that distribution, (C–F) a gamma DFE (α = 0.215, β = 567.1), but where (C and E) the mass of the 10−3 ≤ |s| < 10−2 bin was added to the 10−2 ≤ |s| bin, and (D and F) where the mass of the 10−2 ≤ |s| bin was added to the 10−3 ≤ |s| < 10−2 bin. The data sets simulated for (C) and (D) had sample sizes of n = 2596 chromosomes, while the data sets for (E) and (F) had sample sizes of n = 24 chromosomes.
Figure 3Inference of the DFE is robust to misspecification of the demographic model and background selection. Points show the MLEs of the (A) demographic parameters and (B) DFE parameters inferred from 100 simulated data sets with linkage and population structure. Red lines denote the true values and the yellow dots denote the median estimates across the 100 data sets. Estimates of time of expansion (T1) and the ratio of current to ancestral population size (N1/NANC) tend to be biased because demography is incorrectly modeled due to background selection, but estimates of the DFE are unbiased.
MLEs of various DFEs
| Data set | DFE | Parameter MLEs | Log-likelihood | AIC | |ΔAIC| |
|---|---|---|---|---|---|
| LuCamp | Gamma | α = 0.215, β = 562.1 | −3334.7 | 6673.4 | 13 |
| Neu+gamma | −3327.2 | 6660.4 | 0 | ||
| Neu+exp+let | −3337.8 | 6681.6 | 21.2 | ||
| Discrete | −3334.1 | 6676.2 | 15.8 | ||
| 1kG EUR | Gamma | α = 0.186, β = 875.0 | −1450.5 | 2905.0 | 0 |
| Neu+gamma | −1450.8 | 2907.6 | 2.6 | ||
| Neu+exp+let | −1472.0 | 2950.0 | 45 | ||
| Discrete | −1453.4 | 2914.8 | 9.8 | ||
| ESP EUR | Gamma | α = 0.169, β = 1327.4 | −3012.6 | 6029.2 | 2.6 |
| Neu+gamma | −3010.3 | 6026.6 | 0 | ||
| Neu+exp+let | −3071.6 | 6149.2 | 122.6 | ||
| Discrete | −3029.5 | 6067.0 | 40.4 |
These results are reported assuming LNS/LS = 2.31 and μ = 1.5×10−8. See Table S4 in File S1 for additional information. The shape and scale parameters of the gamma distribution are denoted with α and β, respectively, and the rate parameter of the exponential distribution is denoted with λ. Neu, neutral; exp, exponential; let, lethal; 1kG, 1000 Genomes.
Change in AIC relative to the model with the lowest AIC.
In terms of |s|, these parameters correspond to the ranges of |s| corresponding to: [0, 10−5), [10−5, 10−4), [10−4, 10−3), and [10−3, 10−2), respectively. The mass in the [10−2, 1] range is the complement of the total mass of the four aforementioned categories.
We infer more nearly neutral (|s| < 10−5) and fewer strongly deleterious (|s| ≥ 10−2) new mutations than previous studies
| Data set | Best fit DFE | 0 ≤ | | 10−5 ≤ | | 10−4 ≤ | | 10−3 ≤ | | 10−2 ≤ | | |
|---|---|---|---|---|---|---|---|
| 1.8 × 10−8, 2.5 | Gamma | 0.183 | 0.096 | 0.147 | 0.219 | 0.355 | |
| 1000 Genomes | 1.8 × 10−8, 2.5 | Gamma | 0.217 (0.212–0.223) | 0.112 (0.111–0.113) | 0.169 (0.165–0.172) | 0.243 (0.235–0.249) | 0.259 (0.252–0.266) |
| ESP | Gamma | 0.229 (0.223–0.234) | 0.105 (0.104–0.106) | 0.152 (0.150–0.155) | 0.216 (0.212–0.221) | 0.298 (0.294–0.302) | |
| LuCamp | Discrete | 0.278 (0.221–0.303) | 0.027 (0.001–0.110) | 0.211 (0.167–0.234) | 0.352 (0.330–0.373) | 0.132 (0.124–0.142) | |
| 1000 Genomes | 1.5 × 10−8, 2.31 | Gamma | 0.237 (0.231–0.243) | 0.127 (0.125–0.128) | 0.192 (0.188–0.197) | 0.266 (0.259–0.272) | 0.178 (0.171–0.186) |
| ESP | Neu+gamma | 0.263 (0.250–0.277) | 0.104 (0.091–0.114) | 0.167 (0.160–0.173) | 0.249 (0.241–0.259) | 0.217 (0.211–0.221) | |
| LuCamp | Neu+gamma | 0.242 (0.223–0.260) | 0.091 (0.072–0.107) | 0.194 (0.183–0.203) | 0.332 (0.313–0.352) | 0.141 (0.129–0.152) |
Results in comparison with data from Boyko . C.I.’s were constructed by Poisson resampling the nonsynonymous SFS and fitting a DFE 200 times. See Table S5 in File S1 for additional information. Neu, neutral.
African-American.
The results presented with the assumptions LNS/LS = 2.5 and μ = 1.8 × 10−8 match the mutation rate assumptions of Boyko .
Figure 4The distribution of selection coefficients of new mutations under our best-fit DFEs compared to Boyko . Results are presented for the best-fit DFE to each full data set and the best-fit DFE when the data were projected down to n = 24 chromosomes. C.I.’s were estimated by Poisson resampling the nonsynonymous SFS and fitting a DFE 200 times. C.I.’s for the DFE fit to the Boyko European data set were unavailable. Note that our models predict more nearly neutral mutations (0 ≤ |s| < 10−5) and fewer strongly deleterious mutations (10−2 ≤ |s|) than Boyko , across all mutation rates. Top panel denotes our favored mutation rate while the bottom panel denotes the mutation rate used by Boyko . See Figure S5 in File S1 for a comparison of the population-scaled selection coefficients (2Ns).
Figure 5Small sample size and misspecification of the DFE can explain some of the differences between previous estimates and our estimates. Gamma and neutral+gamma DFEs were fit to 100 simulated data sets of sample sizes n = 24 and n = 2596 chromosomes, where the true DFE was neutral+gamma distributed (pneu = 0.164, α = 0.338, β = 358.8). (A) The distributions of the difference in log-likelihood between the gamma and neutral+gamma distributions. When the sample size is large (n = 2596) the neutral+gamma distribution has a higher log-likelihood than the gamma distribution. However, the small samples (n = 24) are unable to distinguish between the gamma and neutral+gamma distributions. (B) The estimated proportions of new mutations having different selective effects when fitting the gamma and neutral+gamma distributions. Note that when n = 24, the gamma distribution overpredicts the proportion of strongly deleterious mutations (|s| ≥ 0.01). Red dots denote the true proportion of mutations in each bin. The boxes cover the first and third quartiles, and the band represents the median. The whiskers cover the highest and lowest datum within 1.5 times the interquartile range from the first and third quartiles. Lastly, any data outside that region are plotted as outlier points.