| Literature DB >> 22022285 |
Kirk E Lohmueller1, Anders Albrechtsen, Yingrui Li, Su Yeon Kim, Thorfinn Korneliussen, Nicolas Vinckenbosch, Geng Tian, Emilia Huerta-Sanchez, Alison F Feder, Niels Grarup, Torben Jørgensen, Tao Jiang, Daniel R Witte, Annelli Sandbæk, Ines Hellmann, Torsten Lauritzen, Torben Hansen, Oluf Pedersen, Jun Wang, Rasmus Nielsen.
Abstract
A major question in evolutionary biology is how natural selection has shaped patterns of genetic variation across the human genome. Previous work has documented a reduction in genetic diversity in regions of the genome with low recombination rates. However, it is unclear whether other summaries of genetic variation, like allele frequencies, are also correlated with recombination rate and whether these correlations can be explained solely by negative selection against deleterious mutations or whether positive selection acting on favorable alleles is also required. Here we attempt to address these questions by analyzing three different genome-wide resequencing datasets from European individuals. We document several significant correlations between different genomic features. In particular, we find that average minor allele frequency and diversity are reduced in regions of low recombination and that human diversity, human-chimp divergence, and average minor allele frequency are reduced near genes. Population genetic simulations show that either positive natural selection acting on favorable mutations or negative natural selection acting against deleterious mutations can explain these correlations. However, models with strong positive selection on nonsynonymous mutations and little negative selection predict a stronger negative correlation between neutral diversity and nonsynonymous divergence than observed in the actual data, supporting the importance of negative, rather than positive, selection throughout the genome. Further, we show that the widespread presence of weakly deleterious alleles, rather than a small number of strongly positively selected mutations, is responsible for the correlation between neutral genetic diversity and recombination rate. This work suggests that natural selection has affected multiple aspects of linked neutral variation throughout the human genome and that positive selection is not required to explain these observations.Entities:
Mesh:
Year: 2011 PMID: 22022285 PMCID: PMC3192825 DOI: 10.1371/journal.pgen.1002326
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Summary of the correlation coefficients (Spearman's ) for the three datasets.
| Dataset | Correlation type |
| MAF vs. recombination rate |
| MAF vs. genic content |
| Low-coverage | Pairwise correlation | 0.111 | 0.062 | −0.039 | −0.012 |
| Partial correlation | 0.117 | 0.042 | −0.056 | −0.018 | |
| Higher− coverage | Pairwise correlation | 0.209 | 0.086 | −0.033 | −0.043 |
| Partial correlation | 0.173 | 0.046 | −0.076 | −0.035 | |
| CGS | Pairwise correlation | 0.200 | 0.101 | −0.040 | −0.040 |
| Partial Correlation | 0.188 | 0.066 | −0.077 | −0.020 |
Partial correlation controls for human-chimp divergence, GC content, genic content, and coverage (the number of neutral bases covered by sequencing data).
Partial correlation controls for human-chimp divergence, GC content, recombination rate, and coverage (the number of neutral bases covered by sequencing data).
*P<0.05.
**P<0.001.
***P<10−5.
****P<10−16.
Figure 1Correlations between summaries of genetic variation and recombination rate in the low-coverage dataset dividing the data into genic and non-genic windows (see text).
(A) Number of SNPs per covered base divided by human-chimp divergence (S) versus recombination rate. (B) Average minor allele frequency versus recombination rate. Red and green lines denote the lowess curves fit to the two variables for genic and non-genic windows, respectively. Black points denote genic windows while gray points denote non-genic windows. Each point represents the average statistics computed over 50 100 kb windows. The windows were sorted by recombination rate prior to binning. Note that several outlier data points fell outside the plotting area.
Summary of correlation coefficients (Spearman's ) for the three datasets divided into genic and non-genic windows.
| Dataset | Correlation type | Window type |
| MAF vs. recombination rate |
| Low coverage | Pairwise correlation | Genic | 0.175 | 0.073 |
| Nongenic | 0.028 | 0.042 | ||
| Partial correlation | Genic | 0.185 | 0.029 | |
| Nongenic | 0.050 | 0.033 | ||
| Higher coverage | Pairwise correlation | Genic | 0.250 | 0.123 |
| Nongenic | 0.168 | 0.043 | ||
| Partial correlation | Genic | 0.209 | 0.063 | |
| Nongenic | 0.132 | 0.031 | ||
| CGS | Pairwise correlation | Genic | 0.241 | 0.123 |
| Nongenic | 0.154 | 0.074 | ||
| Partial correlation | Genic | 0.227 | 0.065 | |
| Nongenic | 0.138 | 0.055 |
Partial correlation controls for human-chimp divergence, GC content, and coverage (the number of neutral bases covered by sequencing data).
*P<0.05.
**P<0.001.
***P<10−5.
****P<10−16.
Figure 2Comparison of Spearman's for genic regions with the expected values based on forward simulations for the low-coverage dataset.
(A) Number of SNPs per covered base divided by human-chimp divergence (S) versus recombination rate. (B) Average minor allele frequency versus recombination rate. (C) Number of SNPs per covered base divided by human-chimp divergence (S) versus human-chimp nonsynonymous divergence (d). The red solid lines denote the point estimates from the genic regions in the low-coverage data. The dotted lines represent 95% confidence intervals obtained by bootstrapping. Black points denote a model with no selection and pink points a model where negative selection acted only on nonsynonymous mutations. Blue points denote models where both nonsynonymous and some intronic sites were subjected to negative selection. Orange points denote models where most nonsynonymous mutations were negatively selected, but some were positively selected. Green points denote models where nonsynonymous and some intronic mutations were subjected to negative selection, but a fraction of nonsynonymous mutations were positively selected. See Table S6 for a more detailed description of the different models of selection. Nonsynonymous divergence was measured from the simulations as the fraction of differences between the human and chimp sequences at first and second codon positions.
Figure 3Negative selection is required to match multiple aspects of the low-coverage data.
(A) Number of SNPs per covered base divided by human-chimp divergence (S) versus recombination rate. (B) Number of SNPs per covered base divided by human-chimp divergence (S) versus human-chimp nonsynonymous divergence (d). The red solid lines denote the point estimates from the genic regions in the low-coverage data. The dotted lines represent 95% confidence intervals obtained by bootstrapping. p denotes the proportion of simulated windows that contained positively selected mutations and p denotes the proportion of windows that experienced negative selection. All sites in the remaining windows evolved neutrally. In windows with positive selection, 0.5% of nonsynonymous mutations were positively selected (black points: s = 0.625%; pink points: s = 0.3%), while the remainder evolved neutrally. In windows with negative selection, a gamma distribution of selective effects was used for nonsynonymous mutations and 50% of intronic mutations were selected against with s = 0.0075%.
Figure 4Correlation between neutral human-chimp divergence (d) and recombination rate.
The red solid line denotes the point estimate from the genic regions in the low-coverage data. The dotted lines represent 95% confidence intervals obtained by bootstrapping. Black points denote a model with no selection and pink points a model where negative selection acted only on nonsynonymous mutations. Blue points denote models where both nonsynonymous and some intronic sites were subjected to negative selection. Orange points denote models where most nonsynonymous mutations were negatively selected, but some were positively selected. Green points denote models where nonsynonymous and some intronic mutations were subjected to negative selection, but a fraction of nonsynonymous mutations were positively selected. See Table S6 for a more detailed description of the different models of selection.