| Literature DB >> 22075116 |
Dennis Kostka1, Melissa J Hubisz, Adam Siepel, Katherine S Pollard.
Abstract
GC-biased gene conversion (gBGC) is a recombination-associated evolutionary process that accelerates the fixation of guanine or cytosine alleles, regardless of their effects on fitness. gBGC can increase the overall rate of substitutions, a hallmark of positive selection. Many fast-evolving genes and noncoding sequences in the human genome have GC-biased substitution patterns, suggesting that gBGC-in contrast to adaptive processes-may have driven the human changes in these sequences. To investigate this hypothesis, we developed a substitution model for DNA sequence evolution that quantifies the nonlinear interacting effects of selection and gBGC on substitution rates and patterns. Based on this model, we used a series of lineage-specific likelihood ratio tests to evaluate sequence alignments for evidence of changes in mode of selection, action of gBGC, or both. With a false positive rate of less than 5% for individual tests, we found that the majority (76%) of previously identified human accelerated regions are best explained without gBGC, whereas a substantial minority (19%) are best explained by the action of gBGC alone. Further, more than half (55%) have substitution rates that significantly exceed local estimates of the neutral rate, suggesting that these regions may have been shaped by positive selection rather than by relaxation of constraint. By distinguishing the effects of gBGC, relaxation of constraint, and positive selection we provide an integrated analysis of the evolutionary forces that shaped the fastest evolving regions of the human genome, which facilitates the design of targeted functional studies of adaptation in humans.Entities:
Mesh:
Year: 2011 PMID: 22075116 PMCID: PMC3278478 DOI: 10.1093/molbev/msr279
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
FLineage-specific evolutionary histories. Panel A: For a branch (or a collection of branches) of interest, we assume a different substitution model compared with the rest of the tree. In addition to global rescaling (via the parameter S), a separate selection coefficient (S) and gene conversion disparity (B) lead to a seminested collection of models for sequence evolution on this lineage (see Materials and Methods). Panel B: Parameterization of the rate matrix Q for the two different types of branches (eq. 2).
Evolutionary Classification of HARs
| A | |||||||
| Class | Number of HARs | Number of Substitutions | ΔGC | Acceleration | Recombination | ||
| 0 | — | — | — | — | — | — | |
| 10 | 2.30 | 1.46 | 19.46 | 3.02 | 1.23 | 0.39 | |
| 28 | 3.68 | 2.47 | 41.31 | 4.83 | 1.60 | 0.02 | |
| 42 | 2.48 | 0.05 | 11.94 | 1.00 | 0.97 | 0.88 | |
| 112 | 3.32 | 0.08 | 29.76 | 4.63 | 1.08 | 0.86 | |
| 10 | 4.50 | 2.49 | 46.29 | 5.97 | 1.53 | 0.15 | |
| B | |||||||
| Class | Number of HARs | Number of Substitutions | ΔGC | Acceleration | Recombination | ||
| 34 | 0.59 | – 0.08 | 1.00 | 0.18 | 1.10 | 0.61 | |
| 35 | 1.31 | 0.90 | 14.87 | 1.87 | 0.92 | 0.92 | |
| 15 | 3.00 | 2.21 | 45.32 | 4.45 | 1.64 | 0.05 | |
| 74 | 1.89 | – 0.17 | 9.33 | 1.00 | 1.17 | 0.43 | |
| 42 | 3.00 | – 0.08 | 29.04 | 4.11 | 1.17 | 0.45 | |
| 1 | 3.00 | 0.74 | 20.92 | 2.37 | 1.78 | 0.20 | |
Average number of substitutions per HAR.
Average increase in GC-content per HAR, in percentage points.
Average fold change in substitution rate, with respect to M0 (left column) and M(right column), taking the maximum likelihood estimate for the branch length from the model corresponding to the annotated class.
Average male recombination rate in cM per M, with P-values for a test of higher than expected recombination rate (second column).
Same as panel A, with potential CpG dinucleotides masked in the analysis.
FEffects of gBGC and selection on GC-content and substitution rates. We investigated the substitution process along on a short branch (0.005 expected substitutions under the neutral model) for a range of values for the selection coefficient S and the gBGC disparity B. Each parameter combination has a unique effect on change in GC-content (Δ, color scale) and the expected number of substitutions (contour lines, labels are the fold change) compared with a neutral model (S = 0 and B = 0, bold line). For any fixed selection coefficient S, increasing the gBGC disparity B increases the GC-content and the expected number of substitutions. These effects are nonlinear and depend on the level of selection.
FInferring acceleration in the presence of gBGC. We used simulations to determine the frequency with which alignments generated from models with a wide range of levels of selection (S) and gBGC (B) are assigned to each class by our methodology. Increasing brightness corresponds to a decreasing fraction of the null class (C0). For the other three classes, the color representation corresponds to a point on the probability 2-simplex (red = gBGC, green = acceleration, blue = both). Because our classification procedure is conservative with respect to annotating selection, the red area is larger than the green area in each plot. Panel A: 1,000-bp alignments. Power is high and relatively few nonnull alignments are assigned to C0 (white/light grid points). Panel B: 500-bp alignments. Power is slightly reduced. Panel C: 100-bp alignments. Power is significantly lower (more white/light grid points), but we are still able to correctly annotate most of the extreme instances of substitution rate acceleration in the presence of gBGC.
Substitutions of Each Type across All HARs
| Type | Number of Substitutions (%) |
| Weak-to-strong | 369 (57) |
| Strong-to-weak | 187 (29) |
| Weak-to-weak | 46 (7) |
| Strong-to-strong | 45 (7) |
Number of HARs Annotated to Each Class via BIC, LRTs, and AIC
| Class | BIC | LRTs | AIC |
| 37 | 38 | 35 | |
| 164 | 154 | 149 | |
| 1 | 10 | 18 |