| Literature DB >> 30218074 |
Jedidiah Carlson1, Adam E Locke2, Matthew Flickinger3, Matthew Zawistowski3, Shawn Levy4, Richard M Myers4, Michael Boehnke3, Hyun Min Kang3, Laura J Scott3, Jun Z Li5,6, Sebastian Zöllner7,8.
Abstract
A detailed understanding of the genome-wide variability of single-nucleotide germline mutation rates is essential to studying human genome evolution. Here, we use ~36 million singleton variants from 3560 whole-genome sequences to infer fine-scale patterns of mutation rate heterogeneity. Mutability is jointly affected by adjacent nucleotide context and diverse genomic features of the surrounding region, including histone modifications, replication timing, and recombination rate, sometimes suggesting specific mutagenic mechanisms. Remarkably, GC content, DNase hypersensitivity, CpG islands, and H3K36 trimethylation are associated with both increased and decreased mutation rates depending on nucleotide context. We validate these estimated effects in an independent dataset of ~46,000 de novo mutations, and confirm our estimates are more accurate than previously published results based on ancestrally older variants without considering genomic features. Our results thus provide the most refined portrait to date of the factors contributing to genome-wide variability of the human germline mutation rate.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30218074 PMCID: PMC6138700 DOI: 10.1038/s41467-018-05936-5
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Mutation rates vary according to sequence context. a Heatmap of estimated relative mutation rates for all possible for A > G and C > T transition subtypes, up to a 7-mer resolution (high-resolution heatmaps for all possible subtypes are included in Supplementary Fig. 1). The leftmost panels show the relative mutation rates for the 1-mer types, and the subsequent panels to the right show these rates stratified by increasingly broader sequence context. Each 4 × 4 grid delineates a set of 16 subtypes, defined by the upstream sequence (y-axis) and downstream sequence (x-axis) from the central (mutated) nucleotide. Boxed regions indicate motifs previously identified by Aggarwala and Voight as hypermutable (pink) or hypomutable (green), relative to their similar subtypes. b Zoomed-in view showing hypermutable NTT[A > T]AAA subtypes relative to other 7-mer A > T subtypes
Fig. 2Discordance between ERV-estimated and common SNV-estimated mutation rates. a Relationship between 7-mer relative mutation rates estimated among BRIDGES ERVs (x-axis) and the 1000G intergenic SNVs (y-axis) on a log-log scale. We note that the strength of this correlation is driven by hypermutable CpG > TpG transitions. b Type-specific 2D-density plots, as situated in the scatterplot of a. The dashed line indicates the expected relationship if no bias is present. c Heatmap showing ratio between the relative mutation rates for each 7-mer mutation subtype. Subtypes with higher rates among the 1000G SNVs (relative to ERV-derived rates) are shaded gold, and subtypes with lower rates in the 1000G SNVs are shaded green. Relative differences are truncated at 2 and 0.5, as only 2.5% of subtypes showed differences beyond this range
Goodness-of-fit statistics for mutation rate estimates applied to de novo testing data
| Mutation rate estimation strategy |
| Δ | Nagelkerke’s | |||
|---|---|---|---|---|---|---|
| Subtype length | Study | Variant type | ||||
| 1-mers | BRIDGES | ERVs | 353,896 | 21,575 | 7 | 0.088 |
| 3-mers | BRIDGES | ERVs | 335,319 | 2998 | 4 | 0.118 |
| 5-mers | BRIDGES | ERVs | 332,861 | 540 | 3 | 0.124 |
| 7-mers | BRIDGES | ERVs | 332,321 | 0 | 1 | 0.126 |
| 7-mers | BRIDGES | ERVs (passing 1000G strict mask) | 332,582 | 261 | 2 | 0.125 |
| 7-mers | BRIDGES | MAC10+ | 342,886 | 10,565 | 5 | 0.103 |
| 7-mers | 1000G | Intergenic SNVs[ | 344,003 | 11,682 | 6 | 0.100 |
aDifference in AIC from the baseline BRIDGES 7-mer model
bLower AIC rank indicates better model performance
Fig. 3Distributions of statistically significant mutagenic effects of genomic features. a Effects of seven genomic features where associations with multiple mutation types were detected. For features with bidirectional effects, we separately plotted distributions of positive associations (OR > 1; above dashed line) and negative associations (OR < 1; below dashed line). The number of 7-mer subtypes within each type for which that feature is statistically significant in a positive or negative direction is shown above or below each distribution. Distributions are only shown for types with 10 or more 7-mer subtypes associated in the same direction. *Odds ratios for the three continuously valued features (recombination rate, replication timing, and GC content) indicate the change in odds of mutability per 10% increase in the value of that feature. †Effects in CpG islands tend to be stronger than other features, so are shown on a wider scale. b Distributions of significant mutagenic effects for the 5 features only associated with CpG > TpG transitions
Fig. 4Comparison of goodness-of-fit for different mutation rate estimation strategies. For each mutation type and each model i, we calculated as a measure of relative model performance, with lower values of ΔAIC indicating better fit to the GoNL/ITMI de novo mutation data. ΔAIC is shown on the horizontal axis on an arcsinh scale. For each mutation type, the best-fitting model thus has a ΔAIC = 0. Models with ΔAIC < 10 (grey-shaded area) are considered comparable to the optimal model, whereas models with ΔAIC > 10 are considered to explain substantially less variation than the optimal model[67]