| Literature DB >> 31477931 |
Ryan D Hernandez1,2,3,4,5,6, Lawrence H Uricchio7, Kevin Hartman8, Chun Ye9,10, Andrew Dahl9,11, Noah Zaitlen12,13,14.
Abstract
The vast majority of human mutations have minor allele frequencies under 1%, with the plurality observed only once (that is, 'singletons'). While Mendelian diseases are predominantly caused by rare alleles, their cumulative contribution to complex phenotypes is largely unknown. We develop and rigorously validate an approach to jointly estimate the contribution of all alleles, including singletons, to phenotypic variation. We apply our approach to transcriptional regulation, an intermediate between genetic variation and complex disease. Using whole-genome DNA and lymphoblastoid cell line RNA sequencing data from 360 European individuals, we conservatively estimate that singletons contribute approximately 25% of cis heritability across genes (dwarfing the contributions of other frequencies). The majority (approximately 76%) of singleton heritability derives from ultrarare variants absent from thousands of additional samples. We develop an inference procedure to demonstrate that our results are consistent with pervasive purifying selection shaping the regulatory architecture of most human genes.Entities:
Mesh:
Year: 2019 PMID: 31477931 PMCID: PMC6730564 DOI: 10.1038/s41588-019-0487-7
Source DB: PubMed Journal: Nat Genet ISSN: 1061-4036 Impact factor: 38.330
Fig. 1.Simulation results.
Across a broad range of parameters, the accuracy of heritability inference improves as the number of SNP bins (partitioned by MAF) increases. (a) Mean bias of total heritability (inferred-true) for different numbers of SNP bins (K), where each point represents the mean of 500 simulations for different parameters, and a box plot summarizing the bias distribution across all parameters (indicating median, upper/lower quartile, and twice the interquartile range). (b–f) The distribution of average bias across simulated parameters for each SNP bin, showing that both mean and variance of the bias decrease as K increases (n = 500 simulations in each plot).
Figure 2.Partitioning heritability.
Rare variants (RVs) dominate the genetic architecture of human gene expression. (a) Average heritability estimates across genes, partitioned across minor allele frequency (MAF) bins (h2’, purple) after correcting for population structure using PCA (blue) and eliminating residual rare variant structure identified using a trans-permutation (pink). (b) The proportion of heritability attributed to each MAF bin. Singletons represent ~25% of the total inferred heritability, the vast majority of which is due to variants that are extremely rare in the population (inset, partitioning singletons in our data by the MAF observed in gnomAD, n > 15k; singletons not reported in gnomAD are indicated by *). (c) Cumulative h2’ inferred as a function of MAF for different frequency filter thresholds (purple, green, blue, brown), and when singletons are partitioned by population MAF (based on gnomAD, red). Including all SNPs and partitioning singletons by population MAF (instead of observed MAF) results in a substantially increased level of h2’. (d) Globally rare singletons represent 56% of all singletons, but contribute 93% of h2’singleton. Rare INDELs and structural variants (SVs) also have enriched contributions to heritability (2.8% of singletons but 7.8% of h2’singleton). However, singletons inferred to derive from Neanderthal introgression or have gnomAD MAF ≥ 10−4 make negligible contributions to h2’singleton. In all cases, confidence intervals/envelopes are based on the 95% quantile range of 1000 bootstrap simulations. PCA, principal component analysis; MAC, minor allele count.
Figure 3.Pervasive purifying selection drives the genetic architecture of gene expression.
Our model infers the strength of purifying selection acting on causal variants (ϕ), the correlation between the fitness and the effect size of causal variants (ρ), and a scaling factor that transforms fitness into effect sizes (τ). (a) The posterior distribution of the mean of each parameter across genes (curves), as well as a histogram of the posterior parameter estimates for each gene. (b) The joint posterior distribution of the average ρ and τ across genes shows an evolutionary tradeoff between the correlation and scaling of fitness and effect sizes. (c) The cumulative proportion of heritability inferred from the gene expression data (dots) compared to the expected patterns from 1000 draws from the posterior distribution (grey) and neutral expectation (pink).