| Literature DB >> 24728292 |
Fred A Wright1, Patrick F Sullivan2, Andrew I Brooks3, Fei Zou4, Wei Sun4, Kai Xia4, Vered Madar4, Rick Jansen5, Wonil Chung4, Yi-Hui Zhou6, Abdel Abdellaoui7, Sandra Batista8, Casey Butler8, Guanhua Chen4, Ting-Huei Chen4, David D'Ambrosio9, Paul Gallins10, Min Jin Ha4, Jouke Jan Hottenga7, Shunping Huang8, Mathijs Kattenberg7, Jaspreet Kochar9, Christel M Middeldorp7, Ani Qu9, Andrey Shabalin11, Jay Tischfield3, Laura Todd10, Jung-Ying Tzeng6, Gerard van Grootheest5, Jacqueline M Vink7, Qi Wang9, Wei Wang12, Weibo Wang8, Gonneke Willemsen7, Johannes H Smit5, Eco J de Geus7, Zhaoyu Yin4, Brenda W J H Penninx5, Dorret I Boomsma7.
Abstract
We assessed gene expression profiles in 2,752 twins, using a classic twin design to quantify expression heritability and quantitative trait loci (eQTLs) in peripheral blood. The most highly heritable genes (∼777) were grouped into distinct expression clusters, enriched in gene-poor regions, associated with specific gene function or ontology classes, and strongly associated with disease designation. The design enabled a comparison of twin-based heritability to estimates based on dizygotic identity-by-descent sharing and distant genetic relatedness. Consideration of sampling variation suggests that previous heritability estimates have been upwardly biased. Genotyping of 2,494 twins enabled powerful identification of eQTLs, which we further examined in a replication set of 1,895 unrelated subjects. A large number of non-redundant local eQTLs (6,756) met replication criteria, whereas a relatively small number of distant eQTLs (165) met quality control and replication standards. Our results provide a new resource toward understanding the genetic control of transcription.Entities:
Mesh:
Year: 2014 PMID: 24728292 PMCID: PMC4012342 DOI: 10.1038/ng.2951
Source DB: PubMed Journal: Nat Genet ISSN: 1061-4036 Impact factor: 38.330
Demography of 2,752 subjects from 1,444 twin pairs for twin-based heritability analyses.
| Variable | Median (IQR) |
|---|---|
| Age (years) | 32 (28–39) |
| Body mass index (kg/m2) | 23.3 (21.3–25.8) |
| White blood cell count (109/L) | 6.3 (5.3–7.4) |
| Hematocrit (fraction) | 0.42 (0.40–0.45) |
| Female sex | 0.658 |
| Blood draw between 0700–1100 | 0.940 |
| Fasting at time of blood draw | 0.947 |
| Current smoker | 0.216 |
| Alcohol user (12 drinks/year) | 0.771 |
IQR=inter-quartile range.
Figure 1Transcriptome-wide estimates of heritability, based on n=2752 twins. (a) Manhattan plot of h2 P-values for the highest h2 transcript for each of 18,392 genes. The inset (showing PADI2) illustrates that the evidence for heritability is based on higher a correlation between MZ pairs (blue) than between DZ pairs (red). (b) Clustering of 777 genes with h2 q < 0.05. The most heritable genes belong to the cluster with lowest inter-gene correlation, but many significant genes belong to clusters with high inter-gene correlation. (c) Among 43,628 transcripts, the significant proportion (in terms of false discovery q-value) is dependent on mean transcript expression, increasing rapidly for transcripts above an approximate detection threshold (expression ≥ 3.584, determined as the 90th percentile of chrY expression in females).
Predictors of high heritability expression levels.
| Predictor | Mean | Enrichment z | Expr -corrected Enrichment z | ||
|---|---|---|---|---|---|
| Mean expression | -- | -- | |||
| Variance of expression | |||||
| GC content, +5kb of TSS | −1.42 | 0.155 | |||
| GC content, −5kb of TSS | −0.72 | 0.471 | |||
| DNase I hypersens. site (DHS) near TSS | |||||
| DHS near TSS, blood | 1.30 | 0.195 | |||
| Gene density | |||||
| Gene size | |||||
| Local recombination rate | 0.73 | 0.464 | 3.01 | 0.0026 | |
| Size of LD block | −0.05 | 0.959 | −0.49 | 0.622 | |
| Gene conservation score | 1.14 | 0.255 | |||
| Genes under selection (185) | 0.013 | 1.60 | 0.109 | 1.82 | 0.068 |
| Genes under positive selection (549) | 0.007 | 1.32 | 0.186 | 1.78 | 0.074 |
| Genes under balancing selection (47) | 0.042 | 2.65 | 0.0081 | 2.83 | 0.0046 |
| Genes under adaptive selection (174) | 0.019 | 2.26 | 0.024 | 1.13 | 0.260 |
| Human accelerated genes (161) | 0.024 | 3.05 | 0.0023 | ||
| Primate accelerated genes (137) | 0.024 | 2.86 | 0.0042 | ||
| NHGRI GWAS catalog (2343) l | |||||
| NHGRI, chr6 genes removed (2142) | |||||
| NHGRI, immune diseases (720) | |||||
| NHGRI, non-immune diseases (1623) | |||||
| OMIM disease entries (3089) | |||||
| NHGRI + OMIM (4809) |
Abbreviations: TSS=transcription start site, NHGRI=National Human Genome Research Institute, GWAS=genome-wide association study, OMIM=Online Mendelian Inheritance in Man. Values in boldface correspond to P < 0.0022, for Bonferroni significance at α=0.05 for the 23 tests in each of uncorrected and corrected analyses. Values in blue depict significant negative associations.
From the Encode Duke UCSC tracks.
Defined as the reversed rank of the variance of bp position of gene and two flanking genes.
End transcription bp minus start transcription bp.
Decode sex-averaged standardized recombination maps at http://www.decode.com/addendum/ in 10kb bins.
LD block boundaries as described in Supplementary Methods.
NCBI HomoloGene (build 66) score, defined as the ratio of number of appearances in other organisms to the total of 21.
Reference [36], genes with the property is shown in parentheses.
References 36–38.
Reference 39.
Reference 35
Reference 41. Reference 1, for SNPs with P<5×10−8.
Following classification in Reference 42.
Reference 43.
Figure 2Gene density and other predictors of heritability, using n=2616 paired co-twins and 18,392 genes. (a) Mean h2 (corrected for gene expression level) vs. density of protein coding genes per autosome, showing that heritability is considerably higher for gene-poor chromosomes. Plot symbol area is proportional to number of array genes per chromosome. (b) Histograms of the permuted enrichment z-statistics for two predictors listed in Table 2. Observed values (blue dots) are extreme compared to the permutations.
Figure 3Apparent heritability and local IBD effects vs. true underlying distributions. (a) For the twin-based h2 estimates (n=2752, 8818 expressed genes shown), subtracting the effects of sampling variation produces an estimated true distribution (blue). Re-simulating from the fitted true assumed distribution closely approximates the observed h2 (black curve). (b) The analogous expressed-gene results for local IBD effect estimation. (c) Proportions of all 18,392 genes exceeding h2 thresholds for observed data and for the estimated “true” h2 distribution. The MuTHER study (n=856) reported many more extreme h2 values, but the observation is consistent with greater sampling variation due to smaller sample size. (d) The analogous figure using only expressed genes from both studies.
Figure 4Comparison and replication of eQTL results. (a) Number of unique genes with evidence of local association (q < 0.01, SNP ± 1 Mb window of gene), depicted for published leukocyte eQTL studies (LCLs, monocytes, and PBLs), as well as subsampling of NTR data (PBLs) using only genotyped markers and moderate QC (n=2494, 43,628 transcripts examined). Sample sizes are corrected for the number of covariates used. The “NTR with final QC” value applies q<0.001. (b) Overlap of local eQTL findings with two other large blood studies, at q<0.01. (c) Number of unique genes with evidence (q<0.01) for distant (greater than 1Mb) association. The implausible non-monotone pattern for NTR on original expression values illustrates the importance of robust association methods. Using the final QC on NTR data and q<0.001 drops the number of distant eQTLs from over 800 to ~300. The results suggest that many distant associations remain to be discovered, but careful QC is essential. (d) Overlap of distant eQTL findings (q<0.001) with previous studies (within 1 Mb).
Figure 5Properties of distant eQTLs. (a) 348 eQTLs (gene-SNP pairs) were significant (q < 0.001) and passed the QC procedures and, of these, 165 replicated (q < 0.1) in 1895 NESDA individuals. (b) The 304 SNPs in significant eQTLs were examined for overlap with regulatory features, including DNase/FAIRE and transfactor binding sites, using Variant Effect Predictor (version 2.8) of Ensembl. [54] Most features were not enriched, although the 3 SNPs annotated as 5′ UTR variants all overlap with regulatory features, representing a significant enrichment compared to the total 18.4% overlap of distant eQTL SNPs with regulatory features representing a significant enrichment compared to the total 18.4% overlap of distant eQTL SNPs with regulatory features. (c) The π1 value represents the estimated proportion of the transcriptome influenced by the 304 QC-passing SNPs in significant eQTLs. Across all significant bins the cumulative proportion is only ~3%. (d) A distant eQTL hotspot on chr19 was associated with the expression of 12 distant genes, and one local gene (MYO1F). The partial correlation graph suggests that MYO1F expression is independent of the expression of the other distant genes given the expression of the transcription factor SOX13.