| Literature DB >> 36171194 |
Mike Thompson1, Mary Grace Gordon2,3,4, Andrew Lu5, Anchit Tandon6, Eran Halperin7,8,9,10, Alexander Gusev11,12, Chun Jimmie Ye2,3,13,14,15, Brunilda Balliu10, Noah Zaitlen16,17.
Abstract
A majority of the variants identified in genome-wide association studies fall in non-coding regions of the genome, indicating their mechanism of impact is mediated via gene expression. Leveraging this hypothesis, transcriptome-wide association studies (TWAS) have assisted in both the interpretation and discovery of additional genes associated with complex traits. However, existing methods for conducting TWAS do not take full advantage of the intra-individual correlation inherently present in multi-context expression studies and do not properly adjust for multiple testing across contexts. We introduce CONTENT-a computationally efficient method with proper cross-context false discovery correction that leverages correlation structure across contexts to improve power and generate context-specific and context-shared components of expression. We apply CONTENT to bulk multi-tissue and single-cell RNA-seq data sets and show that CONTENT leads to a 42% (bulk) and 110% (single cell) increase in the number of genetically predicted genes relative to previous approaches. We find the context-specific component of expression comprises 30% of heritability in tissue-level bulk data and 75% in single-cell data, consistent with cell-type heterogeneity in bulk tissue. In the context of TWAS, CONTENT increases the number of locus-phenotype associations discovered by over 51% relative to previous methods across 22 complex traits.Entities:
Mesh:
Year: 2022 PMID: 36171194 PMCID: PMC9519579 DOI: 10.1038/s41467-022-33212-0
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 17.694
Fig. 1An overview of the CONTENT approach.
CONTENT first decomposes the observed expression for each individual into context-specific and context-shared components following[16]. Then, CONTENT fits predictors for the context-shared component of expression as well as each context-specific component of expression (e.g., liver). Finally, for a given context, CONTENT combines the genetically predicted components into the full model using a simple regression. Icons were created with BioRender.com.
Fig. 2CONTENT is powerful and well-calibrated in simulated data.
Accuracy of each method to predict the genetically regulated gene expression of each gene-context pair for different correlations of intra-individual noise across contexts. Mean adjusted R2 across contexts between the true (A) full (context-specific + context-shared), B shared, and (C) specific genetic components of expression and the predicted component for each method and for different levels of intra individual correlation. The context-by-context approach and UTMOST output only a single predictor, and we show the variability captured by this predictor for each component of expression. CONTENT, however, generates predictors for all three components of expression, and notably, CONTENT(Specific) and CONTENT(Shared) capture their intended component of expression without capturing the opposite (i.e., the predictor for CONTENT(Specific) is uncorrelated with the true shared component of expression and vice versa). We show here the accuracy for each component and method on gene-contexts with both context-shared and context-specific effects, but show in Supplementary Fig. 4 the accuracy for all gene-contexts pairs.
Fig. 3CONTENT outperforms existing approaches in the GTEx and scRNA-seq CLUES datasets.
A, D Number of genes with a significantly predictable component (hFDR ≤ 5%) in GTEx (A) and CLUES (D); the sample sizes for each context are included in parentheses. B, E Ratio of expression prediction accuracy (adjusted R2) of the best-performing cross-validated CONTENT model over the context-by-context (green) and UTMOST (blue) approaches (median across all genes significantly predicted by at least either method). Numbers above one indicate higher adjusted R2 and thus prediction accuracy for CONTENT. C, F Prediction accuracy of CONTENT(Full) and CONTENT(Shared) when a gene-tissue has a significant shared, specific, and full model.
Fig. 4Contribution of context-specific genetic regulation in GTEx and CLUES.
A, C Number of genes with a significant (FDR ≤ 5%) CONTENT(Specific) model of expression in GTEx (A) and CLUES (C). Color indicates sample size of context. B, D Proportion of expression variance of CONTENT(Full) explained by CONTENT(Specific) and CONTENT(Shared) for genes with a significant CONTENT(Full) model.
Fig. 5CONTENT(Full) is powerful, sensitive, and specific in simulated TWAS data.
Average AUC from 1000 TWAS simulations while varying the overall heritability of gene expression. Each phenotype (1000 per proportion of heritability) was generated from 300 (100 genes and 3 contexts each) randomly selected gene-context pairs' genetically regulated gene expression, and the 300 gene-context pairs' genetically regulated expression accounted for 20% of the variability in the phenotype. In genes with low heritability, CONTENT(Shared) performed similarly to CONTENT (Full), however CONTENT(Full) was the most powerful method in discovering the correct genes for TWAS across the range of heritability. CONTENT(Full) was significantly more powerful than UTMOST and the context-by-context approach at all levels of heritability.
CONTENT outperforms existing methods in TWAS across 22 complex traits and diseases
| GTEx | CLUES | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Trait | Context-by-context | UTMOST | CONTENT (All) | CONTENT (Full) | CONTENT (Specific) | CONTENT (Shared) | Context-by- context | UTMOST | CONTENT (All) | CONTENT (Full) | CONTENT (Specific) | CONTENT (Shared) |
| AD | 17 | 9 | 20 | 20 | 11 | 7 | 5 | 9 | 13 | 3 | ||
| Asthma | 155 | 90 | 181 | 195 | 67 | 74 | 63 | 101 | 104 | 34 | ||
| Bipolar | 42 | 45 | 63 | 65 | 39 | 9 | 14 | 20 | 25 | 5 | ||
| CAD | 10 | 11 | 18 | 15 | 8 | 6 | 6 | 7 | 6 | 0 | ||
| CKD | 26 | 19 | 31 | 32 | 18 | 2 | 4 | 5 | 5 | 1 | ||
| Crohn’s | 77 | 63 | 73 | 83 | 47 | 27 | 22 | 30 | 37 | 9 | ||
| Eczema | 32 | 13 | 44 | 41 | 10 | 8 | 5 | 9 | 9 | 3 | ||
| FastGlu | 16 | 8 | 12 | 14 | 7 | 3 | 3 | 6 | 6 | 6 | 0 | |
| HDL | 58 | 29 | 60 | 73 | 36 | 21 | 14 | 23 | 25 | 6 | ||
| IBS | 9 | 5 | 20 | 16 | 3 | 3 | 1 | 5 | 6 | 1 | ||
| LDL | 89 | 57 | 107 | 116 | 58 | 47 | 29 | 40 | 44 | 14 | ||
| Lupus | 93 | 54 | 94 | 104 | 51 | 36 | 27 | 42 | 48 | 11 | ||
| MDD | 99 | 79 | 132 | 134 | 62 | 20 | 29 | 32 | 39 | 3 | ||
| MS | 20 | 10 | 32 | 25 | 9 | 9 | 7 | 8 | 10 | 5 | ||
| PBC | 62 | 42 | 55 | 58 | 33 | 21 | 14 | 24 | 26 | 6 | ||
| Psoriasis | 47 | 22 | 46 | 41 | 16 | 13 | 10 | 17 | 16 | 6 | ||
| RA | 73 | 56 | 79 | 86 | 46 | 40 | 20 | 33 | 45 | 9 | ||
| Sarcoidosis | 19 | 13 | 27 | 26 | 8 | 6 | 4 | 6 | 6 | 4 | 2 | |
| Sjogren | 17 | 9 | 25 | 21 | 6 | 4 | 2 | 6 | 6 | 1 | ||
| T1D | 77 | 64 | 88 | 84 | 49 | 26 | 23 | 36 | 29 | 13 | ||
| T2D | 193 | 115 | 208 | 205 | 112 | 76 | 76 | 77 | 98 | 17 | ||
| Ulc colitis | 16 | 10 | 30 | 26 | 7 | 5 | 4 | 9 | 7 | 2 | ||
TWAS results (unique loci, merging genes within 1MB) across 22 complex traits and diseases using weights output by CONTENT, UTMOST, and the context-by-context method. CONTENT(All) refers to the collection of all loci output by at least one CONTENT model. CONTENT(Full) added an average of 15% and 19% of gene-trait discoveries over the CONTENT(Shared) and CONTENT(Specific) approaches together at an hFDR of 5% in GTEx and CLUES respectively.
Numbers in bold font indicate the method with the greatest number of discoveries. AD Alzheimer’s disease, CAD Coronary Artery Disease, CKD Chronic Kidney Disease, Crohn’s Crohn’s Disease, FastGlu Fasting Glucose, GFR Glomerular filtration rate, HDL High-density lipoprotein, IBS Irritable bowel syndrome, LDL Low-density lipoprotein, Lupus Systemic lupus erythematosus, MDD Major depressive disorder, MS Multiple sclerosis, PBC Primary biliary cholangitis, RA Rheumatoid arthritis, Sjogren Sjögren’s syndrome, T1D Type 1 diabetes, T2D Type 2 diabetes, TG Triglycerides, Ulc colitis Ulcerative colitis.
See Supplementary Table 2 for GWAS trait information