| Literature DB >> 33901188 |
Binglan Li1, Yogasudha Veturi2, Anurag Verma2, Yuki Bradford2, Eric S Daar3, Roy M Gulick4, Sharon A Riddler5, Gregory K Robbins6, Jeffrey L Lennox7, David W Haas8,9, Marylyn D Ritchie2,10.
Abstract
As a type of relatively new methodology, the transcriptome-wide association study (TWAS) has gained interest due to capacity for gene-level association testing. However, the development of TWAS has outpaced statistical evaluation of TWAS gene prioritization performance. Current TWAS methods vary in underlying biological assumptions about tissue specificity of transcriptional regulatory mechanisms. In a previous study from our group, this may have affected whether TWAS methods better identified associations in single tissues versus multiple tissues. We therefore designed simulation analyses to examine how the interplay between particular TWAS methods and tissue specificity of gene expression affects power and type I error rates for gene prioritization. We found that cross-tissue identification of expression quantitative trait loci (eQTLs) improved TWAS power. Single-tissue TWAS (i.e., PrediXcan) had robust power to identify genes expressed in single tissues, but, often found significant associations in the wrong tissues as well (therefore had high false positive rates). Cross-tissue TWAS (i.e., UTMOST) had overall equal or greater power and controlled type I error rates for genes expressed in multiple tissues. Based on these simulation results, we applied a tissue specificity-aware TWAS (TSA-TWAS) analytic framework to look for gene-based associations with pre-treatment laboratory values from AIDS Clinical Trial Group (ACTG) studies. We replicated several proof-of-concept transcriptionally regulated gene-trait associations, including UGT1A1 (encoding bilirubin uridine diphosphate glucuronosyltransferase enzyme) and total bilirubin levels (p = 3.59×10-12), and CETP (cholesteryl ester transfer protein) with high-density lipoprotein cholesterol (p = 4.49×10-12). We also identified several novel genes associated with metabolic and virologic traits, as well as pleiotropic genes that linked plasma viral load, absolute basophil count, and/or triglyceride levels. By highlighting the advantages of different TWAS methods, our simulation study promotes a tissue specificity-aware TWAS analytic framework that revealed novel aspects of HIV-related traits.Entities:
Mesh:
Year: 2021 PMID: 33901188 PMCID: PMC8102009 DOI: 10.1371/journal.pgen.1009464
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 6.020
Fig 1Cross-tissue TWAS simulation scheme.
With the simulation parameters, we were able to generate SNP-gene-trait relations of varied tissue specificity backgrounds. In each replication, simulated datasets were divided into an eQTL detection dataset and a TWAS dataset. The former was used to identify eQTLs using different eQTL detection methods and the sample size was equivalent to that of GTEx. The detected eQTLs were then passed, separately, to the TWAS dataset to assist gene-level association tests. The TWAS dataset sample size was equivalent of that of the ACTG clinical trial dataset. Two types of gene-level association approaches estimated and ascribed p-values to the simulated gene-trait relations. In each replication, we simulated 100 different SNP-gene-trait pairs for one single point estimation of TWAS gene prioritization performance. All association p-values had been adjusted for the number of genes and tissues in each replication. 20 independent replications were conducted to obtain the distribution of TWAS performance statistics.
TWAS methods tested in this simulation study.
| eQTL detection methods | Gene-trait association approaches | Equivalent developed TWAS methods | PMID | ||
|---|---|---|---|---|---|
| Type | Name | Type | Name | ||
| Single tissue-based | Elastic net | Single-tissue association | Linear or logistic regression | PrediXcan | 26258848 |
| Integrative tissue-based | Group LASSO | Single-tissue association | Linear or logistic regression | Single-tissue UTMOST | 30804563 |
| Single tissue-based | Elastic net | Cross-tissue association | Principal component regression | MulTiXcan | 30668570 |
| Integrative tissue-based | Group LASSO | Cross-tissue association | Generalized Berk-Jones test | Cross-tissue UTMOST | 30804563 |
Fig 2Power of different TWAS methods in prioritizing genes of varied tissue specificity properties.
Power was the proportion of successfully identified gene-trait associations in the causal tissue out of all simulations. X-axis is the number of gene-expressing tissues. Each column stands for the proportion of eQTLs that are shared among tissues for a gene. Each row is the similarity of gene expression profiles across tissues which is estimated by correlation. Moving from the top left to the bottom right is a gradient spectrum from tissue-specific genes to broadly expressed genes. The colors represent different TWAS methods and y-axis is the power. For tissue-specific genes at the top left, single-tissue TWAS (Elastic Net-SLR) and cross-tissue TWAS (Group LASSO-GBJ) had similar power. For broadly expressed genes at the bottom right, cross-tissue TWAS (Group LASSO-GBJ) had greater power. Brackets showed pairwise comparison of power between the Group LASSO-GBJ and other TWAS methods using Wilcoxon Signed-rank Test. Black brackets were cases where Group LASSO-GBJ had higher power than other three methods; red brackets were cases where Group LASSO-GBJ had lower power than other three methods. *p-value < 0.05, **p-value < 0.01, ***p-value < 0.0001.
Fig 3Type I error rates of different TWAS methods in prioritizing genes of diverse tissue specificity properties.
Type I error rate was the probability that TWAS wrongly identified a gene-trait association as significant while there was not any signal simulated in the dataset. Association p-values were controlled for the number of genes and tested tissues. X-axis is the number of gene-expressing tissues. Each column stands for the proportion of eQTLs that are shared among tissues for a gene. Each row is the similarity of gene expression profiles across tissues which is estimated by correlation. Moving from the top left to the bottom right is a gradient spectrum from tissue-specific genes to broadly expressed genes. The colors represent different TWAS methods and y-axis is the type I error rate. All TWAS methods had controlled type I error rates (≤ 5%).
Fig 4False positive rates of tissues among statistically significant results.
False positive rates were the proportion of significant associations found in trait-irrelevant tissues amongst all significant results. Association p-values were controlled for the number of genes and tested tissues. X-axis is the number of gene-expressing tissues. Each column stands for the proportion of eQTLs that are shared among tissues for a gene. Each row is the similarity of gene expression profiles across tissues which is estimated by correlation. Moving from the top left to the bottom right is a gradient spectrum from tissue-specific genes to broadly expressed genes. Colors represent different TWAS methods and y-axis is the false positive rate of tissues among statistically significant results. Single-tissue TWAS wrongly identified 5% and 77% trait-irrelevant tissues for tissue-specific genes and broadly expressed genes, respectively.
Fig 5A proposed TSA-TWAS analytic framework that leverages TWAS performance on genes of different tissue specificity properties.
The framework proposed based on our simulations is as follows: If trait-related tissue(s) are known for a trait or disease of interest, run single-tissue TWAS, for example, PrediXcan. If trait-related tissue(s) are unknown, run cross-tissue TWAS (UTMOST) on the genes that are expressed in more than one tissue and run single-tissue TWAS (PrediXcan) on the genes that are expressed in one single tissue.
Fig 6Power of the TSA-TWAS framework when there were different proportions of tissue-specific genes in the data.
The power of TSA-TWAS was compared to only running single-tissue TWAS (elastic net-SLR) and cross-tissue TWAS (Group LASSO-GBJ test). TSA-TWAS had consistent power of identifying complex trait-related genes and was robust to makeups of tissue-specific and multi-tissue genes in a dataset.
Fig 7The TSA-TWAS analytic framework for the ACTG combined genotyping phase I-IV baseline laboratory traits.
Approximately 2.2 million SNPs, 4,360 individuals, and 37 baseline laboratory traits survived the QC. UTMOST eQTL models were used to impute GReX of a total of 12,038 genes in 49 tissues. 2,812 genes (23%) had GReX in one single tissue, and 9,226 genes (77%) had GReX in more than one tissue.
Summary statistics of the ACTG genotyping phase I-IV baseline laboratories.
| Trait | Sample Size | Mean | Std. Dev. | Min | Max | Transformation | Unit | Description |
|---|---|---|---|---|---|---|---|---|
| Albumin | 1216 | 4.05 | 0.44 | 1.80 | 5.30 | g/dL | week 0 albumin (Alb, g/dL) | |
| Bicarbonate | 3971 | 26.01 | 2.94 | 12.00 | 35.00 | mmol/L | week 0 bicarbonate (Bicarb, mmol/L) | |
| Calcium | 1336 | 9.17 | 0.44 | 7.40 | 10.80 | mg/dL | week 0 calcium (Ca, mg/dL) | |
| Chloride | 4048 | 103.27 | 2.94 | 88.00 | 117.00 | mmol/L | week 0 chloride (Cl, mmol/L) | |
| Cholesterol | 4286 | 159.27 | 36.80 | 5.90 | 414.00 | mg/dL | week 0 cholesterol (Chol, mg/dL) | |
| Creatinine | 4100 | 0.91 | 0.20 | 0.05 | 2.80 | mg/dL | week 0 creatinine (Creat, mg/dL) | |
| HDL-c | 2376 | 37.31 | 12.78 | 3.90 | 148.00 | mg/dL | week 0 HDL-c (HDL-c, mg/dL) | |
| Hemoglobin | 4293 | 13.49 | 1.77 | 6.00 | 20.20 | g/dL | week 0 hemoglobin (Hgb, g/dL) | |
| Absolute basophil count | 2526 | 1.44 | 0.32 | 0.00 | 3.39 | Log10 | cells/mm3 | log10 transformed week 0 absolute basophil count (Baso, cells/mm3) |
| Absolute eosinophil count | 3932 | 2.06 | 0.40 | 0.18 | 3.55 | Log10 | cells/mm3 | log10 transformed week 0 absolute eosinophil count (Eos, cells/mm3) |
| Alkaline phosphatase | 4226 | 1.88 | 0.15 | 0.70 | 2.72 | Log10 | U/L | log10 transformed week 0 alkaline phosphatase (AlkP, U/L) |
| ALT | 4233 | 1.48 | 0.27 | 0.04 | 2.81 | Log10 | U/L | log10 transformed week 0 ALT (ALT, U/L) |
| Absolute lymphocyte count | 4149 | 3.11 | 0.24 | 0.92 | 4.03 | Log10 | cells/mm3 | log10 transformed week 0 absolute lymphocyte count (Lymph, cells/mm3) |
| Absolute monocyte count | 4116 | 2.58 | 0.21 | 0.66 | 3.69 | Log10 | cells/mm3 | log10 transformed week 0 absolute monocyte count (Mono, cells/mm3) |
| Amylase | 1026 | 1.85 | 0.20 | 1.11 | 2.89 | Log10 | U/L | log10 transformed week 0 amylase (Amyl, U/L) |
| Absolute neutrophil count | 4277 | 3.32 | 0.21 | 2.28 | 4.67 | Log10 | cells/mm3 | log10 transformed week 0 absolute neutrophil count (ANC, cells/mm3) |
| AST | 4235 | 1.49 | 0.21 | 0.48 | 2.81 | Log10 | U/L | log10 transformed week 0 AST (AST, U/L) |
| BUN | 4221 | 1.08 | 0.15 | -0.22 | 2.17 | Log10 | mg/dL | log10 transformed week 0 BUN (BUN, mg/dL) |
| CK | 1360 | 1.97 | 0.38 | -0.05 | 3.79 | Log10 | U/L | log10 transformed week 0 CK (CK, U/L) |
| Fasting glucose | 3233 | 1.93 | 0.08 | 1.52 | 2.64 | Log10 | mg/dL | log10 transformed week 0 fasting glucose (Gluc fasting, mg/dL) |
| Glucose (Log10) | 3031 | 1.93 | 0.08 | 1.70 | 2.77 | Log10 | mg/dL | log10 transformed week 0 glucose (Gluc, mg/dL) |
| LDL-c | 3539 | 1.95 | 0.16 | 0.00 | 2.57 | Log10 | mg/dL | log10 transformed week 0 LDL-c (LDL-c, mg/dL) |
| Lipoprotein | 1118 | 1.58 | 0.32 | 0.30 | 2.85 | Log10 | log10 transformed week 0 lipoprotein | |
| Platelet count | 4263 | 2.30 | 0.15 | 1.15 | 3.34 | Log10 | x10E9/L | log10 transformed week 0 platelet count (Plat, x10E9/L) |
| Total bilirubin | 4202 | -0.31 | 0.21 | -1.00 | 0.49 | Log10 | mg/dL | log10 transformed week 0 total bilirubin (TBili, mg/dL) |
| Triglyceride | 4318 | 2.07 | 0.25 | 1.08 | 3.45 | Log10 | mg/dL | log10 transformed week 0 triglyceride (Trig, mg/dL) |
| White blood cell count | 4279 | 0.62 | 0.16 | -0.05 | 1.49 | Log10 | x10E3 cells/cu mm | log10 transformed week 0 white blood cell count (WBC, x10E3 cells/cu mm) |
| Hematocrit | 4274 | 39.83 | 5.10 | 1.00 | 62.10 | percent | week 0 hematocrit (Hct, percent) | |
| Phosphate | 3261 | 3.44 | 0.61 | 0.80 | 7.70 | mg/dL | week 0 phosphate (Phos, mg/dL) | |
| Potassium | 4062 | 4.15 | 0.39 | 2.00 | 8.00 | mmol/L | week 0 potassium (K, mmol/L) | |
| Sodium | 4067 | 138.88 | 2.80 | 123.00 | 151.00 | mmol/L | week 0 sodium (Na, mmol/L) | |
| CD4 count | 4358 | 14.78 | 6.46 | 0.00 | 36.55 | Square root | cells/mm3 | square root of absolute CD4 count at week 0 |
| Viral load | 4358 | 4.75 | 0.72 | 2.02 | 7.11 | Log10 | copies/dL | week 0 viral load RNA |
| Fasting cholesterol | 4136 | 158.42 | 36.24 | 6.10 | 414 | mg/dL | week 0 fasting cholesterol | |
| Fasting HDL-c | 4126 | 1.56 | 0.15 | 0.60 | 2.20 | Log10 | mg/dL | log10 transformed week 0 fasting HDL-c |
| Fasting LDL-c | 4042 | 1.95 | 0.15 | 0.85 | 2.57 | Log10 | mg/dL | log10 transformed week 0 fasting LDL-c |
| Fasting triglyceride | 3888 | 2.05 | 0.24 | 1.08 | 2.45 | Log10 | mg/dL | log10 transformed week 0 fasting triglycerides |
Fig 8PhenoGram of statistically significant gene-trait associations identified by the TSA-TWAS analytic framework.
We plotted the associations with p-value < 1.12×10−7. Each association is arranged according to the SNP location on each chromosome and the points are color-coded by baseline laboratory values. Diamonds represented previously reported or replicated associations, and circle represented novel associations identified in this study.
Replicated associations related to HIV baseline laboratory values identified by TSA-TWAS.
| Trait | Gene | Chromosome | TSS | P | Colocalized Tissues | Locus RCP |
|---|---|---|---|---|---|---|
| Alkaline phosphatase | 6 | 24428177 | 1.08E-11 | Esophagus | 0.1006 | |
| 6 | 24494852 | 1.79E-11 | Liver | 0.1805 | ||
| Fasting HDL | 16 | 56961850 | 4.49E-12 | Adipose | 0.0916 | |
| HDL | 16 | 56961850 | 4.49E-12 | Artery | 0.2837 | |
| Total bilirubin | 2 | 233692866 | 2.78E-15 | |||
| 2 | 233775679 | 1.39E-12 | ||||
| 2 | 233760248 | 3.59E-12 | Liver | 0.1318 | ||
| 2 | 233681938 | 4.51E-12 | ||||
| Triglyceride | 21 | 38256698 | 3.18E-13 | |||
| Viral load | 6 | 32014762 | 4.11E-15 | |||
| 6 | 29602228 | 1.14E-12 | ||||
| 7 | 87401697 | 1.07E-11 | ||||
| 6 | 31269491 | 1.15E-11 | ||||
| 6 | 31834608 | 2.32E-11 | ||||
| 22 | 42692121 | 8.39E-11 |
Novel associations related to HIV baseline laboratory values identified by TSA-TWAS.
| Trait | Gene | Chromosome | TSS | P | Colocalized Tissues | Locus RCP |
|---|---|---|---|---|---|---|
| Absolute basophil count | 7 | 66628767 | 3.08E-14 | |||
| 20 | 35955360 | 3.83E-13 | ||||
| 6 | 47477789 | 7.27E-13 | ||||
| 6 | 47477243 | 1.32E-12 | ||||
| 6 | 31834608 | 1.69E-12 | ||||
| 4 | 74933095 | 1.84E-11 | ||||
| 3 | 49108046 | 1.51E-10 | ||||
| 1 | 156594487 | 2.24E-10 | ||||
| 7 | 107470018 | 1.81E-09 | ||||
| 6 | 26156354 | 2.19E-09 | ||||
| 19 | 8321500 | 2.87E-09 | ||||
| 21 | 38256698 | 4.72E-09 | ||||
| 8 | 33473423 | 6.35E-09 | ||||
| 17 | 47967810 | 1.05E-08 | ||||
| 5 | 76818933 | 2.99E-08 | ||||
| 6 | 32014762 | 8.92E-08 | ||||
| Absolute neutrophil count | 1 | 154924734 | 3.63E-08 | Adipose | 0.0663 | |
| Alkaline phosphatase | 5 | 141100756 | 7.44E-09 | Artery | 0.1294 | |
| 21 | 38256698 | 2.77E-08 | ||||
| Fasting HDL | 16 | 56989485 | 1.70E-09 | Adrenal Gland | 0.0292 | |
| Sodium | 20 | 35955360 | 7.71E-08 | |||
| Triglyceride | 5 | 141100756 | 5.78E-14 | |||
| 1 | 156594487 | 2.12E-12 | ||||
| 20 | 35955360 | 7.21E-12 | ||||
| 6 | 31834608 | 9.13E-12 | ||||
| 4 | 74933095 | 1.88E-11 | ||||
| 8 | 33473423 | 2.69E-11 | ||||
| 3 | 49108046 | 9.69E-11 | ||||
| 6 | 26156354 | 1.20E-10 | ||||
| 6 | 47477789 | 1.04E-09 | ||||
| 6 | 47477243 | 1.17E-09 | ||||
| 6 | 32014762 | 1.23E-09 | ||||
| 7 | 66628767 | 1.34E-08 | ||||
| 19 | 8321500 | 1.40E-08 | ||||
| 11 | 36594493 | 1.94E-08 | ||||
| 6 | 30626842 | 5.32E-08 | ||||
| Viral load | 6 | 30676389 | 6.27E-14 | |||
| 11 | 64318088 | 7.01E-14 | ||||
| 5 | 76818933 | 1.81E-12 | ||||
| 17 | 47967810 | 1.95E-12 | ||||
| 19 | 8321500 | 3.50E-12 | ||||
| 3 | 49108046 | 3.60E-12 | ||||
| 7 | 66628767 | 3.84E-12 | ||||
| 8 | 33473423 | 4.27E-12 | ||||
| 1 | 161037631 | 4.57E-12 | ||||
| 16 | 23557732 | 5.27E-12 | ||||
| 6 | 47477243 | 1.05E-11 | ||||
| 21 | 38256698 | 1.08E-11 | ||||
| 6 | 47477789 | 1.54E-11 | ||||
| 20 | 35955360 | 1.70E-11 | ||||
| 4 | 74933095 | 1.86E-11 | ||||
| 6 | 30626842 | 2.20E-11 | ||||
| 6 | 26156354 | 8.44E-11 | ||||
| 6 | 152987362 | 1.14E-10 | ||||
| 3 | 158571163 | 1.23E-10 | ||||
| 5 | 141100756 | 2.42E-09 | ||||
| 6 | 32115335 | 2.83E-09 | ||||
| 7 | 107470018 | 3.75E-09 | ||||
| 10 | 6088987 | 5.39E-09 | ||||
| 6 | 46704320 | 6.34E-09 | ||||
| 1 | 156594487 | 1.81E-08 | ||||
| 5 | 53560633 | 2.07E-08 | ||||
| 11 | 36594493 | 2.14E-08 | ||||
| 6 | 31665391 | 2.76E-08 | ||||
| 13 | 99254714 | 4.10E-08 | ||||
| 2 | 36531805 | 4.54E-08 |