| Literature DB >> 32499813 |
Jake Lin1, Rubina Tabassum1, Samuli Ripatti1,2,3, Matti Pirinen1,2,4.
Abstract
BACKGROUND: Multivariate testing tools that integrate multiple genome-wide association studies (GWAS) have become important as the number of phenotypes gathered from study cohorts and biobanks has increased. While these tools have been shown to boost statistical power considerably over univariate tests, an important remaining challenge is to interpret which traits are driving the multivariate association and which traits are just passengers with minor contributions to the genotype-phenotypes association statistic.Entities:
Keywords: Bayesian information criteria; canonical correlation; feature selection; genotype phenotype correlation studies; multivariate GWAS; multivariate analysis; pheno- and genotypes; visualilzation
Year: 2020 PMID: 32499813 PMCID: PMC7242752 DOI: 10.3389/fgene.2020.00431
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1MetaPhat workflow 1. GWAS results for K traits are accepted as input. 2. After quality control and filtering, a multivariate GWAS is performed on the full model with all K traits using metaCCA via efficient multi-processing and chunking to reduce computation time. 3. Lead SNPs are detected and sorted based on the leading canonical correlation/P-value and then clumped based on a user-specified window size. Custom variants can be added. 4. Decomposition of chosen variants is performed through highest and lowest traces to find an optimal subset with a minimum BIC and driver traits based on the established P-value threshold. 5. MetaPhat results include trace plots for P-values and BIC, univariate association statistics plots for all lead SNPs, cluster maps (shown in Figure 2), and a summary table listing central traits (union of drivers and optimal subset).
FIGURE 2MetaPhat results using multivariate lipidomics data. (A) Trace plot of rs7412 identifies CE14 and PCO23 as the driver traits. (B) CE14, PC36, and PCO23 form the optimal subset as defined by minimum BIC (highest negative BIC). (C) Trait importance map of each SNP is the rank on the lowest trace where the rankings are transformed to the range of 0 and 1 values, with darker blue shades representing the most important traits of the multivariate association. (D) SNP similarity based on the rank correlation on the lowest trace.
MetaPhat terminology.
| Highest trace | Starting from the full model of K traits, we tested all unique combinations of (K-1) traits to find the subset with the highest CCA statistic (lowest |
| Lowest trace | Starting from the full model of K traits, we tested all unique combinations of (K-1) traits to find the subset with the lowest CCA statistic (highest |
| Inverted trace | Aggregates the traits that have been dropped on the lowest trace. The goal was to include the driver sets into the search space for the |
| Drivers/driver traits | The traits that have been dropped on the lowest trace at the step where the multivariate |
| Optimal set | The subset of traits that has the lowest BIC among subsets across all three traces. Interpretation: the set that is a statistically optimal description of the multivariate association. |
| Central traits | Union of drivers and optimal set. Interpretation: includes the important traits of the multivariate association. |
Lipid traits used in MetaPhat analysis.
| (A) PLASMA LIPIDOMICS | ||||||
| Identifier | Lipid class | Lipid species | QC’d variants | HDL corr. | LDL corr. | TG corr. |
| CE14 | Cholesteryl ester | 8,711,715 | 0.032 | 0.464 | 0.251 | |
| CE15 | Cholesteryl ester | 8,711,715 | 0.067 | 0.396 | 0.188 | |
| CE17 | Cholesteryl ester | 8,711,665 | 0.107 | 0.394 | 0.107 | |
| LPC8 | Lysophospatidylcholines | 8,710,151 | 0.011 | –0.124 | –0.083 | |
| LPC9 | Lysophospatidylcholines | 8,694,250 | 0.114 | –0.015 | –0.118 | |
| LPE5 | Lysophosphatidylethanolamine | 8,710,162 | 0.077 | –0.077 | 0.073 | |
| LPE6 | Lysophosphatidylethanolamine | 8,711,037 | 0.235 | 0.005 | 0.041 | |
| PC17 | Phosphatidylcholine | 8,711,715 | 0.120 | 0.115 | 0.361 | |
| PC18 | Phosphatidylcholine | 8,711,533 | 0.126 | 0.196 | 0.248 | |
| PC29 | Phosphatidylcholine | 8,704,982 | 0.113 | 0.138 | 0.250 | |
| PC36 | Phosphatidylcholine | 8,711,715 | 0.033 | 0.190 | 0.336 | |
| PC37 | Phosphatidylcholine | 8,751,062 | 0.061 | 0.242 | 0.243 | |
| PC46 | Phosphatidylcholine | 8,711,715 | 0.240 | 0.105 | 0.214 | |
| PC21 | Phosphatidylcholine | 8,711,715 | 0.154 | 0.204 | 0.219 | |
| PCO7 | Phosphatidylcholine-ether | 8,711,715 | 0.081 | 0.194 | 0.076 | |
| PCO23 | Phosphatidylcholine-ether | 8,711,560 | 0.187 | 0.115 | –0.154 | |
| PCO29 | Phosphatidylcholine-ether | 8,710,292 | 0.198 | 0.115 | –0.086 | |
| PE7 | Phosphatidylethanolamine | 8,707,361 | –0.027 | 0.028 | 0.585 | |
| PEO3 | Phosphatidylethanolamine-ether | 8,706,846 | 0.083 | 0.198 | 0.154 | |
| PEO11 | Phosphatidylethanolamine-ether | 8,693,147 | 0.148 | 0.238 | 0.099 | |
| PI9 | Phosphatidylinositol | 8,711,715 | –0.026 | 0.231 | 0.460 | |
| HDL | High-density lipoprotein cholesterol | 2,343,025 | 95,129 | |||
| LDL | Low-density lipoprotein cholesterol | 2,271,091 | 90,421 | |||
| TC | Total cholesterol | 2,341,292 | 95,537 | |||
| TG | Triglycerides | 2,286,633 | 91,598 | |||
MetaPhat results of the 7 lead variants from the multivariate analyses of the lipidomics data.
| Variant/Gene | Samples missing | Driver trait(s) | BIC optimal subset | Central traits | |||
| *rs174567/FADS2 | 1.3% | 2.40e–145 | PC36, CE14, PC17, LPC8, PEO11, PEO3, LPE5, PC21, PC46, PC29, CE15, PC37, PC18, PCO7, PCO29, PCO23, PI9, PE7 | 1.95e–05 | CE15, LPC8, PC17, PC21, PC36, PC46, PE7, PEO11, PI9 | 2.10e–146 | PC36, CE14, PC17, LPC8, PEO11, PEO3, LPE5, PC21, PC46, PC29, CE15, PC37, PC18, PCO7, PCO29, PCO23, PI9, PE7 |
| *rs66505542/BUD13 | 0.1% | 1.55e–08 | PI9 | 3.39e–04 | PI9, LPC9, PC36 | 3.27e–12 | PI9, LPC9, PC36 |
| rs146327691/SLCO1A2_ UTR | 1.2% | 4.27e–08 | LPE5 | 1.91e–06 | LPE5, LPC9, LPE6, PE7 | 5.60e–11 | LPE5, LPC9, LPE6, PE7 |
| rs188167837/ENSG00 000200733_UTR_13KB | 1.0% | 2.95e–08 | PC17 | 7.59e–05 | PC17, CE14, CE17, PC21 | 4.64e–09 | PC17, CE14, CE17, PC21 |
| *rs261290/ALDH1A2 | 0.6% | 2.51e–40 | PE7 | 2.04e–07 | PE7, CE15, PC17, PCO29, PI9 | 1.37e–46 | PE7, CE15, PC17, PCO29, PI9 |
| *rs7412/APOE | 0% | 4.17e–13 | CE14, PCO23 | 1.82e–06 | CE14, PCO23, PC36 | 5.79e–18 | CE14, PCO23, PC36 |
| rs8736/MBOAT7 | 23.6% | 9.12e–50 | PI9 | 5.89e–02 | PI9, LPE6, PC36, PE7 | 1.25e–81 | PI9, LPE6, PC17 |
MetaPhat detection of driver and optimal lipid sets for 13 variants reported to be associated with at least three lipids by GLGC (12).
| Gene | Variant Chr:Pos | GLGC associated lipids | GLGC lead | MetaPhat all traits | MetaPhat driver(s) | Without driver (s) | BIC optimal set | Central traits |
| rs12748152 chr1:27138393 | HDL LDL TG | 1e–15 | 2.8e–23 | HDL LDL TG | 3.0e–06 | HDL LDL | HDL LDL TG | |
| rs9987289 chr8:9183358 | HDL LDL TC | 2e–41 | 1.6e–76 | HDL TC LDL | 1.0e–04 | HDL LDL | HDL LDL TC | |
| rs1532085 chr15:58683366 | HDL TC TG | 1e–188 | 0 | HDL TC TG | 6.4e–01 | HDL TC TG | HDL TC TG | |
| rs3764261 chr16:56993324 | ALL | 1e–769 | 0 | ALL | NA | HDL LDL TG | ALL | |
| rs4722551 7:25991826 | LDL TG TC | 4e–14 | 2.5e–24 | TG LDL TC | 2.0e–02 | LDL TG | LDL TG TC | |
| rs4420638 19:45422946 | ALL | 2e–178 | 6.3e–210 | ALL | NA | LDL HDL TC | ALL | |
| rs6882076 5:156390297 | TC LDL TG | 5e–41 | 1.3e–49 | TG TC LDL | 6.9e–01 | TC TG | TC LDL TG | |
| rs10401969 19:19407718 | TC TG LDL | 4e–77 | 1.3e–138 | TG TC LDL | 1.0e–01 | TC TG | TC TG LDL | |
| rs6831256 4:3473139 | TG TC LDL | 2e–12 | 6.3e–16 | TG TC | 1.0e–07 | TG TC | TG TC | |
| rs2131925 1:63025942 | TG LDL TC | 3e–74 | 7.8e–157 | TG LDL TC | 9.5e–05 | TG TC HDL | ALL | |
| rs2954029 8:126490972 | ALL | 1e–107 | 1.6e–148 | ALL | NA | TG TC LDL | ALL | |
| rs174546 11:61569830 | ALL | 7e–38 | 1.3e–104 | ALL | NA | ALL | ALL | |
| rs964184 11:116648917 | ALL | 7e–224 | 7.9e–264 | ALL | NA | TG TC | ALL | |