| Literature DB >> 31596850 |
Michael C Turchin1, Matthew Stephens1,2.
Abstract
Genome-wide association studies (GWAS) have now been conducted for hundreds of phenotypes of relevance to human health. Many such GWAS involve multiple closely-related phenotypes collected on the same samples. However, the vast majority of these GWAS have been analyzed using simple univariate analyses, which consider one phenotype at a time. This is despite the fact that, at least in simulation experiments, multivariate analyses have been shown to be more powerful at detecting associations. Here, we conduct multivariate association analyses on 13 different publicly-available GWAS datasets that involve multiple closely-related phenotypes. These data include large studies of anthropometric traits (GIANT), plasma lipid traits (GlobalLipids), and red blood cell traits (HaemgenRBC). Our analyses identify many new associations (433 in total across the 13 studies), many of which replicate when follow-up samples are available. Overall, our results demonstrate that multivariate analyses can help make more effective use of data from both existing and future GWAS.Entities:
Mesh:
Year: 2019 PMID: 31596850 PMCID: PMC6802844 DOI: 10.1371/journal.pgen.1008431
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Dataset summary.
| Dataset | Release | Phenotypes | |
|---|---|---|---|
| GlobalLipids | 2010 | 95454 | LDL, HDL, TC, TG |
| 2013 | 188577 | LDL, HDL, TC, TG | |
| GIANT | 2010 | 77167 | Height, BMI, WHRadjBMI |
| 2014/5 | 224459 | Height, BMI, WHRadjBMI | |
| HaemgenRBC | 2012 | 135367 | RBC, PCV, MCV, MCH, MCHC, Hb |
| 2016 | 173480 | RBC, PCV, MCV, MCH, MCHC, Hb | |
| ICBP | 2011 | 69395 | SBP, DBP, PP, MAP |
| MAGIC | 2010 | 46186 | FstIns, FstGlu, HOMA_B, HOMA_IR |
| GEFOS | 2015 | 32965 | FA, FN, LS |
| GIS | 2014 | 48972 | Iron, Sat, TrnsFrn, Log10Frtn |
| SSGAC | 2016 | 343072 | NEB_Pooled, AFB_Pooled |
| CKDGen | 2010/1 | 67093 | Crea, Cys, CKD, UACR, MA |
| ENIGMA2 | 2015 | 30717 | ICV, Accumbens, Amygdala, Caudate, Hippocampus, Pallidum, Putamen, Thalamus |
N is the maximum number of samples contributing to each study.
a—Low-Density Lipoproteins (LDL), High-Density Lipoproteins (HDL), Total Cholesterol (TC), Total Triglycerides (TG)
b—Body Mass Index (BMI), Waist-Hip Ratio adjusted for BMI (WHRadjBMI)
c—Red Blood Cell Count (RBC), Packed Cell Volume (PCV), Mean Cell Volume (MCV), Mean Cell Haemoglobin (MCH), Mean Cell Haemoglobin Concentration (MCHC), Haemoglobin (Hb)
d—Systolic Blood Pressure (SBP), Diastolic Blood Pressure (DBP), Pulse Pressure (PP), Mean Arterial Pressure (MAP)
e—Fasting Insulin (FstIns), Fasting Glucose (FstGlu), Homeostatic Model Assessment of Beta Cell Function (HOMA_B), Homeostatic Model Assessment of Insulin Resistance Function (HOMA_IR)
f—Forearm Bone Mineral Density (FA), Femoral Neck Bone Mineral Density (FN), Lumbar Spine Bone Mineral Density (LS)
g—Serum Iron (Iron), Serum Transferrin Saturation (Sat), Serum Transferrin (TrnsFrn), Log-Transformed Ferritin (Log10Frtn)
h—Number of Children Ever Born, Male & Female (NEB_Pooled), Age at First Birth, Male & Female (AFB_Pooled)
i—Serum Creatine (Crea), Serum Cystatin (Cys), Chronic Kidney Disease (CKD), Urinary Albumin-to-Creatine Ratio (UACR), Microalbuminuria (MA)
j—Intracranial Volume (ICV), specified subcortical brain structures refer to MRI-derived volume measurements for each one
Fig 1Number of independent significant SNPs, by study.
The barplot shows the number of independent SNPs that were significant in previous univariate analyses (blue) and the number of additional significant associations in our new multivariate analyses (red). For univariate analysis, significance levels were set by the original study. For multivariate analyses, we declared a SNP to be significant if its weighted average Bayes Factor (BFav) exceeded that of the smallest BFav among the previous univariate significant SNPs. We considered SNPs more than.5Mb apart to be independent. See Table 1 and Methods for phenotype details, Methods for further analysis details, and S2–S4 Tables for lists of significant SNPs from each dataset.
Summary of new multivariate associations identified.
| —SNP Associations— | |||||
|---|---|---|---|---|---|
| Dataset | Release | Previous Univariate | New Multivariate | BFav Thresh | Overlap With Next Release |
| GlobalLipids | 2010 | 102 | 19 | 4.35 | 13/19 |
| 2013 | 145 | 65 | 4.29 | - | |
| GIANT | 2010 | 144 | 60 | 4.11 | 49/60 |
| 2014/5 | 724 | 162 | 4.49 | - | |
| HaemgenRBC | 2012 | 63 | 16 | 5.21 | 9/16 |
| 2016 | 610 | 60 | 4.68 | - | |
| ICBP | 2011 | 22 | 22 | 5.24 | - |
| MAGIC | 2010 | 12 | 1 | 6.90 | - |
| GEFOS | 2015 | 34 | 13 | 5.06 | - |
| GIS | 2014 | 8 | 5 | 7.04 | - |
| SSGAC | 2016 | 9 | 1 | 5.43 | - |
| CKDGen | 2010/1 | 28 | 6 | 4.10 | - |
| ENIGMA2 | 2015 | 5 | 3 | 7.48 | - |
Previous Univariate: the number of previous genome-wide significant univariate associations based on the publicly available summary data. New Multivariate: the number of new genome-wide significant multivariate associations. BFav Thresh: the Bayes Factor threshold used in declaring new multivariate associations to be significant. Overlap With Next Release: for GlobalLipids2010, GIANT2010, and HaemgenRBC2012, the last column shows the number of new multivariate associations that overlap with the univariate GWAS associations in the next release from the same consortium; overlap is defined as being within 50kb of the univariate GWAS variant.
Fig 2Replication of new multivariate associations.
The figure shows results based on earlier and later releases from studies with multiple releases (GlobalLipids, GIANT, and HaemgenRBC). Each point represents a new multivariate association identified in our multivariate analysis of the earlier release. The x- and y-axes show the minimum (across phenotypes) of the -log10 univariate p-values from the earlier release (x-axis) vs. the later release (y-axis). Dashed red lines represent the univariate significance GWAS thresholds used for each study’s releases. Across all three studies, 84 out of 94 new multivariate associations from the earlier releases have smaller minimum univariate p-values in the later release, and 68 out of 84 new multivariate associations that did not reach GWAS significance in the earlier release do so in the later release (see S5 Table for a per-dataset breakdown).
Fig 3Comparison of new multivariate hits vs. relaxing univariate p-value threshold.
For each data set the graph shows how many associations become significant as the univariate p-value threshold is relaxed (moving from right to left on the x-axis), and how many of these are declared as new multivariate hits in our analysis. In both cases results are pruned to avoid counting associations of SNPs in strong LD; see Methods for details. The appearance of appreciable blue areas indicates that the multivariate analysis is reordering the significance of SNPs compared with performing multiple univariate analyses.
Fig 4Distribution, across significant SNPs, of number of phenotypes that are confidently associated (A) or confidently unassociated (B).
Results are shown for three well-powered datasets: GlobalLipids2013, GIANT2014/5, and HaemgenRBC2016. Here “confident” means with probability >0.95, so a SNP is considered “confidently associated” with a phenotype if the sum of its probabilities in the “Directly Associated” and “Indirectly Associated” categories exceeds 0.95 (A), and is considered confidently unassociated with the phenotype if this probability is less than 0.05 (B). The set of significant SNPs includes both previous univariate associations and new multivariate associations.