| Literature DB >> 30166544 |
Jonathan D Mosley1,2, QiPing Feng3, Quinn S Wells3, Sara L Van Driest3,4, Christian M Shaffer3, Todd L Edwards5, Lisa Bastarache6, Wei-Qi Wei6, Lea K Davis3, Catherine A McCarty7, Will Thompson8, Christopher G Chute9, Gail P Jarvik10, Adam S Gordon10, Melody R Palmer10, David R Crosslin11, Eric B Larson10,12, David S Carrell12, Iftikhar J Kullo13, Jennifer A Pacheco14, Peggy L Peissig15, Murray H Brilliant16, James G Linneman15, Bahram Namjou17, Marc S Williams18, Marylyn D Ritchie19, Kenneth M Borthwick19, Shefali S Verma19, Jason H Karnes20, Scott T Weiss21, Thomas J Wang6, C Michael Stein3, Josh C Denny3,6, Dan M Roden3,6,22.
Abstract
Defining the full spectrum of human disease associated with a biomarker is necessary to advance the biomarker into clinical practice. We hypothesize that associating biomarker measurements with electronic health record (EHR) populations based on shared genetic architectures would establish the clinical epidemiology of the biomarker. We use Bayesian sparse linear mixed modeling to calculate SNP weightings for 53 biomarkers from the Atherosclerosis Risk in Communities study. We use the SNP weightings to computed predicted biomarker values in an EHR population and test associations with 1139 diagnoses. Here we report 116 associations meeting a Bonferroni level of significance. A false discovery rate (FDR)-based significance threshold reveals more known and undescribed associations across a broad range of biomarkers, including biometric measures, plasma proteins and metabolites, functional assays, and behaviors. We confirm an inverse association between LDL-cholesterol level and septicemia risk in an independent epidemiological cohort. This approach efficiently discovers biomarker-disease associations.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30166544 PMCID: PMC6117367 DOI: 10.1038/s41467-018-05624-4
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Overview. a Overview of the study design. Bayesian sparse linear mixed modelling (BSLMM) was used to compute SNP weights for 53 biomarkers from the ARIC study. These weights were used to compute genetically predicted biomarkers in the EHR data set and phenome wide scanning (PheWAS) was used to identify clinical phenotypes associated with the genetically predicted biomarker. b Circos plot showing the 116 significant associations (Bonferroni p < 0.05) between the genetic predictors of the ARIC biomarkers and pheWAS phenotypes. Associations are denoted by lines. Coloring is used to highlight similar groups of biomarkers and pheWAS phenotypes
Fig. 2Associations with positive controls. Positive control biomarker-phenotype pairs were identified a priori for 42 ARIC biomarkers. The histogram quantifies the percentage of pairs with Bonferroni p < 0.05, rank order value ≤ 5, false discovery rate (FDR) q < 0.1 or not seen by any of the criteria. Some pairs may fall into multiple categories
Fig. 3Comparison of an FDR versus Bonferroni p value selection threshold. a Scatter plot summarizing pheWAS analyses for a genetic predictor of systolic blood pressure (SBP). Each point indicates a logistic regression association analysis, adjusted for birth decade, sex, and 3 PCs, between genetically predicted waist circumference and a pheWAS phenotype. Odds ratios are per standard deviation increase in the genetic predictor. Blue and green colored circles denote associations that are significant at Bonferroni p < 0.05 and FDR q < 0.1, respectively. Only selected points are labelled for clarity. b Count of the number of associations, binned by disease, meeting a Bonferroni, and FDR selection thresholds. c PheWAS associations for a waist circumference genetic predictor and d count of disease associations significant by Bonferroni or FDR criteria. e Frequency histogram of the skewness (see Methods for calculations) of the pheWAS beta coefficients for each of the 53 biomarkers. The red arrow points to the value for waist circumference. HTN: hypertension; PVD: peripheral vascular disease; T2D: type 2 diabetes
Fig. 4Associations for selected biomarkers. Scatter plots summarizing pheWAS analyses for genetic predictors of a triglyceride levels, b pack-years of smoking, c serum magnesium levels, and d serum Von Willebrand factor levels. Odds ratios are from logistic regression analyses, adjusting for birth decade, sex, and 3 PCs. Blue and green colored circles denote associations that are significant at Bonferroni p < 0.05 and FDR q < 0.1, respectively. PVD: peripheral vascular disease; AAA: abdominal aortic aneurysm; IHD: ischemic heart disease; DVT: deep vein thrombosis; PE: pulmonary embolism; GI: gastrointestinal
Fig. 5Associations with LDL cholesterol (LDL-C). a Scatter plot summarizing pheWAS analyses for a genetic predictor of LDL-C. Blue and green colored circles denote associations that are significant at Bonferroni p < 0.05 and FDR q < 0.1, respectively. b Association analysis between the LDL genetic predictor and the PheWAS septicemia phenotype, stratified by type 2 diabetes (T2D) status. Error bars represent 95% confidence intervals of odds-ratio estimates. c Epidemiological association between the Low (LDL-C < 60 mg/dl) versus Normal LDL-C (between 90 and 130 mg/dl) and septicemia, stratified by T2D status, using an independent EHR cohort. Odds-ratios were determined by multivariable logistic regression adjusting for age, gender and race and stratified by T2D status. Error bars represent 95% confidence intervals. T2D type 2 diabetes