| Literature DB >> 33175840 |
Jeffery A Goldstein1, Joshua S Weinstock2, Lisa A Bastarache3, Daniel B Larach4, Lars G Fritsche2, Ellen M Schmidt2, Chad M Brummett4, Sachin Kheterpal4, Goncalo R Abecasis2, Joshua C Denny3, Matthew Zawistowski2.
Abstract
Phenotypes extracted from Electronic Health Records (EHRs) are increasingly prevalent in genetic studies. EHRs contain hundreds of distinct clinical laboratory test results, providing a trove of health data beyond diagnoses. Such lab data is complex and lacks a ubiquitous coding scheme, making it more challenging than diagnosis data. Here we describe the first large-scale cross-health system genome-wide association study (GWAS) of EHR-based quantitative laboratory-derived phenotypes. We meta-analyzed 70 lab traits matched between the BioVU cohort from the Vanderbilt University Health System and the Michigan Genomics Initiative (MGI) cohort from Michigan Medicine. We show high replication of known association for these traits, validating EHR-based measurements as high-quality phenotypes for genetic analysis. Notably, our analysis provides the first replication for 699 previous GWAS associations across 46 different traits. We discovered 31 novel associations at genome-wide significance for 22 distinct traits, including the first reported associations for two lab-based traits. We replicated 22 of these novel associations in an independent tranche of BioVU samples. The summary statistics for all association tests are freely available to benefit other researchers. Finally, we performed mirrored analyses in BioVU and MGI to assess competing analytic practices for EHR lab traits. We find that using the mean of all available lab measurements provides a robust summary value, but alternate summarizations can improve power in certain circumstances. This study provides a proof-of-principle for cross health system GWAS and is a framework for future studies of quantitative EHR lab traits.Entities:
Year: 2020 PMID: 33175840 PMCID: PMC7682892 DOI: 10.1371/journal.pgen.1009077
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Fig 1Scatterplot of Δ in MGI and BioVU when using the first available measure rather than the mean measurement in a GWAS of Cholesterol level.
Δ is the -log fold change in p-value at a SNP for using an alternate analysis, in this case the first available lab measurement. Each dot is a SNP, with red dots indicating GWAS catalog SNPs for the specific lab trait. The white diamond contains 99.9% of SNPs and is used to identify SNPs with the largest changes in p-value due to the alternate analysis. SNPs outside the bounding diamond in the top right (green) quadrant show a concordant increase in significance in both MGI and BioVU, that is, SNPs for which the alternative strategy increases significance in both cohorts. Conversely, SNPs in the bottom left (blue) quadrant show a concordant decrease in significance in both MGI and BioVU. SNPs in either the top left or bottom right (yellow) quadrants have a discordant effect, indicating a large increase in p-value in one cohort but a large decrease in p-value in the second cohort. In this example, one catalog SNP showed a concordant increase in significance when using the first available lab measure, 11 catalog SNPs had a concordant decrease in significance and one SNP had discordant effects. The complete set of scatterplots for each analyzed lab and alternative analysis strategy (summary statistic and comorbidity model) are included in the S1 Fig. Tables 3 and 4 summarize the movement of catalog SNPs for each lab and analysis strategy.
Classification of catalog SNPs for alternate summary statistics.
| Median Measurement | First Available Measurement | Maximum Measurement | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Lab | Testable Catalog SNPs | Concordant Increased Significance | Concordant Decreased Significance | Discordant Effect | Concordant Increased Significance | Concordant Decreased Significance | Discordant Effect | Concordant Increased Significance | Concordant Decreased Significance | Discordant Effect |
| Chol | 91 | 0 | 12 | 0 | 1 | 11 | 1 | 2 | 4 | 6 |
| Create | 36 | 2 | 0 | 0 | 2 | 2 | 1 | 0 | 8 | 1 |
| EoAB | 31 | 0 | 6 | 0 | 0 | 9 | 0 | 0 | 2 | 1 |
| EoRE | 28 | 0 | 1 | 0 | 0 | 4 | 0 | 0 | 1 | 1 |
| HCT | 36 | 0 | 0 | 0 | 4 | 0 | 1 | 15 | 0 | 1 |
| HDL | 101 | 0 | 6 | 3 | 0 | 15 | 1 | 0 | 27 | 5 |
| Hgb | 34 | 0 | 0 | 0 | 5 | 0 | 0 | 12 | 0 | 0 |
| LDL | 84 | 0 | 9 | 1 | 0 | 9 | 4 | 2 | 2 | 6 |
| LymphAB | 35 | 0 | 0 | 0 | 0 | 3 | 1 | 5 | 1 | 2 |
| LymphRE | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| MCHC | 20 | 0 | 1 | 0 | 2 | 5 | 3 | 2 | 5 | 1 |
| MCH | 64 | 0 | 16 | 27 | 0 | 33 | 8 | 0 | 33 | 7 |
| MCV | 77 | 1 | 5 | 7 | 0 | 19 | 13 | 0 | 30 | 6 |
| MonoAB | 43 | 2 | 3 | 0 | 0 | 9 | 0 | 0 | 13 | 1 |
| MPV | 84 | 0 | 11 | 9 | 0 | 39 | 9 | 5 | 20 | 17 |
| PLT | 102 | 0 | 0 | 1 | 7 | 7 | 1 | 0 | 19 | 5 |
| PMNAB | 35 | 0 | 0 | 0 | 0 | 2 | 1 | 0 | 3 | 1 |
| PMNRE | 21 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| RBC | 50 | 0 | 4 | 4 | 13 | 0 | 1 | 21 | 0 | 0 |
| RDW | 29 | 0 | 1 | 2 | 0 | 1 | 4 | 0 | 7 | 0 |
| Trigs | 73 | 0 | 7 | 0 | 1 | 15 | 1 | 0 | 22 | 0 |
| WBC | 33 | 0 | 4 | 5 | 0 | 7 | 1 | 0 | 9 | 0 |
| Total | 1127 | 5 (0.4%) | 86 (7.6%) | 59 (5.2%) | 35 (3.1%) | 190 (16.9%) | 51 (4.5%) | 64 (5.6%) | 206 (18.3%) | 62 (5.5%) |
Classification of catalog SNPs for the comorbidity model, which includes covariates for various lab-altering diseases.
| Comorbidity Model | ||||
|---|---|---|---|---|
| Lab | Testable Catalog SNPs | Concordant Increased Significance | Concordant Decreased Significance | Discordant Effect |
| Chol | 91 | 2 | 5 | 2 |
| Creat | 36 | 1 | 3 | 2 |
| EoAB | 31 | 0 | 0 | 0 |
| EoRE | 28 | 0 | 0 | 1 |
| HCT | 36 | 2 | 0 | 2 |
| HDL | 101 | 15 | 2 | 2 |
| Hgb | 34 | 1 | 0 | 0 |
| LDL | 84 | 0 | 7 | 2 |
| LymphAB | 35 | 2 | 0 | 4 |
| LymphRE | 20 | 0 | 0 | 0 |
| MCHC | 20 | 2 | 0 | 2 |
| MCH | 64 | 1 | 7 | 26 |
| MCV | 77 | 9 | 1 | 4 |
| MonoAB | 43 | 5 | 0 | 1 |
| MPV | 84 | 18 | 0 | 5 |
| PLT | 102 | 5 | 1 | 4 |
| PMNAB | 35 | 0 | 2 | 1 |
| PMNRE | 21 | 0 | 0 | 2 |
| RBC | 50 | 2 | 0 | 5 |
| RDW | 29 | 0 | 1 | 3 |
| Trigs | 73 | 3 | 3 | 7 |
| WBC | 33 | 2 | 2 | 2 |
| Total | 1127 | 70 (6.2%) | 34 (3.0%) | 77 (6.8%) |
Summary of clinical lab traits tested, including meta-analysis samples size, number of testable GWAS catalog SNPs, number of replicated catalog SNPs and replication rate.
| Lab Name | Category | Description | Meta-Analysis Sample Size | Number of Testable GWAS Catalog SNPs | Number of Catalog SNPs Replicated in Meta-Analysis | Replication Rate (%) |
|---|---|---|---|---|---|---|
| Alb | Liver function | Albumin, most abundant blood protein | 39,513 | 5 | 4 | 80 |
| AlkP | Liver function | Alkaline phosphatase, bile duct and bone enzyme released by damage | 39,809 | 3 | 1 | 33 |
| ALT | Liver function | ALanine aminoTransferase, liver enzyme released by damage | 40,116 | 0 | 0 | N/A |
| Amyl | Pancreas | Amylase, digestive pancreas enzyme released by damage | 10,368 | 0 | 0 | N/A |
| AST | Liver function | ASpartate aminoTransferase, liver enzyme released by damage | 40,176 | 0 | 0 | N/A |
| BasoAB | Differential | Basophils, white blood cell type (absolute number) | 29,653 | 19 | 12 | 63 |
| BasoRE | Differential | Basophils, white blood cell type (relative proportion) | 32,578 | 11 | 7 | 64 |
| BEAR | Blood gas | Base Excess ARterial, Acid-base measure of metabolic acidosis or alkalosis | 8,895 | 0 | 0 | N/A |
| Bili | Liver function | Total Bilirubin, heme byproduct excreted by liver | 38,416 | 4 | 4 | 100 |
| BNP | Heart failure | Brain Natriuretic Protein, Signaling protein from heart under stress | 9,369 | 1 | 1 | 100 |
| BUN | Renal function | Blood Urea Nitrogen Protein byproduct excreted by kidneys | 45,922 | 0 | 0 | N/A |
| Ca | Electrolytes | Calcium, blood electrolyte | 46,100 | 9 | 7 | 78 |
| Chol | Lipid panel | Total cholesterol | 23,642 | 91 | 60 | 66 |
| CKMBRe | Cardiac markers | Creatine Kinase Muscle Brain isoform, relative, Enzyme in heart released by damage | 10,964 | 0 | 0 | N/A |
| Cl | Electrolytes | Chloride, blood electrolye | 45,920 | 0 | 0 | N/A |
| CPK | Cardiac markers | Creatine PhosphoKinase, enzyme in skeletal and cardiac muscle released by damage | 15,150 | 0 | 0 | N/A |
| Creat | Renal function | Creatinine, creatine byproduct excreted by kidneys | 46,027 | 36 | 29 | 81 |
| CRP | Inflammatory | C-reactive protein, marker of inflammation | 12,447 | 16 | 7 | 44 |
| EoAB | Differential | Eosinophils, white blood cell type (absolute count) | 29,912 | 31 | 25 | 81 |
| EoRE | Differential | Eosinophils, white blood cell type (relative proportion) | 26,980 | 28 | 18 | 64 |
| Ferrit | Iron | Ferritin, iron storage protein | 11,744 | 6 | 1 | 17 |
| FT4 | Thyroid function | Free tetraiodothyronin, active thyroid hormone | 15,868 | 0 | 0 | N/A |
| Gluc | Metabolic | Blood glucose | 46,027 | 18 | 16 | 89 |
| HCO3 (CO2) | Blood gas | Bicarbonate, main blood pH buffer | 45,932 | 0 | 0 | N/A |
| HCT | Complete blood count | Hematocrit, measure of blood oxygen carrying capacity | 46382 | 36 | 20 | 56 |
| HDL | Lipid panel | High density lipoprotein cholesterol | 23,318 | 101 | 84 | 83 |
| Hgb | Complete blood count | Hemoglobin, oxygen carrying protein | 46,159 | 34 | 18 | 53 |
| HgbA1C | Metabolic | Hemoglobin A1C, measure of blood glucose over previous 90 days | 17,407 | 11 | 10 | 91 |
| IGranAB | Differential | Immature granulocytes, immature white blood cell type (absolute count) | 30,744 | 0 | 0 | N/A |
| IGranRE | Differential | Immature granulocytes, immature white blood cell type (relative proportion) | 30,683 | 0 | 0 | N/A |
| INR | Coagulation | International Normalized Ratio, derivative of PT used to dose anticoagulants | 33,695 | 0 | 0 | N/A |
| Iron | Iron | Iron | 11,317 | 4 | 3 | 75 |
| K | Electrolytes | Potassium, blood electrolyte | 45,941 | 0 | 0 | N/A |
| LAC | Blood gas | Lactic acid, marker of tissue hypoxia | 8,792 | 0 | 0 | N/A |
| LDH | Tumor markers | Lactate dehydrogenase, enzyme found in many cell types released by damage | 9,734 | 0 | 0 | N/A |
| LDL | Lipid panel | Low density lipoprotein cholesterol | 22,896 | 84 | 58 | 69 |
| Lipase | Pancreas | Lipase, digestive pancreas enzyme released by damage | 12,649 | 2 | 2 | 100 |
| LymphAB | Differential | Lymphocytes, white blood cell type (absolute count) | 32,548 | 35 | 22 | 63 |
| LymphRE | Differential | Lymphocytes, white blood cell type (relative proportion) | 32,553 | 20 | 10 | 50 |
| MCH | Red cell indices | Mean corpuscular hemoglobin, used to differentiate causes of anemia | 46,159 | 64 | 57 | 89 |
| MCHC | Red cell indices | Mean corpuscular hemoglobin concentration, used to differentiate causes of anemia | 46,157 | 20 | 19 | 95 |
| MCV | Red cell indices | Mean corupuscular volume, used to differentiate causes of anemia | 46,153 | 77 | 68 | 88 |
| Mg | Electrolytes | Magnesium, blood electrolyte | 22,773 | 4 | 4 | 100 |
| MonoAB | Differential | Monocytes, white blood cell type (absolute count) | 32,587 | 43 | 32 | 74 |
| MonoRE | Differential | Monocytes, white blood cell type (relative proportion) | 32,594 | 15 | 12 | 80 |
| MPV | Coagulation | Mean platelet volume | 40,058 | 84 | 73 | 87 |
| Na | Electrolytes | Sodium, blood electrolyte | 45,933 | 0 | 0 | N/A |
| pCO2 | Blood gas | Arterial partial pressure of CO2, measure of ventilation | 9,516 | 0 | 0 | N/A |
| pH | Blood gas | Arterial pH | 10,279 | 0 | 0 | N/A |
| Phos | Electrolyte | Phosphorus, blood electrolyte | 21,618 | 5 | 4 | 80 |
| PLT | Complete blood count | Platelet count, clot forming measure | 46,145 | 102 | 84 | 82 |
| PMNAB | Differential | Neutrophils, white blood cell type (absolute count) | 32,595 | 35 | 15 | 43 |
| PMNRE | Differential | Neutrophils, white blood cell type (relative proportion) | 29,435 | 21 | 7 | 33 |
| pO2 | Blood gas | Arterial partial pressure of oxygen, measure of oxygenation | 9,557 | 0 | 0 | N/A |
| PT | Coagulation panel | Prothrombin time, clot forming measure | 33,671 | 1 | 1 | 100 |
| PTT | Coagulation panel | Partial Thromboplastin Time, clot forming measure | 30,972 | 9 | 6 | 67 |
| RBC | Complete blood count | Red Blood Cell count, measure of blood oxygen carrying capacity | 46,158 | 50 | 31 | 62 |
| RDW | Red cell indices | Red cell Distribution Width, measure of variability in MCV, used to differentiate causes of anemia | 44,281 | 29 | 21 | 72 |
| %SAT | Iron | Transferrin saturation, measure of available iron transport capacity | 10,180 | 4 | 3 | 75 |
| SedRat | Inflammatory markers | Erythrocyte Sedimentation Rate (ESR), non-specific marker of inflammation | 13,945 | 5 | 5 | 100 |
| TIBC | Iron | Total Iron Binding Capacity, measure of iron transport capacity, used to calculate transferrin saturation | 10,397 | 1 | 1 | 100 |
| TProt | Liver function | Total Protein in blood | 38,352 | 2 | 2 | 100 |
| Trigs | Lipid panel | Triglycerides, tested as part of cholesterol panels | 23,963 | 73 | 63 | 86 |
| Troponin | Cardiac markers | Troponin I, heart protein released by damage | 10,106 | 0 | 0 | N/A |
| TSH | Thyroid function | Thyroid Stimulating Hormone, test of thyroid function and feedback | 27,441 | 1 | 1 | 100 |
| UCrea | Renal function | Urine creatinine, measure of kidney function | 10,522 | 0 | 0 | N/A |
| UricA | Gout | Uric acid, nucleotide breakdown product elevated in gout | 7,429 | 17 | 14 | 82 |
| Vi-B12 | Nutrition | Vitamin B12, used in DNA synthesis | 12,506 | 7 | 7 | 100 |
| Vit-D | Nutrition | Vitamin D storage form, regulates calcium and phosphorus | 12,250 | 6 | 6 | 100 |
| WBC | Complete blood count | White Blood Cell count | 46,100 | 33 | 27 | 82 |
Fig 2Sample sizes for 70 clinical lab traits from the meta-analysis of BioVU and MGI EHRs (red triangles) and the previous largest reported GWAS in a European cohort (black circles). Our meta-analysis provides the largest GWAS for 34 lab traits, including the first for 14. Asterisks along the bottom row indicate labs for which we identified a novel genetic association.
Fig 3Replication rates for GWAS catalog SNPs of clinical labs increased with (A) the number of times an association was reported in the GWAS catalog, (B) the most significant p-value previously reported for the association, and (C) the ratio of sample size in our meta-analysis to that of the previous largest study.
Summary of Novel findings.
| MGI-BioVU Meta-Analysis | BioVU Replication Cohort | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Lab | SNP | Chr:Pos | Allele 1 | Allele 2 | N | Beta | P-Value | N | Beta | P-Value | Replicated |
| AlkP | rs3843738 | 17:43739194 | A | G | 39,809 | 0.04 | 2.51E-08 | 22,920 | 0.01 | 3.58E-01 | No |
| AlkP | rs73004933 | 19:19675696 | T | C | 39,809 | 0.08 | 4.47E-09 | 22,730 | 0.05 | 7.14E-03 | Yes |
| ALT | rs112574791 | 8:145730221 | A | G | 40,116 | 0.18 | 3.02E-08 | 23,007 | 0.15 | 5.80E-04 | Yes |
| Amyl | rs1930212 | 1:104324819 | A | G | 10,368 | -0.25 | 1.48E-45 | 3,573 | -0.18 | 4.69E-09 | Yes |
| Amyl | rs8051363 | 16:75255217 | A | G | 10,368 | 0.10 | 1.07E-10 | 3,564 | 0.09 | 4.51E-04 | Yes |
| BasoRE | rs386785158 | 15:70744437 | T | C | 29,653 | 0.06 | 7.94E-13 | 16,191 | 0.04 | 2.10E-04 | Yes |
| Bili | rs855791 | 22:37462936 | A | G | 39,890 | 0.04 | 2.34E-08 | 22,918 | 0.04 | 1.00E-05 | Yes |
| BUN | rs10516957 | 4:95949206 | T | C | 45,922 | -0.06 | 1.35E-08 | 25,245 | 0.01 | 6.11E-01 | No |
| Ca | rs6727384 | 2:97400324 | A | G | 46,100 | -0.04 | 5.13E-10 | 25,200 | -0.05 | 2.06E-07 | Yes |
| Ca | rs2839899 | 9:80350999 | A | G | 46,100 | 0.04 | 6.76E-09 | 25,194 | 0.03 | 9.47E-03 | Yes |
| Cl | rs1030025 | 2:103105611 | A | T | 45,920 | 0.05 | 4.68E-10 | 25,204 | 0.02 | 9.16E-02 | No |
| FT4 | rs10122824 | 9:139109861 | T | G | 15,868 | 0.07 | 1.00E-09 | 9,721 | 0.07 | 7.28E-07 | Yes |
| Glucose | rs7607980 | 2:165551201 | T | C | 46,027 | -0.05 | 4.27E-09 | 25,312 | -0.04 | 2.09E-03 | Yes |
| Glucose | rs896854 | 8:95960511 | T | C | 46,027 | -0.04 | 1.55E-09 | 25,311 | 0.01 | 3.64E-01 | No |
| Glucose | rs9273364 | 6:32626302 | T | G | 46,027 | 0.05 | 2.63E-11 | 24,801 | 0.05 | 3.10E-06 | Yes |
| HgbA1C | rs3130628 | 6:31609272 | T | C | 17,407 | -0.08 | 1.23E-08 | 7,340 | 0.03 | 3.79E-02 | No |
| HCO3 (CO2) | rs1799913 | 11:18047255 | T | G | 45,932 | -0.04 | 5.89E-09 | 25,219 | -0.04 | 7.82E-07 | Yes |
| HCO3 (CO2) | rs77375846 | 2:103155075 | T | C | 45,932 | -0.10 | 9.33E-25 | 25,217 | -0.06 | 2.78E-05 | Yes |
| IGranRE | rs13284665 | 9:131513370 | A | G | 30,683 | 0.22 | 6.61E-74 | QC Fail | N/A | N/A | No |
| IGranAB | rs13284665 | 9:131513370 | A | G | 30,744 | 0.13 | 6.76E-35 | QC Fail | N/A | N/A | No |
| K | rs10039139 | 5:137164863 | T | G | 45,941 | 0.07 | 8.32E-16 | 25,211 | 0.06 | 1.83E-06 | Yes |
| Lipase | rs9377343 | 6:96512220 | A | G | 12,649 | -0.10 | 4.79E-14 | 5,564 | -0.08 | 3.60E-05 | Yes |
| Lipase | rs8051363 | 16:75255217 | A | G | 12,649 | 0.13 | 2.00E-20 | 5,549 | 0.07 | 8.39E-04 | Yes |
| MCHC | rs12352830 | 9:80041132 | C | G | 46,157 | -0.04 | 4.37E-08 | 26,243 | -0.04 | 5.77E-05 | Yes |
| MonoRE | rs117358683 | 12:44145965 | A | G | 32,594 | -0.23 | 2.69E-08 | 16,185 | 0.04 | 4.07E-01 | No |
| MPV | rs11212635 | 11:108310702 | A | T | 40,058 | 0.04 | 9.55E-09 | 17,333 | -0.01 | 3.68E-01 | No |
| TProt | rs8022180 | 14:103263020 | A | G | 38,352 | 0.04 | 7.24E-10 | 19,665 | 0.03 | 2.63E-03 | Yes |
| Trigs | rs6847598 | 4:76750356 | T | C | 23,963 | -0.05 | 1.58E-08 | 12,526 | -0.03 | 1.48E-02 | Yes |
| TSH | rs12590163 | 14:105223525 | T | C | 27,441 | -0.05 | 4.68E-08 | 17,042 | -0.04 | 6.76E-04 | Yes |
| TSH | rs310766 | 3:12233482 | A | G | 27,441 | -0.06 | 1.66E-08 | 17,079 | -0.05 | 1.42E-05 | Yes |
| TSH | rs9275141 | 6:32651117 | T | G | 27,441 | 0.05 | 3.47E-09 | 17,054 | 0.04 | 8.64E-04 | Yes |
Fig 4Pairwise genetic correlation of clinical lab traits.
We restricted to labs with heritability of at least 7%. Squares are colored only for correlations having a p-value <0.05 for the null hypothesis of correlation equal to zero.