| Literature DB >> 35121771 |
Eun Kyung Choe1,2, Manu Shivakumar1, Anurag Verma3, Shefali Setia Verma3, Seung Ho Choi4, Joo Sung Kim5,6, Dokyoon Kim7,8.
Abstract
The expanding use of the phenome-wide association study (PheWAS) faces challenges in the context of using International Classification of Diseases billing codes for phenotype definition, imbalanced study population ethnicity, and constrained application of the results in research. We performed a PheWAS utilizing 136 deep phenotypes corroborated by comprehensive health check-ups in a Korean population, along with trans-ethnic comparisons through using the UK Biobank and Biobank Japan Project. Meta-analysis with Korean and Japanese population was done. The PheWAS associated 65 phenotypes with 14,101 significant variants (P < 4.92 × 10-10). Network analysis, visualization of cross-phenotype mapping, and causal inference mapping with Mendelian randomization were conducted. Among phenotype pairs from the genotype-driven cross-phenotype associations, we evaluated penetrance in correlation analysis using a clinical database. We focused on the application of PheWAS in order to make it robust and to aid the derivation of biological meaning post-PheWAS. This comprehensive analysis of PheWAS results based on a health check-up database will provide researchers and clinicians with a panoramic overview of the networks among multiple phenotypes and genetic variants, laying groundwork for the practical application of precision medicine.Entities:
Mesh:
Year: 2022 PMID: 35121771 PMCID: PMC8817039 DOI: 10.1038/s41598-021-04580-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Overview of the study design. (A) We utilized a health check-up cohort on comprehensive health check-up. Sub-cohorts of this cohort are the Gene-EnvironmeNtal IntEraction and phenotype (GENIE) cohort, which includes biobank data, and the Health and Prevention EnhAnCEment (H-PEACE) cohort, which includes an EHR database of the health check-up results. (B) Phenome wide association study (PheWAS) was performed for 136 phenotypes adjusting for age, sex, and PC1-PC3. (C) We leveraged cross-phenotype associations to perform systematic analysis of the PheWAS results, which were polygenicity, pleiotropy, a bipartite gene network, and a bipartite phenotype network. The details are described in Fig. 4. (D) To ensure robustness of the PheWAS results, we further dissected the results to suggest applicable interpretations, the heritability for each phenotype; Correlation between phenotype heritability and the effect of the loci on genes and protein sequences associated with phenotypes. (E) Using cross-phenotype association information, we constructed phenotype-phenotype and phenotype-genotype networks. (F) We visualized the comparison of obesity indices (body mass index, waist circumference, visceral adipose tissue, and total adipose tissue amount). (G) We constructed cross-phenotype mappings, which have a core phenotype (Pheno-1 in the figure) and branches of connected phenotypes that share loci. These were partitioned by color according to the biological system involved. (H) We estimated causal inferences in the phenotype pairs from cross-phenotype associations using Mendelian randomization and constructed a causal inference map. (I) We performed trans-ethnic and trans-nationality analysis among Korean, European, and Japanese populations. (J) We compared phenotype-phenotype pairs generated from SNP-based cross phenotype-association in the Biobank analysis with those generated from correlation analysis in the EHR-based H-PEACE cohort. We evaluated the overlap or exclusiveness of pairs for each phenotype by phenotype degree.
Figure 4Comparison of phenotype-phenotype pairs between PheWAS driven and EHR-driven analysis. There were 76 phenotypes also recorded in the EHR-driven cohort (H-PEACE cohort). PheWAS-driven pairs (1164) were based on shared SNPs with association P < 1 × 10–4, and EHR-driven pairs (1938) on correlation analysis with multi-test corrected P < 0.05 (Table S26). Skeletal muscle mass (95%) and alkaline phosphatase (93.48%) had high ratios of overlap, while thyroid cancer (0%) and alpha fetoprotein (8%) had low ratios. In terms of biological categories, the average replication % was highest for anthropometric measurement (86.43%) Of the 1164 pairs from the PheWAS-driven approach, 834 (71.65%) also manifested significance in the EHR-driven analysis.
Overview of the studied phenotypes.
| Category | Phenotype | Significant loci count ( | Significant loci count ( | Significant gene count ( | Heritability (h2) |
|---|---|---|---|---|---|
| AM | Anthropometric measure | ||||
| AM | Height | 2415 | 5 | 257 | 0.3221 |
| AM | Weight | 995 | 0 | 132 | 0.2292 |
| AM | Body mass index | 886 | 0 | 142 | 0.2375 |
| AM | Skeletal muscle mass | 1479 | 0 | 192 | 0.2769 |
| AM | Body fat mass | 1135 | 0 | 144 | 0.1995 |
| AM | Body fat percent | 1324 | 0 | 149 | 0.2142 |
| AM | Waist circumference | 841 | 0 | 129 | 0.1781 |
| AM | Total adipose tissue area | 1251 | 0 | 128 | 0.1505 |
| AM | Visceral adipose tissue area | 981 | 0 | 131 | 0.1082 |
| CV | Cerebro-cardio-vascular | ||||
| CV | Heart rate | 1318 | 40 | 130 | 0.1681 |
| CV | Axis on EKC | 959 | 0 | 142 | 0.1496 |
| CV | EKG: Sinus bradycardia | 862 | 40 | 99 | 0.1062 |
| CV | EKG: Right bundle branch block | 786 | 0 | 141 | 0 |
| CV | EKG: 1st degree atrioventricular block | 894 | 0 | 156 | 0.0119 |
| CV | EKG: Myocardial infarction | 853 | 0 | 160 | 0.0915 |
| CV | EKG: Myocardial ischemia | 1459 | 0 | 276 | 0.2081 |
| CV | Coronary CT: Coronary calcium score | 2688 | 19 | 629 | 0.1278 |
| CV | Coronary CT: Coronary vascular plaque | 1241 | 0 | 114 | 0 |
| CV | Coronary CT: Coronary vascular stenosis | 654 | 0 | 102 | 0 |
| CV | Coronary CT: Aortic dilatation | 619 | 0 | 127 | 0.1247 |
| CV | Brain unidentified bright object (UBO) | 519 | 0 | 92 | 0.1272 |
| CV | Brain small vessel disease | 789 | 0 | 117 | 0.0202 |
| CV | Brain vascular atherosclerosis | 521 | 0 | 105 | 0.1204 |
| CV | Brain vascular stenosis | 901 | 0 | 182 | 0.1987 |
| CV | Brain aneurysm | 720 | 0 | 111 | 0.147 |
| CV | Brain atrophy | 1246 | 0 | 166 | 0.2294 |
| CV | Diagnosed of hypertension | 1039 | 0 | 138 | 0.1024 |
| DS | Digestive system | ||||
| DS | Gall bladder adenomyomatosis | 817 | 0 | 140 | 0.0733 |
| DS | Pancreas IPMN | 873 | 0 | 164 | 0.0875 |
| DS | Liver hemangioma | 714 | 4 | 121 | 0.0003 |
| DS | Gall bladder cholecystitis | 836 | 1 | 156 | 0.0232 |
| DS | Gall bladder stone | 765 | 0 | 135 | 0.0276 |
| DS | Gall bladder polyp | 904 | 1 | 122 | 0.1163 |
| DS | Fatty liver | 849 | 144 | 111 | 0.1332 |
| DS | Atrophic gastritis | 610 | 0 | 103 | 0.015 |
| DS | Intestinal metaplasia of stomach | 1074 | 0 | 151 | 0.1527 |
| DS | Duodenal ulcer | 833 | 54 | 106 | 0 |
| DS | Gastric ulcer | 1000 | 0 | 200 | 0.0315 |
| DS | Gastroesophageal reflux disease | 565 | 0 | 101 | 0.0143 |
| DS | Serum total protein | 945 | 52 | 203 | 0.1993 |
| DS | Serum albumin | 1310 | 21 | 231 | 0.2325 |
| DS | Serum total bilirubin | 2570 | 1151 | 137 | 0.274 |
| DS | Alkaline phosphatase | 2631 | 299 | 203 | 0.1203 |
| DS | Glutamic oxaloacetic transaminase | 2209 | 8 | 462 | 0.0334 |
| DS | Glutamic pyruvic transaminase | 1266 | 6 | 255 | 0.0609 |
| DS | Gamma-Glutamyl Transferase | 2716 | 78 | 512 | 0.0818 |
| DS | Gastric cancer | 982 | 0 | 207 | 0.1719 |
| DS | Hepatitis B virus surface antigen | 3762 | 324 | 252 | 0.1679 |
| DS | Hepatitis C virus antibody | 1119 | 0 | 231 | 0.0809 |
| EM | Endocrine and metabolism | ||||
| EM | Fasting blood glucose level | 1842 | 0 | 212 | 0.1116 |
| EM | Uric acid | 3977 | 1261 | 284 | 0.2186 |
| EM | Triglycerides | 2676 | 333 | 258 | 0.1385 |
| EM | HDL cholesterol | 2036 | 442 | 171 | 0.2471 |
| EM | Hemoglobin A1c | 1861 | 0 | 245 | 0.1084 |
| EM | Free T4 | 1652 | 2 | 194 | 0.1547 |
| EM | Thyroid-Stimulating Hormone | 15,064 | 741 | 2549 | 0.1016 |
| EM | Total cholesterol | 1061 | 17 | 151 | 0.0678 |
| EM | LDL cholesterol | 1279 | 63 | 142 | 0.0367 |
| EM | Metabolic syndrome | 811 | 2 | 132 | 0.1583 |
| EM | Thyroid cancer | 836 | 0 | 225 | 0.0023 |
| EM | Breast cancer | 952 | 0 | 210 | 0.0522 |
| EM | Diagnosed of diabetes | 1507 | 0 | 187 | 0.0824 |
| EM | Diagnosed of dyslipidemia | 1103 | 2 | 150 | 0.1251 |
| HS | Hematologic system | ||||
| HS | White blood cell count | 1629 | 143 | 153 | 0.1454 |
| HS | Platelet count | 3040 | 185 | 278 | 0.2375 |
| HS | Neutrophil percent among WBC | 2080 | 250 | 200 | 0.1423 |
| HS | Lymphocyte percent among WBC | 1978 | 247 | 190 | 0.1524 |
| HS | Monocyte percent among WBC | 1948 | 18 | 194 | 0.2067 |
| HS | Eosinophils percent among WBC | 3109 | 11 | 343 | 0.2822 |
| HS | Basophils percent among WBC | 4043 | 293 | 373 | 0.2941 |
| HS | Red blood cell count | 1997 | 209 | 181 | 0.2582 |
| HS | Hemoglobin | 1707 | 12 | 199 | 0.1854 |
| HS | Mean corpuscular volume | 3270 | 250 | 251 | 0.2444 |
| HS | Mean corpuscular hemoglobin | 3077 | 134 | 358 | 0.2204 |
| HS | Mean corpuscular hemoglobin concentration | 4982 | 1266 | 979 | 0.1389 |
| HS | Plateletcrit | 2747 | 68 | 299 | 0.2023 |
| HS | Mean Platelet Volume | 3843 | 188 | 591 | 0.1353 |
| HS | Prothrombin time | 4515 | 227 | 846 | 0.061 |
| HS | Activated Partial Thromboplastin Time | 2092 | 691 | 181 | 0.1725 |
| HS | Hematocrit | 962 | 0 | 170 | 0.1544 |
| HS | Red blood cell distribution width | 2489 | 146 | 244 | 0.1575 |
| LS | Life style | ||||
| LS | Smoking history | 939 | 0 | 99 | 0.062 |
| LS | Alcohol consumption | 2158 | 612 | 156 | 0.0908 |
| LS | Exercise amount | 1657 | 1 | 357 | 0 |
| LS | Education level | 558 | 0 | 134 | 0.0264 |
| LS | Marital status | 0 | 0 | 0 | 0.0048 |
| LS | Coffee consumption | 680 | 17 | 109 | 0.0317 |
| LS | Nocturia per night | 652 | 0 | 98 | 0.0339 |
| ME | Mental and emotion | ||||
| ME | Sleep onset latency | 503 | 0 | 112 | 0.0931 |
| ME | Wake Time After Sleep Onset | 869 | 0 | 121 | 0.066 |
| ME | Depressed mood | 1043 | 0 | 163 | 0.1041 |
| ME | Appetite change increase | 808 | 0 | 114 | 0 |
| ME | Diminished cognitive functioning | 918 | 0 | 152 | 0.0769 |
| ME | Worthlessness or guilty feeling | 1033 | 0 | 104 | 0.0399 |
| ME | Suicidal ideation | 1334 | 0 | 285 | 0.2812 |
| ME | Loss of interest or pleasure | 993 | 0 | 160 | 0.169 |
| ME | Fatigue | 888 | 0 | 158 | 0.046 |
| ME | Psychomotor retardation | 869 | 0 | 149 | 0.0698 |
| ME | Psychomotor agitation | 1093 | 0 | 188 | 0.0663 |
| ME | Depression score | 839 | 0 | 141 | 0.0614 |
| MN | Minerals | ||||
| MN | Calcium level | 3792 | 597 | 763 | 0.2503 |
| MN | Phosphorus level | 1461 | 1 | 177 | 0.106 |
| MN | Sodium level | 3992 | 776 | 821 | 0.1384 |
| MN | Potassium level | 632 | 0 | 94 | 0 |
| MN | Chloride level | 763 | 3 | 213 | 0.0967 |
| MN | CO2 level | 759 | 0 | 116 | 0.0499 |
| MN | Vitamin D3 | 1271 | 8 | 117 | 0.0743 |
| MC | Musculoskeletal | ||||
| MC | Bone density by DEXA | 799 | 0 | 88 | 0.2982 |
| MC | Spondylosis | 419 | 0 | 73 | 0 |
| MC | Spondylolisthesis | 939 | 0 | 147 | 0.4245 |
| MC | Compression fracture | 1189 | 0 | 229 | 0.4589 |
| MC | Intervertebral disc space narrowing | 529 | 0 | 97 | 0.0603 |
| OS | Ophthalmic system | ||||
| OS | Cataract | 865 | 0 | 98 | 0.0214 |
| OS | Drusen | 842 | 0 | 124 | 0 |
| OS | Macular change | 881 | 0 | 137 | 0.0347 |
| OS | Optic disc cupping | 755 | 0 | 133 | 0.0856 |
| OS | Optic nerve fiber loss | 886 | 0 | 144 | 0.1659 |
| OS | Intraocular pressure, right | 1468 | 30 | 263 | 0.156 |
| OS | Intraocular pressure, Left | 1451 | 4 | 175 | 0.1074 |
| PS | Pulmonary system | ||||
| PS | Forced vital capacity (L) | 1519 | 0 | 188 | 0.2408 |
| PS | Forced vital capacity (%) | 1524 | 0 | 192 | 0.2426 |
| PS | First second of forced expiration (L) | 1474 | 0 | 182 | 0.2895 |
| PS | First second of forced expiration (%) | 2066 | 0 | 168 | 0.2876 |
| PS | FEV1/FVC | 1611 | 86 | 228 | 0.2055 |
| PS | Pulmonary function test category | 563 | 0 | 95 | 0.081 |
| RS | Renal system | ||||
| RS | Blood Urea Nitrogen | 2551 | 123 | 450 | 0.1825 |
| RS | Renal stone | 824 | 0 | 143 | 0.1145 |
| RS | Creatinine | 2059 | 29 | 399 | 0.2535 |
| RS | Estimated glomerular filtration rate | 1353 | 34 | 207 | 0.2791 |
| RS | Urine pH | 3432 | 651 | 772 | 0.1166 |
| RS | Urine albumin | 1388 | 0 | 212 | 0.1625 |
| TM | Tumor marker | ||||
| TM | Cancer Antigen 125 | 7923 | 145 | 1508 | 0 |
| TM | Carbohydrate antigen 19–9 | 27,140 | 936 | 4624 | 0.0312 |
| TM | Alpha Fetoprotein | 4244 | 119 | 654 | 0.1803 |
| TM | Carcinoembryonic antigen | 1835 | 202 | 375 | 0.0356 |
| TM | Prostate-Specific Antigen | 12,999 | 279 | 2559 | 0.1082 |
Figure 2Trans-ethnic, trans-nationality comparison of PheWAS. We compared PheWAS results among Korean, Japanese, and European populations. Phenotypes existing in all datasets were used. We evaluated loci significantly associated only in Koreans (black bar), in both populations (gray bar), and only in the other population (bright gray bar). The colored bar at the top indicates phenotype categories. The Y axis denotes the ratio (%) of loci in each classification, with 100% being the total significant in the compared populations. (A) PheWAS result comparison between Korean and Japanese populations. (B) PheWAS result comparison between Korean and European populations.
Figure 3Post-PheWAS analysis. (A) Network analysis A network representation of gene-phenotype associations related to metabolic syndrome was constructed from 102 genes associated with metabolic syndrome and 128 phenotypes sharing those genes. Each edge is a phenotype-gene association, with genes for significant loci (P < 10–4) being annotated by VEP. Node size is proportional to degree, which is the number of connections. Pink nodes correspond to phenotypes and green nodes to genes. (B) Relationships among obesity indices We visualized the comparison among the obesity indices such as body mass index (BMI), waist circumference (WC), visceral adipose tissue (VAT) and total adipose tissue (TAT) amount by drawing a the venn-diagram for cross phenotype association of phenotypes or genes. (C) Cross-phenotype mapping Cross-phenotype mappings were generated based on the bipartite phenotype network, in turn constructed from the connections among phenotypes sharing at least one locus. Coffee consumption, which is one of the lifestyle phenotypes, had 31 phenotype degrees in the bipartite phenotype network. (D) Causal inference mapping We estimated causal inferences in phenotype pairs based on cross-phenotype associations using Mendelian randomization (MR), and constructed a causal inference map. The direction of the arrow is the causality result from MR (Blue arrows, skeletal muscle mass as outcome; Red arrows, skeletal muscle mass as exposure; Green arrows, bidirectional). Pairs observed in the bipartite phenotype network but insignificant in MR have straight black lines without arrows.