| Literature DB >> 34890160 |
Yonghyun Nam1, Sang-Hyuk Jung, Anurag Verma, Vivek Sriram, Hong-Hee Won, Jae-Seung Yun, Dokyoon Kim.
Abstract
The polygenic risk score (PRS) can help to identify individuals' genetic susceptibility for various diseases by combining patient genetic profiles and identified single-nucleotide polymorphisms (SNPs) from genome-wide association studies. Although multiple diseases will usually afflict patients at once or in succession, conventional PRSs fail to consider genetic relationships across multiple diseases. Even multi-trait PRSs, which take into account genetic effects for more than one disease at a time, fail to consider a sufficient number of phenotypes to accurately reflect the state of disease comorbidity in a patient, or are biased in terms of the traits that are selected. Thus, we developed novel network-based comorbidity risk scores to quantify associations among multiple phenotypes from phenome-wide association studies (PheWAS). We first constructed a disease-SNP heterogeneous multi-layered network (DS-Net), which consists of a disease network (disease-layer) and SNP network (SNP-layer). The disease-layer describes the population-level interactome from PheWAS data. The SNP-layer was constructed according to linkage disequilibrium. Both layers were attached to transform the information from a population-level interactome to individual-level inferences. Then, graph-based semi-supervised learning was applied to predict possible comorbidity scores on disease-layer for each subject. The SNP-layer serves as receiving individual genotyping data in the scoring process, and the disease-layer serves as the propagated output for an individual's multiple disease comorbidity scores. The possible comorbidity scores were combined by logistic regression, and it is denoted as netCRS. The DS-Net was constructed from UK Biobank PheWAS data, and the individual genetic profiles were collected from the Penn Medicine Biobank. As a proof-of-concept study, myocardial infarction (MI) was selected to compare netCRS with the PRS with pruning and thresholding (PRS-PT). The combined model (netCRS + PRS-PT + covariates) achieved an AUC improvement of 6.26% compared to the (PRS-PT + covariates) model. In terms of risk stratification, the combined model was able to capture the risk of MI up to approximately eight-fold higher than that of the low-risk group. The netCRS and PRS-PT complement each other in predicting high-risk groups of patients with MI. We expect that using these risk prediction models will allow for the development of prevention strategies and reduction of MI morbidity and mortality.Entities:
Mesh:
Year: 2022 PMID: 34890160 PMCID: PMC8682919
Source DB: PubMed Journal: Pac Symp Biocomput ISSN: 2335-6928
Figure 1.Overall framework of network-based comorbidity risk scoring algorithms (netCRS):
Left) individual genotype data collected from Penn Medicine BioBank. Middle) schematic description of disease-SNP heterogeneous multi-layered network (DS-Net). SNP-layer constructed by linkage-disequilibrium and disease-layer constructed using UK biobank PheWAS summary data. Right) Upper right represents possible comorbidity scores of each disease for individual. The possible comorbidity scores are combined by logistic regression, and the combined scores, netCRS, are generated by each patient
Figure 2.Visualization of MI-specific disease-layer:
The node size is the sum of the weighted degree of the node, indicating the relative size, and the node labels represents their PheCode. The thickness of the line represent the edge weights (similarity). Parentheses in disease categories represent the percentages of diseases that belong to a category.
Demographics table of the development and validation cohort.
| Development |
| ||||
|---|---|---|---|---|---|
| Phenotypes | 135 (out of 1,403) | ||||
| SNPs | 39,365 (after genetic pre-processing) | ||||
|
| |||||
| Total | MI cases | Controls | |||
|
| (N = 4972) | (N = 763) | (N = 4209) | ||
|
| <0.001 | ||||
|
| 1,854 (37.3%) | 171 (22.4%) | 1683 (40.0%) | ||
|
| 3,118 (62.7%) | 592 (77.6%) | 2526 (60.0%) | ||
|
| 62.0 ± 14.8 | 68.4 ± 11.2 | 60.9 ± 15.1 | <0.001 | |
Performance comparison of netCRS and PRS-PT in terms of AUC
| Models | Hyper-parameter ( | ||||
|---|---|---|---|---|---|
| 0.01 | 0.1 | 1 | 10 | 100 | |
| [1] PRS-PT | 0.5827 (Baseline) | ||||
| [2] netCRS* | 0.6028 |
| 0.6395 | 0.6197 | 0.6039 |
| [3] netCRS + PRS-PT | 0.6274 |
| 0.6570 | 0.6389 | 0.6255 |
| [4] PRS-PT + Sex + Age | 0.6979 (Baseline) | ||||
| [5] netCRS + Sex + Age* | 0.7083 |
| 0.7261 | 0.7144 | 0.7051 |
| [6] netCRS + PRS-PT + Sex + Age* | 0.7230 |
| 0.7396 | 0.7287 | 0.7199 |
Diagnostic odds ratio and 95% confidential intervals for the MI according to netCRS risk group: We compared three different models: (a) model [2]: netCRS alone, (b) model [5]: netCRS + sex + age, and (c) model [6]: netCRS + PRS-PT + sex + age.
| Total (N = 4,972) | No. of MI/ | Model [2] | Model [5] | Model [6] | |||
|---|---|---|---|---|---|---|---|
| OR (95% CI) | OR (95% CI) | OR (95% CI) | |||||
|
| 94/1243 | Reference | |||||
|
| 150/1243 | 1.68 (1.28–2.21) | <0.001 | 1.71 (1.30–2.25) | <0.001 | 1.65 (1.25–2.19) | <0.001 |
|
| 218/1243 | 2.60 (2.02–3.37) | <0.001 | 2.72 (2.10–3.55) | <0.001 | 2.70 (2.08–3.53) | <0.001 |
|
| 301/1243 |
|
|
|
|
|
|
Abbreviations: OR, odds ratio; CI, confidence interval; PRS, polygenic risk score.
p-value for netCRS categories.
Genetic subgroups based on the combinations of PRS and netCRS
| Odds ratio | PRS-PT(MI) | ||||
|---|---|---|---|---|---|
| Low risk | Intermediate risk | High risk | Very high risk | ||
|
| Low risk (0th-25th) | Reference (19/334) | 1.18 (20/299) | 1.35 (21/273) | 2.46 (34/243) |
| Intermediate risk (26th-50th) | 1.46 (23/276) | 2.36 (36/268) | 2.77 (45/286) | 3.07 (46/263) | |
| High risk (51st-75th) | 2.07 (33/280) | 4.59 (71/272) | 3.94 (52/241) | 4.55 (60/232) | |
| Very high risk (76th-100th) | 4.04 (52/226) | 4.66 (58/219) | 5.60 (78/245) |
| |
For calculating odds ratio, we performed multivariate logistic regression analysis for MI classification task (myocardial infarction (MI) cases versus Normal control). Logistic model: (MI cases vs. Normal control) ~ 16 combinations (PRS and netCRS groups) + sex + age. With the lowest risk group (Low PRS group & Low netCRS group) as a reference, the odds ratio of each combination was reported in this table.