| Literature DB >> 36008469 |
Courtney Astore1, Hongyi Zhou1, Bartosz Ilkowski1, Jessica Forness1, Jeffrey Skolnick2.
Abstract
To understand the origin of disease comorbidity and to identify the essential proteins and pathways underlying comorbid diseases, we developed LeMeDISCO (Large-Scale Molecular Interpretation of Disease Comorbidity), an algorithm that predicts disease comorbidities from shared mode of action proteins predicted by the artificial intelligence-based MEDICASCY algorithm. LeMeDISCO was applied to predict the occurrence of comorbid diseases for 3608 distinct diseases. Benchmarking shows that LeMeDISCO has much better comorbidity recall than the two molecular methods XD-score (44.5% vs. 6.4%) and the SAB score (68.6% vs. 8.0%). Its performance is somewhat comparable to the phenotype method-based Symptom Similarity Score, 63.7% vs. 100%, but LeMeDISCO works for far more cases and its large comorbidity recall is attributed to shared proteins that can help provide an understanding of the molecular mechanism(s) underlying disease comorbidity. The LeMeDISCO web server is available for academic users at: http://sites.gatech.edu/cssb/LeMeDISCO .Entities:
Mesh:
Substances:
Year: 2022 PMID: 36008469 PMCID: PMC9411158 DOI: 10.1038/s42003-022-03816-9
Source DB: PubMed Journal: Commun Biol ISSN: 2399-3642
Comparison of LeMeDISCO’s J-score with the XD-score, NG, S score and Symptom Similarity Score for correlations with comorbidity quantified by the log(RR) score, φ-score, and recalla.
| Log(RR) score | φ-score | Recall | Precision | AUROC | AUPRC | |
|---|---|---|---|---|---|---|
| LeMeDISCO | ||||||
| Permute drug–protein [ | 0.050 ± 0.004 (0.0) [1.3 × 10−54] | 0.060 ± 0.004 (0.0) [2.1 × 10−12] | 8.8 ± 0.7% [2.0 × 10−315] | 74.7 ± 1.3%[0.027] | 0.495 ± 0.006 [3.5 × 10−9] | 0.755 ± 0.004 [4.9 × 10−11] |
| Permute drug–disease [ | 0.0026 ± 0.0056 (0.24) [4.0 × 10−92] | 0.0029 ± 0.0075 (0.19) [1.2 × 10−31] | 0.0137 ± 0.0828% [0.0] | 54.3 ± 46.5% [0.31] | 0.500 ± 0.001 [1.7 × 10−112] | 0.757 ± 0.0006 [2.3 × 10−294] |
| LeMeDISCO | ||||||
| XD-score[ | 0.042 (2.8 × 10−13) | 0.071 (9.7 × 10−35) | 6.4% | 77.8% | 0.510 | 0.801 |
| NGd | 0.0047 (0.42) | 0.053 (6.6 × 10−20) | — | — | — | — |
| LeMeDISCO | 77.7% | |||||
| −0.0620 (0.057) | −0.0413 (0.205) | 8.0% | 0.434 | 0.761 | ||
| LeMeDISCO | 0.140 (5.2 × 10−13) | 0.135 (3.8 × 10−12) | 63.7% | 79.3% | 0.512 | 0.814 |
| Symptom similarity[ | ||||||
aNumbers in parentheses “()” are the p values of the corresponding correlation. Bold indicates the best results for the given dataset. For the permutations of drug–protein and drug–disease relationships, the average ± standard deviation of 100 runs with different random seeds was given, the number in parenthesis “[]” is the p value converted from the z-score = (LeMeDISCO value-average)/standard deviation to characterize the statistical significance of the difference between LeMeDISCO and permutation tests.
bMapping the DOID IDs from the human DO database to ICD-9 IDs of ref. [1], gives a set of 191,966 disease pairs.
cMapped the ICD-9 disease code to our DOID of DO and obtained a consensus subset of 29,658 disease pairs from Table 1’s dataset of 97,665 disease pairs in ref. [8].
dNG is the number of shared genes between disease pairs in ref. [8].
eConsensus set of 943 disease pairs from the dataset of ref. [7] and our dataset of 191,966.
fA consensus dataset of 2621 disease pairs was obtained from their Supplementary dataset 4 of ref. [6] compared to our set of 191,966 pairs.
Fig. 1Summary results for 3608 distinct diseases.
a ICD-10 main classification coverage across the 3608 diseases. Some diseases are found in multiple groups; they were counted in each group with which they are associated. b Histogram of the number of MOAs. c Frequency (bin size 0.02) and density of the J-score for the ~2 million significant (q value <0.05), non-redundant disease pairs. d Frequency (bin size = 100) and density of the degree (number of edges) of each disease (node). e Fraction of diseases in the giant component of disease–disease network versus the J-score cutoff.
Top 20 comorbidities (excluding same disease pair, (i.e., CAD-CAD)), top 20 comorbidity enriched MOA proteins (with respect to original disease), and top 20 pathways associated with the prediction CAD results.
| Comorbidities | MOA proteins | Pathways | ||||
|---|---|---|---|---|---|---|
| Disease | J-score | Gene name | Score | Pathway | ||
| Heart disease | 0.47 | <0.0001 | COX7A2L | 0.38 | Class A/1 (Rhodopsin-like receptors) | 3.7 × 10−9 |
| Cardiovascular system disease | 0.45 | <0.0001 | COX7A2 | 0.38 | Olfactory signaling pathway | 2.7 × 10−8 |
| Obstructive lung disease | 0.43 | <0.0001 | COX7A1 | 0.38 | GPCR ligand binding | 3.8 × 10−8 |
| Asthma | 0.44 | <0.0001 | NR4A3 | 0.36 | The canonical retinoid cycle in rods (twilight vision) | 4.5 × 10−7 |
| Myocardial infarction | 0.39 | <0.0001 | PGR | 0.36 | ADORA2B mediated anti-inflammatory cytokines production | 1.4 × 10−6 |
| Familial hyperlipidemia | 0.33 | <0.0001 | LXN | 0.35 | Nuclear receptor transcription pathway | 2.3 × 10−6 |
| Diabetes mellitus | 0.33 | <0.0001 | OSBPL8 | 0.35 | Anti-inflammatory response favoring Leishmania parasite infection | 7.6 × 10−6 |
| Rhinitis | 0.32 | <0.0001 | SLC8A3 | 0.35 | Leishmania parasite growth and survival | 7.6 × 10−6 |
| Liver disease | 0.31 | <0.0001 | KCNA10 | 0.35 | Peptide ligand-binding receptors | 3.3 × 10−5 |
| Hyperthyroidism | 0.31 | <0.0001 | NR3C2 | 0.35 | SUMOylation of intracellular receptors | 3.5 × 10−5 |
| Chronic obstructive pulmonary disease | 0.30 | <0.0001 | RARRES1 | 0.35 | G alpha (i) signaling events | 9.7 × 10−5 |
| Lymphedema | 0.29 | <0.0001 | GPRC5A | 0.34 | Visual phototransduction | 1.3 × 10−4 |
| Allergic asthma | 0.29 | <0.0001 | ANXA1 | 0.34 | Amine ligand-binding receptors | 1.5 × 10−4 |
| Intrinsic asthma | 0.29 | <0.0001 | NR3C1 | 0.33 | Leishmania infection | 1.6 × 10−4 |
| Pulmonary emphysema | 0.29 | <0.0001 | ELOVL7 | 0.33 | Integrin cell surface interactions | 3.6 × 10−4 |
| Syndrome | 0.29 | <0.0001 | TSPAN13 | 0.33 | Sodium/Calcium exchangers | 9.6 × 10−4 |
| Congestive heart failure | 0.28 | <0.0001 | GRP | 0.33 | Retinoid cycle disease events | 9.7 × 10−4 |
| Kidney disease | 0.28 | <0.0001 | ELOVL3 | 0.33 | Diseases associated with visual transduction | 9.7 × 10−4 |
| pseudohypoparathyroidism | 0.28 | <0.0001 | ELOVL1 | 0.32 | Reduction of cytosolic Ca++ levels | 9.7 × 10−4 |
| Fatty liver disease | 0.28 | <0.0001 | OSBPL5 | 0.32 | Diseases of the neuronal system | 9.7 × 10−4 |
Up to top 20 comorbidities, top 20 comorbidity enriched MOA proteins (with respect to input), and top 20 pathways (ranked by p value since q values are the same) associated with the prediction CAD GWAS-driven LeMeDISCO results using the gene set from ref. [30].
| Comorbidities | MOA proteins | Pathways | |||||
|---|---|---|---|---|---|---|---|
| Disease | J-score | Gene name | Score | Pathway | |||
| Renal artery disease | 0.028 | 5.8 × 10−4 | PEX10 | 0.24 | RAB geranylgeranylation | 0.16 | 4.6 × 10−3 |
| Anuria | 0.022 | 5.3 × 10−3 | BEND6 | 0.23 | Platelet activation, signaling and aggregation | 0.16 | 7.7 × 10−3 |
| Anterior uveitis | 0.015 | 0.022 | NEURL1 | 0.22 | MET activates RAP1 and RAC1 | 0.16 | 0.017 |
| CCM2 | 0.20 | RHO GTPases activate KTN1 | 0.16 | 0.017 | |||
| FGD6 | 0.20 | Response to elevated platelet cytosolic Ca2+ | 0.16 | 0.019 | |||
| CENPW | 0.20 | Killing mechanisms | 0.16 | 0.019 | |||
| PCID2 | 0.20 | WNT5:FZD7-mediated leishmania damping | 0.16 | 0.019 | |||
| RPL17 | 0.19 | Diseases of signal transduction by growth factor receptors and second messengers | 0.16 | 0.022 | |||
| MANEAL | 0.18 | PTK6 Regulates RHO GTPases, RAS GTPase, and MAP kinases | 0.16 | 0.022 | |||
| HHAT | 0.17 | TFAP2 (AP-2) family regulates the transcription of growth factors and their receptors | 0.16 | 0.024 | |||
| PHYHIP | 0.16 | Purine catabolism | 0.16 | 0.030 | |||
| IYD | 0.16 | RHO GTPases activate CIT | 0.16 | 0.031 | |||
| VEGFA | 0.16 | Signal transduction by L1 | 0.16 | 0.033 | |||
| HNRNPD | 0.14 | VEGFR2 mediated cell proliferation | 0.16 | 0.033 | |||
| AGT | 0.13 | RHO GTPases Activate NADPH Oxidases | 0.16 | 0.037 | |||
| PLEKHA1 | 0.12 | RHO GTPases activate PAKs | 0.16 | 0.037 | |||
| SERPINA1 | 0.11 | TRAF6 mediated NF-kB activation | 0.16 | 0.037 | |||
| NUDT5 | 0.04 | Neutrophil degranulation | 0.16 | 0.038 | |||
| RAB23 | 0.04 | NOTCH3 Activation and Transmission of Signal to the Nucleus | 0.16 | 0.039 | |||
| NKIRAS2 | 0.04 | Signaling by NTRK2 (TRKB) | 0.16 | 0.039 | |||
Top 20 comorbidities (excluding same disease pair, (i.e., OC-OC)), top 20 comorbidity enriched MOA proteins (with respect to original disease), and top 20 pathways associated with the prediction OC results.
| Comorbidities | MOA proteins | Pathways | ||||
|---|---|---|---|---|---|---|
| Disease | J-score | Gene name | Score | Pathway | ||
| testicular cancer | 0.42 | <0.0001 | TEK | 0.5 | MAPK1/MAPK3 signaling | 7.10 × 10−15 |
| fallopian tube cancer | 0.41 | <0.0001 | TYRO3 | 0.49 | EPH-Ephrin signaling | 8.59 × 10−15 |
| squamous cell carcinoma | 0.40 | <0.0001 | RYK | 0.49 | RAF/MAP kinase cascade | 2.37 × 10−14 |
| tongue squamous cell carcinoma | 0.39 | <0.0001 | MERTK | 0.49 | MAPK family signaling cascades | 2.69 × 10−14 |
| nodular prostate | 0.36 | <0.0001 | AXL | 0.49 | FLT3 Signaling | 3.83 × 10−14 |
| cervical cancer | 0.36 | <0.0001 | LTK | 0.48 | EPH-ephrin-mediated repulsion of cells | 3.96 × 10−14 |
| myeloproliferative neoplasm | 0.32 | <0.0001 | EGFR | 0.47 | PI5P, PP2A, and IER3 Regulate PI3K/AKT Signaling | 4.07 × 10−13 |
| inflammatory breast carcinoma | 0.31 | <0.0001 | KIT | 0.47 | Negative regulation of the PI3K/AKT network | 9.15 × 10−13 |
| urinary bladder cancer | 0.30 | <0.0001 | KDR | 0.47 | Constitutive Signaling by Aberrant PI3K in Cancer | 3.79 × 10−12 |
| lung cancer | 0.30 | <0.0001 | FLT3 | 0.47 | PI3K/AKT Signaling in Cancer | 1.3 × 10−10 |
| bile duct cancer | 0.29 | <0.0001 | FLT1 | 0.47 | EPHA-mediated growth cone collapse | 5.51 × 10−10 |
| parotid gland cancer | 0.29 | <0.0001 | ROR2 | 0.47 | Diseases of signal transduction by growth factor receptors and second messengers | 3.45 × 10−9 |
| neurofibroma | 0.29 | <0.0001 | RET | 0.47 | PIP3 activates AKT signaling | 8.38 × 10−8 |
| peritoneum cancer | 0.28 | <0.0001 | PTK2B | 0.47 | EPHB-mediated forward signaling | 2.83 × 10−7 |
| gallbladder cancer | 0.28 | <0.0001 | PTK2 | 0.47 | Intracellular signaling by second messengers | 4.76 × 10−7 |
| Barrett’s esophagus | 0.28 | <0.0001 | NTRK3 | 0.47 | Toll-like receptor 4 (TLR4) cascade | 4.87 × 10−6 |
| tongue cancer | 0.27 | <0.0001 | NTRK2 | 0.47 | Toll-like receptor cascades | 2.16 × 10−5 |
| larynx cancer | 0.27 | <0.0001 | NTRK1 | 0.47 | ERBB2 activates PTK6 signaling | 2.23 × 10−5 |
| kidney cancer | 0.27 | <0.0001 | MUSK | 0.47 | ERBB2 regulates cell motility | 3.98 × 10−5 |
| lung benign neoplasm | 0.27 | <0.0001 | LMTK3 | 0.47 | PI3K events in ERBB2 signaling | 5.02 × 10−5 |
Top 20 comorbidities, seven comorbidity enriched MOA proteins (with respect to input), and top 20 pathways associated with the prediction OC GWAS-driven results using the gene set from ref. [52].
| Comorbidities | MOA proteins | Pathways | ||||
|---|---|---|---|---|---|---|
| Disease | J-score | Gene name | Score | Pathway | ||
| angiosarcoma | 0.0047 | 0.012 | RAD51C | 0.41 | DNA repair | 3.80 × 10−8 |
| skin cancer | 0.0036 | 0.012 | RAD51D | 0.39 | Diseases of DNA repair | 6.48 × 10−8 |
| skin benign neoplasm | 0.0036 | 0.012 | MSH6 | 0.39 | Mismatch repair | 1.23 × 10−7 |
| ovarian carcinoma | 0.0036 | 0.024 | MSH2 | 0.25 | Mismatch repair (MMR) directed by MSH2:MSH6 (MutSalpha) | 1.23 × 10−7 |
| biliary tract disease | 0.0035 | 0.028 | MLH1 | 0.12 | Resolution of D-loop structures through synthesis-dependent strand annealing (SDSA) | 5.59 × 10−7 |
| genetic disease | 0.0033 | 0.012 | BRIP1 | 0.065 | Transcriptional regulation by TP53 | 8.02 × 10−7 |
| myxoid leiomyosarcoma | 0.0032 | 0.017 | STK11 | 0.050 | Resolution of D-loop structures | 8.02 × 10−7 |
| epithelioid leiomyosarcoma | 0.0032 | 0.017 | Resolution of D-loop structures through holliday junction intermediates | 8.02 × 10−7 | ||
| leiomyosarcoma | 0.0032 | 0.017 | Presynaptic phase of homologous DNA pairing and strand exchange | 1.09 × 10−6 | ||
| mesenchymoma | 0.0031 | 0.019 | Homologous DNA pairing and strand exchange | 1.23 × 10−6 | ||
| hematopoietic system disease | 0.0031 | 0.019 | TP53 regulates the transcription of DNA repair genes | 4.22 × 10−6 | ||
| lymphatic system disease | 0.0031 | 0.039 | HDR through homologous recombination (HRR) | 4.24 × 10−6 | ||
| childhood medulloblastoma | 0.0031 | 0.012 | Mismatch repair (MMR) directed by MSH2:MSH3 (MutSbeta) | 1.61 × 10−5 | ||
| adult medulloblastoma | 0.0031 | 0.012 | HDR through homologous recombination (HRR) or single-strand annealing (SSA) | 2.79 × 10−5 | ||
| medullomyoblastoma | 0.0031 | 0.012 | Homology directed repair | 2.98 × 10−5 | ||
| chondrosarcoma | 0.0031 | 0.019 | DNA double-strand break repair | 4.83 × 10−5 | ||
| pancreas disease | 0.0031 | 0.041 | Meiotic recombination | 4.85 × 10−4 | ||
| metachromatic leukodystrophy | 0.0030 | 0.021 | Regulation of TP53 activity through phosphorylation | 5.23 × 10−4 | ||
| uveal cancer | 0.0030 | 0.021 | Meiosis | 8.11 × 10−4 | ||
| urinary system benign neoplasm | 0.0029 | 0.024 | Reproduction | 1.12 × 10−3 | ||
Fig. 2Schematic representation of LeMeDISCO.
a The method for determining the MOA proteins associated with a disease indication via MEDICASCY, and b The method for determining the comorbidities associated with a given disease and its molecular mechanisms via LeMeDISCO.