| Literature DB >> 25228247 |
Kai Sun, Joana P Gonçalves, Chris Larminie, Nataša Przulj1.
Abstract
BACKGROUND: Understanding the relationship between diseases based on the underlying biological mechanisms is one of the greatest challenges in modern biology and medicine. Exploring disease-disease associations by using system-level biological data is expected to improve our current knowledge of disease relationships, which may lead to further improvements in disease diagnosis, prognosis and treatment.Entities:
Mesh:
Year: 2014 PMID: 25228247 PMCID: PMC4174675 DOI: 10.1186/1471-2105-15-304
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Comparison of studies on inferring disease-disease associations
| Data | Size | Evaluation | |
|---|---|---|---|
| van Driel | OMIM | 5132 phenotypes in OMIM | Comparing results with genotypic similarities |
| Lage | OMIM | 7000 OMIM record pairs | Evaluating results against the overlap of the OMIM record pairs |
| Goh | OMIM | 1284 OMIM diseases | Analysing network topologicalproperties |
| Huang | GWAS | 7 diseases | Comparing results with phenotypic similarities |
| Li and Agarwal (2009) [ | Pubmed abstracts,biological pathways | 1028 diseases in MeSH | Comparing results with MeSHclassification |
| Kim | GWAS | 53 clinical traits related tosevere asthma | Mining the literature manually |
| Hu and Agarwal (2009) [ | Expression data | 645 diseases in MeSH | Comparing results with MeSHclassification |
| Suthram | Expression data, PPI | 54 diseases | Evaluating results against genetic similarities |
| Lewis | GWAS | 61 diseases | Comparing results with Huang |
| Mathur and Dinakarpandian | DO annotation, GOannotation | 36 diseases (for evaluation) | Evaluating results using 68 curated disease associations |
| Our study | Disease-gene associations, GOannotation, PPI | 543 ICD-9 diseases | Evaluating results against ICD-9classification, comorbidity, andgenetic similarities derived fromGWAS data |
The comparison is based on the data used to derive associations (denoted by ‘Data’), number of diseases evaluated (denoted by ‘Size’) and benchmarks used for evaluation (denoted by ‘Evaluation’). The number of diseases evaluated in our study is computed as the union of diseases annotated in the four disease-gene association datasets we analysed, given in Figure 1.
Figure 1The overlap of datasets. The overlap of diseases (denoted by ‘D’), genes (denoted by ‘G’) and their associations (denoted by ‘A’) between the four disease-gene association datasets we analysed. Boxes on the left list the sizes of the datasets. The size of the intersection of the datasets is marked in bold.
Evaluation of our measures against ICD-9 classification
| Data | Group | Annotation-based | Function-based | Topology-based |
|---|---|---|---|---|
| OMIM | Same | 0.0114 ± 0.0665 | 0.0355 ± 0.0892 | 0.4349 ± 0.1101 |
| Different | 0.0010 ± 0.0139 | 0.0118 ± 0.0314 | 0.3996 ± 0.0760 | |
|
| 1.2785 ×10−13 | 1.0423 ×10−52 | 2.1257 ×10−54 | |
| CTD | Same | 0.0361 ± 0.1590 | 0.0728 ± 0.1754 | 0.4863 ± 0.1770 |
| Different | 0.0050 ± 0.0274 | 0.0333 ± 0.0662 | 0.4408 ± 0.1368 | |
|
| 1.4887 ×10−23 | 1.4040 ×10−9 | 2.0240 ×10−25 | |
| FunDO | Same | 0.0418 ± 0.1344 | 0.0991 ± 0.1611 | 0.5560 ± 0.2214 |
| Different | 0.0100 ± 0.0262 | 0.0549 ± 0.0830 | 0.4952 ± 0.1636 | |
|
| 1.7609 ×10−144 | 9.6708 ×10−100 | 2.7037 ×10−90 | |
| HuGENet | Same | 0.0931 ± 0.1798 | 0.2470 ± 0.2123 | 0.8031 ± 0.2248 |
| Different | 0.0438 ± 0.0566 | 0.1881 ± 0.1522 | 0.7837 ± 0.2292 | |
|
| 1.4585 ×10−74 | 9.9053 ×10−72 | 4.5910 ×10−14 | |
| Intersection | Same | 0.0338 ± 0.1511 | 0.0593 ± 0.1907 | 0.3826 ± 0.1131 |
| Different | 0.0024 ± 0.0329 | 0.0089 ± 0.0428 | 0.3496 ± 0.1020 | |
|
| 2.2667 ×10−2 | 2.7448 ×10−4 | 5.4716 ×10−4 | |
| Union | Same | 0.0350 ± 0.1179 | 0.0963 ± 0.1463 | 0.5680 ± 0.2226 |
| Different | 0.0085 ± 0.0219 | 0.0583 ± 0.0818 | 0.5042 ± 0.1716 | |
|
| 1.3493 ×10−211 | 7.1478 ×10−113 | 4.1709 ×10−141 |
Numbers in the table are similarity scores between diseases from the same ICD-9 categories, compared with those from different ICD-9 categories. P-values are calculated by using the Mann −Whitney U test.
Figure 2Evaluation against comorbidity. ROC curves obtained by evaluating the three disease similarity measures against comorbidity. Due to space limitations, only ROC curves of FunDO are shown here (see Additional file 1: Figure S5 for ROC curves of other datasets). The ϕ-correlation threshold was set to 0.06 (the same threshold was used in [47]). We evaluated diseases annotated with at least 1, 3, 5, 7, 10, 15 genes, shown by curves with different colours in each plot.
Evaluation of our measures against comorbidity
| Data | Annotation-based | Function-based | Topology-based |
|---|---|---|---|
| OMIM | 0.8009 ± 0.0277 (0.5740) | 0.8694 ± 0.0073 (0.5120) | 0.8495 ± 0.0011 (0.5044) |
| CTD | 0.7849 ± 0.0164 (0.5404) | 0.7316 ± 0.0046 (0.5047) | 0.7949 ± 0.0042 (0.5203) |
| FunDO | 0.7426 ± 0.0088 (0.4672) | 0.7142 ± 0.0017 (0.4940) | 0.7497 ± 0.0016 (0.5031) |
| HuGENet | 0.7563 ± 0.0001 (0.5084) | 0.8185 ± 0.0001 (0.4987) | 0.7153 ± 0.0015 (0.4922) |
| Intersection | 0.9925 ± 0.0001 (0.6013) | 0.9802 ± 0.0001 (0.5081) | 0.9958 ± 0.0041 (0.4664) |
| Union | 0.8225 ± 0.0045 (0.4704) | 0.7491 ± 0.0001 (0.4999) | 0.7939 ± 0.0022 (0.5008) |
| Average | 0.8194 ± 0.0837 (0.5270) | 0.8106 ± 0.0930 (0.5029) | 0.8163 ± 0.0907 (0.4979) |
Numbers in the table are AUC values obtained by evaluating the three disease similarity measures against comorbidity associations. The ϕ-correlation threshold was set to 0.06 (the same threshold was used in [47]), and all diseases annotated with at least 3 genes were evaluated. Average AUC values obtained by using randomised scores are shown by numbers in brackets (standard deviations are not shown in the table due to space limitation). Each evaluation test was run 30 times to compute the statistics reported in the table.
Evaluation of our measures against GWAS
| Data | Annotation- | Function- | Topology- |
|---|---|---|---|
| based | based | based | |
| F/G | 0.7224 ± 0.0010 | 0.6781 ± 0.0001 | 0.6863 ± 0.0009 |
| (0.4945) | (0.4968) | (0.5005) | |
| Common | 0.7527 ± 0.0010 | 0.7147 ± 0.0001 | 0.7555 ± 0.0020 |
| (0.4926) | (0.5005) | (0.4951) |
Numbers in the table are AUC values obtained by evaluating the three disease similarity measures against disease associations derived from highly confident GWAS data. Only diseases annotated with at least 3 genes were evaluated. ‘F/G’ are diseases having associated genes in both FunDO and GWAS data (99 diseases in total). ‘Common’ are diseases having associated genes in all four disease-gene association datasets (given in Figure 1) and GWAS data (50 diseases in total). Average AUC values obtained by using randomised scores are shown by numbers in brackets (standard deviations are not shown in the table due to space limitation). Each evaluation test was run 30 times to compute the statistics reported in the table.
List of the top 10 diseases associated with DM
| Rank | Code | Disease name | Reference |
|---|---|---|---|
| 1 | 239 | Neoplasms of unspecified | PMID: 23639840 |
| nature | |||
| 2 | 155 | Malignant neoplasm of liver | GWAS |
| and intrahepatic bile ducts | |||
| 3 | 710 | Diffuse diseases of connective | GWAS |
| tissue | |||
| 4 | 714 | Rheumatoid arthritis and other | GWAS |
| inflammatory polyarthropathies | |||
| 5 | 256 | Ovarian dysfunction | ICD-9, GWAS |
| 5 | 278 | Overweight, obesity and | ICD-9, comorbidity, |
| other hyperalimentation | GWAS | ||
| 7 | 401 | Essential hypertension | Comorbidity |
| 8 | 295 | Schizophrenic disorders | PMID: 17474808 |
| 9 | 282 | Hereditary hemolytic anemias | GWAS |
| 10 | 289 | Other diseases of blood and | PMID: 11727971 |
| blood-forming organs |
The top 10 diseases associated with DM were inferred using the topology-based similarity measure and FunDO as the source of disease-gene associations. Only diseases annotated in all four disease-gene association datasets are listed in the table. For a disease associated with DM according to ICD-9, comorbidity or GWAS, we added the supported evidence to the reference (the last column). The remaining disease associations were validated via mining the literature on PubMed (http://www.ncbi.nlm.nih.gov/pubmed), and for each disease only one reference (shown by PubMed ID) was listed in the table due to space limitation.