| Literature DB >> 25360770 |
Mehdi Bagheri Hamaneh1, Yi-Kuo Yu1.
Abstract
Identifying similar diseases could potentially provide deeper understanding of their underlying causes, and may even hint at possible treatments. For this purpose, it is necessary to have a similarity measure that reflects the underpinning molecular interactions and biological pathways. We have thus devised a network-based measure that can partially fulfill this goal. Our method assigns weights to all proteins (and consequently their encoding genes) by using information flow from a disease to the protein interaction network and back. Similarity between two diseases is then defined as the cosine of the angle between their corresponding weight vectors. The proposed method also provides a way to suggest disease-pathway associations by using the weights assigned to the genes to perform enrichment analysis for each disease. By calculating pairwise similarities between 2534 diseases, we show that our disease similarity measure is strongly correlated with the probability of finding the diseases in the same disease family and, more importantly, sharing biological pathways. We have also compared our results to those of MimMiner, a text-mining method that assigns pairwise similarity scores to diseases. We find the results of the two methods to be complementary. It is also shown that clustering diseases based on their similarities and performing enrichment analysis for the cluster centers significantly increases the term association rate, suggesting that the cluster centers are better representatives for biological pathways than the diseases themselves. This lends support to the view that our similarity measure is a good indicator of relatedness of biological processes involved in causing the diseases. Although not needed for understanding this paper, the raw results are available for download for further study at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbpmn/DiseaseRelations/.Entities:
Mesh:
Year: 2014 PMID: 25360770 PMCID: PMC4216010 DOI: 10.1371/journal.pone.0110936
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1The relation between the results of enrichment analysis and the average correlation .
The percentage of diseases for which GO/KEGG terms were identified by Saddlesum as a function of average correlation . To facilitate the calculation, we sorted all s in ascending order and placed them into bins each containing diseases. The percentage is then measured by the number of diseases with GO/KEGG term hit(s) per bin. For very low average correlations is significantly lower.
Figure 2The probabilities of having common term associations or being siblings.
(A) The probabilities of finding a pair of diseases with (1) common GO/KEGG terms (red), (2) the same parents and common associations (blue), and (3) the same parents without shared biological terms (green) are shown. Here only pairs with a defined term similarity are considered. (B) For pairs with undefined (pairs with at least one member not associated with any biological terms), the distribution of siblings is plotted as a function of correlation. (C) and (D) show similar quantities to (A) and (B) respectively, when the biological term associations are directly retrieved from the KEGG DISEASE database.
Figure 3Comparison with MimMiner.
(A) The inset figure shows the number () of weighted disease pairs with shared KEGG pathways that were ranked higher than by MimMiner (in red) and or by our method (in blue). Also shown in the inset (in green) is the weighted number of pairs with common term associations missed (ranked lower) by MimMiner, but identified (ranked higher) by our model. In the main panel, the same quantities corresponding to the proposed method are plotted after exclusion of obvious candidates for being related. The closeness between the blue and green curves indicates that the non-apparent candidates found by our method are largely missed by MimMiner. Displayed in panel (B) is the inverse of average normalized rank versus the term similarity cutoff. At large similarity cutoff, the higher the average normalized rank (the smaller and thus the larger ) the better the agreement between the quality scores (cosine similarity or the MimMiner score) and the KEGG annotation.
Examples of relationships between diseases that have undefined (or zero) term similarity and undefined (or zero) MimMiner score, and are from different disease families.
| First disease ID | Second disease ID | First disease name | Second disease name |
|
| Relationship |
| MESH:C567070 | MESH:C536289 | Atypical mycobacteriosis, familial, X-Linked 1 | Immunodeficiency without anhidrotic ectodermal dysplasia | 1.0 | 1 | Both diseases have been associated with nuclear factor kappa B signaling |
| MESH:C536198 | MESH:C536113 | Ehlers-Danlos syndrome type 6 | Nevo syndrome | 1.0 | 1 | These diseases have been suggested to be identical |
| MESH:C537494 | MESH:C566453 | Stickler syndrome, type 3 | Deafness, autosomal recessive 53 | 1.0 | 1 | Hearing loss is one of the symptoms of Stickler syndrome, type 3 |
| MESH:C535407 | MESH:D053609 | Gamma aminobutyric acid transaminase deficiency | Lethargy | 0.9961 | 1 | Lethargy has been reported in pateints with Gamma aminobutyric acid transaminase deficiency |
| MESH:D016301 | MESH:C562440 | Alveolar bone loss | Hypophosphatasia, childhood | 0.9584 | 1 | These are both tooth/bone diseases. |
| MESH:C564629 | MESH:C538150 | Deafness, autosomal recessive 31 | Syndactyly Cenani-Lenz type | 0.1040 | 0 | Hearing loss has been associated with Cenani-Lenz type of syndactyly |
| MESH:C536156 | MESH:C536601 | Keratomalacia | Amaurosis congenita of Leber, type 2 | 0.0835 | 0 | These are both eye diseases. |
| MESH:C563906 | MESH:C563425 | Cardiomyopathy, dilated, 1o | Diabetes mellitus, permanent neonatal | 0.0197 | 0 | ATP-sensitive potassium channels have been reported to be involved in both diseases |
| MESH:C564334 | MESH:D008527 | Acrocapitofemoral dysplasia | Medulloblastoma | 0.0168 | 0 | These disease have been associated with Hedgehog signaling pathway |
| MESH:C565334 | OMIM:188890 | Epilepsy, nocturnal frontal lobe, type 3 | Tobacco addiction, susceptibility to | 0.0155 | 0 | Both diseases have been associated with mutations in nicotinic acetylcholine receptors |
and denote correlation and the number of common gene associations respectively.
Figure 4The effect of clustering on the minimum term size.
The minimum term size distribution of (A) GO and (B) KEGG terms reported by SaddleSum enrichment analyses when using disease weight vectors directly (red curves) and when using cluster center vectors (blue curves). Not only the most informative (smallest size) terms are preserved during clustering, the clustering procedure seems to shift the minimum term size distribution towards the small end, indicating the likelihood of providing even more specific terms when weight vectors are grouped under the proposed clustering procedure.
Terms associated with the cluster with the highest probability to include the Parkinson's disease.
| Term ID | Name | E-value |
| GO:0007268 | synaptic transmission | 4.22e-12 |
| GO:0019226 | transmission of nerve impulse | 4.58e-12 |
| GO:0035637 | multicellular organismal signaling | 2.00e-11 |
| GO:0007267 | cell-cell signaling | 1.13e-10 |
| GO:0050877 | neurological system process | 4.54e-10 |
| GO:0001963 | synaptic transmission, dopaminergic | 5.34e-08 |
| GO:0007270 | neuron-neuron synaptic transmission | 4.47e-07 |
| GO:0044708 | single-organism behavior | 8.73e-07 |
| GO:0003008 | system process | 1.16e-06 |
| GO:0030534 | adult behavior | 1.69e-06 |
| GO:0001505 | regulation of neurotransmitter levels | 3.81e-06 |
| GO:0006805 | xenobiotic metabolic process | 4.11e-06 |
| GO:0071466 | cellular response to xenobiotic stimulus | 4.59e-06 |
| GO:0009410 | response to xenobiotic stimulus | 4.59e-06 |
| GO:0044281 | small molecule metabolic process | 1.62e-05 |
| GO:0007610 | behavior | 5.39e-05 |
| GO:1901615 | organic hydroxy compound metabolic proce | 6.38e-05 |
| GO:0023052 | signaling | 6.72e-05 |
| GO:0044700 | single organism signaling | 6.72e-05 |
| GO:0065008 | regulation of biological quality | 7.75e-05 |
| KEGG:hsa04080 | Neuroactive ligand-receptor interaction | 2.61e-19 |
| KEGG:hsa05010 | Alzheimer's disease | 2.73e-06 |
| KEGG:hsa05012 | Parkinson's disease | 8.69e-06 |
Terms associated with the cluster with the highest probability to include the Retinitis Pigmentosa type 7.
| Term ID | Name | E-value |
| GO:0007603 | phototransduction, visible light | 5.64e-09 |
| GO:0009584 | detection of visible light | 9.95e-09 |
| GO:0007602 | phototransduction | 1.69e-08 |
| GO:0009583 | detection of light stimulus | 2.51e-08 |
| GO:0009582 | detection of abiotic stimulus | 1.12e-07 |
| GO:0009581 | detection of external stimulus | 2.98e-07 |
| GO:0051606 | detection of stimulus | 3.66e-06 |
| GO:0022400 | regulation of rhodopsin mediated signali | 1.07e-05 |
| GO:0016056 | rhodopsin mediated signaling pathway | 1.31e-05 |
| GO:0009416 | response to light stimulus | 1.47e-05 |
| GO:0009314 | response to radiation | 1.45e-04 |
| GO:0071482 | cellular response to light stimulus | 9.23e-04 |
| GO:0008277 | regulation of G-protein coupled receptor | 5.23e-03 |
| GO:0071478 | cellular response to radiation | 5.23e-03 |
| KEGG:hsa04744 | Phototransduction | 8.51e-04 |
Figure 5Two example clusters.
The clusters that include Parkinson's disease (OMIM:168600) and Retinitis pigmentosa 7 (MESH:C564284) are shown in panels (A) and (B) respectively. In each case, only diseases with membership probabilities larger than 5% are shown. The size of each node (circle) is proportional to the probability of membership of that node in the cluster. For a disease pair, the thickness of the line linking the diseases is proportional to , where is the correlation between the two diseases and is the minimum correlation between all diseases shown in each cluster. The names and IDs of the members of each cluster are also given. Diseases whose names are written in the same color (other than black) have exactly the same gene associations and so are equivalent in our study. Equivalent diseases are represented by one node in the figure. For example, the node identified by C566637 in panel (B) represents the four diseases whose names are in green, i.e. C535804, C566637, C565827, and C562479.
An example of a set of diseases that are associated with the same gene, but some have different phenotypes.
| Disease ID | Disease annotation | Disease family |
| MESH:C566619 | Exudative Vitreoretinopathy 4 | Eye diseases |
| MESH:C536527 | Van Buchem disease type 2 | Musculoskeletal diseases |
| MESH:C536063 | Osteoporosis-pseudoglioma syndrome | Musculoskeletal diseases |
| MESH:C536748 | Worth syndrome | Musculoskeletal disease |
| MESH:C536056 | Osteopetrosis autosomal dominant type 1 | Musculoskeletal diseases |