| Literature DB >> 26673408 |
Apichat Suratanee1, Kitiporn Plaimas2.
Abstract
Categorizing human diseases provides higher efficiency and accuracy for disease diagnosis, prognosis, and treatment. Disease-disease association (DDA) is a precious information that indicates the large-scale structure of complex relationships of diseases. However, the number of known and reliable associations is very small. Therefore, identification of DDAs is a challenging task in systems biology and medicine. Here, we developed a novel network-based scoring algorithm called DDA to identify the relationships between diseases in a large-scale study. Our method is developed based on a random walk prioritization in a protein-protein interaction network. This approach considers not only whether two diseases directly share associated genes but also the statistical relationships between two different diseases using known disease-related genes. Predicted associations were validated by known DDAs from a database and literature supports. The method yielded a good performance with an area under the curve of 71% and outperformed other standard association indices. Furthermore, novel DDAs and relationships among diseases from the clusters analysis were reported. This method is efficient to identify disease-disease relationships on an interaction network and can also be generalized to other association studies to further enhance knowledge in medical studies.Entities:
Keywords: disease-disease association; network-based method; prioritization technique; scoring method
Year: 2015 PMID: 26673408 PMCID: PMC4674013 DOI: 10.4137/BBI.S35237
Source DB: PubMed Journal: Bioinform Biol Insights ISSN: 1177-9322
Figure 1Network example of DDA score calculation.
Notes: The left panel shows a simulated network in which nodes represent genes and edges represent interactions. The network consists of the disease genes of three diseases, D1, D2, and D3. Red, green, and blue nodes represent the diseases D1, D2, and D3, respectively. The DDA scores of the relationships between D1–D2, D1–D3, and D2–D3 are presented in the right panel.
Figure 2Investigating score distributions between a set of known disease associations and unknown disease associations. Two distributions of scores, between a set of known and a set of unknown disease associations. The scores of our method based on RWR, F_Flow, NetRank and NetScore are shown in the Figures 1(A), (B), (C) and (D), respectively.
Performance measurement for identifying disease associations using our methods with different prioritization techniques.
| RWR | NetRank | NetScore | F_Flow | |
|---|---|---|---|---|
| Performance (AUC) | 0.71 | 0.57 | 0.46 | 0.53 |
| 2.95E–16 | 0.016 | 0.163 | 0.289 |
Performance measurement for identifying disease association using scores of different association indices.
| JACCARD | SIMPSON | GEOMETRIC | COSINE | PCC | |
|---|---|---|---|---|---|
| Performance (AUC) | 0.5706 | 0.5709 | 0.5695 | 0.5705 | 0.6213 |
Figure 3Performances of our method based on four different prioritization algorithms on an edge swapping network and the node removing network. (A) Edges in the original protein-protein interaction network were swapped with different amounts of edge swappings (20%, 40%, 60%, and 80%). (B) Nodes were removed from the original protein-protein interactions with different amounts of nodes (20%, 40%, 60%, and 80%). The performances of our method based on F_Flow, NetRank, NetScore, and RWR on the interfered networks are shown.
Predicted disease associations with the number of studies found in PubMed.
| PHENOTYPI C SERIES 1 (PS1) | PHENOTYPIC SERIES 2 (PS2) | OMIM ID CORRESPONDING TO PS1 | OMIM ID CORRESPONDING TO PS2 | DISEASE-DISEASE ASSOCIATION (DDA) SCORE | PheNUMA (1: FOUND, 0: NOT FOUND) | NUMBER OF FOUND STUDIES IN PUBMED |
|---|---|---|---|---|---|---|
| Muscular dystrophy-dystroglycanopathy, type B | Muscular dystrophy-dystroglycanopathy, type C | PS613155 | PS609308 | 0.9995 | 0 | 0 |
| Epilepsy, generalized, with febrile seizures plus | Seizures, familial febrile | PS604233 | PS121210 | 0.9994 | 0 | 136 |
| Muscular dystrophy-dystroglycanopathy, type A | Muscular dystrophy-dystroglycanopathy, type B | PS236670 | PS613155 | 0.9994 | 0 | 0 |
| Muscular dystrophy-dystroglycanopathy, type A | Muscular dystrophy-dystroglycanopathy, type C | PS236670 | PS609308 | 0.9994 | 0 | 1 |
| Mitochondrial DNA depletion syndrome | Progressive external ophthalmoplegia with mtDNA deletions | PS603041 | PS157640 | 0.9992 | 1 | 3 |
| Muscular dystrophy-dystroglycanopathy, type B | Muscular dystrophy, limb-girdle, auto-somal recessive | PS613155 | PS253600 | 0.9988 | 0 | 0 |
| Muscular dystrophy-dystroglycanopathy, type C | Muscular dystrophy, limb-girdle, auto-somal recessive | PS609308 | PS253600 | 0.9987 | 1 | 0 |
| Joubert syndrome | Meckel syndrome | PS213300 | PS249000 | 0.9985 | 0 | 34 |
| Muscular dystrophy-dystroglycanopathy, type A | Muscular dystrophy, limb-girdle, auto-somal recessive | PS236670 | PS253600 | 0.9981 | 0 | 0 |
| Atrial fibrillation, familial | Brugada syndrome | PS608583 | PS601144 | 0.9978 | 1 | 40 |
| Meckel syndrome | Nephronophthisis | PS249000 | PS256100 | 0.9974 | 0 | 21 |
| Maple syrup urine disease | Pyruvate dehydro-genase complex deficiency | PS248600 | PS312170 | 0.9972 | 0 | 1 |
| Cardiomyopathy, familial hypertrophic | Left ventricular noncompaction | PS192600 | PS604169 | 0.9969 | 0 | 13 |
| Epiphyseal dysplasia, multiple | Stickler syndrome | PS132400 | PS108300 | 0.9967 | 0 | 0 |
| Bardet-Biedl syndrome | Meckel syndrome | PS209900 | PS249000 | 0.9966 | 1 | 13 |
| Brugada syndrome | Long QT syndrome | PS601144 | PS192500 | 0.9966 | 1 | 435 |
| Atrial fibrillation, familial | Long QT syndrome | PS608583 | PS192500 | 0.9963 | 1 | 45 |
| Hemolytic uremic syndrome | Macular degeneration, age-related | PS235400 | PS603075 | 0.9961 | 0 | 0 |
| Joubert syndrome | Nephronophthisis | PS213300 | PS256100 | 0.9960 | 0 | 66 |
| Microphthalmia, isolated | Microphthalmia, isolated, with coloboma | PS251600 | PS300345 | 0.9960 | 0 | 19 |
Figure 4Network of selected predicted disease associations with a high score. Selected 129 predicted disease associations with a score higher than 0.95 were used for constructing a network.
Figure 5Clusters from our predicted interaction network. Three highly connected regions computed from MCODE from our predicted disease association network. Clusters from left to right panels in the figure are ranked from high-score to low-score cluster.
Clusters of selected 129 disease associations.
| CLUSTER | SCORE | NUMBER OF NODES | NUMBER OF EDGES | NODE IDS |
|---|---|---|---|---|
| 1 | 6 | 13 | 36 | PS115200, PS613155, PS601419, PS609308, PS604233, PS121210, PS161800, PS253600, PS608583, PS601144, PS192600, PS604169, PS192500 |
| 2 | 5.5 | 6 | 14 | PS213300, PS208500, PS249000, PS209900, PS204000, PS256100 |
| 3 | 3 | 3 | 3 | PS312170, PS168000, PS248600 |
Algorithm: DDA score calculation
| Input: | PPInetwork:= A protein-protein interaction network |
| GeneOf(Disease):= A set of disease-associated genes | |
| SetOfDiseases:= A set of diseases | |
| Output: | DDAscore:= A disease-disease association scores for all disease pairs |
| Procedure: Prioritize(Network, seeds) | |
| //Prioritizing genes in a network using genes associated with Di as seeds | |
| Rank_i:= Prioritize(PPInetwork, GeneOf(Di)) | |
| //Prioritizing genes in a network using genes associated with Dj as seeds | |
| Rank_j:= Prioritize(PPInetwork, GeneOf(Dj)) | |
| //Calculating DDA scores | |
| Totalgenes:= getNumberOfGenes(PPInetwork) DDAscore(Di,Dj):=
| |