| Literature DB >> 28757797 |
Apichat Suratanee1, Kitiporn Plaimas2.
Abstract
The associations between proteins and diseases are crucial information for investigating pathological mechanisms. However, the number of known and reliable protein-disease associations is quite small. In this study, an analysis framework to infer associations between proteins and diseases was developed based on a large data set of a human protein-protein interaction network integrating an effective network search, namely, the reverse k-nearest neighbor (RkNN) search. The RkNN search was used to identify an impact of a protein on other proteins. Then, associations between proteins and diseases were inferred statistically. The method using the RkNN search yielded a much higher precision than a random selection, standard nearest neighbor search, or when applying the method to a random protein-protein interaction network. All protein-disease pair candidates were verified by a literature search. Supporting evidence for 596 pairs was identified. In addition, cluster analysis of these candidates revealed 10 promising groups of diseases to be further investigated experimentally. This method can be used to identify novel associations to better understand complex relationships between proteins and diseases.Entities:
Keywords: network-based method; protein-disease associations; reverse nearest neighbor search
Year: 2017 PMID: 28757797 PMCID: PMC5513527 DOI: 10.1177/1177932217720405
Source DB: PubMed Journal: Bioinform Biol Insights ISSN: 1177-9322
Figure 1.Overview of the method. The framework starts by constructing a protein-protein interaction network using the information from the STRING database.[21] Integrating the protein-disease annotations from Menche et al,[23] we applied the reverse k-nearest neighbor algorithm to the network for identifying influenced proteins for each protein in the network. Then, an enrichment analysis of the diseases that were significantly related to the influenced proteins was undertaken, and the association between each protein and each disease was inferred. Later, the protein-disease pair candidates were used for finding disease-disease associations. Finally, all candidate pairs, either protein-disease pairs or disease-disease pairs, were validated by text mining the PubMed database.
Figure 2.Degree distribution of our protein-protein interaction network is scale free.
Network properties of the constructed protein-protein interaction network.
| No. of nodes | No. of interactions | Average of clustering coefficient | Average of degree | Average of closeness centrality | Average of betweenness |
|---|---|---|---|---|---|
| 17 880 | 203 319 | 0.2673 | 22.7426 | 5.62E–09 | 7989.8160 |
Figure 3.Venn diagram of the number of protein-disease association pairs.
List of potential candidate pairs of proteins and disease with more than or equal to 30 publications found in the PubMed.
| Protein name (HUGO) | Disease (MeSH) | No. of articles found in PubMed |
|---|---|---|
| PTH | Bone diseases | 830 |
| APOB | Insulin resistance | 632 |
| GATA1 | Leukemia | 427 |
| AKT1 | Leukemia | 253 |
| MUC16 | Ovarian neoplasms | 252 |
| APOB | Metabolic syndrome x | 244 |
| MPL | Myeloproliferative disorders | 231 |
| CXCR4 | Myocardial infarction | 213 |
| DAG1 | Muscular dystrophies | 212 |
| ABCB1 | Prostatic neoplasms | 138 |
| HBB | Pathological conditions, signs, and symptoms | 120 |
| COL4A5 | Pathological conditions, signs, and symptoms | 113 |
| GHRH | Dwarfism | 108 |
| DOT1L | Leukemia | 106 |
| MAPK14 | Neoplasms | 94 |
| MYOD1 | Sarcoma | 86 |
| APOB | Myocardial ischemia | 65 |
| CDK5 | Amyotrophic lateral sclerosis | 60 |
| VWF | Blood platelet disorders | 54 |
| CTNNB1 | Type 2 diabetes mellitus | 54 |
| DLG1 | Pathological conditions, signs, and symptoms | 53 |
| MAP2K1 | Lung neoplasms | 49 |
| PALB2 | Ovarian neoplasms | 49 |
| APOB | Hyperinsulinism | 49 |
| CBL | Diabetes mellitus | 47 |
| TLR7 | Virus diseases | 45 |
| TFAP2C | Death | 41 |
| HLA-DQA1 | Graves disease | 35 |
| FGF8 | Limb deformities, congenital | 30 |
| HLA-C | Ankylosing spondylitis | 30 |
Figure 4.Performance of identifying protein-disease associations using RkNN and kNN methods. The barplots illustrate the precision of protein-disease association predictions by the RkNN and kNN methods. The precisions of both methods are compared by varying parameter k from 1 to 30. The fractions of the number of true protein-disease association detected to the number of identified protein-disease association are presented on the top of each bar. kNN indicates k-nearest neighbor; RkNN, reverse k-nearest neighbor.
Figure 5.Performance of identifying protein-disease associations using RkNN and kNN methods on the interfered network. (A)-(C) show performances of the methods on the interfered network by removing proteins that have node degree more than 300, 200, and 100, respectively. kNN indicates k-nearest neighbor; RkNN, reverse k-nearest neighbor.
List of 10 clusters consisting of clustering score, the number of nodes and edges, and the lists of proteins and diseases in each cluster.
| Cluster | Score | No. of nodes | No. of edges | Proteins as nodes | Diseases as nodes |
|---|---|---|---|---|---|
| 1 | 3.6 | 6 | 9 | GUCY2D, PRPF8, PDE6G | [retinal diseases], [retinal degeneration], [eye diseases, hereditary] |
| 2 | 3.333 | 7 | 10 | GATA1, AKT1, MYB | [lymphoproliferative disorders], [lymphatic diseases], [leukemia], [immunoproliferative disorders] |
| 3 | 3 | 5 | 6 | SUN2, DAG1 | [muscular disorders, atrophic], [muscular diseases], [muscular dystrophies] |
| 4 | 3 | 5 | 6 | DOT1L, TAL1, TCF3 | [precursor cell lymphoblastic leukemia-lymphoma], [leukemia, lymphoid] |
| 5 | 2.667 | 4 | 4 | CBL, GRPEL2 | [glucose metabolism disorders], [diabetes mellitus] |
| 6 | 2.667 | 4 | 4 | GATA4, PKP2 | [cardiovascular abnormalities], [heart defects, congenital] |
| 7 | 2.667 | 4 | 4 | TP53, IL10RA | [inflammatory bowel diseases], [gastroenteritis] |
| 8 | 2.667 | 4 | 4 | FGG, SERPINC1 | [blood coagulation disorders, inherited], [hemorrhagic disorders] |
| 9 | 2.667 | 4 | 4 | PPIB, COL2A1 | [bone diseases, developmental], [osteochondrodysplasias] |
| 10 | 2.667 | 4 | 4 | APOB, TPP1 | [lipid metabolism, inborn errors], [lipid metabolism disorders] |
Figure 6.Clustering results. (A)-(J) present ten promising clusters found in the protein-disease association network.