| Literature DB >> 25689268 |
Shunyao Wu1, Fengjing Shao1, Jun Ji2, Rencheng Sun2, Rizhuang Dong3, Yuanke Zhou2, Shaojie Xu2, Yi Sui2, Jianlong Hu2.
Abstract
Based on the hypothesis that the neighbors of disease genes trend to cause similar diseases, network-based methods for disease prediction have received increasing attention. Taking full advantage of network structure, the performance of global distance measurements is generally superior to local distance measurements. However, some problems exist in the global distance measurements. For example, global distance measurements may mistake non-disease hub proteins that have dense interactions with known disease proteins for potential disease proteins. To find a new method to avoid the aforementioned problem, we analyzed the differences between disease proteins and other proteins by using essential proteins (proteins encoded by essential genes) as references. We find that disease proteins are not well connected with essential proteins in the protein interaction networks. Based on this new finding, we proposed a novel strategy for gene prioritization based on protein interaction networks. We allocated positive flow to disease genes and negative flow to essential genes, and adopted network propagation for gene prioritization. Experimental results on 110 diseases verified the effectiveness and potential of the proposed method.Entities:
Mesh:
Year: 2015 PMID: 25689268 PMCID: PMC4331530 DOI: 10.1371/journal.pone.0116505
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Networks used in this work.
| Network | Number of interactors | Number of interactions | Number of interactors in the largest component | Number of interactions in the largest component |
|---|---|---|---|---|
| i2d | 14060 | 117002 | 13980 | 116956 |
| STRING | 11632 | 128104 | 11502 | 128017 |
| All data source | 15215 | 200044 | 15106 | 200012 |
Statistics of the proteins in the protein interaction network constructed based on the i2d database.
| Number of proteins | Number of proteins in the largest component | |
|---|---|---|
|
| 2490 | 2481 |
|
| 1942 | 1938 |
|
| 297 | 297 |
|
| 2193 | 2184 |
|
| 1645 | 1641 |
|
| 9925 | 9858 |
Statistics of the genes in the protein interaction network constructed based on the STRING database.
| Number of genes | Number of genes in the largest component | |
|---|---|---|
|
| 2339 | 2310 |
|
| 1706 | 1696 |
|
| 277 | 275 |
|
| 2062 | 2035 |
|
| 1429 | 1421 |
|
| 7864 | 7771 |
Fig 1Median values of the proportions of non-disease essential proteins among n (n ∈ {1, 2, …, 12}) neighbors in the protein interaction network constructed based on the i2d database.
Fig 2Median values of the proportions of non-disease essential proteins among n (n ∈ {1, 2, …, 11}) neighbors in the protein interaction network constructed based on the STRING database.
Median values of the proportions of non-disease essential proteins among n (n ∈ {1, 2, 3, 4, 5, 6}) neighbors of nonessential disease proteins (D −) and other proteins (O) in the protein interaction network constructed based on the i2d database.
|
|
| p-value | |
|---|---|---|---|
|
| 11.11% | 9.09% | 0.5104 |
|
| 20.04% | 20.92% | 7.9795e-06 |
|
| 14.68% | 15.82% | 5.1915e-31 |
|
| 7.06% | 7.62% | 4.9202e-20 |
|
| 4.01% | 4.32% | 1.9152e-09 |
|
| 0.00% | 1.64% | 3.2285e-19 |
Median values of the proportions of non-disease essential proteins among n (n ∈ {1, 2, 3, 4, 5, 6, 7}) neighbors of nonessential disease proteins (D −) and other proteins (O) in the protein interaction network constructed based on the STRING database.
|
|
| p-value | |
|---|---|---|---|
|
| 10.91% | 11.54% | 5.6717e-03 |
|
| 18.72% | 19.41% | 5.1421e-06 |
|
| 16.57% | 17.50% | 1.4300e-19 |
|
| 8.57% | 8.73% | 2.5534e-05 |
|
| 6.33% | 6.09% | 4.8792e-12 |
|
| 6.72% | 6.80% | 0.6847 |
|
| 6.25% | 6.25% | 0.2228 |
Fig 3An example of gene prioritization based on network.
(a) The disease proteins a and b are selected as the training set, while c as the test disease protein. (b) Global distance measurements may mistake the non-disease hub protein e for a disease protein.
Statistics of the performance (the average values of enrichment score) with disease as a unit.
|
|
|
| ||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
| Monogenic | 0.895 | 0.941 | 23.386 | 26.266 | 27.181 | 30.725 |
| Complex | 0.829 | 0.895 | 10.476 | 11.029 | 13.855 | 14.982 |
| Cancer | 0.932 | 0.951 | 17.822 | 18.449 | 17.855 | 18.423 |
| All | 0.892 | 0.938 | 21.370 | 23.751 | 24.710 | 27.666 |
One tailed t-Tests for Table 6: NP versus Competing Approaches.
|
|
|
| ||
|---|---|---|---|---|
|
|
|
|
| |
| Monogenic | 7.488e-029 | 3.942e-030 | 4.484e-010 | 6.281e-010 |
| Complex | 0.007 | 0.004 | 0.028 | 0.018 |
| Cancer | 3.034e-004 | 1.873e-004 | 0.992 | 0.994 |
| All | 2.585e-032 | 1.754e-033 | 4.690e-008 | 1.270e-008 |
Fig 4ROC curves.
Statistics of the performance (the average values of enrichment score) with disease as a unit to detect disease genes of 83 diseases verified after 2008.
Significances (p-value) between the results of NP and NP were calculated by the one tailed student’s t-test.
|
|
| p-value | ||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
| Monogenic | 9.915 | 11.629 | 13.579 | 15.603 | 7.137e-006 | 1.006e-005 |
| Complex | 6.896 | 8.478 | 8.611 | 12.114 | 0.064 | 0.031 |
| Cancer | 6.132 | 7.669 | 8.484 | 10.336 | 0.045 | 0.041 |
| All | 9.047 | 10.773 | 12.432 | 14.548 | 2.179e-007 | 1.760e-007 |
Fig 5Leukoencephalopathy with Vanishing White Matter Protein-Protein Interaction Network.