| Literature DB >> 27829360 |
Jingchao Ni1, Mehmet Koyuturk1, Hanghang Tong2, Jonathan Haines3, Rong Xu3, Xiang Zhang4.
Abstract
BACKGROUND: Accurately prioritizing candidate disease genes is an important and challenging problem. Various network-based methods have been developed to predict potential disease genes by utilizing the disease similarity network and molecular networks such as protein interaction or gene co-expression networks. Although successful, a common limitation of the existing methods is that they assume all diseases share the same molecular network and a single generic molecular network is used to predict candidate genes for all diseases. However, different diseases tend to manifest in different tissues, and the molecular networks in different tissues are usually different. An ideal method should be able to incorporate tissue-specific molecular networks for different diseases.Entities:
Keywords: Disease gene prioritization; Network of networks; Tissue-specific molecular networks
Mesh:
Year: 2016 PMID: 27829360 PMCID: PMC5103411 DOI: 10.1186/s12859-016-1317-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Different network models for disease gene prioritization. a the traditional heterogeneous network model, b the network of networks (NoN) model, where T1 to T4 represent different tissues (and their specific molecular networks) that are specific to the corresponding diseases, and c the network of star networks (NoSN) model, where each disease corresponds to multiple molecular networks of its specific tissue. In the NoN and NoSN models, the known disease-gene associations are regarded as the seed nodes. In b, the seed nodes are highlighted in blue
Summary of symbols
| Symbol | definition |
|---|---|
|
| The adjacency matrix of disease similarity network |
|
| The adjacency matrix of the tissue-specific molecular network of |
| disease | |
|
| The adjacency matrix of the center molecular network of disease |
| (for NoSN) | |
|
| The adjacency matrix of the |
| disease | |
|
| An |
|
| The ranking score vector of genes in |
|
| The seed vector of genes in |
|
| The ranking score vector of genes in |
|
| The seed vector of genes in |
|
| The ranking score vector of genes in |
|
| The seed vector of genes in |
|
| Number of diseases in |
|
| Number of genes in |
|
| Number of genes in |
|
| Number of genes in |
|
| Number of auxiliary molecular networks of disease |
|
| Degree of disease |
|
| The set of common genes in |
|
| The set of genes in |
|
| The set of common genes in |
|
| The set of genes in |
|
| The set of common genes in |
|
| The set of genes in |
Fig. 2An illustration for NoN and NoSN construction. TPPIN: tissue-specific PPIN. TGCN: tissue-specific GCN. First, each disease in the disease similarity network is assigned a TPPIN using the disease-tissue association matrix, if the shown two criteria are satisfied. Thus we obtain an NoN. Then each disease in the NoN is assigned a TGCN as the auxiliary molecular network to form an NoSN, using the same strategy as assigning TPPINs to diseases. Please see text for details
AUC value comparison
| Network model | Method | AUC50 | AUC100 | AUC300 | AUC500 | AUC700 | AUC1000 |
|---|---|---|---|---|---|---|---|
| Heterogeneous network | CIPHER-DN | 0.2332*** | 0.2439*** | 0.2510*** | 0.2524*** | 0.2530*** | 0.2535*** |
| CIPHER-SP | 0.2068*** | 0.2478*** | 0.3112*** | 0.3369*** | 0.3568*** | 0.3790*** | |
| RWRH | 0.2382*** | 0.2849*** | 0.3849*** | 0.4503** | 0.4922** | 0.5388** | |
| PRINCE | 0.2632* | 0.3065* | 0.3787** | 0.4247*** | 0.4594*** | 0.5092*** | |
| BIRW | 0.2615* | 0.3082* | 0.4095* | 0.4653 | 0.5068* | 0.5513* | |
| Katz | 0.2101*** | 0.2726*** | 0.3831** | 0.4451* | 0.4838** | 0.5289** | |
| CATAPULT | 0.1370*** | 0.1957*** | 0.3148*** | 0.3803*** | 0.4315*** | 0.4875*** | |
| NoN |
| 0.2711* | 0.3235 | 0.4244 | 0.4815 | 0.5233 | 0.5665 |
| NoSN a |
|
|
|
|
|
|
|
| NoSN b |
| 0.2900 | 0.3400 | 0.4355 | 0.4882 | 0.5331 | 0.5798 |
|
|
|
|
|
|
|
|
aNoSN with one set of tissue-specific GCNs. bNoSN with two sets of tissue-specific GCNs. The p-value ranges: * represents 0.005∼0.05, ** represents 0.0005∼0.005, *** represents <0.0005
Fig. 3Robustness evaluation. The threshold is set to select tissue-specific genes to construct tissue-specific GCNs
Fig. 4Effects of parameters on the performance of CR and CRSTAR
Fig. 5Learned weights and corresponding ranking inconsistencies
Fig. 6ROC curve and AUC value comparisons on predicting new associations. The black solid lines in a and b denote what random guess would have achieved