| Literature DB >> 29297381 |
Zhen Tian1, Maozu Guo2,3, Chunyu Wang1, Xiaoyan Liu1, Shiming Wang1.
Abstract
BACKGROUND: In recent years, biological interaction networks have become the basis of some essential study and achieved success in many applications. Some typical networks such as protein-protein interaction networks have already been investigated systematically. However, little work has been available for the construction of gene functional similarity networks so far. In this research, we will try to build a high reliable gene functional similarity network to promote its further application.Entities:
Keywords: Gene functional similarity network; Gene ontology; Referenced gene association network; Topological similarity
Mesh:
Year: 2017 PMID: 29297381 PMCID: PMC5751769 DOI: 10.1186/s12859-017-1969-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The flowchart for the construction of RGFSN
Fig. 2Distribution of functional similarity based on seven different methods. We can find that result for single gene functional similarity method is bias, while the similarity values for the integrated method are distributed from 0 to 1 evenly
Fig. 3Relationship of gene functional similarity scores and protein proximity scores. Genes with longer path will have smaller functional similarity value
Summary properties of four biological networks
| Property | HPRD | BioGRID | DIP | RGFSN |
|---|---|---|---|---|
| Number of nodes | 9616 | 20,024 | 5176 | 8765 |
| Number of edges | 39,239 | 325,377 | 22,977 | 41,646 |
| Cluster coefficient | 0.102 | 0.106 | 0.098 | 0.118 |
| Diameter | 14 | 8 | 10 | 8 |
| Radius | 1 | 1 | 1 | 5 |
| Centralization | 0.027 | 0.102 | 0.054 | 0.028 |
| Shortest paths | 84,981,088 | 398,421,606 | 26,066,196 | 768,063,238 |
| Characteristic path length | 4.209 | 3.306 | 3.986 | 4.158 |
| Average number of neighbors | 7.704 | 23.862 | 8.742 | 9.764 |
| Density | 0.001 | 0.001 | 0.002 | 0.001 |
| Heterogeneity | 1.889 | 2.347 | 1.778 | 1.020 |
Four fitting models of degree distribution for each network
| Distribution model | P | RGFSN | BioGRID | DIP | HPRD |
|---|---|---|---|---|---|
| Gaussian distribution |
| 4.26 ± 1.04 | 2.85 ± 1.09 | 4.56 ± 0.88 | 7.03 ± 1.72 |
|
| 7.80 ± 0.03 | 1.54 ± 0.06 | −8.83 ± 10.13 | −0.95 ± 1.12 | |
|
| 4.18 ± 0.08 | 1.51 ± 2.91 | 3.36 ± 0.18 | 3.65 ± 2.06 | |
|
| 7.68 ± 0.07 | −5.43 ± 3.12 | 6.57 ± 1.12 | 1.02 ± 1.77 | |
|
| 0.7652 | 0.2695 | 0.9837 | 0.9822 | |
| Power law distribution |
| 6.64 ± 1.03 | 3.86 ± 0.035 | 1.29 ± 0.032 | 2.38 ± 0.06 |
|
| 0.850 ± 0.19 | −1.04 ± 0.01 | −1.01 ± 0.03 | −1.10 ± 0.03 | |
|
|
|
| 0.9628 | 0.9623 | |
| Log-normal distribution |
| 4.89 ± 1.96 | 3.03 ± 0.94 | 0.45 ± 4.21 | 0.84 ± 7.91 |
|
| 7.36 ± 0.98 | 1.18 ± 0.26 | 1.09 ± 0.72 | 1.09 ± 0.69 | |
|
| 0.69 ± 0.10 | 0.82 ± 0.26 | 1.12 ± 0.69 | 1.09 ± 0.67 | |
|
| 8.17 ± 3.15 | 5.50 ± 0.44 | 1.86 ± 0.17 | 3.45 ± 0.32 | |
|
| 0.6469 | 0.7691 | 0.6214 | 0.6205 | |
| Exponential distribution |
| 6.68 ± 1.32 | 1.30 ± 0.15 | 1.55 ± 0.25 | 2.42 ± 5.19 |
|
| 9.35 ± 0.96 | 6.47 ± 0.39 | 1.58 ± 0.03 | 2.77 ± 0.06 | |
|
| 6.68 ± 0.78 | 1.70 ± 0.11 | 2.96 ± 0.08 | 6.35 ± 0.32 | |
|
| 0.9816 | 0.9368 |
|
|
Fig. 4The graphic view of the degree distributions for each network
Results of protein complex prediction based on different networks
| Network | Precision | Recall | F-measure |
|---|---|---|---|
| STRING | 0.213 | 0.268 | 0.236 |
| HumanNet | 0.151 | 0.142 | 0.146 |
| 5NN-IGFSN | 0.275 | 0.223 | 0.246 |
| RGFSN | 0.324 | 0.347 | 0.314 |
Fig. 5The graph view of three selected predicted protein complex