| Literature DB >> 23866986 |
Shide Liang1, Dandan Zheng, Daron M Standley, Huarong Guo, Chi Zhang.
Abstract
BACKGROUND: Construction of a reliable network remains the bottleneck for network-based protein function prediction. We built an artificial network model called protein overlap network (PON) for the entire genome of yeast, fly, worm, and human, respectively. Each node of the network represents a protein, and two proteins are connected if they share a domain according to InterPro database.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23866986 PMCID: PMC3720179 DOI: 10.1186/1752-0509-7-61
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Figure 1Relationship between quantity and size of sub-graphs. The PON of an entire genome was calculated.
Network properties of the main sub-graphs of four investigated genomes
| Yeast | 762 | 237 | 146 | 38.4 | 5.65 | 0.94 |
| Fly | 6,099 | 915 | 385 | 101.9 | 4.52 | 0.93 |
| Worm | 7,742 | 745 | 343 | 121.3 | 5.02 | 0.95 |
| Human | 25,455 | 1,478 | 551 | 312.5 | 4.14 | 0.92 |
The mean value was calculated for the degree and clustering coefficient of all nodes and the shortest path length between any two nodes in the network.
Figure 2Main sub-graph of yeast PON. (A) Main sub-graph consisting of two modules. (B) Network connections between two modules. The dashed line represents the edge between two nodes and the arrow indicates the domain connection within a protein. Two modules in the graph are connected by domain RWD [Pfam: PF05773] associated with protein binding function. Pkinase [Pfam: PF0069, upper module], DEAD [Pfam: PF00270, lower module], and Helicase_C [Pfam: PF00271, lower module] are prevailing domains associated with ATP binding function. Only annotated domains in InterPro database were presented for proteins GCN2 [UniProt: P15442], GIR2 [UniProt: Q03768], IMPACT homolog [UniProt: P25637], and a putative ATP-dependent RNA helicase [UniProt: Q06698].
Figure 3Distribution of degree values in human PON. The points in the box represent huge clusters of proteins with the same domain composition.
Results of protein function prediction with a single-genome PON
| Yeast | 30.2 | 37.1 | 33.9 | 34.0 | 63.0 |
| Fly | 34.2 | 54.1 | 47.4 | 37.5 | 59.6 |
| Worm | 32.6 | 53.0 | 38.3 | 36.0 | 61.0 |
| Human | 40.0 | 68.3 | 55.8 | 43.3 | 70.4 |
aPercentage of correctly predicted GO terms averaged throughout the whole genome.
bA node was predictable if its neighboring nodes were annotated with at least one GO term.
cPercentage of predictable and annotated nodes.
dThe GO term with the highest frequency of occurrence was evaluated.
eThe best of the top 3 predictions was evaluated in cases that neighboring nodes were annotated with at least three GO terms.
Figure 4Effect of domain diversity on function prediction accuracy. A GO term was used for prediction only if it was associated with at least a certain number of domain types at the neighboring nodes.
Results of protein function prediction with a composite PON of four genomes
| Yeast | 32.8 | 45.9 | 38.9 | 15.4 |
| Fly | 34.4 | 54.1 | 36.8 | 17.2 |
| Worm | 36.1 | 48.2 | 37.2 | 32.0 |
| Human | 38.4 | 57.0 | 38.8 | 19.4 |
aAnnotated and predictable with a single-genome PON.
bNew prediction with the composite PON.
Results of protein function prediction with the second layer nodes
| Yeast | 20.6 | 41.5 | 26.1 | 30.2 | 28.2 | 17.6 |
| Fly | 24.7 | 51.7 | 28.4 | 42.8 | 33.2 | 30.0 |
| Worm | 23.5 | 46.6 | 28.1 | 37.1 | 34.7 | 24.9 |
| Human | 28.5 | 55.3 | 31.6 | 47.5 | 36.7 | 33.0 |
The composite network of four genomes was used for prediction.
Results of prediction and evaluation with GOA database
| Yeast | 48.3 | 75.2 | 75.7 | 90.0 |
| Fly | 63.6 | 82.2 | 81.8 | 89.0 |
| Worm | 70.8 | 66.8 | 84.6 | 91.3 |
| Human | 60.7 | 84.6 | 80.6 | 90.8 |
| Yeast | 35.5 | 85.0 | 68.0 | 86.1 |
| Fly | 51.1 | 84.8 | 77.0 | 85.8 |
| Worm | 56.0 | 67.8 | 77.5 | 86.2 |
| Human | 49.3 | 79.1 | 72.7 | 84.6 |