| Literature DB >> 26019559 |
Lingtao Su1, Guixia Liu1, Han Wang2, Yuan Tian1, Zhihui Zhou1, Liang Han1, Lun Yan1.
Abstract
Identification of protein complexes is of great importance in the understanding of cellular organization and functions. Traditional computational protein complex prediction methods mainly rely on the topology of protein-protein interaction (PPI) networks but seldom take biological information of proteins (such as Gene Ontology (GO)) into consideration. Meanwhile, the environment relevant analysis of protein complex evolution has been poorly studied, partly due to the lack of high-precision protein complex datasets. In this paper, a combined PPI network is introduced to predict protein complexes which integrate both GO and expression value of relevant protein-coding genes. A novel protein complex prediction method GECluster (Gene Expression Cluster) was proposed based on a seed node expansion strategy, in which a combined PPI network was utilized. GECluster was applied to a training combined PPI network and it predicted more credible complexes than peer methods. The results indicate that using a combined PPI network can efficiently improve protein complex prediction accuracy. In order to study protein complex evolution within cells due to changes in the living environment surrounding cells, GECluster was applied to seven combined PPI networks constructed using the data of a test set including yeast response to stress throughout a wine fermentation process. Our results showed that with the rise of alcohol concentration, protein complexes within yeast cells gradually evolve from one state to another. Besides this, the number of core and attachment proteins within a protein complex both changed significantly.Entities:
Keywords: GO; PPI; core and attachment protein; evolution; gene expression value; protein complex
Year: 2014 PMID: 26019559 PMCID: PMC4433864 DOI: 10.1080/13102818.2014.946700
Source DB: PubMed Journal: Biotechnol Biotechnol Equip ISSN: 1310-2818 Impact factor: 1.632
Figure 1. Flowchart diagram on how to construct dynamic PPI networks.
Note: val(A) represents the expression value of gene ‘A’; Mean(A) represents the mean expression value of gene ‘A’, which is calculated by computing the average expression value of gene ‘A’ at different time points.
Figure 2. Flowchart diagram on how to construct combined PPI networks.
Figure 3. Flowchart diagram of the GECluster algorithm.
Notes: M(i,j) is the weighted matrix of combined PPI network whose elements represent the edge weight value in combined PPI network. D(i) is an array, with element values ordered from large to small and the value represents the degree of network node. Cluster_Coefficient1 represents the Cluster_Coefficient value of the new cluster before the selected node is included. Cluster_Coefficient2 represents the Cluster_Coefficient value of the new cluster after the selected node is added into the cluster.
Figure 4. Degree and path length distribution of dynamic PPI network inferred and static PPI network.
Network information.
| Network name | Node number | Edge number |
|---|---|---|
| Static PPI network | 4971 | 21937 (no self-loop) |
| Dynamic PPI network | 2078 | 6823 |
| Combined PPI network | Node number | Added edge number |
| 2078 | 1475155 |
Figure 5. Prediction accuracy comparison between GECluster, MCODE and CFinder.
Complex function annotates results.
| Algorithm | Complex | Function term | Size | Annotation score | |
|---|---|---|---|---|---|
| GECluster | CDC16, CDC26, APC11, CDC27, DOC1, APC2 | Cyclin catabolic process | 6 | 100% | 3.5E−14 |
| RRP43, SKI6, CSL4, RRP45, RRP46 DIS3, RRP4 | Exosome | 7 | 100% | 9.3E−18 | |
| RPB8, RPC25, RPC34, RPC17 | DNA-directed RNA polymerase III complex | 4 | 100% | 8.2E−8 | |
| ORC3, ORC1, ORC2, ORC5, ORC6, ORC4 | Origin recognition complex | 6 | 100% | 3.5E−16 | |
| KTI12, ELP3, ELP2, IKI1 | tRNA wobble uridine modification | 4 | 100% | 1.4E−7 | |
| SWC4, YAF9, HTZ1, SWC7, SWR1 | Chromatin regulator | 5 | 100% | 4.8E−8 | |
| SAS5, SAS4, SAS2 | SAS acetyltransferase complex | 3 | 100% | 2.8E−7 | |
| MPE1, YSH1, YTH1, FIP1, PTA1, CFT1 | mRNA cleavage and polyadenylation specificity factor complex | 6 | 100% | 2.6E−13 | |
| CLF1, PRP19, PRP45, CEF1 | Spliceosome | 4 | 100% | 5.2E−7 | |
| SYF1, SYF2, ISY1, PRP19, CLF1 | First spliceosomal transesterification activity | 5 | 100% | 1.6E−11 | |
| MCODE | MED2, GAL11, MED8, ROX3, MED7, SRB4, SPT15, SRB5 | Srb-mediator complex | 8 | 85.7% | 1.2E−13 |
| CDC26, CDC16, CDC27, APC2, DOC1, APC11 | Cyclin catabolic process | 6 | 100% | 3.5E−14 | |
| GAS3, GPI8, NSG1, PHO86, GPI2, SUR2, BSD2 | Endoplasmic reticulum | 7 | 85.7% | 4.5E−5 | |
| RIX1, IPI3, BUD20, NOG2, SDA1, ARX1, NOP15 | Ribosomal large subunit biogenesis | 7 | 85.7% | 3.4E−9 | |
| TUM1, NCS6, UBA4, NCS2 | Wobble position uridine thiolation | 4 | 100% | 5.2E−10 | |
| GIM5, YKE2, TUB4, PAC10 | Tubulin complex assembly | 4 | 75% | 5.3E−6 | |
| HRR25, LTV1, RIO2, TSR1, NUG1, RPS28B, EDC3 | Ribonucleoprotein complex biogenesis | 7 | 100% | 2.8E−7 | |
| MED2, GAL11, MED8, ROX3, MED7, SRB4, SPT15, SRB5 | Srb-mediator complex | 8 | 85.7% | 1.2E−6 | |
| ELP4, RPO21, RPB5, TFG2, RPB7, RPB2, RPB9, RPB3, RPB4, IKI3, ELP3, ELP2, IKI1, ELP6 | DNA-directed RNA polymerase II, core complex | 14 | 50% | 2.2E−13 |
Note: Element list in complex column represents the complex predicted by the corresponding method. Each complex consists of several proteins labelled by the gene names (named by Committee of Human Gene Nomenclature) and separated by commas. Function term describes the function of each complex. Annotation score is calculated by counting the number of proteins that have the annotated function term, out of all the proteins in the protein complex.
Figure 6. Influence of FS_Weight min on Precision, Recall and F1.
Figure 7. Influence of FS_Weight min on Precision, Recall and F1.
Information about the dynamic PPI networks and combined PPI networks inferred from test datasets and the complex numbers predicted.
| Dynamic network | Node number | Edge number | Clustering coefficient | Network diameter | Characteristic path length | Combined network | Added edges | Complex numbers |
|---|---|---|---|---|---|---|---|---|
| DPPI1 | 2227 | 7230 | 0.126 | 12 | 4.226 | CPPI1 | 1,612,056 | 199 |
| DPPI2 | 2123 | 6954 | 0.101 | 10 | 4.067 | CPPI2 | 1,404,088 | 134 |
| DPPI3 | 2124 | 6624 | 0.121 | 12 | 4.177 | CPPI3 | 1,419,210 | 174 |
| DPPI4 | 1945 | 4987 | 0.103 | 12 | 4.443 | CPPI4 | 1,238,366 | 101 |
| DPPI5 | 2081 | 5067 | 0.075 | 11 | 4.445 | CPPI5 | 1,386,405 | 87 |
| DPPI6 | 1921 | 4467 | 0.081 | 11 | 4.564 | CPPI6 | 1,138,527 | 95 |
| DPPI7 | 1506 | 3117 | 0.06 | 12 | 4.709 | CPPI7 | 698,317 | 70 |
Note: Node number is the protein number in the dynamic network; clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together; network diameter is the average minimum distance between pairs of nodes; characteristic path length is defined as the average number of edges in the shortest paths between all vertex pairs; added edges is the number of edges added when constructing combined PPI network; complex numbers are the number of complexes predicted.
Figure 8. Relationships among protein complexes between different PPI networks.