| Literature DB >> 24564762 |
Phi-Vu Nguyen, Sriganesh Srihari, Hon Wai Leong.
Abstract
BACKGROUND: Protein complexes conserved across species indicate processes that are core to cellular machinery (e.g. cell-cycle or DNA damage-repair complexes conserved across human and yeast). While numerous computational methods have been devised to identify complexes from the protein interaction (PPI) networks of individual species, these are severely limited by noise and errors (false positives) in currently available datasets. Our analysis using human and yeast PPI networks revealed that these methods missed several important complexes including those conserved between the two species (e.g. the MLH1-MSH2-PMS2-PCNA mismatch-repair complex). Here, we note that much of the functionalities of yeast complexes have been conserved in human complexes not only through sequence conservation of proteins but also of critical functional domains. Therefore, integrating information of domain conservation might throw further light on conservation patterns between yeast and human complexes.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24564762 PMCID: PMC4098725 DOI: 10.1186/1471-2105-14-S16-S8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Conservation of complexes between yeast and human. Many proteins in yeast have either 'split' into multiple proteins or fused into common proteins in human during evolution. This mechanism is a result of selecting optimal protein assemblies [14] thereby resulting in multi-fold expansion of complexity in human. In order to capture these conservation mechanisms it is necessary to integrate domain along with PPI information.
Figure 2Construction of the interolog network - a simplified example. Our interolog network constructing integrates PPI and domain conservation information to generate a network that is conducive for clustering algorithms to identify considerably more conserved complexes compared to direct clustering of the original PPI networks from species.
Figure 3Conservation scores for building benchmark complex datasets. We generate a "gold standard" conserved complexes dataset to test our method. We use two scores here - the Jaccard score for orthologous groups and multi-set Jaccard score.
Properties of yeast physical PPI datasets
| Database | # proteins | # (non self and duplicated) interactions |
|---|---|---|
| IntAct (version Nov 13, 2012) | 5276 | 18834 |
| Biogrid (version 3.2.95, Nov 30, 2012) | 5886 | 73923 |
| IntAct ∪Biogrid | 6332 | 83777 |
| IntAct∩Biogrid | 4620 | 8930 |
| ICDScore(IntAct ∪ Biogrid) | 5239 | 71636 |
Properties of human physical PPI datasets
| Database | # proteins | #interactions |
|---|---|---|
| HPRD (Release 9, 2010) | 9617 | 39184 |
| Biogrid (April 25, 2012) | 12515 | 59027 |
| HPRD ∪ Biogrid | 13624 | 76719 |
| HPRD ∩ Biogrid | 8615 | 21491 |
| ICDScore(HPRD ∪ Biogrid) | 8521(EntrezID) | 61868 |
| ICDEnrich(HPRD ∪ Biogrid) | 9764 (EntrezID) | 192053 (EntrezID) |
Properties of manually curated protein complex datasets
| Databases | # complexes |
|---|---|
| Total: 408 | |
| Total: 1843 | |
Properties of the interolog network constructed from yeast and human PPIs
| # Mapped nodes using orthology | 2470 |
|---|---|
| # Interologs | 6133 |
| Size of biggest connected component | 2434 nodes, 6112 edges |
| #Other connected components | 16 (size from 2-3) |
Comparisons of different methods on yeast data.
| Method | # Predicted complexes | # Matched predictions | Precision | # Gold standard conserved complexes | # Detected conserved complexes | Recall (of conserved complexes) |
|---|---|---|---|---|---|---|
| COCIN | 71 | 36 | 42 | 32 | ||
| CMC | 1202 | 145 | 12.1% | 42 | 23 | 54.8% |
| HACO | 1040 | 69 | 6.6% | 42 | 17 | 40.5% |
| MCL | 387 | 37 | 9.6% | 42 | 5 | 11.9% |
Predicted complexes: resulting network clusters
Matched predictions: resulting network clusters that match with benchmarks
Precision = #matched prediction/#predicted complexes
Recall = # detected conserved complexes/# gold standard conserved complexes
Comparisons of different methods on human data
| Method | # Predicted complexes | # Matched predictions | Precision | # Gold standard conserved complexes | # Detected conserved complexes | Recall (of conserved complexes) |
|---|---|---|---|---|---|---|
| COCIN | 71 | 36 | 118 | 78 | ||
| CMC | 1389 | 156 | 11.2% | 118 | 66 | 55.9% |
| HACO | 1290 | 80 | 6.2% | 118 | 36 | 30.5% |
| MCL | 631 | 45 | 7.1% | 118 | 24 | 20.3% |
Predicted complexes: resulting network clusters
Matched predictions: resulting network clusters that match with benchmarks
Precision = #matched prediction/#predicted complexes
Recall = # detected conserved complexes/# gold standard conserved complexes
One predicted complex of COCIN can match with many benchmark complexes, this explains for #detected conserved complexes > #matched predictions (as illustrated in Figures 5-8)
Figure 4An illustration on a predicted complexes from IN. (a) A predicted complex in the IN. (b) The corresponding complex in the human PPI network. (c) The corresponding complex in the yeast PPI network.
Figure 5COCIN compared to CMC. COCIN over the interolog network identifies significantly more conserved complexes compared to direct clustering of the original PPI networks using CMC [19].
Additional conserved complexes found in yeast
| ID | Complex name | Size | Jaccard score | Functional category | Functional description |
|---|---|---|---|---|---|
| 96 | eIF3 complex | 7 | 0.63 | Translation | Eukaryotic translation initiation factor |
| 247 | Transcription factor TFIID complex | 15 | 0.73 | Transcription | mRNA synthesis |
| 27 | DNA-directed RNA polymerase II complex | 12 | 0.69 | Transcription | mRNA synthesis |
| 45 | DNA replication factor C complex (Rad24p) | 5 | 0.67 | DNA processing | DNA synthesis and replication |
| 152 | DNA replication factor C complex (Rcf1p) | 5 | 0.67 | DNA processing | DNA synthesis and replication |
| 294 | Mcm2-7 complex | 6 | 0.6 | DNA processing | Chromosome maintainance, DNA synthesis and replication |
| 268 | SF3b complex | 6 | 0.57 | RNA processing | mRNA splicing |
| 65 | U6 snRNP complex | 8 | 0.5 | RNA processing | This complex combines with other snRNPs, unmodified pre-mRNA, and various other proteins to assemble a spliceosome, a large RNA-protein molecular complex upon which splicing of pre-mRNA occurs. |
| 375 | AP-3 adaptor complex | 4 | 0.67 | Cellular transport, vesicular transport | This complex is responsible for protein trafficking to lysosomes and other related organelles. |
| 25 | 20S proteasome | 14 | 0.5 | Cell cycle, protein fate | Proteasomal degradation (ubiquitin/proteasomal pathway), protein processing (proteolytic) |
| 137 | Chaperonin-containing T-complex | 8 | 0.67 | Protein fate | A multisubunit ring-shaped complex that mediates protein folding in the cytosol without a cofactor. |
Additional conserved complexes found in human
| ID | Complex name | Size | Jaccard score | Functional category | Function description |
|---|---|---|---|---|---|
| 4392 | EIF3 complex (EIF3A, EIF3B, EIF3G, EIF3I, EIF3C) | 5 | 0.57 | Translation | Translation initiation |
| 4403 | EIF3 complex (EIF3A, EIF3B, EIF3G, EIF3I, EIF3J) | 5 | 0.57 | Translation | Translation initiation |
| 104 | RNA polymerase II core complex | 12 | 0.69 | Transcription | mRNA synthesis |
| 2685 | RNA polymerase II | 17 | 0.59 | Transcription | mRNA synthesis |
| 2686 | BRCA1-core RNA polymerase II complex | 13 | 0.64 | Transcription | mRNA synthesis |
| 471 | PCAF complex | 10 | 0.6 | Transcription, DNA processing | DNA conformation modification (e.g. chromatin), modification by acetylation, deacetylation, organization of chromosome structure. |
| 2200 | RFC2-5 subcomplex | 4 | 0.5 | DNA processing | DNA synthesis and replication |
| 387 | MCM complex | 6 | 0.6 | DNA processing | Chromosome maintainance, DNA synthesis and replication |
| 369 | MMR complex 2 | 4 | 0.67 | DNA processing | DNA damage repair |
| 290 | MSH2-MLH1-PMS2-PCNA DNA-repair initiation complex | 4 | 0.67 | DNA processing | DNA damage repair initiation |
| 1169 | SNARE complex | 4 | 0.6 | Cellular transport, vesicular transport | Vesicle fusion, synaptic vesicle exocytosis |
| 562 | LSm2-8 complex | 7 | 0.67 | RNA processing | mRNA splicing |
| 561 | LSm1-7 complex | 7 | 0.67 | RNA processing | Control of mRNA stability during splicing |
| 3036 | Ubiquitin E3 ligase (SKP1A, SKP2, CUL1, CKS1B, RBX1) | 5 | 0.5 | Cell cycle, protein fate | Mitotic cell cycle and cell cycle control, modification by ubiquitination, deubiquitination |
| 2188 | Ubiquitin E3 ligase (CDC34, NEDD8, BTRC, CUL1, SKP1A, RBX1) | 5 | 0.5 | Cell cycle, protein fate | Mitotic cell cycle and cell cycle control, modification by ubiquitination, deubiquitination |
| 2189 | Ubiquitin E3 ligase (SMAD3, BTRC, CUL1, SKP1A, RBX1) | 5 | 0.5 | Cell cycle, protein fate | Mitotic cell cycle and cell cycle control, modification by ubiquitination, deubiquitination |
Figure 6Some examples of additional conserved complexes found in IN. The clusters detected from the original PPI networks include several noisy proteins and noisy interactions (false positives), thereby reducing their Jaccard accuracies.
Figure 7COCIN compared to HACO. COCIN over the interolog network identifies significantly more conserved complexes compared to direct clustering of the original PPI networks using HACO [20].
Figure 8COCIN compared to MCL. COCIN over the interolog network identifies significantly more conserved complexes compared to direct clustering of the original PPI networks using MCL [21].
Details of gold standard testing dataset for conserved protein complexes between human and yeast
| Score usage | |
|---|---|
| Threshold | 50% |
| # conserved | |
| Total: 79/408 (19.3%) | |
| # conserved | |
| Total: 219/1843 (11.9%) | |
Figure 9Assessment of Ensembl and OrthoMCL based homology for IN construction and conserved-complex detection. Ensembl [17] contains protein orthologs based on sequence similarity as well as domain information, while OrthoMCL [18] is predominantly based on sequence similarity. As we can see from the table, using domain information (through Ensembl) generates significantly more many-to-many ortholog mappings thereby enhancing our interolog construction.
Figure 10Some examples of the one-to-many and many-to-many relationships of complex conservation between human and yeast. Ensembl [17] contains protein orthologs based on sequence similarity as well as domain information, while OrthoMCL [18] is predominantly based on sequence similarity. As we can see from the table, using domain information (through Ensembl) generates significantly more many-to-many ortholog mappings thereby enhancing our interolog construction.
Homology data: Ensembl and OrthoMCL
| Ensembl database | OrthoMCL database | ||
|---|---|---|---|
| # Ortholog groups: | # 1-to-1 groups | 1096 | 1153 |
| # 1-Yeast-to-many groups | 756 | 434 | |
| # 1-Human-to-many groups | 116 | 116 | |
| # many-to-many groups | 197 | 167 | |
| Total: | |||
| # Human paralog groups: | 2573 | 2435 | |
| # Yeast paralog groups: | 426 | 393 | |
| Total # homolog groups: | 5164 | 4698 | |
Ensembl [17] contains protein orthologs based on sequence similarity as well as domain information, while OrthoMCL [18] is predominantly based on sequence similarity. As we can see from the table, using domain information (through Ensembl) generates significantly more many-to-many ortholog mappings thereby enhancing our interolog construction.
Figure 11Comparison between using Ensembl and OrthoMCL in constructing the interolog network. Ensembl [17] contains protein orthologs based on sequence similarity as well as domain information, while OrthoMCL [18] is predominantly based on sequence similarity. As we can see from the table, using domain information (through Ensembl) generates significantly more many-to-many ortholog mappings thereby enhancing our interolog construction.