| Literature DB >> 19333396 |
Chittibabu Guda1, Brian R King, Lipika R Pal, Purnima Guda.
Abstract
Knowledge of specific domain-domain interactions (DDIs) is essential to understand the functional significance of protein interaction networks. Despite the availability of an enormous amount of data on protein-protein interactions (PPIs), very little is known about specific DDIs occurring in them. Here, we present a top-down approach to accurately infer functionally relevant DDIs from PPI data. We created a comprehensive, non-redundant dataset of 209,165 experimentally-derived PPIs by combining datasets from five major interaction databases. We introduced an integrated scoring system that uses a novel combination of a set of five orthogonal scoring features covering the probabilistic, evolutionary, evidence-based, spatial and functional properties of interacting domains, which can map the interacting propensity of two domains in many dimensions. This method outperforms similar existing methods both in the accuracy of prediction and in the coverage of domain interaction space. We predicted a set of 52,492 high-confidence DDIs to carry out cross-species comparison of DDI conservation in eight model species including human, mouse, Drosophila, C. elegans, yeast, Plasmodium, E. coli and Arabidopsis. Our results show that only 23% of these DDIs are conserved in at least two species and only 3.8% in at least 4 species, indicating a rather low conservation across species. Pair-wise analysis of DDI conservation revealed a 'sliding conservation' pattern between the evolutionarily neighboring species. Our methodology and the high-confidence DDI predictions generated in this study can help to better understand the functional significance of PPIs at the modular level, thus can significantly impact further experimental investigations in systems biology research.Entities:
Mesh:
Year: 2009 PMID: 19333396 PMCID: PMC2659750 DOI: 10.1371/journal.pone.0005096
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Schematic diagram showing the derivation of datasets.
This example shows a set of PPIs, denoted P, from which the set of all interacting proteins, denoted P are derived. Using P, a set of putative non-interacting protein pairs, denoted as P are generated with some constraints as described in methods. The last section of this figure shows how all possible DDIs (D) are derived from P. D is created from P (not shown in the figure) essentially the same way as D is derived from P, with an additional step of filtering out domain pairs that do not exist in D.
Figure 2Flow diagram illustrating the scoring system and methodology.
Figure 3Cumulative distribution of positive and negative test datasets against the entire range of prediction score.
Positive test sets include iPfam DDIs and DDIs in single-domain PPIs, while negative test set includes DDIs created from random combination of domains in iPfam excluding the iPfam DDIs.
Figure 4ROC curve plotting true positive rate against false positive rate across the entire range of score thresholds.
Cumulative distribution of DDIs in iPfam, DDIs from single-domain PPIs and the negative DDIs across the entire score range of all possible DDIs in Dint.
| Score Threshold | <−6 | −6 | −3 | 0 | 3 | 6 | 9 | 12 | 15 | 18 | 21 | 24 | > = 27 |
| Cumulative % of iPfam DDIs | 100.0 | 100.0 | 99.8 | 98.8 | 95.7 | 89.0 |
| 68.6 | 54.0 | 38.2 | 24.9 | 13.7 | 6.0 |
| Cumulative % of single-domain PPIs | 100.0 | 100.0 | 99.8 | 97.7 | 89.8 | 85.1 |
| 38.5 | 27.6 | 18.6 | 10.7 | 5.9 | 2.4 |
| Cumulative % of negative DDIs | 100.0 | 99.2 | 92.4 | 73.5 | 50.1 | 26.8 |
| 6.4 | 2.4 | 1.0 | 0.4 | 0.2 | 0.0 |
Each column shows the cumulative % of data scoring the value in that column and higher. The entire range of all observed scores fell between −12.43 and 38.37.
Figure 5Comparison of the prediction accuracy of the current integrated method (IM) with Lee et al's method and GPE method.
Species-wise analysis of predicted domain-domain interactions.
| Species | 1 | 2 | 3 | 4 | 5 | 6 | ||
| Proteome size | Usable interacting proteins | Usable PPIs | Domain coverage in | Unique predicted DDIs | Species-specific DDIs | |||
| Total | % | Total | % | |||||
| Human | 37,993 | 12,311 | 32.4 | 47,837 | 5,099 (76%) | 25,287 | 14,918 | 60.0 |
| Mouse | 32,745 | 3,173 | 9.7 | 4,258 | 2,400 (36%) | 4,197 | 304 | 7.2 |
|
| 16,273 | 6,775 | 41.6 | 20,578 | 3,643 (76%) | 7,226 | 2,568 | 35.5 |
|
| 22,515 | 2,488 | 11.1 | 3,934 | 2,025 (45%) | 2,340 | 463 | 19.8 |
| Yeast | 5,800 | 4,381 | 75.5 | 43,403 | 3,303 (90%) | 19,083 | 11,820 | 61.9 |
|
| 5,250 | 730 | 13.9 | 1,246 | 828 (36%) | 858 | 218 | 25.4 |
|
| 4,330 | 3,205 | 74.0 | 10,589 | 3,221 (82%) | 12,044 | 9,670 | 80.3 |
|
| 35,011 | 1,070 | 3.1 | 2,642 | 749 (16%) | 1,288 | 383 | 29.7 |
A usable interacting protein is the one that has at least one InterPro domain mapped to it.
Usable PPIs are those whose partner proteins have at least one InterPro domain mapped on each.
Percentage value in parenthesis is calculated against the total number of InterPro domains mapped in the entire proteome of each species.
A score threshold of 10 is used to predict DDI(s) in a given PPI and those DDIs that are unique within a species are selected.
Species-specific DDIs are those that are found only in a particular species.
Percentage of species-specific DDIs over the unique predicted DDIs in column 5. This number indicates the extent of preservation of DDIs within a species that were not found in any other species.
Cross-species comparison of domain-domain interactions from eight model organisms.
| Species | HUMAN | MOUSE | DROME | CAEEL | YEAST | PLAF7 | ARATH | ECOLI |
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
HUMAN-Homo sapiens; MOUSE-Mus musculus; DROME-Drosophila melanogaster; CAEEL-Caenorhabditis elegans; YEAST-Saccharomyces cerevisiae; PLAF7-Plasmodium falciparum; ARATH-Arabidopsis thaliana; ECOLI-Escherichia coli. The upper diagonal shows the number of overlapping DDIs between specific pairs of species. Values in the lower diagonal represent DDI conservation similarity between specific pairs of species.
Figure 6Pie distribution of conserved and non-conserved DDIs in eight model organisms.
Top five predicted DDIs that are conserved in multiple species, sorted by the number of species conserved.
| sps# | Domain-1 | Name | Domain-2 | Name | Species | Score |
| 8 | IPR009057 | Homeodomain_like | IPR009057 | Homeodomain_like | HMDCYPAE | 25.7 |
| 8 | IPR000719 | Prot_kinase_core | IPR001650 | DNA/RNA_helicase_C | HMDCYPAE | 13.5 |
| 8 | IPR000719 | Prot_kinase_core | IPR003593 | AAA+_ATPase_core | HMDCYPAE | 13.3 |
| 8 | IPR003593 | AAA+_ATPase_core | IPR003593 | AAA+_ATPase_core | HMDCYPAE | 11.7 |
| 8 | IPR012335 | Thioredoxin_fold | IPR012335 | Thioredoxin_fold | HMDCYPAE | 10.5 |
| 7 | IPR001353 | Proteasome_A_B | IPR001353 | Proteasome_A_B | HMDCYAE | 35.3 |
| 7 | IPR012287 | Homeodomain-rel | IPR012287 | Homeodomain-rel | HMDCYAE | 31.1 |
| 7 | IPR004045 | GST_N | IPR010987 | GST_C_like | HMDCYAE | 23.4 |
| 7 | IPR000719 | Prot_kinase_core | IPR003527 | MAP_kin | HMDCYPA | 22.9 |
| 7 | IPR004045 | GST_N | IPR004045 | GST_N | HMDCYAE | 20.6 |
| 6 | IPR004827 | TF_bZIP | IPR011700 | bZIP_2 | HMDCYA | 37.4 |
| 6 | IPR000426 | Proteasome_alpha | IPR001353 | Proteasome_A_B | HMDCYA | 36.9 |
| 6 | IPR001356 | Homeobox | IPR012287 | Homeodomain-rel | HMDCYA | 36.6 |
| 6 | IPR001092 | HLH_basic | IPR011598 | HLH_DNA_bd | HMDCYA | 30.5 |
| 6 | IPR013088 | Znf_NHR/GATA | IPR013088 | Znf_NHR/GATA | HMDCYE | 29.9 |
| 5 | IPR000243 | Pept_T1A_subB | IPR001353 | Proteasome_A_B | HMCYA | 36.3 |
| 5 | IPR008331 | Ferritin_Dps | IPR008331 | Ferritin_Dps | HMDCE | 32.5 |
| 5 | IPR001114 | AdlSucc_Synth | IPR001114 | AdlSucc_Synth | HMYAE | 31.8 |
| 5 | IPR001163 | LSM_snRNP_core | IPR010920 | LSM_related_core | HMDCY | 31.6 |
| 5 | IPR009078 | Ferritin/RR_like | IPR012347 | Ferritin_rel | HMDCE | 29.6 |
| 4 | IPR008946 | Nucl_hrmn_rcpt_lig_bd | IPR013088 | Znf_NHR/GATA | HMDC | 38.4 |
| 4 | IPR000793 | ATPase_a_b_C | IPR004100 | ATPase_a_b_N | HYAE | 37.3 |
| 4 | IPR007860 | MutS_II | IPR007861 | MutS_IV | HCYE | 37.1 |
| 4 | IPR001628 | Znf_hrmn_rcpt | IPR008946 | Nucl_hrmn_rcpt_lig_bd | HMDC | 36.7 |
| 4 | IPR011261 | RNAP_dimerisation | IPR11262 | RNAP_insert | HDYE | 36.4 |
| 3 | IPR002398 | Pept_C14_p45 | IPR011600 | Pept_C14_cat | HMD | 37.2 |
| 3 | IPR009025 | RNA_pol_RBP11-like | IPR011261 | RNAP_dimersation | HDY | 36.5 |
| 3 | IPR013506 | Topo_IIA_B_2 | IPR013760 | Topo_IIA_cen | HYE | 36.0 |
| 3 | IPR009025 | RNA_pol_RBP11-like | IPR011262 | RNAP_insert | HDY | 35.5 |
| 3 | IPR002205 | Topo_IIA_A/C | IPR013757 | Topo_IIA_A_a | HYE | 35.4 |
| 2 | IPR002314 | AA-tRNA-synt_IIb | IPR015805 | His-tRNA_synth | HE | 37.4 |
| 2 | IPR000971 | Globin_subset | IPR002337 | Haemoglobin_b | HM | 34.6 |
| 2 | IPR001217 | STAT | IPR013800 | STAT_alpha | HC | 34.5 |
| 2 | IPR009056 | Cyt_c_monohaem | IPR012282 | Cytochrome_c_R | HY | 34.5 |
| 2 | IPR007120 | RNA_pol_Rpb2_6 | IPR007644 | RNApol_bsu_protrusn | HY | 34.3 |
| 1 | IPR002100 | TF_MADSbox | IPR002487 | TF_Kbox | A | 38.4 |
| 1 | IPR004516 | His-tRNA_synth_IIA | IPR015805 | His-tRNA_synth | E | 38.3 |
| 1 | IPR002104 | Integrase_cat-core phage | IPR013762 | Integrase-like_cat-core_phage | E | 37.6 |
| 1 | IPR007627 | RNA_pol_sigma70_r2 | IPR014284 | RNA_pol_sigma-70 | E | 36.4 |
| 1 | IPR001576 | Phosphoglycerate_kinase | IPR015824 | Phosphoglycerate_kinase-N | Y | 35.9 |
H-human, M-mouse, D-Drosophila, C-C.elegans, Y-yeast, P-Plasmodium, A-Arabidopsis, E-E. coli.