| Literature DB >> 21113022 |
Sailu Yellaboina1, Asba Tasneem, Dmitri V Zaykin, Balaji Raghavachari, Raja Jothi.
Abstract
DOMINE is a comprehensive collection of known and predicted domain-domain interactions (DDIs) compiled from 15 different sources. The updated DOMINE includes 2285 new domain-domain interactions (DDIs) inferred from experimentally characterized high-resolution three-dimensional structures, and about 3500 novel predictions by five computational approaches published over the last 3 years. These additions bring the total number of unique DDIs in the updated version to 26,219 among 5140 unique Pfam domains, a 23% increase compared to 20,513 unique DDIs among 4346 unique domains in the previous version. The updated version now contains 6634 known DDIs, and features a new classification scheme to assign confidence levels to predicted DDIs. DOMINE will serve as a valuable resource to those studying protein and domain interactions. Most importantly, DOMINE will not only serve as an excellent reference to bench scientists testing for new interactions but also to bioinformaticans seeking to predict novel protein-protein interactions based on the DDIs. The contents of the DOMINE are available at http://domine.utdallas.edu.Entities:
Mesh:
Year: 2010 PMID: 21113022 PMCID: PMC3013741 DOI: 10.1093/nar/gkq1229
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Sources of DOMINE database contents
| Method/source | Number of DDIs | Description |
|---|---|---|
| iPfam | 4030 | iPfam contains a collection of DDIs that are observed in PDB entries. Data, dated 17 February 2007, were used. |
| 3did | 6066 | 3did is a collection of DDIs in proteins for which high-resolution 3D structures are known. Data, downloaded in September 2010, were used. |
| ME | 2391 | ME refers to a Bayesian approach that integrates DDIs predicted using a maximum likelihood estimation approach on yeast, worm, fruit fly and human PPI networks with gene ontology and domain fusion data. |
| RCDP | 960 | The RCDP approach uses sequence coevolution to predict the domain pair that is most likely to mediate a given PPI. Given a PPI, RCDP predicts the domain pair with the highest degree of co-evolution to be the mediating domain pair. Set of DDIs predicted from 1180 yeast PPIs (Raghavachari data set) was used. |
| P-value | 596 | P-value refers to the statistical approach that assigns |
| Fusion | 2768 | DDIs inferred using domain fusion hypothesis as reported in the Interdom database (v1.1) were used. |
| DPEA | 1812 | DPEA is a statistical approach to infer DDIs from PPI networks from many organisms. It uses an expectation–maximization algorithm to obtain probability of interaction for each potentially interacting domain pair, and computes the change in likelihood, expressed as a log odds score, by excluding this domain pair from being considered as a potentially interacting domain pair. DPEA was applied on PPI networks from 69 organisms (Riley data set), and the set of DDIs only between Pfam-A domains with log odds score ≥3.0 was used. |
| PE | 2588 | PE is an optimization approach based on the assumption that the set of true DDIs are well approximated by the minimum set of DDIs that can justify every PPI in a PPI network. Given a PPI network, the PE approach uses linear programming to compute the LP score for every domain pair that could possibly justify interaction between two proteins, and a |
| GPE | 1563 | GPE builds upon the PE approach by unifying domains that always occur together in a protein as a singular ‘supra-domain', and uses the linear programming framework as used by PE. GPE was applied on the redefined Riley data set (Guimaraes data set), and the set of DDIs only between Pfam-A domains with LP score ≥0.60 and pw-score ≤0.01 was used. Supra-domains were expanded back to individual Pfam-A domains. |
| DIPD | 2157 | DIPD constructs feature vectors for each protein pair within the sets of PPIs (Riley data set) and non-PPIs, and uses a discriminative classifier to identify the minimum set of domain pairs/triplets that can discriminate PPIs and non-PPIs. Each selected feature (domain pair) is a putative DDI. The sets of predictions on Raghavachari, Riley and Guimaraes data sets were used. |
| RDFF | 2475 | Chen and Liu's Random Decision Forest Framework (RDFF) approach explores all possible DDIs and predicts PPIs based on protein domains. The decision tree-based model is used to infer DDIs for each correctly predicted PPI. The set of DDIs only between Pfam-A domains was used. |
| K-GIDDI | 386 | K-GIDDI uses gene ontology information to construct an initial DDI network using the top |
| Insite | 2408 | Insite uses a naïve Bayes model to build upon features in DPEA. Its novel formulation of evidence models for PPIs and DDIs helps address noise (false positives) generated by high-throughput assays. |
| DomainGA | 459 | DomainGA is a genetic algorithm-type machine learning approach based on multi-parameter optimization. It uses the available PPI data to compute a score for domain pairs, which are then used to predict PPIs. Yeast PPI data set was used to identify 867 putative DDIs between domains defined based on information derived from the Interpro database. The set of 459 DDIs only between Pfam domains was used. |
| DIMA | 8012 | DIMA predicts DDIs based on phylogenetic profiling of presence/absence of domains in many organisms. |
aUpdated dataset.
bNew dataset.
Figure 1.Unsupervised hierarchical clustering of Jaccard index values for every pair of methods, based on the overlap of their predictions, is shown as a heat-map. Data used for generating this heatmap are available as Supplementary Table S2.
Figure 2.DOMINE construction and data characteristics. (A) Schematic overview of the DOMINE database construction. (B) Histograms showing the number of predicted DDIs with a confidence score S or above (black histogram; primary y-axis), and a fraction of them that are known to be true (green histogram; secondary y-axis). (C) Stacked histogram showing the fraction of predicted DDIs by each method classified as HCP, MCP or LCP.
Figure 3.DOMINE database contents (top panel), and percentage of HCP, MCP and LCP that are known to be true (bottom panel).
Figure 4.Screen shot of query result for HSP90 domain.