Literature DB >> 17999995

DIMA 2.0--predicted and known domain interactions.

Philipp Pagel¹, Matthias Oesterheld, Oksana Tovstukhina, Norman Strack, Volker Stümpflen, Dmitrij Frishman.

Abstract

DIMA-the domain interaction map has evolved from a simple web server for domain phylogenetic profiling into an integrative prediction resource combining both experimental data on domain-domain interactions and predictions from two different algorithms. With this update, DIMA obtains greatly improved coverage at the level of genomes and domains as well as with respect to available prediction approaches. The domain phylogenetic profiling method now uses SIMAP as its backend for exhaustive domain hit coverage: 7038 Pfam domains were profiled over 460 completely sequenced genomes. Domain pair exclusion predictions were produced from 83 969 distinct protein-protein interactions obtained from IntAct resulting in 21 513 domain pairs with significant domain pair exclusion algorithm scores. Additional predictions applying the same algorithm to predicted protein interactions from STRING yielded 2378 high-confidence pairs. Experimental data comes from iPfam (3074) and 3did (3034 pairs), two databases identifying domain contacts in solved protein structures. Taken together, these two resources yielded 3653 distinct interacting domain pairs. DIMA is available at http://mips.gsf.de/genre/proj/dima.

Entities: Chemical Gene Species

Mesh：

Substances：
Proteins

Year: 2007 PMID： 17999995 PMCID： PMC2238836 DOI： 10.1093/nar/gkm996

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Conserved domains represent the building blocks of protein architecture. Many of the domains known today are re-used in a variety of different proteins in a modular fashion, thus conferring a large range of structural and functional features to their host proteins. The problem of biological annotation, formal description and classification of these domains has been addressed by several groups leading to important resources such as SMART (1), BLOCKS (2), PFAM (3) and the integration endeavor InterPro (4). While the majority of conserved domains are mainly characterized by their biochemical activity or structural importance, a significant fraction represents adapters for physical binding. Examples include the well-described SH3, WW and PDZ domains (5), which are present in large numbers of functionally unrelated proteins, serving as universal interaction modules. Numerous approaches to the problem of systematically describing known domain interactions and identifying yet unknown interaction domains have been proposed (6–10). The PFAM database has added a domain interaction resource called iPFAM to their site that describes protein domains found to engage in physical contact (11). Known PFAM domains were matched to intra- and inter-protein contacts found in solved protein structures from the protein structure database PDB (protein data bank) (12) and their interactions annotated in iPFAM. A very similar approach has been taken by Stein et al. (13) resulting in the 3did database of domain contacts. In addition to physical binding, individual domains can be linked by common biochemical or cellular functions. In the prediction of domain–domain interactions, we often cannot distinguish between physical and functional relations and treat physical binding as a special case of a functional link. The same is true for most techniques predicting protein–protein interactions. Based on the well-known method of protein phylogenetic profiling, we had introduced the idea of domain phylogenetic profiling and demonstrated its utility for linking functionally related and physically interacting proteins (14). Other approaches to building domain interaction networks include the ‘domain team’ approach (15) that identifies functionally coupled domains based on their chromosomal location, as well as direct experimental evidence. DIMA—the domain interaction map—was launched in 2005 as an online platform for our domain phylogenetic profiling approach and was soon extended to also include physical domain contacts from iPfam (16). In the greatly improved and extended 2.0 release, we have added new data sources and prediction methods: iPFAM domain contacts are now complemented by a similar dataset from 3did, domain profiling now covers the latest Pfam release and has more than doubled the number of genomes used for profiling to 460. With the domain pair exclusion algorithm (DPEA), we have integrated another prediction algorithm, which we apply to experimental and predicted data. Finally, we now use the SIMAP resource for highly efficient domain profiling. Below, we report in detail on the state of DIMA, which allows users to explore protein domain networks based on links produced by both experimental evidence and domain-relation predictions.

DOMAIN INTERACTIONS FROM KNOWN PROTEIN STRUCTURES

As for all areas of biology, experimental support of domain interactions is the most reliable source of data. While in the case of protein–protein interactions, a wealth of data has been collected in several well-maintained databases (17–23), the situation is quite different for domains. For the majority of protein–protein interactions found in the literature, no detailed information on the domains mediating the contact is provided and no comprehensive large-scale experiments for domain interactions are available to our knowledge. We have included two datasets of domain contacts derived from solved protein structures in the PDB (12) into DIMA: iPfam, integrated with the well-known Pfam resource (11) and 3did (13) which represents an independent database with a similar scope at the EMBL. Both of these datasets contain physical domain–domain interactions in separate protein chains as well as contacts within the same chain and are currently the only gold standard datasets available. Currently, iPfam and 3did contain 3074 and 3034 unique domain pairs, respectively. The union of both sets contains 3653 distinct domain pairs.

DOMAIN INTERACTIONS FROM COMPREHENSIVE PROTEIN INTERACTION DATA

As stated in the introduction, beyond direct physical binding the term protein interactions is often used to indicate functional coupling between proteins involved in the same signaling pathway or catalyzing subsequent steps of a biochemical reaction. The same is obviously true for conserved domains. To our knowledge, no databases of experimentally supported functional interactions among protein domains exist, beyond the protein structure based domains contacts described above. Given the obvious incompleteness of the physical contact datasets and the non-existence of functional interaction data, prediction of domain relations is of great importance for our understanding of the role and contribution of modular proteins to the systems-level mechanisms of the cell. An important approach to identifying interacting domains is built upon the idea that interacting domain pairs will be overrepresented in pairs of interacting proteins and this signal can be detected by statistical analysis. Many variations and improvements of the idea have been put forward (6–10). One of the best methods as of today is the DPEA by Riley et al. (6), which uses the expectation maximization algorithm to produce a maximum-likelihood estimate of the probability of interaction for domain pairs and evaluates the contribution of each pair of putative interacting domains by a modified likelihood ratio test (E-scores). In order to get the best results possible, it is of great importance to run the algorithm on a large high-quality dataset of protein–protein interactions. With the recent formation of the international molecular exchange consortium (IMEx) and the resulting exchange of interaction data among the major players in the PPI field (DIP, IntAct, MINT, MPact, BioGRID and BIND), the task of obtaining a comprehensive dataset has become simple, as the archival resources DIP and IntAct will hold all relevant data. In DIMA, we use the PSIMItab data from IntAct (release date 2007-08-31) (22). The dataset provides 124 935 pairwise protein interactions from which we extracted 83 969 unique pairs from 159 different species for which Uniprot IDs were available. Pfam domain annotation for all proteins involved was obtained from the Uniprot-Swissprot and Uniprot-TREMBL data (24) resulting in a total of 126 260 possible interacting domain pairs.

DOMAIN PHYLOGENETIC PROFILING

Phylogenetic profiling was introduced as a means of predicting functional links and physical interactions among proteins by analyzing the presence or absence of orthologs over a large number of genomes (25). Proteins linked by common function were found to have correlated phylogenetic profiles—i.e. be either both present or absent from a given genome. The method has proven very useful for assigning functional annotation to novel proteins and newly sequenced organisms. In DIMA, we apply the phylogenetic profiling approach to conserved domain represented by hits of the Pfam HMMs. As of version 2.0, DIMA domain profiling is carried out on 460 completely sequenced prokaryotic and eukaryotic genomes. Comprehensive Pfam domain coverage of all sequences was provided by SIMAP (26).

DOMAIN INTERACTIONS FROM PREDICTED PROTEIN INTERACTIONS

Despite great advances in coverage of the interactome of many important model organisms in recent years, the available data is still far from complete. Therefore, prediction of protein–protein interactions and functional relations is a very important addition. The STRING database (27) unites a large number of prediction approaches and experimental data resulting in a comprehensive scored list of predicted interactions. Although not as reliable as interactions supported by experimental evidence, the resource is reputed to produce high-quality results. DIMA includes domain interaction predictions by the DPEA using the predicted interactions from STRING release 6.3 to complement the data derived from IntAct. Combined STRING scores were computed on the purely predicted evidence categories and a conservative threshold of 0.9 was applied to yield a set of high confidence PPI predictions for the subsequent DPEA analysis.

AVAILABILITY

DIMA is available at http://mips.gsf.de/genre/proj/dima. The web interface allows easy searching by domain identifier, domain description or sequence. Preferences such as phylogenetic profiling distance metrics and thresholds, entropy filtering, DPEA cutoffs and selection of organisms to be profiled can easily be changed by the user. Results are primarily presented in a concise table format (Figure 1) showing the predictions and data sources supporting the domain relations and the user can choose to view a graphical representation of the local domain neighborhood (Figure 2) or details on the domain phylogenetic profiling results (Figure 3). The network can be navigated by centering any domain in the neighborhood and re-computing its respective interactions.

Figure 1.

Results are presented in a concise table showing all relevant information.

Figure 2.

If desired, a graphical representation of the local domain neighborhood is shown.

Figure 3.

For the profiling method detailed results can be examined or raw data downloaded.

Results are presented in a concise table showing all relevant information. If desired, a graphical representation of the local domain neighborhood is shown. For the profiling method detailed results can be examined or raw data downloaded. For large-scale analysis or incorporation of DIMA data in own projects, we offer the option to compute the entire domain interaction network and return the results to the user by email upon completion. The DIMA backend program is available from the authors on request.

SYSTEM ARCHITECTURE

The pre-processing and backend software was written in Python with the exception of the domain profile neighbor search, which was implemented in C++ for performance reasons. The backend tool returns results as tab-separated tables for easy parsing and analysis outside the web environment. The web frontend was implemented in Java as part of the GenRe framework at MIPS and handles all user interactions.

CONCLUSION AND OUTLOOK

In addition to keeping DIMA up to date in terms of available data and user options, future work will comprise adding new algorithms and adding/updating relevant data sources as they become available. An important goal for the future is the derivation of a useful combined scoring scheme for all methods. Currently, this is hardly possible due to the extreme scarcity and methodological bias (exclusively protein structure data) of experimentally validated domain interaction data that are a prerequisite for a meaningful calibration. Since its original publication, DIMA has evolved from a domain phylogenetic profiling platform to an integrated resource for domain interaction prediction. Of course, DIMA is not the only resource with respect to domain interactions. InterDom (28,29) is a service with similar goals but focuses more on explaining protein interactions with predicted domain interactions and to our knowledge incorporates fewer methods than DIMA. Using up-to-date data, a huge set of completely sequenced genomes and state-of-the-art algorithms DIMA provides a comprehensive resource of domain interactions with great value to the user.

29 in total

Review 1. Assembly of cell regulatory systems through protein interaction domains.

Authors: Tony Pawson; Piers Nash
Journal: Science Date: 2003-04-18 Impact factor: 47.728

2. Large scale statistical prediction of protein-protein interaction by potentially interacting domain (PID) pair.

Authors: Wan Kyu Kim; Jong Park; Jung Keun Suh
Journal: Genome Inform Date: 2002

3. Integrative approach for computationally inferring protein domain interactions.

Authors: See-Kiong Ng; Zhuo Zhang; Soon-Heng Tan
Journal: Bioinformatics Date: 2003-05-22 Impact factor: 6.937

4. InterDom: a database of putative interacting protein domains for validating predicted protein interactions and complexes.

Authors: See-Kiong Ng; Zhuo Zhang; Soon-Heng Tan; Kui Lin
Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971

DIMA 2.0--predicted and known domain interactions.

INTRODUCTION

DOMAIN INTERACTIONS FROM KNOWN PROTEIN STRUCTURES

DOMAIN INTERACTIONS FROM COMPREHENSIVE PROTEIN INTERACTION DATA

DOMAIN PHYLOGENETIC PROFILING

DOMAIN INTERACTIONS FROM PREDICTED PROTEIN INTERACTIONS

AVAILABILITY

SYSTEM ARCHITECTURE

CONCLUSION AND OUTLOOK

Review 1. Assembly of cell regulatory systems through protein interaction domains.

2. Large scale statistical prediction of protein-protein interaction by potentially interacting domain (PID) pair.

3. Integrative approach for computationally inferring protein domain interactions.

4. InterDom: a database of putative interacting protein domains for validating predicted protein interactions and complexes.

5. BIND: the Biomolecular Interaction Network Database.

6. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles.

7. Predicting protein-protein interactions from protein domains using a set cover approach.

8. New features of the Blocks Database servers.

9. Correlated sequence-signatures as markers of protein-protein interaction.

10. Inferring domain-domain interactions from protein-protein interactions.

1. DIMA 3.0: Domain Interaction Map.

2. Function prediction and analysis of mycobacterium tuberculosis hypothetical proteins.

Review 3. Protein function annotation by homology-based inference.

4. DASMIweb: online integration, analysis and assessment of distributed protein interaction data.

5. Inferring protein function by domain context similarities in protein-protein interaction networks.

6. DASMI: exchanging, annotating and assessing molecular interaction data.