| Literature DB >> 22928109 |
Jaroslav Kubrycht1, Karel Sigler, Pavel Souček.
Abstract
Virtual interactomics represents a rapidly developing scientific area on the boundary line of bioinformatics and interactomics. Protein-related virtual interactomics then comprises instrumental tools for prediction, simulation, and networking of the majority of interactions important for structural and individual reproduction, differentiation, recognition, signaling, regulation, and metabolic pathways of cells and organisms. Here, we describe the main areas of virtual protein interactomics, that is, structurally based comparative analysis and prediction of functionally important interacting sites, mimotope-assisted and combined epitope prediction, molecular (protein) docking studies, and investigation of protein interaction networks. Detailed information about some interesting methodological approaches and online accessible programs or databases is displayed in our tables. Considerable part of the text deals with the searches for common conserved or functionally convergent protein regions and subgraphs of conserved interaction networks, new outstanding trends and clinically interesting results. In agreement with the presented data and relationships, virtual interactomic tools improve our scientific knowledge, help us to formulate working hypotheses, and they frequently also mediate variously important in silico simulations.Entities:
Year: 2012 PMID: 22928109 PMCID: PMC3423939 DOI: 10.1155/2012/976385
Source DB: PubMed Journal: Mol Biol Int ISSN: 2090-2182
Some more recent online accessible bioinformatic tools.
| Programs | Purpose | Input | Evaluation tools and procedures | Compared structures | Internet accessibility | References |
|---|---|---|---|---|---|---|
| Program tools | ||||||
|
| ||||||
| 3D-BLAST (two methods) | ∗Identification of 23 states of the structural alphabet | PDB ACF | ∗SASM | ∗structural alphabet sequence databases |
|
[ |
| ∗Comparison of protein folds using spherical polar Fourier basis functions. | ∗Carbo-like similarity score | ∗SPF shape density rep. |
| |||
| DASMIweb | Online integration, analysis and assessment of distributed protein interaction data and predictions (interactomic mining) | some protein identifiers | literature curation, prediction, 3D analysis | records of interactions |
| [ |
| FFAS03 | Server accepts a user supplied protein sequence and automatically generates a profile, which is then compared with several sets of sequence profiles of proteins databases | QS | PPA | DS |
| [ |
| KinasePhos 2.0 | Prediction of kinase-specific phosphorylation sites based on the site sequences from databases | QS | k-fold + Jackknife cross-validations | phosphorylation sites from Phospho.ELM Swiss-Prot |
| [ |
| Phos3D | Method of phosphorylation-site prediction based on 3D structural information associated with 530 phys-chem-pr | QS (+3D context) | SVM, spatial amino acid propensities | PDB coordinates |
| [ |
| SeSAW | Identification of functionally or evolutionarily conserved motifs based on balancing between sequence and structural similarities | ∗PDB QF, ID; ∗PDB QF, ID, IDt |
| conserved domains templates |
| [ |
|
| ||||||
| Databases | ||||||
|
| ||||||
| ADAN | Prediction of protein-protein interactions of different modular protein domains mediated by linear motifs | PDB ACF | PSSM_PI | integrated DaSt |
| [ |
| CDD (RPS BLAST) | Conserved domain relationships, domain location of frequent binding sites | QS | model MSA derived PSSM | Conserved domains (PSSM) |
| [ |
| GWIDD | Docking database; integrated resource for studies of protein-protein interactions on the genome scale | search interface | docking techniques | DaSt |
| [ |
| MegaMotifBase (structural database) | 3D motif orientation, intermotif distances, solvent accessibility, secondary structure content, hydrogen bonding, residual packaging, familiar/superfamiliar/none motif relationship | motif or sequence pattern | complex evaluation including projection | DS + DaSt |
| [ |
| PTGL | Database for secondary structure-based protein topologies; visualization of topology diagrams and 3D structures | PDB QF | ULN4D_graph theory | DS + DaSt |
| [ |
| RsiteDB | Database of protein binding pockets which interact with single strand RNA | PDB code, UPS | 3D arrangement of phys-chem-pr | 3D-CBP |
| [ |
∗Independent alternatives; 3D-CBP: 3D consensus binding patterns important for protein-nucleotide recognition; DS: database sequences; DSA: double sequence alignment; DaSt: database structures; CS: list of compared sequences; ID, IDt: chain or template ID, respectively; MSA: multiple sequence alignment; QS: query sequence(s) (instead of QS, clonal names or gi numbers can be used in majority of given approaches); phys-chem-pr: physicochemical properties; PDB: Protein data bank; PDB QF: PDB-formatted query file; PDB ACF: PDB-derived atom coordinate file; PPA: profile-profile alignment; PSSM: position-specific scoring matrices; PSSM_PI: PSSM for protein-protein interactions calculated by FoldX; rep.: representations; SASM: structural alphabet substitution matrix; SPF: spherical polar Fourier; SVM: support vector machines; ULN4D_graph theory: unique linear notations of four descriptions for protein structures on different abstraction levels based on graph theory; UPS: unbound protein structure.
Epitope prediction or reevaluation on accessible servers.
| Programs | Purpose | Tools or procedures of evaluation | Input | Output | Internet accessibility | References |
|---|---|---|---|---|---|---|
| Epitope prediction | ||||||
|
| ||||||
| MimoDB 2.0 | Mimotope database | MySQL relational database | according to menu | Structures visualization, alignments, and so forth |
| [ |
| MimoPro | Maps a group of mimotopes back to a source antigen so as to locate the interacting epitope on the antigen | Branch and bound optimization (analysis of overlapping patches on the surface of a protein) | PDB identifier mimotope sequences | Score + 3D location in antigen |
| [ |
| ViPR | Virus pathogen database | Integration of various resources | according to menu | Structures, annotations, and so forth |
| [ |
| MetaMHC | Prediction of MHC binding epitopes | meta-approach | QS | Four metapredictor scores including MetaSVMp score |
| [ |
| Epitopia | Prediction of B-cell epitopes |
∗MLA |
∗QS | Immunogenicity score + probability score + color scale record on QS |
| [ |
| ElliPro | New structure-based tool for the prediction of antibody epitopes | 3D + SA + flx + antigenicity |
∗QS | Score + visualized epitope 3D structure and 3D location |
| [ |
| MHCPred | Prediction of class II mouse MHC peptide binding affinity | ISC-PLS, SYBYL software package | QS | Binding affinity (pIC50) |
| [ |
|
| ||||||
| Stable epitopes and vaccines | ||||||
|
| ||||||
| BayesB | SVM-based prediction of linear B-cell epitopes using Bayes Feature Extraction | Residue conservation + position-specific residue propensities | QS | Residue epitope propensity score |
| [ |
| OptiTope | Selection of optimal peptides for epitope-based vaccines | Multistep parallel evaluation | MSA of Ag | Fraction of overall immunogenicity covered MHC |
| [ |
| PVS | PVS returns a variability-masked sequence, which can be submitted to the RANKPEP server to predict conserved T-cell epitopes. | 3D visualization of MSA derived sequence variability “per site” | PDB ACF | 3D map of variability fragments with no variable residues and their 3D location in antigen |
| [ |
| PEPVAC | Prediction of MHC I; server can also identify conserved and promiscuous MHC I ligands | PSSM, distance matrix, phylogenic clustering algorithm | Genome, HLA-supertypes | Selected peptide sequences score |
| [ |
| RANKPEP | Prediction of MHC I and MHC II ligands | Profile comparison | Genome, HLA-supertypes, QS | Selected peptide sequences score |
| [ |
∗Independent alternatives; Ag: antigen; BFE: Bayes Feature Extraction; flx: flexibility; ES: epitope sequences; HLA: human leukocyte antigens (human MHC); ISC-PLS: iterative self-consistent partial-least-squares based additive method; MHC: alleles of major histocompatibility antigens; MLA: machine learning algorithm; MSA: multiple sequence alignment; PDB: protein data bank; PDB ACF: PDB-derived atom coordinate file; QS: query sequence(s); SA: solvent accessibility; SVM: support vector machines.
Examples of protein docking.
| Approach/combination | Investigated molecules | Predicted/investigated partners | Evaluation tools and results | Employed internet addresses | References |
|---|---|---|---|---|---|
| Protein-ligand | |||||
|
| |||||
| SwissDock | protein | small molecule (ligand) | FullFitness + RMSD |
| [ |
| Glide (version 5.6) | P-glycoprotein | 24 P-gp binders + 102 endogenous molecules | Glide XP score + MM-GB/SA rescoring function |
| [ |
| VSDocker (AutoGrid4, AutoDock4, etc.) | protein | small molecule (example with 86 775 ligands) | Δ |
| [ |
| Opal (Autodock tools) | protein | small molecule (ligand) | Δ |
| [ |
| FlexX module of SYBYL 7.0 and FMO | neuraminidase (from N1 subtype of influenza virus) | carboxyhexenyl derivatives or analogues | FlexX energy scores |
| [ |
| TarFisDock | protein | drugs | interaction energy |
| [ |
| KinDOCK | ATP binding sites of PK | focused ligand library | theoretical affinity |
| [ |
| AutoDock + Gold | Cdc25 dual specificity phosphatases | library of sulfonylated aminothiazoles as inhibitors | Δ | none | [ |
| Molsoft module + ACD database | thyroid hormone receptor | 250 000 compounds | EQS score | none | [ |
|
| |||||
| Protein-protein | |||||
|
| |||||
| ZDOCK | protein | flexible/shape complementary protein | combined evaluation using Fast Fourier Transform |
| [ |
| KBDOCK | protein | protein with hetero-binding site | combined knowledge-based approach |
| [ |
| RosettaDock | protein | protein | CAPRI CS |
| [ |
| ClusPro | protein | protein | combined evaluation |
| [ |
|
| |||||
| Protein-nucleic acid | |||||
|
| |||||
| DARS-RNP | protein | RNA | knowledge-based potentials for scoring protein-RNA models |
| [ |
| protein-DNA docking approaches | transcription factor/protein | DNA | RMSD + knowledge based energy |
| [ |
|
| |||||
| Antigen receptors | |||||
|
| |||||
| DS-QM based peptide binding prediction | HLA-DP2 | HLA-DP2 interacting potential TCR ligands | CS following from two different DS-QM | none | [ |
| SnugDock | antibody | antigen | paratope based structural optimization + CLEP |
| [ |
| pDOCK | MHC I and II | potential TCR ligands | RMSD |
| [ |
| MHCsim | MHC I and II | potential TCR ligands | molecular dynamic simulation |
| [ |
ACD: available chemicals directory (database); CAPRI: critical assessment of predicted interactions; CLEP: combined lowest energy prediction; CS: combined scoring; DS-QM: docking score-based quantitative matrices; EQS score: score combining grid energy, electrostatic and entropy terms; FMO: fragment molecular orbitals; HLA-DP: a subset of human MHC II; ΔG bind: free energy of binding; PK: protein kinases; MM-GB/SA molecular mechanics scoring function with generalized Born implicit solvent effects; P-gp: P-glycoprotein; RMSD: root mean square displacement statistics; TCR: T-cell receptor.
Online accessible tools for PPI network assembly, reevaluation, and comparison.
| Purpose | Input | Evaluation results | Network content | Internet accessibility | References | |
|---|---|---|---|---|---|---|
| Program tools | ||||||
|
| ||||||
| PINTA | Resource for the prioritization of disease related candidate genes based on the differential expression of their neighborhood in a genome-wide protein-protein interaction network. | Files with disease-specific expression data | Internal scores, | menu: human, mouse, rat, worm |
| [ |
| PCFamily | Finds homologous structure complexes of the query using BLASTP to search the structural template database | QS or QS set in FASTA format |
| 941 protein complexes |
| [ |
| BisoGenet | Build and visualize biological networks in a fast and user-friendly manner including P-PIN | List of identifiers | Node degrees, cluster coefficients | 5365 human genes |
| [ |
| DynaMod | Identifies significant functional modules reflecting the change of modularity and differential expressions that are correlated with gene expression profiles under different conditions | Genome-wide expression profile |
| In frame of GSEA |
| [ |
| TORQUE | Cross species querying allows users to run topology-free queries on predefined or user-provided target networks resulting in subnetwork of the target network most similar to query subset | Seq of compared sets, query list, target P-PIN | Thickness of the edge | 5430 proteins, 39 936 interactions |
| [ |
| NetworkBLAST | Identifies protein complexes within and across species, forming html page with links to schemes of interactions within complexes and additional graph data files | Files with ∗PPI, BLPE ∗PPI | likelihood based density score | species related P-PIN |
| [ |
|
| ||||||
| Databases | ||||||
|
| ||||||
| STRING | A database of functional interaction networks of proteins, globally integrated and scored | Search based on gene annotations | probabilistic confidence score | >1100 completely sequenced organisms |
| [ |
| ChemProt | Disease chemical biology database including 7 × 105 chemicals and two millions chemical-protein interactions; indicates a possible formation of disease-associated protein complexes | Compound and protein identifiers | compilation of multiple chemical-protein annotations | 30 578 proteins |
| [ |
| NetAge | Database and network analysis tools for biogerontological research (integrity and functionality of P-PIN is under a tight epigenetic control) | Tools for searching and browsing | experimental evidence about interactions | miRNA-regulated P-PIN in age-related diseases |
| [ |
| BioGRID | All interactions in BioGRID are available through the Osprey visualization system, which can be used to query network organization in a user-defined fashion | Intuitive graphical interface | experimental evidence about interactions | Over 198 000 genetic + protein interactions from six species |
| [ |
| EHCO | Encyclopedia of Hepatocellular Carcinoma genes Online collect, organize and compare unsorted HCC-related studies | CMS | NL processing and softbots | 97 proteins, 47 highly interactive, 18 hubs |
| [ |
| IntNetDB v1.0 | A database providing automatic prediction and visualization of PPI network among genes of interest; analysis includes domain-domain interactions, known gene contexts, crossvalidation, and so forth | list of query gene identifiers | likelihood ratio following from Bayesian analysis | concerns 27 species contains GSP and GSN from HPRD |
| [ |
∗Independent alternatives; BLPE: BLASTP E values between pairs of proteins from each of the compared species; CMS: content management system; GSEA: gene set enrichment analyses; GSP, GSN: gold standard positive and a gold standard negative dataset of HPRD, respectively; HCC: hepatocellular carcinoma; HPRD: human protein reference database; NL: natural language; PE: probabilistic evaluation; PPI: protein-protein interactions; QS: query sequences; seq: sequences.