| Literature DB >> 25971595 |
Reyhaneh Esmaielbeiki, Konrad Krawczyk, Bernhard Knapp, Jean-Christophe Nebel, Charlotte M Deane.
Abstract
The majority of biological processes are mediated via protein-protein interactions. Determination of residues participating in such interactions improves our understanding of molecular mechanisms and facilitates the development of therapeutics. Experimental approaches to identifying interacting residues, such as mutagenesis, are costly and time-consuming and thus, computational methods for this purpose could streamline conventional pipelines. Here we review the field of computational protein interface prediction. We make a distinction between methods which address proteins in general and those targeted at antibodies, owing to the radically different binding mechanism of antibodies. We organize the multitude of currently available methods hierarchically based on required input and prediction principles to provide an overview of the field.Entities:
Keywords: antibody antigen interaction; protein interface prediction; protein–protein interaction
Mesh:
Substances:
Year: 2015 PMID: 25971595 PMCID: PMC4719070 DOI: 10.1093/bib/bbv027
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Commonly used metrics to assess the quality of interface residue predictions
| Metric | Formula |
|---|---|
| Specificity | |
| Sensitivity (also known as recall) | |
| Precision | |
| F1 (harmonic mean of precision and recall) | |
| Accuracy | |
| Matthews correlation coefficient (MCC) |
A single interface prediction consists of a set of residues believed to constitute the binding site and those that do not. Out of those believed to be the binding site, if they are truly binding residues they are called TP, otherwise they are FP. Out of the residues identified as non-binding, if they do not constitute the interface, they are called TN and FN otherwise (see Figure S2). These four numbers are used to calculate a range of performance metrics presented in this table.
Figure 1.Classification of existing protein interface prediction methods. In the leftmost column we present the input required by a method. In the middle column, a simplified pipeline for the protocol is presented. In the rightmost, prediction column, the resulting binding site is shown in red. Most methods output a ranked list of possible binding sites. Here for simplicity, we show a single result for each method. (A) Sequence-feature-based predictors: These methods receive a protein sequence. Sequential features of the input are compared with features thought to contribute to a residue being part of an interface, such as conservation scores and physico-chemical properties. (B) 3D mapping-based predictors: These methods receive a protein structure and its sequence as input. Evolutionary conservation is coupled with 3D surface and sequence information. Conserved residues can be grouped according to their surface proximity to form contiguous interface patches. (C) 3D-classifier-based predictors: The input for these methods is a protein structure and its sequence. Distinct sets of attributes (physico-chemical, evolution, 3D structural features, etc.) are used as an input to a learning method such as a SVM or Random Forest. (D) Template-based predictors: These methods receive a protein structure (and thus its sequence) as input. Complex templates are then identified, which can be homologues or structural neighbours (these are shown in white, whereas their binding partners are in green, cyan and yellow). Templates of the input protein are aligned to the query protein. The most commonly aligned contact sites are returned as a prediction. (E) Partner-specific interface predictors: These methods receive the structures/sequences of two proteins that are assumed to interact. The three groups of methods are shown for this category. Partner-specific descriptors can be calculated to predict interfaces. In some cases docking is used to sample possible orientations to identify a consensus binding site. Partner-specific descriptors and docking poses are used as input for parametric functions and classifiers to obtain the final result. In the co-evolution-based strategy, a MSA of interacting homologues is created and sites that appear to mutate in concert (co-evolve) are assumed to constitute the binding site. A colour version of this figure is available at BIB online: http://bib.oxfordjournals.org.
Protein interface predictors and their performance
| Input | Main knowledge source(properties) | Intrinsic-based | Template-based | Output | Performance | |||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Method | Predictor | Sequence | Structure | Both | Sequence | Structure | Both | Additional | Evolution Info. | Intrinsic features | Both | Homologous Structure | Structural Neighbour | Residue-based | Patch-based | Data set* | Recall % | Precision % | Specificity % | Accuracy % | MCC | F1 % | AUC | Numbers taken from* |
| A | [ | x | x | x | x | [ | 45.55 | 86.98 | 97.41 | 83.12 | 0.55 | 59.79 | – | |||||||||||
| [ | x | x | x | x | 57.9 | – | 65 | 62.5 | 0.22 | 52 | – | |||||||||||||
| [ | x | x | x | x | [ | 83 | – | 78 | – | 0.76 | – | – | ||||||||||||
| [ | x | x | + | x | x | 47 | 22.2 | 69 | 66.4 | 0.13 | 25.6 | |||||||||||||
| [ | x | x | x | x | 42.84 | 81.96 | – | – | – | 56.25 | – | |||||||||||||
| [ | x | x | x | x | 70 | 37.7 | – | – | – | 49 | – | [ | ||||||||||||
| [ | x | x | + | x | x | [ | 36.6 | 18.9 | 76.1 | 71.9 | 0.09 | 23.2 | – | [ | ||||||||||
| [ | x | x | x | x | [ | 69 | – | 65 | – | 0.28 | 67 | – | [ | |||||||||||
| [ | x | x | x | x | 58.8 | 26.3 | – | – | – | 36.3 | [ | |||||||||||||
| [ | x | x | x | x | [ | 39 | – | 58 | 72 | – | – | – | ||||||||||||
| [ | x | x | x | x | 50 | 62 | – | – | – | 10 | – | [ | ||||||||||||
| B | [ | x | x | x | x | [ | 39.8 | – | 86.9 | 72.6 | – | – | – | |||||||||||
| [ | x | x | x | x | [ | 34.2 | – | 85.1 | 68.5 | – | – | – | [ | |||||||||||
| C | [ | x | x | x | x | [ | 63.6 | – | 84.3 | – | 0.37 | – | – | |||||||||||
| [ | x | x | x | x | [ | 72.7 | – | 61 | 75.2 | 0.47 | 66.3 | 0.82 | ||||||||||||
| [ | x | x | x | x | [ | – | – | – | – | 0.17 | – | 0.69 | ||||||||||||
| [ | x | x | x | x | 99.08 | 99.91 | – | 80.32 | 1.29 | 99.48 | – | |||||||||||||
| [ | x | x | x | x | [ | 45.8 | 69.6 | – | 79.8 | – | – | – | ||||||||||||
| [ | x | x | x | x | 78.99 | 65.3 | 54.66 | 67.29 | 0.34 | – | – | |||||||||||||
| [ | x | x | x | x | x | [ | 68 | – | 73 | 71 | 0.43 | 71 | – | |||||||||||
| [ | x | x | x | x | [ | 74.7 | 63.4 | – | – | 0.58 | – | 0.9 | ||||||||||||
| [ | x | x | x | x | [ | – | – | – | 70 | – | – | – | ||||||||||||
| [ | x | x | x | x | [ | 77 | – | 63 | – | 0.35 | 69 | – | [ | |||||||||||
| [ | x | x | x | x | [ | 78.27 | 63.44 | 51.28 | 65.3 | 0.30 | – | – | [ | |||||||||||
| [ | x | x | x | x | 59 | – | 54 | 69 | 0.33 | 56 | – | [ | ||||||||||||
| [ | x | x | x | x | 60.7 | – | 41.9 | – | 0.20 | – | – | |||||||||||||
| [ | x | x | x | x | [ | – | – | – | – | – | – | – | ||||||||||||
| [ | x | x | x | x | CAPRI | 41.7 | 40.3 | – | – | – | – | – | ||||||||||||
| [ | x | x | x | x | [ | 46.2 | 42.2 | – | 83.2 | 0.30 | 44.1 | – | ||||||||||||
| [ | x | x | x | x | 37.7 | 57.8 | – | 75.1 | 0.31 | 45.7 | – | |||||||||||||
| [ | x | x | x | x | CAPRI | 30.1 | 30.4 | – | 76.9 | 0.16 | 30.2 | 0.60 | [ | |||||||||||
| [ | x | x | x | x | [ | 36 | – | 93 | – | 0.33 | 52 | – | [ | |||||||||||
| [ | x | x | x | x | 60.3 | 63.7 | – | 74.2 | 0.42 | – | – | |||||||||||||
| [ | x | x | x | x | – | – | – | – | – | – | – | – | ||||||||||||
| [ | x | x | x | x | – | – | – | – | – | – | – | – | ||||||||||||
| [ | x | x | x | x | [ | 67 | 22 | – | 67 | – | – | – | ||||||||||||
| [ | x | x | x | x | CAPRI | 34.5 | 37.4 | – | 79.5 | 0.23 | 35.9 | 0.71 | [ | |||||||||||
| [ | x | x | x | x | 42.8 | 57.8 | – | 73.3 | – | – | – | |||||||||||||
| [ | x | x | x | x | CAPRI | 27.3 | 28.7 | – | 76.6 | 0.14 | 28 | 0.62 | [ | |||||||||||
| [ | x | x | x | x | [ | – | – | – | 76 | 0.5 | – | – | ||||||||||||
| [ | x | x | x | x | – | – | – | 72 | 0.43 | – | – | [ | ||||||||||||
| [ | x | x | x | x | [ | 27.7 | – | 44.2 | – | 0.15 | – | – | [ | |||||||||||
| D | [ | [ | – | 25 | – | 45 | – | – | – | |||||||||||||||
| [ | [ | – | 50.5 | – | 49.5 | – | – | – | ||||||||||||||||
| CAPRI | 24 | 38.9 | – | 81.1 | 0.20 | 29.7 | 0.71 | [ | ||||||||||||||||
| E | [ | x | x | x | x | [ | 56.1 | 52.6 | – | 85.4 | 0.45 | 52.5 | – | |||||||||||
| [ | x | x | x | x | [ | 43 | 72.7 | – | – | – | – | – | ||||||||||||
| [ | x | x | x | x | 67.3 | 50 | – | – | – | – | – | |||||||||||||
| F | [ | x | x | x | x | CAPRI-bound | 46.1 | 45.4 | – | 80.9 | 0.34 | 45.7 | 0.77 | |||||||||||
| CAPRI-unbound | 43.7 | 44 | – | 81.2 | 0.32 | 43.8 | 0.75 | |||||||||||||||||
| [ | x | x | x | x | [ | 57.5 | 50.3 | – | 72.6 | 0.34 | 0.53 | 0.73 | ||||||||||||
| CAPRI-bound | 53 | 43 | – | 72.1 | 0.29 | 0.47 | 0.71 | |||||||||||||||||
| CAPRI-unbound | 53.6 | 43.3 | – | 73.2 | 0.30 | 0.48 | 0.72 | |||||||||||||||||
| [ | x | x | x | x | [ | 45.7 | 43.60 | – | – | – | – | – | ||||||||||||
| CAPRI-bound | 42.2 | 41.50 | – | – | – | – | – | |||||||||||||||||
| CAPRI-unbound | 44.6 | 39.8 | – | – | – | – | – | |||||||||||||||||
| [ | x | x | x | x | x | x | [ | 34 | 32 | – | – | – | 34 | – | ||||||||||
| [ | x | x | x | x | x | x | 35.3 | 31.5 | – | – | – | 33.3 | – | |||||||||||
| G | [ | x | x | x | x | [ | – | – | – | – | – | – | 0.47 | |||||||||||
| [ | x | x | x | x | [ | – | – | – | – | – | – | 0.87 | ||||||||||||
| [ | x | x | x | x | [ | 62.2 | 40.4 | – | – | – | – | – | ||||||||||||
| [ | x | x | x | x | [ | – | – | – | – | – | – | 0.72 | ||||||||||||
| [ | x | x | x | x | 72.7 | 39.3 | – | – | – | 51 | – | |||||||||||||
| [ | x | x | + | x | – | – | – | – | – | – | – | |||||||||||||
| [ | x | x | x | [ | 20 | 59 | – | – | – | – | – | [ | ||||||||||||
| [ | x | x | x | [ | 20 | 23 | – | – | – | – | – | [ | ||||||||||||
| [ | x | x | x | [ | 20 | 23 | – | – | – | – | – | [ | ||||||||||||
| [ | x | x | x | [ | 20 | 23 | – | – | – | – | – | [ | ||||||||||||
| [ | x | x | x | [ | 20 | 25 | – | – | – | – | – | [ | ||||||||||||
| [ | x | x | x | [ | 20 | 20 | – | – | – | – | – | [ | ||||||||||||
The predictors are grouped by their corresponding category from this manuscript, based on the input and methodology used. The numbers in the ‘Method’ column correspond to the heading numbering in the text (except from meta predictors). Performance measures, where available, were collected from the original publications. Where possible, the performance measures were taken from studies benchmarking several studies at once. Empty cells in columns with * correspond to the same study where its reference number is available in the predictor column in the same row. Cells with + refer to ‘predicted structural feature’. In the data set column, CAPRI refers to the targets used in the CAPRI challenge, which can be in the bound or unbound form. The 3D classifier group contains some methods, which are based on scoring function. Columns marked with x correspond to the features the predictor is using. Where data is not available - sign is used. In the Method column for ‘A' see section ‘Sequence Feature-based Predictors', for ‘B' see section ‘3D mapping-based Predictors', for ‘C' see section ‘3D-Classifier Predictors', for ‘D' see meta methods in section ‘Descriptors used by predictors', for ‘E' see section ‘Homologous Template-based Predictors', for ‘F' see section ‘Structural Neighbour-based Predictors' and for ‘G' see section ‘Partner-specific interface predictors'.
Figure 2.Antibody structure and binding. The most common form of an antibody is the IgG (upper left). IgG is composed of two pairs of heavy and light chains. The tip of an antibody that carries the binding site (symmetrical in an IgG) is the variable region (upper right). The variable region harbours the six CDR loops, which form the majority of the antigen recognition site, the paratope (lower). The CDR regions are distinct between different antibodies whereas the rest of the antibody remains largely unchanged. The paratope recognizes a specific epitope, the corresponding binding site on the antigen (lower). A colour version of this figure is available at BIB online: http://bib.oxfordjournals.org.