| Literature DB >> 26205532 |
Tomasz Kosciolek1, David T Jones2.
Abstract
Here we present the results of residue-residue contact predictions achieved in CASP11 by the CONSIP2 server, which is based around our MetaPSICOV contact prediction method. On a set of 40 target domains with a median family size of around 40 effective sequences, our server achieved an average top-L/5 long-range contact precision of 27%. MetaPSICOV method bases on a combination of classical contact prediction features, enhanced with three distinct covariation methods embedded in a two-stage neural network predictor. Some unique features of our approach are (1) the tuning between the classical and covariation features depending on the depth of the input alignment and (2) a hybrid approach to generate deepest possible multiple-sequence alignments by combining jackHMMer and HHblits. We discuss the CONSIP2 pipeline, our results and show that where the method underperformed, the major factor was relying on a fixed set of parameters for the initial sequence alignments and not attempting to perform domain splitting as a preprocessing step. Proteins 2016; 84(Suppl 1):145-151.Entities:
Keywords: CASP; ab initio prediction; amino acid covariation; protein structure prediction; residue-residue contact prediction
Mesh:
Substances:
Year: 2015 PMID: 26205532 PMCID: PMC5042084 DOI: 10.1002/prot.24863
Source DB: PubMed Journal: Proteins ISSN: 0887-3585
Figure 1Diagram of sequence alignment pipeline implemented in the CONSIP2 server.
Figure 2Average MetaPSICOV performance for targets in the small MSA region compared to CASP11 contact prediction results.
Summary of CONSIP2 Results
| Domain | Length | Top‐ |
|
|---|---|---|---|
| T0761‐D1 | 88 | 5.6 | 1 |
| T0761‐D2 | 136 | 8.7 | 1 |
| T0763‐D1 | 130 | 46.2 | 2 |
| T0767‐D2 | 180 | 58.3 | 43 |
| T0771‐D1 | 151 | 10.0 | 8 |
| T0775‐D2 | 66 | 46.2 | 19 |
| T0775‐D4 | 61 | 25.0 | 20 |
| T0775‐D5 | 145 | 0.0 | 14 |
| T0777‐D1 | 345 | 23.2 | 39 |
| T0781‐D1 | 200 | 5.0 | 2 |
| T0785‐D1 | 112 | 18.2 | 1 |
| T0789‐D1 | 146 | 51.7 | 253 |
| T0789‐D2 | 126 | 28.0 | 304 |
| T0790‐D1 | 135 | 44.4 | 276 |
| T0790‐D2 | 130 | 26.9 | 300 |
| T0791‐D1 | 156 | 53.3 | 223 |
| T0791‐D2 | 139 | 42.9 | 271 |
| T0793‐D1 | 109 | 15.0 | 302 |
| T0793‐D2 | 45 | 11.1 | 402 |
| T0793‐D5 | 118 | 38.1 | 357 |
| T0794‐D2 | 172 | 26.5 | 133 |
| T0799‐D1 | 141 | 7.1 | 2 |
| T0802‐D1 | 116 | 13.0 | 4 |
| T0804‐D2 | 152 | 16.7 | 1 |
| T0806‐D1 | 256 | 84.3 | 561 |
| T0808‐D2 | 269 | 35.2 | 46 |
| T0810‐D1 | 113 | 17.4 | 83 |
| T0814‐D1 | 137 | 37.0 | 115 |
| T0814‐D2 | 116 | 82.6 | 131 |
| T0820‐D1 | 90 | 5.6 | 1 |
| T0824‐D1 | 108 | 45.5 | 155 |
| T0826‐D1 | 201 | 7.5 | 422 |
| T0827‐D2 | 158 | 10.0 | 257 |
| T0831‐D2 | 244 | 7.7 | 71 |
| T0832‐D1 | 209 | 2.4 | 10 |
| T0834‐D1 | 99 | 5.0 | 34 |
| T0834‐D2 | 92 | 17.7 | 28 |
| T0836‐D1 | 204 | 43.9 | 50 |
| T0837‐D1 | 121 | 29.2 | 9 |
| T0855‐D1 | 115 | 17.4 | 19 |
Contact prediction precision is calculated for top‐L/5 LR contacts, where L is the length of the protein and LR indicates long‐range contacts (>23 sequence separation).
N eff—number of effective sequences calculated as described in the Materials and Methods section (see “Effective sequence calculations”).
Recalculated Effective Sequence Counts and Precision Values Using Only the Domain Sequence
| Domain | Initial | Recalculated | ||
|---|---|---|---|---|
| Top‐ |
| Top‐ |
| |
| T0789‐D2 | 28.0 | 304 | 36.0 | 278 |
| T0790‐D2 | 26.9 | 300 | 38.0 | 258 |
| T0793‐D1 | 15.0 | 302 | 9.5 | 12 |
| T0793‐D2 | 11.1 | 402 | 25.0 | 12 |
| T0793‐D5 | 38.1 | 357 | 21.7 | 61 |
| T0826‐D1 | 7.5 | 422 | 62.5 | 424 |
| T0827‐D2 | 10.0 | 257 | 20.0 | 116 |
Initial results (Columns 2 and 3)—the results submitted by the CONSIP2 server during CASP11 prediction season, obtained using the whole target sequence.
Recalculated results (Columns 4 and 5)—produced by CONSIP2 using only the domain sequence.
Figure 3Changes to the outliers upon realigning the domain sequence. The solid line shows reference MetaPSICOV benchmark results. Points represent analyzed outliers: solid circle (o)—initial CONSIP2 server prediction; (x)—result for the realigned domain sequence.
Figure 4Jones‐UCL FM prediction for target T0836‐D1. A sample free modeling case, where the predicted contacts (N eff = 50; top‐L/5 LR contact precision = 44%) helped to produce an outstanding model.