| Literature DB >> 28901583 |
Daniel W A Buchan1, David T Jones1.
Abstract
In this paper, we present the results for the MetaPSICOV2 contact prediction server in the CASP12 community experiment (http://predictioncenter.org). Over the 35 assessed Free Modelling target domains the MetaPSICOV2 server achieved a mean precision of 43.27%, a substantial increase relative to the server's performance in the CASP11 experiment. In the following paper, we discuss improvements to the MetaPSICOV2 server, covering both changes to the neural network and attempts to integrate contact predictions on a domain basis into the prediction pipeline. We also discuss some limitations in the CASP12 assessment which may have overestimated the performance of our method.Entities:
Keywords: bioinformatics; contact prediction; machine learning; meta-prediction
Mesh:
Substances:
Year: 2017 PMID: 28901583 PMCID: PMC5836854 DOI: 10.1002/prot.25379
Source DB: PubMed Journal: Proteins ISSN: 0887-3585
Figure 1MetaPSICOV2 contact prediction pipeline. Sequences enter the pipeline at the top left. An HHblits run against PDB70 is run and if putative structural domains are identified, an additional masked sequence(s) is produced. The masked sequence (red path) and query sequence (blue path) then follow the CONSIP2 pipeline. If the prediction over the masked sequence produces high quality contacts these are integrated before the final Contact Prediction is produced
Summary of MetaPSICOV2 performance
| Target ID | Domain | Precision (%) |
| Type |
|---|---|---|---|---|
| T0859 | D1 | 4.35 | 1 | FM |
| T0862 | D1 | 26.32 | 9 | FM |
| T0863 | D1 | 12.82 | 80 | FM |
| T0863 | D2 | 6.94 | 80 | FM |
| T0864 | D1 | 64.00 | 175 | FM |
| T0866 | D1 | 100 | 952 | FM |
| T0869 | D1 | 52.38 | 16 | FM |
| T0870 | D1 | 8.00 | 28 | FM |
| T0878 | D1 | 43.48 | 204 | FM |
| T0880 | D2 | 25.00 | 1 | FM |
| T0886 | D1 | 100.00 | 1473 | FM |
| T0886 | D2 | 100.00 | 1473 | FM |
| T0888 | D1 | 4.00 | 2 | FM |
| T0890 | D2 | 13.64 | 16 | FM |
| T0892 | D2 | 63.64 | 289 | FM |
| T0894 | D1 | 0.00 | 16 | FM |
| T0897 | D1 | 3.57 | 10 | FM |
| T0897 | D2 | 16.00 | 10 | FM |
| T0898 | D1 | 27.27 | 33 | FM |
| T0899 | D1 | 86.54 | 109 | FM |
| T0899 | D2 | 61.11 | 109 | FM |
| T0900 | D1 | 95.24 | 7 | FM |
| T0901 | D2 | 50.00 | 631 | FM |
| T0904 | D1 | 25.49 | 42 | FM |
| T0905 | D1 | 93.88 | 914 | FM |
| T0912 | D3 | 42.86 | 1023 | FM |
| T0914 | D1 | 3.13 | 6 | FM |
| T0914 | D2 | 15.15 | 6 | FM |
| T0915 | D1 | 38.71 | 25 | FM |
| T0918 | D1 | 72.73 | 428 | FM |
| T0918 | D2 | 84.00 | 428 | FM |
| T0918 | D3 | 87.50 | 428 | FM |
| T0923 | D1 | 15.52 | 10 | FM |
| T0941 | D1 | 8.70 | 3 | FM |
| T0946 | D1 | 62.50 | 337 | FM |
| T0868 | D1 | 75.00 | 11 | FM/TBM |
| T0884 | D1 | 26.67 | 26 | FM/TBM |
| T0890 | D1 | 58.82 | 16 | FM/TBM |
| T0892 | D1 | 64.29 | 289 | FM/TBM |
| T0894 | D2 | 63.64 | 16 | FM/TBM |
| T0896 | D1 | 72.22 | 2673 | FM/TBM |
| T0896 | D2 | 10.00 | 3 | FM/TBM |
| T0898 | D2 | 27.27 | 33 | FM/TBM |
| T0901 | D1 | 80.00 | 631 | FM/TBM |
| T0909 | D1 | 24.62 | 80 | FM/TBM |
| T0912 | D2 | 88.24 | 1026 | FM/TBM |
| T0943 | D1 | 69.23 | 473 | FM/TBM |
| T0945 | D1 | 94.67 | 872 | FM/TBM |
Contact prediction precision is calculated over the top L/5 Long Range contacts. Where L is the length of the protein and Long Range is taken to be a sequence separation >23 residues.
N Gives the number of effective sequences calculated as described in the Materials and Methods.
Type gives the prediction category; FM: Free modelling, FM/TBM: Free modelling/Template Based Modelling.
Figure 2Figure shows the increase in precision as the N increases. FM targets are shown as red circles and FM/TBM targets as blue triangles. Trend lines shown have been fitted using LOESS
Change in performance for integrating domain predictions
| Target ID | Default Precision (%) | Updated Precision (%) |
| Type |
|---|---|---|---|---|
| T0862 | 26.32 | 26.32 | 9 | FM |
| T0904 | 25.49 | 25.49 | 42 | FM |
| T0905 | 93.88 | 93.88 | 914 | FM |
| T0941 | 7.25 | 8.70 | 3 | FM |
| T0946‐D1 | 6.25 | 62.50 | 337 | FM |
| T0896‐D1 | 11.11 | 72.22 | 2673 | FM/TBM |
| T0896‐D2 | 10 | 10.00 | 3 | FM/TBM |
| T0909 | 23.08 | 24.62 | 80 | FM/TBM |
| T0912‐D2 | 88.24 | 88.24 | 1026 | FM/TBM |
| T0945 | 92 | 94.67 | 872 | FM/TBM |
Change in precision for targets where the new domain identification strategy was utilised.
Default Precision shows the predicted precision given the default MetaPSICOV2 pipeline.
Updated Precision shows the precision after integrating contacts based on domain recognition.
Type gives the prediction category; FM: Free modelling, FM/TBM: Free modelling/Template Based Modelling.
Figure 3Comparison of precision values for top L/5 predictions using MeatPSICOV and MetaPSICOV2. Targets compared are only those domains which did not go through the MetaPSICOV2 domain recognition process. Points are individual CASP12 targets. Points are labelled by N: red squares and triangles for low N values, green diamonds and circles for high N values
Figure 4Relationship between MetaPSICOV2 probability estimate (in bins of 5%) and the true precision for predicted contacts which fell in those bins