| Literature DB >> 21246630 |
Ian M Overton1, C A Johannes van Niekerk, Geoffrey J Barton.
Abstract
Production of diffracting crystals is a critical step in determining the three-dimensional structure of a protein by X-ray crystallography. Computational techniques to rank proteins by their propensity to yield diffraction-quality crystals can improve efficiency in obtaining structural data by guiding both protein selection and construct design. XANNpred comprises a pair of artificial neural networks that each predict the propensity of a selected protein sequence to produce diffraction-quality crystals by current structural biology techniques. Blind tests show XANNpred has accuracy and Matthews correlation values ranging from 75% to 81% and 0.50 to 0.63 respectively; values of area under the receiver operator characteristic (ROC) curve range from 0.81 to 0.88. On blind test data XANNpred outperforms the other available algorithms XtalPred, PXS, OB-Score, and ParCrys. XANNpred also guides construct design by presenting graphs of predicted propensity for diffraction-quality crystals against residue sequence position. The XANNpred-SG algorithm is likely to be most useful to target selection in structural genomics consortia, while the XANNpred-PDB algorithm is more suited to the general structural biology community. XANNpred predictions that include sliding window graphs are freely available from http://www.compbio.dundee.ac.uk/xannpredEntities:
Mesh:
Substances:
Year: 2011 PMID: 21246630 PMCID: PMC3084997 DOI: 10.1002/prot.22914
Source DB: PubMed Journal: Proteins ISSN: 0887-3585
Summary of Performance on Blind Test Datasets
| Dataset | ||||||||
|---|---|---|---|---|---|---|---|---|
| TEST-PDB | TEST-SG | HTEST-PDB | HTEST-SG | |||||
| Algorithm | AROC | MCC | AROC | MCC | AROC | MCC | AROC | MCC |
| XANNpred-PDB | 0.854 | 0.63 | — | — | 0.810 | 0.50 | — | — |
| XANNpred-SG | — | — | 0.836 | 0.52 | — | — | 0.877 | 0.58 |
| XtalPred | 0.707 | 0.37 (0.29) | 0.791 | 0.47 (0.47) | 0.770 | 0.48 (0.48) | 0.701 | 0.34 (0.27) |
| OB-Score | 0.612 | 0.23 (0.17) | 0.658 | 0.37 (0.31) | 0.644 | 0.32 (0.30) | 0.613 | 0.24 (0.19) |
| ParCrys | 0.541 | 0.17 (0.12) | 0.655 | 0.36 (0.25) | 0.634 | 0.32 (0.21) | 0.562 | 0.23 (0.13) |
| PXS | 0.574 | 0.21 (0.17) | 0.522 | 0.13 (0.02) | 0.599 | 0.30 (0.05) | 0.416 | 0 (−0.02) |
These values may be inflated due to overlap with training data and therefore are omitted from the table. For completeness, respective AROC/MCC values for XANNpred-SG on TEST-PDB are 0.917/0.66; on HTEST-PDB 0.880/0.62. Respective AROC/MCC values for XANNpred-PDB on TEST-SG are 0.822/0.47; on HTEST-SG 0.857/0.65.
Matthews correlation values given for XtalPred, OB-Score, ParCrys, and PXS are maximum possible values. Matthews correlation values in brackets were determined with predictive thresholds quoted in the literature for OB-Score and ParCrys; bracketed values for XtalPred reflect a threshold of 3; bracketed values for PXS reflect a threshold of 0.2.
Figure 1ROC curves for XANNpred-PDB, XtalPred,20 OB-Score,19 PXS,16 and ParCrys21 on the blind test dataset TEST-PDB. XANNpred-PDB significantly outperforms the next best algorithm XtalPred (two-tailed P ≤ 0.0062). Areas under the ROC curves are given in the bottom right-hand corner. This figure was generated using the R package.42
Figure 2ROC curves for XANNpred-SG, XtalPred,20 OB-Score,19 ParCrys,21 and PXS16 on the blind test dataset TEST-SG. Areas under the ROC curves are given in the bottom right-hand corner. This figure was generated using the R package.42
Figure 3XANNpred-PDB sliding window plot for “HVA22-like protein a” (Q9S7V4). Residues 92 to 177 fall into windows with very high XANNpred score (>0.9), while the centre position of very high-scoring windows spans residues 123 to 148. The 85 residues within very high-scoring windows therefore offer a potentially promising starting point for work to crystallize the C-terminal region of “HVA22-like protein a.”