| Literature DB >> 21685077 |
Marcin J Mizianty1, Lukasz Kurgan.
Abstract
MOTIVATION: X-ray crystallography-based protein structure determination, which accounts for majority of solved structures, is characterized by relatively low success rates. One solution is to build tools which support selection of targets that are more likely to crystallize. Several in silico methods that predict propensity of diffraction-quality crystallization from protein chains were developed. We show that the quality of their predictions drops when applied to more recent crystallization trails, which calls for new solutions. We propose a novel approach that alleviates drawbacks of the existing methods by using a recent dataset and improved protocol to annotate progress along the crystallization process, by predicting the success of the entire process and steps which result in the failed attempts, and by utilizing a compact and comprehensive set of sequence-derived inputs to generate accurate predictions.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21685077 PMCID: PMC3117383 DOI: 10.1093/bioinformatics/btr229
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 2.The MCC (A) and AUC (B) values obtained by the considered crystallization propensity predictors with respect with the date of the test trials (x-axis) from the DB_CRYS test dataset. BLAST and SVMCrys provide only binary prediction; their AUC cannot be computed.
List of stop statuses and current statuses in PepcDB
The statuses are sorted top-down from steps earlier to further in the crystallization procedure. The current status indicates the current, rather than the completed activity, e.g. for the ‘cloning failed’ stop status, the current status ‘cloned’ does not mean that cloning was successful, but if the current status is ‘expressed’ then cloning can be assumed successful. We disregarded ‘other’, ‘poor NMR’, ‘mass spec failed’ and ‘duplicate target found’ stop statuses and ‘other’, ‘test target’, ‘work stopped’, ‘selected’, ‘mass spec verified’, ‘NMR assigned’, ‘HSQC’, ‘NMR structure’ current statuses.
Fig. 1.The overall architecture of the proposed PPCpred method.
Summary of results for the prediction of the propensity of the diffraction-quality crystallization success (based on the DB_CRYS test dataset), the prediction of the propensity of the material production failure (DB_MF test set), the prediction of the propensity of the purification failure (DB_PF test set) and the prediction of the propensity of the crystallization failure (DB_CF test set)
| Test dataset (prediction target) | Method | MCC | ACC | SPEC | SENS | AUC | ||
|---|---|---|---|---|---|---|---|---|
| value | sig | value | sig | |||||
| DB_CRYS (propensity of the diffraction-quality crystallization success) | ParCrys | 0.108 | + | 47.5 | + | 31.8 | 78.6 | 0.561 |
| OBScore | 0.124 | + | 47.8 | + | 31.4 | 0.572 | ||
| BLAST-based | 0.188 | + | 65.6 | + | 79.5 | 38.0 | N/A | |
| CRYSTALP2 | 0.195 | + | 55.3 | + | 45.7 | 74.4 | 0.648 | |
| MetaPPCP | 0.195 | + | 59.9 | + | 59.0 | 61.7 | 0.620 | |
| SVMCrys | 0.213 | + | 56.3 | + | 46.7 | 75.2 | N/A | |
| XtalPred | 0.278 | + | 63.9 | + | 62.3 | 67.0 | 0.683 | |
| SVM_POLY | 0.398 | + | 74.6 | + | 47.9 | 0.779 | ||
| max-based | 0.467 | + | 76.1 | + | 81.6 | 65.3 | ||
| PPCpred | 84.8 | 61.2 | 0.789 | |||||
| DB_MF (propensity of the material production failure) | BLAST-based | 0.014 | + | 55.4 | + | 35.3 | 66.0 | N/A |
| max-based | 0.339 | + | 71.6 | + | 45.4 | 0.621 | ||
| SVM_RBF | 0.423 | + | 74.6 | + | 56.1 | 84.5 | ||
| PPCpred | 78.0 | 0.755 | ||||||
| DB_PF (propensity of the purification failure) | BLAST-based | 0.102 | + | 60.0 | + | 43.2 | 67.4 | N/A |
| max-based | 0.246 | + | 70.8 | + | 34.4 | 86.9 | 0.609 | |
| SVM_POLY | 0.290 | – | 30.8 | |||||
| PPCpred | 72.0 | 81.6 | 0.697 | |||||
| DB_CF (propensity of the crystallization failure) | BLAST-based | 0.060 | + | 60.9 | 37.0 | 69.4 | N/A | |
| SVM_POLY | 0.346 | = | 40.1 | |||||
| PPCpred | 0.457 | 76.6 | 78.7 | 0.811 | ||||
| max-based | – | 76.9 | – | 70.5 | 79.2 | 0.813 | ||
The proposed PPCpred is compared against results on the OBScore, XtalPred, ParCrys, CRYSTALP2, MetaPPCP and SVMCrys on the DB_CRYS dataset, and against the maximum-based aggregation method (max-based), the best performing SVM classifier (SVM_POLY or SVM_RBF), and the BLAST-based predictor on the four datasets. The methods are sorted in the ascending order based on their MCC scores, and the highest values for each quality index and dataset are shown in bold. The BLAST and SVMCrys provide only binary prediction and thus we could not compute their AUC. Results of tests of significance of the differences in MCC and ACC between PPCpred and the other methods are given in the ‘sig’ columns. The tests compare values over 100 bootstrapping repetitions. The ‘+’ and ‘−’ mean that PPCpred is statistically significantly better/worse with P<0.01, and ‘=’ means that results are not significantly different.
Results for the 4-class prediction (failure in material production, failure in purification, failure in crystallization and success in the generation of the diffraction-quality crystals) on the DB_4CL test dataset
| Method | Mean MCC | ACC | ||
|---|---|---|---|---|
| Value | sig | Value | sig | |
| BLAST-based | 0.041 | + | 31.1 | + |
| max-based | 0.294 | + | 49.0 | + |
| PPCpred | ||||
The proposed PPCpred is compared against the maximum-based aggregation method (max-based), and the BLAST-based predictor. The methods are sorted in the ascending order based on their MCC scores, and the highest values for each quality index and dataset are shown in bold. Results of tests of significance of the differences in mean MCC and ACC between PPCpred and the other methods are given in the ‘sig’ columns. The tests compare values over 100 bootstrapping repetitions. The ‘+’ and ‘−’ mean that PPCpred is statistically significantly better/worse with P<0.01, and ‘=’ means that results are not significantly different.
Summary of the features types selected for the prediction of the material production, purification and crystallization
| Features types | Number of features selected for the prediction of | |||
|---|---|---|---|---|
| Material production | Purification | Crystallization | Diffraction-quality crystallization | |
| Hydrophobicity index | 2 | 2 | 5 | 5 |
| Energy-based index | 4 | 0 | 2 | 3 |
| Composition of AAs | 1 | 3 | 1 | 1 |
| Isoelectric point | 0 | 1 | 0 | 0 |
| Solvent accessibility | 3 | 4 | 1 | 3 |
| Disorder | 1 | 0 | 1 | 1 |
| Secondary structure | 0 | 0 | 0 | 1 |
| Considered AA types | Arg, Cys, Glu | Asn, Cys, Ser, Met | His | Cys, His, Ser |
Fig. 3.Scatter plots of three pairs of features used by the PPCpred: features used for the prediction of crystallization (A); for the diffraction-quality crystallization (B) and for the purification (C). Size of the markers denotes the number of trials and color denotes their membership, green for the successful and black for the failed trials.