| Literature DB >> 22558194 |
Yongchao Dou1, Jun Wang, Jialiang Yang, Chi Zhang.
Abstract
To understand enzyme functions, identifying the catalytic residues is a usual first step. Moreover, knowledge about catalytic residues is also useful for protein engineering and drug-design. However, to experimentally identify catalytic residues remains challenging for reasons of time and cost. Therefore, computational methods have been explored to predict catalytic residues. Here, we developed a new algorithm, L1pred, for catalytic residue prediction, by using the L1-logreg classifier to integrate eight sequence-based scoring functions. We tested L1pred and compared it against several existing sequence-based methods on carefully designed datasets Data604 and Data63. With ten-fold cross-validation, L1pred showed the area under precision-recall curve (AUPR) and the area under ROC curve (AUC) of 0.2198 and 0.9494 on the training dataset, Data604, respectively. In addition, on the independent test dataset, Data63, it showed the AUPR and AUC values of 0.2636 and 0.9375, respectively. Compared with other sequence-based methods, L1pred showed the best performance on both datasets. We also analyzed the importance of each attribute in the algorithm, and found that all the scores contributed more or less equally to the L1pred performance.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22558194 PMCID: PMC3338704 DOI: 10.1371/journal.pone.0035666
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Performance on the dataset Data604.
| Method | AUPR | AUC | Recall | Precision |
| JSD | 0.0692 | 0.8443 | 0.3299 | 0.1016 |
| Consurf | 0.0778 | 0.8969 | 0.3515 | 0.0944 |
| VJSD | 0.1300 | 0.8700 | 0.3724 | 0.1593 |
| CRpred | 0.1819 | 0.9338 | 0.3805 | 0.2310 |
| L1pred | 0.2198 | 0.9494 | 0.3741 | 0.2752 |
Figure 1PR curves of five methods on the Data604 dataset.
Performance on the dataset Data63.
| Method | AUPR | AUC | Recall | Precision |
| JSD | 0.0759 | 0.8410 | 0.4160 | 0.1061 |
| Consurf | 0.1019 | 0.8876 | 0.2017 | 0.1644 |
| VJSD | 0.1520 | 0.8599 | 0.3109 | 0.2349 |
| CRpred | 0.1809 | 0.9201 | 0.4244 | 0.2446 |
| L1pred | 0.2636 | 0.9375 | 0.3571 | 0.3257 |
Figure 2PR curves of five methods on the Data63 dataset.
Computing time of L1pred and CRpred methods.
| Method | AUPR | AUC | Recall | Precision |
| JSD | 0.0759 | 0.8410 | 0.4160 | 0.1061 |
| Consurf | 0.1019 | 0.8876 | 0.2017 | 0.1644 |
| VJSD | 0.1520 | 0.8599 | 0.3109 | 0.2349 |
| CRpred | 0.1809 | 0.9201 | 0.4244 | 0.2446 |
| L1pred | 0.2636 | 0.9375 | 0.3571 | 0.3257 |
Performance on the dataset EF-family.
| Method | AUPR | AUC | Recall | Precision |
| JSD | 0.0841 | 0.8543 | 0.0886 | 0.5522 |
| Consurf | 0.0969 | 0.8767 | 0.1229 | 0.3048 |
| VJSD | 0.1695 | 0.8873 | 0.2333 | 0.2756 |
| CRpred | 0.2256 | 0.9118 | 0.2853 | 0.3838 |
| Youn | N/A | 0.9298 | 0.5702 | 0.1851 |
| L1pred | 0.2589 | 0.9372 | 0.4478 | 0.2862 |
Performance of L1pred by removing attributes one by one.
| Method | AUPR | AUC | Recall | Precision |
| no-Consurf | 0.1688 | 0.9282 | 0.3854 | 0.2125 |
| no-SS | 0.2119 | 0.9467 | 0.4559 | 0.2440 |
| no-RT | 0.2128 | 0.9492 | 0.4370 | 0.2455 |
| no-ACH | 0.2129 | 0.9486 | 0.4736 | 0.2313 |
| no-VJSD | 0.2140 | 0.9488 | 0.4392 | 0.2466 |
| no-JSD | 0.2167 | 0.9492 | 0.4623 | 0.2422 |
| no-ASA | 0.2175 | 0.9494 | 0.3947 | 0.2640 |
| no-OP | 0.2184 | 0.9487 | 0.4128 | 0.2607 |
| L1pred | 0.2198 | 0.9494 | 0.3741 | 0.2752 |
Figure 3Weights of the top fifteen features on the Data604 dataset.
Figure 4Prediction results of L1pred on a dehydrogenase (a) and an asparaginase (b) Red: true positive, blue: false negative, and green: false positive.