| Literature DB >> 21235786 |
Daniel Restrepo-Montoya1, Camilo Pino, Luis F Nino, Manuel E Patarroyo, Manuel A Patarroyo.
Abstract
BACKGROUND: Most predictive methods currently available for the identification of protein secretion mechanisms have focused on classically secreted proteins. In fact, only two methods have been reported for predicting non-classically secreted proteins of Gram-positive bacteria. This study describes the implementation of a sequence-based classifier, denoted as NClassG+, for identifying non-classically secreted Gram-positive bacterial proteins.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21235786 PMCID: PMC3025837 DOI: 10.1186/1471-2105-12-21
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Comparison of the evaluation measurements of NClassG+, SecretomeP 2.0 and SecretP 2.0 for the classification of Gram-positive bacterial proteins
| NClassG+ | SecretomeP 2.0 | SecretP 2.0 | ||||
|---|---|---|---|---|---|---|
| Accuracy | 0.88 | 0.90 | 0.88 | 0.84 | 0.69 | 0.83 |
| MCC | 0.77 | 0.71 | 0.76 | 0.52 | 0.46 | 0.50 |
| Specificity | 0.92 | 0.97 | 0.88 | 0.71 | 1.00 | 0.99 |
| Sensitivity | 0.84 | 0.87 | 0.86 | 0.54 | 0.34 | 0.32 |
The split set is a partition of the learning data set used in the training process, and the test set corresponds to the independent set used in the final test only for comparing the performance of NClassG+, SecretomeP 2.0 and SecretP 2.0. The split sets (a3) correspond to the ones reported in Figure 2 (B step).
Figure 1NClassG+ ROC Plot. ROC plot analysis of the performance of NClassG+.
Figure 2Methodology of NClassG+. The NClassG+ classifier was selected among a large number of possible classifiers resulting from all the possible combinations of protein vector representations and Kernel functions considered in this study. In step A, the candidate classifiers were built and compared in a nested k-fold cross-validation (CV) environment. Briefly, using the training and test data sets from the inner loop of the nested k-fold CV procedure, a classifier is optimized according to CV accuracy for all the possible Kernel function/feature combination pairs, selecting the pair with the best CV accuracy value in each iteration of the outer loop. The training and test data sets from the inner loop come from the training data set of the outer loop, the test data set from the outer loop is used to calculate an estimated accuracy of the whole process. Using the hyperparameters of the best classifier trained with the inner loop CV, a classifier is trained and tested with the outer loop data sets. NClassG+ is the classifier with the best CV accuracy, as calculated in the inner loop. In step B, prior to performing the nested k-fold CV procedure, the learning data set was partitioned to assess and compare the performance of the selected classifier against SecretomeP 2.0 and SecretP 2.0. The a1, a2, and a3 data sets are totally different partitions derived from the learning set used in the construction of NClassG+. * hyperparameter optimization.