| Literature DB >> 18464915 |
Joerg D Wichard1, Henning Cammann, Carsten Stephan, Thomas Tolxdorff.
Abstract
We investigate the performance of different classification models and their ability to recognize prostate cancer in an early stage. We build ensembles of classification models in order to increase the classification performance. We measure the performance of our models in an extensive cross-validation procedure and compare different classification models. The datasets come from clinical examinations and some of the classification models are already in use to support the urologists in their clinical work.Entities:
Mesh:
Year: 2008 PMID: 18464915 PMCID: PMC2366047 DOI: 10.1155/2008/218097
Source DB: PubMed Journal: J Biomed Biotechnol ISSN: 1110-7243
Figure 1A scatterplot matrix of the data. Each box shows a pair of variables and the cases are color-coded, a red cross marks PCa, and a blue circle non-PCa. The DRE is a binary variable (suspicious or nonsuspicious).
Figure 3A sketch of a classification tree, wherein the leaves represent classes and the branches represent conjunctions of features that lead to those classes.
Figure 2For every partition of the cross-validation, the data is divided in a training and a test set. The performance of each ensemble model was assessed on validation set which was initially removed and never included in model training.
The average performance of several classifier ensembles with respect to the validation set which was initially removed and never included in model training. We show the mean and the standard deviation values from 20 independent validation runs, no preprocessing was used.
| Accuracy | F-score | AUC | SPS95 | |
|---|---|---|---|---|
| PDA | 0.776 ± 0.026 | 0.823 ± 0.026 | 0.863 | 0.454 |
| Log.Reg. | 0.778 ± 0.038 | 0.823 ± 0.036 | 0.868 | 0.484 |
| MLP | 0.791 ± 0.045 | 0.823 ± 0.04 | 0.863 | 0.453 |
| SVM | 0.795 ± 0.023 | 0.833 ± 0.02 | 0.825 | 0.142 |
| CART | 0.757 ± 0.03 | 0.809 ± 0.026 | 0.843 | 0.394 |
| KNN | 0.756 ± 0.036 | 0.813 ± 0.032 | 0.809 | 0.309 |
| Mixed | 0.783 ± 0.03 | 0.828 ± 0.026 | 0.860 | 0.457 |
The confusion matrix for a binary classification problem.
| predicted class + 1 | predicted class − 1 | |
|---|---|---|
| Real class + 1 | True positive (tp) | False negative (fn) |
| Real class − 1 | False positive (fp) | True negative (tn) |
The average performance of several classifier ensembles with respect to the validation set which was initially removed and never included in model training. We show the mean and the standard deviation values from 20 independent validation runs wherein the training data was balanced.
| Accuracy | F-score | AUC | SPS95 | |
|---|---|---|---|---|
| PDA | 0.772 ± 0.034 | 0.809 ± 0.035 | 0.861 | 0.414 |
| Log.Reg. | 0.792 ± 0.03 | 0.834 ± 0.027 | 0.868 | 0.458 |
| MLP | 0.766 ± 0.027 | 0.787 ± 0.029 | 0.858 | 0.451 |
| SVM | 0.786 ± 0.038 | 0.816 ± 0.042 | 0.821 | 0.051 |
| CART | 0.755 ± 0.031 | 0.792 ± 0.029 | 0.841 | 0.376 |
| KNN | 0.726 ± 0.032 | 0.766 ± 0.034 | 0.801 | 0.297 |
| Mixed | 0.789 ± 0.033 | 0.830 ± 0.026 | 0.867 | 0.445 |