| Literature DB >> 17181863 |
Hojin Moon1, Hongshik Ahn, Ralph L Kodell, Chien-Ju Lin, Songjoon Baek, James J Chen.
Abstract
Personalized medicine is defined by the use of genomic signatures of patients to assign effective therapies. We present Classification by Ensembles from Random Partitions (CERP) for class prediction and apply CERP to genomic data on leukemia patients and to genomic data with several clinical variables on breast cancer patients. CERP performs consistently well compared to the other classification algorithms. The predictive accuracy can be improved by adding some relevant clinical/histopathological measurements to the genomic data.Entities:
Mesh:
Year: 2006 PMID: 17181863 PMCID: PMC1794434 DOI: 10.1186/gb-2006-7-12-r121
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Performance of classification algorithms for the leukemia data based on 20 repetitions of 10-fold CV
| Algorithm | Accuracy | Sensitivity* | Specificity† | PPV‡ | NPV§ |
| CERP | 98.6 (<.001) | 96.0 (<.001) | 100.0 (.000) | 100.0 (.000) | 97.9 (<.001) |
| RF | 97.9 (.008) | 95.0 (.022) | 99.5 (.009) | 99.0 (.018) | 97.4 (.011) |
| AdaBoost | 96.0 (.005) | 95.6 (.012) | 96.3 (.009) | 93.2 (.016) | 97.6 (.006) |
| SVM | 97.2 (.012) | 92.0 (.034) | 100.0 (.000) | 100.0 (.000) | 95.9 (.017) |
| DLDA | 97.5 (.007) | 96.0 (<.001) | 98.3 (.011) | 96.8 (.021) | 97.9 (<.001) |
| SC | 96.0 (.004) | 96.0 (<.001) | 96.0 (.007) | 92.7 (.011) | 97.8 (<.001) |
| CART | 81.7 (.035) | 76.2 (.046) | 84.6 (.053) | 72.4 (.067) | 87.0 (.021) |
| CRUISE | 86.8 (.021) | 79.8 (.040) | 90.5 (.029) | 82.0 (.044) | 89.4 (.018) |
| QUEST | 86.9 (.020) | 79.4 (.042) | 91.0 (.032) | 82.7 (.048) | 89.3 (.018) |
SD is given in parentheses. *AML considered positive. †ALL considered negative. ‡Positive predictive value. §Negative predictive value.
Performance of classification algorithms for the van 't Veer et al. breast cancer genomic data based on 20 repetitions of 10-fold CV
| Algorithm | Accuracy | Sensitivity* | Specificity† | PPV‡ | NPV§ |
| CERP | 62.3 (.023) | 50.9 (.037) | 71.1 (.026) | 57.7 (.029) | 65.2 (.020) |
| RF | 62.5 (.019) | 46.8 (.032) | 74.7 (.032) | 58.9 (.029) | 64.5 (.014) |
| AdaBoost | 58.8 (.041) | 32.1 (.089) | 79.4 (.069) | 55.0 (.094) | 60.3 (.028) |
| SVM | 56.5 (.029) | 39.6 (.053) | 69.7 (.027) | 50.1 (.042) | 59.9 (.025) |
| DLDA | 62.5 (.019) | 52.4 (.023) | 70.3 (.026) | 57.8 (.026) | 65.6 (.015) |
| SC | 60.9 (.019) | 50.6 (.026) | 68.9 (.023) | 55.7 (.024) | 64.3 (.016) |
| CART | 54.6 (.028) | 17.5 (.058) | 83.2 (.047) | 44.6 (.084) | 56.6 (.018) |
| CRUISE | 55.1 (.048) | 21.5 (.100) | 81.0 (.059) | 45.6 (.112) | 57.3 (.034) |
| QUEST | 56.5 (.044) | 22.8 (.080) | 82.6 (.077) | 51.0 (.117) | 58.1 (.027) |
SD is given in parentheses. *Poor prognosis considered positive. †Good prognosis considered negative. ‡Positive predictive value. §Negative predictive value.
Performance of classification algorithms for the van 't Veer et al. breast cancer genomic and clinical/histopathological data based on 20 trials of 10-fold CV
| Algorithm | Accuracy | Sensitivity* | Specificity† | PPV‡ | NPV§ |
| CERP | 63.3 (.024) | 52.5 (.042) | 71.6 (.027) | 58.8 (.031) | 66.1 (.022) |
| RF | 63.0 (.023) | 48.2 (.034) | 74.4 (.034) | 59.4 (.034) | 65.1 (.016) |
| AdaBoost | 61.9 (.045) | 38.7 (.090) | 79.8 (.065) | 59.9 (.085) | 62.8 (.034) |
| SVM | 57.4 (.027) | 40.3 (.044) | 70.7 (.037) | 51.5 (.040) | 60.5 (.021) |
| DLDA | 62.9 (.017) | 52.6 (.025) | 70.9 (.027) | 58.4 (.023) | 66.0 (.013) |
| SC | 62.2 (.018) | 53.8 (.025) | 68.8 (.018) | 57.1 (.022) | 65.8 (.016) |
| CART | 54.7 (.031) | 21.6 (.096) | 80.3 (.063) | 44.3 (.103) | 57.2 (.022) |
| CRUISE | 57.5 (.047) | 24.0 (.100) | 83.4 (.063) | 51.9 (.120) | 58.8 (.032) |
| QUEST | 56.3 (.036) | 21.8 (.062) | 83.1 (.071) | 50.7 (.082) | 57.8 (.021) |
SD is given in parentheses. *Poor prognosis considered positive. †Good prognosis considered negative. ‡Positive predictive value. §Negative predictive value.
Figure 1Accuracy of classification algorithms for the van de Vijver et al. [17] data.
Enhancement of the prediction accuracy by ensemble majority voting*
| ρ | Prediction accuracy of each base classifier | |||||
| 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | ||
| 3 | 0 | 0.5 | 0.648 | 0.784 | 0.896 | 0.972 |
| 0.1 | 0.5 | 0.635 | 0.762 | 0.871 | 0.953 | |
| 0.3 | 0.5 | 0.618 | 0.732 | 0.836 | 0.927 | |
| 15 | 0 | 0.5 | 0.787 | 0.950 | 0.996 | 1.000 |
| 0.1 | 0.5 | 0.695 | 0.851 | 0.947 | 0.990 | |
| 0.3 | 0.5 | 0.636 | 0.762 | 0.868 | 0.948 | |
| 25 | 0 | 0.5 | 0.846 | 0.986 | 1.000 | 1.000 |
| 0.1 | 0.5 | 0.708 | 0.868 | 0.958 | 0.993 | |
| 0.3 | 0.5 | 0.639 | 0.766 | 0.872 | 0.951 | |
| 101 | 0 | 0.5 | 0.980 | 1.000 | 1.000 | 1.000 |
| 0.1 | 0.5 | 0.728 | 0.891 | 0.971 | 0.996 | |
| 0.3 | 0.5 | 0.642 | 0.771 | 0.877 | 0.954 | |
*Binomial probability used for ρ = 0, with normal approximation for r > 25; Beta-binomial probability used for ρ > 0.
Figure 2An ensemble in CERP.