Marcel Dettling1, Peter Bühlmann. 1. Seminar für Statistik, ETH Zürich, CH-8092, Switzerland. dettling@stat.math.ethz.ch
Abstract
MOTIVATION: Microarray experiments generate large datasets with expression values for thousands of genes but not more than a few dozens of samples. Accurate supervised classification of tissue samples in such high-dimensional problems is difficult but often crucial for successful diagnosis and treatment. A promising way to meet this challenge is by using boosting in conjunction with decision trees. RESULTS: We demonstrate that the generic boosting algorithm needs some modification to become an accurate classifier in the context of gene expression data. In particular, we present a feature preselection method, a more robust boosting procedure and a new approach for multi-categorical problems. This allows for slight to drastic increase in performance and yields competitive results on several publicly available datasets. AVAILABILITY: Software for the modified boosting algorithms as well as for decision trees is available for free in R at http://stat.ethz.ch/~dettling/boosting.html.
MOTIVATION: Microarray experiments generate large datasets with expression values for thousands of genes but not more than a few dozens of samples. Accurate supervised classification of tissue samples in such high-dimensional problems is difficult but often crucial for successful diagnosis and treatment. A promising way to meet this challenge is by using boosting in conjunction with decision trees. RESULTS: We demonstrate that the generic boosting algorithm needs some modification to become an accurate classifier in the context of gene expression data. In particular, we present a feature preselection method, a more robust boosting procedure and a new approach for multi-categorical problems. This allows for slight to drastic increase in performance and yields competitive results on several publicly available datasets. AVAILABILITY: Software for the modified boosting algorithms as well as for decision trees is available for free in R at http://stat.ethz.ch/~dettling/boosting.html.
Authors: Angelo Gámez-Pozo; Iker Sánchez-Navarro; Manuel Nistal; Enrique Calvo; Rosario Madero; Esther Díaz; Emilio Camafeita; Javier de Castro; Juan Antonio López; Manuel González-Barón; Enrique Espinosa; Juan Angel Fresno Vara Journal: PLoS One Date: 2009-11-05 Impact factor: 3.240