MOTIVATION: Microarray gene expression data are increasingly employed to identify sets of marker genes that accurately predict disease development and outcome in cancer. Many computational approaches have been proposed to construct such predictors. However, there is, as yet, no objective way to evaluate whether a new approach truly improves on the current state of the art. In addition no 'standard' computational approach has emerged which enables robust outcome prediction. RESULTS: An important contribution of this work is the description of a principled training and validation protocol, which allows objective evaluation of the complete methodology for constructing a predictor. We review the possible choices of computational approaches, with specific emphasis on predictor choice and reporter selection strategies. Employing this training-validation protocol, we evaluated different reporter selection strategies and predictors on six gene expression datasets of varying degrees of difficulty. We demonstrate that simple reporter selection strategies (forward filtering and shrunken centroids) work surprisingly well and outperform partial least squares in four of the six datasets. Similarly, simple predictors, such as the nearest mean classifier, outperform more complex classifiers. Our training-validation protocol provides a robust methodology to evaluate the performance of new computational approaches and to objectively compare outcome predictions on different datasets.
MOTIVATION: Microarray gene expression data are increasingly employed to identify sets of marker genes that accurately predict disease development and outcome in cancer. Many computational approaches have been proposed to construct such predictors. However, there is, as yet, no objective way to evaluate whether a new approach truly improves on the current state of the art. In addition no 'standard' computational approach has emerged which enables robust outcome prediction. RESULTS: An important contribution of this work is the description of a principled training and validation protocol, which allows objective evaluation of the complete methodology for constructing a predictor. We review the possible choices of computational approaches, with specific emphasis on predictor choice and reporter selection strategies. Employing this training-validation protocol, we evaluated different reporter selection strategies and predictors on six gene expression datasets of varying degrees of difficulty. We demonstrate that simple reporter selection strategies (forward filtering and shrunken centroids) work surprisingly well and outperform partial least squares in four of the six datasets. Similarly, simple predictors, such as the nearest mean classifier, outperform more complex classifiers. Our training-validation protocol provides a robust methodology to evaluate the performance of new computational approaches and to objectively compare outcome predictions on different datasets.
Authors: Brian V Balgobind; Marry M Van den Heuvel-Eibrink; Renee X De Menezes; Dirk Reinhardt; Iris H I M Hollink; Susan T J C M Arentsen-Peters; Elisabeth R van Wering; Gertjan J L Kaspers; Jacqueline Cloos; Evelien S J M de Bont; Jean-Michel Cayuela; Andre Baruchel; Claus Meyer; Rolf Marschalek; Jan Trka; Jan Stary; H Berna Beverloo; Rob Pieters; C Michel Zwaan; Monique L den Boer Journal: Haematologica Date: 2010-10-22 Impact factor: 9.941
Authors: Maïa Chanrion; Vincent Negre; Hélène Fontaine; Nicolas Salvetat; Frédéric Bibeau; Gaëtan Mac Grogan; Louis Mauriac; Dionyssios Katsaros; Franck Molina; Charles Theillet; Jean-Marie Darbon Journal: Clin Cancer Res Date: 2008-03-15 Impact factor: 12.531
Authors: Herman M J Sontrop; Perry D Moerland; René van den Ham; Marcel J T Reinders; Wim F J Verhaegh Journal: BMC Bioinformatics Date: 2009-11-26 Impact factor: 3.169