Literature DB >> 17392326

Classification based upon gene expression data: bias and precision of error rates.

Ian A Wood1, Peter M Visscher, Kerrie L Mengersen.   

Abstract

MOTIVATION: Gene expression data offer a large number of potentially useful predictors for the classification of tissue samples into classes, such as diseased and non-diseased. The predictive error rate of classifiers can be estimated using methods such as cross-validation. We have investigated issues of interpretation and potential bias in the reporting of error rate estimates. The issues considered here are optimization and selection biases, sampling effects, measures of misclassification rate, baseline error rates, two-level external cross-validation and a novel proposal for detection of bias using the permutation mean.
RESULTS: Reporting an optimal estimated error rate incurs an optimization bias. Downward bias of 3-5% was found in an existing study of classification based on gene expression data and may be endemic in similar studies. Using a simulated non-informative dataset and two example datasets from existing studies, we show how bias can be detected through the use of label permutations and avoided using two-level external cross-validation. Some studies avoid optimization bias by using single-level cross-validation and a test set, but error rates can be more accurately estimated via two-level cross-validation. In addition to estimating the simple overall error rate, we recommend reporting class error rates plus where possible the conditional risk incorporating prior class probabilities and a misclassification cost matrix. We also describe baseline error rates derived from three trivial classifiers which ignore the predictors. AVAILABILITY: R code which implements two-level external cross-validation with the PAMR package, experiment code, dataset details and additional figures are freely available for non-commercial use from http://www.maths.qut.edu.au/profiles/wood/permr.jsp

Mesh:

Year:  2007        PMID: 17392326     DOI: 10.1093/bioinformatics/btm117

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  36 in total

Review 1.  Challenges and standards in reporting diagnostic and prognostic biomarker studies.

Authors:  Francisco Azuaje; Yvan Devaux; Daniel Wagner
Journal:  Clin Transl Sci       Date:  2009-04       Impact factor: 4.689

2.  An empirical assessment of validation practices for molecular classifiers.

Authors:  Peter J Castaldi; Issa J Dahabreh; John P A Ioannidis
Journal:  Brief Bioinform       Date:  2011-02-07       Impact factor: 11.622

3.  Optimal classifier selection and negative bias in error rate estimation: an empirical study on high-dimensional prediction.

Authors:  Anne-Laure Boulesteix; Carolin Strobl
Journal:  BMC Med Res Methodol       Date:  2009-12-21       Impact factor: 4.615

4.  The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models.

Authors:  Leming Shi; Gregory Campbell; Wendell D Jones; Fabien Campagne; Zhining Wen; Stephen J Walker; Zhenqiang Su; Tzu-Ming Chu; Federico M Goodsaid; Lajos Pusztai; John D Shaughnessy; André Oberthuer; Russell S Thomas; Richard S Paules; Mark Fielden; Bart Barlogie; Weijie Chen; Pan Du; Matthias Fischer; Cesare Furlanello; Brandon D Gallas; Xijin Ge; Dalila B Megherbi; W Fraser Symmans; May D Wang; John Zhang; Hans Bitter; Benedikt Brors; Pierre R Bushel; Max Bylesjo; Minjun Chen; Jie Cheng; Jing Cheng; Jeff Chou; Timothy S Davison; Mauro Delorenzi; Youping Deng; Viswanath Devanarayan; David J Dix; Joaquin Dopazo; Kevin C Dorff; Fathi Elloumi; Jianqing Fan; Shicai Fan; Xiaohui Fan; Hong Fang; Nina Gonzaludo; Kenneth R Hess; Huixiao Hong; Jun Huan; Rafael A Irizarry; Richard Judson; Dilafruz Juraeva; Samir Lababidi; Christophe G Lambert; Li Li; Yanen Li; Zhen Li; Simon M Lin; Guozhen Liu; Edward K Lobenhofer; Jun Luo; Wen Luo; Matthew N McCall; Yuri Nikolsky; Gene A Pennello; Roger G Perkins; Reena Philip; Vlad Popovici; Nathan D Price; Feng Qian; Andreas Scherer; Tieliu Shi; Weiwei Shi; Jaeyun Sung; Danielle Thierry-Mieg; Jean Thierry-Mieg; Venkata Thodima; Johan Trygg; Lakshmi Vishnuvajjala; Sue Jane Wang; Jianping Wu; Yichao Wu; Qian Xie; Waleed A Yousef; Liang Zhang; Xuegong Zhang; Sheng Zhong; Yiming Zhou; Sheng Zhu; Dhivya Arasappan; Wenjun Bao; Anne Bergstrom Lucas; Frank Berthold; Richard J Brennan; Andreas Buness; Jennifer G Catalano; Chang Chang; Rong Chen; Yiyu Cheng; Jian Cui; Wendy Czika; Francesca Demichelis; Xutao Deng; Damir Dosymbekov; Roland Eils; Yang Feng; Jennifer Fostel; Stephanie Fulmer-Smentek; James C Fuscoe; Laurent Gatto; Weigong Ge; Darlene R Goldstein; Li Guo; Donald N Halbert; Jing Han; Stephen C Harris; Christos Hatzis; Damir Herman; Jianping Huang; Roderick V Jensen; Rui Jiang; Charles D Johnson; Giuseppe Jurman; Yvonne Kahlert; Sadik A Khuder; Matthias Kohl; Jianying Li; Li Li; Menglong Li; Quan-Zhen Li; Shao Li; Zhiguang Li; Jie Liu; Ying Liu; Zhichao Liu; Lu Meng; Manuel Madera; Francisco Martinez-Murillo; Ignacio Medina; Joseph Meehan; Kelci Miclaus; Richard A Moffitt; David Montaner; Piali Mukherjee; George J Mulligan; Padraic Neville; Tatiana Nikolskaya; Baitang Ning; Grier P Page; Joel Parker; R Mitchell Parry; Xuejun Peng; Ron L Peterson; John H Phan; Brian Quanz; Yi Ren; Samantha Riccadonna; Alan H Roter; Frank W Samuelson; Martin M Schumacher; Joseph D Shambaugh; Qiang Shi; Richard Shippy; Shengzhu Si; Aaron Smalter; Christos Sotiriou; Mat Soukup; Frank Staedtler; Guido Steiner; Todd H Stokes; Qinglan Sun; Pei-Yi Tan; Rong Tang; Zivana Tezak; Brett Thorn; Marina Tsyganova; Yaron Turpaz; Silvia C Vega; Roberto Visintainer; Juergen von Frese; Charles Wang; Eric Wang; Junwei Wang; Wei Wang; Frank Westermann; James C Willey; Matthew Woods; Shujian Wu; Nianqing Xiao; Joshua Xu; Lei Xu; Lun Yang; Xiao Zeng; Jialu Zhang; Li Zhang; Min Zhang; Chen Zhao; Raj K Puri; Uwe Scherf; Weida Tong; Russell D Wolfinger
Journal:  Nat Biotechnol       Date:  2010-07-30       Impact factor: 54.908

5.  Common component classification: what can we learn from machine learning?

Authors:  Ariana Anderson; Jennifer S Labus; Eduardo P Vianna; Emeran A Mayer; Mark S Cohen
Journal:  Neuroimage       Date:  2010-06-25       Impact factor: 6.556

6.  Effect of training-sample size and classification difficulty on the accuracy of genomic predictors.

Authors:  Vlad Popovici; Weijie Chen; Brandon G Gallas; Christos Hatzis; Weiwei Shi; Frank W Samuelson; Yuri Nikolsky; Marina Tsyganova; Alex Ishkin; Tatiana Nikolskaya; Kenneth R Hess; Vicente Valero; Daniel Booser; Mauro Delorenzi; Gabriel N Hortobagyi; Leming Shi; W Fraser Symmans; Lajos Pusztai
Journal:  Breast Cancer Res       Date:  2010-01-11       Impact factor: 6.466

7.  Simpler evaluation of predictions and signature stability for gene expression data.

Authors:  Yvonne E Pittelkow; Susan R Wilson
Journal:  J Biomed Biotechnol       Date:  2010-01-10

8.  Pitfalls of supervised feature selection.

Authors:  Pawel Smialowski; Dmitrij Frishman; Stefan Kramer
Journal:  Bioinformatics       Date:  2009-10-29       Impact factor: 6.937

9.  Robust model selection for classification of microarrays.

Authors:  Ikumi Suzuki; Takashi Takenouchi; Miki Ohira; Shigeyuki Oba; Shin Ishii
Journal:  Cancer Inform       Date:  2009-06-25

10.  ArrayMining: a modular web-application for microarray analysis combining ensemble and consensus methods with cross-study normalization.

Authors:  Enrico Glaab; Jonathan M Garibaldi; Natalio Krasnogor
Journal:  BMC Bioinformatics       Date:  2009-10-28       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.