Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Cross-validation under separate sampling: strong bias and how to correct it.

Literature DB >> 25123902

Cross-validation under separate sampling: strong bias and how to correct it.

Ulisses M Braga-Neto¹, Amin Zollanvari¹, Edward R Dougherty¹.

Abstract

MOTIVATION: It is commonly assumed in pattern recognition that cross-validation error estimation is 'almost unbiased' as long as the number of folds is not too small. While this is true for random sampling, it is not true with separate sampling, where the populations are independently sampled, which is a common situation in bioinformatics.
RESULTS: We demonstrate, via analytical and numerical methods, that classical cross-validation can have strong bias under separate sampling, depending on the difference between the sampling ratios and the true population probabilities. We propose a new separate-sampling cross-validation error estimator, and prove that it satisfies an 'almost unbiased' theorem similar to that of random-sampling cross-validation. We present two case studies with previously published data, which show that the results can change drastically if the correct form of cross-validation is used.
AVAILABILITY AND IMPLEMENTATION: The source code in C++, along with the Supplementary Materials, is available at: http://gsp.tamu.edu/Publications/supplementary/zollanvari13/.

Entities: Chemical Gene

Mesh：

Year: 2014 PMID： 25123902 PMCID： PMC4296143 DOI： 10.1093/bioinformatics/btu527

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

9 in total

1. Is cross-validation valid for small-sample microarray classification?

Authors: Ulisses M Braga-Neto; Edward R Dougherty
Journal: Bioinformatics Date: 2004-02-12 Impact factor: 6.937

2. Colorectal cancer epidemiology: incidence, mortality, survival, and risk factors.

Authors: Fatima A Haggar; Robin P Boushey
Journal: Clin Colon Rectal Surg Date: 2009-11

3. The molecular classification of multiple myeloma.

Authors: Fenghuang Zhan; Yongsheng Huang; Simona Colla; James P Stewart; Ichiro Hanamura; Sushil Gupta; Joshua Epstein; Shmuel Yaccoby; Jeffrey Sawyer; Bart Burington; Elias Anaissie; Klaus Hollmig; Mauricio Pineda-Roman; Guido Tricot; Frits van Rhee; Ronald Walker; Maurizio Zangari; John Crowley; Bart Barlogie; John D Shaughnessy
Journal: Blood Date: 2006-05-25 Impact factor: 22.113

4. Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series.

Authors: Christine Desmedt; Fanny Piette; Sherene Loi; Yixin Wang; Françoise Lallemand; Benjamin Haibe-Kains; Giuseppe Viale; Mauro Delorenzi; Yi Zhang; Mahasti Saghatchian d'Assignies; Jonas Bergh; Rosette Lidereau; Paul Ellis; Adrian L Harris; Jan G M Klijn; John A Foekens; Fatima Cardoso; Martine J Piccart; Marc Buyse; Christos Sotiriou
Journal: Clin Cancer Res Date: 2007-06-01 Impact factor: 12.531

5. Effect of separate sampling on classification accuracy.

Authors: Mohammad Shahrokh Esfahani; Edward R Dougherty
Journal: Bioinformatics Date: 2013-11-20 Impact factor: 6.937

6. Selection bias in gene extraction on the basis of microarray gene-expression data.

Authors: Christophe Ambroise; Geoffrey J McLachlan
Journal: Proc Natl Acad Sci U S A Date: 2002-04-30 Impact factor: 11.205

7. Prognostically useful gene-expression profiles in acute myeloid leukemia.

Authors: Peter J M Valk; Roel G W Verhaak; M Antoinette Beijen; Claudia A J Erpelinck; Sahar Barjesteh van Waalwijk van Doorn-Khosrovani; Judith M Boer; H Berna Beverloo; Michael J Moorhouse; Peter J van der Spek; Bob Löwenberg; Ruud Delwel
Journal: N Engl J Med Date: 2004-04-15 Impact factor: 91.245

8. Incidence of Parkinson's disease: variation by age, gender, and race/ethnicity.

Authors: Stephen K Van Den Eeden; Caroline M Tanner; Allan L Bernstein; Robin D Fross; Amethyst Leimpeter; Daniel A Bloch; Lorene M Nelson
Journal: Am J Epidemiol Date: 2003-06-01 Impact factor: 4.897

9. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling.

Authors: Eng-Juh Yeoh; Mary E Ross; Sheila A Shurtleff; W Kent Williams; Divyen Patel; Rami Mahfouz; Fred G Behm; Susana C Raimondi; Mary V Relling; Anami Patel; Cheng Cheng; Dario Campana; Dawn Wilkins; Xiaodong Zhou; Jinyan Li; Huiqing Liu; Ching-Hon Pui; William E Evans; Clayton Naeve; Limsoon Wong; James R Downing
Journal: Cancer Cell Date: 2002-03 Impact factor: 31.743

9 in total

4 in total

4. Incorporating prior knowledge induced from stochastic differential equations in the classification of stochastic observations.

Authors: Amin Zollanvari; Edward R Dougherty
Journal: EURASIP J Bioinform Syst Biol Date: 2016-01-20