Literature DB >> 27601553

Cautionary Note on Using Cross-Validation for Molecular Classification.

Li-Xuan Qin1, Huei-Chung Huang1, Colin B Begg1.   

Abstract

Purpose Reproducibility of scientific experimentation has become a major concern because of the perception that many published biomedical studies cannot be replicated. In this article, we draw attention to the connection between inflated overoptimistic findings and the use of cross-validation for error estimation in molecular classification studies. We show that, in the absence of careful design to prevent artifacts caused by systematic differences in the processing of specimens, established tools such as cross-validation can lead to a spurious estimate of the error rate in the overoptimistic direction, regardless of the use of data normalization as an effort to remove these artifacts. Methods We demonstrated this important yet overlooked complication of cross-validation using a unique pair of data sets on the same set of tumor samples. One data set was collected with uniform handling to prevent handling effects; the other was collected without uniform handling and exhibited handling effects. The paired data sets were used to estimate the biologic effects of the samples and the handling effects of the arrays in the latter data set, which were then used to simulate data using virtual rehybridization following various array-to-sample assignment schemes. Results Our study showed that (1) cross-validation tended to underestimate the error rate when the data possessed confounding handling effects; (2) depending on the relative amount of handling effects, normalization may further worsen the underestimation of the error rate; and (3) balanced assignment of arrays to comparison groups allowed cross-validation to provide an unbiased error estimate. Conclusion Our study demonstrates the benefits of balanced array assignment for reproducible molecular classification and calls for caution on the routine use of data normalization and cross-validation in such analysis.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 27601553      PMCID: PMC5477984          DOI: 10.1200/JCO.2016.68.1031

Source DB:  PubMed          Journal:  J Clin Oncol        ISSN: 0732-183X            Impact factor:   44.544


  29 in total

Review 1.  Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification.

Authors:  Richard Simon; Michael D Radmacher; Kevin Dobbin; Lisa M McShane
Journal:  J Natl Cancer Inst       Date:  2003-01-01       Impact factor: 13.506

2.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data.

Authors:  Rafael A Irizarry; Bridget Hobbs; Francois Collin; Yasmin D Beazer-Barclay; Kristen J Antonellis; Uwe Scherf; Terence P Speed
Journal:  Biostatistics       Date:  2003-04       Impact factor: 5.899

3.  Gene expression in the urinary bladder: a common carcinoma in situ gene expression signature exists disregarding histopathological classification.

Authors:  Lars Dyrskjøt; Mogens Kruhøffer; Thomas Thykjaer; Niels Marcussen; Jens L Jensen; Klaus Møller; Torben F Ørntoft
Journal:  Cancer Res       Date:  2004-06-01       Impact factor: 12.701

4.  The sva package for removing batch effects and other unwanted variation in high-throughput experiments.

Authors:  Jeffrey T Leek; W Evan Johnson; Hilary S Parker; Andrew E Jaffe; John D Storey
Journal:  Bioinformatics       Date:  2012-01-17       Impact factor: 6.937

5.  Adjusting batch effects in microarray expression data using empirical Bayes methods.

Authors:  W Evan Johnson; Cheng Li; Ariel Rabinovic
Journal:  Biostatistics       Date:  2006-04-21       Impact factor: 5.899

6.  Common genetic variants account for differences in gene expression among ethnic groups.

Authors:  Richard S Spielman; Laurel A Bastone; Joshua T Burdick; Michael Morley; Warren J Ewens; Vivian G Cheung
Journal:  Nat Genet       Date:  2007-01-07       Impact factor: 38.330

7.  Opinion: Reproducible research can still be wrong: adopting a prevention approach.

Authors:  Jeffrey T Leek; Roger D Peng
Journal:  Proc Natl Acad Sci U S A       Date:  2015-02-10       Impact factor: 11.205

8.  Cancer biomarkers: can we turn recent failures into success?

Authors:  Eleftherios P Diamandis
Journal:  J Natl Cancer Inst       Date:  2010-08-12       Impact factor: 13.506

9.  Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors:  Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal:  J Stat Softw       Date:  2010       Impact factor: 6.440

10.  MicroRNA array normalization: an evaluation using a randomized dataset as the benchmark.

Authors:  Li-Xuan Qin; Qin Zhou
Journal:  PLoS One       Date:  2014-06-06       Impact factor: 3.240

View more
  8 in total

1.  Making External Validation Valid for Molecular Classifier Development.

Authors:  Yilin Wu; Huei-Chung Huang; Li-Xuan Qin
Journal:  JCO Precis Oncol       Date:  2021-08-05

2.  Performance evaluation of transcriptomics data normalization for survival risk prediction.

Authors:  Ai Ni; Li-Xuan Qin
Journal:  Brief Bioinform       Date:  2021-11-05       Impact factor: 13.994

3.  Empirical evaluation of data normalization methods for molecular classification.

Authors:  Huei-Chung Huang; Li-Xuan Qin
Journal:  PeerJ       Date:  2018-04-11       Impact factor: 2.984

Review 4.  Toward a Standardized Strategy of Clinical Metabolomics for the Advancement of Precision Medicine.

Authors:  Nguyen Phuoc Long; Tran Diem Nghi; Yun Pyo Kang; Nguyen Hoang Anh; Hyung Min Kim; Sang Ki Park; Sung Won Kwon
Journal:  Metabolites       Date:  2020-01-29

5.  Development and Comparison of Multimodal Models for Preoperative Prediction of Outcomes After Endovascular Aneurysm Repair.

Authors:  Yonggang Wang; Min Zhou; Yong Ding; Xu Li; Zhenyu Zhou; Zhenyu Shi; Weiguo Fu
Journal:  Front Cardiovasc Med       Date:  2022-04-26

6.  PRECISION.array: An R Package for Benchmarking microRNA Array Data Normalization in the Context of Sample Classification.

Authors:  Huei-Chung Huang; Yilin Wu; Qihang Yang; Li-Xuan Qin
Journal:  Front Genet       Date:  2022-07-22       Impact factor: 4.772

7.  Prediction of Liver Triglyceride Content in Early Lactation Multiparous Holstein Cows Using Blood Metabolite, Mineral, and Protein Biomarker Concentrations.

Authors:  Ryan S Pralle; Henry T Holdorf; Rafael Caputo Oliveira; Claira R Seely; Sophia J Kendall; Heather M White
Journal:  Animals (Basel)       Date:  2022-09-24       Impact factor: 3.231

8.  Searching for best lower dimensional visualization angles for high dimensional RNA-Seq data.

Authors:  Wanli Zhang; Yanming Di
Journal:  PeerJ       Date:  2018-07-12       Impact factor: 2.984

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.