Literature DB >> 22347720

Empirical Performance of Cross-Validation With Oracle Methods in a Genomics Context.

Josue G Martinez1, Raymond J Carroll, Samuel Müller, Joshua N Sampson, Nilanjan Chatterjee.   

Abstract

When employing model selection methods with oracle properties such as the smoothly clipped absolute deviation (SCAD) and the Adaptive Lasso, it is typical to estimate the smoothing parameter by m-fold cross-validation, for example, m = 10. In problems where the true regression function is sparse and the signals large, such cross-validation typically works well. However, in regression modeling of genomic studies involving Single Nucleotide Polymorphisms (SNP), the true regression functions, while thought to be sparse, do not have large signals. We demonstrate empirically that in such problems, the number of selected variables using SCAD and the Adaptive Lasso, with 10-fold cross-validation, is a random variable that has considerable and surprising variation. Similar remarks apply to non-oracle methods such as the Lasso. Our study strongly questions the suitability of performing only a single run of m-fold cross-validation with any oracle method, and not just the SCAD and Adaptive Lasso.

Entities:  

Year:  2011        PMID: 22347720      PMCID: PMC3281424          DOI: 10.1198/tas.2011.11052

Source DB:  PubMed          Journal:  Am Stat        ISSN: 0003-1305            Impact factor:   8.710


  2 in total

1.  Tuning parameter selectors for the smoothly clipped absolute deviation method.

Authors:  Hansheng Wang; Runze Li; Chih-Ling Tsai
Journal:  Biometrika       Date:  2007-08-01       Impact factor: 2.445

2.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models.

Authors:  Hui Zou; Runze Li
Journal:  Ann Stat       Date:  2008-08-01       Impact factor: 4.028

  2 in total
  10 in total

1.  Identification of important regressor groups, subgroups and individuals via regularization methods: application to gut microbiome data.

Authors:  Tanya P Garcia; Samuel Müller; Raymond J Carroll; Rosemary L Walzem
Journal:  Bioinformatics       Date:  2013-10-24       Impact factor: 6.937

2.  Structured variable selection with q-values.

Authors:  Tanya P Garcia; Samuel Müller; Raymond J Carroll; Tamara N Dunn; Anthony P Thomas; Sean H Adams; Suresh D Pillai; Rosemary L Walzem
Journal:  Biostatistics       Date:  2013-04-10       Impact factor: 5.899

3.  DNA mismatch repair MSH2 gene-based SNP associated with different populations.

Authors:  Zainularifeen Abduljaleel; Faisal A Al-Allaf; Wajahatullah Khan; Mohammad Athar; Naiyer Shahzad; Mohiuddin M Taher; Mohammed Alanazi; Mohamed Elrobh; Narasimha P Reddy
Journal:  Mol Genet Genomics       Date:  2014-02-22       Impact factor: 3.291

4.  The Pyroptosis-Related Signature Predicts Diagnosis and Indicates Immune Characteristic in Major Depressive Disorder.

Authors:  Zhifang Deng; Jue Liu; Shen He; Wenqi Gao
Journal:  Front Pharmacol       Date:  2022-05-19       Impact factor: 5.988

5.  Network-based biomarkers enhance classical approaches to prognostic gene expression signatures.

Authors:  Rebecca L Barter; Sarah-Jane Schramm; Graham J Mann; Yee Hwa Yang
Journal:  BMC Syst Biol       Date:  2014-12-08

6.  Regularized group regression methods for genomic prediction: Bridge, MCP, SCAD, group bridge, group lasso, sparse group lasso, group MCP and group SCAD.

Authors:  Joseph O Ogutu; Hans-Peter Piepho
Journal:  BMC Proc       Date:  2014-10-07

7.  Added predictive value of omics data: specific issues related to validation illustrated by two case studies.

Authors:  Riccardo De Bin; Tobias Herold; Anne-Laure Boulesteix
Journal:  BMC Med Res Methodol       Date:  2014-10-28       Impact factor: 4.615

8.  Using machine learning tools for protein database biocuration assistance.

Authors:  Caroline König; Ilmira Shaim; Alfredo Vellido; Enrique Romero; René Alquézar; Jesús Giraldo
Journal:  Sci Rep       Date:  2018-07-05       Impact factor: 4.379

9.  Label noise in subtype discrimination of class C G protein-coupled receptors: A systematic approach to the analysis of classification errors.

Authors:  Caroline König; Martha I Cárdenas; Jesús Giraldo; René Alquézar; Alfredo Vellido
Journal:  BMC Bioinformatics       Date:  2015-09-29       Impact factor: 3.169

10.  Understanding the undelaying mechanism of HA-subtyping in the level of physic-chemical characteristics of protein.

Authors:  Mansour Ebrahimi; Parisa Aghagolzadeh; Narges Shamabadi; Ahmad Tahmasebi; Mohammed Alsharifi; David L Adelson; Farhid Hemmatzadeh; Esmaeil Ebrahimie
Journal:  PLoS One       Date:  2014-05-08       Impact factor: 3.240

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.