Literature DB >> 19447927

Impossibility of successful classification when useful features are rare and weak.

Jiashun Jin1.   

Abstract

We study a two-class classification problem with a large number of features, out of which many are useless and only a few are useful, but we do not know which ones they are. The number of features is large compared with the number of training observations. Calibrating the model with 4 key parameters--the number of features, the size of the training sample, the fraction, and strength of useful features--we identify a region in parameter space where no trained classifier can reliably separate the two classes on fresh data. The complement of this region--where successful classification is possible--is also briefly discussed.

Year:  2009        PMID: 19447927      PMCID: PMC2682944          DOI: 10.1073/pnas.0903931106

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  7 in total

1.  Replication validity of genetic association studies.

Authors:  J P Ioannidis; E E Ntzani; T A Trikalinos; D G Contopoulos-Ioannidis
Journal:  Nat Genet       Date:  2001-11       Impact factor: 38.330

2.  Replicating genotype-phenotype associations.

Authors:  Stephen J Chanock; Teri Manolio; Michael Boehnke; Eric Boerwinkle; David J Hunter; Gilles Thomas; Joel N Hirschhorn; Goncalo Abecasis; David Altshuler; Joan E Bailey-Wilson; Lisa D Brooks; Lon R Cardon; Mark Daly; Peter Donnelly; Joseph F Fraumeni; Nelson B Freimer; Daniela S Gerhard; Chris Gunter; Alan E Guttmacher; Mark S Guyer; Emily L Harris; Josephine Hoh; Robert Hoover; C Augustine Kong; Kathleen R Merikangas; Cynthia C Morton; Lyle J Palmer; Elizabeth G Phimister; John P Rice; Jerry Roberts; Charles Rotimi; Margaret A Tucker; Kyle J Vogan; Sholom Wacholder; Ellen M Wijsman; Deborah M Winn; Francis S Collins
Journal:  Nature       Date:  2007-06-07       Impact factor: 49.962

3.  High Dimensional Classification Using Features Annealed Independence Rules.

Authors:  Jianqing Fan; Yingying Fan
Journal:  Ann Stat       Date:  2008       Impact factor: 4.028

4.  Higher criticism thresholding: Optimal feature selection when useful features are rare and weak.

Authors:  David Donoho; Jiashun Jin
Journal:  Proc Natl Acad Sci U S A       Date:  2008-09-24       Impact factor: 11.205

5.  Feature selection by higher criticism thresholding achieves the optimal phase diagram.

Authors:  David Donoho; Jiashun Jin
Journal:  Philos Trans A Math Phys Eng Sci       Date:  2009-11-13       Impact factor: 4.226

6.  Classification of sparse high-dimensional vectors.

Authors:  Yuri I Ingster; Christophe Pouet; Alexandre B Tsybakov
Journal:  Philos Trans A Math Phys Eng Sci       Date:  2009-11-13       Impact factor: 4.226

7.  Why most published research findings are false.

Authors:  John P A Ioannidis
Journal:  PLoS Med       Date:  2005-08-30       Impact factor: 11.613

  7 in total
  9 in total

1.  Network-based Prediction of Cancer under Genetic Storm.

Authors:  Ahmet Ay; Dihong Gong; Tamer Kahveci
Journal:  Cancer Inform       Date:  2014-10-15

2.  Statistical challenges of high-dimensional data.

Authors:  Iain M Johnstone; D Michael Titterington
Journal:  Philos Trans A Math Phys Eng Sci       Date:  2009-11-13       Impact factor: 4.226

3.  A Selective Overview of Variable Selection in High Dimensional Feature Space.

Authors:  Jianqing Fan; Jinchi Lv
Journal:  Stat Sin       Date:  2010-01       Impact factor: 1.261

4.  Classification based hypothesis testing in neuroscience: Below-chance level classification rates and overlooked statistical properties of linear parametric classifiers.

Authors:  Hamidreza Jamalabadi; Sarah Alizadeh; Monika Schönauer; Christian Leibold; Steffen Gais
Journal:  Hum Brain Mapp       Date:  2016-03-26       Impact factor: 5.038

5.  Study design in high-dimensional classification analysis.

Authors:  Brisa N Sánchez; Meihua Wu; Peter X K Song; Wen Wang
Journal:  Biostatistics       Date:  2016-05-05       Impact factor: 5.899

6.  Multiclass linear discriminant analysis with ultrahigh-dimensional features.

Authors:  Yanming Li; Hyokyoung G Hong; Yi Li
Journal:  Biometrics       Date:  2019-06-18       Impact factor: 2.571

7.  Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering.

Authors:  Eva Freyhult; Mattias Landfors; Jenny Önskog; Torgeir R Hvidsten; Patrik Rydén
Journal:  BMC Bioinformatics       Date:  2010-10-11       Impact factor: 3.169

8.  Classification of microarrays; synergistic effects between normalization, gene selection and machine learning.

Authors:  Jenny Önskog; Eva Freyhult; Mattias Landfors; Patrik Rydén; Torgeir R Hvidsten
Journal:  BMC Bioinformatics       Date:  2011-10-07       Impact factor: 3.169

9.  Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction.

Authors:  Ping Shi; Surajit Ray; Qifu Zhu; Mark A Kon
Journal:  BMC Bioinformatics       Date:  2011-09-23       Impact factor: 3.169

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.