Literature DB >> 19169416

High Dimensional Classification Using Features Annealed Independence Rules.

Jianqing Fan1, Yingying Fan.   

Abstract

Classification using high-dimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other high-throughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel and Levina (2004) show that the Fisher discriminant performs poorly due to diverging spectra and they propose to use the independence rule to overcome the problem. We first demonstrate that even for the independence classification rule, classification using all the features can be as bad as the random guessing due to noise accumulation in estimating population centroids in high-dimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as bad as the random guessing. Thus, it is paramountly important to select a subset of important features for high-dimensional classification, resulting in Features Annealed Independence Rules (FAIR). The conditions under which all the important features can be selected by the two-sample t-statistic are established. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.

Entities:  

Year:  2008        PMID: 19169416      PMCID: PMC2630123          DOI: 10.1214/07-AOS504

Source DB:  PubMed          Journal:  Ann Stat        ISSN: 0090-5364            Impact factor:   4.028


  11 in total

1.  Singular value decomposition regression models for classification of tumors from microarray experiments.

Authors:  Debashis Ghosh
Journal:  Pac Symp Biocomput       Date:  2002

2.  Dimension reduction strategies for analyzing global gene expression data with a response.

Authors:  Francesca Chiaromonte; Jessica Martinelli
Journal:  Math Biosci       Date:  2002-03       Impact factor: 2.144

3.  Boosting for tumor classification with gene expression data.

Authors:  Marcel Dettling; Peter Bühlmann
Journal:  Bioinformatics       Date:  2003-06-12       Impact factor: 6.937

4.  Linear regression and two-class classification with gene expression data.

Authors:  Xiaohong Huang; Wei Pan
Journal:  Bioinformatics       Date:  2003-11-01       Impact factor: 6.937

5.  PLS dimension reduction for classification with microarray data.

Authors:  Anne-Laure Boulesteix
Journal:  Stat Appl Genet Mol Biol       Date:  2004-11-23

6.  Predicting the clinical status of human breast cancer by using gene expression profiles.

Authors:  M West; C Blanchette; H Dressman; E Huang; S Ishida; R Spang; H Zuzan; J A Olson; J R Marks; J R Nevins
Journal:  Proc Natl Acad Sci U S A       Date:  2001-09-18       Impact factor: 11.205

7.  Diagnosis of multiple cancer types by shrunken centroids of gene expression.

Authors:  Robert Tibshirani; Trevor Hastie; Balasubramanian Narasimhan; Gilbert Chu
Journal:  Proc Natl Acad Sci U S A       Date:  2002-05-14       Impact factor: 11.205

8.  Tumor classification by partial least squares using microarray gene expression data.

Authors:  Danh V Nguyen; David M Rocke
Journal:  Bioinformatics       Date:  2002-01       Impact factor: 6.937

9.  Graphical methods for class prediction using dimension reduction techniques on DNA microarray data.

Authors:  Efstathia Bura; Ruth M Pfeiffer
Journal:  Bioinformatics       Date:  2003-07-01       Impact factor: 6.937

10.  Effective dimension reduction methods for tumor classification using gene expression data.

Authors:  A Antoniadis; S Lambert-Lacroix; F Leblanc
Journal:  Bioinformatics       Date:  2003-03-22       Impact factor: 6.937

View more
  60 in total

1.  Exploiting Linkage Disequilibrium for Ultrahigh-Dimensional Genome-Wide Data with an Integrated Statistical Approach.

Authors:  Michelle Carlsen; Guifang Fu; Shaun Bushman; Christopher Corcoran
Journal:  Genetics       Date:  2015-12-12       Impact factor: 4.562

2.  Non-Concave Penalized Likelihood with NP-Dimensionality.

Authors:  Jianqing Fan; Jinchi Lv
Journal:  IEEE Trans Inf Theory       Date:  2011-08       Impact factor: 2.501

3.  Impossibility of successful classification when useful features are rare and weak.

Authors:  Jiashun Jin
Journal:  Proc Natl Acad Sci U S A       Date:  2009-05-15       Impact factor: 11.205

4.  Large sample size, wide variant spectrum, and advanced machine-learning technique boost risk prediction for inflammatory bowel disease.

Authors:  Zhi Wei; Wei Wang; Jonathan Bradfield; Jin Li; Christopher Cardinale; Edward Frackelton; Cecilia Kim; Frank Mentch; Kristel Van Steen; Peter M Visscher; Robert N Baldassano; Hakon Hakonarson
Journal:  Am J Hum Genet       Date:  2013-05-23       Impact factor: 11.025

5.  VARIABLE SELECTION IN LINEAR MIXED EFFECTS MODELS.

Authors:  Yingying Fan; Runze Li
Journal:  Ann Stat       Date:  2012-08-01       Impact factor: 4.028

6.  Graph-based sparse linear discriminant analysis for high-dimensional classification.

Authors:  Jianyu Liu; Guan Yu; Yufeng Liu
Journal:  J Multivar Anal       Date:  2018-12-17       Impact factor: 1.473

7.  Higher criticism thresholding: Optimal feature selection when useful features are rare and weak.

Authors:  David Donoho; Jiashun Jin
Journal:  Proc Natl Acad Sci U S A       Date:  2008-09-24       Impact factor: 11.205

8.  Automated multidimensional phenotypic profiling using large public microarray repositories.

Authors:  Min Xu; Wenyuan Li; Gareth M James; Michael R Mehan; Xianghong Jasmine Zhou
Journal:  Proc Natl Acad Sci U S A       Date:  2009-07-09       Impact factor: 11.205

9.  Comment: Feature Screening and Variable Selection via Iterative Ridge Regression.

Authors:  Jianqing Fan; Runze Li
Journal:  Technometrics       Date:  2020-08-24

10.  Generalized Alternating Direction Method of Multipliers: New Theoretical Insights and Applications.

Authors:  Ethan X Fang; Bingsheng He; Han Liu; Xiaoming Yuan
Journal:  Math Program Comput       Date:  2015-02-06
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.