Literature DB >> 27185970

Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification.

Jianqing Fan1, Yang Feng2, Jiancheng Jiang3, Xin Tong4.   

Abstract

We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.

Entities:  

Keywords:  classification; density estimation; feature augmentation; feature selection; high dimensional space; nonlinear decision boundary; parallel computing

Year:  2016        PMID: 27185970      PMCID: PMC4866821          DOI: 10.1080/01621459.2015.1005212

Source DB:  PubMed          Journal:  J Am Stat Assoc        ISSN: 0162-1459            Impact factor:   5.033


  14 in total

1.  Linear regression and two-class classification with gene expression data.

Authors:  Xiaohong Huang; Wei Pan
Journal:  Bioinformatics       Date:  2003-11-01       Impact factor: 6.937

2.  Nonparametric Independence Screening in Sparse Ultra-High Dimensional Additive Models.

Authors:  Jianqing Fan; Yang Feng; Rui Song
Journal:  J Am Stat Assoc       Date:  2011-06       Impact factor: 5.033

3.  Estimating misclassification error with small samples via bootstrap cross-validation.

Authors:  Wenjiang J Fu; Raymond J Carroll; Suojin Wang
Journal:  Bioinformatics       Date:  2005-02-02       Impact factor: 6.937

4.  Regularized linear discriminant analysis and its application in microarrays.

Authors:  Yaqian Guo; Trevor Hastie; Robert Tibshirani
Journal:  Biostatistics       Date:  2006-04-07       Impact factor: 5.899

5.  Sparse linear discriminant analysis for simultaneous testing for the significance of a gene set/pathway and gene selection.

Authors:  Michael C Wu; Lingsong Zhang; Zhaoxi Wang; David C Christiani; Xihong Lin
Journal:  Bioinformatics       Date:  2009-01-25       Impact factor: 6.937

6.  A ROAD to Classification in High Dimensional Space.

Authors:  Jianqing Fan; Yang Feng; Xin Tong
Journal:  J R Stat Soc Series B Stat Methodol       Date:  2012-04-12       Impact factor: 4.488

7.  PLS dimension reduction for classification with microarray data.

Authors:  Anne-Laure Boulesteix
Journal:  Stat Appl Genet Mol Biol       Date:  2004-11-23

8.  Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors:  Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal:  J Stat Softw       Date:  2010       Impact factor: 6.440

9.  Tumor classification by partial least squares using microarray gene expression data.

Authors:  Danh V Nguyen; David M Rocke
Journal:  Bioinformatics       Date:  2002-01       Impact factor: 6.937

10.  Effective dimension reduction methods for tumor classification using gene expression data.

Authors:  A Antoniadis; S Lambert-Lacroix; F Leblanc
Journal:  Bioinformatics       Date:  2003-03-22       Impact factor: 6.937

View more
  1 in total

1.  JDINAC: joint density-based non-parametric differential interaction network analysis and classification using high-dimensional sparse omics data.

Authors:  Jiadong Ji; Di He; Yang Feng; Yong He; Fuzhong Xue; Lei Xie
Journal:  Bioinformatics       Date:  2017-10-01       Impact factor: 6.937

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.