Literature DB >> 25346546

Generic Feature Selection with Short Fat Data.

B Clarke1, J-H Chu2.   

Abstract

Consider a regression problem in which there are many more explanatory variables than data points, i.e., p ≫ n. Essentially, without reducing the number of variables inference is impossible. So, we group the p explanatory variables into blocks by clustering, evaluate statistics on the blocks and then regress the response on these statistics under a penalized error criterion to obtain estimates of the regression coefficients. We examine the performance of this approach for a variety of choices of n, p, classes of statistics, clustering algorithms, penalty terms, and data types. When n is not large, the discrimination over number of statistics is weak, but computations suggest regressing on approximately [n/K] statistics where K is the number of blocks formed by a clustering algorithm. Small deviations from this are observed when the blocks of variables are of very different sizes. Larger deviations are observed when the penalty term is an Lq norm with high enough q.

Entities:  

Keywords:  Bridge; Clustering; LASSO; Large p small n; Ridge; Summary statistics; Variance-bias tradeoff

Year:  2014        PMID: 25346546      PMCID: PMC4208697     

Source DB:  PubMed          Journal:  J Indian Soc Agric Stat        ISSN: 0019-6363


  2 in total

1.  QSAR with few compounds and many features.

Authors:  D M Hawkins; S C Basak; X Shi
Journal:  J Chem Inf Comput Sci       Date:  2001 May-Jun

2.  Penalized Regression and Risk Prediction in Genome-Wide Association Studies.

Authors:  Erin Austin; Wei Pan; Xiaotong Shen
Journal:  Stat Anal Data Min       Date:  2013-08-01       Impact factor: 1.051

  2 in total
  1 in total

1.  Collective feature selection to identify crucial epistatic variants.

Authors:  Shefali S Verma; Anastasia Lucas; Xinyuan Zhang; Yogasudha Veturi; Scott Dudek; Binglan Li; Ruowang Li; Ryan Urbanowicz; Jason H Moore; Dokyoon Kim; Marylyn D Ritchie
Journal:  BioData Min       Date:  2018-04-19       Impact factor: 2.522

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.