Literature DB >> 33286109

Weighted Mean Squared Deviation Feature Screening for Binary Features.

Gaizhen Wang1, Guoyu Guan2.   

Abstract

In this study, we propose a novel model-free feature screening method for ultrahigh dimensional binary features of binary classification, called weighted mean squared deviation (WMSD). Compared to Chi-square statistic and mutual information, WMSD provides more opportunities to the binary features with probabilities near 0.5. In addition, the asymptotic properties of the proposed method are theoretically investigated under the assumption log p = o ( n ) . The number of features is practically selected by a Pearson correlation coefficient method according to the property of power-law distribution. Lastly, an empirical study of Chinese text classification illustrates that the proposed method performs well when the dimension of selected features is relatively small.

Entities:  

Keywords:  Chi-square statistic; Pearson correlation coefficient; feature screening; mutual information; power-law distribution; weighted mean squared deviation

Year:  2020        PMID: 33286109      PMCID: PMC7516793          DOI: 10.3390/e22030335

Source DB:  PubMed          Journal:  Entropy (Basel)        ISSN: 1099-4300            Impact factor:   2.524


  7 in total

1.  Mathematics. Critical truths about power laws.

Authors:  Michael P H Stumpf; Mason A Porter
Journal:  Science       Date:  2012-02-10       Impact factor: 47.728

2.  Discussion of "Sure Independence Screening for Ultra-High Dimensional Feature Space.

Authors:  Hao Helen Zhang
Journal:  J R Stat Soc Series B Stat Methodol       Date:  2008-11       Impact factor: 4.488

3.  A Generic Sure Independence Screening Procedure.

Authors:  Wenliang Pan; Xueqin Wang; Weinan Xiao; Hongtu Zhu
Journal:  J Am Stat Assoc       Date:  2018-08-06       Impact factor: 5.033

4.  Model-Free Feature Screening for Ultrahigh Dimensional Discriminant Analysis.

Authors:  Hengjian Cui; Runze Li; Wei Zhong
Journal:  J Am Stat Assoc       Date:  2015-06-01       Impact factor: 5.033

5.  Feature Screening for Ultrahigh Dimensional Categorical Data with Applications.

Authors:  Danyang Huang; Runze Li; Hansheng Wang
Journal:  J Bus Econ Stat       Date:  2014       Impact factor: 6.565

6.  Model-Free Feature Screening for Ultrahigh Dimensional Data.

Authors:  Liping Zhu; Lexin Li; Runze Li; Lixing Zhu
Journal:  J Am Stat Assoc       Date:  2012-01-24       Impact factor: 5.033

7.  Feature Screening via Distance Correlation Learning.

Authors:  Runze Li; Wei Zhong; Liping Zhu
Journal:  J Am Stat Assoc       Date:  2012-07-01       Impact factor: 5.033

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.