Literature DB >> 34896682

A variable selection method based on mutual information and variance inflation factor.

Jiehong Cheng1, Jun Sun1, Kunshan Yao1, Min Xu1, Yan Cao1.   

Abstract

Feature selection plays a vital role in the quantitative analysis of high-dimensional data to reduce dimensionality. Recently, the variable selection method based on mutual information (MI) has attracted more and more attention in the field of feature selection, where the relevance between the candidate variable and the response is maximized and the redundancy of the selected variables is minimized. However, multicollinearity often is a serious problem in linear models. Collinearity can cause unstable parameter estimation, unreliable models, and weak predictive ability. In order to address this problem, the variance inflation factor (VIF) was introduced for feature selection. Therefore, a variable selection method based on MI combined with VIF was proposed in this paper, called Mutual Information-Variance Inflation Factor (MI-VIF). By calculating the MI between the independent variable and the response variable, the variable with greater MI was selected to maximize the correlation between the independent variable and the response variable. By calculating the VIF between the independent variables, the multicollinearity test was performed. The variables that cause the multicollinearity of the model were eliminated to minimize the collinearity between the independent variables. The proposed method was tested based on two high-dimensional spectral datasets. The regression models (PLSR, MLR) were established based on feature selection through MI-VIF and MI-based methods (MIFS, MMIFS) to compare the prediction accuracy of the models. The results showed that under two datasets, the MI-VIF showed a good prediction performance. Based on the tea dataset, the established MI-VIF-MLR model achieved accuracy with Rp2 of 0.8612 and RMSEP of 0.4096, the MI-VIF-PLSR model achieved accuracy with Rp2 of 0.8614 and RMSEP of 0.4092. Based on the diesel fuels dataset, the established MI-VIF-MLR model achieved accuracy with Rp2 of 0.9707 and RMSEP of 0.6568, the MI-VIF-PLSR model achieved accuracy with Rp2 of 0.9431 and RMSEP of 0.9675. In addition, the MI-VIF was compared with the Successive projections algorithm (SPA), which is a method to reduce the collinearity between variables in the wavelength selection of the near-infrared spectrum. It was found that MI-VIF also had a good predictive effect compared to SPA. It proves that the MI-VIF is an effective variable selection method.
Copyright © 2021. Published by Elsevier B.V.

Entities:  

Keywords:  Mutual information; Spectrum; Variable selection; Variance inflation factor

Mesh:

Substances:

Year:  2021        PMID: 34896682     DOI: 10.1016/j.saa.2021.120652

Source DB:  PubMed          Journal:  Spectrochim Acta A Mol Biomol Spectrosc        ISSN: 1386-1425            Impact factor:   4.098


  7 in total

1.  [Long short-term memory and Logistic regression for mortality risk prediction of intensive care unit patients with stroke].

Authors:  Y H Deng; Y Jiang; Z Y Wang; S Liu; Y X Wang; B H Liu
Journal:  Beijing Da Xue Xue Bao Yi Xue Ban       Date:  2022-06-18

2.  Interpretation of Discrepancies between Cities in the Transmission of COVID-19: Evidence from China in the First Weeks of the Pandemic.

Authors:  Zhao-Ge Liu; Xiang-Yang Li
Journal:  Int J Infect Dis       Date:  2022-03-04       Impact factor: 12.074

3.  Development of Simplified Models for Non-Destructive Hyperspectral Imaging Monitoring of S-ovalbumin Content in Eggs during Storage.

Authors:  Kunshan Yao; Jun Sun; Jiehong Cheng; Min Xu; Chen Chen; Xin Zhou; Chunxia Dai
Journal:  Foods       Date:  2022-07-08

4.  An Assessment Framework for the Training of General Practitioners and Specialists Based on EPAs.

Authors:  Shenshen Gao; Na Li; Xinqiong Wang; Yi Yu; Ren Zhao; Virgínia Trigo; Nelson Campos Ramalho
Journal:  Front Public Health       Date:  2022-07-07

5.  Analysis on Risk Characteristics of Traffic Accidents in Small-Spacing Expressway Interchange.

Authors:  Yanpeng Wang; Jin Xu; Xingliang Liu; Zhanji Zheng; Heshan Zhang; Chengyu Wang
Journal:  Int J Environ Res Public Health       Date:  2022-08-12       Impact factor: 4.614

6.  Comparative Analysis of Influencing Factors on Crash Severity between Super Multi-Lane and Traditional Multi-Lane Freeways Considering Spatial Heterogeneity.

Authors:  Junxiang Zhang; Bo Yu; Yuren Chen; You Kong; Jianqiang Gao
Journal:  Int J Environ Res Public Health       Date:  2022-10-06       Impact factor: 4.614

7.  Prediction for late-onset sepsis in preterm infants based on data from East China.

Authors:  Xianghua Shuai; Xiaoxia Li; Yiling Wu
Journal:  Front Pediatr       Date:  2022-09-14       Impact factor: 3.569

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.