| Literature DB >> 33162675 |
Qiyue Wang1, Yao Lu1, Xiaoke Zhang2, James Hahn1.
Abstract
Feature selection is a critical component in supervised learning to improve model performance. Searching for the optimal feature candidates can be NP-hard. With limited data, cross-validation is widely used to alleviate overfitting, which unfortunately suffers from high computational cost. We propose a highly innovative strategy in feature selection to reduce the overfitting risk but without cross-validation. Our method selects the optimal sub-interval, i.e., region of interest (ROI), of a functional feature for functional linear regression where the response is a scalar and the predictor is a function. For each candidate sub-interval, we evaluate the overfitting risk by calculating a necessary sample size to achieve a pre-specified statistical power. Combining with a model accuracy measure, we rank these sub-intervals and select the ROI. The proposed method has been compared with other state-of-the-art feature selection methods on several reference datasets. The results show that our proposed method achieves an excellent performance in prediction accuracy and reduces computational cost substantially.Entities:
Keywords: Feature Selection; Functional Data; Machine Learning
Year: 2020 PMID: 33162675 PMCID: PMC7641503 DOI: 10.1016/j.neucom.2020.10.009
Source DB: PubMed Journal: Neurocomputing ISSN: 0925-2312 Impact factor: 5.719