| Literature DB >> 30127856 |
Christina Brester1,2, Jussi Kauhanen3, Tomi-Pekka Tuomainen3, Sari Voutilainen3, Mauno Rönkkö1, Kimmo Ronkainen3, Eugene Semenkin2, Mikko Kolehmainen1.
Abstract
BACKGROUND: The redundancy of information is becoming a critical issue for epidemiologists. High-dimensional datasets require new effective variable selection methods to be developed. This study implements an advanced evolutionary variable selection method which is applied for cardiovascular predictive modeling. The epidemiological follow-up study KIHD (Kuopio Ischemic Heart Disease Risk Factor Study) was used to compare the designed variable selection method based on an evolutionary search with conventional stepwise selection. The sample contains in total 433 predictor variables and a response variable indicating incidents of cardiovascular diseases for 1465 study subjects.Entities:
Keywords: Cardiovascular disease; Kuopio ischemic heart disease risk factor study; Predictive modeling; Variable selection
Year: 2018 PMID: 30127856 PMCID: PMC6092817 DOI: 10.1186/s13040-018-0180-x
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Fig. 1The binary representation of a reduced variable set. One corresponds to a variable that is present in the model input and zero corresponds to an ignored variable
Fig. 2The performance of cardiovascular predictive modeling in combination with variable selections. The figure shows boxplots that compare F-score values obtained with Logit and SVM (the degree of the polynomial kernel is 1.0, 1.5 and 2.0) models without any variable selection, with stepwise selection and with evolutionary variable selection. Mean F-score values are marked with asterisks
Fig. 3The performance of cardiovascular predictive modeling in combination with variable selections. The figure portrays boxplots that reflect RMSE (root mean square error) values obtained with Logit and SVM (the degree of the polynomial kernel is 1.0, 1.5 and 2.0) models without any variable selection, with stepwise selection and with evolutionary variable selection. Mean RMSE values are marked with asterisks
Fig. 4The list of variables whose MOGA-ranks are higher than 0.95. The figure shows ranks of the listed variables given by the MOGA, stepwise selection and Pearson correlation coefficients