Farideh Bagherzadeh-Khiabani1, Azra Ramezankhani1, Fereidoun Azizi2, Farzad Hadaegh1, Ewout W Steyerberg3, Davood Khalili4. 1. Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Velenjak, 1985717413 Tehran, Iran. 2. Endocrine Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Velenjak, 1985717413 Tehran, Iran. 3. Department of Public Health, Erasmus MC, Rotterdam, The Netherlands. 4. Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Velenjak, 1985717413 Tehran, Iran; Department of Biostatistics and Epidemiology, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Velenjak, 1985717413 Tehran, Iran. Electronic address: dkhalili@endocrine.ac.ir.
Abstract
OBJECTIVES: Identifying an appropriate set of predictors for the outcome of interest is a major challenge in clinical prediction research. The aim of this study was to show the application of some variable selection methods, usually used in data mining, for an epidemiological study. We introduce here a systematic approach. STUDY DESIGN AND SETTING: The P-value-based method, usually used in epidemiological studies, and several filter and wrapper methods were implemented to select the predictors of diabetes among 55 variables in 803 prediabetic females, aged ≥ 20 years, followed for 10-12 years. To develop a logistic model, variables were selected from a train data set and evaluated on the test data set. The measures of Akaike information criterion (AIC) and area under the curve (AUC) were used as performance criteria. We also implemented a full model with all 55 variables. RESULTS: We found that the worst and the best models were the full model and models based on the wrappers, respectively. Among filter methods, symmetrical uncertainty gave both the best AUC and AIC. CONCLUSION: Our experiment showed that the variable selection methods used in data mining could improve the performance of clinical prediction models. An R program was developed to make these methods more feasible and visualize the results.
OBJECTIVES: Identifying an appropriate set of predictors for the outcome of interest is a major challenge in clinical prediction research. The aim of this study was to show the application of some variable selection methods, usually used in data mining, for an epidemiological study. We introduce here a systematic approach. STUDY DESIGN AND SETTING: The P-value-based method, usually used in epidemiological studies, and several filter and wrapper methods were implemented to select the predictors of diabetes among 55 variables in 803 prediabetic females, aged ≥ 20 years, followed for 10-12 years. To develop a logistic model, variables were selected from a train data set and evaluated on the test data set. The measures of Akaike information criterion (AIC) and area under the curve (AUC) were used as performance criteria. We also implemented a full model with all 55 variables. RESULTS: We found that the worst and the best models were the full model and models based on the wrappers, respectively. Among filter methods, symmetrical uncertainty gave both the best AUC and AIC. CONCLUSION: Our experiment showed that the variable selection methods used in data mining could improve the performance of clinical prediction models. An R program was developed to make these methods more feasible and visualize the results.
Authors: L Nelson Sanchez-Pinto; Laura Ruth Venable; John Fahrenbach; Matthew M Churpek Journal: Int J Med Inform Date: 2018-05-21 Impact factor: 4.046
Authors: Alena Kuhlemeier; Thomas Jaki; Elizabeth Y Jimenez; Alberta S Kong; Hope Gill; Chi Chang; Ken Resnicow; Dawn K Wilson; M Lee Van Horn Journal: J Behav Med Date: 2022-01-15
Authors: Stephanie Gillespie; Jacqueline Laures-Gore; Elliot Moore; Matthew Farina; Scott Russell; Benjamin Haaland Journal: J Speech Lang Hear Res Date: 2018-12-10 Impact factor: 2.297
Authors: Alena Kuhlemeier; Yasin Desai; Alexandra Tonigan; Katie Witkiewitz; Thomas Jaki; Yu-Yu Hsiao; Chi Chang; M Lee Van Horn Journal: J Consult Clin Psychol Date: 2021-04