Noora Kanerva1,2, Jukka Kontto2, Maijaliisa Erkkola3, Jaakko Nevalainen4, Satu Männistö2. 1. 1 Department of Public Health, University of Helsinki, Finland. 2. 2 Department of Public Health Solutions, National Institute for Health and Welfare, Finland. 3. 3 Nutrition Unit, University of Helsinki, Finland. 4. 4 School of Health Sciences, University of Tampere, Finland.
Abstract
AIMS: Factors that contribute to the development of overweight are numerous and form a complex structure with many unknown interactions and associations. We aimed to explore this structure (i.e. the mutual importance or hierarchy of sociodemographic and lifestyle-related risk factors of being overweight) using a machine-learning technique called random forest (RF). The results were compared with traditional logistic regression (LR) analysis. METHODS: The cross-sectional FINRISK 2007 Study included 4757 Finns (aged 25-74 years). Information on participants' lifestyle and sociodemographic characteristics were collected with questionnaires. Diet was assessed, using a validated food-frequency questionnaire. Height and weight were measured. Participants with a body mass index (BMI) ≥25 kg/m2 were classified as overweight. R-statistical software was used to run RF analysis ('randomForest') to derive estimates for variable importance and out-of-bag error, which were compared to a LR model. RESULTS: In total, 704 (32%) men and 1119 (44%) women had normal BMI, whereas 1502 (69%) men and 1432 (57%) women had BMI ≥25. Estimated error rates for the models were similar (RF vs. LR: 42% vs. 40% for men, 38% vs. 35% for women). Both models ranked age, education and physical activity as the most important risk factors for being overweight, but RF ranked macronutrients (carbohydrates and protein) as more important compared to LR. CONCLUSIONS: RF did not demonstrate higher power in variable selection compared to LR in our study. The features of RF are more likely to appear beneficial in settings with a larger number of predictors.
AIMS: Factors that contribute to the development of overweight are numerous and form a complex structure with many unknown interactions and associations. We aimed to explore this structure (i.e. the mutual importance or hierarchy of sociodemographic and lifestyle-related risk factors of being overweight) using a machine-learning technique called random forest (RF). The results were compared with traditional logistic regression (LR) analysis. METHODS: The cross-sectional FINRISK 2007 Study included 4757 Finns (aged 25-74 years). Information on participants' lifestyle and sociodemographic characteristics were collected with questionnaires. Diet was assessed, using a validated food-frequency questionnaire. Height and weight were measured. Participants with a body mass index (BMI) ≥25 kg/m2 were classified as overweight. R-statistical software was used to run RF analysis ('randomForest') to derive estimates for variable importance and out-of-bag error, which were compared to a LR model. RESULTS: In total, 704 (32%) men and 1119 (44%) women had normal BMI, whereas 1502 (69%) men and 1432 (57%) women had BMI ≥25. Estimated error rates for the models were similar (RF vs. LR: 42% vs. 40% for men, 38% vs. 35% for women). Both models ranked age, education and physical activity as the most important risk factors for being overweight, but RF ranked macronutrients (carbohydrates and protein) as more important compared to LR. CONCLUSIONS: RF did not demonstrate higher power in variable selection compared to LR in our study. The features of RF are more likely to appear beneficial in settings with a larger number of predictors.
Entities:
Keywords:
Machine learning; mutual importance; obesity; random forest; risk factor
Authors: Lisa M Bodnar; Abigail R Cartus; Sharon I Kirkpatrick; Katherine P Himes; Edward H Kennedy; Hyagriv N Simhan; William A Grobman; Jennifer Y Duffy; Robert M Silver; Samuel Parry; Ashley I Naimi Journal: Am J Clin Nutr Date: 2020-06-01 Impact factor: 8.472
Authors: Yijin Zheng; Jianping Liang; Ding Zeng; Weiqing Tan; Lun Yang; Shuang Lu; Wanwen Yao; Yi Yang; Li Liu Journal: Front Public Health Date: 2022-08-17