| Literature DB >> 34235168 |
Sri Astuti Thamrin1, Dian Sidik Arsyad2, Hedi Kuswanto1, Armin Lawi3, Sudirman Nasir4.
Abstract
Obesity is strongly associated with multiple risk factors. It is significantly contributing to an increased risk of chronic disease morbidity and mortality worldwide. There are various challenges to better understand the association between risk factors and the occurrence of obesity. The traditional regression approach limits analysis to a small number of predictors and imposes assumptions of independence and linearity. Machine Learning (ML) methods are an alternative that provide information with a unique approach to the application stage of data analysis on obesity. This study aims to assess the ability of ML methods, namely Logistic Regression, Classification and Regression Trees (CART), and Naïve Bayes to identify the presence of obesity using publicly available health data, using a novel approach with sophisticated ML methods to predict obesity as an attempt to go beyond traditional prediction models, and to compare the performance of three different methods. Meanwhile, the main objective of this study is to establish a set of risk factors for obesity in adults among the available study variables. Furthermore, we address data imbalance using Synthetic Minority Oversampling Technique (SMOTE) to predict obesity status based on risk factors available in the dataset. This study indicates that the Logistic Regression method shows the highest performance. Nevertheless, kappa coefficients show only moderate concordance between predicted and measured obesity. Location, marital status, age groups, education, sweet drinks, fatty/oily foods, grilled foods, preserved foods, seasoning powders, soft/carbonated drinks, alcoholic drinks, mental emotional disorders, diagnosed hypertension, physical activity, smoking, and fruit and vegetables consumptions are significant in predicting obesity status in adults. Identifying these risk factors could inform health authorities in designing or modifying existing policies for better controlling chronic diseases especially in relation to risk factors associated with obesity. Moreover, applying ML methods on publicly available health data, such as Indonesian Basic Health Research (RISKESDAS) is a promising strategy to fill the gap for a more robust understanding of the associations of multiple risk factors in predicting health outcomes.Entities:
Keywords: Logistic Regression; Naive Bayes; classification; machine learning; obesity status
Year: 2021 PMID: 34235168 PMCID: PMC8255629 DOI: 10.3389/fnut.2021.669155
Source DB: PubMed Journal: Front Nutr ISSN: 2296-861X
General description of obesity data from Indonesian RISKESDAS 2018.
| Obesity status (Y) | Non-obese | 484,189 | 78.23 |
| Obese | 134,709 | 21.77 | |
| Location (X1) | Urban | 267,913 | 43.29 |
| Rural | 350,985 | 56.71 | |
| Marital status (X2) | Not married | 84,792 | 13.70 |
| Married | 472,269 | 76.31 | |
| Divorced | 14,333 | 2.32 | |
| Widowed | 47,504 | 7.68 | |
| Age groups (X3) | 18–24 years | 69,532 | 11.23 |
| 25–29 years | 60,380 | 9.76 | |
| 30–34 years | 68,683 | 11.10 | |
| 35–39 years | 77,538 | 12.53 | |
| 40–44 years | 73,775 | 11.92 | |
| 45–49 years | 70,503 | 11.39 | |
| 50–54 years | 58,618 | 9.47 | |
| 55–59 years | 49,632 | 8.02 | |
| 60–64 years | 35,471 | 5.73 | |
| >64 years | 54,766 | 8.85 | |
| Education (X4) | Not/Never schooled | 40,861 | 6.60 |
| Not finished basic school | 84,637 | 13.68 | |
| Finished basic school | 157,391 | 25.43 | |
| Finished Junior High School | 104,435 | 16.87 | |
| Finished Senior High School | 170,246 | 27.51 | |
| Finished Academy/College | 20,005 | 3.23 | |
| Finished higher education | 41,323 | 6.68 | |
| Work types (X5) | Not working | 171,984 | 27.79 |
| School | 12,238 | 1.98 | |
| Government employee | 27,703 | 4.48 | |
| Private employee | 50,049 | 8.09 | |
| Entrepreneur | 91,011 | 14.71 | |
| Farmer | 163,009 | 26.34 | |
| Fisherman | 8,344 | 1.35 | |
| Daily waged labors | 52,379 | 8.46 | |
| Others | 42,181 | 6.82 | |
| Sugary foods (X6) | >1 time per day | 82,775 | 13.37 |
| 1 time per day | 125,754 | 20.32 | |
| 3–6 times per week | 138,685 | 22.41 | |
| 1–2 times per week | 177,173 | 28.63 | |
| <3 times per month | 62,972 | 10.17 | |
| Never | 31,539 | 5.10 | |
| Sweet drinks (X7) | >1 time per day | 176,096 | 28.45 |
| 1 time per day | 195,361 | 31.57 | |
| 3–6 times per week | 87,827 | 14.19 | |
| 1–2 times per week | 95,409 | 15.42 | |
| <3 times per month | 33,666 | 5.44 | |
| Never | 30,539 | 4.93 | |
| Salty foods (X8) | >1 time per day | 64,660 | 10.45 |
| 1 time per day | 78,744 | 12.72 | |
| 3–6 times per week | 105,363 | 17.02 | |
| 1–2 times per week | 170,442 | 27.54 | |
| <3 times per month | 107,318 | 17.34 | |
| Never | 92,371 | 14.93 | |
| Fatty/Oily foods (X9) | >1 time per day | 103,634 | 16.74 |
| 1 time per day | 113,057 | 18.27 | |
| 3–6 times per week | 133,552 | 21.58 | |
| 1–2 times per week | 164,703 | 26.61 | |
| <3 times per month | 72,739 | 11.75 | |
| Never | 31,213 | 5.04 | |
| Grilled foods (X10) | >1 time per day | 12,948 | 2.09 |
| 1 time per day | 22,189 | 3.59 | |
| 3–6 times per week | 63,967 | 10.34 | |
| 1–2 times per week | 161,356 | 26.07 | |
| <3 times per month | 202,251 | 32.68 | |
| Never | 156,187 | 25.24 | |
| Preserved foods (X11) | >1 time per day | 6,310 | 1.02 |
| 1 time per day | 12,024 | 1.94 | |
| 3–6 times per week | 31,993 | 5.17 | |
| 1–2 times per week | 72,618 | 11.73 | |
| <3 times per month | 145,068 | 23.44 | |
| Never | 350,885 | 56.70 | |
| Seasonings powders (X12) | >1 time per day | 227,357 | 36.74 |
| 1 time per day | 226,628 | 36.62 | |
| 3–6 times per week | 42,598 | 6.88 | |
| 1–2 times per week | 34,030 | 5.50 | |
| <3 times per month | 20,887 | 3.37 | |
| Never | 67,398 | 10.89 | |
| Soft/Carbonated drinks (X13) | >1 time per day | 3,689 | 0.60 |
| 1 time per day | 7,857 | 1.27 | |
| 3–6 times per week | 16,470 | 2.66 | |
| 1–2 times per week | 43,686 | 7.06 | |
| <3 times per month | 100,398 | 16.22 | |
| Never | 446,798 | 72.19 | |
| Energy drinks (X14) | >1 time per day | 3,654 | 0.59 |
| 1 time per day | 7,761 | 1.25 | |
| 3–6 times per week | 12,888 | 2.08 | |
| 1–2 times per week | 31,045 | 5.02 | |
| <3 times per month | 58,659 | 9.48 | |
| Never | 504,891 | 81.58 | |
| Instant foods (X15) | >1 time per day | 12,144 | 1.96 |
| 1 time per day | 28,943 | 4.68 | |
| 3–6 times per week | 108,287 | 17.50 | |
| 1–2 times per week | 220,125 | 35.57 | |
| <3 times per month | 149,066 | 24.09 | |
| Never | 100,333 | 16.21 | |
| Alcoholic drinks (X16) | Yes | 30,240 | 4.89 |
| No | 588,658 | 95.11 | |
| Mental-emotional disorders (X17) | Yes | 61,092 | 9.87 |
| No | 557,806 | 90.13 | |
| Diagnosed hypertension (X18) | Yes | 55,640 | 8.99 |
| No | 315,467 | 50.97 | |
| Unknown | 247,791 | 40.04 | |
| Physical activity (X19) | Adequate | 73,736 | 11.91 |
| Not adequate | 545,162 | 88.09 | |
| Smoking (X20) | Yes | 233,306 | 37.70 |
| No | 385,592 | 62.30 | |
| Fruit and vegetables consumptions (X21) | Adequate | 29,321 | 4.74 |
| Not adequate | 589,577 | 95.26 |
Comparison of classification accuracy with 10-fold CV based on the obesity test data using three models with confusion matrix.
| CART | Non-obese | 360,554 | 193,472 | 360,260 | 193,579 | 360,791 | 193,595 | 360,325 | 193,504 | 360,459 | 193,685 |
| Obese | 75,298 | 291,411 | 75,283 | 291,744 | 75,227 | 291,362 | 75,294 | 291,335 | 75,401 | 291,611 | |
| Naïve-Bayes | Non-obese | 314,384 | 141,264 | 313,957 | 141,209 | 314,357 | 141,167 | 314,080 | 141,106 | 314,273 | 141,413 |
| Obese | 121,468 | 343,619 | 121,586 | 344,114 | 121,661 | 343,790 | 121,539 | 343,733 | 121,587 | 343,883 | |
| Logistic Regression | Non-obese | 320,456 | 140,260 | 319,952 | 140,279 | 320,628 | 140,336 | 320,202 | 140,144 | 320,285 | 140,474 |
| Obese | 115,396 | 344,623 | 115,591 | 345,044 | 115,390 | 344,621 | 115,417 | 344,695 | 115,575 | 344,822 | |
| CART | Non-obese | 360,531 | 193,271 | 360,426 | 193,360 | 360,177 | 193,275 | 360,566 | 193,586 | 360,411 | 193,430 |
| Obese | 75,312 | 291,645 | 75,410 | 291,447 | 75,351 | 291,331 | 75,308 | 291,504 | 75,317 | 291,377 | |
| Naïve-Bayes | Non-obese | 314,356 | 141,221 | 314,273 | 141,183 | 314,030 | 141,113 | 314,239 | 141,296 | 314,234 | 141,345 |
| Obese | 121,487 | 343,695 | 121,563 | 343,624 | 121,498 | 343,493 | 121,635 | 343,794 | 121,494 | 343,462 | |
| Logistic Regression | Non-obese | 320,479 | 140,281 | 320,423 | 140,220 | 320,206 | 140,253 | 320,464 | 140,277 | 320,355 | 140,328 |
| Obese | 115,364 | 344,635 | 115,413 | 344,587 | 115,322 | 344,353 | 115,410 | 344,813 | 115,373 | 344,479 | |
Evaluation of classification prediction performance with 10-fold CV based on the obesity test data using 3 ML methods.
| CART | 1-Fold | 70.81 | 60.10 | 65.08 | 42.24 | 74.57 | 67.98 | ||
| 2-Fold | 70.80 | 60.11 | 65.05 | 42.24 | 74.56 | 67.95 | |||
| 3-Fold | 70.81 | 60.08 | 65.08 | 42.25 | 74.56 | 67.98 | |||
| 4-Fold | 70.80 | 60.09 | 65.06 | 42.22 | 74.55 | 67.96 | |||
| 5-Fold | 70.79 | 60.09 | 65.05 | 42.21 | 74.54 | 67.95 | |||
| 6-Fold | 70.83 | 60.14 | 65.10 | 42.28 | 74.55 | 68.00 | |||
| 7-Fold | 70.81 | 60.12 | 65.08 | 42.24 | 74.55 | 67.98 | |||
| 8-Fold | 70.81 | 60.12 | 65.08 | 42.24 | 74.56 | 67.97 | |||
| 9-Fold | 70.80 | 60.09 | 65.07 | 42.23 | 74.56 | 67.97 | |||
| 10-Fold | 70.81 | 60.10 | 65.07 | 42.24 | 74.54 | 67.97 | |||
| Naïve-Bayes | 1-Fold | 71.46 | 72.13 | 70.87 | 69.00 | 70.53 | 42.90 | 78.47 | 69.60 |
| 2-Fold | 71.46 | 72.08 | 70.90 | 68.98 | 70.50 | 42.89 | 78.47 | 69.58 | |
| 3-Fold | 71.46 | 72.10 | 70.89 | 69.01 | 70.52 | 42.89 | 78.47 | 69.61 | |
| 4-Fold | 71.47 | 72.10 | 70.90 | 69.00 | 70.52 | 42.90 | 78.47 | 69.60 | |
| 5-Fold | 71.45 | 72.10 | 70.86 | 68.97 | 70.50 | 42.87 | 78.45 | 69.57 | |
| 6-Fold | 71.47 | 72.13 | 70.88 | 69.00 | 70.53 | 42.90 | 78.48 | 69.60 | |
| 7-Fold | 71.46 | 72.11 | 70.88 | 69.00 | 70.52 | 42.89 | 78.46 | 69.60 | |
| 8-Fold | 71.46 | 72.10 | 70.88 | 69.00 | 70.52 | 42.89 | 78.45 | 69.60 | |
| 9-Fold | 71.45 | 72.09 | 70.87 | 68.98 | 70.50 | 42.87 | 78.48 | 69.58 | |
| 10-Fold | 71.45 | 72.12 | 70.85 | 68.97 | 70.51 | 42.86 | 78.47 | 69.58 | |
| Logistic Regression | 1-Fold | 73.52 | 71.49 | ||||||
| 2-Fold | 73.46 | 71.44 | |||||||
| 3-Fold | 73.54 | 71.49 | |||||||
| 4-Fold | 73.51 | 71.48 | |||||||
| 5-Fold | 73.48 | 71.44 | |||||||
| 6-Fold | 73.53 | 71.49 | |||||||
| 7-Fold | 73.52 | 71.48 | |||||||
| 8-Fold | 73.52 | 71.48 | |||||||
| 9-Fold | 73.52 | 71.48 | |||||||
| 10-Fold | 73.52 | 71.48 |
Bold values shows in which aspect does the ML methods performed best.
Figure 1AUC performance of the classification methods with 10-fold CV using the CART method.
Figure 3AUC performance on the classification method with the 10-fold CV using the Logistic Regression method.
Estimation of the Logistic Regression parameters based on fold 6 out of the 10-fold CV for obesity dataset in Indonesian RISKESDAS 2018 survey.
| Constant | 6.510 | 0.046 | 142.754 | 0.000 | 671.976 | |
| Location (X1) | Rural | −0.305 | 0.005 | −59.121 | 0.000 | 0.737 |
| Marital status (X2) | Married | −0.363 | 0.007 | −50.033 | 0.000 | 0.695 |
| Divorced | 0.271 | 0.015 | 18.000 | 0.000 | 1.311 | |
| Widowed | 0.289 | 0.012 | 24.963 | 0.000 | 1.335 | |
| Age groups (X3) | 25–29 years | 0.488 | 0.010 | 46.674 | 0.000 | 1.630 |
| 30–34 years | 0.560 | 0.011 | 52.679 | 0.000 | 1.750 | |
| 35–39 years | 0.680 | 0.011 | 64.375 | 0.000 | 1.975 | |
| 40–44 years | 0.746 | 0.011 | 69.255 | 0.000 | 2.110 | |
| 45–49 years | 0.741 | 0.011 | 67.743 | 0.000 | 2.097 | |
| 50–54 years | 0.549 | 0.012 | 46.783 | 0.000 | 1.731 | |
| 55–59 years | 0.333 | 0.013 | 26.349 | 0.000 | 1.396 | |
| 60–64 years | 0.304 | 0.014 | 21.859 | 0.000 | 1.355 | |
| >64 years | −0.457 | 0.014 | −32.580 | 0.000 | 0.633 | |
| Education (X4) | Not finished basic school | 0.313 | 0.013 | 24.156 | 0.000 | 1.367 |
| Finished basic school | 0.361 | 0.012 | 29.692 | 0.000 | 1.435 | |
| Finished Junior High School | 0.456 | 0.013 | 35.808 | 0.000 | 1.577 | |
| Finished Senior High School | 0.469 | 0.012 | 38.083 | 0.000 | 1.598 | |
| Finished Academy/College | 0.502 | 0.018 | 28.496 | 0.000 | 1.652 | |
| Finished higher education | 0.506 | 0.015 | 33.432 | 0.000 | 1.659 | |
| Work types (X5) | School | −0.356 | 0.018 | −19.850 | 0.000 | 0.700 |
| Government employee | 0.197 | 0.013 | 15.224 | 0.000 | 1.218 | |
| Private employee | −0.117 | 0.010 | −12.055 | 0.000 | 0.889 | |
| Entrepreneur | 0.069 | 0.008 | 8.797 | 0.000 | 1.072 | |
| Farmer | −0.548 | 0.007 | −74.090 | 0.000 | 0.578 | |
| Fisherman | −0.838 | 0.024 | −35.437 | 0.000 | 0.432 | |
| Daily waged labors | −0.389 | 0.010 | −39.463 | 0.000 | 0.678 | |
| Others | 0.010 | 0.010 | 0.987 | 0.324 | 1.010 | |
| Sugary foods (X6) | 1 times per day | −0.135 | 0.009 | −15.096 | 0.000 | 0.874 |
| 3–6 times per week | −0.141 | 0.009 | −15.938 | 0.000 | 0.869 | |
| 1–2 times per week | −0.158 | 0.009 | −18.457 | 0.000 | 0.854 | |
| <3 times per month | 0.013 | 0.011 | 1.189 | 0.234 | 1.013 | |
| Never | −0.101 | 0.014 | −7.308 | 0.000 | 0.904 | |
| Sweet drinks (X7) | 1 times per day | 0.094 | 0.007 | 13.815 | 0.000 | 1.099 |
| 3–6 times per week | 0.148 | 0.008 | 17.454 | 0.000 | 1.159 | |
| 1–2 times per week | 0.189 | 0.008 | 22.735 | 0.000 | 1.208 | |
| <3 times per month | 0.313 | 0.012 | 26.572 | 0.000 | 1.368 | |
| Never | 0.297 | 0.013 | 23.106 | 0.000 | 1.346 | |
| Salty foods (X8) | 1 times per day | 0.070 | 0.010 | 6.824 | 0.000 | 1.073 |
| 3–6 times per week | −0.077 | 0.010 | −7.773 | 0.000 | 0.926 | |
| 1–2 times per week | −0.113 | 0.009 | −12.268 | 0.000 | 0.893 | |
| <3 times per month | −0.056 | 0.010 | −5.640 | 0.000 | 0.946 | |
| Never | −0.016 | 0.010 | −1.568 | 0.117 | 0.984 | |
| Fatty/Oily foods (X9) | 1 times per day | −0.092 | 0.009 | −10.707 | 0.000 | 0.913 |
| 3–6 times per week | −0.158 | 0.008 | −19.229 | 0.000 | 0.854 | |
| 1–2 times per week | −0.165 | 0.008 | −20.722 | 0.000 | 0.848 | |
| <3 times per month | −0.184 | 0.010 | −18.937 | 0.000 | 0.832 | |
| Never | −0.495 | 0.014 | −35.457 | 0.000 | 0.609 | |
| Grilled foods (X10) | 1 times per day | −0.184 | 0.019 | −9.749 | 0.000 | 0.832 |
| 3–6 times per week | −0.311 | 0.016 | −18.881 | 0.000 | 0.733 | |
| 1–2 times per week | −0.419 | 0.016 | −26.825 | 0.000 | 0.658 | |
| <3 times per month | −0.430 | 0.016 | −27.690 | 0.000 | 0.651 | |
| Never | −0.452 | 0.016 | −28.697 | 0.000 | 0.636 | |
| Preserved foods (X11) | 1 times per day | −0.465 | 0.025 | −18.674 | 0.000 | 0.628 |
| 3–6 times per week | −0.550 | 0.022 | −25.115 | 0.000 | 0.577 | |
| 1–2 times per week | −0.597 | 0.021 | −28.800 | 0.000 | 0.551 | |
| <3 times per month | −0.694 | 0.020 | −34.273 | 0.000 | 0.499 | |
| Never | −0.856 | 0.020 | −42.964 | 0.000 | 0.425 | |
| Seasonings powders (X12) | 1 times per day | 0.117 | 0.006 | 19.308 | 0.000 | 1.124 |
| 3–6 times per week | 0.276 | 0.010 | 27.709 | 0.000 | 1.318 | |
| 1–2 times per week | 0.229 | 0.011 | 20.837 | 0.000 | 1.257 | |
| <3 times per month | 0.582 | 0.013 | 46.073 | 0.000 | 1.789 | |
| Never | 0.399 | 0.008 | 47.027 | 0.000 | 1.491 | |
| Soft/Carbonated drinks (X13) | 1 times per day | 0.313 | 0.032 | 9.805 | 0.000 | 1.368 |
| 3–6 times per week | 0.156 | 0.029 | 5.284 | 0.000 | 1.169 | |
| 1–2 times per week | 0.073 | 0.028 | 2.621 | 0.009 | 1.076 | |
| <3 times per month | −0.158 | 0.027 | −5.753 | 0.000 | 0.854 | |
| Never | −0.457 | 0.027 | −16.900 | 0.000 | 0.633 | |
| Energy drinks (X14) | 1 times per day | 0.046 | 0.031 | 1.476 | 0.140 | 1.047 |
| 3–6 times per week | 0.020 | 0.029 | 0.681 | 0.496 | 1.020 | |
| 1–2 times per week | −0.032 | 0.027 | −1.185 | 0.236 | 0.968 | |
| <3 times per month | −0.095 | 0.027 | −3.549 | 0.000 | 0.909 | |
| Never | −0.713 | 0.026 | −27.394 | 0.000 | 0.490 | |
| Instant foods (X15) | 1 times per day | 0.010 | 0.019 | 0.512 | 0.609 | 1.010 |
| 3–6 times per week | 0.048 | 0.017 | 2.767 | 0.006 | 1.049 | |
| 1–2 times per week | −0.063 | 0.017 | −3.710 | 0.000 | 0.939 | |
| <3 times per month | 0.084 | 0.017 | 4.901 | 0.000 | 1.088 | |
| Never | −0.009 | 0.018 | −0.533 | 0.594 | 0.991 | |
| Alcoholic drinks (X16) | No | −1.576 | 0.008 | −190.048 | 0.000 | 0.207 |
| Mental-emotional disorders (X17) | No | −1.029 | 0.007 | −150.755 | 0.000 | 0.357 |
| Diagnosed hypertension (X18) | No | −0.867 | 0.009 | −100.728 | 0.000 | 0.420 |
| Unknown | −0.982 | 0.009 | −110.600 | 0.000 | 0.375 | |
| Physical activity (X19) | Not adequate | −0.852 | 0.007 | −128.275 | 0.000 | 0.427 |
| Smoking (X20) | No | 0.219 | 0.005 | 41.165 | 0.000 | 1.244 |
| Fruit and vegetables consumptions (X21) | Not adequate | −1.248 | 0.009 | −135.504 | 0.000 | 0.287 |
Figure 4Obesity data classification tree for fold 6 out of the 10-fold CV for CART model based on the variables of alcoholic drinks (X16), energy drinks (X14), soft/carbonated drinks (X13), mental-emotional disorders (X17), Fruit and Vegetables Consumptions (X21), diagnosed hypertension (X18), Physical Activity (X19), and Marital Status (X2).