| Literature DB >> 30879019 |
Worrawat Engchuan1, Alexandros C Dimopoulos2,3, Stefanos Tyrovolas2,4,5,6, Francisco Félix Caballero7,8, Albert Sanchez-Niubo4,5,6, Holger Arndt9, Jose Luis Ayuso-Mateos6,10,11, Josep Maria Haro4,5,6, Somnath Chatterji12, Demosthenes B Panagiotakos2,13.
Abstract
BACKGROUND Studies on the effects of sociodemographic factors on health in aging now include the use of statistical models and machine learning. The aim of this study was to evaluate the determinants of health in aging using machine learning methods and to compare the accuracy with traditional methods. MATERIAL AND METHODS The health status of 6,209 adults, age <65 years (n=1,585), 65-79 years (n=3,267), and >80 years (n=1,357) were measured using an established health metric (0-100) that incorporated physical function and activities of daily living (ADL). Data from the English Longitudinal Study of Ageing (ELSA) included socio-economic and sociodemographic characteristics and history of falls. Health-trend and personal-fitted variables were generated as predictors of health metrics using three machine learning methods, random forest (RF), deep learning (DL) and the linear model (LM), with calculation of the percentage increase in mean square error (%IncMSE) as a measure of the importance of a given predictive variable, when the variable was removed from the model. RESULTS Health-trend, physical activity, and personal-fitted variables were the main predictors of health, with the%incMSE of 85.76%, 63.40%, and 46.71%, respectively. Age, employment status, alcohol consumption, and household income had the%incMSE of 20.40%, 20.10%, 16.94%, and 13.61%, respectively. Performance of the RF method was similar to the traditional LM (p=0.7), but RF significantly outperformed DL (p=0.006). CONCLUSIONS Machine learning methods can be used to evaluate multidimensional longitudinal health data and may provide accurate results with fewer requirements when compared with traditional statistical modeling.Entities:
Mesh:
Year: 2019 PMID: 30879019 PMCID: PMC6436225 DOI: 10.12659/MSM.913283
Source DB: PubMed Journal: Med Sci Monit ISSN: 1234-1010
Figure 1Global health metric distribution stratified by each value in each predictor. The boxplots show the distribution of the health metrics stratified by the unique value of the predictors. The difference in the distribution within the predictor suggests a relationship between the predictor and the health metric.
Figure 2Parameters optimization by grid search. The left panel represents the mean squared error (MSE) change over different values of the number of trees used to build a random forest model (ntree). The right panel represents the MSE change for the different values of the number of predictors randomly picked at each branch of the tree (mtry). The optimal parameters (ntree=500) (mtry=15) were used in the final model.
Figure 3Contribution of historical, personal-fitted, and health trend features. The scatter plots (A–C) illustrate the relationship between the personal-fitted predictor (A), health trend predictor (B), the prediction from 11 predictors (C), and the health metric. The boxplot (D) shows the squared errors (SE) from 11 predictors and all predictors.
Figure 4Performance comparison between three prediction models and random prediction. (A) Box plots show the distribution of squared errors of the random prediction model, deep learning (DL), the linear model (LM), and the random forest (RF) model. (B) A magnified version of (A) shows the difference between the DL, LM, and RF. Student’s t-test was used to calculate the P-values.
Summarized %incMSE and coefficients by predictors.
| Predictors | %incMSE | Standardized coefficients |
|---|---|---|
| Health trend (health metric estimated by 4 previous health metric) | 85.76 | 8.13 |
| Physical activity (active | 63.40 | 3.30 |
| Personal-fitted variable (health metric estimated by 11 socio-demographics and fall history of previous 4 waves) | 46.71 | 1.54 |
| Age groups (<65 | 20.40 | −0.61 |
| Employment (in work | 20.10 | 0.51 |
| Alcohol consumption | 16.94 | 0.74 |
| Quantiles Household Wealth (Q1–Q5) | 13.61 | 0.46 |
| Social network size (<5 | 7.16 | 1.22 |
| Falls (have fall history | 5.30 | −0.60 |
| Marital status (married | 5.29 | −0.03 |
| Smoke (never | 4.54 | −0.19 |
| Sex (males | 4.35 | 0.32 |
| Education (no qualification | 2.86 | 0.05 |