Literature DB >> 33747995

Can the Random Forests Model Improve the Power to Predict the Intention of the Elderly in a Community to Participate in a Cognitive Health Promotion Program?

Haewon Byeon1.   

Abstract

BACKGROUND: We aimed to develop a model predicting the participation of the elderly in a cognitive health program using the random forest algorithm and presented baseline information for enhancing cognitive health.
METHODS: This study analyzed the raw data of Seoul Welfare Panel Study (SWPS) (20), which was surveyed by Seoul Welfare Foundation for the residents of Seoul from Jun 1st to Aug 31st, 2015. Subjects were 2,111 (879 men and 1232 women) persons aged 60 yr and older living in the community who were not diagnosed with dementia. The outcome variable was the intention to participate in a cognitive health promotion program. A prediction model was developed by the use of a Random forests and the results of the developed model were compared with those of a decision tree analysis based on classification and regression tree (CART).
RESULTS: The random forests model predicted education level, subjective health, subjective friendship, subjective family bond, mean monthly family income, age, smoking, living with a spouse or not, depression history, drinking, and regular exercise as the major variables. The analysis results of test data showed that the accuracy of the random forests was 72.3% and that of the CART model was 70.9%.
CONCLUSION: It is necessary to develop a customized health promotion program considering the characteristics of subjects in order to implement a program effectively based on the developed model to predict participation in a cognitive health promotion program.
Copyright © 2021 Byeon et al. Published by Tehran University of Medical Sciences.

Entities:  

Keywords:  Cognitive health promotion program; Decision tree; Prediction model; Random forest

Year:  2021        PMID: 33747995      PMCID: PMC7956102          DOI: 10.18502/ijph.v50i2.5346

Source DB:  PubMed          Journal:  Iran J Public Health        ISSN: 2251-6085            Impact factor:   1.429


Introduction

Medical technology has been advanced greatly and the advancement of it has extended the average life span. Consequently, the proportion of old age in life is increasing worldwide. South Korea, particularly, has experienced incomparably rapid aging (1). For example, it has entered an aged society with more than 14% of an elderly population rate in 2017, and it is expected to reach a super-aged society in 2026 (1). Despite this rapid rate of aging, the term “cognitive health” was not well known in South Korea until the early 21st century compared to physical health. Moreover, the deterioration of cognitive abilities such as the loss in memory, language ability, and visuospatial function received much less attention than physical diseases such as the muscular-skeletal disease (2). However, previous studies in the past 20 yr have reported that it would be possible to prevent cognitive impairment in the old age such as dementia by altering risk factors such as health behaviors (3, 4). As a result, more attention has been given to cognitive health promotion programs that can maintain cognitive health before the occurrence of dementia and prevent it beforehand (5). Particularly, elderly dementia is one of the four major causes of death (i.e., heart disease, cancer, stroke, and dementia), and it has been reported that the prevalence of it exceeds 10% of elderly people (≥65 yr old) (6). The surge of cognitive impairment in the elderly has become a big problem not only on the individual but also on the social level. People in the old age can experience lypophrenia ultimately due to the crisis in self-identity caused by anxiety, conflicts, and the sense of loss, which are the results of reduced social activities and tangible and intangible loss experience (7). Consequently, the elderly can experience depression in extreme case along with gloom, isolation, and abiotrophy (7). The depression of the elderly is often left unattended because they are not aware that they are experiencing depression or, even they do, they rarely express their feelings directly (8, 9). Elderly depression is highly likely to delay or recur (10), and chronic depression is known risk factor of dementia (11). Therefore, it is necessary to develop and apply an intervention program at the local community level that can delay the deterioration of cognitive function or improve cognitive function in order to prevent dementia, which is a major hindrance of individual life and a big social problem. The Ministry of Health and Welfare of South Korea has operated “the Dementia Early Detection and Cognitive Health Promotion Program for the Elderly Living in a Local Community” in local community centers since 2009 in order to prevent the deterioration of the mental function of the elderly and provide self-health management methods for mental health disorders such as depression, stress, and dementia. Gender, social activities, past counseling experience, educational level, and religion were major factors related to the people’s intention in participating in elderly cognitive health promotion programs. However, these previous studies (12, 13) were conducted only for the elderly who visited community centers (14) and only examined the individual factors such as health behaviors using a general linear model (15). Only a few studies predicted the participation intention for preventing cognitive disorders or identified multiple risk factors. Furthermore, previous studies on the dementia care program have mainly used survey (16), qualitative study on dementia care services (17), and review on health promotion (18). There are not enough studies for developing a prediction model using data mining technologies such as the artificial neural network or the decision tree model (19). In our best knowledge, there is no model that can predict the participation intention in a cognitive health promotion program based on data mining algorithm using epidemiological data representing an elderly population in one local community. In summary, since the cognitive health problem in the old age directly affects the life of the elderly, it is needed to identify factors influencing the intention of participating in cognitive health promotion program in order to maintain cognitive health and prevent dementia. Particularly, it is necessary to develop a prediction model considering various factors such as sociological factors, social environment, health level, and health behavior complexly. We aimed to develop a model predicting the participation of the elderly in a cognitive health program using the random forest algorithm and presented baseline information for enhancing cognitive health.

Materials and Methods

Subjects

This study analyzed the raw data of Seoul Welfare Panel Study (SWPS) (20), which was surveyed by Seoul Welfare Foundation for the residents of Seoul from Jun 1st to Aug 31st, 2015. This study was approved by the institutional review board of Honam University (No. 1041223-201812-HR-26). SWPS was carried out to understand the welfare level of households living in Seoul, to identify the actual status of socially vulnerable classes, and to estimate the demand for welfare services. The population of this study was households in Seoul among the 2009 Population and Housing Census target households. Stratified clustered sampling was used for choosing samples from the 25 districts in Seoul. Major survey items were income, economic level, health, living condition, and demand for welfare services. The survey was conducted using the computer-assisted personal interviewing method. This method had an interviewer visits the surveyed households and enters the response according to the structured questionnaire using a portable computer. We targeted 2,111 people (879 men and 1232 women) among 7,761 people living, who were not diagnosed with dementia, in the local community after excluding 5,650 people who were younger than 60 yr old.

Measurement of Variables

The outcome variable was the intention to participate in a cognitive health promotion program (yes or no). Examples of explanatory variables were age (60–69 or 70+), gender, the highest level of education (below elementary school, middle school, high school, or college graduate or above), current economic activity (yes or no), mean monthly family income (<2 million KRW, 2–4 million KRW, or ≥4 million KRW), living with a spouse (married and living together, married but not living together, and no spouse), smoking (non-smoker, smoker in the past, or current smoker), drinking (non-drinker or drinker), subjective health status (good, average, or bad), regular exercise (no or yes) subjective family bond (good, average, or bad), friendship (good, average, or bad), and presence of a depression symptom in the past one month (no or yes). In this study, a current smoker is defined as a person who smokes one cigarette per day or smokes from time to time and who has smoked more than five packs (100 cigarettes) during the whole life, referring to the WHO standards. A past smoker is defined as a person who smoked in the past but does not smoke any longer, while a non-smoker is defined as a person who has never smoked before or smoked less than 5 packs in the whole life. Short form geriatric depression scale (GDS-SF) (21) was used for measuring depression. It is composed of 15 items (yes=1 and no=2) and over 5 points were defined as depression.

Analysis MethodDevelopment and evaluation of a prediction model

The potentially associated factors of the intention to participate in the cognitive health promotion program were analyzed using chi-square test. A random forest-based prediction model was developed by dividing the data into training data (70%) and test data (30%). The results of the developed model were compared with those of a decision tree analysis based on classification and regression tree (CART). All statistical analysis was conducted using R (version 3.5.0) and statistical significance was determined at alpha=0.05 using a two-tailed test. The prediction performance of the developed model was evaluated using a confusion matrix, and the importance of variables and the derived main risk factors were compared (22).

Random Forest

The random forest is an ensemble classifier that randomly learns a number of decision trees (Fig. 1). This model consists of a training step and a test step. The training step composes many decision trees, while the test step classifies and predicts an input vector (23). Although the random forest is similar with bagging in terms of enhancing stability by combining decision trees, which are produced from many bootstrap samples using a majority rule, it is different from bagging in the aspect that it uses a small number of explanatory variables randomly selected from each bootstrap sample. The random forest randomly extracts several explanatory variables for bootstrap samples and establishes decision trees in order to regulate the correlation of coupled models. However, it constructs a model without pruning as much as possible. The advantages of the random forest are to have a strong prediction power than a decision tree and to prevent over-fitting (24).
Fig. 1:

Concept of random forest algorithm

Concept of random forest algorithm

CART

CART is one of the statistical decision classification model analysis algorithms. It measures impurity using Gini index and it is an algorithm based on binary split forming only two children nodes from one parent node (25). The advantages of CART are that it is easy to interpret the generated rules and that it can use both continuous and categorical variables (26). Gini coefficient means the probability that two random samples from n elements belong to different groups (27). When the amount of Gini coefficient’s reduction is calculated, the classification variable and optimum separation, which reduce a Gini coefficient the most, as the child nodes as the last step of the algorithm (28). In this study, the separation and merge threshold of the decision rule was set as 0.05. Moreover, the number of parent nodes, the number of child nodes, and the number of branches were limited to 200 people, 100 people, and 5.

Results

Potentially associated factors of the participation intention in the cognitive health promotion program in the old age

The general characteristics and potential factors of participants were organized by their participation intention in a cognitive health promotion programs (Table 1). Out of 2,111 subjects, 221 subjects (9.9%) answered that they were interested in participating in a cognitive health promotion program. The results of chi-square test showed that the elderly with the intention to participate in a cognitive health promotion program and the elderly without it were significantly (P<0.05) different in the experience of depression in the past one month and current economic activity. In other words, the elderly with a depression symptom (12.6%) and those not economically active showed significantly (P<0.05) higher intention to participate in a cognitive health promotion program.
Table 1:

Potentially associated factors of the participation intention in the cognitive health promotion program, n (%)

VariablesParticipation intention in the cognitive health promotion programP
No (N=1,900)Yes (N=211)
Age(yr)0.562
  60–691,013 (89.6)117 (10.4)
  70+887 (90.4)94 (9.6)
Gender0.674
  Male794 (90.3)85 (9.7)
  Female1,106 (89.8)126 (10.2)
Living with a spouse0.324
  Married and living together1,285 (90.6)134 (9.4)
  Married but not living together39 (92.9)3 (7.1)
  No spouse576 (88.6)74 (11.4)
Highest level of education0.068
  Below elementary school821 (89.8)93 (10.2)
  Middle school346 (92.8)27 (7.2)
  High school434 (87.5)62 (12.5)
  College graduate or above299 (91.2)29 (8.8)
Current economic activity0.027
  Yes330 (93.2)24 (6.8)
  No1,570 (89.4)187 (10.6)
Mean monthly family income0.288
  <2 million krw1,223 (89.5)143 (10.5)
  2–4 million krw447 (92.0)39 (8.0)
  ≥4 million krw83 (89.2)10 (10.8)
Drinking0.179
  Non-drinker1,443 (89.5)169 (10.5)
  Drinker457 (91.6)42 (8.4)
Smoking0.267
  Non-smoker192 (89.7)22 (10.3)
  Smoker in the past406 (88.1)55 (11.9)
  Current smoker1,302 (90.7)134 (9.3)
Subjective health status0.060
  Good540 (92.5)44 (7.5)
  Average619 (88.7)79 (11.3)
  Bad741 (89.4)88 (10.6)
Regular exercise0.081
  No1,065 (91.0)105 (9.0)
  Yes835 (88.7)106 (11.3)
Subjective family bond0.156
  Good1,102 (90.8)112 (9.2)
  Average585 (88.4)77 (11.6)
  Bad165 (92.2)14 (7.8)
Friendship0.389
  Good641 (91.2)62 (8.8)
  Average974 (89.2)118 (10.8)
  Bad285 (90.2)31 (9.8)
Depression in the past one month0.021
  No1,427 (90.9)143 (9.1)
  Yes473 (87.4)68 (12.6)
Potentially associated factors of the participation intention in the cognitive health promotion program, n (%)

Major prediction factors in the random forests model

The variable importance of the random forests model estimated by the mean decrease in node impurity is shown in Fig. 2. The results of the analysis revealed that major associated factors of South Korean elderly to predict the participation in a cognitive health promotion program were education level (12.52), subjective health (11.87), subjective friendship (11.56), subjective family bond (10.63), mean monthly family income (8.62), age (8.36), smoking (7.52), living with a spouse (6.99), depression history (6.72), drinking (6.14), and regular exercise (6.12). Among them, the education level was the most important predictor for predicting participation in a cognitive health promotion program.
Fig. 2:

The variable importance of the random forests model

The variable importance of the random forests model

Comparing the accuracy of the developed prediction model

In order to validate the performance of the developed model, the accuracy of the random forest-based cognitive health promotion program participation prediction model was calculated (Fig. 3) and the results were compared with the accuracy of the decision tree based prediction model (Table 2). The analysis results of training data showed that the accuracy of the random forests was 73.6% and that of the decision tree model was 71.5%. Moreover, those of test data revealed that the accuracy of the random forests was 72.3% and that of the decision tree model was 70.9%.
Fig. 3:

Accuracy of the random forests

Table 2:

Comparing the accuracy of the developed prediction model

DataModelAccuracy (%)
Training dataDecision tree model71.5
Random forests73.6
Test dataDecision tree model70.9
Random forests72.3
Accuracy of the random forests Comparing the accuracy of the developed prediction model

Comparing the risk factors of prediction models

Major related variables derived from the cognitive health promotion program participation prediction model using 13 explanatory variables were compared and the results are summarized in Table 3. Among the prediction models used in this study, the random forests model estimated major variables using the reduction of the Gini coefficient. The decision tree model identified nine variables (subjective health status, smoking, the highest level of education, living with a spouse or not, drinking, current employment, subjective family bond, depression symptoms, and subjective friendship) as the factors related to the cognitive health promotion program participation, and the accuracy was 70.9%. The random forests model predicted education level, subjective health, subjective friendship, subjective family bond, mean monthly family income, age, smoking, living with a spouse or not, depression history, drinking, and regular exercise as the major variables, and the accuracy was 72.3%.
Table 3:

Comparing the risk factors of prediction models

ModelNumber of factorCharacteristics
Decision tree model9Subjective health status, smoking, the highest level of education, living with a spouse or not, drinking, current employment, subjective family bond, depression symptoms, and subjective friendship
Random forests11Education level, subjective health, subjective friendship, subjective family bond, mean monthly family income, age, smoking, living with a spouse or not, depression history, drinking, and regular exercise
Comparing the risk factors of prediction models

Discussion

Efficiently managing the cognitive health of the elderly living in a local community is an important task of the public health service. This study analyzed the major factors influencing the cognitive health promotion program participation intention of South Korean elderly using the random forests model, which is one of the ensemble algorithm-based decision classification models. The results of the established random forests based prediction model showed that education level, subjective health status, subjective affection, subjective family bond, mean monthly family income, age, smoking, living with a spouse or not, depression history, drinking, and regular work out were closely related to the cognitive health promotion program participation. Especially, the education level was the most important predictor. The education level was the most important predictor among various factors. Previous studies have shown that educational level is related to cognitive impairment in the old age (29). The results of meta-analysis studies revealed that, regardless of race and culture, the elderly with higher education had a lower risk of dementia (30) while those with lower education had a higher risk of dementia (31). It could be because higher education is related to strengthening the cognitive reserve, which is the power to maintain cognitive functions normally against the aging-related brain degenerative atrophy (32). According to the cognitive reserve theory, the brain can be shrunk or grow. Therefore, it is important to establish the cognitive reserve to cope with geriatric dementia (33). Therefore, it is necessary to develop cognitive health promotion programs that prioritize the elderly with lower educational level in order to prevent cognitive impairment and maintain cognitive health. Furthermore, it would be more effective to run a program when a health promotion program for enhancing the cognitive health in the old age is customized for the education level of a target subject. It was noteworthy that gender was not a major factor in the random forests model. The results disagreed with previous studies (14, 15) using logit models showing that gender was an important factor of participating in a health promotion program. Song & Boo (2016) (14) reported that gender was the most important factor affecting the intention to participate in a health promotion program for the elderly living alone: the female elderly showed 4.85 times stronger intention to participate in a program than the male elderly. It is believed that the discrepancy was due to the characteristics of subjects. The subjects of Song & Boo (2016) (14) were mostly composed of the female elderly (72%) and the study targeted people already registered in a health promotion program offered by a community center. However, in this study, the proportion of female subjects (59%) and that of male subjects (41%) were similar and this study included the elderly staying at the home. They could be the reason that this study showed little effect of gender on the intention to participate in a cognitive health promotion program. Therefore, the differentiation of contents considering the educational level will increase the effects of intervention more than those considering the gender characteristics. The accuracy and prediction power of the random forests was higher than that of the decision tree. It is believed that the random forests were more accurate than the decision tree because the random forests were based on the bagging algorithm that produces diverse decision trees from 500 bootstrap samples. The decision tree is highly affected by parameters determining node because it is possible for outliers to construct separate nodes. In other words, the decision tree has a high risk of over fitting (25). On the other hand, it has been reported that the random forests have higher accuracy and performance than decision trees because the random forests maintain the deviation of trees and reduce deviation because the random forests produce numerous trees and predict target variables through the mean or probability of each tree (24). Therefore, the random forests would be more suitable for data with many explanatory variables such as disease examination data or establishing a prediction model using a distributed processing system such as big data than the decision tree model because the random forests generate trees by extracting many training data and predict target variables. The advantage of this study was that it developed a model to predict participation in a cognition health promotion program using epidemiological data that can represent the population. The first limitation of this study was that the explanatory variable did not include the level of cognition. Future studies are needed to develop a prediction model by identifying the level of cognition using standardized cognitive screening test such as Mini-Mental State Examination, and develop a prediction model using it as an explanatory variable. The second limitation was that it was impossible to draw a causal relationship even from meaningful relationships because this study analyzed cross-sectional data.

Conclusion

It is highly likely that those who voluntarily participate in a health promotion project are more motivated than those who do not. It is necessary to develop a customized health promotion program considering the characteristics of subjects in order to implement a program effectively based on the developed model to predict participation in a cognitive health promotion program. Moreover, future studies are needed to evaluate means to increase the prediction power of the random forests model using the weighted random forests algorithm.
  20 in total

1.  Association of alcohol drinking with verbal and visuospatial memory impairment in older adults: Clinical Research Center for Dementia of South Korea (CREDOS) study.

Authors:  Haewon Byeon; Yunhwan Lee; Soon Young Lee; Kang Soo Lee; So Young Moon; HyangHee Kim; Chang Hyung Hong; Sang Joon Son; Seong Hye Choi
Journal:  Int Psychogeriatr       Date:  2014-08-14       Impact factor: 3.878

2.  What do community-dwelling people with dementia need? A survey of those who are known to care and welfare services.

Authors:  Henriëtte G van der Roest; Franka J M Meiland; Hannie C Comijs; Els Derksen; Aaltje P D Jansen; Hein P J van Hout; Cees Jonker; Rose-Marie Dröes
Journal:  Int Psychogeriatr       Date:  2009-07-15       Impact factor: 3.878

Review 3.  Contribution of depression to cognitive impairment and dementia in older adults.

Authors:  Guy G Potter; David C Steffens
Journal:  Neurologist       Date:  2007-05       Impact factor: 1.398

4.  'How can they tell?' A qualitative study of the views of younger people about their dementia and dementia care services.

Authors:  Angela Beattie; Gavin Daker-White; Jane Gilliard; Robin Means
Journal:  Health Soc Care Community       Date:  2004-07

Review 5.  No health without mental health.

Authors:  Martin Prince; Vikram Patel; Shekhar Saxena; Mario Maj; Joanna Maselko; Michael R Phillips; Atif Rahman
Journal:  Lancet       Date:  2007-09-08       Impact factor: 79.321

6.  Improving access to geriatric mental health services: a randomized trial comparing treatment engagement with integrated versus enhanced referral care for depression, anxiety, and at-risk alcohol use.

Authors:  Stephen J Bartels; Eugenie H Coakley; Cynthia Zubritsky; James H Ware; Keith M Miles; Patricia A Areán; Hongtu Chen; David W Oslin; Maria D Llorente; Giuseppe Costantino; Louise Quijano; Jack S McIntyre; Karen W Linkins; Thomas E Oxman; James Maxwell; Sue E Levkoff
Journal:  Am J Psychiatry       Date:  2004-08       Impact factor: 18.112

Review 7.  Education and dementia in the context of the cognitive reserve hypothesis: a systematic review with meta-analyses and qualitative analyses.

Authors:  Xiangfei Meng; Carl D'Arcy
Journal:  PLoS One       Date:  2012-06-04       Impact factor: 3.240

8.  Does physical activity prevent cognitive decline and dementia?: A systematic review and meta-analysis of longitudinal studies.

Authors:  Sarah J Blondell; Rachel Hammersley-Mather; J Lennert Veerman
Journal:  BMC Public Health       Date:  2014-05-27       Impact factor: 3.295

9.  Identifying Important Risk Factors for Survival in Kidney Graft Failure Patients Using Random Survival Forests.

Authors:  Omid Hamidi; Jalal Poorolajal; Maryam Farhadian; Leili Tapak
Journal:  Iran J Public Health       Date:  2016-01       Impact factor: 1.429

10.  Differences in Health-related Quality of Life and Mental Health by Living Arrangement among Korean Elderly in the KNHANES 2010-2012.

Authors:  Yeunhee Kwak; Haekyung Chung; Yoonjung Kim
Journal:  Iran J Public Health       Date:  2017-11       Impact factor: 1.429

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.