Literature DB >> 30402378

Developing the High-Risk Drinking Scorecard Model in Korea.

Jun-Tae Han1, Il-Su Park2, Suk-Bok Kang3, Byeong-Gyu Seo3.   

Abstract

OBJECTIVES: This study aimed to develop a high-risk drinking scorecard using cross-sectional data from the 2014 Korea Community Health Survey.
METHODS: Data were collected from records for 149,592 subjects who had participated in the Korea Community Health Survey conducted from 2014. The scorecard model was developed using data mining, a scorecard and points to double the odds approach for weighted multiple logistic regression.
RESULTS: This study found that there were many major influencing factors for high-risk drinkers which included gender, age, educational level, occupation, whether they received health check-ups, depressive symptoms, over-moderate physical activity, mental stress, smoking status, obese status, and regular breakfast. Men in their thirties to fifties had a high risk of being a drinker and the risks in office workers and sales workers were high. Those individuals who were current smokers had a higher risk of drinking. In the scorecard results, the highest score range was observed for gender, age, educational level, and smoking status, suggesting that these were the most important risk factors.
CONCLUSION: A credit risk scorecard system can be applied to quantify the scoring method, not only to help the medical service provider to understand the meaning, but also to help the general public to understand the danger of high-risk drinking more easily.

Entities:  

Keywords:  Korea Community Health Survey; data mining; high-risk drinking; scorecard; weighted multiple logistic regression

Year:  2018        PMID: 30402378      PMCID: PMC6202019          DOI: 10.24171/j.phrp.2018.9.5.04

Source DB:  PubMed          Journal:  Osong Public Health Res Perspect        ISSN: 2210-9099


Introduction

Moderate alcohol consumption is generally known to reduce the risk of ischemic heart disease [1,2]. However, alcohol consumption has been recognized as one of the major risk factors of preventable mortality and morbidity. Binge drinking and heavy drinking have been associated with violence, poor management of diabetes, neurological damage, hypertension, hepatitis, gastrointestinal and heart disease, liver cirrhosis, cancers such as oral, rectal, and liver cancer, stroke, and alcohol dependence [3-5]. The 2016 Korea National Health and Nutrition Examination Survey [conducted by the Korea Centers for Disease Control and Prevention (KCDC)] revealed rates of monthly alcohol consumption and high-risk drinking of 61.9% and 13.8%, respectively. This rate of high-risk drinking is very high compared to the rates reported by the World Health Organization for Africa (5.7%), the Americas (13.7%), the Eastern Mediterranean (0.1%), Europe (16.5%), South-East Asia (1.6%), and the Western Pacific Region (7.7%) [6]. In addition, many longitudinal studies in Korea have focused on the health effects of alcohol drinking [7-9]. Many recent studies have found that age, income level, employment status, smoking status, obesity, subjective assessment of health, and presence of spouse are all related to high-risk drinking [10,11]. Previous studies ranged from small scale sample surveys which were designed and surveyed individually, to large scale sample surveys which were collected the nationally. Some studies involved individuals exposed to high-risk drinking thus were a group that needs an improvement of drinking culture the most. Logistic regression analysis was the preferred method of most studies to detect risk factors for high-risk drinking. However, there was a difficulty in interpreting the results of high-risk drinking predictions, generally or utilizing it in medical service. The purpose of this study was to develop a predictive model of high-risk drinking in Korea using data mining. Scorecards for high-risk drinking may be used by employing the developed prediction model.

Materials and Methods

1. Study design

This study was a secondary analysis of data that was collected in a nationally representative cross sectional and population-based survey conducted by the KCDC. The overall framework of this study is shown in Figure 1.
Figure 1

Framework of the study.

KCHS = Korea Community Health Survey; PDO = point to double the odds.

2. Subjects

This study was based on data acquired in the Korea Community Health Survey (KCHS) from 2014. The KCHS is a national health survey conducted since 2008 to provide population-based estimates of health indicators to be used for the development and assessment of public health policies and programs. The 2014 KCHS used a multistage sampling design to obtain a representative sample of adults aged 19 years or older. Those aged 19 years or older were initially selected. Subjects who did not respond to the questionnaires about sociodemographic variables and health-related variables were excluded. After exclusion criteria were applied, 149,592 subjects were included in the final analysis.

3. Study variables

The target variable in this study was the alcohol consumption pattern. Any person who had drunk any kind of alcoholic beverage during the past year was classified as a current drinker and was asked more questions on the quantity consumed in a typical day and the drinking frequency. “High-risk” was defined as the male respondents who consumed more than 7 drinks twice a week or more, as well as female respondents who consumed more than 5 drinks twice a week or more. All others were defined as “normal.” For the comprehensive analysis of the various factors associated with high-risk alcohol consumption, health behaviors, sociodemographic variables, and self-rated health status, including mental health, were selected as independent variables. The sociodemographic variables were gender, age, marital status, monthly household income, education level, and occupation. The lifestyle and health-related variables were over-moderate physical activity (participated in moderate physical activity for 5 days or more per week, and for 30 minutes or more per activity, or in vigorous activity for 3 days or more per week, and for 20 minutes or more per activity), eating a breakfast regularly, current smoking status, health check-ups during the past 2 years (proxy variable indication for interest on self-health care [12-14]), experience of depression (yes, no), subjective health status (good, bad), subjective stress recognition (yes, no), and obesity status (Table 1).
Table 1

Data description in the analysis.

VariableDefinition
InputTargetMale respondents who consumed more than 7 drinks twice a week or more, as well as female respondents who consumed more than 5 drinks twice a week or more, were defined as high-risk drinkers.(1: High-risk, 0: Normal)
GenderMale, Female
Age (y)19~29, 30~39, 40~49, 50~59, 60~69, ≥ 70
Marital statusMarriedOthers: Never married, separated, divorced, widowed
Monthly household income (million KRW)< 0.5, 0.5~1.0, 1.0~2.0, 2.0~3.0, 3.0~4.0, 4.0~5.0, 5.0~6.0, ≥ 6.0
Occupation

- Administrative officer: Administrative, management, or professional occupation

- Clerical officer: Business and financial operations occupations

- Service and sales worker: Sales and related occupations

- Farmer and fisher: Farming, fishing, and forestry occupations

- Elementary work: Installation, maintenance, and repair occupations/labors

- Other

Educational levelUneducated, elementary school, middle school, high school, university or higher
Health check-upYes, No
Experience of depressionYes, No
Subjective health statusGood, Bad (Fair or poor)
Over-moderate physical activityYes, No
Subjective stress recognitionYes, No
Current smokingYes, No
ObesityYes: BMI ≥ 25 kg/m2, No: BMI < 25 kg/m2
Eating a breakfast regularlyYes, No

4. Statistical analysis

Data analysis, predictive model and scorecard development were performed with SAS version 9.4. In order to calculate the total population that the sample would represent, the stratification variables and sampling weights designated by the KCDC were employed. All data were described as unweighted frequency, and weighted percentage. χ2-test for categorical variables were performed. The high-risk drinking predictive model was built on the training set and tested the validity of the models on the validation set. The data set was divided into the training data set (60%) and the validation data set (40%). The training set contained 90,015 cases (60%) represented by 73,250 normal cases and 16,765 high-risk cases. The validation set comprised 59,577 cases (40%), divided into 48,512 normal cases and 11,065 high-risk cases. In this study, the weighted multiple logistic regression model was employed to develop the high-risk predictive model and a predictive scorecard was suggested for high-risk drinkers using a developed model.

Results

1. Differences in variables by target groups

Table 2 shows the key input variables used in the analysis by target groups. Of the 149,592 Korean adults, 27,830 (18.6%) participants were in the high-risk group and 121,762 (81.4%) participants were in the normal group. Significant differences between the 2 groups were observed in sociodemographic factors. The percentage of male participants in the high-risk group was 83.7% and 49.7% in the normal group. In the high-risk group, there were significantly higher numbers of participants who were between 40–49 years old and who were high school graduates compared with the normal group. The biggest proportion of the high-risk group was participants employed in elementary work (30.9%), while the biggest proportion of the normal group had a different employment status [unemployed, full-time student, soldier (33.3%)]. Table 2 also provides a breakdown of the proportion in both groups depending on whether they received health checkups during the prior 2 years, or they participated in over-moderate physical activity, or they were stressed. In addition, the percentage of current smokers in the high-risk group was higher than in the normal group [21.3% normal group, 51.5% high-risk group (Table 2)].
Table 2

Descriptive characteristics for the variables in the analysis.

VariableNormalHigh-riskχ2


NWeighted (N)%NWeighted (N)%
Gender
 Male57,15512,117,46249.723,0994,765,41483.78001.2402**
 Female64,60712,254,59050.34,731929,83716.3

Age (y)
 19~2918,3375,270,49721.63,266965,14916.91323.9462**
 30~3922,5385,220,38721.45,5631,288,90722.6
 40~4927,1585,650,64223.27,9881,639,86428.8
 50~5924,8134,586,04418.86,6421,257,35322.1
 60~6916,3372,224,0669.13,009402,0317.1
 ≥ 7012,5791,420,4175.81,362141,9472.5

Educational level
 Uneducated6,428617,1212.567174,9321.3415.7543**
 Elementary school13,8541,563,9236.42,373289,7215.1
 Middle school12,3791,855,5737.62,985471,3888.3
 High school46,1419,828,63140.312,5462,609,86645.8
 University or higher42,96010,506,80543.19,2552,249,34439.5

Occupation
 Administrative officer16,5474,144,49117.03,585906,07615.92598.3654**
 Clerical officer13,4233,209,87513.23,623874,89615.4
 Service and sales worker17,2463,519,66514.44,452971,67517.1
 Farmer and fisher11,609690,0242.82,775193,9233.4
 Elementary work22,9874,683,06919.28,5341,761,09430.9
 Others39,9508,124,92733.34,861987,58817.3

Monthly household income (million won)
 < 0.56,354725,6633.0962127,8332.2147.624**
 0.5~1.011,4661,432,9105.91,971264,7394.6
 1.0~2.019,3633,197,30113.14,407737,82113.0
 2.0~3.024,4894,881,71220.06,1031,226,20821.5
 3.0~4.022,2084,837,85419.95,5261,213,42821.3
 4.0~5.015,2203,541,15314.53,718860,79615.1
 5.0~6.09,4292,336,4299.62,164534,5109.4
 ≥ 6.013,2333,419,02914.02,979729,91612.8

Marital status
 Married36,9248,458,59434.77,7211,803,39931.760.7085**
 Others84,83815,913,45865.320,1093,891,85268.3

Health check-up
 No38,7568,466,52334.79,4232,023,59735.54.097*
 Yes83,00615,905,52965.318,4073,671,65564.5

Experience of depression
 No114,13622,812,51293.626,0485,312,06593.32.5624
 Yes7,6261,559,5406.41,782383,1866.7

Subjective health status
 Bad71,22813,453,29155.215,8363,162,63855.50.6535
 Good50,53410,918,76144.811,9942,532,61344.5

Over-moderate physical activity
 No93,71918,985,09077.920,3584,229,20674.3104.8473**
 Yes28,0435,386,96222.17,4721,466,04525.7

Subjective stress recognition
 No90,43917,625,85772.318,8803,713,60765.2335.6205**
 Yes31,3236,746,19527.78,9501,981,64534.8

Currently smoking
 No97,37319,184,78578.713,8212,762,41548.54779.969**
 Yes24,3895,187,26721.314,0092,932,83651.5

Obesity
 No92,51518,672,47076.618,3933,753,28665.9785.9349**
 Yes29,2475,699,58223.49,4371,941,96534.1

Eating a breakfast regularly
 No34,2318,329,31534.29,5012,281,62240.1212.7888**
 Yes87,53116,042,73765.818,3293,413,62959.9
Total121,76224,372,052100.027,8305,695,251100.0

p < 0.05,

p < 0.01

2. Model building and performance

The data set was divided into the training set (60%) and the validation set (40%). The models on the training set were built and the validity of the models on the validation set were tested. The performance of the developed model was evaluated with respect to discrimination using the area under a receiver operating characteristic (ROC) curve, misclassification rate, and Kolmogorov-Smirnov statistics (Table 3).
Table 3

AUC, Kolmogorov-Smirnov statistics, and misclassification rate for the predictive model.

AUCKSMisclassification rate
Train data (60%)0.75300.39670.1865
Validation data (40%)0.75270.39990.1864

AUC = area under the curve; KS = Kolmogorov-Smirnov.

The ROC charts are graphical displays that give the global measure of the predictive accuracy of the model (Figure 2). They display the sensitivity against 1-specificity of a classifier for a range of cut-offs. Sensitivity is a measure of accuracy for predicting events that is equal to the true positive divided by the total actual positive. 1-specificity is a measure of accuracy for predicting non-events that is equal to the true negative divided by the total actual negative. The performance of the models is demonstrated by the degree to which the ROC curves push up and to the left. The area under the curves can provide a quantitative performance measure. The area will range from 0.5, for a worthless model, to 1, for a perfect classifier. The shapes of the ROC curves indicate that the predictive power of the model for predicting high-risk and normal is reasonably good (Figure 2).
Figure 2

ROC curve for the predictive model.

AUC = area under the curve; ROC = receiver operating curve.

Table 4 provides the parameter estimates of the risk prediction model for falling into high-risk group. The weighted logistic regression estimates revealed that men were significantly more likely to belong to high-risk group than women (p < 0.01). The parameter estimate for age groups showed that participants between the ages of 30 and 59 years old were significantly more likely to belong to a high-risk group than those aged under 29 (p < 0.01). Participants who had graduated from high school or had a lower level of education, were significantly more likely to belong to the high-risk group than those who had graduated from university or had a higher qualification (p < 0.01). Participants who worked in business, sales and related occupations were significantly more likely to belong to the high-risk group than administrative employees (p < 0.01). The ORs of those participating in over-moderate physical activity was higher than those without over-moderate physical activity. The ORs of person who had at least one of the following factors (smoking, depression experience, and stress relative to their reference group) were significantly higher [p < 0.01 (Table 4)].
Table 4

Result of weighted logistic regression analysis.

VariableCategoryβ̂OR
Intercept−3.0029**

Gender (ref: female)Male1.2774**3.59

Age (y, ref: 19–29)30–390.1767**1.19
40–490.3399**1.41
50–590.3**1.35
60–69−0.02890.97
≥ 70−0.6204**0.54

Educational level (ref: University or higher)Uneducated0.5667**1.76
Elementary school0.3488**1.42
Middle school0.4102**1.52
High school0.2919**1.34

Occupation (ref: Administrative officer)Clerical officer0.23**1.26
Service and sales worker0.1907**1.21
Farmer and fisher0.07591.08
Elementary work0.1111**1.12
Others−0.2007**0.82

Monthly household income (million won, ref < 0.5)0.5–1.0−0.03980.96
1.0–2.00.05641.07
2.0–3.0−0.007910.99
3.0–4.00.06081.06
4.0–5.00.03051.03
5.0–6.00.04291.04
≥ 6.00.02841.03

Marital status (ref: Other)Married0.04081.04
Health check-up (ref: No)Yes−0.0879**0.92
Experience of depression (ref: No)Yes0.1715**1.19
Subjective health status (ref: Bad)Good−0.03980.96
Over-moderate physical activity (ref: No)Yes0.0781**1.08
Subjective stress recognition (ref: No)Yes0.2036**1.23
Current smoking (ref: No)Yes0.7305**2.08
Obesity (ref: Normal) Obesity0.2637**1.30
Eating a breakfast regularly (ref: No)Yes−0.1995**0.82

p < 0.05,

p < 0.01

3. Scorecard development

Scorecards for high-risk drinking were evaluated using the developed prediction model. In this study, the concept of point to double the odds (PDO), which is the most widely used scaling in the credit risk industry. For example, if PDO is set at 20, the odds of the person who receives 520 points through this method is twice as likely as those of the person who has 500 points. To make the scorecard, the adjusted coefficient was calculated by subtracting the smallest regression coefficient estimate from the assumed coefficient estimates of each variable to make the adjusted coefficient greater than or equal to zero. Then, the appropriate PDO was determined and the corrected regression coefficient transformed linearly into a single score as shown in Equation 1 [15,16]. In this study, PDO was set at 58.43994, and Table 5 showed the result of the scorecard (Table 5).
Table 5

Result of scorecard.

VariableCategoryScoreMax score
GenderMale248.0248.0
Female0.0

Age (y)19–29120.4186.4
30–39154.7
40–49186.4
50–59178.7
60–69114.8
≥ 700.0

Educational levelUneducated110.0110.0
Elementary school67.7
Middle school79.6
High school56.7
University or higher0.0

OccupationAdministrative officer39.083.6
Clerical officer83.6
Service and sales worker76.0
Farmer and fisher53.7
Elementary work60.5
Others0.0

Monthly household income (million won)< 0.57.718.7
0.5–1.00.0
1.0–2.018.7
2.0–3.06.2
3.0–4.019.5
4.0–5.013.6
5.0–6.016.1
≥ 6.013.2

Marital statusMarried7.97.9
Other0.0

Health check-upYes0.017.1
No17.1

Experience of depressionYes33.333.3
No0.0

Subjective health statusGood0.07.7
Bad7.7

Over-moderate physical activityYes15.215.2
No0.0

Subjective stress recognitionYes39.539.5
No0.0

Current smokingYes141.8141.8
No0.0

ObesityObesity51.251.2
Normal0.0

Eating a breakfast regularlyYes0.038.7
No38.7
Table 2 also showed that males (score: 248.0), uneducated participants (score: 110.0), participants under 69 years of age (score: 114.8–186.4), and current smokers (score: 141.8) had scores higher than 100. For example, if an individual belongs in the following categories: ✓ a male in his forties (40–49 years) ✓ uneducated ✓ worked in a clerical office (business and financial operations occupations) ✓ monthly household income of 1.0~2.0 million won married ✓ without receiving health check-up ✓ experience of depression ✓ poor health status ✓ experience of over-moderate physical activity stressed ✓ current smoking ✓ obesity ✓ no regular breakfast The total score will be 1,000 (248.0 + 186.4 + 110.0 + 83.6 + 19.5 + 7.9 + 17.1 + 33.3 + 7.7 + 15.2 + 39.5 + 141.8 + 51.2 + 38.7) and he will belong to the most high-risk drinking group.

Discussion

The high-risk drinking predictive model was developed in Korea using cross-sectional data from KCHS (2014). A total of 149,592 individuals were included in this study, and the weighted multiple logistic regression model was employed to develop the high-risk drinking predictive model. In addition, a scorecard for high-risk drinking can be used that was designed using the developed prediction model. This study found that the major influencing factors for being a high-risk drinker were gender, age, educational level, occupation, whether they received health check-up, depressive symptoms, over-moderate physical activity, mental stress, smoking status, obese status, and regular breakfast. These finding were largely consistent with previous studies [17-19]. High-risk drinkers were more likely to be men in their thirties to fifties (30–59 years), or were office or sales workers. In particular, current smokers had an increased likelihood of high-risk drinking. However, monthly household income, marital status, and health status were not significantly related to the risk of falling into the high-risk drinking group in this study. The results from the scorecard showed that the largest score range were found in the following factors: gender, age, educational level, and smoking status. In addition, the uneducated participants had the highest risk factor score according to the education level, and that of clerical officers was the highest according to the occupation category. For example, a male who is in his forties (40~49 years), uneducated, worked in a clerical office, and currently smoking, will score at least 769.9 points for high-risk drinking. In Korea, individuals with the above-mentioned factors are more likely to become involved in social relationships, which could increase the likelihood of high-risk drinking [11]. A scorecard is mainly used by credit rating agencies to measure consumers’ credit so that the company prevent losses caused by consumers’ activities such as taking loans, issuing credit cards and buying insurance, etc. This scorecard system can be applied, to quantify the scoring method, not only to help medical service providers to understand the meaning, but also to help the general public to understand the dangers of high-risk drinking more easily. In this respect, this study is meaningful. In addition, it can provide a basis for more effective healthcare services such as education to prevent high-risk drinking. In addition to the data used in this study, further refinement of the model by reflecting the local, social environment and geographical factors related to drinking is expected to enable the setting of various measures to solve and prevent high-risk drinking. Finally, the scorecard modeling methodology will be helpful in measuring and understanding the level of health risk behaviors measured by various statistical models of health education program providers and users besides drinking.

✓ a male in his forties (40–49 years)

✓ uneducated

✓ worked in a clerical office (business and financial operations occupations)

✓ monthly household income of 1.0~2.0 million won married

✓ without receiving health check-up

✓ experience of depression

✓ poor health status

✓ experience of over-moderate physical activity stressed

✓ current smoking

✓ obesity

✓ no regular breakfast

  13 in total

1.  The health and health behaviors of people who do not drink alcohol.

Authors:  C A Green; M R Polen
Journal:  Am J Prev Med       Date:  2001-11       Impact factor: 5.043

2.  Prevalence of alcohol use disorder in a South Korean community--changes in the pattern of prevalence over the past 15 years.

Authors:  Bong-Jin Hahm; Maeng Je Cho
Journal:  Soc Psychiatry Psychiatr Epidemiol       Date:  2005-02       Impact factor: 4.328

3.  [Factors associated with cancer screening intention in eligible persons for national cancer screening program].

Authors:  Rock Bum Kim; Ki Soo Park; Dae Yong Hong; Cheol Heon Lee; Jang Rak Kim
Journal:  J Prev Med Public Health       Date:  2010-01

4.  Alcohol consumption and the risk of type 2 diabetes mellitus: effect modification by hypercholesterolemia: the Third Korea National Health and Nutrition Examination Survey (2005).

Authors:  Hyeongap Jang; Won-Mo Jang; Jong-Heon Park; Juhwan Oh; Mu-Kyung Oh; Soo-Hee Hwang; Yong-Ik Kim; Jin-Seok Lee
Journal:  Asia Pac J Clin Nutr       Date:  2012       Impact factor: 1.662

5.  Drinking behaviour among men and women in China: the 2007 China Chronic Disease and Risk Factor Surveillance.

Authors:  Yichong Li; Yong Jiang; Mei Zhang; Peng Yin; Fan Wu; Wenhua Zhao
Journal:  Addiction       Date:  2011-07-19       Impact factor: 6.526

6.  Alcohol consumption and mortality from all-cause and cancers among 1.34 million Koreans: the results from the Korea national health insurance corporation's health examinee cohort in 2000.

Authors:  Mi Kyung Kim; Min Jung Ko; Jun Tae Han
Journal:  Cancer Causes Control       Date:  2010-10-13       Impact factor: 2.506

7.  Alcohol consumption and health among elders.

Authors:  Ana I Balsa; Jenny F Homer; Michael F Fleming; Michael T French
Journal:  Gerontologist       Date:  2008-10

Review 8.  Alcohol, health, and the heart: implications for clinicians.

Authors:  J Chick
Journal:  Alcohol Alcohol       Date:  1998 Nov-Dec       Impact factor: 2.826

9.  Binge drinking among US adults.

Authors:  Timothy S Naimi; Robert D Brewer; Ali Mokdad; Clark Denny; Mary K Serdula; James S Marks
Journal:  JAMA       Date:  2003-01-01       Impact factor: 56.272

10.  Drinking patterns among Korean adults: results of the 2009 Korean community health survey.

Authors:  So Yeon Ryu; Catherine M Crespi; Annette E Maxwell
Journal:  J Prev Med Public Health       Date:  2013-07-31
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.