Literature DB >> 32565878

Influences of Daily Life Habits on Risk Factors of Stroke Based on Decision Tree and Correlation Matrix.

Zeguo Shao1,2, Yuhong Xiang1, Yingchao Zhu3, Aiqin Fan4, Peng Zhang5,6.   

Abstract

PURPOSE: To explore the influences of smoking, alcohol consumption, drinking tea, diet, sleep, and exercise on the risk of stroke and relationships among the factors, present corresponding knowledge-based rules, and provide a scientific basis for assessment and intervention of risk factors of stroke.
METHODS: The decision tree C4.5 algorithm was optimized and utilized to establish a model for stroke risk assessment; then, the main risk factors of stroke (including hypertension, dyslipidemia, diabetes, atrial fibrillation, body mass index (BMI), history of stroke, family history of stroke, and transient ischemic attack (TIA)) and daily habits (e.g., smoking, alcohol consumption, drinking tea, diet, sleep, and exercise) were analyzed; corresponding knowledge-based rules were finally presented. Establish a correlation matrix of stroke risk factors and analyze the relationship between stroke risk factors.
RESULTS: The accuracy of the established model for stroke risk assessment was 87.53%, and the kappa coefficient was 0.8344, which was superior to that of the random forest and Logistic algorithm. Additionally, 37 knowledge-based rules that can be used for prevention of risk factors of stroke were derived and verified. According to in-depth analysis of risk factors of stroke, the values of smoking, exercise, sleep, drinking tea, alcohol consumption, and diet were 6.00, 7.00, 8.67, 9.33, 10.00, 10.60, and 10.75, respectively, indicating that their influence on risk factors of stroke was reduced in turn; on the one hand, smoking and exercise were strongly associated with other risk factors of stroke; on the other hand, sleep, drinking tea, alcohol consumption, and diet were not firmly associated with other risk factors of stroke, and they were relatively tightly associated with smoking and exercise.
CONCLUSIONS: Establishment of a model for stroke risk assessment, analysis of factors influencing risk factors of stroke, analysis of relationships among those factors, and derivation of knowledge-based rules are helpful for prevention and treatment of stroke.
Copyright © 2020 Zeguo Shao et al.

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 32565878      PMCID: PMC7285386          DOI: 10.1155/2020/3217356

Source DB:  PubMed          Journal:  Comput Math Methods Med        ISSN: 1748-670X            Impact factor:   2.238


1. Introduction

Stroke is an acute cerebrovascular disease, associating with the characteristics of high morbidity, high disability, and high mortality. It is a refractory disease that imposes a major threat to human health and life [1]. At present, there are no effective treatments for stroke. Prevention is still the most feasible strategy to reduce the harm of stroke and reduce its social burden, especially with respect to high global incidence and potential risk factors of stroke [2]. The risk factors of stroke are divided into intervention factors (e.g., smoking, alcohol consumption, and body mass index (BMI)) and nonintervention factors (e.g., age, gender, ethnicity, and genetic attributes) according to whether the risk can be changed through intervention [3]. Hence, studying the intervention factors is of great significance for the prevention of stroke. In addition, we previously found that the interventional risk factors for stroke appeared more in people's daily lives and behavioral habits [4, 5]. Unhealthy lifestyles can trigger or increase the risk of stroke, and moderate lifestyle changes may reduce the risk of stroke as well [6]. Therefore, numerous scholars suggested that further studies should be carried out to provide effective interventions to guide and improve people's lifestyle, so as to reduce the risk and incidence of stroke [7-9]. However, in 2019, Altobelli et al. analyzed the relevant literature and found that research in this area was conducted in only a limited number of developed countries, and there were very few reports on the impact of lifestyle and dietary habits on risk factors of stroke [10]. In China, Huang et al. conducted relevant research and demonstrated that a healthy lifestyle (high fruit intake, quitting smoking, doing housework, and good sleep quality) may reduce the chance of recurrence of first-onset ischemic stroke [11]. Although the risk factors of stroke in daily life habits are not the main risk factors of stroke, they are closely associated with the main risk factors [12]. The present study was aimed at the Chinese population, and large-scale and multidimensional stroke data were collected through modern information technology. The optimized decision tree algorithm was used to analyze risk factors of stroke in daily life habits, derive knowledge-based rules, and establish a model for stroke risk assessment to analyze relationships among risk factors of stroke.

2. Materials and Methods

2.1. Data Collection and Pretreatment

We established a whole-course stroke management network system via collection of large-scale data from Shanghai suburban population, involving nearly 10,000 people, in which 5599 valid data were finally acquired. The data included subjects' demographic characteristics, physical examination, family medical history, treatment history, personal diet and lifestyle habits, sleep and breathing, psychological status, quality of life, and stroke knowledge. In order to facilitate classification of stroke, we also designed a rapid stroke screening form and performed statistical analysis. We preliminarily extracted and integrated data and determined 16 risk factors of stroke for further analysis. As shown in Table 1, among 5599 data collected, there were 2491 males and 3108 females, subjects' minimum and maximum age were 18 and 89 years old, respectively. The age- and gender-based data are shown in Figure 1.
Table 1

Subjects' clinical data.

Type of dataRisk factor of strokeFieldData distribution
Clinical diagnosisHypertensionHytey: 1242, n: 3782, uncertain: 575
DyslipidemiaDysly: 511, n: 4508, uncertain: 580
DiabetesDiaby: 403, n: 4618, uncertain: 578
Atrial fibrillationAFy: 75, n: 4940, uncertain: 584

Medical history and family historyFamily history of strokeFSHy: 449, n: 4460, uncertain: 690
History of strokeSHy: 165, n: 4730, uncertain: 704
TIATIAy: 95, n: 4350, uncertain: 1154

Demographic informationGenderGenM: 2491, F: 3108
AgeAgeRefer to Figure 1

Physical examinationBMIBMIcB1: 205, B2: 2926, B3: 1760, B4: 520, B5: 150, uncertain: 38

Daily habitsSmokingSmoky: 1192, n: 4379, null: 28
Alcohol consumptionAlcoy: 1065, n: 4500, null: 34
Drinking teaTeay: 1563, n: 3997, null: 39
DietDTC1: 2812, C2: 263, C3: 2181, null: 370
SleepSleepTS: 366, TB: 4958, BL: 205, null: 70
Exercise sportSportC1: 1518, C2: 1624, C3: 2275, null: 182

“y” means “yes,” “n” indicates “no,” and definitions of the types of BMI, diet, sleep, and exercise are presented in Figure 1. In Figure 1, we sometimes use fields to represent their corresponding stroke risk factors.

Figure 1

Distribution of age- and gender-based data.

As illustrated in Figure 1, [18,30) indicates that age is 18 years old or older and less than 30 years old; F and M denote female and male, respectively; and PN is the number of individuals. The present research analyzed the risk factors of smoking, alcohol consumption, drinking tea, diet, sleep, sport, and BMI. The above-mentioned factors were defined as follows: Smoking: those who have smoked for 6 months or more in their lifetime were marked as “y”; otherwise, they are denoted as “n” Alcohol consumption: those who have drunk no less than twice/week and no less than 80 ml each time were marked as “y”; otherwise, they were denoted as “n” Drinking tea: those who have drunk tea at least 3 days/week were marked as “y”; otherwise, they were denoted as “n” Diet: the daily food ingredients are mainly sugars, fats, or proteins, which were marked with “C1,” “C2,” and “C3,” respectively Sport: those who have exercised sport more than 3 times/week and more than 30 min each time, demonstrating regular level of sport, marked as “C1”; those who have exercised sport 2-3 times/week, and 10-30 min each time, reflecting medium level of sport, marked as “C2”; those who have exercised less than or equal to 1 time/week and less than 10 min each time, indicating lower level of sport, marked “C3” BMI: since the WHO standards are not highly appropriate for Chinese people, the Chinese Reference Standards were formulated with reference to the WHO standards and are divided into five types: B1, B2, B3, B4, and B5 (Table 2)
Table 2

Sleep classification.

AgeDuration of sleep (hours)Mark
<3 (months)<14TS
14~17TB
>17TL

1~2 (years old)<11TS
11~14TB
>14TL

6~13 (years old)<9TS
9~11TB
>11TL

14~17 (years old)<8TS
8~10TB
<10TL

18~64 (years old)<6TS
6~10TB
<10TL

>64 (years old)<7TS
7~8TB
<8TL
Sleep: duration of sleep in different ages can be divided into three types: very short-term, medium-term, and very long-term, which could be labelled as TS, TB, and TL, respectively, as shown in Figure 2
Figure 2

BMI classification.

According to the rapid screening of risk factors of stroke (including hypertension, dyslipidemia, diabetes, atrial fibrillation, smoking history, BMI, sport, stroke history, family history of stroke, and transient ischemic attack (TIA)), refer to the Guidelines for Screening, Prevention and Control of Ischemic Stroke presented by the Ministry of Health of China (hereinafter referred to the guidelines), this study classified stroke risk into H, M, L, N, T, and Y levels, as summarized in Table 3.
Table 3

Definition of different levels of risk factors of stroke.

TypeDefinition
YHave a history of stroke.
THas a previous transient ischemic attack.
HThe major risk factors defined in the guidelines are 2 items or more, or the major risk factors include 1 item, and the secondary risk factors involve 2 items or more.
MThe major risk factors defined in the guidelines include 1 item, and the secondary risk factors involve less than 2 items.
LThe main risk factors defined in the guidelines include 0 item, and the secondary risk factors involve 2 items or more.
NThe main risk factors defined in the guidelines include 0 item, and the secondary risk factors involve less than 2 items.

2.2. Decision Trees

The decision tree is a popular, logic-based, easily interpretable, straightforward, and widely applicable method [13]. The classic decision tree algorithms include ID3, C4.5, and CART. In contrast to ID3, which can only handle discrete variables, C4.5 and CART can handle continuous variables, and they are not sensitive to incomplete data. In addition, the CART generates binary trees and the C4.5 algorithm generates multiple branches. Decision trees can generate interpretable knowledge rules, which can express relationship between factors. This is in line with our goal to explore relationships among the risk factors of stroke. Therefore, the C4.5 algorithm was selected in the current research. Details of the C4.5 algorithm were described in the following.

2.2.1. C4.5 Algorithm

In 1992, Ross Quinlan developed the C4.5 decision tree algorithm [14]. C4.5 constructs a decision tree as a learning model from the data samples. The divide-and-conquer approach is adopted for construction of decision tree models using a measure called information gain to select the attribute from the dataset for the tree. (1) Information Gain. Suppose that there are C categories of data in the sample dataset D. The information entropy formula is as follows: where D represents the training dataset, C denotes the data class number, and p represents the ratio of the sample number in class i to all samples. When the attribute A is chosen as the node of the decision tree, the information entropy after the action of feature A is as follows: where k represents the data samples D divided into k parts. (2) Gain Ratio. The information gain represents the value of the information entropy that the dataset D decreases after the action of the feature A. The formula is as follows: The information gain ratio is given by

2.2.2. Improvement and Implementation of C4.5 Algorithm

We used a decision tree algorithm to analyze the above-mentioned 16 risk factors of stroke (see Table 1). The decision tree is generated using the J48 (C4.5 algorithm implementation) in the Weka classifier algorithm. The confidence factor for the pruning is set to 0.25, and the minimum number of instances per leaf (minNumObj) is set to 1. The 10-fold cross-validation is additionally used to select and evaluate the model. In order to solve imbalanced data problem and improve the robustness of the system, we, in the current study, presented SMOTE algorithm to improve the model. The SMOTE algorithm is an intelligent oversampling technique for unbalanced datasets proposed by Chawla et al. in 2002. It can effectively improve the overfitting phenomenon caused by traditional oversampling techniques and solve the problem of biased classification results. As illustrated in Figure 3, after classified dataset is preprocessed for equilibrium judgment, the number of records in each class is first counted to find out the maximum value (max) and minimum value (min) of the number of records and then quotient max and min, if max/min < 3. After the dataset is judged to be balanced, it is directly entered into the C4.5 classifier for classification. Otherwise, it is judged that the dataset is unbalanced and is entered into the SMOTE processor: first, the entire dataset is sampled, the sampling method is nonrepeatable sampling, the number is equal to the number of datasets, each record is randomly sorted, and then, SMOTE is used to generate new minority data. The effects of operations, such as filtering and sorting preprocessing on the SMOTE algorithm, are eliminated to ensure that the data obtained by SMOTE is obtained by randomly combining the major data and the minor data to avoid overfitting caused by the data generated by SMOTE only from the minor data. Then, the data are entered into the classification module.
Figure 3

SMOTE+C4.5 classification model.

3. Results

The number of leaves of the tree was 98, while the size of the tree was 171 (Figures 4–8). The performance indexes of the tree are as follows: classification accuracy: 87.5281%; kappa statistic: 0.8344; mean absolute error: 0.0567; and root-mean-square error: 0.175.
Figure 4

A decision tree to classify risk factors of stroke.

Figure 5

Decision tree #1 to classify risk factors of stroke.

Figure 6

Decision tree #2 to classify risk factors of stroke.

Figure 7

Decision tree #3 to classify risk factors of stroke.

Figure 8

Decision tree #4 to classify risk factors of stroke.

To assess the performance of the proposed system for stroke risk classification, precision, recall, accuracy, and kappa were calculated, and 10-fold cross-validation was used. Equations (5)–(8) were presented to calculate precision, recall, accuracy, and kappa, respectively. Precision represents the correct positive prediction ratio to the whole positive samples. Recall is the correct positive prediction ratio to the whole positive predictions. Accuracy is correct prediction ratio to the whole predictions. True positives (TPs) are positive cases that are correctly predicted as positive. False negatives (FNs) are positive cases that are incorrectly predicted as negative. True negatives (TNs) are negative cases that are correctly predicted as negative. False positives (FPs) are negative cases that are incorrectly predicted as positive. Meanwhile, kappa offers a more robust estimated performance of the proposed system compared with a simple agreement and gives an overall evaluation of all the cases. p is the relative observed agreement among the proposed system and the physician analysis, and p is the hypothetical probability of chance agreement. Table 4 presents the confusion matrix of the classification result using optimized C4.5 algorithm. In order to evaluate the performance of the optimized C4.5 algorithm, the random forest and Logistic algorithm were implemented for making comparison. Random forests or random decision forests are an ensemble learning method for classification, regression, and other tasks that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees [15]. Logistic regression is a generalized linear regression analysis model, commonly used in data mining, automatic disease diagnosis, economic prediction, and other fields. The Logistic regression is good at analyzing linear relationships, and analyzing nonlinear relationships is worse than decision trees. In addition, it is sensitive to extreme values and easily affected by extreme values, and the decision tree performs better in this respect [16].
Table 4

Confusion matrix achieved by the optimized C4.5 algorithm.

Risk level analyzed by optimized C4.5 algorithmRecall
HMYTNL
Risk level analyzed by physicians
 H 1288 12700000.910
 M44 1502 00000.972
 Y00 165 0001.000
 T200 51 00 0.962
 N0000 679 2550.727
 L0000182 596 0.766
Precision0.9660.9221.0001.0000.7890.700
Accuracy87.53%
Kappa0.8344
In the current study, the number of trees in the random forest was set to 100, and for each tree, the minimum number of instances for each leaf was set to 1. The Ridge value in the Logistic was set to 1.0E − 8, and the maximum number of iterations to perform was set to -1. They all use tenfold cross-validation like decision trees. Tables 5 and 6 summarize the confusion matrix of classification results using random forest and Logistic algorithm, respectively.
Table 5

Confusion matrix achieved by the random forest algorithm.

Risk level analyzed by random forest algorithmRecall
HMYTNL
Risk level analyzed by physicians
 H 1300 11500000.919
 M72 1473 00100.953
 Y60 158 0010.958
 T2460 11 39 0.208
 N0000 699 2350.748
 L0000239 539 0.693
Precision0.9270.9241.0001.0000.7420.688
Accuracy85.46%
Kappa0.8063
Table 6

Confusion matrix achieved by the Logistic algorithm.

Risk level analyzed by LogisticRecall
HMYTNL
Risk level analyzed by physicians
 H 1289 12401010.911
 M97 1446 11100.935
 Y00 164 0010.994
 T501 46 100.868
 N0000 690 2440.739
 L0100214 563 0.724
Precision0.9270.9200.9880.9580.7620.696
Accuracy85.83%
Kappa0.8119
Regardless of accuracy or kappa value, the optimized C4.5 is the highest among the three algorithms. The recall of the risk type “T” could achieve only 0.208 using the random forest algorithm, which was noticeably lower than 0.962 using the C4.5 algorithm. Figures 9–11 demonstrate that misclassification rate of risk type “T” is the lowest in optimized C4.5 algorithm among the three algorithms.
Figure 9

Illustration of errors of the optimized C4.5 algorithm.

Figure 10

Illustration of errors of the random forest algorithm.

Figure 11

Illustration of errors of the Logistic algorithm.

Corresponding knowledge-based rules can be deduced from the decision tree. There were 98 knowledge-based rules deduced from the present case. There are 37 rules related to the 6 daily living habits (smoking, alcohol consumption, drinking tea, diet, sleep, and sport), which are illustrated in the Supplementary Information (available here).

4. Discussion

According to the previous decision tree, the average depth and frequency of each risk factor in the decision tree were calculated, as shown in Table 7. Values of risk factors for stroke (stroke history, hypertension, dyslipidemia, diabetes, family history of stroke, TIA, smoking, atrial fibrillation, exercise, sleep, gender, BMI, drinking tea, age, and alcohol consumption) were increased, indicating that their influence on risk factors of stroke was relatively reduced. Simultaneously, the impact of daily living habits on risk factors of stroke was relatively insignificant, demonstrating that the influence of lifestyle habits and diet on risk factors of stroke is indirect.
Table 7

Values of risk factors for stroke.

Risk factorsDepth/frequencyAverage depth
01234567891011121314
SH10.00
Hyte22.00
Dysl213.33
Diab4213.71
FSH415.40
TIA111225.43
Smok1216.00
AF726.67
Sport1137.00
Sleep1118.67
Gen11329.00
BMI319.25
Tea1119.33
Age12133110.00
Alco121110.60
DT1310.75
We further analyzed the above-mentioned 98 knowledge-based rules for risk factors of stroke, in which risk factors were extracted from the knowledge-based rules. Within each set, the sum of the reciprocals of factors was used to represent the weight of each factor. All factor sets and their weights will be described in the Supplementary Information. Within each set, every two factors formed a factor pair; the same factor pairs were weighted and summed together to form a factor-based relationship matrix, as shown in Table 8.
Table 8

A factor-based relationship matrix.

SHHyteDyslDiabFSHTIASmokAFSportSleepGenBMIcTeaAgeAlco
Hyte6.84
Dysl6.346.34
Diab5.715.715.46
FSH5.714.954.954.64
TIA3.913.913.663.463.29
Smok4.164.164.164.163.853.03
AF0.450.450.450.450.450.330.20
Sport4.494.494.493.823.982.903.440.20
Sleep0.420.420.420.420.270.080.420.000.27
Gen1.741.741.741.431.431.051.050.001.740.08
BMIc2.172.172.172.172.172.172.170.002.170.081.05
Tea0.600.600.600.600.600.480.480.130.480.000.220.38
Age6.846.846.345.714.953.914.160.454.490.421.742.170.60
Alco0.700.700.700.700.700.400.700.000.700.190.220.400.140.70
DT0.960.960.960.960.960.960.960.000.960.000.620.860.380.960.22
As illustrated in Table 8, it was unveiled that the risk factors of stroke, such as stroke history (SH), hypertension (Hyte), dyslipidemia (Dysl), diabetes (Diab), and age (Age), have the highest correlation. Of the 6 daily habit factors we examined (smoking, alcohol consumption, tea, diet, sleep, and exercise), only the correlation of smoking (Smok) and sport (Sport) was higher than the average (1.95). This indicates that alcohol consumption, drinking tea, diet, and sleep are not strongly correlated with other factors. In addition, regarding this weak correlation, the correlation values of alcohol consumption, drinking tea, diet, sleep, smoking, and sport were close to those of strong correlation categories (SH, Hyte, Dysl, Diab, and Age), as shown in Table 9.
Table 9

Factors with higher correlation values than the mean values within the group.

SmokSportSleepTeaAlcoDT
FactorsCorrelationFactorsCorrelationFactorsCorrelationFactorsCorrelationFactorsCorrelationFactorsCorrelation
SH4.16SH4.49SH0.42SH0.60SH0.70SH0.96
Hyte4.16Hyte4.49Hyte0.42Hyte0.60Hyte0.70Hyte0.96
Dysl4.16Dysl4.49Dysl0.42Dysl0.60Dysl0.70Dysl0.96
Diab4.16Age4.49Age0.42Age0.60Age0.70Age0.96
Age4.16FSH3.98Diab0.42Diab0.60Diab0.70Diab0.96
FSH3.85Diab3.82Smok0.42FSH0.60FSH0.70FSH0.96
Sport3.44Smok3.44FSH0.27Smok0.48Smok0.70Smok0.96
TIA3.03TIA2.90Sport0.27Sport0.48Sport0.70Sport0.96
TIA0.48TIA0.96
BMI0.86

The effects of the 6 daily habits (smoking, alcohol consumption, drinking tea, diet, sleep, and exercise) on stroke risk are discussed in the next sections.

4.1. Smoking and Sport

Of the 37 knowledge-based rules mentioned above, 30 rules included a “smoking” factor, suggesting that smoking significantly increases the risk factors of stroke. Yamagishi et al. demonstrated that smoking increases the risk of stroke in patients with hypertension [17], which is in line with our findings. In addition, the radar chart of the risk ratio of smoking to nonsmoking is also illustrated by Figure 12(a).
Figure 12

Radar charts illustrating the effects of daily life habits on risk factors of stroke.

Of the 37 knowledge-based rules mentioned above, 35 contained “sport.” As displayed in Figure 12(b), there is no significant difference in the impact of high-intensity and medium-intensity exercise on risk factors of stroke. Exercise is the most common factor affecting the risk of stroke, and moderate exercise helps prevent stroke, which is consistent with the results of McDonnell et al.'s study [18]. Additionally, 28 knowledge-based rules contained both “smoking” and “sport” factors, indicating that smoking and sport are closely associated together, and further, doing exercise by smokers is beneficial to reduce the risk of stroke.

4.2. Alcohol Consumption and Drinking Tea

It was noted that individuals who drink alcohol have a significantly higher risk of stroke than nonalcohol consumers (Figure 12(c)). This is in line with Hu et al.'s outcome that heavy drinking can increase the risk of stroke, while moderate drinking has insignificant influence on the risk of stroke [19]. However, it is not an independent factor and is typically associated with hypertension, diabetes, and hypercholesterolemia. Knowledge-based rules showed that drinking tea has no direct effect on the risk of stroke (Figure 12(d)), and similar to alcohol consumption, it can be related to BMI. Sosa et al. demonstrated that tea is highly beneficial to reduce the risk of stroke in obese people [20]. Zhang et al. conducted experiments on mice and concluded that drinking tea has a neuroprotective effect on hemorrhagic stroke [21]. In addition, we found that “tea = y” and “alco = y” do not simultaneously appear in the same rule in the present study, and the correlation value of 0.14 (Table 8) between them is also very insignificant, indicating that drinking tea and alcohol consumption have simultaneously no effect on the risk of stroke.

4.3. Diet

As shown in Figure 12(e), the effects of the three types of diet (mainly sugar, fat, and protein) on risk of stroke are not significantly different. According to the rules, these types are more concentrated in the “H” and “M” types, demonstrating that dietary structure has a certain influence on individuals with high risk of stroke. In addition, from the perspective of correlation value (Table 8), it has a relatively higher correlation with other factors compared with alcohol consumption, drinking tea, and sleep.

4.4. Sleep

As displayed in Figure 12(f), the risk of stroke is lower when duration of sleep is appropriate. Very long or short duration of sleep is not conducive to avoid the risk of stroke, which is consistent with Huang et al.'s findings, expressing that a good sleep quality helps reduce the risk of stroke [11, 22]. From the perspective of rules, sleep is associated with smoking, alcohol consumption, and sport, and from the perspective of correlation, sleep, smoking, and exercise are relatively correlated together. People who exercise less and are obese have an increased risk of stroke, if the duration of their sleep is extremely long. People who exercise less, as well as being smokers, and alcohol drinkers have a higher risk of stroke, if the duration of their sleep would be lower than normal level. As shown in Figure 12(a), “YESp” stands for “smoking” and “Nop” stands for “nonsmoking.” As illustrated in Figure 12(b), “C1p,” “C2p,” and “C3P” represent three kinds of exercise: “C1,” “C2,” and “C3.” In Figure 12(c), “YESp” stands for “drinking,” and “Nop” denotes “no drinking.” As displayed in Figure 12(d), “YESp” stands for “drinking tea,” and “Nop” represents “no tea drinking.” As depicted in Figure 12(e), “C1p,” “C2p,” and “C3p” represent “C1,” “C2,” and “C3,” respectively. As illuminated in Figure 12(f), “TSp,” “TBp,” and “TLp” denote “TS,” “TB,” and “TL,” respectively.

5. Conclusions

In the present study, we optimized the decision tree C4.5 algorithm to assess and analyze risk factors of stroke (stroke history, hypertension, dyslipidemia, diabetes, family history of stroke, TIA, smoking, atrial fibrillation, sport, sleep, gender, BMI, drinking tea, age, alcohol consumption, and diet) via 5599 valid data collected. The classification result showed to have an accuracy of 87.5281% and a kappa coefficient of 0.8344. It also was noted that classification performance was higher than that of the random forest and Logistic algorithm. Then, we focused on 6 factors influencing daily life, such as smoking, alcohol consumption, drinking tea, sleep, and sport, and presented a series of knowledge-based rules that are conducive to guide patients to adjust individuals' living habits. With further analysis of decision tree and knowledge-based rules, the independent influence of each factor and the relationship between the factors were analyzed. Different from other studies, we analyzed the relationship between smoking and exercise, among alcohol consumption, drinking tea, and BMI, among diet, sport, and BMI, and among sleep, sport, smoking, and alcohol consumption and found that although these daily living habits cannot directly determine the risk of stroke (with low independent influence) they could be used to intervene the risk factors of stroke. On the one hand, smoking and exercise were strongly associated with other risk factors of stroke; on the other hand, sleep, drinking tea, alcohol consumption, and diet were not firmly associated with other risk factors of stroke, and they were relatively tightly associated with smoking and exercise. However, further research needs to be conducted to indicate whether smoking and exercise play a significant role in the risk of stroke in daily habits.
  17 in total

1.  Smoking raises the risk of total and ischemic strokes in hypertensive men.

Authors:  Kazumasa Yamagishi; Hiroyasu Iso; Akihiko Kitamura; Tomoko Sankai; Takeshi Tanigawa; Yoshihiko Naito; Shinichi Sato; Hironori Imano; Tetsuya Ohira; Takashi Shimamoto
Journal:  Hypertens Res       Date:  2003-03       Impact factor: 3.872

2.  Lifestyles correlate with stroke recurrence in Chinese inpatients with first-ever acute ischemic stroke.

Authors:  Zhi-Xin Huang; Xiao-Ling Lin; Hai-Ke Lu; Xiao-Yu Liang; Li-Juan Fan; Xin-Tong Liu
Journal:  J Neurol       Date:  2019-02-19       Impact factor: 4.849

3.  Lifestyle Factors and Gender-Specific Risk of Stroke in Adults with Diabetes Mellitus: A Case-Control Study.

Authors:  Jian Guo; Tianjia Guan; Ying Shen; Baohua Chao; Mei Li; Longde Wang; Yuanli Liu
Journal:  J Stroke Cerebrovasc Dis       Date:  2018-03-09       Impact factor: 2.136

4.  [ELITE study - Nutrition, lifestyle and individual information for the prevention of stroke, dementia and heart attack - Study design and cardiovascular status].

Authors:  Stephan Lüders; Bastian Schrader; Jörg Bäsecke; Hermann Haller; Albrecht Elsässer; Michael Koziolek; Joachim Schrader
Journal:  Dtsch Med Wochenschr       Date:  2018-12-13       Impact factor: 0.628

Review 5.  Stroke epidemiology: advancing our understanding of disease mechanism and therapy.

Authors:  Bruce Ovbiagele; Mai N Nguyen-Huynh
Journal:  Neurotherapeutics       Date:  2011-07       Impact factor: 7.620

6.  Delayed Treatment with Green Tea Polyphenol EGCG Promotes Neurogenesis After Ischemic Stroke in Adult Mice.

Authors:  Jian-Cheng Zhang; Hang Xu; Yin Yuan; Jia-Yi Chen; Yu-Jing Zhang; Yun Lin; Shi-Ying Yuan
Journal:  Mol Neurobiol       Date:  2016-05-20       Impact factor: 5.590

7.  Television Viewing Time and Stroke Risk: Australian Diabetes Obesity and Lifestyle Study (1999-2012).

Authors:  Toby B Cumming; Elizabeth Holliday; David Dunstan; Coralie English
Journal:  J Stroke Cerebrovasc Dis       Date:  2019-01-22       Impact factor: 2.136

8.  Dairy foods and risk of stroke: a meta-analysis of prospective cohort studies.

Authors:  D Hu; J Huang; Y Wang; D Zhang; Y Qu
Journal:  Nutr Metab Cardiovasc Dis       Date:  2013-12-25       Impact factor: 4.222

9.  Primary stroke prevention in China - a new approach.

Authors:  Valery L Feigin; Wenzhi Wang; Hua Fu; Liping Liu; Rita Krishnamurthi; Rohit Bhattacharjee; Priya Parmar; Tasleem Hussein; Suzanne Barker-Collo
Journal:  Neurol Res       Date:  2015-03-28       Impact factor: 2.448

Review 10.  Overview of Meta-Analyses: The Impact of Dietary Lifestyle on Stroke Risk.

Authors:  Emma Altobelli; Paolo Matteo Angeletti; Leonardo Rapacchietta; Reimondo Petrocelli
Journal:  Int J Environ Res Public Health       Date:  2019-09-25       Impact factor: 3.390

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.