Maryam Tayefi1, Habibollah Esmaeili2, Maryam Saberi Karimian3, Alireza Amirabadi Zadeh4, Mahmoud Ebrahimi5, Mohammad Safarian6, Mohsen Nematy6, Seyed Mohammad Reza Parizadeh1, Gordon A Ferns7, Majid Ghayour-Mobarhan8. 1. Biochemistry and Nutrition Research Center, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran. 2. Biochemistry and Nutrition Research Center, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran; Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran. 3. Student Research Committee, Department of Modern Sciences and Technologies, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran. 4. Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran. 5. Cardiovascular Research Center, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran. 6. Department of Nutrition Research Center, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran. 7. Brighton & Sussex Medical School, Division of Medical Education, Falmer, Brighton, Sussex BN1 9PH, UK. 8. Biochemistry and Nutrition Research Center, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran; Cardiovascular Research Center, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran. Electronic address: ghayourm@mums.ac.ir.
Abstract
INTRODUCTION: Hypertension is an important risk factor for cardiovascular disease (CVD). The goal of this study was to establish the factors associated with hypertension by using a decision-tree algorithm as a supervised classification method of data mining. METHODS: Data from a cross-sectional study were used in this study. A total of 9078 subjects who met the inclusion criteria were recruited. 70% of these subjects (6358 cases) were randomly allocated to the training dataset for the constructing of the decision-tree. The remaining 30% (2720 cases) were used as the testing dataset to evaluate the performance of decision-tree. Two models were evaluated in this study. In model I, age, gender, body mass index, marital status, level of education, occupation status, depression and anxiety status, physical activity level, smoking status, LDL, TG, TC, FBG, uric acid and hs-CRP were considered as input variables and in model II, age, gender, WBC, RBC, HGB, HCT MCV, MCH, PLT, RDW and PDW were considered as input variables. The validation of the model was assessed by constructing a receiver operating characteristic (ROC) curve. RESULTS: The prevalence rates of hypertension were 32% in our population. For the decision-tree model I, the accuracy, sensitivity, specificity and area under the ROC curve (AUC) value for identifying the related risk factors of hypertension were 73%, 63%, 77% and 0.72, respectively. The corresponding values for model II were 70%, 61%, 74% and 0.68, respectively. CONCLUSION: We have developed a decision tree model to identify the risk factors associated with hypertension that maybe used to develop programs for hypertension management.
INTRODUCTION:Hypertension is an important risk factor for cardiovascular disease (CVD). The goal of this study was to establish the factors associated with hypertension by using a decision-tree algorithm as a supervised classification method of data mining. METHODS: Data from a cross-sectional study were used in this study. A total of 9078 subjects who met the inclusion criteria were recruited. 70% of these subjects (6358 cases) were randomly allocated to the training dataset for the constructing of the decision-tree. The remaining 30% (2720 cases) were used as the testing dataset to evaluate the performance of decision-tree. Two models were evaluated in this study. In model I, age, gender, body mass index, marital status, level of education, occupation status, depression and anxiety status, physical activity level, smoking status, LDL, TG, TC, FBG, uric acid and hs-CRP were considered as input variables and in model II, age, gender, WBC, RBC, HGB, HCT MCV, MCH, PLT, RDW and PDW were considered as input variables. The validation of the model was assessed by constructing a receiver operating characteristic (ROC) curve. RESULTS: The prevalence rates of hypertension were 32% in our population. For the decision-tree model I, the accuracy, sensitivity, specificity and area under the ROC curve (AUC) value for identifying the related risk factors of hypertension were 73%, 63%, 77% and 0.72, respectively. The corresponding values for model II were 70%, 61%, 74% and 0.68, respectively. CONCLUSION: We have developed a decision tree model to identify the risk factors associated with hypertension that maybe used to develop programs for hypertension management.