| Literature DB >> 31703364 |
Wenbing Chang1, Yinglai Liu1, Yiyong Xiao1, Xinglong Yuan1, Xingxing Xu1, Siyue Zhang1, Shenghan Zhou1.
Abstract
The outcomes of hypertension refer to the death or serious complications (such as myocardial infarction or stroke) that may occur in patients with hypertension. The outcomes of hypertension are very concerning for patients and doctors, and are ideally avoided. However, there is no satisfactory method for predicting the outcomes of hypertension. Therefore, this paper proposes a prediction method for outcomes based on physical examination indicators of hypertension patients. In this work, we divide the patients' outcome prediction into two steps. The first step is to extract the key features from the patients' many physical examination indicators. The second step is to use the key features extracted from the first step to predict the patients' outcomes. To this end, we propose a model combining recursive feature elimination with a cross-validation method and classification algorithm. In the first step, we use the recursive feature elimination algorithm to rank the importance of all features, and then extract the optimal features subset using cross-validation. In the second step, we use four classification algorithms (support vector machine (SVM), C4.5 decision tree, random forest (RF), and extreme gradient boosting (XGBoost)) to accurately predict patient outcomes by using their optimal features subset. The selected model prediction performance evaluation metrics are accuracy, F1 measure, and area under receiver operating characteristic curve. The 10-fold cross-validation shows that C4.5, RF, and XGBoost can achieve very good prediction results with a small number of features, and the classifier after recursive feature elimination with cross-validation feature selection has better prediction performance. Among the four classifiers, XGBoost has the best prediction performance, and its accuracy, F1, and area under receiver operating characteristic curve (AUC) values are 94.36%, 0.875, and 0.927, respectively, using the optimal features subset. This article's prediction of hypertension outcomes contributes to the in-depth study of hypertension complications and has strong practical significance.Entities:
Keywords: XGBoost; classification algorithm; feature selection; hypertension outcomes; prediction; recursive feature elimination
Year: 2019 PMID: 31703364 PMCID: PMC6963807 DOI: 10.3390/diagnostics9040178
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Original feature.
| No. | Name | No. | Name | No. | Name |
|---|---|---|---|---|---|
| Baseline Data | Blood biochemical | 57 | RARMDBP | ||
| 1 | SEX | 30 | ALT | 58 | LARMSBP |
| 2 | AGE | 31 | AST | 59 | LARMDBP |
| 3 | HEIGHT | 32 | K | 60 | LLEGSBP |
| 4 | WEIGHT | 33 | Na | 61 | BAPWVR |
| 5 | BMI | 34 | Cl | 62 | RLEGSBP |
| 6 | HR | 35 | GLU | 63 | LLEGDBP |
| 7 | PULSE | 36 | CREA | 64 | RLEGDBP |
| 8 | RYSBPL | 37 | BUN | 65 | BAPWVL |
| 9 | RYDBPL | 38 | URIC | 66 | ABIR |
| 10 | HTBEGIN | 39 | HSCRP | 67 | ABIL |
| 11 | ZGSBP | 40 | TG | Dynamic blood pressure | |
| 12 | ZGDBP | 41 | TC | 68 | MEANSBP |
| 13 | PSSBP1 | 42 | HDLC | 69 | MEANDBP |
| 14 | PSDBP1 | 43 | LDLC | 70 | HIGHSBP |
| UCG cardiac vascular ultrasound | Thyroid function | 71 | DAYMDBP | ||
| 15 | AO | 44 | FT3 | 72 | LOWSBP |
| 16 | LA | 45 | FT4 | 73 | LOWDBP |
| 17 | IVSD | 46 | T3 | 74 | DAYMSBP |
| 18 | LV | 47 | T4 | 75 | HIGHDBP |
| 19 | EF | 48 | TSH | 76 | NIHTMSBP |
| 20 | LVPWd | Urine protein | 77 | NIHTMDBP | |
| 21 | RVd | 49 | MAUCR | Breathing sleep | |
| blood routine | 50 | HUPRO | 78 | AHI | |
| 22 | WBC | Blood sugar | 79 | APNEA | |
| 23 | NEUT | 51 | HBLAC | 80 | HYPOPNEA |
| 24 | RBC | Inflammatory factor | 81 | SAO2 | |
| 25 | HB | 52 | ESR | 82 | MEANSAO2 |
| 26 | PLT | 53 | CRP | Other | |
| Urine routine | 54 | NTPRO | 83 | HCY | |
| 27 | UKET | 55 | ET | 84 | W_DISC_NOHPT |
| 28 | USG | Limb blood pressure | |||
| 29 | USG1 | 56 | RARMSBP |
The explanation of the abbreviated in Table 1.
| NO. | Abbreviations | Explanation |
|---|---|---|
| 1 | SEX | sex |
| 2 | AGE | age |
| 3 | HEIGHT | height |
| 4 | WEIGHT | weight |
| 5 | BMI | body mass index |
| 6 | HR | heart rate |
| 7 | PULSE | pulse |
| 8 | RYSBPL | left arm systolic pressure |
| 9 | RYDBPL | left arm diastolic pressure |
| 10 | HTBEGIN | initial hypertension age |
| 11 | ZGSBP | highest systolic blood pressure |
| 12 | ZGDBP | highest diastolic blood pressure |
| 13 | PSSBP1 | normal systolic blood pressure |
| 14 | PSDBP1 | normal diastolic blood pressure |
| 15 | AO | ascending aorta diameter |
| 16 | LA | left atrium |
| 17 | IVSD | ventricular septal thickness |
| 18 | LV | left ventricular end diastolic diameter |
| 19 | EF | ejection fraction |
| 20 | LVPWd | thickness of the back wall |
| 21 | RVd | right ventricle |
| 22 | WBC | white blood cell |
| 23 | NEUT | percentage of neutrophils |
| 24 | RBC | red blood cells |
| 25 | HB | hemoglobin |
| 26 | PLT | platelet |
| 27 | UKET | ketone body |
| 28 | USG | specific gravity of urine |
| 29 | USG1 | USG tube type |
| 30 | ALT | alanine aminotransferase |
| 31 | AST | aspartate aminotransferase |
| 32 | K | serum potassium |
| 33 | Na | serum sodium |
| 34 | Cl | serum chlorine |
| 35 | GLU | blood sugar |
| 36 | CREA | creatinine |
| 37 | BUN | urea nitrogen |
| 38 | URIC | uric acid |
| 39 | HSCRP | high-sensitivity C-reactive protein |
| 40 | TG | triglyceride |
| 41 | TC | triacylglycerol |
| 42 | HDLC | high density lipoprotein cholesterol |
| 43 | LDLC | low density lipoprotein cholesterol |
| 44 | FT3 | serum free triiodothyronine |
| 45 | FT4 | free thyroxine |
| 46 | T3 | triiodothyronine |
| 47 | T4 | tetraiodothyronine |
| 48 | TSH | thyroid stimulating hormone |
| 49 | MAUCR | urinary microalbumin/creatinine |
| 50 | HUPRO | 4-hour urine protein quantitation |
| 51 | HBLAC | glycated hemoglobin |
| 52 | HCY | homocysteine |
| 53 | ESR | erythrocyte sedimentation rate |
| 54 | CRP | C-reactive protein |
| 55 | NTPRO | amino terminal precursor protein of brain natural peptide |
| 56 | ET | endothelin |
| 57 | RARMSBP | right upper limb systolic blood pressure |
| 58 | RARMDBP | right upper limb diastolic blood pressure |
| 59 | LARMSBP | left upper limb systolic blood pressure |
| 60 | LARMDBP | left upper limb diastolic blood pressure |
| 61 | RLEGSBP | right lower limb systolic blood pressure |
| 62 | RLEGDBP | right lower limb diastolic blood pressure |
| 63 | LLEGSBP | left lower extremity systolic blood pressure |
| 64 | LLEGDBP | left lower extremity diastolic blood pressure |
| 65 | BAPWVR | right brachium-ankle pulse wave conduction velocity |
| 66 | BAPWVL | left brachium-ankle pulse wave conduction velocity |
| 67 | ABIR | right ankle-brachium index |
| 68 | ABIL | left ankle-brachium index |
| 69 | AHI | hourly breathing number |
| 70 | APNEA | the longest apnea number |
| 71 | HYPOPNEA | the longest hypoventilation time |
| 72 | SAO2 | the lowest SaO2% |
| 73 | MEANSAO2 | the average SaO2% |
| 74 | MEANSBP | 24h mean systolic blood pressure |
| 75 | MEANDBP | 24h mean diastolic blood pressure |
| 76 | HIGHSBP | the highest systolic blood pressure |
| 77 | HIGHDBP | the highest diastolic blood pressure |
| 78 | LOWSBP | the lowest systolic blood pressure |
| 79 | LOWDBP | the lowest diastolic blood pressure |
| 80 | DAYMSBP | daytime average systolic blood pressure |
| 81 | DAYMDBP | daytime mean diastolic blood pressure |
| 82 | NIHTMSBP | nighttime average systolic blood pressure |
| 83 | NIHTMDBP | nighttime average diastolic blood pressure |
| 84 | W_DISC_NOHPT | number of antihypertensive drugs at discharge |
Description of partial hypertension examination indicators.
| Attribute No. | Name | Description | Type | Value Range | Mean Value | Std. |
|---|---|---|---|---|---|---|
| 1 | Sex | Baseline data | Categorical | Male or female (1 or 0) | / | / |
| 2 | Age | Baseline data | Numeric | 15–76 | 38.31 | 11.42 |
| 3 | BMI | Body mass index | Numeric | 10–50.93 | 27.28 | 4.27 |
| 4 | PULSE | Pulse rate | Numeric | 49–121 | 76.28 | 12.65 |
| 5 | RYSBPL | Left arm systolic pressure | Numeric | 95–230 | 151.90 | 22.67 |
| 6 | FT3 | One index of thyroid function | Numeric | 0.74–7.3 | 3.19 | 0.47 |
| 7 | SaO2 | One index of respiratory sleep test | Numeric | 55–96 | 84.13 | 6.54 |
| 8 | meanSBP24h | 24 h mean systolic blood pressure | Numeric | 96–184 | 135.09 | 15.12 |
| 82 | NIHTMDBP | Mean diastolic pressure at night | Numeric | 48–131 | 82.23 | 12.57 |
| 83 | NIHTMSBP | Mean systolic blood pressure at night | Numeric | 87–192 | 128.20 | 17.03 |
| 84 | W_DISC_NOH | Number of antihypertensive drugs at discharge | Categorical | 0,1,2,3,4 | / | / |
Processed data set.
| Item. | SEX | AGE | HEIGHT | WEIGHT | BMI | W_DISC_NOH 1 | |
|---|---|---|---|---|---|---|---|
| NO. | |||||||
|
| 1 | 36 | 168 | 65 | 23.03 | 3 | |
|
| 1 | 55 | 178 | 105 | 33.13 | 3 | |
|
| 1 | 26 | 172 | 90 | 30.42 | 3 | |
|
| 1 | 36 | 170 | 73 | 25.25 | 2 | |
|
| 0 | 36 | 168 | 75 | 26.57 | 4 | |
|
| 1 | 30 | 178 | 102 | 32.19 | 3 | |
|
| 1 | 34 | 180 | 90 | 27.77 | 3 | |
|
| 0 | 29 | 178 | 60 | 29.38 | 2 | |
|
| 1 | 34 | 180 | 67 | 29.68 | 1 | |
|
| 0 | 38 | 178 | 70 | 23.63 | 0 | |
|
| 1 | 31 | 180 | 65 | 33.95 | 2 | |
|
| 0 | 43 | 173 | 72.5 | 24.22 | 3 | |
1 W_DISC_NOH is the number of antihypertensive drugs at discharge.
Confusion matrix of classification results.
| Real Situation | Prediction Results | |
|---|---|---|
| Positive Class | Negative Class | |
| positive class |
|
|
| negative class |
|
|
1 TP is Ture Positive; 2 FN is False Negative; 3 FP is False Positive; 4 TN is True Negative.
Figure 1Predictive model construction process. Abbreviations: RF = random forest; AUC = area under receiver operating curve; RFE = recursive feature elimination; XGBoost = extreme gradient boosting; SVM = support vector machine; F1 Measure is calculated according to Equation (17).
The number of optimal feature subsets for each classifier under three criteria.
| Classifier | SVM | C4.5 | RF | XGBoost | |
|---|---|---|---|---|---|
| Criterion | |||||
| ACC (%) | 3 | 2 | 3 | 4 | |
| F1 Measure | 16 | 2 | 6 | 3 | |
| AUC | 9 | 2 | 3 | 9 | |
Prediction performance (accuracy, F1 measure, and AUC) of each classifier using their optimal features subset.
| Classifier | SVM | C4.5 | RF | XGBoost | |
|---|---|---|---|---|---|
| Criterion | |||||
| ACC (%) | 75.80% | 86.30% | 88.98% | 94.36% | |
| F1 Measure | 0.626 | 0.819 | 0.859 | 0.875 | |
| AUC | 0.660 | 0.839 | 0.871 | 0.927 | |
Figure 2Prediction performance (accuracy, F1 measure, and AUC) of each classifier using their optimal features subset. (The blue dotted line represents the position of the maximum or minimum value).
Figure 3Relationship between accuracy and number of features used by the classifier: (a) SVM, (b) decision tree, (c) RF, (d) XGBoost.
Figure 4Relationship between F1 measure and number of features used by the classifier: (a) SVM, (b) decision tree, (c) RF, (d) XGBoost.
Figure 5Relationship between AUC and number of features used by the classifier: (a) SVM, (b) decision tree, (c) RF, (d) XGBoost.
Figure 6Feature weighting of XGBoost.
Figure 7Classification performance of different (A) number of evaluators and (B) depths for XGBoost.