| Literature DB >> 32606434 |
Fernando López-Martínez1,2, Edward Rolando Núñez-Valdez1, Rubén González Crespo3, Vicente García-Díaz1.
Abstract
This paper focus on a neural network classification model to estimate the association among gender, race, BMI, age, smoking, kidney disease and diabetes in hypertensive patients. It also shows that artificial neural network techniques applied to large clinical data sets may provide a meaningful data-driven approach to categorize patients for population health management, and support in the control and detection of hypertensive patients, which is part of the critical factors for diseases of the heart. Data was obtained from the National Health and Nutrition Examination Survey from 2007 to 2016. This paper utilized an imbalanced data set of 24,434 with (69.71%) non-hypertensive patients, and (30.29%) hypertensive patients. The results indicate a sensitivity of 40%, a specificity of 87%, precision of 57.8% and a measured AUC of 0.77 (95% CI [75.01-79.01]). This paper showed results that are to some degree more effectively than a previous study performed by the authors using a statistical model with similar input features that presents a calculated AUC of 0.73. This classification model can be used as an inference agent to assist the professionals in diseases of the heart field, and can be implemented in applications to assist population health management programs in identifying patients with high risk of developing hypertension.Entities:
Mesh:
Year: 2020 PMID: 32606434 PMCID: PMC7327031 DOI: 10.1038/s41598-020-67640-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Related work.
| Author | Input features | n Total | Type of model | AUC (%) |
|---|---|---|---|---|
| LaFreniere et al.[ | Age, gender, BMI, sys/diast BP, high and low density lipoproteins, triglycerides, cholesterol, microalbumin, and urine albumin creatinine ratio | 379,027 | Backpropagation neural network | 82 |
| Polak and Mendyk[ | Age, sex, diet, smoking and drinking habits, physical activity level and BMI | 159,989 | backpropagation (BP) and fuzzy network | 75 |
| Tang et al.[ | Sys/diast BP, fasting plasma glucose, age, BMI, heart rate, gender, WC, diabetes, renal profile | 2,092 | Feed-forward, back-propagation neural network | 76 |
| Ture et al.[ | Age, sex, hypertension, smoking, lipoprotein, triglyceride, uric acid, total cholesterol, BMI | 694 | Feed-forwardneural network | 81 |
| Lynn et al.[ | Sixteen genes, age, BMI, fasting blood sugar, hypertension medication, no history of cancer, kidney, liver or lung | 22,184 genes, 159 cases | One-hidden-layer neural network | 96.72 |
| Sakr et al.[ | Age, gender, race, reason for test, stress, medical history | 23,095 | Backpropagation neural network | 64 |
| López-Martínez et al.[ | Age, gender, ethnicity, BMI, smoking history, kidney disease, diabetes | 24,434 | Three-hidden layer neural network | 77 |
n samples by hypertensive class, gender and race.
| Class | Gender | Race | n |
|---|---|---|---|
| Hypertensive | Female | Mexican American | 464 |
| Non-Hispanic black | 925 | ||
| Non-Hispanic white | 1,433 | ||
| Other Hispanic | 368 | ||
| Other race—including multi-racial | 277 | ||
| Male | Mexican American | 575 | |
| Non-Hispanic black | 1,039 | ||
| Non-Hispanic white | 1,582 | ||
| Other Hispanic | 371 | ||
| Other race—including multi-racial | 365 | ||
| Non-hypertensive | Female | Mexican American | 1,461 |
| Non-Hispanic black | 1,676 | ||
| Non-Hispanic white | 3,663 | ||
| Other Hispanic | 1,084 | ||
| Other race—including multi-racial | 1,038 | ||
| Male | Mexican American | 1,275 | |
| Non-Hispanic black | 1,465 | ||
| Non-Hispanic white | 3,585 | ||
| Other Hispanic | 820 | ||
| Other race—including multi-racial | 968 | ||
| Total | 24,434 | ||
Variables included in the model.
| Variable name | Description | Code | Meaning |
|---|---|---|---|
| Gender | Gender | 1 | Male |
| 2 | Female | ||
| Agerange | Age at screening adjudicated—date of birth was used to calculate AGE | 1 | 20-30 |
| 2 | 31–40 | ||
| 3 | 41–50 | ||
| 4 | 51–60 | ||
| 5 | 61–70 | ||
| 6 | 71–80 | ||
| Race | Race/Hispanic origin | 1 | Mexican American |
| 2 | Other Hispanic | ||
| 3 | Non-Hispanic white | ||
| 4 | Non-Hispanic black | ||
| 5 | Other race—including multi-racial | ||
| BMXBMI | Body mass index (kg/m | 1 | Underweight = |
| 2 | Normal weight = 18.5–24.9 | ||
| 3 | Overweight = 25–29.9 | ||
| 4 | Obesity = BMI of 30 or greater | ||
| Kidney | Ever told you had weak/failing kidneys | 1 | Yes |
| 2 | No | ||
| Smoke | Smoked at least 100 cigarettes in life | 1 | Yes |
| 2 | No | ||
| Diabetes | Doctor told you have diabetes | 1 | Yes |
| 2 | No | ||
| 3 | Borderline | ||
| Hypclass | Systolic: blood pres (mean) mm Hg | 0 | Non-hypertensive |
| 1 | Hypertensive |
Chi-squared between each variable.
| Feature | Score | |
|---|---|---|
| Gender | 0.3988107 | 0.711909 |
| Agerange | 0.000000 | 1,965.607023 |
| Race | 0.008822 | 6.858521 |
| BMIrange | 0.0172385 | 5.67193 |
| Kidney | 0.3546428 | 0.856775 |
| Smoke | 0.0975246 | 2.745566 |
| Diabetes | 0.0012164 | 10.465222 |
Fig. 4Multilayer perceptron architecture.
Fig. 1Decision boundary.
Fig. 2Draw test points.
Fig. 3Relation between BMI and age by gender and hypertension class.
Model architecture parameters.
| Parameter | Value | |
|---|---|---|
| Input dimension | 7 | |
| Num output classes | 2 | |
| Num hidden layers | 3 | |
| Hidden layer1 dimension | 64 | |
| Activation func layer1 | Relu | |
| Hidden layer2 dimension | 32 | |
| Activation func layer2 | Relu | |
| Hidden layer3 dimension | 16 | |
| Activation func layer3 | Relu | |
| Minibatch size | 10 | |
| Num samples to train | 17,104 | |
| Num minibatches to train | 1,710 | |
| Loss function | Cross entropy with softmax | |
| Eval error | Classification error | |
| Learner for parameters | Momentum sgd | |
| Learning rate | 0.01 | |
| Momentum | 0.9 | |
| Eval metrics | Confusion matrix, AUC | |
Fig. 5Training error.
Fig. 6Loss error.
Fig. 7Test prediction error.
Confusion matrix.
| Predicted | ||
|---|---|---|
| Non-hypertensive | Hypertensive | |
| Non-hypertensive | 4,477 | 648 |
| Hypertensive | 1,318 | 887 |
Classification report.
| True positive | False negative | Precision | Accuracy |
| 887 | 1,318 | 0.578 | 0.732 |
| False positive | True negative | Recall | f1-score |
| 648 | 4,477 | 0.402 | 0.474 |
| Positive label: 1 | Negative label: 0 | ||
Decision jungle parameters.
| Parameter | Value | |
|---|---|---|
| Resampling method | Bagging | |
| Trainer mode | Single parameter | |
| Number of decision DAGs | 8 | |
| Maximum depth of the decision DAGs | 32 | |
| Maximum width of the decision DAGs | 128 | |
| Number of optimization steps per layer | 2,048 | |
Classification report.
| Method | True positive | False negative | False positive | True negative | Precision | Accuracy | Recall | f1-score |
|---|---|---|---|---|---|---|---|---|
| Our model | 887 | 1,318 | 648 | 4,477 | 0.578 | 0.732 | 0.402 | 0.474 |
| Decision jungle | 540 | 912 | 390 | 3,045 | 0.581 | 0.734 | 0.372 | 0.453 |
| Logistic regression | 557 | 895 | 389 | 3,046 | 0.589 | 0.737 | 0.384 | 0.465 |
| Support vector machine | 556 | 896 | 387 | 3,048 | 0.59 | 0.737 | 0.383 | 0.464 |
| boosted decision tree | 568 | 884 | 439 | 2,996 | 0.564 | 0.729 | 0.391 | 0.462 |
| Bayes point machine | 543 | 909 | 388 | 3,047 | 0.583 | 0.735 | 0.374 | 0.456 |
| Synthetic minority oversampling | 3,645 | 789 | 1,326 | 2,086 | 0.73 | 0.73 | 0.82 | 0.77 |
| Positive label: 1 | Negative label: 0 | |||||||
Logistic regression parameters.
| Parameter | Value | |
|---|---|---|
| Optimization tolerance | 1.00E−07 | |
| L1 regularization weight | 1 | |
| L2 regularization weigh | 1 | |
| Memory size for L-BFGS | 20 | |
Support vector machine parameters.
| Parameter | Value | |
|---|---|---|
| Lambda—weight for L1 regularization | 1.00E−03 | |
| normalize features before training | Yes | |
Boosted decision tree parameters.
| Parameter | Value | |
|---|---|---|
| Maximum number of leaves per tree | 20 | |
| Minimum number of training instances | 10 | |
| Learning rate | 0.2 | |
| Number of trees constructed | 100 | |
Bayes point machine parameters.
| Parameter | Value | |
|---|---|---|
| Number of training iterations | 30 | |
| bias to be added to each instance in training | Yes | |
Synthetic minority oversampling parameters.
| Parameter | Value | |
|---|---|---|
| SMOTE percentage | 200 | |
| Number of nearest neighbors | 5 | |
Classification methods comparison.
| Method | Precision | Accuracy | f1-score | AUC |
|---|---|---|---|---|
| Our model | 0.578 | 0.732 | 0.474 | 0.77 |
| Decision jungle | 0.581 | 0.734 | 0.453 | 0.769 |
| Logistic regression | 0.589 | 0.737 | 0.465 | 0.764 |
| Support vector machine | 0.59 | 0.737 | 0.464 | 0.759 |
| Boosted decision tree | 0.564 | 0.729 | 0.462 | 0.765 |
| Bayes point machine | 0.583 | 0.735 | 0.456 | 0.763 |
| Synthetic minority oversampling | 0.73 | 0.73 | 0.77 | 0.8 |
Predictive ability tests.
| DJ | LR | SVM | BDT | BPM | |
|---|---|---|---|---|---|
| ANN | 0.001- (3.65) | 0.035- (1.80) | 0.001- (4.03) | 0.036- (1.80) | 0.011- (1.67) |
Fig. 8ROC curve.
Fig. 9Precision/recall.