| Literature DB >> 30767907 |
Jaram Park1, Jeong-Whun Kim2,3, Borim Ryu1, Eunyoung Heo1, Se Young Jung1,4, Sooyoung Yoo1.
Abstract
BACKGROUND: Prevention and management of chronic diseases are the main goals of national health maintenance programs. Previously widely used screening tools, such as Health Risk Appraisal, are restricted in their achievement this goal due to their limitations, such as static characteristics, accessibility, and generalizability. Hypertension is one of the most important chronic diseases requiring management via the nationwide health maintenance program, and health care providers should inform patients about their risks of a complication caused by hypertension.Entities:
Keywords: chronic disease; clustering and classification; decision support systems; health risk appraisal; hypertension; risk
Mesh:
Year: 2019 PMID: 30767907 PMCID: PMC6396076 DOI: 10.2196/11757
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Statistics of National Health Insurance Service dataset (2002-2013). The precise percentage of the numbers in this table cannot be provided because the official total numbers are unavailable. However, we believe that each dataset contains almost all the medical records of the sampled patients, since South Korea has a mandatory social insurance system that meets the universal coverage of the population and medical institutions.
| Description | Health check-up cohort (n) | National sample cohort (n) |
| Hospitals | 51,920 | 52,483 |
| Patients | 514,866 | 1,113,656 |
| Prescriptions | 83,935,395 | 83,935,395 |
| Visits | 96,534,359 | 119,362,188 |
| Diagnostic codes (full code name) | 17,385 | 19,626 |
| Diagnostic codes (first 3-digits) | 2160 | 2319 |
| Annual patient visits, mean | 15.6 | 8.9 |
| Diagnostic codes/visit, mean | 2.4 | 2.5 |
| Drugs/prescription, mean | 4.4 | 4.4 |
Figure 1Feature matrix construction. CCV: cardio-cerebrovascular.
Figure 2Flowchart of the evaluation strategy.
Figure 3Flowchart describing the study population.
General characteristics of the study population.
| Variable | Hypertension only (n=44,203) | Cardio-cerebrovascular (n=30,332) | Cardiovascular (n=21,617) | Cerebrovascular (n=18,042) | |
| Age (years), mean (SD) | 57.1 (9.1) | 60.5 (9.6) | 60.3 (9.5) | 61.8 (9.4) | |
| Female | 19,036 (43.1) | 14,253 (47.0) | 9911 (45.7) | 8933 (49.5) | |
| Male | 25,167 (56.9) | 16,079 (53.0) | 11,706 (54.3) | 9109 (50.5) | |
| Body mass index, mean (SD) | 24.5 (2.9) | 24.5 (3.0) | 24.6 (3.0) | 24.4 (3.0) | |
| None | 29,144 (65.9) | 20,645 (68.1) | 14,583 (67.5) | 12,558 (69.6) | |
| Past | 4035 (9.1) | 2453 (8.1) | 1799 (8.3) | 1326 (7.3) | |
| Current | 8647 (19.6) | 5716 (18.8) | 4174 (19.3) | 3249 (18.0) | |
| Nondrinker | 23,645 (53.5) | 17,825 (58.8) | 12,611 (58.3) | 10,979 (60.9) | |
| Drinker | 19,647 (44.4) | 11,869 (39.1) | 8568 (39.6) | 6652 (36.9) | |
| Systolic blood pressure, mean (SD) | 137.5 (18.6) | 137.2 (18.6) | 137.2 (18.7) | 137.5 (18.6) | |
| Diastolic blood pressure, mean (SD) | 85.3 (12.0) | 84.6 (11.9) | 84.6 (12.0) | 84.4 (11.8) | |
| Total cholesterol, mean (SD) | 202.2 (38.4) | 203.8 (39.5) | 204.2 (39.7) | 203.7 (39.0) | |
| Fasting blood sugar level, mean (SD) | 102.3 (35.1) | 104.5 (39.4) | 105 (40.8) | 105.0 (39.6) | |
| Diabetes, n (%)a | 2616 (5.9) | 2250 (7.4) | 1863 (8.6) | 1543 (8.6) | |
| Hyperlipidemia, n (%)a | 3784 (8.6) | 3026 (10.0) | 2495 (11.5) | 1982 (11.0) | |
aThe percent of this variable may not add up to 100% due to the missing value.
Performance of prediction of each outcome across the models with all significant features (N=555).
| Prediction outcome algorithms | Within test | External test | |||||||||||||||
| Accuracy | Recall | Precision | F1-score | Accuracy | Recall | Precision | F1-score | ||||||||||
| Logistic regression | .797 | .807 | .721 | .762 | .609 | .007 | .869 | .013 | |||||||||
| Support vector machine | .796 | .803 | .722 | .760 | .610 | .009 | .019 | ||||||||||
| Decision tree | .780 | .691 | .749 | .740 | .737 | .650 | .691 | ||||||||||
| Random forest | .793 | .799 | .718 | .757 | .644 | ||||||||||||
| Multilayer perceptron | .806 | .803 | .742 | .771 | .616 | .034 | .754 | .065 | |||||||||
| Long short-term memory | .772 | .681 | .716 | .553 | .613 | ||||||||||||
| Logistic regression | .748 | .784 | .540 | .640 | .732 | .048 | .091 | ||||||||||
| Support vector machine | .743 | .797 | .533 | .639 | .068 | .747 | .125 | ||||||||||
| Decision tree | .707 | .492 | .613 | .673 | .449 | .572 | |||||||||||
| Random forest | .723 | .798 | .509 | .622 | .685 | .787 | .461 | ||||||||||
| Multilayer perceptron | .782 | .727 | .098 | .547 | .166 | ||||||||||||
| Logistic regression | .741 | .757 | .471 | .581 | .002 | .005 | |||||||||||
| Support vector machine | .733 | .776 | .463 | .580 | .769 | .002 | .795 | .004 | |||||||||
| Decision tree | .672 | .405 | .544 | .662 | .735 | .381 | .501 | ||||||||||
| Random forest | .698 | .812 | .427 | .560 | .674 | .397 | |||||||||||
| Multilayer perceptron | .787 | .769 | .001 | .833 | .001 | ||||||||||||
aThe highest scores are presented in italics.
Figure 4Model evaluation (F1-score) results based on the number of features across 6 models. (LR: logistic regression, SVM: support vector machine, DT: decision tree, RF: random forest, MLP: multilayer perceptron, LSTM: long short-term memory).
Prediction for cardio-cerebrovascular according to the number of features across 6 models.
| Number of features per algorithms | Within test | External test | |||||||||||||||
| Accuracy | Recall | Precision | F1-score | Accuracy | Recall | Precision | F1-score | ||||||||||
| Logistic regression | .742 | .762 | .658 | .706 | .707 | .801 | .595 | .683 | |||||||||
| Support vector machine | .742 | .764 | .657 | .707 | .707 | .803 | .594 | .683 | |||||||||
| Decision tree | .752 | .656 | .730 | .712 | .858 | .593 | .701 | ||||||||||
| Random forest | .771 | .813 | .684 | .743 | .858 | ||||||||||||
| Multilayer perceptron | .766 | .778 | .691 | .732 | .722 | .815 | .610 | .698 | |||||||||
| Long short-term memory | .752 | .618 | .491 | .620 | |||||||||||||
| Logistic regression | .788 | .793 | .717 | .753 | .609 | .006 | .011 | ||||||||||
| Support vector machine | .790 | .789 | .721 | .753 | .609 | .009 | .814 | .017 | |||||||||
| Decision tree | .774 | .684 | .749 | .735 | .758 | .637 | .692 | ||||||||||
| Random forest | .786 | .813 | .706 | .756 | .643 | ||||||||||||
| Multilayer perceptron | .802 | .785 | .742 | .763 | .609 | .012 | .690 | .024 | |||||||||
| Long short-term memory | .796 | .653 | .770 | .521 | .611 | ||||||||||||
| Logistic regression | .797 | .807 | .721 | .762 | .609 | .007 | .013 | ||||||||||
| Support vector machine | .796 | .803 | .722 | .760 | .610 | .009 | .877 | .019 | |||||||||
| Decision tree | .780 | .691 | .749 | .740 | .737 | .650 | .691 | ||||||||||
| Random forest | .793 | .799 | .718 | .757 | .644 | ||||||||||||
| Multilayer perceptron | .806 | .803 | .742 | .771 | .616 | .034 | .754 | .065 | |||||||||
| Long short-term memory | .772 | .681 | .716 | .553 | .613 | ||||||||||||
aThe highest scores are presented in italics.