| Literature DB >> 34070100 |
Abstract
The rise in dementia among the aging Korean population will quickly create a financial burden on society, but timely recognition of early warning for dementia and proper responses to the occurrence of dementia can enhance medical treatment. Health behavior and medical service usage data are relatively more accessible than clinical data, and a prescreening tool with easily accessible data could be a good solution for dementia-related problems. In this paper, we apply a deep neural network (DNN) to prediction of dementia using health behavior and medical service usage data, using data from 7031 subjects aged over 65 collected from the Korea National Health and Nutrition Examination Survey (KNHANES) in 2001 and 2005. In the proposed model, principal component analysis (PCA) featuring and min/max scaling are used to preprocess and extract relevant background features. We compared our proposed methodology, a DNN/scaled PCA, with five well-known machine learning algorithms. The proposed methodology shows 85.5% of the area under the curve (AUC), a better result than that using other algorithms. The proposed early prescreening method for possible dementia can be used by both patients and doctors.Entities:
Keywords: deep learning; deep neural network; dementia; feature extraction; prediction; principal component analysis
Year: 2021 PMID: 34070100 PMCID: PMC8158341 DOI: 10.3390/ijerph18105386
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1The data selection of the study population from KNHANES.
Figure 2The Flow Chart of the proposed DNN/scaled PCA approach.
Figure 3Diverse Feature scaling plots with the PCA: class 0 is non-dementia patients (blue triangle) and class 1 is dementia patients (red rectangular): (a) 10th and 11th PCA with quantile transformer scaler; (b) 17th and 18th PCAs with min/max scaler; (c) 19th and 20th PCAs with min/max Scaler; (d) 9th and 20th PCAs with standard scaler; (e) 8th and 9th PCAs without scaler; (f) 12th and 13th PCAs without scaler.
AUC results based on the various PCA types.
|
|
|
|
|
|
|
|
| AUC |
| 0.788 | 0.804 | 0.779 | 0.695 |
The top AUC value is highlighted in bold.
Figure 4The percentage of variance in PCA-min/max-transformer scaler.
Figure 5The architecture of the proposed DNN.
AUC results based on the various hyperparameter setting combination.
| Neurons | # of Neuron | 10 | 20 | 30 | 40 | 50 |
| AUC | 0.783 | 0.783 |
| 0.822 | 0.816 | |
| Hidden Layers | # of Layers | 2 | 3 | 4 | 5 | 8 |
| AUC | 0.822 | 0.832 |
| 0.815 | 0.741 | |
| Epochs | Epoch | 25 | 30 | 40 | 50 | 100 |
| AUC | 0.788 | 0.795 | 0.814 |
| 0.811 | |
| Drop-Outs | % of Drop-Out | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 |
| AUC | 0.817 | 0.821 | 0.795 |
| 0.781 |
The top AUC values are highlighted in bold.
Confusion matrix of the proposed model.
| Confusion Matrix Parameters | Predicted | Predicted |
|---|---|---|
| Actual (Dementia) | 23 (TP) | 12 (FN) |
| Actual (Non-Dementia) | 455 (FP) | 1901 (TN) |
Performance result of the proposed model with min/max scaled PCA.
| Classification Model | Threshold | Rc | Sp | Pc | Acc | AUC |
|---|---|---|---|---|---|---|
| DNN | 0.025 |
|
|
|
|
|
| RF | 0.02 | 65.7 | 75.3 | 3.8 | 75.2 |
|
| ABC | 0.465 | 62.8 | 73.7 | 3.4 | 73.5 | 74.1 |
| GNB | 0.035 | 65.7 | 79.3 | 4.5 | 79.1 | 77.2 |
| MLP | 0.005 | 54.2 | 79.1 | 3.7 | 78.8 | 75.3 |
| SVC | 0.035 | 65.7 | 64.5 | 2.6 | 64.5 | 67.6 |
The top two AUC value are highlighted in bold and DNN results are marked as italic.
Figure 6Comparison of ROC curves of six classification methods.
Comparison of performance and methodology.
| Methods | # of Subject | # of Features | Performance | Note | |
|---|---|---|---|---|---|
| Normal | Dementia | ||||
| RF, SVM [ | 40,736 | 614 | 4894 | AUC (0.775) | - |
| Logistic Regression with LASSO [ | 16,655 | 498 | EHR | AUC (0.809) | Patients with unrecognized dementia |
| MLP, SVM [ | 9799 | 4201 | 14 for phase 1 | F-measure (0.739) | High positive cases |
| PCA/DNN | 6928 | 103 | 22 | AUC (0.855) | - |
Correlation coefficients of the 22 variables.
| Variable | Correlation | Variable | Correlation |
|---|---|---|---|
| year | 0.003220 | arthritis | −0.033356 |
| region | −0.000858 | diabetes | −0.004743 |
| age |
| hypertension | −0.034224 |
| gender | 0.035885 | stroke |
|
| marital status | −0.059214 | myocardial infarction | −0.004670 |
| education | −0.033166 | tuberculosis | −0.013385 |
| insurance type | 0.004591 | asthma | −0.011954 |
| the number of family members |
| chronic renal failure | 0.017108 |
| household income | 0.018608 | smoking status | −0.025947 |
| subjective health status |
| drinking | −0.027069 |
| stress awareness | −0.036353 | regular exercise | −0.026474 |
The best correlation (greater than 0.14) is marked in bold and the next three informative values (over ±0.07) are marked in red italics.