| Literature DB >> 34966518 |
Mohammad Nahid Hossain1, Mohammad Helal Uddin1, K Thapa1, Md Abdullah Al Zubaer1, Md Shafiqul Islam1, Jiyun Lee1, JongSu Park1, S-H Yang2.
Abstract
Cognitive impairment has a significantly negative impact on global healthcare and the community. Holding a person's cognition and mental retention among older adults is improbable with aging. Early detection of cognitive impairment will decline the most significant impact of extended disease to permanent mental damage. This paper aims to develop a machine learning model to detect and differentiate cognitive impairment categories like severe, moderate, mild, and normal by analyzing neurophysical and physical data. Keystroke and smartwatch have been used to extract individuals' neurophysical and physical data, respectively. An advanced ensemble learning algorithm named Gradient Boosting Machine (GBM) is proposed to classify the cognitive severity level (absence, mild, moderate, and severe) based on the Standardised Mini-Mental State Examination (SMMSE) questionnaire scores. The statistical method "Pearson's correlation" and the wrapper feature selection technique have been used to analyze and select the best features. Then, we have conducted our proposed algorithm GBM on those features. And the result has shown an accuracy of more than 94%. This paper has added a new dimension to the state-of-the-art to predict cognitive impairment by implementing neurophysical data and physical data together.Entities:
Mesh:
Year: 2021 PMID: 34966518 PMCID: PMC8712156 DOI: 10.1155/2021/1302989
Source DB: PubMed Journal: J Healthc Eng ISSN: 2040-2295 Impact factor: 2.682
Figure 1The graphical abstract of research planning. (a) Data acquisition. (b) Data preprocessing and analyzing features. (c) Machine learning approach. (d) Result analysis.
SMMSE scores of the participants.
| SMMSE score ≤ 9 | 10 ≤ SMMSE score ≥ 21 | 21 ≤ SMMSE score ≥ 24 | SMMSE score ≥ 25 | |
|---|---|---|---|---|
| Participants | 2 | 3 | 6 | 22 |
| Cognitive impairment | Severe | Moderate | Mild | Normal |
| Gender | One male; one female | Two males; one female | Four males; two females | 19 males; three females |
| Age ± SD | 64 ± 1 | 60 ± 2 | 56 ± 1 | 52 ± 2 |
| No. of months activity recorded | 6 | 6 | 6 | 6 |
Dataset sample.
| Application data | Wearable device data | Group | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| TNW | TT (sec) | ENW | AVG (sec) | AE (kcal) | QST (hour) | WS | St (hour) | HPD | DC (metre) | CT (hour) | Class |
| 10 | 8 | 1 | 1.33 | 305.5 | 7 | 650 | 10 | 84 | 1400 | 1.5 | 0 (N) |
| 12 | 10 | 2 | 1.66 | 240.67 | 8 | 755 | 12 | 78 | 1800 | 2.5 | 1 (MCI) |
| 8 | 7 | 1 | 1 | 332.5 | 6 | 750 | 9 | 65 | 1200 | 2 | 3 (S) |
| 15 | 12 | 2 | 2 | 357.5 | 5 | 595 | 10 | 73 | 1500 | 1.25 | 2 (M) |
| 13 | 10 | 3 | 1.67 | 190.87 | 7 | 580 | 11 | 79 | 1300 | 3 | 3 (S) |
Figure 2The distribution of datasets between original data and augmented data.
Participants' neurophysical and physical activity features.
| Feature | Abbr. | Extraction process |
|---|---|---|
| Total number of words | TNW | Ēt |
| Total time | TT | Ēte represents total time. In the formula, user taken time is represented by |
| Error number count of word | ENW | Ē represents the calculated error number count of words parameter. In the formula, |
| Average time | AVG |
|
| Absolute energy | AE | Energy is assessed in two main parts: active energy (eae) and rest (ere) energy. Ēem is a representation of absolute energy in the given formula. The energy parameter is acquired using a smartwatch daily basis. |
| Quality sleeping time | QST | Based on the National Sleep Foundation's professional's research, an age-specific sleep duration recommendation is called Sleep Health Index (SHI) [ |
| Walking steps | WS | Walking steps (ews) value was taken for a period of time. Ēws stands for walking steps parameter, which is the sum of the incremental steps over the day. The walking steps quantity parameter is acquired using a smartwatch daily basis. |
| Sitting time | ST | The sitting time (est) value was taken while the participant was in an idle mood and not sleeping. Ēst stands for sitting time parameter, which is the sum of the different idle periods over the day. The sitting time parameter is acquired using a smartwatch daily basis. |
| Heart Pulse data | HPD | The average heartbeat value is denoted as the base heartbeat value ( |
| Distance Covered | DC | Distance covered (edc) value was taken while the participant was running. Ēdc stands for distance covered parameter, representing how long the participant was running over the day in the hour's scale. The distance covered parameter is acquired using a smartwatch daily basis. |
| Cycling time | CT | Cycling time (ect) value was taken while the participant was cycling. Ēct stands for cycling time parameter, representing how long the participant was cycling over the day in the hour's scale. The cycling time parameter is acquired using a smartwatch daily basis. |
Figure 3Correlation matrix heatmap with “r” values of features.
Figure 4Correlation matrix heatmap with “p” values of features.
Figure 5The “GBM” model working procedure steps.
Figure 6The “GBM” mathematical modeling block.
Figure 7The “GBDT” block with gamma value in the leaf.
The error rate between the actual score and the estimated score.
| Feature | Base subgroup | Weekday subgroup | Weekend subgroup | Wrapper selected features |
|---|---|---|---|---|
| TNW | 5.102 | 4.795 | 4.988 | 3.125 |
| TT | 4.112 | 4.256 | 4.394 | |
| ENW | 4.226 | 4.129 | 4.186 | |
| AVG | 4.274 | 4.202 | 4.195 | |
| AE | 4.245 | 4.195 | 4.113 | |
| QST | 4.114 | 4.052 | 4.123 | |
| WS | 4.218 | 4.253 | 4.209 | |
| ST | 5.044 | 4.725 | 4.826 | |
| HPD | 4.032 | 4.107 | 3.975 | |
| DC | 5.096 | 4.923 | 4.753 | |
| CT | 5.077 | 4.889 | 4.662 |
Figure 8The distribution of values of the dataset. Features values have been normalized between 0 and 1 and the interquartile range (IQR) in the box between 25% and 75%. The median line 50% of the values of the features falls between 25% and upper 75%, and the “+” sign represents outliers of features.
Summary of overall classification accuracy.
| Classifier name | Accuracy (%) |
|---|---|
| Gradient boosting machine (GBM) | 94.8 |
| Support vector machine (SVM) | 61.5 |
Summary of classification results.
| Cognitive level | Classifier | Precision (%) | Recall (%) |
| Accuracy (%) |
|---|---|---|---|---|---|
| Normal | GBM | 92.2 | 99.2 | 95.5 | 96.6 |
|
| |||||
| SVM | 80.1 | 94.4 | 87.7 | 87.8 | |
|
| |||||
| Mild | GBM | 93.3 | 99.2 | 96.5 | 97.7 |
| SVM | 49.9 | 40.1 | 44.4 | 45.5 | |
|
| |||||
| Moderate | GBM | 89.1 | 90.2 | 87.8 | 91.1 |
| SVM | 64.5 | 48.5 | 54.5 | 56.6 | |
|
| |||||
| Severe | GBM | 91.1 | 92.2 | 88.2 | 94.4 |
| SVM | 64.5 | 49.9 | 55.5 | 58.7 | |
Figure 9The ROC curve demonstrates the performance of the classification of each cognitive impairment level. The yellow curve (mild cognitive impairment) shows higher performance, while the red curve (severe cognitive impairment) has a little bit lower result, and the blue (moderate cognitive impairment) shows lower performance.
Figure 10Comparison between the actual score and the projected score of model. (a) Mild. (b) Moderate. (c) Severe.