| Literature DB >> 35207623 |
Dongyoung Kim1, Jeonggun Lee1, Yunhee Woo1, Jaemin Jeong1, Chulho Kim2,3, Dong-Kyu Kim3,4.
Abstract
Recently, deep learning for automated sleep stage classification has been introduced with promising results. However, as many challenges impede their routine application, automatic sleep scoring algorithms are not widely used. Typically, polysomnography (PSG) uses multiple channels for higher accuracy; however, the disadvantages include a requirement for a patient to stay one or more nights in the lab wearing uncomfortable sensors and wires. To avoid the inconvenience caused by the multiple channels, we aimed to develop a deep learning model for use in clinical decision support systems (CDSSs) and combined convolutional neural networks and a transformer for the supervised learning of three classes of sleep stages only with single-channel EEG data (C4-M1). The data for training, validation, and test were derived from 1590, 341, and 343 polysomnography recordings, respectively. The developed model yielded an overall accuracy of 91.4%, comparable with that of human experts. Based on the severity of obstructive sleep apnea, the model's accuracy was 94.3%, 91.9%, 91.9%, and 90.6% in normal, mild, moderate, and severe cases, respectively. Our deep learning model enables accurate and rapid delineation of three-class sleep staging and could be useful as a CDSS for application in real-world clinical practice.Entities:
Keywords: EEG; deep learning; neural network; sleep scoring; sleep staging
Year: 2022 PMID: 35207623 PMCID: PMC8880374 DOI: 10.3390/jpm12020136
Source DB: PubMed Journal: J Pers Med ISSN: 2075-4426
Figure 1Representative raw data sample from each sleep stage. Bandpass filtering was applied to raw polysomnographic data to reduce impact of noise and for artifact reduction.
Figure 2Simplified deep learning model architecture for automated polysomnography analysis. (a) Sequential information must be created in a single epoch to develop input data from preprocessed data. At this time, in the case of 𝑊𝑖𝑛𝑑𝑜𝑤 > 𝑆𝑡𝑟𝑖𝑑𝑒, overlap as much as the difference occurs. (b) Overall end-to-end architecture of our deep learning model based on transformers with a CNN. (c) Architecture of inner/outer transformer models with independent sets of weight parameters.
Summary of datasets.
| Dataset Type (No. of Patients) | Wake | Non-REM | REM |
|---|---|---|---|
| Training dataset (1590) | 262,511 (23%) | 702,824 (62%) | 163,277 (15%) |
| Validation dataset (341) | 53,644 (23%) | 147,607 (63%) | 34,761 (14%) |
| Testing dataset (343) | 57,662 (23%) | 151,401 (62%) | 34,866 (14%) |
Figure 3Performance of deep neural network model based on sequence length. The learning curve begins to plateau when more than 7 sequential epochs are used.
Profile of datasets based on the severity of obstructive sleep apnea.
| Dataset | Severity | Wake | Non-REM | REM |
|---|---|---|---|---|
| Training (1590) | Normal (109) | 19,062 (23%) | 50,660 (61%) | 13,323 (16%) |
| Mild (229) | 34,908 (21%) | 104,518 (62%) | 28,373 (17%) | |
| Moderate (336) | 57,794 (23%) | 149,596 (61%) | 38,747 (16%) | |
| Severe (916) | 150,747 (24%) | 398,050 (63%) | 82,834 (13%) | |
| Validation (341) | Normal (23) | 3066 (18%) | 11,163 (64%) | 3145 (18%) |
| Mild (49) | 7341 (21%) | 22,227 (63%) | 5556 (16%) | |
| Moderate (72) | 11,995 (24%) | 31,829 (62%) | 7200 (14%) | |
| Severe (197) | 31,242 (24%) | 82,388 (62%) | 18,860 (14%) | |
| Testing (343) | Normal (24) | 3260 (18%) | 11,952 (65%) | 3200 (17%) |
| Mild (50) | 8367 (23%) | 22,677 (62%) | 5326 (15%) | |
| Moderate (72) | 12,443 (23%) | 32,361(61%) | 8576 (16%) | |
| Severe (197) | 33,592 (25%) | 84,411 (62%) | 17,764 (13%) |
Figure 4Evaluation of training times for best training accuracy. One iteration means that all the data samples in the entire training set have been used to train a deep learning model. (a) Training/validation/test accuracy trend for a single-epoch model. (b) Training/validation/test accuracy trend for a multi-epoch model.
Deep learning model performance for three-class sleep scoring.
| Wake | Non-REM | REM | |
|---|---|---|---|
| Recall | 0.85 | 0.95 | 0.85 |
| Precision | 0.89 | 0.93 | 0.88 |
| F1 score | 0.87 | 0.94 | 0.86 |
| Cohen’s Kappa | 0.84 | ||
| Macro F1-score | 0.89 | ||
| Weighted accuracy | 88.49% | ||
| Accuracy | 91.45% | ||
Figure 5Classification performance of the deep learning model for sleep stage scoring. Confusion matrix showing numbers of samples classified correctly or incorrectly as a percentage: precision and recall values for each case.
Performance of deep learning model based on the severity of obstructive sleep apnea.
| Cohen’s Kappa | Macro-F1-Score | Weighted Accuracy | Accuracy | |
|---|---|---|---|---|
| Normal | 0.89 | 0.92 | 92.35% | 94.18% |
| Mild | 0.88 | 0.92 | 92.10% | 93.82% |
| Moderate | 0.85 | 0.89 | 89.55% | 91.60% |
| Severe | 0.82 | 0.87 | 87.86% | 90.39% |
Figure 6Performance of the real-time processing for sleep stage classification. The inference speed with according to the clock frequency of the CPU core.
Comparison of performance between previous studies and the present study.
| Model | Method | Class-Wise Recall | Class-Wise Precision | Overall Metrics | |||||
|---|---|---|---|---|---|---|---|---|---|
| Wake | NREM | REM | Wake | NREM | REM | Accuracy | |||
| Single-Epoch | DeepSleepNet [ | CNN | 0.76 | 0.93 | 0.74 | 0.90 | 0.89 | 0.68 | 86.06 |
| AttnSleep [ | CNN | 0.80 | 0.94 | 0.78 | 0.90 | 0.91 | 0.78 | 88.71 | |
| The present work | Transformer | 0.83 | 0.95 | 0.79 | 0.89 | 0.91 | 0.81 | 89.50 | |
| Multi-Epoch | DeepSleepNet [ | CNN + RNN | 0.84 | 0.95 | 0.78 | 0.87 | 0.92 | 0.86 | 89.88 |
| AttnSleep [ | CNN + RNN | 0.82 | 0.96 | 0.85 | 0.91 | 0.92 | 0.85 | 90.93 | |
| The present work -1 | Transformer + RNN | 0.85 | 0.95 | 0.86 | 0.89 | 0.93 | 0.86 | 91.38 | |
| The present work -2 | Inner + Outer Transformer | 0.85 | 0.95 | 0.85 | 0.89 | 0.93 | 0.88 | 91.45 | |
Accuracy performance of a single channel-based deep learning model according to the channel types (EEG/EOG) and positions.
| Channel | Class-Wise Recall | Class-Wise Precision | Overall Metrics | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Wake | NREM | REM | Wake | NREM | REM | Accuracy | |||
| Single-Epoch | EEG | C3-M2 | 0.81 | 0.95 | 0.75 | 0.90 | 0.90 | 0.81 | 88.97 |
| C4-M1 | 0.83 | 0.95 | 0.79 | 0.89 | 0.91 | 0.81 | 89.50 | ||
| F3-M2 | 0.82 | 0.95 | 0.80 | 0.90 | 0.91 | 0.81 | 89.44 | ||
| F4-M1 | 0.80 | 0.95 | 0.83 | 0.91 | 0.91 | 0.79 | 89.40 | ||
| O1-M2 | 0.78 | 0.92 | 0.79 | 0.89 | 0.90 | 0.69 | 86.74 | ||
| O2-M1 | 0.78 | 0.93 | 0.78 | 0.90 | 0.90 | 0.73 | 87.46 | ||
| EOG | E1-M2 | 0.81 | 0.95 | 0.81 | 0.89 | 0.91 | 0.82 | 89.32 | |
| E2-M1 | 0.81 | 0.93 | 0.84 | 0.89 | 0.92 | 0.78 | 89.07 | ||
| Multi-Epoch | EEG | C3-M2 | 0.85 | 0.95 | 0.83 | 0.89 | 0.93 | 0.87 | 91.10 |
| C4-M1 | 0.85 | 0.95 | 0.85 | 0.89 | 0.93 | 0.88 | 91.45 | ||
| F3-M2 | 0.86 | 0.95 | 0.84 | 0.88 | 0.93 | 0.88 | 91.21 | ||
| F4-M1 | 0.87 | 0.94 | 0.84 | 0.87 | 0.94 | 0.88 | 91.18 | ||
| O1-M2 | 0.83 | 0.94 | 0.82 | 0.89 | 0.92 | 0.82 | 89.79 | ||
| O2-M1 | 0.83 | 0.95 | 0.82 | 0.88 | 0.92 | 0.85 | 90.17 | ||
| EOG | E1-M2 | 0.83 | 0.96 | 0.86 | 0.90 | 0.93 | 0.88 | 91.27 | |
| E2-M1 | 0.83 | 0.96 | 0.86 | 0.90 | 0.93 | 0.87 | 91.23 | ||
Deep learning model performance according to channel combinations.
| Channel | Class-Wise Recall | Class-Wise Precision | Overall Metrics | |||||
|---|---|---|---|---|---|---|---|---|
| Wake | NREM | REM | Wake | NREM | REM | Accuracy | ||
| C4-M1 | 0.85 | 0.95 | 0.85 | 0.89 | 0.93 | 0.88 | 91.45 | |
| C4 + EMG | 0.85 | 0.95 | 0.89 | 0.90 | 0.94 | 0.86 | 91.70 | |
| C4 + E2(EOG) | 0.86 | 0.95 | 0.89 | 0.89 | 0.94 | 0.89 | 92.16 | |
| C4 + EMG + E2(EOG) | 0.87 | 0.95 | 0.90 | 0.89 | 0.95 | 0.88 | 92.27 | |
| C4 + F4 + O2 + EMG + E2(EOG) | 0.88 | 0.95 | 0.90 | 0.88 | 0.95 | 0.89 | 92.41 | |
| Multi-EEG | 2 EEG | 0.86 | 0.95 | 0.88 | 0.90 | 0.94 | 0.87 | 92.02 |
| 3 EEG | 0.86 | 0.95 | 0.89 | 0.90 | 0.94 | 0.88 | 92.24 | |
| 4 EEG | 0.84 | 0.96 | 0.87 | 0.90 | 0.93 | 0.87 | 91.76 | |
| 5 EEG | 0.87 | 0.95 | 0.89 | 0.89 | 0.94 | 0.89 | 92.47 | |
| 6 EEG | 0.85 | 0.96 | 0.88 | 0.91 | 0.94 | 0.89 | 92.33 | |
Summary of the SHHS dataset.
| Dataset Type (No. of Patients) | Wake | Non-REM | REM |
|---|---|---|---|
| Training dataset (3884) | 1,134,126 (29%) | 2,247,931 (57%) | 548,083 (14%) |
| Validation dataset (832) | 243,558 (29%) | 482,490 (57%) | 116,685 (14%) |
| Testing dataset (834) | 242,631 (29%) | 481,646 (57%) | 119,010 (14%) |
Performance accuracy of a single-channel EEG-based deep learning model based on a public SHHS dataset.
| Channel | Class-Wise Recall | Class-Wise Precision | Overall Metrics | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Wake | NREM | REM | Wake | NREM | REM | Accuracy | |||
| Single-Epoch | EEG | C4-A1 | 0.821 | 0.961 | 0.653 | 0.958 | 0.856 | 0.817 | 87.69% |
| C3-A2 | 0.822 | 0.939 | 0.766 | 0.943 | 0.879 | 0.777 | 88.08% | ||
| Multi-Epoch | EEG | C4-A1 | 0.891 | 0.960 | 0.835 | 0.951 | 0.920 | 0.879 | 92.26% |
| C3-A2 | 0.895 | 0.943 | 0.864 | 0.930 | 0.928 | 0.856 | 91.82% | ||