| Literature DB >> 31938990 |
Xiaoqing Zhang1,2,3, Mingkai Xu4, Yanru Li1,2,3, Minmin Su1,2,3, Ziyao Xu4, Chunyan Wang1,2,3, Dan Kang1,2,3, Hongguang Li1,2,3, Xin Mu1,2,3, Xiu Ding1,2,3, Wen Xu1,2,3, Xingjun Wang5, Demin Han6,7,8.
Abstract
PURPOSE: To develop an automated framework for sleep stage scoring from PSG via a deep neural network.Entities:
Keywords: Deep learning; Obstructive sleep apnea (OSA); Polysomnography (PSG); Sleep staging
Mesh:
Year: 2020 PMID: 31938990 PMCID: PMC7289784 DOI: 10.1007/s11325-019-02008-w
Source DB: PubMed Journal: Sleep Breath ISSN: 1520-9512 Impact factor: 2.816
Demographics and characteristics of datasets
| Training | Validation | Testing | ||
|---|---|---|---|---|
| Number of participants/epochs | 92/93,788 | 21/20,845 | 152/150103 | |
| Normal | 13/13081 | 3/2958 | 23/22741 | |
| Mild OSA | 19/19612 | 4/3976 | 23/22580 | |
| Moderate OSA | 17/17163 | 4/4262 | 29/27774 | |
| Severe OSA | 43/43932 | 10/9649 | 77/77008 | |
| Sex (male: female) | 64:28 | 16:4 | 129:33 | > 0.05 |
| Age (median, range) | 42.5 (19–68) | 47.5 (22–57) | 38.0 (79–61) | < 0.05* |
| BMI (kg/m2) (median, range) | 25.95 (16.1–38.4) | 27.65 (18.8–34.0) | 26.55 (13.8–46.3) | > 0.05 |
| TST (min) (median, range) | 423.10 (200.5–577.6) | 436.10 (285.5–510.4) | 426.85 (92.0–578.5) | > 0.05 |
| AHI (median, range) | ||||
| Normal | 1.8 (0.5–4.2) | 1.2 (0.6–2.2) | 1.4 (0.2–4.9) | > 0.05 |
| Mild OSA | 11.1 (5.4–13.5) | 9.7 (7.7–14.3) | 9.1 (5.4–14.1) | > 0.05 |
| Moderate OSA | 19.9 (15.3–29.2) | 18.45 (15.1–29.7) | 23.7 (15.1–28.8) | > 0.05 |
| Severe OSA | 51.8 (30.6–105.3) | 66.6 (37.3–97.7) | 56.9 (30.9–112.4) | > 0.05 |
| Sleep stage ( | ||||
| W | 16,201 | 2339 | 26,112 | |
| N1 | 14,839 | 3574 | 24,489 | |
| N2 | 47,889 | 10,744 | 73,395 | |
| N3 | 1881 | 648 | 3289 | |
| R | 12,978 | 3540 | 22,818 | |
Minimum SpO2 (%) (median, range) | 85 (51–96) | 83 (37–94) | 83 (35–95) | > 0.05 |
Number of arousals (median, range) | 79.5 (1–592) | 97 (7–528) | 79.5 (0–692) | > 0.05 |
BMI body mass index, TST total sleep time, AHI apnea–hypopnea index, SpO pulse oxygen saturation
Fig. 1Overall architecture of our method. The left side shows the input signals, consisting of 5 channels: EEG C3/A2, EEG C4/A1, 2-channel EOG, and EMG. The 5-channel signal is divided into 5 groups, as shown in the middle of the figure: Groups 1 to 4 are EEG C3/A2, EEG C4/A1, and 2-channel EOG, respectively, and the fifth group consists of all 5 input signals. Then each group of signals was feed into a CNN model for training and prediction. At the same time, a noise detection algorithm detected the noise in each group. The right part of the illustration shows the integration, and the colored nodes represent integration weights corresponding to different CNN models. We take the weighted-average as each stage’s probability. Notice that the “X” on the weight means that this weight is reset to zero due to noise. After integration, the output prediction was modified by expert-defined rules
Fig. 3Confusion matrix for the predicted sleep stage, displaying the agreement with expert scores.The vertical rows represent the sleep staging scored by the human expert, while the horizontal rows are the predictions for the same epoch of the testing dataset. The diagonal numbers are the epochs for which the prediction of the model matches the human expert at each sleep stage. The model possesses higher consistency for W, N2, and R identification
Model performance with different training algorithms
| Training algorithm | Macro-accuracy | Weighted F1 score | Cohen’s Kappa |
|---|---|---|---|
| Without the 3-epoch splice | 0.8034 | 0.7885 | 0.7044 |
| Without noise detection | 0.8050 | 0.7996 | 0.7105 |
| Without expert rules | 0.8173 | 0.8115 | 0.7266 |
| The proposed model | 0.8181 | 0.8150 | 0.7276 |
AHI apnea–hypopnea index
Model performance on testing dataset according to AHI
| Testing dataset | Macro-accuracy | Weighted-F1 score | Cohen’s Kappa |
|---|---|---|---|
| Normal | 0.8361 | 0.8277 | 0.7560 |
| Mild OSA | 0.8265 | 0.8221 | 0.7433 |
| Moderate OSA | 0.8222 | 0.8153 | 0.7288 |
| Severe OSA | 0.8088 | 0.7981 | 0.7124 |
| Weighted average | 0.8181 | 0.8150 | 0.7276 |
AHI apnea–hypopnea index
Fig. 2(a) Example of an overnight PSG record scored by the model vs. human expert. (b) t-SNE for the last hidden layer of the CNN. Each differently colored point indicates a sleep stage scored by the model, suggesting that the model can discriminate different sleep stages well
Model performance on sleep staging of testing dataset
| Sleep staging | Precision | Recall | F1 score | Number of epochs |
|---|---|---|---|---|
| W | 0.8920 | 0.8680 | 0.8799 | 25,408 |
| N1 | 0.6352 | 0.4667 | 0.5381 | 17,992 |
| N2 | 0.8433 | 0.9059 | 0.8734 | 78,843 |
| N3 | 0.6138 | 0.3919 | 0.4784 | 2100 |
| R | 0.8123 | 0.9171 | 0.8615 | 25,760 |
| Weighted average | 0.8181 | 0.8150 | 0.7276 | 150,103 |
Model performance on testing dataset
| Testing dataset | Top-1 macro-accuracy | Top-2 macro-accuracy | Average increase rate |
|---|---|---|---|
| Normal | 0.8341 | 0.9611 | 0.1270 |
| Mild | 0.8292 | 0.9698 | 0.1407 |
| Moderate | 0.8228 | 0.9512 | 0.1285 |
| Severe | 0.8088 | 0.9619 | 0.1531 |
| The average performance | 0.8184 | 0.9602 | 0.1419 |
AHI apnea–hypopnea index
Distribution of sleep staging with two most significant predicted probabilities for each epoch (without expert rules)
| Number of epochs (%*) | The second largest probability of prediction | |||||
|---|---|---|---|---|---|---|
| W | N1 | N2 | N3 | R | ||
| Maximum probability of prediction (model output) | W | – | 11,984 (47.17%) | 629 (2.47%) | 15 (0.06%) | 2119 (8.33%) |
| N1 | 3994 (22.20%) | – | 6294 (34.98%) | 0 (0%) | 2795 (15.53%) | |
| N2 | 1030 (13.06%) | 52,656 (66.79%) | – | 11,347 (14.39%) | 1392 (9.1 × 10−4%) | |
| N3 | 2 (0.09%) | 0 | 1289 (61.38%) | – | 0 | |
| R | 1265 (6.91%) | 14,886 (57.79%) | 3577 (13.89%) | 0 (0%) | – | |
*Proportion of the same epoch of the testing dataset
Fig. 4According to the growth rate of accuracy, there was no statistical difference in AHI between the two groups, but the number of arousals demonstrated significant differences (p<0.01). AHI=Apnea–Hypopnea Index
Comparation of other methods to the proposed method
| Method | Channel | Acc | Macro-average F1 | Per-class F1 | Cohen’s | ||||
|---|---|---|---|---|---|---|---|---|---|
| W | N1 | N2 | N3 | R | |||||
| Supratak A et al. [ | Fpz-Cz | 0.820 | 0.769 | 0.847 | 0.466 | 0.859 | 0.848 | 0.824 | 0.76 |
| Supratak A et al. [ | Pz-Oz | 0.798 | 0.731 | 0.881 | 0.370 | 0.827 | 0.773 | 0.803 | 0.72 |
| Tsinalis O et al. [ | Fpz-Cz | 0.789 | 0.737 | 0.716 | 0.370 | 0.846 | 0.840 | 0.814 | 0.65 |
| Sun Y, et al. [ | Pz-Oz | 0.810 | 0.736 | 0.856 | 0.249 | 0.889 | 0.792 | 0.863 | 0.73 |
| The proposed model | All | 0.836 | 0.781 | 0.864 | 0.498 | 0.887 | 0.845 | 0.816 | 0.77 |
Acc accuracy