| Literature DB >> 26464268 |
Orestis Tsinalis1, Paul M Matthews2, Yike Guo3.
Abstract
We developed a machine learning methodology for automatic sleep stage scoring. Our time-frequency analysis-based feature extraction is fine-tuned to capture sleep stage-specific signal features as described in the American Academy of Sleep Medicine manual that the human experts follow. We used ensemble learning with an ensemble of stacked sparse autoencoders for classifying the sleep stages. We used class-balanced random sampling across sleep stages for each model in the ensemble to avoid skewed performance in favor of the most represented sleep stages, and addressed the problem of misclassification errors due to class imbalance while significantly improving worst-stage classification. We used an openly available dataset from 20 healthy young adults for evaluation. We used a single channel of EEG from this dataset, which makes our method a suitable candidate for longitudinal monitoring using wearable EEG in real-world settings. Our method has both high overall accuracy (78%, range 75-80%), and high mean [Formula: see text]-score (84%, range 82-86%) and mean accuracy across individual sleep stages (86%, range 84-88%) over all subjects. The performance of our method appears to be uncorrelated with the sleep efficiency and percentage of transitional epochs in each recording.Entities:
Keywords: Deep learning; EEG; Electroencephalography; Ensemble learning
Mesh:
Year: 2015 PMID: 26464268 PMCID: PMC4837220 DOI: 10.1007/s10439-015-1444-y
Source DB: PubMed Journal: Ann Biomed Eng ISSN: 0090-6964 Impact factor: 3.934
The Rechtschaffen and Kales sleep staging criteria,20 adapted from.22
| Sleep stage | Scoring criteria |
|---|---|
| Non-REM 1 (N1) | 50% of the epoch consists of relatively low voltage mixed (2–7 Hz) activity, and <50% of the epoch contains alpha (8–13 Hz) activity. Slow rolling eye movements lasting several seconds often seen in early N1. |
| Non-REM 2 (N2) | Appearance of sleep spindles and/or K-complexes and <20% of the epoch may contain high voltage (>72 |
| Non-REM 3 (N3) | 20–50% (formerly N3) or >50% (formerly N4) of the epoch consists of high voltage (>75 |
| REM (R) | Relatively low voltage mixed (2–7 Hz) frequency EEG with episodic rapid eye movements and absent or reduced chin EMG activity. |
| Wake (W) | >50% of the epoch consists of alpha (8–13 Hz) activity or low voltage, mixed (2–7 Hz) frequency activity. |
The transition rules summarized from the AASM sleep scoring manual.11 Chapter IV: Visual Rules for Adults, pp. 23–31]
| Stage pair | Transition pattern | Rule | Differentiating features |
|---|---|---|---|
| N1–N2 | N1-{N1,N2} | 5.A.Note.1 | Arousal, K-complexes, sleep spindles |
| (N2-)N2-{N1,N2}(-N2) | 5.B.1 | K-complexes, sleep spindles | |
| 5.C.1.b | Arousal, K-complexes, sleep spindles | ||
| N2-{N1-N1,N2-N2}-N2 | 5.C.1.c | Alpha, body movement, slow eye movement | |
| N1-R | R-R-{N1,R}-N2 | 7.B | Chin EMG tone |
| 7.C.1.b | Chin EMG tone | ||
| 7.C.1.c | Chin EMG tone, arousal, slow eye movement | ||
| R-{N1-N1-N1,R-R-R} | 7.C.1.d | Alpha, body movement, slow eye movement | |
| N2-R | R-R-{N2,R}-N2 | 7.C.1.e | Sleep spindles |
| (N2-)N2-{N2,R}-R(-R) | 7.D.1 | Chin EMG tone | |
| 7.D.2 | Chin EMG tone, K-complexes, sleep spindles | ||
| 7.D.3 | K-complexes, sleep spindles |
Curly braces indicate choice between the stages or stage progressions in the set based on the distinctive features, and parentheses indicate optional epochs
Peak frequencies and number of wavelet cycles per frequency for time-frequency analysis using complex Morlet wavelets.
| Target frequency band | Target sleep stages | Frequency or time precision | Peak frequency (Hz) | Number of wavelet cycles |
|---|---|---|---|---|
| Slow (0.5–2 Hz) | N3 | Time | 0.7 | 3 |
| Slow (0.5–0 Hz) | N3 | Time | 1 | 3 |
| Slow (0.5–2 Hz) | N3 | Time | 1.5 | 3 |
| Slow (0.5–2 Hz) | N3 | Time | 2 | 3 |
| K-complex (1.6–4 Hz) | N2 | Time | 2 | 3 |
| K-complex (1.6–4 Hz) | N2 | Time | 3.2 | 3 |
| delta/theta (2–7 Hz) | N1,R,W | Intermediate | 3 | 5 |
| delta/theta (2–7 Hz) | N1,R,W | Intermediate | 4 | 5 |
| delta/theta (2–7 Hz) | N1,R,W | Intermediate | 5 | 5 |
| delta/theta (2–7 Hz) | N1,R,W | Intermediate | 6 | 5 |
| alpha (8–13 Hz) | N1,W | Frequency | 8 | 10 |
| alpha (8–13 Hz) | N1,W | Frequency | 10 | 10 |
| alpha (8–13 Hz) | N1,W | Frequency | 12 | 10 |
| Spindle (12–15 Hz) | N2,N3 | Time | 12 | 3 |
| Spindle (12–15 Hz) | N2, N3 | Time | 13 | 3 |
| Spindle (12–15 Hz) | N2,N3 | Time | 14 | 3 |
| Spindle (12–15 Hz) | N2,N3 | Time | 15 | 3 |
| beta (15–30 Hz) | N1 (arousal) | Time | 16 | 3 |
| beta (15–30 Hz) | N1 (arousal) | Time | 18 | 3 |
| beta (15–30 Hz) | W | Intermediate | 20 | 5 |
| gamma (30–100 Hz)a | N1,N2,N3,R,W | Intermediate | 40 | 5 |
a There is evidence in the literature that features from modalities other than EEG, such as eye movements,27 stage R sleep13 and EMG activity,8,25 can manifest themselves in the gamma activity of EEG
Features extracted from the single channel EEG signal.
| Feature | Number | Purpose | Transform |
|---|---|---|---|
| Power of frequency-band power over the entire epoch | 22 | Capture the overall presence of the particular frequency band in the signal |
|
| Power of frequency-band power using a sliding window | 231 | Capture the presence of the particular frequency band in the signal across time |
|
| Time-domain signal power over the entire epoch | 1 | Capture the overall amplitude characteristics of the signal |
|
| Time-domain signal power using a sliding window | 11 | Capture the amplitude characteristics of the signal over time |
|
| Frequency-band power-power correlation | 242 | Capture the relationships between the different frequency bands over time | None |
| Time-domain signal autocorrelation | 50 | Capture long-term dependencies in the signal |
|
| ALL | 557 |
Comparison between our method and the literature across the five scoring performance metrics (precision, sensitivity, -score, per-stage accuracy, and overall accuracy).
| Scoring performance metrics | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Precision | Sensitivity |
| Accuracy | ||||||
|
| Mean | Worst | Mean | Worst | Mean | Worst | Mean | Worst | Overall |
| Independent training and testing | |||||||||
| Ref. | 93 |
| 77 | 29 | 82 | 43 | 86 | 63 | 77 |
| Ref. | 90 | 82 | 73 | 19 | 77 | 31 | 83 | 57 | 73 |
| Ref. | 92 | 88 | 74 | 36 | 81 | 51 | 84 | 66 | 74 |
| (92) | (86) | (75) | (55) | (82) | (68) | (84) | (74) | (75) | |
| Current |
| 88 |
|
|
|
|
|
|
|
| (94) | (90) | (80) | (65) | (86) | (75) | (88) | (78) | (80) | |
|
|
|
|
|
|
|
|
|
| |
| Non-independent training and testing | |||||||||
| Ref. | 93 | 88 | 77 | 53 | 84 | 68 | 86 | 75 | 77 |
| Current |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For the binary metrics, we report the mean performance (over all five sleep stages) as well as the worst performance (in the most misclassified sleep stage, always stage N1). We present the results for our method using the Fpz-Cz electrode with cross-validation using both independent and non-independent training and testing. The numbers in parentheses are the bootstrap 95% confidence interval bounds for the mean performance across subjects. The signed numbers in italics indicate the improvement (positive) or deterioration (negative) in performance over the second best (improvement) or best (deterioration) method in the literature
Confusion matrix from cross-validation using the Fpz-Cz electrode.
| N1 (algorithm) | N2 (algorithm) | N3 (algorithm) | R (algorithm) | W (algorithm) | |
|---|---|---|---|---|---|
| N1 (expert) |
|
|
|
|
|
| N2 (expert) |
|
|
|
|
|
| N3 (expert) |
|
|
|
|
|
| R (expert) |
|
|
|
|
|
| W (expert) |
|
|
|
|
|
This confusion matrix is the sum of the confusion matrices from each fold. The numbers in bold are numbers of epochs. The numbers in parentheses are the percentage of epochs that belong to the class classified by the expert (rows) that were classified by our algorithm as belonging to the class indicated by the columns
Normalized confusion matrices from 20-fold cross-validation using the Fpz-Cz electrode without and with neighboring epochs. All values are percentages. Pairs of stages with mutual improvement are in bold (N1–N2, N1-R and N2-R).
| Algorithm | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Without neighboring epochs | With neighboring epochs | |||||||||
| N1 | N2 | N3 | R | W | N1 | N2 | N3 | R | W | |
| N1 (expert) | 53 |
| 0 |
| 18 | 60 |
| 0 |
| 17 |
| N2 (expert) |
| 77 | 7 |
| 4 |
| 78 | 7 |
| 4 |
| N3 (expert) | 0 | 8 | 89 | 0 | 3 | 0 | 8 | 89 | 0 | 3 |
| R (expert) |
|
| 0 | 73 | 5 |
|
| 0 | 80 | 4 |
| W (expert) | 13 | 1 | 1 | 4 | 82 | 13 | 1 | 1 | 4 | 81 |
Correlation between sleep efficiency and percentage of transitional epochs, and scoring performance (-score and overall accuracy).
| Metric | Recording parameters | |||
|---|---|---|---|---|
| Sleep efficiency | Percentage of transitional epochs | |||
|
|
|
|
| |
|
| 0.02 | 0.42 | 0.04 | 0.20 |
| Overall accuracy | 0.02 | 0.46 | 0.05 | 0.17 |
Figure 1-score as a function of sleep efficiency.
Figure 2-score as a function of transitional epochs.
Figure 3The original manually scored hypnogram (top) and the estimated hypnogram using our algorithm (bottom) for the second night of subject number 2.