Jessie P Bakker 1 , Marco Ross 2 , Andreas Cerny 2 , Ray Vasko 1 , Edmund Shaw 1 , Samuel Kuna 3,4 , Ulysses J Magalang 5 , Naresh M Punjabi 6 , Peter Anderer 2 . Show Affiliations »
Abstract
STUDY OBJECTIVES: To quantify the amount of sleep stage ambiguity across expert scorers and to validate a new auto-scoring platform against sleep staging performed by multiple scorers. METHODS: We applied a new auto-scoring system to three datasets containing 95 PSGs scored by six to twelve scorers, to compare sleep stage probabilities (hypnodensity; that is, the probability of each sleep stage being assigned to a given epoch) as the primary output, as well as a single sleep stage per epoch assigned by hierarchical majority rule. RESULTS: The percentage of epochs with 100% agreement across scorers was 46±9%, 38±10% and 32±9% for the datasets with six, nine, and twelve scorers, respectively. The mean intra-class correlation coefficient between sleep stage probabilities from auto- and manual-scoring was 0.91, representing excellent reliability. Within each dataset, agreement between auto-scoring and consensus manual-scoring was significantly higher than agreement between manual-scoring and consensus manual-scoring (0.78 vs. 0.69; 0.74 vs. 0.67; and 0.75 vs. 0.67; all p<0.01). CONCLUSION: Analysis of scoring performed by multiple scorers reveals that sleep stage ambiguity is the rule rather than the exception. Probabilities of the sleep stages determined by artificial intelligence auto-scoring provide an excellent estimate of this ambiguity. Compared to consensus manual-scoring, sleep staging derived from auto-scoring is for each individual PSG non-inferior to manual-scoring meaning that auto-scoring output is ready for interpretation without the need for manual adjustment. © Sleep Research Society 2022. Published by Oxford University Press on behalf of the Sleep Research Society.
STUDY OBJECTIVES: To quantify the amount of sleep stage ambiguity across expert scorers and to validate a new auto-scoring platform against sleep staging performed by multiple scorers. METHODS: We applied a new auto-scoring system to three datasets containing 95 PSGs scored by six to twelve scorers, to compare sleep stage probabilities (hypnodensity; that is, the probability of each sleep stage being assigned to a given epoch) as the primary output, as well as a single sleep stage per epoch assigned by hierarchical majority rule. RESULTS: The percentage of epochs with 100% agreement across scorers was 46±9%, 38±10% and 32±9% for the datasets with six, nine, and twelve scorers, respectively. The mean intra-class correlation coefficient between sleep stage probabilities from auto- and manual-scoring was 0.91, representing excellent reliability. Within each dataset, agreement between auto-scoring and consensus manual-scoring was significantly higher than agreement between manual-scoring and consensus manual-scoring (0.78 vs. 0.69; 0.74 vs. 0.67; and 0.75 vs. 0.67; all p<0.01). CONCLUSION: Analysis of scoring performed by multiple scorers reveals that sleep stage ambiguity is the rule rather than the exception. Probabilities of the sleep stages determined by artificial intelligence auto-scoring provide an excellent estimate of this ambiguity. Compared to consensus manual-scoring, sleep staging derived from auto-scoring is for each individual PSG non-inferior to manual-scoring meaning that auto-scoring output is ready for interpretation without the need for manual adjustment. © Sleep Research Society 2022. Published by Oxford University Press on behalf of the Sleep Research Society.
Entities: Chemical
Keywords:
Sleep stages; artificial intelligence; hypnodensity; machine learning; polysomnography; validation
Year: 2022
PMID: 35780449 DOI: 10.1093/sleep/zsac154
Source DB: PubMed Journal: Sleep ISSN: 0161-8105 Impact factor: 5.849