Literature DB >> 35951083

Sleep scoring moving from visual scoring towards automated scoring.

Thomas Penzel¹.

Abstract

Entities: Chemical

Mesh：

Year: 2022 PMID： 35951083 PMCID： PMC9548668 DOI： 10.1093/sleep/zsac190

Source DB: PubMed Journal: Sleep ISSN： 0161-8105 Impact factor: 6.313

× No keyword cloud information.

Introduction

For many years we have been using visual sleep scoring to quantify sleep stages and all events occurring during sleep. This required a long path with initial standardization of rules for visual sleep staging starting with the sleep of healthy young volunteers and continuing with all the sleep abnormalities we know today. Currently, rules for sleep staging are laid out in the AASM manual version 2.6 which provides definitions for sleep stages and the most commonly observed events related to the sleep disorders with the highest prevalence [1]. We recognize the high variability in sleep scoring results achieved by expert sleep scorers [2, 3]. Visual sleep scoring is still, however, a very valid task because we may observe unexpected events during sleep and this teaches us much about the abnormalities observed during sleep. Visual sleep scoring is also very important for newcomers to the field of sleep medicine, so that they learn and understand how sleep changes across the night, how much sleep varies from person to person, and how to identify unusual and abnormal events during sleep.

The Problem

Sleep recording and sleep scoring have become part of regular diagnostic procedures in sleep medicine and have become a kind of “mass production” for diagnosing sleep disorders given the high prevalence of these disorders. In the process of “mass production,” sleep scoring, as far as routine clinical work, can be tedious, can be boring, and hence can be sensitive to errors in boring work, related to the normal variation of vigilance in the sleep scorer when doing his/her job. Today we have validated computer software, which helps to make the boring work of scoring raw medical data more robust against human errors. The best example for making use of modern computer software to make human scoring of medical data more robust is cancer diagnosis and especially scoring of mammography x-ray images [4]. Errors in scoring mammography x-ray images should be lower than 5%. And they are in the 5% range with automated approaches according to quality control studies [5]. For sleep stage scoring, we still accept error rates of 15% or more if counted on an epoch-by-epoch comparison. If the agreement between sleep stage scorers is 85%, this is acceptable, and start to be worried if the agreement drops below 70%. As a side note, these numbers are a simplified estimate, because quantitative reliability studies use many different methods, like Cohen’s K, Fleiss K, Pearson product-moment correlation coefficient, and intraclass correlation coefficient (ICC) which reflect different statistical properties of differences [3, 6]. Making use of algorithms and software approaches as used for cancer diagnosis (e.g. machine learning, artificial intelligence, artificial neural networks, big data analysis, and deep learning) can help us improve the accuracy of sleep staging and make it more robust. How could this be achieved? First, I propose we need a common reference database on which sleep staging software can be tested and against which sleep staging software needs to be validated. We know that a lot of training scorers in different sleep centers will reduce the variability in scoring between centers [7]. The AASM Interscorer reliability program is one effort to decrease variability in scoring by encouraging the training of sleep scorers [8]. We also know that improving definitions for events helps decrease the variability in scoring. This has worked well for the scoring of apnea and hypopnea events [6, 9]. Agreement between scorers became moderate and clinically acceptable with this approach [10]. This is helpful for sleep apnea diagnosis. It means that sleep apnea diagnosis achieves reliable results. The approach of redefining sleep stages and making definitions more precise and simple was one of the initial aims when creating the AASM manual for the scoring of sleep. Some fuzziness in the definitions of Rechtschaffen and Kales was removed and the definitions became more precise. Without a doubt, this reduced the variability in the results of expert sleep scoring sleep stages to some extent [11]. However, too much variability is still observed for the manual scoring of sleep stages [2, 3].

Proposed Solution

In this issue of SLEEP, there is a report of a new computer-supported sleep scoring software that has been compared against sleep scoring ambiguity across expert sleep scorers with their visual scoring results [12]. A new and very promising approach of this newly published work is, that the authors decided to apply their sleep scoring software on three databases. Choosing different datasets really includes a representative variety of sleep recordings and of expertise in sleep scorers. We know that there are “decision flavors” among groups of human expert sleep scorers. With “flavors” I mean group differences in scoring, as observed between nations, possibly schools educating sleep scoring, and similar differences [13]. Based on this concept, it is impossible to find a perfect agreement among human scorers and as a consequence, agreement with a sleep scoring software will be challenging. It is time to overcome exactly this limitation. Taking several datasets together with a variety of healthy sleepers and pathological sleep, and with a variety of expert sleep scorers, all really experienced in their scoring of sleep stages and events, is the way forward and reach a better agreement for tasks like sleep scoring. An exciting result of the published work is, that there is no reference sleep expert truth. The hypnodensity-based chart presents probabilities for sleep stages and thus treats all expert sleep scorers equally [14]. There is no longer a gold reference sleep scorer expert, but the reference is built from probabilities of sleep stage scoring. This is a fair and adequate way to build a reference dataset for testing sleep scoring software. The influence of different “flavors” of different sleep scorers is averaged out to some extent. Is it possible to create a database of sleep stage scoring accepted by all sleep centers and all sleep stage scoring experts? The new reference dataset finally should be held by a professional scientific society like AASM or by an academic institution, possibly linked to a generally accessible sleep resource database. But this is a different discussion. The strength of the study presented here is, that it presents a new approach to auto-scoring, which despite using up-to-date computer software algorithms, reaches a limited accuracy. As this report explains, accepting the limited accuracy is a very fair way, by showing the ambiguity of multiple expert scorers with the hypnodensity-based approach. The hypnodensity-based approach does not favor one sleep scorer, but is based on probabilities. We can learn from this study, that the time is now to create computer-based automated sleep scoring. The computational tools are out there. We can learn, that we need to create a reference database for the validation of computer-based automated sleep scoring. We can learn that we need to treat human expert sleep scorers equally by using hypnodensity-based probabilities. And finally, we should come to a better sleep stage accuracy than 85% by using the help of modern computer algorithms. Of course, we can never forget that visual sleep scoring will remain part of education in sleep medicine.

13 in total

1. Computer-Assisted Automated Scoring of Polysomnograms Using the Somnolyzer System.

Authors: Naresh M Punjabi; Naima Shifa; Georg Dorffner; Susheel Patil; Grace Pien; Rashmi N Aurora
Journal: Sleep Date: 2015-10-01 Impact factor: 5.849

2. Inter-scorer reliability between sleep centers can teach us what to improve in the scoring rules.

Authors: Thomas Penzel; Xiaozhe Zhang; Ingo Fietze
Journal: J Clin Sleep Med Date: 2013-01-15 Impact factor: 4.062

3. Performance of an automated polysomnography scoring system versus computer-assisted manual scoring.

Authors: Atul Malhotra; Magdy Younes; Samuel T Kuna; Ruth Benca; Clete A Kushida; James Walsh; Alexandra Hanlon; Bethany Staley; Allan I Pack; Grace W Pien
Journal: Sleep Date: 2013-04-01 Impact factor: 5.849

4. Automatic sleep stage classification with deep residual networks in a mixed-cohort setting.

Authors: Alexander Neergaard Olesen; Poul Jørgen Jennum; Emmanuel Mignot; Helge Bjarup Dissing Sorensen
Journal: Sleep Date: 2021-01-21 Impact factor: 5.849

5. Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: Hypnodensity based on multiple expert scorers and auto-scoring.

Authors: Jessie P Bakker; Marco Ross; Andreas Cerny; Ray Vasko; Edmund Shaw; Samuel Kuna; Ulysses J Magalang; Naresh M Punjabi; Peter Anderer
Journal: Sleep Date: 2022-07-03 Impact factor: 5.849

6. Agreement in computer-assisted manual scoring of polysomnograms across sleep centers.

Authors: Samuel T Kuna; Ruth Benca; Clete A Kushida; James Walsh; Magdy Younes; Bethany Staley; Alexandra Hanlon; Allan I Pack; Grace W Pien; Atul Malhotra
Journal: Sleep Date: 2013-04-01 Impact factor: 5.849

7. Interrater reliability for sleep scoring according to the Rechtschaffen & Kales and the new AASM standard.

Authors: Heidi Danker-Hopfe; Peter Anderer; Josef Zeitlhofer; Marion Boeck; Hans Dorn; Georg Gruber; Esther Heller; Erna Loretz; Doris Moser; Silvia Parapatics; Bernd Saletu; Andrea Schmidt; Georg Dorffner
Journal: J Sleep Res Date: 2009-03 Impact factor: 3.981

8. Interrater reliability of sleep stage scoring: a meta-analysis.

Authors: Yun Ji Lee; Jae Yong Lee; Jae Hoon Cho; Ji Ho Choi
Journal: J Clin Sleep Med Date: 2022-01-01 Impact factor: 4.062

9. MAMMO_QC: Free software for quality control (QC) analysis in digital mammography and digital breast tomosynthesis compliant with the European guidelines and EUREF/EFOMP protocols.

Authors: Massimiliano Porzio; Anastasios C Konstantinidis
Journal: Biomed Phys Eng Express Date: 2021-10-20

10. Expert-level sleep scoring with deep neural networks.

Authors: Siddharth Biswal; Haoqi Sun; Balaji Goparaju; M Brandon Westover; Jimeng Sun; Matt T Bianchi
Journal: J Am Med Inform Assoc Date: 2018-12-01 Impact factor: 4.497