Matteo Cesari1, Ambra Stefani1, Thomas Penzel2,3, Abubaker Ibrahim1, Heinz Hackner1, Anna Heidbreder1, András Szentkirályi4, Beate Stubbe5, Henry Völzke6, Klaus Berger4, Birgit Högl1. 1. Department of Neurology, Medical University of Innsbruck, Innsbruck, Austria. 2. Interdisciplinary Sleep Medicine Center, Charité-Universitätsmedizin Berlin, Berlin, Germany. 3. Saratov State University, Saratov, Russian Federation. 4. Institute of Epidemiology and Social Medicine, University of Münster, Münster, Germany. 5. Department of Internal Medicine B, University Medicine Greifswald, Greifswald, Germany. 6. Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany.
Abstract
STUDY OBJECTIVES: The objective of this study was to evaluate interrater reliability between manual sleep stage scoring performed in 2 European sleep centers and automatic sleep stage scoring performed by the previously validated artificial intelligence-based Stanford-STAGES algorithm. METHODS: Full night polysomnographies of 1,066 participants were included. Sleep stages were manually scored in Berlin and Innsbruck sleep centers and automatically scored with the Stanford-STAGES algorithm. For each participant, we compared (1) Innsbruck to Berlin scorings (INN vs BER); (2) Innsbruck to automatic scorings (INN vs AUTO); (3) Berlin to automatic scorings (BER vs AUTO); (4) epochs where scorers from Innsbruck and Berlin had consensus to automatic scoring (CONS vs AUTO); and (5) both Innsbruck and Berlin manual scorings (MAN) to the automatic ones (MAN vs AUTO). Interrater reliability was evaluated with several measures, including overall and sleep stage-specific Cohen's κ. RESULTS: Overall agreement across participants was substantial for INN vs BER (κ = 0.66 ± 0.13), INN vs AUTO (κ = 0.68 ± 0.14), CONS vs AUTO (κ = 0.73 ± 0.14), and MAN vs AUTO (κ = 0.61 ± 0.14), and moderate for BER vs AUTO (κ = 0.55 ± 0.15). Human scorers had the highest disagreement for N1 sleep (κN1 = 0.40 ± 0.16 for INN vs BER). Automatic scoring had lowest agreement with manual scorings for N1 and N3 sleep (κN1 = 0.25 ± 0.14 and κN3 = 0.42 ± 0.32 for MAN vs AUTO). CONCLUSIONS: Interrater reliability for sleep stage scoring between human scorers was in line with previous findings, and the algorithm achieved an overall substantial agreement with manual scoring. In this cohort, the Stanford-STAGES algorithm showed similar performances to the ones achieved in the original study, suggesting that it is generalizable to new cohorts. Before its integration in clinical practice, future independent studies should further evaluate it in other cohorts.
STUDY OBJECTIVES: The objective of this study was to evaluate interrater reliability between manual sleep stage scoring performed in 2 European sleep centers and automatic sleep stage scoring performed by the previously validated artificial intelligence-based Stanford-STAGES algorithm. METHODS: Full night polysomnographies of 1,066 participants were included. Sleep stages were manually scored in Berlin and Innsbruck sleep centers and automatically scored with the Stanford-STAGES algorithm. For each participant, we compared (1) Innsbruck to Berlin scorings (INN vs BER); (2) Innsbruck to automatic scorings (INN vs AUTO); (3) Berlin to automatic scorings (BER vs AUTO); (4) epochs where scorers from Innsbruck and Berlin had consensus to automatic scoring (CONS vs AUTO); and (5) both Innsbruck and Berlin manual scorings (MAN) to the automatic ones (MAN vs AUTO). Interrater reliability was evaluated with several measures, including overall and sleep stage-specific Cohen's κ. RESULTS: Overall agreement across participants was substantial for INN vs BER (κ = 0.66 ± 0.13), INN vs AUTO (κ = 0.68 ± 0.14), CONS vs AUTO (κ = 0.73 ± 0.14), and MAN vs AUTO (κ = 0.61 ± 0.14), and moderate for BER vs AUTO (κ = 0.55 ± 0.15). Human scorers had the highest disagreement for N1 sleep (κN1 = 0.40 ± 0.16 for INN vs BER). Automatic scoring had lowest agreement with manual scorings for N1 and N3 sleep (κN1 = 0.25 ± 0.14 and κN3 = 0.42 ± 0.32 for MAN vs AUTO). CONCLUSIONS: Interrater reliability for sleep stage scoring between human scorers was in line with previous findings, and the algorithm achieved an overall substantial agreement with manual scoring. In this cohort, the Stanford-STAGES algorithm showed similar performances to the ones achieved in the original study, suggesting that it is generalizable to new cohorts. Before its integration in clinical practice, future independent studies should further evaluate it in other cohorts.
Authors: Thomas Penzel; Max Hirshkowitz; John Harsh; Ron D Chervin; Nic Butkov; Meir Kryger; Beth Malow; Michael V Vitiello; Michael H Silber; Clete A Kushida; Andrew L Chesson Journal: J Clin Sleep Med Date: 2007-03-15 Impact factor: 4.062
Authors: Henry Völzke; Dietrich Alte; Carsten Oliver Schmidt; Dörte Radke; Roberto Lorbeer; Nele Friedrich; Nicole Aumann; Katharina Lau; Michael Piontek; Gabriele Born; Christoph Havemann; Till Ittermann; Sabine Schipf; Robin Haring; Sebastian E Baumeister; Henri Wallaschofski; Matthias Nauck; Stephanie Frick; Andreas Arnold; Michael Jünger; Julia Mayerle; Matthias Kraft; Markus M Lerch; Marcus Dörr; Thorsten Reffelmann; Klaus Empen; Stephan B Felix; Anne Obst; Beate Koch; Sven Gläser; Ralf Ewert; Ingo Fietze; Thomas Penzel; Martina Dören; Wolfgang Rathmann; Johannes Haerting; Mario Hannemann; Jürgen Röpcke; Ulf Schminke; Clemens Jürgens; Frank Tost; Rainer Rettig; Jan A Kors; Saskia Ungerer; Katrin Hegenscheid; Jens-Peter Kühn; Julia Kühn; Norbert Hosten; Ralf Puls; Jörg Henke; Oliver Gloger; Alexander Teumer; Georg Homuth; Uwe Völker; Christian Schwahn; Birte Holtfreter; Ines Polzer; Thomas Kohlmann; Hans J Grabe; Dieter Rosskopf; Heyo K Kroemer; Thomas Kocher; Reiner Biffar; Ulrich John; Wolfgang Hoffmann Journal: Int J Epidemiol Date: 2010-02-18 Impact factor: 7.196
Authors: András Szentkirályi; Ambra Stefani; Heinz Hackner; Maria Czira; Inga K Teismann; Henry Völzke; Beate Stubbe; Sven Gläser; Ralf Ewert; Thomas Penzel; Ingo Fietze; Peter Young; Birgit Högl; Klaus Berger Journal: Sleep Date: 2019-03-01 Impact factor: 5.849
Authors: Diane C Lim; Diego R Mazzotti; Kate Sutherland; Jesse W Mindel; Jinyoung Kim; Peter A Cistulli; Ulysses J Magalang; Allan I Pack; Philip de Chazal; Thomas Penzel Journal: Sleep Med Rev Date: 2020-03-20 Impact factor: 11.609
Authors: J F van den Berg; A Knvistingh Neven; J H M Tulen; A Hofman; J C M Witteman; H M E Miedema; H Tiemeier Journal: Int J Obes (Lond) Date: 2008-04-15 Impact factor: 5.095
Authors: Ulysses J Magalang; Ning-Hung Chen; Peter A Cistulli; Annette C Fedson; Thorarinn Gíslason; David Hillman; Thomas Penzel; Renaud Tamisier; Sergio Tufik; Gary Phillips; Allan I Pack Journal: Sleep Date: 2013-04-01 Impact factor: 5.849
Authors: B L Radhakrishnan; E Kirubakaran; Immanuel Johnraja Jebadurai; A Immanuel Selvakumar; J Dinesh Peter Journal: Front Public Health Date: 2022-04-12