Shujian Deng1,2,3,4, Xin Zhang1,2,3,4, Ying Zhang1,2,3,4, He Gao5, Eric I-Chao Chang6, Yubo Fan1,2,3,4, Yan Xu7,8,9,10,11,12. 1. School of Biological Science and Medical Engineering and Research Institute, Beihang University, Shenzhen, China. 2. Key Laboratory of Biomechanics and Mechanobiology of Ministry of Education, Beihang University, Beijing, 100191, China. 3. State Key Laboratory of Software Development Environment, Beihang University, Beijing, 100191, China. 4. Beijing Advanced Innovation Centre for Biomedical Engineering, Beihang University, Beijing, 100191, China. 5. Clinical Sleep Medicine Center, The General Hospital of the Air Force, Beijing, 100142, China. 6. Microsoft Research Asia, Beijing, 100080, China. 7. School of Biological Science and Medical Engineering and Research Institute, Beihang University, Shenzhen, China. xuyan04@gmail.com. 8. Key Laboratory of Biomechanics and Mechanobiology of Ministry of Education, Beihang University, Beijing, 100191, China. xuyan04@gmail.com. 9. State Key Laboratory of Software Development Environment, Beihang University, Beijing, 100191, China. xuyan04@gmail.com. 10. Beijing Advanced Innovation Centre for Biomedical Engineering, Beihang University, Beijing, 100191, China. xuyan04@gmail.com. 11. Microsoft Research Asia, Beijing, 100080, China. xuyan04@gmail.com. 12. Beihang University, Xueyuan Road No. 37, Beijing, China. xuyan04@gmail.com.
Abstract
OBJECTIVES: To determine inter-lab reliability in sleep stage scoring using the 2014 American Academy of Sleep Medicine (AASM) manual. To understand in-depth reasons for disagreement and provide suggestions for improvement. METHODS: This study consisted of 40 all-night polysomnographys (PSGs) from different samples. PSGs were segmented into 37,642 30-s epochs. Five doctors from China and two doctors from America scored the epochs following the 2014 AASM standard. Scoring disagreement between two centers was evaluated using Cohen's kappa (κ). After visual inspection of PSGs of deviating scorings, potential disagreement reasons were analyzed. RESULTS: Inter-lab reliability yielded a substantial degree (κ = 0.75 ± 0.01). Scoring for stage W (κ = 0.89) and R (κ = 0.87) achieved the highest agreement, while stage N1 (κ = 0.45) reflected the lowest. Considering the relative disagreement ratio, N2-N3 (22.09%), W-N1 (19.68%), and N1-N2 (18.75%) were the most frequent combinations of discrepancy. American and Chinese doctors showed certain characteristics in the scoring of discrepancy combination W-N1, N1-N2, and N2-N3. There are seven reasons for disagreement, namely "on-threshold characteristic" (29.21%), "context influence" (18.06%), "characteristic identification difficulty" (8.81%), "arousal-wake confusion" (7.57%), "derivation inconsistence" (2.15%), "on-borderline characteristic" (0.92%), and "misrecognition" (33.27%). CONCLUSIONS: This study demonstrated the sleep stage scoring agreement of the 2014 AASM manual and explored potential sources of labeling ambiguity. Improvement measures were suggested accordingly to help remove ambiguity for scorers and improve scoring reliability at the international level.
OBJECTIVES: To determine inter-lab reliability in sleep stage scoring using the 2014 American Academy of Sleep Medicine (AASM) manual. To understand in-depth reasons for disagreement and provide suggestions for improvement. METHODS: This study consisted of 40 all-night polysomnographys (PSGs) from different samples. PSGs were segmented into 37,642 30-s epochs. Five doctors from China and two doctors from America scored the epochs following the 2014 AASM standard. Scoring disagreement between two centers was evaluated using Cohen's kappa (κ). After visual inspection of PSGs of deviating scorings, potential disagreement reasons were analyzed. RESULTS: Inter-lab reliability yielded a substantial degree (κ = 0.75 ± 0.01). Scoring for stage W (κ = 0.89) and R (κ = 0.87) achieved the highest agreement, while stage N1 (κ = 0.45) reflected the lowest. Considering the relative disagreement ratio, N2-N3 (22.09%), W-N1 (19.68%), and N1-N2 (18.75%) were the most frequent combinations of discrepancy. American and Chinese doctors showed certain characteristics in the scoring of discrepancy combination W-N1, N1-N2, and N2-N3. There are seven reasons for disagreement, namely "on-threshold characteristic" (29.21%), "context influence" (18.06%), "characteristic identification difficulty" (8.81%), "arousal-wake confusion" (7.57%), "derivation inconsistence" (2.15%), "on-borderline characteristic" (0.92%), and "misrecognition" (33.27%). CONCLUSIONS: This study demonstrated the sleep stage scoring agreement of the 2014 AASM manual and explored potential sources of labeling ambiguity. Improvement measures were suggested accordingly to help remove ambiguity for scorers and improve scoring reliability at the international level.
Authors: C W Whitney; D J Gottlieb; S Redline; R G Norman; R R Dodge; E Shahar; S Surovec; F J Nieto Journal: Sleep Date: 1998-11-01 Impact factor: 5.849
Authors: Michael H Silber; Sonia Ancoli-Israel; Michael H Bonnet; Sudhansu Chokroverty; Madeleine M Grigg-Damberger; Max Hirshkowitz; Sheldon Kapen; Sharon A Keenan; Meir H Kryger; Thomas Penzel; Mark R Pressman; Conrad Iber Journal: J Clin Sleep Med Date: 2007-03-15 Impact factor: 4.062
Authors: Warren R Ruehland; Fergal J O'Donoghue; Robert J Pierce; Andrew T Thornton; Parmjit Singh; Janet M Copland; Bronwyn Stevens; Peter D Rochford Journal: Sleep Date: 2011-01-01 Impact factor: 5.849
Authors: Heidi Danker-Hopfe; D Kunz; G Gruber; G Klösch; J L Lorenzo; S L Himanen; B Kemp; T Penzel; J Röschke; H Dorn; A Schlögl; E Trenker; G Dorffner Journal: J Sleep Res Date: 2004-03 Impact factor: 3.981
Authors: Heidi Danker-Hopfe; Peter Anderer; Josef Zeitlhofer; Marion Boeck; Hans Dorn; Georg Gruber; Esther Heller; Erna Loretz; Doris Moser; Silvia Parapatics; Bernd Saletu; Andrea Schmidt; Georg Dorffner Journal: J Sleep Res Date: 2009-03 Impact factor: 3.981
Authors: Maurice Abou Jaoude; Haoqi Sun; Kyle R Pellerin; Milena Pavlova; Rani A Sarkis; Sydney S Cash; M Brandon Westover; Alice D Lam Journal: Sleep Date: 2020-11-12 Impact factor: 5.849
Authors: Jae Hoon Cho; Ji Ho Choi; Ji Eun Moon; Young Jun Lee; Ho Dong Lee; Tae Kyoung Ha Journal: Medicina (Kaunas) Date: 2022-06-09 Impact factor: 2.948