Sabrina L Wendt1, Peter Welinder2, Helge B D Sorensen3, Paul E Peppard4, Poul Jennum5, Pietro Perona2, Emmanuel Mignot6, Simon C Warby7. 1. Center for Sleep Science and Medicine, Stanford University, Palo Alto, CA, United States; Danish Center for Sleep Medicine, Glostrup University Hospital, DK-2600 Glostrup, Denmark. 2. Computational Vision Laboratory, California Institute of Technology, Pasadena, CA, United States. 3. Dept. of Electrical Engineering, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark. 4. Department of Population Health Sciences, University of Wisconsin - Madison, Madison, WI, United States. 5. Danish Center for Sleep Medicine, Glostrup University Hospital, DK-2600 Glostrup, Denmark. 6. Center for Sleep Science and Medicine, Stanford University, Palo Alto, CA, United States. 7. Center for Sleep Science and Medicine, Stanford University, Palo Alto, CA, United States; Center for Advanced Research in Sleep Medicine, Hôpital du Sacré-Coeur de Montréal, Department of Psychiatry, Université de Montréal, Montréal, Canada. Electronic address: simon.c.warby@umontreal.ca.
Abstract
OBJECTIVES: To measure the inter-expert and intra-expert agreement in sleep spindle scoring, and to quantify how many experts are needed to build a reliable dataset of sleep spindle scorings. METHODS: The EEG dataset was comprised of 400 randomly selected 115s segments of stage 2 sleep from 110 sleeping subjects in the general population (57±8, range: 42-72 years). To assess expert agreement, a total of 24 Registered Polysomnographic Technologists (RPSGTs) scored spindles in a subset of the EEG dataset at a single electrode location (C3-M2). Intra-expert and inter-expert agreements were calculated as F1-scores, Cohen's kappa (κ), and intra-class correlation coefficient (ICC). RESULTS: We found an average intra-expert F1-score agreement of 72±7% (κ: 0.66±0.07). The average inter-expert agreement was 61±6% (κ: 0.52±0.07). Amplitude and frequency of discrete spindles were calculated with higher reliability than the estimation of spindle duration. Reliability of sleep spindle scoring can be improved by using qualitative confidence scores, rather than a dichotomous yes/no scoring system. CONCLUSIONS: We estimate that 2-3 experts are needed to build a spindle scoring dataset with 'substantial' reliability (κ: 0.61-0.8), and 4 or more experts are needed to build a dataset with 'almost perfect' reliability (κ: 0.81-1). SIGNIFICANCE: Spindle scoring is a critical part of sleep staging, and spindles are believed to play an important role in development, aging, and diseases of the nervous system.
OBJECTIVES: To measure the inter-expert and intra-expert agreement in sleep spindle scoring, and to quantify how many experts are needed to build a reliable dataset of sleep spindle scorings. METHODS: The EEG dataset was comprised of 400 randomly selected 115s segments of stage 2 sleep from 110 sleeping subjects in the general population (57±8, range: 42-72 years). To assess expert agreement, a total of 24 Registered Polysomnographic Technologists (RPSGTs) scored spindles in a subset of the EEG dataset at a single electrode location (C3-M2). Intra-expert and inter-expert agreements were calculated as F1-scores, Cohen's kappa (κ), and intra-class correlation coefficient (ICC). RESULTS: We found an average intra-expert F1-score agreement of 72±7% (κ: 0.66±0.07). The average inter-expert agreement was 61±6% (κ: 0.52±0.07). Amplitude and frequency of discrete spindles were calculated with higher reliability than the estimation of spindle duration. Reliability of sleep spindle scoring can be improved by using qualitative confidence scores, rather than a dichotomous yes/no scoring system. CONCLUSIONS: We estimate that 2-3 experts are needed to build a spindle scoring dataset with 'substantial' reliability (κ: 0.61-0.8), and 4 or more experts are needed to build a dataset with 'almost perfect' reliability (κ: 0.81-1). SIGNIFICANCE: Spindle scoring is a critical part of sleep staging, and spindles are believed to play an important role in development, aging, and diseases of the nervous system.
Authors: Heidi Danker-Hopfe; D Kunz; G Gruber; G Klösch; J L Lorenzo; S L Himanen; B Kemp; T Penzel; J Röschke; H Dorn; A Schlögl; E Trenker; G Dorffner Journal: J Sleep Res Date: 2004-03 Impact factor: 3.981
Authors: Magdy Younes; Samuel T Kuna; Allan I Pack; James K Walsh; Clete A Kushida; Bethany Staley; Grace W Pien Journal: J Clin Sleep Med Date: 2018-02-15 Impact factor: 4.062
Authors: Daniel J Levendowski; Luigi Ferini-Strambi; Charlene Gamaldo; Mindy Cetel; Robert Rosenberg; Philip R Westbrook Journal: J Clin Sleep Med Date: 2017-06-15 Impact factor: 4.062
Authors: Julie A E Christensen; Miki Nikolic; Simon C Warby; Henriette Koch; Marielle Zoetmulder; Rune Frandsen; Keivan K Moghadam; Helge B D Sorensen; Emmanuel Mignot; Poul J Jennum Journal: Front Hum Neurosci Date: 2015-05-01 Impact factor: 3.169