| Literature DB >> 35853958 |
Harlin Lee1, Boyue Li1, Shelly DeForte2, Mark L Splaingard2, Yungui Huang2, Yuejie Chi3, Simon L Linwood4.
Abstract
Despite being crucial to health and quality of life, sleep-especially pediatric sleep-is not yet well understood. This is exacerbated by lack of access to sufficient pediatric sleep data with clinical annotation. In order to accelerate research on pediatric sleep and its connection to health, we create the Nationwide Children's Hospital (NCH) Sleep DataBank and publish it at Physionet and the National Sleep Research Resource (NSRR), which is a large sleep data common with physiological data, clinical data, and tools for analyses. The NCH Sleep DataBank consists of 3,984 polysomnography studies and over 5.6 million clinical observations on 3,673 unique patients between 2017 and 2019 at NCH. The novelties of this dataset include: (1) large-scale sleep dataset suitable for discovering new insights via data mining, (2) explicit focus on pediatric patients, (3) gathered in a real-world clinical setting, and (4) the accompanying rich set of clinical data. The NCH Sleep DataBank is a valuable resource for advancing automatic sleep scoring and real-time sleep disorder prediction, among many other potential scientific discoveries.Entities:
Mesh:
Year: 2022 PMID: 35853958 PMCID: PMC9296671 DOI: 10.1038/s41597-022-01545-6
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 8.501
The distribution of 3,673 unique patients’ races.
| Race description | Count | Percentage |
|---|---|---|
| White | 2,433 | 66.24% |
| Black or African American | 738 | 20.09% |
| Multiple races | 277 | 7.54% |
| Asian | 93 | 2.53% |
| Others and unknown | 132 | 3.59% |
| Total | 3,673 | 100% |
Fig. 1Age at the time of sleep study, where 20 patients that are more than 30 years old are not shown.
Fig. 2Length of care at NCH before and after first sleep study, where each patient has two entries: one negative for length of care prior to first sleep study (in years), and one positive for follow up after first sleep study (in days). One entry above 1200 days and 5 entries below −25 years are not shown.
Fig. 3Visual verification that a randomly chosen 30-second segment of sleep data on Natus Sleepworks (top) matches the sleep data in the corresponding EDF file (bottom), especially at the region of interest marked by red box. Natus Sleepworks may denoise or auto-scale some signals for the viewer.
List of 33 most common channels and their frequencies in 3,984 EDF files.
| Channel name | Count | Percentage | |
|---|---|---|---|
| EEG C3-M2 | 3,971 | 99.67% | |
| EEG O1-M2 | 3,971 | 99.67% | |
| EEG O2-M1 | 3,971 | 99.67% | |
| EEG CZ-O1 | 3,971 | 99.67% | |
| RATE | 3,970 | 99.65% | |
| ETCO2 | 3,970 | 99.65% | |
| CAPNO | 3,970 | 99.65% | |
| RESP RATE | 3,970 | 99.65% | |
| SPO2 (2,819) or OSAT (1,152) | 3,970 | 99.65% | |
| EEG F3-M2 | 3,969 | 99.62% | |
| RESP THORACIC (2,821) or RESP CHEST (1,148) | 3,969 | 99.62% | |
| RESP ABDOMINAL (2,821) or RESP ABDOMEN (1,148) | 3,969 | 99.62% | |
| SNORE | 3,968 | 99.60% | |
| EEG C4-M1 | 3,962 | 99.45% | |
| EEG F4-M1 | 3,960 | 99.40% | |
| C-FLOW | 3,943 | 98.97% | |
| EOG LOC-M2 | 3,933 | 98.72% | |
| EOG ROC-M1 | 3,931 | 98.67% | |
| EMG CHIN1-CHIN2 | 3,782 | 94.93% | |
| PRESSURE | 2,824 | 70.88% | |
| EMG LLEG-RLEG | 2,820 | 70.78% | |
| ECG EKG2-EKG | 2,820 | 70.78% | |
| RESP AIRFLOW | 2,820 | 70.78% | |
| TIDAL VOL | 2,818 | 70.73% | |
| RESP PTAF | 2,817 | 70.71% | |
| PATIENT EVENT | 2,722 | 68.32% | |
| TCCO2 | 1,417 | 35.57% | |
| SNORE_DR | 1,148 | 28.82% | |
| XFLOW | 1,148 | 28.82% | |
| EMG LLEG + -LLEG- | 1,146 | 28.77% | |
| EMG RLEG + -RLEG- | 1,146 | 28.77% | |
| ECG LA-RA | 1,146 | 28.77% | |
| FLOW_DR | 1,146 | 28.77% | |
| RESP FLOW | 1,146 | 28.77% | |
| C-PRESSURE | 1,146 | 28.77% | |
| EEG CHIN1-CHIN2 | 136 | 3.41% |
Other 101 channels appear in less than 1% of the files. Brief descriptions are included for channels that are not measuring EEG, EOG, or EMG. CO2 is carbon dioxide, PAP is positive airway pressure, CPAP is continuous PAP, and PTAF is pressure transducer.
Example annotations from a .tsv file.
| Onset | Duration | Description |
|---|---|---|
| 15985.234375 | 0.0 | Chewing motion |
| 15990.93359375 | 30.0 | Sleep stage W |
| 16002.09375 | 0.0 | Movement |
| 16002.34375 | 1.21875 | Limb Movement |
“Chewing motion” and “Movement” are free text entries by the NCH technician, while “Limb Movement” is a standard sleep event labeled by Natus Sleepworks.
The variable names and number of observations for each patient data file in Health_Data. More details about the variables can be found in Sleep_Study_Data_File_Format.pdf in the same folder.
| File name | Variable names | Rows |
|---|---|---|
| DEMOGRAPHIC.csv | study pat ID, birth date, pcori gender cd, pcori race cd, pcori hispanic cd, gender descr, race descr, ethnicity descr, language descr, peds gest age num weeks, peds gest age num days | 3,673 |
| SLEEP_STUDY.csv | study pat ID, sleep study ID, sleep study start datetime, sleep study duration datetime, age at sleep study days | 3,984 |
| SLEEP_ENC_ID.csv | study pat ID, sleep study ID, study enc ID | 3,964 |
| ENCOUNTER.csv | study enc ID, study pat ID, encounter date, visit start datetime, visit end datetime, adt arrival datetime, ed departure datetime, encounter type, visit type cd, visit type descr, ICU visit Y/N, prov ID, prov type, dept ID, dept specialty, admit source, hosp admit source, discharge disposition, discharge destination, drg code, drg name, visit reason | 495,138 |
| MEDICATION.csv | study med ID, study enc ID, study pat ID, med start datetime, med end datetime, med order datetime, med taken datetime, med source type, quantity, days supply, frequency, effective drug dose, eff drug dose source value, drug dose unit, refills, RxNorm code, RxNorm term type, medication descr, generic drug descr, drug order status, drug action, route, route source value, prescribing prov ID, pharm class, pharm subclass, thera class, thera subclass | 3,035,986 |
| MEASUREMENT.csv | study meas ID, study pat ID, study enc ID, meas recorded datetime, meas type, meas value number, meas value text, meas source, study prov ID | 332,569 |
| DIAGNOSIS.csv | study dx ID, study enc ID, study pat ID, dx start datetime, dx end datetime, dx source type, dx enc type, dx code type, dx code, dx name, dx alt code, class of problem, chronic Y/N, prov ID | 1,513,853 |
| PROCEDURE.csv | study proc ID, study pat ID, study enc ID, procedure datetime, study prov ID, proc ID NCH, proc code, proc code type, proc descr | 283,599 |
| PROCEDURE_SURG_HX.csv | study surghx ID, study pat ID, proc noted date, proc start time, proc end time, proc code, cpt code, proc descr | 10,190 |
20 diagnoses that are given to the highest number of unique patients in the NCH Sleep DataBank according to DIAGNOSIS.csv.
| Diagnosis | ICD 10 code | Patients, |
|---|---|---|
| Sleep disorders | G47 | 3,379 |
| Sleep apnea | G47.3 | 2,558 |
| Sleep disorder, unspecified | G47.9 | 1,163 |
| Other sleep disorders | G47.8 | 914 |
| Circadian rhythm sleep disorders | G47.2 | 566 |
| Insomnia | G47.0 | 388 |
| Hypersomnia | G47.1 | 257 |
| Sleep related movement disorders | G47.6 | 180 |
| Parasomnia | G47.5 | 165 |
| Narcolepsy and cataplexy | G47.4 | 47 |
| Abnormalities of breathing | R06 | 2,776 |
| Encounter for immunization | Z23 | 1,720 |
| Chronic diseases of tonsils and adenoids | J35 | 1,686 |
| Encounter for general examination without complaint, suspected or reported diagnosis | Z00 | 1,587 |
| Acute upper respiratory infections of multiple and unspecified sites | J06 | 1,537 |
| Body mass index (BMI) | Z68 | 1,417 |
| Suppurative and unspecified otitis media | H66 | 1,378 |
| Symptoms and signs concerning food and fluid intake | R63 | 1,369 |
| Acute pharyngitis | J02 | 1,260 |
| Other symptoms and signs involving the circulatory and respiratory system | R09 | 1,256 |
| Other functional intestinal disorders | K59 | 1,185 |
| Cough | R05 | 1,176 |
| Lack of expected normal physiological development in childhood and adults | R62 | 1,097 |
| Encounter for follow-up examination after completed treatment for conditions other than malignant neoplasm | Z09 | 1,068 |
| Nausea and vomiting | R11 | 1,051 |
| Fever of other and unknown origin | R50 | 1,043 |
| Specific developmental disorders of speech and language | F80 | 1,002 |
| Asthma | J45 | 991 |
| Gastro-esophageal reflux disease | K21 | 982 |
Note that the diagnoses were abstracted to a higher level before being counted. For example, patients with diagnosis “G47.33 Obstructive sleep apnea (adult) (pediatric)” were counted under G47 and G47.3.
Sleep stage classification results of our baseline algorithm applied to different age groups.
| Automated score sleep stage | ||||||
|---|---|---|---|---|---|---|
| W | N1 | N2 | N3 | R | ||
| Manual score sleep stage, | W (661,645) | 0. | 34.0 | 1.5 | 1.4 | |
| N1 (127,602) | 23.9 | 68.1 | 2.1 | 5.0 | ||
| N2 (1,375,678) | 4.4 | 0. | 5.8 | 1.1 | ||
| N3 (871,200) | 1.7 | 0. | 27.2 | 0. | ||
| R (608,180) | 6.7 | 0. | 76.6 | 1.5 | ||
| Manual score sleep stage, | W (52,979) | 0.1 | 8.2 | 0.5 | 1.7 | |
| N1 (8,263) | 37.5 | 47.4 | 0.6 | 12.1 | ||
| N2 (80,275) | 5.6 | 0.1 | 2.9 | 2.3 | ||
| N3 (30,612) | 2.6 | 0. | 18.3 | 0. | ||
| R (24,006) | 9.2 | 0. | 24.7 | 0.6 | ||
| Manual score sleep stage, | W (63,041) | 0. | 2.4 | 2.8 | 11.4 | |
| N1 (4,579) | 28.7 | 24.7 | 6.2 | 39.2 | ||
| N2 (38,525) | 9.4 | 0. | 10.2 | 17.4 | ||
| N3 (64,512) | 4.5 | 0. | 3.7 | 8.5 | ||
| R (60,167) | 11.1 | 0. | 5.0 | 7.1 | ||
(a) All age groups. 3,928 sleep studies and 3,644,305 samples. Overall accuracy is 64.4%.
(b) 18 years and older. 222 sleep studies and 196,135 samples. Overall accuracy is 81.1%.
(c) 0–1 year olds. 242 sleep studies and 230,824 samples. Overall accuracy is 76.6%.
One sample is a 30-second epoch of sleep. Cell (row i, column j) of the normalized confusion matrix indicates the percentage (%) of samples in stage i (manually scored by NCH technician) that were predicted to be in stage j (by our automated algorithm). Each row adds to 100%. Bolded diagonal entries are the percentages of samples in each stage that were correctly classified. Overall accuracy is the total number of correctly classified samples divided by the total number of samples in %. All numbers reported are averaged over 3-fold stratified cross validation trials and rounded to one decimal point. Standard deviation was <1% for all entries except one and not shown here.
Summary statistics of sleep time and distribution of sleep stages for two PSG cohorts.
| Cohort 1 | Cohort 2 | |
|---|---|---|
| PSG, | 16 | 370 |
| Unique patients, | 12 | 311 |
| Age, mean ± s.d. (years) | 10.5 ± 5.6 | 13.2 ± 4.7 |
| Sleep time, mean ± s.d. (hours) | 8.0 ± 0.7 | 7.5 ± 0.9 |
| W, mean ± s.d. (%) | 14.4 ± 7.1 | 20.5 ± 16.1 |
| N1, mean ± s.d. (%) | 4.1 ± 2.7 | 3.5 ± 3.4 |
| N2, mean ± s.d. (%) | 45.2 ± 7.3 | 39.9 ± 11.5 |
| N3, mean ± s.d. (%) | 20.5 ± 6.7 | 21.1 ± 8.5 |
| R, mean ± s.d. (%) | 15.8 ± 6.0 | 15.0 ± 7.3 |
| N1 N2, mean ± s.d. (%) | 49.3 ± 6.7 | 43.4 ± 11.8 |
| N1 N2 N3, mean ± s.d. (%) | 69.8 ± 6.3 | 64.5 ± 13.4 |
Cohort 1: PSGs with OSA diagnoses on PWS patients, Cohort 2: PSGs with OSA diagnoses on obese but not PWS patients; sleep time: total amount of time spent in sleep stages W, N1, N2, N3, and R; s.d.: standard deviation. Percentage of each sleep stage is calculated by dividing time spent in each sleep stage by sleep time. All numbers are rounded to one decimal point.
| Measurement(s) | Polysomnogram • Electronic Health Record • Electroencephalogram Measurement • Electrocardiogram • Electrooculogram • Electromyogram |
| Technology Type(s) | Polysomnography • Electronic Health Record System • Electroencephalography (EEG) • Electrocardiography • Electrooculography • Electromyography |
| Sample Characteristic - Organism | Homo Sapiens |
| Sample Characteristic - Environment | Hospital |
| Sample Characteristic - Location | United States Of America |