Literature DB >> 32180280

Psychometric properties of the Patient-Reported Outcomes Measurement Information System (PROMIS) Sleep Disturbance and Sleep-Related Impairment item banks in adolescents.

Jojanneke A M C van Kooten^1,2, Caroline B Terwee³, Michiel A J Luijten^3,4, Lindsay M H Steur¹, Sigrid Pillen⁵, Nicole G J Wolters⁶, Gertjan J L Kaspers^1,2, Raphaële R L van Litsenburg^1,2.

Abstract

Sleep problems have a high prevalence and negative daytime consequences in adolescents. Current sleep measures for this age group have limitations. The Patient-Reported Outcomes Measurement Information System (PROMIS® ) developed sleep item banks for adults. In a previous validation study, these item banks were adapted to a shortened version for adolescents. The current study aimed to further explore the psychometric properties of the 11-item Sleep-Related Impairment and 23-item Sleep Disturbance item banks in Dutch adolescents. We investigated structural validity by testing item response theory assumptions and model fit; measurement invariance by performing differential item functioning analyses; performance as a computerized adaptive test; reliability by marginal reliability estimates and test-retest reliability (intraclass correlation coefficients and limits of agreement); and construct validity by hypothesis testing. Additionally, we provide mean values for the item banks. The study sample consisted of 1,046 adolescents (mean age 14.3 ± 1.6), including 1,013 high-school students and 33 sleep-clinic patients. The Sleep Disturbance-23 showed lack of unidimensionality, but had sufficient test-retest reliability, and could distinguish between adolescents with and without sleep or health issues. The Sleep-Related Impairment-11 showed sufficient unidimensionality and model fit and was thus tested as a computerized adaptive test, demonstrating an equal amount of reliable measures to the full item bank. Furthermore, the Sleep-Related Impairment-11 could distinguish between adolescents with and without sleep or health issues and test-retest reliability was moderate. The use of both item banks in the full form and the use of the Sleep-related Impairment-11 as a computer adaptive test is recommended.

Entities: Chemical Disease Gene Species

Keywords: paediatric; questionnaire; reproducibility of results; teenager; validation

Year: 2020 PMID： 32180280 PMCID： PMC8047882 DOI： 10.1111/jsr.13029

Source DB: PubMed Journal: J Sleep Res ISSN： 0962-1105 Impact factor: 3.981

INTRODUCTION

Sleep problems and sleep deprivation are common among adolescents: 20%–37% of otherwise healthy adolescents struggle with sleep problems (Paiva, Gaspar, & Matos, 2015; Short, Gradisar, Gill, & Camfferman, 2013; Verkooijen et al., 2018) and 30%–60% do not get the required 8–10 hr of sleep during school nights (National Sleep Foundation, 2014; Paruthi et al., 2016; Short et al., 2013). Compared with adults and younger children, healthy sleep is challenged by unique features during adolescence: extrinsic factors such as social activities and academic demands (with early school start times) interact with physiological changes that cause a tendency towards later bed times and later natural wake‐up times (Moore & Meltzer, 2008; National Sleep Foundation, 2014). This increases adolescents’ sleep debt during the school week and often makes for more irregular sleep patterns, with catch‐up sleep during the weekends. In this phase, adolescents are also discovering autonomy regarding their sleep schedules (Crowley, Wolfson, Tarokh, & Carskadon, 2018; Jakobsson, Josefsson, & Hogberg, 2019). Sleep disorders can further diminish sleep duration, with delayed sleep phase disorder in 5%–16% of healthy adolescents (Carter, Hathaway, & Lettieri, 2014; Moore & Meltzer, 2008) and insomnia in 8%–10% (Amaral, Figueiredo Pereira, Silva Martins, Serpa Cdo, & Sakellarides, 2013; Moore & Meltzer, 2008) as the most common diagnoses. Insufficient sleep during the night is associated with many daytime problems: sleepiness; behavioural problems, including increased risk taking (Carter et al., 2014; Moore & Meltzer, 2008; Verkooijen et al., 2018); and difficulties in emotional regulation, resulting in increased irritability, anxiety, depressive symptoms and self‐harm (Carter et al., 2014; Chaput et al., 2016; Moore & Meltzer, 2008; Paiva et al., 2015; Paruthi et al., 2016). Cognitive function and academic achievement are worse in children with poor sleep (Carter et al., 2014; Moore & Meltzer, 2008; Paruthi et al., 2016), as is physical health, illustrated by an increased presence of hypertension and obesity, and symptoms such as dizziness and headaches (Chaput et al., 2016; Paiva et al., 2015; Paruthi et al., 2016). Sleep problems during adolescence are predictive of sleep problems later in life (Dregan & Armstrong, 2010): early recognition and treatment of sleep problems is therefore important. Given the high prevalence of sleep problems in adolescents, the negative consequences, and the unique sleep features of this age group; it is important to have psychometrically sound measurement instruments for screening and follow‐up, validated in this age group. In contrast to objective measures such as actigraphy and polysomnography, sleep questionnaires are able to capture feelings and cognitions about sleep and the effects of impaired sleep (Moore & Meltzer, 2008). A multitude of sleep questionnaires targeting different sleep constructs is available. Previous reviews identified around 60 sleep questionnaires used in children and adolescents, but none of the generic sleep instruments was adequately validated (Ji & Liu, 2016; Spruyt & Gozal, 2011). The Patient‐Reported Outcomes Measurement Information System (PROMIS®) might offer a solution for the lack of validated sleep questionnaires for adolescents. PROMIS was initiated by six US research institutes and the National Institutes of Health (NIH) and is an international initiative that aimed to standardize questionnaires measuring key health outcomes in research and clinical practice and to increase the relevance of results by facilitating comparison of data. The methodological basis is the use of item response theory (IRT), enabling the creation of item banks that support fixed‐length forms and computer adaptive testing (CAT). A CAT can achieve greater measurement precision with fewer items: participants need to complete only a subset of items instead of the full set, because after the first item, the selection of subsequent items is determined by the participant’s responses to the previous items (Alonso et al., 2013; Cella et al., 2010). The Patient‐Reported Outcomes Measurement Information System developed Sleep Disturbance and Sleep‐Related Impairment item banks for adults through factor and IRT analyses (Buysse et al., 2010). The construct validity of both the full item banks and the short forms was found to be sufficient in adults (Buysse et al., 2010; Yu et al., 2011). We previously started validation of the PROMIS adult sleep item banks for adolescents (van Kooten, Litsenburg, Yoder, Kaspers, & Terwee, 2018; van Kooten, Terwee, Kaspers, & Litsenburg, 2016), because sleep item banks for children (Bevans et al., 2018; Forrest et al., 2018) were not yet developed. Since then, multiple studies have used the PROMIS adult sleep item banks in young adults and adolescents (Bian et al., 2017; Hanish, Lin‐Dyken, & Han, 2017; Levenson et al., 2017). The Dutch‐Flemish versions of the PROMIS v1.0 adult sleep item banks showed adequate content validity in adolescents (van Kooten et al., 2016), meaning the items were considered relevant and comprehensible for adolescents, parents and sleep experts and no key issues were considered missing. Additional psychometric evaluation in a community sample of over 1,000 Dutch adolescents, however, showed that the one‐factor models found in adults could not be replicated. Thus, the items used for adolescents did not reflect the same single construct measured in adults and the item banks were not unidimensional. Adaptation of the item banks to improve the unidimensionality needed for IRT analyses resulted in a shortened version of the Sleep‐Related Impairment item bank (11 instead of 16 items) with adequate fit (comparative fit index [CFI] 0.98) and a shortened version of the Sleep Disturbance item bank (23 instead of 27 items) with fit indices just below the recommended value (CFI 0.90, recommended value > 0.95) (van Kooten et al., 2018). The current study aims to further explore the psychometric properties of the adult version of the 11‐item PROMIS v1.0 Sleep‐Related Impairment and 23‐item PROMIS v1.0 Sleep Disturbance item bank in adolescents. We evaluated structural validity, measurement invariance, performance as a CAT, reliability and construct validity, and additionally provide mean values for these item banks in adolescents.

METHODS

Participants and procedures

Test sample

A community sample of adolescents was recruited from seven randomly selected high schools in the Netherlands. Schools from all educational levels and from different regions were included. Most Dutch high‐school students are aged 12–18 years, but exceptions of 11 or 19 years do exist and were included in the current study. Adolescents were asked to fill out the questionnaires during regular class hours. During this administration, the author JvK was present in the classroom to supervise the procedure and provide assistance if necessary. In one school where online entry was not possible due to lack of digital resources, paper versions were distributed. Adolescents (11–19 years) with sleep problems were recruited from four outpatient (sleep) clinics. Adolescents with any type of sleep problem were eligible. They were invited to participate during their first visit to the clinic. They received a study package containing paper versions of the study questionnaires, including questions on the type of sleep problems they experienced. Exclusion criteria for both samples were any impairments that precluded filling out the questionnaires independently. The study was approved by the Institutional Review Board of the VU University Medical Center Amsterdam.

Retest sample

All participants from the high‐school sample were invited to participate in the retest study. Participants who were interested could apply by providing their Email address. Two weeks after the first entry, a link to the repeat questionnaire was send via Email.

Measures

Sociodemographic variables

Descriptive data were collected on gender, age, educational level and country of birth. In addition, all participants were asked to report on current use of medication and on health issues, specifically diagnosis of autism spectrum disorder (ASD) and attention deficit (hyperactivity) disorder (ADHD/ ADD), as these diagnoses are common and associated with sleep problems (Becker, Langberg, Eadeh, Isaacson, & Bourchtein, 2019; Richdale & Schreck, 2009).

PROMIS Item Banks

This study used the shortened versions of the adult Dutch‐Flemish PROMIS Sleep Disturbance and Sleep‐Related Impairment v1.0 item banks that were adapted for adolescents (van Kooten et al., 2018). The item banks aim to gain a general overview of the subjects’ perception of sleep problems and how these problems hinder daily functioning. The original PROMIS Sleep Disturbance item bank contains 27 items that are reflective of insomnia‐like symptoms. It assesses one's perception of sleep quality and restoration associated with sleep, perceived sleep difficulties and concerns about falling and staying asleep, and perceptions of adequate and satisfactory sleep (Buysse et al., 2010). Our previous study on structural validity resulted in an adapted Sleep Disturbance item bank with 23 items with better fit in adolescents. We removed items Sleep20 (I had a problem with my sleep), Sleep106 (My sleep was light), Sleep108 (My sleep was restless) and Sleep125 (I felt lousy when I woke up) from the original item bank (van Kooten et al., 2018). The original PROMIS Sleep‐Related Impairment item bank consists of 16 items that are related to sleepiness, fatigue and cognitive difficulties during waking hours. In addition, Sleep‐Related Impairment items assess perceptions of functional impairment during waking hours that are associated with sleep problems or impaired alertness (Buysse et al., 2010; National Institutes of Health, 2015). Our previous study on structural validity resulted in an adapted Sleep‐Related Impairment item bank with 11 items, with better fit in adolescents. We removed items Sleep4 (I had enough energy), Sleep119 (I felt alert when I woke up), Sleep120 (When I woke up I felt ready to start the day), Sleep123 (I had difficulty waking up) and Sleep124 (I still felt sleepy when I woke up) from the original item bank (van Kooten et al., 2018). All sleep disturbance and sleep‐related impairment items were measured on a 5‐point Likert scale (1 = not at all or never; 2 = a little bit or rarely; 3 = somewhat or sometimes; 4 = quite a bit or often; 5 = very much or always). The answers were indicative of how frequently respondents had experienced problems related to sleep in the last 7 days. The official HealthMeasures scoring service tool (https://www.assessmentcenter.net/ac_scoringservice/) was used to calculate T‐scores using the US calibration parameters for all participants who filled out at least one item. T‐scores are anchored on the US general population, with a mean of 50 and a standard deviation of 10. Higher scores indicate more sleep disturbances or more sleep‐related impairment.

Statistical analyses

Item‐level descriptives

For each item the median and mean response category was calculated separately for the high‐school sample and the sleep‐clinic sample.

Structural validity

Psychometric analyses of the Sleep Disturbance‐23 and the Sleep‐Related Impairment‐11 baseline data were conducted according to the PROMIS analyses plan (Reeve et al., 2007). The Graded Response Model (GRM) was estimated with marginal maximum likelihood (MML). The Graded Response Model is an IRT model for ordinal items. An IRT model requires that three assumptions are met: unidimensionality, local independence and monotonicity. Table 1 provides further explanation of and criteria for the investigated IRT‐model assumptions and fit.

Table 1

Item Response Theory (IRT) analyses

Investigated property		Explanation	Criteria for acceptable values	Sleep Disturbance−23	Sleep‐Related Impairment−11
Assumptions IRT model	Unidimensionality	A person’s response to an item should be accounted for by the amount of the construct measured (sleep disturbance and sleep‐related impairment) and not by other factors, so all items in the item bank need to reflect the same construct. With CFA we first test how well a unidimensional model fits our data. Bi‐factor analysis and EFA were performed if CFA showed poor fit. Bi‐factor analysis tests if items load on randomly added factors in addition to the general factor (the construct we wish to measure). EFA tests how much of the variance in the data is explained by the first factor (the construct we wish to measure).	CFA Factor loadings >0.50 CFI >0.95 TLI >0.95 RMSEA < 0.06 SRMR < 0.08	20/23 items >0.50 0.80 0.78 0.14 0.11	All items >0.50 0.96 0.94 0.15 0.06
			Bi‐factor analysis Factor loading general factor (G) > random factor (F) Omega‐H (% explained by G) > 0.80 ECV (ratio between explained variance by G and F) > 0.80	14/23 items G > F 0.68 0.54	10/11 items G > F 0.86 0.79
			EFA
			Explained variance by first factor >20%	40%	64%
			Ratio between first and second factor >4	5.9	15.3
	Local independence	Items should be independent of each other, when controlled for the construct measured.	Residual correlations ≤0.20	11 pairs (4.3%) correlation >0.20	1 pair (1.8%) correlation >0.20
	Monotonicity	The probability of selecting an item response indicative of more problems should increase with an increasing level of the construct. This is tested with Mokken scale analysis, which assesses scalability, the possibility of locating the item on a scale.	Scalability coefficient H (scale) > 0.50 Scalability coefficient H_i (item) > 0.30	0.37 (0.01) 20/23 items >0.30	0.59 (0.02) All items >0.30
IRT‐model fit		IRT models describe, in probabilistic terms, the relationship between a person’s response to an item and the level of construct measured by the total item bank. With S‐X^2, the differences between observed and expected response frequencies are qualified.	S‐X² ≥ 0.001	All items >0.001	All items >0.001
Differential Item Functioning (DIF)		This questions if items perform differently across groups (measurement invariance), here Dutch adolescents and US adults. Uniform DIF means the magnitude is similar for all levels of construct; non‐uniform DIF means this differs between different (higher/lower) levels of construct.	Change in McFadden R ₂ ≤ .02	3/23 items uniform DIF, no non‐uniform DIF	No uniform or non‐uniform DIF

Abbreviations: CFA, confirmatory factor analysis; CFI, Comparative Fit Index; EFA, exploratory factor analysis; ECV, explained common variance; F, random factor; G, general factor; GRM, Graded Response Model; RMSEA, root mean square error of approximation; SRMR, standardized root mean residuals; TLI, Tucker‐Lewis Index.

Item Response Theory (IRT) analyses A person’s response to an item should be accounted for by the amount of the construct measured (sleep disturbance and sleep‐related impairment) and not by other factors, so all items in the item bank need to reflect the same construct. With CFA we first test how well a unidimensional model fits our data. Bi‐factor analysis and EFA were performed if CFA showed poor fit. Bi‐factor analysis tests if items load on randomly added factors in addition to the general factor (the construct we wish to measure). EFA tests how much of the variance in the data is explained by the first factor (the construct we wish to measure). CFA Factor loadings >0.50 CFI >0.95 TLI >0.95 RMSEA < 0.06 SRMR < 0.08 20/23 items >0.50 0.80 0.78 0.14 0.11 All items >0.50 0.96 0.94 0.15 0.06 Bi‐factor analysis Factor loading general factor (G) > random factor (F) Omega‐H (% explained by G) > 0.80 ECV (ratio between explained variance by G and F) > 0.80 14/23 items G > F 0.68 0.54 10/11 items G > F 0.86 0.79 EFA Scalability coefficient H (scale) > 0.50 Scalability coefficient Hi (item) > 0.30 0.37 (0.01) 20/23 items >0.30 0.59 (0.02) All items >0.30 Abbreviations: CFA, confirmatory factor analysis; CFI, Comparative Fit Index; EFA, exploratory factor analysis; ECV, explained common variance; F, random factor; G, general factor; GRM, Graded Response Model; RMSEA, root mean square error of approximation; SRMR, standardized root mean residuals; TLI, Tucker‐Lewis Index.

Measurement invariance

Differential item functioning (DIF) between our sample and the PROMIS 2 sleep–wake sample was assessed. This sample was used to develop the item banks and consists of 1,993 adults from the general population and 259 adults recruited from medical, psychiatric or sleep clinics. Their mean age was 51.2 (15.9) years (range 18–88); 52% was male (Buysse et al., 2010). These data were available from HealthMeasures Dataverse (https://dataverse.harvard.edu/dataverse/HealthMeasures.).

Post‐hoc CAT simulation

A post‐hoc CAT simulation was performed using the item parameters estimated in our study sample. This is a simulation based on real responses from participants, where for each new item selected, the item that can give the most information (based on item parameters) is presented to the individual. This analysis was only performed if IRT‐model assumptions were met and participants with missing data were excluded. The algorithm was set to administer a minimum of one item and to stop administration when the reliability of the participants T‐score was above 0.90 (standard error of the mean [SEM] <0.32) or all items had been used (Wainer et al., 2000). The number of participants that reached a reliable score was compared between the CAT and the full‐length Sleep Disturbance‐23 or Sleep‐Related Impairment‐11.

Reliability

If IRT assumptions were met and adequate model fit was found (section 2), marginal reliability estimates were plotted, showing the standard error of theta across the scale. Test–retest reliability was examined by calculating intraclass correlation coefficients (ICC) and limits of agreement (LoA). For the ICC a two‐way random effects model for absolute agreement was used. The LoA was calculated as the mean difference between the test and retest T‐score ± 1.96*SD of this mean; 95% of differences are located between the upper and lower LoA. ICC and LoA were interpreted following the COnsensus‐based Standards for the selection of health Measurement INstruments (COSMIN) guidelines: an ICC of ≥0.70 is considered sufficient reliability and the LoA should be smaller than the minimal important change (MIC) (Prinsen et al., 2018).

Construct validity

To determine construct validity, we assessed the difference in T‐scores between groups, testing four hypotheses about the ability of the PROMIS item banks to distinguish between these groups. In line with COSMIN guidelines, construct validity is considered sufficient when ≥75% of the results are in accordance with the hypotheses (Prinsen et al., 2018). We expected PROMIS T‐scores to be higher (worse) in (a) the sleep‐clinic sample compared with the high‐school sample, (b) the adolescents with a high risk of sleep problems compared with healthy high school students, and (c) high‐school students with health issues compared with healthy high‐school students. Adolescents with a high risk of sleep problems included the sleep‐clinic sample and high‐school students with relevant health issues that were associated with a higher probability of sleep problems. Health issues included self‐reported sleep difficulties, use of sleep medication, chronic health problems and/or use of medication associated with sleep problems (such as ADHD (Becker et al., 2019) and ASD (Richdale & Schreck, 2009), other psychiatric conditions such as depression and anxiety (Baddam, Canapari, Noordt, & Crowley, 2018), medication prescribed for the prevously mentioned conditions or strong pain medication such as opioids). A mean difference of ≥2 points, with a higher (worse) score in the clinical or health issues sample, was considered clinically relevant (Lee et al., 2017). The fourth hypothesis was that T‐scores would worsen (increase) with more problems (higher response category) on item Sleep20 (I had a sleep problem). This item is included in the original PROMIS Sleep Disturbance item bank, but not in the Sleep Disturbance‐23. For the fourth hypothesis we merged the last two response categories (‘Quite a bit’ and ‘Very much’) because the separate groups were too small. Differences in T‐scores between groups were evaluated using linear regression analysis, with correction for relevant demographic variables based on the results from the analyses of the mean values (section 3.8).

Mean T‐scores for adolescents from the general population

Mean T‐scores were calculated for all high‐school students. In addition, we compared mean T‐scores between boys and girls, adolescents with low (lower general secondary education/intermediate vocational education) and high (higher general secondary education) educational level as a reflection of socioeconomic status, and younger (11–14) and older (15–19) aged adolescents, because these are factors that are associated with sleep quality and/or quantity. We expected scores to be higher (worse) in girls (Galland et al., 2017; Paiva et al., 2015), adolescents with low educational level (Moore et al., 2011) and older adolescents (Crowley et al., 2018; Moore et al., 2011). A mean difference of ≥ 2 points was considered clinically relevant (Lee et al., 2017). Analyses 3.2 to 3.5 were carried out using R; analyses 3.1 and 3.5 to 3.8 were carried out using SPSS 24.

RESULTS

Participants

In total, 1,046 adolescents provided valid data, including 1,013 high‐school students and 33 adolescents from the sleep clinics. Sample characteristics are summarized in Table 2. Almost half of the total sample consisted of boys; this is comparable to the general population. In the general Dutch population, 61% of adolescents receives high‐level education (CBS [Dutch central bureau for statistics], 2019); this percentage was higher in our high‐school sample (81%) and lower in the sleep‐clinic sample (38%). As expected, the percentages of adolescents with ASD and ADHD were higher in the sleep‐clinic sample than in the high‐school sample (ASD 24% versus 5% and ADHD 18% versus 5%, respectively), the percentage of adolescents in the Dutch general population with ASD is 2.0% and with ADHD 6.9% (van Hal, Rooijen, & Hoff, 2019). In the sleep‐clinic sample 82% had a problem with initiating and maintaining sleep, 18% had parasomnias, 18% had delayed sleep phase disorder, 12% had obstructive sleep apnea and 9% had excessive daytime sleepiness; 39% of the sleep clinic sample experienced multiple sleep problems at the time of inclusion.

Table 2

Participant characteristics

Characteristic	High‐school sample	Healthy adolescents ^b	High‐school sample with health issues	Sleep‐clinic sample	Sleep‐clinic + high‐school sample with health issues	Retest sample
n	1,013	920	93	33	126	114
Age [mean (SD); range]	14.3 (1.6); 11–19	14.3 (1.6); 11–19	14.4 (1.6); 12–19	14.8 (1.9); 11–18	14.5 (1.7); 11–19	14.7 (1.5); 11–19
Gender (% boys)	48.4	46.2	69.9	41.9	62.9	33.3
Country of birth (% Netherlands)	94.2	94.5	91.4	100	93.5	93.0
Educational level (% high) ^a	81.4	81.4	81.7	37.5	70.4	94.7
ASD (% yes)	4.4	0.0	48.8	24.4	42.1	4.4
ADHD (% yes)	4.5	0.0	49.5	18.2	41.3	1.8
T‐score Sleep Disturbance−23 [mean (SD); range]	47.3 (7.0); 25.9–70.8	47.1 (6.8); 25.9–70.8	49.8 (8.4); 25.9–68.8	57.9 (8.8); 36.1–79.9	51.9 (9.2); 25.9–79.9	46.3 (7.0); 30.9–70.7
T‐score Sleep‐ Related Impairment−11 [mean (SD); range]	48.6 (9.6); 31.1–82.4	48.2 (9.5); 31.1–82.4	51.7 (10.2); 31.1–71.7	58.7 (12.1); 31.1–82.4	53.5 (11.1); 31.1–82.4	47.1 (9.6); 31.1–70.6

Abbreviations: ADHD, attention deficit hyperactivity disorder; ASD, autism spectrum disorder.

Low = lower general secondary education/intermediate vocational education; high = higher/A‐level general secondary education.

Excluding children with medical/psychiatric conditions (ASD, ADHD and other psychiatric conditions [e.g., depression and anxiety]) or medications (medication prescribed for previously mentioned conditions, sleep medication and strong pain medication such as tramadol).

Participant characteristics Abbreviations: ADHD, attention deficit hyperactivity disorder; ASD, autism spectrum disorder. Low = lower general secondary education/intermediate vocational education; high = higher/A‐level general secondary education. Excluding children with medical/psychiatric conditions (ASD, ADHD and other psychiatric conditions [e.g., depression and anxiety]) or medications (medication prescribed for previously mentioned conditions, sleep medication and strong pain medication such as tramadol). Of the 1,013 included high‐school students, 372 provided their Email address and 114 (11%) completed the PROMIS item banks again after 2 weeks. Compared with non‐responders (n = 899), responders (n = 114) to the retest were more often girls (50% versus 66%, respectively, p < .01) and they more often received high‐level education (80% versus 93%, respectively, p < .01). Responders and non‐responders did not differ significantly in age or baseline T‐scores.

Item‐level descriptives

Table 3 provides item‐level descriptives. For the high‐school sample, the median response category ranged from 1 (‘Not at all’ or ‘Never’) to 2 (‘A little bit’ or ‘Rarely’), whereas the medians in the sleep‐clinic sample ranged from 1 to 4 (‘Quite a bit’ or ‘Often’). The percentage of missing responses ranged from 0.0% to 4.7% per item in the high‐school sample and from 0.0% to 9.1% per item in the sleep‐clinic sample. This is likely to be due to the fact that skipping items was not possible in the online version used in the majority of high‐school participants, whereas sleep clinic patients filled out paper questionnaires. The two items with 9.1% missing answers in the sleep‐clinic sample had 0.8 and 2.2% missing in the high‐school sample; thus there does not seem to be a systematic problem with these items. All 23 Sleep Disturbance items were filled out by 93% of adolescents and all 11 Sleep‐Related Impairment items by 95%.

Table 3

Item‐level descriptive statistics

Items	High‐school sample (n = 1,013)			Sleep‐clinic sample (n = 33)
Items	Median	Mean (SD)	Missing (%)	Median	Mean (SD)	Missing (%)
Sleep Disturbance‐23
Sleep105: My sleep was restful.	2	2.4 (1.0)	0 (0.0)	4	3.8 (1.0)	0 (0.0)
Sleep107: My sleep was deep.	2	2.5 (1.1)	2 (0.2)	4	3.4 (1.4)	0 (0.0)
Sleep109: My sleep quality was …	2	2.2 (0.8)	2 (0.2)	4	3.6 (0.8)	0 (0.0)
Sleep110: I got enough sleep.	2	2.5 (0.9)	2 (0.2)	3	3.5 (0.9)	0 (0.0)
Sleep115: I was satisfied with my sleep.	2	2.6 (1.1)	2 (0.2)	4	3.9 (1.1)	0 (0.0)
Sleep116: My sleep gave me new energy.	2	2.5 (1.1)	47 (4.7)	4	3.8 (1.0)	1 (3.0)
Sleep42: It was easy for me to fall asleep.	2	2.6 (1.1)	9 (0.9)	4	3.8 (1.4)	0 (0.0)
Sleep44: I had difficulty falling asleep.	2	2.2 (1.2)	12 (1.2)	4	3.7 (1.5)	0 (0.0)
Sleep45: I laid in bed for hours waiting to fall asleep.	2	2.1 (1.1)	10 (1.0)	4	3.4 (1.5)	1 (3.0)
Sleep50: I woke up too early and could not fall back asleep.	2	2.1 (1.1)	9 (0.9)	3	2.6 (1.3)	0 (0.0)
Sleep65: I felt physically tense at bedtime.	1	1.7 (1.0)	9 (0.9)	2	2.1 (1.4)	1 (3.0)
Sleep67: I worried about not being able to fall asleep.	1	1.5 (0.9)	8 (0.8)	2	2.2 (1.4)	0 (0.0)
Sleep68: I felt worried at bedtime.	1	1.4 (0.8)	9 (0.9)	1	1.9 (1.3)	0 (0.0)
Sleep69: I had trouble stopping my thoughts at bedtime.	1	2.0 (1.2)	8 (0.8)	2	2.5 (1.5)	3 (9.1)
Sleep70: I felt sad at bedtime.	1	1.3 (0.8)	9 (0.9)	1	1.7 (1.2)	0 (0.0)
Sleep71: I had trouble getting into a comfortable position to sleep.	1	1.8 (1.0)	9 (0.9)	2	2.3 (1.3)	0 (0.0)
Sleep72: I tried to get to sleep.	2	2.1 (1.2)	23 (2.3)	3	3.1 (1.4)	0 (0.0)
Sleep78: Stress disturbed my sleep.	1	1.7 (1.1)	22 (2.2)	2	2.2 (1.3)	0 (0.0)
Sleep86: I tossed and turned at night.	1	1.9 (1.2)	22 (2.2)	3	2.8 (1.4)	3 (9.1)
Sleep87: I had trouble staying asleep at night.	1	1.6 (0.9)	47 (4.7)	3	3.0 (1.3)	0 (0.0)
Sleep90: I had trouble sleeping.	2	1.8 (1.0)	22 (2.2)	4	3.3 (1.3)	1 (3.0)
Sleep92: I woke up and had trouble falling back to sleep.	2	2.1 (1.2)	22 (2.2)	3	3.1 (1.5)	1 (3.0)
Sleep93: I was afraid I would not get back to sleep after waking up.	1	1.5 (0.9)	22 (2.2)	2	2.2 (1.4)	2 (6.1)
Sleep‐related Impairment‐11
Sleep6: I was sleepy during the daytime.	2	2.4 (1.1)	22 (2.2)	4	3.3 (1.2)	0 (0.0)
Sleep7: I had trouble staying awake during the day.	1	1.7 (1.0)	21 (2.1)	2	2.5 (1.3)	0 (0.0)
Sleep10: I had a hard time getting things done because I was sleepy.	1	1.7 (0.9)	36 (3.6)	2	2.5 (1.4)	0 (0.0)
Sleep11: I had a hard time concentrating because I was sleepy.	2	1.9 (1.0)	36 (3.6)	3	2.9 (1.5)	0 (0.0)
Sleep18: I felt tired.	2	2.3 (1.1)	37 (3.7)	4	3.7 (1.4)	2 (6.1)
Sleep19: I tried to sleep whenever I could.	1	1.8 (1.0)	39 (3.9)	2	2.4 (1.2)	2 (6.1)
Sleep25: I had problems during the day because of poor sleep.	1	1.6 (0.8)	37 (3.7)	2	2.7 (1.5)	2 (6.1)
Sleep27: I had a hard time concentrating because of poor sleep.	1	1.8 (1.0)	38 (3.8)	3	2.9 (1.5)	2 (6.1)
Sleep29: My daytime activities were disturbed by poor sleep.	1	1.7 (0.9)	37 (3.7)	3	2.7 (1.2)	0 (0.0)
Sleep30: I felt irritable because of poor sleep.	2	1.8 (1.0)	38 (3.8)	3	3.0 (1.3)	0 (0.0)
Sleep33: I had a hard time controlling my emotions because of poor sleep.	1	1.6 (0.9)	36 (3.6)	3	2.7 (1.4)	1 (3.0)

Item‐level descriptive statistics

Structural validity

Results of analyses for IRT assumptions and fit are shown in Table 1. The Sleep Disturbance‐23 did not meet the three assumptions needed to fit the IRT model. For unidimensionality, exploratory factor analysis (EFA) criteria were met, but both confirmatory factor analysis (CFA) and bi‐factor analysis were not satisfactory (CFI 0.80, criterion > 0.95; Omega‐H 0.68, criterion > 0.80). IRT item fit was nevertheless satisfactory. The Sleep‐Related Impairment‐11 did meet the overall criteria for IRT modelling (CFI 0.96, Omega‐H 0.86); IRT item fit was good.

Measurement invariance

In the Sleep Disturbance‐23 three items were flagged for uniform language DIF; together they impact the total T‐score by about 2 points, which could be relevant in the future when comparing US scores to Dutch scores. In the Sleep‐Related Impairment‐11 no items were flagged for DIF.

Post‐hoc CAT simulation

A post‐hoc CAT simulation was performed for the Sleep‐Related Impairment‐11. Out of the 1,000 participants included in the simulation, 765 participants (76.5%) reached a reliable score with the full‐length item bank, whereas 235 participants (23.5%) could not reach a reliable score with 11 items. Of these 765, 757 also reached a reliable score using less than 11 items in CAT (mean number of items 4.5 ± 1.7).

Reliability

The Sleep‐Related Impairment‐11 has a reliability higher than 0.90 between a T‐score of approximately 43 and 80; adolescents with lower (better) scores reach a lower reliability (Figure 1). We did not perform reliability estimates for the Sleep Disturbance‐23, because assumptions needed to fit the IRT model were not met.

Figure 1

Standard error of measurement over the range of T‐scores

Standard error of measurement over the range of T‐scores Test–retest reliability of the Sleep Disturbance‐23 was sufficient; ICC (95% confidence interval) was 0.76 (0.67–0.83). The Bland‐Altman plot (see Figure 2) shows a mean difference of −1.1 point, with LoA −10.3 to 8.0. Test–retest reliability of the Sleep‐Related Impairment‐11 was just below the recommended value, with an ICC (95% confidence interval) of 0.68 (0.57–0.77). The Bland‐Altman plot (see Figure 3) shows a mean difference of −1.6 point, with LoA −16.1 to 12.9. The reliability was lower for lower (better) T‐scores.

Figure 2

Test–retest reliability (Bland‐Altman plot), Sleep Disturbance‐23

Figure 3

Test–retest reliability (Bland‐Altman plot), Sleep‐Related Impairment‐11

Test–retest reliability (Bland‐Altman plot), Sleep Disturbance‐23 Test–retest reliability (Bland‐Altman plot), Sleep‐Related Impairment‐11

Construct validity

For both the Sleep Disturbance‐23 and the Sleep‐Related Impairment‐11 all results were in accordance with the hypotheses (Table 4). The Sleep Disturbance‐23 showed differences between the different samples of 2.7 to 10.6 points, and the largest difference was found between the high‐school sample and the sleep‐clinic sample. Additionally, adolescents who reported having more sleep problems on the single item also had worse T‐scores. The Sleep‐Related Impairment‐11 showed differences between the different healthy and non‐healthy samples of 4.0 to 8.6 points, corrected for age and gender. Here also, the largest difference was found between the high‐school sample and the sleep‐clinic sample. Adolescents who reported having more sleep problems on the single item also had worse T‐scores.

Table 4

Hypothesis testing Sleep Disturbance‐23 and Sleep‐related Impairment‐11

We expected that:	Mean difference in T‐score (95% confidence interval) ^a
We expected that:	Sleep Disturbance−23	Sleep‐related Impairment−11 ^b
1. The sleep‐clinic sample had higher scores than the high‐school students	10.6 (8.1–13.1)	8.6 (5.2–11.9)
2. The adolescents with sleep problems and/or relevant health issues had higher scores than healthy high‐school students	4.8 (3.5–6.2)	5.3 (3.5–7.1)
3. The high‐school students with relevant health issues had higher scores than healthy high‐school students	2.7 (1.2–4.2)	4.0 (2.0–6.0)
4. Adolescents who answered item Sleep20 ‘I had a sleep problem’ with a higher response category, would have higher scores
‘Not at all’ versus ‘A little bit’	6.1 (5.1–7.1)	4.5 (3.0–6.0)
‘A little bit’ versus ‘Somewhat’	2.2 (0.9–3.4)	3.9 (1.7–6.1)
‘Somewhat’ versus ‘Quite a bit/very much’	5.2 (4.0–6.5)	3.2 (0.6–5.7)

A mean difference of ≥2 points was considered clinically relevant.

Corrected for age and gender.

Hypothesis testing Sleep Disturbance‐23 and Sleep‐related Impairment‐11 1. The sleep‐clinic sample had higher scores than the high‐school students 2. The adolescents with sleep problems and/or relevant health issues had higher scores than healthy high‐school students 3. The high‐school students with relevant health issues had higher scores than healthy high‐school students 4. Adolescents who answered item Sleep20 ‘I had a sleep problem’ with a higher response category, would have higher scores A mean difference of ≥2 points was considered clinically relevant. Corrected for age and gender.

Mean T‐scores for adolescents from the general population

For the Sleep Disturbance‐23, the mean (SD) T‐score in the high‐school sample was 47.3 (7.0), with a range from 25.9 to 70.8. T‐scores did not differ between low and high educational level, and there was no relevant difference between boys and girls or younger and older adolescents (Table 5). For the Sleep‐Related Impairment‐11, the mean (SD) T‐score in the high‐school sample was 48.6 (9.6), with a range from 31.1 to 82.4. T‐scores did not differ more than 2 points between low and high educational level. There was a relevant difference between younger and older adolescents, and between boys and girls: older adolescents and girls scored higher (5.0 and 2.7 points, respectively), indicating more sleep‐related impairment (Table 5).

Table 5

Mean T‐scores, high‐school sample

Variables	Mean (SD) Sleep disturbance−23	Mean (SD) sleep‐related impairment−11
Overall	47.3 (7.0)	48.6 (9.6)
Gender
Boys	46.4 (6.9)	47.2 (9.6) ^a
Girls	48.2 (7.0)	49.9 (9.6) ^a
Age
11–14 years	46.6 (6.9)	46.5 (9.1) ^a
15–19 years	48.4 (7.1)	51.5 (9.6) ^a
Educational level
Low	47.7 (7.8)	49.1 (10.4)
High	47.3 (6.8)	48.4 (9.4)

Clinically relevant (≥2 points) difference between groups.

Mean T‐scores, high‐school sample Clinically relevant (≥2 points) difference between groups.

DISCUSSION

The field of adolescent sleep medicine is in need of sleep questionnaires with good psychometric properties. These need to be tested specifically in this age group, because adolescents are different from their younger and older peers in terms of sleep physiology and social influences on sleep. PROMIS has high potential in this field and has developed item banks through IRT, enabling use as CAT, which ultimately leads to less participant burden (Cella et al., 2007, 2010). PROMIS has developed sleep item banks for adults that can possibly also be used in adolescents. In this study, we determined structural validity, measurement invariance, performance as CAT, reliability and construct validity of the PROMIS Sleep Disturbance‐23 and Sleep‐Related Impairment‐11 item banks, adapted for adolescents in previous research. The Sleep Disturbance‐23 did not meet the assumptions for IRT analyses due to lack of unidimensionality and is therefore not suited for use as CAT in its current form. A proper alternative for the Sleep Disturbance‐23 with sufficient unidimensionality does not currently exist. Ji et al. provided an overview of sleep questionnaires used in adolescents, from 2000 to 2016. Only six generic sleep measures were validated to some extent in adolescents. In three of these questionnaires, structural validity was not assessed at all. For the Sleep Disturbance Scale for Children and the Pittsburgh Sleep Quality Index factor analyses were performed, but the results did not meet the criteria for sufficient structural validity (Bruni et al., 1996; Zhou et al., 2012). The Sleep Disorders Inventory for Students (adolescent version) did meet criteria for CFA, but did not have adequate IRT‐model fit (Ji & Liu, 2016). In contrast to structural validity, the Sleep Disturbance‐23 showed sufficient test–retest reliability in terms of ICC and sufficient construct validity. Ideally, test–rest reliability would also be assessed comparing the LoA to the MIC; however, the MIC is not yet determined for both sleep item banks. The Sleep‐Related Impairment‐11 met the requirements for IRT analyses and showed good item fit. Post‐hoc CAT simulations using one to 10 items showed a difference of only 0.3% in the amount of reliable measurements when compared to the full‐length item bank. There was a floor effect in reliability, meaning that in adolescents with good sleep, reliability is lower. Importantly, in the adolescents with more sleep problems, T‐scores are estimated with high reliability. Test–retest reliability was just lower (0.68) than the criterion of 0.70. This might partly be explained by the large group of healthy participants in the study sample, because the Bland‐Altman plot shows that the reliability is lower in the participants with lower (better) scores for this item bank. The Sleep‐Related Impairment‐11 has more measurement error than the Sleep Disturbance‐23, which was shown by wider LoA. This could be explained by the lower number of items (11 versus 23). The construct validity was sufficient. During the course of this study, PROMIS also developed a Sleep Disturbance and a Sleep‐Related Impairment item bank for children aged 5–17 years. The constructs measured by the adult and paediatric item banks are similar, but the adult item banks are longer and the wording of the items is partly different. The paediatric Sleep Disturbance item bank has 15 items, of which nine are identical to the adult item bank. The paediatric Sleep‐Related Impairment item bank contains 13 items, of which six are identical to the adult item bank. The items that we deleted to fit the bank in Dutch adolescents were both adult‐only and shared items. Currently, research is being conducted to evaluate the performance of the adult sleep item banks in Dutch adults and the paediatric sleep item banks in Dutch children and adolescents, because the question remains over whether the lack of unidimensionality we found for the Sleep Disturbance‐23 is due to the adolescent age group or the construct measured. Preliminary results showed that the adult Sleep Disturbance item bank is unidimensional enough in adults (C.B. Terwee, personal communication, July 31, 2019), but that the paediatric Sleep Disturbance item bank lacks unidimensionality in adolescents (S. Peersmann, personal communication, July 31, 2019). This suggests that the construct of sleep disturbance might not be one unidimensional construct in adolescents, or that the questions developed for adults or a larger paediatric age group do not cover enough of the specific adolescent sleep issues. Regarding the latter and recalling the unique adolescent sleep features, specific questions should mostly focus on problems with sleep onset and cover the sleep‐related impairments due to the tendency for later bedtimes, activities supporting this tendency, and autonomy regarding bedtime (Crowley et al., 2018; Jakobsson et al., 2019). Drawing upon existing measures, example statements could be ´I have trouble sleeping because I do things in bed that keep me awake (for example reading, watching TV, etc.)´ or ´When it is time to go to sleep, I have trouble settling down´(de Bruin, Kampen, Kooten, & Meijer, 2014; Essner, Noel, Myrvik, & Palermo, 2015). Importantly, additional content needs to be developed in accordance with the high standards of PROMIS. This study had a few limitations. First of all, our study sample was more highly educated than the general Dutch adolescent population, which may limit the generalizability of these results. Secondly, regarding the analyses, we estimated DIF based on the US adult sample; ideally an adolescent sample would have been used, but this was not available. This means that we do not know if the observed DIF for the Sleep Disturbance‐23 is due to language or age. Additionally, most available DIF methods can detect DIF but cannot identify the DIF items due to parameter identification issues (Bechger & Maris, 2015). In conclusion, the Sleep Disturbance‐23 is a reliable measure (high ICC) of sleep disturbance and can properly distinguish between clinical and non‐clinical groups of adolescents. Although it is not suitable as CAT in its current form, better alternatives are currently unavailable and its use in adolescents is therefore recommended. Future research is necessary to optimize structural validity in order to enable CAT. In contrast, the Sleep‐Related Impairment‐11 item bank has sufficient structural validity and performed well as CAT. It can properly distinguish between clinical and non‐clinical groups of adolescents, but test–retest reliability was just below the recommended criterion.

CONFLICT OF INTEREST

Caroline B. Terwee is coordinator of the Dutch‐Flemish PROMIS group (http://www.dutchflemishpromis.nl/) and president of the PROMIS Health Organization. Raphaële R.L. van Litsenburg is a member of the Dutch paediatric PROMIS group. They both previously received grants for work on the translation and validation of the PROMIS item banks. The other authors declare that they have no conflict of interest.

AUTHOR CONTRIBUTIONS

All authors of this paper have contributed to writing the manuscript in a significant way. In addition, authors LS, SP and NW collected data in the medical centres and JvK collected data in the high schools and analyzsd the data together with ML. All authors have reviewed and approved the final version submitted.

39 in total

1. A Statistical Test for Differential Item Pair Functioning.

Authors: Timo M Bechger; Gunter Maris
Journal: Psychometrika Date: 2014-09-16 Impact factor: 2.500

2. Sleep deprivation in adolescents: correlations with health complaints and health-related quality of life.

Authors: Teresa Paiva; Tania Gaspar; Margarida G Matos
Journal: Sleep Med Date: 2015-01-20 Impact factor: 3.492

3. Correlates of adolescent sleep time and variability in sleep time: the role of individual and health related characteristics.

Authors: Melisa Moore; H Lester Kirchner; Dennis Drotar; Nathan Johnson; Carol Rosen; Susan Redline
Journal: Sleep Med Date: 2011-03 Impact factor: 3.492

Review 4. Pediatric sleep questionnaires as diagnostic or epidemiological tools: a review of currently available instruments.

Authors: Karen Spruyt; David Gozal
Journal: Sleep Med Rev Date: 2010-10-08 Impact factor: 11.609

5. Development and validation of patient-reported outcome measures for sleep disturbance and sleep-related impairments.

Authors: Daniel J Buysse; Lan Yu; Douglas E Moul; Anne Germain; Angela Stover; Nathan E Dodds; Kelly L Johnston; Melissa A Shablesky-Cade; Paul A Pilkonis
Journal: Sleep Date: 2010-06 Impact factor: 5.849

6. Development of short forms from the PROMIS™ sleep disturbance and Sleep-Related Impairment item banks.

Authors: Lan Yu; Daniel J Buysse; Anne Germain; Douglas E Moul; Angela Stover; Nathan E Dodds; Kelly L Johnston; Paul A Pilkonis
Journal: Behav Sleep Med Date: 2011-12-28 Impact factor: 2.964

7. Exploring the Association Between Self-Reported Asthma Impact and Fitbit-Derived Sleep Quality and Physical Activity Measures in Adolescents.

Authors: Jiang Bian; Yi Guo; Mengjun Xie; Alice E Parish; Isaac Wardlaw; Rita Brown; François Modave; Dong Zheng; Tamara T Perry
Journal: JMIR Mhealth Uhealth Date: 2017-07-25 Impact factor: 4.773

8. Content validity of the Patient-Reported Outcomes Measurement Information System Sleep Disturbance and Sleep Related Impairment item banks in adolescents.

Authors: Jojanneke A M C van Kooten; Caroline B Terwee; Gertjan J L Kaspers; Raphaёle R L van Litsenburg
Journal: Health Qual Life Outcomes Date: 2016-06-18 Impact factor: 3.186

Review 9. Sleep Disturbances in Child and Adolescent Mental Health Disorders: A Review of the Variability of Objective Sleep Markers.

Authors: Suman K R Baddam; Craig A Canapari; Stefon J R van Noordt; Michael J Crowley
Journal: Med Sci (Basel) Date: 2018-06-04

10. Psychometric properties of the Patient-Reported Outcomes Measurement Information System (PROMIS) Sleep Disturbance and Sleep-Related Impairment item banks in adolescents.

Authors: Jojanneke A M C van Kooten; Caroline B Terwee; Michiel A J Luijten; Lindsay M H Steur; Sigrid Pillen; Nicole G J Wolters; Gertjan J L Kaspers; Raphaële R L van Litsenburg
Journal: J Sleep Res Date: 2020-03-16 Impact factor: 3.981

3 in total

1. Effectiveness and cost-effectiveness of a web-based routine assessment with integrated recommendations for action for depression and anxiety (RehaCAT+): protocol for a cluster randomised controlled trial for patients with elevated depressive symptoms in rehabilitation facilities.

Authors: Johannes Knauer; Yannik Terhorst; Paula Philippi; Selina Kallinger; Sandro Eiler; Reinhold Kilian; Tamara Waldmann; Morten Moshagen; Martina Bader; Harald Baumeister
Journal: BMJ Open Date: 2022-06-23 Impact factor: 3.006

2. Investigating the contributions of circadian pathway and insomnia risk genes to autism and sleep disturbances.

Authors: Rackeb Tesfaye; Guillaume Huguet; Zoe Schmilovich; Thomas Renne; Mor Absa Loum; Elise Douard; Zohra Saci; Martineau Jean-Louis; Jean Luc Martineau; Rob Whelan; Sylvane Desrivieres; Andreas Heinz; Gunter Schumann; Caroline Hayward; Mayada Elsabbagh; Sebastien Jacquemont
Journal: Transl Psychiatry Date: 2022-10-03 Impact factor: 7.989

3. Psychometric properties of the Patient-Reported Outcomes Measurement Information System (PROMIS) Sleep Disturbance and Sleep-Related Impairment item banks in adolescents.

3 in total