Literature DB >> 31628731

Psychometric Properties of the Pediatric Patient-Reported Outcomes Measurement Information System Item Banks in a Dutch Clinical Sample of Children With Juvenile Idiopathic Arthritis.

Michiel A J Luijten¹, Caroline B Terwee², Hedy A van Oers³, Mala M H Joosten⁴, J Merlijn van den Berg³, Dieneke Schonenberg-Meinema³, Koert M Dolman⁵, Rebecca Ten Cate⁶, Leo D Roorda⁷, Martha A Grootenhuis⁴, Marion A J van Rossum⁸, Lotte Haverman³.

Abstract

OBJECTIVE: To assess the psychometric properties of 8 pediatric Patient-Reported Outcomes Measurement Information System (PROMIS) item banks in a clinical sample of children with juvenile idiopathic arthritis (JIA).
METHODS: A total of 154 Dutch children (mean ± SD age 14.4 ± 3.0 years; range 8-18 years) with JIA completed 8 pediatric version 1.0 PROMIS item banks (anger, anxiety, depressive symptoms, fatigue, pain interference, peer relationships, physical function mobility, physical function upper extremity) twice and the Pediatric Quality of Life Inventory (PedsQL) and the Childhood Health Assessment Questionnaire (C-HAQ) once. Structural validity of the item banks was assessed by fitting a graded response model (GRM) and inspecting GRM fit (comparative fit index [CFI], Tucker-Lewis index [TLI], and root mean square error of approximation [RMSEA]) and item fit (S-X2 statistic). Convergent validity (with PedsQL/C-HAQ subdomains) and discriminative validity (active/inactive disease) were assessed. Reliability of the item banks, short forms, and computerized adaptive testing (CAT) was expressed as the SE of theta (SE[θ]). Test-retest reliability was assessed using intraclass correlation coefficients (ICCs) and smallest detectable change.
RESULTS: All item banks had sufficient overall GRM fit (CFI >0.95, TLI >0.95, RMSEA <0.08) and no item misfit (all S-X2 P > 0.001). High correlations (>0.70) were found between most PROMIS T scores and hypothesized PedsQL/C-HAQ (sub)domains. Mobility, pain interference, and upper extremity item banks were able to discriminate between patients with active and inactive disease. Regarding reliability, PROMIS item banks outperformed legacy instruments. Post hoc CAT simulations outperformed short forms. Test-retest reliability was strong (ICC >0.70) for all full-length item banks and short forms, except for the peer relationships item bank.
CONCLUSION: The pediatric PROMIS item banks displayed sufficient psychometric properties for Dutch children with JIA. PROMIS item banks are ready for use in clinical research and practice for children with JIA.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2020 PMID： 31628731 PMCID： PMC7756261 DOI： 10.1002/acr.24094

Source DB: PubMed Journal: Arthritis Care Res (Hoboken) ISSN： 2151-464X Impact factor: 4.794

INTRODUCTION

In recent years, the focus of health care has been drifting toward the inclusion of health‐related quality of life (HRQoL) outcomes for patients in research and daily clinical practice by administering patient‐reported outcome measures (PROMs) (1, 2, 3, 4, 5, 6). Previous studies have shown that rheumatology could benefit greatly from the use of patient‐reported outcomes, as patients experience a wide array of problems (7) for which there is a disconnect between patient‐reported outcomes and outcomes reported by parents or clinicians (8). In clinical practice, there are often multiple PROMs available to measure the same construct/domain that differ in content, length, and scoring methods. These PROMs vary in their psychometric quality and often suffer from ceiling or floor effects when assessing patients who are outside the measurement range of the questionnaire. Most traditional PROMs (also known as legacy instruments) are scored using classical test theory (CTT), where all questions carry the same weight when calculating domain scores. The domain scores of these PROMs are incomparable due to the ordinal scoring methods used in CTT. In item response theory (IRT) modeling, the difficulty and discriminatory power of items can be taken into account when calculating a domain score. Additionally, IRT uses interval‐based scores, which allows comparison of scores on the same metric. Therefore, a group of researchers from several US‐based academic institutions and the National Institutes of Health initiated the creation of the Patient‐Reported Outcomes Measurement Information System (PROMIS) (9, 10), a new, universal set of IRT‐based PROMs for adults and children that can accurately and quickly assess aspects of physical, mental, and social health of patients (9, 11). This article provides an extensive overview of the psychometric properties of the Dutch pediatric Patient‐Reported Outcomes Measurement Information System (PROMIS) item banks in a sample of children with juvenile idiopathic arthritis (JIA). This is the first study to provide a full calibration of Dutch PROMIS pediatric item banks in a clinical sample. This article demonstrates the advantages of computerized adaptive testing in clinical populations such as children with JIA. The US PROMIS group developed several item banks to assess relevant domains of physical, mental, and social health, such as fatigue, pain interference, or peer relationships (10). An item bank is a collection of a large number of items intended to measure 1 construct over a wide range of functioning, symptoms, or evaluations of well‐being. This allows comparisons between different samples using the same PROM. The PROMIS item banks were developed using IRT modeling, which allows us to order items based on their difficulty. Using this information, items can be selected from the full‐length item bank to create a short form, which measures a similar range of functioning as the full‐length item bank. An online alternative to short forms is computerized adaptive testing (CAT). CAT uses the information of the IRT model (i.e., item difficulty and discrimination) and previous responses (11) to choose which items to administer to a specific patient. If, for example, a patient answers that he or she is never tired, the CAT will not offer an item about being exhausted to this patient, as the item about being exhausted has a higher difficulty. CAT thus provides more tailored items to patients than short forms, which makes the estimates of the construct more reliable (12). As long as items are selected from the same item bank, scores from short forms and CATs can be compared on the same scale. In 2009, the Dutch‐Flemish PROMIS group (www.dutchflemishpromis.nl) was founded, followed by the Dutch‐Flemish pediatric PROMIS group in 2011, to translate and implement the PROMIS item banks in The Netherlands and Flanders, Belgium. The pediatric PROMIS group translated 9 full PROMIS item banks into Dutch‐Flemish (13). The goal of this study was to assess the psychometric properties of 8 Dutch‐Flemish PROMIS pediatric item banks in a clinical sample of Dutch children with juvenile idiopathic arthritis (JIA). The application of PROMIS is highly anticipated within rheumatology (14, 15), and psychometric properties of the pediatric item banks were previously assessed in children with JIA in the US (8), making comparisons possible. In the current study, the structural validity of the item banks was investigated and construct validity was assessed by comparing the PROMIS instruments to legacy instruments (the Pediatric Quality of Life Inventory [PedsQL] and the Childhood Health Assessment Questionnaire [C‐HAQ]) and by comparing scores from patients with active and inactive disease. Furthermore, we assessed the reliability of the individual measurements for full‐length item banks, short forms, and CATs. Finally, we assessed the test–retest reliability of the PROMIS item banks.

Participants

All children diagnosed with JIA, 8–18 years of age, and under treatment in the Emma Children’s Hospital Amsterdam University Medical Centers, Onze Lieve Vrouwe Gasthuis West, the Reade center for Rehabilitation and Rheumatology in Amsterdam, or the Leiden University Medical Centre in Leiden, were eligible and asked to participate in the study between June 2015 and January 2017. The study was approved by the medical ethics committees of all the participating centers. An invitation was sent to children and their parents to log in to the study website (www.hetklikt.nu/promis). All participants provided informed consent. Participating children were asked to complete 8 full pediatric PROMIS item banks at the start of the study (T1) and again 10 days later (T2) to assess test–retest reliability. Additionally, participants were asked to complete the PedsQL and C‐HAQ at T1. All questionnaires were completed online. A reminder for T1 and T2 was sent out 3 days after the initial invitation. Children unable to understand Dutch or children with limitations/disorders that made them unable to complete (online) questionnaires were excluded from the study. Nonrespondent data were not available.

Patient characteristics

Personal data on age and sex were provided by the children. Medical data on the type of JIA, presence of uveitis, medication use, age at disease onset, disease duration, physician score of disease activity, and the number of joints with arthritis (1 = monoarthritis, 2–4 = oligoarthritis, 5–10 = polyarthritis, >10 = severe polyarthritis) were extracted retrospectively by pediatric rheumatology experts from the electronic medical records. The type of JIA was categorized in accordance to the International League of Associations for Rheumatology criteria (16). Disease activity was rated by a rheumatologist using the 100‐mm physician visual analogue scale (VAS; range 0–100, with 0 indicating no disease activity and higher scores indicating more activity).

Measures

PROMIS item banks

Eight full‐length, Dutch PROMIS, version 1.0, pediatric self‐report item banks (anger [17], anxiety [18], depressive symptoms [18], fatigue [19], pain interference [20], peer relationships [21], physical function mobility, and physical function upper extremity [22]) were completed by the children. All item banks utilize a 7‐day recall period. A 5‐point Likert scale ranging from 1 (“never”) to 5 (“almost always”) is used for all item banks, except the mobility and upper extremity item banks. For these item banks, the response categories range from 1 (“not able to do”) to 5 (“with no trouble”). Total scores are calculated by applying the original US IRT model to the data and estimating the level of functioning of the patient (theta). This level of functioning is transformed into a T score, with a score of 50 representing the mean of the general US population (SD 10). For all item banks, higher scores represent more of the construct (e.g., better mobility or more pain interference). Scores can also be calculated for the standard PROMIS short forms, consisting of 8 items for all domains, except for anger (5 items) and fatigue (10 items), by extracting short‐form item responses from the full‐length item bank.

PedsQL generic scale 4.0

The PedsQL (23) is a 23‐item questionnaire that assesses the self‐reported HRQoL of children (ages 8–18 years) across the following 4 domains: physical functioning (8 items); emotional functioning (5 items); social functioning (5 items); and school functioning (5 items). The PedsQL utilizes a 7‐day recall period. Items are scored using a 5‐point Likert scale ranging from 1 (“never a problem”) to 5 (“almost always a problem”). The response options are transformed into values of 0, 25, 50, 75, and 100 respectively. Domain scores (range 0–100, with a higher score representing better functioning) are calculated by summing and averaging the items within each domain. The total PedsQL score (range 0–100) is calculated by averaging all individual item scores. The PedsQL is an often used, validated tool for Dutch children with JIA (7, 24).

C‐HAQ

The C‐HAQ is a 30‐item questionnaire that measures self‐reported functional ability in children (ages 8–18 years) (25). The C‐HAQ is composed of the following 8 categories: dressing and grooming (4 items); arising (2 items); eating (3 items); walking (2 items); hygiene (5 items); reach (4 items); grip (5 items); and activities (5 items). The C‐HAQ utilizes a 1‐week recall period. Each item on the C‐HAQ is scored from 0 (“without any difficulty”) to 3 (“unable to do”). The highest scoring item within a category determines the score for that category. The disability index (range 0 [low]–3 [high]) averages the category scores. Additionally, the C‐HAQ contains two 100‐mm VAS to measure pain (0 = no pain, 100 = very severe pain) and well‐being (0 = very well, 100 = very poor) over the past week. The C‐HAQ is a validated tool for assessing Dutch children with JIA (25, 26) and a recommended instrument for assessing daily functioning in rheumatology patients (27).

Statistical analysis

Descriptive analyses were performed to describe sociodemographic and clinical characteristics of the children using SPSS, version 24.0 (28). All further analyses were performed in R (29).

Structural validity

To assess the structural validity of the PROMIS item banks, a graded response model (GRM) was fitted to each of the item banks. A GRM is an IRT model for items with ordinal response categories and requires several assumptions to be met, such as unidimensionality, local independence, and monotonicity. To assess unidimensionality of each item bank, a confirmatory factor analysis (CFA) was performed using the R‐package lavaan, version 0.6‐3 (30). An acceptable fit of a unidimensional model is indicated by a comparative fit index (CFI) value and Tucker‐Lewis Index (TLI) score >0.95, a standardized root mean square residual (SRMR) value <0.10, and a root mean square error of approximation (RMSEA) value <0.08 (31). Scaled indices were reported. Local independence was assessed by looking at the residual correlations in the CFA model. An item pair is considered local independent when it has a residual correlation <0.20 (32). Finally, monotonicity was assessed using Mokken scaling (33, 34). The assumption of monotonicity is met when the item H values of all items are ≥0.30 and the H value of the entire scale is ≥0.50. Once the assumptions were met, a GRM was fitted to each item bank to estimate item discrimination and threshold parameters using the expectation‐maximization algorithm within the R‐package mirt, version 1.29) (35). The discrimination parameter (α) represents the ability of an item to distinguish between patients with a different level of functioning (θ). The threshold parameters (β) represent the required level of functioning of a person to choose a higher response category over a lower response category. Previous simulation studies have shown that fitting a GRM requires a large sample size of ~500 respondents in most cases, but that increased unidimensionality and high discriminatory parameters of an item bank reduce the number of respondents required (36, 37). As the items in PROMIS item banks were specifically chosen based on their discriminatory power and their contribution to measuring a single construct, we expected that a smaller sample size could be used. Caution is advised when assessing the estimated parameters, however, as other sample characteristics (i.e., skewness of responses) can impact parameter calibration. Model fit of the GRM model was assessed using the same CFI, TLI, SRMR, and RMSEA criteria as for the CFA. Item fit was assessed using the S‐X2 statistic (38), which calculates the differences between observed and expected responses under the GRM model. A P value of the S‐X2 statistic <0.001 for an item is considered an item misfit (32).

Construct validity

Construct validity was investigated by assessing convergent and discriminative validity. Convergent validity was assessed by correlating the PROMIS item bank T scores to the PedsQL or C‐HAQ using Pearson’s correlation coefficient (r). A strong correlation (>0.70 or less than –0.70) was expected between PROMIS T scores and the sum scores of the PedsQL and C‐HAQ scales measuring similar constructs. Correlations with unrelated constructs were expected to be lower (Δr > 0.10). Discriminative (known‐groups) validity was assessed by comparing the T scores of PROMIS item banks between patients with an active and inactive disease using an independent sample t‐test. Disease activity can be represented by results from the physician VAS and the number of joints with arthritis. However, a combination of these variables would result in an active disease group too small for valid comparison. The correlation between these 2 variables was high (r = 0.75), indicating that a combination of these variables would not impact the results much. Therefore, the physician VAS was used to discriminate active (>0) and inactive (0) disease, as this resulted in large enough groups for valid and reliable comparisons. It was expected that the physical health domains would be most affected by JIA (7, 24). Mobility and upper extremity T scores were hypothesized to be significantly lower for patients with an active disease. The pain interference T scores were expected to be significantly higher for patients with active disease. For the remaining item banks, no differences in T scores were hypothesized between patients with active and inactive disease. Each PROMIS item bank was considered to have sufficient construct validity if at least 75% of the hypotheses were confirmed.

Reliability

In IRT, the reliability of an item bank can vary across levels of the measured construct. The estimated level of functioning is represented by θ, which is standardized to have a mean of 0 and an SD of 1 in the calibration sample. Each response pattern has a θ estimate and an associated SE of theta estimate (SE[θ]). An SE(θ) of 0.32 corresponds to a reliability of 0.90. To compare the reliability of the PROMIS item banks to similar domains on the PedsQL, a GRM was fit to each PedsQL domain to calculate the θ estimates and SE(θ). Thetas and SE(θ)s were estimated for the full‐length PROMIS item banks and short forms using the expected a posteriori (EAP) estimator. Post hoc CAT simulations were performed using R‐package catR, version 3.16 (39) on each item bank and using maximum posterior weighted information selection criterion and the EAP estimator (40) to assess whether or not a CAT would outperform short forms. The starting item for each CAT was the item that offered most information at the mean of the population (θ = 0). The maximum number of items for the CAT simulation was set to the number of items in the short form of the same item bank, which ensured that the CAT did not administer more items than the short form. The stopping rule of the SE(θ) was <0.32 (41).

Test–retest reliability

Test–retest reliability was assessed for the full‐length item banks and the short forms by calculating the intraclass correlation coefficient (ICC; two‐way random‐effects model for absolute agreement) (42) of the T scores for the patients who completed the PROMIS item banks twice (within 4 weeks). An ICC >0.70 was considered acceptable (42). The smallest detectable change (SDC) was calculated for all full‐length item banks as 1.96 × √2 × (SD × [√1 – ICC]). The SDC represents the smallest change in score that falls outside of the measurement error (42).

Patient characteristics

A total of 154 children with JIA completed all PROMIS pediatric item banks, the PedsQL, and the C‐HAQ. A total of 111 children completed the item banks twice, with a time interval ranging 1–14 weeks (mean 2.6). Patient characteristics are shown in Table 1. The mean ± SD age of patients was 14.4 ± 3.0 years (range 8–18 years), and the majority of the patients were female (70.7%). The majority of patients were diagnosed with polyarticular JIA (44.0%). More than one‐half of the patients had inactive disease (66.9%), as measured by the physician VAS (n = 140). The distribution of joints affected by arthritis (n = 149) was 75.8% no arthritis, 8.9% monoarthritis, 7.0% oligoarthritis, and 3.2% polyarthritis.

Table 1

Patient characteristics*

Characteristics	No.	Value
Age, mean ± SD years	157	14.4 ± 3.0
Age at onset of JIA, mean ± SD years	157	8.9 ± 4.5
Sex, female	111	70.7
JIA subtype
Oligoarticular JIA, persistent	26	16.6
Oligoarticular JIA, extended	16	10.2
Polyarticular JIA, RF negative	62	39.5
Polyarticular JIA, RF positive	7	4.5
Enthesitis‐related arthritis	21	13.4
Psoriatic arthritis	11	7.0
Undifferentiated arthritis	0	0
Systemic JIA	4	2.5
Chronic arthritis with other autoimmune inflammatory disease	8	5.1
Disease specifications
Disease duration, median (range)	157	4.9 (0.18–16.8)
Physician assessment of disease activity, VAS score (range 0–100) ^†	140	0 (0–50)
Number of joints with arthritis ^‡
No arthritis	119	75.8
Monoarthritis (1 joint)	14	8.9
Oligoarthritis (2–4 joints)	11	7.0
Polyarthritis (>4 joints)	5	3.2
Presence of uveitis	26	16.6
Medication at time point of evaluation
No medication	60	39.0
NSAIDs	20	12.7
MTX	69	43.9
Other DMARDs	4	2.5
Anti‐TNF	45	28.7
Other biologics	2	1.3
Multiple medications	38	24.2

Values are the percentage unless indicated otherwise. JIA = juvenile idiopathic arthritis; RF = rheumatoid factor; VAS = visual analog scale; NSAIDs = nonsteroidal antiinflammatory drugs; MTX = methotrexate; DMARDs = disease‐modifying antirheumatic drugs; anti‐TNF = anti–tumor necrosis factor.

Physician VAS outcomes were missing for 17 patients at the time of measurement.

Information on the number of infected joints was missing for 8 patients at the time of measurement.

Patient characteristics* Values are the percentage unless indicated otherwise. JIA = juvenile idiopathic arthritis; RF = rheumatoid factor; VAS = visual analog scale; NSAIDs = nonsteroidal antiinflammatory drugs; MTX = methotrexate; DMARDs = disease‐modifying antirheumatic drugs; anti‐TNF = anti–tumor necrosis factor. Physician VAS outcomes were missing for 17 patients at the time of measurement. Information on the number of infected joints was missing for 8 patients at the time of measurement.

Structural validity

Unidimensionality was sufficient for all item banks except for the anxiety item bank (RMSEA = 0.103) (Table 2). Local independence did not hold for all item banks (not for anxiety, mobility, peer relationships, and upper extremity). As the percentages of local dependent item pairs were low (1–4%), the GRM analyses were performed without removing items. The assumption of monotonicity was met for all items and item banks. The item parameters and item fit statistics of the fitted GRMs are available in Supplementary Table 1, available on the Arthritis Care & Research website at http://onlinelibrary.wiley.com/doi/10.1002/acr.24094/abstract. The discrimination parameters ranged from 1.07 to 22.25. Two discrimination parameters of the upper extremity item bank had outlying discriminatory values (α > 10). For the item banks peer relationships, mobility, and upper extremity, not all item thresholds could be calculated, as not all response categories were used by the respondents. There were no items with item misfit (S‐X2 < 0.001) in any of the item banks.

Table 2

Model assumptions of the PROMIS pediatric item banks for children with juvenile idiopathic arthritis (n = 155)*

Item bank	Unidimensionality				Local independence, no. (%) ^†	Monotonicity, H scale
Item bank	CFI score	TLI score	SRMR	RMSEA	Local independence, no. (%) ^†	Monotonicity, H scale
Anger scale	0.995	0.989	0.032	0.053	0 (0)	0.726
Anxiety	0.983	0.980	0.077	0.103	1 (1.3)	0.662
Depressive symptoms	0.996	0.995	0.035	0.000	0 (0)	0.733
Fatigue	0.991	0.990	0.042	0.055	0 (0)	0.743
Mobility (n = 156)	0.992	0.991	0.072	0.000	6 (2.4)	0.588
Pain interference	0.987	0.985	0.044	0.059	0 (0)	0.682
Peer relationships	0.954	0.947	0.080	0.080	4 (3.8)	0.508
Upper extremity (n = 157)	0.991	0.990	0.073	0.021	5 (1.2)	0.573

PROMIS = Patient‐Reported Outcomes Measurement Information System; CFI = comparative fit index; TLI = Tucker‐Lewis index; SRMR = standardized root mean square residual; RMSEA = root mean square error of approximation.

Locally dependent item pairs.

Model assumptions of the PROMIS pediatric item banks for children with juvenile idiopathic arthritis (n = 155)* Local independence, no. (%) Monotonicity, H scale CFI score TLI score PROMIS = Patient‐Reported Outcomes Measurement Information System; CFI = comparative fit index; TLI = Tucker‐Lewis index; SRMR = standardized root mean square residual; RMSEA = root mean square error of approximation. Locally dependent item pairs.

Construct validity

The correlations between the PROMIS item banks, the PedsQL, and the C‐HAQ are shown in Table 3. For all item banks, at least 1 expected strong correlation (>0.70) with a relevant PedsQL or C‐HAQ subdomain was found, except for the peer relationship item bank. For the mobility and upper extremity item banks, additional correlations were found that were nearly the same strength (Δr < 0.10) as the hypothesized strong correlation with the PedsQL physical subscale.

Table 3

Convergent and discriminative validity of the pediatric PROMIS item banks for children with juvenile idiopathic arthritis (n = 154)*

PROMIS questionnaire item	Convergent validity, PedsQL			Convergent validity, C‐HAQ			Discriminant validity			Total hypotheses correct, %
PROMIS questionnaire item	Physical	Emotional	Social	Total	Grip	Pain	Active disease ^†	Inactive disease ^‡	Mean difference ± SD	Total hypotheses correct, %
Anger	–0.48	–0.72 ^§	–0.58	0.48	0.46	0.37	50.60	49.63	–0.97 ± 1.86	100
Anxiety	–0.50	–0.78 ^§	–0.62	0.48	0.46	0.38	49.64	50.28	0.64 ± 1.81	100
Depressive symptoms	–0.54	–0.79 ^§	–0.60	0.48	0.42	0.48	51.25	49.51	–1.74 ± 1.84	100
Fatigue	0.76 ^§	–0.62	–0.61	0.61	0.49	0.67	51.86	49.22	–2.64 ± 1.91	86
Mobility	0.83 ^§	0.52	0.67	–0.74 ^§	–0.52	–0.71 ^§	46.58	51.2	4.62 ± 1.85 ^¶	71
Pain interference	–0.76 ^§	–0.62	–0.65	0.64 ^§	0.49	0.75 ^§	53.36	48.43	–4.93 ± 1.83 ^¶	86
Peer relationships	0.29	0.44	0.54 ^§	–0.32	–0.28	–0.22	51.17	49.72	–1.45 ± 1.90	71
Upper extremity	0.79 ^§	0.56	0.65	–0.77 ^§	–0.70§	–0.66 ^§	47.37	51.18	3.80 ± 1.75 ^¶	71

PROMIS = Patient‐Reported Outcomes Measurement Information System; PedsQL = Pediatric Quality of Life Inventory; C‐HAQ = Childhood Health Assessment Questionnaire.

N = 35.

N = 103.

Significant; numbers were hypothesized to be highly (>0.70) correlated or able to discriminate between patients with active and inactive disease.

Significant at P < 0.05; numbers were hypothesized to be highly (>0.70) correlated or able to discriminate between patients with active and inactive disease.

Convergent and discriminative validity of the pediatric PROMIS item banks for children with juvenile idiopathic arthritis (n = 154)* PROMIS questionnaire item Convergent validity, PedsQL Convergent validity, C‐HAQ Total hypotheses correct, % Active disease Inactive disease Mean difference ± SD PROMIS = Patient‐Reported Outcomes Measurement Information System; PedsQL = Pediatric Quality of Life Inventory; C‐HAQ = Childhood Health Assessment Questionnaire. N = 35. N = 103. Significant; numbers were hypothesized to be highly (>0.70) correlated or able to discriminate between patients with active and inactive disease. Significant at P < 0.05; numbers were hypothesized to be highly (>0.70) correlated or able to discriminate between patients with active and inactive disease. Discriminative validity was assessed by comparing T scores from patients with active disease (n = 35) to those from patients with inactive disease (n = 105). The results are shown in Table 3. Patients with active disease scored significantly lower on the mobility item bank (mean difference –4.62, t(138) = 2.50, P = 0.014) and the upper extremity item bank (mean difference –3.81, t(137) = 2.17, P = 0.032) than patients with inactive disease. For the pain interference item bank, patients with active disease scored significantly higher (mean difference 4.93, t(136) = –2.70, P = 0.008) than patients with no disease activity. For the anger, anxiety, depressive symptoms, fatigue, and pain interference item banks, at least 75% of the hypotheses regarding construct validity were confirmed. The mobility, upper extremity, and peer relationships item banks did not meet the criterion (71%).

Reliability

All PROMIS item banks provided reliable measurements (SE[θ] < 0.32) for the sample mean of 0 and a range of at least 2 SD of theta in the direction of clinical interest (e.g., higher thetas for depressive symptoms, lower thetas for mobility). The only exception was the upper extremity item bank, which did not reach satisfactory reliability for the mean. The reliability of measurements of the full item bank, short forms, post hoc CATs, and their related subdomain from the PedsQL across the range of theta for all items banks is visualized in Figures 1 and 2. The number of reliable measurements, the number of items used, and the average SE(θ) value of the full item banks, short forms, and CATs are shown in Table 4.

Figure 1

Figure 2

Table 4

Reliability and test–retest reliability of measurements of the full‐length (FL) item banks, short forms (SF), and computerized adaptive testing (CAT) of the pediatric PROMIS item banks in a sample of children with juvenile idiopathic arthritis (n = 155)*

PROMIS item	Mean FL SE(θ)	FL SE(θ) <0.32, no. (%) ^†	Mean SF SE(θ)	SF SE(θ) <0.32, no. (%) ^†	Mean CAT SE(θ)	CAT SE(θ) <0.32, no. (%) ^†	Mean CAT items administered	FL, no. of items	SF, no. of items	FL, ICC (95% CI) (n = 101)	SF, ICC (n = 101)	SDC (n = 101)
Anger scale	0.37	89 (57.4)	0.51	89 (57.4)	0.40	89 (57.4)	3.6	5	5	0.70 (0.59–0.70)	0.70	15.3
Anxiety	0.36	86 (55.5)	0.52	73 (47.1)	0.41	84 (54.2)	5.6	13	8	0.77 (0.68–0.84)	0.76	13.5
Depressive symptoms	0.34	89 (57.4)	0.75	78 (50.3)	0.40	88 (56.8)	5.2	13	8	0.79 (0.70–0.85)	0.77	14.2
Fatigue	0.20	123 (79.4)	0.40	108 (69.7)	0.31	114 (73.5)	4.7	23	10	0.87 (0.82–0.91)	0.85	17.2
Mobility	0.30	106 (67.5)	0.54	79 (50.3)	0.37	99 (63.1)	5.3	23	8	0.84 (0.76–0.89)	0.81	13.3
Pain interference	0.27	108 (69.7)	0.40	103 (66.5)	0.36	106 (68.4)	4.5	13	8	0.83 (0.77–0.89)	0.82	13.6
Peer relationships	0.29	112 (72.3)	0.41	82 (52.9)	0.36	97 (62.6)	5.8	15	8	0.69 (0.58–0.78)	0.72	18.7
Upper extremity	0.38	76 (48.7)	0.84	65 (41.7)	0.45	70 (44.9)	6.0	29	8	0.86 (0.80–0.90)	0.84	12.1

PROMIS = Patient‐Reported Outcomes Measurement Information System; SE(θ) = SE of theta; ICC = intraclass correlation coefficient; 95% CI = 95% confidence interval; SDC = smallest detectable change.

Number of patients with an SE(θ) <0.32. An SE(θ) of 0.32 equals a reliability of 0.90.

Reliability of measurements of the full item bank, short forms, post hoc computerized adaptive testing (CAT), and their related subdomain from the Pediatric Quality of Life Inventory (PedsQL) across the range of theta for all item banks. PROMIS = Patient‐Reported Outcomes Measurement Information System. Reliability of measurements of the full item bank, short forms, post hoc computerized adaptive testing (CAT), and their related subdomain from the Pediatric Quality of Life Inventory (PedsQL) across the range of theta for all item banks. PROMIS = Patient‐Reported Outcomes Measurement Information System. Reliability and test–retest reliability of measurements of the full‐length (FL) item banks, short forms (SF), and computerized adaptive testing (CAT) of the pediatric PROMIS item banks in a sample of children with juvenile idiopathic arthritis (n = 155)* Mean FL SE(θ) FL SE(θ) <0.32, no. (%) Mean SF SE(θ) SF SE(θ) <0.32, no. (%) Mean CAT SE(θ) CAT SE(θ) <0.32, no. (%) Mean CAT items administered FL, no. of items SF, no. of items FL, ICC (95% CI) (n = 101) SF, ICC (n = 101) SDC (n = 101) PROMIS = Patient‐Reported Outcomes Measurement Information System; SE(θ) = SE of theta; ICC = intraclass correlation coefficient; 95% CI = 95% confidence interval; SDC = smallest detectable change. Number of patients with an SE(θ) <0.32. An SE(θ) of 0.32 equals a reliability of 0.90.

Test–retest reliability

Ten patients were removed from the test–retest reliability analyses, as they did not complete the second measurement within 4 weeks of the initial measurement. Most item banks displayed sufficient (ICC >0.70) test–retest reliability. Only the item bank peer relationships displayed a moderate test–retest reliability (ICC 0.69). The SDC ranged from 12.1 to 18.7. The SDC and ICC values are shown in Table 4.

DISCUSSION

This is the first study to assess the psychometric properties of the pediatric PROMIS item banks in a Dutch clinical sample. The PROMIS item banks all displayed sufficient validity and reliability for use in clinical practice for children with JIA. All item banks fit the underlying IRT model. The item banks correlated highly with similar (sub)domains from the legacy instruments PedsQL and C‐HAQ. The item banks pain interference, mobility, and upper extremity were able to discriminate between active and inactive JIA. Other studies have shown that issues regarding physical health commonly occur in these 3 domains in children with JIA (7, 24). All item banks measure their domain‐specific levels of functioning accurately across a wide range of level of functioning and in the clinically most relevant direction from the mean. The PROMIS short forms and CATs provided reliable estimations for the majority of patients. CATs outperformed short forms in terms of test length and number of reliably estimated patients. The aim of the pediatric Dutch‐Flemish PROMIS group is to improve the measurements of patient‐reported outcomes in The Netherlands and Belgium by providing researchers and health care professionals access to the generic pediatric PROMIS item banks, short forms, and CATs. The current study supports the application of CATs in clinical samples. The PROMIS item banks outperformed legacy instruments (the PedsQL) by providing more reliable measurements across a broader range of functioning. A limitation of this study is that our sample was small and contained a large proportion of patients with inactive disease. Due to a combination of relatively good health and a small sample size, the physical function item banks did not have enough variation in responses to provide reliable parameter estimates; particularly, 2 items from the upper extremity item banks had outlying discrimination parameters due to a lack of variety of item responses. Due to the skewed data, a moderate ceiling effect was present for the mobility and upper extremity item banks. This might indicate that there are not enough items with a high difficulty present in these item banks to discriminate between patients with healthier levels of functioning. However, having fewer precise measurements at a healthy level of functioning is less important than having precise measurements in the clinical range. The skewness of the data also has an effect on the informative value of items, and consequently, on the SE(θ). The item banks peer relationships, mobility, and upper extremity displayed lower item thresholds and some local dependent item pairs, also due to skewness. Similar skewed data were found in a US sample of patients with JIA (8). Despite the small sample size, this study shows that there are strong psychometric properties for this population. The psychometric properties of the PROMIS item banks in this study were similar to the properties reported in the developmental phase of the instruments (17, 18, 19, 20, 21, 22) in terms of IRT model and item fit. For the study of US children with JIA, fit indices were not available. Brandon et al (8) investigated the discriminative validity across different levels of disease activity in children with JIA. Their study found discriminative abilities for the fatigue, mobility, pain interference, and upper extremity item banks. Our findings support these results, except for the fatigue item bank. This is possibly due to different methods of determining disease activity. We compared only disease activity to no disease activity, as there were only limited retrospective data available to assess disease activity, and few patients with disease activity to facilitate group comparisons. The reliability of the measurements of the Dutch JIA sample were generally higher than those found in the US sample (8). This is possibly due to differences in model calibration and parameterization. To our knowledge, no studies have been published that assess the test–retest reliability of the full pediatric item banks. In the current study, test–retest reliability was sufficient for all item banks, except the peer relationships item bank (ICC 0.69). Varni et al (43) assessed the test–retest reliability of the pediatric short forms and found similar results. Additionally, the current study displayed similar test–retest reliability for short forms and full‐length item banks. To enable international comparisons of PROMIS T scores, differential item functioning (DIF) needs to be assessed between The Netherlands and the US. As the US data on JIA children were unobtainable, assessing DIF was not possible in this study. A next step is to calibrate the pediatric item banks in a normative Dutch sample and perform DIF analyses with the US normative sample. In conclusion, the current study demonstrates sufficient psychometric properties for the pediatric PROMIS item banks in children with JIA and provides evidence for the advantages of using the PROMIS CATs in Dutch clinical populations.

AUTHOR CONTRIBUTIONS

All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be submitted for publication. Dr. Haverman had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study conception and design

Terwee, van Oers, Joosten, Grootenhuis, van Rossum, Haverman.

Acquisition of data

Van Oers, Joosten, van den Berg, Schonenberg‐Meinema, Dolman, ten Cate, Roorda, van Rossum, Haverman.

Analysis and interpretation of data

Luijten, Terwee, Haverman.

ROLE OF THE STUDY SPONSOR

Pfizer Pharmaceuticals had no role in the study design or in the collection, analysis, or interpretation of the data, the writing of the manuscript, or the decision to submit the manuscript for publication. Publication of this article was not contingent upon approval by Pfizer Pharmaceuticals. Table S1 Click here for additional data file.

31 in total

1. International League of Associations for Rheumatology classification of juvenile idiopathic arthritis: second revision, Edmonton, 2001.

Authors: Ross E Petty; Taunton R Southwood; Prudence Manners; John Baum; David N Glass; Jose Goldenberg; Xiaohu He; Jose Maldonado-Cocco; Javier Orozco-Alcala; Anne-Marie Prieur; Maria E Suarez-Almazor; Patricia Woo
Journal: J Rheumatol Date: 2004-02 Impact factor: 4.666

2. Patient-Reported Outcomes Measurement Information System Tools for Collecting Patient-Reported Outcomes in Children With Juvenile Arthritis.

Authors: Timothy G Brandon; Brandon D Becker; Katherine B Bevans; Pamela F Weiss
Journal: Arthritis Care Res (Hoboken) Date: 2017-03 Impact factor: 4.794

3. The future of outcomes measurement: item banking, tailored short-forms, and computerized adaptive assessment.

Authors: David Cella; Richard Gershon; Jin-Shei Lai; Seung Choi
Journal: Qual Life Res Date: 2007-03-31 Impact factor: 4.147

4. Predictors of health-related quality of life in children and adolescents with juvenile idiopathic arthritis: results from a Web-based survey.

Authors: L Haverman; M A Grootenhuis; J M van den Berg; M van Veenendaal; K M Dolman; J F Swart; T W Kuijpers; M A J van Rossum
Journal: Arthritis Care Res (Hoboken) Date: 2012-05 Impact factor: 4.794

5. The PedsQL in pediatric rheumatology: reliability, validity, and responsiveness of the Pediatric Quality of Life Inventory Generic Core Scales and Rheumatology Module.

Authors: James W Varni; Michael Seid; Tara Smith Knight; Tasha Burwinkle; Joy Brown; Ilona S Szer
Journal: Arthritis Rheum Date: 2002-03

6. Implementation of Patient-reported Outcome Measures in Total Knee Arthroplasty.

Authors: David C Ayers
Journal: J Am Acad Orthop Surg Date: 2017-02 Impact factor: 3.020

7. Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms.

Authors: Seung W Choi; Steven P Reise; Paul A Pilkonis; Ron D Hays; David Cella
Journal: Qual Life Res Date: 2009-11-26 Impact factor: 4.147

8. Psychometric properties of the PROMIS ® pediatric scales: precision, stability, and comparison of different scoring and administration options.

Authors: James W Varni; Brooke Magnus; Brian D Stucky; Yang Liu; Hally Quinn; David Thissen; Heather E Gross; I-Chan Huang; Darren A DeWalt
Journal: Qual Life Res Date: 2013-10-02 Impact factor: 4.147

9. Sample Size Requirements for Estimation of Item Parameters in the Multidimensional Graded Response Model.

Authors: Shengyu Jiang; Chun Wang; David J Weiss
Journal: Front Psychol Date: 2016-02-09

10. Dutch-Flemish translation of nine pediatric item banks from the Patient-Reported Outcomes Measurement Information System (PROMIS)®.

Authors: Lotte Haverman; Martha A Grootenhuis; Hein Raat; Marion A J van Rossum; Eline van Dulmen-den Broeder; Karel Hoppenbrouwers; Helena Correia; David Cella; Leo D Roorda; Caroline B Terwee
Journal: Qual Life Res Date: 2015-03-28 Impact factor: 4.147

4 in total

Review 1. Enthesitis-related arthritis: monitoring and specific tools.

Authors: Hanène Lassoued Ferjani; Kaouther Maatallah; Sirine Miri; Wafa Triki; Dorra Ben Nessib; Dhia Kaffel; Wafa Hamdi
Journal: J Pediatr (Rio J) Date: 2021-09-28 Impact factor: 2.990

2. Musculoskeletal pain and its effect on daily activity and behaviour in Icelandic children and youths with juvenile idiopathic arthritis: a cross-sectional case-control study.

Authors: Svanhildur Arna Oskarsdottir; Audur Kristjansdottir; Judith Amalia Gudmundsdottir; Solrun W Kamban; Zinajda Alomerovic Licina; Drifa Bjork Gudmundsdottir; Bjorg Gudjonsdottir
Journal: Pediatr Rheumatol Online J Date: 2022-07-15 Impact factor: 3.413

3. Heritable Connective Tissue Disorders in Childhood: Increased Fatigue, Pain, Disability and Decreased General Health.

Authors: Jessica Warnink-Kavelaars; Lisanne E de Koning; Lies Rombaut; Mattijs W Alsem; Leonie A Menke; Jaap Oosterlaan; Annemieke I Buizer; Raoul H H Engelbert
Journal: Genes (Basel) Date: 2021-05-28 Impact factor: 4.096

4. Comparison of Patient-Reported Outcomes Measurement Information System Computerized Adaptive Testing Versus Fixed Short Forms in Juvenile Myositis.

Authors: Ruchi N Patel; Valeria G Esparza; Jin-Shei Lai; Elizabeth L Gray; Bryce B Reeve; Rowland W Chang; David Cella; Kaveh Ardalan
Journal: Arthritis Care Res (Hoboken) Date: 2021-07-30 Impact factor: 5.178

4 in total