| Literature DB >> 35501922 |
Stephanie Maguire1, Jenny Davison2, Marian McLaughlin2, Victoria Simms2.
Abstract
BACKGROUND: Whilst there are studies that have systematically reviewed the psychometric properties of quality of life measures for children and young people with intellectual disabilities, these narrowly focus on disease or health conditions. The objective of this planned systematic review is therefore to collate, summarise, and critically appraise the psychometric properties of self-report health-related quality of life (HRQoL) and subjective wellbeing measures used by adolescents (aged 11-16) with an intellectual disability.Entities:
Keywords: Adolescence; COSMIN; Health-related quality of life; Intellectual disability; Measures; Psychometric properties; Self-report; Subjective wellbeing
Mesh:
Year: 2022 PMID: 35501922 PMCID: PMC9063098 DOI: 10.1186/s13643-022-01957-w
Source DB: PubMed Journal: Syst Rev ISSN: 2046-4053
Data extraction form
| Author and date | Country | Instrument | Measurement domains | Sample size | Study population | Sample age | Psychometric properties assessed |
|---|---|---|---|---|---|---|---|
| ⠀ |
COSMIN definitions of domains, measurement properties, and aspects of measurement properties [38, 39]
| Term | Definition | ||
|---|---|---|---|
| Domain | Measurement property | Aspect of a measurement property | |
| Reliability | The degree to which the measurement is free from measurement error | ||
| Reliability (extended definition) | The extent to which scores for patients who have not changed are the same for repeated measurement under several conditions, e.g. using different sets of items from the same patient-reported outcome measure (PROM) (internal consistency); over time (test-retest); by different persons on the same occasion (interrater); or by the same persons (i.e. raters or responders) on different occasions (intra-rater) | ||
| Internal consistency | The degree of the interrelatedness among the items | ||
| Reliability | The proportion of the total variance in the measurements which is due to ‘true’a differences between patients | ||
| Measurement error | The systematic and random error of a patient’s score that is not attributed to true changes in the construct to be measured | ||
| Validity | The degree to which a PROM measures the construct(s) it purports to measure | ||
| Content validity | The degree to which the content of a PROM is an adequate reflection of the construct to be measured | ||
| Face validity | The degree to which (the items of) a PROM indeed looks as though they are an adequate reflection of the construct to be measured | ||
| Construct validity | The degree to which the scores of a PROM are consistent with hypotheses (for instance, with regard to internal relationships, relationships to scores of other instruments, or differences between relevant groups) based on the assumption that the PROM validly measures the construct to be measured | ||
| Structural validity | The degree to which the scores of a PROM are an adequate reflection of the dimensionality of the construct to be measured | ||
| Hypotheses testing | Idem construct validity | ||
| Cross-cultural validity | The degree to which the performance of the items on a translated or culturally adapted PROM is an adequate reflection of the performance of the items of the original version of the PROM | ||
| Criterion validity | The degree to which the scores of a PROM are an adequate reflection of a ‘gold standard’ | ||
| Responsiveness | The ability of a PROM to detect change over time in the construct to be measured | ||
| Responsiveness | Idem responsiveness | ||
| Interpretabilityb | Interpretability is the degree to which one can assign qualitative meaning — that is, clinical or commonly understood connotations — to a PROM’s quantitative scores or change in scores | ||
aThe word ‘true’ must be seen in the context of the CTT, which states that any observation is composed of two components — a true score and error associated with the observation. ‘True’ is the average score that would be obtained if the scale was given an infinite number of times. It refers only to the consistency of the score, and not to its accuracy [55] bInterpretability is not considered a measurement property but an important characteristic of a measurement instrument
Criteria for good psychometric properties adapted from Prinsen et al. [43]
| Measurement property | Rating | Criteria |
|---|---|---|
| Structural validity | + |
• CFA: CFI or TLI or comparable measure > 0.95 OR RMSEA • < 0.06 OR SRMR < 0.082
No violation of unidimensionalityc: CFI or TLI or comparable measure > 0.95 OR RMSEA < 0.06V OR SRMR < 0.08b
No violation of local independence: residual correlations amongthe items after controlling for the dominant factor < 0.20 ORQ3s < 0.37
no violation of monotonicity: adequate looking graphs OR item scalability > 0.30
Adequate model fit • IRT: • Rasch: infit and outfit mean squares ≥ 0.5 and ≤ 1.5 OR Z-standardized values > −2 |
| ? | CTT: not all information for ‘+’ reported IRT/Rasch: model fit not reported | |
| – | Criteria for ‘+’ not met | |
| Internal consistency | + | At least low evidenced for sufficient structural validitye AND Cronbach’s alpha(s) ≥ 0.70 for each unidimensional scale or subscalef |
| ? | Criteria for ‘At least low evidenced for sufficient structural validitye’ not met | |
| – | At least low evidenced for sufficient structural validity5 AND Cronbach’s alpha(s) < 0.70 for each unidimensional scale or subscalef | |
| Reliability | + | ICC or weighted kappa ≥ 0.70 |
? – | ICC or weighted kappa not reported ICC or weighted kappa < 0.70 | |
| Measurement error | + ? – | SDC or LoA < MICe MIC not defined SDC or LoA > MICe |
| Hypotheses testing for construct validity | + | The result is in accordance with the hypothesisg |
| ? | No hypothesis defined (by the review team) | |
| – | The result is not in accordance with the hypothesisg | |
| Cross-cultural validity/measurement invariance | + | No important differences found between group factors (such as age, gender, language) in multiple group factor analysis OR no important DIF for group factors (McFadden’s |
| ? | No multiple group factor analysis OR DIF analysis performed | |
| – | Important differences between group factors OR DIF was found | |
| Criterion validity | + | Correlation with gold standard ≥ 0.70 OR AUC ≥ 0.70 |
| ? | Not all information for ‘+’ reported | |
| – | Correlation with gold standard < 0.70 OR AUC < 0.70 | |
| Responsiveness | + | The result is in accordance with the hypothesisg OR AUC ≥ 0.70 |
| ? | No hypothesis defined (by the review team) | |
| – | The result is not in accordance with the hypothesisg OR AUC < 0.70 |
The criteria are based on, e.g. Terwee et al. [56] and Prinsen et al. [44]. AUC, area under the curve; CFA, confirmatory factor analysis; CFI, comparative fit index; CTT, classical test theory; DIF, differential item functioning; ICC, intraclass correlation coefficient; IRT, item response theory; LoA, limits of agreement; MIC, minimal important change; RMSEA, root-mean-square error of approximation; SEM, standard error of measurement; SDC, smallest detectable change; SRMR, standardized root mean residuals; TLI, Tucker-Lewis index. a‘+,’ sufficient; ‘–,’ insufficient, ‘ ?,’ indeterminate. bTo rate the quality of the summary score, the factor structures should be equal across studies. cUnidimensionality refers to a factor analysis per subscale, while structural validity refers to a factor analysis of a multidimensional patient-reported outcome measure. dAs defined by grading the evidence according to the GRADE approach. eThis evidence may come from different studies. fThe criteria ‘Cronbach alpha < 0.95’ was deleted, as this is relevant in the development phase of a PROM and not when evaluating an existing PROM. gThe results of all studies should be taken together, and it should then be decided if 75% of the results are in accordance with the hypotheses