| Literature DB >> 23764247 |
Claudia Gorecki1, Julia M Brown, Stefan Cano, Donna L Lamping, Michelle Briggs, Susanne Coleman, Carol Dealey, Elizabeth McGinnis, Andrea E Nelson, Nikki Stubbs, Lyn Wilson, Jane Nixon.
Abstract
BACKGROUND: Patient-reported outcome (PRO) data are integral to patient care, policy decision making and healthcare delivery. PRO assessment in pressure ulcers is in its infancy, with few studies including PROs as study outcomes. Further, there are no pressure ulcer PRO instruments available.Entities:
Mesh:
Year: 2013 PMID: 23764247 PMCID: PMC3698102 DOI: 10.1186/1477-7525-11-95
Source DB: PubMed Journal: Health Qual Life Outcomes ISSN: 1477-7525 Impact factor: 3.186
Figure 1Steps towards developing and evaluating the PU-QOL instrument.
Psychometric tests and criteria used in the evaluation of the PU-QOL instrument
| ● Score distributions (floor/ceiling effects and skew of scale scores) | ● Even distribution of endorsement frequencies across response categories (>80%) | |
| ● % of item-level missing data (<10%) [ | ● Low number of persons at extreme (i.e. floor/ceiling) ends of the measurement continuum | |
| ● % of computable scale scores (>50% completed items) [ | ||
| ● Items in scales rated ‘not relevant’ <35% | ||
| ● Similar item mean scores [ | ● Positive residual r between items (<0.30) | |
| ● Items have adequate corrected ITC (ITC ≥0.3) [ | ● High negative residual r (>0.60) suggests redundancy | |
| ● Items have similar ITCs [ | ● Items sharing common variance suggests uni-dimensionality | |
| | ● Items do not measure at the same point on the scale | ● Evenly spaced items spanning whole measurement range |
| ● NA | ● Ordered set of response thresholds for each scale item | |
| ● Scale scores spanning entire scale range | ● Person-item threshold distribution: person locations should be covered by items and item locations covered by persons when both calibrated on the same metric scale [ | |
| ● Floor and ceiling (proportion sample at minimum and maximum scale score) effects should be low (<15%) [ | ||
| | ● Skewness statistics should range from −1 to +1 [ | ● Good targeting demonstrated by the mean location of items and persons around zero |
| | ● No published criteria for item level targeting | |
| | | |
| Internal consistency - extent to which items comprising a scale measure the same construct (e.g. homogeneity of the scale). | ● Cronbach's alphas for summary scores (adequate scale internal consistency is ≥0.70 [ | ● High person separation index >0.7 [ |
| ● Item-total r between +0.4 and +0.6 indicate items are moderately correlated with scale scores; higher values indicate well correlated items with scale scores [ | ||
| ● Power-of-tests indicate the power in detecting the extent to which the data do not fit the model [ | ||
| ● Items with ordered thresholds | ||
| *Test-retest reliability - stability of a measuring instrument. | ● Intra-class r coefficient >0.70 between test and retest scores [ | ● Statistical stability across time points (no uniform or non-uniform item DIF (p=>0.05 or Bonferroni adjusted value)) |
| ● Pearson r: >0.7 indicates reliable scale stability | ||
| ● Involves accumulating evidence from different forms | | |
| Content validity - extent to which the content (items) of a scale is representative of the conceptual construct it is intended to measure. | ● Consideration of item sufficiency and the target population | ● Clearly defined construct |
| ● Qualitative evidence from individuals for whom the measure is targeted, expert opinion and literature review (e.g. theoretical and/or conceptual definitions) [ | ● Validity comes from careful item construction and consideration of what each item is meant to measure, then testing against model expectations | |
| Construct validity | | |
| i) Within-scale analyses - extent to which a distinct construct is being measured and that items can be combined to form a scale score. | ● Cronbach alpha for scale scores >0.70 | ● Fit residuals (item-person interaction) within given range +/−2.5 |
| ● ITC >0.30 | ||
| ● Homogeneity coefficient (IIC mean and range >0.3) | ● Non-significant chi square (item-trait interaction) values | |
| ● Scaling success | ● No under- or over-discriminating ICC | |
| | | ● Mean fit residual close to 0.0; SD approaching 1.0 [ |
| | | ● Person fit residuals within given range +/−2.5 |
| Measurement continuum - extent to which scale items mark out the construct as a continuum on which people can be measured. | ● NA | ● Individual scale items located across a continuum in the same way locations of people are spread across the continuum [ |
| | | ● Items spread evenly over a reasonable measurement range [ |
| Response dependency –response to one item determines response to another. | ● NA | ● Response dependency is indicated by residual r >0.3 for pairs of items [ |
| ii) Between scale analysis | | |
| Criterion Validity - hypotheses based on criterion or ‘gold standard’ measure. | ● There are no true gold standard HRQL [ | ● NA |
| *Convergent validity - scale correlated with other measures of the same/ similar constructs. | ● Moderate to high r predicted for similar scales; criteria used as guides to the magnitude of r, as opposed to pass/fail benchmarks (high r >0.7; moderate r=0.3-0.7; low r <0.3) [ | ● NA |
| *Discriminant validity – scale not correlated with measures of different constructs | ● Low r (<0.3) predicted between scale scores and measures of different constructs (e.g. age, gender) | ● NA |
| *Known groups differences - ability of a scale to differentiate known groups | ● ^Generate hypotheses (based on subgroups known to differ on construct measured) and compare mean scores (e.g. predict a stepwise change in PU-QOL scale scores across 3 PU severity groups and that mean scores would be significantly different) | ● Hypothesis testing (e.g. clinical questions are formulated and the empirical testing comes from whether or not data fit the Rasch model) |
| | ● Statistically significant differences in mean scores (ANOVA) | |
| * | ● NA | ● Persons with similar ability should respond in similar ways to individual items regardless of group membership (e.g. age) [ |
| | | ● Uniform DIF - uniformity amongst differences between groups |
| ● Non-Uniform DIF - non-uniformity amongst differences between groups; can be considered at 1% (Bonferroni adjusted) and 5% CIs |
Table adapted from [35,45]; *Additional tests performed for field test two; ^The PU HRQL literature is not well established, therefore was limited for identifying clinical parameters to formulate known groups; NA No test for particular psychometric property; SD Standard deviation; ITC Item total correlation; IIT Inter-item correlation; ICC Item characteristic curve; r correlation; ANOVA, Analysis of variance; DIF Differential item functioning; CI Confidence interval.
Participant characteristics
| 24 - 98 (72, 13.5) | 20 - 103 (71.3, 16.5) | |
| Total | n=227 | n=229 |
| Male | 90 (39.6) | 119 (52.0) |
| Female | 137 (60.4) | 110 (48.0) |
| | | |
| White | 223 (98.2) | 227 (99.1) |
| Asian | 1 (0.4) | 2 (0.9) |
| Black/African | 2 (0.4) | 0 |
| Chinese | 0 | 0 |
| Not stated | 1 (0.4) | 0 |
| | | |
| Hospital (surgery) | 99 (43.6) | 62 (27.1) |
| Hospital (medicine) | 21 (9.3) | 74 (32.3) |
| Community | 107 (47.1) | 88 (38.4) |
| | | |
| Category 1 | 38 (10.6%) | 76 (18.1%) |
| Category 2 | 144 (40.2%) | 170 (40.5%) |
| Category 3/4 | 153 (42.7%) | 170 (40.5%) |
| Missing | 1 (0.3%) | 4 (0.9%) |
| | | |
| Short-term risk | 39 (17.2) | 36 (15.7) |
| Medium to long-term risk | 71 (31.3) | 87 (38.0) |
| On-going long-term risk | 116 (51.1) | 103 (45.0) |
| Missing | 1 (0.4) | 3 (1.3) |
| | | |
| Single | 59 (26.0) | 71 (31.0) |
| Married | 85 (37.5) | 77 (33.6) |
| Cohabiting | 81 (35.7) | 75 (32.8) |
| Missing | 2 (0.8) | 6 (2.6) |
| | | |
| Live alone | 84 (37.0) | 86 (37.6) |
| Cohabit with carer | 63 (27.8) | 51 (22.3) |
| Cohabit with other | 61 (26.9) | 48 (20.9) |
| Missing | 19 (8.4) | 44 (19.2) |
| | | |
| No formal education | 129 (56.8) | 125 (54.6) |
| GCSE or equivalent | 39 (17.2) | 40 (17.5) |
| A-Level or equivalent | 25 (11.0) | 16 (6.9) |
| Degree or higher | 15 (6.6) | 21 (9.2) |
| Missing | 19 (8.4) | 27 (11.8) |
Summary of preliminary PU-QOL instrument psychometric analysis, field test 1
| Pain (8) | 5 | -0.94 − 0.80 | 0 | 4 | 0.78 | 0.89 | 0.24 – 0.66 | 0.53 – 0.70^ |
| Exudate (8) | 4 | -0.51 − 0.48 | 0 | 0 | 0.59 | 0.92 | 0.40 – 0.86 | 0.56 – 0.84^ |
| Odour (6) | 2 | -1.47 − 0.60 | 0 | 0 | 0.62 | 0.96 | 0.74 – 0.91 | 0.83 – 0.92^ |
| Sleep (6) | 3 | -0.54 − 0.31 | 0 | 0 | 0.62 | 0.92 | 0.48 – 0.84 | 0.67 – 0.86^ |
| Vitality (3) | 0 | -0.48 − 0.44 | 0 | 0 | 0.03 | n/a | n/a | n/a |
| Movement/mobility (11) | 4 | -0.33 − 0.48 | 0 | 0 | 0.58 | 0.93 | 0.23 – 0.91 | 0.67 – 0.80^ |
| ADL (9) | 8 | -0.54 − 0.57 | 0 | 0 | 0.29 | 0.95 | 0.41 – 0.90 | 0.58 – 0.90^ |
| Emotional well-being (17) | 4 | -1.15 − 1.46 | 1 | 0 | 0.82 | 0.93 | 0.24 – 0.79 | 0.54 – 0.76^ |
| Appearance & self- consciousness (7) | 4 | -0.83 − 0.65 | 0 | 0 | 0.56 | 0.90 | 0.41 – 0.75 | 0.60 – 0.79^ |
| Participation (9) | 4 | -0.56 − 0.54 | 0 | 0 | 0.65 | 0.96 | 0.53 – 0.89 | 0.73 – 0.90^ |
IIC inter-item correlation; ^Range item-total correlation (ITC).
Summary of PU-QOL Rasch analysis, field test 2
| Pain (8) | 0 | -1.11 − 1.03 | 0 | 0 | 0.72 | 0 | 0 | 0 | 0 | 0 | 0 |
| Exudate (8) | 1 | -0.75 – 0.84 | 1 | 1 | 0.69 | 0 | 0 | 0 | 0 | 0 | 0 |
| Odour (6) | 0 | -1.31 – 0.91 | 0 | 0 | 0.66 | 0 | 0 | 0 | 0 | 0 | 0 |
| Sleep (6) | 0 | -0.91 – 0.45 | 1 | 1 | 0.62 | 0 | 0 | 0 | 0 | 0 | 0 |
| Mobility and movement (9) | 2 | -0.46 – 0.57 | 0 | 0 | 0.42 | 0 | 0 | 0 | 0 | 2 | 0 |
| Activity (8) | 4 | -0.30 – 0.56 | 0 | 0 | 0.27 | 0 | 0 | 0 | 0 | 0 | 0 |
| Vitality (6) | 0 | -0.50 – 0.81 | 0 | 0 | 0.38 | 1 | 0 | 0 | 0 | 0 | 0 |
| Emotional well-being (15) | 2 | -1.48 – 2.44 | 0 | 0 | 0.86 | 0 | 0 | 0 | 0 | 0 | 0 |
| Self-consciousness (7) | 0 | -1.27 – 1.02 | 0 | 0 | 0.58 | 0 | 0 | 0 | 0 | 0 | 0 |
| Participation (9) | 7 | -0.91 – 1.00 | 0 | 0 | 0.57 | 0 | 0 | 0 | 0 | 0 | 0 |
DIF differential item functioning; HC healthcare; Uni uniform DIF; Non non-uniform DIF.
Summary of PU-QOL traditional psychometric analysis, field test 2
| Pain (8) | 0.89 | 0.24 – 0.66^ | 0.53 – 0.70^ | 0.80 | 0.81 | 0.80 | 0.48b | 0.38b (206) | 0.13b (214) | 0.11b (214) |
| Exudate (8) | 0.91 | 0.32 – 0.72^ | 0.51 – 0.75^ | 0.62 | 0.63 | 0.62 | n/a | 0.25a (216) | 0.08b (225) | -0.14b (224) |
| Odour (6) | 0.97 | 0.72 – 0.93^ | 0.79 – 0.94^ | 0.68 | 0.68 | 0.70 | n/a | 0.20a (217) | 0.05b (228) | -0.14b (227) |
| Sleep (6) | 0.92 | 0.49 – 0.81^ | 0.68 – 0.85^ | 0.82 | 0.82 | 0.82 | n/a | 0.32b (171) | 0.21b (178) | 0.10b (178) |
| Vitality (6) | 0.90 | 0.49 – 0.90^ | 0.63 – 0.90^ | 0.74 | 0.74 | 0.74 | 0.36b | 0.52b (135) | 0.03b (137) | -0.16b (137) |
| Movement/Mobility (9) | 0.93 | 0.23 – 0.91^ | 0.67 – 0.80^ | 0.87 | 0.86 | 0.88 | -0.50b | 0.39b (37) | 0.04b (39) | 0.22b (39) |
| ADL (8) | 0.95 | 0.41 – 0.90^ | 0.58 – 0.90^ | 0.87 | 0.87 | 0.87 | -0.38b | 0.35b (48) | -0.05b (49) | -0.19b (49) |
| Emotional well-being (15) | 0.94 | 0.24 – 0.79^ | 0.54 – 0.76^ | 0.83 | 0.82 | 0.83 | -0.44b | 0.58b (133) | 0.16b (135) | -0.15b (135) |
| Appearance & self-consciousness (7) | 0.89 | 0.37 – 0.79^ | 0.62 – 0.76^ | 0.81 | 0.81 | 0.81 | -0.40b | 0.50b (176) | 0.23b (179) | -0.03b (178) |
| Participation (9) | 0.93 | 0.36 – 0.88^ | 0.60 – 0.86^ | 0.63 | 0.64 | 0.63 | -0.52b | 0.51b (75) | 0.01b (76) | -0.29b (76) |
IIC inter-item correlation; ^Range item-total correlation (ITC); 1Spearman correlation; 2Pearson correlation; aCorrelations falling outside of the predicted range; bCorrelations consistent with predictions.