| Literature DB >> 34040826 |
Abstract
Background: Applied health science research commonly measures concepts via multiple-item tools (scales), such as self-reported questionnaires or observation checklists. They are usually validated in more detail in separate psychometric studies or very cursorily in substantive studies. However, methodologists advise that, as validity is a property of the inferences based on measurement in a context, psychometric analyses should be performed in substantive studies as well. Until recently, performing comprehensive psychometrics required expert knowledge of different, often proprietary, software. The increasing availability of statistical techniques in the R environment now makes it possible to integrate such analyses in applied research.Entities:
Keywords: Psychometrics; item response theory; measurement; open science; reproducibility
Year: 2018 PMID: 34040826 PMCID: PMC8133536 DOI: 10.1080/21642850.2018.1472602
Source DB: PubMed Journal: Health Psychol Behav Med ISSN: 2164-2850
Overview of the 6 steps: questions, statistics and criteria for decision.
| Step | Question(s) | Statistics | Decision criteria |
|---|---|---|---|
| 1. Descriptives | Items with no/little variation? | Frequencies table (ordered) | If insufficient variation (for example <5 endorsements for a response option in a binary item, 95% of responses in a single category) → exclude item or merge categories |
| Differences between items regarding their distributions? | As above | If yes → possibility that results of IRT and FA might diverge; perform both and compare | |
| Negative correlations between items? | Tetrachoric (BI) or Spearman (ORD) inter-item correlation matrix & heatplot | If yes → reverse code items with negative correlations | |
| Are there respondents with unusual response patterns? | Multivariate outliers (Mahalanobis D2 and | If yes (e.g. D2 | |
| 2. Non-parametric IRT | Do items form a single scale? | Coefficients of homogeneity (item, item pair, scale) | H < .30 → the scale is not homogeneous (or item is not scalable) → consider excluding items after dimensionality checks (below) |
| How many, and which respondents have idiosyncratic response patterns? | Guttman errors | > third quartile + 1.5 interquartile range → examine responses and possible reasons (test administration, data entry errors); check influence on results via sensitivity tests | |
| Is the scale uni- or multi-dimensional? | Automatic item selection algorithm ( | If unscalable items (value of 0 in | |
| Are items associated only via the latent dimension? | Conditional association (local independence test) | if significant violations identified → exclude one by one starting with those with crit values >80 (Schuur, | |
| Is the probability of endorsing a ‘correct’ response option increasing with increasing levels of the latent dimension? | Monotonicity per subscale | if significant violations identified → exclude one by one starting with those with crit values >80 (Schuur, | |
| Is the ‘difficulty/intensity’ order of the items the same (invariant) at all levels of the latent dimension? | Invariant item ordering per subscale (ORD: method=‘MIIO’) | if significant violations identified → exclude one by one starting with those with crit values >80 (Schuur, | |
| 3. Parametric IRT | Do items form a scale that satisfies requirements of additive measurement? | Rasch (BI) or Rating Scale model (ORD) item fit (infit and outfit); pathway map | If item fit outside the mean squares range of 0.6–1.4 and standardized fit statistics outside +/−2.0 → exclude items one by one |
| What is the order of item difficulty? Are there levels of the latent continuum with too many/few items? | Item difficulty estimates; joint ICCs plot; | If items are not ordered according to expectations (if a priori hypotheses exist) → consider excluding items | |
| Are item associations explained only by the latent dimension? | 2- and 3-way residuals (local independence test) | If significant ( | |
| How many, and which respondents have response patterns that do not fit the model? | person fit | persons with mean squares outside range of 0.6–1.4 and standardized fit statistics outside +/−2.0 → examine responses and possible reasons (test administration, data entry errors); check influence on results via sensitivity tests | |
| How well is the scale able to differentiate between respondents regarding their ability levels? | Separation reliability; person separation | Reliability <.80, person separation <2 (depending on sample size and use of the scale) → consider adding items or sampling respondents with extreme levels | |
| Does their difficulty match the ability of the sample? | Person-item map | If too many/few items in some areas of the latent continuum (item saturation/deficiency) → consider excluding items / generating items for further study | |
| 4. FA | What is the optimal number of dimensions? | (BI: tetrachoric matrix; ORD: default or polychoric matrix) | If the PA, VSS and ICLUST solutions differ → consider examining item level diagnostics and Step 2 and 3 results to explain inconsistency |
| Are the data consistent with the hypothesized scale structure? | Confirmatory FA: | Model misfit (TLI ≤ 0.95; CFI ≤ 0.95; RMSEA ≥ 0.06; | |
| 5. CTT | Is the (sub)scale reliable? | Reliability <.80 or .70 (arbitrary thresholds, interpret with care) → consider adding items (depending on the purposes of the scale and results of Step 2 and 3, e.g. person-item map) | |
| Are items associated with the total score? Would their exclusion improve reliability? | Item-total associations; Cronbach's | Item-total associations <.30, | |
| 6. Total scores | Do total scores show the expected distribution? Any ceiling/floor effects? | Frequencies, descriptives, histograms, % extreme values | If summary statistics not as expected (e.g. % of respondents with extreme values >15% of the sample) → reconceptualization and/or further scale improvement is necessary |
Note: BI: statistics applicable only for binary data; ORD: statistics applicable only for ordinal data; IRT: item response theory; FA: factor analysis; CTT: classical test theory;
Figure 1.Endorsement frequencies for RM-SIP items from least endorsed to most endorsed.
Mokken Scaling for SIP items: aisp algorithm at increasing homogeneity thresholds.
| Item | Homogeneity threshold levels | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| .05 | .10 | .15 | .20 | .25 | .30 | .35 | .40 | .45 | .50 | .55 | .60 | |
| 1.stay home | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 | 2 | 2 |
| 2.change position | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 3.walk slowly | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 3 | 3 | 3 |
| 4.no work | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 3 | 4 | 0 | 0 |
| 5.handrail | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 5 | 7 | 7 |
| 6.rest often | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 |
| 7.hold on stand up | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 8.others do | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 3 | 4 | 5 | 5 |
| 9.dress slowly | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 6 | 6 |
| 10.stand up less | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 | 3 | 0 |
| 11.not bend down | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 12.struggle chair | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 13.difficult bed | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 4 | 4 |
| 14.appetite not good | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 15.trouble socks | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 6 | 6 |
| 16.walk short | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 | 3 | 3 |
| 17.sleep bad | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 4 | 4 |
| 18.help dress | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 19.sit down | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 | 3 | 3 |
| 20.no heavy jobs | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 | 4 | 5 | 5 |
| 21.bad temper | 1 | 1 | 1 | 1 | 1 | 0 | 2 | 0 | 0 | 0 | 0 | 0 |
| 22.upstairs slowly | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 5 | 7 | 7 |
| 23.stay in bed | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 |
| 24.constant pain | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 5 | 2 | 2 |
Note: Numbers represent which subscale the item belongs to; 0 indicates the item is unscalable at that homogeneity level.
Figure 2.Joint Item Characteristic Curve (ICC) plot and person-item map for the 15-item RM-SIP.
Figure 3.Parallel analysis and Very Simple Structure (VSS) plots for the 24-item RM-SIP.
Bivariate correlations (Pearson's r) between illness perceptions, pain intensity, and 15-item and 24-item RM-SIP.
| Variables | IP1 | IP2 | IP3 | IP4 | IP5 | IP6 | IP7 | IP8 | VAS | SIP24 |
|---|---|---|---|---|---|---|---|---|---|---|
| IP1 – consequences | 0.43*** | |||||||||
| IP2 – timeline | −0.15* | −0.18** | ||||||||
| IP3 – personal control | −0.15* | −0.15* | 0.42*** | |||||||
| IP4 – treatment control | 0.53*** | 0.29*** | −0.08 | −0.03 | ||||||
| IP5 – identity | 0.57*** | 0.29*** | −0.27*** | −0.22*** | 0.41*** | |||||
| IP6 – concern | 0.09 | 0.06 | 0.26*** | 0.30*** | 0.03 | −0.08 | ||||
| IP7 – understanding | 0.50*** | 0.25*** | −0.19** | −0.14* | 0.17* | 0.60*** | −0.11 | |||
| Pain intensity – VAS | 0.55*** | 0.30*** | −0.22** | −0.15* | 0.45*** | 0.40*** | 0.12# | 0.33*** | ||
| 24-item RM-SIP | 0.64*** | 0.33*** | −0.14* | −0.14* | 0.46*** | 0.35*** | 0.04 | 0.35*** | 0.52*** | |
| 15-item RM-SIP | 0.61*** | 0.31*** | −0.11 | −0.12# | 0.44*** | 0.29*** | 0.05 | 0.26*** | 0.50*** | 0.96*** |
Note: # p < .1; *p < .05; **p < .01; ***p < .001; VAS, visual analogue scale; IP, illness perceptions; RM-SIP, Sickness Impact Profile Roland Scale.
Multiple regressions of 15-item and 24-item RM-SIP (n = 222; non-standardized estimates and standard errors).
| Dependent variable: | ||
|---|---|---|
| 15-item RM-SIP | 24-item RM-SIP | |
| Intercept | −2.241 (1.944) | −3.233 (2.664) |
| Gender (male) | −0.013 (0.482) | 0.320 (0.660) |
| Age | 0.024 (0.020) | 0.007 (0.028) |
| Education (low) | 0.805@ (0.454) | 0.931 (0.622) |
| Pain intensity- VAS | 0.027** (0.008) | 0.034** (0.011) |
| IP1 – consequences | 0.985*** (0.158) | 1.285*** (0.216) |
| IP2 – timeline | 0.038 (0.157) | 0.112 (0.215) |
| IP3 – personal control | −0.013 (0.095) | −0.020 (0.131) |
| IP4 – treatment control | −0.032(0.092) | −0.066 (0.126) |
| IP5 – identity | 0.257@ (0.133) | 0.423* (0.183) |
| IP6 – concern | −0.246* (0.122) | −0.326@(0.167) |
| IP7 – understanding | −0.028 (0.087) | −0.037 (0.119) |
| IP8 – emotional response | −0.016(0.117) | 0.201 (0.161) |
| Adjusted R2 | 0.42 | 0.45 |
Note: @p < .1; *p < .05; **p < .01; ***p < .001; VAS, visual analogue scale; IP, illness perceptions; RM-SIP, Sickness Impact Profile Roland Scale.