| Literature DB >> 33267819 |
L B Mokkink1, M Boers2,3, C P M van der Vleuten4, L M Bouter2,5, J Alonso6, D L Patrick7, H C W de Vet2, C B Terwee2.
Abstract
BACKGROUND: Scores on an outcome measurement instrument depend on the type and settings of the instrument used, how instructions are given to patients, how professionals administer and score the instrument, etc. The impact of all these sources of variation on scores can be assessed in studies on reliability and measurement error, if properly designed and analyzed. The aim of this study was to develop standards to assess the quality of studies on reliability and measurement error of clinician-reported outcome measurement instruments, performance-based outcome measurement instrument, and laboratory values.Entities:
Keywords: COSMIN; Delphi study; Measurement error; Outcome measurement instruments; Quality assessment; Reliability; Risk of Bias
Mesh:
Year: 2020 PMID: 33267819 PMCID: PMC7712525 DOI: 10.1186/s12874-020-01179-5
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Fig. 1Content of the Delphi study
Descriptive information of the panelists (n = 52)
| Country in which they mainly work | The Netherlands | 14 |
| USA | 10 | |
| Canada | 6 | |
| UK | 6 | |
| Australia | 3 | |
| Denmark | 2 | |
| France | 2 | |
| Germany | 2 | |
| Switzerland | 2 | |
| Belgium | 1 | |
| Finland | 1 | |
| Hong Kong | 1 | |
| Italy | 1 | |
| Spain | 1 | |
| Reason for invitationa | PubMed or EMBASE search | 4 |
| Author of review on non-PROMs | 3 | |
| Author of methodological paper on reliability | 16 | |
| Representative of relevant organization | 18 | |
| Own network only | 9 | |
| Nominated by invited panelist | 4 | |
| Professional backgrounda, b | Methodologist | 25 |
| Psychometrician | 18 | |
| Epidemiologist | 17 | |
| (Bio)statistician | 15 | |
| Allied health care professional | 10 | |
| Medical doctor | 7 | |
| (Clinical) psychologist | 4 | |
| Clinimetrician | 2 | |
| Other | 3 | |
| Ever used one of the COSMIN tools? | Yes | 29 |
| No | 23 |
a multiple answers allowed, b 56% of the panelists indicated to have more than one profession (up to 5)
Consensus of components of outcome measurement instruments that do not involve biological sampling
| Component | Consensus | Elaboration | Consensus on the elaboration (%) |
|---|---|---|---|
| Equipment | 43/48 (90%) (R2b) | All equipment used in preparation, administration, and assigning scores | 43/48 (90%) (R2) |
| Preparatory actions | 38/46 (83%) (R2) | 1. ‘First time only’ general preparatory actions, such as required expertise or training for professionals to prepare, administer, store or assign the scores 2. Specific preparatory actions for each measurement, such as • preparations of equipment by professionalsc • preparations of the patientd by the professional • Preparations undertaken by the patients | 37/46 (80%) (R2) |
| Unprocessed data collection | 30/44 (68%) (R2) | What the patient and/or professional(s) actually do to obtain the unprocessed data | 33/44 (75%) (R2) |
| Data processing and storage | 44/44 (100%) (R2) | All actions undertaken on the unprocessed data to allow the assignment of the score | 37/44 (84%) (R2) |
| 39/44 (89%) (R2) | |||
| Assignment of the score | 36/44 (82%) (R2) | Methods used to transform processed data into a final scoree on the outcome measurement instrument. | 34/44 (77%) (R2) |
a Consensus was set at 67% of the panelists (strongly) agreed to a proposal, the denominator can be decreased because panelists considered themselves to have ‘no expertise’ on a specific proposal or dropped-out; b R2: consensus reached in Round 2; c Professionals are those who are involved in the preparation or the performance of the measurement, in the data processing, or in the assignment of the score; this may be done by one and the same person, or by different persons; d In the COSMIN methodology we use the word ‘patient.’ However, sometimes the target population is not patients, but e.g. healthy individuals, caregivers, clinicians, or body structures (e.g. joints, or lesions). In these cases, the word patient should be read as e.g. healthy volunteer, or clinician; e The score can be further used or interpreted by converting a score to another scale, metric or classification. For example, a continuous score is classified into an ordinal score (e.g. mild/moderate/severe), a score is dichotomized into below or above a normal value, patients are classified as responder to the intervention (e.g. when their change is larger than the Minimal Important Change (MIC) value)
Consensus on components of outcome measurement instruments that involve biological sampling
| Component | Consensus | Elaboration | Consensus on the elaboration (%) |
|---|---|---|---|
| Equipment | See above | All equipment used in the preparation, the administration, and the determination of the values of the outcome measurement instrument | See above |
| Preparatory actions preceding sample collection by professionals, patients, and others (if applicable) | See above | 1. General preparatory actions, such as required expertise or training for professionals to prepare, administer, store and determine the value 2. Specific preparatory actions for each measurement, such as • preparations of equipment, environment, and storage by professionalsb • preparation of the patientc by the professional • Preparatory actions undertaken by the patients | See above |
| Collection of biological sample | 32/38 (84%) (R2d) | All actions undertaken to collect the biological sample, before any sample processing | 33/38 (87%)e (R2) |
| Biological sampling processing and storage | Combiningf 33/35 (94%) (R3g) Term: 29/35 (83%) (R3) | All actions undertaken to be able to preserve, transport, and store the biological sample for determination; and, if applicable, further actions undertaken on the stored sample to be able to conduct the determination of the biological sample | 35/36 (97%) (R3) |
| Determination of the value of the sample5 | 20/35 (57%) (R3) 31/35 (89%)i (R3) | Methods used for counting or quantifying the amount of the substance or entity of interesth | 27/36 (75%) (R3) |
a Consensus was set at 67% of the panelists (strongly) agreed to a proposal, the denominator can be decreased because panelists considered themselves to have ‘no expertise’ on a specific proposal or dropped-out; b Professionals are those who are involved in the preparation or the performance of the measurement, in the data processing, or in the assignment of the score; this may be done by one and the same person, or by different persons; c In the COSMIN methodology we use the word ‘patient.’ However, sometimes the target population is not patients, but e.g. healthy individuals, caregivers, clinicians, or body structures (e.g. joints, or lesions). In these cases, the word patient should be read as e.g. healthy volunteer, or clinician; d R2: consensus reached R2: consensus reached in Round 2; e After round 2 we changed the formulation, but we did not rated agreement among panelists; f In round 2 we proposed two components ‘initial processing and storage’ and ‘second processing’, which we proposed to combine in Round 3; g R3: consensus reached in Round 3; h Decision by the steering committee; i Consensus reached in R3 on the term ‘value’
Elements of a comprehensive research question of a study on reliability or measurement error
| Element of the research question | Consensusa (%) | |
|---|---|---|
| 1 | the | 42/45 (93%) (R1b) |
| 2 | the | 42/45 (93%) (R1) (version) 33/45 (73%) (R1) (operationalization) |
| 3 | the | 40/45 (89%) (R1) |
| 4 | a specification whether one is interested in a | 36/42 (86%) (R2d) |
| 5 | a specification of the | 38/45 (84%) (R1) |
| 6 | a specification of the | 41/45 (91%) (R1) |
| 7 | a specification of the | 42/45 (93%) (R1) |
a Consensus was set at 67% of the panelists (strongly) agreed to a proposal, the denominator can be decreased because panelists considered themselves to have ‘no expertise’ on a specific proposal or dropped-out; b R1: consensus reached in Round 1; c In Generalizability theory these are the facets of stratification (FoS), when patients are nested in a facet [16]; d R2: consensus reached in Round 2; e In Generalizability theory these are the random or fixed facets of generalizability (FoG), e.g. time or occasion, the (level of expertise of) professionals, the machines, or other components of the measurement [16]; f In the COSMIN methodology we use the word patient. However, sometimes the target population doesn’t consist of patients, but e.g. healthy individuals, caregivers, clinicians, or the body structures (e.g. joints, or lesions). In these cases, the word patient should be read as e.g. healthy volunteer, or clinician; g In Generalizability theory these are the Object of Measurement (OoM) or the facet of differentiation [16]
Standards for design requirements of studies on reliability or measurement error
| very good | adequate | doubtful | inadequate | NA | ||
|---|---|---|---|---|---|---|
| 1 | Were patients stable in the time between the repeated measurements on the construct to be measured? | Yes (evidence provided) | Reasons to assume standard was met | Unclear | No (evidence provided) | NA |
| 2 | Was the time interval between the repeated measurements appropriate? | Yes | Doubtful , OR time interval not stated | No | NA | |
| 3 | Were the measurement condition similar for the repeated measurements – except for the condition being evaluated as a source of variation? | Yes (evidence provided) | Reasons to assume standard was met, OR change was unavoidable | Unclear | No (evidence provided) | NA |
| 4 | Did the professional(s) administer the measurement without knowledge of scores or values of other repeated measurement(s) in the same patients? | Yes (evidence provided) | Reasons to assume standard was met | Unclear | No (evidence provided) | NA |
| 5 | Did the professional(s) assign the scores or determined the values without knowledge of the scores or values of other repeated measurement(s) in the same patients? | Yes (evidence provided) | Reasons to assume standard was met | Unclear | No (evidence provided) | |
| 6 | Were there any other important flaws in the design or statistical methods of the study? c | No | Minor methodological flaws | Yes | ||
a R2: consensus reached in round 2; b R3: consensus reached in round 3; c Standard 6 and the responses of the four-point rating system were not discussed in the Delphi study
Consensus reached on standards for preferred statistical methods for reliability
| very good | adequate | doubtful | inadequate | ||
|---|---|---|---|---|---|
| 7 | For continuous scores: was an Intraclass Correlation Coefficient (ICC)a calculated? | ICC calculated; the model or formula was described, and matches the study designc and the data | ICC calculated but model or formula was not described or does not optimally match the study designc OR Pearson or Spearman correlation coefficient calculated WITH evidence provided that no systematic difference between measurements has occurred | Pearson or Spearman correlation coefficient calculated WITHOUT evidence provided that no systematic difference between measurements has occurred OR WITH evidence provided that systematic difference between measurements has occurred | |
| 8 | For ordinal scores: was a (weighted) Kappa calculated? | Kappa calculated; the weighting scheme was described, and matches the study design and the data | Kappa calculated, but weighting scheme not described or does not optimally match the study design | ||
| 9 | For dichotomous/nominal scores: was Kappa calculated for each category against the other categories combined? | Kappa calculated for each category against the other categories combined | |||
a Generalizability and Decision coefficients are ICCs; b R2: consensus reached in round 2; c Based on panelists’ suggestions the steering committee decided after round 3 to use the word ‘study design’ instead of ‘reviewer constructed research question’; d R3: consensus reached in round 3
Consensus reached on standards for preferred statistical methods for measurement error (agreement)
| very good | adequate | doubtful | inadequate | ||
|---|---|---|---|---|---|
| 7 | For continuous scores: was the Standard Error of Measurement (SEM), Smallest Detectable Change (SDC), Limits of Agreement (LoA) or Coefficient of Variation (CV) calculated? | SEM, SDC, LoA or CV calculated; the model or formula for the SEM/SDC is described; it matches the study designc and the data | SEM, SDC, LoA or CV calculated, but the model or formula is not described or does not optimally match the study design and evidence provided that no systematic difference has occurred | SEMconsistency SDCconsistency or LoA or CV calculated, without knowledge about systematic difference or with evidence provided that systematic difference has occurred | SEM calculated based on Cronbach’s alpha OR using SD from another population |
| 8 | For dichotomous/ nominal/ ordinal scores: Was the percentage specific (e.g. positive and negative) agreement calculated? | % specific agreement calculated | % agreement calculated | ||
a R2: consensus reached in round 2; b R3: consensus reached in round 3; c Based on panelists’ suggestions the steering committee decided after round 3 to use the word ‘study design’ instead of ‘reviewer constructed research question’