| Literature DB >> 27618914 |
Cecilia A C Prinsen1, Sunita Vohra2,3,4, Michael R Rose5, Maarten Boers6,7, Peter Tugwell8, Mike Clarke9, Paula R Williamson10, Caroline B Terwee6.
Abstract
BACKGROUND: In cooperation with the Core Outcome Measures in Effectiveness Trials (COMET) initiative, the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) initiative aimed to develop a guideline on how to select outcome measurement instruments for outcomes (i.e., constructs or domains) included in a "Core Outcome Set" (COS). A COS is an agreed minimum set of outcomes that should be measured and reported in all clinical trials of a specific disease or trial population.Entities:
Keywords: COMET; COSMIN; Core Outcome Set; Delphi study; Guideline; Instrument selection; Outcome measurement instrument; Outcomes research
Mesh:
Year: 2016 PMID: 27618914 PMCID: PMC5020549 DOI: 10.1186/s13063-016-1555-2
Source DB: PubMed Journal: Trials ISSN: 1745-6215 Impact factor: 2.279
Characteristics of the panelists
| Study characteristics | Panelists ( |
|---|---|
| Country, number (%) | |
| Australia | 15 (16) |
| Canada | 14 (15) |
| Denmark | 7 (7) |
| Germany | 6 (6) |
| The Netherlands | 19 (20) |
| Spain | 5 (5) |
| UK | 12 (13) |
| USA | 8 (8) |
| Otherb | 14 (10) |
| Background, number (%)c | |
| Allied health care professional | 30 (32) |
| Clinimetrician/psychometrician | 29 (31) |
| Epidemiologist | 40 (42) |
| Physician | 28 (30) |
| Statistician | 10 (11) |
| Otherd | 15 (16) |
| Current profession, number (%)c | |
| Clinician | 26 (27) |
| Journal editor | 9 (10) |
| Researcher | 88 (93) |
| Othere | 10 (11) |
| Level of experience in COS development, number (%) | |
| A lot | 11 (12) |
| Some | 28 (30) |
| A little | 26 (27) |
| None | 30 (32) |
| Level of experience in instrument development, number (%) | |
| A lot | 32 (34) |
| Some | 39 (41) |
| A little | 12 (13) |
| None | 12 (13) |
| Level of experience with evaluation of measurement properties, number (%) | |
| A lot | 44 (46) |
| Some | 33 (35) |
| A little | 14 (15) |
| None | 4 (4) |
| Level of experience in conducting systematic reviews, number (%) | |
| A lot | 20 (21) |
| Some | 35 (37) |
| A little | 19 (20) |
| None | 21 (22) |
aIn some cases, the total numbers are not exactly 100 % because of rounding of percentages to no decimal places
bBrazil (N = 1), France (N = 2), Italy (N = 3), Norway (N = 1), Portugal (N = 1), Switzerland (N = 1)
cAs panelists could tick more than one response option, the total score exceeded 100 %
dTrialist (N = 2), systematic reviewer (N = 1), social research methodologist (N = 2), clinical academic (N = 1), scientific researcher (N = 1), health services researcher (N = 1), clinical psychologist (N = 2), project manager (N = 1), public health (N = 1), academic course writer/teacher (N = 1), clinical researcher (N = 1), human movement scientist (N = 1)
eAcademic (N = 2), consultant for clinical researches (N = 1), research funder (N = 1), Health Technology Assessment consultant (N = 2), educator (N = 1), project manager (N = 1), advisor on research methods (N = 1), director of collaborative centre (N = 1)
Consensus on four main steps in the selection of outcome measurement instruments for Core Outcome Sets (COSs), including their tasks
| Percentage of agreement in the Delphi study (%) | ||
|---|---|---|
|
| ||
| Aspects to consider before starting to search for outcome measurement instruments: | ||
| 1. | The construct (i.e., outcome or domain) to be measured | 98 |
| 2. | The target population (e.g., age, gender, disease characteristics) | 99 |
|
| ||
| COS developers should aim for finding | 72 | |
| When finding outcome measurement instruments, COS developers can have three sources of information: (1) systematic reviews, (2) literature searches, and (3) other sources (optional) | ||
| 1. | COS developers use existing, good quality, and up-to-date systematic reviews of outcome measurement instruments | 94 |
| 2 | a. MEDLINE (e.g., through the PubMed or OVID interface) is considered the minimum database to consult in finding all existing outcome measurement instruments. An additional search in EMBASE is highly recommended | 99 and 82, respectively |
| b. Reference lists of the included studies should be checked to find all existing outcome measurement instruments | 91 | |
| 3. | Additional sources may be considered as optional sources in finding relevant outcome measurement instruments | 89 |
|
| ||
| To evaluate the quality of the outcome measurement instruments, COS developers evaluate (1) the measurement properties and (2) the feasibility aspects of the identified outcome measurement instruments | ||
| 1. | Evidence on the measurement properties should be available in the target populationa | 70–93 |
| 2. | Feasibility aspects should be taken into consideration in the selection of outcome measurement instruments for outcomes included in a COSb | 77–97 |
|
| ||
| 1. | Select only one outcome measurement instrument for each outcome (e.g., construct or domain) in a COS | 90 |
| 2. | The minimum requirements for including an outcome measurement instrument in a COS are: at least high quality evidencec for goodd content validity and for goodd internal consistency (if applicable), and if the outcome measurement instrument is feasible | 81 |
| 3. | A consensus procedure to agree on the outcome measurement instruments for each outcome included in a COS should be performed among all relevant stakeholders, including patients | 90 |
aSee Table 3 for the percentage of agreement per measurement property separately
bSee Table 6 for the percentage of agreement per feasibility aspect separately
c“High quality evidence” is defined as consistent findings in multiple studies of at least good quality OR in one study of excellent quality AND a total sample size of 100 patients or more (Table 5)
d“Good” is defined as a “+” rating according to the criteria for good measurement properties (Table 4)
Overview of all measurement properties, including their definitions
| Measurement property | Definition according to the COSMINa taxonomy | Percentage of agreement in the Delphi study (%) |
|---|---|---|
| Content validity (including face validity) | The degree to which the content of a measurement instrument is an adequate reflection of the construct to be measured | 93 |
| Reliability | The degree to which the measurement is free from measurement error | 91 |
| Responsiveness | The ability of a measurement instrument to detect change over time in the construct to be measured | 91 |
| Internal consistency | The degree of interrelatedness among the items | 90 |
| Structural validity | The degree to which the scores of a measurement instrument are an adequate reflection of the dimensionality of the construct to be measured | 83 |
| Measurement error | The systematic and random error of a patient’s score that is not attributed to true changes in the construct to be measured | 83 |
| Hypotheses testing | The degree to which the scores of a measurement instrument are consistent with hypotheses based on the assumption that the measurement instrument validly measures the construct to be measured | 82 |
| Criterion validity | The degree to which the scores of a measurement instrument are an adequate reflection of a “gold standard” | 76 |
| Cross-cultural validity | The degree to which the performance of the items on a translated or culturally adapted measurement instrument is an adequate reflection of the performance of the items of the original version of the measurement instrument | 70 |
a COnsensus-based Standards for the selection of health Measurement INstruments
Overview of all feasibility aspects
| Feasibility aspects | Percentage of agreement in the Delphi study (%) |
|---|---|
| Patient’s comprehensibility | 97 |
| Interpretability | 95 |
| Ease of administration | 93 |
| Length of the outcome measurement instrument | 91 |
| Completion time | 91 |
| Patient’s mental ability level | 91 |
| Ease of standardization | 90 |
| Clinician’s comprehensibility | 90 |
| Type of outcome measurement instrument | 90 |
| Cost of an outcome measurement instrument | 89 |
| Required equipment | 88 |
| Type of administration | 87 |
| Availability in different settings | 86 |
| Copyright | 85 |
| Patient’s physical ability level | 85 |
| Regulatory agency’s requirement for approval | 84 |
| Ease of score calculation | 77 |
Quality of evidence
| Quality rating | Criteria |
|---|---|
| High | Consistent findings in multiple studies of at least good quality OR one study of excellent quality AND a total sample size of ≥100 patients |
| Moderate | Conflicting findings in multiple studies of at least good quality OR consistent findings in multiple studies of at least fair quality OR one study of good quality AND a total sample size of ≥50 patients |
| Low | Conflicting findings in multiple studies of at least fair quality OR one study of fair quality AND a total sample size of ≥30 patients |
| Very low | Only studies of poor quality OR a total sample size of <30 patients |
| Unknown | No studies |
Criteria for good measurement properties
| Measurement property | Rating* | Criteria | Percentage of agreement in the Delphi study (%) |
|---|---|---|---|
| Content validity |
| All items refer to relevant aspects of the construct to be measured AND are relevant for the target population AND are relevant for the context of use AND together comprehensively reflect the construct to be measured | 97 |
|
| Not all information for ‘+’ reported | ||
|
| Criteria for ‘+’ not met | ||
| Structural validity |
|
| CTT: 84 |
|
| |||
|
| CTT: Not all information for ‘+’ reported | ||
|
| Criteria for ‘+’ not met | ||
| Internal consistency |
| At least limited evidence for unidimensionality or positive structural validity AND Cronbach's alpha(s) ≥ 0.70 and ≤ 0.95 | 89 |
|
| Not all information for ‘+’ reported OR conflicting evidence for unidimensionality or structural validity OR evidence for lack of unidimensionality or negative structural validity | ||
|
| Criteria for ‘+’ not met | ||
| Reliability |
| ICC or weighted Kappa ≥ 0.70 | 88 |
|
| ICC or weighted Kappa not reported | ||
|
| Criteria for ‘+’ not met | ||
| Measurement error |
| SDC or LoA < MIC | 72 |
|
| MIC not defined | ||
|
| Criteria for ‘+’ not met | ||
| Hypotheses testing |
| At least 75% of the results are in accordance with the hypotheses | 87 |
|
| No correlations with instrument(s) measuring related construct(s) AND no differences between relevant groups reported | ||
|
| Criteria for ‘+’ not met | ||
|
| No important differences found between language versions in multiple group factor analysis or DIF analysis | ||
| Cross-cultural validity |
| Multiple group factor analysis AND DIF analysis not performed | 84 |
|
| One or more criteria for ‘+’ not met | ||
| Criterion validity |
| Convincing arguments that gold standard is “gold” AND correlation with gold standard ≥ 0.70 | 88 |
|
| Not all information for ‘+’ reported | ||
|
| Criteria for ‘+’ not met | ||
| Responsiveness |
| At least 75% of the results are in accordance with the hypotheses | 88 |
|
| No correlations with changes in instrument(s) measuring related construct(s) AND no differences between changes in relevant groups reported | ||
|
| Criteria for ‘+’ not met |
Modified from Terwee et al. [19]
AUC = area under the curve, CFI = comparative fit index, CTT = classical test theory, DIF = differential item functioning, EFA = exploratory factor analysis, ICC = intraclass correlation coefficient, IRT = item response theory, LoA = limits of agreement, MIC = minimal important change, RMSEA = root mean square error of approximation, SEM = Standard Error of Measurement, SDC = smallest detectable change, SRMR = standardized root mean residuals, TLI = Tucker-Lewis index
“+” = positive rating, “?” = indeterminate rating,” –“ = negative rating