Literature DB >> 27132246

Are validated outcome measures used in distal radial fractures truly valid? A critical assessment using the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist.

Y V Kleinlugtenbelt¹, R W Nienhuis², M Bhandari³, J C Goslings⁴, R W Poolman⁵, V A B Scholtes⁵.

Abstract

OBJECTIVES: Patient-reported outcome measures (PROMs) are often used to evaluate the outcome of treatment in patients with distal radial fractures. Which PROM to select is often based on assessment of measurement properties, such as validity and reliability. Measurement properties are assessed in clinimetric studies, and results are often reviewed without considering the methodological quality of these studies. Our aim was to systematically review the methodological quality of clinimetric studies that evaluated measurement properties of PROMs used in patients with distal radial fractures, and to make recommendations for the selection of PROMs based on the level of evidence of each individual measurement property.
METHODS: A systematic literature search was performed in PubMed, EMbase, CINAHL and PsycINFO databases to identify relevant clinimetric studies. Two reviewers independently assessed the methodological quality of the studies on measurement properties, using the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist. Level of evidence (strong / moderate / limited / lacking) for each measurement property per PROM was determined by combining the methodological quality and the results of the different clinimetric studies.
RESULTS: In all, 19 out of 1508 identified unique studies were included, in which 12 PROMs were rated. The Patient-rated wrist evaluation (PRWE) and the Disabilities of Arm, Shoulder and Hand questionnaire (DASH) were evaluated on most measurement properties. The evidence for the PRWE is moderate that its reliability, validity (content and hypothesis testing), and responsiveness are good. The evidence is limited that its internal consistency and cross-cultural validity are good, and its measurement error is acceptable. There is no evidence for its structural and criterion validity. The evidence for the DASH is moderate that its responsiveness is good. The evidence is limited that its reliability and the validity on hypothesis testing are good. There is no evidence for the other measurement properties.
CONCLUSION: According to this systematic review, there is, at best, moderate evidence that the responsiveness of the PRWE and DASH are good, as are the reliability and validity of the PRWE. We recommend these PROMs in clinical studies in patients with distal radial fractures; however, more clinimetric studies of higher methodological quality are needed to adequately determine the other measurement properties.Cite this article: Dr Y. V. Kleinlugtenbelt. Are validated outcome measures used in distal radial fractures truly valid?: A critical assessment using the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist. Bone Joint Res 2016;5:153-161. DOI: 10.1302/2046-3758.54.2000462.

Entities: Chemical Disease Gene Species

Keywords: COSMIN; Distal radius fracture; PROM; Validation study

Year: 2016 PMID： 27132246 PMCID： PMC4921040 DOI： 10.1302/2046-3758.54.2000462

Source DB: PubMed Journal: Bone Joint Res ISSN： 2046-3758 Impact factor: 5.853

The aim of this systematic review was to evaluate the methodological quality of the clinimetric studies that evaluated measurement properties of the available patient reported outcome measures (PROMs) used in patients with distal radial fractures. To determine which PROM, based on the level of evidence of each individual measurement property, is most appropriate for the evaluation of patients with distal radial fractures. The two PROMs that were most extensively evaluated were the patient rated wrist evaluation (PRWE) (with seven of nine measurement properties investigated) and the Disabilities of Arm, Shoulder and Hand (DASH) (with four of nine investigated). The methodological quality of these studies ranged at best from poor to good. Strong evidence supporting ‘good quality’ of any of the current available PROMs in patients with distal radial fractures is lacking. The PRWE and DASH are the two most extensively evaluated PROMs. Their measurement properties were mainly good but the methodological quality of the clinimetric studies was low; this does mean that these results may be biased. For now we recommend to use the PRWE or DASH, but more clinimetric studies of higher methodological quality are needed to select PROMs in patients with distal radius fractures with greater confidence. Strength: This is the first study that has used the COnsensus-based Standards for the Selection of Health Measurement INstruments (COSMIN) checklist to systematically review the methodological quality of studies on the measurement properties of PROMs in the evaluation of treatment of distal radial fractures. Strength: Our search was not just limited to English language studies, as both reviewers have a good knowledge of German and Dutch. Limitation: It was not possible to distinguish between poor study reporting and poor methodological quality.

Introduction

Distal radial fractures account for approximately 17 % of all fractures[1] and the distal radius is the most common fracture site in the upper extremity.[2-4] Despite its high incidence, there is no treatment consensus for these fractures.[5] To conduct best evidence clinical trials in distal radial fracture treatment, and to properly compare trial results, there must be consensus on the use of outcome measures. Historically, outcome assessment after distal radial fractures focused on imaging and physical examination (e.g. grip strength and range of motion). These assessments, however, do not represent the patients’ perspective as they do not take the patients’ feelings, opinion or wellbeing into account, which are likely to be more important for the patient.[6] In the last two decades, outcomes assessment has shifted towards a patient-centred approach. This approach assesses the outcome based directly on the opinion of the patient. Outcomes such as pain and functional ability, which are highly relevant for patients, can be assessed by patient-reported outcome measures (PROMs).[7] Currently, a wide variety of PROMs are available and are used to assess patient-reported functional outcomes for upper limb and wrist disorders.[8-20] Several (non-)systematic studies have reviewed the existing literature in order to present available PROMs for assessing wrist and hand function in general.[21-25] Over a period of 25 years, the two most extensively used PROMs for evaluating the treatment outcome of patients with distal radial fractures.[26] were the Disabilities of Arm, Shoulder and Hand (DASH), and the (original or modified) Gartland and Werley scoring system. However, the patient-rated wrist evaluation (PRWE) was found to have the best measurement properties, e.g. it was found to be the most reliable, valid and responsive instrument for these patients. This conclusion was based on the results of the available clinimetric studies.[26] Clinimetrics is a scientific discipline that aims to develop methods of assessing the properties of health measurement instruments, with the aim of improving the quality of outcome measures. Although the measurement properties were found to be good, the authors did not incorporate the methodological quality of these clinimetric studies. It is important for the understanding of this systematic review to distinguish between the ‘methodological quality’ of clinimetric studies on PROMs and the ‘quality’ (e.g. the measurement properties) of the PROMs themselves. Evidently a PROM is only as good as the methodological quality of its study. In order to assess the methodological quality of clinimetric studies (i.e. studies on measurement properties) on PROMs, the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) group formulated a set of guidelines. First, the COSMIN group reached consensus on terminology, definitions and a taxonomy of measurement properties of PROMs in an international Delphi study. Next, the group developed a checklist containing standards for evaluating the methodological quality of studies on the measurement properties (e.g., reliability) of measurement instruments (e.g. DASH) (www.cosmin.nl).[27] The best PROM should have a high level of evidence (e.g. as evaluated in high quality studies) supporting good quality on all measurement properties. The definitions and a description of the measurement properties are given in Table I.

Table I.

Definitions of the measurement properties.

	Definitions of the measurement properties
Internal consistency	The degree of the interrelatedness among the items
	“Do the different questions in a PROM that are meant to measure the same general construct produce similar scores?”
Reliability	The proportion of the total variance in the measurements which is because of “true” differences among patients
	“How close are repeated measurements?”
Measurement error	The systematic error and random error of a patient’s score that are not attributed to true changes in the construct to be measured
	“What amount of change in a score cannot be considered a real or true change?”
Content validity	The degree to which the content of a health-related patient-reported outcomes (HR-PRO) instrument is an adequate reflection of the construct to be measured
	“Are all items relevant for the specific population and have important activities been missed?”
Structural validity	The degree to which the scores of an HR-PRO instrument are an adequate reflection of the dimensionality of the construct to be measured
	“Do all items in a PROM reflect single or multiple constructs?”
Hypotheses testing	The degree to which the scores of an HR-PRO instrument are consistent with hypotheses (for instance with regard to internal relationships, relationships to scores of other instruments, or differences between relevant groups) based on the assumption that the HR-PRO instrument validly measures the construct to be measured
	“What is the expected relationship with other PROMs assessing comparable constructs?”
Cross-cultural validity	The degree to which the performance of the items on a translated or culturally adapted HR-PRO instrument is an adequate reflection of the performance of the items of the original version of the HR-PRO instrument
	“Has the PROM been correctly translated and retested in another language and cultural setting?”
Criterion validity	The degree to which the scores of an HR-PRO instrument are an adequate reflection of a “gold standard”
	“Is the PROM tested against the benchmark PROM?”
Responsiveness	The ability of an HR-PRO instrument to detect change over time in the construct to be measured
	“If patients improve or worsen over time does this change in the PROM accordingly?”
Interpretability[*]	The degree to which one can assign qualitative meaning—that is, clinical or commonly understood connotations—to an instrument’s quantitative scores or change in scores
	“What do the scores or change in scores of a PROM mean?”

Clarification in bold

Is not a real measurement property, but nevertheless it is a meaningful requirement for the applicability of PROMs in research

PROM, patient-reported outcome measure

Definitions of the measurement properties. Clarification in bold Is not a real measurement property, but nevertheless it is a meaningful requirement for the applicability of PROMs in research PROM, patient-reported outcome measure The aim of this systematic review was to evaluate the methodological quality (using the COSMIN checklist) of the clinimetric studies that evaluated measurement properties of the available PROMs used in patients with distal radial fractures, and to make recommendations for the selection of PROMs based on the level of evidence of each individual measurement property. The results of this study might help us to determine which PROM is most appropriate for the evaluation of patients with distal radial fractures.

Materials and Methods

Literature search

We performed a literature search on November 13, 2015 to identify all published studies on the measurement properties of PROMs in the evaluation of treatment of distal radial fractures. The following databases were searched with specific index terms and derivatives of these terms: PubMed (1990 to 2015), EMbase (1990 to 2015), CINAHL (1990 to 2015), and PsycINFO (1990 to 2015). In PubMed we used a validated search filter for finding studies on measurement properties.[28] We also added the names of all PROMs that are described for wrist disorders.[29] The full search strategy is provided in the supplementary material. We restricted our search to studies published in English, German and Dutch because both reviewers are fluent in these languages. Reference lists were hand-searched to identify additional relevant studies.

Selection criteria

Two reviewers (YK and RN) independently assessed all titles and abstracts. We included studies with a description of the measurement properties of PROMs used in patients with a distal radial fracture. When in doubt about the applicability of a study, the full text article was retrieved and screened for eligibility. Afterwards, the researchers discussed their assessments and consensus was reached. In cases where consensus couldn’t be obtained, a third reviewer (VS), was employed to achieve consensus.

Assessment of the quality of the studies

The same two reviewers independently rated the methodological quality of the studies using the COSMIN checklist (www.cosmin.nl).[30] The COSMIN checklist consists of 11 separate checklists, called “boxes”. In nine boxes the quality of nine measurement properties is addressed: a) internal consistency, b) reliability, c) measurement error, d) content validity, e) structural validity, f) hypotheses testing, g) cross-cultural validity, h) criterion validity and i) responsiveness. The last box, “j: interpretability”, is not a measurement property, but nevertheless it is a meaningful requirement for the applicability of PROMs in research. The generalisability of the results is determined with a final box. The definitions of the measurement properties and interpretability are given in Table I. In each box, the methodological quality can be evaluated based on a variety of items addressing adequate study design and statistical analysis. Each question in any box must be rated as ‘excellent’, ‘good’, ‘fair’, ‘poor’ or ‘not applicable’. Scoring is then performed using the criteria set by the COSMIN group. To obtain a total score for the methodological quality of one of the boxes, “the worst score counts” algorithm was applied as set out by the COSMIN guidelines,[31] meaning that the methodological quality of that measurement property was only rated ‘excellent’ if all relevant questions pertaining to that box (e.g. measurement property) were scored as excellent. In all boxes, a small sample size was considered poor methodological quality. As a rule of thumb, a sample size of ⩾ 100 received a rating of ‘excellent’, 50 to 100 received ‘good’, 30 to 50 was rated ‘fair’, and less than 30 was rated as ‘poor’.[31]

Level of evidence of the measurement properties per PROM

For each PROM, we determined the level of evidence by combining the results of the different studies for each measurement property, as described by Terwee et al.[31] The following factors were taken into account: the number of studies (one or multiple), the methodological quality of the studies (excellent/good/fair/poor/not available), and consistency of the results (positive/negative). Based on these factors each measurement property per PROM could be ranked as strong, moderate, limited or conflicting evidence. Only when the methodological quality of the clinimetric study/studies was poor was the level of evidence rated as ‘unknown’.

Results

Included studies

A total of 2064 studies were retrieved by the electronic search performed in PubMed (n = 720), EMbase (n = 1075) and CINAHL/ PsycINFO (n = 269) (Fig. 1). After removing duplicates, 1508 unique studies were identified. The titles and abstracts were independently screened by two researchers, after which 27 studies were deemed potentially eligible. After retrieving and reading the full text, 19 studies were included. Reference evaluation of these 19 articles did not yield any additional relevant studies.

Fig. 1

Search strategy and selection of articles. *Nov 13, 2015. **Cinahl search includes PsycInfo database. HR-PRO, health-related patient-reported outcomes; DRF, distal radial fracture.

Overall results

In the 19 included studies, a total of 12 PROMs were evaluated (Table II). In three papers, multiple PROMs were evaluated: three,[32] three[33] and five,[34] respectively. Most studies (80%) evaluated more than one measurement property. None of the studies evaluated structural validity. Criterion validity was also not evaluated in any of the studies. However, this was expected given that there are no measurement instruments that can be used as a benchmark, which is a prerequisite of this measurement property. A complete overview of the study characteristics is shown in Table III.

Table II.

Patient-related outcome instruments included in the review[8-14,16-20]

Abbreviation	Full name	Original author
PRWE	Patient-Rated Wrist Evaluation	MacDermid[9]
DASH	Disabilities of Arm, Shoulder and Hand	Hudak[8]
MHQ	Michigan Hand Questionnaire	Chung[11]
SF-36	Short Form-36	Ware[12]
PEM	Patient Evaluation Measure	Macey[10]
AIMS2	Arthritis Impact Measurement Scale	Meenan[14]
BWH-CTQ	Brigham and Women’s Hospital Carpal Tunnel Questionnaire	Levine[13]
IOF-WFQ	International Osteoporosis Foundation Wrist Fracture Questionnaire	Lips[16]
PFW	Patient Focused Wrist Outcome Instrument	Bialocerkowski[17]
TSK	Tampa Scale of Kinesophobia	Kori[18]
CAT	Catastrophizing Subscale of the Coping Strategies Questionnaire	Rosenstiel[19]
SES	Self-Efficacy Scale	Altmaier[20]

Table III.

Study characteristics[32-49]

MeasurementInstrument	Study	n	Mean age	Gender	Country	Language
			(range or sd)	Male (%)
Patient-Rated Wrist Evaluation	Gabl[37]	133	62 (19 to 92)	27	Austria	German[*]
	Hemelaers[38]	44	56 (15)	36	Switzerland	German
	MacDermid[39]	36/101	45 (10) / 50 (16)	33 / 31	Canada	English[*]
	MacDermid[32]	59	53 (18)	37	Canada	English[*]
	Wilcke[35]	99	58 (18)	20	Sweden	Swedish
	Lovgren[34]	16	52 (12)	19	Sweden	Swedish
	Mehta[40]	50	46 (14)	56	India	Hindi
	Kim[41]	63	56 (19 to 83)	27	Rep. Korea	Korean
	Schonnemann[42]	60/29	55 (19 to 86)	27	Denmark	Danish
	Walenkamp[43]	102	59 (48 to 66)	30	Netherlands	Dutch
Disabilities of Arm, Shoulder and Hand	Macdermid[32]	59	53 (18)	37	Canada	English[*]
	Westphal[36]	107	59 (17 to 84)	27	Germany	German
	Westphal[44]	72	60 (16)	29	Germany	German
	Lovgren[34]	16	52 (12)	19	Sweden	Swedish
Michigan Hand Questionnaire	Kotsis[45]	47 / 37	48 (17) / 51(16)	32 / 38	USA	English
	Shauver[46]	51	50 (19 to 83)	37	USA	English
	Waljee[47]	128	61 (9)	27	USA/UK	English[*]
Short Form-36	Amadio[33]	21	57 (14 to 84)	14	USA	English[*]
	MacDermid[32]	59	53 (18)	37	Canada	English[*]
Patient Evaluation Measure	Forward[48]	200	54 (24 to 80)	36	UK	English[*]
Arthritis Impact Measurement Scale2	Amadio[33]	21	57 (14 to 84)	14	USA	English[*]
Brigham and Women’s Hospital Carpal Tunnel Questionnaire	Amadio[33]	21	57 (14 to 84)	14	USA	English[*]
International Osteoporosis Foundation Wrist Fracture Questionnaire	Lips[16]	105	63 (8)	12	UK/NL/Ita/BE	English/Dutch/Italian[*]
Patient Focused Wrist Outcome Instrument	Bialocerkowski[49]	26	62 (22 to 84)	15	Australia	English
Tampa Scale of Kinesophobia	Lovgren[34]	16	52 (12)	19	Sweden	Swedish
Catastrophizing Subscale of the Coping Strategies Questionnaire	Lovgren[34]	16	52 (12)	19	Sweden	Swedish
Self-Efficacy Scale	Lovgren[34]	16	52 (12)	19	Sweden	Swedish

It can be deduced as per the COnsensus-based Standards for the selection of health Measurement INstruments guidelines, often the country in which the study is performed and the language version of the measurement instrument that was used are not mentioned explicitly, but can be deduced from the affiliation of the authors

Patient-related outcome instruments included in the review[8-14,16-20] Study characteristics[32-49] It can be deduced as per the COnsensus-based Standards for the selection of health Measurement INstruments guidelines, often the country in which the study is performed and the language version of the measurement instrument that was used are not mentioned explicitly, but can be deduced from the affiliation of the authors Of all PROMs, the PRWE has been studied most extensively, followed by the DASH. The eight studies evaluating the PRWE assessed almost all measurement properties: seven of the nine (Table IV). However, overall, the methodological quality of these studies was low, varying from poor to fair for internal consistency, reliability, measurement error, cross-cultural validity and responsiveness, and varying from poor to good for content validity and hypothesis testing. Interpretability was also assessed, but these studies were of poor methodological quality.

Table IV.

Summary of methodological quality of the studies on measurement properties of the PRWE and DASH[32-44]

	PRWE[37]	PRWE[38]	PRWE[39]	PRWE[32]	PRWE[35]	PRWE[34]	PRWE[40]	PRWE[42]	PRWE[41]	PRWE[43]	DASH[32]	DASH[36]	DASH[44]	DASH[34]
Generalisability	Fair	Fair	Fair	Poor	Fair	Excel	Poor	Fair	Fair	Fair	Poor	Fair	Good	Excel
Internal Consistency	Poor	Poor			Fair	Poor	Poor	Poor	Poor	Poor		Poor	Poor	Poor
Reliability		Fair	Poor		Fair	Poor	Fair	Poor	Fair				Fair	Poor
Measurement Error									Fair	Poor
Content validity			Fair					Good
Structural validity
Hypotheses testing		Fair			Good		Fair	Fair	Poor			Fair
Cross-cultural					Fair		Poor	Poor	Poor
Criterion validity
Responsiveness			Fair	Fair	Fair		Fair	Fair	Poor		Fair		Fair
Interpretability					Poor				Poor		Poor	Poor

A full overview of all the scores are shown in the supplementary material

Summary of methodological quality of the studies on measurement properties of the PRWE and DASH[32-44] A full overview of all the scores are shown in the supplementary material The four studies evaluating the DASH[32,34-36] assessed less than half of the measurement properties: four of nine. The methodological quality of these studies was generally low, varying from persistently poor for internal consistency, poor to fair for reliability, and consistently fair for responsiveness. Measurement error, content validity, hypothesis testing, cross-cultural validity and interpretability were not assessed in any of the studies (Table IV). Of the other ten PROMs, one to three measurement properties were assessed. These concerned mostly internal consistency, reliability and responsiveness. Overall, the methodological quality of these clinimetric studies was at best poor to fair (Table V). This is mainly due to the low sample size in the majority of these studies but can also be secondarily attributed to the high amount of items that were scored as “not applicable”. Finally, the lack of description surrounding the statistical methods that were used also contributed to the poor rating.

Table V.

	MHQ[45]	MHQ[46]	MHQ [46]	SF-36[32]	SF-36[33]	PEM[48]	IOF[16]	PFW[49]	AIMS2[33]	BWH[33]	TSK[34]	CAT[34]	SES[34]
Generalisability	Fair	Fair	Poor	Poor	Fair	Poor	Fair	Fair	Fair	Fair	Excellent	Excellent	Excellent
Internal Consistency						Poor	Poor				Poor	Poor	Poor
Reliability							Poor				Poor	Poor	Poor
Measurement Error
Content validity
Structural validity
Hypotheses testing						Poor		Poor
Cross-cultural
Criterion validity
Responsiveness	Fair	Fair	Fair	Fair	Poor		Fair	Poor	Poor	Poor
Interpretability		Fair

MHQ, Michigan Hand Questionnaire; SF-36, Short Form-36; PEM, Patient Evaluation Measure; AIMS2, Arthritis Impact Measurement Scale; BWH-CTQ, Brigham and Women’s Hospital Carpal Tunnel Questionnaire; IOF-WFQ International Osteoporosis Foundation Wrist Fracture Questionnaire; PFW, Patient Focused Wrist Outcome Instrument; TSK, Tampa Scale of Kinesophobia; CAT, Catastrophizing Subscale of the Coping Strategies Questionnaire; SES Self-Efficacy Scale

Summary of methodological quality of the studies on measurement properties of the other measurement instruments. A full overview of all the scores are shown in the supplementary material[16,32-34,45,46,48,49] MHQ, Michigan Hand Questionnaire; SF-36, Short Form-36; PEM, Patient Evaluation Measure; AIMS2, Arthritis Impact Measurement Scale; BWH-CTQ, Brigham and Women’s Hospital Carpal Tunnel Questionnaire; IOF-WFQ International Osteoporosis Foundation Wrist Fracture Questionnaire; PFW, Patient Focused Wrist Outcome Instrument; TSK, Tampa Scale of Kinesophobia; CAT, Catastrophizing Subscale of the Coping Strategies Questionnaire; SES Self-Efficacy Scale The synthesis of results per PROM and their accompanying level of evidence are presented in Table VI.

Table VI.

Ratings of measurement properties and interpretability of measurement instruments with level of evidence[32-49]

	PRWE[32,34,35,37-43]	DASH[32,34,36,44]	MHQ[45-47]	SF-36[32,33]	PEM[48]	AIMS2[33]	BWH[33]	IOF[33]	PFW[49]	TSK[34]	CAT[34]	SES[34]
Reliability
Internal consistency	+	?			?			?		?	?	?
Cronbach’s alpha	0.89 to 0.97	0.93 to 0.98			0,94			0.96		0.68 to 0.82	0.88 to 0.97	0.79 to 0.95
Reliability	++	+						?		?	?	?
Intraclass correlation cofficient	0.81 to 0.97	0.78 to 0.95						NA		0.81 to 0.84	0.85 to 0.89	0.57 to 0.86
Measurement error	+
Smallest detectable change	4.4 to 11.0
Validity
Content validity	++
Structural validity
Hypotheses testing	++	+			?				+
Comparator instrument	DASH	Gartland			NA				NA
Cross-cultural	+
Criterion validity
Responsiveness
Responsiveness	++	++	++	+		?	?	+	?
Standardised response mean	NA	NA	NA	NA		NA	NA	NA	NA
INTERPRETABILITY
Interpretability	?		-
Minimal important change	11.5

+ ++ or − − − multiple studies of good quality OR 1 study of excellent quality: strong evidence positive/negative result

+ + or − − multiple studies of fair quality OR 1 study of good quality: moderate evidence positive/negative result

+ or − 1 study of fair quality: limited evidence positive/negative result

+ / − conflicting findings

? only studies of poor quality: unknown, due to poor methodological quality

NA, not available (not performed or described)

PRWE, Patient-Rated Wrist Evaluation; DASH, Disabilities of Arm, Shoulder and Hand; MHQ, Michigan Hand Questionnaire; SF-36, Short Form-36; PEM, Patient Evaluation Measure; AIMS2, Arthritis Impact Measurement Scale; BWH-CTQ, Brigham and Women’s Hospital Carpal Tunnel Questionnaire; IOF-WFQ International Osteoporosis Foundation Wrist Fracture Questionnaire; PFW, Patient Focused Wrist Outcome Instrument; TSK, Tampa Scale of Kinesophobia; CAT, Catastrophizing Subscale of the Coping Strategies Questionnaire; SES Self-Efficacy Scale

Ratings of measurement properties and interpretability of measurement instruments with level of evidence[32-49] + ++ or − − − multiple studies of good quality OR 1 study of excellent quality: strong evidence positive/negative result + + or − − multiple studies of fair quality OR 1 study of good quality: moderate evidence positive/negative result + or − 1 study of fair quality: limited evidence positive/negative result + / − conflicting findings ? only studies of poor quality: unknown, due to poor methodological quality NA, not available (not performed or described) PRWE, Patient-Rated Wrist Evaluation; DASH, Disabilities of Arm, Shoulder and Hand; MHQ, Michigan Hand Questionnaire; SF-36, Short Form-36; PEM, Patient Evaluation Measure; AIMS2, Arthritis Impact Measurement Scale; BWH-CTQ, Brigham and Women’s Hospital Carpal Tunnel Questionnaire; IOF-WFQ International Osteoporosis Foundation Wrist Fracture Questionnaire; PFW, Patient Focused Wrist Outcome Instrument; TSK, Tampa Scale of Kinesophobia; CAT, Catastrophizing Subscale of the Coping Strategies Questionnaire; SES Self-Efficacy Scale The highest levels of evidence were found for the measurement properties of the PRWE. Nevertheless, the evidence is, at best, limited to moderate. For instance, reliability (assessed in 78% of the studies) ranged from 0.81 to 0.97 (Table VI). Three studies were of poor methodological quality, and four were of fair quality (Table IV). Therefore, the synthesis of these results is that there is moderate evidence supporting good reliability. There is also moderate evidence that the validity (content and hypothesis testing) and responsiveness are good. The evidence is limited in that its internal consistency and cross-cultural validity are good, and its measurement error is acceptable. There is no evidence for its structural and criterion validity. The evidence for the DASH is moderate that its responsiveness is good. The evidence is limited that its reliability and the validity on hypotheses testing are good. There is no evidence for the other measurement properties. The evidence for the other ten PROMs is mainly unknown, since the quality of the studies that evaluated some of the PROM measurement properties (mainly internal consistency, reliability and/or responsiveness) was mainly poor methodologically.

Discussion

The aim of this systematic review was to evaluate the methodological quality of the clinimetric studies that evaluated measurement properties of the available PROMs used in patients with distal radial fractures, and to make recommendations for the selection of PROMs based on the level of evidence of each individual measurement property.

Key findings

The two PROMs that were most extensively evaluated were the PRWE (with seven of nine measurement properties investigated) and the DASH (with four of nine investigated). The methodological quality of these studies ranged at best from poor to good. Therefore, after synthesis of the scores and incorporating the levels of evidence, the quality of these two PROMS is not supported with strong levels of evidence on any of the measurement properties. For the PRWE, there is moderate evidence supporting good reliability, content validity, hypotheses testing and responsiveness. The evidence is only limited in that the measurement error is acceptable and the cross-cultural validity and internal consistency are good. Structural validity and criterion validity were never evaluated, so these lack in evidence. The evidence for interpretability, which is not a measurement property, is unknown, since this was only evaluated in three studies with poor methodological quality. The DASH showed at best moderate evidence for good responsiveness and limited evidence for good hypotheses testing and reliability. All other measurement properties were found to be lacking in evidence. These findings do not mean that these and other PROMs have poor measurement properties and thus are of poor quality. Since we found that, overall, the measurement properties were good but the methodological quality of the clinimetric studies was low, it does mean that these results may be biased. Therefore, the results of our review do imply that studies of higher methodological quality are needed to properly assess their measurement properties. For instance, many PROMs are translated into multiple languages. The PRWE has been correctly translated into 14 languages, following the translation process described by Beaton et al.[50] Nevertheless, we only found cross-cultural validity studies for the Swedish, Hindi, Korean and Danish versions, because the other translated versions were not adequately evaluated. However, our search was limited to English, German and Dutch, so it can be assumed that the cross-cultural validity was evaluated but the results were not published in any of these languages.

Comparison of results with previous literature

Previous reviews described a variety of PROMS measuring wrist and/or hand disorders in general, but not PROMs specific to distal radial fractures. Goldhahn et al[25] advise using a combination of a disease-specific PROM (PRWE), an extremity-specific PROM (DASH) and a generic PROM (SF-36). Changulani et al[22] compared the measurement properties of four PROMs for wrist and hand disorders. They concluded that the PRWE is the most responsive instrument for evaluating outcomes in patients with a distal radial fracture. These conclusions were drawn before the COSMIN checklist was available. The methodological quality of the clinimetric studies was not taken into account and therefore these results may be biased, especially since in the current review we found that the methodological quality of these studies was, at best, fair. Therefore, we can only conclude that the good responsiveness of the DASH and PRWE is supported by moderate evidence. Hoang-Kim et al[21] assessed the quality of reviews published on currently used PROMs for assessing function of the hand and wrist joints. Although they used COSMIN’s taxonomy, terminology and definitions to define the different measurement properties, they did not systematically review the methodological quality of these studies. Nevertheless, they concluded that the PRWE has good construct validity and responsiveness, and found this to be only slightly better than the DASH for assessing patients with wrist injuries. Based on the results of our review we agree that the PRWE is slightly better investigated than the DASH, but disagree with their rating of “good” on some measurement properties. This difference may be due to the fact that we incorporated the methodological quality of these studies by using the COSMIN checklist instead of only using the COSMIN taxonomy.

Study strengths

To our knowledge, this is the first study that has used the COSMIN checklist to systematically review the methodological quality of studies on the measurement properties of PROMs in the evaluation of treatment of distal radial fractures. Furthermore, the quality of each study was assessed by two independent reviewers, as recommended by the COSMIN group, and a third reviewer in cases of disagreement. Using these methods, we were able to minimise subjective judgement on the outcome. We searched for relevant articles from 1990 onwards, so we consider it unlikely that any relevant PROMs were missed. This is especially true since most PROMs were developed after 1990. Since we found 19 studies eligible from a possible 1508, this shows that our search strategy was very broad and inclusive. Yet, it also demonstrates that the literature on this topic is somewhat lacking. Our search was not just limited to the English language, as both reviewers have a good knowledge of German and Dutch.

Study weaknesses

There were some limitations to this review. As in all reviews, publication bias from unpublished studies may threaten the internal validity as unpublished studies are more likely to report negative or unfavourable results.[51] Another limitation of this study was that it was not always clear to the reviewers if specific methodological aspects were not reported or not performed, making it impossible to distinguish between poor study reporting and poor methodological quality. We did not contact the authors of the studies to clarify these issues. It can be assumed that some studies have been executed properly but are not sufficiently well described according to the COSMIN criteria. This may have affected the quality ratings. The shortcomings of outcome measurement research in distal radial fractures exposed by this review should not be generalised to all clinimetric research in orthopaedic surgery. However, it is known that strong evidence supporting good quality of multiple PROMs for various pathology is lacking[52-54] so we advise the reader to be cautious when choosing a PROM based on the results of clinimetric studies without considering their methodological quality. For future research, we believe that it is especially important to further evaluate the measurement properties and interpretability of the PRWE and DASH outcome measures in higher quality studies. Based on the results of the available clinimetric studies, there is no evidence that these PROMs are not useful in evaluating the treatment of distal radial fractures, and therefore we do not believe that it is necessary to develop new instruments. Currently, based on best available evidence, we recommend using the PRWE or DASH to evaluate the outcome of treatment in patients with distal radial fractures but we cannot stress strongly enough that more clinimetric studies of higher methodological quality are needed in order to more confidently select appropriate PROMs. According to this systematic review, strong evidence supporting ‘good quality’ of any of the current available PROMs in patients with distal radial fractures is lacking. The evidence that the responsiveness of the PRWE and DASH is good is moderate, as is the evidence for good validity and reliability of the PRWE. We therefore recommend these PROMs in clinical studies in patients with distal radial fractures; however, more clinimetric studies of higher methodological quality are needed to adequately determine their other measurement properties. If the methodological quality of clinimetric studies continues to increase, PROMs can be selected with greater confidence.

51 in total

1. [Reliability and responsiveness of the German version of the Disabilities of the Arm, Shoulder and Hand questionnaire (DASH)].

Authors: T Westphal
Journal: Unfallchirurg Date: 2007-06 Impact factor: 1.000

Review 2. Measurement properties of performance-based outcome measures to assess physical function in young and middle-aged people known to be at high risk of hip and/or knee osteoarthritis: a systematic review.

Authors: S L Kroman; E M Roos; K L Bennell; R S Hinman; F Dobson
Journal: Osteoarthritis Cartilage Date: 2013-11-09 Impact factor: 6.576

Review 3. The measurement properties of the IKDC-subjective knee form.

Authors: Hanna Tigerstrand Grevnerts; Caroline B Terwee; Joanna Kvist
Journal: Knee Surg Sports Traumatol Arthrosc Date: 2014-09-06 Impact factor: 4.342

Review 4. External fixation versus internal fixation for unstable distal radius fractures: a systematic review and meta-analysis of comparative clinical trials.

Authors: David H Wei; Rudolf W Poolman; Mohit Bhandari; Valerie M Wolfe; Melvin P Rosenwasser
Journal: J Orthop Trauma Date: 2012-07 Impact factor: 2.512

5. Reliability and validity of measurement and associations between disability and behavioural factors in patients with Colles' fracture.

Authors: Anneli Lövgren; Karin Hellström
Journal: Physiother Theory Pract Date: 2011-08-08 Impact factor: 2.279

6. [Acceptance of patient-related evaluation of wrist function following distal radius fracture (DRF)].

Authors: M Gabl; D Krappinger; R Arora; R Zimmermann; P Angermann; S Pechlaner
Journal: Handchir Mikrochir Plast Chir Date: 2007-02 Impact factor: 1.018

7. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study.

Authors: Lidwine B Mokkink; Caroline B Terwee; Donald L Patrick; Jordi Alonso; Paul W Stratford; Dirk L Knol; Lex M Bouter; Henrica C W de Vet
Journal: Qual Life Res Date: 2010-02-19 Impact factor: 4.147

8. Patient rating of wrist pain and disability: a reliable and valid measurement tool.

Authors: J C MacDermid; T Turgeon; R S Richards; M Beadle; J H Roth
Journal: J Orthop Trauma Date: 1998 Nov-Dec Impact factor: 2.512

Review 9. What counts: outcome assessment after distal radius fractures in aged patients.

Authors: Jörg Goldhahn; Felix Angst; Beat R Simmen
Journal: J Orthop Trauma Date: 2008-09 Impact factor: 2.512

10. Evaluation of a Swedish version of the patient-rated wrist evaluation outcome questionnaire: good responsiveness, validity, and reliability, in 99 patients recovering from a fracture of the distal radius.

Authors: Maria T Wilcke; Hassan Abbaszadegan; Per Y Adolphson
Journal: Scand J Plast Reconstr Surg Hand Surg Date: 2009

12 in total

1. Is Use of a Psychological Workbook Associated With Improved Disabilities of the Arm, Shoulder and Hand Scores in Patients With Distal Radius Fracture?

Authors: Stuart Goudie; Diane Dixon; Gail McMillan; David Ring; Margaret McQueen
Journal: Clin Orthop Relat Res Date: 2018-04 Impact factor: 4.176

2. Patient-Rated Wrist Evaluation: Spanish Version and Evaluation of Its Psychometric Properties in Patients with Acute Distal Radius Fracture.

Authors: Veronica Alfie; Gerardo Gallucci; Jorge Boretto; Agustin Donndorff; Juieta Puig Dubois; Sonia Benitez; Diego Giunta; Pablo de Carli
Journal: J Wrist Surg Date: 2017-03-08

Review 3. Measurement properties of the most commonly used Foot- and Ankle-Specific Questionnaires: the FFI, FAOS and FAAM. A systematic review.

Authors: I N Sierevelt; R Zwiers; W Schats; D Haverkamp; C B Terwee; P A Nolte; G M M J Kerkhoffs
Journal: Knee Surg Sports Traumatol Arthrosc Date: 2017-10-12 Impact factor: 4.342

Review 4. A Systematic Review of Self-Reported Outcome Measures Assessing Disability Following Hand and Upper Extremity Conditions in Persian Population.

Authors: Erfan Shafiee; Maryam Farzad; Mahdieh Karbalaei
Journal: Arch Bone Jt Surg Date: 2021-03

5. A Systematic Review of Outcome Measures Assessing Disability Following Upper Extremity Trauma.

Authors: Prakash Jayakumar; Mark Williams; David Ring; Sarah Lamb; Stephen Gwilym
Journal: J Am Acad Orthop Surg Glob Res Rev Date: 2017-07-27

6. A Combined Randomised and Observational Study of Surgery for Fractures In the distal Radius in the Elderly (CROSSFIRE): a statistical analyses plan.

Authors: Andrew Lawson; Justine Naylor; Rachelle Buchbinder; Rebecca Ivers; Zsolt Balogh; Paul Smith; Rajat Mittal; Wei Xuan; Kirsten Howard; Arezoo Vafa; Piers Yates; Bertram Rieger; Geoff Smith; Ilia Elkinson; Woosung Kim; Jai Sungaran; Kim Latendresse; James Wong; Sameer Viswanathan; Keith Landale; Herwig Drobetz; Phong Tran; Richard Page; Raphael Hau; Jonathan Mulford; Ian Incoll; Michael Kale; Bernard Schick; Andrew Higgs; Andrew Oppy; Diana Perriman; Ian Harris
Journal: Trials Date: 2020-07-15 Impact factor: 2.279

7. Are the patient-rated wrist evaluation (PRWE) and the disabilities of the arm, shoulder and hand (DASH) questionnaire used in distal radial fractures truly valid and reliable?

Authors: Y V Kleinlugtenbelt; R G Krol; M Bhandari; J C Goslings; R W Poolman; V A B Scholtes
Journal: Bone Joint Res Date: 2018-01 Impact factor: 5.853