Literature DB >> 30779267

Comparing outcomes: The Clinical Outcome in Routine Evaluation from an international point of view.

Marina Zeldovich1, Rainer W Alexandrowicz1.   

Abstract

OBJECTIVES: The Clinical Outcome in Routine Evaluation-Outcome Measure (CORE-OM) is a freely accessible self-assessment questionnaire with a total of 34 items measuring the progress of psychological or psychotherapeutic treatments according to four scales (well-being, problems, functioning, and risk). The CORE-OM originated in the United Kingdom and has been translated into 54 languages and dialects. The aim of this study is to systematically compare the translated versions.
METHOD: A total of 21 translations were compared using methods of systematic review and meta-analysis.
RESULTS: The results show a certain heterogeneity between the studies; however, the 21 translations can be declared as equivalent.
CONCLUSION: The factorial structure could not be replicated in any of translations. Therefore, further analysis of the CORE-OM domains is recommended. In addition, some supplementary restrictions on the translation process, data collection, and reporting of results are necessary to ensure comparability and quality of CORE-OM translations.
© 2019 The Authors International Journal of Methods in Psychiatric Research Published by John Wiley & Sons Ltd.

Entities:  

Keywords:  CORE-OM; meta-analysis; outcome measurement; translation of questionnaires

Mesh:

Year:  2019        PMID: 30779267      PMCID: PMC6849827          DOI: 10.1002/mpr.1774

Source DB:  PubMed          Journal:  Int J Methods Psychiatr Res        ISSN: 1049-8931            Impact factor:   4.035


INTRODUCTION

Outcome measurement (OM) is the assessment of the “effect on a patient's health status that is attributable to an intervention by a health professional or health service” (Andrews, Peters, & Teesson, 1994, p. 3). An outcome is the end result of a provided health service that affects the health status and functioning of the patient treated, thus reflecting what happens to the patient “in terms of palliation, control of illness, cure, or rehabilitation” (Brook, Williams, & Avery, 1976, p. 809). It focuses on the change and allows evaluation of both the services provided and the patients' progress during treatment. In the late 1970s, a discussion regarding the necessity of OM in mental health services arose in the United States and spread internationally in the 1980s (Brook et al., 1976; Erickson, 1975; Lohr, 1988; Slade, Thornicroft, & Glover, 1999). Sutherland and Till (1993) identified three levels, at which OM might prove useful: micro (e.g., individual or clinical), meso (e.g., within an institution), and macro (e.g., governmental).

OM instruments

OM instruments assess outcomes using either an external perspective (e.g., clinicians or relatives) or the patient's subjective perception, or both (Thornicroft & Slade, 2014). They cover various domains, such as well‐being, global functioning, quality of life, physical or mental condition, satisfaction with the treatment, provided services, or their costs (Slade, 2002; Thornicroft & Tansella, 2010). Outcome measures can be symptom independent or targeted a specific group of mental diseases; they can focus on recovery or individual goals in the course of treatment (Thornicroft & Slade, 2014). Several instruments have been developed to measure outcomes. For example, the Health of the Nation Outcome Scales (Wing et al., 1998) focuses upon the outcome of mental health treatments using an external perspective (clinicians). Because the external perspective is not always reliable and possibly biased, some authors recommend to integrate also the patient's perspective (Slade, 1996; Slade, Leese, Taylor, & Thornicroft, 1999). The Inventory of Interpersonal Problems (Horowitz, Rosenberg, Baer, Ureno, & Villasenor, 1988) focuses on interpersonal problems causing psychological distress. Symptom dependent measures such as the Beck Depression Inventory (BDI‐II; Beck, Steer, & Brown, 1996) can be used for assessment of the symptom severity in course of the therapies. Moreover, there are questionnaires measuring important constructs for outcome assessment, such as well‐being (e.g., Quality of Well‐Being Scale; Kaplan & Bush, 1982), global functioning (e.g., Global Assessment Scale; Endicott, Spitzer, Fleiss, & Cohen, 1976, or Work and Social Adjustment Scale; Mundt, Marks, Shear, & Greist, 2002), and quality of life (e.g., Quality of Life Scale; Flanagan, 1978).

The CORE‐OM

The present study focuses on the Clinical Outcome in Routine Evaluation–Outcome Measure (CORE‐OM; Barkham et al., 1998). It concentrates on the areas mentioned above plus the patient's resources. If the CORE‐OM works the way it was designed, we would dispose of an instrument gathering a wide spectrum of information useful for assessing the treatment progress from the patient's perspective. Its development relied on a survey of mental health services in the United Kingdom, which revealed a lack of systematic information on patient's health status in pretreatment and posttreatment phases (Mellor‐Clark, Barkham, Connell, & Evans, 1999). The CORE‐OM has been integrated into the British National Health System (Slade, 2010).

Structure

The CORE‐OM contains 34 items split into to the four scales well‐being (four items), problems/symptoms (12 items), functioning (12 items), and risk (six items) using a five‐categorical response format (0 = not at all, 1 = only occasionally, 2 = sometimes, 3 = often, and 4 = most or all the time); eight items are inversely worded. According to the manual, the CORE‐OM is not restricted to specific diagnosis groups. The questionnaire is copyleft (i.e., it can be used free of charge), which fosters its broad application.

Evaluation

Evans et al. (2002) evaluated the psychometric properties of the CORE‐OM using a clinical sample collected from 23 sites within the National Health Service, three university student counselling services, and a staff support service in the United Kingdom (n = 890; 60% female) and a nonclinical sample (n = 1,106; 54% female) of university students and staff as well as “a sample of convenience” (Evans et al., 2002, p. 53; Lohr, 1999) representing the “general population.” Table 1 gives an overview about the results of psychometric analyses.
Table 1

An overview of psychometric properties (reliability and validity) of the English CORE‐OM in accordance to Evans et al. (2002)

Method/instrumentSample typeWell‐beingProblemsFunctioningRiskNonrisk itemsTotal score
N 4 items12 items12 items6 items28 items34 items
ReliabilityCronbach's α Clinical1,1060.750.880.870.790.940.94
Nonclinical8900.770.900.860.790.940.94
Test stability (ρ)Student sample430.880.870.870.640.910.90
ValidityBDI‐IClinical2510.770.780.780.590.840.85
BDI‐II290.790.740.780.320.830.81
BSI970.630.760.710.620.790.81
SCL‐90340.680.870.790.830.850.88
GHQ690.670.660.650.560.720.75
GHQ‐A690.430.600.440.300.560.55
GHQ‐B690.550.610.570.300.640.64
GHQ‐C690.600.520.600.440.620.63
IIP‐322460.480.580.650.450.640.65

Note. CORE‐OM: Clinical Outcome in Routine Evaluation–Outcome Measure; BDI‐I: Beck Depression Inventory (Beck, Ward, Mendelson, Mock, & Erbaugh, 1961); BDI‐II: Beck Depression Inventory (Beck et al., 1996); BSI: Brief Symptom Inventory (Derogatis & Melisaratos, 1983); GHQ: General Health Questionnaire, 28‐item version (Goldberg & Hiller, 1979); GHQ‐A: somatic symptoms; GHQ‐B: anxiety and insomnia; GHQ‐C: social dysfunction; IIP‐32: Inventory of Interpersonal Problems, 32‐item version (Barkham, Gillian, & Startup, 1996); SCL‐90: Symptom Checklist‐90‐Revised (Derogatis, 1983).

An overview of psychometric properties (reliability and validity) of the English CORE‐OM in accordance to Evans et al. (2002) Note. CORE‐OM: Clinical Outcome in Routine Evaluation–Outcome Measure; BDI‐I: Beck Depression Inventory (Beck, Ward, Mendelson, Mock, & Erbaugh, 1961); BDI‐II: Beck Depression Inventory (Beck et al., 1996); BSI: Brief Symptom Inventory (Derogatis & Melisaratos, 1983); GHQ: General Health Questionnaire, 28‐item version (Goldberg & Hiller, 1979); GHQ‐A: somatic symptoms; GHQ‐B: anxiety and insomnia; GHQ‐C: social dysfunction; IIP‐32: Inventory of Interpersonal Problems, 32‐item version (Barkham, Gillian, & Startup, 1996); SCL‐90: Symptom Checklist‐90‐Revised (Derogatis, 1983).

Translation

To foster international comparisons, Evans, Mellor‐Clark, Marginson, and Barkham (2000) and Evans et al. (2002) called for translations into other languages by psychologists, psychiatrists, or psychotherapists. This process has to meet specific requirements defined by CORE System Trust (CST, 2011, 2015), requiring (a) forward translation, (b) a focus group discussing the translation, and (c) field testing and backward translation. One of the authors of the English CORE‐OM has to accompany the translational procedure. The CST (2011, 2015) further specifies rules regarding the examination of the psychometric properties of the translations. The sample should be representative of the target population and comprise at least N = 100 for a clinical population, N = 40 for test–retest examination with the interval of 1 week to 1 month, and N = 200 for the clinical and N = 200 for the nonclinical population. Given that internal consistency and retest reliability can be considered sufficient, the CST guidelines further recommend assessing both the reliable change (Jacobson, Follette, & Revenstorf, 1984) and clinical significant change (CSC; Jacobson & Truax, 1991), the latter also termed “clinical reliability” (Evans, Margison, & Barkham, 1998). The CST (2011) further recommends exploratory factor analysis (EFA) of the data. Interestingly, the CST (2015) relaxed the requirements by recommending sweepingly N = 100 for all samples and dropping the factor analysis from the list. Barkham, Mellor‐Clark, Connell, and Cahill (2006) emphasized that the CORE‐OM focuses on “ethnic groups and European languages” (p. 9). Since its introduction in 1999, the CORE‐OM has been translated into 52 languages and dialects. According to the CST (2018), 23 translated versions of the CORE‐OM have been published and 29 translations were under development at the time of writing.

Rationale

According to the World Health Organization (WHO, 2018) guidelines, a translation should be “conceptually equivalent in each of the target countries/cultures. (…) should be equally natural and acceptable and should practically perform in the same way. (…) A well‐established method to achieve this goal is to use forward‐translations and back‐translations.” Thus, the WHO guidelines consider forward and backward translation sufficient to achieve “conceptual equality” of questionnaire translations. The translation process of the CORE‐OM followed these steps, so that from a WHO perspective, the translated versions of the CORE‐OM could be considered equivalent. However, if that claim holds, we should be able to provide empirical evidence for it—or reveal a lack thereof.

Objectives

The authors of the present study are not aware of a systematic comparison of the published translations of the CORE‐OM so far. Goldhahn, Shisha, Macdermid, and Goldhahn (2013) emphasized the importance of “appropriately translated instruments” for “international multicentre studies; or inclusion of people with different cultural backgrounds in national trials” (p. 591). Because the CORE‐OM will be used for international comparisons, we require empirical evidence that the translations can be considered equivalent from a psychometric point of view. Therefore, the present study compares all available studies of the translations of the CORE‐OM with respect to the reported psychometric properties considering the three major criteria (a) reliability (as reflected by internal consistency and retest stability), (b) validity (in terms of factorial structure, convergent, and discriminant validity), and (c) objectivity (in terms of application, evaluation, and interpretation of results). Only if these criteria are met to a sufficient extent can we recommend the various CORE‐OM versions for comparisons across countries with differing languages.

METHOD

Study design

The present study uses techniques applied in systematic reviews and meta‐analyses. Each article presenting a translation of the CORE‐OM serves as a primary study, the results of which will be summarised. We follow the guidelines for Preferred Reporting Items for Systematic Reviews and Meta‐Analysis (Moher et al., 2015). First, we provide a systematic review of studies of CORE‐OM translations with respect to the translational processes and psychometric analyses performed therein. Second, we conduct a meta‐analysis on the psychometric details with the primary focus on the three major criteria reliability, validity, and objectivity.

Data collection

The official website of the CST (2018) lists available translations, contact information, and translations currently in progress. For each language listed there, we conducted a search on PsychINFO and PubMed using the search terms CORE‐OM AND psychom* propert* OR CORE‐OM AND translat*; publication year from 1998 to 2018. Moreover, we also performed a Google scholar search using generic search terms, that is, “CORE‐OM” along with the respective language, for example, “German CORE‐OM”. Authors, whose translations could not be found in these sources, were contacted via e‐mail. The target was to collect all articles presenting a translation of the CORE‐OM. To ensure the validity of findings, two persons (M. Z. and L. C. W.) performed the search independently of each other.

Information extraction

From each article, we extracted (a) data collection and sampling characteristics (sample type, recruiting of participants, and duration); (b) descriptive statistics of clinical and nonclinical samples (sample size, gender, and mean age); (c) reliability measures (Cronbach's α); test stability coefficients (Spearman and Pearson); and (d) details regarding the examination of the validity (factorial structure using EFA and confirmatory factor analysis [CFA] and correlations between the CORE‐OM and the SCL‐90 and the BDI‐II).

Analysis

Using a random‐effect meta‐analytical approach (DerSimonian & Laird, 1986), we pooled Cronbach's α, the stability coefficients, and the correlations of the total scores of SCL‐90 and BDI‐II with the CORE‐OM total score, the modified total score (nonrisk items), and the scale scores. We calculated Cochran's Q (Cochran, 1950) and I 2 and H 2 (Higgins & Thompson, 2002) to assess the variation of studies outcomes. Further, we generated the diagnostic plot of Baujat, Mah, Pignon, and Hill (2002) and forest plots (Lewis & Clarke, 2001) for visualisation of the results (see Supporting Information). Regarding I 2, we follow the guideline of Quintana (2015), who suggests to consider up to 25% as low, 50% as moderate, and 75% and above as high variance between the studies. Using Cook's distance (Cook & Weisberg, 1982), we identified the studies contributing most to heterogeneity. To detect possible explanations of observed differences in Cronbach's α and convergent validity coefficients, we performed a moderator analysis using (a) mean age of participants, (b) gender, and (c) sample type (inpatients, outpatients, and mixed samples) as covariates. For test stability coefficients, we disposed only of information on the sample type (community, students, and mixed samples), which was used as a moderator. Age and gender (proportion of females) were introduced as quantitative covariates, and sample type was dummy coded (outpatients as reference group for the internal consistency's analyses and community for test stability). All analyses were performed with R (R Core Team, 2013) applying the packages robumeta (Fisher, Tipton, & Zhipeng, 2017) and metafor (Viechtbauer, 2010). For better readability, references to specific studies are given in brackets throughout the text (full list in Table 2).
Table 2

An overview of all identified CORE‐OM publications

ClinicalNonclinical
GenderAgeGenderAge
No.Author(s), yearTranslationPubLang N PatientsTypeCollectionFemaleMale M SD N TypeCollectionFemaleMale M SD
[1]Evans et al. (2002)EnglishPaperEn890OutMental health servicespp5303441,106University staff, studentspp601498
[2]Bodinaku (2014)AlbanianPhDEn209OutMental health centrepp129803712.4501Communitypp24925240.215.1
a Cartasso and Lemos (2012)ArgentinianPaperEsp106OutPsychiatric centrepp7531
a Santana et al. (2015)Brazilian PortuguesePaperEn44OutTrauma clinicpp202443.214.155Communitypp24313717.4
[3]Jokic‐Begic, Korajlija, and Jurin (2014)CroatianPaperHr183InPsychiatric hospitalpp108754710.68425Communitypp23119438.712.29
[4]Juhová (2015)CzechPhDCz175InPsychiatric hospitalpp10766300Students, communitypp/online23762
[5]Meerding, van't Spijker, and van Riessen (2012)DutchPaperNl10,988OutDifferent mental praxespp7,1193,86838.413613Communitypp31429847.717.7
[6]b Juntunen, Piiparinen, Honkalampi, Inkinen, and Laitila (2015)FinnishPaperFin209Communitypp1149543.216.2
Honkalampi et al. (2017)PaperEn201In/outMental health servicespp1217935.0212.76
[7]Sproll (2011)GermanDiplGer179OutPsychotherapy ambulance, primary care ambulancepp1037636.612.6197Communitypp1009730.9912.99
[8]Kristjánsdóttir et al. (2015)IcelandicPaperEn387OutUniversity hospitalpp3177038.05207Studentspp1367122.7
[9]Palmieri et al. (2009)ItalianPaperEn647In/outPsychotherapy ambulancepp4431963611.9263Studentspp19271255.8
[10]Uji, Sakamoto, Adachi, and Kitamura (2012)JapanesePaperEn1,357InPsychiatric hospitalspp8814333513.9Students, communitypp
[11]Viliuniene et al. (2012)LithuanianPaperEn39OutPsychotherapy ambulancepp231636.712.5133Studentspp933720.80.7
[12]Skre et al. (2013)NorwegianPaperEn527OutPsychiatric centrepp32020737.412.6464Students, communitypp33313132.614.3
a Sales, de Matos Moleiro, Evans, and Alves (2012)PortuguesePaperPortStudents, communitypp77341462
[13]Zeldovich, Ivanov, Evans, and Andreas (2014)RussianPaperRu159InPsychiatric hospitalpp798038.212.8224Students, communitypp/online1695526.58.8
[14]Gampe, Bieščad, Balúnová‐Labanicová, Timulák, and Evans (2012)SlovakPaperSlo40In/outPsychiatric hospitalpp202036.1374Students, communitypp383123.27
[15]Feixas et al. (2012) and Trujillo et al. (2016)SpanishPaperEsp, en192OutPrimary care ambulancepp1306141.314.9452Students, communitypp3439429.314.4
[16]Elfström et al. (2012)SwedishPaperEn619OutPrimary care ambulancepp42717140.513229Studentspp12610327.58
[17]Campbell and Young (2016)XhosaPaperEn49OutMental health services for studentspp194Studentspp
[18]Campbell (2011)South African EnglishPaperEn312OutMental health services for studentspp2328020.6421Studentspp25616521.23

Note. In the text, we refer to the numbers in the first column. CORE‐OM: Clinical Outcome in Routine Evaluation–Outcome Measure; Pub: publication; Paper: published paper; PhD: PhD thesis; Dipl: diploma thesis; Lang: language of the publication. N: sample size; Type: sample type; pp: paper–pencil; Online: online survey; Collection: type of data collection; M: mean; SD: standard deviation; ―: information not available.

No psychometrics.

Psychometric properties of the Finnish CORE‐OM for the nonclinical sample originate from the publication Juntunen et al. (2015) and are, therefore, considered only once.

An overview of all identified CORE‐OM publications Note. In the text, we refer to the numbers in the first column. CORE‐OM: Clinical Outcome in Routine Evaluation–Outcome Measure; Pub: publication; Paper: published paper; PhD: PhD thesis; Dipl: diploma thesis; Lang: language of the publication. N: sample size; Type: sample type; pp: paper–pencil; Online: online survey; Collection: type of data collection; M: mean; SD: standard deviation; ―: information not available. No psychometrics. Psychometric properties of the Finnish CORE‐OM for the nonclinical sample originate from the publication Juntunen et al. (2015) and are, therefore, considered only once.

RESULTS

Study selection

From the 52 translations listed on the CST (2018), we could identify 26 publications covering 21 translations in the literature search. Two versions had more than one publication each: The Spanish version had one Spanish and one English paper, and the Finnish version had one master thesis and two publications in Finnish and English, respectively. If available, we chose the most recent available article for analysis, with a preference for published works over unpublished manuscripts. That way, 19 papers could be included in the quantitative analysis. Figure 1 shows the attrition diagram of the extraction process.
Figure 1

A flow diagram depicting the selection process of the included in the analysis papers

A flow diagram depicting the selection process of the included in the analysis papers Some of these 19 articles did not report all information we sought for: 18 publications reported Cronbach's α, 12 coefficients for test stability examination, and five correlation analyses of the corresponding CORE‐OM with BDI‐II [1, 2, 8, 14, 15] and six with SCL‐90 [1, 2, 6, 7, 14, 15]. Four studies were not considered in our analyses for they used other instruments for validation (BDI‐I [6], Inventory of Interpersonal Problems‐32 [7], Beck Anxiety Inventory [8], and Brief Symptom Inventory [13]). Regarding dimensional analysis, we could identify five publications applying principal component analysis (PCA) and four studies provided a CFA, but results were not reported in sufficient detail. Due to the small number of studies applying EFA and CFA and the lack of comprehensive information on the fit statistics provided by those who did, we cannot perform a quantitative analysis and will, therefore, only summarise the results. For the same reason, no analysis of CSC could be performed. These findings were verified by two independent persons M. Z. and A. M. K.

Sample characteristics

From 21 translations, the clinical samples totalled to N = 17,303, with 11,184 (65%) female and 5,977 (35%) male respondents (142 missing); the mean age was 37.3 (SD = 12.9) years. All clinical samples were convenience samples covering both outpatients and inpatients of hospitals, primary care, day and psychotherapeutic services, private psychologists, and psychotherapists. All studies used the paper–pencil version of the respective CORE‐OM translation. Nonclinical samples consisted of n = 3,633 (61%) female and n = 2,319 (39%) male respondents. Gender was missing for n = 257 (1%) respondents. The mean age in the nonclinical samples was 30.4 (SD = 15.8). Three of the 21 studies tried to adapt the proportions in nonclinical samples using sociodemographic factors from population statistics of corresponding countries, regions, or cities. Bodinaku (2014) [2] used a random walk technique for data collection in Albania, only Meerding et al. (2012) [5] engaged a survey agency for recruiting of respondents in the Netherlands. Student samples were used in 14 studies as nonclinical samples, six of which as the only source of information. All studies used the paper–pencil version, and two had additionally an online survey. Reported durations for data collection were between half a year and 2 years. Table 2 gives an overview of field phases characteristics of the studies.

Internal consistency

In total, 18 studies calculated indices of internal consistency using the Cronbach's α for the total and subscale scores, 15 of them reported also values for the nonrisk total score (Table 3). In addition to the total score for all 34 items, Evans et al. (2002) suggested determining the total score for nonrisk items (28 items without six items from the risk scale) to investigate psychological distress, which will be included in our analyses. Table 6 summarizes the results of the psychometric analyses (see section Internal consistency [Cronbach's α]). The mean coefficients per scale (column 3) ranged from 0.93 (total) to 0.72 (well‐being). The well‐being and the risk scales showed lower values compared with the other scales and the total of items. All but the problems scale have significant heterogeneity values and a high amount (I 2 > 75%) of between studies' variance. The results of the outlier tests revealed two studies contributing most to variability in results [3, 5]. Moreover, the Croatian translation [3] contributed significantly to the mean Cronbach's α due to lower values in total score and total score of nonrisk items. Nevertheless, the values of internal consistency of the Croatian CORE‐OM were still acceptable with  = 0.86, confidence interval (CI) [0.82, 0.89] and  = 0.84, CI [0.79, 0.88]. The Dutch translation [5] influenced significantly the mean Cronbach's α of the scale problems with  = 0.88, CI [0.88, 0.88].
Table 3

Reliability of the CORE‐OM translations

Reliability
Cronbach's α Test stability
No.WPFRT‐R N TypeTimespanCoefWPFRT‐R
[1]0.750.880.870.790.940.9443StudentsS0.880.870.870.640.900.91
[2]0.600.890.780.860.920.91104Community7d0.690.700.780.480.800.81
[3]0.580.920.860.840.860.8478CommunityTwice a weekP0.770.830.910.580.880.88
[4]0.740.900.800.780.930.9371Students7dP0.600.710.650.500.700.70
[5]0.750.880.840.720.940.93
[6]0.770.890.850.780.940.94
[7]0.710.890.790.800.930.9255Students7dS0.740.820.630.430.830.82
[8]0.790.870.870.660.940.94204Students2wS0.750.770.750.480.800.71
[9]0.710.870.770.770.920.91
[10]0.680.890.810.830.94Patients4w0.670.820.720.660.85
[11]0.810.860.820.670.9357Students7dS0.680.720.770.600.780.79
[12]0.700.870.840.810.940.9381Students2wS0.630.690.700.350.760.76
[13]0.630.880.760.720.920.9114Patients7dS0.900.570.810.180.76
[14]0.670.880.770.850.930.9267Students, community2wS0.700.660.700.560.750.75
[15]0.810.900.850.770.940.9478Students, community2w+S0.760.850.790.450.870.87
[16]0.760.880.850.760.940.9370Students2w+S0.800.800.810.640.850.86
[17]0.640.860.800.710.93
[18]0.760.890.840.730.94.94

Note. In the text, we refer to the numbers in the first column. CORE‐OM: Clinical Outcome in Routine Evaluation–Outcome Measure; No.: number for indication in text; W: well‐being; P: problems; F: functioning; R: risk; T: total score; ‐R: total score for nonrisk items; N: sample size; Type: sample type; Timespan: retest period in days (d) and weeks (w); Coef: coefficient (S: Spearman's ρ, P: Pearson's r); ―: information not available.

Table 6

Results of heterogeneity tests

ScaleNo. of items α ρ r CI [lower, upper] Q df p I 2 (%) H 2 Papers contributing to differences
Internal consistency (Cronbach's α)
Total score340.93[0.93, 0.94]56.7317<0.00181.265.34[3]
Nonrisk items280.92[0.92, 0.94]64.8414<0.00188.248.50[3]
Well‐being40.72[0.69, 0.75]86.2017<0.00187.337.90
Problems120.88[0.88, 0.89]19.08170.326.251.07[5]
Functioning120.83[0.81, 0.84]76.8317<0.00183.446.04
Risk60.77[0.74, 0.80]186.6417<0.00188.949.04
Test–retest reliability (Spearman's ρ)
Total score340.82[0.78, 0.85]22.07110.0252.442.10
Nonrisk items280.81[0.77, 0.85]34.5310<0.00169.933.33
Well‐being40.74[0.69, 0.78]22.42110.0248.561.94
Problems120.77[0.72, 0.81]22.93110.0252.942.12
Functioning120.78[0.72, 0.82]36.2511<0.00172.010.57[3]
Risk60.51[0.46, 0.55]12.18110.350.011.00
Convergent validity (SCL‐90)
Total score340.82[0.80, 0.84]5.2050.390.811.01[7]
Nonrisk items280.81[0.80, 0.84]5.5050.3619.881.25[7]
Well‐being40.68[0.62, 0.73]9.2450.146.481.87[2]
Problems120.81[0.77, 0.83]8.8950.1144.101.79
Functioning120.72[0.67, 0.76]7.1750.2136.581.58[2]
Risk60.61[0.51, 0.69]15.9650.0172.613.65[1]
Convergent validity (BDI‐II)
Total score340.84[0.82, 0.86]1.2140.880.001.00
Nonrisk items280.83[0.81, 0.85]0.6140.960.001.00
Well‐being40.76[0.71, 0.81]7.9040.152.902.12[2]
Problems120.78[0.75, 0.80]1.1740.880.001.00
Functioning120.76[0.71, 0.79]4.9440.2934.671.53[2, 8]
Risk60.53[0.47, 0.57]4.2240.380.021.00

Note. BDI‐II: Beck Depression Inventory‐II; SCL‐90: Symptom Checklist‐90; CI: confidence interval [lower, upper]; Q: Cochran's Q; df: degrees of freedom; p: p value; I 2 and H 2: heterogeneity coefficients; ―: none.

Reliability of the CORE‐OM translations Note. In the text, we refer to the numbers in the first column. CORE‐OM: Clinical Outcome in Routine Evaluation–Outcome Measure; No.: number for indication in text; W: well‐being; P: problems; F: functioning; R: risk; T: total score; ‐R: total score for nonrisk items; N: sample size; Type: sample type; Timespan: retest period in days (d) and weeks (w); Coef: coefficient (S: Spearman's ρ, P: Pearson's r); ―: information not available.

Test stability

The shortest retest period was twice within 1 week [3]. The other retest periods were 1 week [2, 4, 7, 11, 13], 2 weeks or more [15, 16], and 1 month [10]. Six studies lacked information on retest periods, five papers provided no test stability analyses; seven studies used a student sample for retesting, two studies used community samples, and two used clinical samples for assessing test stability. We found two papers with a mixed sample of students and community members. The pooled test stability coefficients (see Table 6, section Test–retest reliability [Spearman's ρ]) ranged from  = 0.51 (risk) to  = 0.82 (total). The risk scale showed generally a low test stability and low heterogeneity between the studies; well‐being, problems, functioning, and the total score as well as the total score of nonrisk items ranged from I 2 = 48% to I 2 = 72%. The Croatian CORE‐OM [3] influenced significantly the pooled results of the functioning scale.

Factorial structure

Table 4 shows results regarding the factorial structure of the CORE‐OM translations. None of the studies applying factor analysis could replicate the intended four‐factor structure of the instrument. Eight studies [1, 2, 4, 6, 7, 9, 11, 12] applied a PCA like Evans et al. (2002), four of which favoured three major components. The results were largely declared comparable with those of Evans et al., finding in almost all analyses a positively formulated domain measuring strengths, a negatively formulated domain measuring weaknesses, and the set of risk items. One study [2] suggested either a one‐factorial or a two‐factorial solution to describe their data adequately.
Table 4

Factorial structure of the CORE‐OM translations

Factorial structure
Principal component analysisConfirmatory analysis
No.SampleRotationComponentFactorSample χ 2 df GFIRMSEA [CI]CFI
[1]C, NCOblique3POS, NEG, RISK
[2]C, NCOrthogonalUncertainOne global scale solution
[3]NC1,641.15090.800.07 [―, ―]0.80
[4]C, NCOblimin3POS, NEG, RISK
[5]
[6]NCOblimin3POS, NEG, RISK
[7]C, NCOblique3POS, NEG, RISK
[8]
[9]C3
[10]C0.880.06
[11]C, NCOblimin3
[12]C, NCPromax2Psychological distress, riskNC1,854.75210.08 [―, ―]0.94
[13]C, NC3,964.25610.057 [0.05, 0.06]0.81
[14]
[15]
[16]
[17]
[18]

Note. In the text, we refer to the numbers in the first column. CORE‐OM: Clinical Outcome in Routine Evaluation–Outcome Measure; No.: number for indication in text; Sample: C: clinical; NC: nonclinical; rotation: method to perform the factor extraction; components: number of extracted components; factors: contextual meaning of extracted factors; POS: positively worded items; NEG: negatively worded items; RISK: risk items; χ 2: Chi‐square value; df: degrees of freedom; GFI: goodness of fit index; RMSEA: root mean square error of approximation; CI: confidence interval; CFI: comparative fit index; ―: information not available.

Factorial structure of the CORE‐OM translations Note. In the text, we refer to the numbers in the first column. CORE‐OM: Clinical Outcome in Routine Evaluation–Outcome Measure; No.: number for indication in text; Sample: C: clinical; NC: nonclinical; rotation: method to perform the factor extraction; components: number of extracted components; factors: contextual meaning of extracted factors; POS: positively worded items; NEG: negatively worded items; RISK: risk items; χ 2: Chi‐square value; df: degrees of freedom; GFI: goodness of fit index; RMSEA: root mean square error of approximation; CI: confidence interval; CFI: comparative fit index; ―: information not available. The four studies [3, 10, 12, 13] applying a CFA to assess the adequacy of the four‐factorial structure of Evans et al. (2002) reported generally rather moderate results (Table 4). None of the studies applying a CFA found the four‐factorial structure to describe the data best.

Convergent validity

Evans et al. (2002) correlated the English CORE‐OM with well‐established instruments. However, these instruments are not available in all target languages; thus, only nine studies deal with convergent validity. In total, six studies compared the CORE‐OM with the SCL‐90 and five with the BDI‐II. Table 5 summarizes coefficients resulted from single studies, and Table 6 (lower part) shows the results of the validity analyses. The pooled correlation coefficients of the SCL‐90 total score and the four scales ranged from  = 0.61 (risk) to  =0.82 (total).
Table 5

Convergent validity of the CORE‐OM translations

Convergent validity
BDI‐IISCL‐90
No.A/U N WPFRT‐RA/U N WPFRT‐R
[1]++290.790.740.780.320.810.83++340.680.870.790.830.880.85
[2]++2090.680.760.710.570.840.84++2090.570.770.640.660.820.79
[3]++0.79+−
[4]+−0.710.810.710.560.840.83
[5]+−+−
[6]+−++2010.670.830.730.570.810.82
[7]+−++1350.710.850.770.580.860.86
[8]++5770.780.780.790.520.850.82
[9]+−
[10]+−+−
[11]+−
[12]+−+−
[13]+−+−
[14]++400.790.770.760.640.870.84++400.790.760.730.580.830.84
[15]++1620.790.800.740.480.830.83++1550.700.770.720.460.790.79
[16]+−+−
[17]+−
[18]+−

Note. In the text, we refer to the numbers in the first column. CORE‐OM: Clinical Outcome in Routine Evaluation–Outcome Measure; A/U: indicates whether a translation of the Beck Depression Inventory‐II (BDI‐II) or the Symptom Checklist‐90 (SCL‐90) is available (A) or not (U) in the respective language and whether it has been used; ++: exists and used; +−: exists and not used; −−: does not exist or has been established after publication of respective study. No.: number for indication in text; N: sample size; W: well‐being; P: problems; F: functioning; R: risk; T: total score; ‐R: total score for nonrisk items; ―: information not available.

Convergent validity of the CORE‐OM translations Note. In the text, we refer to the numbers in the first column. CORE‐OM: Clinical Outcome in Routine Evaluation–Outcome Measure; A/U: indicates whether a translation of the Beck Depression Inventory‐II (BDI‐II) or the Symptom Checklist‐90 (SCL‐90) is available (A) or not (U) in the respective language and whether it has been used; ++: exists and used; +−: exists and not used; −−: does not exist or has been established after publication of respective study. No.: number for indication in text; N: sample size; W: well‐being; P: problems; F: functioning; R: risk; T: total score; ‐R: total score for nonrisk items; ―: information not available. Results of heterogeneity tests Note. BDI‐II: Beck Depression Inventory‐II; SCL‐90: Symptom Checklist‐90; CI: confidence interval [lower, upper]; Q: Cochran's Q; df: degrees of freedom; p: p value; I 2 and H 2: heterogeneity coefficients; ―: none. All but the problems scale showed nonsignificant heterogeneity tests for both SCL‐90 and BDI‐II. Nevertheless, some studies showed a major influence on the variability between the studies (see Table 6, section Convergent validity [SCL‐90]). The German CORE‐OM [7] had a significant effect on the score for nonrisk items; the Albanian CORE‐OM [2] contributed significantly to differences in variability between the studies in the scales well‐being and functioning; and the results of the English CORE‐OM [1] influence significantly the variability of the validity coefficients in the scale risk with  = 0.83, CI [0.68, 0.91]. The correlation coefficients of the CORE‐OM scales with the total score of the BDI‐II (see Table 6, section Convergent validity [BDI‐II]) ranged from  = 0.53 (risk) to  = 0.84 (total). The heterogeneity coefficients varied from I 2 = 0% (total, nonrisk items, and problems) to I 2 = 52.9% (well‐being). None of the results were significant. The Icelandic CORE‐OM [8] contributed to the findings in the functioning scale. The Albanian CORE‐OM [2] influenced the pooled correlation coefficients of the well‐being and functioning scales.

Moderator analysis

We found a moderating effect of the sample type upon Cronbach's α: The inpatient studies showed significantly lower internal consistency than the outpatient samples with respect to the total score, the nonrisk items, the well‐being scale, and the problems scale (see Table 7). The coefficients in mixed samples did not differ significantly from the outpatient samples. None of the other moderator analyses revealed significant effects; therefore, they will not be reported in detail.
Table 7

Results of the moderator analysis for Cronbach's α by sample type

Cronbach's α
Omnibus testInpatientsa Inpatients and outpatients (mixed sample)a
Scale k Q M df p β z p β z p
Total score184.3020.12−0.12−2.020.04−0.06−0.890.38
Nonrisk items157.4520.02−0.21−2.720.01−0.06−0.800.42
Well‐being185.5120.06−0.16−2.340.02−0.30−0.380.71
Problems187.1720.030.062.550.01−0.02−0.610.54
Functioning183.9420.14−0.09−1.500.13−0.12−1.600.11
Risk181.5720.460.091.180.240.070.690.49

Note. k: number of studies; Q M: empirical Q value for moderator analysis; df: degrees of freedom; p: p value; β: regression coefficient; z: z value.

Compared with the outpatient sample.

Results of the moderator analysis for Cronbach's α by sample type Note. k: number of studies; Q M: empirical Q value for moderator analysis; df: degrees of freedom; p: p value; β: regression coefficient; z: z value. Compared with the outpatient sample.

Objectivity

The objectivity is difficult to evaluate in the face of the specificities of the target languages. All the versions of the CORE‐OM have the same appearance (a two‐sided A4 paper) with some slight optical differences (font type and size must be chosen from a list of given fonts). The head of the questionnaire containing sociodemographic data, treatment setting (beginning/follow‐up/end of the treatment), and the instructions is a part of the translation process and should also be discussed in the focus group. On the bottom of the back page, there is space to provide calculation of the scale means and the sum of the total score as well as the total score of nonrisk items. The instructions at the top support comparability across test coordinators. The instructions do not provide specific details regarding the interpretation of the results. The manual of the English CORE‐OM contains cut‐off points indicating the CSC. To allow for comparing different versions of the CORE‐OM, we need standardized scores, norm tables, and reference values for CSC (split by gender, age, or other relevant factors), which did not appear in the reviewed studies.

DISCUSSION

Summary

The present study compared systematically 21 translations of the CORE‐OM applying methods of systematic review and meta‐analysis. The research focussed on the comparability of the translations with respect to psychometric properties (especially reliability, validity, and objectivity). Our results show that the different versions of the CORE‐OM are largely comparable from a psychometric point of view and adequately reflect the English CORE‐OM. Despite a certain heterogeneity in data collection, sample sizes, and composition of the samples, the international versions of the CORE‐OM provide similar results of psychometric analyses regarding all criteria. We identified six studies contributing significantly to internal consistency, retest reliability, and convergent validity, four of which in a positive direction and two (Croatia and Albania) showed significantly lower values in internal consistency and convergent validity, respectively. However, the internal consistency of the Croatian version was still above 0.80 and therefore satisfactory. In contrast, the Albanian well‐being and functioning scales had significantly and severely lower correlations with both external criteria (SCL‐90 and BDI‐II). This may be due to poor translation of the CORE‐OM, poor translation of the external criteria, inappropriate samples, or a specific attitude towards the well‐being and functioning constructs in the Albanian context. This should be pursued further in targeted studies. The CORE‐OM was developed for outpatients (Barkham et al., 1998). Because some studies collected inpatient data as well, we could compare the sample types in a moderator analysis. The internal consistency of inpatient samples was significantly lower than that of the outpatient samples; that is, the CORE‐OM performs better in the sample type it has been developed for. Hence, it should be used cautiously in inpatient settings, particularly in multicentre studies.

Reported analyses

Almost all of the 21 studies (18) analysed internal consistency, and 13 assessed retest reliability. Only nine translations performed a validation using external measures such as BDI‐II and SCL‐90, although these two instruments are available in almost all target languages (see Table 4). Likewise, the examination of the factorial validity was seldom carried out (7 PCA; 4 CFA). Therefore, factorial validity is difficult to evaluate. This may be due to the fact that the current translation guidelines (CST, 2015) do not provide any details regarding the assessment of validity, which we would consider a worthwhile extension.

Sampling

Most of the samples were convenience samples for both clinical and nonclinical populations. Additionally, half of the studies assessing stability used student samples. Therefore, a generalization of these results is only possible with great caution, if at all. None of the studies—including Evans et al. (2002)—could replicate the originally proposed four‐factorial structure of the CORE‐OM. Rather, the majority of the studies suggested that the 34 items represent a three‐factorial latent structure: positively worded items (assessing strengths), negatively worded items (assessing disabilities and distress), and the items of the risk scale. These findings are in line with former results regarding the factorial structure of the English CORE‐OM: Even the PCA of Evans et al. suggested a three‐factor solution. Lyne, Barrett, Evans, and Barkham (2006) suggested a two‐factorial structure (risk and psychological distress), recommending to use the risk items as a separate indicator of risky and self‐harming behaviour, but only by professionals. Handscomb, Hall, Hoare, and Shorter (2016) applied a CFA in a sample of tinnitus patients. They estimated 10 different model variants derived from previous studies on the CORE‐OM, finding also the poorest fit for the original four‐factorial solution and the best fit for the model containing negative, positive, and risk factors (i.e., the one that had already been identified by Evans et al., 2002). Nevertheless, the questionnaire remained unaltered with respect to both number of items and scoring. Because the translated versions of the instrument have adopted this deficiency, we see a clear need for further research on the factorial structure and scoring of the CORE‐OM.

The risk scale

The results indicate severe problems of the risk scale. The risk construct itself seems to function poorly in the selected clinical population across all countries. We have to assume that patients treated in an outpatient setting (for which the CORE‐OM has been designed) have already reached a certain degree of stability and are, therefore, not acutely at risk. None of the studies analysed here reported information on medical care for patients. Hence, this scale seems applicable rather in an (inpatient) psychiatric setting, for example, shortly after admission and at the end of the stay, to detect possible changes. The psychometric properties of the risk scale in studies using inpatient samples (i.e., Croatian, Czech, Russian, and Slovak CORE‐OM) did not differ significantly from the studies using outpatients. Therefore, the risk scale requires further detailed research. Evans et al. (2002) have already suggested evaluating the total score without considering risk items, and the present research indicates again that this calculation approach should be pursued further. For example, studies involving patients with potentially harmful behaviour (e.g., drugs or substance abuse, psychoses, and depression) could clarify the mediocre results of this scale.

Independence with respect to diagnostic groups

Evans et al. (2000) considered the CORE‐OM suitable for all diagnostic groups (p. 253), which seems questionable in the light of our empirical results. There are some publications on special disorder groups investigating the psychometric properties of the CORE‐OM, such as eating disorder (Jenkins & Turner, 2014) or emotional distress in people with tinnitus (Handscomb et al., 2016). But we could not identify studies dedicated to the applicability of the CORE‐OM to patients with personality disorder or psychoses.

Reporting standards

Our review showed also the need for more specific guidelines for reporting the results of psychometric analyses. The current CST (2015) guidelines specify detailed steps regarding the translation processes, so that translating authors are subject to highly standardised procedures. In contrast, no specific guidelines exist regarding mandatory analyses and reporting standards. It should be determined which methods are suitable for recording the respective psychometric properties (e.g., whether Spearman or Pearson correlation coefficients should be preferred for assessing the test stability). The sample sizes should be supported by a power analysis to clarify the consequences of noncompliance with regulations. The sampling requires clear presentation (see Lounsbury, Gibson, & Saudargas, 2006, discussing the consequences of using student samples). Furthermore, a standardized presentation of the results would increase the comparability of studies. Our results show further that the studies dealt only marginally with the calculation of both the CSC and the reliable change. Because the CORE‐OM is primarily suitable for measuring change, the necessity of both indices seems highly indicated.

Methodology

We consider the chosen procedure appropriate for comparing the various translations. Gilbody, Richards, Brealey, and Hewitt (2007) conducted a similar analysis using international versions of the Patient Health Questionnaire (Spitzer, Kroenke, & Williams, 1999) and Patient Health Questionnaire‐9 (Kroenke, Spitzer, & Williams, 2001). This technique has proven successful for comparative studies on questionnaires and can, therefore, be considered as a standard procedure for multiple language translation.

Conclusion

The question, whether different translations of the CORE‐OM can be treated as one and the same instrument, could therefore be answered with “yes.” However, reservations exist regarding the quality of the original (English) CORE‐OM, especially regarding the factorial structure. All translations applied the original factorial structure thus adopting its weaknesses as well. Therefore, we recommend a revision of the instrument in this regard. Keeping in mind that we dispose already of numerous follow‐up studies probing various alternative models, it is interesting to note that none of these results has so far found its way into the CORE‐OM. A very promising candidate was the approach of Lyne et al. (2006). The authors used a “nested factors first‐order general factor model with four residualized latents (…) and with two method latents of positively and negatively worded items” (p. 195). However, this complex model would not allow for a straightforward scoring required in a clinical daily routine. Another promising candidate would be the three‐factorial model of Evans et al. (2002), which deserved a closer inspection, possibly involving item response theory models (e.g., de Ayala, 2009). Our results show further that the instrument performs better with outpatient samples, which has to be considered when using the CORE‐OM in multicentre studies. Finally, international guidelines for the reporting on translation and adaptation studies should be established. This will increase both the quality of the studies and the comparability between different translations.

DECLARATION OF INTEREST STATEMENT

The authors declare no conflict of interest. Data S1. Supporting information Click here for additional data file.
  43 in total

1.  A graphical method for exploring heterogeneity in meta-analyses: application to a meta-analysis of 65 trials.

Authors:  Bertrand Baujat; Cédric Mahé; Jean-Pierre Pignon; Catherine Hill
Journal:  Stat Med       Date:  2002-09-30       Impact factor: 2.373

2.  Assessing the needs of the severely mentally ill: cultural and professional differences.

Authors:  M Slade
Journal:  Int J Soc Psychiatry       Date:  1996

Review 3.  What outcomes to measure in routine mental health services, and how to assess them: a systematic review.

Authors:  Mike Slade
Journal:  Aust N Z J Psychiatry       Date:  2002-12       Impact factor: 5.744

4.  Validation of the Swedish version of the Clinical Outcomes in Routine Evaluation Outcome Measure (CORE-OM).

Authors:  M L Elfström; C Evans; J Lundgren; B Johansson; M Hakeberg; S G Carlsson
Journal:  Clin Psychol Psychother       Date:  2012-03-22

Review 5.  Outcome measurement: concepts and questions.

Authors:  K N Lohr
Journal:  Inquiry       Date:  1988       Impact factor: 1.730

6.  The Finnish Clinical Outcome in Routine Evaluation Outcome Measure: psychometric exploration in clinical and non-clinical samples.

Authors:  Kirsi Honkalampi; Aarno Laitila; Hanna Juntunen; Kristiina Lehmus; Aino Piiparinen; Iida Törmänen; Mikko Inkinen; Chris Evans
Journal:  Nord J Psychiatry       Date:  2017-08-24       Impact factor: 2.202

7.  Validation and utility of a self-report version of PRIME-MD: the PHQ primary care study. Primary Care Evaluation of Mental Disorders. Patient Health Questionnaire.

Authors:  R L Spitzer; K Kroenke; J B Williams
Journal:  JAMA       Date:  1999-11-10       Impact factor: 56.272

8.  Confirmatory factor analysis of Clinical Outcomes in Routine Evaluation (CORE-OM) used as a measure of emotional distress in people with tinnitus.

Authors:  L Handscomb; D A Hall; D J Hoare; G W Shorter
Journal:  Health Qual Life Outcomes       Date:  2016-09-06       Impact factor: 3.186

9.  The factor structure and psychometric properties of the Clinical Outcomes in Routine Evaluation--Outcome Measure (CORE-OM) in Norwegian clinical and non-clinical samples.

Authors:  Ingunn Skre; Oddgeir Friborg; Sigmund Elgarøy; Chris Evans; Lars Henrik Myklebust; Kjersti Lillevoll; Knut Sørgaard; Vidje Hansen
Journal:  BMC Psychiatry       Date:  2013-03-22       Impact factor: 3.630

Review 10.  From pre-registration to publication: a non-technical primer for conducting a meta-analysis to synthesize correlational data.

Authors:  Daniel S Quintana
Journal:  Front Psychol       Date:  2015-10-08
View more
  3 in total

1.  Comparing outcomes: The Clinical Outcome in Routine Evaluation from an international point of view.

Authors:  Marina Zeldovich; Rainer W Alexandrowicz
Journal:  Int J Methods Psychiatr Res       Date:  2019-02-19       Impact factor: 4.035

2.  Measurement properties of the Swedish clinical outcomes in routine evaluation outcome measures (CORE-OM): Rasch analysis and short version for depressed and anxious out-patients in a multicultural area.

Authors:  Louise Danielsson; Magnus L Elfström; Javier Galan Henche; Jeanette Melin
Journal:  Health Qual Life Outcomes       Date:  2022-02-19       Impact factor: 3.186

3.  Clinical Outcomes in Routine Evaluation Measures for Patients Discharged from Acute Psychiatric Care: Four-Arm Peer and Text Messaging Support Controlled Observational Study.

Authors:  Reham Shalaby; Pamela Spurvey; Michelle Knox; Rebecca Rathwell; Wesley Vuong; Shireen Surood; Liana Urichuk; Mark Snaterse; Andrew J Greenshaw; Xin-Min Li; Vincent I O Agyapong
Journal:  Int J Environ Res Public Health       Date:  2022-03-23       Impact factor: 3.390

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.