Literature DB >> 27382288

Psychometric properties of the Spanish version of the Clinical Outcomes in Routine Evaluation - Outcome Measure.

Adriana Trujillo¹, Guillem Feixas¹, Arturo Bados², Eugeni García-Grau², Marta Salla², Joan Carles Medina², Adrián Montesano¹, José Soriano³, Leticia Medeiros-Ferreira⁴, Josep Cañete⁵, Sergi Corbella⁶, Antoni Grau⁷, Fernando Lana⁸, Chris Evans⁹.

Abstract

OBJECTIVE: The objective of this paper is to assess the reliability and validity of the Spanish translation of the Clinical Outcomes in Routine Evaluation - Outcome Measure, a 34-item self-report questionnaire that measures the client's status in the domains of Subjective well-being, Problems/Symptoms, Life functioning, and Risk.
METHOD: Six hundred and forty-four adult participants were included in two samples: the clinical sample (n=192) from different mental health and primary care centers; and the nonclinical sample (n=452), which included a student and a community sample.
RESULTS: The questionnaire showed good acceptability and internal consistency, appropriate test-retest reliability, and acceptable convergent validity. Strong differentiation between clinical and nonclinical samples was found. As expected, the Risk domain had different characteristics than other domains, but all findings were comparable with the UK referential data. Cutoff scores were calculated for clinical significant change assessment.
CONCLUSION: The Spanish version of the Clinical Outcomes in Routine Evaluation - Outcome Measure showed acceptable psychometric properties, providing support for using the questionnaire for monitoring the progress of Spanish-speaking psychotherapy clients.

Entities: Disease Species

Keywords: CORE-OM; outcome measure; psychometric validation; reliability; validity

Year: 2016 PMID： 27382288 PMCID： PMC4922811 DOI： 10.2147/NDT.S103079

Source DB: PubMed Journal: Neuropsychiatr Dis Treat ISSN： 1176-6328 Impact factor: 2.570

Introduction

This paper reports the psychometric properties of the Spanish version of Clinical Outcomes in Routine Evaluation – Outcome Measure (CORE-OM). The CORE-OM was designed mainly for practice-based evidence (a complement to evidence-based practice).1 We expected that the Spanish translation of the CORE-OM would be a useful, reliable, and valid instrument suitable to be widely used for research and practice in Spain and in some countries in which Spanish is spoken in similar form to that used in Spain. The translation should also prove a useful base, with the original English version, for countries where local Spanish usage is sufficiently different from that in Spain that a somewhat different translation will be needed. From its origin, the measure was designed to be pan-theoretical (not associated with a school of therapy) and pan-diagnostic (not focused on a single presenting problem), and was driven by what practitioners and clients considered to be the most important generic aspects of psychological well-being, and change in therapies, to be measured. It is recommended to be used before and at the end of therapy. The CORE-OM measure is copyleft; that is, it can be reproduced without payment of any license fee if it is not changed in any way.2 Translations were done following the CORE System Trust (CST) protocol, and with the supervision and guidance of Chris Evans (CE). Copyright violations are illegal, but CST and CE welcome collaboration on new translations to the protocol. All CORE instruments are available to download from,3 which provides more information about the system, instruments, and translation protocol. Information focused on the CORE-OM in Spanish is at.4 There are many fields in which CORE-OM has demonstrated its utility having been used in areas as varied as benchmark studies,5,6 assessment of outcome of psychological therapies in primary and secondary settings,7–10 studies of treatment processes,11–13 assessment of the psychological well-being of individuals in nonclinical occupational settings,14 and examination of psychological health among university students who were receiving university counseling.15,16 Acceptability and psychometric properties have been demonstrated with diverse samples, for example, older people and patients with eating disorders.17,18 Though designed more for practice-based evidence, the CORE-OM has been used in randomized controlled trials.19–21 CORE-OM has been translated, following a clear and thorough protocol,22 into over 20 languages, with that number growing. Evaluation of the psychometric properties of the translated measure has been completed showing comparable psychometric properties to that found for the English version in the UK for a growing number of languages including Italian,23 Portuguese,24 Swedish,25 Lithuanian,26 Icelandic,27 and Croatian,28 and many others are nearing completion including a Catalan version. All forms are widely used as a routine change measure in a range of health care settings in the UK and increasingly in other languages and countries.13,29–32 This study was designed to assess the psychometric properties of the Spanish translation of the CORE-OM, and hence its suitability to be used in routine assessment of mental health interventions in Spain and perhaps other Spanish-speaking countries.

Method

The Spanish version

For the translation and adaptation of the CORE-OM to Spanish, we followed the steps established by international groups and the CST protocol including participation of a member of the group who designed the instrument (CE).33 This process is congruent to the guidelines of the International Test Commission,34 and it emphasizes the importance of translating the items according to their con-textualized meaning in culture and the environment in which they will be applied, as well as making them understandable for the most varied range of possible potential users. It does not rely excessively on back-translation in order to avoid too literal translations. To seek for improvement of the resulting version, we requested the collaboration of 12 people from different parts of Spain, selected because of their high level of English proficiency. Ten of them responded to the request by providing a translation. Six of them were professionals in psychology, and four were lay people. With this material, a working session was organized with the participation of two of the professionals of psychology and two of the lay people who collaborated with the translation, along with a member of the CST (CE) who acted as a consultant or supervisor. In this session, each item was discussed taking into account the available translations. For each item, the best option was chosen by consensus. A first draft came out of this process which was reviewed by three experts in psychology with over 20 years of experience in clinical settings who made some modifications that were discussed by email with CE. This revised version was submitted to extensive scrutiny by a group of 64 people (between 16 and 76 years, all of them from different conditions and linguistic backgrounds, and fully proficient in Spanish, 12 of them were professionals of psychology, and 52 lay people) who were asked to read it carefully and to judge whether the items were understandable and clear. They were also encouraged to make all the comments they deemed appropriate with regard to the way items were written. Afterward, the comments and observations made were discussed by the three experts mentioned, and issues that seemed to need discussion of the original English were shared with CE, until a final version was achieved. This version was delivered to an experienced, bilingual English–Spanish translator, with a degree in psychology and without access to the original version, to back-translate. Looking at this back-translation, neither the experts nor the member of the CST considered necessary to make any modification of the latest version, at which point it became the CST-approved translation into Spanish. From that version,33 the shorter versions (designed for routine use in therapy sessions, for screening and ongoing monitoring: CORE-SFA, CORE-SFB, CORE-10, and CORE-5, all in male and female versions35) were typeset and made available through webpages, initially4 and now also.22

Participants and procedures

The study protocol was approved by the Bioethics Committee of the University of Barcelona (ref. IRB0003099) and by the ethical committees of the centers taking part in the study. All the participants were informed of the implications of the study and signed an informed consent document before enrolling. The study included 644 adult participants in two samples (Table 1). The clinical sample (n=192) comprised patients from nine mental health centers and from some primary care centers in the Barcelona area. The CORE-OM was included in the routine pretreatment assessment of these centers, and it is this routine clinical data that are reported on in this paper. All patients who were referred for psychological treatment between March 2012 and May 2013 in the centers collaborating in the study were included in the study. Professionals were asked to exclude from these referrals inpatients and outpatients with severe psychological disorders. Another exclusion criterion was insufficient linguistic competence to communicate in Spanish.

Table 1

Demographic data

Sample	Total (missing data for sex)	Females (%)	Males (%)	Mean age (SD)	Age range (years)
Nonclinical sample	452 (15)	343 (75.9)	94 (20.8)	29.3 (14.4)	18–76
Students	310 (15)	250 (76.9)	60 (18.5)	23.2 (6.1)	18–69
Community sample	127 (0)	93 (73.2)	34 (26.8)	44.4 (17.6)	20–76
Clinical sample (outpatients)	192 (1)	130 (67.7)	61 (31.8)	41.3 (14.9)	18–78
Primary care	44 (0)	29 (65.9)	15 (34.1)	41.8 (12.7)	22–76
Secondary care	147 (1)	101 (68.2)	46 (31.1)	41.1 (15.5)	18–78
Test–retest sample	78 (0)	54 (69.2)	24 (30.8)	34.9 (18.8)	18–69
Students	32 (0)	26 (81.3)	6 (18.8)	20.7 (3.8)	18–34
Community sample	46 (0)	28 (60.9)	18 (39.1)	44.8 (18.8)	20–69

Abbreviation: SD, standard deviation.

The nonclinical sample (n=452) included a student and a community nonstudent sample, between 18 and 70 years of age (inclusion criteria) who were assessed in the same period from March 2012 to May 2013 and had sufficient linguistic competence to communicate in Spanish. The latter (n=127) consisted of volunteers and/or their relatives who were not receiving psychological treatment (exclusion criterion). The student sample (n=325) was drafted from the Faculty of Psychology of the University of Barcelona; 219 were undergraduate students from four different subject areas, and 106 were master-level students. Forty-six participants of the community sample and 32 of the student sample agreed to take part in the test–retest survey completing the questionnaire twice; this second administration of the questionnaire took place between 15 and 30 days after the first one. For student, test and retest were made in their classrooms with a 2-week test–retest interval; for the community sample, all the participants who completed the first assessment were contacted by phone ~2 weeks later and were invited to participate in the retest survey. For those who accepted, the questionnaire was sent in an envelope, and they completed and returned it. Test–retest stability was not measured in the clinical sample as that would have involved significant interference with normal clinical management of these participants. This was in line with the UK original study where there was no test–retest stability examination in the clinical sample for the same reason.36

Instruments and measures

CORE-OM is a 34-item self-report questionnaire that assesses the client’s status in the domains of Subjective well-being (four items), Problems/Symptoms (12 items), Functioning (12 items), and Risk (six items).36,37 Eight of the items are positively cued (items 3, 4, 7, 12, 19, 21, 31, and 32). The focus is on the last 7 days, and items are scored in a five-point scale ranging from 0 (not at all) to 4 (most or all the time), where higher scores on all domains indicate more problems and high levels of psychological distress even for the Subjective well-being scale. The domains were named to designate their item content but never envisaged to be psychometric factors.35,38 The Subjective well-being domain comprises four items capturing this aspect. The Problems/Symptom domain includes four items addressing anxiety, four for depression, and two each for physical problems and trauma. The Functioning domain includes four items covering general/work functioning, four addressing close relationships, and four for social functioning. The Risk domain has four items about risk to self and two about risk to others. The CORE-OM was designed to be user-friendly for both clients and practitioners.35,39 It takes 5–10 minutes to complete, and the total and domain scores are reported as means across items. Prorating, that is, using the item mean even with missing items, is recommended as long as <10% of the items in the score are missing.36 Psychometric properties were excellent in the original UK testing and in all subsequent explorations showing high internal consistency (Cronbach α between 0.75 and 0.94 for all scores, the lowest for Risk) and test–retest stability of 0.91 (Spearman’s ρ 0.91 for 1-week test–retest in a student sample). Discriminant validity showed large differences between clinical and nonclinical samples (Cohen’s d from 0.71 Risk to 1.77 Problems/Symptoms) and high correlations with measures which are conceptually close, for example, Beck Depression Inventory-II (BDI-II) (ρ=0.85) and Symptom Checklist 90 Revised (SCL-90-R) (ρ=0.88). The CORE-OM is also sensitive to change in therapies.15,36 As expected, the domains did not show neat factorial separation, but an oblique structure in which Risk items are clearly separated from other items and two strongly correlated main problem dimensions of the positively and the negatively cued items gave a moderate and just acceptable fit on confirmatory factor analysis.38 BDI-II is a 21-item self-administered inventory designed to measure the intensity of depressive symptoms in psychiatric and nonpsychiatric populations of both adults and adolescents.40 Items are rated on a four-point scale (0–3), and total scores are obtained by tallying the ratings for all 21 items. Scores range from 0 to 63, with higher scores reflecting increased depressive severity. The BDI-II requires ~5–10 minutes to complete and may be administered to individuals 13–80 years of age. We used the Spanish-language version of the BDI-II.41 SCL-90-R is a 90-item self-report symptom inventory designed to screen for a broad range of psychological problems.42 Each of the 90 items is rated on a five-point Likert scale of distress, ranging from “not at all” (0) to “extremely” (4). Subsequently, the answers are combined in nine primary symptom dimensions: Somatization, Obsessive-Compulsive, Interpersonal Sensitivity, Hostility, Depression, Anxiety, Paranoid Ideation, Phobic Anxiety, and Psychoticism. In addition, three global indices provide measures of overall psychological distress: the Global Severity Index, the Positive Symptom Total, and the Positive Symptom Distress Index. We used the Spanish-language version of the SCL-90-R.43

Analyses

To facilitate a comparison with the UK data, we followed the original study by assessing acceptability, internal consistency, test–retest reliability (with 15- to 30-day interval), influence of age and sex, correlations between domain scores, and discriminant validity against sample, reflected in the differences between clinical and nonclinical sample, along with the calculation of cutoff scores, and convergent validity in terms of the correlations between CORE-OM’s scores and those on the BDI-II and SCL-90-R.36 Following the UK study, most analyses were reported for each of the four content domains (Subjective well-being, Problems/Symptoms, Life/social functioning, and Risk) as well as for total scale, and for score of all items except those in the Risk domain. Internal reliability was reported as Cronbach’s α for the subsample with no missing item data,44 but results for domain scores were reported where a score could be computed by prorating up to 10% of missing items. To test the equality for the different coefficients in the samples and subsamples, a Felt’s procedure was done.45 Again following the UK validation study, nonparametric correlation coefficients (Spearman’s ρ) and nonparametric tests of differences in central location of distributions (Wilcoxon test) were used as scores did not conform to Gaussian distributions. The BDI-II40,41 and the SCL-90-R42,43 were used to test convergent validity with other self-report measures. Clinically significant change was calculated according to the c criterion that uses a cutoff point based on the contrast between dysfunctional and general population samples.46 Analyses were conducted using SPSS, version 20.0. As in the original paper, the methodology was mainly exploratory and descriptive rather than one of null hypothesis testing; wherever possible, 95% confidence intervals (CIs) were reported rather than P-values. This gave a test approximating to testing for P<0.05. Comparisons of parameters within this sample and against those reported in the UK data were generally informed in terms of overlap or not of CIs.15 This paper did not follow the original UK analysis in including a principal component analysis, as subsequent UK papers have shown that the CORE-OM, as its authors expected, has a complicated factor structure that would need larger clinical and nonclinical samples for the Spanish data than we have to date.6 More psychometric exploration will be reported later when such significantly larger samples are available.

Results

Acceptability

All of the questionnaires have sufficiently few items missing to allow prorating for a usable overall score (ie, no participant omitted more than three items). One hundred and seventy-nine (93.2%) participants of the clinical and 432 (95.6%) of the nonclinical samples returned completed data. The overall omission rate was 0.17%. The items that were most often incomplete were items 3 (0.7%) and 25 (0.7%) in the nonclinical and items 21 (1%) and 32 (1%) in the clinical sample.

Internal consistency

To evaluate the internal reliability, we calculated Cronbach’s α,44 for all domains and the entire scale for the clinical and nonclinical groups. Furthermore, to test if the differences between these coefficients were statistically significant, we followed the procedure proposed by Feldt et al.45 All domains showed an appropriate internal reliability in both samples. The levels were within the acceptable range, although being lower for the Risk domain (Table 2 and Figure 1).

Table 2

Coefficient α (95% CI) denoting internal consistency for nonclinical and clinical samples

Domains	Nonclinical samples		Clinical samples		Pooled nonclinical samples (n=452)	Pooled clinical samples (n=192)
Domains	Students(n=325)	Community(n=127)	Primary care(n=44)	Secondary care(n=148)	Pooled nonclinical samples (n=452)	Pooled clinical samples (n=192)
Subjective well-being	0.80 (0.76, 0.83)	0.80 (0.74, 0.85)	0.79 (0.67, 0.88)	0.81 (0.75, 0.85)	0.80 (0.77, 0.83)	0.81 (0.76, 0.85)
Problems/Symptoms	0.88 (0.86, 0.90)	0.85 (0.81, 0.89)	0.86 (0.80, 0.91)a	0.90 (0.88, 0.92)a	0.88 (0.86, 0.90)	0.90 (0.87, 0.91)
Functioning	0.86 (0.84, 0.88)	0.84 (0.80, 0.88)	0.82 (0.73, 0.89)	0.86 (0.83, 0.89)	0.86 (0.84, 0.88)	0.85 (0.82, 0.88)
Risk	0.73 (0.68, 0.77)b	0.60 (0.48, 0.70)b	0.80 (0.68, 0.87)	0.76 (0.70, 0.82)	0.71 (0.66, 0.75)c	0.77 (0.71, 0.82)c
Nonrisk items	0.94 (0.93, 0.95)	0.93 (0.91, 0.95)	0.92 (0.88, 0.95)	0.85 (0.93, 0.96)	0.94 (0.93, 0.95)	0.94 (0.93, 0.95)
All items	0.94 (0.93, 0.95)b	0.92 (0.90, 0.94)b	0.93 (0.90, 0.95)	0.95 (0.93, 0.96)	0.94 (0.93, 0.95)	0.94 (0.93, 0.95)

Notes:

P<0.05 (significantly higher α in the secondary care sample in comparison with primary care sample).

P<0.05 (significantly higher α in the students sample in comparison with the community sample).

P<0.05 (significantly higher α in the clinical sample in comparison with the nonclinical sample).

Abbreviation: CI, confidence interval.

Figure 1

Forest plot showing comparison between Spanish scores and UK referential data.

In comparison with the UK referential data, the pooled clinical and nonclinical α values for all items and all nonrisk items showed tight 95% CIs covering the UK referential values, and when the clinical and nonclinical samples were pooled, the lower confidence limit (CL) was above that for the UK data. For Subjective well-being, the Spanish α was above the UK one; for Problems/Symptoms, the clinical sample α had a CI covering the UK one, and the nonclinical α was slightly lower than the UK nonclinical value with the upper CL below the UK value; for Functioning, the CIs included the UK referential values. The values for the Risk domain were lower than the UK ones (which were the same for clinical and nonclinical samples at 0.79), though the CI for the combined clinical sample included 0.79.

Test–retest stability

Test–retest correlations were strong within domains in the nonclinical data (Table 3). The stabilities for all domains were satisfactory (range: 0.76–0.87), except for the Risk domain (0.45) reflecting the high rate of zero responses in answering these items in the nonclinical group. Changes of mean values between first and second survey were not significant for all scores.

Table 3

Test–retest stability and changes of mean values between first and second survey in a nonclinical sample (n=78)

Domains	Test–retest stabilitya	Change
Domains	Test–retest stabilitya	Mean	95% CI	P-valueb
Subjective well-being	0.76	−0.013	−0.13, 0.10	0.80
Problems/Symptoms	0.85	0.045	−0.03, 0.12	0.47
Functioning	0.79	0.045	−0.02, 0.12	0.16
Risk	0.45	0.008	−0.04, 0.04	0.90
Nonrisk items	0.87	0.037	−0.02, 0.10	0.36
All items	0.87	0.030	−0.02, 0.08	0.43

Notes:

Rho Spearman correlation.

Wilcoxon test.

Abbreviation: CI, confidence interval.

Convergent validity

Correlations between domain scores and the BDI-II and the SCL-90-R were calculated (Table 4). Across domain scores, correlations were highest against conceptually close measures showing an acceptable convergent validity. The pattern and the correlations were generally very similar to the UK findings,36 although the Spanish correlations between the Risk scores and the BDI-II and SCL-90R were lower than the UK ones.

Table 4

Correlations with referential measures in clinical samples

Samples	n	Domains
Samples	n	W	P	F	R	−R	All
Primary care (present study)
BDI-II	39	0.76	0.75	0.65	0.32	0.78	0.74
SCL-90-R	30	0.66	0.56	0.58	0.10	0.64	0.61
Secondary care (present study)
BDI-II	123	0.80	0.82	0.77	0.55	0.85	0.85
SCL-90-R	125	0.70	0.81	0.75	0.51	0.82	0.82
Pooled clinical samples (present study)
BDI-II	162	0.79	0.80	0.74	0.48	0.83	0.83
SCL-90-R	155	0.70	0.77	0.72	0.46	0.79	0.79
Clinical sample (Evans et al36)
BDI-II	29	0.79	0.74	0.78	0.32	0.83	0.81
SCL-90-R	34	0.68	0.87	0.79	0.83	0.85	0.88

Abbreviations: W, subjective well-being; P, problems/symptoms; F, functioning; R, risk; BDI-II, Beck Depression Inventory-II; SCL-90-R, Symptom Checklist-90-Revised.

Differences between clinical and nonclinical samples

There were significant differences between clinical and nonclinical samples in all domains (Table 5) with higher scores for the clinical sample than the nonclinical one. With the exception of the Problem/Symptoms domain, the effect sizes of the differences were similar to the results of the UK study with CIs including the UK referential values.36 At 1.4 (CI 1.22–1.59), the effect size for the Problem/Symptoms score is lower than the UK referential that was 1.7 but remains respectable as discriminant validity against the clinical/nonclinical distinction. As in the UK data, the effect size of the difference for the Risk score at 0.8 was smaller than for all the other scores, actually higher than that in the UK data (0.7) but with the CI including the UK value.

Table 5

Mean and standard deviations for clinical and nonclinical samples

Domains	Present study						Evans et al*
	Nonclinical (n=452)		Clinical (n=192)		95% CI		Nonclinical (n=1,084)		Clinical (n=863)		95% CI
	Mean	SD	Mean	SD	Difference	da	Mean	SD	Mean	SD	Difference	da,b
Subjective well-being	1.18	0.76	2.41	0.95	1.08, 1.36	1.5 (1.31, 1.68)	0.91	0.83	2.37	0.96	1.38, 1.53	1.6 (1.54, 1.74)
Problems/Symptoms	0.99	0.62	1.98	0.87	0.86, 1.10	1.4 (1.22, 1.59)	0.90	0.72	2.31	0.88	1.33, 1.48	1.7 (1.67, 1.88)
Functioning	0.74	0.52	1.56	0.75	0.71, 0.92	1.3 (1.19, 1.55)	0.85	0.65	1.86	0.84	0.95, 1.09	1.3 (1.26, 1.46)
Risk	0.11	0.27	0.48	0.66	0.29, 0.44	0.8 (0.69, 1.04)	0.20	0.45	0.63	0.75	0.38, 0.49	0.7 (0.62, 0.81)
Nonrisk items	0.91	0.55	1.86	0.78	0.84, 1.05	1.5 (1.32, 1.70)	0.88	0.66	2.12	0.81	1.18, 1.31	1.7 (1.59, 1.80)
All items	0.77	0.48	1.62	0.71	0.75, 0.94	1.5 (1.33, 1.71)	0.76	0.59	1.86	0.75	1.04, 1.16	1.6 (1.55, 1.76)

Notes:

Cohen effect size parameter.

Cohen’s d has been calculated with the data provided at UK study.36

Reproduced with permission from Evans C, Connell J, Barkham M, et al. Towards a standardised brief out come measure: psychometric properties and utility of the CORE-OM. Br J Psychiatry. 2002;180:51–60.36 Available from: http://bjp.rcpsych.org/content/180/1/51.long.

Abbreviations: CI, confidence interval; SD, standard deviation.

The box plot in Figure 2 shows no patients in the clinical sample scoring zero and a very few patients (outliers) in the nonclinical sample scoring very highly. The box for the one sample and the median line bisecting the box for the other sample do not overlap.

Figure 2

Box plot of mean item score for all items for clinical and nonclinical samples.

Abbreviation: CORE-OM, Clinical Outcomes in Routine Evaluation - Outcome Measure.

Sex and age differences

In the nonclinical sample, age was significantly and negatively related with all domain scores except Risk: Subjective well-being (ρ=−0.25, P<0.001), Problems/Symptoms (P=−0.23, P<0.001), and Functioning (ρ=−0.18, P<0.001); nevertheless, those relationships were weak. In the clinical sample only, the Functioning domain showed a significant correlation with age (ρ=−0.19, P=0.006), and again, this relationship was weak. Regarding sex, only the Subjective well-being domain showed a statistical difference between men and women in both samples with a small effect size (Table 6).

Table 6

Sex differences in scores for clinical and nonclinical samples

Domains	Nonclinical						Clinical
	Male (n=94)		Female (n=343)		95% CI		Male (n=61)		Female (n=130)		95% CI
	Mean	SD	Mean	SD	Difference	da	Mean	SD	Mean	SD	Difference	da,b
W	0.95	0.77	1.23	0.76	−0.46, −0.11	−0.37 (−0.60, −0.14)	2.15	1.03	2.51	0.88	−0.65, −0.08	−0.39 (−0.69, −0.08)
P	0.96	0.68	0.99	0.60	−0.18, 0.10	−0.05 (−0.28, 0.18)	1.83	0.92	2.05	0.85	−0.48, 0.04	−0.25 (−0.56, 0.05)
F	0.70	0.57	0.74	0.50	−0.15, 0.08	−0.08 (−0.31, 0.15)	1.57	0.80	1.55	0.73	−0.21, 0.25	0.03 (−0.28, 0.33)
R	0.13	0.28	0.10	0.27	−0.03, 0.08	0.11 (−0.12, 0.34)	0.53	0.78	0.44	0.59	−0.11, 0.29	0.14 (−0.17, 0.44)
All – R	0.85	0.60	0.92	0.54	−0.19, 0.05	−0.13 (−0.35, 0.10)	1.77	0.83	1.90	0.75	−0.37, 0.10	−0.17 (−0.47, 0.14)
All	0.72	0.52	0.77	0.47	−0.16, 0.05	−0.10 (−0.33, 0.12)	1.56	0.78	1.64	0.68	−0.30, 0.13	−0.11 (−0.42, 0.19)

Note:

Cohen effect size parameter.

Abbreviations: CI, confidence interval; SD, standard deviation; W, Subjective well-being; P, Problems/Symptoms; F, Functioning; R, Risk.

Correlations between domain scores

Table 7 shows, as expected, significant and generally strong correlations between all domains. However, correlations between Risk domain scores and the other scores were lower, especially in the nonclinical sample.

Table 7

Correlations between Spearman’s ρ values for clinical and nonclinical samples

Domains	W	P	F	R	All – R
Nonclinical (n=452)
W
P	0.79
F	0.77	0.75
R	0.33	0.39	0.40
All – R	0.89	0.94	0.91	0.41
All	0.88	0.94	0.91	0.45	0.99
Clinical (n=192)
W
P	0.85
F	0.71	0.76
R	0.51	0.56	0.57
All – R	0.89	0.95	0.90	0.60
All	0.88	0.94	0.90	0.67	0.99

Abbreviations: W, subjective well-being; P, problems/symptoms; F, functioning; R, Risk.

Clinically significant change

Values for clinical significant change were calculated for all domains following the c criterion which takes into account data from both clinical and nonclinical samples.46 Cutoff scores (Table 8) separate typical clinical and nonclinical populations and will help to identify the extent to which change after treatment is clinically meaningful.

Table 8

Male and female cutoff scores between clinical and nonclinical populations

Domains	Present study		Evans et al*
Domains	Male	Female	Male	Female
Subjective well-being	1.46	1.82	1.37	1.77
Problems/Symptoms	1.33	1.43	1.44	1.62
Functioning	1.06	1.07	1.29	1.30
Risk	0.24	0.21	0.43	0.30
Nonrisk items	1.24	1.33	1.36	1.50
All items	1.06	1.13	1.19	1.29

Note:

Reproduced with permission from Evans C, Connell J, Barkham M, et al. Towards a standardised brief outcome measure: psychometric properties and utility of the CORE-OM. Br J Psychiatry. 2002;180:51–60.36 Available from: http://bjp.rcpsych.org/content/180/1/51.long.

Discussion

To the extent that these psychometric analyses of these data from the Spanish version of the CORE-OM are good or acceptable, the translation is supported for use in Spanish-speaking populations. Regarding acceptability, considered as the number of missing items and unusable measures, the results were excellent compared to those obtained in the original English-language test.37 In our study, the percentage of complete item responses was higher for both the clinical and the nonclinical sample than in the initial UK testing, which could be taken as an evidence not only for the proper design of the questionnaire but also for the quality of the translation process carried out to adapt this instrument into Spanish.33 These results are consistent with other studies of validation such as the Italian, where the percentages of item response (96% for the clinical sample and 81% for nonclinical sample)23 are comparable or lower than those observed in the current study (93.2% for the clinical and 95.6% for the nonclinical sample). Similarly, the results from Sweden have an omission rate of 0.44% of items,25 compared with 0.17% in our study. There are no patterns regarding specific items in which omissions occurred, indicating that there appears to be no connection to any specific dimension. Considering reliability, the results are acceptable and consistent with the analysis made in other studies of adaptation and validation,25–27 as well as with the original UK data. In all of these translations, including the present study, some differences in the internal consistency between clinical and nonclinical samples were identified; however, in all domains, the α value was between 0.7 and 0.9, which means that the reliability of the CORE-OM in Spanish has resulted as satisfactory as in other versions. α was lowest for the Risk domain, at 0.71 for the pooled nonclinical sample, lower than the observed value of 0.79 in the UK validation study (CI 0.77–0.81). It seems likely that this difference arises because the Risk items are tuned to catch mostly only quite significant levels of Risk, giving floor effects that curtail variance in nonclinical samples. It seems possible that both, the larger size of the UK nonclinical sample compared to that reported here and perhaps a higher rate of Risk to self in the UK populations, where, particularly in young adults, self-harm may be more prevalent than in some other countries probably including Spain,47,48 may have led to more inter-item covariance appearing in this score due to floor effects rather than necessarily to much lower population covariances. Test–retest stability in our study was good with the exception of the Risk domain score, which again is likely to be explained by its small length, floor effects, and the intrinsically impulsive (and thus unstable) nature of some of the phenomena addressed by these items. Stability correlations were strong but slightly lower than in the UK study,36 which is consistent with other results such as the Icelandic data.27 Regarding convergent validity, correlations between the domain scores of the CORE-OM and the BDI-II and SCL-90-R were strong except again for the Risk scores, which is consistent with the original UK data.36 In different studies, the CORE-OM has shown satisfactory convergent validity with other conceptually close measures which supports its value as a wider general measure for psychotherapy outcome assessment.23,25,27,49 Comparative analysis showed significant differences between clinical and nonclinical population in all domains, as in other validation studies, demonstrating discriminant validity across different countries and languages. The effect size (Cohen’s d) values were large for all domains. As in the original UK data, small but statistically significant correlations between scores and age were found in our study, more so in the nonclinical than the clinical samples. These seem likely to be genuine demographic associations, but the small effect size illustrates that age does not strongly and systematically contaminate scores. However, the majority of participants in the nonclinical sample were students (72%) with a very different age mean and range from clinical population. Thus, larger replication studies with more diverse nonclinical samples are needed to ascertain the generalizability of these differences. Furthermore, a community sample of persons who exceed pensionable age, almost absent in these samples, would indicate whether specific norms are needed for older populations.17 In the analysis of sex differences in mean scores, only Subjective well-being domain showed a statistical difference between men and women in both samples, with a small effect size in the same direction as the results analyzed in the UK version.15 According to the UK authors, sex should be considered in the interpretation of individual data regardless of clinical or nonclinical condition. In the Swedish and Italian studies, sex differences were very similar to those found in the UK study.23,25,36 However, it seems highly plausible that there will be sex effects, which may be culture specific. The strong and positive correlations between the domain scores are expected because the items of the CORE-OM are designed to evaluate related aspects of psychological distress, and the correlations found in this study are not dissimilar from those in all explorations to date with the only scale showing low correlations with respect to the others being Risk.5,23,25–27 This corroborates the special characteristics of the Risk domain,38 defined as an oblique factorial scale with fairly low positive correlation with the other items and domains illustrating that Risk issues are, generally, rather distinct from other aspects of psychological distress domains. The items were designed as much to provide flags of Risk more than to form a robust scale, while ensuring that the crucial issue of Risk would contribute to the overall score, in contrast to many measures which omit it. The findings in this study, in the UK, and of all other translations studied so far fit that design. The cutoff scores obtained in our study are a little lower than those reported by the British and Lithuanian adaptations.26,36 Our values seem more similar to those found in the Italian version,23 with the exception of the Functioning domain, which again is lower in our data than the others. It seems entirely plausible that cutoff scores, which reflect service provision (implicit in the separation of clinical and nonclinical populations), will show cultural/national variations. Currently, data about cutoff scores for reliable change are being collected, and we hope that the results will be published soon. Overall, the results provide very reassuring information about the psychometric properties and the potential of the Spanish version of the CORE-OM. The limitations of the study are the nonrandom sample frames, the relatively limited sample sizes, lack of interview measures, and relatively limited number of convergent validity tests. However, these results clearly support the use of the measure and justify development of subsequent studies with the forms derived from this questionnaire. A Catalan translation has been completed, and psychometric exploration of it is currently in progress. Another translation into Spanish considered more suited for use in Argentina has been completed but not tested. Initial discussions with a few natives suggest that the Spanish version assessed in this article is considered acceptable for use in Chile, Mexico, and Colombia. Further exploration of its acceptability, and then its psychometric properties in other Spanish-speaking countries other than Spain, is encouraged. In summary, this study presents the Spanish version of the CORE-OM showing that it is a reliable and valid instrument for assessing psychological distress in patients and providing feedback to their therapists about overall change and ongoing progress. An additional advantage of this instrument in all its versions, including Spanish, is that it can be used without payment of license fees, and this should facilitate generation of much more evidence about the efficacy and effectiveness of psychological therapies in Spain and at least some other Spanish-speaking populations.

23 in total

1. Measurement and psychotherapy. Evidence-based practice and practice-based evidence.

Authors: F R Margison; M Barkham; C Evans; G McGrath; J M Clark; K Audin; J Connell
Journal: Br J Psychiatry Date: 2000-08 Impact factor: 9.319

2. Service profiling and outcomes benchmarking using the CORE-OM: toward practice-based evidence in the psychological therapies. Clinical Outcomes in Routine Evaluation-Outcome Measures.

Authors: M Barkham; F Margison; C Leach; M Lucock; J Mellor-Clark; C Evans; L Benson; J Connell; K Audin; G McGrath
Journal: J Consult Clin Psychol Date: 2001-04

3. Early sudden gains in psychotherapy under routine clinic conditions: practice-based evidence.

Authors: William B Stiles; Chris Leach; Michael Barkham; Mike Lucock; Steve Iveson; David A Shapiro; Michaela Iveson; Gillian E Hardy
Journal: J Consult Clin Psychol Date: 2003-02

4. Sudden gains in cognitive therapy for depression: a replication and extension.

Authors: Gillian E Hardy; Jane Cahill; William B Stiles; Caroline Ispan; Norman Macaskill; Michael Barkham
Journal: J Consult Clin Psychol Date: 2005-02

5. Distribution of CORE-OM scores in a general population, clinical cut-off points and comparison with the CIS-R.

Authors: Janice Connell; Michael Barkham; William B Stiles; Elspeth Twigg; Nicola Singleton; Olga Evans; Jeremy N V Miles
Journal: Br J Psychiatry Date: 2007-01 Impact factor: 9.319

6. Evaluation of the psychometric properties of the Icelandic version of the Clinical Outcomes in Routine Evaluation-Outcome Measure, its transdiagnostic utility and cross-cultural validation.

Authors: Hafrún Kristjánsdóttir; Baldur Heiðar Sigurðsson; Paul Salkovskis; Daníel Ólason; Engilbert Sigurdsson; Chris Evans; Eva Dögg Gylfadóttir; Jón Friðrik Sigurðsson
Journal: Clin Psychol Psychother Date: 2013-10-29

7. Dimensions of variation on the CORE-OM.

Authors: K Jake Lyne; Paul Barrett; Chris Evans; Michael Barkham
Journal: Br J Clin Psychol Date: 2006-06

8. Factors associated with suicidal ideation in the general population: five-centre analysis from the ODIN study.

Authors: Patricia R Casey; Graham Dunn; Brendan D Kelly; Gail Birkbeck; Odd Stefan Dalgard; Ville Lehtinen; Sohlam Britta; Jose Luis Ayuso-Mateos; Christopher Dowrick
Journal: Br J Psychiatry Date: 2006-11 Impact factor: 9.319

9. Risk factors for suicidality in Europe: results from the ESEMED study.

Authors: M Bernal; J M Haro; S Bernert; T Brugha; R de Graaf; R Bruffaerts; J P Lépine; G de Girolamo; G Vilagut; I Gasquet; J V Torres; V Kovess; D Heider; J Neeleman; R Kessler; J Alonso
Journal: J Affect Disord Date: 2006-10-30 Impact factor: 4.839

10. Predictors of patient non-attendance at Improving Access to Psychological Therapy services demonstration sites.

Authors: Laura Di Bona; David Saxon; Michael Barkham; Kim Dent-Brown; Glenys Parry
Journal: J Affect Disord Date: 2014-08-12 Impact factor: 4.839

13 in total

1. Dilemma-focused intervention for unipolar depression: a treatment manual.

Authors: Guillem Feixas; Victoria Compañ
Journal: BMC Psychiatry Date: 2016-07-12 Impact factor: 3.630

2. A DILEMMA-FOCUSED INTERVENTION FOR DEPRESSION: A MULTICENTER, RANDOMIZED CONTROLLED TRIAL WITH A 3-MONTH FOLLOW-UP.

Authors: Guillem Feixas; Arturo Bados; Eugeni García-Grau; Clara Paz; Adrián Montesano; Victoria Compañ; Marta Salla; Mari Aguilera; Adriana Trujillo; José Cañete; Leticia Medeiros-Ferreira; José Soriano; Montserrat Ibarra; Joan C Medina; Eliana Ortíz; Fernando Lana
Journal: Depress Anxiety Date: 2016-04-22 Impact factor: 6.505

3. Effects and Mechanisms of Cognitive, Aerobic Exercise, and Combined Training on Cognition, Health, and Brain Outcomes in Physically Inactive Older Adults: The Projecte Moviment Protocol.

Authors: Alba Castells-Sánchez; Francesca Roig-Coll; Noemí Lamonja-Vicente; Marina Altés-Magret; Pere Torán-Monserrat; Marc Via; Alberto García-Molina; José Maria Tormos; Antonio Heras; Maite T Alzamora; Rosa Forés; Guillem Pera; Rosalia Dacosta-Aguayo; Juan José Soriano-Raya; Cynthia Cáceres; Pilar Montero-Alía; Juan José Montero-Alía; Maria Mercedes Jimenez-Gonzalez; Maria Hernández-Pérez; Alexandre Perera; George A Grove; Josep Munuera; Sira Domènech; Kirk I Erickson; Maria Mataró
Journal: Front Aging Neurosci Date: 2019-08-14 Impact factor: 5.750

4. Comparing outcomes: The Clinical Outcome in Routine Evaluation from an international point of view.

Authors: Marina Zeldovich; Rainer W Alexandrowicz
Journal: Int J Methods Psychiatr Res Date: 2019-02-19 Impact factor: 4.035

5. Personal Construct Therapy vs Cognitive Behavioral Therapy in the Treatment of Depression in Women with Fibromyalgia: Study Protocol for a Multicenter Randomized Controlled Trial.

Authors: Clara Paz; Mari Aguilera; Marta Salla; Victoria Compañ; Joan C Medina; Arturo Bados; Eugeni García-Grau; Antoni Castel; José Cañete Crespillo; Adrián Montesano; Leticia Medeiros-Ferreira; Guillem Feixas
Journal: Neuropsychiatr Dis Treat Date: 2020-01-24 Impact factor: 2.570

6. Effectiveness of integrated treatment for eating disorders in Spain: protocol for a multicentre, naturalistic, observational study.

Authors: Antoni Grau Touriño; Guillem Feixas; Joan Carles Medina; Clara Paz; Chris Evans
Journal: BMJ Open Date: 2021-03-08 Impact factor: 2.692

7. Cognitive Conflict in Borderline Personality Disorder: A Study Protocol.

Authors: Victor Suarez; Guillem Feixas
Journal: Behav Sci (Basel) Date: 2020-11-26

8. Factor analysis of the Clinical Outcomes in Routine Evaluation - Outcome Measures (CORE-OM) in a Kenyan sample.

Authors: Fredrik Falkenström; Manasi Kumar; Aiysha Zahid; Mary Kuria; Caleb Othieno
Journal: BMC Psychol Date: 2018-10-01

9. Effects of Aerobic Exercise, Cognitive and Combined Training on Cognition in Physically Inactive Healthy Late-Middle-Aged Adults: The Projecte Moviment Randomized Controlled Trial.

Authors: Francesca Roig-Coll; Alba Castells-Sánchez; Noemí Lamonja-Vicente; Pere Torán-Monserrat; Guillem Pera; Alberto García-Molina; José Maria Tormos; Pilar Montero-Alía; Maria Teresa Alzamora; Rosalía Dacosta-Aguayo; Juan José Soriano-Raya; Cynthia Cáceres; Kirk I Erickson; Maria Mataró
Journal: Front Aging Neurosci Date: 2020-10-29 Impact factor: 5.750

10. Exploration of the psychometric properties of the Clinical Outcomes in Routine Evaluation-Outcome Measure in Ecuador.

Authors: Clara Paz; Guido Mascialino; Chris Evans
Journal: BMC Psychol Date: 2020-09-01