Literature DB >> 31185804

A brief, patient- and proxy-reported outcome measure in advanced illness: Validity, reliability and responsiveness of the Integrated Palliative care Outcome Scale (IPOS).

Fliss Em Murtagh^1,2, Christina Ramsenthaler^2,3, Alice Firth², Esther I Groeneveld², Natasha Lovell², Steffen T Simon⁴, Johannes Denzel³, Ping Guo², Florian Bernhardt³, Eva Schildmann³, Birgitt van Oorschot⁵, Farina Hodiamont³, Sabine Streitwieser³, Irene J Higginson², Claudia Bausewein³.

Abstract

BACKGROUND: Few measures capture the complex symptoms and concerns of those receiving palliative care. AIM: To validate the Integrated Palliative care Outcome Scale, a measure underpinned by extensive psychometric development, by evaluating its validity, reliability and responsiveness to change.
DESIGN: Concurrent, cross-cultural validation study of the Integrated Palliative care Outcome Scale - both (1) patient self-report and (2) staff proxy-report versions. We tested construct validity (factor analysis, known-group comparisons, and correlational analysis), reliability (internal consistency, agreement, and test-retest reliability), and responsiveness (through longitudinal evaluation of change). SETTING/PARTICIPANTS: In all, 376 adults receiving palliative care, and 161 clinicians, from a range of settings in the United Kingdom and Germany.
RESULTS: We confirm a three-factor structure (Physical Symptoms, Emotional Symptoms and Communication/Practical Issues). Integrated Palliative care Outcome Scale shows strong ability to distinguish between clinically relevant groups; total Integrated Palliative care Outcome Scale and Integrated Palliative care Outcome Scale subscale scores were higher - reflecting more problems - in those patients with 'unstable' or 'deteriorating' versus 'stable' Phase of Illness (F = 15.1, p < 0.001). Good convergent and discriminant validity to hypothesised items and subscales of the Edmonton Symptom Assessment System and Functional Assessment of Cancer Therapy-General is demonstrated. The Integrated Palliative care Outcome Scale shows good internal consistency (α = 0.77) and acceptable to good test-retest reliability (60% of items kw > 0.60). Longitudinal validity in form of responsiveness to change is good.
CONCLUSION: The Integrated Palliative care Outcome Scale is a valid and reliable outcome measure, both in patient self-report and staff proxy-report versions. It can assess and monitor symptoms and concerns in advanced illness, determine the impact of healthcare interventions, and demonstrate quality of care. This represents a major step forward internationally for palliative care outcome measurement.

Entities: Chemical

Keywords: Outcome and process assessment; palliative care; patient-reported outcome measures; psychometrics; reliability; symptom assessment; validation studies

Mesh：

Year: 2019 PMID： 31185804 PMCID： PMC6691591 DOI： 10.1177/0269216319854264

Source DB: PubMed Journal: Palliat Med ISSN： 0269-2163 Impact factor: 4.762

What is already known about the topic? Validated measures of symptoms and quality-of-life already exist for palliative care. Symptom measures do not capture broader concerns of those with advanced illness, such as information needs, family, practical issues, and so on. Quality-of-life measures are often weighted towards symptoms and function, and may be less relevant and applicable in far-advanced illness. Few existing measures have a validated proxy-report version. What this paper adds? The Integrated Palliative care Outcome Scale (IPOS) is shown to be clinically meaningful, with good validity, reliability, and responsiveness to change. Both patient self-report and staff proxy-report versions of IPOS are evaluated, allowing maximum flexibility for clinical use. Implications for practice IPOS is a comprehensive and psychometrically robust measure underpinned by what patients report as their main concerns, and brief enough to be used in advanced illness. This is a major advance for palliative care outcome measurement internationally.

Background

Healthcare systems around the world face major challenges because of increasing numbers of older people with multi-morbidities, and growing need for palliative care.[1] The increase in chronic diseases – now accounting for a third of all deaths globally – as well as population ageing, is responsible for these changes.[2,3] More people need palliative care,[3] with some estimates exceeding 40 million/year worldwide.[1] To meet current challenges, unexplained variations in healthcare quality need to be addressed through improving outcomes.[4] This can only be achieved with measurement of individual patient-centred outcomes.[5] Patient-reported outcome measures (PROMs) are validated questionnaires completed by patients to measure their perceptions of their own health status/wellbeing.[6] Available PROMs tend to be illness-specific,[7-10] rather than capturing common concerns in advanced illness, regardless of diagnosis. Most PROMs focus on disease control or complications,[11] rather than the concerns of patients. One of the few outcome measures which does capture the full range of concerns prioritised by those with advanced illness themselves, is the Palliative care Outcome Scale (POS).[12] POS was developed >15 years ago, has been psychometrically well tested,[13-16] and is widely used.[17-19] It is a brief, person-centred outcome measure,[20] yet incorporates the main concerns that people with advanced illness themselves prioritise.[21] Although evidence from POS users demonstrated its value in practice and research, there was need for a more refined version (e.g. incorporating more symptoms, refining the spiritual/existential item for diverse populations) to aid utility.[22,23] Having cognitively tested a refined version of POS, the Integrated Palliative care Outcome Scale (IPOS),[24] we therefore aimed to evaluate the validity, reliability, and responsiveness to change of the patient self-report and staff proxy-report IPOS. IPOS covers patients’ main concerns, common symptoms, patient/family distress, existential well-being, sharing feelings with family or friends, information received, and practical concerns, within a timeframe of 3 days (for inpatient settings) or 7 days (for ambulatory settings).

Methods

We are following established quality criteria[25] to refine and validate IPOS. The first steps, involving cognitive testing, are published;[24] this involved cultural adaption and cognitive interviewing in both English and German,[24] to establish a final IPOS in both languages (available for download at www.pos-pal.org). Here, we report testing of the construct validity, reliability, and responsiveness of this final version of IPOS, in both languages. The design is a multi-centre validation study of two versions of IPOS – (a) patient self-report and (b) staff proxy-report.

Population and settings

Patients receiving palliative care were consecutively recruited from eight UK and five German sites: (a) three UK and one German hospital consultation services, (b) five UK and three German in-patient palliative units, and (c) seven UK and two German community (home-based) palliative services. Staff caring for participating patients were also recruited. Data were gathered between June 2014 and January 2016. Inclusion/exclusion criteria: Inclusion criteria for patient participants were: >18 years of age, capacity for written informed consent (as judged by the clinical team), and able to speak/read English or German. Exclusion criteria for patient participants were as follows: impaired capacity, too unwell or distressed to participate (as judged by their clinical teams); or unable to understand English or German. Inclusion criterion for staff participants was delivering care for a patient participant. Staff participants scored the research measures independently of the corresponding patient participant.

Data collection

Demographic/clinical information for patient participants at baseline included age, gender, ethnicity, marital status, if living alone, presence/absence of family caregiver(s), performance status using Australia-modified Karnofsky Performance Status (AKPS),[26] and primary diagnosis. Data (see Table 1 for data collection measures) were collected at two time-points; 2–5 days apart within in-patient settings, and 7–21 days apart in community settings. For patient participants, IPOS (patient version, 3-day recall period), Edmonton Symptom Assessment System–revised (ESASr)[30-32] and the Functional Assessment of Cancer Therapy-General (FACT-G)[33] were collected at first time-point, and IPOS (patient-version) and global change question (see Table 1)[34,40] were collected at the second time-point. For staff participants, the staff-version of IPOS, the Support Team Assessment Schedule (STAS),[41] AKPS[26] and Phase of Illness[38] were collected at the first time point, and IPOS (staff-version), AKPS, and Phase of Illness were collected at the second time point.

Table 1.

Study measures.

Measure	Details of measure	Background, validation and references
The Integrated Palliative care (or Patient) Outcome Scale (IPOS)In this study, reported by both patient and staff participants independently, at T1 and T2	IPOS combines the items from the Palliative Care Outcome Scale (POS) and those from its symptom list (POS and POS-S) into one integrated measure. There are two versions of IPOS: patient self-report and staff proxy-report.The IPOS versions consist of 20 (patient version) or 19 (staff version) items: one free-text question about main problems and concerns, 17 items on physical, psychological, spiritual problems, communication needs including with family, and practical support. They are scores on a 5-point Likert-type scale from 0 (best) to 4 (worst). One additional free-text items asks about additional symptoms to be specified and scored. In the patient version, there is also an additional item reporting by whom the measure was completed (patient alone, with family help, with staff help). Only the 17 standardised items contribute to the IPOS total score. The full IPOS measure is available for download at www.pos-pal.org.	The original Palliative care Outcome Scale (POS) included 10 items covering domains most important to patients with advanced illness.[12] Following patient and staff feedback, a symptom module (POS-S, adapted for specific conditions) was added. Staff versions of POS and POS-S were also developed.[27] Both patient- and staff-reported versions of POS and POS-S have undergone extensive psychometric testing. Both measures show good validity and internal consistency[12,20] as well as test–retest reliability[12] in diverse settings.[13–20] Factor analysis suggests two subscales, psychological well-being and quality of care.[15] It has been translated, culturally adapted and validated for use in fourteen languages[16,17,28,29] and is widely used internationally.[18,19,23] Phase 1 of this IPOS validation study included cognitive interviewing to assess acceptability and content/face validity, and identify cognitive processing issues; this has been reported previously.[24]
The Edmonton Symptom Assessment System revised (ESASr)In this study, reported by patient participants, at T1	ESAS consists of nine visual analogue scales, scored from 0–10, including pain, shortness of breath, nausea, depression, activity, anxiety, well-being, drowsiness and appetite. Initially, ESAS was developed to measure the most common symptoms in cancer patients. Higher scores indicate worse symptoms.	The revised ESAS has been widely validated for use in assessing the symptoms of patients with advanced progressive illness.[30–32]
The Function Assessment of Cancer Therapy – General (FACT-G)In this study, reported by patient participants, at T1	FACT-G comprises 27 items, divided into four primary quality of life domains: Physical well-being (7 items), Social/family well-being (7 items), Emotional well-being (6 items), and Functional well-being (7 items). Each item is measured on a 5-point Likert-type scale. Higher scores indicate better functioning.	Initially developed as a quality-of-life measure for evaluating patients receiving cancer treatment, it has been widely validated across long-term conditions.[33]
Global change questionIn this study, reported by patient participants, at T2	Single-item asking patient participants to report overall change in their symptoms and concerns; ‘Over the last three days would you say that things have got better/worse /there has been no change’.	A single global ‘change’ question recommended for assessing responsiveness of patient-reported outcome measures.[34]
The Support Team Assessment Schedule (STAS)In this study, reported by staff participants, at T1	STAS represents the first available staff-rated palliative care clinical assessment, comprising 9 core and up to 20 optional items covering physical, psychosocial, spiritual, communication, planning, family concerns and service aspects. We used the supplementary definitions and ratings for individual symptoms as described in Clinical Audit in Palliative Care.[35]	It is a validated tool designed to allow clinical staff to assess the clinical and intermediate outcomes of palliative care.[36,37]
Phase of Illness.In this study, reported by staff participants, at T1 and T2	Single item staff-reported measure to provide the context of the current phase of illness with four categories; stable, unstable, deteriorating, and terminal. This single item measure is recorded by staff.	Phase of illness categorises seriously ill patients according to the acuteness and urgency of palliative needs, and has been used as a predictor of resource use for Australian sub-acute and non-acute healthcare.[38] It shows good inter-rater reliability and clinical utility in populations with advanced progressive illness.[38,39]
Australia-modified Karnofsky Performance StatusIn this study, reported by staff participants, at T1 and T2	A single score between 0% and 100% (in 10% steps) based on a patient’s ability to perform common tasks relating to activity, work and self-care. A score of 100% signifies normal physical abilities with no evidence of disease. Decreasing numbers indicate reduced performance status.	The Australia-modified Karnofsky Performance Status (AKPS) is based on the Karnofsky Performance Status, but is adapted for advanced illness.[26]

Study measures.

Analysis

All data were independently double-entered and cross-checked. If not otherwise stated, missing data were excluded pairwise for all analyses. Descriptive statistics were used to describe the sample, and the range/distribution of scores for individual IPOS items, IPOS subscales (as derived from the factor analysis), and total scores. (a) Structural validity: We undertook confirmatory factor analysis (CFA) to establish structural validity and subscales. Taking prior factor analyses into account,[15,16] we contrasted a 2-factor with a 3-factor solution. We used robust maximum likelihood estimation to accommodate the ordinal nature of the data.[42] Fit of each solution was evaluated using chi-square, ratio of chi-square and degrees of freedom, confirmatory fit index, Tucker-Lewis index and root mean square error of approximation (RMSEA).[43] Contrasting models were compared regarding fit indices, standardized parameter estimates, and local strains (low loadings, high standard error).[44] (b) Known-group comparisons: We hypothesized from prior evidence[38] that Those with ‘unstable’ or ‘deteriorating’ Phase of Illness would have (i) a higher total IPOS score (more symptoms/concerns) and (ii) higher physical symptom scores on IPOS, than those with ‘stable’ Phase of Illness. Those with lower function (AKPS) would have (iii) higher total IPOS scores and (iv) higher physical symptom scores on IPOS.[45]

Construct validity

Non-parametric tests after checking of assumptions were used, Kruskal–Wallis H test for hypotheses (i) and (ii), and Mann-Whitney U test for (iii) and (iv). (c) Convergent and discriminant validity was tested by correlating individual IPOS items and subscales with respective items and subscales from patient-reported ESAS-r[32] and FACT-G[33] measures, using Spearman’s correlation coefficients with associated p-values. We hypothesized: High correlations (r > 0.70) of identical or near-identical single items relating to the physical/psychological symptoms from ESAS and IPOS. Mid-range correlations (r 0.5 ⩽ 0.7) between (i) total ESAS scores (which includes only symptoms) and (ii) FACT-G total/subscale scores (not covering spiritual, practical and family issues as in IPOS), with total IPOS scores (including domains beyond symptoms). (a) Internal consistency was estimated using Cronbach’s α for IPOS total scores and subscales. The normally accepted threshold for good internal consistency of 0.8[25,46] was lowered to 0.6 due to the multi-dimensional, non-redundant nature of the IPOS.[47] (b) Test–retest reliability was determined among those patients reporting no change on the global change rating.[34] (c) Inter-rater reliability was assessed between independent patient and staff ratings, and between two independent staff ratings. Cohen’s weighted kappa (κ) was calculated as the reliability statistic, together with proportion agreement of cases where staff or patient’s ratings were equal to or within +1 or −1 of the score, and Spearman correlation to test the association between patient/staff or two independent staff ratings. For interpretation, the Landis and Koch[48] and Fleiss’[49] criteria of k > 0.4 for fair to good and k > 0.75 for substantial to excellent agreement were used.

Responsiveness to change

We assessed responsiveness using a distribution-based approach.[50] We compared mean changes and respective standard deviations of change descriptively in the six categories of change given by the global change rating (ranging from much better to much worse with a ‘don’t know’-category). All analyses were conducted using SPSS version 24.0.[51] The R lavaan package (version),[52] was used for CFA. A p-value < 0.05 was considered statistically significant for all analyses.

Sample size

Sample size considerations were based on guidelines for sample sizes for factor analysis[53] and a Monte Carlo study, determining the sample size to detect factor loadings of at least 0.40 (based on former factor analytic evidence from the POS),[15,16] a power of 80% and an alpha level of 0.05. Simulations were run using the R simsem package,[54] using 10,000 replications and assuming missing data to be handled within a full information maximum likelihood approach. A sample size of 320 was deemed sufficient for modelling.

Results

Subject characteristics

In all, 392 patient participants were recruited. Screened, eligible, approached and consented participants, and first (n = 376) and second (n = 275) timepoint completion, with reasons for non-completion, are shown in Appendix Figure 1. Demographic and clinical characteristics are reported in Table 2.

Table 2.

Demographic and clinical characteristics for all patient participants (n = 376).

Variable	Patients
Variable	n	%
Setting
Hospital inpatient	180	47.9
Hospice inpatient	72	19.1
Hospital outpatient	5	1.3
Community (home-based)	95	25.3
Respite (in-patient)	13	3.5
Missing	11	2.9
Country
Germany	154	40.4
United Kingdom	222	59.6
Socio-demographic details
Age	Mean 65.8 (median: 67)	(SD 13.2; range 20–93)
<65 years	157	41.6
⩾65 years	219	58.4
Gender
Men	174	46.3
Women	187	49.7
Missing	15	4.0
Ethnic origin
White	342	91
Black African or Black Caribbean	8	2.1
Asian	4	1.1
Mixed ethnic background	1	0.3
Missing	21	5.6
Marital status
Single	36	9.6
Married	208	55.3
Divorced or separated	56	14.9
Widowed	57	15.2
Missing	19	5.1
Having a carer	202	53.7
Living alone	132	35.1
Disease factors
Phase of illness
Stable	164	43.6
Unstable	129	34.3
Deteriorating	52	13.8
Dying	1	0.3
Missing	30	8.0
Primary diagnosis
Cancer	292	77.7
Digestive organs	82	21.8
Respiratory tract	47	12.5
Genitourinary tract	63	16.8
Breast	29	7.7
Lymph/Haematopoietic	15	4.0
Other cancers[a]	56	14.9
Non-cancer	57	15.2
COPD	24	6.4
Stroke, MND	11	2.9
HIV/AIDS	2	0.5
Renal failure	3	0.8
Liver failure	7	1.9
Heart failure	2	0.5
Other[b]	8	2.2
Missing	27	6.7
Functional status
Australia-modified Karnofsky performance status
Mean (SD, range)	56.8	(SD 15.8; range 0–90)
0–50	150	39.8
60–100	219	58.2
Missing	7	1.8
IPOS completion
Completed IPOS alone	162	43.1
Completed IPOS with family help	37	9.8
Completed IPOS with staffc help	168	44.7
Missing	9	2.4
Time between assessment 1 and 2 (in days)	Mean 6.6 (median 4)	(SD 7.6; range 1–62)

COPD: chronic obstructive pulmonary disease; MND: motor neurone disease; IPOS: Integrated Palliative care Outcome Scale.

Other cancers comprised cancers of lip/oral cavity/pharynx, skin, brain and central nervous system (CNS), and multiple sites.

Non-specified or other non-cancer disease.

Not staff participants in the study.

Demographic and clinical characteristics for all patient participants (n = 376). COPD: chronic obstructive pulmonary disease; MND: motor neurone disease; IPOS: Integrated Palliative care Outcome Scale. Other cancers comprised cancers of lip/oral cavity/pharynx, skin, brain and central nervous system (CNS), and multiple sites. Non-specified or other non-cancer disease. Not staff participants in the study.

Descriptive statistics and distribution

Table 3 shows prevalence for IPOS items, distribution of IPOS scores, and % of missing data, at first timepoint. The full range of response options was used; only the items ‘Vomiting’, ‘Having enough information’ and ‘Practical matters’ showed positive skew above ±1.0. There was little missing data; the highest percentage of missing data was for ‘Poor appetite’ (3.5%), ‘Family anxiety’ (2.4%), ‘Vomiting’ (2.1%), ‘Practical matters’ (2.1%), ’Having enough information’ (1.9%), and ‘Feeling at peace’ (1.9%).

Table 3.

Descriptive statistics and distribution for IPOS items at timepoint 1 (n = 376).

	Prevalence[a]	95% CI	Not at all (0)	Slight (1)	Moderate (2)	Severe (3)	Overwhelming/all the time (4)	Missing
	%		%	%	%	%	%	%
Physical symptoms
1 – Pain	62.3	57.4–67.2	17.8	18.6	30.3	26.1	5.9	1.3
2 – Shortness of breath	40.8	35.8–45.8	31.9	26.1	22.9	12.0	5.9	1.3
3 – Weakness or lack of energy	81.7	77.8–85.6	4.5	13.3	31.4	37.5	12.8	0.5
4 – Nausea	29.0	24.4–33.6	46.5	23.4	14.9	10.6	3.5	1.1
5 – Vomiting	14.6	11.0–18.2	73.1	10.1	7.2	6.1	1.3	2.1
6 – Poor appetite	48.9	43.9–53.9	27.4	20.2	22.6	18.9	7.4	3.5
7 – Constipation	42.2	37.2–47.2	39.9	16.5	19.1	16.5	6.6	1.3
8 – Sore or dry mouth	55.3	50.3–60.3	23.7	19.9	25.3	22.3	7.7	1.1
9 – Drowsiness	64.9	60.1–69.7	14.6	19.9	33.8	25.0	6.1	0.5
10 – Poor mobility	77.4	73.2–81.6	8.5	12.8	23.4	34.3	19.7	1.3
Emotional symptoms
11 – Patient anxiety	71.0	66.4–75.6	13.6	14.4	29.5	25.3	16.2	1.1
12 – Family anxiety	84.8	81.2–88.4	6.9	5.9	17.0	33.2	34.6	2.4
13 – Depression	51.9	46.9–56.9	27.7	19.7	27.4	16.8	7.7	0.8
14 – Feeling at peace	72.1	67.6–76.6	8.8	17.0	18.9	34.8	18.4	2.1
Communication/practical issues
15 – Sharing feelings	75.0	70.6–79.4	7.7	16.0	14.1	25.0	35.9	1.3
16 – Information	83.5	79.8–87.3	5.6	9.0	12.0	32.4	39.1	1.9
17 – Practical matters	28.7	24.1–33.3	42.8	26.3	16.0	6.6	6.1	2.1

IPOS: Integrated Palliative care Outcome Scale.

Prevalence was defined as any IPOS symptoms/concerns specified as moderate, severe or overwhelming.

Descriptive statistics and distribution for IPOS items at timepoint 1 (n = 376). IPOS: Integrated Palliative care Outcome Scale. Prevalence was defined as any IPOS symptoms/concerns specified as moderate, severe or overwhelming.

Structural validity, identification of subscales, and internal consistency

An initial CFA was conducted to test for uni-dimensionality, a model using all 17 scorable IPOS items loading onto one latent variable. As expected for IPOS (a multi-dimensional measure), the goodness-of-fit indices of this initial CFA suggest no adequate fit, with fit indices comparative fit index (CFI) (0.55) and RMSEA (0.11) well below and above the recommended cut-offs (see Appendix Table 1). The three-factor solution showed a better fit than the two-factor solution (see standardised factor loadings in Appendix Table 2). The first factor, Physical Symptoms, comprises 10 items and explains 24.9% of variance. The second factor, Emotional Symptoms, consists of 4 items and explains 12.3% of variance. The third factor, Communication/Practical Issues, contains 3 items, and explains 8.3% of variance. (These three factors we use throughout the analysis as subscales; the Physical, Emotional, and Communication/Practical subscales.) Total and subscale statistics are shown in Table 4. Cronbach’s α was 0.77, showing good internal consistency for the total scale. For the IPOS subscales, Cronbach’s α was 0.70, 0.68, and 0.58, respectively.

Table 4.

Descriptive statistics and distribution for IPOS total and subscale scores at timepoint 1 (n = 376).

Total and sub- scale scores	# items	Range	Mean	SD	Skew	α[a]	Eigenvalue	% variance
IPOS Total Score	17	3–50	27.4	9.3	−.05	.77
IPOS Physical symptoms	10	1–33	15.8	6.1	−.01	.70	3.5	24.9
IPOS Emotional symptoms	4	0–16	8.1	3.6	−.16	.68	1.7	12.3
IPOS Communication/Practical Issues	3	0–12	3.4	2.7	.64	.58	1.2	8.3

IPOS: Integrated Palliative care Outcome Scale.

Cronbach’s alpha coefficient of internal reliability.

Descriptive statistics and distribution for IPOS total and subscale scores at timepoint 1 (n = 376). IPOS: Integrated Palliative care Outcome Scale. Cronbach’s alpha coefficient of internal reliability.

Construct validity

One-way analysis of variance supported our prior hypothesis that total IPOS scores and IPOS subscale scores were higher in those patients with unstable or deteriorating Phase of Illness compared to stable Phase of Illness (F = 15.1, p < 0.001 for total IPOS and F = 17.8 and 5.7, p < 0.003 for IPOS Physical and IPOS Emotional symptoms, respectively) (see Appendix Figure 2). The total IPOS (t = 2.8, p = 0.006), IPOS Physical Symptoms subscale (t = 3.8, p < 0.001), and individual IPOS items ‘Shortness of breath’, ‘Weakness or lack of energy’, ‘Drowsiness’, ‘Poor mobility’, ‘Family anxiety’, ‘Depression’ and ‘Information’ were all able to distinguish between those patients with higher versus lower functional status on Australia-modified Karnofsky Performance Status (60%–100% vs 0%–50%) (see Appendix Table 5). Because of skewed data, these comparisons were also run using equivalent non-parametric tests, with highly similar results. Convergent and discriminant validity assessment also comprised testing a series of hypotheses for how IPOS subscales and single items correlate with single items, subscales and total scores of ESAS and FACT-G. Correlations were confirmed, being in the hypothesised range of magnitude and direction (see Appendix Tables 3 and 4).

Reliability

In all, 66 patients self-classified as stable between the two timepoints. This was confirmed using the staff-reported ‘Phase of Illness’. For these 66 stable patients, test–retest reliability weighted kappa values showed good to very good agreement (range 0.50–0.8), except for the items ‘Sharing feelings with family or friends’ (κ = 0.20), ‘Having enough information’ (κ = 0.39), ‘Feeling at peace’ (κ = 0.43), and ‘Drowsiness’ (κ = 0.43). The proportion agreement within one score between assessments was generally good to excellent with these four items being the only ones with proportions below 80% (see Table 5). Note that Cohen’s weighted kappa was calculated using all answer options, and for each item of the IPOS, plus all subscale and total scores of the IPOS.

Table 5.

IPOS Item	Test–retest			Inter-rater: Two staff			Inter-rater: patient-staff
IPOS Item	N	κ_ww	%	n	κ_ww	%	n	κ_ww	%
Pain	66	0.49	81.8	92	0.72	91.3	348	0.59	87.1
Shortness of breath	66	0.78	92.4	91	0.82	92.3	345	0.62	86.1
Weakness or lack of energy	66	0.54	86.4	92	0.25	84.8	350	0.29	82.3
Nausea	66	0.75	94.0	91	0.63	94.5	346	0.46	81.2
Vomiting	66	0.62	89.4	90	0.61	92.2	342	0.54	88.3
Poor appetite	66	0.65	89.4	89	0.46	82.0	339	0.34	74.9
Constipation	66	0.69	89.4	86	0.41	80.2	342	0.47	77.5
Sore or dry mouth	66	0.79	90.9	85	0.49	84.7	343	0.25	65.1
Drowsiness	66	0.43	77.3	88	0.21	80.7	350	0.11	60.6
Poor mobility	66	0.71	86.4	91	0.46	83.5	348	0.42	74.4
Patient anxiety	66	0.64	80.3	95	0.44	87.4	347	0.35	75.2
Family anxiety	66	0.60	87.9	54	0.27	81.5	283	0.34	79.2
Depression	66	0.65	83.3	90	0.52	86.7	348	0.38	75.9
Feeling at peace	66	0.43	77.3	82	0.45	82.9	330	0.26	72.4
Sharing feelings	66	0.20	69.7	70	0.34	80.0	308	0.13	68.8
Information	66	0.39	75.8	77	0.14	70.1	332	0.02	70.2
Practical matters	66	0.55	83.3	59	0.20	77.9	317	0.10	68.5
Total IPOS	66	0.81	86.4	29	0.64	79.3	209	0.39	69.4
IPOS Physical Symptoms	66	0.76	89.4	94	0.57	81.9	355	0.39	72.1
IPOS Emotional Symptoms	66	0.67	80.3	95	0.45	64.2	351	0.38	64.4
IPOS Communication/Practical Issues	66	0.44	68.2	88	0.23	67.0	347	0.13	54.5

IPOS: Integrated Palliative care Outcome Scale.

Reliability assessment: weighted kappa (κw) and proportion agreement within one score for test–retest reliability, inter-rater agreement between two independent staff ratings and inter-rater agreement between patient and staff ratings at timepoint 1. IPOS: Integrated Palliative care Outcome Scale. For the assessment of inter-rater agreement between two independent staff, a maximum of 95 matched pairs per IPOS item was available. Agreement as measured by weighted Kappa scores was good (⩾ κ = 0.4) for 11 of 17 IPOS items with the highest levels of agreement being achieved for the items ‘Pain’ (κ = 0.72), ‘Shortness of breath’ (κ = 0.82) and ‘Nausea’ (κ = 0.63). Lower levels of agreement were observed for items ‘Weakness or lack of energy’, ‘Drowsiness’, ‘Family anxiety’, ‘Sharing feelings with family or friends’, ‘Has the patient had enough information as s/he wanted?’ and ‘Have any practical matters resulting from his or her illness been addressed?’. Analysis of the standard error of measurement for these items with low levels of agreement showed that errors for these items were close to 1 point on the IPOS. The comparison of staff and patient ratings yielded similar results with acceptable to good agreement (⩾ κ = 0.3) achieved on 11 of 17 IPOS items with highest levels of agreement for ‘Pain’, ‘Shortness of breath’, ‘Vomiting’, and ‘Constipation’. Again, items ‘Having had enough information’ (κ = 0.02), ‘Have practical matters been addressed?’ (κ = 0.10), and ‘Sharing feelings with family or friends’ (κ = 0.13) showed low levels of agreement, together with the items ‘Drowsiness’ (κ = 0.11), ‘Feeling at peace’ (κ = 0.26) and ‘Sore or dry mouth (κ = 0.25) (see Table 3). However, the proportion of scores that were within one score of a perfect match was still high (above 70%) for these items, except for ‘Drowsiness’ (60.6%) and ‘Sore or dry mouth’ (65.1%). The proportion with agreement between patient and staff ratings was higher at the second assessment (see Appendix, Table 7).

Responsiveness

Table 6 presents the mean changes in the IPOS total score in relation to patients’ global report of change. SD at baseline for the total IPOS score was 9.2. Mean change scores for the total score were as large as 4.3 in the ‘much improved’ group and even larger (−9.6) for the group that described themselves as ‘much worse’. However, associated standard deviations of change were comparably large, pointing towards potential misclassification according to the global change criterion. Overall, a change of about 5 points in the total IPOS score represents a moderate effect size.

Table 6.

Mean total IPOS score changes (between T1–T2) by global change scale (a negative change scores indicates deterioration).

Total IPOS	n	Mean_change ± SD_change
Things have got…
Much better	28	4.3 ± 6.1
A little better	90	3.0 ± 7.5
No change	55	1.7 ± 6.7
I don’t know	3	−2.3 ± 13.5
A little worse	24	−0.3 ± 8.1
Much worse	5	−9.6 ± 8.0

IPOS: Integrated Palliative care Outcome Scale.

Mean total IPOS score changes (between T1–T2) by global change scale (a negative change scores indicates deterioration). IPOS: Integrated Palliative care Outcome Scale.

Discussion

IPOS is a valid and reliable outcome measure for use with people with advanced illness. It has good structural validity, with three underlying factors – physical symptoms, emotional symptoms, and communication/practical issues. Unusually, this validation study included a high proportion of people with poor functional status, strengthening conclusions for the advanced illness population. IPOS discriminates clearly between different palliative Phases of Illness.[38,39] The physical symptom subscale also discriminates between those with poor or high functional status. Almost all individual IPOS items show good agreement when re-tested in stable patients. There is acceptable or good agreement between most patient self-reported and staff proxy-reported items. Most importantly, the total IPOS score showed a change in keeping with patient-report of the overall change in their symptoms and other concerns, both in direction and size of change. The changing age distribution of the population and increasing prevalence of multi-morbidities (with more complex health needs)[55] require outcome measures to work across conditions and in advanced illness.[56] Only measures that reflect patient priorities can support a truly patient-centred approach to care.[57] Until now, outcome measures extending beyond symptoms or quality-of-life for this population have been lacking. Health-related quality-of-life measures, often heavily based on physical function, show low sensitivity with large floor/ceiling effects among those with advanced, multi-morbid disease and do not capture the main priorities of those affected.[58-64] The IPOS has features which set it apart from other outcome measures commonly used in the context of advanced disease. Including how symptoms or other concerns have affected the individual themselves is a distinct characteristic not commonly sought in quality-of-life or symptom tools.[65] The ESAS,[31] the M.D. Anderson Symptom Inventory (MDASI),[66] the Symptom Distress Scale[67] and the Palliative Problem Severity Score (PCPSS)[38] all focus on severity of symptoms. With the exception of PCPSS, they score physical and psychological symptoms, excluding concerns such as family issues (family anxiety, sharing feelings with family or friends), spirituality, practical issues, information needs, and communication concerns.[58-64] While existing tools are well-validated,[41,67-71] proxy-reported versions are much less well established.[72,73] IPOS may also capture the impact of symptoms and concerns differently. In terms of overall validity, the performance of the new, refined IPOS is comparable to both the original POS and similar measures in the field. In the original POS validation,[12] mid-range correlations to the EORTC QLQ-C30 physical, non-physical and quality-of-life subscales were reported. Mid-range correlations were also apparent in comparison of POS with the Rotterdam Symptom Checklist,[74] the Italian POS with the FACIT-SP and EORTC QLQ-C15-Pal,[55] the POS-S with the Rotterdam checklist, the MDASI and EORTC measures,[12,55,74] and in this validation study when comparing IPOS to FACT-G and ESAS. This result – of mid-range rather than high correlations – is likely because existing scales largely focus on the severity of symptoms, whereas POS and IPOS focus on how a person is affected. In line with this, comparison of the POS pain item with the Brief Pain Inventory’s pain impairment item yielded a higher correlation than to the Pain Severity item.[74] The mid-range correlations (⩽0.50) between aspects of psychological well-being across questionnaires further demonstrate the different dimensions included in IPOS, covering wider issues of spiritual and family well-being. The consistently low correlations of the communication/practical items with other outcome measures across studies point towards the uniqueness of this aspect.[12,55,74,75] The second distinct feature of this validation of IPOS is the broader testing of reliability. In terms of test–retest reliability, we found mostly good to very good agreement (weighted kappa values 0.50–0.80). These values are higher than in similar studies of test–retest reliability of either POS[12,55,74,75] or ESAS,[31,70,76] perhaps explained by using an external criterion to judge stability, rather than assuming stability over 24–48 h, an assumption that might not be justified in a fast-changing palliative population.[58] However, some items of IPOS (‘Sharing feelings with family or friends’, ‘Having enough information’, ‘Feeling at peace’, and ‘Drowsiness’) showed less agreement. These items also showed low agreement in the comparison of patient and staff ratings. This is consistent with prior studies of the biases affecting proxy assessments,[77] which suggest systematic overestimation of physical symptoms and underestimation of psychological well-being and information needs.[78-81] The low agreement for the information item had also been observed in studies of POS; Higginson and Gao[82] reported a weighted kappa value as low as 0.04 for this item, and results by Dawber et al.[78] and Van Soest-Poortvliet et al.[83] are similar. A study of the Palliative care Problem Severity Index[38] identified features of the raters (e.g. new staff member with new patient), patient characteristics (e.g. communication problems, drowsiness), or family characteristics (e.g. lacking interaction with family), as impeding agreement. This result has also been observed in a study looking at proxy ratings of the McGill Quality of Life questionnaire.[79] These features may also have been present in the IPOS validation study. Fluctuating symptoms, in particular drowsiness, may also contribute to this as demonstrated for the comparable ESAS item.[84,85] This validation was cross-cultural, conducted in two countries simultaneously, and we believe this strengthened both cognitive development of the measure[24] and this full-scale validation. Despite some skewness in distributions of individual items, both parametric and non-parametric statistics showed highly similar and robust results, both in terms of correlations and test of differences. However, there were limitations in the population studied; only 15% of patient participants had non-cancer conditions. IPOS needs to be further tested in non-cancer conditions, and refinements for different diseases may be required. Indeed, development of a version for use in cognitive impairment and dementia is already well under way.[86] The use of consecutive enrolment ensured IPOS was validated in a broadly clinically representative group. However, selection bias cannot be ruled out. This validation also mirrored conditions for IPOS use in clinical practice – the ‘least controlled’ use – for instance with absence of specific staff training prior to implementation.[87] Despite incorporation of a global criterion for change, it was not possible to derive values for a minimal clinically important change for improvement or deterioration for the total IPOS, its subscales, and individual items. Such a feature is desirable, as optimal cut-offs for individual symptoms in particular can trigger specific clinical actions, such as referral to a palliative care team, help triage patients within services[88] and therefore extend the clinical and research utility of an outcome measure.[72,89,90]

Clinical and research implications

This study has demonstrated IPOS is valid and reliable. Because it is brief and underpinned by the symptoms and concerns of people with advanced illness, it will be invaluable for clinical practice and research. To implement such a measure into routine clinical practice needs training. A recent survey of the use of ESAS showed a range of training needs and other barriers.[85] We are already working on best ways to implement, using the national Outcomes Assessment and Complexity Collaborative in the United Kingdom. This is based on the well-established Australian Palliative Care Outcomes Collaborative.[91]

Conclusion

The IPOS is a valid and reliable outcome measure for use with people with advanced illness, both in its patient self-report and staff proxy-report versions. It is suitable for assessing and monitoring symptoms and concerns in advanced illness, monitoring change over time, determining the impact of healthcare interventions, and demonstrating quality of care. This will be invaluable for clinical care, audit and research, and represents a major step forward internationally for outcome measurement in advanced illness. Click here for additional data file. Supplemental material, 854264_supp_mat1 for A brief, patient- and proxy-reported outcome measure in advanced illness: Validity, reliability and responsiveness of the Integrated Palliative care Outcome Scale (IPOS) by Fliss EM Murtagh, Christina Ramsenthaler, Alice Firth, Esther I Groeneveld, Natasha Lovell, Steffen T Simon, Johannes Denzel, Ping Guo, Florian Bernhardt, Eva Schildmann, Birgitt van Oorschot, Farina Hodiamont, Sabine Streitwieser, Irene J Higginson and Claudia Bausewein in Palliative Medicine

76 in total

1. The PRISMA Symposium 1: outcome tool use. Disharmony in European outcomes research for palliative and advanced disease care: too many tools in practice.

Authors: Richard Harding; Steffen T Simon; Hamid Benalia; Julia Downing; Barbara A Daveson; Irene J Higginson; Claudia Bausewein
Journal: J Pain Symptom Manage Date: 2011-10 Impact factor: 3.612

2. Health care providers' use and knowledge of the Edmonton Symptom Assessment System (ESAS): is there a need to improve information and training?

Authors: Daniela Carli Buttenschoen; Jarad Stephan; Sharon Watanabe; Cheryl Nekolaichuk
Journal: Support Care Cancer Date: 2013-09-13 Impact factor: 3.603

Review 3. The Edmonton Symptom Assessment System: a 15-year retrospective review of validation studies (1991--2006).

Authors: Cheryl Nekolaichuk; Sharon Watanabe; Crystal Beaumont
Journal: Palliat Med Date: 2008-03 Impact factor: 4.762

4. Comparative fit indexes in structural models.

Authors: P M Bentler
Journal: Psychol Bull Date: 1990-03 Impact factor: 17.737

5. What matters most in end-of-life care: perceptions of seriously ill patients and their family members.

Authors: Daren K Heyland; Peter Dodek; Graeme Rocker; Dianne Groll; Amiram Gafni; Deb Pichora; Sam Shortt; Joan Tranmer; Neil Lazar; Jim Kutsogiannis; Miu Lam
Journal: CMAJ Date: 2006-02-28 Impact factor: 8.262

6. How to analyze palliative care outcome data for patients in Sub-Saharan Africa: an international, multicenter, factor analytic examination of the APCA African POS.

Authors: Richard Harding; Lucy Selman; Victoria M Simms; Suzanne Penfold; Godfrey Agupio; Natalya Dinat; Julia Downing; Liz Gwyther; Barbara Ikin; Thandi Mashao; Keletso Mmoledi; Lydia Mpanga Sebuyira; Tony Moll; Faith Mwangi-Powell; Eve Namisango; Richard A Powell; Frank H Walkey; Irene J Higginson; Richard J Siegert
Journal: J Pain Symptom Manage Date: 2012-09-24 Impact factor: 3.612

7. Responsiveness and minimally important difference for the patient-reported outcomes measurement information system (PROMIS) 20-item physical functioning short form in a prospective observational study of rheumatoid arthritis.

Authors: Ron D Hays; Karen L Spritzer; James F Fries; Eswar Krishnan
Journal: Ann Rheum Dis Date: 2013-10-04 Impact factor: 19.103

8. A critical look at transition ratings.

Authors: Gordon H Guyatt; Geoffrey R Norman; Elizabeth F Juniper; Lauren E Griffith
Journal: J Clin Epidemiol Date: 2002-09 Impact factor: 6.437

Review 9. Ageing populations: the challenges ahead.

Authors: Kaare Christensen; Gabriele Doblhammer; Roland Rau; James W Vaupel
Journal: Lancet Date: 2009-10-03 Impact factor: 79.321

10. Psychometric properties of instruments to measure the quality of end-of-life care and dying for long-term care residents with dementia.

Authors: Mirjam C van Soest-Poortvliet; Jenny T van der Steen; Sheryl Zimmerman; Lauren W Cohen; Maartje S Klapwijk; Mirjam Bezemer; Wilco P Achterberg; Dirk L Knol; Miel W Ribbe; Henrica C W de Vet
Journal: Qual Life Res Date: 2011-08-05 Impact factor: 4.147

45 in total

1. Timing of GP end-of-life recognition in people aged ≥75 years: retrospective cohort study using data from primary healthcare records in England.

Authors: Daniel Stow; Fiona E Matthews; Barbara Hanratty
Journal: Br J Gen Pract Date: 2020-11-26 Impact factor: 5.386

2. Primary palliative care. Caring for patients with life-limiting illness in the community.

Authors: Sylvia McCarthy
Journal: Malays Fam Physician Date: 2021-11-28

3. Comprehensive symptom assessment using Integrated Palliative care Outcome Scale in hospitalized heart failure patients.

Authors: Yasuhiro Hamatani; Moritake Iguchi; Yurika Ikeyama; Atsuko Kunugida; Megumi Ogawa; Natsushige Yasuda; Kana Fujimoto; Hidenori Ichihara; Misaki Sakai; Tae Kinoshita; Yasuyo Nakashima; Masaharu Akao
Journal: ESC Heart Fail Date: 2022-03-20

4. Pharmacological strategies used to manage symptoms of patients dying of COVID-19: A rapid systematic review.

Authors: Laura Heath; Matthew Carey; Aoife C Lowney; Eli Harriss; Mary Miller
Journal: Palliat Med Date: 2021-05-13 Impact factor: 4.762

5. Association between prognostic awareness and quality of life in patients with advanced cancer.

Authors: Karolina Vlckova; Kristyna Polakova; Anna Tuckova; Adam Houska; Martin Loucka
Journal: Qual Life Res Date: 2022-02-04 Impact factor: 4.147

6. [Needs and burdens of palliative care patients with advanced and/or metastatic head and neck tumors].

Authors: C Roch; P Schendzielorz; A Scherzad; B van Oorschot; M Scheich
Journal: HNO Date: 2020-07 Impact factor: 1.284

7. Psychological burden in family caregivers of patients with advanced cancer at initiation of specialist inpatient palliative care.

Authors: Karin Oechsle; Anneke Ullrich; Gabriella Marx; Gesine Benze; Julia Heine; Lisa-Marie Dickel; Youyou Zhang; Feline Wowretzko; Kim Nikola Wendt; Friedemann Nauck; Carsten Bokemeyer; Corinna Bergelt
Journal: BMC Palliat Care Date: 2019-11-18 Impact factor: 3.234

8. Perceptions of healthcare professionals towards palliative care in internal medicine wards: a cross-sectional survey.

Authors: Jason Tay; Scott Compton; Gillian Phua; Qingyuan Zhuang; Shirlyn Neo; Guozhang Lee; Limin Wijaya; Min Chiam; Natalie Woong; Lalit Krishna
Journal: BMC Palliat Care Date: 2021-06-30 Impact factor: 3.234

9. C-Reactive Protein and Its Relationship with Pain in Patients with Advanced Cancer Cachexia: Secondary Cross-Sectional Analysis of a Multicenter Prospective Cohort Study.

Authors: Koji Amano; Hiroto Ishiki; Tomofumi Miura; Isseki Maeda; Yutaka Hatano; Shunsuke Oyamada; Naosuke Yokomichi; Keita Tagami; Takuya Odagiri; Tetsuya Ito; Mika Baba; Tatsuya Morita; Masanori Mori
Journal: Palliat Med Rep Date: 2021-05-05

10. Systematic symptom and problem assessment at admission to the palliative care ward - perspectives and prognostic impacts.

Authors: Anja Coym; Anneke Ullrich; Lisa Kathrin Hackspiel; Mareike Ahrenholz; Carsten Bokemeyer; Karin Oechsle
Journal: BMC Palliat Care Date: 2020-05-28 Impact factor: 3.234