Literature DB >> 33192054

Measuring the Patient Experience of Mental Health Care: A Systematic and Critical Review of Patient-Reported Experience Measures.

Sara Fernandes¹, Guillaume Fond¹, Xavier Yves Zendjidjian¹, Karine Baumstarck¹, Christophe Lançon¹, Fabrice Berna², Franck Schurhoff², Bruno Aouizerate², Chantal Henry², Bruno Etain², Ludovic Samalin², Marion Leboyer², Pierre-Michel Llorca², Magali Coldefy³, Pascal Auquier¹, Laurent Boyer¹.

Abstract

BACKGROUND: There is growing concern about measuring patient experience with mental health care. There are currently numerous patient-reported experience measures (PREMs) available for mental health care, but there is little guidance for selecting the most suitable instruments. The objective of this systematic review was to provide an overview of the psychometric properties and the content of available PREMs.
METHODS: A comprehensive review following the preferred reporting items for systematic reviews and meta-analysis (PRISMA) guidelines was conducted using the MEDLINE database with no date restrictions. The content of PREMs was analyzed using an inductive qualitative approach, and the methodological quality was assessed according to Pesudovs quality criteria.
RESULTS: A total of 86 articles examining 75 PREMs and totaling 1932 items were included. Only four PREMs used statistical methods from item response theory (IRT). The 1932 items covered seven key mental health care domains: interpersonal relationships (22.6%), followed by respect and dignity (19.3%), access and care coordination (14.9%), drug therapy (14.1%), information (9.6%), psychological care (6.8%) and care environment (6.1%). Additionally, a few items focused on patient satisfaction (6.7%) rather than patient experience. No instrument covered the latent trait continuum of patient experience, as defined by the inductive qualitative approach, and the psychometric properties of the instruments were heterogeneous.
CONCLUSION: This work is a critical step in the creation of an item library to measure mental health care patient-reported experience that will be used in France to develop, validate, and standardize item banks and computerized adaptive testing (CAT) based on IRT. It will also provide internationally replicable measures that will allow direct comparisons of mental health care systems. TRIAL REGISTRATION: NCT02491866.

Entities: Chemical

Keywords: bipolar disorder; health services research; major depression; patient experience; patient satisfaction; patient-reported experience measures; schizophrenia; systematic review

Year: 2020 PMID： 33192054 PMCID： PMC7653683 DOI： 10.2147/PPA.S255264

Source DB: PubMed Journal: Patient Prefer Adherence ISSN： 1177-889X Impact factor: 2.711

Background

Providing high-quality care is a priority for all health systems worldwide; however, a recent report highlights that the quality of mental health care remains lower than that of other medical disciplines.1,2 The current care organization is not adequate to address mental disorders (eg, schizophrenia, bipolar disorder and major depression) that emerge as a major health disparity category.2–6 Patients with mental disorders have a marked decrease in life expectancy (eg, approximately 14 years on average for patients with schizophrenia).7 They are confronted with persistent gaps in access to and receipt of mental health care.4 In particular, they are faced with misdiagnosis, which can lead to inappropriate or delayed treatment and, consequently, poor health outcomes.8 The major challenges for mental health care include inadequate treatments and the underuse of guidelines,9–14 as well as health care variation among geographical regions,15 stigma and discrimination,16–18 and poor adherence to treatment by patients.19 Quality measurement is fundamental for improving the quality of mental health care and identifying where changes are needed, and it requires appropriate measurement methods. It is currently established that patients’ experience is an important measure of health care quality,20–22 and the use of patient-reported experience measures (PREMs) is recommended.23 PREMs report information on patients’ views of their experience while receiving care.24 They are most commonly in the form of questionnaires.25 Respondents are asked to provide detailed reports on what actually occurred during a specific care episode, rather than an evaluation of what occurred,26 to determine the extent to which care is patient-centered.27,28 There is evidence of an association between a more positive patient experience and improved health care outcomes.28–31 Many PREMs in mental health have been developed in recent decades, but there is little guidance for selecting the most suitable instruments. To date, systematic reviews have focused on satisfaction instruments,32,33 which is a limited approach to patient experience, or on PREMs but in a non-exhaustive way.34 Given the growing number of PREMs and the need for using them in clinical settings, the objectives of this systematic review were to 1) identify all available PREMs designed to measure the mental health care experience of adult patients, 2) provide an overview of their content and psychometric properties, and 3) critically analyze the methodological quality of these instruments using a set of pre-established robust criteria.

Methods

Search Strategy

A comprehensive review of the published peer-reviewed literature was conducted using the MEDLINE bibliographic database, with no date restrictions. Our research was limited to articles written in English and articles reporting on the development and/or validation process of mental health care quality assessment instruments. The reference lists of the selected articles were screened to find additional instruments that were not identified in the initial literature search. In addition, studies describing translations or revisions were retrieved to check references to the original instrument development. Articles that only addressed the use of an instrument were excluded. The authors also used online resources to inform this review. The research strategy was conceptualized as a combination of the context of use (ie, mental health or psychiatry), what is being measured (patient experience or satisfaction) and the study design (development and/or validation process of an instrument). This search key used a compilation of MeSH terms and free-text words, using Boolean operators, as follows: (“patient satisfaction” OR “consumer satisfaction” OR “client satisfaction” OR “patient experience” OR “patients experience” OR “patient experiences” OR “patients experiences” OR “patient reported experience” OR “patient reported experience measure” OR “PREM” OR “PREMs”) AND (“psychiatry” OR “psychiatry”[Mesh] OR “psych*” OR “mental” OR “Mental Health Services”[Mesh]) AND (“tool*” OR “instrument*” OR “score*” OR “scale*” OR “survey*” OR “questionnaire*” OR “measure*”) AND (“development” OR “validation” OR “psychometric” OR “psychometrics” OR “psychometrics”[Mesh]). This review was performed in accordance with the preferred reporting items for systematic reviews and meta-analysis (PRISMA) guidelines.35

Study Selection

Eligibility Criteria

Articles had to meet the following eligibility criteria to be included in this review. The inclusion criteria were as follows: (i) articles dealing with the process of development and/or validation of any instrument intended to be used and/or applicable in the context of mental health care; (ii) adult participants, regardless of their care setting; (iii) instruments designed to capture the experience of patients/service users; and (iv) study written in English. This means that any study describing, at least in part, the operationalization of the construct, item development, pretesting or psychometric analyses were included. The exclusion criteria were as follows: (i) instruments specifically designed for the elderly or children and adolescents; (ii) changes or cultural adaptation of one already existing instrument; (iii) instruments not self-reported by patients; (iv) articles addressing an ad hoc instrument; and (v) instruments developed for specific care (ie, home care, nursing care, residential care, etc.); (vi) review articles, editorials, discussions and opinion papers, and conference proceedings; and (vii) articles written in a language other than English.

Selection of Studies

The articles identified by the search key were carefully reviewed by two independent authors (SF and LB). These articles were first screened according to their titles and abstracts, and those that did not meet the eligibility criteria were eliminated. The full text was retrieved and reviewed when the decision could not be made on the basis of the title or abstract or when the assessment was discordant between the two examiners. In the latter case, when a consensus could not be reached, a third author (GF) was consulted to reach an agreement. The reference lists of articles eligible for inclusion in this review were also screened.

Data Extraction

Data were extracted separately by two independent authors (SF and LB). Excel was used to collect all the relevant information from the included articles using a predefined data extraction form. The following data were extracted for each instrument: general data (author(s) and year of publication, name and abbreviation of the instrument, country and language of origin, study objective(s), characteristics and size of the sample, administration method), structure (number of items, number and labels of dimensions/factors, time frame, response scale), development characteristics (viewpoints and sources for item development) and some psychometric properties (reliability and construct validity).

Content Analysis of the Instruments

The content of the instruments included in this review was analyzed using an exploratory qualitative approach. In the absence of a recognized and validated theoretical framework,36,37 we used an inductive approach,38 which consists of developing a conceptual framework from the raw data. This method makes it possible to move from a set of specific data sets to more general categories of meaning without being driven by predetermined theoretical assumptions. To do this, all collected items were carefully examined and coded. Codes sharing a relationship of meaning have been iteratively grouped into a limited number of categories with distinct and meaningful content. Each category was then reviewed and named according to the characteristic words it covers. This approach enabled us to examine the relative weight of each dimension by taking into account that some items could be classified into different categories, eg “I received information about treatment options for my mental health problems”39 could fit in the “information” and “medication” dimensions. This strategy has allowed us to identify the dimensions most commonly covered by the range of instruments currently available in the mental health context.

Quality Assessment

The criteria used to assess the quality of the instruments are derived from the Quality Assessment Criteria framework developed by Pesudovs et al.40 Originally designed to perform a standardized assessment of the quality of the development process and the psychometric properties of patient-reported outcome measures (PROMs), Pesudovs’ criteria proved to be relevant for evaluating PREMs.41 These criteria are presented in Table 1. Each instrument was independently rated by two authors (SF and LB) as positive (⩗⩗), acceptable (⩗) or negative (X) against each criterion. When consensus could not be reached, a third author (GF) was consulted.

Table 1

Quality Criteria

Property	Definition	Quality Criteria
Instrument development
Pre-study hypothesis and intended population	Specification of the hypothesis pre-study and if the intended population have been studied	✓✓- Clear statement of aims and target population, as well as intended population being studied in adequate depth✓- Only one of the above or generic sample studiedX- Neither reported
Actual content area (face validity)	Extent to which the content meets the pre-study aims and population	✓✓- Content appears relevant to the intended population✓- Some relevant content areas missingX- Content area irrelevant to the intended population
Item identification	Items selected are relevant to the target population	✓✓- Evidence of consultation with patients, stakeholders and experts (through focus groups/one-to-one interview) and review of literature✓- Some evidence of consultationX- Patients not involved in item identiﬁcation
Item selection	Determining of ﬁnal items to include in the instrument	✓✓- Rasch or factor analysis employed, missing items and ﬂoor/ceiling effects taken into consideration. Statistical justiﬁcation for removal of items✓- Some evidence of above analysisX- Nil reported
Unidimensionality	Demonstration that all items ﬁt within an underlying construct	✓✓- Rasch analysis or factor loading for each construct. Factor loadings >0.4 for all items✓- Cronbach’s alpha used to determine correlation with other items in instrument. Value >0.7 and <0.9X- Nil reported
Response scale	Scale used to complete the measure	✓✓- Response scale noted and adequate justiﬁcation given✓- Response scale with no justiﬁcation for selectionX- Nil reported
Instrument performance
Convergent validity	Assessment of the degree of correlation with a related measure	✓✓- Tested against appropriate measure, Pearson’s correlation coefﬁcient between 0.3 and 0.9✓- Inappropriate measure, but coefﬁcient between 0.3 and 0.9X- Nil reported or tested and correlates <0.3 or >0.9
Discriminant validity	Degree to which an instrument diverges from another instrument that it should not be similar to	✓✓- Tested against appropriate measure, Pearson’s correlation coefﬁcient <0.3✓- Inappropriate measure, but coefﬁcient <0.3X- Nil reported or tested and correlates >0.3
Predictive validity	Ability for a measure to predict a future event	✓✓- Tested against appropriate measure and coefficient >0.3✓- Inappropriate measure but coefﬁcient >0.3X- Nil reported or tested and correlates <0.3
Test-retest reliability	Statistical technique used to estimate components of measurement error by testing comparability between two applications of the same test at different time points	✓✓- Pearson’s r value or ICC >0.8✓- Measured but Pearson’s r value or ICC <0.8X- Nil reported
Responsiveness	Extent to which an instrument can detect clinically important differences over time	✓✓- Discussion of responsiveness and change over time. Score changes > MID over time✓- Some discussion but no measure of MIDX- Nil reported

Notes: ✓✓-positive rating, ✓-acceptable rating, X-negative rating.

Abbreviations: ICC, intraclass coefficient; MID, minimally important difference.

Quality Criteria Notes: ✓✓-positive rating, ✓-acceptable rating, X-negative rating. Abbreviations: ICC, intraclass coefficient; MID, minimally important difference.

Results

The literature search produced a total of 693 potentially relevant scientific articles (last access: August 6th, 2019), and 11 additional articles were identified by further sources, for a total of 704 articles. These articles were first sorted according to the relevance of their titles and abstracts, leading to the exclusion of 577 references that were not relevant. The full text of the remaining 127 articles was retrieved. Of these, 56 articles were excluded because they did not meet the inclusion criteria. Thus, following this first stage, 71 articles remained. The reference lists of these articles were then reviewed, and 15 additional articles were included. See Figure 1 for details on the literature selection process.

Figure 1

PRISMA flow chart.

Note: Adapted from Moher D, Liberati A, Tetzlaff J, Altman DG, PRISMA Group (2009). Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. J Clin Epidemiol. 62(10):1006–1012.35

PRISMA flow chart. Note: Adapted from Moher D, Liberati A, Tetzlaff J, Altman DG, PRISMA Group (2009). Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. J Clin Epidemiol. 62(10):1006–1012.35 The search yielded a total of 86 articles examining 75 instruments16‑39‑42‑124 (see to view the characteristics of these instruments).

General Data

The instruments included in this review were published between 197975 and 2018.39 Most of these instruments were from the United States (n=23) and the United Kingdom (n=15), followed by Australia (n=6), Sweden (n=5), Canada (n=4), France (n=4), Germany (n=3), Norway (n=3), Italy (n=3), the Netherlands (n=2), Thailand (n=2), Iran (n=1), Ireland (n=1), Belgium (n=1) and Ethiopia (n=1). Furthermore, one instrument was used simultaneously in several countries, namely, the US, Japan and Italy.65 Sixty-one instruments were self-administered (81.3%), and 14 were designed to be administered during an interview (18.7%).43,44,49,52,60,65,68,82,89,93,94,97,108,112,113,117 Most of the scales specifically targeted mental health service users (89.3%), while 8 were generic and applicable for mental health care.43–46,57,58,71,75,111,114,115 Of these 75 included instruments, 24 were designed for inpatient and residential settings (32.0%),50,51,56,59,64,68,71,74,78–80,84,85,90,95,100,101,104–106,108,118–121,123 including two that were specific to the forensic setting79,104; one instrument was developed in two versions, including one version for civil inpatients and the other version for forensic setting.108 Thirty-four instruments were designated for community-based services (45.3%).43,46,49,53,54,58,60,63,65,73,76,77,81,83,86,87,89,91,94,96,97,99,105,107,109,111,113,116,117 Seventeen instruments have proven useful for both inpatients and outpatients (22.7%).16,39,42,47,48,52,66,67,69,72,75,87,98,102,103,110,114,115,122,124,125 Among them, some instruments were only validated in specific populations: three in patients with schizophrenia,85,98,102,124 one in bipolar patients,88 one in depressed patients63 and one in bipolar or psychotic patients.67 The time frame for administering the instrument was reported for 29 instruments (38.7%): twenty were designed for completion before leaving the hospital, one of which was delivered 6 to 7 days after admission,64 one at the end of a group therapy session (normally after the first week of admission),119 one 1 week before discharge,85 seven on the day of discharge,50,51,74,90,106,118,120 1 day before discharge,56 and nine unspecified,53,59,67,78,84,95,121 two of which were designed to be completed near the patient’s discharge, generally within 24 to 72 hours68 or 24 to 48 hours80 before leaving. Two instruments were designed to be administered after discharge,71,125 including one within 1 month of discharge.71 Another instrument was designed to be administered both before and after discharge.66 Among the instruments designed for use in outpatient or community services, one instrument was designed to be completed 3 months after psychotropic drug change,88 two were designed to be completed before leaving the clinic,46,107 one was designed to be completed at the end of the initial visit122 and one was designed to be completed at home.63 Additionally, one instrument was administered at different times depending on the agencies.103

Instruments’ Structure

The number of dimensions varied from 1 (Patient Evaluation of Care-5 (PEC-5), Client Satisfaction Questionnaire (CSQ-8), Mental Health Service Satisfaction Scale (MHSSS), Satisfaction Index – Mental Health (SI-MH), Patient Satisfaction with Psychotropic (PASAP), Consumer Evaluation of Mental Health Services (CEO-MHS), Reassurance Questionnaire (RQ))50,75,82,87–89,111 to 11 (Survey of Health care Experiences of Patients (SHEP)).125 The number of dimensions was determined using statistical methods for 51 instruments. Among them, one instrument used a non-parametric Mokken analysis,69 while the others used exploratory or confirmatory factor analyses. Alternatively, 19 instruments established their dimensionality based on a conceptual framework drawn from the literature without using statistical methods to confirm their structure.16‑47‑48‑51‑52‑59‑66‑71‑78‑86‑94‑95‑99‑100‑102‑109‑117‑118‑122‑124‑125 The number of items ranged from 5 (PEC-5);50 Helping Alliance Scale (HAS)97 to 84 (Thai Psychiatric Satisfaction Scale (TPSS)).96 The mean and mode were 26.1 (SD=17.4) and 20, respectively. Twenty-six instruments (48.0%) presented a combination of positively and negatively worded items.42,43,46–48,53,54,59,60,67,71,73,81,82,86,87,89,93,97,99,102,108,110,116,124 Most items had Likert-type scale, though the response options varied between the instruments: the majority had an odd number of response options (52.0%), among which 35 had a 5-point Likert scale, two had a 7-point Likert scale, and two had a 3-point Likert scale. Seventeen instruments had a balanced rating scale (22.7%), among which 15 had a 4-point Likert scale and two had a 6-point Likert scale. One instrument used a dichotomous format,65 and 17 had combined response modalities (22.7%),39,45,51,52,54,56,58,60,68,69,71,92,95,97,102,109,124,125 two of which used a visual analogue scale.69,97 One instrument did not provide information about the response scale used.74 In addition, some scales also offered open-ended questions to capture additional qualitative information.46–48,59,64,66,76,77,84,86,95,122

Generation Process

Evidence of patient involvement varied between instruments. Some instruments were developed from a single perspective, while others used a combined approach (literature review and/or patients’ and/or professionals’ perspectives). Patients may have been involved in all phases of instrument development to ensure both content and face validity. In other instances, patients may have only taken part in the refinement process to ensure face validity of the scale. In this case, patients may have been asked to evaluate the understanding, relevance, clarity, acceptability and usefulness of the instrument in a pretest phase prior to larger-scale administration. Patients may also have been included in the item development process (through interviews or focus groups), but the instrument was not pretested in a subsequent phase. Fifty-six instruments have involved patients in some way (74.7%).16,39,42–48,50–55,57–64,66,68,71–74,76–84,86,88–90,93,94,96,98,99,102,105–107,109,110, 112–115,117,118,120–123 The majority of the instruments were designed using a combined approach (54.7%),16,39,43–52,54,55,60–64,66–74,76–78,80,82–84,86,90,92–94,98,109,112,113,117,118,122 while 28 instruments were developed from a single perspective (37.3%): 16 were drawn from a literature review,56,75,85,87,88,96,101,101–104,108,114–116,119,123–125 10 were designed from the patients’ perspective42,57–59,79,81,89,99,105–107,121 and 2 from the professional/expert or other perspectives.53,110 Six instruments (8.0%) did not report any information on the development process.65,91,95,97,111,120

Psychometric Properties

Psychometric properties were assessed and reported with varying levels of evidence. These findings were not available for 9 out of the 75 instruments (12%).51,52,65,78,86,95,97,116,118 Only four papers used statistical methods from item response theory (IRT),50,62,73,121 while the others used classical test theory (CTT). Reliability measured by internal consistency was documented for 61 instruments and was the most commonly used approach. Cronbach’s alpha coefficient was within the acceptable value range (0.70–0.90) for only 19 instruments.16,45,47,48,50,53,66,67,69,71,74,87,89,92,104,108,109,112,113,119 One instrument did not provide the values but indicated that all scales had reached the recommended value of 0.70.117 Of the 41 instruments that had a Cronbach’s alpha outside this interval (54.7%), 15 instruments had a total scale (or at least one domain) where the value failed to reach the recommended threshold of at least 0.70,43,44,46,54,56,60,63,64,68,72,79,93,100,101,107,111,125 and 31 instruments had at least one alpha value exceeding 0.90, which can indicate item redundancy.42,46,49,54,57–63,73,75,79–85,126,90,94,96,98,102,103,106,107,114,115,120–122,127 Fourteen instruments did not assess this property (18.7%).39,51,52,65,76–78,86,91,95,97,99,110,116,118 The range for all included instruments was 0.35 (SHEP)125 to 0.96 (PCQ-H, TPSS, VSSS-EU, QPC–IP)73,96,102,106 for the total scale scores or by dimension. In addition, stability over time was also examined using test–retest estimates for 20 instruments (26.7%).16,47,48,59,61,62,64,67,76,77,81–83,87,96,99,102,110–115,117,124,127 The questionnaires were administered a second time within a time interval ranging from 1 day to 2 weeks and this information was not available for five instruments (25%). The stability of results over time was globally acceptable for the majority of instruments (75%), while it was very good for 5 instruments (20%). Sixty-five instruments reported elements to support construct validity, but these data were often incomplete (86.7%). Indeed, among these articles, 51 investigated the structure of the instruments by using either exploratory or confirmatory factorial analysis39,42–46,49,50,53,54,56–58,60–64,68,72–77,79–85,87–91,93,96,98,103,104,106–108,110–115,119–121,123 or a Mokken analysis69 (68.0%), and 37 tested inter-item, item-dimension, dimension-dimension and item-total correlations (49.3%).16,42–46,49,50,53,54,60–63,66,67,71–75,80–82,85,88,93,96,100,101,103,104,106,107,114,115,117,119,122,123 (also miscalled as concurrent in some cases) validity was assessed for 31 instruments,16,43,44,47–49,57–59,63,67,69,71,73,75–77,79,80,85, 88,90,92–94,103,104,108,109,112–115,117,119,121,122 while only 6 reported some evidence of divergent validity.73,104,109,111,121,122 Among the latter, strong evidence was found for three instruments,73,104,109 while the others did not explore this property in relation to another established instrument.111,121,122 Moreover, one instrument provided conclusions that contradicted the theorized relationships.111 Some aspects of criterion-related validity were examined, and eight instruments reported elements of predictive validity (10.7%).45,72,74,82,90,109,111,119 Finally, a preliminary examination of the concept of responsiveness was only undertaken for three instruments (4.0%).87,88,111

Content of the Instruments

The inductive qualitative analysis of the 1932 items identified seven key domains that underlie the concept of quality of mental health care from the patient’s perspective. The most represented dimension was interpersonal relationships (22.6%), followed by respect and dignity (19.3%), access and care coordination (14.9%), drug therapy (14.1%), information (9.6%), psychological care (6.8%) and care environment (6.1%). Additionally, a few items focused on patient satisfaction (6.7%) rather than patient experience.

Discussion

This work provides for the first time a description and a critical analysis of all available PREMs for mental health care, regardless of care setting and conditions. The multitude of instruments identified in this review has shown that they differ in scope, content and psychometric robustness. This wide range of instruments is an obstacle when choosing the most appropriate assessment instrument, which has important implications for the accuracy of the quality of care measurement. Although it is recognized that the assessment of these psychometric properties is essential to support the performance of an assessment instrument,128,129 some of them are not systematically evaluated. Some instruments demonstrated a satisfactory development process and psychometric properties, while others did not meet the recommended criteria. Thus, our work provides strong evidence that professionals should choose PREMs that best suit their needs. Beyond this help in the choice of PREMS, our work leads us to frame our discussion around the distinction between two broad categories of measures of patient-centered care: patient experience and patient satisfaction. The instruments selected in our study combine these two related but distinct concepts.22,127,130,131 Patient satisfaction is commonly used by health care facilities as a measure of the quality of care from the patients’ perspective.132–135 However, patient satisfaction has been the subject of much controversy due to a tendency to obtain satisfaction rates with significant ceiling effects,136 thereby questioning the validity of the results.136–139 This tendency is partly related to the design of satisfaction surveys, which are based on respondents’ expectations and subjective perceptions.127,132. Hence, two patients who receive the same care but who have different expectations may not express the same degree of satisfaction. On the other hand, a patient who expresses high satisfaction with this care may not be representative of an optimal care experience,22,136 and conversely, some patients may express dissatisfaction that may reflect inappropriate or clinically unfeasible expectations rather than suboptimal care.22,140. Patient experience is now recognized as the preferred approach for measuring the quality of care and services and has been increasingly adopted by many countries.141–144 This measure overcomes the bias of satisfaction surveys by reintroducing the objective component into its evaluation.145 To do this, the questions are based on a detailed report that covers all aspects of the patient’s experience to reflect their actual care experience. In this sense, they provide more accurate and relevant information for monitoring and improving health services and care.22 However, there is considerable misunderstanding about what these two concepts refer to, and researchers tend to use them interchangeably.130,133 Our findings illustrate the difficulty of distinguishing satisfaction and experience measures among available instruments.33,146 First, when the initial literature search was conducted without including satisfaction terms, a limited number of results were identified (n=103), of which only nine met the eligibility criteria.39,53,58,62,73,81,106,113,123 In the absence of an adequate MeSH thesaurus, most of the patient experience instruments are indexed with the keyword “patient satisfaction”.39,62,73,123 The inclusion of a “patient experience” thesaurus would support research and the use of PREMs in practice. Second, no distinction between PREMs and satisfaction measures was made because this classification may not be obvious. Indeed, while experience measure refers to the objective experience of patients, by asking patients to provide a detailed report on specific aspects of care (eg, “I received information about treatment options for my mental health problems”),39 the satisfaction measure is a subjective assessment against patients’ expectations (eg, “Do you consider that your treatment has been adjusted to your situation?”).123 How questions are framed determines the degree of subjectivity of measures,147 and most instruments combine both types of questions. The wide range of instruments identified by the review suggests the value of developing item banks and computerized adaptive testing (CAT) covering all aspects relevant to psychiatric patients to allow comparison across multiple conditions and settings of care at a national and international level.148,149 These modern methods make it possible to optimize measurement precision and flexibility compared to standard questionnaires where all respondents answer the same items, regardless of their characteristics. These item banks, from which the CAT selects the items to be administered, will cover all of the dimensions underlying the concept of quality of mental health care. First, the inductive qualitative approach identified seven key dimensions to measure mental health care patient-reported experience (also called latent trait). Some dimensions are common concerns for general patients, while others are more specific to psychiatric patients. In particular, interpersonal relationships are a major focus covered by the majority of instruments. Interpersonal relationships aim to establish a climate favorable for successful health care delivery, thereby contributing to improved patient satisfaction, treatment compliance and, consequently, health care outcomes.150,151 This dimension has been extended to all social relationships that can influence the subjective perception of the patient’s quality of care by integrating relationships with other patients152 as well as involvement of family and relatives in care.153 Furthermore, the development of patient-reported measures should involve patients to ensure that the instruments reflect what truly matters to them.42,121,154 In our review, most of the instruments were developed with patient involvement; however, this was not a primary concern, and only a handful of instruments used qualitative approaches (such as qualitative interviews or focus groups) to obtain patients’ perspectives. In addition, no instrument covers the latent trait continuum (ie, underlying the multidimensional concept of quality of care), which poses the problem of measuring patient experience based on current instruments and suggests the relevance of creating an item bank. Second, the psychometric qualities of the included instruments were heterogeneous. Only four papers used statistical methods from IRT50,62,73,121 as a supplement to CTT. However, IRT was used only to assess unidimensionality or to help in the selection of optimal test items to shorten the instrument and enhance its clinical utility. Most of the instruments included in this review have documented at least one psychometric property, and only 12% reported none. The main properties assessed were construct validity and reliability, mainly quantified in terms of internal consistency. For most of the instruments that addressed construct validity, it was often incomplete and relied primarily on factor analysis. However, this method alone is not enough to support construct validity. The psychometric robustness of an instrument must be based on a thorough assessment of all psychometric properties. The majority used exploratory or confirmatory factor analyses to assess the underlying structure of their instruments. Other instruments have used item-item, item-dimension, dimension-dimension and/or item-total correlations. Convergent validity (also miscalled concurrent validity in some cases) received special attention in slightly less than of the instruments, unlike divergent validity. The size of the samples was variable, which may raise questions about the relevance of some validity estimates that may require large samples. In addition, precautions should be taken regarding generalization when the instrument has been tested in a sample with particular characteristics. Reliability was assessed in two ways. The majority of instruments reported good internal consistency, but excessively high values (>0.90) may suggest redundancies.126 Test–retest reliability was not a major objective, as only 20 instruments reported this property. Finally, only three instruments were concerned by the concept of responsiveness.87,88,111 However, this concept is particularly important for practice and research because it makes it possible to detect a change in a patient’s state of health.65 Taken together, these elements indicate that there are a large number of instruments that have been psychometrically validated with varying evidence. Our work may thus be considered a first step in the creation of an item library to comprehensively and validly measure mental health care patient-reported experience that will be used in France to develop, validate, and standardize item banks and CATs based on IRT.23 It will also provide internationally replicable measures that will allow direct comparisons of mental health care systems. The interest of item banks and CAT is mainly to propose tailored individual assessment without loss of scale precision or content validity.149

Strengths and Limitations of This Review

First, we used a standardized methodology and robust quality criteria to evaluate the performance of currently available mental health assessment instruments. The Pesudovs framework was used because its simplified scoring system allows for a rigorous evaluation with more flexibility40 than other methods, such as the COSMIN checklist, which is based on the “worst score” principle. In addition, an adapted version of the Pesudovs framework for the evaluation of PREMs has been developed and used several times in other recent systematic reviews.41 To our knowledge, this is the first review to identify and evaluate instruments designed to measure the quality of mental health care from the patients’ perspective for a range of conditions and in multiple care settings. However, the completeness of the review may be questionable. We conducted our research from a single database due to limited access to other bibliographic databases. Nevertheless, MEDLINE may be considered to be the reference database in the health field. Second, we limited our searches to the English language. This language restriction was applied to obtain a homogeneous pool of items and to limit the costs associated with translation. However, we argue that our research is comprehensive because it was conducted without date limitations and identified instruments from 16 countries. In addition, the reference lists of articles included in the review were carefully reviewed, and additional relevant references could be retrieved. Third, the search key used may be questionable. Patient experience is a relatively recent term for which there is no commonly accepted definition and no appropriate MeSH thesaurus. When the terms used in the research combination were limited to “patient experience” and its derivatives, the number of results was small. We have therefore included terms related to patient satisfaction to broaden the scope of the results. Furthermore, the concept of quality of care is multidimensional, and the use of the indexed MeSH thesaurus (ie, “quality of health care”) has not made it possible to identify as many instruments as using a more general reading key. Despite these findings, the large number of instruments identified by the review supports the comprehensiveness of this work. Fourth, the assessment of the quality of the development process and psychometric properties depends on the quality and accuracy of publications. Some instruments may not have been properly evaluated due to insufficient reporting or inability to access some documents. Finally, the content analysis of the instruments was based on a 7-dimensional categorization derived from the data of the inductive qualitative analysis. Despite the rigorous methodology used, this categorization may be questionable. Nevertheless, these results are consistent with the dimensions commonly found in the documentation.

Conclusion

This work provides a description and a critical analysis of the available PREMs for mental health care that can help professionals choose PREMs that best suit their needs. This is a critical step in the creation of an item library to measure mental health care patient-reported experience that could be used in France and Europe to develop, validate, and standardize item banks and CAT using innovative technologies based on IRT.

137 in total

1. Development of a consumer survey for behavioral health services.

Authors: S V Eisen; J A Shaul; B Clarridge; D Nelson; J Spink; P D Cleary
Journal: Psychiatr Serv Date: 1999-06 Impact factor: 3.084

2. The qualitative content analysis process.

Authors: Satu Elo; Helvi Kyngäs
Journal: J Adv Nurs Date: 2008-04 Impact factor: 3.187

3. The SATISPSY-22: development and validation of a French hospitalized patients' satisfaction questionnaire in psychiatry.

Authors: X Y Zendjidjian; P Auquier; C Lançon; A Loundou; N Parola; M Faugère; L Boyer
Journal: Eur Psychiatry Date: 2014-05-19 Impact factor: 5.361

4. Measuring what matters to patients.

Authors: Angela Coulter
Journal: BMJ Date: 2017-02-20

5. Association Between the Centers for Medicare and Medicaid Services Hospital Star Rating and Patient Outcomes.

Authors: David E Wang; Yusuke Tsugawa; Jose F Figueroa; Ashish K Jha
Journal: JAMA Intern Med Date: 2016-06-01 Impact factor: 21.873

6. Patients' perspectives on information received in outpatient psychiatry.

Authors: M Perreault; T E Katerelos; H Tardif; N Pawliuk
Journal: J Psychiatr Ment Health Nurs Date: 2006-02 Impact factor: 2.952

7. Who Gets Needed Mental Health Care? Use of Mental Health Services among Adults with Mental Health Need in California.

Authors: Linda Diem Tran; Ninez A Ponce
Journal: Calif J Health Promot Date: 2017

8. Quality of psychiatric care: validation of an instrument for measuring inpatient opinion.

Authors: A Gigantesco; R Morosini; A Bazzoni
Journal: Int J Qual Health Care Date: 2003-02 Impact factor: 2.038

9. Continuity of care: validation of a self-report measure to assess client perceptions of mental health service delivery.

Authors: Anthony S Joyce; Carol E Adair; T Cameron Wild; Gerald M McDougall; Alan Gordon; Norman Costigan; Gloria Pasmeny
Journal: Community Ment Health J Date: 2009-06-24

10. PIPEQ-OS--an instrument for on-site measurements of the experiences of inpatients at psychiatric institutions.

Authors: Oyvind Bjertnaes; Hilde Hestad Iversen; Johanne Kjollesdal
Journal: BMC Psychiatry Date: 2015-10-06 Impact factor: 3.630

6 in total

1. The Validity of the SQoL-18 in Patients with Bipolar and Depressive Disorders: A Psychometric Study from the PREMIUM Project.

Authors: Laurent Boyer; Sara Fernandes; Melanie Faugere; Raphaelle Richieri; Pascal Auquier; Guillaume Fond; Christophe Lancon
Journal: J Clin Med Date: 2022-01-29 Impact factor: 4.241

2. Reliability and validity of the Psychiatric Inpatient Patient Experience Questionnaire - Continuous Electronic Measurement (PIPEQ-CEM).

Authors: Hilde Hestad Iversen; Mona Haugum; Oyvind Bjertnaes
Journal: BMC Health Serv Res Date: 2022-07-11 Impact factor: 2.908

3. Capturing the experiences of patients with inherited optic neuropathies: a systematic review of patient-reported outcome measures (PROMs) and qualitative studies.

Authors: Benson S Chen; Tomasz Galus; Stephanie Archer; Valerija Tadić; Mike Horton; Konrad Pesudovs; Tasanee Braithwaite; Patrick Yu-Wai-Man
Journal: Graefes Arch Clin Exp Ophthalmol Date: 2022-01-13 Impact factor: 3.117

4. Reporting Inpatients' Experiences and Satisfaction in a National Psychiatric Facility: A Study Based on the Random Forest Algorithm.

Authors: Eman A Haji; Ahmed H Ebrahim; Hassan Fardan; Haitham Jahrami
Journal: J Patient Exp Date: 2022-01-04

5. interRAI Subjective Quality of Life Scale for Mental Health and Addiction Settings: A Self-Reported Measure Developed From a Multi-National Study.

Authors: Hao Luo; Alice Hirdes; Jyrki Heikkilä; Kathleen De Cuyper; Chantal Van Audenhove; Margaret Saari; John P Hirdes
Journal: Front Psychiatry Date: 2021-07-09 Impact factor: 4.157

6. Development and Calibration of the PREMIUM Item Bank for Measuring Respect and Dignity for Patients with Severe Mental Illness.

Authors: Sara Fernandes; Guillaume Fond; Xavier Zendjidjian; Pierre Michel; Karine Baumstarck; Christophe Lançon; Ludovic Samalin; Pierre-Michel Llorca; Magali Coldefy; Pascal Auquier; Laurent Boyer
Journal: J Clin Med Date: 2022-03-16 Impact factor: 4.241

6 in total