Literature DB >> 32648902

Psychometric evaluation of instruments measuring the work environment of healthcare professionals in hospitals: a systematic literature review.

Susanne M Maassen¹, Anne Marie J W Weggelaar Jansen², Gerard Brekelmans¹, Hester Vermeulen^3,4, Catharina J van Oostveen^2,5.

Abstract

PURPOSE: Research shows that the professional healthcare working environment influences the quality of care, safety climate, productivity, and motivation, happiness, and health of staff. The purpose of this systematic literature review was to assess instruments that provide valid, reliable and succinct measures of health care professionals' work environment (WE) in hospitals. DATA SOURCES: Embase, Medline Ovid, Web of Science, Cochrane CENTRAL, CINAHL EBSCOhost and Google Scholar were systematically searched from inception through December 2018. STUDY SELECTION: Pre-defined eligibility criteria (written in English, original work-environment instrument for healthcare professionals and not a translation, describing psychometric properties as construct validity and reliability) were used to detect studies describing instruments developed to measure the working environment. DATA EXTRACTION: After screening 6397 titles and abstracts, we included 37 papers. Two reviewers independently assessed the 37 instruments on content and psychometric quality following the COSMIN guideline. RESULTS OF DATA SYNTHESIS: Our paper analysis revealed a diversity of items measured. The items were mapped into 48 elements on aspects of the healthcare professional's WE. Quality assessment also revealed a wide range of methodological flaws in all studies.
CONCLUSIONS: We found a large variety of instruments that measure the professional healthcare environment. Analysis uncovered content diversity and diverse methodological flaws in available instruments. Two succinct, interprofessional instruments scored best on psychometrical quality and are promising for the measurement of the working environment in hospitals. However, further psychometric validation and an evaluation of their content is recommended.

© The Author(s) 2020. Published by Oxford University Press in association with the International Society for Quality in Health Care. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities: Chemical Disease Gene Species

Keywords: hospital; instruments; organizational culture; psychometric properties; systematic review; work environment

Mesh：

Year: 2020 PMID： 32648902 PMCID： PMC7654380 DOI： 10.1093/intqhc/mzaa072

Source DB: PubMed Journal: Int J Qual Health Care ISSN： 1353-4505 Impact factor: 2.038

Purpose

A positive work environment (WE) for healthcare professionals is an important variable in achieving good patient care [1] and is strongly associated with good clinical patient outcomes, e.g. low occurrence of patient falls and pressure ulcers, good pain management, low hospital mortality and hospital acquired infections rates [2, 3]. Associated with efficiency, e.g. fewer re-admissions and adverse events [2-4], a positive WE is a prerequisite for a safety climate and a high performing organization that finds quality improvement a part of daily practice [2, 5, 6]. Research shows that when healthcare professionals perceive a positive WE, they have more job satisfaction and are therefore likely to stay longer; fewer staff will suffer burnout or work-related stress [7-10]. In general, WE is defined as the inner setting of the organization FOR which staff work [11]. In healthcare, a positive WE is defined as a setting that supports excellence and decent practices that strive to ensure health, safety and the personal well-being of staff, support quality patient care and improve the motivation, productivity and performance of individuals and organizations [12]. Pearson et al. [13] explains the relevant elements of WE as ‘a workplace environment characterized by: the promotion of physical and mental health as evidenced by observable positive health and well-being, job and role satisfaction, desirable recruitment and retention rates, low absenteeism, illness and injury rates, low turnover, low involuntary overtime rates, positive inter-staff relationships, low unresolved grievance rates, opportunities for professional development, low burnout and job strain, participation in decision-making, autonomous practice and control over practice and work role, evidence of strong clinical leadership, demonstrated competency and positive perceptions of the work environment including perceptions of work-life balance.’ A positive WE stems from respect and trust between colleagues at all levels, effective collaboration and communication between all educational levels within a profession, different disciplines and working on different departments [14], recognition for good work, a safe atmosphere, positive climate and support from management [5, 6]. Measuring WE is not easy, since this multidimensional concept encompasses diverse elements [13, 15]. Some WE measuring instruments focus on specific professions (e.g. nursing [16-18], physicians [19, 20], residents [21], management [22, 23]) or specific wards (e.g. intensive care units [17], critical care [24], cardiac care [25]) or include only one or two aspects of WE (e.g. ethics, social climate [26], organizational culture [27-29], organizational climate [30]). Achieving a positive WE is not just up to the members of one profession in one department, but a challenge for a team with members from various professions, roles or departments and even organizational boundaries [14, 31]. WE is not the sole responsibility of management, but of management and healthcare professionals together. Therefore, a WE measurement instrument should measure all members of a team and not just those from one profession, one department or one or two aspects of WE. If hospitals pursue systematic and objective insights into their WE with a valid, reliable and succinct measurement tool, they would gain an understanding of the influential factors that would allow them to improve WE for the benefit of their patients, staff and organization. The aim of this systematic review was to assess instruments that provide valid, reliable and succinct measures of health care professionals’ WE in hospitals.

Data sources and study selection

To find an instrument that staff can use to assess their WE, we performed a three-step study. First, we systematically searched the literature to detect all available WE measuring instruments. Second, we assessed the content of these instruments. Third, we assessed instrument quality with the COSMIN guidelines [32, 33], particularly their psychometric properties. To ensure optimal clarity and transparency, we used the PRISMA reporting guideline to structure this paper [34].

Step 1: systematic literature searches

One researcher (SM) and a librarian systematically searched Embase, Medline Ovid, Web of Science, Cochrane, CENTRAL, CINAHL, EBSCOhost and Google Scholar using key words and their synonyms: ‘WE,’ ‘organizational culture’ and ‘measurement’ (see Supplementary File 1) from inception through December 2018. No search limits were used for language, publication date or type of research. Titles and abstracts of retrieved papers were independently reviewed for inclusion by two researchers (SM and CO). The inclusion criteria were: (i) written in English, the paper describes the development of an original WE measuring instrument for healthcare professionals in hospitals; (ii) the instrument is not a translation of another instrument; (iii) the paper describes psychometric properties with at least some form of construct validity and reliability. Given our focus on the WE of all hospital staff, we excluded papers describing WE instruments in a single profession in one department. The reviewers discussed to the point of consensus any differences in their assessments of potentially eligible papers. Full versions of eligible papers were then scrutinized independently by three researchers (SM, CO and GB) and cited references were assessed to find additional instruments. Disagreements on these assessments were discussed with a fourth researcher (AMW) until consensus was reached.

Data extraction

Step 2: content assessment

Based on a pre-defined data extraction form two researchers independently (SM, CO, GB or AMW) extracted the study context and instrument content. Study context included research design, country, clinical setting, number and types of health care staff. Instrument content included primary goal, measurement type, focus of interest and number of items, subscales, sample, and study setting. Next, to enable comparative analysis of the contents, two researchers (SM and CO) independently sorted and clustered all items/subscales of the instruments into elements. Their content analyses were discussed up to consensus by the whole research team.

Step 3: quality assessment

To appraise the methodological quality of the instruments, we assessed their psychometric properties: measurement development, internal consistency, reliability, structural validity, criterion validity, hypothesis testing for construct validity, measurement error and responsiveness. We used the consensus-based standards for the selection of health measurement instruments (COSMIN) risk of bias checklist [32, 35]. The COSMIN checklist was developed to assess the methodological quality of single studies included in systematic reviews of patient-reported outcome measures (PROM) [32]. Although the subject of our review is a staff outcome measurement and not a PROM, this assessment method is useful because the purpose remains the same: screening for the risk of bias. The COSMIN risk of bias checklist is a modular tool, which means that only the measurement properties that were described in a paper were assessed [32]. COSMIN contains two boxes on content validity. The second box focuses on detailed content validity development issues and is not suitable for the type of studies included in this review. Therefore, we used only the first box, ‘PROM development.’ Table 1 lists the definitions of properties as applied in this review.

Table 1

Definitions of measurement properties [33]

Measurement property	Definition
Content development	The degree to which the content of a measurement instrument is an adequate reflection of the construct to be measured
Internal consistency	The degree to which different items of a (sub)scale correlate and measure the same construct (interrelatedness)
Reliability	The extent to which scores for persons who have not changed are the same for repeated measurement under several conditions
Structural validity	The degree to which the scores of an instrument are an adequate reflection of the dimensionality of the construct to be measured
Criterion validity	The degree to which the scores of an instrument are an adequate reflection of a ‘golden standard’
Hypothesis testing for construct validity	The degree to which the scores of the instrument are consistent with the hypotheses based on the assumption that the instrument measures the construct to be measured
Measurement error	The systematic and random error of a patient’s score that is not attributed to true changes in the construct to be measured
Responsiveness	The ability of an instrument to detect change over time in the construct to be measured

Definitions of measurement properties [33] Two researchers (SM and CO) appraised quality on a four-point scale (very good-adequate-doubtful-inadequate). Their ratings were independently crosschecked by two other researchers (GB and AMW). The methodological quality score for each psychometric property was determined by the lowest rating of any item in that category [32]. When applicable, the measurement properties were rated by the ‘criteria for good measurement properties’ as described by Mokkink et al. [32]. Properties were judged as ‘sufficient’ (+), ‘insufficient’ (−) or ‘indeterminate’ according to the COSMIN standards [35].

Results of data synthesis

The search strategy (see Figure 1) yielded 6397 individual papers. After screening the titles and abstracts, 6305 papers were excluded because they did not describe the development of an original instrument to measure the WE of healthcare professionals or did not provide psychometric details. This resulted in 92 potentially relevant papers eligible for full text screening. After full text screening, another 57 papers were excluded based on the inclusion criteria. Assessing the references cited in the included papers found one other relevant study for a total of 37 included papers.

Figure 1

Flow diagram of search and selection procedure conform PRISMA [34].

Flow diagram of search and selection procedure conform PRISMA [34]. The 37 papers each describe an individual self-assessment instrument, all using Likert scales to reflect on the degree of agreement with a specific proposition about the WE. Going from the date of the oldest publication (1984), development of instruments to measure the WE of healthcare professionals began in 1984 and has been continuously under development since then (see Table 2). Studies took place in the USA (20/37), Canada (4/37), Australia (3/37), UK (3/37) Japan (1/37) and European Union (7/37). More than half (20/37) sampled healthcare professionals in the nursing domain: e.g. nurses/nurse assistants [36-53]. Other studies applied samples of diverse healthcare professionals [54-70]. Most studies focused on measuring WE as a total concept [36–39, 41, 43–45, 48–56, 59, 62, 68, 71, 72] despite terming it differently sometimes, e.g. practice environment [41, 43–45, 49, 52, 56, 68], ward environment [37] or healthy WE [39, 59, 71]. Seven studies focused primarily on culture, as in organizational culture [42, 61, 66], hospital culture [60], nursing [47] or ward culture [63] and culture of care [65]. Additionally, we found WE instruments with a focus on organizational [57, 64, 70] or psychological climate [58] in contrast to instruments that focus on teamwork [46] or aspects of teamwork, such as team vitality [69], team collaboration [67] and workplace relationships [40].

Table 2

Content and context of work-environment measuring instruments

Author	Year	Sample and setting	Instrument	Focus	Measurement type	No. of items
Abraham and Foley [36]	1984	Nursing students in mental health nursing, USA	Work-environment scale, short form (WES-SF)	Work environment	4-point Likert, agreement	40
Adams, Bond [37]	1995	Registered nurses in inpatient hospital wards, UK	Ward organizational features scale (WOFS)	Ward environment	4-point Likert, agreement	105
Aiken and Patrician [38]	2000	Nurses in hospitals (specialized AIDS units and general medicine), USA	Revised nursing work index (NWI-R)	Nursing work environment	4-point Likert, agreement	57
Appel, Schuler [54]	2017	Physicians and nurses in hospitals (ICU, ER, intermediate care, regular wards, OR), Germany	Kurzfragenbogen zur arbeitsanalyse (KFZA)	Work environment	5-point Likert scale	26 SV37 LV
Berndt, Parsons [39]	2009	Nurses in hospitals, USA	Healthy workplace index (HWPI)	Healthy workplace	4-point Likert, agreement and presence	32
Bonneterre, Ehlinger [55]	2011	Nurses and nurse assistants in hospitals, France	Nursing work index—extended organization (NWI-EO)	Psychosocial and organizational work factors	4-point Likert, agreement	22
Clark, Sattler [71]	2016	Nurses in hospitals, United States	Healthy work-environment inventory (HWEI)	Healthy work environment	5-point Likert, presence	20
Duddle and Boughton [40]	2008	Nurses in a hospital, Australia	Nursing workplace relational environment scale (NWRES)	Nursing workplace relational environment	5-point Likert, agreement	22
Erickson, Duffy [56]	2004	Nurses, occupational therapist, physical therapy, respiratory therapy, social services, speech pathology and chaplaincy working within one hospital. United States	Professional practice environment scale (PPE)	Practice environment	4-point Likert, agreement	39
Erickson, Duffy [41]	2009	Nurses within one hospital, USA	Revised professional practice environment scale (RPPE)	Practice environment	4-point Likert, agreement	39
Estabrooks, Squires [42]	2009	Nurses in pediatric hospitals, Canada	Alberta context tool (ACT)	Organizational context	5-point Likert, agreement or presence	56
Flint, Farrugia [43]	2010	Nurses within two hospitals, Australia	Brisbane practice environment measure (B-PEM)	Practice environment	5-point Likert, agreement	26
Friedberg, Rodriguez [57]	2016	Clinicians (physicians, nurses, allied health professionals) and other staff (clerks, receptionist) in community clinics and health centers, United States	Survey of workplace climate	Workplace climate	5-point Likert, agreement and 1 item: 5-point scale (1 calm—5 hectic/chaotic)	44
Gagnon, Paquet [58]	2009	Health care workers (nurses, health care professionals, technicians, office staff, support staff and management) within one health care center, Canada	CRISO Psychological climate questionnaire (PCQ)	Psychological climate	5-point Likert, agreement	60
Ives-Erickson, Duffy [44]	2015	Patient care assistants, within two hospitals, USA	Patient Care Associates’ Work-environment scale (PCA-WES)	Practice environment	4 -point Likert, occurrence	35
Ives Erickson, Duffy [45]	2017	Nurses within one hospital, USA	Professional practice work-environment inventory (PPWEI)	Practice environment	6-point Likert, agreement	61
Jansson von Vultée [59]	2015	Health care personal, task advisors, employees at advertising, daycare and in leadership programs, Sweden	Munik questionnaire	Healthy workplaces	4-point Likert, agreement	65
Kalisch, Lee [46]	2010	Nurses and nurse assistants in hospitals, USA	Nursing teamwork survey (NTS)	Nursing teamwork	5-point Likert appearance	33
Kennerly, Yap [47]	2012	Nurses and nurse assistants in long term care, hospital, ambulatory care, USA	Nursing culture assessment tool (NCAT)	Nursing culture	4-point Likert, agreement	22
Klingle, Burgoon [60]	1995	Patients, nurses and physicians, USA	Hospital culture scale (HSC)	Hospital culture	5-point Likert, agreement	15
Kobuse, Morishima [61]	2014	Physicians, nurses, allied health personnel, administrative staff, other staff in hospitals, Japan	Hospital organizational culture questionnaire (HOCQ)	Organizational culture	5-point Likert, agreement	24
Kramer and Schmalenberg [48]	2004	Nurses in hospitals, USA	Essentials of Magnetism tool (EOM)	Nursing work environment		62
Lake [49]	2002	Nurses in hospitals, USA	Practice environment scale of the nursing work index (PES-NWI)	Practice environment	4-point Likert, agreement	31
Li, Lake [50]	2007	Nurses in hospitals, USA	Short form of NWI-R	Nursing work environment	4-point Likert, agreement	12
Mays, Hrabe [51]	2010	Nurses and nurse managers in hospitals, USA	N2N Work-environment scale	Nursing work environment	5-point rating scale	12
McCusker, Dendukuri [62]	2005	Employees from rehabilitation services, diagnostic services, other clinical services and support services in one hospital, Canada	Adapted 24 version NWI-R	Work environment	4-point Likert, agreement	23
McSherry and Pearce [63]	2018	Nurses, physicians, allied health care and supporting staff in hospitals, UK	Cultural health check (CHC)	Ward culture	4-point Likert, occurrence	16
Pena-Suarez, Muniz [64]	2013	Auxiliary nurse, administrator assistant, porter, laboratory technician, X-ray technician and others (nurses and physicians excluded) within health services of Austria, Spain	Organizational climate scale (CLIOR)	Organizational climate	5-point Likert, agreement	50
Rafferty, Philippou [65]	2017	Nurses, allied health professionals, physicians, administrative and care assistant in in and outpatients mental health and community care, United Kingdom	CoCB	Culture of care	5-point Likert, agreement and 1 open question	31
Reid, Courtney [52]	2015	Nurses in professionals and industrial organizations, Australia	Brisbane practice environment measure (B-PEM)	Practice environment	5-point Likert, agreement	28
Saillour-Glenisson, Domecq [66]	2016	Physicians, nurses and orderlies in hospitals, France	Contexte organisationnel et managérial en etablissement de santé (COMEt)	Organizational culture	5-point Likert, agreement	82
Schroder, Medves [67]	2011	Health care professionals from different backgrounds working in health care teams, Canada	Collaborative practice assessment tool (CPAT)	Team collaboration	7-point Likert, agreement and 3 open questions	56
Siedlecki and Hixson [68]	2011	Nurses and physicians in one hospital, USA	Professional practices environment assessment scale (PPEAS)	Professional practice environment	10-point rating scale	13
Stahl, Schirmer [72]	2017	Midwives within hospitals, Germany	Picker Employee Questionnaire—Midwives	Work environment	Different rating types with 2–16 answer options	52
Upenieks, Lee [69]	2010	Front line nurses, physicians and ancillary health care providers in hospitals, United States	Revised health care team vitality instrument (HTVI)	Team vitality	5-point Likert, agreement	10
Whitley and Putzier [53]	1994	Nurses in one hospital, USA	Work quality index	Work environment	7-point Likert, satisfaction	38
Wienand, Cinotti [70]	2007	Physicians, scientist, management, nurses, therapists, laboratory and radiology technicians in hospitals and outpatient clinics, Italy	Survey on organizational climate in health care institutions (ICONAS)	Organizational climate	10-point rating scale	48

Content and context of work-environment measuring instruments

Content

The number of items in the instruments range from 12 to 105 with a mean of 44 items (see Table 2). Sorting and clustering the subscales/items up to consensus resulted in 48 WE elements (see Table 3 and Supplementary File 2). Based on the content comparison, we conclude that 21 instruments measure the environment of clinical inpatient settings [36–39, 41, 43–45, 48–56, 62, 68, 71, 72], sharing common features in terms of items and constructs, e.g. multidisciplinary collaboration [36–39, 41, 44, 48–51, 54–56, 62, 68, 72], autonomy [36, 38, 41, 45, 48, 49, 53, 54, 56, 62], informal leadership [37, 39, 41, 44, 48–51, 56, 72] or supportive management [36–39, 43, 44, 48–50, 52, 54, 55, 62, 72]. Other frequently used constructs and items are staffing adequacy [38, 39, 45, 48–50, 52, 55], workload [36, 43, 52–54, 59, 71, 72] or working conditions [37, 43, 49, 53–55], and professional development [39, 43, 48, 49, 52–54, 62, 71, 72] or professionalism and competency [38, 39, 43, 48, 49, 51, 52, 62]. We found no commonalities among the items and constructs used in instruments focused on culture [42, 47, 60, 61, 63, 65, 66]. These instruments emphasize informal leadership [57, 58, 64, 70], innovation and readiness for change [57, 58, 64] and relational atmosphere [57, 58, 64]. Items on respect [40, 46, 67], teamwork [40, 46], open communication [40, 67, 69], supportive management [46, 67, 69] and information distribution [46, 67, 69] are predominantly present in the instruments that emphasize teamwork.

Table 3

Content mapping of the instruments

Content mapping of the instruments Some instruments were developed years ago and have undergone several updates; e.g. the Nursing Work Index [38, 49, 50, 55, 62], Professional Practice Environment [41, 45, 56] and Brisbane practice environment measure [43, 52]. Adapted versions were frequently developed for a different sample than the original instrument [41, 52, 55, 56, 62]. Although several instruments have a development history, the process is not always described properly. Only the instruments developed by Adams, Bond [37], Kramer and Schmalenberg [48], Rafferty, Philippou [65] and Stahl, Schirmer [72] provide enough information on the developmental process to gain an adequate COSMIN score. Some authors refer to other publications for descriptions of the item development process and face or content validity [42, 50, 52, 54, 63].

Methodological quality

Overall, judged by the COSMIN guideline, the methodological quality of the studies is basic but adequate (see Table 3). Most authors explain the structural validity and internal consistency. However, three instruments were rated as inadequate [53, 59, 67] and five as doubtful [37, 41, 60, 63, 66] for structural validity. Ten studies applied confirmative factor analysis, mostly alongside an exploratory factor analysis [43, 46, 47, 52, 57, 58, 64, 66, 67, 69]. Internal consistency measures were calculated and reported with Cronbach’s alpha by all but three authors [36, 59, 69]. Only Pena-Suarez, Muniz [64] conducted cross-cultural validity, although their method was inadequate. In 12/37 studies, the criteria for sufficient internal consistency (Cronbach’s alpha > 0.7 for each subscale [32]) were not met [37, 42, 47, 48, 52, 54, 55, 58, 62, 66, 67, 72]. Other measurement properties were too scattered for both method and method quality to assess criterion validity or hypothesis testing. Other fundamental measurement properties were performed arbitrarily and if available, the quality can be considered as doubtful. Best overall quality assessment was found for the culture of care barometer (CoCB) [65] and the Picker Employee Questionnaire for Midwives [72] because of their overall adequate score on COSMIN criteria and sufficient statistical outcome for internal consistency. That said, measurement properties such as reliability, hypothesis testing and criterion validity have not yet been established for these relatively new instruments (Table 4).

Table 4

Quality assessment of methodology in work-environment instruments

Author, year	Quality of instrument development	n	Structural validity		Internal consistency		Other measurement properties
Author, year	Quality of instrument development	n	Meth. quality	Rating	Meth. quality	Rating	Yes/no	Specification	Meth. quality	Rating
Abraham and Foley [36]	Inadequate	153			Doubtful	?	No
Adams, Bond [37]	Adequate	834	Doubtful	- EFA loadings NR	Very good	- α 0.92–0.66	Yes	Reliability measurement error	Doubtful inadequate	- Pearson r 0.90–0.71? NR
Aiken and Patrician [38]	Inadequate	2027			Doubtful	? Α 0.79–0.75	Yes	Reliability hypothesis testing	Inadequate Inadequate	? NR? NR
Appel, Schuler [54]	OP	1163	Adequate	- EFA loadings SV 0.86–0.36; LV NR	Very good	- α: LV 0.87–0.60 SV 0.80–0.63	No
Berndt, Parsons [39]	Doubtful	160	Adequate	- EFA loadings 0.87–0.45	Very good	+ α 0.92–0.88	Yes	Hypothesis testing	Very good	+ OOM
Bonneterre, Ehlinger [55]	Doubtful	4085	Adequate	- EFA loadings NR	Very good	- α 0.89–0.56	Yes	Reliability hypothesis testing	Doubtful	- Spearman’s r 0.88–0.54? KG
Clark, Sattler [71]	Doubtful	520	Adequate	- EFA loadings 0.79–0.47	Very good	+ α 0.94	No
Duddle and Boughton [40]	Doubtful	119	Adequate	- EFA loadings 0.88–0.61	Very good	+ α 0.93–0.78	No
Erickson, Duffy [56]	Inadequate	849	Adequate	- EFA loadings 0.87–0.31	Very good	+ α 0.88–0.78	No
Erickson, Duffy [41]	Inadequate	1550 (2x775)	Doubtful	? EFA loadings 0.87–0.34	Very good	+ α 0.88–0.81	No
Estabrooks, Squires [42]	OP	752	Adequate	- EFA loadings 0.86–0.34	Very good	- α 0.91–0.54	Yes	Hypothesis testing	Very good	+ OOM
Flint, Farrugia [43]	Inadequate	195 (EFA) 938 (CFA)	Very good	- EFA loadings 0.95–0.38 CFA for each factor Range CFI 0.99–0.919 range RMSEA 0.08–0.06	Very good	+ α 0.87–0.81	No
Friedberg, Rodriguez [57]	Inadequate	601	Very good	+ EFA and CFA loadings 0.95–0.38 CFI 0.97 RMSEA 0.04	Very good	+ α 0.96–0.78	No
Gagnon, Paquet [58]	Inadequate	3142	Very good	+ CFA CFI 0.98 RMSEA 0.05	Very good	- α 0.91–0.64	Yes	Hypothesis testing	Inadequate	? KG
Ives-Erickson, Duffy [44]	Inadequate	390	Adequate	- EFA loadings 0.88–0.42	Very good	+ α 0.93–0.84	No
Ives Erickson, Duffy [45]	Inadequate	874	Adequate	- EFA loadings 0.85–0.51	Very good	+ α 0.92–0.82	No
Jansson von Vultée [59]	Inadequate	435	Inadequate	- NR	Inadequate	NR	No
Kalisch, Lee [46]	Doubtful	1758	Very good	- EFA and CFA: loadings 0.69–0.41; CFI 0.88 RMSEA 0.05	Very good	+ α 0.85–0.74	Yes	Reliability criterion validity hypothesis testing	Doubtful very good Doubtful	+ ICC2 > 0.84 + Pearson r 0.76 + KG
Kennerly, Yap [47]	Inadequate	340	Very good	- EFA and CFA: loadings 0.90–0.51; CFI 0.94 RMSEA 0.06	Very good	- α 0.93–0.60	No
Klingle, Burgoon [60]	Inadequate	1829	Doubtful	- NR	Very good	? α 0.87–0.81	Yes	Hypothesis testing	Doubtful	? KG
Kobuse, Morishima [61]	Doubtful	2924	Adequate	- EFA loadings 0.87–0.28	Very good	+ α 0.82–0.75	Yes	Hypothesis testing	Inadequate	? KG
Kramer and Schmalenberg [48]	Adequate	3602	Adequate	- EFA loadings 0.83–0.34	Very good	- α 0.94–0.69	Yes	Reliability hypothesis testing	Doubtful very good	? r range 0.88–0.53 + KG
Lake [49]	Inadequate	2299	Adequate	? EFA loadings: 0.73–0.40;	Very good	+ α 0.84–0.71	Yes	Reliability hypothesis testing	Inadequate very good	+ ICC1 0.97–0.86 + KG
Li, Lake [50]	OP	2000	Adequate	- EFA loadings > 0.70	Very good	+ α 0.92–0.84	No
Mays, Hrabe [51]	Inadequate	210	Adequate	- EFA loadings 0.87–0.57	Doubtful	+ α 0.89–0.75	Yes	Hypothesis testing	Doubtful	? KG
McCusker, Dendukuri [62]	Inadequate	121	Adequate	- EFA loadings 0.79–0.40	Very good	- α 0.88–0.64	Yes	Hypothesis testing	Adequate	? OOM
McSherry and Pearce [63]	OP	98	Doubtful	- EFA loadings 0.92–0.17	Doubtful	+ α 0.78–0.71	No
Pena-Suarez, Muniz [64]	Inadequate	3163	Very good	- EFA and CFA: loadings 0.77–0.41; CFI 0.85 RMSEA 0.06	Doubtful	- α total scale 0.97	Yes	Cross-cultural validity	Inadequate	- DIF NR
Rafferty, Philippou [65]	Adequate	1705	Adequate	- EFA loadings 0.87–0.40	Very good	+ α 0.93–0.70	No
Reid, Courtney [52]	OP	639	Very good	- EFA and CFA: loadings 0.88–0.40; CFI 0.91 RMSEA 0.06	Very good	- α 0.89–0.66	Yes	Hypothesis testing	Doubtful	? KG
Saillour-Glenisson, Domecq [66]	Inadequate	859	Doubtful	- EFA and CFA: loadings, CFI and RMSEA NR	Very good	- α 0.91–0.53	Yes	Reliability	Inadequate	- ICC range NR
Schroder, Medves [67]	Doubtful	111	Inadequate	- CFA for each factor range CFI 0.99–0.94 range RMSEA 0.13–0.04	Very good	- α 0.89–0.67	No
Siedlecki and Hixson [68]	Inadequate	1332	Adequate	- EFA loadings 0.91–0.71	Very good	+ α 0.89–0.73	Yes	Hypothesis testing	Inadequate	? KG
Stahl, Schirmer [72]	Adequate	1692	Adequate	- EFA loadings 0.80–0.30	Very good	- α 0.90–0.50	Yes	Hypothesis testing	Inadequate	? OOM
Upenieks, Lee [69]	Doubtful	464	Very good	+ CFA: CFI = 0.98 RSMEA 0.06			Yes	Hypothesis testing	Very good	? OOM Pearson r 0.52–0.72
Whitley and Putzier [53]	Inadequate	245	Inadequate	- NR	Very good	+ α 0.87–0.72	No
Wienand, Cinotti [70]	Doubtful	8681	Adequate	- EFA loadings 0.78–0.38	Very good	+ α 0.95–0.76	Yes	Hypothesis testing	Very good	? KG

NR: Not reported, KG: known groups, OOM: other outcome measurement, OP: other publication, LV: long version, SV: short version, EFA: exploratory factor analysis, CFA: confirmative factor analysis, CFI: comparative fit index, RMSEA: root-mean-square error of approximation, DIF: differential item functioning and ICC: intraclass correlation coefficient.

Quality assessment of methodology in work-environment instruments NR: Not reported, KG: known groups, OOM: other outcome measurement, OP: other publication, LV: long version, SV: short version, EFA: exploratory factor analysis, CFA: confirmative factor analysis, CFI: comparative fit index, RMSEA: root-mean-square error of approximation, DIF: differential item functioning and ICC: intraclass correlation coefficient.

Discussion

The aim of this review was to assess WE instruments and learn which ones provide valid, reliable and succinct measures of health care professionals’ WE in hospitals. We identified 37 studies that report on the development and psychometric evaluation of an instrument measuring healthcare professionals’ experience of WE in hospitals. The number of instruments found, even using tight inclusion criteria, reflects the importance of the WE concept in the past 35 years. Despite new management structures, the greater focus on cost containment, and the change in focus from profession-centeredness to patient-centeredness have not influenced the importance of WE measurement [6, 73]. Especially rising attention for patient safety and high-performing organizations steered the importance of WE measurement. However, over the years, WE measurements have been made under different names, different elements and focus. Although elements did overlap, we could not identify one clear set to measure WE. Therefore, it is not possible to conclude which elements contribute more to the WE construct based on the assessment of the instruments. Additionally, most studies used a sample from the nursing domain, especially nurses [37, 38, 40, 46–51, 53, 55, 71], whereas a positive WE is team-based and teams in hospitals contain more than one profession, different educational levels and specialisms [14]. We found methodological flaws in most of the papers reporting the development of WE instruments. The most relevant shortcomings are the lack of information on scale development, failing to fully determine structural validity by confirmative factor analysis and failing to establish such psychometric properties as ‘reliability,’ ‘criterion validity,’ ‘hypothesis testing,’ ‘measurement error’ and ‘responsiveness.’ This made drawing firm conclusions on the validity and reliability of the 37 instruments included in this review hardly possible. Just five instruments scored ‘adequate’ or ‘very good’ on the COSMIN risk of bias checklist on all of the applied properties [42, 50, 54, 65, 72]. Of the five, only the short questionnaire for workplace analysis (KFZA) [54] and the CoCB [65] are both generally applicable and succinct, with an item total below the mean of this review. Both instruments are recent developments, which could suggest that scientists are paying more attention to the (reporting of) methodology of measurement instrument development.

Limitations

Some limitations of this study warrant consideration. First, to compare instrument content, the item and subscale descriptions of the individual instruments were mapped into 48 elements. Some details of instruments may possibly have been lost in the mapping process. Second, we sought original development and validation studies for this review, which may mean that other publications that discuss other psychometric properties of the included instruments were left out. Third, we searched for instruments intended to measure the WE in hospitals. Nevertheless, a large group of studies used samples from predominantly one discipline (e.g. nurses or nursing assistants [37, 38, 41–45, 51–53, 71]), and some instruments were developed specifically for one discipline (e.g. nursing [39, 40, 46, 48–50, 55]). Given that nurses are the largest professional group in hospitals, our search had to include measurement instruments for nursing. However, our assessment focused on instruments measuring WE in general and thus excluded instruments measuring a specific type of nursing or department.

Implications for research

To address methodological issues in the development process of instruments, it is important that instruments provide an understanding of the construct to be measured. Therefore, it is crucial that healthcare professionals participate actively in next phase. Clear definitions of items and categories would be helpful in creating distinct construct definitions and thus obtain a better understanding of what should be included in a WE measurement instrument to provide relevant, comprehensible and meaningful information [74, 75]. Some instruments found in this review already perform well, so we do not recommend developing new instruments. Rather, we advise scrutinizing the methodology of existing instruments using the COSMIN guidelines. For instance, we suggest performing confirmative factor analyses to check whether the data fit the proposed theoretical model for WE, and to determine the responsiveness of WE instruments in longitudinal research [32, 35].

Implications for practice

A positive healthcare WE is vital for high-performing healthcare organizations to provide good quality of care and retain a happy, healthy professional workforce [2, 6, 7, 76] so obtaining periodical insight into WE assessment on the team level is important [72]. Preferably, the WE instrument should facilitate teams and management to improve the WE, e.g. by deploying, monitoring and evaluating focused interventions. Besides taking valid, reliable measurements, the instrument should provide clearly relevant information for healthcare professionals [6, 77, 78]. Research shows that if an instrument provides information for use as a dialog tool, teams will become actively engaged in improving their WE [65]. Especially, the CoCB [65] is designed to do this. Based on the assumption that instruments containing more than one construct measured with the same method are at risk of overrated validity [74, 75], the outcomes of the CoCB should always be used in combination with other managerial information, e.g. patient quality data or data on personnel sick leave and job satisfaction.

Conclusion

The findings of this systematic review have potential value in guiding researchers, healthcare managers and human resource professionals to select an appropriate and psychometrically robust instrument to measure WE. We have demonstrated content diversity and methodological problems in most of the currently available instruments, highlighting opportunities for future research. Based on our findings, we draw the cautious conclusion that more recently developed instruments, such as the CoCB [65], seem to fit current reporting demands for healthcare teams. However, we suggest investing in improving their psychometrical quality. Click here for additional data file.

74 in total

1. Development and psychometric exploration of the professional practice environment assessment scale.

Authors: Sandra L Siedlecki; Eric D Hixson
Journal: J Nurs Scholarsh Date: 2011-09-19 Impact factor: 3.176

2. Testing the psychometric properties of the Brisbane Practice Environment Measure using Exploratory Factor Analysis and Confirmatory Factor Analysis in an Australian registered nurse population.

Authors: Carol Reid; Mary Courtney; Debra Anderson; Cameron Hurst
Journal: Int J Nurs Pract Date: 2013-11-15 Impact factor: 2.066

3. The development and testing of the nursing teamwork survey.

Authors: Beatrice J Kalisch; Hyunhwa Lee; Eduardo Salas
Journal: Nurs Res Date: 2010 Jan-Feb Impact factor: 2.381

Review 4. Sources of method bias in social science research and recommendations on how to control it.

Authors: Philip M Podsakoff; Scott B MacKenzie; Nathan P Podsakoff
Journal: Annu Rev Psychol Date: 2011-08-11 Impact factor: 24.137

5. Development and psychometric evaluation of the patient care associates' work environment scale.

Authors: Jeanette Ives-Erickson; Mary E Duffy; Dorothy A Jones
Journal: J Nurs Adm Date: 2015-03 Impact factor: 1.737

6. Fostering implementation of health services research findings into practice: a consolidated framework for advancing implementation science.

Authors: Laura J Damschroder; David C Aron; Rosalind E Keith; Susan R Kirsh; Jeffery A Alexander; Julie C Lowery
Journal: Implement Sci Date: 2009-08-07 Impact factor: 7.327

Review 7. High performing hospitals: a qualitative systematic review of associated factors and practical strategies for improvement.

Authors: Natalie Taylor; Robyn Clay-Williams; Emily Hogden; Jeffrey Braithwaite; Oliver Groene
Journal: BMC Health Serv Res Date: 2015-06-24 Impact factor: 2.655

8. Short Questionnaire for Workplace Analysis (KFZA): factorial validation in physicians and nurses working in hospital settings.

Authors: Patricia Appel; Michael Schuler; Heiner Vogel; Amina Oezelsel; Hermann Faller
Journal: J Occup Med Toxicol Date: 2017-05-12 Impact factor: 2.646

9. Validation of the organizational culture assessment instrument.

Authors: Brody Heritage; Clare Pollock; Lynne Roberts
Journal: PLoS One Date: 2014-03-25 Impact factor: 3.240

10. Measuring nurses' perception of work environment: a scoping review of questionnaires.

Authors: Rebecka Maria Norman; Ingeborg Strømseng Sjetne
Journal: BMC Nurs Date: 2017-11-21

1 in total

Review 1. Identification of tools used to assess the external validity of randomized controlled trials in reviews: a systematic review of measurement properties.

Authors: Andres Jung; Julia Balzer; Tobias Braun; Kerstin Luedtke
Journal: BMC Med Res Methodol Date: 2022-04-06 Impact factor: 4.615

1 in total