Literature DB >> 33982356

Systematic review of applied usability metrics within usability evaluation methods for hospital electronic healthcare record systems: Metrics and Evaluation Methods for eHealth Systems.

Marta Weronika Wronikowska1, James Malycha1,2, Lauren J Morgan3, Verity Westgate1, Tatjana Petrinic4, J Duncan Young1, Peter J Watkinson1.   

Abstract

BACKGROUND AND OBJECTIVES: Electronic healthcare records have become central to patient care. Evaluation of new systems include a variety of usability evaluation methods or usability metrics (often referred to interchangeably as usability components or usability attributes). This study reviews the breadth of usability evaluation methods, metrics, and associated measurement techniques that have been reported to assess systems designed for hospital staff to assess inpatient clinical condition.
METHODS: Following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) methodology, we searched Medline, EMBASE, CINAHL, Cochrane Database of Systematic Reviews, and Open Grey from 1986 to 2019. For included studies, we recorded usability evaluation methods or usability metrics as appropriate, and any measurement techniques applied to illustrate these. We classified and described all usability evaluation methods, usability metrics, and measurement techniques. Study quality was evaluated using a modified Downs and Black checklist.
RESULTS: The search identified 1336 studies. After abstract screening, 130 full texts were reviewed. In the 51 included studies 11 distinct usability evaluation methods were identified. Within these usability evaluation methods, seven usability metrics were reported. The most common metrics were ISO9241-11 and Nielsen's components. An additional "usefulness" metric was reported in almost 40% of included studies. We identified 70 measurement techniques used to evaluate systems. Overall study quality was reflected in a mean modified Downs and Black checklist score of 6.8/10 (range 1-9) 33% studies classified as "high-quality" (scoring eight or higher), 51% studies "moderate-quality" (scoring 6-7), and the remaining 16% (scoring below five) were "low-quality."
CONCLUSION: There is little consistency within the field of electronic health record systems evaluation. This review highlights the variability within usability methods, metrics, and reporting. Standardized processes may improve evaluation and comparison electronic health record systems and improve their development and implementation.
© 2021 The Authors. Journal of Evaluation in Clinical Practice published by John Wiley & Sons Ltd.

Entities:  

Keywords:  electronic health records; electronic patients record (EPR); systematic review; usability methods; usability metrics

Mesh:

Year:  2021        PMID: 33982356      PMCID: PMC9438452          DOI: 10.1111/jep.13582

Source DB:  PubMed          Journal:  J Eval Clin Pract        ISSN: 1356-1294            Impact factor:   2.336


INTRODUCTION

Electronic health record (EHR) systems are real‐time records of patient‐centred clinical and administrative data that provide instant and secure information to authorized users. Well designed and implemented systems should facilitate timely clinical decision‐making. , However the prevalence of poorly performing systems suggest the common violation of usability principles. There are many methods to evaluate system usability. Usability evaluation methods cited in the literature include user trials, questionnaires, interviews, heuristic evaluation and cognitive walkthrough. , , , There are no standard criteria to compare results from these different methods and no single method identifies all (or even most) potential problems. Previous studies have focused on usability definitions and attributes. , , , , , Systematic reviews in this field often present a list of usability evaluation methods and usability metrics with additional information on the barriers and/or facilitators to system implementation. , However many of these are restricted to a single geographical region, type of illness, health area, or age group. The lack of consensus on which methods to use when evaluating usability may explain the inconsistent approaches demonstrated in the literature. Recommendations exist , , but none contain guidance on the use, interpretation and interrelationship of usability evaluation methods, usability metrics and the varied measurement techniques applied to assess EHR systems used by clinical staff. These are a specific group of end‐users whose system‐based decisions have a direct impact on patient safety and health outcomes. The objective of this systematic review was to identify and characterize usability metrics (and their measurement techniques) within usability evaluation methods applied to assess medical systems, used exclusively by hospital based clinical staff, for individual patient care. For this study, all components in the included studies have been identified as “metrics” to facilitate comparison of methods when testing and reporting EHR systems development. In such cases, Nielsen's satisfaction attribute is equivalent to the ISO usability component of satisfaction.

METHODS

This systematic review was registered with PROSPERO (registration number CRD42016041604). During the literature search and initial analysis phase, we decided to focus on the methods used to assess graphical user interfaces (GUIs) designed to support medical decision‐making rather than visual design features. We have changed the title of the review to reflect this decision. We followed the Preferred Reporting Items for Systematic Reviews and Meta‐Analyses (PRISMA) guidelines (Appendix Table S1).

Eligibility criteria

Included studies evaluated electronic systems; medical devices used exclusively by hospital staff (defined as doctors, nurses, allied health professionals, or hospital operational staff) and presented individual patient data for review. Excluded studies evaluated systems operating in nonmedical environments, systems that presented aggregate data (rather than individual patient data) and those not intended for use by clinical staff. Results from other systematic or narrative reviews were also excluded.

Search criteria

The literature search was carried out by TP using Medline, EMBASE, CINAHL, Cochrane Database of Systematic Reviews, and Open Grey bibliographic databases for studies published between January 1986 and November 2019. The strategy combined the following search terms and their synonyms: usability assessment, EHR, and user interface. Language restrictions were not applied. The reference lists of all included studies were checked for further relevant studies. Appendix Table S2 presents the full Medline search strategy.

Study selection and analysis

The systematic review was organized using Covidence systematic review management software (Veritas Health Innovation Ltd, Melbourne). Two authors (MW, VW) independently reviewed all search result titles and abstracts. The full text studies were then screened independently (MW, VW). Any discrepancies between the authors regarding the selection of the articles were reviewed by a third party (JM) and a consensus was reached in a joint session.

Data extraction

We planned to extract the following data:We extracted data into two stages. Stage 1 relied on the extraction of general data from each of the studies that met our primary criteria based the original data extraction form. Stage 2 extended the extraction to gain more specific information such as the measurement techniques for each identified metric as we observed that these were reported in different ways. Demographics (authors, title, journal, publication date, country). Characteristics of the end‐users. Type of medical data included in EHR systems. Usability evaluation methods and their types, such as: questionnaires or surveys, user trials, interviews, heuristic evaluation. Usability metrics (components variously defined as attributes, criteria, or metrics ). For the purpose of this review, we adopted the term “metric” to describe any such component) but we include all metric‐similar terms used by authors in included studies: satisfaction, efficiency, effectiveness metrics, learnability, memorability, errors components, Types and frequency of usability metric analysed within usability evaluation methods. The extracted data was assessed for agreement reaching the goal of >95%. All uncertainties regarding data extraction were resolved by discussion among the authors.

Quality assessment

We used two checklists to evaluate quality of included studies. First used tool, the Downs & Black (D&B) Checklist for the Assessment of Methodological Quality contains 27 questions, covering the following domains: reporting quality (10 items), external validity (three items), bias (seven items), confounding (six items) and power (one item). It is widely used for clinical systematic reviews because it is validated to assess randomized controlled trials, observational and cohort studies. However, many of the D&B checklist questions have little or no relevance to studies evaluating EHR systems, particularly because EHR systems are not classified as “interventions.” Due to this fact, we modified D&B checklist to have usability‐oriented tool. The purpose of our modified D&B checklist, constructed of 10 questions, was quality assessment of the aim of the study (specific to usability evaluation methods) evidence that included methods and metrics were supported by peer reviewed literature. Our modified D&B checklist investigated whether the participants of the study were clearly described and representative of the eventual (intended) end‐users, the time period over which the study was undertaken being clearly described and the results reflected the methods and described appropriately. The modified D&B checklist is summarized in the appendix (Appendix Table S3). Using this checklist, we defined “high quality” studies as those which scored well in each of the domains (scores ≥ eight). Those studies, which scored in most but not all domains were defined as “moderate quality” (scores of six and seven). The remainder were defined as “low quality” (scores of five and below). We decided to not exclude any paper due to low quality.

RESULTS

We followed the PRISMA guidelines for this systematic review (Appendix Table S1). The search generated 2231 candidate studies. After the removal of duplicates, 1336 abstracts remained (Figure 1). From these, 130 full texts were reviewed, with 51 studies eventually being included. All included studies were published between 2001 and 2019. Of the included studies, 86% were tested on clinical staff, 6% on usability experts and 8% on both clinical staff and usability experts. The characteristics of the included studies are summarized in Table 1.
FIGURE 1

Study selection process: PRISMA flow diagram

TABLE 1

Details of included studies

RefAuthorYearCountryParticipantsNumberSystem type
35Aakre et al.2017USAInternal Medicine Residents, Resident, Fellows, Attending Physicians26EHR with SOFA a score calculator
36Abdel‐Rahman2016USAPhysicians, Residents, Nurses, Pharmacologists, Pharmacists, Administrators28EHR with the addition of a medication display
37Al Ghalayini, Antoun, Moacdich2018LebanonFamily Medicine Residents13EHR evaluation
38Allen et al.2006USA“Experts” experienced in usability testing4EHR evaluation
39Belden et al.2017USAPrimary Care Physicians16Electronic clinical notes
40Brown et al.2001USANurses10Electronic clinical notes
41Brown et al.2016UKHealth Information System Evaluators8Electronic quality‐improvement tool
42Brown et al.2018UKPrimary Care Physicians7Electronic quality‐improvement tool
43Chang et al.2011USANurses, Home Aides, Physicians, Research Assistants60EHR on mobile devices
44Chang et al.2017TaiwanMedical Students, Physician Assistant Students132EHR with the addition of a medication display
45Devine et al.2014USACardiologists, Oncologists10EHR with clinical decision support tool
46Fidler et al.2015USACritical Care Physicians, Nurses10Monitoring – physiology (for patients with arrhythmias)
47Forsman et al.2013SwedenSpecialists Physicians, Resident Physicians, Usability Experts12EHR evaluation
48Fossum et al.2011NorwayRegistered Nurses25EHR with clinical decision support tool
49Gardner et al.2017USAStaff Physicians, Fellows, Medical Resident, Nurse Practitioners, Physician Assistant14Monitoring – physiology (for patients with heart failure)
50Garvin et al.2019USAGastroenterology Fellows, Internal Medicine Resident, Interns20EHR with clinical decision support tool for patients with cirrhosis
51Glaser et al.2013USAUndergraduates, Physicians, Registered Nurses18EHR with the addition of a medication display
52Graber et al.2015IranPhysicians32EHR with the addition of a medication display
53Hirsch et al.2012GermanyPhysicians29EHR with clinical decision support tool
54Hirsch et al.2015USAInternal Medicine Residents, Nephrology Fellows12EHR evaluation
55Hortman, Thompson2005USAFaculty Members, Student Nurse5Electronic outcome database display
56Hultman et al.2016USAResident Physicians8EHR on mobile devices
57Iadanza et al.2019ItalyAn evaluator1EHR with ophthalmological pupillometry display
58Jaspers et al.2008NetherlandsClinicians116EHR evaluation
59Kersting, Weltermann2019GermanyGeneral Practitioners, Practice Assistants18EHR for supporting longitudinal care management of multimorbid seniors
60Khairat et al.2019USAICU Physicians (Attending Physicians, Fellows, Residents)25EHR evaluation
61Khajouei et al.2017IranNurses269Electronic clinical notes
62King et al.2015USAIntensive Care Physicians4EHR evaluation
63Koopman, Kochendorfen, Moore2011USAPrimary Care Physicians10EHR with clinical decision support tool for diabetes
64Laursen et al.2018DenmarkHuman Computer Interaction Experts, Dialysis Nurses and Nephrologist8EHR with clinical decision support tool for patients of need of haemodialysis therapy
65Lee et al.2017South KoreaProfessors, Fellows, Residents, Head Nurses, Nurses383EHR evaluation
66Lin et al.2017CanadaPhysicians, Nurses, Respiratory Therapists22EHR evaluation
67Mazur et al.2019USAResidents and Fellows (Internal Medicine, Family Medicine, Paediatrics Specialty, Surgery, Other)38EHR evaluation
68Nabovati et al.2014IranEvaluators3EHR evaluation
69Nair et al.2015CanadaFamily Physicians, Nurse Practitioners, Family Medicine Residents13EHR with clinical decision support tool for chronic pain
70Neri et al.2012USAGenetic Counsellors, Nurses, Physicians7Electronic genetic profile display
71Nouei et al.2015IranSurgeons, Assistants, Other Surgery Students (Residents Or Fellowship)unknownEHR evaluation within theatres
72Pamplin et al.2019USAPhysicians, Nurses, Respiratory Therapists41EHR evaluation
73Rodriguez et al.2002USA, Puerto RicoInternal Medicine Resident Physicians36EHR evaluation
74Schall et al.2015FranceGeneral Practitioners, Pharmacists, NonClinician E‐Health Informatics Specialists, Engineers.12EHR with clinical decision support tool
75Seroussi et al.2017USANurses, Physicians7EHR evaluation
76Silveira et al.2019BrasilCardiologists and Primary Care Physicians15EHR with clinical decision support tool for patients with hypertension
77Su et al.2012TaiwanStudent Nurses12EHR evaluation
78Tappan et al.2009CanadaAnaesthesiologists, Anaesthesia Residents22EHR evaluation within theatres
79Van Engen‐Verheul et al2016NetherlandsNurses, Social Worker, Medical Secretary, Physiotherapist9EHR evaluation
80Wachter et al.2003USAAnaesthesiologists, Nurse Anaesthetists, Residents, Medical Students46Electronic pulmonary investigation results display
81Wu et al.2009CanadaFamily Physicians, Internal Medicine Physician9EHR on mobile devices
82Zhang et al.2009USAPhysicians, Health Informatics Professionals8EHR evaluation
83Zhang et al.2013USAPhysicians, Health Informatics ProfessionalsunknownEHR evaluation
84Zheng et al.2007USAActive Resident Users, Internal Medicine Residents30EHR with clinical reminders
85Zheng et al.2009USAResidents30EHR with clinical reminders

Sequential Organ Failure Assessment.

Study selection process: PRISMA flow diagram Details of included studies Sequential Organ Failure Assessment. Of the included studies, 16 evaluated generic EHR systems. Eleven evaluated EHR decision support tools (four for all ward patients, one for patients with diabetes, one for patients with chronic pain, one for patients with cirrhosis, one for patients requiring haemodialysis therapy, one for patients with hypertension, one for cardiac rehabilitation and one for management of hypertension, type‐2 diabetes and dyslipidaemia). Seven evaluated specific electronic displays (physiological data for patients with heart failure, arrhythmias, also genetic profiles, an electronic outcomes database, longitudinal care management of multimorbid seniors, chromatic pupillometry data, and pulmonary investigation results). Four studies evaluated medication specific interfaces. Three evaluated electronic displays for patients' clinical notes. Three studies each evaluated mobile EHR systems. Two evaluated EHR systems with clinical reminders. Two evaluated quality improvement tools. Two evaluated systems for use in the operating theatre environment and one study evaluated a sequential organ failure assessment score calculator to quantify the risk of sepsis. We extracted data on GUIs. All articles provided some description of GUIs, but these were often incomplete, or were a single screenshot. It was not possible to extract further useful information on GUIs. Appendix Table S4 presents the specification of type of data included in EHR systems.

Usability evaluation methods

Ten types of methods to evaluate usability were used in the 51 studies that were included in this review. These are summarized in Table 2. We categorized the 10 methods into broader groups: user trials analysis, heuristic evaluations, interviews and questionnaires. Most authors applied more than one method to evaluate electronic systems. User trials were the most common method reported, used in 44 studies (86%). Questionnaires were used in 40 studies (78%). Heuristic evaluation was used in seven studies (14%) and interviews were used in 10 studies (20%). We categorized thinking aloud, observation, a three‐step testing protocol, comparative usability testing, functional analysis and sequential pattern analysis as user trials analysis. Types of usability evaluation methods are described in Table 3.
TABLE 2

Usability evaluation methods

User trial analysis
RefUser trialThinking aloudObservationComparative usability testingA three step testing protocolFunctional analysisSequential pattern analysisCognitive walkthroughHeuristic evaluationQuestionnaire / SurveysInterview
35 * * *
36 * * * * *
37 * * * *
38 *
39***
40 * *
41 * *
42 * * * *
43 * *
44 * *
45 * * * *
46 * * *
47 * * * * *
48 * * * * * *
49 * * *
50 * * *
51 * *
52 * *
53 * *
54 * * *
55 * * * *
56 * * *
57 *
58 * *
59 * * * *
60 * * *
61 * *
62 * * * *
63 * * * *
64 * * *
65 * *
66 * * * *
67 * *
68 *
69 * * * *
70 * * * *
71 * * *
72 * * * *
73 * *
74 * * *
75 * * *
76 * *
77 * * *
78 * * *
79 * * *
80 * * *
81 * * *
82 *
83 * *
84 *
85 *
N 44 23 13 1 1 1 2 3 7 40 10
% 86 45 25 2 2 2 4 6 14 78 20
TABLE 3

Description of the methods included as User Trials Analysis

MethodDescriptionReferences
User trialA process through which end‐users (or potential end‐users) complete tasks using the system under evaluation. Every participant should be aware of the purpose of the system and analysis. According to Neville et al. 35 participants should be “walked through” through the task under analysis One of the main objectives for a user trial is to collect observation data but sometimes information comes from the post‐test interviews or questionnaires.

35

Studies using user trials are indicated in Table 2

Thinking aloudVerbal reporting method that generates information on the cognitive processes of the user during task performance. The user must verbalize their thoughts as they interact with the interface 36,37,38,39,40,41,42,43,44,45,46,47, 48, 49, 50, 51
ObservationDirect and remote observation of users interacting with the system 52
Comparative Usability TestingExamines the time to acquire information and accuracy of information 53
Three Step Testing ProtocolTests for intuitiveness within the system. Step one asks users to identify relevant features within the interface. Step two requires users to connect the clinical variables of interest. Step three asks users to diagnose clinical events based on the emergent features of the display 54
Functional AnalysisMeasures “functions” within the EHR and classifies them into either Operations or Objects. Operations are then sub classified into Domains or Overheads. 55, 56
Sequential Pattern AnalysisSearches for recurring patterns within a large number of event sequences. Designed to show “combinations of events” appearing consistently, in chronological order and then in a recurring fashion. 57, 58
Cognitive WalkthroughWalkthrough of a scenario with execution of actions that could take place during completion of the task completion with expression of comments about use of the interface. It measures ease of learning for new users. 8, 36, 39, 59, 60, 61, 62
Heuristic evaluationMethod that helps to identify usability problems using a checklist related to heuristics. Types of HE methods are reported in Appendix Table S5. 7,37, 49, 59, 63, 64, 65, 66,67, 68
Questionnaire/ SurveyResearch instrument used for collecting data from selected group of respondents. The questionnaires used in studies included in this review are summarized in Appendix Table S7.Appendix Table S7
Interview

Structured research method, which may be applied before the user‐trial, in the middle of user trials or after the user trial.

We identified six types of interviews (follow‐up, unstructured, prestructured, semi‐structured, contextual and post‐test interviews), described in Appendix Table S6.

The purpose of interviews (unstructured, follow‐up and semi‐structured), applied before the user trial, was understanding the end‐users' needs, their environment, information/communication flow and identification of possible changes, which could improve the process/workflow.

The goal of interviews (contextual), applied during user trial, was end‐users observation while they work to collect information about potential utility of systems.

The purpose of interviews (prestructured, posttest, semi‐structured), applied after the user trial, was mainly gathering information about missing data, system's weaknesses, opportunities for improvements and users' expectations toward further system's development.

38, 39, 42, 43, 69, 70, 71, 72, 73, 74
Usability evaluation methods Description of the methods included as User Trials Analysis 35 Studies using user trials are indicated in Table 2 Structured research method, which may be applied before the user‐trial, in the middle of user trials or after the user trial. We identified six types of interviews (follow‐up, unstructured, prestructured, semi‐structured, contextual and post‐test interviews), described in Appendix Table S6. The purpose of interviews (unstructured, follow‐up and semi‐structured), applied before the user trial, was understanding the end‐users' needs, their environment, information/communication flow and identification of possible changes, which could improve the process/workflow. The goal of interviews (contextual), applied during user trial, was end‐users observation while they work to collect information about potential utility of systems. The purpose of interviews (prestructured, posttest, semi‐structured), applied after the user trial, was mainly gathering information about missing data, system's weaknesses, opportunities for improvements and users' expectations toward further system's development. Three heuristic evaluation methods were used in seven of the included studies. Four studies used the method described by Zhang et al. One study, despite application of this method, also used the seven clinical knowledge heuristics outlined by Devine et al. The three remaining studies used the heuristic checklist introduced by Nielsen. , The severity rate scale was sometimes used to judge the importance or severity of usability problems. Findings from heuristics analyses are summarized in Appendix Table S5. Six types of interviews were used in 10 (20%) studies. The interviews were carried out before the user trial, in the middle of user trial or after the user trial. The purpose of interviews (unstructured, follow‐up, and semi‐structured ) before the user trial was to understand the end‐users' needs, their environment, information/communication flow, and identification of possible changes. The purpose of interviews (contextual ) during the user trial was observation by the end‐users while using the system to collect information about potential system utility. The purpose of interviews following the user trial (prestructured, posttest, semi‐structured , , , , , [one called in‐depth debriefing semi‐structured interview ]) was mainly gathering information about missing data, system's weaknesses, opportunities for improvements, and users' expectations toward further system development. Findings from interviews are summarized in Appendix Table S6. Among the questionnaires, the System Usability Scale (SUS) was used in 16 studies, the Post‐Study System Usability Questionnaire (PSSUQ) was used in five studies, the Questionnaire of User Interaction Satisfaction (QUIS) was used in four studies, the Computer Usability Satisfaction Questionnaire (CSUQ) was used three times and the NASA‐Task Load (NASA‐TLX) was used in six studies. The questionnaires used in studies included in this review are summarized in Appendix Table S7.

Usability metrics

The usability metrics are summarized in Table 4. Satisfaction was measured in 38 studies (75%), efficiency was measured in 32 studies (63%), effectiveness was measured in 31 studies (61%), learnability was measured in 12 studies (24%), errors was measured in 16 studies (31%), memorability was measured in one study (2%) and usefulness metric that was measured in 20 studies (39%).
TABLE 4

Usability metrics

RefSatisfactionEfficiencyEffective‐nessLearn‐abilityMemor‐abilityErrorsUseful‐nessTotal
77 *1
36 *****5
78 ****4
63 *1
53 ***3
79 ***3
59 ***3
69 *****5
80 ****4
81 ****4
37 *****5
82 ****4
38 ****4
39 ****4
83 ***3
70 *****5
84 ***3
85 *****5
86 **2
40 ***3
87 ****4
41 ***3
64 *1
60 ***3
71 0
72 ***3
88 *1
42 *1
43 ****4
65 0
89 ***3
44 *****5
90 ***3
66 **2
45 **2
46 *****5
73 **2
74 ***3
91 ****4
48 ******6
92 **2
93 **2
47 ****4
94 ****3
49 ***3
54 *1
50 ****4
55 ****4
95 0
57 0
58 0
Total 38 32 31 12 1 16 20
% 75 63 61 24 2 31 39
Usability metrics

usability metrics within usability evaluation methods

Table 5 summarizes the variety of usability evaluation methods used to quantify the different metrics. Some authors used more than one method within the same study (e.g., user trial and a questionnaire) to assess the same metric.
TABLE 5

Usability metrics and the usability methods used to measure them. Values are the number of studies

User TrialsHeuristic EvaluationInterviewsQuestionnaires
Satisfaction101231
Efficiency29002
Effectiveness29002
Learnability40010
Memorability0100
Errors11511
Usefulness50411
Usability metrics and the usability methods used to measure them. Values are the number of studies Satisfaction and errors: These were assessed using all four categories of usability evaluation methods. Satisfaction (analysed in 38 studies) was measured using questionnaires (in 31 studies), user trials (in 10 studies), interviews (in two studies) and heuristic evaluation (in one study). The most frequently reported metrics of user trials were efficiency and effectiveness (both used in 29 studies). For heuristic evaluation it was errors, for interviews' it was usefulness (in four studies)and for questionnaires it was satisfaction (in 31 studies) and usefulness (in 11 studies). Results were reported in different ways regardless of types of usability evaluation methods or types of usability metric applied, so we created a list of measurement techniques.

Usability metrics' measurement techniques

We found that different measurement techniques (MT) were used to report the metrics. The number of measurement techniques used to report the identified usability metrics differed from 1 to 25 per single metric. Appendix Table S8 presents all types of measurement techniques applied for all identified metrics and how the measurement technique was used (e.g., within a user trial, survey/questionnaire, interview or heuristic evaluation). The greatest variety in usability metric reporting was found in the case of Nielsen's errors quality component (23 measurement techniques were used), ISO 9001 effectiveness (15 measurements techniques used) and our newly identified usefulness metric (12 measurement techniques used). User errors, reported using 23 different measurement techniques, were most often reported as the number of errors (n = 4) or percentage of errors made (n = 6). Authors sometimes provided contextual information about the type of errors (n = 5), or reason for errors (n = 1). These measurement techniques were investigated within user trials. The effectiveness metric was reported with 15 measurement techniques. The most frequent ones used were: number of successfully completed tasks (in eight studies), percentage of correct responses (in four studies) and the percentage of participants able to complete tasks (in three studies). Efficiency was mostly reported as time to complete tasks (n = 27). Sometimes this was reported as a comparator against an alternative system (n = 13). Task completion was also measured by number of clicks (n = 11). Five studies measured the number of clicks compared to a predetermined optimal path. In two cases the time of assessing the patient's state was also measured. Satisfaction was reported by eight measurement techniques. This was most frequently by questionnaire results (n = 31), by general user comments related to the system satisfaction (n = 10), by recording the number of positive comments (n = 4) or the number of negative comments (n = 4) or users preferences across two tested system versions (n = 1). The usefulness metric was reported using 12 different measurement techniques. These included users' comments regarding the utility of the system in clinical practice (n = 5), comments about usefulness of layout (n = 1), average score of system usefulness (n = 5), and total mean scores for work system‐useful‐related dimensions (n = 1). Results for the quality assessment are summarized in the appendix (Appendix Table S9). We did not exclude articles due to poor quality. For the D&B quality assessment, the mean score (out of a possible 32 points) was 9.9 and the median and mode score were 10. The included studies scored best in the reporting domain, with seven out of the 10 questions generating points. Studies scored inconsistently (and generally poorly) in the bias and confounding domains and no study scored points in the power domain (Appendix Table S10). Using the Modified D&B checklist the mean score was 6.8 and the median was 7.0 out of a possible 10 points. Seventeen studies (33%) were classified as “high‐quality” (scoring eight or higher), 26 studies (51%) were “moderate‐quality” (scoring six or seven), and the remaining eight studies (16%) were “low‐quality” (scoring five or below). The relationship between the two versions of the D&B scores is shown in the appendix (Appendix Figure S1).

DISCUSSION

Main findings

This review demonstrates wide variability in both methodological approaches and study quality in the considerable amount of research undertaken to evaluate EHR systems. EHR systems, despite being expensive and complex to implement, are becoming increasingly important in patient care. Given the pragmatic, rather than experimental nature of EHR systems, it is not surprising that EHR systems evaluation requires an observational or case‐controlled study. Common methodological failings were unreferenced and incorrectly named usability evaluation methods, discrepancies between study aims, methods and results (e.g., authors did not indicate their intention to measure certain metrics and then subsequently reported these metrics in the results or described the usability evaluation methods in method section but did not present the results). In the future, well‐conducted EHR system evaluation requires established human‐factor engineering driven evaluation methods. These need to include clear descriptions of study aims, methods, users and time‐frames. The Medicines and Healthcare Regulation Authority (MHRA) requires this process for medical devices and it is logical that a comparable level of uniform evaluation may benefit EHRs.

Strengths

We have summarized the usability evaluation methods, metrics, and measurement techniques used in studies evaluating EHR systems. To our knowledge this has not been done before. Our results' tables may therefore be used as a goal‐oriented matrix, which may guide those requiring a usability evaluation method, usability metric, or combination of each, when attempting to study a newly implemented electronic system in the healthcare environment. We identified usefulness as a novel metric, which we believe has the potential to enhance healthcare system testing. Our modified D&B quality assessment checklist was not validated but has the potential to be developed into a tool better suited to assessing studies that evaluate medical systems. By highlighting the methodological inconsistencies presented by researchers in this field we hope to improve the quality of research in the field, which may in turn lead to better systems being implemented in clinical practice.

Limitations

The limitations of the included studies were reflected in the quality assessment: none of the included studies scored >41% in the original D&B checklist, which is indicative of poor overall methodological quality. Results from the modified D&B quality assessment scale, offered by our team, were better but still showed over half the studies were of low or medium quality. A significant proportion of the current research into EHR systems usability has been conducted by commercial, nonacademic entities. These groups have little financial incentive to publish their work unless the results are favourable, so although this review may reflect publication bias, it is unlikely to reflect all current practices. It was sometimes difficult to extract data on the methods used in studies included in this review. This may reflect a lack of consensus on how to conduct studies of this nature, or a systematic lack of rigour in this field of research.

CONCLUSION

To our knowledge, this systematic review is the first to consolidate applied usability metrics (with their specifications) within usability evaluation methods to assess the usability of electronic health systems used exclusively by clinical staff. This review highlights the lack of consensus on methods to evaluate EHR systems' usability. It is possible that healthcare work efficiencies are hindered by the resultant inconsistencies. The use of multiple metrics and the variation in the ways they are measured, may lead to flawed evaluation of systems. This in turn may lead to the development and implementation of less safe and effective digital platforms. We suggest that the main usability metrics as defined by ISO 9241‐1 (efficiency, effectiveness, and satisfaction) used in combination with usefulness, may form part of an optimized method for the evaluation of electronic health systems used by clinical staff. Assessing satisfaction via reporting the users positive and negative comments; assessing efficiency via time to task completion and time taken to assess the patient state; assessing effectiveness via number/percentage of completed tasks and quantifying user errors; and assessing usefulness via user trial with think‐aloud methods, may also form part of an optimized approach to usability evaluation. Our review supports the concept that high performing electronic health systems for clinical use should allow successful (effective) and quick (efficient) task completion with high satisfaction levels and they should be evaluated against these expectations using established and consistent methods. Usefulness may also form part of this methodology in the future.

CONFLICT OF INTEREST

The authors declare that they have no competing interests.

ETHICS STATEMENT

Ethical approval is not required for the study.

AUTHORS' CONTRIBUTIONS

MW, LM and PW designed the study, undertook the methodological planning and led the writing. TP advised on search strategy and enabled exporting of results. JM, VW and DY assisted in study design, contributed to data interpretation, and commented on successive drafts of the manuscript. All authors read and approved the final manuscript. Appendix Figure S1 Comparative performance of Downs & Black and Modified Downs & Black Quality Assessment checklists x‐axis: reference number y‐axis: score (%) of each checklist Click here for additional data file. Appendix Table S1 Preferred Reporting Items for Systematic Review and Meta‐Analysis (PRISMA) From: Moher D, Liberati A, Tetzlaff J, Altman DG, The PRISMA Group (2009). Preferred Reporting Items for Systematic Reviews and Meta‐Analyses: The PRISMA Statement. PLoS Med 6 (7): e1000097. doi:10.1371/journal.pmed1000097 For more information, visit: www.prisma-statement.org. Click here for additional data file. Appendix Table S2 Search Strategy Click here for additional data file. Appendix Table S3 Modified Downs & Black Quality Assessment Checklist Click here for additional data file. Appendix Table S4 Information on GUI: type of data included in electronic health record systems Click here for additional data file. Appendix Table S5 Heuristic evaluation Click here for additional data file. Appendix Table S6 Interviews Click here for additional data file. Appendix Table S7 SUS = System Usability Scale, PSSUQ = Post‐Study System Usability Questionnaire, QUIS = User Interaction Satisfaction Questionnaire, CSUQ = Computer Usability Satisfaction Questionnaire, SEQ = a Single Ease Question, OAIQ = Object‐Action Interface Questionnaire, QQ = Qualitative Questionnaire, USQ = User Satisfaction Questionnaire, SUSQ = Subjective User Satisfaction Questionnaire, TAM = TAM, PTSQ = Post‐Task Satisfaction Questionnaire, PTQ = Post‐Test Questionnaire, UQ = Usability Questionnaire, UEQ = Usability Evaluation Questionnaire, PQ = Physician's Questionnaire, TPBT = Three paper‐based tests, 10 item SQ = 10‐item Satisfaction Questionnaire, NASA = NASA Task Load Index, PUS ‐ Perceived Usability Scale, CQ = Clinical Questionnaire, Lee et al Quest = Questionnaire without name in Lee et al. 2017, 2sets of quest = Two sets of questionnaires in Zheng et al. 2013, InterRAI MDS‐HC 2.0 = InterRAI MDS‐HC 2.0, EHRUS = the Electronic Health Record Usability Scale, SAQ = self‐administered questionnaire, PVAS = post‐validation assessment survey, 5pS ‐usability score ‐ 5‐point scale, CSS = The Crew Status Survey Click here for additional data file. Appendix Table S8 How usability metrics results were reported ‐ with given number of studies, which used the selected measurement techniques S/Q ‐ Survey/Questionnaire, UT ‐ User Trial, CW‐HE ‐ Cognitive Walkthrough, I ‐ Interview, Click here for additional data file. Appendix Table S9 Quality Assessment results (in %) using the Downs & Black checklists Click here for additional data file. Appendix Table S10 Domains within the Downs & Black Checklist The % score of each included study for each domain. Click here for additional data file.
  68 in total

1.  The employment of an iterative design process to develop a pulmonary graphical display.

Authors:  S Blake Wachter; Jim Agutter; Noah Syroid; Frank Drews; Matthew B Weinger; Dwayne Westenskow
Journal:  J Am Med Inform Assoc       Date:  2003-03-28       Impact factor: 4.497

2.  A Clinical Decision Support System for Chronic Pain Management in Primary Care: Usability testing and its relevance.

Authors:  Kalpana Maria Nair; Raheleh Malaeekeh; Inge Schabort; Paul Taenzer; Arun Radhakrishnan; Dale Guenter
Journal:  J Innov Health Inform       Date:  2015-08-13

3.  Heuristic evaluation of paper-based Web pages: a simplified inspection usability methodology.

Authors:  Mureen Allen; Leanne M Currie; Suzanne Bakken; Vimla L Patel; James J Cimino
Journal:  J Biomed Inform       Date:  2005-11-15       Impact factor: 6.317

4.  Evaluating Nurses' Satisfaction With Two Nursing Information Systems.

Authors:  Reza Khajouei; Reza Abbasi
Journal:  Comput Inform Nurs       Date:  2017-06       Impact factor: 1.985

5.  Dynamic Electronic Health Record Note Prototype: Seeing More by Showing Less.

Authors:  Jeffery L Belden; Richelle J Koopman; Sonal J Patil; Nathan J Lowrance; Gregory F Petroski; Jamie B Smith
Journal:  J Am Board Fam Med       Date:  2017 Nov-Dec       Impact factor: 2.657

6.  A diabetes dashboard and physician efficiency and accuracy in accessing data needed for high-quality diabetes care.

Authors:  Richelle J Koopman; Karl M Kochendorfer; Joi L Moore; David R Mehr; Douglas S Wakefield; Borchuluun Yadamsuren; Jared S Coberly; Robin L Kruse; Bonnie J Wakefield; Jeffery L Belden
Journal:  Ann Fam Med       Date:  2011 Sep-Oct       Impact factor: 5.166

7.  A Decision Support Tool for Healthcare Professionals in the Management of Hyperphosphatemia in Hemodialysis.

Authors:  Sisse H Laursen; Amanda A Buus; Lisbet Brandi; Peter Vestergaard; Ole K Hejlesen
Journal:  Stud Health Technol Inform       Date:  2018

8.  Improving Clinician Decisions and Communication in Critical Care Using Novel Information Technology.

Authors:  Jeremy Pamplin; Christopher P Nemeth; Maria L Serio-Melvin; Sarah J Murray; Gregory T Rule; Elizabeth S Veinott; Sena R Veazey; Anthony J Hamilton; Craig A Fenrich; Dawn E Laufersweiler; Jose Salinas
Journal:  Mil Med       Date:  2020-02-12       Impact factor: 1.437

Review 9.  Adoption factors associated with electronic health record among long-term care facilities: a systematic review.

Authors:  Clemens Scott Kruse; Michael Mileski; Vyachelslav Alaytsev; Elizabeth Carol; Ariana Williams
Journal:  BMJ Open       Date:  2015-01-28       Impact factor: 2.692

Review 10.  The Usability of Electronic Medical Record Systems Implemented in Sub-Saharan Africa: A Literature Review of the Evidence.

Authors:  Michael Kavuma
Journal:  JMIR Hum Factors       Date:  2019-02-25
View more
  4 in total

Review 1.  Reporting guideline for the early-stage clinical evaluation of decision support systems driven by artificial intelligence: DECIDE-AI.

Authors:  Baptiste Vasey; Myura Nagendran; Bruce Campbell; David A Clifton; Gary S Collins; Spiros Denaxas; Alastair K Denniston; Livia Faes; Bart Geerts; Mudathir Ibrahim; Xiaoxuan Liu; Bilal A Mateen; Piyush Mathur; Melissa D McCradden; Lauren Morgan; Johan Ordish; Campbell Rogers; Suchi Saria; Daniel S W Ting; Peter Watkinson; Wim Weber; Peter Wheatstone; Peter McCulloch
Journal:  Nat Med       Date:  2022-05-18       Impact factor: 87.241

2.  Reporting guideline for the early stage clinical evaluation of decision support systems driven by artificial intelligence: DECIDE-AI.

Authors:  Baptiste Vasey; Myura Nagendran; Bruce Campbell; David A Clifton; Gary S Collins; Spiros Denaxas; Alastair K Denniston; Livia Faes; Bart Geerts; Mudathir Ibrahim; Xiaoxuan Liu; Bilal A Mateen; Piyush Mathur; Melissa D McCradden; Lauren Morgan; Johan Ordish; Campbell Rogers; Suchi Saria; Daniel S W Ting; Peter Watkinson; Wim Weber; Peter Wheatstone; Peter McCulloch
Journal:  BMJ       Date:  2022-05-18

Review 3.  Artificial Intelligence, Healthcare, Clinical Genomics, and Pharmacogenomics Approaches in Precision Medicine.

Authors:  Habiba Abdelhalim; Asude Berber; Mudassir Lodi; Rihi Jain; Achuth Nair; Anirudh Pappu; Kush Patel; Vignesh Venkat; Cynthia Venkatesan; Raghu Wable; Matthew Dinatale; Allyson Fu; Vikram Iyer; Ishan Kalove; Marc Kleyman; Joseph Koutsoutis; David Menna; Mayank Paliwal; Nishi Patel; Thirth Patel; Zara Rafique; Rothela Samadi; Roshan Varadhan; Shreyas Bolla; Sreya Vadapalli; Zeeshan Ahmed
Journal:  Front Genet       Date:  2022-07-06       Impact factor: 4.772

Review 4.  Systematic review of applied usability metrics within usability evaluation methods for hospital electronic healthcare record systems: Metrics and Evaluation Methods for eHealth Systems.

Authors:  Marta Weronika Wronikowska; James Malycha; Lauren J Morgan; Verity Westgate; Tatjana Petrinic; J Duncan Young; Peter J Watkinson
Journal:  J Eval Clin Pract       Date:  2021-05-13       Impact factor: 2.336

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.