Literature DB >> 32322351

Consensus-Based Expert Development of Critical Items for Direct Observation of Point-of-Care Ultrasound Skills.

Irene W Y Ma, Janeve Desy, Michael Y Woo, Andrew W Kirkpatrick, Vicki E Noble.

Abstract

BACKGROUND: Point-of-care ultrasound (POCUS) is increasingly used in a number of medical specialties. To support competency-based POCUS education, workplace-based assessments are essential.
OBJECTIVE: We developed a consensus-based assessment tool for POCUS skills and determined which items are critical for competence. We then performed standards setting to set cut scores for the tool.
METHODS: Using a modified Delphi technique, 25 experts voted on 32 items over 3 rounds between August and December 2016. Consensus was defined as agreement by at least 80% of the experts. Twelve experts then performed 3 rounds of a standards setting procedure in March 2017 to establish cut scores.
RESULTS: Experts reached consensus for 31 items to include in the tool. Experts reached consensus that 16 of those items were critically important. A final cut score for the tool was established at 65.2% (SD 17.0%). Cut scores for critical items are significantly higher than those for noncritical items (76.5% ± SD 12.4% versus 53.1% ± SD 12.2%, P < .0001).
CONCLUSIONS: We reached consensus on a 31-item workplace-based assessment tool for identifying competence in POCUS. Of those items, 16 were considered critically important. Their importance is further supported by higher cut scores compared with noncritical items. Accreditation Council for Graduate Medical Education 2020.

Entities: Chemical

Mesh：

Year: 2020 PMID： 32322351 PMCID： PMC7161337 DOI： 10.4300/JGME-D-19-00531.1

Source DB: PubMed Journal: J Grad Med Educ ISSN： 1949-8357

What was known and gap

Point-of-care ultrasound (POCUS) is increasingly used in a number of medical specialties. Workplace-based assessments are essential, and there is a need to establish what checklist items are critical when assessing POCUS skills.

What is new

A consensus-based assessment tool for POCUS skills was developed.

Limitations

The tool provides guidance on which assessment items are critically important; it does not specify to educators how a learner must successfully complete those items.

Bottom line

Consensus was reached on a 31-item workplace-based assessment tool for identifying competence in POCUS, with 16 items considered critically important.

Introduction

Point-of-care ultrasound (POCUS) is increasingly being integrated into patient care in many specialties, such as emergency medicine,1,2 critical care,3–5 anesthesiology,6–8 and internal medicine.9,10 To support competency-based education,11 training programs need to establish a programmatic approach to assessments.12 Recurrent workplace-based observations are essential to help trainees achieve competence and to support decision-making and judgments regarding their competence.13,14 To date, multiple assessment tools for POCUS skills have been published, with varying amounts of validity evidence to support the interpretation of scores.15–23 Assessment tools are primarily checklists, global rating scales, or a combination of both. While data suggested that reliability measures and sensitivity to expertise may be higher for global rating scales,24,25 in the hands of untrained raters, checklists may be easier and more intuitive to use.26,27 However, checklists risk “rewarding thoroughness,” allowing the successful completion of multiple trivial items while masking the commission of a single serious error.27–31 As such, there is a need to establish which checklist items are critical in POCUS, such that incompetent performances are appropriately identified. This study sought to develop a consensus-based assessment tool for POCUS skills and to determine which items are critical for competence.

Methods

Assessment Tool Construction

Draft assessment items were collated by 2 authors (I.W.Y.M. and V.E.N.) based on a review of the relevant literature regarding directly observed POCUS assessments.16,19,32–40 Items were then grouped according to key domains (introduction/patient interactions, use of the ultrasound machine, choice of scans, image acquisition, image interpretation, and clinical integration). For each item, respondents were asked its importance for inclusion into a rating tool, and whether learners must successfully complete that item to be considered competent in POCUS (yes, critical; no, noncritical). Importance was rated using a 3-point Likert scale (1, marginal; 2, important; 3, essential to include). This draft survey was then reviewed by all coauthors for item relevance and completeness. It was subsequently piloted for survey content, clarity, and flow on 5 faculty members who taught POCUS in an educational setting (1 emergency physician, 1 general internist, 2 surgeons, and 1 anatomist) and 2 postgraduate year-5 internal medicine residents who had completed 1 month of POCUS training. This piloted survey became the instrument used in the first round of the consensus process.

Consensus Process

Between August and December 2016, using a modified Delphi technique,41 we conducted 3 rounds of an online survey to establish consensus from an expert panel of diverse POCUS specialists and sought their input on the draft assessment items identified in the prior construction stage. Specifically, we sought to achieve consensus on which of the items should be included in a POCUS assessment tool and which items should be considered critical. The POCUS experts were identified using nonprobability convenience sampling based on international reputation and recruited via an e-mail invitation. Inclusion criteria included completion of at least 1 year of POCUS fellowship training and/or a minimum of 3 years of teaching POCUS. Consensus to include was defined as 80% or more experts agreeing that an item was essential or important to include in the tool, and consensus to exclude was 80% or more agreeing that the item was marginal. Similarly, consensus for a critical item was defined as 80% or more agreeing that the item must be successfully completed to be considered competent. Items for which the experts had not reached consensus but had ≥ 70% agreement were readdressed in subsequent rounds in which items were rated in a binary fashion (yes, should include, versus no, should not include).

Standards Setting

To set cut scores for the tool to distinguish between POCUS performances that are competent from performances that are not competent, we invited 12 experts to attend a 3-hour standards setting meeting on March 6, 2017, either in person or via teleconferencing. For this meeting, ≥ 50% of these subject matter experts had to have been new (ie, did not participate in the initial expert panel). At the start of the meeting, we oriented experts to the standards-setting task involved (modified, iterative Angoff method).42,43 Experts then discussed the behaviors of a borderline POCUS performance to establish a shared mental model of minimally competent performances, defined as those performed unsupervised and considered minimally acceptable. For each item, experts anonymously estimated the percentage of minimally competent POCUS learners who would complete the item successfully. In other words, on an item level, experts were asked to consider a group of 100 borderline learners and estimate how many would successfully complete the item. Experts were blinded to whether or not the item was previously determined by the consensus process to be critically important. Modification to the Angoff method included the use of an iterative process: items with large variances (SD ≥ 25%) were discussed and readdressed in subsequent rounds.44 We decided in advance that no more than 3 rounds of standards setting would be conducted. The final cut score for the entire tool was then derived from the mean of individual-item expert estimates. The final cut score for the critical items was derived from the mean critical-item expert estimates. This study was approved by the University of Calgary Conjoint Health Research Ethics Board.

Statistical Analysis

Standard descriptive statistics were used in this study. Comparisons of measures between groups were performed using Student's t tests. A 2-sided P value of .05 or less was considered to indicate statistical significance. All analyses were conducted using SAS version 9.4 (SAS Institute Inc, Cary, NC).

Results

Of the 27 experts invited to the panel, 25 (93%) agreed to participate. Their baseline characteristics are presented in Table 1.

Table 1

Baseline Characteristics of Expert Panels for Assessment Tool Construction and Standards Setting

Baseline Characteristics	Consensus-Based Tool Construction, n (%)	Standards Setting Process, n (%)
Total number of experts	25 (100)	12 (100)
Specialty^a
Cardiology	2 (8)	0 (0)
Critical care medicine	3 (12)	2 (17)
Emergency medicine	14 (56)	8 (67)
Internal medicine	8 (32)	2 (17)
Pediatric emergency medicine	1 (4)	0 (0)
Surgery	0 (0)	1 (8)
Gender
Male	20 (80)	7 (58)
Female	5 (20)	5 (42)
Location of practice
United States of America	18 (72)	8 (67)
California	3 (12)	0 (0)
Colorado	1 (4)	1 (8)
Maine	0 (0)	1 (8)
Massachusetts	3 (12)	2 (17)
Minnesota	2 (8)	1 (8)
North Carolina	1 (4)	0 (0)
New York	1 (4)	0 (0)
Ohio	1 (4)	2 (17)
Oregon	2 (8)	1 (8)
Pennsylvania	1 (4)	0 (0)
South Carolina	2 (8)	0 (0)
Texas	1 (4)	0 (0)
Canada	7 (28)	4 (33)
Alberta	0 (0)	1 (8)
British Columbia	2 (8)	0 (0)
New Brunswick	1 (4)	0 (0)
Ontario	4 (16)	3 (25)
Years of point-of-care ultrasound experience, y
3–4	1 (4)	0 (0)
5–6	3 (12)	2 (17)
7–8	2 (8)	2 (17)
9–10	3 (12)	2 (17)
More than 10	16 (64)	6 (50)
Completed ≥ 1 y of ultrasound fellowship training	16 (64)	9 (75)

Participants were allowed to choose more than 1 option.

Baseline Characteristics of Expert Panels for Assessment Tool Construction and Standards Setting Participants were allowed to choose more than 1 option.

Assessment Tool

All 25 experts (100%) completed round 1. Experts reached consensus for 31 of the 32 items (97%) for inclusion. The remaining item “Ensures machine charged when not in use” was readdressed in round 2. The experts reached consensus for 14 of the 32 items (44%) in round 1 as being critically important. The group also reached consensus for 2 additional items as not being critical (“Ensures machine charged when not in use” and “Scans with efficiency of hand motion”). Experts did not reach consensus for critical importance on the remaining 16 of 32 items (50%). Round 2 was completed by 24 of the experts (96%). For the item “Ensures machine charged when not in use,” only 10 of the 24 (42%) felt it should be included in the tool. That item was dropped and not considered further. In round 2, consensus was achieved on the critical importance of 1 of the 15 items (7%) that the group had not reached consensus on in round 1–20 of the 24 experts (83%) would fail the learner who does not “appropriately clean the machine and transducers.” The 2 items that had ≥ 70% agreement for being critical (“Able to undertake appropriate next steps in the setting of unexpected or incidental findings” and “Explains procedure—explain ultrasound, its role, and images—where applicable”) were readdressed in round 3. Round 3 was completed by 22 of the 25 experts (88%) who reached consensus on the item “Able to undertake appropriate next steps in the setting of unexpected or incidental findings” as being critically important (18 of 22, 82%). The group did not achieve consensus on the item “Explains procedure—explain ultrasound, its role, and images—where applicable” (16 of 22, 73%). The final 31 items included into the assessment tool and the 16 determined to be critical are listed in Table 2.

Table 2

Final 31-Item Assessment Tool: Critical Items and Established Cut Scores

Item	Critical Items^a	Expert Estimate %^b (SD)
Introduction
1. Introduces self where applicable (ie, if not already known to patient, patient not critically ill)		72.8 (20.4)
2. Explains procedure (explains ultrasound, its role, and images) where applicable (ie, patient not critically ill)		74.2 (16.1)
3. Washes hands		49.0 (17.8)
4. Ensures patient appropriately and discreetly exposed		55.3 (22.1)
5. Explains ultrasound findings appropriately (even if unsure of results), where applicable	Yes (1)	74.6 (18.1)
Appropriate use of the machine
6. Appropriately positions the machine		54.3 (19.6)
7. Appropriately applies basic knobology (eg, on/off, depth, gain)	Yes (1)	86.7 (14.8)
8. Appropriately uses examination presets		52.5 (24.8)
9. Chooses correct transducer	Yes (1)	90.0 (14.1)
10. Appropriately enters patient identifier		43.2 (15.7)
11. Able to store relevant images and clips	Yes (1)	61.3 (21.5)
12. Appropriately cleans machine and transducers	Yes (2)	42.1 (16.3)
13. Able to ensure safety of transducers (eg, not dropping transducers)		42.8 (24.1)
Choice of scans based on clinical relevance
14. Conducts the appropriate types of scans	Yes (1)	80.8 (14.0)
15. Conducts scans in the appropriate prioritization/sequence		64.1 (23.2)
16. Applies appropriate clinical reasoning in choice of scans	Yes (1)	70.1 (10.2)
Image acquisition
17. Attains minimal criteria	Yes (1)	84.2 (16.1)
18. Positions patient appropriately for specific scans		60.1 (18.6)
19. Scans with adequate transducer pressure		56.5 (19.0)
20. Scans adequately through the entire area of interest	Yes (1)	78.8 (19.8)
21. Able to optimize image appropriately when necessary		42.1 (17.6)
22. Adjusts focal zone appropriately (where relevant and available)		32.5 (18.0)
23. Scans with efficiency of hand motion		37.8 (20.6)
Image interpretation
24. Able to recognize key findings	Yes (1)	88.3 (11.1)
25. Able to recognize when images are inadequate/insufficient for a given indication	Yes (1)	87.1 (20.5)
26. Recognizes relevant artifacts	Yes (1)	68.3 (19.1)
Scan integration/clinical decision making
27. Able to determine when and what additional imaging studies/investigations are necessary	Yes (1)	82.2 (17.4)
28. Able to appropriately determine patient disposition based on ultrasound findings	Yes (1)	79.2 (16.9)
29. Able to appropriately incorporate test characteristics (eg, sensitivity/specificity/likelihood ratios) into clinical decision making		60.0 (17.5)
30. Able to appropriately manage unexpected or unknown findings on bedside ultrasound	Yes (3)	67.9 (17.5)
31. Overall, able to determine appropriate next clinical steps	Yes (1)	83.3 (12.1)
Final cut score for the 31-item tool		65.2 (17.0)
Final cut score for the 16 critical item tool		76.5 (12.4)

Critical items are those that the experts indicated that a learner should fail the competency-based assessment if the item was not perform satisfactorily; the numbers in parentheses indicate the round in which consensus for the critical item was achieved.

Expert estimate % refers to the expert estimated percentage of borderline learners who would successfully complete the item.

Final 31-Item Assessment Tool: Critical Items and Established Cut Scores Critical items are those that the experts indicated that a learner should fail the competency-based assessment if the item was not perform satisfactorily; the numbers in parentheses indicate the round in which consensus for the critical item was achieved. Expert estimate % refers to the expert estimated percentage of borderline learners who would successfully complete the item. Twelve experts participated in the standards-setting exercise (Table 1). Of those, 6 (50%) served in the panel on tool construction. In round 1, cut scores were established for 27 of the 31 items (87%). Four items with an SD ≥ 25% were discussed and readdressed in round 2 (“Washes hands,” “Appropriately enters patient identifier,” “Appropriately cleans machine and transducers,” “Able to ensure safety of transducers”). After discussion and rerating of those 4 items in round 2, only 1 item continued to have an SD ≥ 25% (“Able to ensure safety of transducers”). In round 3 postdiscussion, that item achieved an SD < 25% (mean 42.8% ± SD 24.1%). Final cut score of the tool was established at 65.2% ± SD 17.0% (Table 2). Cut scores for critical items were significantly higher than those for noncritical items (76.5% ± SD 12.4% versus 53.1% ± 12.2%, P < .0001). Cut scores for critical items were also significantly higher than the cut score for the full assessment tool (P = .022).

Discussion

In this study, using consensus group methods,45 our experts agreed on 31 items to be included in the workplace-based POCUS assessment tool. POCUS is a complex skill, involving image acquisition, image interpretation, and clinical integration of findings at the bedside.46 Our tool included items on those domains.16,46 In addition, it included items emphasizing the importance of appropriate patient interactions as part of POCUS competence,47 serving to articulate for educators the breadth of key tasks relevant to the assessment of bedside POCUS skills. Of the 31 items on the tool, only 16 (52%) were felt to be critically important. Although critical items on clinical and procedural skills have previously been published,30,48–51 to our knowledge, they have not been established for general POCUS skills. Delineating what items are critical is important for POCUS for 2 reasons. First, POCUS is a relatively new skill. For general medicine, its role has only recently been officially recognized.9 Having few faculty trained in this skill continues to be the most significant barrier to curriculum implementation for general medicine.52,53 In Canada, only approximately 7% of internal medicine faculty54 and 30% of family medicine physicians are trained in POCUS.55 Without trained faculty, appropriate assessment of trainee skills is highly challenging. Critical items can help guide faculty development efforts by helping them better focus on key essential tasks, thereby more effectively managing rater workload56 and improving rater performance.57 Secondly, using key items in assessments may potentially result in higher diagnostic accuracy30,51 and superior reliability measures,58 training, and patient safety.29 In the era of competency-based medical education,11 mastery-based learning is associated with improved clinical outcomes.59,60 Achievement of minimum passing scores set by an expert panel is associated with superior skills and patient outcomes.61–63 While expert panel cut scores are commonly used for standards setting, others have argued that traditional standards-setting methods result in learners being able to miss a fixed percentage of assessment items, without attention to which items were being missed, resulting in patient safety concerns.29 We have noted similar concerns in procedural skills assessments in which learners may achieve very high checklist scores, despite having committed serious procedural errors.27,31 In our present study, we first established which items were considered critical by consensus group methods. We then applied standards-setting procedures to evaluate cut scores. Blinded to whether or not an item was considered critical, our expert panel's established cut scores for critical items were significantly higher than for noncritical items, suggesting those items may indeed be qualitatively different. Specifically, critical items dealt with key skills in image acquisition (items 7, 9, 14, and 16; Table 2), interpretation (items 17, 20, 24, 25, and 26), and safe patient management, such as clinical integration (items 27, 28, 30, and 31), communication of findings (items 5 and 11), and infection control issues (item 12). Our study has some limitations. While our tool provides guidance on which assessment items are critically important, it does not specify to educators how a learner must successfully complete that item. For example, the item “Attains minimal criteria” still requires that the faculty be able to recognize what images are of sufficient quality such that image interpretation is even possible. Therefore, faculty training will continue to be an important part of trainee assessments. Further, despite knowing which items are critical, at present, there is no clear guidance on how to assess those items. Three options have been proposed. From a patient safety perspective, many feel that learners should be required to successfully complete all critical items to be considered competent.64 However, while this approach is appealing from a patient safety perspective, it may result in greater consequences for the learner. Thus, the defensibility of that approach will require additional validity evidence data to support its use. For example, evidence demonstrating that raters can rate those items with high interrater reliability would be helpful.65 A second approach involves setting separate cut scores for critical items than for noncritical items (in the same manner as our present study).64 Finally, a third approach involves applying item weights,65 which may be challenging because experts may not agree on what weights to apply. Certainly, within our study, despite iterative discussions, the final variance on some items remained wide, suggesting disagreements among experts. Future studies should determine which of those 3 methods is superior in delineating competent performances from incompetent ones.

Conclusions

Our experts agreed on 31 items for inclusion in a workplace-based assessment tool for POCUS. Of those, 16 (52%) were felt to be critical in nature, with significantly higher cut scores than those for noncritical items. For determining competency in directly observed POCUS skills, faculty should pay particular attention to those items and ensure that they are completed successfully.

57 in total

1. The role of assessment in competency-based medical education.

Authors: Eric S Holmboe; Jonathan Sherbino; Donlin M Long; Susan R Swing; Jason R Frank
Journal: Med Teach Date: 2010 Impact factor: 3.650

2. Development and Validation of an Assessment Tool for Competency in Critical Care Ultrasound.

Authors: Paru Patrawalla; Lewis Ari Eisen; Ariel Shiloh; Brijen J Shah; Oleksandr Savenkov; Wendy Wise; Laura Evans; Paul Mayo; Demian Szyld
Journal: J Grad Med Educ Date: 2015-12

3. Defining Competencies for Ultrasound-Guided Bedside Procedures: Consensus Opinions From Canadian Physicians.

Authors: G Mark Brown; Mirek Otremba; Luke A Devine; Catherine Gray; Scott J Millington; Irene W Y Ma
Journal: J Ultrasound Med Date: 2015-12-11 Impact factor: 2.153

4. The American Society of Regional Anesthesia and Pain Medicine and the European Society Of Regional Anaesthesia and Pain Therapy Joint Committee recommendations for education and training in ultrasound-guided regional anesthesia.

Authors: Brian D Sites; Vincent W Chan; Joseph M Neal; Robert Weller; Thomas Grau; Zbigniew J Koscielniak-Nielsen; Giorgio Ivani
Journal: Reg Anesth Pain Med Date: 2009 Jan-Feb Impact factor: 6.288

5. The risks of thoroughness: Reliability and validity of global ratings and checklists in an OSCE.

Authors: J P Cunnington; A J Neville; G R Norman
Journal: Adv Health Sci Educ Theory Pract Date: 1996-01 Impact factor: 3.853

6. Simulation-based mastery learning reduces complications during central venous catheter insertion in a medical intensive care unit.

Authors: Jeffrey H Barsuk; William C McGaghie; Elaine R Cohen; Kevin J O'Leary; Diane B Wayne
Journal: Crit Care Med Date: 2009-10 Impact factor: 7.598

4. The Ultrasound Competency Assessment Tool (UCAT): Development and Evaluation of a Novel Competency-based Assessment Tool for Point-of-care Ultrasound.

Authors: Colin Bell; Andrew K Hall; Natalie Wagner; Louise Rang; Joseph Newbigging; Conor McKaigney
Journal: AEM Educ Train Date: 2020-10-03

4 in total