| Literature DB >> 35494980 |
Rachel A Hadler1, Franklin Dexter1, Bradley J Hindman1.
Abstract
Introduction In this study, we tested whether raters' (residents and fellows) decisions to evaluate (or not) critical care anesthesiologists were significantly associated with clinical interactions documented from electronic health record progress notes and whether that influenced the reliability of supervision scores. We used the de Oliveira Filho clinical supervision scale for the evaluation of faculty anesthesiologists. Email requests were sent to raters who worked one hour or longer with the anesthesiologist the preceding day in an operating room. In contrast, potential raters were requested to evaluate all critical care anesthesiologists scheduled in intensive care units during the preceding week. Methods Over 7.6 years, raters (N=172) received a total of 7764 requests to evaluate 21 critical care anesthesiologists. Each rater received a median/mode of three evaluation requests, one per anesthesiologist on service that week. In this retrospective cohort study, we related responses (2970 selections of "insufficient interaction" to evaluate the faculty, and 3127 completed supervision scores) to progress notes (N=25,469) electronically co-signed by the rater and anesthesiologist combination during that week. Results Raters with few jointly signed notes were more likely to select insufficient interaction for evaluation (P < 0.0001): 62% when no joint notes versus 1% with at least 20 joint notes during the week. Still, rater-anesthesiologist combinations with no co-authored notes accounted not only for most (78%) of the evaluation requests but also most (56%) of the completed evaluations (both P < 0.0001). Among rater and anesthesiologist combinations with each anesthesiologist receiving evaluations from multiple (at least nine) raters and each rater evaluating multiple anesthesiologists, most (72%) rater-anesthesiologist combinations were among raters who had no co-authored notes with the anesthesiologist (P < 0.0001). Conclusions Regular use of the supervision scale should be practiced with raters who were selected not only from their scheduled clinical site but also using electronic health record data verifying joint workload with the anesthesiologist.Entities:
Keywords: anesthesiology; hospitals; personnel management; psychometrics; teaching
Year: 2022 PMID: 35494980 PMCID: PMC9036497 DOI: 10.7759/cureus.23500
Source DB: PubMed Journal: Cureus ISSN: 2168-8184
Items in the supervision scale used for the evaluation of critical care anesthesiologist faculty (each scored 1 = never, 2 = rarely, 3 = frequently, or 4 = always)
de Oliveira Filho and colleagues developed their scale using a Delphi process among faculty anesthesiologists and resident physicians. Slight modifications were made to the scale to focus on critical care medicine. For item 5, “airway management” was used instead of “anesthesia induction.” For item 6, “peri-anesthesia management” was shortened to “management” and “anesthetic procedure” reduced to “procedure.” For item 7, each of the examples in the parentheses are different (e.g., removed “anesthesia machine checkout”). All items had to be answered for submission.
| Sequence | Item |
| 1 | The faculty provided me timely, informal, nonthreatening comments on my performance and showed me ways to improve |
| 2 | The faculty was promptly available to help me solve problems with patients and procedures |
| 3 | The faculty used real clinical scenarios to stimulate my clinical reasoning, critical thinking, and theoretical learning |
| 4 | The faculty demonstrated theoretical knowledge, proficiency at procedures, ethical behavior, and interest/ compassion/respect for patients |
| 5 | The faculty was present during the critical moments of procedures (e.g., airway management, critical events, complications) |
| 6 | The faculty discussed with me the management of patients prior to starting a procedure or new therapy and accepted my suggestions, when appropriate |
| 7 | The faculty taught and demanded the implementation of safety measures (e.g., time outs, infection control practices, consideration of deep vein thrombosis and stress ulcer prophylaxis and patient mobilization) |
| 8 | The faculty treated me respectfully and strove to create and maintain a pleasant environment during my clinical activities |
| 9 | The faculty gave me opportunities to perform procedures and encouraged my professional autonomy |
Internal consistency, concurrent validity, and generalizability analysis of the critical care supervision scale
aTo interpret Cronbach’s alpha, for each respondent, four or five of the nine items in the scale were selected (Table 1), and the mean score was calculated. The mean score of the other four or five items was also calculated. Among all respondents, the correlation coefficient between the pairwise split-half mean scores was calculated. The process was repeated using all possible split-halves of the four or five items. The mean of the correlation coefficients was Cronbach’s alpha value, measuring the internal consistency of items [8].
bThe 95% confidence interval for the Spearman rank correlation coefficient was calculated asymptotically. The P‑value was calculated by Monte-Carlo simulation to six-digit accuracy (StatXact 12.0; Cytel, Inc., Cambridge, MA).
cTo estimate the generalizability coefficient with different numbers of independent raters, initial conditions used for integer programming were the 3127 scored evaluations among the subset (N = 157) of raters who each had at least six anesthesiologists and the subset (N = 20) of anesthesiologists each with at least six raters, for a total of 1207 observed combinations of rater and anesthesiologist each with at least one evaluation. For each of the combinations, the mean of the average scores was used in an integer program, solved to maximize the number of combinations in a balanced design. Excel 365 Solver (Microsoft, Redmond, WA) was used with the Evolutionary solving method, automatic scaling, a mutation rate of 0.15, and a population size of 150. With multiple initial conditions and parameter values, the same solution was obtained with N = 21 raters, and N = 9 anesthesiologists.
dThe Stata command used was gstudy (Stata 16.1; StataCorp LLC, College Station, TX).
| Test | Finding | Result, calculation, and interpretation, in sequence |
| Internal consistency | 1 | Cronbach’s alpha of the 9 items equaled 0.961 (95% confidence interval 0.959-0.963) [ |
| Internal consistency | 2 | The sample size, N = 3127 completed (scored) critical care evaluations. |
| Internal consistency | 3 | This “excellent” result was comparable to that for the operating room supervision scale when completed by residents and fellows, Cronbach’s alpha 0.95 and 0.98 [ |
| Concurrent validity | 1 | Spearman’s rank correlation between anesthesiologists’ paired average critical care evaluation supervision scores and average operating room evaluation supervision scores was 0.732 (95% confidence interval 0.406-0.999, P = 0.0027). |
| Concurrent validity | 2 | The sample size was N = 15, critical care anesthesiologists who also provided clinical anesthesia care in operating rooms [ |
| Concurrent validity | 3 | The correlation was greater than observed previously [ |
| Generalizability coefficient | 1 | With the largest possible fully crossed design (21 raters and 9 anesthesiologists), the estimated generalizability coefficient was 0.65 (i.e., ≈101 raters to obtain G-coefficient 0.90).c |
| Generalizability coefficient | 2 | The analysis of variance method was used to estimate this relative generalizability coefficient for the one facet completely crossed design.d |
| Generalizability coefficient | 3 | This low generalizability coefficient finding prompted the current study. |
Distribution of jointly signed progress notes among rater and ratee (critical care anesthesiologist) combinations
| Sorting N = 7764 evaluation requests by week, then by rater, and then in a descending sequence among ratees in the counts of jointly signed progress notes | Number of combinations of the rater and ratee | Mean (standard deviation) of progress notes for the week | 90th percentile of progress notes for the week |
| Rater and ratee combinations with the largest number of jointly electronically signed progress notes that week | 2697 | 9.17 (9.28) | 22 |
| Rater and ratee combinations with the second largest number of jointly electronically signed progress notes that week | 2583 | 0.27 (1.64) | 0 |
| Rater and ratee combinations with the third largest number of jointly electronically signed progress notes that week | 1909 | 0.02 (0.23) | 0 |
| Rater and ratee combinations with the fourth largest number of jointly electronically signed progress notes | 493 | 0.01 | 0 |
| Rater and ratee combinations with the fifth largest number of jointly electronically signed progress notes | 82 | 0 | 0 |
Relationship between endpoints and ordered categories of numbers of patient progress notes signed electronically during the week both by the rater (i.e., resident or fellow) and ratee (i.e., critical care anesthesiologist)
aTo interpret the 7764 requests, among the 397 studied weeks, there were mean 6.8 (SD 2.8) anesthesiology resident physicians and critical care fellows assigned weekly to the surgical and neurological intensive care unit or cardiac intensive care unit. Each rater received email requests to evaluate a mean of 3.0 (SD 0.8) faculty (potential ratees) from the week. The product of 397 weeks, 6.8 potential raters per week, and 3.0 potential ratees per week does not equals 7764 because the mean is being taken for each rater.
bThe percentages of evaluations returned, with “insufficient” or with the nine items scored, are compared with 50%, for testing “most”. These two percentages are labeled b. The 95% confidence interval for 6019/7764 is 77% to 78%. The 95% confidence interval for 1756/3127 is 54% to 58%.
| Variable segmented by categories of notes completed jointly with the faculty ratee | 0 | 1 to 8 | 9 to 14 | 15 to 19 | 20 to 42 | Cochran-Armitage trend test of the row |
| Requests for evaluation (% of the 7764 requests)a | 6019 (78%)b | 392 (5%) | 449 (6%) | 473 (6%) | 431 (6%) | |
| Responses, either “insufficient” or the 9 items completed (% of column Requests for evaluation, from the preceding row) | 4677 (78%) | 306 (78%) | 379 (84%) | 388 (82%) | 347 (81%) | 0.0012 |
| Insufficient given as response (% of column Responses, from the preceding row) | 2921 (62%) | 21 (7%) | 12 (3%) | 11 (3%) | 5 (1%) | <0.0001 |
| Evaluations completed (% of the 3127 evaluations) | 1756 (56%)b | 285 (9%) | 367 (12%) | 377 (12%) | 342 (11%) | |
| Evaluations with all nine items given scores of 4 (% of column Evaluations completed, from the preceding row) | 1292 (74%) | 186 (65%) | 259 (71%) | 277 (73%) | 261 (76%) | 0.61 |
Relationship between endpoints and cumulative counts of patient progress notes signed electronically during the week both by the rater (i.e., resident or fellow) and ratee (i.e., critical care anesthesiologist)
aThe categories were created cumulatively from those in Table 4, selected using integer programming based on minimizing the root mean square differences in sample sizes of evaluation requests among categories.
bThe evaluation response rate of 79% = 6097/7764 (in the preceding row of the same column). Among the 172 raters, there were 14 with at least 100 requests. They had a mean (SD) response rate of 79% (20%). Therefore, there was considerable heterogeneity of response rates among raters.
cRaters averaged 9.17 notes per week with one faculty (ratee) and only 0.27 notes per week with the faculty with the second-largest number of notes jointly with the rater (Table 3). Also, there was a mean of 2.9 invitations per week, with mode 3.0 invitations and median 3.0 invitations. Therefore, the expected percentage of “insufficient” would have been approximately 2/3 (i.e., on evaluating the 1 of 3 faculty with whom the rater worked regularly), not 49% in the cell labeled c. In addition, there was heterogeneity in this reported 49% among raters. Specifically, among the 172 raters, there were 11 with at least 100 responses. They had mean (standard deviation) percentage insufficient of 39% (24%).
dWe found that “most (72%) of the usable combinations of raters and anesthesiologists were among raters who had no notes signed jointly with the ratee.” Most (i.e., 50%) is being compared with the percentage of rater and ratee combinations, limited exclusively to weeks with no progress notes jointly signed. The counts used are labeled d. The reported 72% = (610 – 172)/610. The 95% confidence interval is 68% to 75%.
| Variable versus cumulative counts of notes completed jointly with the faculty ratee | 0 to 42a | 1 to 42a | 9 to 42a | 15 to 42a |
| Requests for evaluation (% of the 7764 requests) | 7764 (100%) | 1745 (22%) | 1353 (17%) | 904 (12%) |
| Responses, either “insufficient” or the 9 items completed (% of column Requests for evaluation, from the preceding row) | 6097 (79%)b | 1420 (81%) | 1114 (82%) | 735 (81%) |
| Insufficient given as response (% of column Responses, from the preceding row) | 2970 (49%)c | 49 (3%) | 28 (3%) | 16 (2%) |
| Evaluations completed (% of the 3127 evaluations) | 3127 (100%) | 1371 (44%) | 1086 (35%) | 719 (23%) |
| Evaluations with score 4.00 was mean of the 9 items (% of column Evaluations completed, from the preceding row) | 2275 (73%) | 983 (72%) | 797 (73%) | 538 (75%) |
| Evaluations completed excluding raters with all scores at 4.00 (% of the 3127 evaluations) | 2652 (85%) | 1066 (34%) | 819 (26%) | 451 (14%) |
| Rater and ratee combinations with ≥1 evaluation, excluding raters with all scores at 4.00 (% of the 1207 combinations) | 1004 (83%) | 660 (55%) | 556 (46%) | 335 (28%) |
| Rater and ratee combinations, excluding raters with all scores at 4.00, raters with <9 ratees, and ratees <9 raters (% of the 1207 combinations) | 610 (51%)d | 172 (14%)d | 86 (7%) | 0 (0%) |