Literature DB >> 35191972

Effect of Artificial Intelligence Tutoring vs Expert Instruction on Learning Simulated Surgical Skills Among Medical Students: A Randomized Clinical Trial.

Ali M Fazlollahi^1,2, Mohamad Bakhaidar^1,2,3, Ahmad Alsayegh^1,2,3, Recai Yilmaz^1,2, Alexander Winkler-Schwartz^1,2, Nykan Mirchi¹, Ian Langleben^1,2, Nicole Ledwos¹, Abdulrahman J Sabbagh^3,4, Khalid Bajunaid⁵, Jason M Harley^6,7,8,9, Rolando F Del Maestro^1,2.

Abstract

Importance: To better understand the emerging role of artificial intelligence (AI) in surgical training, efficacy of AI tutoring systems, such as the Virtual Operative Assistant (VOA), must be tested and compared with conventional approaches. Objective: To determine how VOA and remote expert instruction compare in learners' skill acquisition, affective, and cognitive outcomes during surgical simulation training. Design, Setting, and Participants: This instructor-blinded randomized clinical trial included medical students (undergraduate years 0-2) from 4 institutions in Canada during a single simulation training at McGill Neurosurgical Simulation and Artificial Intelligence Learning Centre, Montreal, Canada. Cross-sectional data were collected from January to April 2021. Analysis was conducted based on intention-to-treat. Data were analyzed from April to June 2021. Interventions: The interventions included 5 feedback sessions, 5 minutes each, during a single 75-minute training, including 5 practice sessions followed by 1 realistic virtual reality brain tumor resection. The 3 intervention arms included 2 treatment groups, AI audiovisual metric-based feedback (VOA group) and synchronous verbal scripted debriefing and instruction from a remote expert (instructor group), and a control group that received no feedback. Main Outcomes and Measures: The coprimary outcomes were change in procedural performance, quantified as Expertise Score by a validated assessment algorithm (Intelligent Continuous Expertise Monitoring System [ICEMS]; range, -1.00 to 1.00) for each practice resection, and learning and retention, measured from performance in realistic resections by ICEMS and blinded Objective Structured Assessment of Technical Skills (OSATS; range 1-7). Secondary outcomes included strength of emotions before, during, and after the intervention and cognitive load after intervention, measured in self-reports.
Results: A total of 70 medical students (41 [59%] women and 29 [41%] men; mean [SD] age, 21.8 [2.3] years) from 4 institutions were randomized, including 23 students in the VOA group, 24 students in the instructor group, and 23 students in the control group. All participants were included in the final analysis. ICEMS assessed 350 practice resections, and ICEMS and OSATS evaluated 70 realistic resections. VOA significantly improved practice Expertise Scores by 0.66 (95% CI, 0.55 to 0.77) points compared with the instructor group and by 0.65 (95% CI, 0.54 to 0.77) points compared with the control group (P < .001). Realistic Expertise Scores were significantly higher for the VOA group compared with instructor (mean difference, 0.53 [95% CI, 0.40 to 0.67] points; P < .001) and control (mean difference. 0.49 [95% CI, 0.34 to 0.61] points; P < .001) groups. Mean global OSATS ratings were not statistically significant among the VOA (4.63 [95% CI, 4.06 to 5.20] points), instructor (4.40 [95% CI, 3.88-4.91] points), and control (3.86 [95% CI, 3.44 to 4.27] points) groups. However, on the OSATS subscores, VOA significantly enhanced the mean OSATS overall subscore compared with the control group (mean difference, 1.04 [95% CI, 0.13 to 1.96] points; P = .02), whereas expert instruction significantly improved OSATS subscores for instrument handling vs control (mean difference, 1.18 [95% CI, 0.22 to 2.14]; P = .01). No significant differences in cognitive load, positive activating, and negative emotions were found. Conclusions and Relevance: In this randomized clinical trial, VOA feedback demonstrated superior performance outcome and skill transfer, with equivalent OSATS ratings and cognitive and emotional responses compared with remote expert instruction, indicating advantages for its use in simulation training. Trial Registration: ClinicalTrials.gov Identifier: NCT04700384.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35191972 PMCID： PMC8864513 DOI： 10.1001/jamanetworkopen.2021.49008

Source DB: PubMed Journal: JAMA Netw Open ISSN： 2574-3805

Introduction

Mastery of bimanual psychomotor skills is a defining goal of surgical education,[1,2] and wide variation in surgical skill among practitioners is associated with adverse intraoperative and postoperative patient outcomes.[3,4] Novel technologies, such as surgical simulators using artificial intelligence (AI) assessment systems, are improving our understanding of the composites of surgical expertise and have the potential to reduce skill heterogeneity by complementing competency-based curriculum training.[5,6,7] Virtual reality simulation and machine learning algorithms can objectively quantify performance and improve the precision and granularity of bimanual technical skills classification.[8,9,10] These systems may enhance surgical educators’ ability to develop more quantitative formative and summative assessment tools to manage future challenging pedagogic requirements. The COVID-19 pandemic has significantly altered surgical trainees’ ability to obtain intraoperative instruction necessary for skill acquisition,[11] and innovative solutions, such as AI-powered tutoring systems, may help in addressing such disruptions.[12] An AI tutoring system refers to an educational platform driven by computer algorithms that integrate assessment with personalized feedback.[13] Our group has developed an AI tutoring system called the Virtual Operative Assistant (VOA) that uses a machine learning algorithm, support vector machine, to classify learner performance and provide goal-oriented, metric-based audiovisual feedback in virtual reality simulations.[14] Following the competency-based medical education model of the Royal College of Physicians and Surgeons of Canada,[15] and to mitigate extrinsic cognitive load through segmentation,[16] the system guides learners in 2 steps: first, helping trainees reach competency in safety metrics and second, evaluating metrics associated with instrument movement and efficiency.[14] The VOA AI tutoring system is designed for surgical simulation training, but its effectiveness compared with conventional surgical instruction is unknown. Expert-led telementoring and virtual clerkships use technologies, such as augmented reality headsets and videotelephony software, for supervision and feedback.[17,18] With the ongoing pandemic, these adaptations may provide alternatives to intraoperative surgical instruction.[19] For this study, we followed the criterion standards of assessment and debriefing in surgical education, Objective Structured Assessment of Technical Skills (OSATS)[20] and Promoting Excellence and Reflective Learning in Simulation (PEARLS) debriefing guide,[21] to design a standardized expert-led remote training as the traditional control. We sought to investigate VOA’s educational value by comparing it with remote expert instruction in enhancing technical performance and learning outcomes of medical students during brain tumor resection simulations and eliciting emotional and cognitive responses that are associated with supporting learning. Our hypothesis was that VOA feedback would be similar to remote expert instruction in performance outcomes but lead to stronger negative emotions and higher cognitive load.

Methods

This multi-institutional instructor-blinded randomized clinical trial was approved by McGill University Health Centre Research Ethics Board, Neurosciences–Psychiatry. All participants signed an informed consent form prior to participation. This report follows the Consolidated Standards of Reporting Trials involving AI (CONSORT-AI)[22] and Best Practices for Machine Learning to Assess Surgical Expertise.[23] The trial protocol and statistical analysis plan are available in Supplement 1.

Participants

Medical students with no surgical experience were invited to voluntarily participate. Recruitment information was shared among student networks, social media, and interest groups. Selection was based on meeting inclusion criterion: enrollment in Medicine Preparatory or first or second year of a medical program in Canada. Our exclusion criteria were participation in surgical clerkship or previous experience with the virtual reality simulator used in this study (NeuroVR; CAE Healthcare).

Randomization

Students were stratified by sex and block randomized to 3 intervention arms, allocation ratio of 1:1:1, using an internet-based, computer-generated random sequence.[24] Group allocation was concealed by the study coordinator, and instructors were notified of appointment times 1 day in advance for scheduling purposes. The participant recruitment flowchart is outlined in Figure 1.

Figure 1.

Participant Recruitment Flowchart

Study Procedure

After participants provided written consent, they completed a background information questionnaire that recorded baseline emotions using the Medical Emotion Scale (MES),[25] experiences that may influence bimanual dexterity (ie, video games,[26] musical instruments[27]), deliberate practice (ie, competitive sports[28]), or prior virtual reality navigation. Students were not informed of the trial purpose or assessment metrics. Participants performed 5 practice simulated tumor resections[29] (eFigure 1 in Supplement 2), followed by feedback (intervention) or no feedback (control), then completed 1 realistic tumor resection simulation[30] (eFigure 2 in Supplement 2) to evaluate learning and transfer of technical skills. The MES self-report was administered again on completion of the fifth and sixth resections to assess participants’ emotions during and after the learning session, respectively, and the Cognitive Load Index[31] self-report was used to measure cognitive load after training.

Simulator

The tumor resection simulator, NeuroVR, simulates neurosurgical procedures on a high-fidelity platform that recreates the visual, auditory, and haptic experience of resecting human brain tumors (eFigure 3 in Supplement 2).[32] Because this simulator records timeseries data of users’ interaction in the virtual space,[33] machine learning algorithms have been demonstrated to successfully differentiate surgical expertise based on validated performance metrics.[8,9,10,34]

Virtual Reality Tumor Resection Procedures

Subpial resection is a neurosurgical technique in oncologic and epilepsy surgery that requires coordinated bimanual psychomotor ability to resect pathologic tissue with preservation of surrounding brain and vessels.[35] The student’s objective was to remove a simulated cortical tumor with minimal bleeding and damage to surrounding tissues using a simulated aspirator in the dominant hand and a simulated bipolar forceps in the nondominant hand (Video).[29,30] Participants received standardized verbal and written instructions on instrument use and performed orientation modules to understand each instrument’s functions. Individuals had 5 minutes to complete each practice resection and 13 minutes for the realistic resection. The first practice subpial resection was considered baseline performance.

Video.

Virtual Reality Brain Tumor Resection on the NeuroVR

Partial recording from a single participant performing the practice and the realistic virtual reality subpial resection task. The bipolar is held with the nondominant hand and appears on the left. The aspirator is held in the dominant hand and appears on the right.

Virtual Reality Brain Tumor Resection on the NeuroVR

Interventions

Participants were allocated 5 minutes between each resection session to receive the intended intervention. Both experimental arms followed principles of deliberate practice guided by self-regulated learning,[36,37] in which formative assessment enables finding areas of growth, setting goals, and adopting strategies that enhance competence.[38] The feedback received and progress toward learning objectives were monitored by either the VOA or an instructor.

VOA AI Tutoring

VOA estimates a competence percentage score and a binary expertise classification based on 4 metrics: assessment criteria selected through expert consultation and statistical, forward, and backward support vector machine feature selection.[14] Competence is evaluated in 2 steps, safety and instrument movement, each associated with 2 metrics: mean bleeding rate and maximum bipolar force application for step 1, and mean instrument tip separation distance and mean bipolar acceleration for step 2. Learners must achieve expert classification for safety metrics in step 1 before moving to step 2 to learn instrument movement metrics and achieve competency. Individuals classified as novice in any metric receive automated audiovisual feedback (eFigure 4 in Supplement 2).[14]

Remote Expert Instruction

Delivering traditional apprenticeship learning during the COVID-19 pandemic for a controlled experiment requires steps that minimize contact and ensure consistency. Two senior neurosurgery residents (M.B. and A.A., postgraduate year 5) who had experience performing human subpial resection procedures completed standardized training (eAppendix 1 in Supplement 2) to perform simulations within consultants’ benchmarks, reliably rate on-screen performances using a modified OSATS visual rating scale,[39] and provide feedback from a modified PEARLS debriefing script.[21] Instructors were blinded to AI assessment metrics. Prior to recruitment, the OSATS scale demonstrated good internal consistency (α = 0.82 [95% CI, 0.77 to 0.87]) and instructors achieved good interrater reliability (intraclass correlation coefficient, 0.84 [95% CI, 0.79 to 0.88]). Each participant’s live on-screen practice performance was assessed remotely by 1 randomly selected instructor (eFigure 5 in Supplement 2), who completed an assessment sheet (eAppendix 2 in Supplement 2). During debriefing, instructors followed a modified PEARLS script and provided feedback from a list of instructions, suggested by consultants, depending on students’ competency. The eTable in Supplement 2 contains details on feedback interventions.

Control Group

Control participants received no performance assessment or feedback and were instructed to use the time between simulations to reflect and set goals for the following trial. This follows principles of experiential learning through active experimentation and reflective observation,[40] establishing a baseline for performance improvement and learning with no feedback.

Outcome Measures

The coprimary outcome was the interaction effect of feedback on surgical performance improvement over time during 5 practice resections, measured by the Intelligent Continuous Expertise Monitoring System (ICEMS) Expertise Score: the mean of expertise predictions (range, −1.00 to 1.00, reflecting novice to expert rating) computed for every 0.2-second of the procedure, by a deep learning algorithm using a long short-term memory network with 16 input performance metrics from simulator’s raw data.[34] The second coprimary outcome was learning and skill retention, evaluated based on realistic tumor resection performance by both the ICEMS and blinded OSATS assessment. The OSATS rubric contains 6 performance categories, each rated on a 7-point Likert scale (eAppendix 2 in Supplement 2). Secondary outcomes were differences in the strength of emotions before, during, and after training and cognitive demands required by each intervention. These were measured by self-report on the MES for emotional strength[25] and the Cognitive Load Index for cognitive demands[31] on a 5-point Likert scale.

Statistical Analysis

Ad hoc analysis to achieve 80% statistical power (β = 0.20), estimating moderate primary outcome effect of 35%, with 2-sided test at α = .05, revealed a minimum of 23 participants were required for each intervention arm. Collected data were examined for outliers and normality. Levene test for equality of variance and Mauchly test of sphericity met assumptions of analysis of variance (ANOVA). Two-way mixed ANOVA investigated the interaction of group assignment (between-participants) and time (within-participant) on learning curves and emotion self-reports. One-way ANOVA tested between-group differences in learning, cognitive load, and OSATS scores. Baseline performance was assigned as a covariate in the mixed model. Repeated measures ANOVA examined within-participant changes of performance in each group. Significance was set at P < .05. P values were adjusted by Bonferroni correction for multiple tests. All statistical analyses were performed on SPSS statistical software version 27 (IBM). Expertise Score predictions were conducted in MATLAB release 2020a (MathWorks). Data were analyzed from April to June 2021.

Results

A total of 70 medical students (41 [59%] women and 29 [41%] men; mean [SD] age, 21.8 [2.3] years) from 4 institutions (McGill University, 32 students [46%]; Laval University, 19 students [27%]; University of Montreal, 17 students [24%]; University of Sherbrooke, 2 students [3%]) were randomized, including 23 students in the VOA group, 24 students in the instructor group, and 23 students in the control group. Distribution of baseline characteristics was balanced among groups (Table). All included participants completed the training, and no one was lost to follow-up. A total of 350 practice resections and 70 realistic resections were scored by the ICEMS. Blinded experts evaluated 70 video recordings of realistic performances using the OSATS scale. There were no statistically significant differences among groups in baseline performance (Figure 2A). At baseline, mean Expertise Scores were −0.57 (95% CI, −0.66 to −0.48) points in the VOA group, −0.60 (95% CI, −0.66 to −0.55) points in the instructor group, and −0.53 (95% CI, −0.62 to −0.43) points in the control groups. All VOA group participants passed the safety module (step 1) and 14 students (61%) completed instrument movements competency (step 2) by the end of training (eFigure 6 in Supplement 2).

Table.

Demographic Characteristics of Included Participants

Characteristic	Medical students, No. (%)
Characteristic	Control group (n = 23)	VOA group (n = 23)	Instructor group (n = 24)
Age, mean (SD), y	21.7 (2.4)	21.9 (2.5)	21.8 (2.1)
Sex
Men	9 (39)	10 (43)	10 (42)
Women	14 (61)	13 (57)	14 (58)
Undergraduate medical training level
Medicine Preparatory^a	9 (39)	10 (43)	7 (29)
First year	8 (35)	8 (35)	9 (38)
Second year	6 (26)	5 (22)	8 (33)
Institution
McGill University	14 (61)	8 (35)	10 (42)
University of Montreal	3 (13)	7 (30)	7 (29)
University of Laval	6 (26)	7 (30)	6 (25)
University of Sherbrooke	0	1 (5)	1 (4)
Dominant hand
Right	23 (100)	21 (91)	22 (92)
Left	0	2 (9)	2 (8)
Interest in pursuing surgery, mean (SD)^b	3.7 (1.0)	3.9 (1.1)	3.8 (1.2)
Play video games, h/wk
Not at all	15 (65)	15 (65)	16 (67)
1-5	5 (22)	6 (26)	5 (21)
6-10	2 (9)	2 (9)	2 (8)
>11	1 (4)	0	1 (4)
Play musical instruments
Yes	12 (52)	8 (35)	13 (54)
No	11 (48)	15 (65)	11 (46)
Did competitive sports in the past 5 y
Yes	12 (52)	17 (74)	17 (71)
No	11 (48)	6 (26)	7 (29)
Prior VR experience in any domain
None	14 (61)	12 (52)	12 (50)
Passive (eg, videos)	8 (35)	10 (43)	9 (38)
Active (eg, games, simulation)	1 (4)	1 (5)	3 (12)
Prior experience with any VR surgical simulator
Yes	1 (4)	0	0
No	22 (96)	23 (100)	24 (100)

Abbreviations: VOA, Virtual Operative Assistant; VR, virtual reality.

Medicine Preparatory is a 1-year preparatory program for graduates of the Quebec Collegial system who have been offered a position from the medical program of McGill University or University of Montreal.

Rated on a Likert Scale (1-5), with 1 indicating less interest and 5 indicating more interest.

Figure 2.

Performance Assessment in the Practice Tumor Resections

A, Negative scores indicate a novice; and a positive score, a more expert performance. Scores in each trial are the mean of all estimations made for every 200 milliseconds of the simulated procedure (approximately 1500 predictions for a 5-minute practice scenario). B, Maximum bipolar force application is a recording of the highest amount of force applied with the bipolar during the entire operation. C, Mean instrument tip separation distance measured as the mean distance between the aspirator and the bipolar tips. D, Mean bipolar acceleration measured as the rate of change in the bipolar instrument’s velocity. Error bars indicate 95% CIs; and VOA, Virtual Operative Assistant.

Abbreviations: VOA, Virtual Operative Assistant; VR, virtual reality. Medicine Preparatory is a 1-year preparatory program for graduates of the Quebec Collegial system who have been offered a position from the medical program of McGill University or University of Montreal. Rated on a Likert Scale (1-5), with 1 indicating less interest and 5 indicating more interest.

Performance Assessment in the Practice Tumor Resections

Performance During Practice Tumor Subpial Resection

At completion, the mean Expertise Scores were 0.14 (95% CI, 0.01 to 0.28) points in the VOA group, −0.62 (95% CI, −0.68 to −0.57) points in the instructor group, and −0.56 (95% CI, −0.65 to −0.47) points in the control group. Mixed ANOVA demonstrated that within-participant performance changes depended on the type of feedback, with the VOA feedback group achieving a difference of 0.66 (95% CI, 0.55 to 0.77) points higher compared with the instructor group (P < .001) and 0.65 (95% CI, 0.54 to 0.77) points higher compared with the control group (P < .001) (Figure 2A). Mean Expertise Scores in instructor and control groups were not significantly different. The VOA group demonstrated Expertise Scores improvements between trials (Figure 2A). Pairwise comparisons demonstrated that learners performed significantly better than baseline after AI tutoring feedback (mean difference vs baseline: trial 1, 0.37 [95% CI, 0.18 to 0.56] points; P < .001; trial 2, 0.51 [95% CI, 0.29 to 0.74] points; P < .001; trial 3, 0.65 [95% CI, 0.41 to 0.89] points; P < .001; trial 4, 0.61 [95% CI, 0.36 to 0.86] points; P < .001). There was significant improvement from trial 1 to trial 3 (mean difference, 0.28 [95% CI, 0.55 to 0.02] points; P = .02) and trial 1 to trial 4 (mean difference, 0.24 [95% CI, 0.00 to 0.49] points; P = .04). Learning curves demonstrate steady improvement from baseline to trial 3 that plateaued at trials 3 and 4. Three VOA feedback instances resulted in mean group performance higher than 0.00 points, the ICEMS novice threshold (Figure 2A). Of the 4 VOA metrics used for competency training, 3 demonstrated improvement in VOA group and significant differences compared with the instructor and control groups (maximum bipolar force application, mean instrument tip separation distance, and mean bipolar acceleration) (Figure 2B-D). There was no significant difference among groups in bleeding rate owing to wide participant variability in this metric. VOA feedback was more effective in enhancing metric scores compared with expert instruction, and compared with control, remote expert feedback significantly reduced mean instrument tip separation distance (mean difference, –3.28 [95% CI, –6.36 to –0.21] mm; P = .03) (Figure 2C). Of the 16 ICEMS metrics not trained by the VOA, 8 significantly improved in the VOA group compared with instructor and control groups, suggesting that feedback on 4 AI-selected safety and instrument movement metrics resulted in improved bimanual psychomotor performance in other benchmark metrics.

Realistic Tumor Resection Performance

The VOA group achieved significantly higher Expertise Scores in the realistic subpial resection than instructor (mean difference, 0.53 [95% CI, 0.40 to 0.67] points) and control (mean difference, 0.49 [95% CI, 0.34 to 0.61] points; P < .001) groups (Figure 3A). Global OSATS ratings of realistic subpial resections showed no significant difference between the VOA group (mean score, 4.63 [95% CI, 4.06 to 5.20] points) and the instructor group (mean score, 4.40 [95% CI, 3.88 to 4.91] points; mean difference, 0.23 [95% CI, −0.59 to 1.06] points; P = .78) or the control group (3.86 [95% CI, 3.44 to 4.27] points; mean difference, 0.78 [95% CI, −0.06 to 1.61] points; P = .07), consistent with an equivalent qualitative performance outcome. In OSATS subscores and compared with the control group, feedback significantly improved participants’ respect for tissue (mean difference: VOA, 1.17 [95% CI, 0.40 to 1.95] points; P = .002; instructor, 0.85 [95% CI, 0.08 to 1.62] points; P = .03) and economy of movement (mean difference: VOA, 1.35 [95% CI, 0.39 to 2.31] points; P = .004; instructor, 1.07 [95% CI, 0.12 to 2.02] points; P = .02). Compared with the control group, expert instruction significantly enhanced instrument handling (mean difference, 1.18 [95% CI, 0.22 to 2.14] points; P = .01) and VOA resulted in significantly higher OSATS overall subscore (mean difference, 1.04 [95% CI, 0.13 to 1.96] points; P = .02) (Figure 3C). Completing VOA’s instrument movement competency correlated significantly with higher economy of movement (Pearson r = 0.25, P = .03), suggesting successful acquisition of the relevant competency.

Figure 3.

Performance Assessment in the Realistic Tumor Resection

Error bars indicate 95% CIs; OSATS, Objective Structured Assessment of Technical Skills; and VOA, Virtual Operative Assistant.

Performance Assessment in the Realistic Tumor Resection

Error bars indicate 95% CIs; OSATS, Objective Structured Assessment of Technical Skills; and VOA, Virtual Operative Assistant.

Emotions and Cognitive Load

In within-participant analysis, there was a significant increase in positive activating emotions (after vs before mean difference, 0.36 [95% CI, 0.16 to 0.55] points; P < .001) and a significant decline in negative activating emotions (after vs before mean difference, –0.59 [95% CI, –0.85 to –0.34] points; P < .001) throughout the simulation training. The significant interaction effect in positive deactivating emotions demonstrated that instructor group participants felt more relieved and relaxed during training compared with learners in VOA (mean difference, 0.75 [95% CI, 0.19 to 1.31] points; P = .006) and control groups (mean difference, 0.71 [95% CI, 0.14 to 1.27] points; P = .01) (Figure 4A-C). No between-participant difference in intrinsic, extrinsic, and germane cognitive load were found (Figure 4D).

Figure 4.

Emotions and Cognitive Load Throughout the Simulation Training

Positive activating emotions include happy, hopeful, grateful (A), and negative activating emotions include confusion and anxiety (B). Error bars indicate 95% CIs; and VOA, Virtual Operative Assistant.

Emotions and Cognitive Load Throughout the Simulation Training

AI Intervention Acceptance

To assess student acceptance of the AI intervention, we administered a poststudy questionnaire to all 23 students of the VOA group, and 22 students (96%) reported that they would prefer to learn from both expert instruction and AI tutoring. Additionally, only 1 student (4%) reported they preferred AI tutoring only, and no student reported they preferred expert instruction only.

Discussion

This randomized clinical trial is the first study, to our knowledge, that compares the effectiveness of an AI-powered tutoring system with expert instruction in surgical simulation while assessing affective and cognitive response to such instruction. Surgical performance is an independent factor associated with postoperative patient outcomes,[41] and technical skills acquired in simulation training improve operating room performance.[42,43,44] Repetitive practice in a controlled environment and educational feedback are key features of simulation-based surgical education[45]; however, use of autonomous pedagogical tools in simulation training is limited. In this randomized clinical trial, our findings demonstrated effective use of AI tutoring in surgical simulation training. VOA feedback improved performance during the practice and realistic simulation scenarios, measured quantitatively by Expertise Scores, and enhanced operative quality and students’ skill transfer, observed by OSATS during the realistic tumor resection. Objective metric-based formative feedback through AI tutoring demonstrated advantages compared with remote expert instruction. It helped students achieve higher expertise by bringing awareness to their metric goals during resections and setting measurable performance objectives, 2 effective strategies of learning theory.[46] Feedback on AI-selected metrics had an extended effect on supplementary performance criteria used in both OSATS and ICEMS. VOA’s learning platform is flexible and allows learners with different levels of expertise to practice and receive personalized formative feedback based on interest and time availability. This AI intervention saved approximately 53 hours of expert supervision and formative assessment over 13 weeks compared with the instructor group while resulting in comparable OSATS scores. VOA did not bring participants’ Expertise Scores to the level of senior experts (ie, ICEMS >0.33),[34] suggesting areas for future research and improvement. More research is needed to understand which surgical procedures lend themselves best to AI interventions, but this study provides evidence that this brain tumor resection technique may be an appropriate candidate. In contrast to our hypothesis and previous reports, in which learning with an AI tutor elicited negative emotions, impairing students’ use of self-regulated learning strategies,[47,48] learning bimanual tumor resection skills with VOA demonstrated a gradual decline in negative activating emotions with an overall increase in positive emotions, similar to human instruction. Encouragingly, VOA participants did not report this learning experience required significantly higher cognitive demands compared with the other interventions, demonstrating clear and comprehensible AI tutoring feedback that required minimal extraneous load. Although the full impact of COVID-19 on surgical education remains unclear,[49] it is important to prepare for future challenges through focused research and further development of effective remote learning platforms.[50] We report 2 potential methods to address remote learning, both with demonstrated ability to enhance task performance better than control. Comparing efficacy between the 2 interventions arms of this trial is up to interpretation and limited to the primary outcome measures used. Curriculum coherence is a fundamental principle in education that is achieved in part by the alignment of intended learning outcomes and instructional activities with the assessment criteria.[51,52] Following this principle in randomized trials involving educational interventions may create a potential overlap between the primary outcome measures and the pedagogical tools used during training. In this study, 4 of 16 ICEMS metrics were learning objectives of the VOA and all 6 OSATS categories were learning objectives of the instructor group; therefore, the use of either tool alone as a primary outcome may lead to bias toward better performance for one group. The VOA’s more flexible and time-efficient approach, in addition to its similar OSATS outcome and its extended effect on ICEMS’s remaining performance metrics, demonstrated that AI tutoring may have some advantages compared with remote expert instruction. Consistent with previous studies,[53,54] our findings suggest that scripted feedback by instructors established a supportive learning environment where participants felt stronger positive deactivating emotions during practice; however, this did not result in greater performance. Studies suggest that there is no statistically significant difference in complication rates, operative time, and surgical outcomes between telementoring and in-person instruction,[55,56] but there is limited evidence comparing their educational effectiveness on technical performance. In this study, remote instruction was inferior to AI tutoring based on quantifiable metrics, but further research is necessary to determine if that remains the case with in-person coaching. Our remote-based method was considered feasible by instructors because they could easily join to provide virtual debriefing and technical instruction. The AI algorithm used in this study failed to detect performance improvements in the instructor group according to OSATS ratings for practice and realistic scenarios (eFigure 7 in Supplement 2). OSATS categories, like instrument handling, describe a subjective qualitative composite of actions that AI systems have difficulty measuring from raw data. ICEMS functions at a deeper level by analyzing the interaction of several underlying metrics that contribute to expertise. These systems may be less able to assess operative strategies, such as a systematically organized tumor resection plan, that students may acquire more readily from expert instruction. These types of procedural instruction may take more educator time to become apparent as changes in learners’ metrics scores. Our findings suggest that monitoring specific AI-derived expert performance metrics, such as bipolar instrument’s acceleration and providing personalized quantitative learner feedback on these metrics, is an efficient method to guide behavioral changes toward a higher operative quality. However, integrating metric objectives with the task goals may be challenging and may require expert input. Most participants (96%) reported that they would prefer learning with feedback from both expert instruction and AI tutoring, suggesting complementary features from both methods could enhance the learning experience. With increasing efforts to capture live operative data,[57] combining intraoperative use of AI tutoring and expert surgical instruction may accelerate the path to mastery.

Limitations

This study has some limitations. Although the AI-powered virtual reality simulation platform used in this study allows detailed quantitative assessment of bimanual technical skills, it fails to capture the full spectrum of competencies, such as interdisciplinary teamwork, required in surgery. Furthermore, the use of volunteers in this study may be a source of selection bias toward motivated and technologically savvy learners. Other limitations include the sample cohort with limited surgical experience, instructor experience level, and the remote instruction context that limited in-person expert feedback delivery owing to the COVID-19 pandemic. Whether AI feedback would remain comparable to in-person expert instruction was beyond the scope of this study and is being evaluated by an ongoing trial (ClinicalTrials.gov Identifier: NCT05168150). Future studies should also focus on combining personalized AI feedback with expert instruction to investigate hybrid methods that maximize the educational potential for learners.

Conclusions

The findings of this randomized clinical trial suggest that performing simulated brain tumor resections was more effective with feedback from an AI tutor compared with learning from remote expert instruction. VOA significantly improved Expertise Scores and OSATS scores in a realistic procedure while fostering an equivalent affective and cognitive learning environment.

41 in total

1. Surgical skill and complication rates after bariatric surgery.

Authors: John D Birkmeyer; Jonathan F Finks; Amanda O'Reilly; Mary Oerline; Arthur M Carlin; Andre R Nunn; Justin Dimick; Mousumi Banerjee; Nancy J O Birkmeyer
Journal: N Engl J Med Date: 2013-10-10 Impact factor: 91.245

2. Objective structured assessment of technical skill (OSATS) for surgical residents.

Authors: J A Martin; G Regehr; R Reznick; H MacRae; J Murnaghan; C Hutchison; M Brown
Journal: Br J Surg Date: 1997-02 Impact factor: 6.939

3. Roadmap for Developing Complex Virtual Reality Simulation Scenarios: Subpial Neurosurgical Tumor Resection Model.

Authors: Abdulrahman J Sabbagh; Khalid M Bajunaid; Norah Alarifi; Alexander Winkler-Schwartz; Ghusn Alsideiri; Gmaan Al-Zhrani; Fahad E Alotaibi; Abdulgadir Bugdadi; Denis Laroche; Rolando F Del Maestro
Journal: World Neurosurg Date: 2020-04-11 Impact factor: 2.104

4. Machine learning distinguishes neurosurgical skill levels in a virtual reality tumor resection task.

Authors: Samaneh Siyar; Hamed Azarnoush; Saeid Rashidi; Alexander Winkler-Schwartz; Vincent Bissonnette; Nirros Ponnudurai; Rolando F Del Maestro
Journal: Med Biol Eng Comput Date: 2020-04-11 Impact factor: 2.602

Review 5. Enhancing surgical performance by adopting expert musicians' practice and performance strategies.

Authors: Mei Rui; Jeffrey E Lee; Jean-Nicolas Vauthey; Claudius Conrad
Journal: Surgery Date: 2018-01-11 Impact factor: 3.982

6. Is Virtual Reality Surgical Performance Influenced by Force Feedback Device Utilized?

Authors: Abdulgadir Bugdadi; Robin Sawaya; Khalid Bajunaid; Duaa Olwi; Alexander Winkler-Schwartz; Nicole Ledwos; Ibrahim Marwa; Ghusn Alsideiri; Abdulrahman Jafar Sabbagh; Fahad E Alotaibi; Gmaan Al-Zhrani; Rolando Del Maestro
Journal: J Surg Educ Date: 2018-07-31 Impact factor: 2.891

7. Artificial Intelligence Distinguishes Surgical Training Levels in a Virtual Reality Spinal Task.

Authors: Vincent Bissonnette; Nykan Mirchi; Nicole Ledwos; Ghusn Alsidieri; Alexander Winkler-Schwartz; Rolando F Del Maestro
Journal: J Bone Joint Surg Am Date: 2019-12-04 Impact factor: 5.284

8. Intelligent Tutoring Systems: Re-Envisioning Surgical Education in Response to COVID-19.

Authors: Nykan Mirchi; Nicole Ledwos; Rolando F Del Maestro
Journal: Can J Neurol Sci Date: 2020-09-10 Impact factor: 2.104

9. Creation of an Interactive Virtual Surgical Rotation for Undergraduate Medical Education During the COVID-19 Pandemic.

Authors: Tiffany N Chao; Ariel S Frost; Robert M Brody; Yasmeen M Byrnes; Steven B Cannady; Neil N Luu; Karthik Rajasekaran; Rabie M Shanti; Kara R Silberthau; Vasiliki Triantafillou; Jason G Newman
Journal: J Surg Educ Date: 2020-07-01 Impact factor: 2.891

10. Intense Simulation-Based Surgical Education for Manual Small-Incision Cataract Surgery: The Ophthalmic Learning and Improvement Initiative in Cataract Surgery Randomized Clinical Trial in Kenya, Tanzania, Uganda, and Zimbabwe.

Authors: William H Dean; Stephen Gichuhi; John C Buchan; William Makupa; Agrippa Mukome; Juliet Otiti-Sengeri; Simon Arunga; Subhashis Mukherjee; Min J Kim; Lloyd Harrison-Williams; David MacLeod; Colin Cook; Matthew J Burton
Journal: JAMA Ophthalmol Date: 2021-01-01 Impact factor: 7.389

3 in total

1. CLTS-GAN: Color-Lighting-Texture-Specular Reflection Augmentation for Colonoscopy.

Authors: Shawn Mathew; Saad Nadeem; Arie Kaufman
Journal: Med Image Comput Comput Assist Interv Date: 2022-09-17

2. Continuous monitoring of surgical bimanual expertise using deep neural networks in virtual reality simulation.

Authors: Recai Yilmaz; Alexander Winkler-Schwartz; Nykan Mirchi; Aiden Reich; Sommer Christie; Dan Huy Tran; Nicole Ledwos; Ali M Fazlollahi; Carlo Santaguida; Abdulrahman J Sabbagh; Khalid Bajunaid; Rolando Del Maestro
Journal: NPJ Digit Med Date: 2022-04-26

3. The Application of Rehabilitation Therapy Occupational Competency Evaluation Model in the Improvement of College Students' Innovation and Entrepreneurship.

Authors: Zhenghan Liang
Journal: Occup Ther Int Date: 2022-06-20 Impact factor: 1.565

3 in total