| Literature DB >> 35904087 |
Thomas Y T Lam1,2, Max F K Cheung3, Yasmin L Munro3, Kong Meng Lim3, Dennis Shung4, Joseph J Y Sung3.
Abstract
BACKGROUND: The number of artificial intelligence (AI) studies in medicine has exponentially increased recently. However, there is no clear quantification of the clinical benefits of implementing AI-assisted tools in patient care.Entities:
Keywords: artificial intelligence; clinical; clinical informatics; gastroenterology; mobile phone; randomized controlled trial; systematic review
Mesh:
Year: 2022 PMID: 35904087 PMCID: PMC9459941 DOI: 10.2196/37188
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 7.076
Figure 1PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram.
Publications.
| Author (publication year) | Title | Country | Article type | Specialty |
| El Solh et al [ | Predicting optimal CPAP by neural network reduces titration failure: a randomized study | The United States | Original article | Respiratory medicine |
| Shimabukuro et al [ | Effect of a machine learning-based severe sepsis prediction algorithm on patient survival and hospital length of stay: a randomised clinical trial | The United States | Original article | Anesthesiology |
| Labovitz et al [ | Using artificial intelligence to reduce the risk of nonadherence in patients on anticoagulation therapy | The United States | Original article | Neurology |
| Gracey et al [ | Improving medication adherence by better targeting interventions using artificial intelligence-a randomized control study | The United States | Abstract | Family medicine |
| Liu et al [ | Evaluating the impact of an integrated computer-based decision support with person-centered analytics for the management of hypertension: a randomized controlled trial | China | Abstract | Cardiology |
| Vennalaganti et al [ | Increased detection of Barrett’s esophagus–associated neoplasia using wide-area trans-epithelial sampling: a multicenter, prospective, randomized trial | The United States | Original article | Gastroenterology and hepatology |
| Biester et al [ | DREAM5: An open-label, randomized, cross-over study to evaluate the safety and efficacy of day and night closed-loop control by comparing the MD-Logic automated insulin delivery system to sensor augmented pump therapy in patients with type 1 diabetes at home | Germany, Israel, and Slovenia | Original article | Endocrinology, diabetes, and metabolism |
| Pouska et al [ | The use of HPI (Hypotension probability indicator) during major intracranial surgery; preliminary results of a prospective randomized trial | Czech Republic | Abstract | Anesthesiology |
| Lin et al [ | Diagnostic efficacy and therapeutic decision-making capacity of an artificial intelligence platform for childhood cataracts in Eye Clinics: a multicenter randomized controlled trial | China | Original article | Ophthalmology |
| Kamdar et al [ | A randomized controlled trial of a novel artificial intelligence–based smartphone application to optimize the management of cancer-related pain | The United States | Abstract | Clinical oncology |
| Persell et al [ | Effect of home blood pressure monitoring via a smartphone hypertension coaching application or tracking application on adults with uncontrolled hypertension: a randomized clinical trial | The United States | Original article | Cardiology |
| Voss et al [ | Effect of wearable digital intervention for improving socialization in children with autism spectrum disorder: a randomized clinical trial | The United States | Original article | Psychiatry |
| Wang et al [ | Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study | China | Original article | Gastroenterology and hepatology |
| Wu et al [ | Randomised controlled trial of WISENSE, a real-time quality improving system for monitoring blind spots during esophagogastroduodenoscopy | China | Original article | Gastroenterology and hepatology |
| Pavel et al [ | A machine-learning algorithm for neonatal seizure recognition: a multicentre, randomised, controlled trial | Ireland, the Netherlands, Sweden, and the United Kingdom | Original article | Neurology |
| Alfonsi et al [ | Carbohydrate counting app using image recognition for youth with type 1 diabetes: pilot randomized control trial | Canada | Original article | Endocrinology, diabetes, and metabolism |
| Auloge et al [ | Augmented reality and artificial intelligence–based navigation during percutaneous vertebroplasty: a pilot randomised clinical trial | France | Original article | Orthopedics and traumatology |
| Avari et al [ | Safety and feasibility of the PEPPER adaptive bolus advisor and safety system; a randomized control study | The United Kingdom and Spain | Original article | Endocrinology, diabetes, and metabolism |
| Chen et al [ | Comparing blind spots of unsedated ultrafine, sedated, and unsedated conventional gastroscopy with and without artificial intelligence: a prospective, single-blind, 3-parallel-group, randomized, single-center trial | China | Original article | Gastroenterology and hepatology |
| Gong et al [ | Detection of colorectal adenomas with a real-time computer-aided system (ENDOANGEL): a randomised controlled study | China | Original article | Gastroenterology and hepatology |
| Liu et al [ | The single-monitor trial: an embedded CADe system increased adenoma detection during colonoscopy: a prospective randomized study | China | Original article | Gastroenterology and hepatology |
| Nicolae et al [ | Conventional vs machine learning-based treatment planning in prostate brachytherapy: results of a phase I randomized controlled trial | Canada | Original article | Clinical oncology |
| Repici et al [ | Efficacy of real-time computer-aided detection of colorectal neoplasia in a randomized trial | Italy | Original article | Gastroenterology and hepatology |
| Su et al [ | Impact of a real-time automatic quality control system on colorectal polyp and adenoma detection: a prospective randomized controlled study (with videos) | China | Original article | Gastroenterology and hepatology |
| Wang et al [ | Lower adenoma miss rate of computer-aided detection-assisted colonoscopy vs routine white-light colonoscopy in a prospective tandem study | China | Original article | Gastroenterology and hepatology |
| Wang et al [ | Effect of a deep-learning computer-aided detection system on adenoma detection during colonoscopy (CADe-DB trial): a double-blind randomised study | China | Original article | Gastroenterology and hepatology |
| Wijnberge et al [ | Effect of a machine learning-derived early warning system for intraoperative hypotension vs standard care on depth and duration of intraoperative hypotension during elective noncardiac surgery: the HYPE randomized clinical trial | The Netherlands | Original article | Anesthesiology |
| Weisinger et al [ | Artificial intelligence-powered non-invasive and frequency-tuned electromagnetic field therapy improves upper extremity motor function in sub-acute stroke patients: a pilot randomized controlled trial | Israel | Abstract | Neurology |
| Blomberg et al [ | Effect of machine learning on dispatcher recognition of out-of-hospital cardiac arrest during calls to emergency medical services: a randomized clinical trial | Denmark | Original article | Emergency medicine |
| Browning et al [ | The clinical effectiveness of using a predictive algorithm to guide antidepressant treatment in primary care (PReDicT): an open-label, randomised controlled trial | The United Kingdom, Spain, Germany, France, and the Netherlands | Original article | Psychiatry |
| Jayakumar et al [ | Comparison of an artificial intelligence-enabled patient decision aid vs educational material on decision quality, shared decision-making, patient experience, and functional outcomes in adults with knee osteoarthritis: a randomized clinical trial | The United States | Original article | Orthopedics and traumatology |
| Kamba et al [ | A multicentre randomized controlled trial to verify the reducibility of adenoma miss rate of colonoscopy assisted with artificial intelligence–based software | Japan | Abstract | Gastroenterology and hepatology |
| Luo et al [ | Artificial intelligence-assisted colonoscopy for detection of colon polyps: a prospective, randomized cohort study | China | Original article | Gastroenterology and hepatology |
| Rafferty et al [ | A novel mobile app (Heali) for disease treatment in participants with irritable bowel syndrome: randomized controlled pilot trial | The United States | Original article | Gastroenterology and hepatology |
| Repici et al [ | Artificial intelligence and colonoscopy experience: lessons from two randomised trials | Italy and Switzerland | Original article | Gastroenterology and hepatology |
| Strömblad et al [ | Effect of a predictive model on planned surgical duration accuracy, patient wait time, and use of presurgical resources: a randomized clinical trial | The United States | Original article | Surgery |
| Wu et al [ | Evaluating the effects of an artificial intelligence system on endoscopy quality and preliminarily testing its performance on detecting early gastric cancer: a randomized controlled trial | China | Original article | Gastroenterology and hepatology |
| Yao et al [ | Artificial intelligence-enabled electrocardiograms for identification of patients with low ejection fraction: a pragmatic, randomized clinical trial | The United States | Original article | Cardiology |
| Brown et al [ | Deep learning computer-aided polyp detection reduces adenoma miss rate: a US multi-center randomized tandem colonoscopy study (CADeT-CS trial) | The United States | Original article | Gastroenterology and hepatology |
Figure 2Number of randomized controlled trials of artificial intelligence–assisted medicine per year.
Figure 3Distribution of original of publications and specialty.
Study objectives and end points (primary and secondary).
| Author (publication year) | Study objective | Primary end point | Secondary end point | Clinical relevance |
| El Solh et al [ | To test the effectiveness of an ANNa application for CPAPb titration on the time required to achieve an optimal CPAP pressure and CPAP titration failure | Time to optimal CPAP pressure | Titration failure rate | Change of treatment |
| Shimabukuro et al [ | To test the use of a machine learning–based severe sepsis prediction system for reductions in average length of stay and in-hospital mortality rate | Average hospital length of stay | In-hospital mortality rate; ICUc length of stay | Mortality and hospital and ICU length of stay |
| Labovitz et al [ | To evaluate the use of an artificial intelligence platform on mobile devices in measuring and increasing medication adherence in patients with stroke on anticoagulation therapy | Medication adherence | Nil | Nil |
| Gracey et al [ | To evaluate the effectiveness of using artificial intelligence to target which patients should receive interventions compared with traditional targeting approaches to improve medication adherence | Medication adherence | Nil | Nil |
| Liu et al [ | To assess the effects of clinical decision support system of graph-based machine learning algorithms on blood pressure management and economic burden of disease | Blood pressure reduction in patients with hypertension | Economic burden | Change of treatment |
| Vennalaganti et al [ | To evaluate the use of WATSd as an adjunct to biopsy sampling for the detection of HGDe or EACf in a referral population with BEg | Rate of detection of HGD or EAC | Neoplasia detection rates based on the procedure order (WATS vs biopsy sampling first) of each procedure separately and the additional time required for WATS | Further investigation |
| Biester et al [ | To evaluate the safety and efficacy of 60-hour glucose control using the MD-Logic system in individuals with type 1 diabetes at home for day and night use, particularly without remote monitoring | Percentage of glucose sensor readings within 70 to 180 mg/dL (3.9-10 mmol/L) | Percentage of glucose sensor readings <60 to 70 mg/dL (3.3-3.9 mmol/L), percentage of glucose sensor readings >180 to 240 mg/dL (10-13.3 mmol/L), average and SD of glucose sensor readings, and overnight percentage of readings (“overnight” defined as 11:00 PM-7:00 AM) <70 mg/dL (3.9 mmol/L) | Further investigation |
| Pouska et al [ | To assess the use of HPIh to avoid hypotension in major intracranial surgery | Number of hypotension events; duration of hypotension events | Number of hypotension events in maintenance phase of anesthesia | Further investigation |
| Lin et al [ | To compare the diagnostic efficacy and treatment decision-making capacity between CC-Cruiser and ophthalmologists in real-world clinical settings | Accuracy of the diagnosis normal lens versus cataract | Evaluation of the disease severity; time required for making the diagnosis; patient satisfaction | Further investigation |
| Kamdar et al [ | To examine the impact of ePAL on cancer pain severity, attitudes toward cancer pain, and health care use | Pain severity | Attitudes toward cancer treatment (Barriers Questionnaire II); anxiety (General anxiety Disorder-7); pain-related hospital admissions | Hospitalization |
| Persell et al [ | To evaluate the effectiveness of an artificial intelligence smartphone coaching app to promote hypertension self-management | SBPi measured at 6 months | Self-reported medication adherence; home monitoring and self-management practices; self-efficacy related to BPj and BMI; self-reported health behaviors | Change of treatment |
| Voss et al [ | To test the efficacy of a wearable machine learning tool for intervention on a core ASDk deficit in the natural home environment | SRS-IIl total score; Vineland Adaptive Behavioural Scales, Second edition; Developmental Neuropsychological Assessment, Second edition; Emotion Guessing Game | Moderator analysis; child behavior checklist; and the Vineland Adaptive Behavioural Scales, Second edition adaptive composite score | Nil |
| Wang et al [ | To investigate whether a high-performance real-time automatic polyp detection system can increase polyp and ADRsm in the real clinical setting | ADR | PDRn; mean number of polyps detected per colonoscopy; mean number of adenomas detected per colonoscopy; rate of false positives and false negatives | Further investigation |
| Wu et al [ | To evaluate the effectiveness of WISENSE to monitor blind spots, time the procedure, and automatically generate photo documentation during EGDo and thus raise the quality of everyday endoscopy | Blind spot rate | Inspection time; completeness of photo documentation generated by endoscopists; completeness of photo documentation generated by WISENSE in WISENSE group; completeness of photo documentation generated by WISENSE and endoscopists in WISENSE group; the percentage of patients being ignored in each site | Further investigation |
| Pavel et al [ | To evaluate the performance of the ANSeRp algorithm in real time by assessing the diagnostic accuracy for the detection of neonatal electrographic seizures with and without the use of ANSeR as a support tool for clinicians at the cot side | Diagnostic accuracy (sensitivity, specificity, and false detection rate) of health care professionals to identify neonates with electrographic seizures and seizure hours with and without the support of the ANSeR algorithm | Summary measures of seizure burden (total seizure burden, maximum hourly seizure burden, and median seizure duration); number of inappropriate antiseizure medications given | Change of treatment |
| Alfonsi et al [ | To test the app’s usability and potential impact on carbohydrate counting accuracy | Carbohydrate counting accuracy | Quality of life for youth; self-care; patient or parent responsibility | Nil |
| Auloge et al [ | To evaluate technical feasibility, accuracy, safety, and patient radiation exposure granted by a novel navigational tool integrating augmented reality and artificial intelligence during percutaneous vertebroplasty of patients with vertebral compression fractures | Technical feasibility of trocar placement using augmented reality or artificial intelligence guidance | Comparison between groups A and B in terms of accuracy, procedural safety, time for trocar placement, and patient radiation exposure (dose area product and fluoroscopy time) | Nil |
| Avari et al [ | To evaluate the safety and efficacy of the PEPPER system compared with a standard bolus calculator | Difference in change in percentage time in range (3.9-10.0 mmol/L; 70-180 mg/dL) between the intervention arm that receives the PEPPER safety system with adaptive bolus advice and the control arm | Percentage time spent in euglycemia, hypoglycemia, and hyperglycemia; number of episodes of serious hypoglycemia; episodes of hypoglycemia within 5 hours postprandially; severe hypoglycemia (defined as a hypoglycemia event requiring third party assistance); postprandial mean area under the curve at 5 hours (expressed as mmol/L min); glycemia risk and variability measures | Change of treatment |
| Chen et al [ | To compare blind spots of sedated C-EGDq, unsedated U-TOEr, and unsedated C-EGD with and without the assistance of ENDOANG | The blind spot of 3 types of EGD with the assistance of ENDOANGEL | Blind spot rate of unsedated U-TOE and unsedated and sedated C-EGD with or without the assistance of ENDOANGEL; consistency between ENDOANGEL and endoscopists’ review | Further investigation |
| Gong et al [ | To evaluate whether the ENDOANGEL system could improve polyp yield during colonoscopy | ADR | The ADR for adenomas of different sizes (diminutive [≤5 mm], small [>5 to<10 mm], and large [≥10 mm]); locations (cecum, ascending colon, transverse colon, descending colon, sigmoid colon, and rectum); PDR; PDR for polyps of different sizes; locations; mean number of adenomas per patient; mean number of polyps per patient; withdrawal time (time spent viewing as the endoscope is withdrawn during a colonoscopy, excluding biopsy or treatment time); adverse events and serious adverse events | Further investigation |
| Liu et al [ | To investigate whether the integration of a CADes system into the primary monitor used during colonoscopy may increase polyp and adenoma detection without increasing physician fatigue | ADR | PDR; polyps per colonoscopy and adenomas per colonoscopy | Further investigation |
| Nicolae et al [ | To evaluate the noninferiority of day 30 dosimetry between a machine learning–based treatment planning system for prostate low-dose-rate brachytherapy and the conventional manual planning technique | The 1-month postoperative follow-up results between expert-planned low-dose-rate treatments (conventional) and the PIPAt machine learning approach | The efficiency of the PIPA approach in a standardized preoperatively planned workflow; total treatment planning time; need and extent of modifications | Nil |
| Repici et al [ | To assess the safety and efficacy of a CADe system in detection of colorectal neoplasias during real-time colonoscopy | ADR | Proximal ADR; total number of polyps detected; sessile serrated lesion detection rate; mean number of adenomas per colonoscopy; cecal intubation rate; withdrawal time | Further investigation |
| Su et al [ | To develop an automatic quality control system and assess whether it could improve polyp and adenoma detection in clinical practice | ADR | PDR; mean number of adenomas detected per colonoscopy; mean number of polyps detected per colonoscopy; withdrawal time (biopsy time was excluded by stopping the clock); adequate bowel preparation rate, defined as the percentage of colonoscopies with each segmental BBPSu score 2 | Further investigation |
| Wang et al [ | To compare adenoma miss rates of CADe colonoscopy vs routine white-light colonoscopy. | Adenoma miss rate | Polyp miss rate; miss rate of advanced adenomas; sessile serrated adenoma or polyps; patient miss rate; ADR for the first pass; adenoma per colonoscopy; polyp per colonoscopy | Further investigation |
| Wang et al [ | To perform a double-blinded study using a sham control to more rigorously assess the effectiveness of a CADe system for improving detection of colon adenomas and polyps. We also aimed to analyze the characteristics of polyps missed by endoscopists | ADR | PDR; number of polyps per colonoscopy; number of adenomas per colonoscopy; sensitivity; specificity of the 3 skilled endoscopists | Further investigation |
| Wijnberge et al [ | To test whether the clinical application of the early warning system in combination with a hemodynamic diagnostic guidance and treatment protocol reduces intraoperative hypotension | Time-weighted average of hypotension during surgery | Incidence of hypotension (the number of hypotensive events per patient): total time with hypotension and percentage of time spent with hypotension during surgery; incidence of hypertension (the number of hypotensive events per patient): total time with hypertension and percentage of time spent with hypertension during surgery | Further investigation |
| Weisinger et al [ | To explore the benefit of BrainQ’s novel and noninvasive, artificial intelligence–powered, frequency-tuned ELF-EMFv treatment (BQ) in improving upper extremity motor function in a population with subacute ischemic stroke | Fugl-Meyer Assessment-Upper Extremity score | Modified Rankin Scale; Action Research Arm Test; Box and Block Test; NIHSSw | Nil |
| Blomberg et al [ | To examine how a machine learning model trained to identify OHCAx and alert dispatchers during emergency calls affected OHCA recognition and response | Rate of dispatchers’ recognition of subsequently confirmed OHCA | Dispatchers’ time to recognition of OHCA; rate of DA-CPRy | Change of treatment |
| Browning et al [ | To assess the clinical effectiveness of using a predictive algorithm based on behavioral tests of affective cognition and subjective symptoms and to guide antidepressant treatment | Treatment response of depression symptoms | Change in anxiety scores at week 8 (measured using the Generalized Anxiety Disorder Assessment, 7 item version [ | Nil |
| Jayakumar et al [ | To assess the effect of an artificial intelligence–enabled patient decision aid that includes education, preference assessment, and personalized outcome estimations (using patient-reported outcome measurements) on decision quality, patient experience, functional outcomes, and process-level outcomes among individuals with advanced knee osteoarthritis considering total knee replacement in comparison with education only | Decision process score of the knee decision quality instrument questions 3.1 to 3.5 | Level of shared decision-making (assessed using the CollaboRATE survey); patient satisfaction with the consultation (numerical rating scale); condition-specific symptoms and functional limitations (Knee Injury and Osteoarthritis Outcome Score, Joint Replacement); duration of consultation in minutes; total knee replacement rates (proportion of patients undergoing surgery); treatment concordance (knee decision quality instrument question 1.6) | Nil |
| Kamba et al [ | To clarify whether adenoma miss rate could be reduced with the CADe assistance during screening and surveillance colonoscopy | Adenoma miss rate | Polyp miss rate; sessile serrated lesion miss rate; ADR | Further investigation |
| Luo et al [ | To explore whether artificial intelligence–assisted colonoscopy could improve the PDR in the actual clinical environment | PDR | Number of polyps detected; the number of diminutive polyps (diameter <6 mm); the number of polyps of each Paris type detected; the number of false positive results | Further investigation |
| Rafferty et al [ | To determine whether Heali, a novel artificial intelligence dietary mobile app can improve adherence to the LFDac, IBSad symptom severity and quality of life outcomes in adults with IBS or IBS-like symptoms over a 4-week period | Adherence to the LFD | IBS symptom severity; quality of life outcomes | Nil |
| Repici et al [ | To assess the efficacy of a CADe system in detection of colorectal neoplasias in a nonexpert setting to challenge the CADe impact in a real-life scenario | ADR | Proximal ADR; total number of polyps detected; sessile serrated lesion detection rate; mean number of adenomas per colonoscopy; cecal intubation rate; withdrawal time | Further investigation |
| Strömblad et al [ | To assess accuracy and real-world outcome from implementation of a machine learning model that predicts surgical case duration | Accurate prediction of the duration of each scheduled surgery | Effects on patients and systems were measured by start time delay of following cases; time between cases; the time patients spent in presurgical area | Nil |
| Wu et al [ | To verify the effectiveness of ENDOANGEL in improving endoscopy quality and pretest its performance in detecting EGCae in a multicenter randomized controlled trial | Number of blind spots | Performance of ENDOANGEL in predicting early gastric cancer in a clinical setting | Further investigation |
| Yao et al [ | To assess whether an ECG-based, artificial intelligence–powered clinical decision support tool enables early diagnosis of low EFaf, a condition that is underdiagnosed but treatable | Rate of newly diagnosed low EF, defined as EF≤50% within 90 days | Completion of an ECG within 90 days; other findings (eg, valvular heart disease), except low EF present on ECGs | Further investigation |
| Brown et al [ | To assess the comparative adenoma miss rate for CADe-assisted colonoscopy when compared with high-definition white light colonoscopy alone | Adenoma miss rate | Polyp miss rate; hyperplastic polyp miss rate; sessile serrated lesion miss rate; ADR; PDR; adenoma per colonoscopy; polyp per colonoscopy; sessile serrated lesion per colonoscopy | Further investigation |
aANN: artificial neural network.
bCPAP: continuous positive airway pressure.
cICU: intensive care unit.
dWATS: wide-area transepithelial sampling.
eHGD: high-grade dysplasia.
fEAC: esophageal adenocarcinoma.
gBE: Barrett’s esophagus.
hHPI: hypotension probability indicator.
iSBP: systolic blood pressure.
jBP: blood pressure.
kASD: autism spectrum disorder.
lSRS-II: Social Responsiveness Scale II.
mADR: adenoma detection rate.
nPDR: polyp detection rate.
oEGD: esophagogastroduodenoscopy
pANSeR: Algorithm for Neonatal Seizure Recognition.
qC-EGD: conventional esophagogastroduodenoscopy.
rU-TOE: ultrathin transoral endoscopy.
sCADe: computer-assisted detection.
tPIPA: prostate implant planning algorithm.
uBBPS: Boston Bowel Preparation Scale.
vELF-EMF: extremely low frequency and low intensity electromagnetic fields.
wNIHSS: National Institutes of Health Stroke Scale.
xOHCA: out-of-hospital cardiac arrest.
yDA-CPR: dispatcher-assisted cardiopulmonary resuscitation.
zQIDS-SR-16: Quick Inventory of Depressive Symptomatology (16-Item) (Self-Report).
aaMADRS: Montgomery-Åsberg Depression Rating Scale.
abSAS: Social Adjustment Scale.
acLFD: low fermentable oligo-, di-, mono-saccharides and polyols diet.
adIBS: irritable bowel syndrome.
aeECG: electrocardiogram.
afEF: ejection fraction.
Study results and limitations.
| Author (publication year) | Primary end point result | Secondary end point result | Study outcome | Underpowered | Study conclusion | Limitations |
| El Solh et al [ | Time to optimal CPAPa pressure: AIb mean 198.7 (SD 143.8) minutes versus control mean 284.0 (SD 126.5) minutes | Titration failure: AI 16% versus control 36%; drop of residual obstructive apnea–hypopnea events and oxygen desaturations | Positive | No | Maximizing the time to achieve optimal CPAP and in reducing CPAP titration failure | Single center only; possible analysis bias as technologists were not blinded |
| Shimabukuro et al [ | AI 10.3 days versus control 13 days | In-hospital mortality: AI 8.96% versus control 21.3%; ICUc length of stay: AI 6.31 days versus control 8.40 days | Positive | No | Significant decrease in the hospital LOSd and in-hospital mortality | Small sample size; heterogenous population; trial was conducted in the 2 ICUs only; metrics were not monitored prospectively during the study because of the likely misrepresentation of such results; false positive rate, sensitivity, and prediction rate may be affected as clinicians may have initiated treatment before severe sepsis onset owing to advanced notice from the predictive algorithm; the use of overall metrics, LOS, and in-hospital mortality for all comers may underestimate the impact of the intervention on outcomes for patients with sepsis; potential for competing risks in the selected end points, mortality may shorten a patient’s LOS; this study was patient-outcome oriented |
| Labovitz et al [ | Mean (SD) cumulative adherence based on pill count was 97.2 (4.4%) for the AI platform group and 90.6% (5.8%) for the control group. Plasma drug concentration levels indicated that adherence was 100% (15/15) and 50% (6/12) in the intervention and control groups, respectively | Nil | Positive | Unknown | Real-time monitoring has the potential to increase adherence and change behavior, particularly in patients on direct oral anticoagulant therapy | Not mentioned |
| Gracey et al [ | Likelihood of being adherent: AI>control, 6.11%; likelihood of being adherent: AI>traditional, 7.8%; no significant difference in likelihood of being adherent | Nil | Positive | Unknown | Using AI to target interventions can increase the effectiveness of medication adherence intervention programs | Not mentioned |
| Liu et al [ | AI versus control: no significant difference | Economic burden of disease—AI versus control (all): no significant difference; economic burden of disease—AI: 46,006 (SD 40,831) yuan (US $6901 [SD 6125]) versus control (in surgical dept): 64,192 (SD 67,968) yuan (US $9629 [SD 10195])); benefit-cost ratio of AI: 1.15; net present value of benefit-cost of AI: 5792 yuan; direct medical costs—AI: 43,467 (SD 39.716) versus control: 61,205 (SD 66,576) yuan | Negative | Unknown | A clinical decision support system based on the graph-based machine learning algorithms changed the antihypertensive prescriptions and reduced the medical expense among patients with hypertension | Not mentioned |
| Vennalaganti et al [ | HGDe or EACf detection—WATSg alone: 29 versus control alone: 7; AI (alone)>control (alone) 4.2 times | Neoplasia detection rates: not mentioned; average time required for WATS; additional time required for WATS: 11 minutes 26 seconds versus control: 6 minutes 55 seconds | Positive | Unknown | WATS increases the detection of HGD and EAC in a high-risk BEh surveillance population when used as an adjunct to biopsy sampling compared with biopsy sampling alone | Single center research only; potential of population bias as study population (20%) was enriched with patients with BE with a known history of dysplasia or referred for endoscopic therapy; no long-term follow-up |
| Biester et al [ | AI: 66.6% versus control: 59.9% | Percentage <60 mg/dL—AI: 0.64% versus control: 0.38%; percentage <70 mg/dL—AI: 2.31% versus control: 1.45%; percentage >180 mg/dL—AI: 28.32% versus control: 36.43%; percentage >240 mg/dL—AI: 8.53% versus control: 8.71%; Mean —AI: median (IQR) 153.11 (142.33-174.81) versus control: 163.84 (150.17-186.54); SD—AI: median (IQR) 52.71 (44.75-66.39) versus control: 54.95 (46.19-69.19) | Positive | No | The MD-Logic system was safe and associated with better glycemic control than SAPi therapy for day and night use. The absence of remote monitoring did not lead to safety signals in adapting basal rates nor in administration of automated bolus corrections | High rate of communication errors between the tablet computer running the algorithm and the insulin pump |
| Pouska et al [ | No significant difference in number of hypotension events between 2 groups (4/20 vs 2/20) | AI: 10 versus control: 4 | Negative | Unknown | On the basis of our data, it seems that the inclusion of HPIj into a goal-directed treatment strategy could lower the incidence of hypotension within maintenance phase of anesthesia | Not mentioned |
| Lin et al [ | Accuracy—AI: 87.4% versus control: 99.1% | No significant difference in evaluation of the disease severity between AI and control; AI: 2.79 minutes versus control: 8.53 minutes; rating of overall satisfaction—AI: mean 3.47 (SD 0.501) versus control: mean 3.38 (0.554) | Negative | No | CC-Cruiser exhibited less accuracy compared with senior human consultants in diagnosing childhood cataracts and making treatment decisions, but it has the capacity to assist human physicians in clinical practice in its current state | Patients without symptoms were less willing to participate in; patients with slightly opaque lens may have missed; CC-Cruiser provided treatment suggestions without considering the patients’ general conditions; lack of internet accessibility limited the implementation of CC-Cruiser in low-income areas; possibly sufficient statistic power because cluster RCTk was adopted in trial, whereas RCT was used in sample size calculation |
| Kamdar et al [ | Difference of BPIl between AI and control: ß=−.09 | Difference of BQ-IIm between AI and control: ß=−.037; difference of General Anxiety Disorder-7 between AI and control: ß=.21; AI: 4 versus control: 20 | Positive | Unknown | AI significantly decreases pain scores and pain-related hospitalizations in patients with cancer-related pain | Not mentioned |
| Persell et al [ | AI: mean systolic blood pressure (SD) 132.3 (15.0) mm Hg versus control: 135 (13.9) mm Hg | Significant improvement in self-reported medication adherence in AI group than control; no significant difference between home monitoring and self-management practices; AI group has 26.7 minutes per week (−5.4 to 58.8) more than control group in self-reported physical activity | Negative | Unknown | Adults with hypertension randomized to a coaching app plus home monitor had similar SBPn compared with controls receiving a tracking app and home monitor | Blinding to participants and research staff is impossible; some outcomes were self-reported; not specifically select participants who were likely to use a health-coaching app; small sample size; the app used in the study was a beta version; the AI and machine learning technology used here in this app gains information with larger numbers of users contributing data; cannot exclude the possibility that some patients may have well-controlled hypertension; limited generalizability because only iOS device users were recruited |
| Voss et al [ | SRS-IIo showed large, not significant, positive mean changes in treatment participants; the VABS-IIp socialization subscale score significantly increased between the start and end of the intervention in treatment-to-control comparisons | Moderator analyses showed a moderation effect for girls showing greater improvement; no significant changes from intake to posttest 1 were observed on Child Behaviour Checklist; the VABS-II adaptive composite score showed slightly greater improvement in younger participants | Positive | Yes | This study underscores the potential of digital home therapy to augment the standard of care | According to the poststudy empirical variance, this study may be underpowered by a factor of 2; low treatment adherence; bias in recruitment of participants; bias owing to the inherent demographic and behavioral heterogeneity of patients; second posttest appointments were not available for control participants before crossing over into treatment |
| Wang et al [ | AI: 29.1% versus control: 20.3% | AI: 0.45 versus control: 0.29 (ORq 1.995, 95% CI 1.532-2.544); AI: 0.97 versus control: 0.51; AI: 0.53 versus control: 0.31; false positive rate of AI: 0.075 per colonoscopy; false negative rate of AI: not mentioned | Positive | No | In a low prevalent ADRr population, an automatic polyp detection system during colonoscopy resulted in a significant increase in the number of diminutive adenomas detected as well as an increase in the rate of hyperplastic polyps | Endoscopists were not blinded; lack of external validity; despite low false positive rates, potential distraction during the procedure could also be caused; fatigue level of participating endoscopists were not controlled; inadequate sample size of colonoscopies performed by junior endoscopists; only Olympus colonoscopy equipment was used |
| Wu et al [ | AI: 5.86% versus control: 22.46% | AI: 5.03 minutes versus control: 4.24 minutes; AI: 71.87% versus control: 79.14%; AI: 90.64% versus control: 79.14%; AI: 92.91% versus control: 79.14%; percentage of patients being ignored in majority gastric sites were significantly lower than control | Positive | No | WISENSE greatly reduced blind spot rate, increased inspection time, and improved the completeness of photo documentation | Only Olympus and Fujifilm endoscopes were used in this trial; the withdrawal time in this trial was generally less than recommended 7 minutes of EGDs in the guideline |
| Pavel et al [ | Diagnostic accuracy (sensitivity, specificity, and false detection rate) for recognition of a neonate with seizures were not significantly different between the 2 groups; sensitivity of seizure hours—AI: 66% versus control: 45.3%; false detection rate of seizure hours was not mentioned | No significant differences found in seizure characteristics; AI: 37.5% versus control: 31.6%; difference 5.9% | Negative | No | In conclusion, this clinical investigation was the first to assess the performance of a machine learning algorithm for neonatal seizure detection in real time and in the real-world setting of busy neonatal ICUs throughout Europe | Excluded seizures with a duration of <30 seconds from both groups; analysis was done using seizure hour instead of looking at each individual seizure |
| Alfonsi et al [ | Absolute error at 3-month follow-up—AI: 27.45% (10.90%) versus control: 38.00% (14.74%); error>10 g at 3-month follow-up—AI: 21.43% (16.82%) versus control: 32.27% (16.31%) | No significant difference between groups A and B in terms of accuracy, procedural safety, time for trocar placement, and patient radiation exposure (dose area product and fluoroscopy time) | Positive | No | The data suggest that use of iSpy is associated with improved carbohydrate counting and that usability and acceptability of the app is quite positive | Single tertiary pediatric center only; the number of foods recognized by iSpy is not all encompassing; detailed information about other factors that can influence care such as education level, socioeconomic status data, family dynamics, or details of treatment regimen were not acquired; text reminders to the control participants was not provided |
| Auloge et al [ | Group A technical feasibility was 100% with successful segmentation and generation of safe or accurate trajectory in all cases | No significant difference in accuracy; no complications or unintended effects were observed in either group—AI: mean 642 (SD 210) seconds, range 300-963 versus control: mean 336 (SD 60) seconds, range 240-438; DAPt—AI: mean 182.6 (SD 106.7) mGy cm2, range 27-355 versus control: mean 367.8 (SD 184.7) mGy cm2, range 115-644; fluoroscopy time—AI: 5.2 (SD 2.6) seconds, range 1.6-8.7 versus control: mean 10.4 (SD 4.1) seconds, range 4.2-17.9 | Positive | Yes | Augmented reality or AI–guided percutaneous vertebroplasty appears feasible, accurate and safe and facilitates lower patient radiation exposure compared with standard fluoroscopic guidance | Small sample size; surgeon bias due to inherent nonblinding; lack of power to assess differences in vertebroplasty complication rates; accuracy of final trocar position was estimated on augmented fluoroscopic images rather than CBCTu; no clinical follow-up is presented |
| Avari et al [ | AI: 62.5% (52.1%-67.8%) versus control: 58.4% (49.6%-64.3%) | No significant difference for percentage of time in euglycemia, hypoglycemia, and hyperglycemia; no episode of serious hypoglycemia; no episodes of hypoglycemia within 5 hours postprandially; case of severe hypoglycemia; AI: 0 versus control: 1; no significant difference in glycemic risk | Negative | Yes | The PEPPER system was safe but did not change glycemic outcomes compared with control | The potential need for additional time required for the adaptive insulin recommender system to be effective; the algorithm is likely to be most beneficial to individuals maintaining regular work patterns rather than shift workers; the algorithm only adapts for bolus insulin and assumes that the basal insulin has been optimized; the system is dependent on meal scenarios where the user has not ingested a significant snack or taken an insulin bolus correction within 5 hours of a meal for revision |
| Chen et al [ | Sedated C-EGD versus unsedated U-TOEv versus unsedated C-EGD: 3.42% (0.89/26) versus 21.77% (5.66/26) versus 31.23% (8.12/26), respectively | Blind spot rate of Sedated C-EGD—AI: 3.42 versus control: 22.46; blind spot rate of unsedated U-TOE—AI: 21.77 versus control: 29.92; blind spot rate of unsedated C-EGD—AI: 31.23 versus control: 42.46; the average accuracy, sensitivity, and specificity of EN-DOANGEL in sedated C-EGD were 88.3%, 92.6%, and 90.2%, respectively; in unsedated U-TOE, were 91.3%, 84.5%, and 90.1%, respectively; and in unsedated C-EGD, were 87.8%,8 2.8%, and 87.8%, respectively | Positive | No | In summary, our study showed that the number of blind spots in conventional sedated EGD was the lowest compared with unseated U-TOE and unsedated EGD, and the addition of ENDOANGEL had a maximal effect on unsedated C-EGD | Single-center study; endoscopist were not blinded |
| Gong et al [ | ITTw—AI: 16% (58/355) versus control: 8% (27/349); PPx—AI: 17% (54/224) versus control: 8% (26/318) | ITT diminutive—AI: 46 (13%) versus control: 25 (7%); ITT small—AI: 4 (1%) versus control: 1 (<1%); ITT large—AI: 10 (3%) versus control: 1 (<1%); no significant differences were found comparing adenoma locations: ITT—AI: 47% (166/355) versus control: 34% (118/349); ITT diminutive—AI: 158 (45%) versus control: 114 (33%); ITT small—AI: 9 (3%) versus control: 7 (2%); ITT large—AI: 11 (3%) versus control: 3 (1%); significant different was only found in sigmoid colon—AI: 79 (22%) versus control: 48 (14%); AI: 0.18 versus control: 0.08; AI: 1.17 versus control: 0.68; AI: 6.38 minutes versus control: 4.76 minutes; no adverse and serious adverse events | Positive | No | In conclusion, the ENDOANGEL system is a quality improving system for colonoscopy that uses computer vision, real-time monitoring of withdrawal speed, and timing of colonoscopy intubation and withdrawal and provides reminders to endoscopists of blind spots, in addition to live tracking previously seen frames during colonoscopy | AI was validated at 1 center only; the withdrawal speed was artificially divided into safe, alarm, and dangerous by assessing videos from Renmin Hospital of Wuhan University; the difference between assisted and unassisted colonoscopy for adenomas of 6-9 mm was not significant, which could be attributable to small numbers |
| Liu et al [ | AI: 29.01% versus control: 20.91% | AI: 47.07% versus control: 33.25%; AI: 1.07 versus control: 0.51; AI: 0.48 versus control: 0.29 | Positive | No | In conclusion, real-time visual alarms provided by a high-performance CADey system embedded into the primary colonoscopy monitor, with nearly unnoticeable latency, have been shown to cause a significant improvement in ADR because of an increased detection of diminutive adenomas without increasing physician fatigue level during colonoscopy | Open-labeled study; the fatigue score was subjective and susceptible to factors other than the visual alarms; whether a polyp was first detected by CADe before the endoscopist was based on the operating endoscopist’s own judgment; the fact that the CADe system detected a polyp before the endoscopists does not necessarily mean that the endoscopists would have missed that lesion |
| Nicolae et al [ | No significant difference in CTVz V100, CTV D90, and Rectum V100 at 1-month postoperative follow-up | AI: mean 2.38 (SD 0.96) minutes versus control: mean 43.13 (SD 58.70) minutes; no significant difference in need and extent of modifications | Positive | Yes | A machine learning–based planning workflow for prostate LDRaa brachytherapy has the potential to offer significant time savings and operational efficiencies, while producing noninferior postoperative dosimetry to that of expert, conventional treatment planners | Single-center study; examining only preoperatively planned cases |
| Repici et al [ | AI: 54.8% versus control: 40.4% | AI: 123 versus control: 97; AI: 353 out of 262 patients versus control: 243 out of 198 patients; AI: 7% versus control: 5.2%; AI: 1.07 versus control: 0.71; AI: 95.6% versus control: 98.5%; withdrawal time: not mentioned | Positive | No | In a multicenter, randomized trial, we found that including CADe in real-time colonoscopy significantly increases ADR and adenomas detected per colonoscopy without increasing withdrawal time | Psychological bias could not be excluded; the equivalence in withdrawal time excludes a somewhat reduced degree of mucosal exposure in the control arm; low detectors and inexperienced or nongastroenterologist endoscopists were not involved in this study |
| Su et al [ | AI: 28.90% versus control: 16.51% (OR 2.055, 95% CI 1.397-3.024; | AI: 38.31% versus control: 25.40%; AI: 0.367 versus control: 0.178; AI: 0.575 versus control: 0.305; AI: mean 7.03 (SD 1.01) minutes versus control: mean 5.68 (SD 1.26) minutes; AI: 87.34% versus control: 80.63% | Positive | No | In summary, AQCSab, an automatic quality control system, could be used in real time for timing, supervising withdrawal stability, evaluating BBPSac, and detecting polyp | Single endoscopic center; some false prompts occurred with the AQCS; fatigue level of participating physicians was not controlled; used 4 intraprocedural quality metrics to form the AQCS, without performing preliminary testing to evaluate whether just 2 or 3 or 4 of these metrics had the same quality improvement; did not test the sole effect of colonoscopy stability; the DCNNsad were trained only on images obtained from a Pentax imaging system |
| Wang et al [ | AI: 13.89% versus control: 40% | AI: 12.98% versus control: 45.90%; no statistical differences in the miss rate of advanced adenomas and sessile serrated adenoma or polyps; no significant difference in patient miss rate; no significant difference in ADR for the first pass; no significant difference in adenoma per colonoscopy; no significant difference in polyp per colonoscopy | Positive | No | The results from this study suggest a significantly lower AMRae when a CADe technology is used compared with routine white light colonoscopy. The detection of diminutive and small adenomas with nonadvanced histology and nonpedunculated shape could be effectively improved by CADe colonoscopy | AMR obtained in the tandem study cannot reflect the absolute miss rate; subjective bias in open-labeled trial; tandem colonoscopy in each patient was performed by the same endoscopist; study population was not restricted to screening-only participants according to guidelines; only skilled endoscopists were allowed to participate in this study; subjected bias may be introduced as the judgments made by the panel of 3 experts who reviewed the video record were not a gold standard as pathology |
| Wang et al [ | AI: 165 (34%) versus sham control: 132 (28%) | AI: 252 (52%) versus sham control: 176 (37%); AI: 1.04 versus sham control: 0.64; AI: 0.58 versus sham control: 0.38 | Positive | No | The CADe system is a safe and effective method to increase ADR during colonoscopy | Potential bias in the presence of a second senior endoscopist; bias in patient recruitment; the actual alert numbers of the sham system should have been measured in the trial to show equivalence |
| Wijnberge et al [ | AI: median 0.10, IQR 0.01-0.43 mm Hg versus control: median 0.44, IQR 0.23-0.72 mm Hg | AI: 3.00, IQR 1.00-8.00 versus control: 8.00, IQR 3.50-12.00; AI: 8.00, IQR 1.33-26.00 minutes versus control: 32.67, IQR 11.50-59.67 minutes; AI: 2.8%, IQR 0.8%-6.6% versus control: 5.6%, IQR 3%-9.4%; AI: 2.0 (0.0 to 3.0) versus control: 0.0 (−1.0 to 0.0); AI: 4.0 (0.0 to 10.7) minutes versus control: −0.7 (−4.3 to 0.7) minutes; AI: 1.5% (0.0 to 3.3) versus control: −0.2% (−1.4 to 0.3) | Positive | No | In this single-center preliminary study of patients undergoing elective noncardiac surgery, the use of a machine learning–derived early warning system compared with standard care resulted in less intraoperative hypotension. Further research with larger study populations in diverse settings is needed to understand the effect on additional patient outcomes and to fully assess safety and generalizability | Single center only; small sample size; patient may have their own personal minimal MAP to be maintained during surgery; depth of anesthesia was not measured; the early warning system is validated only for invasive continuous blood pressure monitoring; an observer being present in the operating room may have influenced protocol adherence |
| Weisinger et al [ | Fugl-Meyer Assessment-Upper Extremity: week 4—AI: mean 23.2 (SD 3.91) versus control: mean 9.9 (SD 3.2); week 8—AI: mean 31.5 (SD 2.97) versus control: mean 23.1 (SD 4.99) | AI: 2.5 (0.18) points versus control: 1.3 (0.16) points; significance improved: Action Research Arm Test-Pinch subscale; significance improved: Box and Block Test; significance improved: NIHSSae | Positive | Unknown | BQ treatment significantly improves upper extremity motor function in a population with subacute ischemic stroke across multiple clinical metrics. Further studies are planned and ongoing with larger study populations and in related indications | Nil |
| Blomberg et al [ | AI: 93.1% (296/318) versus control: 90.5% (304/336) | AI: 1.72 (1.52) minutes versus control: 1.70 (1.63) minutes; AI: 64.8% versus control: 61.9% | Negative | Yes | This randomized clinical trial did not find any significant improvement in dispatchers’ ability to recognize cardiac arrest when supported by machine learning even though AI did surpass human recognition | Not 100% compliance with the machine learning model; the servers analyzing the phone calls had downtime, because the server was underdimensioned |
| Browning et al [ | QIDS-SR-16af at week 8—AI: 55.9% versus control: 51.8 | Generalized Anxiety Disorder Assessment, 7 item version (week 8)—AI: −5.44 versus control: −6.12 | Negative | Yes | Use of a predictive algorithm to guide antidepressant treatment improves symptoms of anxiety and functional outcomes provides initial support for the use of personalized medicine approaches in the treatment of depression | The accuracy of the predictive algorithm was modest at 57.5%; effectiveness was focused rather than efficacy, requesting but not requiring clinicians to alter treatment in response to a prediction of nonresponse; randomization occurred at the level of the patient rather than the site, and thus, the treatment as usual arm may have been influenced by behavior learned in the active arm |
| Jayakumar et al [ | AI: mean 68.9 (SD 19.8) versus control: mean 48.8 (SD 14.5) | CollaboRATE median—AI: 8 of 69 versus control: 28 of 60; number of patient-rated satisfaction scores lower than the median value of 10—AI: 9 of 69 versus control: 19 of 60; no significant difference in duration of consultation in minutes; no significant difference in TKRag rates and treatment concordance | Positive | No | In this randomized clinical trial, an AI-enabled decision aid significantly improved decision quality, level of shared decision-making, satisfaction, and physical limitations without significantly impacting consultation times, TKR rates, or treatment concordance in patients with knee osteoarthritis considering TKR. Decision aids using a personalized, data-driven approach can enhance shared decision-making in the management of knee osteoarthritis | Single-center study; surgeons were not masked; we did not assess the effect of the decision aid on patient knowledge; the typical course of a formal osteoarthritis in-clinic diagnosis possesses a general limitation in limiting the time frame over which the tool may be applied |
| Kamba et al [ | AI first: 13.8% versus control first: 35.7% | AI first: 14.2% versus control first: 40.6%; AI first: 13% versus control first: 38.5%; AI first: 64.5% versus control first: 53.6% | Positive | No | The reduction of AMR by assisting with CADe based on deep learning in a multicenter randomized controlled trial | Nil |
| Luo et al [ | AI: 38.7% versus control: 34% | The number of polyps detected in the control group and the research group was 80 and 105, respectively; AI: 91 versus control: 69; polyp type 0-IIa—AI: 87 versus control: 61; polyp type 0-Is—AI: 5 versus control: 8; polyp type 0-Ip—AI: 13 versus control: 11; 52 false positive result in AI group; in average, 0.35 false positive per colonoscopy | Positive | No | This study shows that an AI system based on deep learning and its real-time performance led to significant increases in colorectal PDRah | Single center study; small sample size; AI has different effects on improving the PDR among different physicians; ADR was not compared between 2 groups in this trial |
| Rafferty et al [ | IBSai symptom score—AI: −170 versus control: −138; quality of life score—AI: 31.1 versus control: 11.8 | No significant difference; AI: 8.3 (4.4-13.1) versus control: 10.4 (7.4-14.0) | Negative | Yes | Results showed that the Heali app was able to significantly increase quality of life outcomes in IBS participants over a 30-day intervention period | Small sample size; self-reporting bias in survey may resulted owing to lack of blinding; stratification was not done; participants were not randomized to groups until study day 10, which was after the collection of baseline data; although anthropometric measures (bodyweight and height) were collected at baseline, they were not collected at the end of the trial, and it is possible that changes in body weight influenced the outcome variables; adherence may be affected by social impacts of the COVID-19 pandemic |
| Repici et al [ | AI: 176/330, 53.3% versus control: 146/330, 44.2% | AI: 41.5% versus control: 36.1%; AI: 1.98 (range 0-15) versus control: 1.61 (range: 0-17); AI: 3.3% versus control: 5.2%; AI: 1.26 (SD 1.82) versus control: 1.04 (SD 1.75); 100% in both groups after excluding patients with inadequate bowel preparation; AI: mean 815 (SD 1.6) versus control: mean 7.98 (SD 1.5) | Positive | No | CADe in real-time colonoscopy significantly increases ADR and adenomas detected per colonoscopy in a nonexpert setting | No comparison of AI assistance with alternative educational interventions among inexpert endoscopists; this study design was not fit to assess the sensitivity or specificity of the device; no power calculations were done for any of our secondary outcomes |
| Strömblad et al [ | Mean absolute error—AI: 49.5 minutes {66} versus control: 59.3 minutes {72} | Mean patient wait time: overall—AI: 16.3 minutes versus control: 49.4 minutes (67.1% improvement); turnover time: overall—AI: 69.1 minutes versus control: 70.6 minutes (2% improvement); patient time in facility—AI: 148.1 versus control: 173.3 (14.5% improvement) | Positive | No | Implementing machine learning–generated predictions for surgical case durations may improve case duration accuracy, presurgical resource use, and patient wait time, without increasing surgeon wait time between cases | Small sample size; prediction accuracy may be affected if the submitted procedure codes deviate significantly from the procedures that are performed; a less common occurrence were multipanel cases in which multiple surgeons from different services operated on the same patient during the same case; there was no stratification by days |
| Wu et al [ | AI: 5.38 (SD 4.32) versus control: 9.82 | AI: 5.40 (SD 3.82) minutes versus control: 4.38 (SD 3.91) minutes; the median percentage of patients with blind spots at each site—AI: 21% versus control: 38.9%; per-lesion accuracy: 84.7%; sensitivity: 100%; specificity: 84.3% | Positive | No | ENDOANGEL was an effective and robust system to improve the quality of EGD and has the potential to detect electrocardiogram in real time | We only conducted a feasibility analysis on real-time detection of gastric cancer based on deep learning in a clinical setting; the enrolled patients were not followed up for a long time; statisticians were not blinded |
| Yao et al [ | In overall cohort—AI: 2.1% versus control 1.6%; among 1356 patients who had a positive result—AI: 19.5% versus control: 14.5% | No significant between AI and control on disease discovery | Positive | No | An AI algorithm run on existing electrocardiograms enabled the early diagnosis of low ejection fraction in a large cohort of patients managed in routine primary care practices. Because electrocardiography is a low-cost test that is frequently performed for a variety of purposes, the algorithm could potentially improve early diagnosis and treatment of a condition that is often asymptomatic but has effective treatments and thus reduce the disease burden in broad populations | Echocardiogram may not be ordered by clinician as nearly all the patients had insurance coverage; study was not designed to determine the long-term clinical impact; for example, heart failure hospitalizations and mortality |
| Brown et al [ | AMR—CADe first: 20.12% versus HDWLaj first: 31.25% | PDR—CADe first: 20.7% versus HDWL first: 33.71%; HPMRak—no significant difference in the hyperplastic polyp miss rate; SSLMRal—CADe first: 7.140% versus HDWL first: 42.11%; no statistically significant difference in ADR during first pass, second pass, and whole process; no statistically significant difference in PDR during first pass, second pass, and whole process; adenoma per colonoscopy during first pass—CADe first: 1.19 versus HDWL first: 0.90; no statistically significant difference during second pass and whole process; polyp per colonoscopy during first pass—CADe first: 2.0 versus HDWL first: 1.59; polyp per colonoscopy during second pass—CADe first: 0.52 versus HDWL first: 0.81; no statistically significant difference during whole process; SSLPCam during second pass—CADe first: 0.01 versus HDWL first: 0.07; no statistically significant difference during first pass whole process | Positive | No | This study showed a decrease in AMR with the use of a deep learning CADe system when compared with HDWL colonoscopy alone and a decrease in polyp and sessile serrated lesion miss rates and an increase in first-pass adenomas per colonoscopy | Not powered to detect a difference in ADR; the tandem colonoscopy design limited in terms of generalizability to the real-world clinical setting; only included experienced endoscopists with a high baseline ADR at US academic medical centers; used a second monitor adjacent to the primary endoscopy monitor |
aCPAP: continuous positive airway pressure.
bAI: artificial intelligence.
cICU: intensive care unit.
dLOS: length of stay.
eHGD: high-grade dysplasia.
fEAC: esophageal adenocarcinoma.
gWATS: wide-area transepithelial sampling.
hBE: Barrett's esophagus.
iSAP: sensor-augmented pump.
jHPI: hypotension probability indicator.
kRCT: randomized controlled trial.
lBPI: Brief Pain Inventory.
mBQ-II: Barriers Questionnaire II.
nSBP: systolic blood pressure.
oSRS-II: Social Responsiveness Scale II.
pVABS-II: Vineland Adaptive Behavioural Scales, Second edition.
qOR: odds ratio.
rADR: adenoma detection rate.
sEGD: esophagogastroduodenoscopy.
tDAP: dose–area product.
uCBCT: cone-beam computed tomography.
vU-TOE: ultrathin transoral endoscopy.
wITT: intention to treat.
xPP: per protocol.
yCADe: computer-assisted detection.
zCTV: clinical target volume.
aaLDR: low-dose rate.
abAQCS: automatic quality control system.
acBBPS: Boston bowel preparation scale.
adDCNN: deep convolutional neural networks.
aeAMR: adenoma miss rate.
afQIDS-SR-16: Quick Inventory of Depressive Symptomatology (16-Item) (Self-Report).
agTKR: total knee replacement.
ahPDR: polyp detection rate.
aiIBS: irritable bowel syndrome.
ajHDWL: high-definition white light.
akHPMR: Hyperplastic polyp miss rate.
alSSLMR: sessile serrated lesion miss rate.
amSSLPC: sessile serrated lesion per colonoscopy.
Quality assessment outcome based on the second version of the Cochrane risk-of-bias tool for randomized trials.
| Author (publication year) | Randomization process | Deviations from intended interventions | Missing outcome data | Measurement of the outcome | Selection of the reported result | Overall bias |
| El Solh et al [ | Low | Low | Low | Low | Low | Low |
| Shimabukuro et al [ | Low | Low | Low | Low | Low | Low |
| Labovitz et al [ | Some concerns | Low | Low | Low | Low | Some concerns |
| Gracey et al [ | Some concerns | High | High | High | High | High |
| Liu et al [ | Some concerns | Some concerns | High | High | Some concerns | High |
| Vennalaganti et al [ | Some concerns | Low | Low | High | Low | High |
| Biester et al [ | Some concerns | Some concerns | High | Low | High | High |
| Pouska et al [ | Some concerns | High | Low | Low | Low | High |
| Lin et al [ | Low | Low | Low | Low | Low | Low |
| Kamdar et al [ | Some concerns | Some concerns | High | High | Some concerns | High |
| Persell et al [ | Low | High | Low | High | Low | High |
| Voss et al [ | High | Some concerns | Low | High | Low | High |
| Wang et al [ | Some concerns | Low | Low | Low | Low | Some concerns |
| Wu et al [ | Some concerns | Low | Low | Low | Low | Some concerns |
| Pavel et al [ | Some concerns | Low | Low | Low | Low | Some concerns |
| Alfonsi et al [ | Some concerns | Low | Low | High | Low | High |
| Auloge et al [ | Some concerns | Some concerns | Low | Low | Low | Some concerns |
| Avari et al [ | Low | Some concerns | Low | Some concerns | Some concerns | Some concerns |
| Chen et al [ | Some concerns | Low | Low | Low | Low | Some concerns |
| Gong et al [ | Low | Low | Low | Low | Low | Low |
| Liu et al [ | Some concerns | Low | Low | Some concerns | Low | Some concerns |
| Nicolae et al [ | Some concerns | Some concerns | High | Low | Low | High |
| Repici et al [ | Some concerns | Low | Low | Low | Low | Some concerns |
| Su et al [ | Low | Low | Low | Low | Low | Low |
| Wang et al [ | Low | Low | Low | High | Low | High |
| Wang et al [ | Low | Low | Low | Some concerns | Low | Some concerns |
| Wijnberge et al [ | Some concerns | Low | Low | Low | Low | Some concerns |
| Weisinger et al [ | Some concerns | Low | High | High | Low | High |
| Blomberg et al [ | Low | Low | High | Low | Low | High |
| Browning et al [ | Low | Low | Low | Low | Low | Low |
| Jayakumar et al [ | Some concerns | Low | Low | High | Low | High |
| Kamba et al [ | Some concerns | High | High | Some concerns | Low | High |
| Luo et al [ | Some concerns | Low | Low | Low | Low | Some concerns |
| Rafferty et al [ | High | Low | Low | High | Low | High |
| Repici et al [ | Some concerns | Some concerns | High | High | Low | High |
| Strömblad et al [ | Low | Low | Low | Low | Low | Low |
| Wu et al [ | Low | Low | Low | Low | Low | Low |
| Yao et al [ | Some concerns | High | High | High | Low | High |
| Brown et al [ | Low | Low | Low | High | Low | High |