| Literature DB >> 34711955 |
Qian Zhou1, Zhi-Hang Chen2, Yi-Heng Cao3, Sui Peng4,5.
Abstract
The evidence of the impact of traditional statistical (TS) and artificial intelligence (AI) tool interventions in clinical practice was limited. This study aimed to investigate the clinical impact and quality of randomized controlled trials (RCTs) involving interventions evaluating TS, machine learning (ML), and deep learning (DL) prediction tools. A systematic review on PubMed was conducted to identify RCTs involving TS/ML/DL tool interventions in the past decade. A total of 65 RCTs from 26,082 records were included. A majority of them had model development studies and generally good performance was achieved. The function of TS and ML tools in the RCTs mainly included assistive treatment decisions, assistive diagnosis, and risk stratification, but DL trials were only conducted for assistive diagnosis. Nearly two-fifths of the trial interventions showed no clinical benefit compared to standard care. Though DL and ML interventions achieved higher rates of positive results than TS in the RCTs, in trials with low risk of bias (17/65) the advantage of DL to TS was reduced while the advantage of ML to TS disappeared. The current applications of DL were not yet fully spread performed in medicine. It is predictable that DL will integrate more complex clinical problems than ML and TS tools in the future. Therefore, rigorous studies are required before the clinical application of these tools.Entities:
Year: 2021 PMID: 34711955 PMCID: PMC8553754 DOI: 10.1038/s41746-021-00524-2
Source DB: PubMed Journal: NPJ Digit Med ISSN: 2398-6352
Fig. 1Flowchart of the study.
Published trials were searched on PubMed. Clinical trial registry and references in the full-text articles for eligibility were also checked to include potentially relevant trials. Clinical trial registry was the clinicaltrial.gov registry website. The observational studies for tool development and/or validation were searched according to the descriptions and the references of the clinical trial paper.
Fig. 2Distribution of the number of trials and percentage of trials with positive results.
a The trend of published randomized controlled trials involving traditional statistical and artificial intelligence prediction tool interventions with years; b the trend of the number of trials with positive and negative results; c number of trials with positive results by three types of prediction tools; d percentage of trials with positive results by three types of prediction tools.
General characteristics of the 65 randomized controlled trials.
| Variables | Levels | Total ( |
|---|---|---|
| Results (%) | Negative | 25 (38.5) |
| Positive | 40 (61.5) | |
| Duration of study ( | 12 [6, 24] | |
| Sample size (median [IQR]) | 435 [192, 999] | |
| Sample size estimation (%) | Larger or equal than expected | 37 (56.9) |
| Less than expected | 7 (10.8) | |
| Not performed | 21 (32.3) | |
| Publication year (%) | 2010–2015 | 21 (32.3) |
| 2016–2020 | 44 (67.7) | |
| Study design (%) | RCT superiority (individualized) | 48 (73.8) |
| RCT superiority with crossover (individualized) | 1 (1.5) | |
| RCT non-inferiority (individualized) | 2 (3.1) | |
| Clustered RCT superiority (clustered) | 7 (10.8) | |
| Stepped-wedge design (clustered) | 7 (10.8) | |
| Allocation ratio (%) | 1:1 parallel | 55 (84.6) |
| Others | 10 (15.4) | |
| Masking (%) | Open-label | 49 (75.4) |
| Single-blinded | 12 (18.5) | |
| Double-blinded | 4 (6.2) | |
| Centers (%) | Single | 33 (50.8) |
| Multi | 32 (49.2) | |
| Disease category (%) | Cancer | 11 (16.9) |
| Chronic disease not included cancer | 18 (27.7) | |
| Acute disease | 19 (29.2) | |
| Primary care | 9 (13.8) | |
| Others | 8 (12.3) | |
| Types of algorithms (%) | Traditional statistical model | 37 (56.9) |
| Machine learning | 17 (26.2) | |
| Deep learning | 11 (16.9) | |
| Prediction tools function (%) | Assistive treatment decision | 35 (53.8) |
| Assistive diagnosis | 16 (24.6) | |
| Risk stratification | 12 (18.5) | |
| Others | 2 (3.1) | |
| Referenced CONSORT (%) | No | 47 (72.3) |
| Yes | 18 (27.7) | |
| Intent-to-treat analysis (%) | No | 39 (60.0) |
| Yes | 26 (40.0) | |
| Study protocol available | No | 49 (75.4) |
| Yes | 16 (24.6) | |
| Model development (%) | No | 7 (10.8) |
| Yes—independent publication | 49 (75.4) | |
| Yes—published in the same article with RCT | 9 (13.8) | |
| Internal validation (%) | No | 23 (35.4) |
| Yes | 42 (64.6) | |
| External validation (%) | No | 25 (38.5) |
| Yes | 40 (61.5) | |
| AUC in model development ( | 0.81 [0.75, 0.90] | |
| AUC in internal validation ( | 0.78 [0.73, 0.78] | |
| AUC in external validation ( | 0.83 [0.79, 0.97] | |
IQR interquartile range, AUC area under the receiver operating characteristic curve.
aAvailable numbers used for description
Fig. 3Risk of bias assessment.
a The distributions of risk of bias by each domain; b the distributions of the overall risk of bias for all trials and for traditional statistical, machine learning, and deep learning tools, respectively.
Procedures of predictive tool interventions in the 11 randomized controlled trials involving interventions evaluating deep learning tools.
| Reference | Conditions | Sample size | Tools for intervention | Control | Algorithms | Tool function | Tool input | Tool output | How the output being used in clinical settings | Trial primary outcomes | Gold standard | Trial findings |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Upper gastrointestinal lesions | 437 | Routine EGD examination stratified by three types with the assistance of ENDOANGEL AI system | Routine EGD examination stratified by three types without AI | DCNN (VGG-16) | Assistive diagnosis | EGD images | A virtual stomach model monitoring blind spots; timing; scoring and grading | Experts referenced AI output to make EGD examination and monitor blind spots. | Mean blind spot rate | Experts | Positive | |
| Childhood cataracts | 700 | CC-cruiser web diagnosis platform | Regular ophthalmic diagnosis | DCNN (ImageNet) | Assistive diagnosis | Ocular images from slit-lamp photography | Diagnosis outcome; comprehensive evaluation; treatment recommendation | AI made diagnosis independently, and its results would be comparted with experts and not impact clinical decision making. | Accuracy of diagnosis | Experts | Negative | |
| Colorectal cancer | 659 | Routine colonoscopies with the assistance of an AI automatic quality control system | Routine colonoscopies | DCNN (AlexNet, ZFNet, YOLO V2) | Assistive diagnosis | Colonoscopy images | Location of colorectal polyps; timing; reminding retest and clean | Endoscopists referenced AI output to make endoscopic examination and report of polyps and adenomas. | Adenoma detection rate | Pathology | Positive | |
| Colorectal cancer | 1058 | Routine colonoscopies with the assistance of an automatic polyp detection system | Routine colonoscopies | Deep learning architecture | Assistive diagnosis | Colonoscopy images | Location of polyps; alarming | Endoscopists were required to check every polyp location detected by the system and report of polyps and adenomas. | Adenoma detection rate | Pathology | Positive | |
| Upper gastrointestinal lesions | 303 | Routine EGD examination with the assistance of WISENSE AI system | Routine EGD examination | DCNN (VGG-16 and DenseNet) | Assistive diagnosis | EGD images | A virtual stomach model monitoring blind spots; timing; scoring and grading; extracting frames with the highest confidence | Experts referenced AI output to make EGD examination and monitor blind spots. | Mean blind spot rate | Experts | Positive | |
| Colorectal cancer | 704 | ENDOANGEL-assisted routine colonoscopy | Routine colonoscopy | DCNN and perceptual hash algorithms (VGG-16) | Assistive diagnosis | Colonoscopy images | Timing; safe, alarm, and dangerous ranges of withdrawal speed for real-time monitoring; slipping warning | Operating endoscopists referenced AI output to make endoscopic examination and report of polyps and adenomas. | Adenoma detection rate | Pathology | Positive | |
| Liu (2020)[ | Colorectal cancer | 1026 | Routine colonoscopy with CADe assistance | Routine colonoscopy | DCNN-3D | Assistive diagnosis | Colonoscopy images | The probability of polyps in each frame; lesions alarming | Endoscopists focused mainly on the main monitor during the examination process, and a voice alarm prompted them to view the system monitor to check the location of each polyp detected by the system. | Detection rate of polyps and adenomas | Pathology | Positive |
| Colorectal cancer | 157 | AI-assisted colonoscopy | Traditional colonoscopy | CNN (YOLO) | Assistive diagnosis | Colonoscopy images | Location of polyps | Endoscopists referenced AI output to make endoscopic examination and report of polyps. | Polyp detection rate | Not reported | Positive | |
| Repici (2020)[ | Colorectal cancer | 685 | High-definition colonoscopies with the AI-based CADe system | Routine colonoscopy | CNN | Assistive diagnosis | Colonoscopy images | Location of polys | Endoscopists referenced AI output to make endoscopic examination and report of polyps and adenomas. | Adenoma detection rate | Pathology | Positive |
| Wang (2020)[ | Colorectal cancer | 962 | White light colonoscopy with assistance from the CADe system | White light colonoscopy with assistance from a sham system | Deep learning architecture | Assistive diagnosis | Colonoscopy images | Location of polyps; alarming | Endoscopists were required to check every polyp location detected by the system and report of polyps and adenomas. | Adenoma detection rate | Pathology | Positive |
| Blomberg (2021)[ | Out-of-hospital cardiac arrest (OHCA) | 5242 | Normal protocols with alert | Normal protocols without alert | Speech recognition using deep neural networks | Assistive diagnosis | Emergency calls | OHCA Alert | Dispatchers in the intervention group were alerted when the machine learning model identified out-of-hospital cardiac arrest. | The rate of dispatcher recognition of subsequently confirmed OHCA | Danish Cardiac Arrest Registry | Negative |
AI artificial intelligence, DL tools using deep learning algorithms, ML tools using machine learning algorithms, CNN convolutional neural networks, DCNN deep convolutional neural networks, CADe computer-aided detection, EGD esophagogastroduodenoscopy, OHCA out-of-hospital cardiac arrest.
Comparisons among trials involving traditional statistical, machine learning and deep learning predictive tool interventions.
| Variables | Levels | TS ( | ML ( | DL ( | |
|---|---|---|---|---|---|
| Duration of study ( | 17 [8, 32] | 7 [4, 19] | 6 [4, 9] | 0.005 | |
| Sample size (median [IQR]) | 435 [194, 999] | 258 [90, 537] | 700 [548, 994] | 0.122 | |
| Clinical settings (%) | Outpatients | 19 (51.4) | 6 (35.3) | 1 (9.1) | 0.015 |
| Inpatients | 17 (45.9) | 8 (47.1) | 10 (90.9) | ||
| Home | 1 (2.7) | 3 (17.6) | 0 (0.0) | ||
| Publication year (%) | 2010–2015 | 14 (37.8) | 7 (41.2) | 0 (0.0) | 0.041 |
| 2016–2020 | 23 (62.2) | 10 (58.8) | 11 (100.0) | ||
| Model input (%) | Clinical quantitative data | 36 (97.3) | 16 (94.1) | 0 (0.0) | <0.001 |
| Images or videos | 1 (2.7) | 0 (0.0) | 10 (90.9) | ||
| Natural language | 0 (0.0) | 1 (5.9) | 1 (9.1) | ||
| Disease category (%) | Cancer | 2 (5.4) | 0 (0.0) | 9 (81.8) | <0.001 |
| Chronic disease | 4 (10.8) | 13 (76.5) | 1 (9.1) | ||
| Acute disease | 16 (43.2) | 2 (11.8) | 1 (9.1) | ||
| Primary care | 9 (24.3) | 0 (0.0) | 0 (0.0) | ||
| Others | 6 (16.2) | 2 (11.8) | 0 (0.0) | ||
| Prediction tools function (%) | Assistive diagnosis | 3 (8.1) | 2 (11.8) | 11 (100.0) | <0.001 |
| Risk stratification | 11 (29.7) | 1 (5.9) | 0 (0.0) | ||
| Assistive treatment decision | 22 (59.5) | 13 (76.5) | 0 (0.0) | ||
| Others | 1 (2.7) | 1 (5.9) | 0 (0.0) | ||
| Results (%) | Negative | 18 (48.6) | 5 (29.4) | 2 (18.2) | 0.136 |
| Positive | 19 (51.4) | 12 (70.6) | 9 (81.8) | 0.044 ( | |
TS randomized controlled trials involving traditional statistical tool as intervention, ML randomized controlled trials involving tool using machine learning algorithms not including deep learning as intervention, DL randomized controlled trials involving tool using deep learning algorithm as intervention.
Fig. 4The number of trials and percentage of positive results of three types of tools according to the risk of bias.
a The number of trials of each type of tool in trials with low risk of bias; b the percentage of positive results of each type of tool in trials with low risk of bias; c the number of trials of each type of tool in trials with some concerns; d the percentage of positive results of each type of tool in trials with some concerns; e the number of trials of each type of tool in trials with a high risk of bias; f the percentage of positive results of each type of tool in trials with a high risk of bias.