| Literature DB >> 34969668 |
George C M Siontis1, Romy Sweda2, Peter A Noseworthy3, Paul A Friedman3, Konstantinos C Siontis3, Chirag J Patel4.
Abstract
OBJECTIVE: Given the complexities of testing the translational capability of new artificial intelligence (AI) tools, we aimed to map the pathways of training/validation/testing in development process and external validation of AI tools evaluated in dedicated randomised controlled trials (AI-RCTs).Entities:
Keywords: artificial intelligence; clinical; data science; decision support systems; machine learning; medical informatics
Mesh:
Year: 2021 PMID: 34969668 PMCID: PMC8718483 DOI: 10.1136/bmjhci-2021-100466
Source DB: PubMed Journal: BMJ Health Care Inform ISSN: 2632-1009
Characteristics of peer-reviewed protocols and completed RCTs evaluating artificial intelligence tools
| Characteristics | AI-RCTs | Protocols of AI-RCTs | Completed AI-RCTs |
| No of centres, n (%) | |||
| 11 (48) | 1 (17) | 10 (59) | |
| 12 (52) | 5 (83) | 7 (41) | |
| Geographic area, n (%) | |||
| 8 (35) | 2 (33) | 6 (35) | |
| 5 (22) | 1 (17) | 4 (24) | |
| 9 (39) | 3 (50) | 6 (35) | |
| 1 (4) | 0 (0) | 1 (6) | |
| Arms of randomisation, n (%) | |||
| 20 (87) | 5 (83) | 15 (88) | |
| 3 (13) | 1 (17) | 2 (12) | |
| Level of randomisation, n (%) | |||
| 22 (96) | 6 (100) | 16 (94) | |
| 1 (4) | 0 (0) | 1 (6) | |
| Sample size | |||
| 214 (108–571) | 298 (219–830) | 214 (100–437) | |
| 20 | 100 | 20 | |
| 22 641 | 18 000 | 22 641 | |
| Power calculations, n (%) | |||
| 18 (78) | 6 (100) | 12 (71) | |
| 5 (22) | 0 (0) | 5 (29) | |
| Type of control intervention, n (%) | |||
| 22 (96) | 6 (100) | 16 (94) | |
| 1 (4) | 0 (0) | 1 (6) | |
| Funding source, n (%) | |||
| 4 (17) | 1 (17) | 3 (18) | |
| 10 (43) | 4 (66) | 6 (35) | |
| 7 (30) | 1 (17) | 6 (35) | |
| 2 (9) | 0 (0) | 2 (12) | |
| Data sources, n (%) | |||
| 5 (22) | 2 (33) | 3 (18) | |
| 4 (17) | 2 (33) | 2 (12) | |
| 4 (17) | 2 (33) | 2 (12) | |
| 4 (17) | 0 (0) | 4 (23) | |
| 6 (27) | 0 (0) | 6 (35) | |
| Strategies for missing data, n (%) | |||
| 4 (17) | 4 (67) | 0 (0) | |
| 19 (83) | 2 (33) | 17 (100) | |
| Primary outcome(s), n (%) | |||
| 7 (30) | 0 (0) | 7 (41) | |
| 1 (4) | 0 (0) | 1 (6) | |
| 1 (4) | 1 (17) | 0 (0) | |
| 14 (61) | 5 (83) | 9 (53) | |
| Primary outcome favours AI tool, n (%) | |||
| 13 (57) | 0 (0) | 13 (76) | |
| 2 (9) | 0 (0) | 2 (12) | |
| 8 (34) | 6 (100) | 2 (12) | |
| Different geographic area of study population in development study and AI-RCT, n (%) | |||
| 3 (14) | 1 (17) | 2 (12) | |
| 12 (52) | 1 (17) | 11 (65) | |
| 8 (34) | 4 (66) | 4 (23) | |
| External validation of AI tool, n (%) | |||
| 11 (48) | 2 (33) | 9 (53) | |
| 12 (52) | 4 (67) | 8 (47) | |
| 0 (0) | 0 (0) | 0 (0) | |
| 11 (48) | 2 (33) | 9 (53) | |
*The respective development study was not identified.
†Compared with the development study.
AI-RCTs, artificial intelligence randomised controlled trials; EHR, electronic health records.
Descriptive summary of 18 artificial intelligence tools evaluated in AI-RCTs
| AI-RCT/AI tool | Medical field | Aim | Description | Softwares/packages used | Primary outcome | Outcome classification | Main finding |
| El-Soll | Pulmonary diseases | Prediction of optimal CPAP titration | A general regression neural network with tree-layer structure (input layer, hidden layer and output layer) was trained to predict optimal CPAP pressure based on five input variables. | Neuroshell 2, Ward Systems, Frederick, MD | Time of achieving optimal continuous positive airway pressure titration | Therapeutic | AI guided CPAP titration resulted in lower time to optimal CPAP and lower titration failure rate. |
| Martin | Chronic diseases | Early detection of adverse trajectories and reduction of readmissions | Summaries of semistructured phone calls about well-being and health-concerns analysed by machine learning-based and rule-based algorithms. By detection of signs of health deterioration, an alarm was triggered. Alarms were reviewed by a clinical case manager who decided subsequent interventions. | Not specified | Unplanned emergency ambulatory care sensitive admissions | Therapeutic | AI tool allowed early identification of health concerns and resulted in reduction of emergency ambulatory care sensitive admissions. |
| Zeevi | Nutrition/endocrinology | Prediction of postprandial glycaemic response | A machine learning algorithm employing stochastic gradient boosting regression was developed to predict personalised postprandial glycaemic responses to real-life meals. Inputs included blood parameters, dietary habits, anthropometrics, physical activity and gut microbiota. | Code adapted from the sklearn 0.15.2 Gradient Boosting Regressor class | Postprandial glycaemic responses | Therapeutic | AI tool accurately predicted postprandial glycaemic responses. Individualised dietary interventions resulted in lower postprandial glycaemic responses and alterations to gut microbiota. |
| Piette | Behavioural | Improvement of chronic low back pain by personalised cognitive behavioural therapy | A reinforcement learning algorithm is employed to customise cognitive behavioural therapy in patients with chronic low back pain. The algorithm learns from patient feedback and pedometer step counts to provide personalised therapy recommendations. | Not specified | 24-item Roland Morris Disability Questionnaire | Therapeutic | Not applicable (protocol of AI-RCT) |
| Sadasivam | Behavioural | Smoking cessation | A hybrid recommender system employing content-based and collaborative filtering methods was developed to provide personalised messages supporting smoking cessation. Data sources included message-metadata together with implicit (ie, website view patterns) and explicit (item ratings) user feedbacks. Each participant received AI-selected messages from a message database that matched their readiness to quit status. | Not specified | Smoking cessation | Therapeutic | After 30 days, there was no difference in smoking cessation rates, although those receiving AI-tailored computer messages rated them as being more influential. |
| Shimabukuro | Infectious diseases | Sepsis prediction | A machine learning based classifier with gradient tree boosting was developed to generate risk scores predictive of sepsis, severe sepsis or septic shock based on electronic health record data. Depending on the predicted risk, an alarm was triggered. Further evaluation and treatment was according to standard guidelines. | Matlab | Average hospital length of stay | Therapeutic | AI-guided monitoring decreased length of hospital stay and in-hospital mortality. |
| Fulmer | Behavioural | Reduction of depression and anxiety | An AI-based chatbot was designed to deliver personalised conversations in the form of integrative mental health support, psychoeducation and reminders. Users could enter both free-text and/or select predefined responses. | Not specified | Self-report tools (PHQ-9, GAD-7, PANAS) for symptoms of depression and anxiety | Therapeutic | AI-based intervention resulted in reduction of symptoms of depression and anxiety. |
| Wang | Gastroenterology | Automatic polyp and adenom detection | A deep CNN based on the SegNet architecture was trained to automatically identify polyps in real time during colonoscopy. | Not specified | Adenoma detection rate | Diagnostic | Automatic polyp detection system resulted in a significant increased detection rate of adenomas and polyps. |
| Wu | Gastroenterology | Quality improvement of endoscopy by automatic identification of blind spots | A deep CNN combined with deep reinforcement learning was designed to automatically detect blind spots during EGD. | TensorFlow | Blind spot rate | Feasibility | AI reduced blind spot rate during esophagogastroduodenoscopy |
| Gong | Gastroenterology | Quality improvement of endoscopy by automatic identification of adenomas | A deep CNN combined with deep reinforcement learning was designed to automatically detect adenomas during colonoscopy. | TensorFlow | Adenoma detection rate | Diagnostic | AI increased adenoma detection rate during colonoscopy |
| Oka | Nutrition/endocrinology | Automated nutritional intervention to improve glycaemic control in patients with diabetes mellitus | Participants use a mobile app to select foods from a large database (>100 000) of menus, which are analysed with regards to their energy and nutrition content by an AI-powered photo analysis system. The trial will compare dietary interventions based on AI-supported vs standard nutritional therapy. | Not specified | Change in glycated haemoglobin levels | Therapeutic | Not applicable (protocol of AI-RCT) |
| Lin | Ophtalmology | Diagnosis and risk stratification of childhood cataracts | A collaborative cloud platform encompassing automatic analysis of uploaded split-lamp photographs of the ocular anterior segment by an AI engine was established. Output includes diagnosis, risk stratification and treatment recommendations. | Not specified | Diagnostic performance for childhood cataract | Diagnostic | AI tool was less accurate than senior consultants in diagnosing childhood cataracts, but was less time-consuming. |
| Wijnberge | Surgery/anaesthesia | Prediction of intraoperative hypotension | A machine learning algorithm to predict hypotensive episodes from arterial pressure waveforms was designed. The model output was implemented as an early warning system based on the estimated ‘hypotension prediction index’(0–100, with higher numbers reflecting higher likelihood of incipient hypotension) and included information about the underlying cause for the predicted hypotension (vasoplegia, hypovolaemia, low contractility). | Matlab | Time-weighted average of hypotension during surgery/frequency and absolute and relative duration of intraoperative hypotension | Therapeutic | The AI-based early warning system performed different under different clinical settings (ie, elective non-cardiac surgery, primary total hip arthroplasty, moderate to high risk non-cardiac surgery patients). |
| Auloge | Orthopaedics | Facilitation of percutaneous vertebroplasty by augmented reality/ artificial intelligence-based navigation | A navigation system integrating four video cameras within the flat-panel detector of a standard C-arm fluoroscopy machine was developed, including an AI software that automatically recognised osseous landmarks, identified each vertebral level and displayed 2D/3D planning images on the user interface. After manual selection of the target vertebra, the software suggests an optimal trans-pedicular approach. Once trajectory is validated, the C-arm automatically rotates and the virtual trajectory is superimposed over the real-world camera input with overlaid, motion-compensated needle trajectories. | Not specified | Technical feasibility of trocar placement using augmented reality/artificial intelligence guidance | Feasibility | AI-guided percutaneous vertebroplasty was feasible and resulted in lower radiation exposure compared with standard fluoroscopic guidance. |
| Wong | Infectious diseases | Early detection of COVID-19 in quarantine subjects | Data from a wearable biosensor worn on the upper arm are automatically transferred in real time through a smartphone app to a cloud storage platform and subsequently analysed by the AI software. The results (including risk prediction of critical events) are displayed on a web-based dashboard for clinical review. | Not specified | Time to diagnosis of coronavirus disease 19 | Diagnostic | Not applicable (protocol of AI-RCT) |
| Aguilera | Behavioural | Increase physical activity in patients with diabetes and depression by tailored messages via AI mobile health application | Participants receive daily messages from a messaging bank, with message category, timing and frequency being selected by a reinforcement learning algorithm. The algorithm employs Thompson Sampling to continuously learn from contextual features like previous physical activity, demographic and clinical characteristics. | Not specified | Improvement in physical activity defined by daily step counts | Therapeutic | Not applicable (protocol of AI-RCT) |
| Hill | Cardiology | Atrial fibrillation detection | An atrial fibrillation risk prediction algorithm was developed using machine learning techniques on retrospective data from nearly 3 000 000 adult patients without history of atrial fibrillation. The output is provided as a risk score for the likelihood of atrial fibrillation. | R | Prevalence of diagnosed atrial fibrillation | Diagnostic | Not applicable (protocol of AI-RCT) |
| Yao | Cardiology | ECG AI-guided screening for low left ventricular ejection fraction | A CNN model has been trained to predict low LVEF from 10 s 12-lead ECGs strips from nearly 98’000 patients with paired ECG-TTE data. The final model consisted of 6 convolutional layers, each followed by a nonlinear ‘Relu’ activation function, a batch-normalisation layer and a max-pooling layer. The binary output will be incorporated into the electronic health record and triggered a recommendation for TTE in case of a positive screening result (predicted LVEF ≤35%). | Keras, TensorFlow, Python | Newly discovered left ventricular ejection fraction <50% | Diagnostic | An AI algorithm applied on existing ECGs enabled the early diagnosis of low left ventricular ejection fraction in patients managed in primary care practices. |
AI-RCT, artificial intelligence randomised controlled trial; CNN, convolutional neural network; CPAP, continuous positive airway pressure; GAD-7, General Anxiety Disorder-7; LVEF, left ventricular ejection fraction; na, not available; PANAS, Positive and Negative Affect Schedule; PHQ-9, Patient Health Questionnaire-9; TTE, transthoracic echocardiography.
Figure 1Patterns of pathways of development (training, validation and/or testing), external validation and clinical evaluation of artificial intelligence tools in ongoing and completed clinical trials (n=23). In network level, each circle corresponds to an individual study (green, blue, and red for development, external validation and AI-RCTs, respectively). The number below each network represents the number of unique AI tools having identified with the respective pattern (network) of studies. For example, the first network of the top row corresponds to a unique AI tool for which a development study (green circle), four external validation studies (blue circles), and two AI-RCTs (red circles) were found. AI-RCTs, artificial intelligence randomised controlled trials.
Figure 2Timelines of publications and sample sizes of development (training, validation and/or testing), external validation studies and completed AI-RCTs (n=17). Each circle corresponds to a unique study (development (training, validation, testing) studies in green, external validation studies in blue, and AI-RCTs in red). Due to the wide range of studies’ sample sizes, the values are displaying in logarithmic (log10) scale. AI-RCTs, artificial intelligence randomised controlled trials.