| Literature DB >> 34426417 |
Susan Cheng Shelmerdine1, Owen J Arthurs2, Alastair Denniston3, Neil J Sebire4.
Abstract
High-quality research is essential in guiding evidence-based care, and should be reported in a way that is reproducible, transparent and where appropriate, provide sufficient detail for inclusion in future meta-analyses. Reporting guidelines for various study designs have been widely used for clinical (and preclinical) studies, consisting of checklists with a minimum set of points for inclusion. With the recent rise in volume of research using artificial intelligence (AI), additional factors need to be evaluated, which do not neatly conform to traditional reporting guidelines (eg, details relating to technical algorithm development). In this review, reporting guidelines are highlighted to promote awareness of essential content required for studies evaluating AI interventions in healthcare. These include published and in progress extensions to well-known reporting guidelines such as Standard Protocol Items: Recommendations for Interventional Trials-AI (study protocols), Consolidated Standards of Reporting Trials-AI (randomised controlled trials), Standards for Reporting of Diagnostic Accuracy Studies-AI (diagnostic accuracy studies) and Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis-AI (prediction model studies). Additionally there are a number of guidelines that consider AI for health interventions more generally (eg, Checklist for Artificial Intelligence in Medical Imaging (CLAIM), minimum information (MI)-CLAIM, MI for Medical AI Reporting) or address a specific element such as the 'learning curve' (Developmental and Exploratory Clinical Investigation of Decision-AI) . Economic evaluation of AI health interventions is not currently addressed, and may benefit from extension to an existing guideline. In the face of a rapid influx of studies of AI health interventions, reporting guidelines help ensure that investigators and those appraising studies consider both the well-recognised elements of good study design and reporting, while also adequately addressing new challenges posed by AI-specific elements. © Author(s) (or their employer(s)) 2021. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.Entities:
Keywords: BMJ health informatics; healthcare sector; medical informatics
Mesh:
Year: 2021 PMID: 34426417 PMCID: PMC8383863 DOI: 10.1136/bmjhci-2021-100385
Source DB: PubMed Journal: BMJ Health Care Inform ISSN: 2632-1009
Summary of reporting guidelines for common study types used in radiological research, and their corresponding guideline extensions where these involve artificial intelligence
| Study design | Reporting guideline | Latest version | AI-related extension | Date of AI-extension published |
| Clinical Trial Protocol | SPIRIT | 2013 | SPIRIT-AI | September 2020 |
| Diagnostic Accuracy Studies | STARD | 2015 | STARD-AI | Expected 2021 |
| CLAIM | March 2020 | |||
| MINIMAR | June 2020 | |||
| Prediction models for diagnostic or prognostication purposes | TRIPOD | 2015 | TRIPOD –AI/ML | Expected 2021 |
| PROBAST | 2019 | PROBAST-ML | Expected 2021 | |
| Randomised Controlled Trials (Interventional Study Design) | CONSORT | 2010 | CONSORT-AI | September 2020 |
| Systematic reviews and meta-analyses | PRISMA | 2009 | None planned or announced | |
| Critical appraisal and data extraction of publications relating to prediction models | CHARMS | 2014 | Applicable to machine learning | |
| Evaluation of human factors in early algorithm deployment | Not applicable | DECIDE-AI | Expected 2021/2022 | |
AI, artificial intelligence; CHARMS, Checklist for critical appraisal and data extraction for systematic reviews of prediction modelling studies; CLAIM, Checklist for Artificial Intelligence in Medical Imaging; CONSORT, Consolidated Standards of Reporting Trials; DECIDE-AI, Developmental and Exploratory Clinical Investigation of Decision-support systems driven by Artificial Intelligence; DTA, Diagnostic Trials of Accuracy; MINIMAR, Minimum Information for Medical AI Reporting; ML, machine learning; PRISMA, Preferred Reporting Items for Systematic Review and Meta-analysis; PROBAST, Prediction model Risk Of Bias Assessment Tool; SPIRIT, Standard Protocol Items: Recommendations for Interventional Trials; STARD, Standards for Reporting of Diagnostic Accuracy Studies; TRIPOD, Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis.
Additional items proposed for studies relating to AI intervention clinical protocols within the SPIRIT-AI statement (in addition to the SPIRIT 2013 statement)
| Section | Item no | SPIRIT 2013 item | Amendment | SPIRIT-AI item |
|
| ||||
| Title | 1 | Descriptive title identifying the study design, population, interventions and if applicable, trial acronym | Elaboration | Indicate that the intervention involves artificial intelligence/machine learning and specify the type of model. |
| Elaboration | Specify the intended use of the AI intervention. | |||
|
| ||||
| Background and rationale | 6a | Description of research question and justification for undertaking the trial, including summary of relevant studies (published and unpublished) examining benefits and harms for each intervention | Extension | Explain the intended use of the AI intervention in the context of the clinical pathway, including its purpose and its intended users (eg, healthcare professionals, patients, public). |
| Extension | Describe any pre-existing evidence for the AI intervention. | |||
|
| ||||
| Study Setting | 9 | Description of study settings (eg, community clinic, academic hospital) and list of countries where data will be collected. Reference to where list of study sites can be obtained | Extension | Describe the onsite and offsite requirements needed to integrate the AI intervention into the trial setting. |
| Eligibility criteria | 10 | Inclusion and exclusion criteria for participants. If applicable, eligibility criteria for study centres and individuals who will perform the interventions (eg, surgeons, psychotherapists) | Elaboration | State the inclusion and exclusion criteria at the level of participants. |
| Extension | State the inclusion and exclusion criteria at the level of the input data. | |||
| Interventions | 11a | Interventions for each group with sufficient detail to allow replication, including how and when they will be administered | Extension | State which version of the AI algorithm will be used. |
| Extension | Specify the procedure for acquiring and selecting the input data for the AI intervention. | |||
| Extension | Specify the procedure for assessing and handling poor quality or unavailable input data. | |||
| Extension | Specify whether there is human-AI interaction in the handling of the input data, and what level of expertise is required for users. | |||
| Extension | Specify the output of the AI intervention. | |||
| Extension | Explain the procedure for how the AI intervention’s output will contribute to decision making or other elements of clinical practice. | |||
|
| ||||
| Harms | 22 | Plans for collecting, assessing, reporting and managing solicited and spontaneously reported adverse events and other unintended effects of trial interventions or trial conduct | Extension | Specify any plans to identify and analyse performance errors. If there are no plans for this, justify why not. |
| Access to data | 29 | Statement of who will have access to the final trial dataset and disclosure of contractual agreements that limit such access for investigators | Extension | State whether and how the AI intervention and/or its code can be accessed, including any restrictions to access or reuse. |
Table adapted from Cruz Rivera et al.21–23 Items within the SPIRIT 2013 statement that have not changed for the SPIRIT-AI statement have been omitted.
AI, artificial intelligence; SPIRIT, Standard Protocol Items: Recommendations for Interventional Trials.
Additional criteria to be included for studies relating to AI interventions within the CONSORT-AI statement (in addition to the CONSORT 2010 statement)
| Section | Item no | CONSORT 2010 item | Amendment | CONSORT-AI item |
|
| ||||
| Title and abstract | 1a | Identification as a randomised trial in the title | Elaboration | Indicate that the intervention involves artificial intelligence/machine learning in the title and/or abstract and specify the type of model. |
| 1b | Structured summary of trial design, methods, results and conclusions | Elaboration | State the intended use of the AI intervention within the trial in the title and/or abstract. | |
|
| ||||
| Background and objectives | 2a | Scientific background and explanation of rationale | Extension | Explain the intended use of the AI intervention in the context of the clinical pathway, including its purpose and its intended users (eg, healthcare professionals, patients, public). |
|
| ||||
| Participants | 4a | Eligibility criteria for participants | Elaboration | State the inclusion and exclusion criteria at the level of participants. |
| Extension | State the inclusion and exclusion criteria at the level of the input data. | |||
| 4b | Settings and locations where the data were collected | Extension | Describe how the AI intervention was integrated into the trial setting, including any onsite or offsite requirements. | |
| Interventions | 5 | The interventions for each group with sufficient details to allow replication, including how and when they were actually administered | Extension | State which version of the AI algorithm was used. |
| Extension | Describe how the input data were acquired and selected for the AI intervention. | |||
| Extension | Describe how poor quality or unavailable input data were assessed and handled | |||
| Extension | Specify whether there was human–AI interaction in the handling of the input data, and what level of expertise was required of users. | |||
| Extension | Specify the output of the AI intervention. | |||
| Extension | Explain how the AI intervention’s outputs contributed to decision making or other elements of clinical practice. | |||
|
| ||||
| Harms | 19 | Extension | Describe results of any analysis of performance errors and how errors were identified, where applicable. If no such analysis was planned or done, justify why not. | |
|
| ||||
| Funding | 25 | Extension | State whether and how the AI intervention and/or its code can be accessed, including any restrictions to access or re-use. | |
Table adapted from Liu et al.17–19 Items within the CONSORT 2010 statement that have not been changed for the CONSORT-AI statement have been omitted.
AI, artificial intelligence; CONSORT, Consolidated Standards of Reporting Trials.
Criteria for the CLAIM checklist for diagnostic accuracy studies using AI
| Section | Item no | STARD 2015 item | Amendment | CLAIM item |
|
| ||||
| Title | 1 | Identification as a study of diagnostic accuracy using at least one measure of accuracy (such as sensitivity, specificity, predictive values or AUC). | Elaboration | Identification as a study of AI methodology, specifying the category of technology used (eg, deep learning). |
| Abstract | 2 | Structured summary of study design, methods, results and conclusions. | Same | |
|
| ||||
| Background | 3 | Scientific and clinical background, including the intended use and clinical role of the index test. | Elaboration | Scientific and clinical background, including the intended use and clinical role of the AI approach. |
| Objectives | 4 | Study objectives and hypotheses. | Same | |
|
| ||||
| Study design | 5 | Whether data collection was planned before the index test and reference standard were performed | Same | |
| Extension | Study goal, such as model creation, exploratory study, feasibility study, non-inferiority trial. | |||
| Participants | 6 | Eligibility criteria (inclusion/exclusion). | Extension | State data sources. |
| 7 | On what basis potentially eligible participants were identified (such as symptoms, results from previous tests, inclusion in registry). | Same | ||
| 8 | Where and when potentially eligible participants were identified (setting, location and dates). | |||
| 9 | Whether participants formed a consecutive, random or convenience series. | Extension | Data preprocessing steps. | |
| Extension | Selection of data subsets, if applicable. | |||
| Extension | Definitions of data elements, with references to common data elements. | |||
| Extension | Deidentification methods. | |||
| Test methods | 10b | Reference standard, in sufficient detail to allow replication. | Elaboration | Definition of ‘ground truth’ (ie, reference standard), in sufficient detail to allow replication. |
| Elaboration | Source of ground truth annotations; qualifications and preparation of annotators. | |||
| Elaboration | Annotation tools. | |||
| 11 | Rationale for choosing the reference standard (if alternatives exist). | Same | ||
| 12b | Definition of and rationale for test positivity cut-offs or result categories of the reference standard, distinguishing prespecified from exploratory. | Elaboration | Measurement of inter-rater and intrarater variability; methods to mitigate variability and/or resolve discrepancies for ground truth. | |
| Model | New | Detailed description of model, including inputs, outputs, all intermediate layers and connections. | ||
| New | Software libraries, frameworks, and packages. | |||
| New | Initialisation of model parameters (eg, randomisation, transfer learning). | |||
| Training | New | Details of training approach, including data augmentation, hyperparameters, number of models trained. | ||
| New | Method of selecting the final model. | |||
| New | Ensembling techniques, if applicable | |||
| Analysis | 14 | Methods for estimating or comparing measures of diagnostic accuracy. | Elaboration | Metrics of model performance. |
| 16 | How missing data on the index test and reference standard were handled. | Same | ||
| 17 | Any analyses of variability in diagnostic accuracy, distinguishing prespecified from exploratory. | Elaboration | Statistical measures of significance and uncertainty (eg, CIs). | |
| Elaboration | Robustness or sensitivity analysis. | |||
| Elaboration | Methods for explainability or interpretability (eg, saliency maps) and how they were validated. | |||
| Elaboration | Validation or testing on external data. | |||
| 18 | Intended sample size and how it was determined. | Same | ||
| Extension | How data were assigned to partitions; specify proportions. | |||
| Extension | Level at which partitions are disjoint (eg, image, study, patient, institution). | |||
|
| ||||
| Participants | 19 | Flow of participants, using a diagram. | Same | |
| 20 | Baseline demographic and clinical characteristics of participants. | Elaboration | Demographic and clinical characteristics of cases in each partition. | |
| Test results | 23 | Cross tabulation of the index test results (or their distribution) by the results of the reference standard. | Elaboration | Performance metrics for optimal model(s) on all data partitions. |
| 24 | Estimates of diagnostic accuracy and their precision (such as 95% CIs). | Same | ||
| 25 | Any adverse events from performing the index test or the reference standard. | Elaboration | Failure analysis of incorrectly classified cases. | |
|
| ||||
| Limitations | 26 | Study limitations, including sources of potential bias, statistical uncertainty and generalisability. | Same | |
| Implications | 27 | Implications for practice, including the intended use and clinical role of the index test. | Same | |
|
| ||||
| Registration | 28 | Registration no and name of registry. | Same | |
| Protocol | 29 | Where the full study protocol can be accessed. | Same | |
| Funding | 30 | Sources of funding and other support; role of funders. | Same | |
This is based on the STARD 2015 guidelines,20 demonstrating which aspects are new, the same or elaborated on. Items not included in the CLAIM checklist (which were previously present in the STARD guideline) have been removed. Table adapted from Bossuyt et al20 and Mongan et al.39
AI, artificial intelligence; CLAIM, Checklist for Artificial Intelligence in Medical Imaging; STARD, Standards for Reporting of Diagnostic Accuracy Studies.