Literature DB >> 34240059

Machine Learning for COVID-19 Diagnosis and Prognostication: Lessons for Amplifying the Signal While Reducing the Noise.

Derek Driggs¹, Ian Selby¹, Michael Roberts¹, Effrossyni Gkrania-Klotsas¹, James H F Rudd¹, Guang Yang¹, Judith Babar¹, Evis Sala¹, Carola-Bibiane Schönlieb¹.

Abstract

Entities: Chemical

Year: 2021 PMID： 34240059 PMCID： PMC7995449 DOI： 10.1148/ryai.2021210011

Source DB: PubMed Journal: Radiol Artif Intell ISSN： 2638-6100

× No keyword cloud information.

Introduction

Since the emergence of COVID-19, researchers in machine learning and radiology have rushed to develop algorithms that could assist with diagnosis, triage, and management of the disease (1). As a result, thousands of diagnostic and prognostic models using chest radiographs and CT have been developed. However, with no standardized approach to development or evaluation, it is difficult, even for experts, to determine which models may be of most clinical benefit. Here, we share our main concerns and present some possible solutions.

Systematic Errors in the Literature

In April 2020, during the first wave of the novel coronavirus outbreak in Europe and the United States, Gog published an editorial outlining how researchers could use their skills to help (2). Her article was a call for researchers to proceed cautiously, stating that the priority should be to “amplify the signal” but avoid “adding to the noise” in the literature. In the several months since this appeal to caution, have we, as a research community, followed her guidance? Our AIX-COVNET collaboration is a multidisciplinary team of radiologists and other clinicians working alongside image-processing and machine learning specialists to develop artificial intelligence tools to support frontline practitioners in the COVID-19 pandemic (3). We set out to quantify common problems in the enormous number of articles that developed machine learning models for COVID-19 diagnosis and prognostication using thoracic imaging. We systematically reviewed every such study published between January 1 and October 3, 2020, and found two predominant sources of error (4). First, an apparent deterioration in standards of research, and second, a lack of collaboration between the machine learning and medical communities leading to inappropriate and redundant efforts. To create models quickly, researchers frequently have relaxed standards for developing safe, reliable, and validated algorithms. This laxity is most obvious in the datasets used to train these models. These datasets contain too few examples from patients with COVID-19, their quality is unreliable, and their origins are poorly understood. Many have been developed with access to only a few hundred COVID-19 images, where comparable models before the pandemic were trained using up to half a million examples (5). Few articles address this small-data issue, or the resulting imbalance of class sizes, making it unlikely that their results will generalize to the wider community. For example, because of the prevalence of data from China, many researchers train on small datasets from China when the model is intended for European populations, and recent research suggests such models are ineffective in practice (6). Differences between the training data and the target population, including patient phenotypes and data acquisition procedures, can all affect a model’s generalizability (6). Training generalizable models from small amounts of labeled data is a common problem in medical imaging, and techniques such as transfer learning, self- or semisupervised learning, and parameter pruning can ameliorate this issue (7,8). Although data sharing is critical for the research community to thrive, distributing or using public datasets of poor quality and unknown origins can further damage research efforts. Many public datasets are combinations of images assembled from other public datasets and redistributed under a new name (9,10). This repackaging of data has led to researchers unknowingly validating their models on public datasets that contain their training data as a subset, likely producing an optimistic view of their performance. There are also a surprising number of studies that unknowingly use a public dataset of pediatric patients for non–COVID-19 cases (9). Additionally, many researchers have not acknowledged that some popular public datasets of patients with COVID-19 are composed of images taken from journal articles with no access to the original DICOM files (11). Whether “pictures of pictures” provide the same quality data as original images is an issue that was discussed before the beginning of this pandemic (12,13) without an established consensus. In this time of crisis, these concerns have been ignored. Given the prevalence of research quality standards for developing medical models, it is perhaps surprising that such widespread issues exist in the COVID-19 literature. We have determined that disconnects between research standards in the medical and machine learning communities partly explain these issues. For example, the Prediction model Risk Of Bias Assessment Tool (PROBAST) checklist (14) for assessing the risk of bias in medical models requires models to be validated on an external dataset, but in machine learning research, it is common practice to validate a model using an 80:20 training-to-testing split from the same data source. On the other hand, model quality checklists, such as the radiomics quality score (RQS) (15), suggest that to protect against overfitting, a model must train on at least 10 training examples per model parameter. However, deep learning models have been shown to generalize well despite heavy overparameterization (16), so this requirement is often inappropriate for deep learning models. Furthermore, with deep learning models, it is difficult to interpret the extracted features, making it difficult to run standard risk-of-bias assessments from the medical literature (17). These gaps between research standards in medicine and machine learning allow the dissemination of irreproducible research, and they extend far beyond the immediate COVID-19 crisis. Collaboration and communication between these communities to bridge these gaps will be necessary as more machine learning models phase into clinical deployment.

Recommendations for Clinical Model Development during the COVID-19 Pandemic and Beyond

Our collaboration is exemplary as it comprises clinicians, machine learning researchers, mathematicians, and radiologists. Given our own experiences and the findings presented in our discussion of the literature, we propose some guiding principles for developing clinical models in the COVID-19 era and beyond. Many existing studies were performed without any input from clinicians. Because of this, models have been built to solve problems that do not necessarily provide significant clinical benefit (4). For example, in the United Kingdom, chest radiographs have a much more significant role in COVID-19 diagnosis than CT scans, but early models focused mostly on diagnosis from CT (18,19). Adapting to local medical practices is difficult without collaborating with clinicians. The origins of public datasets are often unknown, so it is difficult to determine their quality or suitability for inclusion in model development. Such datasets are also unlikely to represent a model’s target population, making it less likely for a model’s performance to generalize upon deployment. Training on high-quality data that are representative of the target community, with validation on data sourced externally, provides the best estimate of a model’s performance. Collecting high-quality data is always a challenge in machine learning, particularly data on a novel virus, but preparation can make data collection easier. Researchers must be familiar with local guidance surrounding the use and sharing of patient data, and pre-emptive protocols for obtaining, anonymizing, and securely storing data, including for anticipated future pandemics, are essential. The current crisis has demonstrated that without these pre-emptive protocols, data collection can be severely delayed. Equally important is developing efficient and potentially semiautomated data preprocessing pipelines to ensure rapid access to high-quality, well-curated datasets. Making these procedures publicly accessible also ensures that different groups do not need to spend time curating the same data. Obtaining large amounts of labeled data for medical applications is difficult, especially when they relate to a novel virus. Models should be adjusted to respond to this small-data problem. Although this is an ongoing area of research, several strategies have been shown to boost performance when working with small or sparsely labeled datasets, including semi- and self-supervised learning (7,20), weight transfusion, and limiting the number of trainable parameters (8). There are gaps between research standards in medicine and machine learning, and more research is required to resolve these inconsistencies. Machine learning researchers should be aware of the RQS (15) and the Checklist for Artificial Intelligence in Medical Imaging (CLAIM) (21), standard checklists to evaluate models using radiomic features. It is also imperative to evaluate a model’s risk of bias using standards such as PROBAST (14) and to report results following guidelines such as the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) checklist (22). Conversely, medical standards must be updated to support deep learning practices. Indeed, calls for an updated TRIPOD checklist (TRIPOD-ML) (23) and the related reporting guidelines SPIRIT-AI (24) and CONSORT-AI (25) are steps in this direction.

A Multimodal Approach to the Diagnosis and Prognosis of Patients with COVID-19

With these considerations in mind, there remain plenty of opportunities for machine learning models to aid clinicians during the current pandemic and beyond, with much of the knowledge gained applicable to other diseases including future pandemics. Below, we outline several data sources that could be used to develop models that are helpful to clinicians (Figure).

Our collaboration has identified five promising applications of machine learning in the COVID-19 pandemic. The AIX-COVNET collaboration’s vision for a multistream model incorporates multiple imaging segmentation methods (A, B, and C) with flow cytometry (D) and clinical data. (A) A saliency map on a radiograph from the NCCID dataset (26). (B) Segmented parenchymal disease on a CT scan from the National COVID-19 Chest Imaging Database (NCCID) (26). (C) Segmentation of calcified atherosclerotic disease on an image from the NCCID (26). (D) A projection of a flow cytometry scatterplot of side-scattered light (SSC) versus side-fluorescence light (SFL), giving insight into cell structures (analysis performed on a Sysmex UK [30] flow cytometer). Chest radiographs are a first-line investigation in many countries, including the United Kingdom. Researchers could examine not only the initial imaging findings and extent of respiratory involvement, but also how radiographic progression in serial studies correlates with patients’ clinical phenotypes. Many works have developed deep learning models using chest radiographs of patients with COVID-19, but further research is required to determine if similar models could be clinically viable, especially for prognostic models. Another promising area of research that has received some attention is in developing segmentation and classification methods to locate lung parenchyma that could be affected by COVID-19 and classify these regions as a symptom of COVID-19 or a result of another disease. High-quality datasets for chest radiographs and CT include the British National COVID-19 Chest Imaging Database (NCCID) (26) and the Medical Imaging and Data Resource Center (MIDRC-RICORD) datasets curated by the RSNA (27). Given that patients with cardiovascular comorbidities are at higher risk of severe disease and mortality (28), it is natural to consider the cardiovascular information that is also contained in thoracic CT. Models that incorporate automated calcium scoring, for example, allow for the burden of atherosclerotic disease to be incorporated into prognostic models, even in those patients with no prior cardiovascular diagnosis. The effects of COVID-19 on the heart have received little attention. Many diseases cause irregularities in the physical and chemical properties of blood cells, affecting distinct cell types differently. COVID-19 might cause a specific and unique set of changes that can be rapidly detected with flow cytometry. This often-untapped plethora of granular and longitudinal data has recently shown promising results when used in models for COVID-19 prognostication (29). Multiple centers collect data in different formats, consider different features, and store data in potentially many different systems. One significant challenge is to design an algorithm robust to these factors. Ideally, a model would use more than one data source. An especially promising direction for investigation is how to optimally combine clinical and radiomic features (4). Many clinicians welcome helpful and appropriately validated models into the clinic. By making these projects open source, interested hospitals can integrate these models into their clinical workflow.

The Impact of the Pandemic on the Future of Artificial Intelligence Development in Radiology

The COVID-19 pandemic presents an opportunity to accelerate cooperation between image scientists, data scientists, radiologists, and other clinicians; our collaboration is but one example. Researchers are close to realizing the potential of machine learning in health care, but there are still many barriers to deployment. To overcome many of these, we do not necessarily need more powerful machine learning models, but a better understanding of how to develop these tools responsibly. Bridging disconnects between machine learning and medical communities is an important step forward, and the current pandemic will forge vital collaborations with potential benefits beyond COVID-19.

14 in total

1. Augmenting the National Institutes of Health Chest Radiograph Dataset with Expert Annotations of Possible Pneumonia.

Authors: George Shih; Carol C Wu; Safwan S Halabi; Marc D Kohli; Luciano M Prevedello; Tessa S Cook; Arjun Sharma; Judith K Amorosa; Veronica Arteaga; Maya Galperin-Aizenberg; Ritu R Gill; Myrna C B Godoy; Stephen Hobbs; Jean Jeudy; Archana Laroia; Palmi N Shah; Dharshan Vummidi; Kavitha Yaddanapudi; Anouk Stein
Journal: Radiol Artif Intell Date: 2019-01-30

2. Checklist for Artificial Intelligence in Medical Imaging (CLAIM): A Guide for Authors and Reviewers.

Authors: John Mongan; Linda Moy; Charles E Kahn
Journal: Radiol Artif Intell Date: 2020-03-25

3. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning.

Authors: Daniel S Kermany; Michael Goldbaum; Wenjia Cai; Carolina C S Valentim; Huiying Liang; Sally L Baxter; Alex McKeown; Ge Yang; Xiaokang Wu; Fangbing Yan; Justin Dong; Made K Prasadha; Jacqueline Pei; Magdalene Y L Ting; Jie Zhu; Christina Li; Sierra Hewett; Jason Dong; Ian Ziyar; Alexander Shi; Runze Zhang; Lianghong Zheng; Rui Hou; William Shi; Xin Fu; Yaou Duan; Viet A N Huu; Cindy Wen; Edward D Zhang; Charlotte L Zhang; Oulan Li; Xiaobo Wang; Michael A Singer; Xiaodong Sun; Jie Xu; Ali Tafreshi; M Anthony Lewis; Huimin Xia; Kang Zhang
Journal: Cell Date: 2018-02-22 Impact factor: 41.582

4. PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies.

Authors: Robert F Wolff; Karel G M Moons; Richard D Riley; Penny F Whiting; Marie Westwood; Gary S Collins; Johannes B Reitsma; Jos Kleijnen; Sue Mallett
Journal: Ann Intern Med Date: 2019-01-01 Impact factor: 25.391

5. The RSNA International COVID-19 Open Radiology Database (RICORD).

Authors: Emily B Tsai; Scott Simpson; Matthew P Lungren; Michelle Hershman; Leonid Roshkovan; Errol Colak; Bradley J Erickson; George Shih; Anouk Stein; Jayashree Kalpathy-Cramer; Jody Shen; Mona Hafez; Susan John; Prabhakar Rajiah; Brian P Pogatchnik; John Mongan; Emre Altinmakas; Erik R Ranschaert; Felipe C Kitamura; Laurens Topff; Linda Moy; Jeffrey P Kanne; Carol C Wu
Journal: Radiology Date: 2021-01-05 Impact factor: 11.105

Review 6. Using imaging to combat a pandemic: rationale for developing the UK National COVID-19 Chest Imaging Database.

Authors: Joseph Jacob; Daniel Alexander; J Kenneth Baillie; Rosalind Berka; Ottavia Bertolli; James Blackwood; Iain Buchan; Claire Bloomfield; Dominic Cushnan; Annemarie Docherty; Anthony Edey; Alberto Favaro; Fergus Gleeson; Mark Halling-Brown; Samanjit Hare; Emily Jefferson; Annette Johnstone; Myles Kirby; Ruth McStay; Arjun Nair; Peter J M Openshaw; Geoff Parker; Gerry Reilly; Graham Robinson; Giles Roditi; Jonathan C L Rodrigues; Neil Sebire; Malcolm G Semple; Catherine Sudlow; Nick Woznitza; Indra Joshi
Journal: Eur Respir J Date: 2020-08-13 Impact factor: 16.671

7. How Might AI and Chest Imaging Help Unravel COVID-19's Mysteries?

Authors: Shinjini Kundu; Hesham Elhalawani; Judy W Gichoya; Charles E Kahn
Journal: Radiol Artif Intell Date: 2020-05-06

Review 8. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension.

Authors: Samantha Cruz Rivera; Xiaoxuan Liu; An-Wen Chan; Alastair K Denniston; Melanie J Calvert
Journal: Nat Med Date: 2020-09-09 Impact factor: 53.440

9. Rapid triage for COVID-19 using routine clinical data for patients attending hospital: development and prospective validation of an artificial intelligence screening test.

Authors: Andrew A S Soltan; Samaneh Kouchaki; Tingting Zhu; Dani Kiyasseh; Thomas Taylor; Zaamin B Hussain; Tim Peto; Andrew J Brent; David W Eyre; David A Clifton
Journal: Lancet Digit Health Date: 2020-12-11

Review 10. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension.

Authors: Xiaoxuan Liu; Samantha Cruz Rivera; David Moher; Melanie J Calvert; Alastair K Denniston
Journal: Nat Med Date: 2020-09-09 Impact factor: 87.241

3 in total

1. Developing a Screening Procedure During the COVID-19 Pandemic: Process and Challenges Faced by a Low-Incidence Area.

Authors: Wei Tang; Fei Wang; Jian-Wei Wang; Yao Huang; Li Liu; Shi-Jun Zhao; Xin-Ming Zhao; Ning Wu
Journal: Front Med (Lausanne) Date: 2021-12-24

2. Robust weakly supervised learning for COVID-19 recognition using multi-center CT images.

Authors: Qinghao Ye; Yuan Gao; Weiping Ding; Zhangming Niu; Chengjia Wang; Yinghui Jiang; Minhao Wang; Evandro Fei Fang; Wade Menpes-Smith; Jun Xia; Guang Yang
Journal: Appl Soft Comput Date: 2021-12-13 Impact factor: 6.725

3. Unbox the black-box for the medical explainable AI via multi-modal and multi-centre data fusion: A mini-review, two showcases and beyond.

Authors: Guang Yang; Qinghao Ye; Jun Xia
Journal: Inf Fusion Date: 2022-01 Impact factor: 12.975

3 in total