| Literature DB >> 34697578 |
Zhenxing Xu1, Chang Su2, Yunyu Xiao1, Fei Wang1.
Abstract
The new coronavirus disease 2019 (COVID-19) has become a global pandemic leading to over 180 million confirmed cases and nearly 4 million deaths until June 2021, according to the World Health Organization. Since the initial report in December 2019 , COVID-19 has demonstrated a high transmission rate (with an R0 > 2), a diverse set of clinical characteristics (e.g., high rate of hospital and intensive care unit admission rates, multi-organ dysfunction for critically ill patients due to hyperinflammation, thrombosis, etc.), and a tremendous burden on health care systems around the world. To understand the serious and complex diseases and develop effective control, treatment, and prevention strategies, researchers from different disciplines have been making significant efforts from different aspects including epidemiology and public health, biology and genomic medicine, as well as clinical care and patient management. In recent years, artificial intelligence (AI) has been introduced into the healthcare field to aid clinical decision-making for disease diagnosis and treatment such as detecting cancer based on medical images, and has achieved superior performance in multiple data-rich application scenarios. In the COVID-19 pandemic, AI techniques have also been used as a powerful tool to overcome the complex diseases. In this context, the goal of this study is to review existing studies on applications of AI techniques in combating the COVID-19 pandemic. Specifically, these efforts can be grouped into the fields of epidemiology, therapeutics, clinical research, social and behavioral studies and are summarized. Potential challenges, directions, and open questions are discussed accordingly, which may provide new insights into addressing the COVID-19 pandemic and would be helpful for researchers to explore more related topics in the post-pandemic era.Entities:
Keywords: Artificial intelligence; COVID-19 pandemic; Electronic health record; Machine learning
Year: 2021 PMID: 34697578 PMCID: PMC8529224 DOI: 10.1016/j.imed.2021.09.001
Source DB: PubMed Journal: Intell Med ISSN: 2667-1026
Figure 1The overall framework of this review. We review four aspects (i.e., epidemiology, therapeutics, clinical research, social and behavioral studies) in terms of applications of AI on COVID-19 pandemic. Also the challenges of each aspect are provided. Finally, the general challenges, directions, and open questions are discussed on model interpretation, model security, model bias, privacy issue and model precision.
Figure 2A general framework of ML (machine learning) and DL (deep learning) based drug repurposing. FNN: feedforward neural network; CNN: convolutional neural network; RNN: Recurrent neural network.
The summary of studies in terms of the applications of AI in drug repurposing
| Reference | Method | Data source & size | Number of identified drug candidates | Identified drug candidates |
|---|---|---|---|---|
| Zhou et al. (March 2020) | Network-based method | DrugBank database (v4.3), Therapeutic Target Database (TTD), PharmGKB database, ChEMBL (Sv20), BindingDB, and IUPHAR/BPS Guide to PHARMACOLOGY. | 16 drug candidates | Candidates: |
| Zeng et al. (July 2020) | Knowledge-graph | 24 million Pubmed research articles. A built knowledge graph contains 15 million edges, 39 types of relationships among nodes including drugs, diseases, proteins/genes, pathways, and | 41 | Tetrandrine, Nadide, Estradiol, and so on (see |
| Gysi et al. (May 2021) | Network-based method including network proximity, network diffusion, and AI-Net | 21 public databases for compiling protein-protein interactions (PPI) data including 18,505 proteins and 327,924 interactions between them; | 4 | Auranofin, Azelastine, Digoxin, and Vinblastine. |
| Wang et al. (May 2021) | Knowledge-graph | 25,534 peer-reviewed scientific articles. | 41 | Connecting 41 drugs based on Benazepril, Losartan, and Amodiaquine. |
| Zhang et al. | Knowledge-graph | PubMed, LitCovid, COVID-19. | 5 | Paclitaxel, SB 203,580, Alpha 2-antiplasmin, Metoclopramide, and Oxymatrine. |
| Gordon et al. (April 2020) | Network-based method | Public sources such as An interactive protein–protein interaction map | 69 | Silmitasertib, |
| Beck et al. (March 2020) | Knowledge-graph | Drug Target Common (DTC) database and BindingDB database. | 5 | Atazanavir, Remdesivir, Efavirenz, Ritonavir, and Dolutegravir. |
| Mall et al. (July 2020) | Knowledge-graph | MOSES, ChEMBL, UniProt, PubChem and NCBI. | 19 | Remdesivir, lopinavir, Ritonavir, and Hydroxychloroquine (see |
The summary of studies in terms of the applications of AI in clinical research
| Reference | Task | Data source & size | Method | Results |
|---|---|---|---|---|
| Su et al. (March 2021) | Explore albumin level between patients with COVID-19 and patients with sepsis. | 308 patients with COVID-19 and 363 patients with Sepsis | Chow's test, linear mixed-effects models, Fisher's exact test, | Two phases of alterations in albumin levels for patients with COVID-19 were found, which were not presented with patients with sepsis. |
| Liang et al. (May 2021) | Estimate the risk of developing critical illness for patients with COVID-19 | 72 potential predictors were considered from 1,590 patients with COVID-19 in the 575 hospitals of 31 provincial administrative regions in China as of January 31, 2020. | Least Absolute Shrinkage and Selection Operator (LASSO) and Logistic Regression (LR) models | AUC=0.88 (95% CI, 0.84–0.93) on a validation cohort with 710 patients. |
| Burn et al. (October 2020) | Explore the characteristics of patients with COVID-19 and influenza | 34,128 adult patients with COVID-19 and 84,585 patients with influenza | Data-driven approach | Compared to patients with influenza, patients with COVID-19 were more male, younger, and with fewer comorbidities and lower medication use. |
| Roth et al. (May 2021) | Investigate the characteristics of patients with COVID-19 in terms of in-hospital mortality in the United States | 20,736 adults with a diagnosis of COVID-19 in the US between March and November 2020. | A multiple mixed-effects logistic regression | The mortality rates for patients with COVID-19 were different between the months of March and April and later months in 2020, which were not fully explained by changes in age, sex, comorbidities, and disease severity. |
| Williams et al. (May 2021) | Predict hospitalization, intensive services, and death for patients with COVID-19 | The cohort for model development has More than 2 million patients diagnosed with influenza or flu-like symptoms any time prior to 2020. | Data-driven approach | The ranges of AUC on validation in terms of three outcomes including hospitalization, intensive services, and death were 0.73–0.81, 0.73–0.91, and 0.82–0.90, respectively. |
| Liang et al. (July 2020) | Predict the risk of COVID-19 patients developing critical illness | 74 baseline clinical features at admission from 1,590 patients with COVID-19 in the 575 hospitals of 31 provincial administrative regions in China as of January 31, 2020. | Feedforward neural network. | The proposed model was validated on three separate cohorts including 1,393 patients and showed the concordance index of 0.890, 0.852, and 0.967, respectively. |
| Yang et al. (December 2020) | Investigate population drifting in terms of COVID-19 patients | 21 routine blood tests from 5,785 patients in ED of New York Presbyterian Hospital/Weill Cornell Medical Center (NYPH/WCMC) between March 11 and June 30,2020. | Density-based spatial clustering of applications with noise (DBSCAN) and the Unified manifold approximation and projection (UMAP), | The number of SARS-CoV-2 patients with the COVID-19 HRP became less and less from March to June 2020. |
| Zhang et al. (June 2020) | Diagnose COVID-19 | 532,506 human lung CT scan images from 3,777 patients, China Consortium of Chest CT Image Investigation (CC—CCII) | CNN | Internal validation: Accuracy=92.49%; |
| Wang et al. | Diagnose COVID-19 | Lung CT images: 5,372 patients from seven cities or provinces in China. | A fully automatic DL model (DenseNet121-FPN) | AUC 0.87 and 0.88 on two validation sets in distinguishing COVID-19 from other pneumonia and AUC 0.86 in distinguishing COVID-19 from viral pneumonia. |
| Ozturk et al. (June 2020) | Diagnose COVID-19 | X-ray images: 127 COVID-19 cases, 500 no-finding, 500 pneumonia. | CNN | An accuracy of 98.08% for classifying COVID-19 and No-findings and 87.02% for classifying COVID-19, No-findings, and Pneumonia. |
| Chen et al. | Predict the severity of COVID-19 | 52 features from 362 patients with COVID-19 including 214 non-severe and 148 severe cases in China. | RF | 95% accuracy when considering all features and 99% accuracy when only using top 10 important features selected by Gini impurity. |
| Xu et al. | Diagnose COVID-19 | 618 CT images in total. | CNN | Accuracy = 86.7% |
| Avila et al. (June 2020) | Predict COVID-19 | 510 patients including 73 positives for COVID-19 and 437 negatives were from the emergency department of Hospital Israelita Albert Einstein (HIAE, São Paulo, Brazil). | Gaussian Naïve Bayes (NB) | 100% sensitivity and 22.6% specificity, 76.7% for both sensitivity and specificity, and 0% sensitivity and 100% specificity when prior values were set to 0.9999, 0.2933, 0.001, respectively. |
| An et al. (October 2020) | Predict mortality for patients with COVID-19 | Sociodemographic and medical information from 10,237 patients with COVID-19 in a nationwide | LASSO, SVM and RF | The LASSO model obtained best AUC (0.962 (0.945- 0.979)), and identified several significant predictors such as old age and preexisting DM or cancer. |
| Mei et al. (May 2020) | Diagnose COVID-19 | CT scan images and non-image information such as demographic and laboratory tests from 905 patients between 17 January 2020 and 3 March 2020 from 18 medical centers in 13 provinces in China. | CNN+MLP | AUC=0.92 on a test set with 279 patients. |
| Ardakani et al. (June 2020) | Diagnose COVID-19 | 1,020 CT images from 108 patients in Iran University of Medical Sciences (IUMS) hospital. | CNN (ResNet-101) | AUC = 0.994, |
| Yang et al. (November 2020) | Predict COVID-19 | Demographic information (i.e., age, sex, race) and 27 routine lab tests from 3,356 SARS-CoV-2 RT-PCR tested patients. | Gradient boosting decision tree (GBDT) | AUC = 0.854 (95% CI: 0.829–0.878). |
| Roy et al. (August 2020) | Diagnose COVID-19 | Italian COVID-19 Lung Ultrasound DataBase: 277 lung ultrasound videos from 35 patients, corresponding to 58,924 images. | Spatial Transformer Networks and CNN | Accurate prediction and localization of COVID-19 imaging biomarkers in three tasks including frame-based classification, video-level grading and pathological artifact segmentation. |
| Narin et al. (May 2020) | Diagnose COVID-19 | 341 images from COVID-19 patients, 2,800 normal chest images, 1,493 viral pneumonia and 2,772 bacterial chest X-ray images | CNN | 96.1%, 99.5%, and 99.7% accuracy on three datasets, respectively. |
| Jain et al. (September 2020) | Diagnose COVID-19 | 1,832 X-ray images strengthened from original 1,215 X-ray images by using data augmentation techniques | CNN (ResNet-50) | Training-validation-testing: accuracy, recall, and precision were 99.77%, 97.14%, and 97.14%, respectively. |
| Wang et al. (November 2020) | Diagnose COVID-19 | Two datasets including 1,102 and 625 chest X-ray images, respectively. | CNN and SVM | 99.33%, and 95.02% accuracy on two datasets, respectively. |
| Loey et al. (April 2020) | Detect COVID-19 | 8,100 chest X-ray images strengthened from original 306 chest X-ray images by using data augmentation techniques. | GAN with deep transfer learning | Testing sets: 100% accuracy; |
| Li et al. (September 2020) | Diagnose COVID-19; | Public dataset: 413 patients with COVID-19 and 1,071 patients with influenza | XGBoost model; | Sensitivity = 92.5%; |
| Zhou et al. (April 2020) | Identify subphenotypes | Mexican Government COVID-19 open data including 778,692 COVID-19 patients. | meta-clustering technique | Identify 3 clusters which showed different recovery rates |
| Su et al. (July 2020) | Identify subphenotypes | NYP-WCMC eligible 318 patients extracted from 1,661 patients with COVID-19 and NYP-LMH eligible 84 patients extracted from 458 patients with COVID-19. | Dynamic time warping and hierarchical agglomerative clustering method | Discovered distinct worsening and recovering subphenotypes within three strata including mild, intermediate, and severe strata. |
| V.Bhavani (December 2020) | Identify subphenotypes | 696 hospitalized patients in University of Chicago Medicine | Group-based trajectory modeling (GBTM) | Discovered 4 subphenotypes which were different in experiencing cytokine storm, coagulopathy, and cardiac and renal injury. |
| Lascarrou et al. (March 2021) | Identify subphenotypes | 416 COVID-19 patients with moderate to severe ARDS at 21 intensive care units in Belgium and France. | Hierarchical clustering method | Identified 3 subphenotypes which have different characteristics on comorbidities, mortality, sex, the duration of symptoms, plateau and driving pressure. |
| Legrand et al. (October 2020) | Identify subphenotypes | 608 patients in at eight teaching hospitals of the Assistance Pub- lique-Hôpitaux de Paris | Consensus cluster analysis method | Identified 3 subphenotypes which are different in terms of a history of chronic hypertension, the presence of fever, respiratory and non-respiratory symptoms, and age. |
| Schinkel et al. (February 2021) | Identify subphenotypes | 2,019 patients collected from COVID Predict project in the Netherlands. | Consensus cluster analysis method | Identified 3 subphenotypes which showed much difference in terms of demographics, comorbidities, and clinical outcomes. |
| Su et al. (July 2021) | Identify subphenotypes | Development cohort with 8,199 patients and internal and external validation cohorts both with 3,519 patients. Those patients were from five major medical centers in New York City (NYC), between March 1 and June 12, 2020. | Data-driven (agglomerative hierarchical clustering model) | Identified 4 subphenotypes which showed much difference in terms of demographics, clinical variables, comorbidities, clinical outcomes, and medication treatments |
Figure 3A general framework of using ML (machine learning) and DL (deep learning) techniques in COVID-19 diagnostic and prognostic prediction.
Figure 4A general framework of using AI techniques for the subphenotyping of patients with COVID-19. SOM: Self-Organizing Map; HAC: Hierarchical Agglomerative Clustering.
An example of a binary classification problem based on machine learning in terms of whether a loan would be returned using n + 1 attributes
| Variable_1 | Variable_2 | … | Variable_n | Q | Label (Y) | |
|---|---|---|---|---|---|---|
| Client_1 | F11 | F12 | … | F1n | Q1 | 1 |
| Client_2 | F21 | F22 | … | F2n | Q2 | 0 |
| … | … | … | … | … | … | … |
| Client_m | Fm1 | Fm2 | … | Fmn | Qm | 1 |
Q is a sensitive variable such as the user's race. Labels “0” and “1” represent “Returned” and “Defaulted”, respectively.
The summary of studies in terms of the applications of AI in epidemiology
| Reference | Task | Data source & size | Model | Result |
|---|---|---|---|---|
| Parbat et al. (May 2020) | Predict the total number of deaths, recovered cases, cumulative number of confirmed cases, and number of daily cases. | Johns Hopkins Github repository ( | Support vector regression model | The proposed model was efficient and has higher accuracy (more than 87%) than linear or polynomial regression methods. |
| Zeynep Ceylan (April 2020) | Estimate the prevalence of COVID-19 in Italy, Spain, and France. | The data of COVID-19 collected from the WHO website ( | Auto-Regressive Integrated Moving Average (ARIMA) model | ARIMA (0,2,1), ARIMA (1,2,0), and ARIMA (0,2,1) showed the best prediction performance (more than 82% accuracy) for Italy, Spain, and France, respectively. |
| Benvenuto et al. (February 2020) | Predict the epidemiological trend of the prevalence and incidence of COVID-2019 | the Johns Hopkins epidemiological data | Auto-Regressive Integrated Moving Average (ARIMA) model | ARIMA (1,0,4) and ARIMA (1,0,3) showed the best performance in terms of determining the prevalence and incidence of COVID-2019, respectively. |
| Rodriguez et al. (September 2020) | Real-time COVID-19 forecasting including incidence and cumulative weekly deaths and Incidence daily hospitalizations. | Johns Hopkins University (JHU) | DeepCOVID including data module, prediction module, and explainability module based on deep learning model | The proposed model was used in CDC COVID-19 Forecast Hub (since April 2020). |
| Singh et al. (September 2020) | Predict the spread of COVID-19 | Data collected from Kaggle website ( | Random Forest and Kalman Filter | The proposed model showed good performance in terms of short-term estimation, but not so good for long-term forecasting. |
| Zheng et al. (July 2020) | Predict the development and spread of the COVID-19 | Data collected from the national and provincial health commissions, and dxy.com website (Real-time data API for COVID-19 epidemic) ( | Hybrid AI Model based on susceptible-infected (ISI) model and RNN model | The proposed model acquired the lower mean absolute percentage errors in Wuhan (0.52%), Beijing (0.38%), Shanghai (0.38%), and countrywide (0.86%) for the next 6 days. |
| Huang et al. (May 2021) | Forecast the trend of COVID-19 pandemics under the influence of reopening policies. | Hospitalization and cumulative morality of COVID-19. | Risk-stratified SIR-HCD | The proposed model obtained lower mean squared error (MSE) and higher prediction accuracy compared to other models, and supports counterfactual analysis. |
| Liu et al. (May 2021) | Investigate the influence (reproduction number) of non-pharmaceutical public health interventions on COVID-19 epidemics in the United States | COVID Tracking Project ( | A generalized linear model (GLM) | Different NPIs showed different levels of reproduction numbers. |
| Tian et al. (July 2020) | Compare the effect of mild interventions in Shenzhen and countries in the United States | Daily cumulative confirmed cases of COVID-19 in Shenzhen, China and the countries in the United States ( | A synthetic control method with a modified selection of control variables and the proposed SIHR model | Implementing the early mild interventions has the potential to subdue the epidemic of COVID-19. |
| Zou et al. (May 2020) | Forecast the spread of COVID-19 | The Johns Hopkins University Center for Systems Science and Engineering; The New York Times data; The data from most states between 03/22/2020 and 05/10/2020. | SuEIR model | The proposed model has been adopted by the CDC for COVID-19 death forecasts. |
| Friedman et al. (May 2021) | Predict mortality of patients with COVID-19 | Public data: | SEIR model, Dynamic Growth, SIKJalpha. | Seven predictive models that showed better performance which had a median absolute percent error of 7% to 13% at six weeks. |
| Murray et al. (March 2020) | Predict hospital bed-days, ICU-days, ventilator-days and deaths | Data from local government, national government, and WHO websites were used. | A statistical model based on parametrized Gaussian error function | They forecasted total beds (64,175), ICU beds (17,380), ventilators (19,481), deaths (81,114) at the peak of COVID-19 in the United States between March to June 2020. |
| Hsiang et al. (September 2020) | Investigate the effect (rate of transmission) of non-pharmaceutical public health interventions on COVID-19 epidemics in China, South Korea, Italy, Iran, France and the United States | COVID-19 data collected from government reports, policy briefings and news articles ( | Reduced-form econometric model | The proposed model showed the interventions can reduce the rate of transmission and delay on the order of 61 million confirmed cases across 6 countries. |
| Li et al. (January 2021) | Predict the epidemic trends in terms of future confirmed cases within 7 days | Coronavirus Update (Live): ( | A transfer learning method called ALeRT-COVID using attention-based RNN architecture | ALeRT-COVID obtained a higher prediction in terms of future confirmed cases |
| Wang et al. (May 2021) | Investigate the impact of the temperature and relative humidity on effective reproductive number in COVID-19 epidemics | Records of 69,498 patients from Chinese National Notifiable Disease Reporting System and 740,843 confirmed cases from COVID-19 database of JHU CSSE ( | Fama-Macbeth Regression | High temperature and humidity can make contributions to the reduction of the transmission of COVID-19. |
| Rockett et al. (July 2020) | Revealing COVID-19 transmission in Australia | Data collected from infected patients during the first 10 weeks of COVID-19 containment in Australia, which reported by New South Wales (NSW) Ministry of Health | Agent-based model | The predictions from ABM were concordant with the local transmission rates. |
| Alzu'bi et al. (December 2020) | Investigate the effect of non-pharmaceutical public health interventions on COVID-19 epidemics | Coronavirus data collected from two urban neighborhoods separated by crossings. | Agent-based model by extending the SIR model | The policies including staying home and hospital isolation policies, and preventing travel between cities made contributions to the reduction of the prevalence and the deaths. |
| Brauer et al. (May 2021) | Estimated global access to handwashing with soap and water | Observational surveys in the context of the Global Burden of Diseases, Injuries, and Risk Factors Study in terms of access to a handwashing station with available soap and water for 1,062 locations from 1990 to 2019. | Spatiotemporal Gaussian process regression modeling | The handwashing access should be considered when building the forecasting models of COVID-19 in terms of low-income counties. |
| Jr et al. (October 2020) | Investigate the effect of social distancing mandates and levels of mask use | COVID-19 case and mortality data from 1 February 2020 to 21 September 2020 in the United States | SEIR model | Keeping universal mask use was enough to relieve the worst effects of epidemic resurgences in multiple states in the United States. Keeping social distancing was helpful for reducing the number of deaths for patients with COVID-19. |