Literature DB >> 33771732

Machine learning research towards combating COVID-19: Virus detection, spread prevention, and medical assistance.

Osama Shahid¹, Mohammad Nasajpour², Seyedamin Pouriyeh³, Reza M Parizi⁴, Meng Han⁵, Maria Valero⁶, Fangyu Li⁷, Mohammed Aledhari⁸, Quan Z Sheng⁹.

Abstract

COVID-19 was first discovered in December 2019 and has continued to rapidly spread across countries worldwide infecting thousands and millions of people. The virus is deadly, and people who are suffering from prior illnesses or are older than the age of 60 are at a higher risk of mortality. Medicine and Healthcare industries have surged towards finding a cure, and different policies have been amended to mitigate the spread of the virus. While Machine Learning (ML) methods have been widely used in other domains, there is now a high demand for ML-aided diagnosis systems for screening, tracking, predicting the spread of COVID-19 and finding a cure against it. In this paper, we present a journey of what role ML has played so far in combating the virus, mainly looking at it from a screening, forecasting, and vaccine perspective. We present a comprehensive survey of the ML algorithms and models that can be used on this expedition and aid with battling the virus.

Entities: Chemical

Keywords: Artificial intelligence; COVID-19; Drug development; Healthcare; Machine learning; Predictive analysis

Mesh：

Year: 2021 PMID： 33771732 PMCID： PMC7987503 DOI： 10.1016/j.jbi.2021.103751

Source DB: PubMed Journal: J Biomed Inform ISSN： 1532-0464 Impact factor: 8.000

Introduction

In December 2019, a novel severe contagious disease called COVID-19, which is caused by Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2), was first discovered in Wuhan, China [90]. COVID-19 is an airborne disease and can easily spread and infect people [142]. According to the Centers for Disease Control and Prevention (CDC) [36], the infected people show a range of symptoms like dry cough, shortness of breath, fatigue, losing the sense of taste and smell, diarrhea, and congestion. Infected patients can also present fever episodes. Strangely enough, some patients who have contracted the virus might not even show any of the aforementioned symptoms [153]. They can feel completely normal carrying the virus and continuing to spread the disease without knowing it [153]. As COVID-19 has a rapid nature of spreading, the World Health Organization (WHO) declared it as a global pandemic in March 2020 [220]. As of now (January 2021), the total number of confirmed COVID-19 cases worldwide was over 85 million [120]. To tackle this outbreak, scientists in different research communities are seeking a wide variety of computer-aided systems such as the Internet of Things [147], Machine Learning (ML) or Deep Learning (DL) techniques [208], [5], Big Data [27], and Blockchain [40], [234], [44] that can assist with overcoming the challenges brought by COVID-19. These technologies can be used for controlling the spread of the virus, detecting the virus, or even designing and manufacturing a vaccine or drug to combat it. There were two epidemics in the past from the coronavirus family including Severe Acute Respiratory Syndrome (SARS-CoV) [57] and Middle Eastern Respiratory Syndrome (MERS) [125]. SARS-CoV is a respiratory virus that was transmissible from person to person, and it was first identified in 2003. The virus had over 8,000 confirmed cases worldwide during its course which affected over 26 countries as stated on the WHO website [219]. MERS is also a respiratory virus with similar symptoms of SARS-CoV. ML, as a subset of Artificial Intelligence (AI), has shown a lot of potentials in many industries like retail [101], banking [46], healthcare [43], [51], pharmaceuticals [65], cybersecurity [181], [99], [77], [145] and many more [66]. ML techniques can be programmed to imitate human intelligence. For example, in healthcare domain, ML techniques can be trained and used towards medical diagnosis [129]. ML models have been vastly trained over a dataset consisting of medical images like Computed Tomography (CT) Scan, Magnetic Resonance Imaging (MRI), or X-ray to detect anomalies [221], [216]. ML models can be expanded in diverse areas including cancer [112], diabetes [246], fatty liver [226], etc. As an example, breast cancer can be diagnosed with a prediction accuracy of 97.13% [14] using ML models. During previous epidemics, ML techniques have been widely implemented in order to assist healthcare professionals and authorities for better actions regarding the diseases [2]. For example, Sandhu et al. [183] proposed an ML model that utilizes GPS technology along with cloud computing power and Google Maps to represent potentially infected patients and provide an alternative route for uninfected users resulting in potentially mitigating the spread. The model reaches the classification accuracy of 80% in re-routing away from infected patients. In another study, Choi et al. [48] used ML models for sentimental analysis to review public overreaction appearing in media articles and social media platforms. The proposed model could rapidly monitor the public reaction and helped policymakers in taking the right actions in reducing fear and distress from the public regarding MERS. ML has also been widely used in order to improve clinical decision-making regarding the current COVID-19 pandemic [61]. Interestingly, in some cases, it enables the researchers to forecast the spread of virus in different areas [89]. Using ML image classification techniques, facilitate COVID-19 diagnosis for healthcare professionals [158]. Additionally, with the objective of finding a cure for the virus, ML algorithms are utilized for drug discovery/repurposing and even vaccine development [81]. The aforementioned examples show the potential of ML in the detection, diagnosis and prediction of viruses. This paper largely provides a survey of latest research on using ML technology in combating COVID-19. In particular, we investigate the role of ML in detecting or screening, forecasting, and medical assistance for the virus. The remainder of the paper is organized as follows. In Section 2, we review the role of ML in detecting and the screening process of COVID-19. We study the use of different ML techniques regarding four major sections for diagnosing and screening including Medical Imaging, Chatbot, and Artificial Intelligence of Things (AIoT). In Section 3, we explore different ML Techniques towards predicting and tracking the spread of COVID-19. This section is mainly divided into three parts reviewing preventing the spread, contact tracing, and forecasting. Similarly, Section 4 reviews the need for medical assistance during the pandemic and how ML technology can be applied for drug discovery/repurposing and vaccine development. Finally, we discuss, outline future work, and conclude in Sections 5, 6 respectively.

ML techniques towards detecting and screening COVID-19

Detecting either a symptomatic or asymptomatic disease in an early stage could be highly effective in order to begin treatment process. Regarding the COVID-19, it not only helps to avoid the spread of contamination, but also it is cost-beneficial. The standard method of diagnosing COVID-19 is to conduct Reverse-Transcription Polymerase Chain Reaction (RT-PCR) test [53]. The RT-PCR is a swab test that is used to detect nucleic acid from COVID-19 in the upper and lower respiratory system. At the beginning of the pandemic, the sensitivity of the RT-PCR test could show negative for patients who were later in-fact confirmed positive [228]. Initially, there was also a concern about having a shortage and limited number of tests present. Those examples highlight the importance of exploring alternative methods of diagnosing COVID-19 that could speed-up the process [26]. Among them, ML models and algorithms have shown promising results in different stages of COVID-19 diagnosis[115]. For instance, Yang et al. [229] developed a ML model based on individual’s laboratory test results, which can assist in early detection of high risk COVID-19 patients. Similarly, another study demonstrated the effectiveness of ML classification models using routine blood tests data for COVID-19 detection [28]. In general, ML techniques have been widely utilized in the healthcare domain and similarly, it can be used towards analyzing patients’ data and diagnosing COVID-19. In this section, we review different ML techniques that have been implemented for screening and diagnosis of COVID-19 using medical imaging which includes X-ray and CT Scan images. Moreover, we discuss other ML-based tools including Chatbots and Artificial Intelligence of Things (AIoT). At the end of this section, we discuss how the ML techniques could be constrained or limited towards detecting and screening COVID-19.

Medical imaging

Diagnosing COVID-19 is one of the most important parts of dealing with the disease. Although, using RT-PCR test is very common for the detection of COVID-19, the chance of eliciting false-positive and false-negative results is concerning [140]. So there is an essential need for using other approaches such as medical images analysis for accurate and reliable screening and diagnosis of COVID-19 [232]. In general, analyzing medical imaging modalities such as chest X-ray and CT-Scan have key contributions in confirming the diagnosis of COVID-19 as well as screening the progression of the disease [109]. Different ML techniques that incorporate X-ray and CT-Scan image processing approaches could help healthcare professionals in diagnosing and understanding the progression of COVID-19.

X-ray

Medical imaging and in particular Chest X-ray (CXR) imaging is one the most commonly used medical imaging modalities in thoracic abnormalities diagnosis [187]. During COVID-19 pandemic, CXR imaging plays a vital role in early detection of COVID-19 to classify COVID-19 and normal chest due to its low cost, fast imaging speed, and low radiation [97]. Within the categorization of medical imaging, CXR was recommended to be implemented as the first medical imaging regarding COVID-19 by the Italian Society of Medical Radiology (abbreviated as SIRM in Italian) [149], [195]. Fig. 1 demonstrates the CXR images from infected and normal people. According to Cozzi et al. [56], CXR has a sensitivity of 67.1% which can be first implemented in special cases including assisting physicians and healthcare professionals with better COVID-19 cases identification and fast treatment assigning to the patient. Another approach by Hassanien et al. [79] is implemented using CXR images in order to classify the lung lesions (caused by COVID-19) with Multi-level Threshold (MT) process and Support Vector Machine (SVM) model. Within this model, firstly, the lung image contrast will be enhanced. Secondly, the image will be reduced into specific sections (using MT) to avoid duplication of work on uninfected areas. Lastly, the SVM model classifies the sections of the lung with respect to the predefined healthy lungs. Sethy et al. [188] developed a DL Convolutional Neural Network (DCNN) model for classification of CXR images for detection of COVID-19. Similarly, [34] proposed a DCNN model using the data gathered from two hospitals in Italy to represent the importance of AI in the detection of COVID-19.

Fig. 1

Chest X-Ray (CXR) images of COVID-19 infected people versus uninfected people [42].

Chest X-Ray (CXR) images of COVID-19 infected people versus uninfected people [42]. Zhang et al. [237] trained ML models over a large viral pneumonia dataset of CXR images to detect anomalies. They tested their model on a completely different dataset that has COVID-19 CXR images. This is done as one of the symptoms of COVID-19 can be pneumonia [36]. The results are impressive as the model performs well when tested on the COVID-19 dataset with the Area Under the Curve (AUC) of 83.61%. It is even more impressive as the model was trained on a different dataset and yet performed well. Similarly, Wang et al. [212] utilized COVIDx dataset, a publicly available dataset consists of COVID-19, pneumonia and non-COVID-19 pneumonia-related X-ray images. The authors used this data to train their model for detection of COVID-19, the Deep Neural Network (DNN) is referred to as COVID-Net showing promising results in diagnosing infected patients. Apostolopoulos et al. [11] used transfer learning approaches like feature extraction and fine-tuning of Convolutional Neural Networks (CNN) based models and trained and tested over similar datasets achieving a prediction accuracy up to almost 98%. They demonstrated that implementing transfer learning can have a significant improvement in results. Most ML classifiers are trained and tested to achieve high prediction accuracy of COVID-19; however, it is also important to quantify the uncertainty that could exist by using such classifiers as a primary medium of diagnosis. An approach to validate the ML prediction of diagnosis in CXR images was reviewed by Ghoshal et al. [72]. It exploited a Bayesian Deep Learning classifier to estimate the model uncertainty. From their analysis the authors see a strong correlation between the model uncertainty and the accuracy of model prediction. Knowing an estimation of the uncertainty could lead to a more reliable prediction and possibly even alert healthcare workers over false predictions. Many other ML models are utilized for detecting COVID-19, and a subset of them are presented in Table 1 .

Table 1

ML Research done towards diagnosing COVID-19using X-RAY (CXR) datasets.

Reference	Dataset	Methods	Remarks
[11]	Multiple Datasets that include 448 confirmed COVID-19 images source: Github	DL - CNN, Feature Extraction (various models)	Various model performance comparison
[72]	68 COVID-19 Chest X-ray images and 5873 Pneumonia images source: Github, Kaggle	Bayesian Deep Learning Classifiers	Estimate uncertainty by a Transfer learning approach with the classifier
[68]	COVIDx	Fine Tuning - ResNet Model	Model achieves high accuracy for multi-class classification
[82]		Multiple models VGG19, MobileNet	Test and train multiple image classifiers to obtain the highest accuracy identifying the virus;
			VGG19 and DenseNet have a high accuracy score
[85]		Multiple Image Classification Models	Vulnerability of DNNS to a universal adversarial perturbation cause failure in classification tasks
			Fine-tuning could be a solution
[3]		Capsule Network-based framework - COVID-CAPS	Decent performance of the framework with low trainable parameters
[185]		DenseNet-121	Test the model’s robustness by performing multi-class classification and k-fold validation
[206]		SqueezeNet-Bayesian based model	Classify X-ray images into normal, COVID-19, and pneumonia
			Use techniques of data augmentation and fine-tuning
[110]		CoroNet	Implement semi-supervised learning based on AutoEncoders
[124]		EfficientNet	Produce a model with high quality and accuracy
[212]		COVID-Net	Publicly accessible for scientists for further improvements
[146]	CXR images of 50 COVID-19 patients, and 50 Normal CXR images	CNN models (InceptionV3, ResNet50, Inception-ResNetV2)	Achieve highest classification by ResNet50
[128]	170 Chest X-ray images of 45 patients from 5 different sources	Modified Pre-trained AlexNet and a simple CNN	Achieve a higher accuracy with pre-trained network
[204]	295 COVID-19 CXR images and 163 Pneumonia and Normal CXR images	MobileNetV2/SqueezeNet	Achieve high classification rate with DL models
[159]	127 COVID-19 CXR images and 1000 Pneumonia and Normal CXR images	DarkCovidNet (CNN)	Implement binary classification and multi-class classification (better performance by Binary model)
[169]	Dataset containing both COVID-19 and Non COVID-19 cases from multiple sources	VGG16, InceptionV3, Xception, DenseNet-121, NasNet-Mobile, etc.	Best performance of VGG16 compared to other models
[209]	306 COVID-19 CXR images and 113 normal CXR images	Decision Tree Classifier in a CNN model	Robust tested method with high accurate results
[100]	CXR images of confirmed 150 COVID-19 patients from Wuhan source: Kaggle	Convolutional Neural Network	Achieve 93% accuracy by the proposed model
[188]	Multiple datasets with 183 COVID-19 images of SARS-CoV and MERS	Implement 9 different models for COVID-19	Achieve the highest accuracy for ResNet50 and SVM models
[205]	231 COVID-19, 2100 pneumonia, and normal CXR images source: Github	Novel ANN and Convolutional CapsNet	High accurate diagnosis with Binary Classification
[50]	423 COVID-19, 3064 normal, and viral pneumonia CXR images	8 Different CNN Models	Achieve best performance for CheXNet by implementing Transfer Learning and Data augmentation
[160]	192 COVID-19 and 145 normal CXR images	nCOVnet(VGG-16)	Achieve high accuracy in predicting COVID-19 infected patients from CXR Images
[92]	10 CXRs from COVID-19 confirmed patients in China and USA	DL U-Net Model	Demonstrate great promise with potential use towards early diagnosis for COVID-19 pneumonia
[67]	126 COVID-19, 5835 normal and pneumonai CXR images	GSA-DenseNet121-COVID-19 (Hybrid CNN using Optimization)	Achieve a high accuracy up to 98% in diagnosis
[134]	250 COVID-19, 4934 Non COVID-19 CXR images	ResNet18, ResNet50, SqueezeNet and DenseNet-121	Perform well across multiple parameters (Receiver Operating Characteristic, precision-recall curve, etc.)
[58]	Multiple datasets including 162 COVID-19 and Non COVID-19 CXR images	Truncated Inception Net	Achieve an accuracy of 99.92% (AUC 0.99) in classifying COVID-19 positive cases
[21]	305 COVID-19 and 822 Non COVID-19 CXR images	Transfer Learning method employed on pre-trained models	Use Gradient Class Activation Map for detecting where the model focuses more for classification
[166]	180 COVID-19 and Non COVID-19 CXR images	Xception and ResNet50V2	Good performance for COVID-19 detection from the concatenation of two models
[238]	318 COVID-19 and Non COVID-19 CXR images	COVID-DA	Propose a Deep Learning model that has a novel classifier separation scheme
[10]	455 COVID-19 and 3450 Non COVID-19 CXR images	MobileNetV2	Higher accuracy by training a CNN MobileNetV2 model compared to transfer learning techniques

ML Research done towards diagnosing COVID-19using X-RAY (CXR) datasets.

CT-scan

Another applicable medical imaging tool for COVID-19 diagnosis is a chest Computed Tomography (CT) Scan, which is more accurate in detecting COVID-19 cases [47]. Due to respiratory problems of COVID-19, which include lung abnormalities, CT-Scan can be specified as the detecting procedure for the early stage of a pandemic while none of the COVID-19 symptoms appear in patients [192]. Fig. 2 demonstrates the CT-Scan images from infected and normal people.

Fig. 2

CT-Scan images of COVID-19 infected people versus uninfected people [42].

CT-Scan images of COVID-19 infected people versus uninfected people [42]. Ardakani et al. [13] implemented a Computer-aided diagnosis system to show the benefits of DL in diagnosing COVID-19 using a variety of CNNs, which conclude the ResNet-101 as the most precise model. Similarly, Li et al. [118] developed a DL framework known as COVID-19 detection neural network (COVNet), which can differentiate COVID-19 from typical types of pneumonia using chest CT-Scan images. An ML model for quantitative infection assessment through CT Scan images was modeled by Shan et al. [189] claiming their model is capable of estimating the shape, volume, and the percentage of infection. A Human-In-The-Loop method [86] was proposed that involves healthcare workers to intervene with the VB-Net (a modified 3D CNN that combines V-Net model and Bottle-neck structure) where the healthcare workers can add the newly processed CT-Scan images into the dataset. They way the training model is constantly updated to on new data allowing the model to produce more efficient results Tang, et al. [203] introduced another method to detect the severity of COVID-19 through assessing quantitative features from CT-Scan images. The use of the Random Forest (RF) model including 500 decision trees along with threefold cross-validation where it allows authors to calculate 63 quantitative features of COVID-19 like infection volume or lung ratio. However, the model is limited by a binary classification i.e., results can either be severe or non-severe, when they should be mild, common, severe, and critical. Gozes et al. [75] trained clinical models integrated with ML to detect the virus that achieves high accuracy and is also able to quantify the burden of the disease. Similar to the previous section, ML techniques that can be used for detecting COVID-19 from CT-Scan images are presented in Table 2 . The list of references includes both published and yet to be peer-reviewed.

Table 2

ML Research done towards diagnosing COVID-19 using CT-Scan datasets.

Reference	Dataset	Methods	Remarks
[239]	Open-sourced COVID-CT source: Github	Multi-task and Self-Supervised learning	Clinically useful
[203]	Clinical CT scan images of 176 COVID-19 cases	Random Forest and Threefold cross-validation	Better performance from Random Forest model in reflecting the severity of COVID-19
[75]	Multiple Datasets of CT scan images (Chinese CDC, China and USA hospitals, and Chainz.cn)	ResNet50	High accuracy in identifying COVID-19 cases
[118]	4356 CT scan images (including COVID-19 and Non COVID-19) from 6 Hospitals	Deep Learning model (CovNET)	High accuracy in identifying COVID-19 cases from other lung diseases
[30]	618 Clinical CT scan images (including COVID-19 and Non COVID-19)	Deep Learning (ResNet18)	High accuracy by using a location attention mechanism (detect COVID-19 cases from others)
[214]	Clinical CT Scan Images (COVID-19 and Non COVID-19 - 99 Patients from 3 Hospitals)	Modified Transfer Learning and Inception Model	Use a fine-tuning technique with pre-trained weights
[18]	Clinical CT scan images from 133 Patients in China	Multi Stage, DL Models, and LSTM	Capable of extracting spatial and temporal information efficiently (better prediction performance)
[162]	CT scan images of 413 COVID-19 and 439 of pneumonia or normal cases	ResNet50	Better performance using transfer learning technique
[103]	CT scan images from 5 Hospitals in China	DL learning models (Inception, ResNet50, 3D U-Net++)	Provide good prediction results whilst overcoming challenges
[189]	549 CT Scan images obtained from clinics in China	Deep Learning and VB-Net	Refine automation of cases by a Human-in the loop section (segmentation and quantifying infected regions)
[102]	Large-scale dataset including 10,250 CT scan images of COVID-19 and Non COVID-19 scans	UNet and 2D Segmentation DL CNN Model	Outperforming radiologists in diagnostic performance
[211]	Clinical CT scan images of 558 COVID-19 patients with pneumonia from 10 hospitals	COPLE-Net and Noise-Robust Dice Loss	Outperform standard Noise-Robust loss functions
			Good performance in segmenting labels for COVID-19 pneumonia lesion by COPLE-Net and the framework
[197]	Clinical CT scan images of 83 COVID-19 and 83 Non COVID-19 cases	BigBiGAN (bi-directional generative adversarial network)	Achieve high validation accuracy in identifying COVID-19 pneumonia from CT images
[227]	Large-scale dataset that include 400 COVID-19, and more Non COVID-19 scans	Classification, Segmentation and Encoder-Decoder Model - Res2Net	Highly efficient model for Classification and Segmentation
[243]	Multiple datasets that include 473 COVID-19 CT scans	UNet	Propose a method to incorporate spatial and channel attention
[230]	Dataset of CT-Scans from 1,684 COVID-19 patients	Inception V1	Validate the model in 3 ways including 10-fold cross-validation achieving high AUC for the validation dataset
[231]	Clinical CT scan images including 146 COVID-19 and 149 Non COVID-19 cases	DenseNet	Classify COVID-19 over CT Images with high AUC
[158]	219 CT scan images of COVID-19 and 399 CT scan images of normal or other diseases	VGG-16, GoogleNet, ResNet	Use of Support Vector Machine (SVM) for binary classification
[141]	746 CT scan images of COVID-19 and Non COVID-19 cases; Open-Source - Github	Capsule Networks (CapsNets), ResNet	Present a detail oriented capsule network, implement data augmentation techniques to overcome lack of data
[198]	Clinical CT scans of 88 patients exposed to COVID-19 from China	DeepPneumonia (ResNet-50)	Capable of predicting COVID-19 with high accuracy
[242]	1,129 Clinical CT scan images for COVID-19 detection	UNet, 3D deep Convolutional Network (DeCoVNet)	Predict COVID-19 infectious probability accurately without annotating lesions for training

ML Research done towards diagnosing COVID-19 using CT-Scan datasets.

Chatbots

Computer programs developed to communicate with humans by adopting natural languages are called chatbots [190]. Basically, a chatbot can communicate with different users and generate proper responses to those users based on their inputs. Recently, the COVID-19 pandemic has led to building different chatbots instead of using hotlines as a communication method. This will reduce hospital visits and increase the efficiency of communication [131]. Generally, chatbots are implemented in order to provide an online conversation with the user by either text or voice displayed on web applications, smartphone applications, channels, and so on [31]. Chatbots are usually considered as one of the best suited to screen patients remotely without interactions[104]. The advantages of them include quickly updating information, repetitively encouraging new behaviors such as washing hands, and assisting with psychological support due to the stress caused by isolation and misinformation [135]. The ML-based chatbots are improved during the training procedure while using more data makes this approach more reliable. During the COVID-19 pandemic, chatbots are getting more attention in order to provide details about COVID-19 in different stages. A wide variety of chatbots with different languages have been implemented to help patients at the early stage of COVID-19. “Aapka Chkitsak”, an AI-based chatbot developed by [24] in India, assists patients with remote consultation regarding their health information and treatments. This application was developed on Google Cloud Platform with the main assistance of Natural Language Processing (NLP), which is compatible with either speech or text. Similarly, Ouerhani et al. [157] developed a chatbot (called “COVID-Chatbot”) based on DL model which uses NLP in order to enhance the awareness of people about the ongoing pandemic. COVID-Chatbot was implemented to decrease the impact of the disease during and after the quarantine phase. Another example is Bebot [23] that provides updated data regarding the pandemic, and also assists patients with symptoms checking. Some other implemented chatbots during this unprecedented time are including Orbita [156], Hyro [93], Apple’s screening tools (website, application, voice command or Siri) [12], CDC’s self-checker [36], and Symptoma [201]. A brief description of these chatbots can be found in Table 3 .

Table 3

AI chatbots/virtual assistants combating COVID-19.

Reference	AI chatbot/virtual assistant Name	Origin	Company	Function
[0.5ex] [24]	Aapka Chkitsak	India	Academic Research	Remote consultation
[157]	COVID-Chatbot	Tunisia and Germany	Academic Research	Enhance awareness regarding COVID-19
[23]	Bebot	Japan	Bespoke	Update information and Check symptoms
[156]	Orbita	USA	–	Reduce contacts
[93]	Hyro	Israel	–	Interact with patients
[201]	Symptoma	Austria	–	Diagnose by checking symptoms
[201]	COVID-BOT	France	Clevy.io	Assist with symptoms by knowledge of government and WHO

AI chatbots/virtual assistants combating COVID-19.

Artificial Intelligence of Things (AIoT)

In general, applications of the Internet of Things (IoT) [78], [191], [150] and AI can assist businesses with automation processes [199]. During the COVID-19 pandemic, AI and IoT are getting more attention in the healthcare domain where screening and detecting procedures can be done more safely. Thermal imaging and social distance monitoring are two main functions that are mainly considered in the screening phase of COVID-19. In fact, the aims of using those devices are high-temperature detection, face mask screening, and distance controlling that are discussed in the coming sections.

Thermal imaging

With respect to the IoT thermal screening applications, AI can assist in this area by implementing appropriate algorithms. SmartX[196], a thermal screening device using infrared thermal imaging and AI face recognition, makes screening in crowd buildings or entrances more efficient (see Fig. 3 ). The device captures a visitors’ temperature and also checks whether they are wearing face mask. A similar device was developed in Taiwan for a hospital with the collaboration of Microsoft in order to detect face mask wearings and temperatures. Consequently, any abnormalities can easily be reported to the staff/authorities to take a proper action [154].

Fig. 3

An industrial thermal imaging system enabled with AI [196].

Social distance monitoring

Regarding the necessity of practicing social distancing using IoT devices, AI can implement an automated screening approach using computer vision methods [180]. An instance of such a device is RayVision [175], which ensures social distancing and face mask wearing guidelines are followed in the crowd. By using the computer vision techniques, it can monitor people with a live stream on its specific dashboard, which allows alerting the authorities in case of any rule-breaking [210]. Fig. 4 represents the process of monitoring using industrial screening system. Similarly, Landing AI [116] is another AI-based technology, which can detect social distancing violations in real-time. Moreover, a peer-reviewed research [170] implemented an Unmanned Aerial Vehicle (UAV) or a drone with the ML application in response to the need for maintaining social distance in crowds.

Fig. 4

An industrial screening system for monitoring the social distance of people and their personal protective equipment [175].

Discussion

Although the analysis of medical images like CT-Scan and X-ray, along with the integration of ML with other technologies in diagnosis COVID-19 have shown promising results, there are still some constraints that need to be contemplated. One of the major challenges that researchers are facing is inadequate image datasets available for ML techniques. Additionally, these publicly available datasets utilized in the different ML models usually come from various medical image sources such as hospitals and medical institutions where it is challenging to pursue inclusion and exclusion COVID-19 criteria such as symptomatic vs. asymptomatic COVID-19 cases or the severity stage of COVID-19 at which these images were taken. These constraints bring a form of uncertainty for wherein the ML models can handle and classify the COVID-19 infection through medical images. Therefore, medical professionals should also use other means of testing like RT-PCR to further validate the results. Using other technologies such as ML-based Chatbots, which are mainly designed to act as medical professionals to support and advise patients, are also associated with challenges. Briefly, these types of chatbots usually cannot follow a complex language in a conversation. Additionally, privacy concerns would be another challenge when a patient shares his or her history with the chatbot to receive a proper recommendation. Similarly, security and privacy concerns are highlighted when it comes to using AI-based IoT devices.

ML techniques towards predicting and tracking the spread of COVID-19

As we are currently in the midst of a global pandemic, the ability to predict and forecast the spread of the COVID-19 could i) help the general population in taking preventative measures, ii) allow healthcare workers to anticipate and prepare for the next wave of potentially infected patients, and iii) allow policymakers to make better decisions regards the safety of the general population. More importantly, the ability to predict the spread of the virus can be used to mitigate and even prevent the spread of COVID-19. ML models can be utilized regarding the forecasting of the virus in providing early-signs of the COVID-19 and projecting its spread. Also, contact tracing and social media data analysis can help with providing early information of COVID-19 in-turn reducing the spread of the virus [213], [115]. The scope of this section is to provide a review of the research of ML tools and models that can make this possible (see Table 5 ). Finally, the section is concluded by addressing some of the remaining challenges in this area.

Table 5

ML Predictive Analysis Tools and Methods Combating COVID-19.

Reference	Section	Model and Technology	Remarks
[7]	Early Tracking, Prevention	Predictive Analysis tool	Using flight details data and recent outbreaks to predict the spread in nearby countries
[89]	Early Tracking, Prevention	ANN - K-Means Algorithm	Using a MAE to successfully predict 2-day spread
[71]	Forecasting	Non-Auto Regressive Neural Network	Prediction Error, due to scarcity of error at the time of Analysis
[121]		Augmented ARGONet	Clustering of Chinese Provinces, and getting a 2-day forecast
[69]		Polynomial Neural Network (PNN) - GROOMS	Addressing data augmentation and importance of early forecast
[19]		RNN	Researching predicting using GRU + LSTM combined models
[194]		K-Means Clustering Algorithm	Possible to predict the spread of cases
[4]		FPASSA-ANFIS (ANN)	Predict a 10-day forecast of the number of cases in China
[177]		ISACL-MFNN	Predict a 10-day forecast of the number of cases in multiple countries
[16]		LSTM and LR models	Predict and forecast of the number of cases of COVID-19 in Iran
[179]		Regression Model, Prophet Prediction	Time-Series Forecasting
[207]		Federated Machine Learning	Efficient mortality prediction of hospitalized patients considering data privacy
[178]		Deep Learning, DEEPCOVID - Framework	Provide real-time COVID-19 forecasting, Use its predictions for CDC
[245]		SuEIR model (a SEIR model integrated with ML)	Predict the number of unreported/untested cases
[173]	Social Media Analysis	AI Algorithms	Phone based survey to determine whether a person is high-risk, low-risk or contracting the virus
[98]		Eclass1-MIMO	Classifying a twitter dataset to determine morbidity in regions
[182]		Natural Processing Language	Getting public sentiment by classifying tweets
[163]		Latent Dirichlet Allocation (LDA)	Algorithms used to spot semantic relationship between words
[240]		Sentiment Analysis	Building a visual cluster to highlighting public opinion over pandemic
[126]		Unsupervised ML (biterm topic model)	Attain content analysis by assessing user tweets
[108]		Naive Bayes, Logistic Regression, and more.	Automate detection of positive COVID-19 report results through tweets
[186]		Shallow Neural Networks	Training multiple word2vec models to put context to words

ML Predictive Analysis Tools and Methods Combating COVID-19.

Early signs, preventing the spread

Obtaining early warning signs for an outbreak of an epidemic could really help towards slowing and mitigating the spread of the virus. It can also encourage governments to impose necessary precautionary measures including lockdowns, quarantines and social distancing [148]. In this section, we review the early-warning signs that were made possible using the ML technology. The World Health Organization (WHO) made statements about COVID-19 being a potential global outbreak on the January, 2020 [224]. There were AI companies like BlueDot and Metabiota that were able to predict the outbreak even earlier [25], [133], [7]. BlueDot focuses on spotting and predicting outbreaks of infectious diseases using its proprietary methods and tools. They use ML and NLP techniques to filter and focus on the risk of spreading a virus. Using the data from local news reports of first few suspected cases of COVID-19, historical data on animal disease outbreaks, and airline ticket information, they were then able to use their tool to predict a definite outbreak occurring within near cities and other regions of China [152]. BlueDot had warned its clients about the outbreak on the 31st December 2019, over a week prior to the WHO made any statements about it [70]. Similarly, Metabiota used their ML algorithms and Big Data to predict outbreaks and spreads of diseases, and event severity [80]. They used their technology and flight data to predict that there will be a COVID-19 outbreak in countries like Japan, Thailand, Taiwan, and South Korea.

Contact tracing

One of the major approaches for preventing the spread of the virus is tracing the confirmed cases of COVID-19 because of the potential spread of the virus through droplets by coughs, sneeze, or talking [38]. It is recommended that not only the people who have tested positive for COVID-19, but also the ones who had been in close contact with the confirmed cases should be quarantined for 14 days. The contact tracing applications are applied all over the world for this purpose with different methods. Basically, it starts after the diagnosis process because the detected case needs to be traced. Most importantly, after the data is collected by those applications, ML and AI techniques will start analyzing data for discovering further spread of the disease [52]. Although the contact tracing applications could be deeply helpful during the pandemic, privacy issues can bring high concerns regarding the surveillance of individuals by some governments as a result of huge amount of the collected data [168], [32]. Using the digital footprint data provided by the applications along with ML technology could allow users to identify infected patients and enforce social distancing measures. A real-time contact tracing using AI has been applied in South Africa using the SQREEM platform [63], which is developed in Singapore in order to track people who have possibly contracted COVID-19. This information is not including the personal data of the user. If the user enters an infected area, he or she will be contacted by the authorities with respect to the probability of infection. One of the common way of contact tracing is using smartphone devices. A variety of smartphone applications enabled with ML or AI have been adopted in order to slow the spread of the virus by tracking and warning regarding the unsafe contacts [113]. Within the process of development, the very first part would be the consideration of framework, either centralized or decentralized using appropriate technologies such as Global Positioning System, Quick Response codes, and Bluetooth [117]. ML can enable automatic alerting and analyzing the massive captured data, which would reduce the workforce [52]. Apple and Google announced that a Bluetooth-based platform for tracking close contacts will be implemented in the upcoming months. This technology will enable higher participation and better communication [161]. An application is developed in South Korea in order to capture the areas using location-based information, where confirmed cases have gone before testing positive for COVID-19. This application will notify people automatically by sending text message who may have been exposed to the contamination areas [114]. Regarding the numerous implemented contact tracing applications, Table 4 presents some of them that have been implemented in various countries [87].

Table 4

Contact Tracing Applications Combating COVID-19.

Reference	Application	Function	Origin	Technology
[74]	AarogyaSetu	Track close contacts of users	India	Bluetooth
[74]	AarogyaSetu	Notify user if captured users are infected	India	GPS location
[6]	Alipay Health Code	Track close contacts of users	China	GPS
		Track traveling information, and body temperature
		Display the situation of user by three colors		Bank transactions’ history
		The situations include healthy, in need of short or long quarantine
[17]	BeAware Bahrain	Track close contacts of users	Bahrain	Bluetooth
[17]	BeAware Bahrain	Track Quarantined and self-isolated cases	Bahrain	Location
[15]	COVIDSafe	Track close contacts of users	Australia	Bluetooth
[15]	COVIDSafe	Notify user if captured users are infected	Australia	Bluetooth
[176]	CovTracer	Track close contacts of users	Cyprus	GPS Location
[176]	CovTracer	Notify user if captured users are infected	Cyprus	GPS Location
[55]	CovidRadar	Track close contacts of users	Mexico	Bluetooth
[55]	CovidRadar	Notify user if captured users are infected	Mexico	Bluetooth
[130]	Ehteraz	Track close contacts of users	Qatar	Bluetooth
[130]	Ehteraz	Notify user if captured users are infected	Qatar	GPS
[137]	eRouska(CZ Smart Quarentine)	Track close contacts of users	Czech Republic	Bluetooth
[137]	eRouska(CZ Smart Quarentine)	Notify user if captured users are infected	Czech Republic	Bluetooth
[136]	GH Covid-19 Tracker App)	Track the places an infected user had gone	Ghana	GPS
[136]	GH Covid-19 Tracker App)	Allow for reporting symptoms	Ghana	GPS
[95]	Hamagen	Track close contacts of users	Israel	Location based on API
[94]	Immuni	Track close contacts of users	Italy	Bluetooth Low Energy
[94]	Immuni	Notify user if captured users are infected	Italy	Bluetooth Low Energy
[96]	Ito	Measure the chance of infection	Germany	Bluetooth
[96]	Ito	Guide for better safety manner	Germany	Bluetooth
[132]	Mask.ir	Track close contacts of users	Iran	Bluetooth
		Provide a map of contaminated areas
		Allow for reporting symptoms
[144]	MyTrace	Track close contacts of users	Malaysia	Bluetooth Low Energy
[144]	MyTrace	Notify user if captured users are infected	Malaysia	Bluetooth Low Energy
[200]	StopCovid	Track close contacts of users	France	Bluetooth
[200]	StopCovid	Notify user if captured users are infected	France	Bluetooth
[62]	TraceCovid	Track close contacts of users	UAE	Bluetooth
		Access to the user’s information by government (privacy concern)
		Notify user if captured users are infected
[138]	TraceTogether	Track close contacts of users	Singapore	Bluetooth
		Access to the user’s information by government (privacy concern)
		Notify user if captured users are infected

Contact Tracing Applications Combating COVID-19.

Forecasting

Forecasting epidemics centers on tracking and predicting the spread of infectious diseases and viruses. During an epidemic, forecasting methods and models can be trained on epidemiological related data to provide an estimated number of infected cases, patterns of spread that can guide healthcare workers on how to prepare appropriately for an outbreak [193]. For example, forecasting tools such as Susceptible, Infected, Recovered, and Dead (SIRD) models can be used to determine the spread of COVID-19 through the population of Hubei, China [8]. Recently, with the COVID-19 pandemic, using ML approaches for forecasting the spread of COVID-19 is getting lots of attention among the research communities. Hu et al. [89] proposed an unsupervised ML method that is used for forecasting and tracking the probable spread of COVID-19 across some provinces. The authors used a Modified Auto-Encoder (MAE) model and trained it to predict the transmission of COVID-19 cases. Using the data from WHO website [223], the authors were able to implement a clustering analysis technique, k-means, by grouping the provinces for their analysis. The proposed model has a good prediction rate towards predicting when the total number of new cases could reduce and eventually plateau in those provinces. Similarly, Liu et al. [121] utilized clustering techniques to forecast COVID-19 activity in Chinese provinces. The data for their ML model is generated by digital traces such as local Internet search that are related to COVID-19 within those provinces, news media articles, the COVID-19 data provided by the China CDC, and the daily forecasts generated by an epidemic model. The authors applied augmentation techniques on the limited data and came up with a dependable model that can predict a 2-day ahead forecast of the spread of COVID-19. In their study, the authors compared the ahead of time results with the China CDC as the pandemic unfolded. Al-Qaness et al. [4] used another forecasting technique to predict the confirmed cases in China for the following 10 days. In their analysis, the authors used the aforementioned COVID-19 dataset from WHO [223]. In this novel forecasting technique, the authors combined and modified a Flower Pollination Algorithm (FPA) [1] and a Salp Swarm Algorithm (SSA) [139] to improve and evaluate the optimal parameters for an Adaptive Neuro-Fuzzy Inference System (ANFIS) [107] by creating an FPASSA-ANFIS model. The authors checked the robustness of the proposed model for COVID-19 test using two different influenza datasets for USA and China. Their model performs relatively well for previous pandemics proving its dependability for the current, and future epidemiological threats. Another study that focuses on the forecasting the spread of COVID-19 in China was performed by Yang et al. [233]. The authors first utilized a modified Susceptible-Exposed-Infected-Removed SEIR epidemiological model to predict the spread of the virus across China. They also used another model, a ML-based Long Short-Term Memory model (LSTM) to predict new infections. To train this ML-based model, the authors used 2003 SARS epidemic statistics and incorporated the COVID-19 epidemiological parameters, such as incubation rate, the probability of transmission, the probability of recovery or death to train the LSTM model. The two approaches both provide strong insights in predicting the peak of the epidemic and also a promise towards predicting forecasting for future epidemics. In a similar research presented by Ayyoubzadeh et al. [16] daily new cases in Iran were predicted by using Linear Regression (LR) model and a 3-layer LSTM model. The authors trained their model on data from Google Trends [73] and Worldometer website [225]. To evaluate the performance of the proposed models, 10-fold cross-validation was applied, and the Root Mean Square Error (RMSE) metric was chosen. Fong et al. [69] introduced GROOMs methodology that ensembles five different types of forecasting methods such as time-series forecasting, self-evolving polynomial neural networks, and so on. From their analysis, the authors [69] concluded that from their list of the available methods, the polynomial neural technique with corrective feedback has the best ability to forecast and achieves the lowest prediction error for their findings. Alternatively, Rizk et al. [177] integrated algorithms and techniques like Interior Search Algorithm (ISA) and Chaotic Learning (CL) into a Multi-Layer Feed-Forward Neural Network (MFNN) to create a forecasting model called ISACL-MFNN. Combining the two, ISA and CL, approach can enhance the overall performance of MFNN. The authors gathered a dataset from WHO website [222] that included data from USA, Italy, and Spain between January 2020 and April 2020. They trained the model successfully on this dataset and evaluated the performance through the aforementioned techniques such as RMSE and Mean Absolute Percentage Error (MAPE), and more. The outcome the authors achieved in [177] is that their model provides a reliable alternative methodology that could be used for COVID-19 forecasting purposes. Ghazaly et al. [71] utilized the limited data from WHO about confirmed COVID-19 cases and deaths between January 2020 and April 2020 to train a Non-Linear Autoregressive Model (NAR) and predict the future cases and deaths that could occur in 9 countries. However, due to not having enough historical data for their model, the authors concluded their network is unable to continue to predict the future cases in those countries. However, over the same time period of those three months, Roy et al. [179] implemented their ML techniques to forecast the number of cases for infected, recovered, and deceased cases country-wise and globally. The authors used a type of regression model called the Prophet Prediction Model. Developed by Facebook, the Prophet Prediction model is able to create precise time-series forecast that is simple and could provide accurate prediction results. Bandyopadhyay et al. [19] used a Recurrent Neural Network (RNN). The LSTM and Gated Recurrent Unit (GRU) model to predict the number of confirmed cases and deaths. The model was trained on the confirmed cases between January 2020 and March 2020 data from Kaggle [59]. The results presented in their findings indicate that the RNN is capable of predicting the cases and assessing the severity of COVID-19. To tackle the spike of cases that have occurred in the United States, Rodriguez et al. [178] have been working with the CDC [35] since the very beginning. The authors developed a DL/ML framework that has been aiding the CDC towards forecasting analysis. The operational framework, which is named DEEPCOVID, provides the CDC a real-time forecast [178]. DEEPCOVID consists of three modules including the data module, the prediction module, and the explainability module. The target of the framework is to predict the new number of COVID-19 incidents and deaths weekly. Its overall performance indicates that it provides a reliable prediction. Similarly, another ML-based method that is used by the CDC to obtain a projection of the number of confirmed and fatal cases, is presented by Zou et al. [245]. Their ML model, called SuEIR, is based on combining ML techniques along with the aforementioned epidemiology SEIR model. The model for its epidemiology analysis also considers the potential untested or unreported cases of COVID-19, hence, making it a (Susceptible, Untested, Exposed, Infectious, Removed) a SuEIR model. The integration of ML techniques that are used for training the model prove in resulting and improving the overall efficiency of the model. The approach that is presented in [245] is also capable of providing short-term prediction for infected cases and deaths across the US. The Institute for Health Metrics and Evaluation (IHME) [54] used a statistical model to predict the number of deaths and hospital utilization that could occur in US states over the following 4 months from the time of the study. Being able to predict the hospital utilization, ICUs and ventilators could really help healthcare workers prepare appropriately. From the analysis the authors found that the number of deaths per day could reduce by the first week of June 2020. The US CDC uses different forecasting methods to predict the number of COVID-19 cases and deaths across the US over a period of time. In fact, along with some of the aforementioned forecasting models, the CDC also participated in a collaborative effort with other industries, institutions, and academic research teams to develop accurate predicting models. The latest forecasting models can be found at the COVID-19 Forecast Hub website [91]. In theory, forecasting results from multiple forecasting models can be combined to form a single weighed average. The forecast could generate a more robust prediction as discussed by Ray et al. [174]. In their work, they took the models provided by the participants in the collaborative effort to generate an ensemble prediction for the number of deaths that could occur over the next four weeks. Their model has proved to be well-calibrated. However, there was the constraint of limited historical data.

Social media analysis

Social media has become a platform where people share pictures, reviews, posts, and exchange stories. A popular social media platform where people may obtain and access news is Twitter. Major news outlets, government bodies, community centers, etc., all have accounts that they use to share updates on Twitter. Its users can validate live alerts and obtain information directly through the smartphone application. Users can also use the platform to share their personal experiences via tweets. It can essentially be considered a form of microblogging for users who just want to share their insight over a certain topic. Tweets can be a form of data that can analyze feedback and obtain public sentiment over certain topics. Over the course of the COVID-19 pandemic, people have engaged on twitter and other social media platforms to share their experiences through this pandemic [108]. Obtaining feedback over Twitter and social media outlets will give a live reflection of how the general public is reacting to the pandemic and can help policymakers in making better decisions. The COVID-19 was declared a global pandemic in March 2020 [220]. However, people had already been posting and discussing it over social platforms. Between January 27th and March 26th, 2020 there were over 5.5 million tweets with keywords “‘corona virus” and “coronavirus” according to [108] when the authors constructed a dataset for their ML models. The dataset consists of both positive and negative tweets shared by users. Using this dataset, the authors were able to implement several ML methods like Logistic Regression, Naive Bayes Classification (NB) to automate the detection of COVID-19 positive results shared over twitter by users. Both Logistic Regression and NB supervised ML models were also used by Samuel et al. [182] to obtain public sentiment and feedback about COVID-19 through users tweets shared over the platform. As there is an abundance of news and tweets shared over twitter, a sentiment analysis was done by [163] addressing the need for filtering it out as there is a potential of misinformation being spread across social platforms. The authors implemented an unsupervised ML topic modelling technique known as Latent Dirichlet Allocation (LDA). The use of LDA is done to spot the semantic relationship between words in a tweet and provides a sentiment analysis on whether the tweets are positive and showing signs of comfort or whether they are negative showing discomfort and panic. In their findings, negative tweets are higher as people are prevalent in anger and sadness towards quarantine and death. Jahanbin et al.[98] gathered data from Twitter by searching for COVID-19 related hashtags. The dataset of tweets is pre-processed and filtered first to remove irrelevant data. This would allow training a better model for classification. The authors used an evolutionary algorithm, called Eclass1-MIMO, for their analysis. Similarly, Mackey et al. [126] introduced an unsupervised ML approach that analyzes tweets by users who may be infected by the virus, recovering from it or the experiences they had related to testing for it. The authors used a Biterm Topic Model (BTM) combined with clustering techniques to determine statistical and geographical characteristics depending on content analysis. Forecasting the spread of COVID-19 is essential on determining the impact the virus may have. ML integrated with epidemiology studies not only can aid to project the impact and the severity of the spread of the virus, but also give policymakers an understanding of how to move forward in handling the situation. In this section, we have presented various ways on how ML can be utilized towards forecasting and mitigating the spread. However, there are some limitations and challenges that still need to be discussed regarding achieving the goal in its entirety. To date, the COVID-19 has been declared a pandemic for over 10 months. However, one of the main issues in predicting a real-time spread of COVID-19 is the infected asymptomatic patients. Those who are infected but show no symptoms and can continue to spread the virus [60]. As the information of those cases cannot be recorded or traced easily, the ML models based on those data cannot be that much reliable towards predicting the spread of the virus. Beyond this, misinformation on social media, noisy and imbalanced data are considered as another challenge in using ML models for prediction and tracking the spread of COVID-19. This misinformation which comes in many forms such as tweets, facebook posts, etc. can be fed into ML models and ultimately result in poor outcomes. In this case, it is necessary to apply advanced technologies like NLP to detect and remove the content with no scientific basis from all social media datasets before feeding them into ML models. Similarly, noisy and imbalanced data can also deteriorate the performance of ML models making ML algorithms to be biased, so it is crucial to apply different data pre-processing techniques before applying them in the ML process.

ML techniques towards medical assistance

As the virus spreads across the world infecting more of the population and with the death toll rising rapidly, efforts are made to develop an effective vaccine or discover a drug for COVID-19. To do that, it is vital to understand how the human body reacts to the virus by understanding the immune system responses when combating the virus [49]. In this section, we review the efforts of research communities in using ML techniques regarding understanding the virus [42], how to attack it, and perhaps even be able to find a cure for COVID-19 (see Table 6 ). We also discuss some of the limitations of using ML towards drug discovery and vaccine development.

Table 6

Vaccine and Drug Development of COVID-19 Using ML Algorithms.

Reference	Sections	Model and Technology	Remarks
[155]	Vaccine Development	Logistic Regression, Support Vector Machine, etc.	Use a tool, called Vaxign, to implement Reverse Vaccinology
[122]		OptiVax, EvalVax, netMHCpan, etc.	Predict binding between virus proteins and human protein molecules
[215]		NetMHCPan	Create an online tool for visualisation and extraction of COVID-19 meta-analysis
[167]		Ellipro, multiple ML methods	Predict the epitope structure
[184]		SVM	Review epitope-based design for a COVID-19 vaccine
[164]		ANN	Predict COVID-19 epitopes
[165]		DeepNovo, LSTM, RNN	Discover antibodies in patients using a predictive analysis of protein sequences
[84]		netMHCpan, netMHC	Predict peptide sequences by ML techniques
[81]	Drug Repurposing	DL Models - Neural Networks	Analyse the response of approved FDA Drugs to COVID-19
[22]		DL - Drug Target Interactions	Repurpose current drugs to discover any affinity between drug and proteins
[106]		Neural Networks and Naive Bayes	Predict drug interaction between proteins and compounds
[143]		CNN, LSTM and MLP models	Predict similarities between available drugs to combat COVID-19
[88]		Fine-Tuning AtomNet based Model	Predict binding between COVID-19 proteins and drug compounds
[241]		Various ML methods	Use RL strategies to generate new 3CLpro structure
[29]		Deep Neural Network	Create small molecule interaction and target 3CLpro
[236]		Deep Learning Models	Provide large scale virtual screening to identify protein interacting pairs

Vaccine and Drug Development of COVID-19 Using ML Algorithms.

Understanding the virus

Analyzing the genomics and proteomics characteristics of a viral disease is an important step to combat the disease. Scientists have been studying the virology of COVID-19 to gain a better understanding of the origin and the cell receptor binding of the virus, and the genomic characteristics [123]. A genome is the complete genetic information that provides the architecture of a virus. Knowing the genome for COVID-19 that can provide a clearer understanding towards transmissibility and infectiousness of the virus [9]. The study of proteomics is knowing the proteins of an organism. Identifying the proteins of COVID-19 would allow a better understanding of the overall protein structure and discovering how the proteins would interact with the inhibitors [151]. Over recent years, there have been remarkable advancements by scientists in interdisciplinary fields of bioinformatics and computational medicine. ML techniques have shown meaningful interpretation towards determining genomics and protein structures of various diseases[119]. In this section, we focus on COVID-19 and discuss the ML techniques that have been implemented regarding the research of interpreting the genomics and proteomics of that. COVID-19 is an RNA (ribonucleic acid) type of virus from the coronavirus (CoV) family. It is a single-stranded RNA with a large viral genome. These large genomes can have two or three viral proteases. For COVID-19, it has 2 proteases, which we will refer to as 3CLpro [33]. COVID-19 belongs to the same family as the aforementioned respiratory diseases SARS and MERS [76]. Viruses that belong to this type of family can infect a range of animal species such as camels, cats, cattle, bats, as well as humans [235]. They are easily transmittable and can infect a host in one species and transmit it onto another species [235]. Multiple findings suggest that the origin of COVID-19 in infecting humans is transmitted from bats with an 89% similarity structure identity to a coronavirus that infects bats (SARS-like-CoVZXC21) [39]. The family of coronavirus has multiple classes, and the virus can belong to either the alpha or the beta class of virus. SARS-CoV and MERS both were determined to be beta coronaviruses according to the CDC [37]. To determine and classify what specific type of virus COVID-19 could be, Randhawa, et al. [172] used a supervised ML technique with digital signal processing techniques (MLDSP) for genome analyses. MLDSP techniques have previously been used to achieve high accuracy in the classification of other viruses and diseases such as Influenza [171]. Using this model, the authors could evaluate that like its predecessor, COVID-19 also belongs to the beta coronavirus. To predict the protein structure of COVID-19, Heo et al. [83] utilized a ML-based method called TrRosetta. This method can be used to predict the inter-residue distance and create structure models for the protein. To do this with higher prediction efficiency, the authors applied molecular dynamics simulation-based refinement. Magar et al. [127] highlighted the importance of knowing the biological structure and protein sequences for combating the virus. With this in mind, the authors developed an ML model that is able to predict inhibitory synthetic antibodies response to the virus and prevent it from spreading. The ML model also provides some insight on what sequence of the binding region of the antibody could counter the viral mutation. The model is trained on a dataset that includes antibody-antigen sequences of a range of viruses such as HIV, Influenza, etc.

Drug discovery and vaccine development

As COVID-19 cases continue to rise numbers of both the infected cases and the death toll, it has become an urgent need to discover a drug that could mitigate these numbers from increasing any further. ML techniques can be used to analyze how drugs react to the viral proteins of COVID-19. We have already seen ML methods and techniques like SVM, RNN and Bayesian Classifiers being used for drug discovery and repurposing [41], [244]. In this section we review the ML studies and research that had been done about discovering the new drugs or repurposing the currently approved Food and Drug Administration (FDA) ones. We also review the ML research that has been done regarding the vaccine development.

Drug discovery and repurposing

An exploratory approach of determining whether commercially available anti-viral drugs can treat or help towards reducing the severity of COVID-19 infected patients was presented by Beck et al. [22]. They used a pre-trained ML interaction prediction model, called Molecule Transformer-Drug Target Interaction (MT-DTI). MT-DTI is reliable model to predict the binding affinity between COVID-19 infected proteins and compounds. The objective of their study was to identify potential FDA approved drugs that may restrain the proteins of COVID-19. MT-DTI is capable of predicting the chemical sequences and amino acid sequences of a target protein without the whole structure information. This is helpful to use as there was limited knowledge on the overall structure of viral proteins of COVID-19 initially. Regarding their advantage, the authors used the MT-DTI model to predict binding affinities of 3,410 FDA-approved drugs. Similarly, Heiser et al. [81] used proprietary DL techniques for the purpose of drug discovery. They used their model to evaluate how FDA and European Medicines Agency (EMA) approved drugs and compounds would affect human cells, analyzing over 1,660 drugs. In some cases, various drugs from antiviral to antimalarial could be used to combat COVID-19 [143]. These drugs used for combating severe illnesses are referred to as ”parents” by Moskal et al. [143] in their study. The authors used ML techniques like CNN, LSTM, and Multi Layered Perception (MLP) analyzing the molecular similarity between these “parents” drugs and second-generation drugs that could potentially be also used to fight against the virus. The authors introduced the second generation drugs as “progeny”. This study is important in predicting other drugs that may be helpful in this pandemic. It can result in having a larger catalog of drugs, which provides alternative solutions if the ”parent” drug fails to respond. Kadioglu et al. [106] identified three viral proteins as targets for their ML approach. They targeted the Spike protein, the nucleocapsid protein, and the 2’-o-ribose-methyltransferase protein. The spike protein acts as a cellular receptor for the host of the virus. The nucleocapsid protein plays a vital role in coronavirus transcription and the overall forming of the genomics of the RNA virus. The 2’-o-ribose-methyltransferase protein is an essential protein for coronavirus synthesis and processing. The authors used ML algorithms against the three proteins in predicting how FDA-approved drugs and natural compounds react to the three proteins with such key characteristics. In [29], the authors used a DNN to predict and generate a new small design for molecules that would be capable of inhibiting COVID-19 3CLpro. Targeting 3CLpro can be an essential part with respect to the drugs discovery for COVID-19. Alternatively, Zhavoronkov et al. [241] utilized 28 various types of ML methods such as Autoencoders, Generative Adversial Networks, Genetic Algorithms to predict and generate the molecular structures. Using deep-Q learning networks Tang et al. [202] were able to potentially generate 3CLpro compounds of COVID-19 that can be used for targeting COVID-19. Being able to successfully predict these protein targets can provide advancements in developing a potential drug for the virus. Hu et al. [88] created an ML model that predicts the binding between the potential drugs and COVID-19 proteins.

Vaccine development

Once a virus starts to spread and turns into a global pandemic, there is a very little chance of stopping it without a vaccine [20]. That stands true for COVID-19 as well. Historically, vaccination has been the solution to control or slow the spread of a viral infection [217]. It is critical to have a vaccine developed to provide immunity against COVID-19 and stop this pandemic. So far, the research for vaccine development of COVID-19 is dedicated with three different types of vaccines [45]. The Whole Virus Vaccine represents a classical strategy for the development of vaccinations of viral disease. Subunit Vaccine relies on extracting the immune response against the S-spike protein for COVID-19 [45]. This will refrain it from docking it with the hosts receptor protein [45]. The Nucleic Acid Vaccines have advanced and the new modifications could have an improved performance in combating the virus [45]. As per the end of December 2020, there are around 60 vaccines that are at the clinical development of trials in combating COVID-19 according to WHO [218]. The process of developing a vaccine would need to first go through the design stage and then towards the testing and experimental stage on animals and eventually in humans. In this section, we review the implicit research that is done over vaccine development and how ML techniques have been employed. Ge et al. [122] used ML techniques to evaluate how small virus strings (called peptides) bind to the human protein molecules. The authors created two ML programs called OptiVax and EvalVax. OptiVax can be utilized to augment vaccine designs. Whereas, EvalVax is applied as an evaluation tool for vaccine designs that can allow analysis of proposed vaccines over key metrics such as population coverage. The authors also use NetMHCpan-4.0 [105] to validate their findings. Similarly, Herst et al. [84] in their research about finding a vaccine employed a similar technique that was previously used for combating the Ebola virus. They used the aforementioned ML techniques netMHC, and netMHCpan for their binding studies and in-turn to predict potential vaccine candidates. Ward et al. [215] mapped out the protein sequences of COVID-19. This data was used by the authors for prediction, specificity, and epitope analysis. Epitope is the molecule that adds antibodies attached and is recognized by the immune system. The authors used the ML-based program NetMHCPan to locate the epitope sequences. Another study about epitope prediction was presented by Qiao et al. [165]. They employed DL techniques that are able to predict the best epitope for peptide-based COVID-19 vaccinations. As an alternative approach to predicting the epitope and protein sequence. A tool called Ellipro was utilized by Rahman et al. [167]. The tool is capable of predicting and presenting a visual view of the protein sequence of the epitope within the structure. The authors use ML techniques further to predict the interaction between the epitope and the immune system. Similarly, Sarkar et al. [184] used the SVM method to predict the toxic level of some epitopes. The study towards epitope prediction continues in the research done by Prachar et al. [164] who employed various techniques such as ANN and Position-Specific Weight Matrices (PSSM) algorithms to predict and verify COVID-19 epitopes. Ong et al. [155] introduced a vaccine designing approach referred to Reverse Vaccinology (RV). The aim of RV is to identify a potential vaccines. They used ML models such as Logistic Regression, SVM, RF, etc. to train on a proteins dataset with the objective of predicting proteins for the vaccine candidates. The authors used an ML-based tool referred as Vaxign-ML. In general, drug discovery/repurposing and vaccine development are considered as lengthy, high-risk, and expensive processes [64]. For example, vaccine development against a new disease is a lengthy process that needs to be done safely following a list of procedures that can sometimes take up to almost a decade [111]. With the current COVID-19 disease, ML models have shown promising results to reduce development timelines and overall costs. Although ML models can be created to predict drug or vaccine structures that could potentially treat or immune people against COVID −19, there are still some existing limitations and constraints. As the efficiency of ML models strongly depend on data, considering the low volume of generated data with respect to drug discovery/repurposing and vaccine development since the emergence of COVID-19 can lead to poor outcomes for ML-based models. Additionally, there are still vague descriptions related to data which make it difficult for researchers to analyze them correctly and come with a predictive ML-based model for drug discovery/repurposing and vaccine development. Although the existing datasets face some limitations, as time goes on, we should expect more reliable data to feed ML models and enhance their efficacy.

Future expectations

Although we review many of the ML approaches regarding the impact of COVID-19 in this paper, there is still an essential need for developing solutions using ML to address the pandemic’s complications and challenges. Since there was no adopted method for fighting against COVID-19 when it was started, previous ML models regarding infectious diseases (epidemiological models) can be helpful for the early stage of COVID-19. As we discussed, detecting and screening COVID-19 using AI and ML techniques can play a key role in combating this pandemic. The combination of technologies like IoT devices with these techniques needs to be expanded for crowd areas including airports, subways or bus stations, and so on. This development would enhance the identification of suspicious cases within lesser both time and contamination. It is important to implement an efficient method of achieving high accuracy of detecting COVID-19 through medical imaging and integrating ML techniques. It is essential to overcome the challenges that are presented by this approach. Challenges such as lack of data and the privacy issue within data collection, misinformation by media, the limited number of expertise between AI and medical science. AI technologies can also assist to implement the following expectations in the future: i) empowering the medical imaging devices using the non-contact automatic image capturing to prevent further infection from the patient to the radiologist or even another patient, and ii) automatically monitoring the patients using intelligent video analysis. As the countries learn more about COVID-19, it is essential to have updated datasets. This can lead to better forecasting by implementing the proper ML models using those datasets. In addition to the forecasting concepts, investigating the effects of different social media and their pathways for detecting early sign of possible future pandemic.

Conclusion

Machine Learning (ML) models and techniques have vastly been used in plenty of industries over the past decade. Within the healthcare industry, ML has been broadly used for screening and diagnosing. In epidemiology area, ML is basically utilized for forecasting and understanding epidemics and diseases. In this paper, we presented a comprehensive survey of how ML applications have been used to fight against COVID-19. We presented the efforts that are taken by the ML research communities to combat this virus across three main phases “Screening”, “Tracking and Forecasting” and “Medical Assistance”. ML applications for each of these phases are primarily focused as such; “Screening” intended for diagnosing the virus through medical imaging data (COVID-19 related X-Rays and CT-Scans), “Tracking and Forecasting” towards forecasting and predicting the numbers of cases and contact tracing, and lastly, “Medical Assistance” with the aim of understanding the protein sequences and structure of the virus and whether a cure could be found in combating it via a drug or vaccine. One of the main challenges that researchers face when diagnosing using ML techniques was the lack of relevant data that are made accessible to the public. Lack of data meant researchers had to use techniques like data augmentation, transfer learning, and fine-tuning models to improve prediction accuracy. Though these methods worked well in some cases, more data would make these models more robust. Similarly, forecasting models trained on more data for predicting the spread and number of cases could be more accurate. Regarding developing a vaccine or repurposing, it is important to have a good understanding of virology, bioinformatics. Additionally, ML is especially important for researchers from different fields to collaborate and integrate their knowledge in order to discover a cure.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

108 in total

1. Prediction of fatty liver disease using machine learning algorithms.

Authors: Chieh-Chen Wu; Wen-Chun Yeh; Wen-Ding Hsu; Md Mohaimenul Islam; Phung Anh Alex Nguyen; Tahmina Nasrin Poly; Yao-Chin Wang; Hsuan-Chia Yang; Yu-Chuan Jack Li
Journal: Comput Methods Programs Biomed Date: 2018-12-29 Impact factor: 5.428

Review 2. The rise of deep learning in drug discovery.

Authors: Hongming Chen; Ola Engkvist; Yinhai Wang; Marcus Olivecrona; Thomas Blaschke
Journal: Drug Discov Today Date: 2018-01-31 Impact factor: 7.851

3. Deep Learning Enables Accurate Diagnosis of Novel Coronavirus (COVID-19) With CT Images.

Authors: Ying Song; Shuangjia Zheng; Liang Li; Xiang Zhang; Xiaodong Zhang; Ziwang Huang; Jianwen Chen; Ruixuan Wang; Huiying Zhao; Yutian Chong; Jun Shen; Yunfei Zha; Yuedong Yang
Journal: IEEE/ACM Trans Comput Biol Bioinform Date: 2021-12-08 Impact factor: 3.710

4. Prediction of epidemic trends in COVID-19 with logistic model and machine learning technics.

Authors: Peipei Wang; Xinqi Zheng; Jiayang Li; Bangren Zhu
Journal: Chaos Solitons Fractals Date: 2020-07-01 Impact factor: 9.922

5. Large-scale machine learning of media outlets for understanding public reactions to nation-wide viral infection outbreaks.

Authors: Sungwoon Choi; Jangho Lee; Min-Gyu Kang; Hyeyoung Min; Yoon-Seok Chang; Sungroh Yoon
Journal: Methods Date: 2017-08-13 Impact factor: 3.608

6. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding.

Authors: Roujian Lu; Xiang Zhao; Juan Li; Peihua Niu; Bo Yang; Honglong Wu; Wenling Wang; Hao Song; Baoying Huang; Na Zhu; Yuhai Bi; Xuejun Ma; Faxian Zhan; Liang Wang; Tao Hu; Hong Zhou; Zhenhong Hu; Weimin Zhou; Li Zhao; Jing Chen; Yao Meng; Ji Wang; Yang Lin; Jianying Yuan; Zhihao Xie; Jinmin Ma; William J Liu; Dayan Wang; Wenbo Xu; Edward C Holmes; George F Gao; Guizhen Wu; Weijun Chen; Weifeng Shi; Wenjie Tan
Journal: Lancet Date: 2020-01-30 Impact factor: 79.321

7. Automatic COVID-19 CT segmentation using U-Net integrated spatial and channel attention mechanism.

Authors: Tongxue Zhou; Stéphane Canu; Su Ruan
Journal: Int J Imaging Syst Technol Date: 2020-11-24 Impact factor: 2.177

8. Application of deep learning for fast detection of COVID-19 in X-Rays using nCOVnet.

Authors: Harsh Panwar; P K Gupta; Mohammad Khubeb Siddiqui; Ruben Morales-Menendez; Vaishnavi Singh
Journal: Chaos Solitons Fractals Date: 2020-05-28 Impact factor: 5.944

9. Estimation of the asymptomatic ratio of novel coronavirus infections (COVID-19).

Authors: Hiroshi Nishiura; Tetsuro Kobayashi; Takeshi Miyama; Ayako Suzuki; Sung-Mok Jung; Katsuma Hayashi; Ryo Kinoshita; Yichi Yang; Baoyin Yuan; Andrei R Akhmetzhanov; Natalie M Linton
Journal: Int J Infect Dis Date: 2020-03-14 Impact factor: 3.623

Review 10. The SARS-CoV-2 Vaccine Pipeline: an Overview.

Authors: Wen-Hsiang Chen; Ulrich Strych; Peter J Hotez; Maria Elena Bottazzi
Journal: Curr Trop Med Rep Date: 2020-03-03

9 in total

1. Machine learning predictions of COVID-19 second wave end-times in Indian states.

Authors: Anvesh Reddy Kondapalli; Hanesh Koganti; Sai Krishna Challagundla; Chaitanya Suhaas Reddy Guntaka; Soumyajyoti Biswas
Journal: Indian J Phys Proc Indian Assoc Cultiv Sci (2004) Date: 2021-10-01

2. Stacking Ensemble-Based Intelligent Machine Learning Model for Predicting Post-COVID-19 Complications.

Authors: Aditya Gupta; Vibha Jain; Amritpal Singh
Journal: New Gener Comput Date: 2021-12-14 Impact factor: 1.180

3. Novel informatics approaches to COVID-19 Research: From methods to applications.

Authors: Hua Xu; David L Buckeridge; Fei Wang; Peter Tarczy-Hornoch
Journal: J Biomed Inform Date: 2022-02-16 Impact factor: 8.000

4. Modelling Covid-19 infections in Zambia using data mining techniques.

Authors: Josephat Kalezhi; Mathews Chibuluma; Christopher Chembe; Victoria Chama; Francis Lungo; Douglas Kunda
Journal: Results Eng Date: 2022-02-04

5. Digital government transformation in turbulent times: Responses, challenges, and future direction.

Authors: Seok-Jin Eom; Jooho Lee
Journal: Gov Inf Q Date: 2022-03-11

6. Empirical Study on Classifiers for Earlier Prediction of COVID-19 Infection Cure and Death Rate in the Indian States.

Authors: Pratiyush Guleria; Shakeel Ahmed; Abdulaziz Alhumam; Parvathaneni Naga Srinivasu
Journal: Healthcare (Basel) Date: 2022-01-02

7. COVID-19 Detection Based on Lung Ct Scan Using Deep Learning Techniques.

Authors: S V Kogilavani; J Prabhu; R Sandhiya; M Sandeep Kumar; UmaShankar Subramaniam; Alagar Karthick; M Muhibbullah; Sharmila Banu Sheik Imam
Journal: Comput Math Methods Med Date: 2022-02-01 Impact factor: 2.238

8. Reported Adverse Effects and Attitudes among Arab Populations Following COVID-19 Vaccination: A Large-Scale Multinational Study Implementing Machine Learning Tools in Predicting Post-Vaccination Adverse Effects Based on Predisposing Factors.

Authors: Ma'mon M Hatmal; Mohammad A I Al-Hatamleh; Amin N Olaimat; Rohimah Mohamud; Mirna Fawaz; Elham T Kateeb; Omar K Alkhairy; Reema Tayyem; Mohamed Lounis; Marwan Al-Raeei; Rasheed K Dana; Hamzeh J Al-Ameer; Mutasem O Taha; Khalid M Bindayna
Journal: Vaccines (Basel) Date: 2022-02-26

9. Does COVID-19 Clinical Status Associate with Outcome Severity? An Unsupervised Machine Learning Approach for Knowledge Extraction.

Authors: Eleni Karlafti; Athanasios Anagnostis; Evangelia Kotzakioulafi; Michaela Chrysanthi Vittoraki; Ariadni Eufraimidou; Kristine Kasarjyan; Katerina Eufraimidou; Georgia Dimitriadou; Chrisovalantis Kakanis; Michail Anthopoulos; Georgia Kaiafa; Christos Savopoulos; Triantafyllos Didangelos
Journal: J Pers Med Date: 2021-12-17

9 in total