Literature DB >> 34511689

Review on COVID-19 diagnosis models based on machine learning and deep learning approaches.

Zaid Abdi Alkareem Alyasseri^1,2, Mohammed Azmi Al-Betar^3,4, Iyad Abu Doush^5,6, Mohammed A Awadallah^3,7, Ammar Kamal Abasi^3,8, Sharif Naser Makhadmeh^9,3, Osama Ahmad Alomari¹⁰, Karrar Hameed Abdulkareem¹¹, Afzan Adam¹, Robertas Damasevicius¹², Mazin Abed Mohammed¹³, Raed Abu Zitar¹⁴.

Abstract

COVID-19 is the disease evoked by a new breed of coronavirus called the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Recently, COVID-19 has become a pandemic by infecting more than 152 million people in over 216 countries and territories. The exponential increase in the number of infections has rendered traditional diagnosis techniques inefficient. Therefore, many researchers have developed several intelligent techniques, such as deep learning (DL) and machine learning (ML), which can assist the healthcare sector in providing quick and precise COVID-19 diagnosis. Therefore, this paper provides a comprehensive review of the most recent DL and ML techniques for COVID-19 diagnosis. The studies are published from December 2019 until April 2021. In general, this paper includes more than 200 studies that have been carefully selected from several publishers, such as IEEE, Springer and Elsevier. We classify the research tracks into two categories: DL and ML and present COVID-19 public datasets established and extracted from different countries. The measures used to evaluate diagnosis methods are comparatively analysed and proper discussion is provided. In conclusion, for COVID-19 diagnosing and outbreak prediction, SVM is the most widely used machine learning mechanism, and CNN is the most widely used deep learning mechanism. Accuracy, sensitivity, and specificity are the most widely used measurements in previous studies. Finally, this review paper will guide the research community on the upcoming development of machine learning for COVID-19 and inspire their works for future development. This review paper will guide the research community on the upcoming development of ML and DL for COVID-19 and inspire their works for future development.

Entities: Chemical

Keywords: 2019‐nCoV; COVID‐19; COVID‐19 dataset; deep learning; machine learning

Year: 2021 PMID： 34511689 PMCID： PMC8420483 DOI： 10.1111/exsy.12759

Source DB: PubMed Journal: Expert Syst ISSN： 0266-4720 Impact factor: 2.812

INTRODUCTION

The coronavirus disease (COVID‐19) is defined as a disease or infection evoked by a new breed of coronavirus called the severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2), which was previously named 2019‐nCoV (Chung et al., 2020; Gralinski & Menachery, 2020; Rothe et al., 2020). It was first discovered in Wuhan Hubei City in China. The World Health Organization (WHO) reported the first COVID‐19 case on 31 December 2019. The COVID‐19 outbreak was declared as a global outbreak on 30 January 2020 (Sohrabi et al., 2020). Since 2009, the WHO considered H1N1 as an internationally pandemic disease; COVID‐19 was the second pandemic disease declared (Cucinotta & Vanelli, 2020; Zarocostas, 2009). The disease spread mainly through close contact with infected individuals, although researchers are still investigating potential infection routes of the disease (Corman et al., 2020; G. Li & De Clercq, 2020). The major signs of COVID‐19 infection are fever, dry cough and breathing difficulty. Some patients may have muscle aches and experience fatigue and loss of taste or smell (anosmia), and up to 10% have GI‐related symptoms, such as diarrhoea (Tsatsakis et al., 2020). As initially thought, one of the potential ways for the virus to spread between people is direct contact. Thus, social distancing can reduce the possibility of being infected. Disease transmission can occur at a distance of as far as 6 feet. As a result of that, the breathing drops generated from the infected person while talking or sneezing can be taken as one of the primary reasons to cause the disease to spread. Furthermore, the symptoms of COVID‐19 in some cases are not noticeable (Rothe et al., 2020). Figure 1 shows how COVID‐19 is spread from a person to another person when there is no social distancing.

FIGURE 1

COVID‐19 Person to person spread (Al‐Betar et al., 2021)

COVID‐19 Person to person spread (Al‐Betar et al., 2021) In general, several approaches for diagnosing COVID‐19 are available, such as nucleic acid‐based methods using polymerase chain reaction (PCR) (Abdulkareem, Mohammed, et al., 2021; Esbin et al., 2020), next‐generation sequencing (Harris et al., 2013), computed tomography (CT) scan (Akram et al., 2021), chest X‐ray (CXR) (Mohammed et al., 2020; Pan, Guan, et al., 2020; Sahlol et al., 2020; Shi et al., 2020) and paper‐based detection (K. Mao et al., 2020). These methods are used in monitoring changes in organs, and patients may need to undergo these pathological tests. The most popular of these pathological tests are CT scan and CXR (Dansana et al., 2020; Kumar, Nagpal, et al., 2020; Kumari et al., 2020; Mohammed et al., 2021; Waheed et al., 2020). Regarding CT and X‐ray chest techniques used for COVID‐19 diagnosis, many advantages distinguished these technologies. For instance, the CT provides more details about the patient's status and relatively quick compared with the other technologies. Using the X‐ray chest can get the results at a lower price and lower radiation. However, these technologies have some drawbacks that may affect their performance and usage, such as the CT scans of the brain can be affected by bone nearby, and the X‐ray chest does not provide 3D information. Unfortunately, a precise and quick method for COVID‐19 diagnosis is still unavailable. Usually, medical images (CT scan and X‐ray) are used in COVID‐19 diagnosis (Pan, Guan, et al., 2020; Pan, Ye, et al., 2020; Shi et al., 2020). These images are observed by an expert, who analyses content according to his or her experience in making diagnosis. In general, doctors can experience fatigue because of long working hours and make wrong diagnosis. Patients with COVID‐19 can present irregularities in their CT or CXR results, and many lung problems are identical to COVID‐19. Moreover, normality on either a CT scan or CXR does not necessarily indicate a negative COVID‐19 case. For this reason, assistant tools in the health care sector are required to ensure correct diagnosis. For faster and more accurate results, new approaches for detecting COVID‐19 on the basis of the principles of artificial intelligence (AI) have been proposed, specifically deep learning (DL) and machine learning (ML) (Alimadadi et al., 2020; Goodfellow et al., 2016; LeCun et al., 2015; Michie et al., 1994). Traditional ML and DL techniques have been developed by various scholars to assist doctors in making correct diagnosis. Such techniques can help in classifying X‐ray or CT scans of the chest into two classes: infected and normal. A decision is made after several steps, such as reading of an X‐ray image, prepossessing and extraction of unique features from input images, then features are input into the ML or DL model for a final prediction decision. The other purposes of using such techniques are predicting an outbreak separation by analysing COVID‐19 data and predicting red zones and number of infected cases with AI. Several AI techniques (DL and ML) are used in examining patients with COVID‐19 according to X‐ray and CT scans: Supervised learning (Ong, Wong, et al., 2020; Siddiqui et al., 2020) Unsupervised learning (Abdulkareem, Mohammed, et al., 2021; Carrillo‐Larco & Castillo‐Cara, 2020; Cui et al., 2020) The DL techniques can be used to diagnose coronavirus, predicted the virus infection, and diagnose the virus. The most prominent DL techniques are: Convolutional Neural Network (CNN) (Alazab et al., 2020; Apostolopoulos et al., 2020) Deep Neural Network (DNN) (Das et al., 2020; Ni et al., 2020) Recurrent Neural Network (RNN) (Liang et al., 2020; Zeroual et al., 2020) Generative Adversarial Networks (GANs) (Jamshidi et al., 2020; Loey, Smarandache, et al., 2020; Waheed et al., 2020) The literature is carefully selected based on impact and quality. Figure 2 shows the main source of COVID‐19 scientific publications using ML and DL. In the figure, the selected works are classified according to the digital database used to retrieve the papers, such as Elsevier, IEEE Xplore, MDPI, Springer, IOP Press and Wiley. Figure 2 shows the distribution of COVID‐19 work based on the type of publication, such as open access and close access.

FIGURE 2

publications statistics on ML and DL works in COVID‐19. (a) Publication per database. (b) COVID‐19 publication. (c) Distribution of published COVID‐19 research articles. (d) COVID‐19 publication The main contributions of this work are to contribute a comprehensive review of the most recent ML and DL approaches used in assisting doctors and scientists in COVID‐19 diagnosis. More than 200 papers from high‐impact‐factor journals published in IEEE, Elsevier, Springer and Wiley were reviewed. The present paper can provide information on the most used public COVID‐19 datasets which can be utilized by other researchers. Also, this review paper will guide the research community on the upcoming development of machine learning for COVID‐19 and inspire their works for future development. This review paper will guide the research community on the upcoming development of ML and DL for COVID‐19 and inspire their works for future development. This paper answers the following questions: Q1‐ What are ML and DL mechanisms applied for COVID‐19 diagnosis? Q2‐ Are there any public standard datasets that can be applied to examine ML and DL techniques? Q3‐ What are the most important measures that can be used to evaluate ML and DL approaches? Q4‐ Are there any end‐to‐end application available for COVID‐19 diagnosis? Q5‐ Which countries have the most number of researchers who publish work about ML and DL techniques to diagnose COVID‐19? Q6‐ What is the suitable ML or DL mechanism for determining different performance indicators for COVID‐19 diagnosis? The remaining part of this review consists of the following sections: Section 2 presents the proposed protocol for selecting studies about COVID‐19. Section 3.1 presents the research related to ML algorithms for COVID‐19, and Section 3.2 presents works related to DL algorithms for COVID‐19. Hybrid approaches are presented in Section 3.3. Public COVID‐19 datasets provided in Section 4. Section 5 provides the most commonly used measures for ML and DL approaches for COVID‐19. The review is concluded and future work is recommended in Section 7.

PROTOCOL FOR SELECTING COVID‐19‐RELATED STUDIES

Studies were selected using the most relevant keywords, namely, ‘COVID‐19’, ‘machine learning’ and ‘deep learning’. Only studies written in English were collected, and digital databases were used, such as Elsevier, IEEE Xplore, MDPI, Springer, IOP Press and Wiley. The procedure of our protocol included five steps. The first step was searching for studies on the COVID‐19 pandemic. A total of 255 papers were retrieved. The search was carried out in April 2021, and studies regarding the analysis of ML and DL techniques for the COVID‐19 pandemic were included. Three keywords were used in this step, namely, ‘COVID‐19’, ‘machine learning’ and ‘deep learning’ in various combinations and merged with the ‘AND’ operator. In the second step, duplicate papers were excluded. A total of 55 papers were excluded. In the third step, selected papers were filtered based on titles and abstracts, and papers with topics outside the scope of our domain were excluded. More than 100 articles were excluded, and only 100 papers were selected and passed to the next step. In step four, we skimmed each paper to ensure that its topic is within the defined scope. As a result, 10 articles were excluded. Finally, 90 papers related to ML and DL techniques for the COVID‐19 pandemic were included. A flowchart of the paper selection procedure is shown in Figure 3. The same eligibility criteria were used in the three steps.

FIGURE 3

Flowchart of selected studies involving the query and inclusion criteria

MACHINE LEARNING AND DEEP LEARNING FOR COVID‐19

Given the widespread infection of the coronavirus, ML and DL have been used in improving the performance of traditional techniques for COVID‐19 detection or prediction. ML approaches for COVID‐19 detection has been used, such as supervised and unsupervised learning, which will be discussed in Section 3.1. Furthermore, several DL approaches have been used for COVID‐19 detection, which will be discussed in Section 3.2. Section 3.3 describes the use of hybrid approaches that combine ML and DL for detection of COVID‐19 infection. Figure 4 presents the categorized AI techniques (ML and DL) selected in this survey paper.

FIGURE 4

Categorizing of related works

Machine learning approaches

ML instructs computers on what to do and train them to perform actions independently. It is a data analysis method that involves the development and fitting of models and allows machines to ‘learn’ by practice and make predictions. ML is used for COVID‐19 detection by analysing the content of the input image of an X‐ray or CT scan and extract unique features from them. According to these features, the prediction of the input image will be provided as a normal case or as an infected case. In general, ML algorithms that are used for COVID‐19 diagnosis can be categorized into two approaches which are supervised learning approaches describes in Section 3.1.1 and unsupervised learning approaches describes in Section 3.1.2. Figure 5 shows the ML research for COVID‐19.

FIGURE 5

Machine learning research for COVID‐19. (a) Number of machine learning publication. (b) Mapping of machine learning publication

Supervised learning approaches

Many regression, classification and feature extraction methods have been used in the detection of COVID‐19 (Peng & Nagata, 2020). These methods are used to accomplish the following: (i) determination of how the epidemic will end, (ii) prediction of the coronavirus transmission over regions, (iii) analysis of the expansion rate and forms of cure over different countries, (iv) correlation of the effect of weather condition and coronavirus and (v) analysis of the transmission rate of the virus (M. Yadav et al., 2020). Recent studies focused on the use of a logistic forecasting model (Wang, Zheng, et al., 2020), neural network‐based prediction model (Wieczorek et al., 2020), a hybrid ensemble nonlinear autoregressive neural network ensemble model combining neural networks with type‐2 fuzzy and the firefly algorithm (Melin et al., 2020a), prediction of COVID 19 (Kavadi et al., 2020), prediction of respiratory decompensation in Covid‐19 patients with ML (Burdick et al., 2020), development of association rules between weather data and COVID‐19 pandemic for the prediction of death rate (Malki et al., 2020), classification of images with target outputs, such as pneumonia, COVID‐19 and healthy lungs in using Q‐deformed entropy and DL features (Hasan et al., 2020), supervised ML models for predicting COVID‐19 cases (Rustam et al., 2020), analysis of the spatial relationship in the spread of COVID‐19 (Melin et al., 2020b), evaluation of health opinions and online contents relevant to COVID‐19 with ML (Sear et al., 2020) and ML techniques for investigating pandemic COVID 19 effects on young students activities, mental health and learning styles (Khattar et al., 2020). Siddiqui et al. (2020) investigated the correlation between patient temperature and COVID‐19 case status (i.e., suspected, confirmed and death), using the k‐means algorithm. The methodology encompasses three phases: database design, clustering and data collection. In the first phase, the dataset used is ‘coronavirus disease (COVID‐2019) situation reports’ obtained from the WHO. This dataset involves the infection rate in various regions in China. The second phase provides a description of the dataset, which consists of seven features (i.e., region, population [10,000 s], suspected cases, confirmed cases and death). Two features were added to the dataset, namely, lowest temperature and highest temperature. The reason is that patient temperature is considered one of the main factors for identifying COVID‐19 case status. In the last phase, clustering algorithms based on k‐means is used in finding new trends. The trends demonstrated the impact of temperature on each region in the three COVID‐19 states (i.e., suspected, confirmed and death). Ong, Wong et al. (2020) analysed the existing COVID‐19 vaccine candidates and then proposed a computational model based on ML and reverse vaccinology (RV) for the prediction of candidate proteins for vaccine design. The main role of RV is to study the bioinformatics of pathogen genomes. Therefore, potential vaccine candidates are identified. The dataset used in this study was collected from the UniProt (Bairoch et al., 2005) and NCBI databases. It consists of SARS‐CoV‐2 sequences and all proteins extracted from known human coronavirus strains. In the present study, the authors relied on Vaxign and Vaxign‐ML (Y. He et al., 2010; Ong, Wang, et al., 2020) to predict proteome biological characteristics and enhanced the Vaxign‐ML model based on RV and ML, using XGBoost, RF, support vector machine (SVM), KNN and logistic regression (LR) methods in predicting the protein levels of all SARS‐CoV‐2 proteins. Randhawa et al. (2020) used intrinsic COVID‐19 genomic signatures as patterns to establish a decision tree ML alignment‐free approach for predicting gene sequence of the COVID‐19 virus. The alignment‐free approach process only raw DNA sequence data and rapidly deliver the taxonomic classifications of novel pathogens. The proposed methodology was tested on a large dataset comprising more than 5000 unique viral genomic sequences. These data were collected from a well‐known repository called Virus‐Host DB. The results showed that the proposed method is an alternative way for analysing pathogen genome sequences and provide precise taxonomic classifications for unseen sequences in real time. Pinter et al. (2020) explored the potential of using a hybridisation model of network‐based fuzzy inference system and multi‐layered perceptron‐imperialist competitive algorithm to predict the outbreak of COVID‐19 according to time series data compiled from the Hungary statistical reports of infected cases and death rates. Three metrics were used in evaluating the performance of the proposed prediction model, namely, were mean absolute percentage error, root mean square error (RMSE) and determination coefficient. The proposed prediction model showed promising performance in estimating the total mortality and prediction of the outbreak of COVID‐19. Elaziz et al. (2020) used CXR images to build a visual diagnostic tool for differentiating between COVID‐19 and normal cases. Initially, on the basis of fractional multichannel exponent moments, features were extracted from CXR images. This process is time consuming and costly; a parallel computational multicore framework was used to boost the computation of images and data processing. In the present study, a new feature selection method was proposed, and a modified manta‐ray foraging optimization algorithm based on differential evolution (MRFODE) was used in identifying relevant subset features from extracted features. Iteratively, MRFODE generates a candidate subset of features, which are evaluated with a KNN approach. Two COVID‐19 X‐ray datasets were used in the study. The first dataset was collected from two resources, which were images published by Joseph Paul Cohenand, Paul Morrison and Lan Dao and images extracted from 43 different publications. These images were classified into two types: normal and viral pneumonia. This dataset encompassed 216 COVID‐19 positive images and 1675 negative COVID‐19 images. The second dataset was collected by a research team composed of members from different countries, including Qatar, Bangladesh, Pakistan and Malaysia (Chowdhury, Rahman, et al., 2020). The research team added new images from the COVID‐19 database. These images were collected by the Interventional Radiology and Italian Society of Medical Radiology. This dataset comprised 219 COVID‐19 positive images and 1341 negative COVID‐19 images. The proposed method (i.e., MRFODE) was evaluated using classification accuracy, recall and precision. The results showed that MRFODE was managed for the purpose of achieving promising classification accuracy in classifying COVID‐19 patients samples. Fayyoumi et al. (2020) designed an online questionnaire for normal and COVID‐19 cases in Jordan. Data from the questionnaire was used in determining the presence of signs and symptoms in both groups. The researchers created a COVID‐19 dataset of signs and symptoms of different patients. Thereafter, the researchers used this dataset as input to a set of ML models (SVM, multi‐layer perceptron [MLP]) and statistical approaches (i.e., LR) to predict potential patients with COVID‐19. The classification accuracy showed that MLP had the best performance (91.62%). SVM had the best performance (91.67%) in terms of precision. Kavadi et al. (2020) proposed an alternative approach for preventing the outbreak of COVID‐19 in India. The researchers used the COVID‐19 Indian database. The proposed approach is combining two well‐regarded approaches, which were nonlinear machine learning (NML) and partial derivative regression. PDL was used in normalizing the dataset, and then NML was used for prediction. Experimental results demonstrated that this technique is superior with regard to classification accuracy and prediction time compared with previous works. Wang, Zheng et al. (2020) exerted considerable efforts in predicting the COVID‐19 trend using logistic modelling method and FbProphet ML algorithm. The dataset used in this study is the most recent COVID‐19 epidemical dataset related to time series data at a country level. Many countries were involved in this study, including Brazil, Russia, India, Peru and Indonesia. In practice, the cap value of the epidemic trend was estimated through logistic modelling. Then, the result was used in creating the learning model of the FbProphet model for the prediction of the epidemic trend of COVID‐19. According to results of the experiment, the proposed model will enable decision‐makers in single country to act rapidly during a COVID‐19 outbreak. Burdick et al. (2020) proposed a prediction model for ventilation needs among COVID‐19 patients based on XGBoost classifier. XGBoost classifier utilizes a couple of decision trees and aggregates their results into one score. In this study, the dataset is collected from five united states health care systems for the patients who enrolled or were admitted to these five hospitals between 24 March 2020 and 4 May 2020. For each patient, 12 variables are measured, such as diastolic blood pressure (DBP), temperature and blood urea nitrogen (BUN). The proposed model produces a diagnostic ratio for predicting ventilation higher than the Modified Early Warning Score (MEWS). Along with this achievement, the proposed model produced a good compromise between sensitivity and specificity, where it has higher sensitivity (0.90) and higher specificity (p < 0.05) than MEWS. De Felice and Polimeni (2020) used a MLbibliometric methodology to analyse and evaluate research trends in COVID‐19. Research papers were obtained from Scopus and contained information regarding countries, outputs, journals, institutions, funding, keywords and citation counts. The results showed a significant increment in the number of published documents for the studied period, determined the clinical features of patients with COVID‐19 and defined COVID‐19 as the most common topic. Samuel et al. (2020) determine public opinion about the pandemic using tweets that mention coronavirus. In addition, the authors presented insights about the relationship between the progress of worry‐sentiment and the time as COVID‐19 reached maximum levels in the United States. These insights were demonstrated by utilizing analytics that explains the text with the aid of data visualization of the text. In the context of textual analytics, two essential ML methods are provided and their effectiveness is compared in categorization of coronavirus tweets with of variable lengths. The results showed high performance of the two ML methods, as they obtained a high classification accuracy for short Tweets. However, the accuracy obtained by the two methods are not promising for longer Tweets. The ML models' ability to anticipate patients affected by COVID‐19 was demonstrated in Rustam et al. (2020). Four standard models were studied, including least absolute shrinkage and selection operator (LASSO), exponential smoothing (ES), linear regression (LR) and SVM to anticipate the threatening factors of COVID‐19. Each model made three prediction types, such as number of newly infected cases, number of deaths and the number of recoveries in the next 10 days. The current scenario of the COVID‐19 is used to evaluate these methods. The results show the ES model achieves high performance as it achieves the best results in predicting the new confirmed cases when it is compared against other comparable models. Sedik et al. (2020) presented two data augmentation tactics in order to enhance the learning process for each of the neural network (CNN) and LSTM. Consequently, increasing the accuracy of COVID‐19 detection. The results prove that the proposed strategies improve logarithmic loss, the accuracy of detection and testing time. Cobb and Seale (2020) studied trends in COVID‐19 growth trends in US counties, considering the existence of shelter‐in‐place (SIP) in these counties. They calculated the rates of compound growth, using cumulative confirmed COVID‐19 cases for a duration of 2 months and 10 days from 21 January 2020 to 31 March 2020. The calculation provided a single number, which was used by the forest ML model that showed the speed of the virus during the studied period. The results showed that SIP orders effectively reduced COVID‐19 cases rate, particularly in large population counties. Figure 6 shows an example of Supervised learning approaches for COVID‐19.

FIGURE 6

Supervised learning approaches for COVID‐19 (Asnaoui et al., 2020)

Unsupervised learning approaches

In contrast to supervised learning, we have unlabelled data in unsupervised learning (i.e., clustering). The aim is to gather the data into similar clusters without any additional knowledge. In general, the major clustering techniques can be split into two categories: hierarchical clustering and partition clustering. The hierarchical methods can be classified as divisive (top‐down approach) or agglomerative (bottom‐up approach). The divisive clustering starts with all objects in one cluster and then attempt to divide them into smaller clusters until a stopping criterion is satisfied. In contrast, The agglomerative clustering starts by considering each object as an separate cluster and then consolidate them into larger clusters until a stopping criterion is satisfied (Abasi, Khader, Al‐Betar, Naim, Makhadmeh, & Alyasseri, 2020a). Partition algorithms are intended to classify text documents into separate clusters. Typically, these methods use each cluster‐centroid to attract similar data (Abasi, Khader, Al‐Betar, Naim, & Alyasseri, et al., 2020). The ultimate aim of these methods is to efficiently disperse a massive amount of data into a collection of heterogeneous clusters, each with homogeneous data (Abasi et al., 2021). Figure 7a shows example of agglomerative and divisive hierarchical clustering to a dataset of five clusters, {a, b, c, d, e} and Figure 7b shows structure of partition clustering to a dataset of fourteen clusters.

FIGURE 7

(a) Application of agglomerative and divisive to a dataset of five objects, {a, b, c, d, e} (b) Application of partition clustering to a dataset of fourteen clusters

(a) Application of agglomerative and divisive to a dataset of five objects, {a, b, c, d, e} (b) Application of partition clustering to a dataset of fourteen clusters Partition clustering technique is the most applied in data clustering by researchers because it requires low execution time when compared with hierarchical clustering techniques (Abasi et al., 2019b). In addition, the evaluation results of when apply clustering emphasizes that partition clustering produce the best performance results with different multi‐scale text collections (Abasi, Khader, Al‐Betar, Naim, Makhadmeh, & Alyasseri, 2020b). A clustering model is proposed in Cui et al. (2020) to identify the latent clusters in COVID‐19 patients. The study was conducted on a dataset of more than six thousand adult patients screened positive with SARS‐CoV‐2 infection at the Mount Sinai Health System in New York, USA. Chronicity and one of the 18 body structures are mapped to patient diagnoses, and the optimum count of clusters was estimated by applying K‐means algorithm and elbow approach. They identified four clusters. Any patients who had positive COVID‐19 scans had respiratory problems. COVID‐19 patients are the most commonly affected with high comorbidity and chronic diseases. Age is also a major factor, but comorbidity and chronicity are strongly associated. Moreover, patients with a history of immune system disorders and metabolic or genitourinary diseases are more vulnerable to severe problems or medical conditions in the circulatory system when they are infected. The discovery of these four clusters is a significant step in identifying the path of the disease and patient treatment and subsequently improving disease prevention. In order to identify data‐driven countries clusters for predicting COVID‐19 effect, the k‐means are used in Carrillo‐Larco and Castillo‐Cara (2020). It was informed by health system coverage, socio‐economic status, metrics of air pollution and estimated disease prevalence. The researchers compare the clusters in terms of case fatality rate, number of deaths, number of confirmed COVID‐19 cases and the order in which the first case was identified by the country. With 155 countries, the model was developed and used in defining clusters. Three principal component analysis (PCA) parameters are used in the model and five or six clusters achieve the optimal result by combining countries to the related sets. The findings indicate that a model of five or six clusters can stratify countries according to reported numbers of COVID‐19 cases. However, in terms of number of fatalities the model could not stratify countries. Figure 8 shows an example of Deep learning steps for COVID‐19.

FIGURE 8

Deep learning steps for COVID‐19 (Ozturk et al., 2020)

Deep learning approaches for COVID‐19

The main DL algorithms used for COVID‐19 are CNN, which is described in Section 3.2.1, DNN, which is described in Section 3.2.2, RNN, which described in Section 3.2.3, GANs which is described in Section 3.2.4, and hybrid approach between ML and DL, which is described in Section 3.3. Figure 9 shows the DL publications for COVID‐19 in different countries.

FIGURE 9

Deep learning publication in different countries for COVID‐19. (a) Number of deep learning publication. (b) Mapping of deep learning publication

Deep learning publication in different countries for COVID‐19. (a) Number of deep learning publication. (b) Mapping of deep learning publication The DL algorithms are widely utilized to handle COVID‐19 disease with different perspectives. Rahman et al. (2020) argue that the 5G RAN is incorporated with edge computing. The local DL mechanism was distributed for COVID‐19 edges, and the method used the global DL framework incorporated with three‐ phase reconciliation for the management of the cloud environment. Their proposed DL model supported experts in the COVID‐19 domain in adding semantic to key decision‐making. Patients with COVID‐19 are also classified with a DL mechanism called DenseNet201 (Jaiswal et al., 2020). This classification method is based on the chest CT scans, and a DL system is pretrained to diagnose coronavirus infection. The ImageNet database was used in assessing the proposed method. The classification model showed competitive results. In another study, the Covid‐19 cases were classified with entropy and DL mechanism, and CT scans were used (Hasan et al., 2020). Initially, CT images were sliced for the reduction of intensity variants. Thereafter, the backgrounds of the CT images were isolated with a histogram thresholding mechanism. Next, features were extracted from each CT lung scan with a Q‐deformed entropy algorithm combined with a DL mechanism. The extracted features were then utilized for classifying CT images with a neural network with LSTM. COVID‐19 infection was predicted using DL approaches (Alakus & Turkoglu, 2020). Laboratory data and DL were used in determining which patient has COVID‐19 infection. The proposed method was able to recognize patients with COVID‐19, with an accuracy rate of 86.66%. Three classes can be used for COVID‐19 detection from CXR images, namely, coronavirus, pneumonia and normal (Toğaçar, Ergen, & Cömert, 2020). The fuzzy colour method is used in restructuring data, and structured data are stacked. Then, DL methods (i.e., MobileNetV2 and SqueezeNet) are used with a feature selection method called Social Mimic optimisation. Finally, an SVM is used in classifying efficient features. Their proposed approach was able to reach an accuracy rate of 99.27%. CXR images are used in diagnosing COVID‐19 with patch‐based CNN architectures. This approach provides interpretable saliency maps for COVID‐19 diagnosis and triage of patients. In another study (Yoo et al., 2020), the decision tree classifier with DL was used in diagnosing COVID‐19 infection with CXR images. The proposed model contained three binary decision tree (BDT) classifiers trained based on a CNN model. The first BDT was used to classify a CXR image as either normal or abnormal. Signs of tuberculosis were discovered using a second BDT with abnormal images. COVID‐19 was discovered using the third BDT of abnormal images. The rate of accuracy reached 95%. Similarly, the model in Wang, Liu, et al. (2020) used CXR images and three types of data: normal, viral pneumonia and COVID‐19. The accuracy rate achieved by the model was 96.1%. CNNs are used in determining the severity of lung disease during COVID‐19 infection (Abdulkareem, Sani, et al., 2021; Zhu et al., 2020). This is achieved through a radiologist score of illness asperity taken from portable CXR images. Their CNN can determine the severity of stage lung disease, prognosticate, and anticipate response to medication. Spatial Transformer Networks which is a kind of neural network is adapted for analysing the lung ultrasonography (LUS) images of different COVID‐19 cases in different stages (Roy et al., 2020). This is done to predict the disease severity score and thus diagnoses the COVID‐19 infection based on LUS images. The sequence prediction of the DL method is used for monitoring COVID‐19 infection and recovering process (Heni, 2020). This is to measure the impact of Bacille Calmette‐Guerin and infection rates of tuberculosis in such populations. The DL methods used for the COVID‐19 outbreak is provided in Hurt et al. (2020). Chest radiographs obtained from five patients with COVID‐19 treated in the US and China were used. Deep transfer learning utilizes learned model knowledge to resolve a new similar task with slightest retraining along with Edge Devices (ED) for instance, Intelligent Medical Equipment, Webcam, Drone, IoT, Robot, etc. A survey of such techniques is presented by Sufian et al. (2020) to point out how such infrastructure can help in the automation when there is an outbreak. Such techniques categorise COVID‐19 by utilizing X‐ray images, CT Images and viral genome sequences. Predicting COVID‐19 is tackled by some researchers using transfer learning in which a model trained on one task is reused into another task after some tuning in to the new task. Minaee et al. (2020) apply transfer learning to figure out COVID‐19 by examining CXR images. Authors trained four CNNs namely, DenseNet‐121, ResNet50, ResNet18 and SqueezeNet. The evaluation results show that most models achieve a sensitivity rate of around 98% and a specificity rate of about 90%. Similarly, Vaid, Kalantar, and Bhandari (2020) use CNNs with the deployment of transfer learning method to enhance results accuracy when recognizing COVID‐19 from CXR scans. The evaluation results are (96.3%) accuracy and loss (0.151 binary cross‐entropy). In a recent survey, Farhat et al. (2020) summarizes the state‐of‐the‐art methods based on DL and pulmonary medical images. They mentioned that for a better prediction CT scans were preferred more than CXRs. Lung ultrasound (LUS) imaging is considered a cheaper and safe real‐time imaging technique that can be used in COVID‐19 diagnosis. Roy et al. (2020) develop a fully‐annotated dataset of LUS images. The proposed dataset is used with Spatial Transformer Networks to anticipate the asperity score of the illness. In addition, they introduce a new uninorms‐based approach to provide video‐level score aggregation. The proposed dataset is benchmarked on different DL models for predicting pixel‐level segmentation of COVID‐19 imaging biomarkers.

Convolutional neural network

As stated in Albawi et al. (2017), the word ‘deep learning’ applies to multi‐layered AI neural networks (ANN). It has become one of the most effective instruments in the last few decades and is very prominent in literature as it manages an immense quantity of data. The interest in deeper secret layers in various fields, particularly in pattern recognition, recently overcame classical methods' performances. CNNs are the most common DNNs. It is called convolution by that name, a linear mathematical action between matrices . A CNN has many levels, including convolutionary, nonlinear, pooling and completely linked levels. Convolutionary and completely linked layers have parameters that need to be set. In ML problems, a CNN has outstanding results, particularly in applications dealing with image data, such as largest image recognition data collection, computer vision and natural language processing. In this article, all the elements and the main CNN problems were clarified and described, as well as how these elements work. Moreover, the parameters which affect CNN efficiency were defined. This paper assumed that readers recognize computer and the artificial neural network in a fairly strong way. Alazab et al. (2020) presented an AI technique which utilized a deep CNN to recognize patients with COVID‐19 patients with two real‐world datasets collected from Australia and Jordan. Their algorithm was tested using 1000 X‐ray images of real patients. Experimentally, their technique detected COVID‐19 cases with an accuracy of 95%˘99%. Their technique was also used in predicting the number of patients with COVID‐19 and recovered and deaths cases over the next 7 days with two forecasting methods, namely, autoregressive integrated moving average model and LSTM. Datasets from Australia and Jordan were used for training and testing purposes. The number of confirmed COVID‐19, recovered and death cases in Australia and Jordan were detected, and the average accuracy was 94.80% (Australia) and 88.43% (Jordan). Image biomarkers from X‐ray images for COVID‐19 can be extracted automatically using deep CNN. Apostolopoulos et al. (2020) used a CNN called Mobile Net to analyse how extracted features are successful in classifying COVID‐19 biomarkers from X‐ray images. The outcome showed a classification accuracy of 87.66% between seven classes of COVID‐19, with 99.18% accuracy in identifying COVID‐19 from non‐COVID‐19 and sensitivity and specificity of 97.36% and 99.42%, respectively. In addition, Civit‐Masot et al. (2020) utilized the VGG16 architecture (Simonyan & Zisserman, 2014) to create a DL model to identify COVID‐19 from X‐ray pulmonary images. The result showed a high sensitivity of around 100% and high specificity. Brunese et al. (2020) presented a DL approach for COVID‐19 recognition from X‐rays. The approach had three steps. First, sign of pneumonia was checked using CXR. Second, COVID‐19 was differentiated from pneumonia. Finally, the location of COVID‐19 in the X‐ray was identified. The obtained results were promising as the time needed for detection was approximately 2.5 s and the average accuracy was 97. To address the problem of the availability of only a small collection of CXR images for COVID‐19, Oh et al. (2020) introduced a patch‐based CNN that can work by tuning a few parameters. In this method, the classification outcome was obtained by majority voting multiple patch locations with inference results. The method overall accuracy was 91.9% which is close to COVID‐Net (Wang & Wong, 2020) with 92.4%. Sedik et al. (2020) addressed this issue by proposing two data‐augmentation models to improve learning process for each of CNN and the convolutional LSTM in the classification of medical images (X‐ray and CT). The aim was to enhance the accuracy of COVID‐19 detection. The obtained outcome revealed enhancement in terms of logarithmic loss, testing time and detection accuracy, compared with the outcomes of DL models that do not utilize data augmentation. Islam et al. (2020) developed an automated system for disease identification. The system helped physicians in diagnosing COVID 19 and offered reliable and clear swift outcomes, which showed decrease in mortality risk. This work attempted to present a profound learning strategy focused on LSTM and CNN models to automatically diagnose COVID‐19 according to CXR images. CNN was used in extracting deep functions, and LSTM was used in identifying extracted elements. The machine was fitted with a series of 4575 X‐ray pictures, including 1525 COVID‐19 pictures. The findings of the tests indicated that the proposed method hit 99.4% precision, 99.9% specificity, 99.2% sensitivity, 99.3% and 98.9 F1 score. An invasive and insightful online research model based on CNN and real‐time data with adaptive algorithms was presented in Farooq and Bazaz (2020). Another difficulty in constantly improving training data is modelling and simulating certain issues by tuning model parameters over time. The key function of the algorithm is the removal of the necessity of redesigning the model when a new collection is obtained, as compared to traditional DL approaches in a constantly evolving training scenario . According to the concept of validation, the authors used it to research the effects of numerous disease response techniques. Finally, the authors suggest and model a strategy for managed natural immunization by means of risk dependent population division (PC), which divides the population into low‐risk (LR) and high‐risk (HR) compartments depending on risk factors (e.g., comorbidities and age). Panwar et al. (2020) employed CT scans and X‐ray for the lung for COVID‐19 detection. The authors proposed a DL nCOVnet neural network system which can be used in examining the X‐rays of patients as an alternate fast testing tool for detecting COVID‐19. The proposed model achieved a true positive rate of 97% after the use of successive layers of CNN and unbiased sets of data. Ardakani et al. (2020) stated that rapid diagnostic approaches can be helpful in treating patients in situations that are labour intensive and facilitate the monitoring and prevention of the spread of pandemic illnesses, such as COVID‐19. In this work, 1020 CT images were obtained from 108 COVID‐19 patients and 86 non‐COVID‐19 patients with typical and viral pneumonia. GoogleNet, AlexNet VGG‐16, SqueezeNet, ResNet‐18, MobileNet‐V2, ResNet‐101, Xception and ResNet‐50 were used in differentiating between COVID‐19 infection and non‐COVID‐19 classes in 10 recognized convolutional neural networks. ResNet‐101 and Xception provided the highest results among the networks. ResNet‐101 differentiated COVID‐19 cases with an AUC of 0.994 from non‐COVID‐19 cases with specificity of 99.02% and sensitivity of 100%. A sensitivity of 98.04% and specificity of 100% were attained by the Xception model. However, the radiologist's validation result was consistent with each AUC in terms of sensitivity, specificity and accuracy . They concluded that ResNet101 can be used as an alternative instrument in radiology department and can be used as a high sensitivity model for COVID‐19 infections prediction. Al‐Waisy et al. (2021) proposed a parallel architecture (COVID‐DeepNet) based on the incorporation of a deep belief network and a convolutional deep belief network trained from scratch with a large‐scale dataset was then integrated. The system accurately diagnosed patients with COVID‐19, with a detection accuracy rate of 99.93%, sensitivity of 99.90%, specificity of 100%, precision of 100%, F1‐score of 99.93%, MSE of 0.021% and RMSE of 0.016%. Paluru et al. (2021) proposed anamorphic depth embedding‐based lightweight CNN, called Anam‐Net, to segment anomalies in COVID‐19 chest CT images. The number of parameters of the state‐of‐the‐art UNet (or its variants) was 7.8 times that of the proposed Anam‐Net. Thus, Anam‐Net is lightweight and infers mobile or resource constraint (point‐of‐care) platforms. The suitability of Anam‐Net for point‐of‐care platforms was demonstrated by using it in embedded systems, such as Raspberry Pi 4, NVIDIA Jetson Xavier and mobile‐based Android application (CovSeg) embedded with Anam‐Net. Monshi et al. (2021) proposed a CovidXrayNet model based on EfficientNet‐B0 to optimize the data augmentation and the CNN hyperparameters for detecting COVID‐19 from CXRs in terms of validation accuracy. CovidXrayNet achieves 95.82% on the COVIDx dataset, with only 30 epochs of training. Also, this optimization increases the accuracy of the popular CNN architectures, such as the Visual Geometry Group network (VGG‐19) and the Residual Neural Network (ResNet‐50), by 11.93% and 4.97%, respectively. Ismael and Şengür (2021) presented deep‐learning‐based approaches, namely deep feature extraction, fine‐tuning of pretrained CNNs and end‐to‐end training of a developed CNN model, classify COVID‐19 and normal (healthy) CXR images. For deep feature extraction, pretrained deep CNN models (ResNet18, ResNet50, ResNet101, VGG16 and VGG19) were used. A dataset containing 180 COVID‐19 and 200 normal (healthy) CXR images was used. The deep features extracted from the ResNet50 model and SVM classifier with a linear kernel function produced a 94.7% accuracy score, which was the highest among the obtained results. The achievement of the fine‐tuned ResNet50 model was 92.6%, whereas end‐to‐end training of the developed CNN model produced a 91.6% result.

Deep neural network

DL strategies have been proposed since 2006 (Liu et al., 2017) for a quick learning algorithm for deep beliefs. Owing to their inherent ability to overcome the downside of traditional neural networks. DL strategies were also identified as suitable for comprehensive computer vision application testing, pattern recognition, speech recognition, reading and recommendation systems in natural language. Some DL architectures and practical configurations were discussed in Liu et al. (2017). Four DL architectures were presented, namely, auto encoders, neural networks, Boltzmann machine and deep belief networks. A clear rationale was provided for the selection of possible DNN models for solving problems in certain cases, such as speech processing, pattern detection and computer vision. According to Das et al. (2020), 6–9 h are required in checking contamination with RT‐PCR kits. Owing to the lower sensitivity of RT‐PCR, it shows high incorrect results. To address this dilemma, COVID‐19's marker and diagnosis focus on X‐ray and computed tomography (CT) radiology. An automated recognition technique that utilizes deep ML is developed. CXR are analysed directly using DL techniques to improve the algorithm operation. The methods train network weights using large datasets and maximize weight with pretrained networks in small datasets. In a work by Wang, Zheng et al. (2020) retrospectively 5372 patients from seven cities or provinces with CT images were investigated. Firstly, the DL method was pretrained using CT images from 4106 patients. The images was useful in exploring lung properties. On the basis of the duration of qualified and externally evaluated success of DL framework, 1266 patients (924 with COVID‐19; 471 had follow‐up of more than 5 days) and 342 with pneumonia from six municipalities or provinces were enrolled. The DL method was effective in evaluating COVID‐19 from other pneumonia (AUC 0.87 and 0.88) and viral pneumonia (AUC 0.86) cases in the four previous sets of validation. The patients were stratified with the DL method, and a substantial difference in period spent in the hospital was observed. The DL device positioned itself automatically on the centres of the suspicious areas without human intervention, exhibiting characteristics that are similar to documented findings in radiology. DL offers a simple way of rapid COVID‐19 screening and recognizes possible high‐risk patients, thereby maximizing care services and preventing serious symptoms. According to Hu et al. (2020), dying from severe alveolar damage and progressive respiratory failure can be attributed to the onset of a serious illness. RT‐PCR is the universal standard for clinical evaluation but can trigger false adverse effects. Furthermore, the lack of RT‐PCR services for testing may postpone the resulting clinical decision and care in a pandemic situation. Chest visualization in CT is a powerful testing and forecasting method for patients with COVID‐19. This work proposed the recognition and characterization of COVID‐19 infections with CT images through a weakly controlled profound learning technique. The technique may help in decreasing manual markings in CT images and distinguishing COVID‐19 instances from non‐COVID‐19 instances. Based on the positive findings, the authors claimed that their work can predict the large‐scale application of proven technologies in clinical trials. To comply with shifts in size and place of lesions, the authors suggested a multi‐scale learning system. They fed intermediate CNN representations (i.e., Conv3, Conv4 and Conv5) with function charts and classification layers which are used with 1X1 convolution. The results obtained with the proposed technique was compared with the results obtained by radiologists (0.994 vs. 0.873). The optimisation algorithm salp swarm algorithm has been used with DL for the gene selection of COVID‐19 (Altan & Karasu, 2020), and methods that reduces death rate has attracted considerable interest (Farooq & Bazaz, 2020). Altan and Karasu (2020) proposed a hybrid method between a chaotic salp swarm algorithm (CSSA) and CNN approach to recognize COVID‐19 pneumonia infection using X‐ray datasets. The main purpose of using CSSA is to optimize the diagnosed coronavirus pneumonia. The efficiency of the proposed hybrid technique has shown a high accuracy rate to spot COVID‐19 disease from X‐ray images.

Recurrent neural network

Predicting newly contaminated and recovered COVID‐19 cases are important in planning resource distribution and updating curfew rules for downturning disease progression. Zeroual et al. (2020) compared five DL methods to predict the number of new COVID‐19 cases and recovered COVID‐19 cases within 17 days. The compared methods were LSTM, simple RNN, gated recurrent units, Variational AutoEncoder (VAE) and bidirectional LSTM (BiL‐STM) algorithms. The study utilized collected data from Spain, Italy, China, France, Australia and USA. The obtained results showed that VAE was superior to the other algorithms in terms of Loss, MAE, RMSE, MAPE, EV and RMSLE. Clinical survival analysis can be used in forecasting the probability of a clinical outcome. Liang et al. (2020) utilized a DL‐based survival model to forecast the risk level of patients with COVID‐19 in progressing to severe illness by using clinical characteristics obtained at entrance. The proposed model was evaluated using three separate cohorts data from Guangdong, Hubei and Wuhan, China. The developed model was used by an online calculation platform https://aihealthcare.tencent.com/COVID19-Triage_en.html, which was used in determining patient prioritization at admission. The patients with a high risk of developing a serious illness was able to receive needed care as soon as possible. Arora et al. (2020) applied DL models to anticipate the positive cases of COVID‐19 in India. In their method, the RNN used LSTM variants. The following variants of LSTM were utilized: deep LSTM, convolutional LSTM and bi‐directional LSTM models in predicting positive COVID‐19 cases. Experimentally, the bi‐directional LSTM obtained the best results, whereas and convolutional LSTM had the worst results in terms of prediction errors. In terms of daily and weekly predictions, the bi‐directional LSTM achieved astonishing results, with 3% or less for short‐term prediction.

Generative adversarial networks

In Loey, Smarandache et al. (2020), the authors introduced deep transfer learning and GAN for COVID‐19 cases recognition using CXR images. This algorithm was tested using a dataset with 307 images. These images were classified into four classes as follows: pneumonia virus, COVID‐19, bacterial pneumonia and normal. Three models of deep transfer were used, namely, Alexnet, Googlenet and Restnet18. Three experimental scenarios were proposed for the recognition of COVID‐19 cases. The first scenario included four classes of images, whereas the second scenario included three classes of images. Two classes of images are included in the third scenario. The COVID‐19 images were included in the dataset used in each experimental scenario. Experimentally, the Googlenet as a deep transfer model achieved the best accuracy results in the first experimental scenario. Alexnet as a deep transfer model showed the highest accuracy in the second experimental scenario, and Googlenet obtained the best accuracy in the third experimental scenario. Jamshidi et al. (2020) utilized DL techniques, such as GANs, intense learning machine and LSTM, to diagnose patients with COVID‐19. They proposed an applied bioinformatics strategy integrating numerous knowledge facets from a range of organized and unstructured data sources into user‐friendly interfaces for doctors and scientists. The key benefit of these AI systems is the increased speed of evaluation and care phase of COVID‐19.

Hybrid approach combining machine and deep learning for COVID‐19

An iteratively pruned DL model was introduced in Rajaraman, Siegelman, et al. (2020). X‐ray images with pulmonary manifestations of COVID‐19 were used. In this method, a custom CNN and a trained model using ImageNet was used in learning the specific feature representations of COVID‐19. Then, the learned knowledge was used in categorizing patients as COVID‐19‐viral abnormality, normal and bacterial pneumonia cases. Experimentally, the proposed model achieves excellent results, with an accuracy of 99.01% and AUC value of 0.9972. Rajaraman and Antani (2020) increased the data used for training to detect COVID‐19 with weakly labelled data. The reason was that COVID‐19 has similar characteristics to pulmonary viral pathogens. The training data was enlarged to include the weakly labelled X‐ray pathogens for bacterial or viral pneumonia. The selected images were used for training a CNN algorithm and contrast results against non‐augmented data‐trained model. Six datasets were used for evaluation. Experimentally, the weakly labelled data augmentation was better than the baseline non‐augmentation on training in identifying COVID‐19 manifestations as viral pneumonia. Similarly, the authors in Zhu et al. (2020) applied a deep‐learning CNN to determine the severity of lung illness of patients with COVID‐19. Their approach was tested using a real‐world dataset with 131 CXRs obtained from 84 patients with COVID‐19 in US hospitals. The dataset was divided. Approximately 80% of the data were used for training, and 20% were used for testing. The correlation analysis and mean square error analysis are utilized to assess the proposed approach. The results were satisfactory in a small dataset, but the authors mentioned that their approach should be tested with a larger dataset. Furthermore, the authors mentioned that their approach may be used in identifying the severity of lung diseases in patients with COVID‐19 and examining illness development and response to treatment. Z. Li et al. (2020) developed a DL‐based AI system to recognize COVID‐19‐infected lung regions and evaluate illness severity and development with thick‐section chest CT images. Their system was tested using a dataset with 531 CT scans collected from 204 patients with COVID‐19. The simulation results of their system were compared with the patient diagnosis reports and key features obtained from radiology result obtained using the receiver operating characteristic curve and Cohen's kappa. The results showed the ability of their system to segment lung infection regions, diagnosing, and follow‐up treatment for COVID‐19 patients using CT scans . The authors mentioned several limitations of their system: 1) organ movement due to breathing and heart motion may cause wrong diagnosis, 2) their system is tested based on the changes in imaging biomarkers at the whole lung level and 3) their system is tested using CT scans of COVID‐19 patients only. Another DNN algorithm for recognizing and categorizing COVID‐19 infection from CT images was developed in Hu et al. (2020). Patients without pneumonia and those with community‐acquired pneumonia were categorized as COVID‐19 by the proposed algorithm. This algorithm identified the precise locations of lesions or inflammation regions caused by COVID‐19 and thus facilitated in determining the severity of illness and was useful in triage and treatment. The algorithm is evaluated using 60 instances published in the TCIA dataset. The number of samples used was varied. For instance, 40 were used in the training, 10 for validation and 10 in the testing stage. In the simulation, the algorithm showed promising results in terms of accuracy, precision and area under the receiver operating characteristic curve. Zhang et al. (2020) employed DL‐based software to detect, localize and quantify COVID‐19 pneumonia. The proposed AI program utilized a 3D CNN combined with V‐Net bottleneck structures. Their method was tested using 2460 images collected from Huoshenshan Hospital in Wuhan, China. Experimentally, their method achieved excellent results, which were useful in disease assessment and formulation of treatment plans. Rahman, Hossain et al. (2021) studied nine different versions of a DL algorithm used in detecting COVID‐19 phenomena with medical IoT devices to collect raw data. The DL algorithms were tested for the design of adversarial examples for each type and identification of the accountability of these algorithms. The authors pointed out that DL algorithms consider defensive models against adversarial perturbations, which remain vulnerable to adversarial cases.

PUBLIC COVID‐19 DATASETS

One of the most challenging tasks in implementing ML and DL approaches is obtaining suitable datasets for the training, testing and evaluation of proposed techniques. On this basis, this section presents the most used recent COVID‐19 datasets, which have different characteristics, sizes and data types, such as CT scan or X‐ray images. Table 1 provides a full demonstration of these datasets and how they can be accessed.

TABLE 1

Existing public COVID‐19 datasets

Dataset	Data type	Classification Output	Dataset type	Characteristics	Primary dataset	Secondary dataset	Techniques	Achievement	Available online
World Health Organization (Siddiqui et al., 2020)	Outbreak	Statistical report	‘Coronavirus disease (COVID‐19) situation reports’	Rate of infection for temperature in various provinces of China.	✓		(Kumar, Arora, et al., 2020; Bandyopadhyay & Dutta, 2020; Alzubaidi et al., 2021; Chaudhary & Singh, 2021; Loey, Smarandache, et al., 2020)	Accuracy Scenario 1: 80.6%, Scenario 2: 85.2%, Scenario 3: 99.9%	WHO. Coronavirus disease (COVID‐2019) situation reports. ‘https://www.who.int/publications/m/item/weekly‐epidemiological‐update—28‐september‐2020, 2020’
Protegen database (Randhawa et al., 2020)	Laboratory findings	Diagnosis	annotated proteins	Positive samples of 397 bacterial and 178 viral protective antigens (PAgs) and 4979 negative samples.	✓		(Ong, Wong, et al., 2020; Ong, Wang, et al., 2020)		‘https://www.ncbi.nlm.nih.gov/’
(NCBI) database (DNA sequence data) (Randhawa et al., 2020)	Laboratory findings	Diagnosis	COVID‐19 virus sequence	5000 unique viral genomic sequences ( 61.8 million bp)	✓		(Kannan et al., 2020; Touati et al., 2021; Derecichei & Atikukke, 2020; Khairkhah et al., 2020; Y.‐F. Mao et al., 2021)	Method 1: 72.7%, Method 2: 68.7%, Method 3: 91.45%	‘https://sourceforge.net/projects/mldsp‐gui/files/COVID19Dataset/’
Hungary‐COVID‐Data (Pinter et al., 2020)	Outbreak	Statistical report	Time series data	Statistical analyses of the cases of COVID‐19 and the fatality rate in Hungary	✓		(Chandra et al., 2021; Guasti, 2020; Pinter et al., 2020)		‘https://www.worldometers.info/coronavirus/country/hungary/’
First dataset: (pneumonia) database, Second dataset: COVID‐19 images (Elaziz et al., 2020)	Medical image	Diagnosis	chest x‐ray image	Dataset part 1: 216 COVID‐19 and 1,675 normal cases. Dataset part 2: 219 COVID‐19 and 1,341 normal cases.	✓		(Kassani et al., 2020; Qi et al., 2020; Barstugan et al., 2020; Afshar et al., 2020; Elaziz et al., 2020; El‐Kenawy et al., 2020; Luján‐García, Yáñez‐Márquez, et al., 2020; Rasheed et al., 2021; Toğaçar, Ergen, Cömert, & Özyurt, 2020)	Accuracy Dataset1: 96.09% Dataset2: 98.09%	Dataset 1: ‘https://github.com/ieee8023/covid‐chestxray‐dataset’. Dataset 2: ‘https://www.sirm.org/category/senza‐categoria/covid‐19/’
real novel COVID‐19 data (Fayyoumi et al., 2020)	Laboratory findings	Diagnosis	Signs and symptoms of the patients	64 negative PCR test and 41 Positive PCR test		✓	(Bandyopadhyay & Dutta, 2020; Albahri et al., 2020; Fayyoumi et al., 2020; Chakraborty & Ghosh, 2020; Loey, Manogaran, et al., 2020)	Accuracy: 91.67%	unavailable
Indian COVID‐19 dataset (Kavadi et al., 2020)	Outbreak	Statistical report	Time series data	Statistical reports of COVID‐19 cases in India	✓		(Prakash et al., 2020; Sujatha et al., 2020; Shastri et al., 2020; Sujath et al., 2020; R. S. Yadav, 2020)	Case 1: 97.82%, Case 2: 98%, Case 3: 96.66% Case 4: 97.50%	‘https://www.kaggle.com/sudalairajkumar/covid19‐in‐india’
COVID‐19 epidemiological data (Wang, Zhengm et al., 2020)	Outbreak	Statistical report	Time series data	COVID‐19 infected cases in numerous countries ( Brazil, Russia, India, Peru and Indonesia)	✓		(Punn et al., 2020; Mohamadou et al., 2020; Muhammad et al., 2020; Muhammad et al., 2021; Tuli et al., 2020)	Accuracy: 94.99%, Sensitivity 93.34%, Specificity 94.30%	‘https://www.who.int/emergencies/diseases/novel‐coronavirus‐2019/situation‐reports’
U.S. hospitals COVID‐19 admission data (Burdick et al., 2020)	Laboratory findings	Diagnosis	Patient Vital Sign and Lab Measurement	197 confirmed cases based on 12 factors, for example systolic blood pressure, Blood pressure, and heart rate.	✓		(Cheng et al., 2020; Qian et al., 2021; Sáez et al., 2021)	Accuracy: 76.2%, Sensitivity 95%, Specificity 76.3%	‘https://doi.org/10.1016/j.compbiomed.2020.103949’
Facebook‐Covid‐19 data (Sear et al., 2020)	Outbreak	Statistical report	COVID‐19 related comments	Data on common Facebook posts from the date 1/17/2020 until 2/28/2020.	✓		(Ahmed, Shahbaz, et al., 2020; Raamkumar et al., 2020; Azizan et al., 2020; Cannata et al., 2021)	Sensitivity Case 1: 94.3% and 90.9%, Case 2: 79.6% and 81.5%	‘https://www.cdc.gov/media/releases/2020/s0229‐COVID‐19‐first‐death.html’
COVID‐XRay‐5K DATASET (Minaee et al., 2020)	Medical image	Diagnosis	5000 images	COVID‐19 X‐ray samples and for Non‐COVID samples	✓		(Ahammed et al., 2020; Chowdhury, Kabir, et al., 2020; Khan, Sohail, et al., 2020; Nugroho, 2021; Imad et al., 2020; Rezaee et al., 2020; Satu et al., 2021)	Sensitivity rate 98% and specificity rate 90%	‘https://github.com/shervinmin/DeepCovid.git’
ChestX‐ray8 database (Wang et al., 2017a)	Medical image	Diagnosis	x‐ray images	totally of 108,948 images which 24,636 include one or more patient images. The rest are 84,312 normal cases images. last Updated 20 July 2020	✓		(Sohan, 2020; Zhao et al., 2020; Crosby et al., 2020; Sathitratanacheewin et al., 2020; Silva et al., 2020; Verma & Tayeb, 2021)	Sensitivity 72%, Specificity 82%	‘https://nihcc.app.box.com/v/ChestXray‐NIHCC’
Deep‐Learning‐COVID‐19‐on‐CXR‐using‐Limited‐Training‐Data‐Sets (Oh et al., 2020)	Medical image	Diagnosis	x‐ray images	Public CXR datasets are available JSRT, SCR, NLM(MC), Pneumonia, and COVID‐19	✓		(Cohen et al., 2020)	Accuracy is 91.9 %	‘https://github.com/jongcye/Deep‐Learning‐COVID‐19‐on‐CXR‐using‐Limited‐Training‐Data‐Sets’
COVID‐19 and Pneumonia Scans Dataset (Brunese et al., 2020)	Medical image	Diagnosis	x‐ray images	5887 images	✓		(Afshar et al., 2020; Amyar et al., 2020; Harmon et al., 2020; Maghdid et al., 2021; Sharma, 2020)	Area under the ROC curve higher than 97%	‘https://public.roboflow.com/classification/covid‐19‐and‐pneumonia‐scans’
Random Sample of NIH Chest X‐ray Dataset (Wang, Peng, et al., 2017b)	Medical image	Diagnosis	x‐ray images	total of 5,606 images and labels are extracted from the dataset of NIH Chest X‐ray	✓		(Antin et al., 2017; Basu et al., 2020; Bharati et al., 2020; Filice et al., 2020; Tang et al., 2021)	Accuracy 90.13%	‘https://www.kaggle.com/nih‐chest‐xrays/sample’
ICLUS ‐ Italian Covid‐19 Lung Ultra‐sound project (Wang, Peng, et al., 2017b)	Medical video	Diagnosis	Lung ultrasound video	a database of ultrasounds images that can possibly used for identifying patient status in different stages	✓		(Roy et al., 2020; Che et al., 2021; Carrer et al., 2020; Dastider et al., 2021)	Accuracy convex: 84%, linear: 94%	‘https://www.disi.unitn.it/iclus’
COVID‐CT (Zhao et al., 2020)	Medical image	Diagnosis	CT images	Tongji Hospital, Wuhan, China COVID‐19 data patients between January and April. Total of 349 CT images that included clinical features extracted from 216 patient of COVID‐19 disease.	✓		(Yang et al., 2020; Afshar et al., 2020; Al‐Karawi et al., 2020; Roberts et al., 2020; X. He et al., 2020)	Accuracy 95.37%, Sensitivity 95.99%, Specificity 94.76%	‘https://github.com/UCSD‐AI4H/COVID‐CT’
PEDIATRIC CXR DATASET (Kermany et al., 2018)	Medical image	Diagnosis	chest X‐ray images	Dataset are collected from Guangzhou Women and Children's Medical Center in Guangzhou which include different classes of medical images such non‐COVID‐19 viral pneumonia, bacterial pneumonia, and normal lungs.	✓		(Haghanifar et al., 2020; Longjiang et al., 2019; Rajaraman, Siegelman, et al., 2020; Rajaraman, Sornapudi¸ et al., 2020; Rajaraman and Antani (2020); Siddiqi, 2020)	Accuracy rate 92.8%, sensitivity rate 93.2% and specificity rate 90.1%	‘https://data.mendeley.com/datasets/rscbjbr9sj/2’
RSNA CXR DATASET (Shih et al., 2019)	Medical image	Diagnosis	chest X‐ray images	Multi‐expert curated and chest X‐ray images dataset contains samples from the National Institutes of Health (NIH) CXR‐14		✓	(Rahman, Khandakar, et al., 2021; Rajaraman, Siegelman, et al., 2020; Rajaraman, Sornapudi¸ et al., 2020; Rajaraman & Antani, 2020; Zamzmi et al., 2020)	Accuracy rate 93.08%, sensitivity rate 97.53, precision rate 93.15%, and F‐score of 94.57%	‘https://www.rsna.org/education/ai‐resources‐and‐training/ai‐image‐challenge’
TWITTER COVID‐19 CXR DATASET	Medical image	Diagnosis	chest X‐ray images	A collection of 134 CXRs with 2K2K pixel resolution in JFIF format that provided by a cardiothoracic radiologist from Spain via Twitter of SARS‐CoV‐2 confirmed cases		✓	(Haghanifar et al., 2020; Rajaraman, Siegelman, et al., 2020; Rajaraman, Sornapudi¸ et al., 2020; Rajaraman & Antani, 2020)	Accuracy rate 99.01%, and area under the curve 99.72%	‘https://twitter.com/ChestImaging’
MONTREAL COVID‐19 CXR DATASET (Cohen et al., 2020)	Medical image	Diagnosis	chest X‐ray images	A collection of 179 CXRs		✓	(Rajaraman, Siegelman, et al., 2020; Rajaraman, Sornapudi¸ et al., 2020; Rajaraman & Antani, 2020)	Accuracy rate 99.01%, and area under curve 99.72%	‘https://arxiv.org/abs/2003.11597’
COVID‐19 Chest X‐Rays for Lung Severity Scoring (Cohen et al., 2020)	Medical image	Diagnosis	chest X‐ray images	131 CXR images that extracted from 84 COVID‐19 cases		✓	(Mangal et al., 2020; Hall et al., 2020; Minaee et al., 2020; Mukherjee et al., 2021; Zhu et al., 2020; Rahimzadeh & Attar, 2020), (Ahmed et al., 2021; Apostolopoulos & Mpesiana, 2020; Khan, Shah, et al., 2020; Luján‐García, Moreno‐Ibarra, et al., 2020; Pham, 2021; Tartaglione et al., 2020; Tizhoosh & Fratesi, 2021)	Accuracy rate 96.6%, precision rate 93.17%, recall rate 98.25% and F‐measure rate 95.6%	‘https://github.com/ieee8023/covid‐chestxray‐dataset’
TCIA dataset (Hu et al., 2020)	Medical image	Diagnosis	3D CT lung scans	60 3D CT images that retrieved based on manual delineations of the lung anatomy	✓		(Choi et al., 2020; Dai et al., 2020; Le et al., 2021; Moitra & Mandal, 2019; Suter et al., 2020)	CI:0.501–0.756, iAUC: 0.620; 95%	‘http://doi.org/10.7937/K9/TCIA.2017.3r3fvz08’
large‐scale COVID‐19 CX‐R image dataset (Al‐Waisy et al., 2020)	Medical image	Diagnosis	chest X‐ray image	800 images that include four main classes specifically normal, COVID‐19, pneumonia virus, and pneumonia bacterial.		✓	(Al‐Waisy et al., 2020)	RMSE: 0.012%, MSE: 0.011%, specificity: 100%, sensitivity: 99.98%, precision: 100%, F1‐score: 99.99%, and accuracy rate: 99.99%	‘https://github.com/AlaaSulaiman/COVID19‐vs‐Normal‐dataset’
The World Mortality Dataset (Karlinsky & Kobak, 2021)	Outbreak	Statistical report	Time series data	data have collected weekly, monthly, or quarterly all‐cause mortality data from 77 countries, openly available as the regularly‐updated World Mortality Dataset	✓		(Karlinsky & Kobak, 2021; Ahmad, Ali, et al., 2020; Chowdhury et al., 2021; Elfiky et al., 2018; Malki et al., 2020)	AUC score: 81.38% and accuracy score: 81.30%	‘https://github.com/akarlinsky/world_mortality’

Existing public COVID‐19 datasets These datasets have been carefully selected from the literature by a well‐designed query in collecting the related works. A combination of keywords involved ML and DL and COVID‐19 diagnosis used in searching for the most relevant papers. The collecting of these papers selected from well‐regarded digital publisher databases (i.e., Elsevier, IEEE Xplore, MDPI, Springer, IOP Press, Wiley, and others). The papers retrieved from the query are subjected to filtering and screening processes to select only the COVID‐19 diagnoses dataset and excluded other datasets such as mask detection. According to Table 1, authors have made significant achievements by providing different volumes and types of datasets that can significantly support COVID‐19 diagnosis. However, the majority of available datasets are dedicated for diagnosis purposes, especially in terms of medical images. This situation is understandable, given that data acquisition based on medical images can be less complex and time consuming than the process using COVID‐19 RT‐PCR samples. Sample size varies among available datasets as it relies on the number of involved patients. However, most datasets are primarily generated through several examination experiments, and few datasets are collected from other available datasets regarded as secondary datasets (i.e., after processing the original dataset). However, secondary datasets have greatly contributed to the diagnosis of COVID‐19. For instance, large‐scale COVID‐19 CX‐R image datasets have been applied to the CLAHE algorithm (Al‐Waisy et al., 2020) as an enhancement technique for the resolution of medical images in the three primary datasets. The classification rate increased by 8% in the proceed dataset relative to that in raw images.

COVID‐19 EVALUATION MEASURES USING MACHINE LEARNING AND DEEP LEARNING

The performance of DL and ML techniques used in detecting COVID‐19 is normally evaluated against several measurements. The most popular measure is classification accuracy. In this section, other measures are used and fully described with mathematical formulation. Table 2 presents the evaluation measures used in evaluating the results of DL and ML methods for COVID‐19 classification. As shown in Table 1, each dataset uses one or more evaluation measures.

TABLE 2

Evaluation measures for COVID‐19 diagnosis models

Measures	Equation	References
Accuracy	Acc=Ta+TrTa+Fa+Tr+Fr×100	(Apostolopoulos et al., 2020; Minaee et al., 2020; Randhawa et al., 2020; Vaid, Kalantar, & Bhandari, 2020; Prakash et al., 2020; Loey, Manogaran, et al., 2020; Fayyoumi et al., 2020; Kavadi et al., 2020; Pinter et al., 2020; Alazab et al., 2020; Paul et al., 2020; Rajaraman, Siegelman, et al., 2020; Altan & Karasu, 2020; Zhu et al., 2020; Rahman, Hossain, et al., 2021; Hu et al., 2020; Rajaraman & Antani, 2020; Loey, Smarandache, et al., 2020; Rahman et al., 2020; Lalmuanawma et al., 2020; Hasan et al., 2020; Jaiswal et al., 2020; Roy et al., 2020; Oh et al., 2020; Pathak et al., 2020; Das et al., 2020; Alakus & Turkoglu, 2020; Hurt et al., 2020; Ni et al., 2020; Toğaçar, Ergen, & Cömert, 2020; Wang, Liu, et al., 2020; Yoo et al., 2020)
Sensitivity	SenRecall=TaTa+Fr	(Fayyoumi et al., 2020; Roy et al., 2020; Pathak et al., 2020; Oh et al., 2020; Das et al., 2020; Alakus & Turkoglu, 2020; Altan & Karasu, 2020; Apostolopoulos et al., 2020; Burdick et al., 2020; Civit‐Masot et al., 2020; Hu et al., 2020; Jaiswal et al., 2020; Loey, Smarandache, et al., 2020; Loey, Manogaran, et al., 2020; Mantas et al., 2020b; Minaee et al., 2020; Ni et al., 2020; Rahman et al., 2020; Rajaraman, Siegelman, et al., 2020; Rajaraman & Antani, 2020; Toğaçar, Ergen, & Cömert, 2020; Yoo et al., 2020)
Specificity	Spe=TrTr+Fr	(Fayyoumi et al., 2020; Apostolopoulos et al., 2020; Burdick et al., 2020; Civit‐Masot et al., 2020; Loey, Manogaran, et al., 2020; Minaee et al., 2020; Rajaraman, Siegelman, et al., 2020; Altan & Karasu, 2020; Hu et al., 2020; Rajaraman & Antani, 2020; (Loey, Smarandache, et al., 2020; Rahman et al., 2020; Pathak et al., 2020; Oh et al., 2020; Das et al., 2020; Jaiswal et al., 2020; Ni et al., 2020; Toğaçar, Ergen, & Cömert, 2020; Yoo et al., 2020)
G_Mean	GM=Sen×Spe	(Fayyoumi et al., 2020)
Precision	Pre=TaTa+Fa	(Fayyoumi et al., 2020; Loey, Manogaran, et al., 2020; Mantas et al., 2020b; Rajaraman, Siegelman, et al., 2020; Altan & Karasu, 2020; Mantas et al., 2020b; Oh et al., 2020; Roy et al., 2020; Alakus & Turkoglu, 2020; Hu et al., 2020; Jaiswal et al., 2020; Loey, Smarandache, et al., 2020; Rajaraman & Antani, 2020; Toğaçar, Ergen, & Cömert, 2020; Pathak et al., 2020; Yoo et al., 2020)
F1‐score	Fs=2×Pre.RecallPre+Recall	(Alazab et al., 2020; Loey, Manogaran, et al., 2020; Roy et al., 2020; Das et al., 2020; Oh et al., 2020; Alakus & Turkoglu, 2020; Altan & Karasu, 2020; Jaiswal et al., 2020; Loey, Smarandache, et al., 2020; Ni et al., 2020; Rajaraman, Siegelman, et al., 2020; Rajaraman & Antani, 2020; Toğaçar, Ergen, & Cömert, 2020)
Root mean square error	RMSE=1n∑i=1ndi−fiσi2	(Alazab et al., 2020; Pinter et al., 2020; Zeroual et al., 2020)
Mean absolute percentage error	MAPE=1N∑∣At−FtAt∣	(Arora et al., 2020; Pinter et al., 2020; Zeroual et al., 2020)
Mean absolute error	MAE=1N∑t=1N∣et∣	(Peng & Nagata, 2020; Zeroual et al., 2020; Heni, 2020; Zhu et al., 2020)
Explained variance	EV=1N∑i=1Nxi−x¯2	(Zeroual et al., 2020),
Root Mean squared log error	RMSLE=1N∑i=1Nlogxi+1−logx^i+1]2	(Zeroual et al., 2020),
Receiver operating characteristic	ROC=∫abfxdx	(Z. Li et al., 2020; Das et al., 2020; Altan & Karasu, 2020; Hu et al., 2020; Ni et al., 2020; Rajaraman, Siegelman, et al., 2020)
Logarithmic loss	LogLoss=−1N∑i=1Nyi.logey^i+1−yi.loge1−y^i	(Vaid, Kalantar, & Bhandari, 2020)
Testing the execution time	TET	(Brunese et al., 2020; Kavadi et al., 2020)
Area under curve	AUC=∫abfxdx	(Z. Li et al., 2020; Alakus & Turkoglu, 2020; Burdick et al., 2020; Heni, 2020; Hu et al., 2020; Jaiswal et al., 2020; Rajaraman, Siegelman, et al., 2020; Rajaraman & Antani, 2020; Wang, Zgeng, et al., 2020; Yoo et al., 2020)
Average time for COVID‐19 detection	M=1N*∑j=1Nt	(Brunese et al., 2020)
Matthews correlation coefficient	MCC=x2N	(Cole et al., 2020; Rajaraman & Antani, 2020)
Others Evaluation (cluster visualization, p‐value, correlation coefficient)	‐	(Vaid, Cakan¸ et al., 2020; Massie et al., 2020; Sear et al., 2020; Siddiqui et al., 2020; Kannan et al., 2020; Ahmad, Garhwal, et al., 2020; Carrillo‐Larco & Castillo‐Cara, 2020; Cui et al., 2020; Jamshidi et al., 2020; Ni et al., 2020; Rajaraman, Siegelman, et al., 2020; Sujath et al., 2020; M. Yadav et al., 2020; Zhang et al., 2020; Zhu et al., 2020)

Evaluation measures for COVID‐19 diagnosis models Figure 10 shows the measures used in evaluating diagnosis models for COVID‐19. More than 18 evaluation measures are used by state‐of‐the‐art. Clearly, accuracy is the main evaluation criterion used by most researchers, followed by sensitivity.

FIGURE 10

Evaluation measures

Evaluation measures Figure 11 shows the classifiers used in evaluating ML and DL methods for COVID‐19. As shown in Figure 11, the SVM classifier is the main technique used by most researchers in the COVID‐19 domain. Boost, K‐means and logistic regression are ranked second according to the percentages of using these classifiers in research papers.

FIGURE 11

The percentage of using machine learning approaches for COVID‐19

CHALLENGES AND FUTURE DIRECTIONS

As aforementioned, the ML and DL methods have been intensively utilized for COVID‐19. Although the ML and DL methods show successful outcomes for their tested COVID19 cases, there are countable challenges that can be considered to enhance the research quality of such direction. Figure 12 shows the percentage of using deep learning approaches for COVID‐19. These challenges can be summarized as follows:

FIGURE 12

The percentage of using deep learning approaches for COVID‐19

Under usage of ultrasound dataset: Most of the datasets used in the literature are for CT scan and x‐ray, where the ultrasound datasets are rarely developed and used. It may also directly assist in the triage of patients; the first‐look estimation of the disease's severity and the urgency at which a patient needs further care. The utilization of ultrasound for COVID‐19 detection can be beneficial for low and middle‐income countries as the diagnosis through RT‐PCR or CT may not always be available. Interestingly, the usage of ultrasound imaging in real‐time, when combined with DL methods, demonstrates valid results instantly (Roy et al., 2020). Heterogeneity and size of the datasets: The literature has datasets with different sizes, but the limitation is that the images for COVID‐19 (i.e., not images for a general chest infection) are scarce (Wang, Peng, et al., 2017a; Wang, Peng, et al., 2017b). Such a small size of COVID‐19 images can be an obstacle to obtain accurate results. Another issue related to dataset collection is that they are obtained from limited regions such as Argentina, Brazil, Canada, China, India, Italy, and Japan (Elaziz et al., 2020; Fayyoumi et al., 2020; Minaee et al., 2020; Pinter et al., 2020; Randhawa et al., 2020; Wang, Zhengm et al., 2020). This can reduce the possibility of having results related to other features related to COVID‐19 such as race, region, and food, etc. Severity level: Most of the datasets used to have two labels for the possible output classes (i.e., infected or not infected) (Jaiswal et al., 2020). However, there is a need to have a dataset that is annotated to show the severity level of the illness to monitor patient healing progress. End‐to‐end system: There is no complete real‐time system that utilizes the research outcomes in machine learning methods (Jaiswal et al., 2020). The researchers can build a real‐time system using deep learning and machine learning mechanisms to diagnose COVID‐19 virus infection. Such a system can help in checking hospitals, airports, markets, schools, and any other area that is considered a host spot those areas which are the primary hotshots for extending the pandemics. Effect on different body organs: Most of the studies rely on investigating lung or chest images to predict the infection with COVID‐19. Other organs affected by the virus need to be investigated further as this may ease the virus detection and reduce the risk of being infected and not knowing that. Especially because when the virus reaches the patient's lung is considered a late severe infection stage, and at an earlier stage of infection there are no symptoms on the virus carrier (Bernheim et al., 2020). The percentage of using deep learning approaches for COVID‐19 With respect to the previous deep learning and machine learning studies related to COVID‐19, this review paper can be considered as a rich resource that inspires future studies to turn their attentions to study the COVID‐19 concerns in different perspectives such as: Historical record of the patient: The deep learning and machine learning mechanisms can be tweaked to predict COVID‐19 patients with constraints related to their historical records (e.g., chronic diseases). Monitoring the COVID‐19 vaccine: Deep learning and machine learning approaches can be used to monitor and observe the effect of the vaccine or other medicine recipes on the patient healing or body chemistry. The COVID‐19 disease lifespan: Deep learning and machine learning can be used to monitor the development of the COVID‐19 virus in the infected cases using x‐ray, CT scan, or ultrasound images to determine the critical infected cases. Other deep learning and machine learning mechanisms for COVID‐19: For example, the reinforcement learning approach can be used to improve the efficiency of the deep learning and machine learning approaches. Utilizing the internet of things (IOT): There is no much work related to employing IOT for COVID‐19 monitoring. The use of a cloud server along with optimized sensor networking can help in achieving such a goal. The system can be used for remote health monitoring or elderly care. Such a system requires more attention to be given to security and privacy issues of the data collected. Gene expression: The connection between the genes expression of the confirmed COVID‐19 case can be studied to determine the most informative genes related to the infection. The ML and DL techniques can be used to extract these genes and describe their effect on the possibility of being infected by the virus and the severity degree of the infected cases. Effect of infection on body chemistry: There is a need to have a dataset that provides information related to being infected with COVID‐19 and the changes that happened to the chemistry of the body (e.g., blood group, RNA sequence, oxygen level, etc.). Such data can help in using other means of diagnosing patients which could help in finding other ways to detect the infected cases.

CONCLUSION

In this paper, a comprehensive review of deep learning and machine learning techniques used to diagnose and to recognize the COVID‐19 outbreak is conducted. The main purpose of this work is to summarize the previous studies and their applications applied for COVID‐19. The reviewed studies have been selected from several public and highly‐reputed research databases such as IEEE, Springer, Elsevier, MDPI, etc. The selection of these papers passes through several filters to remove the duplicate ideas with relevant content related to COVID‐19 using Machine learning and Deep learning strategies. In summary, India has the most COVID‐19 based machine learning studies, while China has the most COVID‐19 based deep learning studies. In general, the summarized studies related to COVID‐19 are analyzed and discussed based on three categories, namely, machine learning, deep learning, and hybrid approaches. The studies related to Machine learning are classified into two classes: supervised learning and unsupervised learning. Furthermore, the studies related to the deep learning are categorized into CNN, DNN, RNN, GAN and Optimization with deep learning approaches. All of summarized studies either used for diagnosing the COVID‐19 by means of analyzing the X‐ray images and CT scan images. Furthermore, few studies used either machine learning or deep learning to anticipate the COVID‐19 outbreak in the countries. In total, SVM, LDA, KNN, ANN, Boost, RF, K‐means, and LR as a machine learning algorithm and CNN, DNN, RNN, and GANs as a deep learning algorithm are used for COVID‐19 diagnosis and outbreak prediction. In conclusion, for COVID‐19 diagnosing and outbreak prediction, SVM is the most widely used machine learning mechanism, and CNN is the most widely used deep learning mechanism. In this review paper, the most recent and public COVID‐19 datasets used by other researchers to evaluate and compare their methods are reported and their accessibility are provided in Table 1. The evaluation measurements used to diagnose the COVID‐19 based machine learning and deep learning strategies are summarized. Accuracy, sensitivity and specificity are the most widely used measurements in the previous studies. This is inline with the previous studies using the deep learning and machine learning strategies for the prediction process.

146 in total

1. Unsupervised Machine Learning for the Discovery of Latent Clusters in COVID-19 Patients Using Electronic Health Records.

Authors: Wanting Cui; Daniel Robins; Joseph Finkelstein
Journal: Stud Health Technol Inform Date: 2020-06-26

2. Novel Feature Selection and Voting Classifier Algorithms for COVID-19 Classification in CT Images.

Authors: El-Sayed M El-Kenawy; Abdelhameed Ibrahim; Seyedali Mirjalili; Marwa Metwally Eid; Sherif E Hussein
Journal: IEEE Access Date: 2020-09-30 Impact factor: 3.367

Review 3. Applications of machine learning and artificial intelligence for Covid-19 (SARS-CoV-2) pandemic: A review.

Authors: Samuel Lalmuanawma; Jamal Hussain; Lalrinfela Chhakchhuak
Journal: Chaos Solitons Fractals Date: 2020-06-25 Impact factor: 5.944

4. Quantifying COVID-19 Content in the Online Health Opinion War Using Machine Learning.

Authors: Richard F Sear; Nicolas Velasquez; Rhys Leahy; Nicholas Johnson Restrepo; Sara El Oud; Nicholas Gabriel; Yonatan Lupu; Neil F Johnson
Journal: IEEE Access Date: 2020-05-11 Impact factor: 3.367

5. Coronavirus herd immunity optimizer (CHIO).

Authors: Mohammed Azmi Al-Betar; Zaid Abdi Alkareem Alyasseri; Mohammed A Awadallah; Iyad Abu Doush
Journal: Neural Comput Appl Date: 2020-08-27 Impact factor: 5.606

6. A new prediction approach of the COVID-19 virus pandemic behavior with a hybrid ensemble modular nonlinear autoregressive neural network.

Authors: Patricia Melin; Julio Cesar Monica; Daniela Sanchez; Oscar Castillo
Journal: Soft comput Date: 2020-11-19 Impact factor: 3.732

7. Multi-task deep learning based CT imaging analysis for COVID-19 pneumonia: Classification and segmentation.

Authors: Amine Amyar; Romain Modzelewski; Hua Li; Su Ruan
Journal: Comput Biol Med Date: 2020-10-08 Impact factor: 4.589

8. Using Machine Learning to Estimate Unobserved COVID-19 Infections in North America.

Authors: Shashank Vaid; Caglar Cakan; Mohit Bhandari
Journal: J Bone Joint Surg Am Date: 2020-07-01 Impact factor: 6.558

9. Prediction of respiratory decompensation in Covid-19 patients using machine learning: The READY trial.

Authors: Hoyt Burdick; Carson Lam; Samson Mataraso; Anna Siefkas; Gregory Braden; R Phillip Dellinger; Andrea McCoy; Jean-Louis Vincent; Abigail Green-Saxena; Gina Barnes; Jana Hoffman; Jacob Calvert; Emily Pellegrini; Ritankar Das
Journal: Comput Biol Med Date: 2020-08-06 Impact factor: 4.589

32 in total

Review 1. A Comprehensive Review of Machine Learning Used to Combat COVID-19.

Authors: Rahul Gomes; Connor Kamrowski; Jordan Langlois; Papia Rozario; Ian Dircks; Keegan Grottodden; Matthew Martinez; Wei Zhong Tee; Kyle Sargeant; Corbin LaFleur; Mitchell Haley
Journal: Diagnostics (Basel) Date: 2022-07-31

2. Imaging Severity COVID-19 Assessment in Vaccinated and Unvaccinated Patients: Comparison of the Different Variants in a High Volume Italian Reference Center.

Authors: Vincenza Granata; Roberta Fusco; Alberta Villanacci; Simona Magliocchetti; Fabrizio Urraro; Nardi Tetaj; Luisa Marchioni; Fabrizio Albarello; Paolo Campioni; Massimo Cristofaro; Federica Di Stefano; Nicoletta Fusco; Ada Petrone; Vincenzo Schininà; Francesca Grassi; Enrico Girardi; Stefania Ianniello
Journal: J Pers Med Date: 2022-06-10

3. An Intelligent ECG-Based Tool for Diagnosing COVID-19 via Ensemble Deep Learning Techniques.

Authors: Omneya Attallah
Journal: Biosensors (Basel) Date: 2022-05-05

4. Maintenance and Management Technology of Medical Imaging Equipment Based on Deep Learning.

Authors: Bin Liu; Lingli Tong; Yanmei Liu; Zhizhang Guo
Journal: Contrast Media Mol Imaging Date: 2022-07-01 Impact factor: 3.009

5. An Analysis of New Feature Extraction Methods Based on Machine Learning Methods for Classification Radiological Images.

Authors: Firoozeh Abolhasani Zadeh; Mohammadreza Vazifeh Ardalani; Ali Rezaei Salehi; Roza Jalali Farahani; Mandana Hashemi; Adil Hussein Mohammed
Journal: Comput Intell Neurosci Date: 2022-05-25

6. Deep Learning-Based Mental Health Model on Primary and Secondary School Students' Quality Cultivation.

Authors: Shuang Li; Yu Liu
Journal: Comput Intell Neurosci Date: 2022-07-06

7. Review on COVID-19 diagnosis models based on machine learning and deep learning approaches.

Authors: Zaid Abdi Alkareem Alyasseri; Mohammed Azmi Al-Betar; Iyad Abu Doush; Mohammed A Awadallah; Ammar Kamal Abasi; Sharif Naser Makhadmeh; Osama Ahmad Alomari; Karrar Hameed Abdulkareem; Afzan Adam; Robertas Damasevicius; Mazin Abed Mohammed; Raed Abu Zitar
Journal: Expert Syst Date: 2021-07-28 Impact factor: 2.812