| Literature DB >> 32451808 |
A S Albahri1, Rula A Hamid2, Jwan K Alwan3, Z T Al-Qays4, A A Zaidan5, B B Zaidan6, A O S Albahri6, A H AlAmoodi6, Jamal Mawlood Khlaf7, E M Almahdi8, Eman Thabet9, Suha M Hadi1, K I Mohammed6, M A Alsalem10, Jameel R Al-Obaidi11, H T Madhloom12.
Abstract
Coronaviruses (CoVs) are a large family of viruses that are common in many animal species, including camels, cattle, cats and bats. Animal CoVs, such as Middle East respiratory syndrome-CoV, severe acute respiratory syndrome (SARS)-CoV, and the new virus named SARS-CoV-2, rarely infect and spread among humans. On January 30, 2020, the International Health Regulations Emergency Committee of the World Health Organisation declared the outbreak of the resulting disease from this new CoV called 'COVID-19', as a 'public health emergency of international concern'. This global pandemic has affected almost the whole planet and caused the death of more than 315,131 patients as of the date of this article. In this context, publishers, journals and researchers are urged to research different domains and stop the spread of this deadly virus. The increasing interest in developing artificial intelligence (AI) applications has addressed several medical problems. However, such applications remain insufficient given the high potential threat posed by this virus to global public health. This systematic review addresses automated AI applications based on data mining and machine learning (ML) algorithms for detecting and diagnosing COVID-19. We aimed to obtain an overview of this critical virus, address the limitations of utilising data mining and ML algorithms, and provide the health sector with the benefits of this technique. We used five databases, namely, IEEE Xplore, Web of Science, PubMed, ScienceDirect and Scopus and performed three sequences of search queries between 2010 and 2020. Accurate exclusion criteria and selection strategy were applied to screen the obtained 1305 articles. Only eight articles were fully evaluated and included in this review, and this number only emphasised the insufficiency of research in this important area. After analysing all included studies, the results were distributed following the year of publication and the commonly used data mining and ML algorithms. The results found in all papers were discussed to find the gaps in all reviewed papers. Characteristics, such as motivations, challenges, limitations, recommendations, case studies, and features and classes used, were analysed in detail. This study reviewed the state-of-the-art techniques for CoV prediction algorithms based on data mining and ML assessment. The reliability and acceptability of extracted information and datasets from implemented technologies in the literature were considered. Findings showed that researchers must proceed with insights they gain, focus on identifying solutions for CoV problems, and introduce new improvements. The growing emphasis on data mining and ML techniques in medical fields can provide the right environment for change and improvement.Entities:
Keywords: Artificial Intelligence; Biological Data Mining; COVID-19; Coronaviruses; MERS-CoV; Machine Learning; SARS-CoV-2
Mesh:
Year: 2020 PMID: 32451808 PMCID: PMC7247866 DOI: 10.1007/s10916-020-01582-x
Source DB: PubMed Journal: J Med Syst ISSN: 0148-5598 Impact factor: 4.460
Three sequences of the Boolean search query
| Seq. | Query Details Terms | Result of Databases | Final Results |
|---|---|---|---|
| 1st query | (‘coronavirus’ OR ‘coronaviridae’ OR ‘CoV’) AND (‘detection system’ OR ‘diagnosis system’ OR ‘diagnostic system’ OR ‘diagnostic application’ OR ‘diagnosis application’) | SD = 766 IEEE = 1 PubMed = 16 WOS = 15 Scopus = 51 | 849−49 (duplicate) = 800 Articles |
| 2nd query | (‘coronavirus’ OR ‘coronaviridae’ ) AND (‘detection’ OR ‘diagnosis’ OR classification) AND (‘machine learning’ OR ‘artificial intelligence’) | SD = 34 IEEE = 2 PubMed = 2 WOS = 2 Scopus = 185 | 225−10 (duplicate) = 215 Articles |
| 3rd query | (‘coronavirus’ OR ‘coronaviridae’ ) AND (‘detection’ OR ‘diagnosis’) AND (‘machine learning’ OR ‘artificial intelligence’) | SD = 29 IEEE = 1 PubMed = 1 WOS = 1 Scopus = 265 | 297−7 (duplicate) = 290 Articles |
| Final results for all queries | |||
Figure 1Schematic of the approach to identify, screen and include relevant studies.
Figure 2Summary of algorithms and methods used in the literature review
Figure 3Statistics of included papers by publication year
State-of-the-art CoV prediction algorithms
| Ref. | Application nature | ML and data mining classification algorithms | Evaluation | Accuracy |
|---|---|---|---|---|
| [ | Improve infection prediction for MERS-CoV | (Classification) Decision tree Naïve Bayes | Accuracy using cross-validation model | 90% |
| [ | Build several prediction models for MERS-CoV | (Classification) Naïve Bayes classifier J48 decision tree | Accuracy, precision and recall | Between 53.6% and 71.58%. |
| [ | Identify the important factors influence the recovery of MERS CoV | (logistic regression) Naïve Bayes SVM J48 | Estimate p-value | - |
| [ | Analysing, diagnosing and predicting MERS-CoV | SVM Decision Tree | Accuracy using cross-validation model | 86.44% |
| [ | Analysing a plausible explanation of the public overreaction to MERS-CoV | Latent Dirichlet allocation Word2Vec method Natural language processing | - | - |
| [ | Diagnosing patients with MERS-CoV through early syndromes | Naïve Bayes Random forest, SVM | Receiver operating characteristic (ROC) | Random Forest (ROC) = 0.942 Naïve Bayes (ROC) = 0.907 SVM (ROC) = 0.68 |
| [ | Extracting difference and similarity between SARS-CoV and MERS-CoV | Apriori algorithm Decision tree SVM | 10-fold validation test | Higher than 75% |
| [ | Predicting and preventing for MERS-CoV | Bayesian belief network Global Positioning System (GPS)-based risk assessment | True-positive (TP) False-positive (FP) rates ROC area | more than 80% ROC= 0.970 |
Descriptions of CoV datasets with available sources
| Ref. | Datasets descriptions | Available sources |
|---|---|---|
| [ | -Dataset of patients affected by MERS-CoV in Saudi Arabia consisted of all cases in the second half of 2016. | Available on the Ministry of Health Control and Command Centre website [ |
| [ | -A total of 1082 records of cases reported from 2013 to 2015. -A total of 633 new case records, 231 recovery records, and 218 death records, for a total of 1082 records. | Collected from the website of the Control and Command Centre of Saudi Ministry of Health [ |
| [ | - The analysed data were collected from the Control and Command Centre. -A total of 836 patient records were used for analysis. Fifty-two patients out of the 836 cases were initially reported as dead. Hence, those cases were removed from the dataset, and 784 cases were used in the study. | Ministry of Health website of the Kingdom of Saudi Arabia. |
| [ | The MERS-CoV dataset consisted of all reported cases in Saudi Arabia from 2013 to 2017. | N/A |
| [ | -Articles collected from the Internet reported by 153 news media outlets in Korea and comments associated with these articles from day 1 (the first confirmed case on May 20, 2015) to the day 70 (the de facto end declared by the government on July 28, 2015), in addition to short-text comments on news articles in Twitter and Facebook. | [ |
| [ | - A dataset was collected from UCI. A dataset containing 322 records, 92 infected cases and 230 uninfected cases was obtained. -Each record contained 24 attributes. | [ [ |
| [ | -SARS and MERS spike glycol protein data from the National Centre for Biotechnology Information database | [ |
| [ | -Synthetic data were generated for 0.2 million users. -Raw information from users was collected using body-worn sensors and manually recorded data using the mobile application. | N/A |
Case study types in the literature review
| Ref. | Real dataset case study | Analysis dataset case study |
|---|---|---|
| [ | √ | |
| [ | √ | |
| [ | √ | |
| [ | √ | |
| [ | √ | |
| [ | √ | |
| [ | √ | |
| [ | √ |
Features and classes used in the literature review
| Ref. | Features and classes | |
|---|---|---|
| Personal patient information attributes | CoV attributes | |
| [ | - Gender - City - The Probable source of infection class | N/A |
| [ | - Gender - Age - Nationality - City - The Patient isa healthcare personnel or not | N/A |
| [ | - Gender - Age - Healthcare worker or not - Symptoms - Status at time of identification of disease - Presence of pre-existing disease or not - Patient in contact with animal or not - Hospital - Household or community-acquired - The patient died or recovered | N/A |
| [ | - Gender - Age - Exposure to camels - Comorbidities - Exposure to MERS-CoV cases - City - The patient is employed in healthcare or not - The patient is alive or dead. | N/A |
| [ | N/A | - MERS epidemic - Mass media - Public emotion |
| [ | - Age - Sex | - Fever - Fasting blood sugar - Heart disease - Chronic kidney - Chills - Dry - Productive - Shortness of breath (SOB) - Sore throat - Runny nose - Abnormal pain - Nausea - Vomiting, diarrhoea - Myalgia - Headache - Hypertension - Chronic lung - Obesity - Smoking - Chest pain |
| [ | N/A | - Spike glycoprotein of MERS and SARS - Amino acid isoleucine - Asparagine |
| [ | - Name - Address - Telephone numbers - Age - Sex - Occupation - GPS geographic location of the house - Names of relatives - Mobile numbers | - SOB - Cough - Fever integer body temperature in C - Acute respiratory distress syndrome - Consumptive coagulopathy - Consumptive coagulopathy - Symptom - Food exposure - Animal exposure - Infected human exposure - Risk area exposure |