| Literature DB >> 33805218 |
Shikah J Alsunaidi1, Abdullah M Almuhaideb2, Nehad M Ibrahim1, Fatema S Shaikh3, Kawther S Alqudaihi1, Fahd A Alhaidari2, Irfan Ullah Khan1, Nida Aslam1, Mohammed S Alshahrani4.
Abstract
The COVID-19 epidemic has caused a large number of human losses and havoc in the economic, social, societal, and health systems around the world. Controlling such epidemic requires understanding its characteristics and behavior, which can be identified by collecting and analyzing the related big data. Big data analytics tools play a vital role in building knowledge required in making decisions and precautionary measures. However, due to the vast amount of data available on COVID-19 from various sources, there is a need to review the roles of big data analysis in controlling the spread of COVID-19, presenting the main challenges and directions of COVID-19 data analysis, as well as providing a framework on the related existing applications and studies to facilitate future research on COVID-19 analysis. Therefore, in this paper, we conduct a literature review to highlight the contributions of several studies in the domain of COVID-19-based big data analysis. The study presents as a taxonomy several applications used to manage and control the pandemic. Moreover, this study discusses several challenges encountered when analyzing COVID-19 data. The findings of this paper suggest valuable future directions to be considered for further research and applications.Entities:
Keywords: 2019 novel coronavirus disease (COVID-19); artificial intelligence (AI); big data; big data analytics; healthcare
Mesh:
Year: 2021 PMID: 33805218 PMCID: PMC8037067 DOI: 10.3390/s21072282
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Summary of surveys on big data analytics in the healthcare field.
| Source | Publication Year | Domain | Key Contribution |
|---|---|---|---|
| [ | 2017 | Healthcare security and privacy | Discussed healthcare data security and privacy issues, and the mechanisms and strategies available for healthcare data privacy, security, and user access |
| [ | 2017 | Heart attack prediction and prevention | Identified the uses and technologies of big data analytics in this area, as well as challenges and concerns regarding patient privacy |
| [ | 2018 | General healthcare | Defined the scope of big data analytics and its applications in healthcare, and provided strategies to overcome its challenges |
| [ | 2019 | Health care organizational decision-making | Identified the main characteristics and drivers of market uptake of Artificial Neural Networks (ANN) for healthcare-related regulatory decision-making |
| [ | 2019 | Healthcare and medical problems | Reviewed traditional and fuzzy decision-making methods applied to nine areas of healthcare and medical problems |
| [ | 2019 | Healthcare sector applications | Discussed the impact of big data on various stakeholders and the challenges |
| [ | 2019 | IoT and healthcare industry | Identified research trends of the Internet of Things Big Data Analytics model (IoTBDA) in the healthcare industry, and demonstrated the influence of the IoTBDA model on the design, development, and application of IoT-based innovations in healthcare services |
| [ | 2019 | Medical decision-making | Described the current state of research related to collective intelligence |
| [ | 2019 | Patient-centric healthcare system | Presented several analytical approaches from various stakeholders’ perspectives and reviewed the different big data frameworks in terms of data sources, analytical capability, and application areas. Also, it discussed the impact of big data on improving the healthcare ecosystem |
| [ | 2019 | Public health and healthcare organizations | Provided a better understanding for governments and health policymakers about how developing a data-driven strategy could improve public health and the functioning of healthcare organizations and explain the challenges associated with this improvement |
| [ | 2020 | COVID-19 detection and contact tracing | Explained the potentials of nature-inspired computing (NIC) models for accurate COVID-19 detection and optimized contact tracing |
| [ | 2020 | COVID-19 medical | Discussed the role of medical imaging integrated with artificial intelligence (AI) in combating COVID-19 |
| [ | 2020 | COVID-19 medical images detection and classification in terms of evaluation and benchmarking | Highlighted the gaps and challenges, and proposed a detailed methodology for the benchmarking and evaluation of AI techniques used in all COVID-19 medical images classification tasks |
| [ | 2020 | COVID-19 pandemic | Explained the role of AI in fighting pandemics |
| [ | 2020 | Data harmonization (DH) and health management decision-making | Collected definitions and concepts of DH and addressed the causal relation between DH and decision-making in health management |
| [ | 2020 | Healthcare aspects | Provided an overview of the big data analytics publication dynamics in healthcare and discussed several examples to this field |
| [ | 2020 | Healthcare engineering systems | Synthesized and analyzed publications covering data analytics, big data, data mining, and machine learning in the field of Healthcare Engineering Systems |
| [ | 2020 | Mobile health (m-health) | Explored AI applications and big data analytics to provide insights for users to plan resource use for specific challenges in m-health, and proposed a m-health model based on AI and big data analytics |
Figure 1Potential application areas of big data analytics for COVID-19.
Data analysis technique, type, source, and findings of the existing studies.
| Area | Ref | Aim | Technique | Used Data Type | Data Source | Findings |
|---|---|---|---|---|---|---|
| Diagnosis | [ | Develop a diagnosis model for COVID-19 detection and diagnosis of symptoms to define appropriate care measures | Best Worst Method (BWM) | Symptoms and CT scans | Body sensors | The model can differentiate COVID-19 from four other viral chest diseases with 98% accuracy |
| [ | Design a medical device to detect and track respiratory symptoms of COVID-19 | N/A | Symptoms | Headsets and mobile phone | The approach provided good and stable results and can be expanded to include more sensors to detect other COVID-19 symptoms | |
| [ | Develop a remote patient monitoring program (RPM) for discharged COVID-19 cases | The mixed-effects logistic regression model | Demographics, medical data | The remote monitoring program, pulse oximeter, and thermometer | RPM provides scalable remote monitoring capabilities and decreases readmission risk | |
| [ | Investigate smartwatches usefulness in pre-symptoms COVID-19 detection | Two anomaly detection models (RHR-Diff and HROS-AD) | Demographics, activity, medical data, COVID-19 status | Smartwatches and MyPHD mobile app | Respiratory infections can be detected through activity tracking and health monitoring via wearable devices | |
| [ | Identify symptoms associated with positive COVID-19 cases | Principal component analysis (PCA), and logistic regression model | Demographics, medical data | Screening via phone and COVID-19 PCR test | Fever, anosmia/ageusia, and myalgia were the strongest signs of positive COVID-19 cases, while no symptoms were limited to nasal congestion/sore throat associated with negative cases | |
| [ | Determine the clinical characteristics and outcomes of COVID-19 patients in the NY area | N/A | Demographics, medical data, COVID-19 status | Northwell Health system | The common comorbidities were obesity, hypertension, and diabetes.From outpatients or dead patients ( | |
| [ | Distinguish COVID-19 cough sound from other respiratory diseases through crowd source data | Logistic Regression (LR), Gradient Boosting Trees, and Support Vector Machines (SVMs) | Demographics, medical data, COVID-19 data | Web app and Android app | Wet and dry cough are the common symptoms of positive COVID-19 cases, whereas chest tightness and the lack of smell are the common combination symptoms | |
| [ | Discuss the importance of developing complementary technologies to diagnose and monitor COVID-19 infections | N/A | Activity data, medical data | Sensors | Recommend deploying advanced wearable technologies configured to directly address needs in COVID-19 monitoring and noticing the symptoms | |
| [ | Identify the clinical characteristics of COVID-19 to help in mapping the disease and guiding pandemic management | N/A | Demographics, medical data, COVID-19 status, travel data | Health Electronic Surveillance Network (HESN) database for all Saudi Arabia regions | Fever and cough were common symptoms in the study sample | |
| [ | Employing a two-stage cascading platform to enhance the accuracy of machine learning models | Progressive machine learning technique merged with Spark-based linear models, Multilayer Perceptron (MLP), and LSTM | Medical data | Cardiac Arrhythmia Database. Uniform Resource Locator (URL) Reputation Dataset from University of California Irvine Machine Learning (UCI ML) Repository | Using an improved algorithm with two-step data analysis platforms can increase accuracy in lower computation time | |
| [ | Analyzing the dense layers among the convolutional network can help to increase the accuracy of classification of images for diabetic retinopathy | Deep learning model | Medical data, Demographics data | The Messidor-2 dataset from the hospital | Using improved programming technology can enhance accuracy | |
| [ | Analyze the effects of COVID-19 on patients with cardiovascular disease | Generalized linear mixed model | Demographics, medical data, COVID-19 status | HERs from General Hospital of Central Theatre Command in Wuhan, China | Middle-aged and elderly heart patients are most likely to have COVID-19, whereas new-onset hypertension and heart injury are common complications of severe COVID-19 cases | |
| Estimate or Predict Risk Score | [ | Specify the effect of COVID-19 on the cardiovascular system | The multi -factor logistic regression model | Demographics and medical data | HERs | Cardiac function and vital signs should be monitored in COVID-19 patients, especially those with hypotension, pericardial effusion, or severe myocardial injury |
| [ | Develop and validate a risk score to predict adverse events of suspected COVID-19 patients | Least absolute shrinkage and selection operator (LASSO) and logistic regression models | Demographics and medical data | 15 EDs in Southern California | COVAS score can help physicians to identify patients who may experience a serious event within 7 days | |
| [ | Discover unregistered suspected COVID-19 patients and infectious places | SIR and θ-SEIHRDmathematical models | Demographics and COVID-19 data | IoT-based system and GPS | The proposed system helps identify people who had close contact with COVID-19 patients | |
| [ | Verify if the COVID-19 virus can be transmitted through indirect contact | N/A | Demographics, medical, environmental, and other data | Guangzhou CDC database and sample collection | The virus can survive for a short period on surfaces, allowing indirect transmission of infection to uninfected people | |
| [ | Identify the COVID-19 outbreak impact on the psychological side | Bivariate linear regression | Demographics, medical, social data | Online questionnaire | The COVID-19 outbreak has a significant mental impact on people | |
| [ | Analyze the risk of tuberculosis skin on getting infected by tuberculosis | Statistical | Medical data, Demographics data | Public source | The tuberculin skin can increase the infection by up to 20% | |
| [ | Predict the course of the COVID-19 epidemic to design a control strategy | A designed mathematical model called SIDARTHE | Demographics, medical, environmental data | Public data from Italian MoH and Italian Civil Protection | Social distancing measures and lockdowns are necessary and effective, and precautionary measures for COVID-19 can only be relieved when tests are conducted on a large scale and a mechanism for contact tracing is in place | |
| Healthcare Decision-Making | [ | Evaluate the effectiveness of COVID-19 control measures | C-SEIR model(mathematical model of disease transmission dynamics) | Confirmed COVID-19 data | Public data sources | Quarantine measures have an effective role in containing COVID-19, but they are economically expensive |
| [ | Develop a patient monitoring platform to directly provide the necessary care | N/A | Demographics, medical, COVID-19 data | Online questionnaire via patient monitoring program | Analyzing patient monitoring data helps to know the risk score to determine the care required, allowing optimal consumption of medical resources | |
| [ | Provide a platform for data collection and analysis to estimate disease incidence to develop risk mitigation strategies and resource allocation | Weighted prediction model | Demographics, medical, COVID-19, and other data | Mobile app | Existing data collection methods can be repurposed to track and obtain real-time data for the population during any rapid global health crisis | |
| [ | Identify the regional distribution of the spread of infection and the percentage of healthcare consumption in each region | N/A | Demographics, medical, and other data | Mobile app | Can rely on the mobile app to perform self-assessment and data collection that can be displayed on an interactive map and linked to the results of the COVID-19 test results to support decision-makers and healthcare providers in making decisions | |
| [ | Forecast the census and ventilators requirements for a specific hospital | Weibull and conditional distributions (analytical model) | Statistical data | COVID-19 hospitalized patient records | The model can predict the census and the required number of MV in one, three, and seven days after the simulation run date | |
| [ | Estimate the need for health services and the number of daily deaths over the next 4 months from the date of the study | Statistical model | COVID-19 and other data | WHO websites and local and national authorities in the US states | The model predicts an increased death rate and demand for medical beds, ICU, and MVs | |
| [ | Prove that the three clinical variables: age, fever, tachypnea, can be used to predict the need to admit COVID-19 patients into the ICU | EHRead from Savana [ | Demographics, medical data | EHRs of the hospitals within the Servicio de Salud de Castilla-La Mancha (SESCAM) Healthcare Network in Castilla-La Mancha, Spain | The most common symptoms of male COVID-19 with an average age of 58.2 years who were admitted to ICU are coughing, fever, and shortness of breath, while those between 40 and 79 years of age are likely to be admitted to the ICU if they suffer from rapid breathing | |
| [ | Pre-risk assessment of the epidemic in Italy and identification of high-risk areas | a-priori effect of hazard and vulnerability model (a-priori E_H_V) | Statistical and environmental data | Data from Italian Ministry of Economic Policy Planning and Coordination, Italian Ministry of Health website, WHO, Italian Ministry of Agriculture, and ISTAT database | The risk of a pandemic is higher in some northern regions of Italy and the policy model developed can help policymakers make decisions | |
| [ | Estimate the remaining period before consuming the operational capacity of the hospital and its resources | Monte Carlo simulation, SIR model, and COVID-19 Hospital Impact Model (CHIME) | Statistical data | Academic health system for three hospitals in the Philadelphia region | The model can help in making proactive decisions |
Note: CT: chest computed tomography, CDC: center for disease control and prevention, COVAS: COVID-19 acuity score, CHIME: COVID-19 hospital impact model, C-SEIR: conscious-based susceptible exposed infected recovery, ED: emergency department, HERs: electronic health records, HROS-AD: heart rate over steps anomaly detection, ISTAT: Italian National Institute of Statistics, GPS: global positioning system, ICU: intensive care unit, LSTM: long short-term memory, IoT: internet of things, MoH: Ministry of health, MV: mechanical ventilation, N/A: not available, NY: New York, RHR-Diff: resting heart rate difference, SIDARTHE: susceptible (S), infected (I), diagnosed (D), ailing (A), recognized (R), threatened (T), healed (H) and extinct (E), SIR: susceptible-infected-recovered, θ-SEIHRD: susceptible exposed infectious hospitalized recovered dead, θ: is the fraction of detected infected people, US: United State, WHO: world health organization.
Most popular big data analytics tools.
| Tool | Description | Main Features | Availability | Reference |
|---|---|---|---|---|
| Apache Hadoop [ | Data storage and distributed processing. | Distributed | Open source | |
| IBM [ | IBM provides a variety of big data tools including: | Text Analytics | Commercial | |
| Amazon [ | Data analysis systems | Data Storage | Commercial | |
| Microsoft Azure [ | It is a big data platform that is cloud-based and used for developing, analyzing, installing, and managing applications. | It provides the following services: | Azure free account and get popular services free for 12 months. | |
| Qubole | It is an easy, open, and stable Data Lake Platform for machine learning, streaming, and ad-hoc analytics. | Platform that drives an ETL (extraction, transformation, and load): | Commercial | |
| HPCC | Tool that offers a framework for data processing with a single architecture. | Data integration and cluster management are easy. | Open source | |
| MapR | MapR supports all Hadoop APIs and Network File System (NFS). | Hadoop, Spark, and Apache Drill | Open source | |
| KNIME | Data Mining | Build and visual workflows. | Open Source | |
| Datameer | Integrate data with different engines. | Datameer Spotlight combines virtual data management and easy modeling tools. | Commercial |
Data storage and management.
| Data Storage | Description | Website |
|---|---|---|
| Cloudera | It extends the Hadoop with extra services | |
| Apache Cassandra | Distributed database management system, multiple servers | |
| Chukwa | Hadoop distributed file system (HDFS) | |
| Apache HBase | Hadoop distributed file system (HDFS) | |
| MongoDB | Document-oriented database | |
| Neo4j | java—graph database | |
| CouchDB | Globally distributed server-clusters | |
| Terrastore | Distributed Database Management System (DBMS) that provides per-document consistency guarantees | |
| HibariDB | Hibari is a distributed, ordered key-value store | |
| Riak | NoSQL database, cloud storage |
Figure 2Type and source of medical data.
Figure 3COVID-19 data distribution in the reviewed studies.
Demographics, social, activity, and travel data found in the reviewed studies.
| Data Category | Data Type | Studies |
|---|---|---|
| Demographics data | Gender | [ |
| Age | [ | |
| Height | [ | |
| Weight | [ | |
| Body mass index (BMI) | [ | |
| Language | [ | |
| Race | [ | |
| Ethnicity | [ | |
| Nationality | [ | |
| Religion | [ | |
| Marital status | [ | |
| Median income | [ | |
| Zip code/postal code | [ | |
| Location/geolocation | [ | |
| Region | [ | |
| Insurance | [ | |
| Job/educational institute | [ | |
| Number of family members | [ | |
| Social data | Social stressors | [ |
| Activity data | Steps | [ |
| Sleep | [ | |
| Heart rate | [ | |
| Home-quarantine activities | [ | |
| Travel Data | Recent outside travel history | [ |
| Outside destinations | [ |
Medical, COVID-19, samples, statistical, and environmental data found in the reviewed studies.
| Data Category | Data Type | Studies |
|---|---|---|
| Medical data | Vital signs | [ |
| Symptoms | [ | |
| Comorbidities | [ | |
| Medical history | [ | |
| Routinely taken medications | [ | |
| Laboratory findings | [ | |
| CT scans | [ | |
| Required ICU | [ | |
| ICU length of stay | [ | |
| Readmission status | [ | |
| COVID-19 data | Number of cases and status | [ |
| Test date | [ | |
| Results (laboratory, outcome) | [ | |
| Symptom onset date | [ | |
| Incubation periods | [ | |
| Treatment measures | [ | |
| Infection feels | [ | |
| Samples | Throat swabs | [ |
| Blood samples | [ | |
| Aerosol and surface samples | [ | |
| Statistical data | Healthcare visits | [ |
| Hospital capability and utilization | [ | |
| Known regional injuries | [ | |
| Percentages related to ICU | [ | |
| Future daily admissions | [ | |
| Percentage of inpatients requiring MV | [ | |
| ICU lengths of stay | [ | |
| Duration of MV | [ | |
| App satisfaction assessment | [ | |
| Hospital market share | [ | |
| Population age and size | [ | |
| Environmental data | Epidemiological data | [ |
| Air pollution | [ | |
| Winter temperature | [ | |
| Healthcare density | [ | |
| Human mobility | [ | |
| Housing concentration | [ |
Note: ICU: intensive care unit, MV: mechanical ventilation.
Summary of vital signs and outwardly measurable symptoms considered by the existing studies.
| Data Category | Data Type | Studies |
|---|---|---|
| Vital signs | Temperature | [ |
| Heart rate | [ | |
| Respiratory rate | [ | |
| Blood pressure systolic | [ | |
| Blood pressure diastolic | [ | |
| Oxygen saturation | [ | |
| Symptoms | Fever | [ |
| Shortness of breath | [ | |
| Respiratory crackles | [ | |
| Wheezing | [ | |
| Rhonchus | [ | |
| Chest pain | [ | |
| Cough | [ | |
| Sneezing | [ | |
| Chills | [ | |
| Nasal congestion/runny nose | [ | |
| Ageusia/Anosmia (lack of smell and taste) | [ | |
| Headache | [ | |
| Sore throat | [ | |
| Dysphagia | [ | |
| Sputum production | [ | |
| Fatigue/lack of energy | [ | |
| Muscle aches | [ | |
| Diarrhea | [ | |
| Vomiting | [ | |
| Loss of appetite | [ | |
| Trouble sleeping | [ | |
| Stomach pain | [ | |
| Rash | [ | |
| Neuralgia | [ |
Figure 4Vital signs’ distribution in the reviewed studies.
Figure 5Symptoms’ distribution in the reviewed studies.