| Literature DB >> 34305251 |
Saloni Shah1, Aos Mulahuwaish1, Kayhan Zrar Ghafoor2,3, Halgurd S Maghdid4.
Abstract
Since the initial reports of the Coronavirus surfacing in Wuhan, China, the novel virus currently without a cure has spread like wildfire across the globe, the virus spread exponentially across all inhabited continent, catching local governments by surprise in many cases and bringing the world economy to a standstill. As local authorities work on a response to deal with the virus, the scientific community has stepped in to help analyze and predict the pattern and conditions that would influence the spread of this unforgiving virus. Using existing statistical modeling tools to the latest artificial intelligence technology, the scientific community has used public and privately available data to help with predictions. A lot of this data research has enabled local authorities to plan their response-whether that is to deploy tightly available medical resources like ventilators or how and when to enforce policies to social distance, including lockdowns. On the one hand, this paper shows what accuracy of research brings to enable fighting this disease; while on the other hand, it also shows what lack of response from local authorities can do in spreading this virus. This is our attempt to compile different research methods and comparing their accuracy in predicting the spread of COVID-19.Entities:
Keywords: COVID-19; Deep learning; Machine learning; Prediction methods
Year: 2021 PMID: 34305251 PMCID: PMC8285044 DOI: 10.1007/s10462-021-09988-w
Source DB: PubMed Journal: Artif Intell Rev ISSN: 0269-2821 Impact factor: 9.588
Fig. 1Snapshot of the Johns Hopkins University, Coronavirus Resource Centre (June 16, 2020)
Comparison of existing prediction models for COVID-19 spreading
| Authors | Objective | Method/Model used | Dataset used | Output and accuracy | Weakness |
|---|---|---|---|---|---|
| Li Yan et al. ( | Focused on using biomarkers, obtained via blood samples, to be able to predict severe COVID-19 cases that result in higher risk of mortality | Supervised XGBoost classifier machine learning-based model (decision-tree-based) | Blood samples from 485 infected patients in the region of Wuhan, China (Jan 10–Feb 18, 2020) | The model can predict the mortality rate for patients more than 10 days in advance with more than 90% accuracy | Since the method is dependent on data, the model will vary when using different datasets. Single-centered, retrospective study lacking large-sample, multi-centered study |
| Elmousalami and Hassanien ( | Use time series model to analyze and predict spread of COVID-19. Using existing datasets from renowned sources as John Hopkins university | Time series models (moving average (MA), weighted moving average (WMA), and single exponential smoothing (SES)) and mathematical formulations | WHO, the national health commission of China and Johns Hopkins University developed open database for the COVID-19 cases | Day-level forecasting models on COVID-19 using time series models and mathematical forecasting | Depends upon data available. Forecasting may miss underreporting of data |
| Tomar and Gupta ( | Using data-driven estimation models, predict rate of infection of COVID-19 in India 30 days ahead. Also predict impact of preventive measures like social distancing on the infection rate | LSTM based technique used with the MATLAB environment | Indian Govt. COVID-19 Dashboard database (April 30–Jan 4, 2020) | Number of recovery days, effect of transmission rate on the number of cases, effect of transmission rate with social distancing observed | Models are based on limited data availability impacting the accuracy |
| Zhao et al. ( | Use the aforementioned models to analyze spread of COVID-19 in regions, depending up how local authorities intervene and what policies they adopt to curb the spread of this pandemic | Maximum-Hasting (MH) parameter estimation method and the modified Susceptible Exposed Infectious Recovered (SEIR) model | Data released by the Johns Hopkins University | Classify six studied African nations into three categories: suppression, mitigation, or mildness. Pretty accurate categorization of nations | Assumes intervention intensity of studied nations at a fraction of comparison model (China). Model may not be able to predict rate of growth, in case suggested interventions are not carried out (in time predictions) |
| Yang et al. ( | To predict the probability of epidemic, its peak and more importantly what would be impact of intervention measures in China. Also attempt to predict the impact of delaying intervention leading into second outbreak / peak | Susceptible-Exposed-Infectious-Removed (SEIR) and Long Short- Term Memory (LSTM) models | Integrated population migration data before and after January 23 (inbound and outbound events by rail, air and road traffic, were sourced from a web-based program) and most updated COVID-19 epidemiological data (National Health Commission of China) | The models used, predict the trend for spread of COVID-19 with reasonable confidence in mainland China and also show promise for future prediction of the epidemic | The accuracy of the models will depend on the implementations of control measures |
| Zou et al. ( | Propose a new model that takes into account untested or unreported cases while predicting rate of cases (active or deaths) of COVID-19 infection | UCLA-SuEIR (Susceptible, unreported, Exposed, Infectious and Recovered) | New York Times COVID-19-data and Johns Hopkins University Center for Systems Science and Engineering data | Provide projections of the number of infections and deaths, and predict peak dates of active cases | The biggest challenge to substantiate the findings of this new model will be data, as it is not reported |
| Hamzaha, et al. ( | Predictive analysis using the SEIR model and Sentiment analysis of verified news into positive and negative news | Susceptible-Exposed-Infectious-Removed (SEIR) and Bidirectional Encoder Representations from Transformers (BERT) | John Hopkins, WHO, local Chinese website—DingXiangYuan | Reflected data on a website, using standard technique. Good from a visual standpoint | No new approach and sentiment analysis may not produce accurate results all the time |
| Fanelli and Piazza ( | A prediction model for maximum number of infected individuals along with timing of the peak. Using simple quantitative models, show how containment efforts can help in reducing the spread | Mean-field approximation in modified Susceptible-Infectious-Recovered-Deceased (SIRD) model | GitHub repository associated with the interactive dashboard hosted by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University, Baltimore, USA | Provide estimates for the time and magnitude of the epidemic peak | The key drawback of the model is that it assumes standard conditions and fails to track with rapid recovery (decrease in number of infected cases) and overestimates the number of deaths when extreme measures like social distancing are used |
| Sajadi et al. (Mohammad | To study climate data with the intent to establish correlation in regions that have similar climate setup and come up with a model to predict possible new locations based on similarity of climate with the current COVID-19 hotspots | ERA-5 reanalysis of the data—then compared to areas that are either not affected, or do not have significant community spread—Eventual statistical analysis done with produced maps using Graph Pad Prism | Examined climate data from cities (globally) with significant community spread of COVID-19 | The analysis shows a statistically significant association between temperature and specific humidity for areas that are significantly and less significantly affected by the pandemic | Lack of human factors like intervention, climatic factors like cloud cover, and viral factor like mutation of the virus which can lead to unpredictability of the model |
| Mollalo et al. ( | Using spatial models coupled with multiple socio-economic data factors to try explaining variation of the spread of COVID-19 in the Unites States (USA) based on geographic modeling | 5 Models based on spatial analysis technique (Global—OLS, SLM, SEM and Local—GWR, MGWR) | County-level counts of COVID-19 cases retrieved from USAFacts (Jan 22–April 9, 2020). Crude incidence rates were computed for the counties and joined to the administrative boundary shapefile of counties obtained from the TIGER/ Line database | GIS models showing the spread of COVID-19 | Model couldn't include county level data, impacting accuracy of predictions. Model also cannot show the impact of lockdown procedures on spread |
| Kuchler et al. ( | Using social connectedness index, establish correlation and predict the spread of COVID-19 between socially connected people in such areas | Analysis of Connectedness Index (SCI) introduced by Bailey et al. ( | Aggregated (anonymized) data from Facebook | Heatmap of socially connected folks in selected hotspots and prediction on spread of virus based on social network | Model may use inaccurate or incomplete data since people do tend to input incorrect information or may have opted out thus reducing the accuracy of the aforementioned index |
| Maghdid et al. ( | A smartphone-based contact tracing approach. It focuses on notifying people who may have been in contact of a positive COVID-19 person. It helps local authorities on lockdown management (area and duration to lockdown) | Unsupervised Machine Learning (UML)—K means clustering algorithm | Customized dataset part of the developed solution storing following information: name, zip code, age, phone number, MAC address of smartphone, gender, COVID-19 status | Smartphone app dashboard with individual alerts and GIS enabled map of possible COVID hotspots. A web portal system dashboard tracing possible impacted user | Accuracy of the tool depends upon registration of users. The more users register for the tool, the higher the accuracy. Research uses Android based application. Not sure if one is developed for iOS given the popularity of Apple smartphones. Privacy concerns given the type of information the app stores |
| Kufel et al. ( | Auto-regressive Integrated Moving Average (ARIMA) model for predicting the dynamics of COVID-19 incidence at different stages of the epidemic | Auto-regressive Integrated Moving Average | Johns Hopkins University Center for Systems Science and Engineering database | Forecast growth during 6 selected sub-periods, of epidemic in 32 European countries | Model is most probably beneficial for short-term forecasts. Didn’t factor or address the role of non-pharma interventions and population testing policies |
| Tuli et al. ( | Compare data forecast between the Generalized Inverse Weibull and Gaussian Distribution | Generalized Inverse Weibull (GIW) distribution | WHO database | An iteratively weighted curve fitting model using the GIW distribution (called” Robust Weibull”). Was able to improve the accuracy of predictions over the Gaussian model | Didn’t factor in population density, age, intervention methods by government in the regression model |
| Tuli et al. ( | Developing and using to forecast a LSTM model based on Generalized Inverse Weibull (GIW) distribution | LSTM-based Robust Weibull approach (W-LSTM) | Epidemic data from European Centre for Disease Prevention and Control (ECDC). Socio-economic data from IndexMundi and World Bank. Virus data from Biorxiv | Forecast model integrating LSTM and Generalized Inverse Weibull (GIW) distribution. Demonstrated better results over ARIMA and other traditional ML based models | No obvious weaknesses suggested by the authors |
| Zheng et al. ( | To forecast the Inflection point of new cases in select countries using existing data | State transition matrix (STM) modeling | National Health Commission of China | A scenario-based forecast model to forecast different inflection points during the spread of the virus globally | Accuracy of forecast was limited due to not enough data |
| Shahid et al. ( | To compare and assess best predictive model amongst autoregressive integrated moving average (ARIMA), support vector regression (SVR), long shot term memory (LSTM), Gated recurrent units (GRU), bidirectional long short term memory (Bi-LSTM) | ARIMA, SVR, LSTM, Bi-LSTM, GRU | Harvard dataset Dataset taken from the link: | Bi-LSTM showed the best results using performance measures like MAE, RMSE, R2_score | No obvious weaknesses suggested by the authors |
Fig. 2Categorization of prediction models based on machine learning and mathematical models
Fig. 3Big data and its applications for fighting COVID-19 pandemic
Fig. 4Basic structure of LSTM
Fig. 5Illustration of the SuEIR model. Solid lines represent the transitions of individuals and dashed lines represent the routes of infection
Fig. 6a Infection rate data comparison, b Infection trend comparison
Fig. 7The results of lockdown prediction model for two different scenarios
Fig. 8AI and technologies to confront COVID-19 pandemic