MichaelC Lucic1, Hakim Ghazzai1, Carlo Lipizzi1, Yehia Massoud2. 1. Stevens Institute of Technology Hoboken NJ 07030 USA. 2. Computer, Electrical and Mathematical Sciences and Engineering DivisionKing Abdullah University of Science and Technology Thuwal 23955-6900 Saudi Arabia.
Abstract
Goal: The United States (US) is currently one of the countries hardest-hit by the novel SARS-CoV-19 virus. One key difficulty in managing the outbreak at the national level is that due to the US' diversity, geographic spread, and economic inequality, the COVID-19 pandemic in the US acts more as a series of diverse regional outbreaks rather than a synchronized homogeneous one. Method: In order to determine how to assess regional risk related to COVID-19, a two-phase modeling approach is developed while considering demographic and economic criteria. First, an unsupervised clustering technique, specifically [Formula: see text]-means, is employed to group US counties based on demographic and economic similarities. Then, time series forecasting of each cluster of counties is developed to assess the short-run viral transmissibility risk. Results: To this end, we test ARIMA and Seasonal Trend Random Walk forecasts to determine which is more appropriate for modeling the spread and lethality of COVID-19. From our analysis, we then utilize the superior ARIMA models to forecast future COVID-19 trends in the clusters, and present the areas in the US which have the highest COVID-19 related risk heading into the winter of 2020. Conclusion: Including sub-national socioeconomic characteristics to data-driven COVID-19 infection and fatality forecasts may play a key role in assessing the risk associated with changes in infection patterns at the national level.
Goal: The United States (US) is currently one of the countries hardest-hit by the novel SARS-CoV-19 virus. One key difficulty in managing the outbreak at the national level is that due to the US' diversity, geographic spread, and economic inequality, the COVID-19 pandemic in the US acts more as a series of diverse regional outbreaks rather than a synchronized homogeneous one. Method: In order to determine how to assess regional risk related to COVID-19, a two-phase modeling approach is developed while considering demographic and economic criteria. First, an unsupervised clustering technique, specifically [Formula: see text]-means, is employed to group US counties based on demographic and economic similarities. Then, time series forecasting of each cluster of counties is developed to assess the short-run viral transmissibility risk. Results: To this end, we test ARIMA and Seasonal Trend Random Walk forecasts to determine which is more appropriate for modeling the spread and lethality of COVID-19. From our analysis, we then utilize the superior ARIMA models to forecast future COVID-19 trends in the clusters, and present the areas in the US which have the highest COVID-19 related risk heading into the winter of 2020. Conclusion: Including sub-national socioeconomic characteristics to data-driven COVID-19 infection and fatality forecasts may play a key role in assessing the risk associated with changes in infection patterns at the national level.
Entities:
Keywords:
ARIMA; COVID-19; [Formula: see text]-means clustering; data analytics; time series analysis
Authors: Robert Verity; Lucy C Okell; Ilaria Dorigatti; Peter Winskill; Charles Whittaker; Natsuko Imai; Gina Cuomo-Dannenburg; Hayley Thompson; Patrick G T Walker; Han Fu; Amy Dighe; Jamie T Griffin; Marc Baguelin; Sangeeta Bhatia; Adhiratha Boonyasiri; Anne Cori; Zulma Cucunubá; Rich FitzJohn; Katy Gaythorpe; Will Green; Arran Hamlet; Wes Hinsley; Daniel Laydon; Gemma Nedjati-Gilani; Steven Riley; Sabine van Elsland; Erik Volz; Haowei Wang; Yuanrong Wang; Xiaoyue Xi; Christl A Donnelly; Azra C Ghani; Neil M Ferguson Journal: Lancet Infect Dis Date: 2020-03-30 Impact factor: 25.071