| Literature DB >> 35013654 |
Joseph Galasso1, Duy M Cao2, Robert Hochberg3.
Abstract
During the COVID-19 pandemic, predicting case spikes at the local level is important for a precise, targeted public health response and is generally done with compartmental models. The performance of compartmental models is highly dependent on the accuracy of their assumptions about disease dynamics within a population; thus, such models are susceptible to human error, unexpected events, or unknown characteristics of a novel infectious agent like COVID-19. We present a relatively non-parametric random forest model that forecasts the number of COVID-19 cases at the U.S. county level. Its most prioritized training features are derived from easily accessible, standard epidemiological data (i.e., regional test positivity rate) and the effective reproduction number ( R t ) from compartmental models. A novel input training feature is case projections generated by aligning estimated effective reproduction number (pre-computed by COVIDActNow.org) with real time testing data until maximally correlated, helping our model fit better to the epidemic's trajectory as ascertained by traditional models. Poor reliability of R t is partially mitigated with dynamic population mobility and prevalence and mortality of non-COVID-19 diseases to gauge population disease susceptibility. The model was used to generate forecasts for 1, 2, 3, and 4 weeks into the future for each reference week within 11/01/2020 - 01/10/2021 for 3068 counties. Over this time period, it maintained a mean absolute error (MAE) of less than 300 weekly cases/100,000 and consistently outperformed or performed comparably with gold-standard compartmental models. Furthermore, it holds great potential in ensemble modeling due to its potential for a more expansive training feature set while maintaining good performance and limited resource utilization.Entities:
Keywords: COVID-19; Compartmental model; Mobility; Random forest; US county
Year: 2022 PMID: 35013654 PMCID: PMC8731233 DOI: 10.1016/j.chaos.2021.111779
Source DB: PubMed Journal: Chaos Solitons Fractals ISSN: 0960-0779 Impact factor: 5.944
Raw training data sources and normalizations. Description of datasets, variables extracted, regional level, and applied normalization in the training dataset.
| Raw dataset source | Feature(s) | Geographic level | Transformation applied |
|---|---|---|---|
| Johns Hopkins University (JHU) CSSE | Weekly case increase | CCE-level | Rolling 7-day sum of case IR ( |
| Facebook.com | Daily mobility relative to average baseline, proportion of users staying in same location | CCE-level | Rolling 7-day mean |
| COVID Tracking Project | Daily tests increase, test positivity | State-level | Rolling 7-day mean of test IR ( |
| COVIDActNow.org | Daily estimated | CCE-level and state-level | None |
| Surgo Foundation [ | Metric that assesses CCE vulnerability to COVID-19, taking into account socioeconomic, epidemiological, and heatlhcare system risk factors | CCE-level | None |
| Institute for Health Metrics and Evaluation (IHME) | Infectious disease mortality rates (tuberculosis, AIDS, diarrheal disease, lower respiratory disease, meningitis, hepatitis) | CCE-level | None |
| IHME | Respiratory disease mortality rates (interstitial lung disease, asthma, coal pneumoconiosis, asbestosis, silicosis, pneumoconiosis, COPD, chronic respiratory disease, other pneumoconiosis, other respiratory diseases) | CCE-level | None |
| IHME | Mortality risk (0–5, 5–25, 25–45, 45–65, and 65–85 age groups), life expectancy | CCE-level | None |
| IHME | Diabetes prevalence rates | CCE-level | None |
| IHME | Obesity prevalence rates (combined male and female) | CCE-level | None |
| U.S. Census (2018 estimates) | Prevalence of African Americans, Native Americans, Hispanic Americans, Multiracial Americans, and individuals over 65 years of age | CCE-level | None |
Fig. 1and case prediction feature generation for a CCE. This procedure is repeated for 7-, 14-, 21- and 28-day forecasts. In Fig. 1A and Fig. 1B, for the CCE and its state are both separately considered; whichever achieves the highest Pearson correlation of any forward shift x that is <50 days (i.e. optimal shift) is used for the regression model in Fig. 1B. The extrapolation in Fig. 1E is calculated by linear regression models trained on the last 14 defined values of each curve; curves are extrapolated to the target end date (i.e. 7, 14, 21, or 28 days in the future). For Fig. 1F, curves in prediction time have forecast values relative to real time; thus, for 28-day forecasts, values are those forecast 28 days into the future.
Fig. 2time-series lag behind case time-series used to forecast cases in lag period. In Harris County, TX, when the CCE time-series from 05/22/2020 to 07/10/2020 has a maximum Pearson correlation with the CCE’s testing and population normalized case time-series for the same period when shifted forward 10 days (B); also, this correlation is higher than that obtained by any shift of the state time-series. Thus, the shifted CCE time-series is linearly regressed against cases (C). This model is applied to unshifted CCE time-series to generate a case-prediction time-series; the last 10 days of both these time-series are predictive of the next 10 days of cases beyond 07/10/2020.
Fig. 3Top 7 RF features for each forecast target. For each forecast target, RF feature permutation importances were averaged over all 11 epi weeks and the top 7 features are shown in the subfigures above along with their standard deviation as error bars.
Fig. 4Performance evaluation using MAE. The errors between projections and real number of incident cases were calculated using Eq. (3). The y-axes of the graphs have been limited so that all models can be visually compared.
Fig. 5Performance evaluation using . As with Fig. 4, the y-axes of the graphs have been limited. The proportions of variance between projections and observed values were evaluated using Eq. (2). We notice that there are large anomalies in the weekly of the Google_Harvard and JHUAPL_Bucky models after epi week 202051; however, for the sake of complete comparison, all weeks for all models are shown.