| Literature DB >> 34903655 |
Daniel J McDonald1, Jacob Bien2, Alden Green3, Addison J Hu3,4, Nat DeFries4, Sangwon Hyun2, Natalia L Oliveira3,4, James Sharpnack5, Jingjing Tang6, Robert Tibshirani7,8, Valérie Ventura3, Larry Wasserman3,4, Ryan J Tibshirani3,4.
Abstract
Short-term forecasts of traditional streams from public health reporting (such as cases, hospitalizations, and deaths) are a key input to public health decision-making during a pandemic. Since early 2020, our research group has worked with data partners to collect, curate, and make publicly available numerous real-time COVID-19 indicators, providing multiple views of pandemic activity in the United States. This paper studies the utility of five such indicators-derived from deidentified medical insurance claims, self-reported symptoms from online surveys, and COVID-related Google search activity-from a forecasting perspective. For each indicator, we ask whether its inclusion in an autoregressive (AR) model leads to improved predictive accuracy relative to the same model excluding it. Such an AR model, without external features, is already competitive with many top COVID-19 forecasting models in use today. Our analysis reveals that 1) inclusion of each of these five indicators improves on the overall predictive accuracy of the AR model; 2) predictive gains are in general most pronounced during times in which COVID cases are trending in "flat" or "down" directions; and 3) one indicator, based on Google searches, seems to be particularly helpful during "up" trends.Entities:
Keywords: COVID-19; digital surveillance; forecasting; hotspot prediction; time series
Mesh:
Year: 2021 PMID: 34903655 PMCID: PMC8713796 DOI: 10.1073/pnas.2111453118
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 12.779
Fig. 1.Revision behavior for two indicators in the HRR containing Charlotte, NC. Each colored line corresponds to the data as reported on a particular date (as of dates varying from 28 September through 19 October). (Left) The DV-CLI signal, which was regularly revised throughout the period, although the effects fade as we look farther back in time. (Right) In contrast, case rates reported by JHU CSSE (smoothed with a 7-d trailing average), which remain “as reported” on 28 September, with a spike toward the end of this period, until a major correction is made on 19 October, which brings this down and affects all prior data as well.
Summary of forecasting and hotspot prediction tasks considered in this paper
| Forecasting | Hotspot prediction | |
| Response variable | ||
| Geographic resolution | HRR | HRR |
| Forecast period | 9 June to 31 December 2020 | 16 June to 31 December 2020 |
| Model type | Quantile regression | Logistic regression |
| Evaluation metric | WIS | AUC |
Fig. 2.Forecast for the HRR containing New York City from an autoregressive model made on 15 October (vertical line). The fan displays 50, 80, and 95% intervals while the orange curve shows the median forecast. The black curve shows “finalized” data, as reported in May 2021.
Fig. 3.Main results for both tasks. (Left) Average WIS for each forecast model, over all forecast dates and all HRRs, divided by the average WIS achieved by a baseline model (a probabilistic version of the flat-line forecaster). (Right) Area under the curve for each hotspot prediction model, calculated over all prediction dates and all HRRs. Here and in all figures we abbreviate CTIS-CLI-in-community by CTIS-CLIIC.
Fig. 4.Histogram of the difference in WIS for the AR model and that for each indicator model, stratified by up, down, or flat period, measured in terms of case trends. Note that larger differences here are better for each indicator model. The y axis is on the log scale to emphasize tail behavior.
Fig. 5.Correlation of the difference in WIS with the leadingness of the indicator at the target date, stratified by up, down, or flat period.