| Literature DB >> 35394862 |
Estee Y Cramer1, Evan L Ray1, Velma K Lopez2, Johannes Bracher3,4, Andrea Brennen5, Alvaro J Castro Rivadeneira1, Aaron Gerding1, Tilmann Gneiting4,6, Katie H House1, Yuxin Huang1, Dasuni Jayawardena1, Abdul H Kanji1, Ayush Khandelwal1, Khoa Le1, Anja Mühlemann7, Jarad Niemi8, Apurv Shah1, Ariane Stark1, Yijin Wang1, Nutcha Wattanachit1, Martha W Zorn1, Youyang Gu9, Sansiddh Jain10, Nayana Bannur10, Ayush Deva10, Mihir Kulkarni10, Srujana Merugu10, Alpan Raval10, Siddhant Shingi10, Avtansh Tiwari10, Jerome White10, Neil F Abernethy11, Spencer Woody12, Maytal Dahan13, Spencer Fox12, Kelly Gaither13, Michael Lachmann14, Lauren Ancel Meyers12, James G Scott15, Mauricio Tec16, Ajitesh Srivastava17, Glover E George18, Jeffrey C Cegan19, Ian D Dettwiller18, William P England18, Matthew W Farthing18, Robert H Hunter18, Brandon Lafferty18, Igor Linkov19, Michael L Mayo18, Matthew D Parno20, Michael A Rowland18, Benjamin D Trump19, Yanli Zhang-James21, Samuel Chen22, Stephen V Faraone21, Jonathan Hess21, Christopher P Morley23, Asif Salekin24, Dongliang Wang23, Sabrina M Corsetti25, Thomas M Baer26, Marisa C Eisenberg27,28,29, Karl Falb25, Yitao Huang25, Emily T Martin29, Ella McCauley25, Robert L Myers25, Tom Schwarz25, Daniel Sheldon30, Graham Casey Gibson31, Rose Yu32,33, Liyao Gao34, Yian Ma35, Dongxia Wu32, Xifeng Yan36, Xiaoyong Jin36, Yu-Xiang Wang36, YangQuan Chen37, Lihong Guo38, Yanting Zhao39, Quanquan Gu40, Jinghui Chen40, Lingxiao Wang40, Pan Xu40, Weitong Zhang40, Difan Zou40, Hannah Biegel41, Joceline Lega41, Steve McConnell42, V P Nagraj43, Stephanie L Guertin43, Christopher Hulme-Lowe44, Stephen D Turner43, Yunfeng Shi45, Xuegang Ban46, Robert Walraven47, Qi-Jun Hong48,49, Stanley Kong50, Axel van de Walle49, James A Turtle51, Michal Ben-Nun51, Steven Riley52, Pete Riley51, Ugur Koyluoglu53, David DesRoches54, Pedro Forli55, Bruce Hamory56, Christina Kyriakides57, Helen Leis58, John Milliken53, Michael Moloney53, James Morgan53, Ninad Nirgudkar59, Gokce Ozcan53, Noah Piwonka58, Matt Ravi59, Chris Schrader58, Elizabeth Shakhnovich58, Daniel Siegel53, Ryan Spatz59, Chris Stiefeling60, Barrie Wilkinson61, Alexander Wong57, Sean Cavany62, Guido España62, Sean Moore62, Rachel Oidtman62,63, Alex Perkins62, David Kraus64, Andrea Kraus64, Zhifeng Gao65, Jiang Bian65, Wei Cao65, Juan Lavista Ferres65, Chaozhuo Li65, Tie-Yan Liu65, Xing Xie65, Shun Zhang65, Shun Zheng65, Alessandro Vespignani66,67, Matteo Chinazzi67, Jessica T Davis67, Kunpeng Mu67, Ana Pastore Y Piontti67, Xinyue Xiong67, Andrew Zheng68, Jackie Baek68, Vivek Farias69, Andreea Georgescu68, Retsef Levi69, Deeksha Sinha68, Joshua Wilde68, Georgia Perakis70, Mohammed Amine Bennouna70, David Nze-Ndong70, Divya Singhvi71, Ioannis Spantidakis70, Leann Thayaparan70, Asterios Tsiourvas70, Arnab Sarker72, Ali Jadbabaie72, Devavrat Shah72, Nicolas Della Penna73, Leo A Celi73, Saketh Sundar74, Russ Wolfinger75, Dave Osthus76, Lauren Castro77, Geoffrey Fairchild77, Isaac Michaud76, Dean Karlen78,79, Matt Kinsey80, Luke C Mullany80, Kaitlin Rainwater-Lovett80, Lauren Shin80, Katharine Tallaksen80, Shelby Wilson80, Elizabeth C Lee81, Juan Dent81, Kyra H Grantz81, Alison L Hill82, Joshua Kaminsky81, Kathryn Kaminsky83, Lindsay T Keegan84, Stephen A Lauer81, Joseph C Lemaitre85, Justin Lessler81, Hannah R Meredith81, Javier Perez-Saez81, Sam Shah86, Claire P Smith81, Shaun A Truelove81,87,88, Josh Wills86, Maximilian Marshall89, Lauren Gardner89, Kristen Nixon89, John C Burant90, Lily Wang8, Lei Gao91, Zhiling Gu8, Myungjin Kim8, Xinyi Li92, Guannan Wang93, Yueying Wang8, Shan Yu94, Robert C Reiner95, Ryan Barber95, Emmanuela Gakidou95, Simon I Hay95, Steve Lim95, Chris Murray95, David Pigott95, Heidi L Gurung96, Prasith Baccam96, Steven A Stage97, Bradley T Suchoski96, B Aditya Prakash98, Bijaya Adhikari99, Jiaming Cui98, Alexander Rodríguez98, Anika Tabassum100, Jiajia Xie98, Pinar Keskinocak101, John Asplund102, Arden Baxter101, Buse Eylul Oruc101, Nicoleta Serban101, Sercan O Arik103, Mike Dusenberry103, Arkady Epshteyn103, Elli Kanal103, Long T Le103, Chun-Liang Li103, Tomas Pfister103, Dario Sava103, Rajarishi Sinha103, Thomas Tsai104, Nate Yoder103, Jinsung Yoon103, Leyou Zhang103, Sam Abbott105, Nikos I Bosse105, Sebastian Funk105, Joel Hellewell105, Sophie R Meakin105, Katharine Sherratt105, Mingyuan Zhou106, Rahi Kalantari107, Teresa K Yamana108, Sen Pei108, Jeffrey Shaman108, Michael L Li68, Dimitris Bertsimas69, Omar Skali Lami68, Saksham Soni68, Hamza Tazi Bouardi68, Turgay Ayer101,109, Madeline Adee110, Jagpreet Chhatwal110, Ozden O Dalgic111, Mary A Ladd110, Benjamin P Linas112, Peter Mueller110, Jade Xiao101, Yuanjia Wang113,114, Qinxia Wang113, Shanghong Xie113, Donglin Zeng115, Alden Green116, Jacob Bien117, Logan Brooks116, Addison J Hu116, Maria Jahja116, Daniel McDonald118, Balasubramanian Narasimhan119,120, Collin Politsch121, Samyak Rajanala120, Aaron Rumack121, Noah Simon122, Ryan J Tibshirani116, Rob Tibshirani120, Valerie Ventura116, Larry Wasserman116, Eamon B O'Dea123, John M Drake123, Robert Pagano124, Quoc T Tran125, Lam Si Tung Ho126, Huong Huynh127, Jo W Walker2, Rachel B Slayton2, Michael A Johansson2, Matthew Biggerstaff2, Nicholas G Reich1.
Abstract
Short-term probabilistic forecasts of the trajectory of the COVID-19 pandemic in the United States have served as a visible and important communication channel between the scientific modeling community and both the general public and decision-makers. Forecasting models provide specific, quantitative, and evaluable predictions that inform short-term decisions such as healthcare staffing needs, school closures, and allocation of medical supplies. Starting in April 2020, the US COVID-19 Forecast Hub (https://covid19forecasthub.org/) collected, disseminated, and synthesized tens of millions of specific predictions from more than 90 different academic, industry, and independent research groups. A multimodel ensemble forecast that combined predictions from dozens of groups every week provided the most consistently accurate probabilistic forecasts of incident deaths due to COVID-19 at the state and national level from April 2020 through October 2021. The performance of 27 individual models that submitted complete forecasts of COVID-19 deaths consistently throughout this year showed high variability in forecast skill across time, geospatial units, and forecast horizons. Two-thirds of the models evaluated showed better accuracy than a naïve baseline model. Forecast accuracy degraded as models made predictions further into the future, with probabilistic error at a 20-wk horizon three to five times larger than when predicting at a 1-wk horizon. This project underscores the role that collaboration and active coordination between governmental public-health agencies, academic modeling teams, and industry partners can play in developing modern modeling capabilities to support local, state, and federal response to outbreaks.Entities:
Keywords: COVID-19; ensemble forecast; forecasting; model evaluation
Mesh:
Year: 2022 PMID: 35394862 PMCID: PMC9169655 DOI: 10.1073/pnas.2113561119
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 12.779
Fig. 1.Overview of the evaluation period included in the paper. Vertical dashed lines indicate “phases” of the pandemic analyzed separately in . (A) The reported number of incident weekly COVID-19 deaths by state or territory, per JHU CSSE reports. Locations are sorted by the cumulative number of deaths as of October 30th, 2021. (B) The time series of weekly incident deaths at the national level overlaid with example forecasts from the COVID-19 Forecast Hub ensemble model. (C) The number of models submitting forecasts for incident deaths each week. Weeks in which the ensemble was submitted are shown with a red asterisk.
Summary accuracy metrics for all submitted forecasts from 28 models meeting inclusion criteria, aggregated across locations (50 states only), submission week, and 1- through 4-wk forecast horizons
| Model | No. forecasts | 95% PI Coverage | 50% PI Coverage | Relative WIS | Relative MAE |
|---|---|---|---|---|---|
| BPagano-RtDriven | 10,864 | 0.72 | 0.36 | 0.77 | 0.80 |
| CEID-Walk | 12,161 | 0.78 |
| 1.00 | 1.03 |
| CMU-TimeSeries | 10,456 | 0.77 | 0.42 | 0.78 | 0.80 |
| Covid19Sim-Simulator | 11,770 | 0.34 | 0.11 | 1.02 | 0.85 |
| CovidAnalytics-DELPHI | 11,064 | 0.82 |
| 0.99 | 1.01 |
| COVIDhub-baseline | 15,460 | 0.88 |
| 1.00 | 1.00 |
| COVIDhub-ensemble | 14,260 |
|
|
|
|
| CU-select | 13,710 | 0.72 | 0.43 | 0.92 | 0.89 |
| DDS-NBDS | 12,261 | 0.86 | 0.43 | 1.25 | 2.19 |
| epiforecasts-ensemble1 | 12,204 | 0.87 |
| 3.17 | 2.74 |
| GT-DeepCOVID | 13,585 | 0.84 | 0.41 | 0.75 | 0.82 |
| IHME-SEIR | 11,116 | 0.59 | 0.25 | 0.79 | 0.82 |
| JHU_CSSE-DECOM | 10,190 | 0.80 | 0.35 | 0.75 | 0.80 |
| JHU_IDD-CovidSP | 14,170 | 0.82 | 0.33 | 0.99 | 1.04 |
| JHUAPL-Bucky | 11,664 | 0.63 | 0.29 | 1.05 | 1.06 |
| Karlen-pypm | 13,060 | 0.86 |
| 0.64 | 0.70 |
| LANL-GrowthRate | 13,560 | 0.83 | 0.38 | 0.85 | 0.91 |
| MOBS-GLEAM_COVID | 15,452 | 0.71 | 0.37 | 0.77 | 0.78 |
| OliverWyman-Navigator | 10,548 | 0.82 |
| 0.72 | 0.76 |
| PSI-DRAFT | 13,209 | 0.34 | 0.15 | 1.51 | 1.27 |
| RobertWalraven-ESG | 13,430 | 0.51 | 0.28 | 1.13 | 0.97 |
| SteveMcConnell-CovidComplete | 12,063 | 0.8 |
| 0.74 | 0.77 |
| UA-EpiCovDA | 13,710 | 0.72 | 0.41 | 0.98 | 0.94 |
| UCLA-SuEIR | 10,549 | 0.31 | 0.09 | 1.37 | 1.21 |
| UCSD_NEU-DeepGLEAM | 11,664 |
| 0.7 | 0.83 | 0.78 |
| UMass-MechBayes | 14,660 |
| 0.56 | 0.63 | 0.67 |
| UMich-RidgeTfReg | 11,394 | 0.63 | 0.34 | 1.18 | 1.08 |
| USC-SI_kJalpha | 9,660 | 0.52 | 0.22 | 0.75 | 0.72 |
The “No. forecasts” column refers to the number of individual location/target/week combinations. Empirical prediction interval (PI) coverage rates calculate the fraction of times the 50% or 95% PIs covered the eventually observed value. Values within 5% coverage of the nominal rates are highlighted in boldface text. The “relative WIS” and “relative MAE” columns show the relative mean WIS and relative MAE, which compare each model to the baseline model while adjusting for the difficulty of the forecasts the given model made for state-level forecasts (see Methods). The baseline model is defined to have a relative score of 1. Models with relative WIS or MAE values lower than 1 had “better” accuracy relative to the baseline model (best score in bold).
Fig. 2.A comparison of each model’s distribution of standardized rank of WIS for each location/target/week observation. A standardized rank of 1 indicates that the model had the best WIS for that particular location, target, and week, and a value of 0 indicates it had the worst WIS. The density plots show interpolated distributions of the standardized ranks achieved by each model for every observation that model forecasted. The quartiles of each model’s distribution of standardized ranks are shown in different colors: yellow indicates the top quarter of the distribution and purple indicates the bottom quarter of the distribution. The models are ordered by the first quartile of the distribution, with models that rarely had a low rank near the top.
Fig. 3.Average WIS by the target forecasted week for each model across all 50 states. A shows the observed weekly COVID-19 deaths based on the CSSE-reported data as of May 25, 2021. B shows the average 1-wk-ahead WIS values per model (in gray). For all 21 wk in which the ensemble model (red triangle) is present, this model has lower WIS values than the baseline model (green square) and the average score of all models (blue circle). C shows the average 4-wk-ahead WIS values per model (in gray). For all 21 wk in which the ensemble model (red triangle) is present, this model has lower WIS values than the baseline model (green square) and the average score of all models (blue circle). The y-axes are truncated in B and C for readability of the majority of the data.
Fig. 4.Forecasts for selected states and pandemic waves, with PIs coverage. The first column shows every 1- and 4-wk-ahead forecast with 95% PIs made by the ensemble during the selected evaluation period. The second and third columns of plots show evaluations of PIs across 1- through 4-wk horizons (x-axis). The red line with triangle points corresponds to the coverage rates of the COVIDhub-ensemble forecasts, and green squares refer to the COVIDhub-baseline model. The boxplots represent the distribution of coverage rates from all component models. The second column evaluates only forecasts made for the dates shown in the first column. The third column evaluates forecasts across all weeks in the evaluation period. In the last two columns, the expected coverage rate (95%) is shown by the dashed line.
Fig. 5.Relative WIS by location for each model across all horizons and submission weeks. The value in each box represents the relative WIS calculated from 1- to 4-wk-ahead targets available for a model at each location. Boxes are colored based on the relative WIS compared to the baseline model. Blue boxes represent teams that outperformed the baseline, and red boxes represent teams that performed worse than the baseline. Locations are sorted by cumulative deaths as of the end of the evaluation period (October 30, 2021). Teams are listed on the horizontal axis in order from the lowest to highest relative WIS values (Table 1).