Literature DB >> 35441083

Aggregating human judgment probabilistic predictions of COVID-19 transmission, burden, and preventative measures.

Allison Codi¹, Damon Luk¹, David Braun¹, Juan Cambeiro^2,3, Tamay Besiroglu^2,4, Eva Chen⁵, Luis Enrique Urtubey de Cèsaris⁵, Paolo Bocchini⁶, Thomas McAndrew¹.

Abstract

Aggregated human judgment forecasts for COVID-19 targets of public health importance are accurate, often outperforming computational models. Our work shows aggregated human judgment forecasts for infectious agents are timely, accurate, and adaptable, and can be used as tool to aid public health decision making during outbreaks.

Entities: Chemical

Year: 2022 PMID： 35441083 PMCID： PMC9016644

Source DB: PubMed Journal: ArXiv ISSN： 2331-8422

INTRODUCTION

Accurate forecasts of the trajectory of COVID-19 and preventative measures to reduce transmission of SARS-CoV-2 provide foresight that enables public health officials to mitigate the impact of the pandemic [1]. Mathematical models are the most often used tool to improve situational awareness [2]. However, most mathematical models rely on structured, reported surveillance data and often do not have access to community level transmission dynamics, data related to human behavior, or behavioral responses to policy changes. Human judgment has produced accurate forecasts of the evolution of an infectious agent for seasonal epidemics and pandemic events [3, 4]. Past work studying COVID-19 and human judgment has highlighted the ability of aggregate human judgment predictions to adapt to changing dynamics faster than mathematical models [5]. When human judgment forecasts have had lower accuracy than mathematical models, previous work has shown that combining the two improves performance over the mathematical model alone [6]. Human judgment predictions of an infectious agent are low-overhead, flexible, and supply rapid and adaptable forecasts to public health decision makers [4]. To best prepare for and prevent infectious disease outbreaks, health officials need quick, accurate, and adaptable forecasts [7]. We show evidence that supports human judgment aggregated probabilistic predictions meet these criteria for COVID-19 targets associated with transmission, burden, and preventative measures.

METHODS

Monthly surveys from Jan. 6, 2021 to Jun. 16, 2021 collected predictions from two human judgment forecasting platforms: Metaculus and Good Judgment Open (GJO) [8, 9]. Subscribers to both platforms were invited to participate via email solicitation. We included monthly forecasts of the pandemic in summary reports to aid real-time public health decision-making which contain a detailed list of human judgment predictions and the exact wording of each question posed to both crowds. [10]. Participants had approximately twelve days to provide probabilistic predictions one to three weeks ahead of time at the US national level for six targets of public health importance: (1) weekly incident cases, (2) hospitalizations, (3) deaths, (4) cumulative first and (5) full-dose vaccinations, and (6) prevalence of immunity evading variants. Participants could submit an initial prediction and revise their prediction as many times as they wished within the twelve-day period. Participants received feedback about the accuracy of their forecast via email when the ground truth was available. Individual forecasts submitted to Metaculus and GJO forecasting platforms were combined into an equally weighted linear pool called a consensus forecast. Consensus forecasts of incident cases and deaths were compared to the COVID-19 Forecasthub, an ensemble that combined up to 48 computational models between the months of Jan. 2021 and Jun. 2021 [11]. The date that forecasts were generated by human judgment and by computational models in the COVID-19 Forecasthub were chosen to be on average within 2 days of one another. For each target, we report the absolute error (AE), defined as a forecast median prediction minus the truth, and the percent error (PE) defined as the absolute error divided by the truth and multiplied by 100.

RESULTS

A total of 404 unique participants (71 Metaculus, 333 GJO) submitted probabilistic predictions across the 33 questions for the above six targets for a total of 2,021 unique forecasts (open access data set available here [12]). A participant was not required to answer all questions. The median consensus prediction for targets 1–5 had a mean PE of 39% in the first survey, 9% for survey 2, 13% for survey 3, and 11%, 26%, 9% for surveys 4 through 6. The largest PE was 73% for a prediction of incident cases that was submitted on survey 5 and smallest PE was 0.1% for a prediction of incident deaths that was submitted on survey 1. PE for the majority of targets decreased over time. The PE of the median consensus prediction was 58% (620,192 AE) for incident cases and 60% (49,201 AE) for incident hospitalizations in the first survey. Both targets reduced their PE to 15% (An AE of 13,803 for cases and an AE of 2,191 for hospitalizations) in the last survey. PE decreased from 18% to 2% (9,613,628 AE to 3,821,920 AE) for cumulative first-dose vaccinations and from 6.1% to 5.8% (3,745,157 AE to 9,236,130 AE) for cumulative full vaccinations between the initial and final surveys. The PE for median consensus predictions of incident deaths was on average 7% (451 mean AE across all six surveys) with a PE less than 0.5% for survey 1 and survey 4 (27 AE and 13 AE). The PE for variant prevalence was on average 57% (13 average AE) and the highest PE was 153% (14 AE) in survey 6. The median consensus prediction was closer to the truth than 62% of the 2,021 individual predictions. When subset to the six incident deaths targets, the consensus prediction was closer to the truth than 75% of individual predictions and in survey five the consensus median prediction of incident deaths was closer to the truth than all of the fifty-nine individual predictions. Compared to ensemble predictions made by the COVID-19 Forecasthub, the median consensus prediction generated by humans was closer to the truth for 3/6 predictions of incident cases and 4/6 predictions of incident deaths. For predictions of incident cases, the mean PE was 32.8% for the COVID-19 Forecasthub and 33.5% for aggregate human judgment. For incident deaths, the mean PE was 10% for the COVID-19 Forecasthub vs 7% for human judgment.

DISCUSSION

We show that (i) aggregate human judgment forecasts are frequently closer to the truth than individual forecasts, (ii) the accuracy of aggregate forecasts depends on the target, (iii) the accuracy of aggregate forecasts can improve over time, and (iv) aggregate human judgment can produce forecasts of incident cases and deaths with similar accuracy to an ensemble of computational models. We are limited by the small number of questions we asked, the short time span over which we surveyed the crowd, and the lack of a controlled environment in which to pose questions. Contrary to recent work that showed a crowd can produce more accurate forecasts for cases than deaths [5], we found aggregate median predictions of incident deaths were more accurate than predictions of incident cases. This may be because humans have the innate capacity to learn relationships between a set of evolving signals, such as incident cases, hospitalizations, and vaccinations, that are correlated with the target they aim to predict. The lack of signals and environmental cues related to questions about the prevalence of specific variants may be why these aggregate forecasts were inaccurate. The availability of environmental cues related to cases, deaths, and hospitalizations may explain why participants were able to learn over time, however more experimental work related to how humans incorporate data to make predictions should be explored.

5 in total

1. Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the United States.

Authors: Estee Y Cramer; Evan L Ray; Velma K Lopez; Johannes Bracher; Andrea Brennen; Alvaro J Castro Rivadeneira; Aaron Gerding; Tilmann Gneiting; Katie H House; Yuxin Huang; Dasuni Jayawardena; Abdul H Kanji; Ayush Khandelwal; Khoa Le; Anja Mühlemann; Jarad Niemi; Apurv Shah; Ariane Stark; Yijin Wang; Nutcha Wattanachit; Martha W Zorn; Youyang Gu; Sansiddh Jain; Nayana Bannur; Ayush Deva; Mihir Kulkarni; Srujana Merugu; Alpan Raval; Siddhant Shingi; Avtansh Tiwari; Jerome White; Neil F Abernethy; Spencer Woody; Maytal Dahan; Spencer Fox; Kelly Gaither; Michael Lachmann; Lauren Ancel Meyers; James G Scott; Mauricio Tec; Ajitesh Srivastava; Glover E George; Jeffrey C Cegan; Ian D Dettwiller; William P England; Matthew W Farthing; Robert H Hunter; Brandon Lafferty; Igor Linkov; Michael L Mayo; Matthew D Parno; Michael A Rowland; Benjamin D Trump; Yanli Zhang-James; Samuel Chen; Stephen V Faraone; Jonathan Hess; Christopher P Morley; Asif Salekin; Dongliang Wang; Sabrina M Corsetti; Thomas M Baer; Marisa C Eisenberg; Karl Falb; Yitao Huang; Emily T Martin; Ella McCauley; Robert L Myers; Tom Schwarz; Daniel Sheldon; Graham Casey Gibson; Rose Yu; Liyao Gao; Yian Ma; Dongxia Wu; Xifeng Yan; Xiaoyong Jin; Yu-Xiang Wang; YangQuan Chen; Lihong Guo; Yanting Zhao; Quanquan Gu; Jinghui Chen; Lingxiao Wang; Pan Xu; Weitong Zhang; Difan Zou; Hannah Biegel; Joceline Lega; Steve McConnell; V P Nagraj; Stephanie L Guertin; Christopher Hulme-Lowe; Stephen D Turner; Yunfeng Shi; Xuegang Ban; Robert Walraven; Qi-Jun Hong; Stanley Kong; Axel van de Walle; James A Turtle; Michal Ben-Nun; Steven Riley; Pete Riley; Ugur Koyluoglu; David DesRoches; Pedro Forli; Bruce Hamory; Christina Kyriakides; Helen Leis; John Milliken; Michael Moloney; James Morgan; Ninad Nirgudkar; Gokce Ozcan; Noah Piwonka; Matt Ravi; Chris Schrader; Elizabeth Shakhnovich; Daniel Siegel; Ryan Spatz; Chris Stiefeling; Barrie Wilkinson; Alexander Wong; Sean Cavany; Guido España; Sean Moore; Rachel Oidtman; Alex Perkins; David Kraus; Andrea Kraus; Zhifeng Gao; Jiang Bian; Wei Cao; Juan Lavista Ferres; Chaozhuo Li; Tie-Yan Liu; Xing Xie; Shun Zhang; Shun Zheng; Alessandro Vespignani; Matteo Chinazzi; Jessica T Davis; Kunpeng Mu; Ana Pastore Y Piontti; Xinyue Xiong; Andrew Zheng; Jackie Baek; Vivek Farias; Andreea Georgescu; Retsef Levi; Deeksha Sinha; Joshua Wilde; Georgia Perakis; Mohammed Amine Bennouna; David Nze-Ndong; Divya Singhvi; Ioannis Spantidakis; Leann Thayaparan; Asterios Tsiourvas; Arnab Sarker; Ali Jadbabaie; Devavrat Shah; Nicolas Della Penna; Leo A Celi; Saketh Sundar; Russ Wolfinger; Dave Osthus; Lauren Castro; Geoffrey Fairchild; Isaac Michaud; Dean Karlen; Matt Kinsey; Luke C Mullany; Kaitlin Rainwater-Lovett; Lauren Shin; Katharine Tallaksen; Shelby Wilson; Elizabeth C Lee; Juan Dent; Kyra H Grantz; Alison L Hill; Joshua Kaminsky; Kathryn Kaminsky; Lindsay T Keegan; Stephen A Lauer; Joseph C Lemaitre; Justin Lessler; Hannah R Meredith; Javier Perez-Saez; Sam Shah; Claire P Smith; Shaun A Truelove; Josh Wills; Maximilian Marshall; Lauren Gardner; Kristen Nixon; John C Burant; Lily Wang; Lei Gao; Zhiling Gu; Myungjin Kim; Xinyi Li; Guannan Wang; Yueying Wang; Shan Yu; Robert C Reiner; Ryan Barber; Emmanuela Gakidou; Simon I Hay; Steve Lim; Chris Murray; David Pigott; Heidi L Gurung; Prasith Baccam; Steven A Stage; Bradley T Suchoski; B Aditya Prakash; Bijaya Adhikari; Jiaming Cui; Alexander Rodríguez; Anika Tabassum; Jiajia Xie; Pinar Keskinocak; John Asplund; Arden Baxter; Buse Eylul Oruc; Nicoleta Serban; Sercan O Arik; Mike Dusenberry; Arkady Epshteyn; Elli Kanal; Long T Le; Chun-Liang Li; Tomas Pfister; Dario Sava; Rajarishi Sinha; Thomas Tsai; Nate Yoder; Jinsung Yoon; Leyou Zhang; Sam Abbott; Nikos I Bosse; Sebastian Funk; Joel Hellewell; Sophie R Meakin; Katharine Sherratt; Mingyuan Zhou; Rahi Kalantari; Teresa K Yamana; Sen Pei; Jeffrey Shaman; Michael L Li; Dimitris Bertsimas; Omar Skali Lami; Saksham Soni; Hamza Tazi Bouardi; Turgay Ayer; Madeline Adee; Jagpreet Chhatwal; Ozden O Dalgic; Mary A Ladd; Benjamin P Linas; Peter Mueller; Jade Xiao; Yuanjia Wang; Qinxia Wang; Shanghong Xie; Donglin Zeng; Alden Green; Jacob Bien; Logan Brooks; Addison J Hu; Maria Jahja; Daniel McDonald; Balasubramanian Narasimhan; Collin Politsch; Samyak Rajanala; Aaron Rumack; Noah Simon; Ryan J Tibshirani; Rob Tibshirani; Valerie Ventura; Larry Wasserman; Eamon B O'Dea; John M Drake; Robert Pagano; Quoc T Tran; Lam Si Tung Ho; Huong Huynh; Jo W Walker; Rachel B Slayton; Michael A Johansson; Matthew Biggerstaff; Nicholas G Reich
Journal: Proc Natl Acad Sci U S A Date: 2022-04-08 Impact factor: 12.779

2. A human judgment approach to epidemiological forecasting.

Authors: David C Farrow; Logan C Brooks; Sangwon Hyun; Ryan J Tibshirani; Donald S Burke; Roni Rosenfeld
Journal: PLoS Comput Biol Date: 2017-03-10 Impact factor: 4.475

3. Recommended reporting items for epidemic forecasting and prediction research: The EPIFORGE 2020 guidelines.

Authors: Simon Pollett; Michael A Johansson; Nicholas G Reich; David Brett-Major; Sara Y Del Valle; Srinivasan Venkatramanan; Rachel Lowe; Travis Porco; Irina Maljkovic Berry; Alina Deshpande; Moritz U G Kraemer; David L Blazes; Wirichada Pan-Ngum; Alessandro Vespigiani; Suzanne E Mate; Sheetal P Silal; Sasikiran Kandula; Rachel Sippy; Talia M Quandelacy; Jeffrey J Morgan; Jacob Ball; Lindsay C Morton; Benjamin M Althouse; Julie Pavlin; Wilbert van Panhuis; Steven Riley; Matthew Biggerstaff; Cecile Viboud; Oliver Brady; Caitlin Rivers
Journal: PLoS Med Date: 2021-10-19 Impact factor: 11.069

4. Applying infectious disease forecasting to public health: a path forward using influenza forecasting examples.

Authors: Chelsea S Lutz; Mimi P Huynh; Monica Schroeder; Sophia Anyatonwu; F Scott Dahlgren; Gregory Danyluk; Danielle Fernandez; Sharon K Greene; Nodar Kipshidze; Leann Liu; Osaro Mgbere; Lisa A McHugh; Jennifer F Myers; Alan Siniscalchi; Amy D Sullivan; Nicole West; Michael A Johansson; Matthew Biggerstaff
Journal: BMC Public Health Date: 2019-12-10 Impact factor: 3.295

5. Improving Pandemic Response: Employing Mathematical Modeling to Confront Coronavirus Disease 2019.

Authors: Matthew Biggerstaff; Rachel B Slayton; Michael A Johansson; Jay C Butler
Journal: Clin Infect Dis Date: 2022-03-09 Impact factor: 9.079

5 in total