Sarah V Leavitt1, Robyn S Lee2,3, Paola Sebastiani1, C Robert Horsburgh4, Helen E Jenkins1, Laura F White1. 1. School of Public Health, Department of Biostatistics, Boston University, Boston, MA, USA. 2. Harvard T.H. Chan School of Public Health, Boston, MA, USA. 3. University of Toronto Dalla Lana School of Public Health Epidemiology Division, Toronto, ON, Canada. 4. School of Public Health, Department of Epidemiology, Boston University, Boston, MA, USA.
Abstract
BACKGROUND: Estimating infectious disease parameters such as the serial interval (time between symptom onset in primary and secondary cases) and reproductive number (average number of secondary cases produced by a primary case) are important in understanding infectious disease dynamics. Many estimation methods require linking cases by direct transmission, a difficult task for most diseases. METHODS: Using a subset of cases with detailed genetic and/or contact investigation data to develop a training set of probable transmission events, we build a model to estimate the relative transmission probability for all case-pairs from demographic, spatial and clinical data. Our method is based on naive Bayes, a machine learning classification algorithm which uses the observed frequencies in the training dataset to estimate the probability that a pair is linked given a set of covariates. RESULTS: In simulations, we find that the probabilities estimated using genetic distance between cases to define training transmission events are able to distinguish between truly linked and unlinked pairs with high accuracy (area under the receiver operating curve value of 95%). Additionally, only a subset of the cases, 10-50% depending on sample size, need to have detailed genetic data for our method to perform well. We show how these probabilities can be used to estimate the average effective reproductive number and apply our method to a tuberculosis outbreak in Hamburg, Germany. CONCLUSIONS: Our method is a novel way to infer transmission dynamics in any dataset when only a subset of cases has rich contact investigation and/or genetic data.
BACKGROUND: Estimating infectious disease parameters such as the serial interval (time between symptom onset in primary and secondary cases) and reproductive number (average number of secondary cases produced by a primary case) are important in understanding infectious disease dynamics. Many estimation methods require linking cases by direct transmission, a difficult task for most diseases. METHODS: Using a subset of cases with detailed genetic and/or contact investigation data to develop a training set of probable transmission events, we build a model to estimate the relative transmission probability for all case-pairs from demographic, spatial and clinical data. Our method is based on naive Bayes, a machine learning classification algorithm which uses the observed frequencies in the training dataset to estimate the probability that a pair is linked given a set of covariates. RESULTS: In simulations, we find that the probabilities estimated using genetic distance between cases to define training transmission events are able to distinguish between truly linked and unlinked pairs with high accuracy (area under the receiver operating curve value of 95%). Additionally, only a subset of the cases, 10-50% depending on sample size, need to have detailed genetic data for our method to perform well. We show how these probabilities can be used to estimate the average effective reproductive number and apply our method to a tuberculosis outbreak in Hamburg, Germany. CONCLUSIONS: Our method is a novel way to infer transmission dynamics in any dataset when only a subset of cases has rich contact investigation and/or genetic data.
Authors: Steven Riley; Christophe Fraser; Christl A Donnelly; Azra C Ghani; Laith J Abu-Raddad; Anthony J Hedley; Gabriel M Leung; Lai-Ming Ho; Tai-Hing Lam; Thuan Q Thach; Patsy Chau; King-Pan Chan; Su-Vui Lo; Pak-Yin Leung; Thomas Tsang; William Ho; Koon-Hung Lee; Edith M C Lau; Neil M Ferguson; Roy M Anderson Journal: Science Date: 2003-05-23 Impact factor: 47.728
Authors: Eleanor M Cottam; Gaël Thébaud; Jemma Wadsworth; John Gloster; Leonard Mansley; David J Paton; Donald P King; Daniel T Haydon Journal: Proc Biol Sci Date: 2008-04-22 Impact factor: 5.349
Authors: Robyn S Lee; Nicolas Radomski; Jean-Francois Proulx; Ines Levade; B Jesse Shapiro; Fiona McIntosh; Hafid Soualhine; Dick Menzies; Marcel A Behr Journal: Proc Natl Acad Sci U S A Date: 2015-10-19 Impact factor: 11.205
Authors: Gytis Dudas; Luiz Max Carvalho; Trevor Bedford; Andrew J Tatem; Guy Baele; Nuno R Faria; Daniel J Park; Jason T Ladner; Armando Arias; Danny Asogun; Filip Bielejec; Sarah L Caddy; Matthew Cotten; Jonathan D'Ambrozio; Simon Dellicour; Antonino Di Caro; Joseph W Diclaro; Sophie Duraffour; Michael J Elmore; Lawrence S Fakoli; Ousmane Faye; Merle L Gilbert; Sahr M Gevao; Stephen Gire; Adrianne Gladden-Young; Andreas Gnirke; Augustine Goba; Donald S Grant; Bart L Haagmans; Julian A Hiscox; Umaru Jah; Jeffrey R Kugelman; Di Liu; Jia Lu; Christine M Malboeuf; Suzanne Mate; David A Matthews; Christian B Matranga; Luke W Meredith; James Qu; Joshua Quick; Suzan D Pas; My V T Phan; Georgios Pollakis; Chantal B Reusken; Mariano Sanchez-Lockhart; Stephen F Schaffner; John S Schieffelin; Rachel S Sealfon; Etienne Simon-Loriere; Saskia L Smits; Kilian Stoecker; Lucy Thorne; Ekaete Alice Tobin; Mohamed A Vandi; Simon J Watson; Kendra West; Shannon Whitmer; Michael R Wiley; Sarah M Winnicki; Shirlee Wohl; Roman Wölfel; Nathan L Yozwiak; Kristian G Andersen; Sylvia O Blyden; Fatorma Bolay; Miles W Carroll; Bernice Dahn; Boubacar Diallo; Pierre Formenty; Christophe Fraser; George F Gao; Robert F Garry; Ian Goodfellow; Stephan Günther; Christian T Happi; Edward C Holmes; Brima Kargbo; Sakoba Keïta; Paul Kellam; Marion P G Koopmans; Jens H Kuhn; Nicholas J Loman; N'Faly Magassouba; Dhamari Naidoo; Stuart T Nichol; Tolbert Nyenswah; Gustavo Palacios; Oliver G Pybus; Pardis C Sabeti; Amadou Sall; Ute Ströher; Isatta Wurie; Marc A Suchard; Philippe Lemey; Andrew Rambaut Journal: Nature Date: 2017-04-12 Impact factor: 49.962
Authors: Christophe Fraser; Christl A Donnelly; Simon Cauchemez; William P Hanage; Maria D Van Kerkhove; T Déirdre Hollingsworth; Jamie Griffin; Rebecca F Baggaley; Helen E Jenkins; Emily J Lyons; Thibaut Jombart; Wes R Hinsley; Nicholas C Grassly; Francois Balloux; Azra C Ghani; Neil M Ferguson; Andrew Rambaut; Oliver G Pybus; Hugo Lopez-Gatell; Celia M Alpuche-Aranda; Ietza Bojorquez Chapela; Ethel Palacios Zavala; Dulce Ma Espejo Guevara; Francesco Checchi; Erika Garcia; Stephane Hugonnet; Cathy Roth Journal: Science Date: 2009-05-11 Impact factor: 47.728
Authors: Sarah V Leavitt; C Robert Horsburgh; Robyn S Lee; Andrew M Tibbs; Laura F White; Helen E Jenkins Journal: Epidemiology Date: 2022-01-01 Impact factor: 4.822
Authors: Sarah V Leavitt; Helen E Jenkins; Paola Sebastiani; Robyn S Lee; C Robert Horsburgh; Andrew M Tibbs; Laura F White Journal: Biostatistics Date: 2022-07-18 Impact factor: 5.279
Authors: Tara Carney; Jennifer A Rooney; Nandi Niemand; Bronwyn Myers; Danie Theron; Robin Wood; Laura F White; Christina S Meade; Novel N Chegou; Elizabeth Ragan; Gerhard Walzl; Robert Horsburgh; Robin M Warren; Karen R Jacobson Journal: PLoS One Date: 2022-02-15 Impact factor: 3.752
Authors: Wenrui Li; Katia Bulekova; Brian Gregor; Laura F White; Eric D Kolaczyk Journal: Philos Trans A Math Phys Eng Sci Date: 2022-08-15 Impact factor: 4.019