Literature DB >> 26858916

A Supervised Learning Process to Validate Online Disease Reports for Use in Predictive Models.

Helena M M Patching1, Laurence M Hudson1, Warrick Cooke1, Andres J Garcia2, Simon I Hay3, Mark Roberts4, Catherine L Moyes5.   

Abstract

Pathogen distribution models that predict spatial variation in disease occurrence require data from a large number of geographic locations to generate disease risk maps. Traditionally, this process has used data from public health reporting systems; however, using online reports of new infections could speed up the process dramatically. Data from both public health systems and online sources must be validated before they can be used, but no mechanisms exist to validate data from online media reports. We have developed a supervised learning process to validate geolocated disease outbreak data in a timely manner. The process uses three input features, the data source and two metrics derived from the location of each disease occurrence. The location of disease occurrence provides information on the probability of disease occurrence at that location based on environmental and socioeconomic factors and the distance within or outside the current known disease extent. The process also uses validation scores, generated by disease experts who review a subset of the data, to build a training data set. The aim of the supervised learning process is to generate validation scores that can be used as weights going into the pathogen distribution model. After analyzing the three input features and testing the performance of alternative processes, we selected a cascade of ensembles comprising logistic regressors. Parameter values for the training data subset size, number of predictors, and number of layers in the cascade were tested before the process was deployed. The final configuration was tested using data for two contrasting diseases (dengue and cholera), and 66%-79% of data points were assigned a validation score. The remaining data points are scored by the experts, and the results inform the training data set for the next set of predictors, as well as going to the pathogen distribution model. The new supervised learning process has been implemented within our live site and is being used to validate the data that our system uses to produce updated predictive disease maps on a weekly basis.

Entities:  

Keywords:  big data analytics; data acquisition and cleaning; machine learning; structured data

Year:  2015        PMID: 26858916      PMCID: PMC4722556          DOI: 10.1089/big.2015.0019

Source DB:  PubMed          Journal:  Big Data        ISSN: 2167-6461            Impact factor:   2.128


  9 in total

Review 1.  Logistic regression and artificial neural network classification models: a methodology review.

Authors:  Stephan Dreiseitl; Lucila Ohno-Machado
Journal:  J Biomed Inform       Date:  2002 Oct-Dec       Impact factor: 6.317

2.  Learning from imbalanced data in surveillance of nosocomial infection.

Authors:  Gilles Cohen; Mélanie Hilario; Hugo Sax; Stéphane Hugonnet; Antoine Geissbuhler
Journal:  Artif Intell Med       Date:  2005-10-17       Impact factor: 5.326

3.  A working guide to boosted regression trees.

Authors:  J Elith; J R Leathwick; T Hastie
Journal:  J Anim Ecol       Date:  2008-04-08       Impact factor: 5.091

4.  Prediction of dengue incidence using search query surveillance.

Authors:  Benjamin M Althouse; Yih Yng Ng; Derek A T Cummings
Journal:  PLoS Negl Trop Dis       Date:  2011-08-02

5.  Funding for malaria control 2006-2010: a comprehensive global assessment.

Authors:  David M Pigott; Rifat Atun; Catherine L Moyes; Simon I Hay; Peter W Gething
Journal:  Malar J       Date:  2012-07-28       Impact factor: 2.979

6.  Providing open access data online to advance malaria research and control.

Authors:  Catherine L Moyes; William H Temperley; Andrew J Henry; Clara R Burgert; Simon I Hay
Journal:  Malar J       Date:  2013-05-16       Impact factor: 2.979

7.  The global distribution and burden of dengue.

Authors:  Samir Bhatt; Peter W Gething; Oliver J Brady; Jane P Messina; Andrew W Farlow; Catherine L Moyes; John M Drake; John S Brownstein; Anne G Hoen; Osman Sankoh; Monica F Myers; Dylan B George; Thomas Jaenisch; G R William Wint; Cameron P Simmons; Thomas W Scott; Jeremy J Farrar; Simon I Hay
Journal:  Nature       Date:  2013-04-07       Impact factor: 49.962

8.  HealthMap: global infectious disease monitoring through automated classification and visualization of Internet media reports.

Authors:  Clark C Freifeld; Kenneth D Mandl; Ben Y Reis; John S Brownstein
Journal:  J Am Med Inform Assoc       Date:  2007-12-20       Impact factor: 4.497

9.  Mapping the zoonotic niche of Ebola virus disease in Africa.

Authors:  David M Pigott; Nick Golding; Adrian Mylne; Zhi Huang; Andrew J Henry; Daniel J Weiss; Oliver J Brady; Moritz U G Kraemer; David L Smith; Catherine L Moyes; Samir Bhatt; Peter W Gething; Peter W Horby; Isaac I Bogoch; John S Brownstein; Sumiko R Mekaru; Andrew J Tatem; Kamran Khan; Simon I Hay
Journal:  Elife       Date:  2014-09-08       Impact factor: 8.140

  9 in total
  3 in total

1.  Estimating Geographical Variation in the Risk of Zoonotic Plasmodium knowlesi Infection in Countries Eliminating Malaria.

Authors:  Freya M Shearer; Zhi Huang; Daniel J Weiss; Antoinette Wiebe; Harry S Gibson; Katherine E Battle; David M Pigott; Oliver J Brady; Chaturong Putaporntip; Somchai Jongwutiwes; Yee Ling Lau; Magnus Manske; Roberto Amato; Iqbal R F Elyazar; Indra Vythilingam; Samir Bhatt; Peter W Gething; Balbir Singh; Nick Golding; Simon I Hay; Catherine L Moyes
Journal:  PLoS Negl Trop Dis       Date:  2016-08-05

Review 2.  Aedes Mosquitoes and Aedes-Borne Arboviruses in Africa: Current and Future Threats.

Authors:  David Weetman; Basile Kamgang; Athanase Badolo; Catherine L Moyes; Freya M Shearer; Mamadou Coulibaly; João Pinto; Louis Lambrechts; Philip J McCall
Journal:  Int J Environ Res Public Health       Date:  2018-01-28       Impact factor: 3.390

3.  A database of geopositioned Middle East Respiratory Syndrome Coronavirus occurrences.

Authors:  Rebecca E Ramshaw; Ian D Letourneau; Amy Y Hong; Julia Hon; Julia D Morgan; Joshua C P Osborne; Shreya Shirude; Maria D Van Kerkhove; Simon I Hay; David M Pigott
Journal:  Sci Data       Date:  2019-12-13       Impact factor: 6.444

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.