| Literature DB >> 30533407 |
Geoffrey Fairchild1, Byron Tasseff1, Hari Khalsa1, Nicholas Generous1, Ashlynn R Daughton1, Nileena Velappan2, Reid Priedhorsky3, Alina Deshpande2.
Abstract
Accessible epidemiological data are of great value for emergency preparedness and response, understanding disease progression through a population, and building statistical and mechanistic disease models that enable forecasting. The status quo, however, renders acquiring and using such data difficult in practice. In many cases, a primary way of obtaining epidemiological data is through the internet, but the methods by which the data are presented to the public often differ drastically among institutions. As a result, there is a strong need for better data sharing practices. This paper identifies, in detail and with examples, the three key challenges one encounters when attempting to acquire and use epidemiological data: (1) interfaces, (2) data formatting, and (3) reporting. These challenges are used to provide suggestions and guidance for improvement as these systems evolve in the future. If these suggested data and interface recommendations were adhered to, epidemiological and public health analysis, modeling, and informatics work would be significantly streamlined, which can in turn yield better public health decision-making capabilities.Entities:
Keywords: computational epidemiology; data; disease modeling; disease surveillance; informatics; public health
Year: 2018 PMID: 30533407 PMCID: PMC6265573 DOI: 10.3389/fpubh.2018.00336
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
Figure 1Screenshot showing part of the mosquito-borne illness epidemiological bulletin list available at (31). This is the most current and complete list, with data available through the 38th week of 2016.
Figure 2Screenshot showing part of the mosquito-borne illness epidemiological bulletin list available at (30). This list only goes through week 21 of 2016 and is missing a number of weeks when compared to the list in Figure 1. This screenshot was taken at the same time as the one in Figure 1.
Figure 3Sample epidemiological case count data in CSV format. CSV files are plain text files that allow tabular data to be laid out as rows separated by newlines and columns separated by commas. This time series does not contain real data and only exists for demonstration purposes.
Figure 4Sample epidemiological case count data in a simple JSON format. Compared to CSV (demonstrated in Figure 3), JSON contains more structure that can more rigorously specify data relationships (including hierarchical relationships). Note that this is not EpiJSON; EpiJSON can be quite verbose (due to, for example, metadata specifications and GeoJSON-specified locations), and the authors felt a complete EpiJSON example would take up an unreasonable amount of space in this paper. As in Figure 3, this time series does not contain real data and only exists for demonstration purposes.
Sample historical weekly epidemiological time series consisting of timestamps and case counts.
| 2 | |
| 5 | |
| 4 |
Explicit transformation of Table 1 into a leading interval series.
| 2 | ||
| 5 | ||
| 4 |
The interval start and end are inclusive and exclusive, respectively.
Explicit transformation of Table 1 into a trailing exclusive interval series.
| 2 | ||
| 5 | ||
| 4 |
The interval start and end are inclusive and exclusive, respectively.
Explicit transformation of Table 1 into a trailing inclusive interval series.
| 2 | ||
| 5 | ||
| 4 |
The interval start and end are inclusive and exclusive, respectively.
Figure 5Screenshot taken from Texas' Department of State Health Services 2014–2015 weekly influenza reports web page (44). Texas identifies its weekly influenza surveillance PDF reports by trailing inclusive timestamps rather than by MMWR week (e.g., “10/3/15” instead of “2015, week 39”). Interestingly, much of the data in each PDF uses MMWR week numbers rather than timestamps.
Influenza reporting intervals in Poland in May 2016.
| 7 | ( | ||
| 8 | ( | ||
| 7 | ( | ||
| 9 | ( |
Instead of reporting data at regular intervals (e.g., every 7 days), Poland reports data four “weeks” a month, regardless of the length of the month. This yields irregular interval durations. Here, the interval start and end are inclusive.