| Literature DB >> 27266846 |
Thomas J R Finnie1, Andy South2, Ana Bento3, Ellie Sherrard-Smith3, Thibaut Jombart3.
Abstract
Epidemiology relies on data but the divergent ways data are recorded and transferred, both within and between outbreaks, and the expanding range of data-types are creating an increasingly complex problem for the discipline. There is a need for a consistent, interpretable and precise way to transfer data while maintaining its fidelity. We introduce 'EpiJSON', a new, flexible, and standards-compliant format for the interchange of epidemiological data using JavaScript Object Notation. This format is designed to enable the widest range of epidemiological data to be unambiguously held and transferred between people, software and institutions. In this paper, we provide a full description of the format and a discussion of the design decisions made. We introduce a schema enabling automatic checks of the validity of data stored as EpiJSON, which can serve as a basis for the development of additional tools. In addition, we also present the R package 'repijson' which provides conversion tools between this format, line-list data and pre-existing analysis tools. An example is given to illustrate how EpiJSON can be used to store line list data. EpiJSON, designed around modern standards for interchange of information on the internet, is simple to implement, read and check. As such, it provides an ideal new standard for epidemiological, and other, data transfer to the fast-growing open-source platform for the analysis of disease outbreaks. CrownEntities:
Keywords: Communications standards; Databases; Epidemics; Outbreaks; Software
Mesh:
Year: 2015 PMID: 27266846 PMCID: PMC7104924 DOI: 10.1016/j.epidem.2015.12.002
Source DB: PubMed Journal: Epidemics ISSN: 1878-0067 Impact factor: 4.396
Fig. 1The major components of an epidemiology workflow (blocks) and the places where a standard transfer format would be of assistance (arrows).
Fig. 2Block diagram of the structure of an EpiJSON file.
Keys and values for the base JSON object contained in an EpiJSON file. Square brackets following a data type indicate arrays.
| Key | Type | Value | Description |
|---|---|---|---|
| “metadata” | attribute[] | Zero or more attribute objects | A set of attribute objects relating to the dataset as a whole |
| “records” | record[] | Zero or more record objects | A set of record objects containing the dataset |
Keys and values for an attribute object.
| Key | Type | Value | Description |
|---|---|---|---|
| “name” | string | string | An identifier for this attribute |
| “type” | string | “string” | Type must be one of the enumerated types listed. (See |
| “value” | string; numeric; boolean | The value to be recorded. It must be a valid example of “type”. May be a homogeneous array of one of the permitted types | |
| “units” | string | string | A string with the UDUNITS2 unit name. For non-numeric or non-dimensional attributes this key may be omitted. Non-standard units |
Possible values for the type key of an attribute object and the consequence for the data held in the value key.
| Type | Value |
|---|---|
| “string” | A character string of unspecified length |
| “number” | A decimal number. In languages making a distinction between numeric types, it is recommended that the “value” key is a signed floating point number encoded on at least 64 bits. Numbers must not contain any whitespace. Commas and periods may only appear where they indicate the decimal place |
| “integer” | An integer number. In languages making a distinction between numeric types, it is recommended that the “value” key is a signed integer number encoded on at least 32 bits |
| “boolean” | A Boolean value, true or false. If a language supports Boolean types then the implementation may treat the value key as Boolean. Must be lower-case and unquoted |
| “date” | A character string representing a date conforming to RFC3339 ( |
| “location” | A GeoJSON object representing a spatial entity. Note: although attributes can hold either spatial (location) or temporal (date) information this is mostly for use in metadata. For records, the event object is the recommended form for storing this information |
| “base64” | A character string of binary data encoded to a text character set using base 64 encoding as per RFC4648 ( |
keys and values for the record object. Square brackets following a data type indicate arrays.
| Key | Type | Value | Description |
|---|---|---|---|
| “id” | string | String representing a UUID | A string conforming to RFC 4122 acting as a unique identifier for this record |
| “attributes” | attribute[] | Zero or more attribute objects | Attributes relating to this record. Examples might be gender, county name etc. |
| “events” | event[] | Zero or more event objects | Events relating to this record. Examples include infection, symptom onset or travel history etc. |
Keys and values for the event object.
| Key | Type | Value | Description |
|---|---|---|---|
| “id” | string | String representing a UUID | A string conforming to RFC 4122 acting as a unique identifier for this event |
| “name” | string | String | The name of this event |
| “date” | string | A string representation of a date | This string must conform to RFC3339 (as above for the attribute object) |
| “location” | object | A GeoJSON object | This object represents the spatial occurrence of the event as valid GeoJSON object |
| “attributes” | attribute[] | Zero or more attribute objects | Attributes relating to this event |
Fig. 3Example of data in EpiJSON format.
Fig. 4Schematic of the common types of data that the repijson package may convert between or include into an EpiJSON file. Solid lines indicate conversion of data dashed lines indicated direct inclusion.