Literature DB >> 29688216

Spatiotemporal incidence of Zika and associated environmental drivers for the 2015-2016 epidemic in Colombia.

Amir S Siraj1, Isabel Rodriguez-Barraquer2, Christopher M Barker3, Natalia Tejedor-Garavito4,5, Dennis Harding6, Christopher Lorton6, Dejan Lukacevic6, Gene Oates6, Guido Espana1, Moritz U G Kraemer7,8,9, Carrie Manore10, Michael A Johansson11,12, Andrew J Tatem4,5, Robert C Reiner13, T Alex Perkins1.   

Abstract

Despite a long history of mosquito-borne virus epidemics in the Americas, the impact of the Zika virus (ZIKV) epidemic of 2015-2016 was unexpected. The need for scientifically informed decision-making is driving research to understand the emergence and spread of ZIKV. To support that research, we assembled a data set of key covariates for modeling ZIKV transmission dynamics in Colombia, where ZIKV transmission was widespread and the government made incidence data publically available. On a weekly basis between January 1, 2014 and October 1, 2016 at three administrative levels, we collated spatiotemporal Zika incidence data, nine environmental variables, and demographic data into a single downloadable database. These new datasets and those we identified, processed, and assembled at comparable spatial and temporal resolutions will save future researchers considerable time and effort in performing these data processing steps, enabling them to focus instead on extracting epidemiological insights from this important data set. Similar approaches could prove useful for filling data gaps to enable epidemiological analyses of future disease emergence events.

Entities:  

Mesh:

Year:  2018        PMID: 29688216      PMCID: PMC5914286          DOI: 10.1038/sdata.2018.73

Source DB:  PubMed          Journal:  Sci Data        ISSN: 2052-4463            Impact factor:   6.444


Background & Summary

Zika virus (ZIKV) emerged as a pathogen of global concern in 2015 when it rapidly spread through the Americas and was associated with Guillain-Barré syndrome (GBS) in adults and congenital Zika syndrome (CZS) in fetuses and neonates[1]. Though ZIKV had been discovered several decades earlier, recognition of severe outcomes and the explosive nature of ZIKV epidemics was only established recently[2-5]. Moreover, an estimated 80% rate of asymptomatic infection[2,7,8] and the presence of more infections with relatively mild symptoms who go unreported[9] complicate efforts to estimate disease incidence and further make modeling the spread of ZIKV a challenging task. Despite these issues and the chronic lack of data at the appropriate spatio-temporal scales, efforts to understand the spatiotemporal dynamics of ZIKV rely heavily on access to data about its spatiotemporal drivers[6]. ZIKV is transmitted primarily by Aedes aegypti mosquitoes, which also transmit chikungunya, yellow fever, and dengue viruses. Like these other viruses, ZIKV transmission is highly dependent on the environment. Climatic conditions, for example, regulate the population dynamics of vectors[10,11], and the built environment plays an important role in human-vector interaction and in providing breeding grounds for mosquitoes[12]. Even though the importance of these factors is widely recognized, their specific roles are more difficult to understand but can be aided by model-based analysis combining epidemiological and environmental data[13]. The availability of spatiotemporal incidence data is critical to both current and near-future responses and to planning for responses to emerging infectious disease outbreaks. For example, during the Ebola epidemic in 2014-2015, mathematical and statistical models using incidence data were critical to informing resource allocation and placement of new hospital beds[14], plans for vaccine trials[15], estimates of intervention effectiveness, and understanding how the outbreak started and where it spread in time and space[16]. Similarly, spatiotemporal ZIKV data has informed efforts to estimate the number of people at risk for infection and the number of pregnant women infected[6]. Such data are also potentially important for selecting sites for ZIKV vaccine trials[17]. Despite the widely recognized importance of spatiotemporal incidence data, there is often limited availability of such data sets for emerging infectious diseases[18]. In the case of Zika, there has been some effort to broaden access to these data (e.g., the cdcepi Github repository[19]), but the data available through these settings are often not internally consistent and are not made available with important covariates, such as population and weather conditions. Colombia is one country for which data has been made available online by its Instituto Nacional de Salud[20] and is of particular interest due to the high resolution of data there (available weekly for each of 1,122 municipalities). This data set is also of particular interest for modeling the spatio-temporal spread of ZIKV due to Colombia’s diverse landscape and because of substantial heterogeneity in the timing and intensity of ZIKV transmission there[21]. Together, these factors offer a unique opportunity to examine the role of environmental and social influences on the spread of ZIKV[22]. In addition to spatiotemporal incidence data, several variables are commonly incorporated into analyses of the transmission dynamics of ZIKV and related pathogens[23,24]. First, temperature plays a dominant role in ZIKV transmission due to its influence on vector and virus life traits[25,26]. Because the effect of temperature on transmission depends not only on mean temperature but also on daily temperature range[27], we include estimates of mean, minimum, and maximum daily temperature. Second, a number of metrics related to moisture—including precipitation, humidity, and normalized difference vegetation index (NDVI)—are commonly used for modeling mosquito population dynamics due to their relevance to the immature stages of the mosquito life cycle[11]. Third, we include spatiotemporal estimates of relative mosquito abundance[28], a spatial estimate of purchasing power as a proxy for the effect of socioeconomic effects on mosquito-human contact[6,29], and spatial estimates of travel time to allow for exploration of the effects of connectivity on spatiotemporal transmission dynamics[24]. Fourth, we include demographic projections[30] of total population and annual births to allow for quantification of the population at risk of ZIKV infection and severe outcomes such as GBS and CZS. Here, we collated data on the aforementioned variables at three administrative scales on a weekly basis between January 1, 2014 and October 1, 2016, which spans the majority of ZIKV transmission activity in Colombia. Our hope is that this effort will increase access to this data set and reduce duplication of the considerable effort required to process data for epidemiological analyses of ZIKV transmission dynamics.

Methods

To achieve our central objective of assembling and collating multiple data sets pertaining to ZIKV transmission in Colombia, we first identified key data and then translated those data to comparable spatiotemporal resolution using a variety of methods. In some cases, this was as simple as downloading raster datasets and clipping them to shape files. In other cases, this involved statistical modelling to transform existing data products from certain scales into a single data product at some other desired scale. In all cases, our methods involved taking input data (Table 1) and generating output data (Table 2 (available online only), Data Citation 1) at a weekly timescale between January 1, 2014 and October 1, 2016 for each of three administrative scales (Fig. 1). Throughout, we generated output data at the national scale, for each of 33 departments, and for each of 1,122 municipalities, as defined by GIS shapefiles from the National Geographical Information System of Colombia[31].
Table 1

Input datasets, used to generate gridded and administrative aggregate outputs.

NameAcquisition yearSourceVersion, Publication year, LicenseData TypeSpatial ResolutionFormat/ Pixel Type & DepthSpatial ReferenceSpatial Coverage
GTOPO30 Gridded Elevation<1996USGS[37]1996, CC0 1.0Elevation, continuous raster30” (~930 m)Geo-tiff/flt32GCS WGS 1984Regional
CPC Surface Air temperature2014–2016Fan & van den Dool[39]2008, CC0 1.0Monthly surface air temperature, continuous raster1800” (~56 km)ESRI grids/flt32GCS WGS 1984Global
Worldclim Average Temperature1960–1990Hijmans R.J., et al.[38]v1, 2005, CCBY 4.0Average monthly temperature, continuous raster150” (~4.65 km)Geo-tiff/flt32GCS WGS 1984Global
Worldclim Minimum Temperature1960–1990Hijmans R.J., et al.[38]v1, 2005, CCBY 4.0Average monthly minimum temperature, continuous raster150” (~4.65 km)Geo-tiff/flt32GCS WGS 1984Global
Worldclim Maximum Temperature1960-1990Hijmans R.J., et al.[38]v1, 2005, CCBY 4.0Average monthly maximum temperature, continuous raster150” (~4.65 km)Geo-tiff/flt32GCS WGS 1984Global
Daily Station Mean Temperature2014–2016NOAA[36]2016, CC0 1.0Daily mean temperature reading from 30 stations, continuous vectorComparable to 1” (~30 m)HTML/flt32GCS WGS 1984Colombia
Daily Station Minimum temperature2014–2016NOAA[36]2016, CC0 1.0Daily minimum temperature reading from 30 stations, continuous vectorComparable to 1” (~30 m)HTML/flt32GCS WGS 1984Colombia
Daily Station Maximum Temperature2014–2016NOAA[36]2016, CC0 1.0Daily maximum temperature reading from 30 stations, continuous vectorComparable to 1” (~30 m)HTML/flt32GCS WGS 1984Colombia
Daily Station Relative Humidity2014–2016NOAA[36]2016, CC0 1.0Daily relative humidity reading from 30 stations, continuous vectorComparable to 1” (~30 m)HTML/flt32GCS WGS 1984Colombia
Daily Mean Dew Point Temperature2014–2016NOAA[36]2016, CC0 1.0Daily mean dew point temperature reading from 30 stations, continuous vectorComparable to 1” (~30 m)HTML/flt32GCS WGS 1984Colombia
Gridded Population of the World (GPW)2005CIESIN[50]v3, 2004, CCBY 4.0Global Population Estimates, continuous raster150” (~4.65 km)Geo-tiff/flt32GCS WGS 1984Global
Confirmed and Suspected Cumulative ZIKV Cases2015–2016INS[20]2016Weekly suspected and confirmed cumulative ZIKV cases by municipality from two INS sourcesNACSV/flt32NAColombia
Occurrence Probability of Aedes aegypti1960–2014Kraemer et al. 2015[28]2015, AuthorGlobal occurrence probabilities of Aedes aegypti, continuous raster150” (~4.65 km)Geo-tiff/flt32GCS WGS 1984Global
GEcon – Gross Cell Product2005Nordhaus[49]2006, CCBY 4.0Global gridded gross cell product, continuous raster3600” (~111 km)XLS/flt32GCS WGS 1984Global
WorldPop Population2015WorldPop[30]2016, CCBY 4.0Population count, continuous raster3” (~93 m)Geo-tiff/flt32GCS WGS 1984Colombia
WorldPop Births2015WorldPop[30]2016, AuthorCount of births, continuous raster30” (~93 m)Geo-tiff/flt32GCS WGS 1984Colombia
MODIS –MOD13A2 NDVI2014–2016Didan K. (Data Citation 2)v6, 2015, CC0 1.016-day NDVI from Terra MODIS, continuous raster30” (~930 m)HDF-EOS tiles/uint8SinusoidalGlobal
MODIS –MYD13A2 NDVI2014–2016Didan K (Data Citation 3)v6, 2015, CC0 1.016-day NDVI from Aqua MODIS, continuous raster30” (~930 m)HDF-EOS tiles/uint8SinusoidalGlobal
NOAA’s Satellite Applications and Research Rainfall Estimates2015–2016NOAA[44]2016, CC0 1.0Daily precipitation estimates from satellites, continuous raster360” (~11 km)Net-CDF /uint8GCS WGS 1984Global
Travel Time to Major Cities2000Nelson A.[51]2008, CCBY 3.0Travel time, continuous raster30” (~930 m)Flt/flt32/flt32GCS WGS 1984Global
MODIS 500m Global Urban Extent2002Schneider et al.[52,53]2009, CCBY 3.0Urban extent15” (~465 m)Flt/flt32/flt32GCS WGS 1984Global
Administrative Boundaries of Colombia2015SIGOT, Colombia[31]2015Municipal administrative boundaries, vectorComparable to 15” (~465 m)ESRI polygon shapefile tilesGCS WGS 1984Colombia
Table 2

Output datasets, compiled and generated at different spatial scales

NameAcquisition yearSourcePublication yearData TypeSpatial ResolutionFormat/ Pixel Type & DepthSpatial ReferenceSpatial Coverage
Mean Temperature Time Series2014–2016Model derived based on GTOPO30 elevation, WorldClim, and Daily Station data
Weekly mean temperature, continuous raster150” (~4.65 km)Raster brick/flt32GCS WGS 1984Colombia
Minimum Temperature Time Series2014–2016Model derived based on GTOPO30 elevation, Mean Temperature Time Series and Daily Station data
Weekly minimum temperature, continuous raster150” (~4.65 km)Raster brick/flt32GCS WGS 1984Colombia
Maximum Temperature Time Series2014–2016Model derived based on GTOPO30 elevation, WorldClim and Daily Station data
Weekly maximum temperature, continuous raster150” (~4.65 km)Raster brick/ flt32GCS WGS 1984Colombia
Precipitation Time Series2014–2016Aggregated from NOAA’s Satellite Applications and Research Rainfall Estimates
Weekly mean relative humidity, continuous raster150” (~4.65 km)Raster brick/flt32GCS WGS 1984Colombia
Mean Relative Humidity Time Series2014–2016Model derived based on GTOPO30 elevation, Mean Temperature Time Series and Daily Station data
Weekly mean relative humidity, continuous raster150” (~4.65 km)Raster brick/flt32GCS WGS 1984Colombia
NDVI –Terra MODIS Time Series2014–2016Temporally interpolated from MODIS –MOD13A2 NDVI
Weekly mean NDVI from Terra MODIS, continuous raster150” (~4.65 km)Raster brick/flt32SinusoidalColombia
NDVI –Aqua MODIS Time Series2014–2016Temporally interpolated from MODIS –MYD13A2 NDVI
Weekly mean NDVI from Aqua MODIS, continuous raster150” (~4.65 km)Raster brick/flt32SinusoidalColombia
Gridded Aedes aegypti AbundanceDerived from Occurrence Probability of Aedes aegypti
Aedes aegypti abundance for each week of the year, continuous raster150” (~4.65 km)Raster brick/flt32GCS WGS 1984Colombia 
Gridded per-capita gross cell product2005Derived from GEcon and Gridded Population of the World
Per-capita gross cell product, continuous raster150” (~4.65 km)BIL/flt32GCS WGS 1984Colombia
Travel Time to Major Cities2000Nelson A.[51]2008Travel time, continuous raster30” (~930 m)BIL/flt32GCS WGS 1984Colombia 
WorldPop Population2015WorldPop[30]2016Population count, continuous raster3” (~93 m)BIL/flt32GCS WGS 1984Colombia 
WorldPop Births2015WorldPop[30]2016Count of births, continuous raster3” (~93 m)BIL/flt32GCS WGS 1984Colombia 
Urban Population2015Derived from MODIS 500 m Global Urban Extent and WorldPop Population
Population residing in urban areas15” (~465 m)BIL/flt32GCS WGS 1984Colombia
Administrative Boundaries of Colombia2015IGAC, Colombia[31]2015Municipal administrative boundaries, vectorComparable to 15” (~465 m)ESRI polygon shapefile tilesGCS WGS 1984Colombia 
Confirmed and Suspected ZIKV CasesDerived from Confirmed and Suspected Cumulative ZIKV Cases
Weekly suspected and confirmed ZIKV cases by administrative unit, tableNACSV/flt32NAMunicipality, Department, National 
Per-capita gross cell product AggregateDerived from Gridded per-capita gross cell product and Administrative Boundaries of Colombia
Mean Per-capita gross cell product, by administrative unit, tableNACSV/flt32NAMunicipality, Department, National 
Population AggregateDerived from WorldPop Population and Administrative Boundaries of Colombia
Population by administrative units, tableNACSV/flt32NAMunicipality, Department, National 
Births AggregateDerived from WorldPop births and Administrative Boundaries of Colombia
Births by administrative units, tableNACSV/flt32NAMunicipality, Department, National 
Urban Population AggregateDerived from WorldPop Population and MODIS 500m Global Urban Extent
Urban population by administrative unit, tableNACSV/flt32NAMunicipality, Department, National 
Population Weighted Aedes aegypti Abundance AggregateDerived from Gridded Aedes aegypti Abundance, WorldPop Population and Administrative Boundaries of Colombia
Population weighted Aedes aegypti abundance for each week of the year by administrative unit, tableNACSV/flt32NAMunicipality, Department, National 
Population Weighted Travel Time to Major CitiesDerived from Travel Time to Major Cities, WorldPop Population and Administrative Boundaries of Colombia
Travel time to major cities weighted by population, tableNACSV/flt32NAMunicipality, Department, National 
Population Weighted Mean Temperature Time SeriesDerived from Mean Temperature Time Series, WorldPop Population and Administrative Boundaries of Colombia
Weekly mean temperature weighted by population, tableNACSV/flt32NAMunicipality, Department, National 
Population Weighted Minimum Temperature Time SeriesDerived from Minimum Temperature Time Series, WorldPop Population, and Administrative Boundaries of Colombia
Weekly minimum temperature weighted by population, tableNACSV/flt32NAMunicipality, Department, National 
Population Weighted Maximum Temperature Time SeriesDerived from Maximum Temperature Time Series, WorldPop Population, and Administrative Boundaries of Colombia
Weekly maximum temperature weighted by population, tableNACSV/flt32NAMunicipality, Department, National 
Population Weighted Precipitation Time SeriesDerived from Precipitation Time Series, WorldPop Population, and Administrative Boundaries of Colombia
Weekly total precipitation weighted by population, tableNACSV/flt32NAMunicipality, Department, National 
Population Weighted Relative Humidity Time SeriesDerived from Mean Relative Humidity Time Series, WorldPop Population, and Administrative Boundaries of Colombia
Weekly mean relative humidity weighted by population, tableNACSV/flt32NAMunicipality, Department, National 
Population weighted NDVI – Terra MODISDerived from NDVI –Terra MODIS Time Series, WorldPop Population, and Administrative Boundaries of Colombia
Weekly mean NDVI from MODIS Terra weighted by population, tableNACSV/flt32NAMunicipality, Department, National 
Population weighted NDVI – Aqua MODISDerived from NDVI –Aqua MODIS Time Series, WorldPop Population, and Administrative Boundaries of ColombiaWeekly mean NDVI from MODIS Aqua weighted by population, tableNACSV/flt32NAMunicipality, Department, National 
Figure 1

Schematic overview of the workflow used to produce the output raster files, and their spatial aggregates at the municipal, departmental, and national scales.

The input stages are shown in yellow, and the processing stages are shown in orange, while the output stages are in green.

Zika case reports

The weekly number of Zika cases, by municipality, was reconstructed using two data sources. The main data source was a website[20] of the Colombian National Institute of Health (Instituto Nacional de Salud) where the official weekly reports on the cumulative number of Zika suspected and confirmed cases for each municipality have been published since the beginning of 2016. While the peak of the Colombian epidemic occurred in 2016, a significant number of cases were reported during 2015. In order to capture this initial portion of the epidemic, we used an additional data source, also available in the INS website[20]. Unfortunately, the number of cases reported in the latter data source seemed to consistently underreport the total number of cases reported by the INS at the national scale. For example, while the official data source reports a cumulative number of 11,712 cases by the end of 2015, this secondary source only reports 3,875 cases for this same period. Therefore, in order to reconstruct the 2015 portion of the epidemic while accounting for the better known total number of cases, we multiplied the weekly 2015 data by a correction factor. This correction factor was calculated as the ratio between the cumulative number of cases reported by each municipality up to the first week of 2016 according to the official source and the alternative source. The raw and the corrected weekly counts for each municipality are included in the data set. To account for cases from unknown municipalities within a department, we also provide data at the departmental level.

Human demographics

We obtained gridded population data across Colombia for the year 2015 at a resolution of 3 arc seconds (~93 m) from the WorldPop website (http://worldpop.org.uk). Similarly, we obtained high-resolution (30 arc seconds) unpublished gridded data on the number of births for the year 2015 from the WorldPop project. These high-resolution products were developed to ensure consistencies with subnational data on sex and age structures, as well as subnational age-specific fertility rates, while adjustments on births were made at subnational scales using data from the government of Colombia[32,33], followed by national-level adjustments to contemporary numbers based on 2012 and 2015 United Nations Population Division data[30,34].

Spatial aggregation of covariates

Aggregation of raster data at the level of administrative units requires some assumption about how raster values should be weighted to obtain a single value for an administrative unit. Due to the fact that Zika virus transmission occurs predominantly in human-dominated areas, we used human population (WorldPop Project) as our weighting variable. We applied this weighting procedure to aggregate all covariates at municipal (e.g., as in Fig. 2), departmental (e.g., as in Fig. 3), and national levels.
Figure 2

Illustrative maps of municipality level weighted output variables for a single sample week.

Variables include minimum temperature (a), mean temperature (b) maximum temperature (c), relative humidity (d), NDVI from Terra MODIS (e) NDVI from Aqua MODIS (f), total rainfall (g) average per capita gross cell product in 2005 US$ standard value (h) and average travel time to major cities in 2000 (i).

Figure 3

Illustration of weekly time-series outputs aggregated at the departmental scale for the departments of Antioquia (red) and Valle del Cauca (blue) for the period January 1, 2014 to October 1, 2016.

Time-series shown are for mean temperature (bold lines), minimum temperature (lower lines) and maximum temperature (upper lines) (a), relative humidity (b), NDVI from Terra (solid lines) and Aqua (dashed lines) MODIS (c), and precipitation (d).

Aedes aegypti abundance

We obtained one hundred posterior samples of Aedes aegypti occurrence probabilities in raster format, from the published work of Kraemer et al.[28], which we used to derive weekly mosquito abundance measures for all 52 weeks of the year. We based our method on the assumption that m mosquitoes at time t, m(t), can be represented by a Poisson distribution with rate parameter λ=−ln(l-occurrence probability), consistent with existing ZIKV transmission models[29,35]. We obtained such an estimate of the relative density of mosquitoes across a 4.65 km x 4.65 km grid for each of 52 weeks. In addition, we generated aggregated values at the municipality, department and national scales after weighting the raster data values by population (see the section on Spatial aggregation of covariates).

Temperature

We downloaded meteorological readings from 30 stations across continental Colombia from National Oceanic and Atmospheric Administration (NOAA)’s Climate Data Online, an online archive of daily meteorological readings[36]. The variables we extracted from this data set included minimum daily temperature, maximum daily temperature, mean daily temperature, and relative humidity, all on a daily basis between January 1, 2014 and October 1, 2016. To facilitate interpolation of these climate variables across a more complete spatial coverage of the country, we downloaded a digital elevation dataset at a resolution of 30 arc seconds from the Global 30 Arc-Second Elevation (GTOPO30) product[37]. Similarly, we downloaded the WorldClim gridded long-term average of monthly minimum temperature, maximum temperature, and precipitation at a 4.65 km x 4.65 km spatial resolution[38], as well as NOAA’s Climate Prediction Center (CPC) global monthly mean air temperature at 0.5 arc-degrees resolution[39]. To generate smooth, high-resolution surfaces of climate variables based on calibration to point readings from the 30 meteorological stations, we tested two approaches of spatial interpolation: (a) using non-parametric surface fitting with thin plate splines (TPS) with or without fixed-factor covariates[40]; (b) using spatial models (kriging) with or without covariates[41]. We selected the best interpolation models for each environmental variable based on leave-one-out cross validation, as described in the Technical Validation section. The thin plate spline (TPS) follows the general form, where Y is the dependent variable evaluated at location x, μ is the fixed effect component of the model with optional covariates at location x, P is the implicit spline polynomial function over the spatial coordinates, and ε is measurement error, assumed to be uncorrelated across sites and normally distributed with mean zero and standard deviation σ. The kriging approach follows the concept that spatial autocorrelation is dependent on distance between locations. We used the krige function in the geoR library of R with parameters chosen based on maximum-likelihood estimation[42]. The model of a spatial process indexed by spatial locations x follows where Y is the dependent variable evaluated at location x, μ is the fixed effect component of the model at location x, S is a stationary Gaussian process with variance σ2 (partial sill) and a correlation function parametrized by φ (range), and ε is the error term with its variance τ2 (nugget variance). When μ is included, the trend is implemented using lm, the regression model function in R, and S(x) is fitted to the residuals of the regression model[41]. Due to Colombia’s proximity to the Equator, we ignored the small effect of distance distortion arising from non-projected spatial layers on both models[43]. Because our goal is generating daily surfaces of climate variables, rather than developing a predictive model that works for days outside those to which we fitted the model, we treated every day separately and fitted a model for each day between January 1, 2014 and October 1, 2016 for which data was available. In addition to generating daily raster outputs and aggregating them at weekly time steps, we generated aggregated values at the municipality (Figs 2a–c), department (Fig. 3a) and national scales after weighting the raster data values by population (see the section on Spatial aggregation of covariates).

Relative humidity

Rather than interpolating relative humidity directly based on station readings (which showed poor estimates in preliminary results), we approached the task of estimating relative humidity indirectly. First, we spatially interpolated weather station measurements of mean dew point temperature from the 30 stations across Colombia. This was followed by calculating relative humidity across the 4.65 km x 4.65 km grid based on interpolated mean temperature and dew point temperature, using the August-Roche-Magnus approximation for the saturation vapour pressure of water in air[44], which follows where T and T are the mean temperature and dew point temperature in °C and a=17.271 and b=237.7 °C[44]. Finally, in addition to generating daily raster outputs and aggregating them at weekly time steps, we generated aggregated values at the municipality (Fig. 2d), department (Fig. 3b) and national scales after weighting the raster data values by population (see the section on Spatial aggregation of covariates).

Normalized Difference Vegetation Index (NDVI)

Satellite-based technologies have been used to capture spatial variation in environmental factors related to vector population dynamics[45-47], including a commonly used index called Normalized Difference Vegetation Index (NDVI) that captures the vegetation cover of regions. To account for spatial and temporal variation in vegetation cover that could influence habitat suitability for Ae. aegypti, the primary ZIKV vector, we downloaded NASA’s Moderate Resolution Image Spectro-radiometer (MODIS-Terra and Aqua version 13A2) vegetation indices at 16-day temporal and 1 km x 1 km spatial resolutions (Data Citation 2, Data Citation 3). These products have similar sensors but differ in their orbits as well as their daily hours and directions of crossing the equator. We linearly interpolated between data points (days on which data was reported) to generate a daily time series before aggregating the data back to a weekly resolution. In addition, we generated aggregated values at the municipality (Figs 2e and f), department (Fig. 3c) and national scales after weighting the raster data values by population (see the section on Spatial aggregation of covariates).

Precipitation

Among the climate datasets we explored, precipitation proved to be the most spatially variable, making it difficult to rely on spatial models to make accurate estimates. Our attempt of spatial interpolation of precipitation using ordinary kriging resulted in large deviations from the observed values of the 30 stations obtained from NOAA. As an alternative, we used satellite-based data from NOAA’s Center for Satellite Applications and Research (STAR). We downloaded daily layers of the STAR rainfall estimates at ~4 km x 4 km resolution[48]. Once we download the daily products, we subset and resampled them into our standard resolution (4.65 km x 4.65 km) and spatial extent compatible with the other variables considered, before averaging across each consecutive seven days to generate weekly gridded data. In addition, we generated aggregated values at the municipality (Fig. 2g), department (Fig. 3d) and national scales after weighting the raster data values by population (see the section on Spatial aggregation of covariates).

Geographically based Economic data (G-Econ)

To account for socioeconomic differences, which are potentially associated with contact between humans and the vector, we used one-degree resolution gridded estimates of 2005 purchasing power parity (PPP) adjusted gross domestic product (GDP)[49]. To express the values in per capita, we divided the gridded GDP by the corresponding population, the latter obtained from the Gridded Population of the World product (v3)[50] after resampling the latter to one-degree resolution. We chose this version of gridded population data for this task given that it was the one originally used to generate the 2005 gridded GDP values. Cells with missing values were imputed with the mean of the surrounding eight grid cell values. Once we obtained a complete grid layer at a resolution of one-degree (~111 km at the equator), we resampled the layer, without smoothing, to a resolution of 4.65 km x 4.65 km to match the resolution of all other gridded layers. We additionally computed aggregated results at the municipality, department and national levels after weighting them by the distribution of population (in the year 2005) within each administrative unit (see the section on Spatial aggregation of covariates).

Travel time

To account for the general accessibility of each municipality and department, we used travel time data downloaded from the European Commission’s Joint Research Center at a resolution of 30 arc seconds[51]. This definition of travel time is a measure of overall accessibility rather than of frequency of travel. It is defined as the average length of time (in minutes) it takes individuals in a region to travel to the nearest location with a population greater than 50,000. Large travel time is indicative of a region whose population lives relatively far from urban centers. This gridded dataset has minutes of land-based travel time to the nearest settlement with population greater than 50,000 (as of the year 2000). The data is developed using a cost-distance model, which accounts for travel time increments based on the available transport networks and other environmental and political factors[51]. We aggregated travel time weighted by population at the municipal level to generate estimates of travel time for each municipality and similarly for each department (see the section on Spatial aggregation of covariates).

Urban population

To identify the level of urbanization in each grid cell, we downloaded the MODIS global 2002 urban extent raster dataset[52,53], which has a binary (0 or 1) value for each 500 m x 500 m grid cell around the globe. By counting the number of high-resolution urban grid cells that fall within each standard grid cell of 4.65 km x 4.65 km, we were able to generate a gridded product of percentage of the physical grid cell that is urban. Furthermore, in combination with the population raster we obtained from WorldPop[30], we were able to generate a gridded estimate of urban population at each 500 m x 500 m grid cell in Colombia.

Code availability

The code used to generate all gridded datasets and aggregating at municipal, departmental, and national levels is freely available for download from GitHub at https://github.com/asiraj-nd/zika-colombia[54]. This code utilizes the R programming language[42] and Python version 2.7.10. Further explanation of the code is provided in a readme file in the repository on GitHub[54].

Data Records

All output datasets described in this article (Data Citation 1) are publicly and freely available through Dryad Digital Repository. The datasets stored in the datadryad.org Repository represent the ones produced at the time of writing, and will be preserved in their published form. Datasets of interest can be obtained by downloading the corresponding zipped archive files (Table 2 (available online only)).

Technical Validation

Most datasets obtained from other sources have already been validated by independent studies[30,38,39,48-53]. We therefore limited our validation to the interpolated climate model outputs developed here by comparing spatial interpolation results to data from the 30 meteorological stations across Colombia. These comparisons were made for the two modeling approaches and for different combinations of covariates for each outcome: mean temperature, maximum temperature, minimum temperature, precipitation, and relative humidity. We used three metrics to compare model performance: mean absolute error, coefficient of variation, and Pearson’s correlation coefficient (COR). Mean absolute error (MAE) is the mean absolute difference between predictions and observations over n data points: We also used relative MAE (of two models), which is the ratio of the two MAEs. A relative MAE m of models A and B respectively, would indicate that predictions from model A were (1-m)% closer to the observed values than those from model B for an m value less than 1. The coefficient of variation (CV) evaluates the extent to which large values are dispersed relative to their mean value. It is the ratio of the root mean square error (RMSE) to the mean of observed values, Results of our comparison are described in Table 3. Overall, the ordinary kriging approach had higher accuracy for temperature (mean, maximum, and minimum) and relative humidity based on all three metrics. Model results also revealed that using other covariates, such as altitude and secondary climate data, improved interpolation results for temperature and relative humidity.
Table 3

Comparisons of model validation results for mean temperature, minimum temperature, maximum temperature and relative humidity based on leave-one-out approach.

Spatial interpolation methodResponse variable and fixed factors usedMAECVCOR
Larger MAE and CV values indicate worse fits, while larger COR values indicate better fit.
    
Mean temperature
    
 Thin Plate SplineNone3.850.210.38
 Altitude1.670.090.90
 Altitude, distance to ocean1.750.10.88
 Altitude, CPC temp3.580.190.66
 Altitude, Worldclim temp1.210.070.95
 Ordinary krigingAltitude, Worldclim temp, CPC temp1.230.070.94
 None3.450.210.43
 Altitude, Worldclim temp1.090.060.96
 Altitude, Worldclim temp, CPC temp1.130.060.95
Minimum temperature
    
 Ordinary krigingAltitude, Worldclim temp1.260.080.95
 Altitude, interpolated mean temp.1.130.070.96
 Altitude, Worldclim temp, interpolated mean temp.1.460.10.93
Maximum temperature
    
 Ordinary krigingAltitude, Worldclim temp1.540.070.92
 Altitude, interpolated mean temp.2.020.10.85
 Altitude, Worldclim temp, interpolated mean temp.2.030.10.85
Relative humiditya
    
 Ordinary krigingAltitude, Worldclim temp5.490.30.86
 Altitude, interpolated mean temp.1.400.10.92
 Altitude, Worldclim temp, interpolated mean temp.1.460.10.91

aDerived using Equation 3.

Usage Notes

This compilation of datasets can facilitate a variety of studies relevant to vector-borne disease epidemiology in Colombia. The archive provides ready to use data both in a raster format with resolution of 5km x 5km, and at administrative units of municipal, departmental, and national scales. These datasets have several limitations. First, the 30 meteorological stations used in generating climate surfaces are sparsely and unevenly distributed over Colombia, leading to uncertainty in the outputs. Moreover, some of the original gridded data we obtained had differing resolutions, including 0.1 arc-degrees (GPM), 0.5 arc-degrees (CPC), and 1 arc-degree (G-Econ). This meant that we had to resample these gridded products (GPM, CPC, GEcon) with crude estimates based on average values over a large swath of grid cells. Further, unlike all other products we used that were non-projected geographic WGS1984 raster files, the Tera and Aqua MODIS NDVI products were in sinusoidal projections, causing some distortions when re-projected to match population layers used in weighting. In addition to spatial discrepancies, we also had to overcome the relatively poor temporal resolutions of Tera and Aqua MODIS NDVI products (which come at 16-day intervals) by linearly interpolating between two data points to fill in the 15 days in between, before aggregating the results at weekly time steps. Furthermore, daily satellite based rainfall data from NOAA assume 12:00-12:00 hour-day, which could potentially cause slight inconsistencies, despite the data finally being aggregated at weekly time steps. Other limitations include the modifiable area unit problem, which arises from disparities in the arbitrary sizes and borders of the administrative units which may bias aggregations based on these borders.

Additional information

How to cite this article: Siraj, A. S. et al. Spatiotemporal incidence of Zika and associated environmental drivers for the 2015-2016 epidemic in Colombia. Sci. Data 5:180073 doi: 10.1038/sdata.2018.73 (2018). Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
  30 in total

1.  Estimating the future number of cases in the Ebola epidemic--Liberia and Sierra Leone, 2014-2015.

Authors:  Martin I Meltzer; Charisma Y Atkins; Scott Santibanez; Barbara Knust; Brett W Petersen; Elizabeth D Ervin; Stuart T Nichol; Inger K Damon; Michael L Washington
Journal:  MMWR Suppl       Date:  2014-09-26

2.  Dynamic life table model for Aedes aegypti (Diptera: Culicidae): analysis of the literature and model development.

Authors:  D A Focks; D G Haile; E Daniels; G A Mount
Journal:  J Med Entomol       Date:  1993-11       Impact factor: 2.278

3.  Estimating drivers of autochthonous transmission of chikungunya virus in its invasion of the americas.

Authors:  T Alex Perkins; C Jessica E Metcalf; Bryan T Grenfell; Andrew J Tatem
Journal:  PLoS Curr       Date:  2015-02-10

4.  Statistical power and validity of Ebola vaccine trials in Sierra Leone: a simulation study of trial design and analysis.

Authors:  Steven E Bellan; Juliet R C Pulliam; Carl A B Pearson; David Champredon; Spencer J Fox; Laura Skrip; Alison P Galvani; Manoj Gambhir; Ben A Lopman; Travis C Porco; Lauren Ancel Meyers; Jonathan Dushoff
Journal:  Lancet Infect Dis       Date:  2015-04-14       Impact factor: 25.071

5.  Model-based projections of Zika virus infections in childbearing women in the Americas.

Authors:  T Alex Perkins; Amir S Siraj; Corrine W Ruktanonchai; Moritz U G Kraemer; Andrew J Tatem
Journal:  Nat Microbiol       Date:  2016-07-25       Impact factor: 17.745

6.  Interim Guidelines for Pregnant Women During a Zika Virus Outbreak--United States, 2016.

Authors:  Emily E Petersen; J Erin Staples; Dana Meaney-Delman; Marc Fischer; Sascha R Ellington; William M Callaghan; Denise J Jamieson
Journal:  MMWR Morb Mortal Wkly Rep       Date:  2016-01-22       Impact factor: 17.586

7.  Big city, small world: density, contact rates, and transmission of dengue across Pakistan.

Authors:  M U G Kraemer; T A Perkins; D A T Cummings; R Zakar; S I Hay; D L Smith; R C Reiner
Journal:  J R Soc Interface       Date:  2015-10-06       Impact factor: 4.118

Review 8.  Environmental and Social Change Drive the Explosive Emergence of Zika Virus in the Americas.

Authors:  Sofia Ali; Olivia Gugliemini; Serena Harber; Alexandra Harrison; Lauren Houle; Javarcia Ivory; Sierra Kersten; Rebia Khan; Jenny Kim; Chris LeBoa; Emery Nez-Whitfield; Jamieson O'Marr; Emma Rothenberg; R Max Segnitz; Stephanie Sila; Anna Verwillow; Miranda Vogt; Adrienne Yang; Erin A Mordecai
Journal:  PLoS Negl Trop Dis       Date:  2017-02-09

Review 9.  A systematic review of mathematical models of mosquito-borne pathogen transmission: 1970-2010.

Authors:  Robert C Reiner; T Alex Perkins; Christopher M Barker; Tianchan Niu; Luis Fernando Chaves; Alicia M Ellis; Dylan B George; Arnaud Le Menach; Juliet R C Pulliam; Donal Bisanzio; Caroline Buckee; Christinah Chiyaka; Derek A T Cummings; Andres J Garcia; Michelle L Gatton; Peter W Gething; David M Hartley; Geoffrey Johnston; Eili Y Klein; Edwin Michael; Steven W Lindsay; Alun L Lloyd; David M Pigott; William K Reisen; Nick Ruktanonchai; Brajendra K Singh; Andrew J Tatem; Uriel Kitron; Simon I Hay; Thomas W Scott; David L Smith
Journal:  J R Soc Interface       Date:  2013-02-13       Impact factor: 4.118

10.  The incubation periods of Dengue viruses.

Authors:  Miranda Chan; Michael A Johansson
Journal:  PLoS One       Date:  2012-11-30       Impact factor: 3.240

View more
  6 in total

1.  Temperature Dramatically Shapes Mosquito Gene Expression With Consequences for Mosquito-Zika Virus Interactions.

Authors:  Priscila Gonçalves Ferreira; Blanka Tesla; Elvira Cynthia Alves Horácio; Laila Alves Nahum; Melinda Ann Brindley; Tiago Antônio de Oliveira Mendes; Courtney Cuinn Murdock
Journal:  Front Microbiol       Date:  2020-06-12       Impact factor: 5.640

2.  Local and regional dynamics of chikungunya virus transmission in Colombia: the role of mismatched spatial heterogeneity.

Authors:  Sean M Moore; Quirine A Ten Bosch; Amir S Siraj; K James Soda; Guido España; Alfonso Campo; Sara Gómez; Daniela Salas; Benoit Raybaud; Edward Wenger; Philip Welkhoff; T Alex Perkins
Journal:  BMC Med       Date:  2018-08-30       Impact factor: 8.775

3.  Temperature drives Zika virus transmission: evidence from empirical and mathematical models.

Authors:  Blanka Tesla; Leah R Demakovsky; Erin A Mordecai; Sadie J Ryan; Matthew H Bonds; Calistus N Ngonghala; Melinda A Brindley; Courtney C Murdock
Journal:  Proc Biol Sci       Date:  2018-08-15       Impact factor: 5.530

4.  Opportunities for improved surveillance and control of dengue from age-specific case data.

Authors:  Isabel Rodriguez-Barraquer; Henrik Salje; Derek A Cummings
Journal:  Elife       Date:  2019-05-23       Impact factor: 8.713

5.  Joint Estimation of Relative Risk for Dengue and Zika Infections, Colombia, 2015-2016.

Authors:  Daniel Adyro Martínez-Bello; Antonio López-Quílez; Alexander Torres Prieto
Journal:  Emerg Infect Dis       Date:  2019-06       Impact factor: 6.883

6.  Effects of changes in temperature on Zika dynamics and control.

Authors:  Calistus N Ngonghala; Sadie J Ryan; Blanka Tesla; Leah R Demakovsky; Erin A Mordecai; Courtney C Murdock; Matthew H Bonds
Journal:  J R Soc Interface       Date:  2021-05-05       Impact factor: 4.118

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.