| Literature DB >> 31158250 |
Alexander C Keyel1,2, Oliver Elison Timm2, P Bryon Backenson3, Catharine Prussing4, Sarah Quinones2, Kathleen A McDonough1,4, Mathias Vuille2, Jan E Conn1, Philip M Armstrong5, Theodore G Andreadis5, Laura D Kramer1.
Abstract
West Nile virus (WNV; Flaviviridae: Flavivirus) is a widely distributed arthropod-borne virus that has negatively affected human health and animal populations. WNV infection rates of mosquitoes and human cases have been shown to be correlated with climate. However, previous studies have been conducted at a variety of spatial and temporal scales, and the scale-dependence of these relationships has been understudied. We tested the hypothesis that climate variables are important to understand these relationships at all spatial scales. We analyzed the influence of climate on WNV infection rate of mosquitoes and number of human cases in New York and Connecticut using Random Forests, a machine learning technique. During model development, 66 climate-related variables based on temperature, precipitation and soil moisture were tested for predictive skill. We also included 20-21 non-climatic variables to account for known environmental effects (e.g., land cover and human population), surveillance related information (e.g., relative mosquito abundance), and to assess the potential explanatory power of other relevant factors (e.g., presence of wastewater treatment plants). Random forest models were used to identify the most important climate variables for explaining spatial-temporal variation in mosquito infection rates (abbreviated as MLE). The results of the cross-validation support our hypothesis that climate variables improve the predictive skill for MLE at county- and trap-scales and for human cases at the county-scale. Of the climate-related variables selected, mean minimum temperature from July-September was selected in all analyses, and soil moisture was selected for the mosquito county-scale analysis. Models demonstrated predictive skill, but still over- and under-estimated WNV MLE and numbers of human cases. Models at fine spatial scales had lower absolute errors but had greater errors relative to the mean infection rates.Entities:
Mesh:
Year: 2019 PMID: 31158250 PMCID: PMC6546252 DOI: 10.1371/journal.pone.0217854
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
A summary of literature that includes Connecticut or New York as part of the study area.
Studies varied in their choice of dependent variable (De). Independent variables were classified as Surveillance (Su), climate (Cl), land cover (La), Sociological (So), host-related (Ho), or Other (Ot).
| Study | Spatial Extent | Spatial Resolution | Temporal Extent | Temporal Resolution | De | Su | Cl | La | So | Ho | Ot |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Allan et al. 2009 [ | USA | County | 2002–2004 | Annual | 0 | 0 | 0 | 1 | 2 | 0 | |
| Andreadis et al. 2004 [ | CT | Point | 1999–2003 | Annual | 1 | 0 | 0 | 0 | 0 | 0 | |
| Andreadis et al. 2004 [ | CT | Point | 1999–2003 | Annual | 0 | 0 | 0 | 1 | 0 | 0 | |
| Bowden et al. 2011 [ | USA | County | 2002–2008 | 7-year period | 0 | 0 | 14 | 0 | 0 | 0 | |
| Brown et al. 2008a [ | New Haven, CT | Point | 2004 | Annual | 0 | 0 | 2 | 0 | 0 | 0 | |
| Brown et al. 2008b [ | CT, DE, MA, MD, NJ, NY, PA, RI | County | 1999–2006 | Annual | 0 | 0 | 2 | 1 | 0 | 1 | |
| Brownstein et al. 2002 [ | NY (7 counties) | Point | 1999 | Annual | 0 | 0 | 1 | 0 | 0 | 0 | |
| DeFelice et al. 2017 [ | Suffolk, NY | County | 2001–2014 | Weekly | 2 | 0 | 0 | 0 | 0 | 0 | |
| DeFelice et al. 2017 [ | Suffolk, NY | County | 2001–2014 | Weekly | 2 | 0 | 0 | 0 | 0 | 0 | |
| DeFelice et al. 2018 [ | USA (12 counties) | County | 2001–2016 | Weekly | 2 | 6 | 0 | 0 | 0 | 0 | |
| DeFelice et al. 2018 [ | USA (12 counties) | County | 2001–2016 | Weekly | 2 | 6 | 0 | 0 | 0 | 0 | |
| Diuk-Wasser et al. 2006 [ | Fairfield, CT | Point | 2001–2003 | 3-year-period | 0 | 0 | 97 | 1 | 0 | 0 | |
| Gates and Boston 2009 [ | USA | County | 2004–2006 | 3-year period | 0 | 0 | 1 | 1 | 0 | 0 | |
| Gates and Boston 2009 [ | USA | County | 2004–2006 | 3-year period | 0 | 0 | 1 | 0 | 0 | 1 | |
| Hahn et al. 2015 [ | USA | County | 2004–2012 | Annual | 0 | 10 | 0 | 0 | 0 | 0 | |
| Keyel et al. (this study) | NY, CT | County, Point | 2000–2015 | Annual | 5 | 66 | 4 | 2 | 7 | 2 | |
| Keyel et al. (this study) | NY, CT | County, Point | 2000–2015 | Annual | 6 | 66 | 4 | 2 | 7 | 2 | |
| Landesman et al. 2007 [ | USA | County | 2002–2004 | Annual; Monthly | 0 | 6 | 0 | 0 | 0 | 0 | |
| Little et al. 2016 [ | Suffolk, NY | 13 × 13 km cells | 2001–2015 | Monthly | 0 | 48 | 0 | 0 | 0 | 0 | |
| Liu et al. 2009 [ | CT | Township | 2000–2005 | Daily | 5 | 3 | 6 | 1 | 0 | 0 | |
| Manore et al. 2014 [ | USA | County | 2005–2011 | Annual | 2 | 96 | 1 | 6 | 27 | 0 | |
| Myer et al. 2017 [ | Suffolk, NY | Point | 2008–2014 | Weekly | 0 | 6 | 37 | 1 | 0 | 0 | |
| Myer and Johnston 2019 [ | Nassau, NY | Point | 2001–2015 | Weekly | 1 | 6 | 16 | 4 | 0 | 2 | |
| Paull et al. 2017 [ | USA | State | 1999–2009 | Annual | 1 | 4 | 0 | 0 | 0 | 0 | |
| Rochlin et al. 2008 [ | Suffolk, NY | Point | 2000–2004 | Annual | 0 | 0 | 10 | 1 | 0 | 1 | |
| Rochlin et al. 2008 [ | Suffolk, NY | Point | 2000–2004 | Annual | 0 | 0 | 10 | 1 | 0 | 1 | |
| Rochlin et al. 2009 [ | Suffolk, NY | Point | 1999–2006 | Annual | 0 | 0 | 3 | 0 | 0 | 2 | |
| Rochlin et al. 2011 [ | Suffolk, NY | Point | 2001–2004 | 4-year period | 8 | 0 | 30 | 13 | 0 | 5 | |
| Shaman et al. 2011 [ | Suffolk, NY | 13 × 13 km cells | 2001–2009 | Annual | 0 | 60 | 0 | 0 | 0 | 0 | |
| Tonjes 2008 [ | Suffolk, NY | Zip Codes | 2000–2004 | Annual | 2 | 0 | 0 | 1 | 0 | 0 | |
| Trawinski and MacKay 2008 [ | Erie, NY | Point | 2001–2005 | Weekly | 0 | 33 | 0 | 0 | 0 | 0 | |
| Trawinski and MacKay 2010 [ | Amherst, Erie, NY | Point | Not reported | 2–5 weeks | 0 | 12 | 66 | 27 | 0 | 51 | |
| Walsh 2012 [ | NY | County | 2000–2010 | Annual | 1 | 2 | 1 | 0 | 0 | 0 | |
| Young et al. 2013 [ | USA | County | 2003–2008 | 6-year period | 0 | 30 | 17 | 0 | 0 | 3 |
1 USA: United States of America; CT: Connecticut; DE: Delaware; MA: Massachusetts; MD: Maryland; NJ: New Jersey; NY: New York State; PA: Pennsylvania; RI: Rhode Island.
2 Assumed. Whether years were pooled or analyzed individually was not clear from the methods section.
3 De: Dependent variables: E equine cases; H Human cases; H z-score deviation from mean number of human cases; H Human per-capita incidence; Hni Human West Nile neuroinvasive disease cases only; H human cases present or absent; M percent of mosquito pools testing positive; M Mosquito abundance; M Mosquito infection rate; M The proportion of mosquitoes belonging to a particular species, M Presence/absence of WNV in mosquito pools.
4 Su: Surveillance variables such as number of dead birds, WNV positive birds, Human WN in previous years, Human infection rate, human immunity (estimated), mosquito infection rates from previous timepoints, mosquito abundance, absence of mosquito surveillance, site classification based on previous WNV infection rates (high, medium, low), number of complaints about mosquitoes, number of known larval sites, WNV positive mosquito pools, distance to nearest complaint, distance to nearest known larval site, distance to nearest WNV positive bird, distance to nearest WNV positive mosquito pool.
5 Cl: Climate and hydrological variables such as temperature, precipitation, growing degree days, and anomalies for each of these variables. Often calculated as minimum, mean, maximum, or cumulative values for different time periods (e.g., month, season, year).
6 Temperature and rainfall values were discussed, but not statistically related to the WNV results.
7 La: Land cover variables such as percent/proportion land cover for different land cover types, buffer distances, or administrative units or distance to land cover features. Soil drainage characteristics were also included here, as were Normalized Vegetation Difference Index (NDVI), Disease Water Stress Index (DWSI), and Middle Infrared Band.
8 So: Sociological variables such as age (median), education, employment (percent), household income (median), housing age, human population (density), human population (total), race, senior households (count, >65), septic systems (count), vacant housing (percent), urban or rural (categorical).
9 Ho: Host variables such as avian abundance (e.g., by order or species), avian diversity, and community competence.
10 Ot: Other variables, such as aspect, catch basin area, catch basin count, county area, elevation, equine density, flood zone, flood zone (distance to nearest), road length, road polygons (index of fragmentation), slope, wastewater treatment plants (distance from, count per administrative unit), year.
Fig 1Observed mosquito infection rate (MLE) vs. predicted MLE from the WNV model using the entire data set.
Background colors correspond to a classification of model predictions based on MLE of 5 [22]. Green corresponds to a correct prediction of high WNV MLE (27 records, 12.4%), blue corresponds to a correct prediction of low WNV MLE (157 records, 72.0%). Yellow corresponds to an error where the model predicts MLE to be high, but it is not (14 records, 6.4%), whereas orange corresponds to an error where the model predicts MLE to be low, but MLE was high (20 records, 9.2%). Future models should aim to improve the model’s sensitivity (0.57), although the specificity (0.92) is also of concern. Note that some predictions can be quite accurate, and still result in misclassification if they are near the classification threshold.
Model fit results for the calculated mosquito infection rates (per 1000).
Climate indicates whether climate variables were included, N indicates sample size, while WNV+ N indicates the number of samples estimated to have WNV present. RMSE, Median RMSE, Max Error, Scaled RMSE, R, r, and r are defined in Methods: model fit statistics.
| Scale | Climate | WNV+ | RMSE | Median RMSE | Scaled RMSE | Max Error | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| County | YES | 218 | 132 | 2.8 | 2.3 | 1.04 | 19.8 | 0.45 | 0.69 | 0.68 |
| County | NO | 218 | 132 | 3.3 | 2.7 | 1.21 | 23.3 | 0.26 | 0.67 | 0.51 |
| Trap | YES | 3156 | 955 | 8.2 | 7.7 | 2.34 | 87.4 | 0.16 | 0.45 | 0.40 |
| Trap subset | YES | 2596 | 395 | 1.2 | 1.0 | 2.13 | 15.4 | 0.53 | 0.59 | 0.73 |
| Trap subset | NO | 2596 | 395 | 1.4 | 1.2 | 2.49 | 10.9 | 0.36 | 0.57 | 0.61 |
Fig 2Observed number of human cases of WNV across all of New York and Connecticut vs. predicted number of human cases of WNV from the model using the entire data set.
Background colors correspond to a classification of model predictions based on a threshold of 1 human case. Green corresponds to a correct prediction of one or more human cases (65 records, 7.4%), blue corresponds to a correct prediction of no human cases (704 records, 79.8%). Yellow corresponds to an error where the model predicts at least one human case, but none were observed (38 records, 4.3%), whereas orange corresponds to an error where the model predicts no human cases, but at least one was observed (75 records, 8.5%). Sensitivity (0.46) and specificity (0.95) were similar to the estimates for county-scale mosquito infection rates.
Model fits for the human data at the county-scale.
The All Counties analysis was based on 882 county × year records, while the subset contained 206 county × year records for which surveillance data were available. RMSE, Max Error, Median RMSE, Scaled RMSE, R, r, and r are defined in Methods: model fit statistics.
| Scale | Climate | RMSE | Median RMSE | Scaled RMSE | Max Error | |||
|---|---|---|---|---|---|---|---|---|
| All Counties | YES | 2.0 | 1.6 | 2.45 | 30.2 | 0.72 | 0.39 | 0.86 |
| All Counties | NO | 2.5 | 1.7 | 2.93 | 37.6 | 0.60 | 0.45 | 0.79 |
| Subset | YES | 3.7 | 1.7 | 1.80 | 42.3 | 0.52 | 0.70 | 0.72 |
| Subset | NO | 4.0 | 2.1 | 1.94 | 44.1 | 0.45 | 0.66 | 0.68 |
| Subset -S | YES | 3.9 | 1.7 | 1.88 | 43.1 | 0.48 | 0.70 | 0.69 |
1 Without surveillance variables
Fig 3Predicted and observed WNV mosquito infection rates (MLE, a, c) and human cases (b, d) for 2012, a particularly widespread WNV year.
MLE thresholds from Little et al. [22]: blue corresponds to MLE < 1 mosquito per 1000, yellow corresponds to MLE 1–5 per 1000, and red to MLE > 5 per 1000. White indicates excluded counties for which we did not have mosquito surveillance data. For human cases (b, d), blue indicates no human cases, yellow indicates 1–5 cases, and red indicates more than 5 cases.
Fig 5Predicted (open < 1, filled ≥ 1) and observed (black < 1, red ≥ 1) number of human WNV cases for each county and year.
Data were not available for New York for 2000–2002, hence the missing points.
Fig 4Predicted (unfilled ≤ 5, filled > 5) and observed (black ≤ 5, red > 5) infected mosquitoes per 1000 for each county and year for WNV.
Missing points correspond to missing years for those counties. Point sizes are scaled relative to the observed infection rate.
Climate variables identified as important by the random forest model when the model with all covariates was run, and when a model with only climate covariates was run (only C).
Model results are presented for human cases in those counties where mosquito surveillance data were collected, and for mosquito infection rates (MLE) at both the county and trap scales. Values in the table indicate the amount of unique variation explained by the variable using variance partitioning, while a blank indicates that the variable was not included in the final predictive model.
| Variables appearing in a final model | Human | Human subset only C | MLE county | MLE county only C | MLE | MLE |
|---|---|---|---|---|---|---|
| Mean minimum temperature (Jan–Mar) | <0.001 | 0.01 | 0.001 | |||
| Mean minimum temperature (Apr–Jun) | 0.01 | |||||
| Mean minimum temperature (Jul–Sep) | 0.004 | 0.03 | 0.01 | 0.02 | 0.01 | 0.01 |
| Mean minimum temperature anomaly (Oct–Dec) | 0.01 | |||||
| Mean maximum temperature (Jan–Mar) | 0.003 | 0.001 | 0.01 | |||
| Mean maximum temperature (Jul–Sep) | 0.001 | |||||
| Mean maximum temperature anomaly (Jan–Mar) | 0.002 | |||||
| Minimum observed temperature (Jul–Sep) | 0.01 | |||||
| Minimum observed temperature (Oct–Dec) | 0.01 | |||||
| Maximum observed temperature (Apr–Jun) | 0.02 | 0.01 | 0.03 | |||
| Maximum observed temperature (Oct–Dec) | 0.02 | |||||
| Maximum observed temperature anomaly (Apr–Jun) | 0.02 | 0.01 | ||||
| Daily temperature range (Jan–Mar) | 0.003 | |||||
| Daily temperature range (Jul–Sep) | 0.01 | |||||
| Daily temperature range (Oct–Dec) | 0.004 | |||||
| Daily temperature range anomaly (Jan–Mar) | 0.02 | |||||
| Soil moisture anomaly (Apr–Jun) | 0.03 | 0.04 | ||||
| Soil moisture anomaly (Jul–Sep) | 0.04 | 0.05 | ||||
| Soil moisture anomaly (Oct–Dec) | 0.01 | |||||
| Growing degree days (Jul–Sep) | 0.002 | |||||
| Growing degree days anomaly (Apr–Jun) | 0.01 | |||||
| Growing degree days anomaly (Oct–Dec) | 0.01 | 0.02 | 0.01 |
a We hypothesize that the contribution of this variable is related to the end of the mosquito season in October.
Non-climatic variables identified as important by the random forest model when the model with all covariates was run, and when a model without climate covariates was run (-C).
Model results are presented for human cases in those counties where mosquito surveillance data were collected, and for mosquito infection rates (MLE) at both the county and trap scales. Values in the table indicate the amount of unique variation explained by the variable using variance partitioning.
| Variables appearing in a final model | Human subset | Human subset -C | MLE County | MLE County -C | MLE Trap | MLE Trap -C |
|---|---|---|---|---|---|---|
| Mosquito infection rate | 0.02 | 0.05 | NA | NA | NA | NA |
| Mosquito abundance index | 0.15 | 0.28 | ||||
| Mosquito density index | 0.06 | |||||
| Trap bait type | ||||||
| Total population | 0.01 | 0.02 | 0.003 | <0.001 | ||
| Population density | 0.02 | 0.002 | 0.01 | |||
| Percent urban | 0.02 | <0.001 | 0.01 | |||
| Percent forest | 0.002 | 0.03 | <0.001 | |||
| Percent open | 0.02 | 0.001 | ||||
| Percent wetland | 0.002 | |||||
| American Robin Index | 0.06 | 0.02 | 0.01 | 0.02 | ||
| American Crow Index | 0.002 | 0.01 | 0.03 |
a We note that the sum of the values in this column exceeds the total amount of variation explained by the model (0.36). This occurred because the model without one or more of these variables explained less variation than just using the mean value from the validation data set and therefore had a negative R value as the baseline instead of zero (see Coefficient of determination section in methods for the method of calculating the R).
Fig 6Predicted mosquito infection rates (MLE, contours) increase non-linearly with 2nd quarter soil moisture anomaly and 3rd quarter temperature.
Cool years with normal soil moisture were associated with the lowest MLE. Warm years showed high MLE regardless of soil moisture and dry years often (but not always) had high MLE. Observations (red circles, size is proportional to MLE) broadly support these predictions. Contour lines correspond to predictions made for a regular grid of 100 points covering the range of both variables. Predictions were made for mean values for all other covariates (see Tables 4 and 5 for included variables, see S1 File for mean values), while observed values correspond to the exact variable combinations and therefore may not exactly correspond to the predictions. Observations are plotted as a general guide to identify major patterns and highlight particular exceptions.
Fig 7Warm winter temperatures and dry summers were associated with the highest risk of mosquito infection with WNV.
Observations (red circles, size is proportional to infection rate) broadly support these predictions. Contour lines correspond to predictions made for a regular grid of 100 points covering the range of both variables. Predictions were made for mean values for all other covariates (see Tables 4 and 5 for included variables, see S1 File for mean values), while observed values correspond to the exact variable combinations and therefore may not exactly correspond to the predictions. Observations are plotted as a general guide to identify major patterns and highlight particular exceptions.
Fig 8For individual trap sites, the risk of WNV increased with increasing mosquito abundance, especially when the mean minimum temperature in the 3rd quarter was high.
Contour lines correspond to predictions from a regular grid of 100 points, (with values from other covariates fixed at a mean value). Observed infection rates (red circles, size is proportional to infection rate) are plotted for comparison, but note that they use exact parameter combinations and not the mean conditions used for making the predictions.
Fig 9Risk of human cases of West Nile were highest for locations with high total populations, especially in years with a warm summer.
Data correspond to the human subset analysis. Contour lines correspond to predictions from a regular grid of 100 points, (with values from other covariates fixed at a mean value). Observed infection rates (red circles, size is proportional to infection rate) are plotted for comparison, but note that they use exact parameter combinations and not the mean conditions used for making the predictions.
Fig 10Mean minimum temperature (a), soil moisture anomaly (b), mosquito infection rate (c), and human case counts (d) by year for five example counties.
Fig 11Total human population of the study region.
Note that the five counties of New York City have been merged into a single entity. Data taken from the US Census [100,111].