| Literature DB >> 34499656 |
Alexander C Keyel1,2, Morgan E Gorris3, Ilia Rochlin4, Johnny A Uelmen5, Luis F Chaves6, Gabriel L Hamer7, Imelda K Moise8, Marta Shocket9, A Marm Kilpatrick10, Nicholas B DeFelice11,12,13, Justin K Davis14, Eliza Little15, Patrick Irwin16,17, Andrew J Tyre18, Kelly Helm Smith19, Chris L Fredregill20, Oliver Elison Timm2, Karen M Holcomb21, Michael C Wimberly14, Matthew J Ward22,23, Christopher M Barker21, Charlotte G Rhodes7, Rebecca L Smith5.
Abstract
West Nile virus (WNV) is a globally distributed mosquito-borne virus of great public health concern. The number of WNV human cases and mosquito infection patterns vary in space and time. Many statistical models have been developed to understand and predict WNV geographic and temporal dynamics. However, these modeling efforts have been disjointed with little model comparison and inconsistent validation. In this paper, we describe a framework to unify and standardize WNV modeling efforts nationwide. WNV risk, detection, or warning models for this review were solicited from active research groups working in different regions of the United States. A total of 13 models were selected and described. The spatial and temporal scales of each model were compared to guide the timing and the locations for mosquito and virus surveillance, to support mosquito vector control decisions, and to assist in conducting public health outreach campaigns at multiple scales of decision-making. Our overarching goal is to bridge the existing gap between model development, which is usually conducted as an academic exercise, and practical model applications, which occur at state, tribal, local, or territorial public health and mosquito control agency levels. The proposed model assessment and comparison framework helps clarify the value of individual models for decision-making and identifies the appropriate temporal and spatial scope of each model. This qualitative evaluation clearly identifies gaps in linking models to applied decisions and sets the stage for a quantitative comparison of models. Specifically, whereas many coarse-grained models (county resolution or greater) have been developed, the greatest need is for fine-grained, short-term planning models (m-km, days-weeks) that remain scarce. We further recommend quantifying the value of information for each decision to identify decisions that would benefit most from model input.Entities:
Mesh:
Year: 2021 PMID: 34499656 PMCID: PMC8428767 DOI: 10.1371/journal.pntd.0009653
Source DB: PubMed Journal: PLoS Negl Trop Dis ISSN: 1935-2727
Fig 1Map of specific locations where WNV models included in this comparison have been applied.
Some models (Spatial Risk Random Forest, not shown) have been applied across the entire US. Green corresponds to analyses with state extents, blue to county extents, and pink to subcounty extents. State outlines are from Natural Earth (https://www.naturalearthdata.com/downloads/50m-cultural-vectors/). City of Chicago boundary is publicly available from the City of Chicago (https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-City/ewy2-6yfk), and county boundaries and the outline for Coachella Valley were derived from US Census tract boundaries (https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html) dissolved to provide a single outline using the Dissolve algorithm in QGIS (https://qgis.org/en/site/). WNV, West Nile virus.
Model overview: A comparison of model class, spatial, temporal resolution, software implementation, and code availability.
| Model | Class of Model | Spatial Resolution | Temporal Resolution | Software | Code Available |
|---|---|---|---|---|---|
| A. Historical Null | Spatial patterns | Flexible | Annual | R |
|
| B. Spatial Risk Random Forest | Spatial patterns | County | Mean from 2005–2018 | R | No |
| C. Temperature-trait-based Relative R0 Model | Spatial patterns | Flexible | Flexible | R |
|
| D. Spatial Risk High Resolution BRT Model | Spatial patterns | 300 × 300 m | Mean from (2004–2017) | R | No (in progress) |
| E. RF1 | Early warning | Flexible | Annual | R |
|
| F. NE_WNV County-years | Early warning | County | Annual | R, mgcv |
|
| G. GLMER Ensemble | Early warning | 13 × 13 km grid | Monthly | R | No |
| H. Harris County | Early warning | Whole Harris County | Month | R | Based on code for SAR models presented by [ |
| I. ArboMAP | Early detection | Typically county | Weekly | R |
|
| J. Chicago Ultra-Fine Scale | Early detection | 1-km hexagon | 1 week (epi weeks 18–38) | JMP, SAS | No (in progress) |
| K. Model-EAKF System | Early detection | Mosquito abatement district | Weekly | Matlab/R | Available upon request |
| L. Temperature-forced Model-EAKF System | Early detection | Mosquito abatement district | Weekly | Matlab/R | Available upon request |
| M. California Risk Assessment | Early detection | Flexible | Flexible | VectorSurv Gateway (website) | Available upon request |
1Spatial patterns: models with predictions that do not vary by year. Early warning: models that do not include current-year surveillance data, may include current-year climate/weather data, and have a model lead time on the order of days to months. Early detection: models that include current-year surveillance data, may include other data streams, and have a lead time on the order of days to months.
2The model itself is flexible with respect to temporal resolution. The GitHub implementation was designed for annual temporal resolution.
3The website is implemented in Javascript, PHP, SQL, Google Maps API, and Mapbox API.
Model inputs.
| Model | Human Data | Mosquito Surveillance | Climate/Weather | Land-cover | Sociological | Other |
|---|---|---|---|---|---|---|
| A. Historical Null | Y1 | Y1 | N | N | N | N |
| B. Spatial Risk Random Forest | Y | N | Y | N | N | N |
| C. Temperature-trait-based Relative R0 Model | N | N | Y | N | N | N |
| D. Spatial Risk High Resolution BRT | Y | N | Y | Y | N | Y |
| E. RF1 | Y | Y | Y | Y | Y | Y |
| F. NE_WNV County-years | Y | N | Y | N | N | N |
| G. GLMER Ensemble | N | Y | Y | N | N | N |
| H. Harris County | N | Y | Y | Y | N | Y |
| I. ArboMAP | Y | Y | Y | N | N | N |
| J. Chicago Ultra-Fine Scale | Y | Y | Y | Y | Y | Y |
| K. Model-EAKF System | Y | Y | N | N | N | Y |
| L. Temperature-forced Model-EAKF System | Y | Y | Y | N | N | Y |
| M. California Risk Assessment | Y | Y | Y | Y | N | Y |
1For the Null model, only human data are required to predict human cases, and only mosquito surveillance data are required to predict mosquito infection rates. Mosquito surveillance is not used to predict human cases or vice versa in this model.
Fig 2Examples of key model outputs.
(A) A summary of key outputs for 1 year. (B) Cumulative human cases (annual human cases), (C) Culex mosquito abundance per trap night, (D) vector index (Culex abundance times infection rate by week), and (E) MIR per 1,000 mosquitoes. Peak MLE/IR is the mosquito infection rate in the peak week, Peak week for MLE/IR is the week in which the peak is reached, while Seasonal MLE/MIR is the infection rate over the season when the mosquitoes are active (using either MLEs or MIRs). Culex, Culex abundance; IR, mosquito infection rate, either as MIR or MLE; HC, human cases; MIR, minimum infection rate; MLE, maximum likelihood estimate of infection rate; VI, vector index.
Model output/predictions.
Prediction targets included human case counts, mosquito infection rates as either MIRs or as MLEs. Probabilistic models are those that generate predictions as probability distributions rather than single mean values. The additional prediction targets column indicates whether the model generates additional outputs not otherwise included in the table.
| Model | Annual Human Cases | Seasonal MLE/MIR | Peak MLE/MIR | Peak Week for MLE/MIR | Vector Index (weekly) | Probabil-istic? | Additional Prediction Targets |
|---|---|---|---|---|---|---|---|
| A. Historical Null | Y | Y | N | N | N | Y | N |
| B. Spatial Risk Random Forest | N | N | N | N | N | N | Y |
| C. Temperature-trait-based Relative R0 Model | N | N | N | N | N | Y | Y |
| D. Spatial Risk High Resolution BRT | N | N | N | N | N | Y | Y |
| E. RF1 | Y | Y | N | N | N | Y | N |
| F. NE_WNV County-years | Y | N | N | N | N | Y | Y |
| G. GLMER Ensemble | N | Y | N | N | N | N | N |
| H. Harris County | N | Y | Y | Y | N | N | Y |
| I. ArboMAP | Y | Y | N | N | N | Y | Y |
| J. Chicago Ultra-Fine Scale | Y | N | N | N | N | Y | Y |
| K. Model-EAKF System | Y | Y | Y | Y | N7 | Y | Y |
| L. Temperature-forced Model-EAKF System | Y | Y | Y | Y | N | Y | Y |
| M. California Risk Assessment | N | N | N | N | N | N | Y |
1Peak week could also be calculated for human cases but typically is not done in practice; therefore, this output was omitted from the table.
2The model has been upgraded since the initial publication to support probabilistic outputs.
3Counties with cases.
4In principle, the model could produce probabilistic output.
5The model uses vector index as a predictor but does not predict values for vector index.
6Can theoretically inverse cases and MIR, but model not tested for that.
7The model can be parameterized with either MLE infection rates or vector index, but empirically, the results from the vector index parameterization were not as strong, and, therefore, the final model is based on MLE.
8+/−25% of peak week, human cases, total infections over the season; +/−25% or 1 human case.
9Virus transmission risk to humans.
BRT, Boosted Regression Trees; EAKF, Ensemble-adjustment Kalman Filter; MIR, minimum infection rate; MLE, maximum likelihood estimate of infection rate.
Model applications.
Only published model applications were included. Each line corresponds to a separate model test; therefore, some models appear more than once. References are listed for further details.
| Model | Study | Prediction Target | Sample Size | Spatial Domain | Time Domain | Testing Method | Metric Score |
|---|---|---|---|---|---|---|---|
| B. Spatial Risk Random Forest | [ | Mean annual incidence per 100,000 population | 43,512 county-years | Conterminous US (3,108 counties) | 2005–2018, averaged | Bootstrapping | |
| D. Spatial Risk High Resolution BRT | [ | Ranked relative risk (0–1) | 1,378 human cases | South Dakota | 2004–2017 | Out of sample data | |
| E. RF1 | [ | Annual human cases | 882 county-years | New York and Connecticut | 2000–2015 | LOYOCV | |
| E. RF1 | [ | Seasonal mosquito MLE | 218 county-years | New York and Connecticut | 2000–2015 | LOYOCV | |
| E. RF1 | [ | Seasonal mosquito MLE | 2,596 trap-years | New York and Connecticut by trap | 2000–2015 | LOYOCV | |
| F. NE_WNV County-years | [ | 2018 human cases | 1,472 county-years | Nebraska | 2002–2017 | Out of sample data | |
| F. NE_WNV County-years | [ | 2018 WNV positive counties | 1,472 county-years | Nebraska | 2002–2017 | Out of sample data | |
| G. GLMER Ensemble | [ | MLE mosquito infection rate | 225 grid-years | Suffolk County, New York | 2001–2015 | LOYOCV | |
| H. Harris County | [ | MLE mosquito infection rate (1-month lead) | 130,567 trap-nights | Harris County, Texas | 2002–2016 | Out of sample data | |
| H. Harris County | [ | Mosquito abundance (1-month lead) | 10,533,033 mosquitoes | Harris County, Texas | 2002–2016 | Out of sample data | |
| I. ArboMAP | [ | Positive county-weeks | Approximately 9,504 county-weeks (training) | South Dakota | 2004–2015 (training) 2016 (testing) | Out of sample data | |
| I. ArboMAP | [ | Positive county-weeks | Approximately 11,088 county-weeks | South Dakota | 2004–2017 | Fit to training data only | |
| J. Chicago Ultra-Fine Scale | [ | Human case probability (by hexagon) | 1,346,940 hexagon-weeks | Variable, up to 5,345 1-km hexagons | 2005–2016 | Fit to training data only | |
| K. Model-EAKF System | [ | Annual human cases; peak mosquito infection rates; peak timing of infectious mosquitoes; annual infectious mosquitoes | 21 county-years | 2 counties (Suffolk, New York and Cook, Illinois) | Weekly, Varied by location | Retrospective data assimilation |
|
| K. Model-EAKF System | [ | Multiple | 110 outbreak-years | 12 counties | Weekly, Varied by location | Retrospective data assimilation |
|
| K. Model-EAKF System | [ | Multiple | 4 county-years | 4 counties | Weekly, 2017 | Real-time data assimilation |
|
| L. Temperature-forced Model-EAKF System | [ | Multiple | 110 outbreak- years | 12 counties | Weekly, Varied by location | Retrospective data assimilation |
|
| M. California Risk Assessment | [ | Historical outbreaks of western equine encephalomyelitis and St. Louis encephalitis as proxy for WNV | 14 agency-years | California | Half-months | Temporal correspondence | Early detection of arbovirus risk prior to outbreaks |
| M. California Risk Assessment | [ | Onset and peak of human cases by geographic region | 12 half-months in 3 regions | California | Half-months | Retrospective data assimilation | Early detection of WNV risk prior to onset and peak of human cases |
| M. California Risk Assessment | [ | Emergency planning threshold (risk ≥ 2.6) | 11,476 trap-nights | Los Angeles Country, California | 2004–2010 | Retrospective data assimilation |
1LOYOCV: leave-one-year-out cross-validation; Out of sample data: accuracy based on data not used to develop the model; Fit to training data only: accuracy based on the same data used to develop the model; Retrospective data assimilation: finalized data until the time of forecast; Real-time data assimilation: data processed and available at the time of forecast.
2R: predictive R2, i.e., an R2 calculated on data outside the sample, R: Spearman correlation coefficient, AUC: area under the curve, Threshold-based accuracy: +/−25% of peak week, human cases, total infections over the season; +/−25% or 1 human case, RMSE: Root Mean Squared Error, CRPS: Continuous Ranked Probability Score.
3Results for 2018 reported here, validation was also performed separately for 2012–2017, see [34] for details.
4Three analyses presented: short-term: AUC = 0.856, annual made on July 5: AUC = 0.836, annual made on July 39: AUC = 0.855.
5Restricted to July–September for each year.
6Restricted to 21 epi weeks per year.
7Varied by analysis and lead time.
8Prediction targets: human cases in next 3 weeks; annual human cases; week with highest percentage of infectious mosquitoes; peak mosquito infection rate; annual infectious mosquitoes.
AUC, area under the curve; CRPS, Continuous Ranked Probability Score; LOYOCV: leave-one-year-out cross-validation; RMSE, Root Mean Squared Error; WNV, West Nile virus.
Fig 3Generalized overview of major factors, tools, and decisions utilized by mosquito control agencies.
This figure is based on 4 representative mosquito abatement districts: 2 in Chicago (IL), Slidell (LA), and Houston (TX). Management practices may differ from program to program, but similar challenges and decisions are made from across varying spatial (local to district-wide) and temporal (days to multiple months) scales.
Fig 5A summary of the spatial and temporal resolution for the 41 models reviewed in [17] that are not included in Fig 4.
Numbers indicate the number of models at that spatial and temporal scale.
List of common decisions made regarding a public health and vector control response to WNV.
Letters correspond to models in Tables 1–4 and indicate models with an appropriate spatial or temporal resolution to inform the decision. Note that this pertains to the scale on which predictions are made and provides no information on the accuracy of the model predictions. As such, models with appropriate scale, but insufficient accuracy, would not be useful in an operational context.
| Public health decisions | Potentially applicable models | |
|---|---|---|
| When (timing) | Where (area) | |
| Mosquito and WNV surveillance (trap sites) |
|
|
| Mosquito and WNV surveillance (county/district thresholds) |
|
|
| Public health and outreach |
|
|
| Larviciding |
|
|
| Truck-based adulticiding |
|
|
| Aerial adulticiding |
|
|
WNV, West Nile virus.
Fig 4The 13 models reviewed in this paper arranged by spatial and temporal resolution.
Rectangles with decreasing shades of gray indicate less coverage identifying potential knowledge gaps. These gaps may guide future model development or require additional data collection, as many models are at the county-annual scale due to data availability.
Classification of temporal and spatial resolutions relevant to vector control and public health decision-making.
| Classification Term | Spatial or Temporal | Resolution |
|---|---|---|
| Long-term planning | Temporal | Years to decades |
| Medium-term planning | Temporal | Months to year |
| Short-term planning | Temporal | Days to weeks |
| Coarse grain | Spatial | Multiple/large management districts (e.g., county or above) |
| Medium grain | Spatial | Single management district or county subdivision |
| Fine grain | Spatial | Meters to km, within a management district |
Fig 6Examples of the 3 spatial scales described in Table 6 for Long Island, NY.
(a) Coarse-grain: county, (b) medium-grain: county subdivision, and (c) fine-grain: 30 × 30 m resolution for vegetation types [40], with the NY county outlines in gray for context. County outlines and county subdivisions from the 2017 US Census https://www.census.gov/geo/maps-data/data/tiger-line.html).