| Literature DB >> 25072884 |
Alyson Lorenz1, Radhika Dhingra1, Howard H Chang2, Donal Bisanzio3, Yang Liu1, Justin V Remais4.
Abstract
Extrapolating landscape regression models for use in assessing vector-borne disease risk and other applications requires thoughtful evaluation of fundamental model choice issues. To examine implications of such choices, an analysis was conducted to explore the extent to which disparate landscape models agree in their epidemiological and entomological risk predictions when extrapolated to new regions. Agreement between six literature-drawn landscape models was examined by comparing predicted county-level distributions of either Lyme disease or Ixodes scapularis vector using Spearman ranked correlation. AUC analyses and multinomial logistic regression were used to assess the ability of these extrapolated landscape models to predict observed national data. Three models based on measures of vegetation, habitat patch characteristics, and herbaceous landcover emerged as effective predictors of observed disease and vector distribution. An ensemble model containing these three models improved precision and predictive ability over individual models. A priori assessment of qualitative model characteristics effectively identified models that subsequently emerged as better predictors in quantitative analysis. Both a methodology for quantitative model comparison and a checklist for qualitative assessment of candidate models for extrapolation are provided; both tools aim to improve collaboration between those producing models and those interested in applying them to new areas and research questions.Entities:
Mesh:
Year: 2014 PMID: 25072884 PMCID: PMC4114569 DOI: 10.1371/journal.pone.0103163
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Habitat models included in inter-model comparison.
| Model Name and Reference | Location | Outcome | Predictors | Model Type | Model | Parameter Estimates |
| Tick Patch (Brownstein et al., 2005) | Southern Connecticut (12 towns) | Tick density | Patch size, Patch isolation | Poisson |
|
|
| Lyme Patch (Brownstein et al., 2005) | Connecticut (all counties) | Human incidence | Patch size, Patch isolation | Poisson |
|
|
| Development (Glass et al., 1995) | Baltimore County, MD | Odds of Lyme disease | Extent of development | Binomial |
|
|
| Coniferous (Glass et al., 1995) | Baltimore County, MD | Odds of Lyme disease | Soil that supports coniferous habitat | Binomial |
|
|
| Herbaceous (Glass et al., 1995) | Baltimore County, MD | Odds of Lyme disease | Soil that supports herbaceous habitat | Binomial |
|
|
| NDVI (Ogden et al., 2006) | Quebec, Canada | Number of ticks submitted | Average NDVI, Population | Negative binomial |
|
|
*Intercepts with no parameter estimate provided were not included; TD = tick density; PS = patch size; PI = patch isolation;
Model employed a logistic link function.
HI = human incidence; OL = odds of Lyme disease; HD = highly developed land; GCH = soil classified as fair-good coniferous-supporting habitat; PHH = soil classified as poor-fair herbaceous-supporting habitat; TC = tick count; POP = population; NDVI = normalized difference vegetation index.
Inter-model comparison considerations and questions applied to Lyme disease incidence and tick abundance/presence models and supporting references.
| Criteria | Questions applied to inter-model comparison of Lyme disease/Tick presence models | NDVI Model | Herbaceous Model | Coniferous Model | Development Model | Tick Patch Model | Lyme Patch Model |
|
| |||||||
|
| |||||||
| Modeling technique | Is the model type appropriate/acceptable to the present research question? Is the model parsimonious? Is the choice of model supported by previous findings or theories? |
|
|
|
|
|
|
| Suitable species under study | Does the model describe the same or a closely related species? |
|
|
|
|
|
|
| Model evaluation method | In the original analysis, was data reserved for validation? | N |
|
|
| N | N |
| Model evaluation method | If so, was reserved data available in different locations, time periods, etc? | N |
|
|
| N | N |
|
| |||||||
| Direct predictors | Are the model's predictor variables directly involved, in the new analysis, in some biological process of the organism of interest? | N | N | N | N | N | N |
| Indirect predictor | If not a direct predictor, is the model's indirect variable associated with a variable that directly impacts the biological processes of the organism of interest, under the new research question? |
|
|
|
|
|
|
| Spatial correlation | Is spatial correlation between the predictor an outcome variable considered? |
| N | N | N |
|
|
| Data type & appropriate categorization | Is the data type (continuous, categorical, nominal, ordinal etc.) appropriate for the candidate model and the proposed new question? |
| N | N | N |
|
|
| Relevance of outcome | Is the model outcome variable able to answer the new question? |
| N | N | N |
| N |
|
| |||||||
| Grain | Is the geographic scale of the model appropriate to the question asked in the original analysis? |
| N | N | N | N | N |
| Grain | Is the grain appropriate to the question being asked in the new analysis? |
| N | N | N | N | N |
| Time | Are the time scales and time periods of the original analysis appropriate to the new research question? |
|
|
|
|
|
|
| Methods description | Is the methods description of the original analysis complete? Is it reproducible? |
|
|
| N |
|
|
| Modeling tools | Are the programs and versions used in the original analysis available for application to the new research area or question? |
|
|
|
| N | N |
|
| |||||||
| Data quality | What was the quality of data used in the original analysis? Was the original model fit to high quality data of sufficient quantity? |
|
|
|
|
|
|
| Data quality | What is the quality of data available for the present analysis? |
|
|
|
|
|
|
| Availability of data for time period under study | For the present question, is the correct data available in an appropriate data type format? |
|
|
|
|
|
|
| Availability of data for time period under study | For extrapolation, is data available at a grain similar to the model's original analysis? |
|
|
|
|
|
|
|
| |||||||
|
| |||||||
| Location | Is the geographic location of the original analysis at the given time period similar to or the same as the location of present analysis? | N | N | N | N | N | N |
|
| |||||||
| Presence of variables in new areas | Do the model's variables have values in the new location? |
|
|
| N | N | N |
| Direct predictors | In the new location, are the model's predictor variables directly involved in some biological process of the organism of interest? | N | N | N | N | N | N |
| Indirect predictors | If not, is the model's indirect variable strongly associated with a variable that in the new location impacts the biological processes of the organism of interest? |
|
|
|
|
|
|
| Numerical range of variables in new areas | Is the numerical range of variables in new areas within the range in which the model was fit? | N | N/A | N/A | N/A | N | N |
| Numerical range of variables in new areas | Is there sufficient variety in values of the variables in the new location to create useful variation in the outcome variable? | N | N | N | N | N | N |
| Relevant categorization | Is the categorization of the available data relevant and sufficiently descriptive in the new location? | N/A | Y | Y | N | N/A | N/A |
| Stationarity | Does the model demonstrate stationarity? | ND | ND | ND | ND | ND | ND |
| Spatial correlation | Are potential changes in spatial correlation accounted for? |
|
|
|
|
|
|
|
| |||||||
| Extent | Does the model cover a sufficient geographic extent in comparison to the extent of the extrapolation? | N | N | N | N | N | N |
|
| |||||||
| Availability of data across extrapolation zone | Is the correct data available for the new location in an appropriate data type format? |
|
|
|
|
|
|
| Availability of data across extrapolation zone | For extrapolation, is data available at a grain similar to the model's original analysis? |
| N | N | N | N | N |
| Quality of data across extrapolation zone | What is the quality of data available for the present analysis? |
|
|
|
|
|
|
Y = Yes, N = No, ND = Not determined, N/A = Not applicable.
Figure 1Spatial extent of Eastern United States considered in the analysis, based on 2000 U.S. Census (24.3°N to 45.9°N latitude, 93.0°W to 66.5°W longitude).
County and state level Spearman correlation coefficients (ρ) for pair-wise model comparisons overall and for geographic sub-analyses.
| Model Pair | County Level | State Level | Northeast | Midwest | South | High Elevation | Low Elevation | Coastal | Inland | Urban | Rural |
|
| −1.00 | −1.00 | −1.00 | −1.00 | −1.00 | −1.00 | −1.00 | −1.00 | −1.00 | −1.00 | −1.00 |
|
| −0.13 |
| −0.20 | −0.33 | −0.04 | −0.34 |
|
| −0.15 | −0.16 | −0.16 |
|
| −0.13 |
| −0.26 | −0.03 | −0.20 | −0.08 | −0.17 | −0.11 | −0.06 | −0.03 | −0.18 |
|
|
|
|
| −0.07 |
|
|
|
|
|
|
|
|
| −0.14 | −0.37 | −0.14 |
| −0.19 | 0.01 | −0.17 | −0.10 | −0.11 | −0.08 | −0.19 |
|
|
| −0.34 |
|
|
|
| −0.08 | −0.05 |
|
|
|
|
|
| 0.00 |
|
|
|
|
|
|
|
|
|
|
| −0.23 | −0.39 | −0.32 | 0.07 | −0.34 | −0.10 | −0.33 | −0.29 | −0.16 | −0.09 | −0.29 |
|
|
|
|
| −0.05 |
| −0.01 |
|
|
|
|
|
|
|
|
|
| −0.05 | 0.10 |
|
|
|
|
|
|
|
| −0.01 | −0.01 | −0.39 |
| −0.01 |
| −0.10 |
|
| −0.07 |
|
|
| −0.56 | −0.21 | −0.36 | −0.59 | −0.60 | −0.60 | −0.51 | −0.40 | −0.71 | −0.38 | −0.49 |
|
| −0.59 | −0.48 | −0.90 | −0.45 | −0.62 | −0.54 | −0.64 | −0.71 | −0.45 | −0.67 | −0.55 |
|
|
|
|
|
|
| −0.05 |
|
| −0.03 |
|
|
|
| −0.21 | −0.53 | −0.17 | −0.25 | −0.20 | −0.16 | −0.17 | −0.22 | −0.15 | −0.19 | −0.24 |
Bolded values indicate a positive association.
*Values are significantly different from 0 (p<0.05)
AUC values from MLR analyses for predictive models using CDC data as gold standard.
| Observational Data Set/Dichotomization | Tick Patch N = 1750 | Lyme Patch N = 1750 | Development N = 1814 | Coniferous N = 1814 | Herbaceous N = 1814 | NDVI N = 1814 |
|
| ||||||
| Minimal vs Low/Moderate/High |
| 0.65 | 0.50 | 0.60 | 0.58 | 0.52 |
| Minimal/Low vs Moderate/High | 0.50 |
| 0.65 | 0.65 |
|
|
| Minimal/Low/Moderate vs High | 0.55 |
| 0.79 | 0.71 | 0.55 |
|
| Minimal vs High | 0.44 | 0.50 | 0.78 | 0.75 |
|
|
| Minimal vs Moderate |
| 0.62 | 0.52 | 0.64 | 0.52 | 0.61 |
| Minimal vs Low |
| 0.67 | 0.46 | 0.57 |
| 0.57 |
| Low vs High | 0.64 |
| 0.80 | 0.68 | 0.50 |
|
| Low vs Moderate | 0.56 |
| 0.56 | 0.57 | 0.57 |
|
| Moderate vs High | 0.59 |
| 0.77 | 0.63 |
|
|
| Minimal vs Moderate/High |
| 0.59 | 0.64 | 0.69 |
|
|
| Minimal/Low vs High | 0.54 |
| 0.79 | 0.71 |
|
|
| Minimal vs Low/Moderate |
| 0.66 | 0.47 | 0.58 |
| 0.55 |
| Low vs Moderate/High | 0.60 |
| 0.67 | 0.62 | 0.54 |
|
| Minimal/Low vs Moderate | 0.47 |
| 0.54 | 0.61 | 0.53 |
|
| Low/Moderate vs High | 0.64 |
| 0.80 | 0.67 |
|
|
|
| ||||||
| None vs Reported/Established |
| 0.60 | 0.52 | 0.58 |
|
|
| None/Reported vs Established |
| 0.54 | 0.59 | 0.64 |
|
|
| None vs Established |
| 0.58 | 0.58 | 0.65 |
|
|
| None vs Reported |
| 0.62 |
| 0.53 |
| 0.50 |
| Reported vs Established | 0.55 |
| 0.60 | 0.62 |
|
|
Bolded AUC values indicate a positive association.
*AUC values are significant (p<0.05).
AUC values from MLR analyses for predictive models using CDC data as gold standard – ensemble models.
| Observational Data Set/Dichotomization | Ensemble Model 1: All Models (N = 1750) | Ensemble Model 2: “Top 3” Models (N = 1750) | Ensemble Model 3: Glass et al. (1995) Models (N = 1814) |
| Lyme disease risk | |||
| N vs L/M/H | 0.54 |
| 0.51 |
| N/L vs M/H | 0.59 |
| 0.71 |
| N/L/M vs H | 0.67 |
| 0.81 |
| N vs H | 0.69 |
| 0.81 |
| Tick presence | |||
| A vs R/E | 0.51 |
| 0.53 |
| A/R vs E | 0.56 |
| 0.59 |
| A vs E | 0.55 |
| 0.58 |
Bolded AUC values indicate a positive association.
*AUC values are significant (p<0.05).
N = none/minimal; L = low; M = moderate; H = high; A = absent/none; R = reported; E = established.
Odds ratios in MLR for predictive models using CDC data as gold standard – original and ensemble models.
| Outcome° | Lyme disease risk (CDC) | Tick presence (CDC) | ||||
| OR | 95% CI | AIC | OR | 95% CI | AIC | |
|
| 3761.9 | 3279.8 | ||||
|
|
| (2.9, 5.3) |
| (1.6, 3) | ||
|
|
| (1.2, 3.4) |
| (1.1, 2.1) | ||
|
| 0.9 | (0.5, 1.7) | ||||
|
| 3747.0 | 3274.5 | ||||
|
| 0.7 | (0.7, 0.8) | 0.8 | (0.8, 0.9) | ||
|
| 0.8 | (0.7, 0.9) | 0.9 | (0.8, 1.0) | ||
|
| 1.0 | (0.9, 1.2) | ||||
|
| 3927.9 | 3433.9 | ||||
|
| 0.2 | (<0.001, 269.8) | 15.4 | (0.0, >1000) | ||
|
| <0.001 | (<0.001, 0.2) | 0.0 | (0.0, 0.6) | ||
|
| <0.001 | (<0.001, <0.001) | ||||
|
| 3915.7 | 3402.9 | ||||
|
| 0.4 | (0.2, 0.6) | 0.7 | (0.4, 1.3) | ||
|
| 0.2 | (0.1, 0.5) | 0.2 | (0.1, 0.3) | ||
|
| 0.1 | (0.0, 0.1) | ||||
|
| 3933.5 | 3406.6 | ||||
|
|
| (2.8, 8.2) | 1.6 | (0.9, 2.9) | ||
|
| 1.4 | (0.5, 3.7) |
| (3.7, 13.2) | ||
|
|
| (1.4, 11.6) | ||||
|
| 3901.8 | 3435.8 | ||||
|
| 0.9 | (0.9, 1.0) | 1.0 | (0.9, 1.1) | ||
|
| 1.1 | (1.0, 1.2) |
| (1.0, 1.2) | ||
|
|
| (1.4, 2.0) | ||||
|
| 3808.6 | 3293.8 | ||||
|
| 0.999 | (0.998, 1.000) | 1.000 | (0.999, 1.002) | ||
|
| 0.999 | (0.997, 1.000) | 0.998 | (0.997, 0.999) | ||
|
| 0.994 | (0.992, 0.996) | ||||
|
| 3776.1 | 3244.5 | ||||
|
|
| (1.001, 1.002) |
| (1.001, 1.001) | ||
|
|
| (1.001, 1.002) |
| (1.001,1.002) | ||
|
|
| (1.002, 1.003) | ||||
|
| 3794.9 | 3417.5 | ||||
|
|
| (1.000, 1.001) | 1.000 | (1.000, 1.001) | ||
|
| 0.998 | (0.998, 0.999) | 0.999 | (0.998, 0.999) | ||
|
| 0.994 | (0.993, 0.995) | ||||
AIC = Akaike information criterion; considers both model fit and complexity, used to assess goodness-of-fit.
°For Lyme Disease Risk, 0 = minimal/no risk, 1 = low risk/Lyme disease reported, 2 = medium risk, 3 = high risk. For Tick Presence, 0 = absent/none, 1 = reported, 2 = established.
N = 1750: Some counties had no deciduous forest; thus, patch size and patch isolation could not be calculated.
*Significant positive OR estimate: 95% CI excludes the null (1.0) and OR estimate is >1.0 (p<0.05).