| Literature DB >> 22984887 |
Caroline B Zeimes1, Gert E Olsson, Clas Ahlm, Sophie O Vanwambeke.
Abstract
Because their distribution usually depends on the presence of more than one species, modelling zoonotic diseases in humans differs from modelling individual species distribution even though the data are similar in nature. Three approaches can be used to model spatial distributions recorded by points: based on presence/absence, presence/available or presence data. Here, we compared one or two of several existing methods for each of these approaches. Human cases of hantavirus infection reported by place of infection between 1991 and 1998 in Sweden were used as a case study. Puumala virus (PUUV), the most common hantavirus in Europe, circulates among bank voles (Myodes glareolus). In northern Sweden, it causes nephropathia epidemica (NE) in humans, a mild form of hemorrhagic fever with renal syndrome.Logistic binomial regression and boosted regression trees were used to model presence and absence data. Presence and available sites (where the disease may occur) were modelled using cross-validated logistic regression. Finally, the ecological niche model MaxEnt, based on presence-only data, was used.In our study, logistic regression had the best predictive power, followed by boosted regression trees, MaxEnt and cross-validated logistic regression. It is also the most statistically reliable but requires absence data. The cross-validated method partly avoids the issue of absence data but requires fastidious calculations. MaxEnt accounts for non-linear responses but the estimators can be complex. The advantages and disadvantages of each method are reviewed.Entities:
Mesh:
Year: 2012 PMID: 22984887 PMCID: PMC3517350 DOI: 10.1186/1476-072X-11-39
Source DB: PubMed Journal: Int J Health Geogr ISSN: 1476-072X Impact factor: 3.918
Figure 1Human hantavirus infections in Sweden.
Independent variables and hypothesized relationships with the abundance of bank voles, the ex vivo virus survival and the human presence
| Area of forests in a 3-km radius around the dwelling (m2) | Forests | x | | | SLU Skogskarta |
| *Mean volume of spruce per hectare in a 3-km radius around the dwelling (m3/ha) | Volume of spruce | x | | | SLU Skogskarta |
| Mean volume of pines per hectare in a 3-km radius around the dwelling (m3/ha) | Volume of pine | x | | | SLU Skogskarta |
| *Maximum distance to forests in a 3-km radius around the dwelling (m) | | x | | | SLU Skogskarta |
| Number of patches of forests 3-km radius | | x | | | SLU Skogskarta |
| Mean shape index of forests 3-km radius | | x | | | SLU Skogskarta |
| Mean contiguity index of forests in a 3-km radius | | x | | | SLU Skogskarta |
| Mean Euclidian nearest-neighbor distance between patches of forests in a 3-km radius (m) | | x | | | SLU Skogskarta |
| *Area of peat bogs in a 3-km radius around the dwelling (m2) | Peat bogs | x | | | SVK |
| Mean snow depth between 1991 and 1998 (cm) | Snow depth | x | x | | SMHI |
| Average duration of the snow when it is present for at least 10 days (days) | Snow period | x | x | | SMHI |
| Majority of grain size of the soil (1 = coarse, 2 = medium, 3 = fine) in a 3-km radius | Soil grain size | | x | | SGU |
| *Elevation (m) | Elevation | x | x | x | Aster GDEM |
| *Distance to the sea coast (m) | Distance to the sea | x | x | x | SVK |
| *Population density (inhabitant/km2) | Population density | | | x | Gridded population of the world |
| Total length of public roads in a 3-km radius (m) | Roads | | | x | SVK |
| *Distance to holiday homes (m) | Holiday homes | | | x | Statistiska Centralbyran |
| Total length of the water ways in a 3-km radius (m) | Water ways | x | Swedish Places |
* Data log-transformed.
Models obtained by logistic regression and cross-validated logistic regression method
| Intercept | −5.371** | −5.447 |
| Area of forests | 8.048*10-8*** | 8.133-8 |
| Log (distance to forests) | 1.665*** | 1.689 |
| Contiguity | 1.198 | 0.226 |
| Snow depth | −0.016 | −0.016 |
| Log (distance to sea) | −0.470** | −0.471 |
| Log (population density) | 0.544* | 0.109 |
| AIC | 629.74 | 792.77 |
| AUC | 0.972 | 0.721 |
* p-value < 0.05, ** p-value < 0.01 and *** p-value < 0.001.
Figure 2Comparison between results of logistic regression, boosted regression tree, cross-validated logistic regression and MaxEnt model.
AUC of partial models based on variables related to each element
| Logistic regression | 0.732 | 0.695 | 0.684 |
| Boosted regression trees | 0.886 | 0.8244 | 0.801 |
| MaxEnt | 0.891 | 0.893 | 0.922 |
Advantages and disadvantages of logistic regression, boosted regression trees, cross-validated logistic regression and Maxent model
| -Best goodness-of-fit and predictive power | -Need of real absence points | |
| -Inclusion of variables reflecting the surrounding environment | ||
| -Account for non-linearity of biological processes | -Need of real absence points | |
| -Modelling of interactions | -Impossible to see all three at one time | |
| -Inclusion of variables reflecting the surrounding environment | -Difficulty to extrapolate | |
| -Available sites instead of absence sites | -Fastidious calculations | |
| -Inclusion of variables reflecting the surrounding environment | -Limited value compared to logistic regression | |
| -Ease of use | -Complex estimators, difficulty to extrapolate | |
| -Spatially continuous results | -Need of spatially continuous data | |
| -Accounts for non-linearity of biological processes | -Limited by the coarsest resolution and the smallest extent of variables |