David J Rogers1, Sarah E Randolph. 1. Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, UK. david.rogers@zoo.ox.ac.uk
Abstract
West Nile virus, severe acute respiratory syndrome and monkeypox are infectious diseases that have recently been introduced into areas far from their region of origin. The greatest risk of new diseases comes from zoonoses--pathogens that circulate among wild animals and are occasionally transferred to humans by intermediate invertebrate hosts or vectors that are sensitive to climatic conditions. Analytical tools that are based on geographical information systems and that can incorporate remotely sensed information about the environment offer the potential to define the limiting conditions for any disease in its native region for which there are at least some distribution data. The direction, intensity or likelihood of its spread to new regions could then be predicted, potentially allowing disease early-warning systems to be developed.
West Nile virus, severe acute respiratory syndrome and monkeypox are infectious diseases that have recently been introduced into areas far from their region of origin. The greatest risk of new diseases comes from zoonoses--pathogens that circulate among wild animals and are occasionally transferred to humans by intermediate invertebrate hosts or vectors that are sensitive to climatic conditions. Analytical tools that are based on geographical information systems and that can incorporate remotely sensed information about the environment offer the potential to define the limiting conditions for any disease in its native region for which there are at least some distribution data. The direction, intensity or likelihood of its spread to new regions could then be predicted, potentially allowing disease early-warning systems to be developed.
West Nile virus first appeared in the New World in 1999, in New York, USA. By the end of 2002, this virus had spread to Canada and 44 states of the USA (plus the District of Columbia) and, in 2002 alone, caused clinical infections in more than 4,000 people and 284 deaths[1].Severe acute respiratory syndrome (SARS) was first recognized in China in mid-November 2002. By mid-June 2003, it had affected more than 8,000 people, killing 790, in 33 countries of all the inhabited continents of the world[2].Monkeypox, which is related to smallpox, is known historically only in Africa, but was (probably) imported into Texas, USA, in early April 2003, in a shipment of Gambian rats that were destined for the pet trade. Transfer to native prairie dogs, which were in turn shipped to other pet traders or were displayed and handled in pet fairs, carried monkeypox to Wisconsin, Indiana, Illinois, Missouri, Kansas and Ohio, and resulted in a total of 71 human infections in these states by early July 2003 (Ref. 3).In all of these examples, little or nothing is known about the natural circulation of the pathogens concerned within their countries of origin, or about the precise method, or likelihood, of their transfer to new areas. For every 'new' or 'emerging' disease of this type, there could be many others that are carried by the same mechanisms, but which then fail to establish in new countries for reasons that are also unclear.These examples, and many more like them, show that the modern equivalent of the spread of cholera from the Broad Street pump to Londoners in the 1840s (Ref. 4) could now encompass the world and its population. 'Small-world' effects increasingly 'connect' geographically distant places[5] — affecting not only knowledge, trade and tourism, but also infectious diseases. The immediate questions to be answered are which diseases will be involved in such transfers, from which sources, to which destinations and by what routes; and what will be the chances of the diseases establishing and spreading from any point of introduction?A geographical information system (GIS), which can hold both the disease data and any other information within the same geographical framework, has the analytical power to help answer these questions. When the diseases in question are known, or suspected, to have environmental risk factors, the addition of remotely sensed (RS) environmental data to the GIS greatly enhances its explanatory power[6]. This article explains how using RS data in a GIS can help us understand the spatio-temporal dynamics of a wide range of disease systems, especially, but not exclusively, those with environmental correlates.Steps to understanding epidemiologyTo predict where diseases might spread, it is necessary to understand their epidemiology in the regions in which they have been historically recorded. First, we must identify the pieces of the puzzle — the pathogen, its vertebrate hosts and the routes of transmission between hosts. Second, the patterns of each disease must be recorded — both the distribution in space and changes with time. An essential part of this second step is an appreciation of the environment in which transmission is taking place. Third, we need to understand the dynamic processes of transmission that ultimately determine the patterns that are observed. Given this baseline knowledge, we might finally be able to estimate the likelihood of pathogen spread to, and establishment in, new areas.The first task is essentially one of diagnosis, a task that is not always simple in complex natural biological systems. Identifying the pieces of the puzzle can lead to an expectation of the resulting patterns, but it can be difficult to anticipate the number of important pieces that are involved. For example, it might have been expected that West Nile virus in North America would be transmitted by only a few native American mosquito species (the vectors). A pathogen in a new continent is unlikely to find many competent vectors, and many flaviviruses (the family to which West Nile virus belongs) typically have only one principal vector species and relatively few additional species of any transmission importance in any region[7]. By August 2003, however, West Nile virus, its RNA or antigens had been detected in 43 mosquito species in the United States[8] (although this does not prove full transmission competence). Had this large number of potential vector species been known or predicted in advance, the rapid spread of West Nile virus in North America would not have been so surprising.The second set of problems — that of the distribution, incidence or prevalence of a disease — is usually tackled using statistical techniques. As empirical disease data are typically sparse, even for well-established, perennial diseases, researchers seek to make predictive maps to fill in the gaps using richer environmental data. All available information is gathered — for example, records of the known presence or intensity of the disease from areas as geographically extensive, and covering as wide a range of natural environments, as possible. This information is stored within a GIS that is, at its core, little more than a database that also records the geographical location of each observation[9]. In addition, the GIS must contain the environmental and other data in the same geographical framework, thereby allowing correlations to be established between these data and the disease data[10].Whilst the database origins allow all the usual calculations of summary or averaged data, the geographical essence of GISs allows them to be used to reveal the spatio-temporal structure of disease cases. For example, disease clusters — the geographical co-occurrence of cases that indicates local transmission of infectious diseases or the presence of environmental determinants of non-infectious diseases — are common. Childhood arthritis is not traditionally thought of as infectious, but the recognition of a cluster of cases in Old Lyme, Connecticut in the late 1970s[11] (albeit without the benefit of a GIS) ultimately led to the discovery of tick-borne Lyme borreliosis[12], the most widespread and prevalent vector-borne infection in the northern temperate world.Satellite imagery (Box 1) is a powerful component of modern disease GISs. Satellite sensors provide data from which information about rainfall, temperature, humidity and vegetation conditions at the Earth's surface can be derived. These conditions are crucial for the indirect transmission of pathogens by vectors or intermediate hosts, such as insects, ticks, snails or rodents[6,13,14]. Environmental conditions might also be important for any directly transmitted (for example, host-to-host) pathogens that must survive for any period of time outside the host. Bovinetuberculosis outbreaks in cattle herds are thought by many to be caused by contamination from infected wild animals, and high-risk areas have been predicted accurately using seasonal features of atmospheric humidity and air temperature[15]. This result might offer some clues about the precise route of transmission, which is still unknown.. For many pathogens, and especially for those with intermediate hosts, transmission is seasonal. Furthermore, the continuing existence of the pathogen depends not just on conditions during the transmission season (usually spring through to autumn in temperate regions, and the rainy season in tropical regions), but also on conditions during winter or the dry season — these conditions can determine the survival of the pathogen or its vector from one year to the next. Satellite images with high spatial resolution, such as Landsat imagery, are not recorded sufficiently often to capture the full details of seasonal cycles; their infrequent images often miss important periods.Satellite imagery at high temporal resolution (Box 1), however, can produce clear monthly pictures (although at the expense of spatial resolution), but the result is large volumes of data, which often show strong serial correlations that affect the power of statistical analyses. To reduce the volume of data and remove these serial correlations without losing the biologically meaningful signals, a technique of time-series analysis that was invented by the French mathematician Joseph Fourier (1768–1830) is used. Fourier solved a problem in calculus that had defeated Newton and subsequent generations of mathematicians, by showing that a complex time series can always be expressed as the sum of a series of sine curves with different amplitudes, frequencies and phases (that is, timings) around a characteristic mean. Using Fourier's techniques, therefore, it is possible to extract information about the annual, bi-annual and tri-annual cycles of rainfall, temperature and other parameters that characterize the natural environments of diseases from the multi-temporal satellite data (Fig. 1) (it is this shift from the time to the frequency domain that removes the serial correlation in the satellite data). The output of temporal Fourier analysis is a set of orthogonal (that is, uncorrelated) variables that capture the seasonality that is of interest in epidemiology[6,13,16], and these variables can therefore be used to classify habitats and describe vector and pathogen distributions. For disease systems, Fourier variables are the environmental equivalent of the genes of individual pathogens, and whole Fourier-processed images (Fig. 2) that capture all the interactive space–time features of a habitat can be likened to the organismal genome.
Figure 1
Normalized difference vegetation index signal from a point in central Wales, UK.
The monthly normalized difference vegetation index (NDVI) signal from a point in central Wales, UK (3.58°W, 52.34°N) for 1996–1999 is shown in light blue, the mean monthly signal for this period is shown in red (displaced vertically by 0.1 for clarity), the temporal Fourier fit to these data is shown in orange, and the annual, bi-annual and tri-annual components of this fit are shown in yellow, purple and green, respectively (right-hand scale). The temporal Fourier fit describes the average annual cycle at this site well.
Figure 2
Normalized difference vegetation index data (NVDI) Fourier images of Europe.
a | The mean is shown in red. b | The annual amplitude is shown in blue. c | The annual phase (that is, the timing of the annual peak) is shown in green. d | All three signals are shown together, which shows how this method of analysis captures habitat seasonality across Europe (notice, for example, that the mean is generally higher in western Europe, with the exception of many parts of Spain, but the annual amplitude is higher in eastern Europe). The white arrow in the mean image points to the site from which the data for Fig. 1 were obtained.
Normalized difference vegetation index signal from a point in central Wales, UK.
The monthly normalized difference vegetation index (NDVI) signal from a point in central Wales, UK (3.58°W, 52.34°N) for 1996–1999 is shown in light blue, the mean monthly signal for this period is shown in red (displaced vertically by 0.1 for clarity), the temporal Fourier fit to these data is shown in orange, and the annual, bi-annual and tri-annual components of this fit are shown in yellow, purple and green, respectively (right-hand scale). The temporal Fourier fit describes the average annual cycle at this site well.
Normalized difference vegetation index data (NVDI) Fourier images of Europe.
a | The mean is shown in red. b | The annual amplitude is shown in blue. c | The annual phase (that is, the timing of the annual peak) is shown in green. d | All three signals are shown together, which shows how this method of analysis captures habitat seasonality across Europe (notice, for example, that the mean is generally higher in western Europe, with the exception of many parts of Spain, but the annual amplitude is higher in eastern Europe). The white arrow in the mean image points to the site from which the data for Fig. 1 were obtained.In a GIS, the statistical relationships that are established between the disease and environmental data sets are applied at the full spatial resolution of the latter, richer data sets, to produce a 'risk map' (Fig. 3). This effectively shows the similarity of environmental conditions in unsurveyed places to environmental conditions in which the disease has been recorded as being either present or absent. This similarity is usually expressed as the probability with which each area on the ground (corresponding to one picture element, or 'pixel', of the satellite imagery) belongs to the class of areas that are known to contain the disease. Errors in risk maps arise for several reasons; sometimes the input data are out-of-date, or simply wrong; sometimes the explanatory variables that are used are inappropriate; and sometimes the model itself is wrong. Only false-negatives (that is, false predictions of absence) are clear indicators of an incorrect or inappropriate model, and much can be learnt by investigating why these arise. Even when none of these applies, risk maps often indicate larger areas as 'at risk' than are known to be affected by the disease at present (that is, many false-positives), because an organism or disease will not occupy all 'suitable' habitats. False-positives also highlight an important application of predictive risk maps — to warn health agencies of the potential spread of a disease into these areas[17]. The significant increases in the incidence of tick-borne encephalitis (TBE) over the past two decades have indeed been accompanied by new records in many supposedly 'false-positive' regions[18,19,20,21].
Figure 3
Results of statistical modelling of the distribution of vectors and disease using selected temporal Fourier-processed images as predictor variables.
Authors: Pere P Simarro; Giuliano Cecchi; Massimo Paone; José R Franco; Abdoulaye Diarra; José A Ruiz; Eric M Fèvre; Fabrice Courtin; Raffaele C Mattioli; Jean G Jannin Journal: Int J Health Geogr Date: 2010-11-01 Impact factor: 3.918
Authors: Anna M Winters; Rebecca J Eisen; Mark J Delorey; Marc Fischer; Roger S Nasci; Emily Zielinski-Gutierrez; Chester G Moore; W John Pape; Lars Eisen Journal: Am J Trop Med Hyg Date: 2010-05 Impact factor: 2.345
Authors: Peter R Wielinga; Cor Gaasenbeek; Manoj Fonville; Albert de Boer; Ankje de Vries; Wim Dimmers; Gerard Akkerhuis Op Jagers; Leo M Schouls; Fred Borgsteede; Joke W B van der Giessen Journal: Appl Environ Microbiol Date: 2006-10-06 Impact factor: 4.792
Authors: Rita M Zorzenon dos Santos; Ana Amador; Wayner V de Souza; Maria Fatima P M de Albuquerque; Silvina Ponce Dawson; Antonio Ruffino-Netto; Carlos R Zárate-Bladés; Celio L Silva Journal: PLoS One Date: 2010-11-30 Impact factor: 3.240
Authors: Anne G Gatewood; Kelly A Liebman; Gwenaël Vourc'h; Jonas Bunikis; Sarah A Hamer; Roberto Cortinas; Forrest Melton; Paul Cislo; Uriel Kitron; Jean Tsao; Alan G Barbour; Durland Fish; Maria A Diuk-Wasser Journal: Appl Environ Microbiol Date: 2009-02-27 Impact factor: 4.792