Literature DB >> 26158274

Evaluating Spatial Interaction Models for Regional Mobility in Sub-Saharan Africa.

Amy Wesolowski¹, Wendy Prudhomme O'Meara², Nathan Eagle³, Andrew J Tatem⁴, Caroline O Buckee¹.

Abstract

Simple spatial interaction models of human mobility based on physical laws have been used extensively in the social, biological, and physical sciences, and in the study of the human dynamics underlying the spread of disease. Recent analyses of commuting patterns and travel behavior in high-income countries have led to the suggestion that these models are highly generalizable, and as a result, gravity and radiation models have become standard tools for describing population mobility dynamics for infectious disease epidemiology. Communities in Sub-Saharan Africa may not conform to these models, however; physical accessibility, availability of transport, and cost of travel between locations may be variable and severely constrained compared to high-income settings, informal labor movements rather than regular commuting patterns are often the norm, and the rise of mega-cities across the continent has important implications for travel between rural and urban areas. Here, we first review how infectious disease frameworks incorporate human mobility on different spatial scales and use anonymous mobile phone data from nearly 15 million individuals to analyze the spatiotemporal dynamics of the Kenyan population. We find that gravity and radiation models fail in systematic ways to capture human mobility measured by mobile phones; both severely overestimate the spatial spread of travel and perform poorly in rural areas, but each exhibits different characteristic patterns of failure with respect to routes and volumes of travel. Thus, infectious disease frameworks that rely on spatial interaction models are likely to misrepresent population dynamics important for the spread of disease in many African populations.

Entities: Chemical Disease Species

Mesh：

Year: 2015 PMID： 26158274 PMCID： PMC4497594 DOI： 10.1371/journal.pcbi.1004267

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.475

Introduction

Human mobility patterns underlie the spread of infectious diseases across spatial scales. Theoretical models of human mobility have been used to understand the spatial spread of influenza, cholera, and malaria, for example [1-20] as well as to design targeted interventions [1,5,20-22]. These models rely almost exclusively on two frameworks, the gravity model and the more recent radiation model, both of which were developed to describe regular commuting patterns in high-income settings [23-26]. In the absence of easily available data on travel behavior, these models are increasingly also being applied to models of infectious disease dynamics in low and middle-income settings. Despite the need for robust epidemiological models in places like Sub-Saharan Africa, it remains unclear if gravity and radiation models adequately describe mobility in these populations. Geographic constraints and economic drivers of travel may be substantially different in Sub-Saharan Africa than in high-income countries. Many African countries are experiencing rapid demographic changes and may have poor transportation infrastructure. Many populations remain subsistence farmers living in rural areas with limited economic opportunities, public resources, and infrastructure [27,28]. Kenya exhibits many of these attributes, for example, including highly variable population density and substantial geographic diversity, ranging from the major urban commercial center of Nairobi (population density ~4,510/km2) to the pastoral communities in the northern part of the country (see Fig 1A). Only 7% of Kenyan roads are paved, often those in and out of the capital, as is common in many African countries. Despite these constraints, mobility in many parts of the continent has increased dramatically over the last decade [29], with rural-to-urban migration, seasonal travel, and extensive travel for agricultural and casual laboring jobs forming important components of the emerging ecology of African populations [30].

Fig 1

The country of study and gravity and radiation model example results.

The country of study and gravity and radiation model example results.

A) A population map of Kenya with district boundaries (grey) and major roads (black). For two districts, a schematic representation of B) a gravity model and C) a radiation model. The gravity model is based on the populations of the destination and origin as well as the distance between these locations. The radiation model is based on the destination and origin’s populations as well as the total population within a circle centered at the origin. This model is based on the premise that individuals living in a certain home location will consider the number of job opportunities (measured as the destination’s population proportional to the resident population) and will travel to the closest destination that would offer better benefits than the resident location. Data sources describing these travel patterns are rare, however [31,32], so gravity (parameterized) and radiation (parameter-free) models offer intuitive and tractable analytical frameworks for describing human mobility patterns (Fig 1B and 1C). In their simplest forms both models rely on spatial population data as a proxy for the economic attractiveness of a place and assume a decay in the amount of travel with distance [23,26,33]. In the standard gravity model, Euclidean distance is often used to inform this decay rate, whereas in the radiation model, an individual is likely to travel to the nearest location that offers an improvement in current working conditions (measured via population size), with decay described as a function of the populations and distance between locations. Extensions have been proposed to improve the standard gravity model to include more relevant driving factors of travel such as the percentage of the population that is male, economic activity measures, and land cover [33]. Other formulations of the gravity model constrain the origin and destination travel and has been shown to outperform the standard gravity model [25]. By definition, neither encompasses different types of journeys or different trip durations, which are often important aspects of travel for the spread of infectious disease. Validating these frameworks, in low and middle-income settings in particular, remains challenging. Mobile phone data sets that are routinely collected by mobile operators provide an important new source of information about the dynamics of populations on an unprecedented scale, and provide an opportunity to measure human mobility directly for entire populations [23,25,34-38]. The adoption of mobile phone technologies in Africa in particular has been rapid, providing the opportunity to study population dynamics of countries for the first time [31,35]. Given the difficulties of obtaining and sharing mobile call data records (CDRs), however, it will be important to assess whether measured travel patterns in different regions support the use of gravity and radiation models in places without mobility data. Here, we first review previous infectious disease models that have explicitly included a model of human mobility, and highlight the disparity between models and types of mobility quantified that are used for simulation versus those including epidemiological data. Next, we analyze CDRs from nearly 15 million subscribers in Kenya over the course of a year to test gravity and radiation models in this East African context. We test both gravity and radiation models in the context of Kenya, and show that both models fail to capture important aspects of mobility measured using CDRs, but in different ways. We then test their utility to describe travel over various trip durations and show differences in travel patterns between shorter and longer journeys. Finally, we highlight situations when each model outperforms the other and discuss a method to choose between models using the amount of travel.

Results

We first reviewed infectious disease models that explicitly include human mobility (Fig 2). Here, we focused only on models that represent the first time a particular formulation was used, and not subsequent versions of the same framework (see Supporting Information for the inclusion criteria and overview of papers included, S1 Table). We also included only papers that explicitly modeled both the disease dynamics and mobility patterns and have excluded papers that have not modeled both components (for example see [4,10-17]). We found nineteen studies, eleven of which were purely simulated epidemiological models [10-20] and eight of which included fits to epidemiological data [1-9]. Although these studies analyzed a range of infectious diseases, nearly all simulation studies analyzed the spread of influenza in high-income countries using commuting as the relevant type of mobility (8 out of 11). The majority of examples used a gravity model (10 papers) [2-8,10,13,17,18] and nearly all of the examples using a radiation model were for simulated disease dynamics only (2 papers) [11,12]. The examples that were fit to disease data were more varied although the majority were from low-income countries (5) [1,2,4,5,39] and described regional movement patterns (see Fig 2) [1-5]. Thus, simple gravity model frameworks are very commonly used to understand the regional spread of infectious disease in low-income settings, highlighting the importance of testing their validity and generalizability.

Fig 2

A summary of previously published papers incorporating human mobility models and infectious disease dynamics.

A summary of previously published papers incorporating human mobility models and infectious disease dynamics.

We reviewed nineteen papers that either simulated (simulated) disease dynamics or used epidemiological data (data). These papers covered a range of infectious diseases (see S1 Table). For each paper we identified the type of mobility model, disease model, location (high, low, or both high and low income country), and the type of movement quantified. Mobility models were classified as a gravity model, radiation model, a spatial transmission kernel, a network, or a risk surface. Disease models included a metapopulation model, metapopulation type model with stochastic fadeouts, time spent at locations with different risks of becoming infected, an individual based model (IBM), and a network. For each paper, the location of either the mobility and/or disease data determined the location of the paper with countries separated as high income or low income. Papers that focused on global disease spread were classified as both high and low income. Mobility was classified as either commuting, regional, or global movement patterns. If the paper did not explicitly state the type of mobility included in the paper the type of mobility was discerned from the spatial resolution of the data. Regional movement includes mobility between political admin units that are larger than a city, in general. If a paper included both global movements, such as airline flights, and localized commuting, then the paper was classified as global. We included papers describing various infectious diseases including cholera, malaria, dengue, measles, pertussis, rubella, influenza, and foot and mouth disease. To test the performance of gravity and radiation models in an African setting, we analyzed regional travel across Kenya from de-identified call detail records (CDRs) at the cell tower level from 14,816,521 individual subscribers between June 2008 and June 2009, representing 92% of mobile market share (data previously described in [36]). We have previously used these data to quantify general mobility patterns as well as travel between locations of interest, and compared to census and travel survey data [23,34,36]. Here we focused on regional movement patterns since this is the most common spatial resolution of mobility models used in conjunction with epidemiological data in low-income settings, and regional travel represents a major source of uncertainty in disease models currently. We calculated all journeys between 69 Kenyan districts over the course of one year, ignoring travel within districts. On this spatial scale, movements between districts within the timespan of one day are almost nonexistent (see Supporting Information), so we used the most commonly used tower each day to approximate each subscriber’s location on a daily basis. We fit both an unconstrained gravity model and a radiation model to data, representing the total number of journeys of the course of the year between districts over the course of the data set (one year, see Materials and Methods). We fit a number of constrained gravity models, although these did not perform as well as the standard gravity model (see Supporting Information). Here, we assume that travel measured by CDRs reflects “true” travel behavior, although it is likely to suffer from different types of bias, like any data on human mobility. The models varied widely in their ability to capture observed travel patterns in and out of rural versus urban districts, as illustrated by travel from Nairobi and Garissa (Fig 3). Nairobi is densely populated (total population of district 3.4 million, 10% of the country’s population) encompassing the capital and major population and economic center in the country. Located in the middle of the country, this district is well connected by paved roads to the second largest city (Mombasa 1.2 million) as well as to western Kenya, where nearly half of the population resides. In this setting, both models were able to identify the primary destination locations accurately, although the radiation model predicted travel to a wider range of locations than observed in the CDRs (Fig 3A and 3B and 3C). Garissa, on the other hand, is a sparsely-populated low-income district bordering Somalia, and likely to be more similar to other rural areas in Africa than to high-income countries. For travel originating from Garissa, the predicted volumes and routes of travel were very different from empirical estimates (Fig 3D and 3E and 3F). Most strikingly, the gravity model predicted travel to a much wider range of destinations than observed, and the radiation model failed to identify the primary travel destination. These errors would be likely to lead models to over-estimate the spread of disease in the first case, and under-estimate disease importation into the capital city in the second.

Fig 3

The actual and predicted amounts of travel from two districts.

The actual and predicted amounts of travel from two districts.

The A,D) actual amount of travel, predicted amounts of travel from the B,E) gravity model and C,F) radiation model are shown for two districts. For each map, a continuous density surface was constructed showing the relative spatial distribution of travel. The bar plot shows the amount of travel to all other districts from the district with values displayed in increasing order of distance. Nairobi A,B,C) is the most populated district in the country and includes the capital. In all three figures, the majority of travel (shown in dark blue) is to neighboring locations. The radiation model estimates more travel to the rest of the country than the data or gravity model. Garissa D,E,F) is a rural district bordering Somalia. The majority of actual travel occurs to Nairobi, which the radiation model did not capture. The gravity model was able to predict a large amount of travel to Nairobi, but greatly over predicts travel to the rest of the country. The models diverged systematically in their predictions with regard to travel volume (Fig 4A and 4B) with the gravity model consistently over-predicting travel and the radiation model under-predicting travel (mean ratio of data to predicted results was 0.83 and 35.03, respectively, see S1 Fig). Although the gravity model using Euclidean distance gave a better overall fit to the data than the radiation model (gravity model adjusted R2: 0.786, radiation model adjusted R2: 0.014, see S2 Fig), this was due to the radiation model’s consistent failure to capture large volumes of human travel between major population centers. We hypothesized that one reason for the poor performance of both models in rural areas may be the impact of physical accessibility and road infrastructure on travel. This is likely to be particularly important in Sub-Saharan Africa, and adjusted measures of distance based on estimated travel times, as well as road distance, have been developed for these regions [40]. We re-fit the parameters of the gravity model using road distance and travel times and found that Euclidean distance between district centroids provided the most accurate overall predictions of travel volume across a range of scenarios including the full dataset, travel to and from the capital, and large urban centers (reduction in deviance: 63%-87%). Interestingly, in rural areas road distance noticeably outperformed all other distance measures, suggesting that travel time estimates may not accurately reflect human behavior in these regions (see Fig 4C and S2–S4 Tables).

Fig 4

The results of fitting each spatial interaction model.

The results of fitting each spatial interaction model.

A) The predicted results from both gravity and radiation models. The gravity model (shown in blue) predicts larger amounts of the total volume of travel over the course of the data set (ratio of predicted values to data–mean: 12, 95% quantile interval: 0.34–43) than the data whereas the radiation model (shown in red) underpredicts the volume of travel (ratio of predicted values to data–mean: 0.5, 95% quantile interval: 0.0066–1.7). B) The ratio of predicted versus actual data from both models versus distance. For both models, the predictions over short distances were worse than over longer distances. C) We re-fit gravity models using Euclidean distance (red), travel times (blue) and road distance (green) between district centroids (circle) and population-weighted district (square) centroids. The reduction in deviance of these models is shown. In general, Euclidean distance based gravity models outperformed all other distance measures, except for travel between rural areas. For this type of travel, road distance outperformed Euclidean distance (Euclidean distance–reduction in deviance: 63%, road distance–reduction in deviance: 72%, see Supporting Information). We compared the distribution of errors from both models to identify “rules of thumb” for using gravity and radiation models to estimate volumes of travel (see Materials and Methods). We assumed the empirical error from each model should be normally distributed and categorized the travel routes that fall more than 2 standard deviations away from the mean (10% of routes, see Fig 5A, KS-statistic = 0.2481, p<0.001). In general, both models failed to adequately capture travel from rural areas of intermediate population density over shorter distances, especially in the western part of Kenya in the Rift Valley and Western provinces (Fig 5B and see S5 Table for further analysis). Importantly, these rural regions of intermediate population density are likely to represent sizeable fractions of African populations; in Kenya these provinces where mobility models are systematically failing account for nearly 40% of the population (14 million individuals).

Fig 5

The utility of each model to describe travel in various settings.

The utility of each model to describe travel in various settings.

A) For travel between all pairs of locations, we compared the error in the actual versus predicted amount of travel. If this error was not within defined bounds (outside of the dotted black lines) (see Materials and Methods), we determined that both models do not adequately describe this travel (shown in black). For the remaining volumes and routes of travel, we determined if the gravity model (blue) or the radiation model (red) performed better to identify which model should be used in various settings. For example, the radiation model does much better than the gravity model predicting low volumes of travel and vice versa (shown in red versus blue). B) A schematic highlighting the situations when a radiation model is preferred over a gravity model and when caution should be taken using the predicted results of either model. Scenarios to use a gravity model over a radiation model include: travel to and from a major population centre, over short distances, and when predicting large volumes of travel. A radiation model should be used over a gravity model when describing travel between rural areas and low volumes of travel. Caution should be taken using either model if the travel is between locations of intermediate rural population and over short distances. C) We performed a logistic regression to determine when to use each model for various amounts of travel (a gravity factor). As the amount of travel increases, the number of times the gravity model outperforms the radiation model increases. For examples for the lowest amounts of travel 43% of the time a radiation model if preferable to a gravity model. In contrast for the largest amount of travel, all of these routes (100%) were better predicted with a gravity model than a radiation model. Neither the gravity nor the radiation model was consistently a superior choice, exhibiting different spatial patterns of performance (see Fig 5B), however in general the radiation model outperformed the gravity model for low amounts of travel and vice a versa. We calculated a naïve gravity factor, i.e. a gravity model without any parameters fit (pop_i * pop_j /d(i,j)) and performed a logistic regression to determine which flows were better predicted using each model (see Fig 5C, Supporting Information for regression results using just populations or distance as covariates, S6 Table–adjusted R2 = 0.5703, p<0.001). We observed a strong positive correlation between the gravity factor, which is proportional to the total amount of travel, and the odds of using a gravity model (Fig 5C). These results imply that a gravity model is more likely to capture the spread of disease between major urban centers, but a radiation model may be more appropriate for modeling rural-to-urban migration. In both cases, model performance varied substantially in different locations. An important consideration for spatial models of infectious disease dynamics is the length of journeys, since it will help determine both the number of onward infections generated by an imported case and the risk of exposure to infection of a traveling individual. Gravity and radiation models do not make explicit assumptions about trip durations, but since they were primarily developed to model commuting patterns they may not be appropriate for understanding journeys of varying length. We therefore analyzed the spatial dimensions of human travel for trips of varying duration (see Table 1, Fig 6A) [19] and the ability of each model to describe these different trips. As expected, the total number of trips between districts decreased as journey duration increased (see Figs 6 and S3 and S4). For example, the number of trips lasting between one and two weeks was on average two orders of magnitude greater than the number of trips lasting at least four months (see Supporting Information). The major routes of travel also varied with the trip duration, with longer journeys being associated with increasing distances and larger population sizes at the destination, with Nairobi in particular becoming an increasingly important longer-term destination (see Figs 6B and S5 and S6). We refit a separate gravity model for each duration of travel (note that we do not refit the radiation model since it is parameter free) (see Materials and Methods, Supporting Information). This analysis highlights the difference in the major routes of travel, where the destination population parameter increased as the trip duration increased and the importance of distance in the model decreased (see Table 1).

Table 1

The gravity model parameters and fit for trips lasting various durations.

Variable	Description	Origin Pop (α)	Dest Pop (β)	Dist (γ)	Constant (k)	Reduc Dev (%)
All	Every trip regardless of duration	1.22 (1.22, 1.22)	1.22 (1.22, 1.22)	2.05 (2.05, 2.05)	-20.06 (-20.6, -20.61)	80
Week	One–two weeks	0.95 (0.95, 0.95)	0.905 (0.905, 0.905)	1.73 (1.73, 1.73)	-12.9 (-12.9, -12.9)	60
Biweek	Two weeks–one month	0.903 (0.902, 0.904)	0.907 (0.906, 0.907)	1.58 (1.58, 1.58)	-14.8 (-14.9, -14.8)	54
Month	One–two months	0.907 (0.906, 0.908)	1.086 (1.085, 1.087)	1.47 (1.47, 1.48)	-17.6 (-17.7, -17.6)	54
Month 2	Two–three months	0.91 (0.91, 0.91)	1.39 (1.39, 1.4)	1.33 (1.33, 1.33)	-22.4 (-22.4, -22.4)	55
Month 3	Three–four months	0.902 (0.9, 0.904)	1.81 (1.8, 1.81)	1.19 (1.19, 1.19)	-29.02 (-29.0, -29.1)	57
Month 4+	Four+ months	0.9 (0.89, 0.902)	1.94 (1.93, 1.94)	1.11 (1.11, 1.12)	-31.64 (-31.6, -31.7)	57

For each duration of travel we measured, the gravity model parameters (with CI in parentheses) and fit (the percentage reduction in deviance) is shown. As the duration of travel increases, the destination population (Dest Pop) parameter increases and the distance (Dist) parameter decreases.

Fig 6

The impact of the duration of journeys on travel between districts.

A) For travel between pairs of locations, we compared the number of trips (log) versus the distance (km) for all journeys (grey) and the best fit lines for trips lasting up to between one and two weeks (red), between two weeks and one month (blue), between one and two months (green), between two and three months (orange), between three and four months (purple), and trips lasting four months or more (yellow). As the duration of journeys increased, the amount of travel between districts decreased. B) For all trips including any trip duration (red) and those lasting 4+ months (grey), the top 5% of of routes are shown based on the total amount of travel. For trips lasting short durations, there is a large amount of travel between nearby districts. For trips lasting long durations, the majority of these top routes are to/from major cities including Nairobi and Mombasa.

The impact of the duration of journeys on travel between districts.

Discussion

Our analysis suggests that gravity and radiation models do not adequately capture movements measured by mobile phones in rural and intermediate population density areas in Kenya, areas that are characteristic of many settings in Sub-Saharan Africa. These findings bring into question the universal applicability of these frameworks, and have important implications for estimating the risk of infectious disease importation, for example. Given the ubiquity of gravity and radiation models in epidemiological frameworks, we focused on validating these fundamental frameworks as opposed to examining more recent modifications [24,41]. One important caveat is that we have compared these theoretical models to travel measured via mobile phones, which may be affected by variable ownership and usage patterns, particularly in poor or rural areas [37,42]. Nevertheless, mobile phone data currently represent one of the most direct ways to measure regional population dynamics, especially in low-income settings where commuting and travel survey data may be patchy [42,43]. Here we have focused on the regional and inter-settlement spatial scales that can be measured using CDRs, but an important next step–particularly for infectious disease prediction–is to find appropriate data to examine the performance of gravity and radiation models on extremely local spatial and short temporal scales. Future work devoted to developing a generalizable model that can accurately capture travel in Sub-Saharan Africa, particularly in rural areas with intermediate population densities, will be an important priority for the development of appropriate frameworks for a description of African population dynamics. As more mobile phone data sets become available, the generalizability of our results can be confirmed in other countries assuming mobile phone data provides a reasonable sample of the underlying population [35]. Spatial interaction models can provide researchers with the ability to model population dynamics in low-income and data sparse settings, such as Sub-Saharan Africa. However the universality of these models is questionable, especially when describing rural travel in geographically and economically heterogeneous settings. Applications reliant on the underlying population dynamics derived from either model, such as understanding the spread of an infectious disease or the role of travel on economic activity, are likely to miss important routes and types of travel commonly found in Sub-Saharan Africa.

Materials and Methods

Data sources

We analyzed anonymized mobile phone call data records (CDR) aggregated to the routing mobile phone tower level. These data were provided by the incumbent mobile phone (92% market share at the time of data acquisition) provider in Kenya and included the timings of calls and SMS from 14,816,512 subscribers from June 2008—June 2009 (with February 2009 missing from the data set). As in previous studies [23,34-36], subscribers represented in the CDRs as unique hashed IDs to protect their privacy. Twelve billion mobile phone communications were analyzed, recording activity at a total of 11,920 routing towers. All subscriber data was aggregated to the district level to further preserve anonymity. In the interest of protecting privacy, limited access to the anonymized data was made available to a select set of researchers.

Quantifying travel patterns

Each entry in a CDR contains an anonymized caller ID, anonymized receiver ID, date, duration, and tower routing number for both the caller and receiver. From the CDRs, the geographic location of the caller and receiver could be approximated based on the unique longitude and latitude coordinates for each mobile phone tower. Using the CDRs, a location for each subscriber every time they either made/received a call (or SMS) could be obtained. For each day in the data set, subscribers were assigned a single tower location [35,36]. If the subscriber made at least one call on that day, then the location of the majority routing tower was assigned [35,36]. If there was no majority routing tower, then for the most likely set of towers, a single tower was randomly chosen. If the subscriber had not made a call on that day, then the location of their most recent routing tower was assigned. This provided a time series of tower location for each subscriber on each day. As done in previous studies, trips are calculated by observing when a subscriber’s tower location has changed from the previous day for the entire data set (12 months of data) [35,36]. We aggregated towers to the district-level based on the tower’s location and only trips between towers in different districts were considered to quantify regional movement patterns. In comparison to a number of other studies analyzing spatial interaction models and infectious disease dynamics, we did not focus on commuting patterns since we are describing regional movement patterns, e.g. movement within a country as opposed to within a single city, and few subscribers change districts between daytime and nighttime.

Duration of travel

We investigated the ability of these models to describe travel over various durations. Using the CDR, we were able to quantify both the number of trips between districts as well as the duration of those journeys (in days based on the daily location of each subscriber). For each trip between districts, we counted the number of days the subscriber spent in the visited district. Using the mobile phone data, we compared all travel (every trip between all pairs of districts over the entire data set) to journeys lasting various durations where trips were stratified into six separate groups (see Table 1). The category, All travel includes every trip taken between districts, regardless of trip duration. We grouped all trips lasting at least four months into a single category due to the length of the data set (in total 12 months of CDR data).

Spatial interaction models

The gravity model is the most common spatial interaction model where the amount of travel (Nij) between two locations (i,j) is dependent on their populations (popi, popj) and the physical distance separating them (d(i,j)) [26,35,36]: where the parameters α,β,γ,k are fit based on a Poisson distribution [35,44]. We choose the fitting method based on Flowerdew [44] where the amount of travel estimated using regression assuming a Poisson family. The gravity model has been extensively used to model mobility in conjunctions with models of the spatial spread of infectious diseases [2-8,10,13,17,18]. There have been a number of proposed additions and modifications to the gravity model including adding covariates such as the percentage of the population that is male [33] or putting constraints on the number of trips [25] such as the singly or doubly constrained model. Here, we fit the simplified since the model without covariates is the most commonly used for disease modeling [1,19-25,29,33,34]. We also fit the origin singly constrained model, production singly constrained model, and doubly constrained model (see Supporting Information). However, these non-constrained simplified gravity models outperformed these three models (increase in sum of square errors: non-constrained – 37.9%, origin constraint – 39.5%, destination constraint – 39.5%, and doubly constrained – 39.5%). We also fit separate gravity models to each set of data describing various trip durations (see Table 1 and Supporting Information). Recently, the radiation model has been proposed as an improvement on the gravity model [23]. It draws its original inspiration from a gravity model, but is a stochastic process that only requires information on the population distribution and is parameter free. In this model, the average amount of travel (Nij) between two locations (i,j) is dependent upon their populations and the total population in the circle of radius rij centered at i where rij = d(i,j) (the circle population is sij): < N > = N (pop pop / (pop + s )(pop + pop + s ). where Tc/T is the proportion of the population who travels. If no data is available to fit the radiation model, then Tc/T is fixed and not fit. Here we fit this percentage to the actual data (see Supporting Information) and the optimum value is Tc/T = 1. Recently, extensions to this model have been proposed to reflect human behavior in employment choice using various functional forms [24], however we have focused on the most commonly used model formulation.

Distance measures

We analyzed three separate distance measures (Euclidean, road, and travel time between both district polygon centroids and district population weighted centroids) [40]. Euclidean distance was measured as the straight-line distance between centroids. Road distance was measured using the road network data from the Kenya National Bureau of Statistics. These data with land cover data (www.africover.org) and topography data (http://srtm.csi.cgiar.org/) were used to construct a ‘friction surface’ that was used to estimate travel time distances, following previously outlined methods [40]. The travel time is based on a measure of friction between one location and another that takes into account land cover types, transport network and gradient. In general, this measure is thought to be more representative of the ease of human travel access across a landscape since it takes into account impedances to travel. Similar to previous methods, water bodies, land cover, slope and the road network datasets were combined on a 1km spatial resolution grid to empirically derive travel speeds [40]. These travel speeds were assigned to each land use type and modified based on the topography to create a ‘friction surface’. This surface was used to estimate travel times between locations using least cost methods [40], with those locations defined by population weighted centroids (defined using high resolution population maps provided by the WorldPop Project: www.worldpop.org.uk), where these centroids were automatically adjusted to be located to the nearest road. Correlations between the measures can be found in the Supporting Information.

Choosing which data is poorly described by either model

We calculated the error of each model as the difference between the data and estimated value (error = log(data)–log(predicted)). We took the standard assumption that these errors were normally distributed with mean 0 and standard deviation of 1. For any value not in the confidence interval, we suggest that caution should be taken when utilizing these estimates (see S5 Table) (about 10% of the pairs of locations were eliminated).

Choosing between models

Of routes between districts that were well described by either model, we calculated a gravity factor, gm = pop_i * pop_j / dist(i,j) which is a equivalent to the gravity model without fitting any parameters, as a proxy for the amount of travel between locations. Using this covariate, we then performed a logistic regression to determine when the radiation model or gravity model produced lower errors compared to the actual data for these routes. logit (p) = b 0, + b 1, * X where p is the probability of choosing a gravity model over a radiation model, X is the gravity factor we calculated. As this value increases, i.e. the amount of travel increases, the probability of choosing a gravity model over a radiation model increases (see Fig 5C).

The relationship between empirical and predicted data to various factors for all trips between districts.

The relationship between the ratio of the predicted versus actual data is shown compared to the population A) of the origin, B) of the destination, and C) the distance between the origin and destination for all trips between districts. The gravity model consistently overpredicted travel, whereas the radiation model consistently underpredicted travel. For both the origin and destination population, there was no clear bias in the ability of each model to predict the volume of travel, although both models predicted more accurate estimates as the distance increased. D) The relationship between distance the amount of travel (log) with the trend line from predictions from the data (black), the gravity model (blue), and radiation model (red). (TIF) Click here for additional data file.

The fit of each spatial interaction model.

Using Euclidean distance, we compared the estimated versus empirical amount of travel between areas of varying population size and distance from the radiation and gravity models. We calculated a Sorensen-Dice coefficient to measure the difference between predicted and total volumes of travel. The coefficient values from a A) gravity model and B) radiation model are shown highlighting the better model performance of the gravity model. Both models performed well at predicting travel to nearby highly populated districts. In general, C) the gravity model outperformed the radiation model. We next compared the ability of both models to predict the relative amount of travel D) gravity model, F) radiation model. Both models performed better at predicting relative travel than the total volume of travel with G) the radiation model often outperforming the gravity model. (TIF) Click here for additional data file.

The number of trips versus the distance between the origin and destination districts for various trip durations.

For all durations of travel, we compared the number of trips versus the Euclidean distance between the origin and destination. For all trip durations, the frequency of trips decays with geographic distance. As the duration of travel increases, the frequency of journeys decreases. (TIF) Click here for additional data file.

The relationship between the ratio of the predicted versus actual data is shown compared to the distance between the origin and destination.

For each group of travel based on trip duration, we compared the gravity model predicted values (using Euclidean distance) versus the Euclidean distance between districts (in kilometers). In general, each gravity model over predicted travel and produced the most accurate estimates (log(predict/data) near 0) for travel over short geographic distances. (TIF) Click here for additional data file.

The top 5 percent of routes for various trip durations.

For each duration of travel, the top five percent of routes are shown. For trips lasting shorter durations, the most traveled routes are often to nearby districts. However, as the trip duration increases the most travel routes often include a major city such as Nairobi or Mombasa. (TIF) Click here for additional data file.

The top 200 routes of travel for various trip durations.

Similar to S5 Fig, we have plotted the top 200 routes of travel for various trip durations. As the trip duration increased, the most traveled routes often include a major city such as Nairobi or Mombasa. (TIF) Click here for additional data file.

A summary of the papers analyzed by disease.

Papers that included epidemiological disease data are labeled ‘D’ whereas those that completely simulated disease dynamics are labeled ‘S’. (DOCX) Click here for additional data file.

The gravity model parameters fit for subsets of the data.

(DOCX) Click here for additional data file.

The reduction in deviance from a gravity model for subsets of the data using various distance measures.

A: For the full data set, data including travel to/from: Nairobi, cities, and not including travel to/from: Nairobi, cities, the reduction in deviance (%) from fitting a gravity model is shown. For each distance measure, both population weighted centroids and non-population weighted centroids were calculated. B: For data including travel between or to/from very and moderately rural areas, the reduction in deviance (%) from fitting a gravity model is shown. For each distance measure, both population weighted centroids and non-population weighted centroids were calculated. (DOCX) Click here for additional data file.

The Sorsensen-Dice coefficient for subsets of the data using various distance measures.

A: For data including travel between or to/from very and moderately rural areas, the Sorsensen-Dice coefficient from fitting a gravity model is shown. For each distance measure, both population weighted centroids and non-population weighted centroids were calculated. B: For data including travel between or to/from very and moderately rural areas, the Sorsensen-Dice coefficient from fitting a gravity model is shown. For each distance measure, both population weighted centroids and non-population weighted centroids were calculated. (DOCX) Click here for additional data file.

The ability of each model to capture various situations % (N).

(DOCX) Click here for additional data file.

Regression results predicting when to use a gravity model or radiation model.

We fit a number of logistic regression equations using distance, the origin population, or destination population as the explanatory variable. For each regression equation (see above equations), the coefficients, intercept, and model fit is shown (percentage reduction in deviance and adjusted R2 value). (DOCX) Click here for additional data file.

The yearly amount of travel between districts with corresponding population and distance variables.

(XLSX) Click here for additional data file.

The Supporting Information text.

(DOCX) Click here for additional data file.

38 in total

1. A method of fitting the gravity model based on the Poisson distribution.

Authors: R Flowerdew; M Aitkin
Journal: J Reg Sci Date: 1982-05

2. A universal model for mobility and migration patterns.

Authors: Filippo Simini; Marta C González; Amos Maritan; Albert-László Barabási
Journal: Nature Date: 2012-02-26 Impact factor: 49.962

3. Multiscale mobility networks and the spatial spreading of infectious diseases.

Authors: Duygu Balcan; Vittoria Colizza; Bruno Gonçalves; Hao Hu; José J Ramasco; Alessandro Vespignani
Journal: Proc Natl Acad Sci U S A Date: 2009-12-14 Impact factor: 11.205

4. The role of population heterogeneity and human mobility in the spread of pandemic influenza.

Authors: Stefano Merler; Marco Ajelli
Journal: Proc Biol Sci Date: 2009-10-28 Impact factor: 5.349

5. Travelling waves in the occurrence of dengue haemorrhagic fever in Thailand.

Authors: Derek A T Cummings; Rafael A Irizarry; Norden E Huang; Timothy P Endy; Ananda Nisalak; Kumnuan Ungchusak; Donald S Burke
Journal: Nature Date: 2004-01-22 Impact factor: 49.962

6. Heterogeneous length of stay of hosts' movements and spatial epidemic spread.

Authors: Chiara Poletto; Michele Tizzoni; Vittoria Colizza
Journal: Sci Rep Date: 2012-06-27 Impact factor: 4.379

7. Evolution of scaling emergence in large-scale spatial epidemic spreading.

Authors: Lin Wang; Xiang Li; Yi-Qing Zhang; Yan Zhang; Kan Zhang
Journal: PLoS One Date: 2011-07-01 Impact factor: 3.240

8. Heterogeneous mobile phone ownership and usage patterns in Kenya.

Authors: Amy Wesolowski; Nathan Eagle; Abdisalan M Noor; Robert W Snow; Caroline O Buckee
Journal: PLoS One Date: 2012-04-25 Impact factor: 3.240

9. Seasonal transmission potential and activity peaks of the new influenza A(H1N1): a Monte Carlo likelihood analysis based on human mobility.

Authors: Duygu Balcan; Hao Hu; Bruno Goncalves; Paolo Bajardi; Chiara Poletto; Jose J Ramasco; Daniela Paolotti; Nicola Perra; Michele Tizzoni; Wouter Van den Broeck; Vittoria Colizza; Alessandro Vespignani
Journal: BMC Med Date: 2009-09-10 Impact factor: 8.775

10. Human mobility in a continuum approach.

Authors: Filippo Simini; Amos Maritan; Zoltán Néda
Journal: PLoS One Date: 2013-03-28 Impact factor: 3.240

34 in total

1. Region-wide synchrony and traveling waves of dengue across eight countries in Southeast Asia.

Authors: Willem G van Panhuis; Marc Choisy; Xin Xiong; Nian Shong Chok; Pasakorn Akarasewi; Sopon Iamsirithaworn; Sai K Lam; Chee K Chong; Fook C Lam; Bounlay Phommasak; Phengta Vongphrachanh; Khamphaphongphane Bouaphanh; Huy Rekol; Nguyen Tran Hien; Pham Quang Thai; Tran Nhu Duong; Jen-Hsiang Chuang; Yu-Lun Liu; Lee-Ching Ng; Yuan Shi; Enrique A Tayag; Vito G Roque; Lyndon L Lee Suy; Richard G Jarman; Robert V Gibbons; John Mark S Velasco; In-Kyu Yoon; Donald S Burke; Derek A T Cummings
Journal: Proc Natl Acad Sci U S A Date: 2015-10-05 Impact factor: 11.205

2. Exploring the use of mobile phone data for national migration statistics.

Authors: Shengjie Lai; Elisabeth Zu Erbach-Schoenberg; Carla Pezzulo; Nick W Ruktanonchai; Alessandro Sorichetta; Jessica Steele; Tracey Li; Claire A Dooley; Andrew J Tatem
Journal: Palgrave Commun Date: 2019-03-26

Review 3. Connecting Mobility to Infectious Diseases: The Promise and Limits of Mobile Phone Data.

Authors: Amy Wesolowski; Caroline O Buckee; Kenth Engø-Monsen; C J E Metcalf
Journal: J Infect Dis Date: 2016-12-01 Impact factor: 5.226

4. Thinking clearly about social aspects of infectious disease transmission.

Authors: Caroline Buckee; Abdisalan Noor; Lisa Sattenspiel
Journal: Nature Date: 2021-06-30 Impact factor: 49.962

5. Spatio-temporal dynamics of measles outbreaks in Cameroon.

Authors: Alyssa S Parpia; Laura A Skrip; Elaine O Nsoesie; Moise C Ngwa; Aristide S Abah Abah; Alison P Galvani; Martial L Ndeffo-Mbah
Journal: Ann Epidemiol Date: 2019-11-06 Impact factor: 3.797

6. Comparing metapopulation dynamics of infectious diseases under different models of human movement.

Authors: Daniel T Citron; Carlos A Guerra; Andrew J Dolgert; Sean L Wu; John M Henry; Héctor M Sánchez C; David L Smith
Journal: Proc Natl Acad Sci U S A Date: 2021-05-04 Impact factor: 11.205

7. Modelling input-output flows of severe acute respiratory syndrome in mainland China.

Authors: Li Wang; Jinfeng Wang; Chengdong Xu; Tiejun Liu
Journal: BMC Public Health Date: 2016-02-29 Impact factor: 3.295

8. A Theoretical Analysis of the Geography of Schistosomiasis in Burkina Faso Highlights the Roles of Human Mobility and Water Resources Development in Disease Transmission.

Authors: Javier Perez-Saez; Lorenzo Mari; Enrico Bertuzzo; Renato Casagrandi; Susanne H Sokolow; Giulio A De Leo; Theophile Mande; Natalie Ceperley; Jean-Marc Froehlich; Mariam Sou; Harouna Karambiri; Hamma Yacouba; Amadou Maiga; Marino Gatto; Andrea Rinaldo
Journal: PLoS Negl Trop Dis Date: 2015-10-29

9. Mathematical models for predicting human mobility in the context of infectious disease spread: introducing the impedance model.

Authors: Kankoé Sallah; Roch Giorgi; Linus Bengtsson; Xin Lu; Erik Wetter; Paul Adrien; Stanislas Rebaudet; Renaud Piarroux; Jean Gaudart
Journal: Int J Health Geogr Date: 2017-11-22 Impact factor: 3.918

10. Introduction of rubella-containing-vaccine to Madagascar: implications for roll-out and local elimination.

Authors: Amy Wesolowski; Keitly Mensah; Cara E Brook; Miora Andrianjafimasy; Amy Winter; Caroline O Buckee; Richter Razafindratsimandresy; Andrew J Tatem; Jean-Michel Heraud; C Jessica E Metcalf
Journal: J R Soc Interface Date: 2016-04 Impact factor: 4.118