Literature DB >> 28199347

Multi-scale spatio-temporal analysis of human mobility.

Laura Alessandretti¹, Piotr Sapiezynski², Sune Lehmann^2,3, Andrea Baronchelli¹.

Abstract

The recent availability of digital traces generated by phone calls and online logins has significantly increased the scientific understanding of human mobility. Until now, however, limited data resolution and coverage have hindered a coherent description of human displacements across different spatial and temporal scales. Here, we characterise mobility behaviour across several orders of magnitude by analysing ∼850 individuals' digital traces sampled every ∼16 seconds for 25 months with ∼10 meters spatial resolution. We show that the distributions of distances and waiting times between consecutive locations are best described by log-normal and gamma distributions, respectively, and that natural time-scales emerge from the regularity of human mobility. We point out that log-normal distributions also characterise the patterns of discovery of new places, implying that they are not a simple consequence of the routine of modern life.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2017 PMID： 28199347 PMCID： PMC5310761 DOI： 10.1371/journal.pone.0171686

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Characterising the statistical properties of individual trajectories is necessary to understand the underlying dynamics of human mobility and design reliable predictive models. A trajectory consists of displacements between locations and pauses at locations, where the individual stops and spends time (Fig 1). Thus, the distribution of waiting times (or pause durations), Δt, between movements and the distribution of distances, Δr, travelled between pauses are often used to quantitatively assess the dynamics of human mobility. For example, specific probability distributions of distances and waiting times characterise different types of diffusion processes. Thanks to the recent availability of data used as proxy for human trajectories including mobile phone call records (CDR), location based social networks (LBSN) data, and GPS trajectories of vehicles, the characteristic distributions of distances and waiting times between consecutive locations have been widely investigated. There is no agreement, however, on which distribution best describes these empirical datasets.

Fig 1

Example of an individual trajectory.

Example of an individual trajectory.

An individual trajectory is composed of pauses (red dots) and displacements (dashed black line). The trajectory shows the positions of one individual across 26 hours. Location is estimated from individual’s WiFi scans as detailed in the text and the data is sampled in 1 min bins. Red dots correspond to locations where the individual spent more than 10 consecutive minutes. The coordinates of these locations have been slightly altered to protect the subject privacy. The map was generated with the Matplotlib Basemap toolkit for Python (https://pypi.python.org/pypi/basemap). Map data © OpenStreetMap contributors (License: http://www.openstreetmap.org/copyright). Map tiles by Stamen Design, under CC BY 3.0. Pioneer studies, based on CDR [1, 2] and banknote records [3], found that the distribution of displacement Δr is well approximated by a power-law, P(Δr) ∼ Δr−, (or ‘Lévy distribution’ [4], as typically 1 < β < 3), and that an exponential cut-off in the distribution may control boundary effects [2]. These findings were confirmed by studies based on GPS trajectories of individuals [5-7] and vehicles [8, 9], as well as online social networks data [10-12]. It has been noted, however, that power-law behaviour may fail to describe intra-urban displacements [13]. Other analyses, based on online social network data [14-16] and GPS trajectories [17-20] showed that the distribution of displacements is well fitted by an exponential curve, P(Δr) ∼ e−, in particular at short distances. Finally, analyses based on GPS on Taxis [21, 22] suggested that displacements may also obey log-normal distributions, P(Δr) ∼ (1/Δr) ∗ e−(log Δ. In Ref. [6], the authors found that this is the case also for single-transportation trips. Fewer studies have explored the distribution of waiting times between displacements, Δt, as trajectory sampling is often uneven (e.g., in CDR data location is recorded only when the phone user makes a call or texts, and LBSN data include the positions of individuals who actively “check-in” at specific places). Analyses based on evenly sampled trajectories from mobile phone call records [1, 23], and individuals GPS trajectories [5, 7] found that the distribution of waiting times can be also approximated by a power-law. A recent study based on GPS trajectories of vehicles, however, suggests that for waiting times larger than 4 hours, this distribution is best approximated by a log-normal function [24]. Several studies have highlighted the presence of natural temporal scales in individual routines: distributions of waiting times display peaks in that corresponds to the typical times spent home on a typical day (∼14 hours) and at work (∼3 − 4 hours for a part-time job and ∼8 − 9 hours for a full-time job) [23, 25, 26]. Fig 2 and Table 1 compare distributions obtained using different data sources. The spectrum of results reflects the heterogeneity of the considered datasets (see Fig 2). It is known in fact that data spatio-temporal resolution and coverage has an important influence on the results of the analyses performed [27-29].

Fig 2

The distribution of displacements P(Δr): heterogeneity of results found in the literature.

Each horizontal line corresponds to a different dataset. Lines extend from the minimum Δr (i.e. the spatial resolution of the data or the minimum value considered for the fit of the distribution), to the maximal length of displacement considered (both in meters). Colours correspond to the model fitting P(Δr) according to the study reported at the end of each line. If the distribution is not unique, but varies for different ranges of Δr, the line is divided in segments. Lines are marked with ‘∗’ if the corresponding data is modelled as a sequence of two distributions of the same type with different parameters, for different ranges Δr. Refs [2, 6, 18, 30] analyse more than one dataset. In [13] the authors analyse the same dataset for different ranges Δr. A more detailed table is presented in section “Related Work”.

Table 1

Distribution of waiting times and displacements: A comparison of over 30 datasets on human mobility.

	Data type	N	Dur.	RangeΔx	P(Δx)	Sampling δt	P(Δt)
[1] (D1)	CDR	3.0 ⋅ 10⁶	1 Y	1 km100 km	power-law (T)β = 1.55	uneven
[1] (D2)	CDR	10³	2 W	1 km100 km		1 h	power-law (T)β = 1.80
[2] (D1)	CDR	10⁵	6 M	1 km1000 km	power-law (T)β = 1.75	uneven
[2] (D2)	CDR	206	1 W	1 km500 km	power-law (T)β = 1.75	2 h
[3]	Bills records	4.6 ⋅ 10⁵bills	1.39 Y	100 m3200 km	10 ≤ Δx ≤ 3200 kmpower-lawβ = 1.59	uneven
[5] (Geolife)	GPS	32	3.42 Y	10 m10000 km	0.01 ≤ Δx ≤ 10 kmpower-lawβ₀ = 1.2510 < Δx ≤ 10000 kmpower-lawβ₁ = 1.90	2 min	power-law β = 1.98
[6](Nokia)	GPS	200	1.50 Y	100 m10 km	power-law (T)β = 1.39	10 sec
[6] (Geolife)	GPS	182	5.00 Y	100 m10 km	power-law (T)β = 1.57	1 − 5 sec
[7] (5 datasets)	GPS	101	5 M	10 m10 km	power-law (T)β = [1.35-1.82]	10 sec	power-law (T)β = [1.45-2.68]
[8]	Taxi (GPS)	50	6 M	1 km100 km	3 ≤ Δx ≤ 23 kmpower-lawβ₀ = 2.5023 < Δx ≤ 100 kmpower-lawβ₁ = 4.60	10 sec
[9]	Taxi (GPS)	6.6 ⋅ 10³	1 W	1 km100 km	power-law (T)β = 1.20	10 sec
[10]	Flickr	4.0 ⋅ 10⁴		1 km10000 km	power-law (T)	uneven
[11]	LBSN	2.2 ⋅ 10⁵	4 M	1 km500 km	power-lawβ = 1.88	uneven
[12]	Twitter	1.3 ⋅ 10⁷	1 Y	1 km100 km	power-lawβ = 1.62	uneven
[13]	LBSN	9.2 ⋅ 10⁵	6 M	1 km20000 km	power-lawβ = 1.50	uneven
[13] (intracity)	LBSN	9.2 ⋅ 10⁵	6 M	10 m100 km	power-law(“poor”) [13]β = 4.67	uneven
[14]	LBSN	2.6 ⋅ 10⁵	1 Y	10 m50 km	exponentialλ = 0.179	uneven
[15]	LBSN	5.2 ⋅ 10⁵	1 Y	1 km4000 km	exponentialλ = 0.003	uneven
[16]	Twitter	1.6 ⋅ 10⁵	8 M	10 m4000 km	0.01 ≤ Δx ≤ 0.1 kmexponentialλ = 0.0730.1 < Δx ≤ 100 kmStretchedpower-lawβ₁ = 0.45100 < Δx ≤ 4000 kmpower-lawβ₂ = 1.32	uneven
[17]	Taxi (GPS)	803	1.25 Y	1 km100 km	Δx ≤ 15 kmexponentialλ = 0.3615 < Δx ≤ 100 kmpower-lawβ = 3.66	30 sec
[18] (D1)	Taxi (GPS)	10⁴	3 M	1 km100 km	1 ≤ Δx ≤ 20 kmexponentialλ₀ = 0.2320 < Δx ≤ 100 kmexponentialλ₁ = 0.17	1 min
[18] (D2)	Taxi (GPS)	10⁴	2 M	1 km100 km	1 ≤ Δx ≤ 20 kmexponentialλ₀ = 0.2420 < Δx ≤ 100 kmexponentialλ₁ = 0.18	1 min
[19]	Taxi (GPS)	6.6 ⋅ 10³	1 W	2 km20 km	exponentialλ = [0.072-0.252]	10 sec
[20](3 datasets)	Taxi (GPS)	10⁴	1 M	600 m10 km	exponential	[9 − 177] s	power-law
[21](6 datasets)	Taxi (GPS)	3.0 ⋅ 10⁴	[1 M-2 Y]	1 km100 km	log-normalμ = [0.77-1.32],σ = [0.67-0.87]	[24 − 116] s
[22]	Taxi (GPS)	1.1 ⋅ 10³	6 M	100 m30 km	log-normalμ = 0.38,σ = 0.48	30 sec
[23]	Surveys	10⁴	1 Y			self-reported	power-law (T)β = 0.49
[24]	Private Cars (GPS)	7.8 ⋅ 10⁵	1 M	1 km500 km	superimposition Poisson	10 sec	Δt ≤ 4hpower-lawβ = 1.031 ≤ Δt ≤ 200hlog-normalμ = 1.60,σ = 1.60
[26]	Private Cars (GPS)	3.5 ⋅ 10⁴	1 M	300 m100 km	polynomial	10 sec	power-lawβ = 0.97
[30] (D1)	CDR	1.3 ⋅ 10⁶	1 M	1 km200 km	power-lawβ = 2.02	uneven
[30] (D2)	CDR	6 ⋅ 10⁶	1 Y	1 km500 km	power-lawβ = 1.75	uneven
[30] (D3)	CDR		4 Y	1 km100 km	power-lawβ = 1.80	uneven
[34]	Travel cards	2.0 ⋅ 10⁶	1 W	100 m50 km	negative binomialμ = 9.28,σ = 5.83	uneven
[42]	TravelDiaries	230	1.5 M	1 km400 km	power-law (T)β = 1.05	self-reported
[56]	Private Cars (GPS)	7.5 ⋅ 10⁴	1 M	10 m500 km	0.01 ≤ Δx ≤ 20 kmexponential20 < Δx ≤ 150 kmpower-lawβ₁ = 3.30	30 sec	Δt ≤ 3hexponentialλ = 1.02
[57]	Taxi (GPS)		1 D	200 m1000 km	power-lawβ = 2.70

The distribution of displacements P(Δr): heterogeneity of results found in the literature.

Distribution of waiting times and displacements: A comparison of over 30 datasets on human mobility.

The table reports for each dataset: the reference to the journal article/book where the study was published, the type of data (LBSN stands for Location Based Social Networks, CDR for Call Detail Record), the number of individuals (or vehicles in the case of car/taxi data) involved in the data collection, the duration of the data collection (M → months, Y → years, D → days, W → weeks), the minimum and maximum length of spatial displacements, the shape of the probability distribution of displacements with the corresponding parameters, the temporal sampling, the shape of the distribution of waiting times with the corresponding parameters. Power-law (T), indicates a truncated power-law. The table can also be found at http://lauraalessandretti.weebly.com/plosmobilityreview.html. First, the datasets considered have different spatial resolution and coverage, and few studies have so far considered the whole range of displacements occurring between ∼10 and 107 m (10000 km) (Fig 2). Our analysis suggests that constraining the analysis to a specific distance range may result in different interpretations of the distributions. Another difference concerns the temporal sampling in the datasets analysed so far. Uneven sampling typical of CDR and LBSN data (i) does not allow to distinguish phases of displacement and pause, since individuals could be active also while transiting between locations, and (ii) may fail to capture patterns other than regular ones [31, 32], because individuals’ voice-call/SMS/data activity may be higher in certain preferred locations. Finally, studies focusing on displacements effectuated using one or several specific transportation modality (private car [24, 33], taxi [20], public transportation [34], or walk [7]) capture only a specific aspect of human mobility behaviour. In this paper, we analyse mobility patterns of ∼850 individuals involved in the Copenhagen Network Study experiment for over 2 years [35]. Individual trajectories are determined combining GPS and Wi-Fi scans data resulting in a spatial resolution of ∼10 m, and even sampling every ∼16 s. Trajectories span more than ∼107 m. Previous studies with comparable spatial coverage (Fig 2) relied on single-transportation modality data [8], unevenly sampled data [16], or small samples (32 individuals in Ref. [5]). To our knowledge, the Copenhagen Network Study data has the best combination of spatio-temporal resolution and sample size among the datasets analysed in the literature to date (see Methods).

Results

We consider an individual to be pausing when he/she spends at least 10 consecutive minutes in the same location, and moving in the complementary case. In the following, we refer to locations as places where individuals pause. The distribution of displacements is robust with respect to variations of the pausing parameter (see S1 File for the results obtained with 15 and 20 minutes pausing). We start by considering the three distributions most frequently reported in the literature (Table 1), namely The log-normal distribution of a random variable x, with parameters σ and μ, defined for σ > 0 and x > 0, with probability density function: The Pareto distribution (i.e. power-law) of a random variable x, with parameter β, defined for x ≥ 1, and β > 1, with probability density function: The exponential distribution of a random variable x, with parameter λ, where x ≥ 0, and λ > 0, with probability density function: In Eq (2) the probability density can be shifted by x0 and/or scaled by s, as P(x) is identically equivalent to P(y)/s, with . In Eqs (1) and (3), P(x) is identically equivalent to P(y), with y = (x − x0). In this work, the shift (x0) and scale (s) parameters are considered as additional parameters to take into account the data resolution. With few exceptions, the results presented below hold also imposing no shift, x0 = 0 (see S1 File). Note also that Pareto distributions with exponential cut-off (or truncated Pareto) are considered below (see also Table 1).

Distribution of displacements

We start our analysis by investigating the distribution of displacements between consecutive stop-locations P(Δr). First, we consider the overall distribution of the displacements Δr using all available data (851 individuals over 25 months). We find that P(Δr) is best described by a log-normal distribution (Eq (1)) with parameters μ = 6.78 ± 0.07 and σ = 2.45 ± 0.04, which maximises Akaike Information Criterion (see Methods)—among the three models considered—with Akaike weight ∼1 (Fig 3, see also S1 File).

Fig 3

Distribution of displacements.

Blue dotted line: data. Black dashed line: log-normal fit with characteristic parameter μ and σ. Red dashed line: Pareto fit with characteristic parameter β for Δr > 7420 m.

Distribution of displacements.

Blue dotted line: data. Black dashed line: log-normal fit with characteristic parameter μ and σ. Red dashed line: Pareto fit with characteristic parameter β for Δr > 7420 m. Second, we investigate if this results holds also for sub-samples of the entire dataset. We bootstrap data 1000 times for samples of 200 and 100 individuals, and we verify that the best distribution is log-normal for all samples, and the average parameters inferred through the bootstrapping procedure are consistent with the parameters found for the entire dataset (see S1 File). In fact, the errors on the value of the parameters reported above are computed by bootstrapping data for samples of 100 randomly selected individuals. This analysis ensures homogeneity within the population considered, and takes into account also that often smaller sample sizes were analysed in previous literature. Third, we zoom in to the individual level. We find that the individual distribution of displacements is best described by a log-normal function for 96.2% of individuals. The best distribution is the Pareto distribution for 1.4%, and exponential for the remaining 2.4%. However, the number of data points per individual tend to be significantly lower in group of individuals exhibiting Pareto or exponential distributions, so that one should be cautious in interpreting the observed deviations from a log-normal distribution. Fig 4 reports the histogram of the individual μ parameters for the 96.2% of the population that is best described by a log-normal distribution, along with three examples of individual distributions.

Fig 4

Distribution of individual displacements.

Distribution of individual displacements.

A) Frequency histogram of 96.2% of individuals for which the individual distribution of displacement is log-normal, according to the value of the log-normal fit coefficient μ. B-C-D) Examples of the distribution of displacements P(Δr) of three individuals i1 (B), i2 (C), i3 (D) (dotted line), with the corresponding log-normal fit (dashed line). The value of the fit coefficients μ and σ are reported in each subfigure. Finally, we look at large Δr in order to compare our results with precedent studies relying on data with larger spatial resolution. We find that limiting the analysis to large values of Δr results in the selection of a Pareto distribution (Eq (2)). We identify the threshold Δr∗ = 7420 m as the minimal resolution for which the best fit in Δr∗ < Δr < 107 m is Pareto with coefficient β = 1.81 ± 0.03 and not log-normal. By bootstrapping 1000 times over samples of 100 individuals we find that . Thus, power-law distributions describe mobility behaviour only for large enough distances, while mobility patterns including distances smaller than 7420 m are better described by log-normal distributions.

Distribution of waiting times

We now analyse the distribution of waiting times between displacements. The best model describing the distribution of waiting times over all individuals is the log-normal distribution (Eq (1), Fig 5, see also S1 File), with parameters μ = −0.42 ± 0.04, σ = 2.14 ± 0.02. As above, errors are found by bootstrapping over samples of 100 individuals. Also, by bootstrapping we find that the log-normal distribution is the best descriptor for samples of 200 and 100 randomly selected individuals (see S1 File). As in the case of displacements, we find that restricting the analysis to large values of our observable Δt, and specifically considering only Δt > Δt∗ = 13 h, results in the selection of the Pareto distribution (Eq (2), see Fig 5), with coefficient β = 1.44 ± 0.01. We find by averaging over 100 samples of 200 individuals that . Note that the log-normal distribution is selected as the best model also when the analysis is restricted to Δt < Δt∗.

Fig 5

Distribution of waiting times between displacements.

Yellow dotted line: data. Black dashed line: Log-normal fit with characteristic parameter μ and σ. Red dashed line: Pareto fit with characteristic parameter β for Δt > 13 h.

Distribution of waiting times between displacements.

Yellow dotted line: data. Black dashed line: Log-normal fit with characteristic parameter μ and σ. Red dashed line: Pareto fit with characteristic parameter β for Δt > 13 h. The distribution of waiting times shows also the existence of “natural time-scales” of human mobility. We detect local maxima of the distribution at 14.0, 39.3, 64.8, and 89.9 hours. Hence, 14 hours is the typical amount of time that students in the experiment spent home every day, in agreement with previous analyses on human mobility [23, 25, 26]. Other peaks appear for intervals Δt ≈ 14 + n ⋅ 24, with n = {2, 3…}, suggesting individuals spend several days at home. Notice also that the distribution we consider is limited to Δt < 5 days, an interval much shorter than the observation time-window (about 2 years), a fact that guarantees the absence of possible spurious effects [29]. This limit is imposed to control the cases in which students leave their phones home. The upper bound is arbitrarily set to 5 days; however, we have verified that results are consistent with respect to variations of this choice.

Distribution of displacements between discoveries

Log-normal features also characterise patterns of exploration. We consider the temporal sequence of stop-locations that individuals visit for the first time—in our observational window—and characterise the distributions of displacements between these ‘discoveries’. We find that the distribution of distances between consecutive discoveries P(Δr) is best described as a log-normal distribution with parameters μ = 6.59 ± 0.02, σ = 1.99 ± 0.01, (Fig 6, see also S1 File). For Δr > 2800 m, the best model fitting the distribution of displacements is the Pareto distribution with coefficient β = 2.07 ± 0.02. This results are verified by bootstrapping (see S1 File).

Fig 6

Distribution of displacements between discoveries.

Green dotted line: data. Black dashed line: Log-normal fit with characteristic parameter μ and σ. Red dashed line: Pareto fit with characteristic parameter β for Δr > 2800 m.

Distribution of displacements between discoveries.

Green dotted line: data. Black dashed line: Log-normal fit with characteristic parameter μ and σ. Red dashed line: Pareto fit with characteristic parameter β for Δr > 2800 m.

Correlations between pauses and displacements

We further investigate the properties of individual trajectories by analysing the correlations between the distance Δr and the duration Δt characterising a displacement and the time Δt spent at destination. Fig 7A shows a positive correlation between Δr and Δt for Δr ≳ 300m (p < 0.01). As Δr is the distance between the displacement origin and destination, the absence of correlation at short distances could be due to individuals not taking the fastest route. A positive correlation characterises also the distance Δr covered between origin and destination and the waiting time at destination for distances 30m ≲ Δr ≲ 104m (p < 0.01). Instead, the correlation is negative for distances larger than 5 × 104m (Fig 7B). This could suggest that individuals break long trips with short pauses. We have verified that these results hold also when individuals’ most important locations (typically including university and home) are removed from the trajectory, implying that these correlations are not dominated by daily commuting.

Fig 7

Correlations between displacements and pauses.

A) The duration Δt of a displacement vs the distance Δr between origin and destination. The blue line is the median value of Δr and Δt computed within log-spaced 2-dimensional bins. The filled blue area corresponds to the 25-75 percentile range. The value of the Pearson correlation coefficient within the shaded grey area indicates a positive correlation, with p − value < 0.01. The dashed line is a power-law function with coefficient β, as a guide for the eye. B) The waiting time Δt at destination vs the distance Δr between origin and destination. The blue line is the median value of Δr and Δt computed within log-spaced 2-dimensional bins. The filled blue area corresponds to the 25-75 percentile range. The value of the Pearson correlation coefficient within the shaded grey area indicates a positive correlation, with p − value < 0.01. The dashed line is a power-law function with coefficient β, as a guide for the eye.

Correlations between displacements and pauses.

Further analysis: Selection of the best model among 68 distributions

In the previous sections we have restricted the analysis of the distributions of displacements and waiting times to the three functional forms that are most frequently found in the literature. We now repeat the selection procedure considering a list of 68 models (see S1 File for the list of distributions) in order to confirm the results described above. The distributions of displacements and displacements between discoveries are best described by log-normal distributions also when the choice is extended to 68 models, and tails (respectively for Δr > Δr∗ = 7420 m and Δr > Δr∗ = 2800 m) are better modelled as generalised Pareto distribution, with form: where ξ is the parameters of the model, such that x ≥ 0 if ξ ≥ 0, and if ξ < 0. The best model selected for the whole distribution of waiting time among the 68 models considered is a gamma distribution, defined for x ∈ (0, ∞), k > 0 and θ > 0 as: where . Although the gamma distribution is the best model for the distribution of waiting times (see S1 File for the result of the fit), the presence of natural scales could indicate that the whole distribution may be better described as the composition of several models.

Discussion

Using high resolution data we have characterised human mobility patterns across a wide range of scales. We have shown that both the distribution of displacements and waiting times between displacements are best described by a log-normal distribution. We found, however, that power-law distributions are selected as the best model when only large spatial or temporal scales are considered, thus explaining (at least partially) the disagreement between previous studies. We also showed that log-normal distributions characterise the distribution of displacements between discoveries, implying that this property is not a simple consequence of the stability of human mobility but a characteristic feature of human behaviour. Finally, we have shown that there exist correlations between displacements’ length and the waiting time at destination. The heavy tailed nature of human mobility has been attributed to various factors, including differences between individual trajectories [36], search optimisation [37-40], the hierarchical organisation of the streets network [41] and of the transportation system [6, 24, 42]. On the other hand log-normal distributions can result from multiplicative [43] and additive [44] processes and describe the inter-event time of different human activities such as writing emails, commenting/voting on online content [45] and creating friendship relations on online social networks [46]. Instead, the distribution of inter-event time in mobile-phone call communication activity can be described as the composition of power-laws [47-49], a feature attributed to the existence of characteristic scales in communication activity such as the time needed to answer a call, as well as the existence of circadian, weakly and monthly patterns. We also find clear signatures of circadian patterns, which could indicate that the whole distribution may be better described as the composition of several models. However, in our case the best description for times including Δt < Δt* is the gamma distribution, which thus is selected both when the whole range of scales is considered and when the analysis is restricted to short times. Our results come from the analysis of a sample of ∼850 University students, which of course represent a very specific sample of the whole population. Nevertheless, it is worth noting that many statistical properties of CNS students mobility patterns are consistent with previous results, such as the distribution of the radius of gyration, the Zipf-like behaviour of individual locations frequency-rank plot, and the power-law tail of the distribution of displacements (β = 1.81 ± 0.03 vs. β = 1.75 ± 0.15 of [2]). Details are reported in Supplementary Information of [50]. While identifying the mechanism responsible for the observed mobility patterns is beyond the scope of the present article, we anticipate that a more complete spatio-temporal description of human mobility will help us develop better models of human mobility behaviour [24, 51]. Our findings can also help the understanding of phenomena such as the spreading of epidemics at different spatial resolutions, since the nature of heterogeneous waiting times between displacements have a major impact on the spreading of diseases [52].

Methods

Data description and pre-processing

The Copenhagen Network Study data collection took place between September 2013 and February 2016 and involved 851 students of Technical University of Denmark (DTU) in Copenhagen. Data collection was approved by the Danish Data Protection Agency. All participants provided informed consent by filling an on-line consent form and all methods were performed in accordance with the relevant guidelines and regulations. Individual trajectories were inferred combining WiFi scans data and GPS scans data recorded on smartphones handed out to all participants. An anthropological field study included in the 2013 deployment of the experiment reported that participants did not alter their habits due to participation in the CNS experiment. The WiFi scans data provides a time-series of wireless network scans performed by participants’ mobile devices. Each record (i, t, SSID, BSSID, RSSI) indicates: the participant identifier, i the timestamp in seconds, t the name of the wireless network scanned, SSID the unique identifier of the access point (AP) providing access to the wireless network, BSSID the signal strength in dBm, RSSI. APs do not have geographical coordinates attached, but their position tend to be fixed. The geographical position of APs is estimated the procedure described in S1 File, which used participants’ sequences of GPS scans to obtain APs locations and remove mobile APs. Then, we clustered geo-localised APs to “locations” using a graph-based approach. With our definition, a “location” is a connected component in the graph G, where a link exists between two APs if their distance is smaller than a threshold d (see [50], SI for more details). Here, we present results obtained for d = 2 m. However, results are robust with respect to the choice of the threshold (see also [50]). Throughout the experiment, participants’ devices scanned for WiFi every Δt seconds. The median time between scans is between Δt = 16 s and Δt < 60 s for 90% of the population (see also [50], SI). Data was temporally aggregated in bins of length Δt = 60 s, since we focus here on the pauses between moves. If a participant visits more than one location within a timebin, we assign the location in which they spent the most time to that bin. Given our definition of location and the given time-binning, the median daily time coverage (the fraction of minutes/day that an individual’s position is known, where the median is taken across all days) is included between 0.6 and 0.98 for 90% of the population.

Model selection

The best model is selected using Akaike weights [53]. First, we determine the best fit parameters for each of the models via Nelder-Mead numerical Likelihood maximisation [54] (maximisation is considered to fail if convergence with tolerance t = 0.0001 is not reached after 200 ⋅ N iterations, where N is the length of the data). For each model m, we compute the Akaike Information Criterion: where L is the maximum likelihood for the candidate model m, V is the number of free parameters in the model, and n is the sample size. The AIC reaches its minimum value AIC for the model that minimises the expected information loss. Thus, AIC rewards descriptive accuracy via the maximum likelihood and penalises models with large number of parameters. The Akaike w(AIC) weight of a model m corresponds to its relative likelihood with respect to a set of possible models. Measuring the Akaike weights allows us to compare the descriptive power of several models. For all distributions considered in this paper, we found one model m∗ such that w∗ ∼ 1 (which implies all the other models have Akaike weight very close to 0).

Figures

All figures were generated using Matplotlib [55] package (version 1.5.3) for Python.

Related work

We present here more detailed analysis of the literature discussed in the paper.

Supporting figures and tables.

(PDF) Click here for additional data file.

26 in total

1. Optimizing the success of random searches.

Authors: G M Viswanathan; S V Buldyrev; S Havlin; M G da Luz; E P Raposo; H E Stanley
Journal: Nature Date: 1999-10-28 Impact factor: 49.962

2. The scaling laws of human travel.

Authors: D Brockmann; L Hufnagel; T Geisel
Journal: Nature Date: 2006-01-26 Impact factor: 49.962

3. Human mobility and time spent at destination: impact on spatial epidemic spreading.

Authors: Chiara Poletto; Michele Tizzoni; Vittoria Colizza
Journal: J Theor Biol Date: 2013-09-04 Impact factor: 2.691

4. Unravelling daily human mobility motifs.

Authors: Christian M Schneider; Vitaly Belik; Thomas Couronné; Zbigniew Smoreda; Marta C González
Journal: J R Soc Interface Date: 2013-05-08 Impact factor: 4.118

5. Characterizing the human mobility pattern in a large street network.

Authors: Bin Jiang; Junjun Yin; Sijian Zhao
Journal: Phys Rev E Stat Nonlin Soft Matter Phys Date: 2009-08-31

6. A tale of many cities: universal patterns in human urban mobility.

Authors: Anastasios Noulas; Salvatore Scellato; Renaud Lambiotte; Massimiliano Pontil; Cecilia Mascolo
Journal: PLoS One Date: 2012-05-29 Impact factor: 3.240

7. Geo-located Twitter as proxy for global mobility patterns.

Authors: Bartosz Hawelka; Izabela Sitko; Euro Beinat; Stanislav Sobolevsky; Pavlos Kazakopoulos; Carlo Ratti
Journal: Cartogr Geogr Inf Sci Date: 2014-02-26

8. Diversity of individual mobility patterns and emergence of aggregated scaling laws.

Authors: Xiao-Yong Yan; Xiao-Pu Han; Bing-Hong Wang; Tao Zhou
Journal: Sci Rep Date: 2013 Impact factor: 4.379

9. Uncovering patterns of inter-urban trip and spatial interaction from social media check-in data.

Authors: Yu Liu; Zhengwei Sui; Chaogui Kang; Yong Gao
Journal: PLoS One Date: 2014-01-17 Impact factor: 3.240

10. A stochastic model of randomly accelerated walkers for human mobility.

Authors: Riccardo Gallotti; Armando Bazzani; Sandro Rambaldi; Marc Barthelemy
Journal: Nat Commun Date: 2016-08-30 Impact factor: 14.919

12 in total

1. Rational Design and Methods of Analysis for the Study of Short- and Long-Term Dynamic Responses of Eukaryotic Systems.

Authors: Duygu Dikicioglu
Journal: Methods Mol Biol Date: 2019

2. Quality of hybrid location data drawn from GPS-enabled mobile phones: Does it matter?

Authors: Eun-Hye Yoo; John E Roberts; Youngseob Eum; Youdi Shi
Journal: Trans GIS Date: 2020-01-27

3. How Short Is Long Enough? Modeling Temporal Aspects of Human Mobility Behavior Using Mobile Phone Data.

Authors: Eun-Hye Yoo
Journal: Ann Am Assoc Geogr Date: 2019-05-20

4. Potential fields and fluctuation-dissipation relations derived from human flow in urban areas modeled by a network of electric circuits.

Authors: Yohei Shida; Jun'ichi Ozaki; Hideki Takayasu; Misako Takayasu
Journal: Sci Rep Date: 2022-06-15 Impact factor: 4.996