Literature DB >> 27493764

Is this scaling nonlinear?

J C Leitão¹, J M Miotto¹, M Gerlach¹, E G Altmann¹.

Abstract

One of the most celebrated findings in complex systems in the last decade is that different indexes y (e.g. patents) scale nonlinearly with the population x of the cities in which they appear, i.e. y∼x (β) ,β≠1. More recently, the generality of this finding has been questioned in studies that used new databases and different definitions of city boundaries. In this paper, we investigate the existence of nonlinear scaling, using a probabilistic framework in which fluctuations are accounted for explicitly. In particular, we show that this allows not only to (i) estimate β and confidence intervals, but also to (ii) quantify the evidence in favour of β≠1 and (iii) test the hypothesis that the observations are compatible with the nonlinear scaling. We employ this framework to compare five different models to 15 different datasets and we find that the answers to points (i)-(iii) crucially depend on the fluctuations contained in the data, on how they are modelled, and on the fact that the city sizes are heavy-tailed distributed.

Entities: Chemical Disease Gene Species

Keywords: allometry; scaling laws; statistical inference

Year: 2016 PMID： 27493764 PMCID： PMC4968456 DOI： 10.1098/rsos.150649

Source DB: PubMed Journal: R Soc Open Sci ISSN： 2054-5703 Impact factor: 2.963

Introduction

The study of statistical and dynamical properties of cities from a complex-systems perspective is increasingly popular [1]. A celebrated result is the scaling between a city-specific observation y (e.g. the number of patents filed in the city) and the population x of the city as [2] with a non-trivial (β≠1) exponent. Super-linear scaling (β>1) was observed when y quantifies creative or economic outputs and indicates that the concentration of people in large cities leads to an increase in the per capita (y/x) production. Sublinear scaling (β<1) was observed when y quantifies resource use and suggests that large cities are more efficient in the per capita (y/x) consumption. Since its proposal, nonlinear scaling has been reported in an impressive variety of different aspects of cities [3-9]. It has also inspired the proposal of different generative processes to explain its ubiquitous occurrence [10-14]. Scalings similar to the one in equation (1.1) appear in physical (e.g. phase transitions) and biological (e.g. allometric scaling) systems suggesting that cities share similarities with these and other complex systems (e.g. fractals). More recent results cast doubts on the significance of the β≠1 observations [15-17]. Reference [15] agrees that economic outputs are faster than linear in x, but claims that the population x has a limited explanatory factor on the per capita rate y/x of cities and function (1.1) is not better than alternative ones (see [6,18] for opposing arguments). Reference [16] focuses on the case of CO2 emissions and shows that depending on whether city boundaries or metropolitan areas are used, the value of β changes from β>1 to β<1. This point was carefully analysed in [17] for different datasets y. Through a careful study of different possible choices of city boundaries, the authors report that the evidence for β≠1 virtually vanishes. These results ask for a more careful statistical analysis that rigorously quantifies the evidence for β≠1 in different datasets. In this paper, we propose a statistical framework based on a probabilistic formulation of the scaling law (1.1) that allows us to perform hypothesis testing and model comparison. In particular, we quantify the evidence in favour of β≠1 comparing (through the Bayesian information criterion, BIC) models with β≠1 to models with β=1. We apply this approach to 15 datasets of cities from five regions and find that the conclusions regarding β vary dramatically not only depending on the datasets, but also on assumptions of the models that go beyond (1.1). We argue that the estimation of β is challenging and depends sensitively on the model because of the following two statistical properties of cities: (i) the distribution of city population has heavy tails (Zipf’s law) [1,19] and (ii) there are large and heterogeneous fluctuations of y as a function of x (heteroscedasticity). Points (i) and (ii) are shown, respectively, in figure 1a,b.

Figure 1.

Example of the data and their main statistical properties. (a) The distribution of the population of the cities for the regions considered in this paper. The roughly straight line in this rank–population plot is in agreement with Zipf’s law and shows that, in most cases, the data vary over two orders of magnitude in the population (e.g. from 100 000 to 10 million inhabitants). (b) Example of the dataset analysed in our work, in which large fluctuations are clearly visible. The paper is organized as follows. We start by describing the problem and the datasets we use (in §2) and discussing (in §3) the limitations of the usual statistical approach based on least-squares fitting in log-scale. We then propose a probabilistic formulation together with different statistical models (in §4) and describe (in §5) how they can be compared with each other and to data. Finally, we discuss our main findings (in §6) and summarize our conclusions (in §7).

Data

The general problem we are interested in is to test and estimate the parameters of equation (1.1) based on observations (x,y) for i=1,…,N cities, where x is the population and y is the amount of the quantity of interest in city i (as in figure 1b). The quantities x,y are estimated within a measurement precision which, in principle, could also be included in the analysis. However, in most cases, this information is not available, and only single measurements of x,y exist. The datasets we choose include a variety of different regions, aggregation methods to define city boundaries, and quantities y. They include the data from various countries and regions: 100 metropolitan areas of the United Kingdom (UK), aggregated as in [17]; 381 metropolitan areas of the United States of America (USA), as discussed in [12]; 459 urban areas of the USA; 472 large cities of the European Union (EU); 275 large cities from the members of the Organization for Economic Co-operation and Development (OECD); and 5565 municipalities (administrative units) from Brazil. For each database, we use indexes of economical activity (weekly income, GDP), innovation (patents filed), transportation (miles travelled, number of train stations), access to culture (number of theatres, number of cinema seats, number of cinema attendances in 1 year, etc.) and health condition (AIDS infections, death by external causes). Further details are presented in appendix A.

Limitations of the usual statistical analysis

The following three steps summarize the usual approach used to test a nonlinear scaling in equation (1.1) (see [2-4,8,11,12,16-18] for scalings in cities and [20] for scalings in biology): The parameters of equation (1.1) are chosen based on least-squares fitting in log-transformed data , i.e. α,β are such that is minimized. The quality of the fitting is quantified by the coefficient of determination . R2 close to 1 is taken as evidence of the agreement between the fit and the data. The 95% confidence interval [βmin,βmax] around β is computed from the sum of the residuals squared and β∉[βmin,βmax] is taken as an evidence that β≠1. This usual approach is appealing because of its simplicity and ease of numerical implementation. However, it contains the following assumptions and limitations that are usually ignored: The parameters obtained through least-squares fitting are maximum-likelihood estimators if (i) the data points are independent and (ii) the fluctuations around the mean , , are Gaussian distributed in with a variance independent of . The value of β obtained in the usual approach is meaningful if these assumptions hold. R2 does not quantify the statistical significance of the model, it quantifies the correlation between data and model (the amount of the variation in the data explained by the model). In particular, R2 close to one is not an evidence that the data are a likely outcome of the model. Below, we obtain that datasets are typically not consistent with the model underlying the usual approach. The confidence interval [βmin,βmax] is a range in which the true value of β is expected to be found only if the model holds [21]. Therefore, in the typical case in which the data are not compatible with the model, one cannot conclude that β≠1 based on the observation that 1∉[βmin,βmax]. Usually, in this case, both β=1 and β≠1 are incompatible with the data. A further limitation of the usual approach is that it requires removing the datapoints with y=0 (because it requires computing ). This filtering is arbitrary, because y=0 is usually a valid observation (e.g. cities without any patents filed). In the study of scaling laws in biology, the underlying hypothesis and alternatives to the usual least-squares fitting have been extensively discussed [22,23]. In city data, statistical analysis beyond the usual approach was performed in [3,5,7,8,10]. It typically amounts to an analysis of the residuals , e.g. a (visual) comparison of the residuals of the fit to the Gaussian distribution predicted by the model underlying the linear fit in log–log scale. The controversies regarding a nonlinear scaling β≠1 motivate us to search for an alternative statistical framework to test the scaling (1.1) beyond the usual approach with residual analysis.

Probabilistic models

The statistical analysis we propose is based on the likelihood of the data being generated by different models. Following [5], we assume that the index y (e.g. number of patents) of a city of size x is a random variable with probability density P(y|x). We interpret equation (1.1) as the scaling of the expectation of y with x where is computed over the ensemble of cities with fixed x. This relation does not specify the shape of P(y|x), e.g. it does not specify how the fluctuations of y around scale with x. Here, we are interested in models P(y|x) satisfying This choice corresponds to Taylor’s law [24]. It is motivated by its ubiquitous appearance in complex systems [25], where typically δ∈[1,2], and by previous analysis of city data that reported non-trivial fluctuations [8,26,27]. The fluctuations in our models aim to effectively describe the combination of different effects, such as the variability in human activity and imprecisions on data gathering. In principle, these effects can be explicitly included in our framework by considering distinct models for each of them. Below, we specify different models P(y|x) compatible with equations (4.1) and (4.2). We consider two classes of models. In the first class, which we call city models, we a priori choose a parametric form for P(y|x), and we use equations (4.1) and (4.2) to fix the free parameters. In the second class, which we call person models, we derive P(y|x) from a generative process for the assignment of y to people that is compatible with equations (4.1) and (4.2). In both cases, the likelihood of the model is written as a function of the data {(x,y)} and at most four free parameters (α,β,γ and δ).

City models

In this class of models, we assume that each data point y is an independent realization from the conditional distribution P(y|x) and therefore the log-likelihood can be written as In order to explore how the choice of P(y|x) affects the outcome of the statistical analysis, we consider two different continuous distributions (Gaussian and lognormal).[1]

Gaussian fluctuations

Here, we consider that P(y|x) is given by a Gaussian distribution with parameters and The relations (4.1) and (4.2) are fulfilled choosing the parameters as The log-likelihood (4.3) is given by This model has P(y≤0|x)>0 and therefore observations with y≤0 can be accounted for. For the observables considered here, y=0 is a valid observation but y<0 is not. We consider two cases: Fixed This is the typical fluctuation scaling found when y is the result of a sum of random variables [25]. Free The general functional form that fulfils equation (4.2). We exclude δ>2, because, in this case, the probability P(y<0|x) of negative values (not feasible for most observables y) remains large for large x.

Lognormal fluctuations

Here, we consider that P(y|x) is given by a lognormal distribution with parameters and : The relations (4.1) and (4.2) are fulfilled choosing the parameters as (see appendix B) The log-likelihood (4.3) is given by This model has P(y≤0|x)=0 and therefore observations with y≤0 cannot be accounted for. We again consider two cases: Fixed This scaling is obtained when y is the product of independent random variables. Furthermore, and the fluctuations of are independent of x and therefore the maximum-likelihood estimation of β coincides with the estimation obtained with minimum least-squares for , as discussed in §3. Free The general functional form that fulfils equation (4.2).

Person model

The starting point for this class of models is the natural interpretation of equation (1.1) that people’s efficiency (or consumption) scales with the size of the city they are living in. This motivates us to consider a generative process in which tokens (e.g. a patent, a dollar of GDP, a mile of road) are produced or consumed by (assigned to) individual persons, in the same spirit as in [12,14]. Specifically, consider j=1,…,M persons living in i=1,…,N cities, on which the population of the city i is given by x such that . Consider also that there is a total of k=1,…,Y tokens that are randomly assigned to the M persons. A super-linear (sublinear) scaling suggests that a token is more likely to be assigned to someone living in a more (less) populous city. In this spirit, we assume that the probability that a token is assigned to person j depends only on the population x( of the city where person j lives as where Z(β) is the normalization constant, i.e. . For β=1, p( j)=1/M and each person is equally likely to be assigned a token (independently of the population of its city). Equation (4.10) is a microscopic model, and we are now interested in the macroscopic behaviour of the city: the probability that a city i gets y tokens, given that its population is x. Assuming that besides their city, individuals are indistinguishable, the probability p(i) that a token is assigned to a city i is given by a sum of p(j) over persons j on city i, which contains exactly x terms. Because x(=x when the person j lives in city i, represented by j∈i, we obtain The probability of observing y tokens in each city of size x is a multinomial distribution Thus, the likelihood can be written as a function of the observed quantities (x,y) as The scaling of the average and variance of y, i.e. equations (4.1), (4.2), is recovered as where we identify that α=Y/Z(β), γ=1 and δ=1. For y≫1, this model coincides with the city model with normal fluctuations and the latter choice of parameters. Note that the fluctuations of this model account only for fluctuations of the assignment, and neglects potential fluctuations of measurement imprecisions.

Results

In this section, we compare the models presented above against our 15 datasets. In particular, we address the following questions whose answers are summarized in table 1:

Table 1.

Summary of the application of our statistical framework to 15 different databases and five models. The entries in the table represent the scaling exponent β. The value obtained through least-squares fitting in log-scale coincides with the value reported in the first column. The error bars were computed with bootstrap. The asterisk indicates that the model has a p-value higher than 0.05. If the difference ΔBIC between the BIC of each model with the same model with a fixed β=1 is below 0, the model is linear (), between zero and six is inconclusive (open circle) and higher than six (strong evidence) is super-linear (↗)/sublinear (↘). The models were also compared between each other using the respective BICs within the same noise model (grey background has lower BIC) and between all others (bold text indicates the model with the lowest BIC).

1. What is the estimated value of β? For each model, we calculate the parameters (α,β,γ,δ) that maximize (see appendix C for details). In table 1, we report β. 2. What is the error bar b around the estimated β? We estimate b using bootstrapping with replacement (see appendix D for details). In table 1, b is shown in parentheses. The interval [β−b,β+b] can be interpreted as the 95% confidence interval of β when the model is not rejected. Otherwise, it can be interpreted as the robustness of the estimated β against fluctuations in the data (cross validation). Summary of the application of our statistical framework to 15 different databases and five models. The entries in the table represent the scaling exponent β. The value obtained through least-squares fitting in log-scale coincides with the value reported in the first column. The error bars were computed with bootstrap. The asterisk indicates that the model has a p-value higher than 0.05. If the difference ΔBIC between the BIC of each model with the same model with a fixed β=1 is below 0, the model is linear (), between zero and six is inconclusive (open circle) and higher than six (strong evidence) is super-linear (↗)/sublinear (↘). The models were also compared between each other using the respective BICs within the same noise model (grey background has lower BIC) and between all others (bold text indicates the model with the lowest BIC).

Hypothesis testing

3. Are the data compatible with the model? We test the hypothesis that the data were generated by the model. Specifically, for each model, we compute a p-value that quantifies (i) whether the fluctuations in the data are compatible with the expected fluctuations from the model and (ii) whether the residuals are uncorrelated (see appendix E for details). In the case the model is not rejected, i.e. p-value >0.05, the corresponding entry in table 1 is marked with an asterisk.

Model comparison

4. What is the statistical evidence for β≠1? We quantify the evidence for β≠1 by comparing the maximum-likelihood of each model with the corresponding model where we fix β=1. We account for the different number of free parameters (e.g. to avoid overfitting) by using the BIC, , where k is the number of free parameters and N the number of observations (see appendix F for details). The difference in the BIC, ΔBIC≡BIC−BIC indicates whether the model with β≠1 provides a sufficiently better description of the data. From this we infer that, for (i) ΔBIC<0 the model with fixed β=1 (linear scaling) is better, (ii) 0≤ΔBIC<6 the evidence for β≠1 is inconclusive, and (iii) ΔBIC≥6 the model with β≠1 (nonlinear scaling) is better. In table 1, these results are indicated by the symbols (i) (linear), (ii) open circle (inconclusive), or (iii) ↘ (sublinear) or ↗(super-linear). 5. What is the statistical evidence for fluctuation scaling (Taylor’s law)? We quantify the evidence for δ≠1 (δ≠2), i.e. non-trivial scaling in the fluctuations in equation (4.2), in the models of cities with Gaussian (lognormal) noise. Within each class, we calculate ΔBIC≡BIC−BIC, where we compare the BICs of the model where (i) δ is fixed (BIC) and (ii) where δ is a free parameter (BIC). In case of ΔBIC>0, the model with δ as a free parameter (non-trivial fluctuation scaling) provides a better description of the data (see appendix F for details). In table 1, the entry for the selected model is highlighted with a grey background. 6. Which model best describes the data? We calculate the BIC of each of the five models (see appendix F for details) and select the one with the lowest BIC as the one that best describes the data. In table 1, the β of the selected model is printed in bold.

Discussion

In this section, we interpret the outcome of the statistical analysis summarized in table 1. We focus on specific findings and their significance to the problem of scaling in cities.

Data are almost never compatible with the proposed models

In almost all cases, the data are not a typical outcome of any of the five proposed models, leading to a rejection of the models (p-value <0.05). The only exceptions (marked by an asterisk in table 1) are the two lognormal models in UK-income and UK-train stations, and the Gaussian model with free δ for OECD-GDP. There are several possible reasons for the widespread rejection of the models: fluctuations of the data may differ from the fluctuations P(y|x) of the models (e.g. measurement errors are not correctly accounted for by P(y|x)); the observations are not independent (e.g. there are correlations between residuals and city size); different scalings are observed for small and large cities (as discussed in [28] and figure 3).

Figure 3.

Comparison of the model of cities and persons. (a) Scaling of the city model, i.e. lognormal with free δ, and the person model (solid lines) for the data of Brazil-AIDS (dots). While the city model captures a sublinear scaling present in small cities β=0.61, the person model describes the roughly linear scaling β=1.04 of large cities. Shaded areas represent 1 s.d. The running mean (red line) is the average (〈x〉, 〈y〉) over 50 datapoints, {x,y}, in a sliding window over the data ordered in x. (b) Cumulative distribution of heavy-tailed distribution of city sizes in terms of cities and persons, i.e. the fraction of (i) cities of size ≤x (city model) and (ii) the population in cities of size ≤x.

The rejections of the models considered here are a consequence of their strong simplifying hypothesis and show that the development of better models is needed in order to understand the observations and clarify the existence of the nonlinear scaling (1.1). It shows also that the estimated confidence interval cannot be used (in the rejected models) to discard a linear scaling β=1 [21]. Still, the widespread rejection of models does not imply that the nonlinear scaling (1.1) is rejected altogether because it is possible that the data are well described by another (unknown) model consistent with equation (4.1) but different from the ones considered here, e.g. having different fluctuations in P(y|x). These alternative models can have different fluctuation relations or can account for the known (e.g. spatial [3]) correlations in the data. In particular, the generative process underlying the person model could be generalized to account for other effects beyond city-size population (e.g. individuals could be segmented by income). Even if most models are rejected, some models can still describe the data better than others (in terms of BIC). The conclusions drawn from such model comparison analysis depend on the used set of models and may change by the introduction of a better model in the future. Our investigations of scaling laws in cities in the next sections are mostly based on model comparison: we analyse which model and parameters best describe the data, with particular interest in the parameter β.

Different datasets are best described by different models

There is no single model that best describes all databases (the bold value in table 1 appears in different rows). A systematic observation on the 15 datasets is that the person model and the Gaussian model with fixed δ are never the best ones. This indicates that the fluctuations in the (large) cities are much larger than predicted by the scaling δ=1 used in both models. For the other models, there are databases in which they are the best models: the lognormal with fixed δ=2 is the best model in the three UK cases and for USA-GDP; the lognormal model with free δ is the best model for USA-roads and EU-cinema capacity; and the Gaussian model with free δ is the best for EU-cinema usage, OECD-GDP and EU-libraries. The inclusion of the additional parameter δ in the lognormal model, related to Taylor’s law in equation (4.2), is considered beneficial in eight out of the 15 cases (shaded grey regions in the two first rows of table 1). Altogether, these results show that the model underlying the usual approach (lognormal with fixed δ) is often not the best model.

The estimated β depends on the model

Models consistent with the average scaling (4.1), but that have different assumptions regarding the fluctuations, can lead to different estimations of β. Consider the case of EU-cinema attendance. The value estimated from the lognormal model with fixed δ is β=1.46±0.19. It coincides with the usual approach (least-square fitting) and suggests a super-linear relation between the number of cinema visitors and the population of cities. However, if we allow for a different fluctuation scaling as in the lognormal model with free δ, a model that is preferred according to our BIC test, we obtain that β=1.00±0.30, i.e. a linear scaling. Conflicting conclusions are observed also in the EU-theatres database. The data and fittings for these two cases are shown in figure 2. Visual inspection of the graph can be misleading because of the log-scale and the different density of points, and shows the need for more careful (quantitative) statistical analysis. Altogether, the variation of β across different models shows that conclusions regarding β (e.g. β≠1) cannot be done independently from the analysis of the fluctuations. Considering also that different models are preferred for different databases (previous point), this confirms the practical importance of going beyond the usual approach (least-square fitting) both in terms of methods and models, as proposed in this paper.

Figure 2.

Effect of fluctuations on the estimation of β. (a) In the ‘EU cinema usage’ database, the lognormal model with δ=2 yields β=1.46, whereas free δ yields β=1.00. (b) In the ‘EU theatres’ database, the lognormal with free δ yields β=0.92, a lower value than β=1.14 obtained in the Gaussian model with free δ. Shaded areas represent the 68th percentile (±1 s.d.) of P(y|x).

Models are dominated either by the small or the large cities

The variation on the estimation of β across the different models can be better understood by analysing how the city size distribution shown in figure 1a influences the estimation of β. The least-square fitting minimizes the distance between the curve and the points in logarithmic scales (). Therefore, when data are viewed in the usual double logarithmic plot, the best curve will be the one that passes close to most points, i.e. it weights a village as much as a million-size city. The fit will be thus dominated by the large number of small cities. The disadvantage of this is that, even if the model describes well most cities, it may fail to describe the behaviour of most of the population. Our person model addresses this issue by giving the same weight to each person, leading to the problem of describing most people but potentially not most cities. To see this, consider the example of the 5565 Brazilian cities. Half of the Brazilian population lives in the 201 largest cities (3.6% of cities); yet, 50% of smallest cities account for only 8.2% of the total population. This is a direct consequence of the heavy-tailed distribution of city sizes, which holds in all our databases (figure 1a). Our city models with free δ in equation (4.2) allow cases beyond the least-squares fitting (δ=2) and person model (δ=1). The exponent δ controls how the variance of P(y|x) grows with x. A small variance for large x, obtained for small δ, will force the fitted curve (average) to pass close to the points of large cities. The weight of the large cities is inversely proportional to δ. The general considerations above explain a great extent of the variation of β across the models observed in table 1. The values obtained for the Gaussian model with δ=1 and the person model are dominated by large cities, in the lognormal δ=2 case, they are dominated by small cities, whereas for the free δ models, it depends on which best δ is obtained. In the Brazil-AIDS dataset δ≥2 and β is dominated by the small cities (δ=2 in the Gaussian model and δ=2.79 in the lognormal model). Accordingly, the values of β for these two models in the second to last row of table 1 are β≪1 in agreement with the lognormal with δ=2 case and in contrast with the Gaussian δ=1 and person model which have β>1 and are dominated by the large cities. Figure 3 shows the results for this dataset and emphasizes how different models describe different city sizes. The same reasoning also explains the values of β of other databases reported in table 1 (e.g. all UK cases). Comparison of the model of cities and persons. (a) Scaling of the city model, i.e. lognormal with free δ, and the person model (solid lines) for the data of Brazil-AIDS (dots). While the city model captures a sublinear scaling present in small cities β=0.61, the person model describes the roughly linear scaling β=1.04 of large cities. Shaded areas represent 1 s.d. The running mean (red line) is the average (〈x〉, 〈y〉) over 50 datapoints, {x,y}, in a sliding window over the data ordered in x. (b) Cumulative distribution of heavy-tailed distribution of city sizes in terms of cities and persons, i.e. the fraction of (i) cities of size ≤x (city model) and (ii) the population in cities of size ≤x. In summary, the ‘weights’ each statistical model attributes to cities have an impact on the estimated value of β and, in particular, on the visual agreement between the data and the fit in the usual double-logarithmic plots. When the scaling relation (4.1) holds for all x, the difference between the models will not be significant. However, as we showed in §6.1, data are typically not compatible with models. In the cases in which β varies substantially across models, generalization beyond the simple scaling (1.1) [6] should be considered in order to account for the x dependence of β. In this case, the heavy-tailed distribution of city sizes leads many models to be dominated either by the large amount of small cities or by the few cities containing most of the population. This reasoning explains why cut-offs in minimum city size and aggregation of cities (different city borders) [9,16,17] influence the estimated β. All these procedures have a strong influence on the small cities, which are the dominant ones in the least-square fitting (e.g. aggregation of cities into metropolitan areas reduces the number of small cities). While applying cut-offs for small cities increases the visual agreement between the data and the fit in the log–log plot, this is only justified if the scaling (1.1) is interpreted as being valid only for large cities. The latter interpretation limits the relevance of the scaling because it becomes limited to a small fraction of the total cities.

Is the scaling nonlinear?

New answers to this central question emerge from the results of our paper (summarized in table 1). In three of the 15 cases, we found models which are reasonably compatible with the data, and we can base our conclusions on these models, i.e. on the obtained β and on the model comparison with the case β=1 (arrows in table 1). This leads to the conclusion that the UK-income and UK-train stations show linear and OECD-GDP shows super-linear scaling. In the remaining 12 cases, conclusions are based solely on model comparison, and we feel more confident to give an answer to this question only when the same conclusion is obtained for models with different fluctuations (i.e. we compare the conclusions obtained in the best model with lognormal and Gaussian fluctuations). We find such an agreement in eight of the 12 cases, so that the scalings are: UK-patents and OECD-patents are linear; USA-GDP, EU-museum usage and Brazil-GDP are super-linear; USA-roads, EU-libraries and Brazil-AIDS are sublinear. For the remaining four cases, our analysis is inconclusive on the question of linear or nonlinear scaling. Two reasons can lead to this conclusion. The first is that the nonlinear scaling qualitatively changes from β<1 to β>1 depending on the assumptions of the fluctuations (e.g. EU N. theatres). The second reason is that in one of the best models there is no sufficient statistical evidence for β≠1 (marked by an open circle in table 1, EU-cinema capacity, EU-cinema usage and Brazil-external). One interesting case falling in this second reason is EU-cinema usage, for which both the lognormal with fixed δ and the best model (Gaussian with free δ) yield β>1. We still consider this case inconclusive, because the best model, despite showing β=1.13±0.11, only marginally improves (0<BIC<6) upon the model with β=1. In this case, additional data are required in order to increase the statistical evidence in favour of either situation. The possibility of reaching an inconclusive answer shows the advantage of the statistical framework proposed here. In summary, in 15 datasets, we found four linear, four super-linear and three sublinear scalings.

Conclusion

In summary, we investigated the existence of non-trivial β≠1 scalings in city datasets. We introduced five different models, showed how to compare them and how to estimate β, and finally tested our methods and models in 15 different datasets. We found that in most cases models are rejected by the data and conclusions can be based only on the comparison between the descriptive power of the different models considered here. Moreover, we found that models which differ only in their assumptions on the fluctuations can lead to different estimations of the scaling exponent β. In extreme cases, even the conclusion on whether a city index scales linearly β=1 or nonlinearly β≠1 with city population depends on assumptions on the fluctuations. A further factor contributing to the large variability of β is the broad city-size distribution that makes models to be dominated either by small or by large cities. In particular, these results show that the usual approach based on least-square fitting is not sufficient to conclude on the existence of nonlinear scaling. Recent works focused on developing generative models of urban formation that explain nonlinear scalings [10-14]. Our finding that most models are rejected by the data confirms the need for such improved models. The significance of our results on models with different fluctuations is that they show that the estimation of β and the development of generative models cannot be done as separate steps. Instead, it is essential to consider the predicted fluctuations not only in the validation of the model, but also in the estimation of β. Finally, the methods and models used in our paper can be applied to investigate scaling laws beyond cities [20,23].

15 in total

Review 1. Bivariate line-fitting methods for allometry.

Authors: David I Warton; Ian J Wright; Daniel S Falster; Mark Westoby
Journal: Biol Rev Camb Philos Soc Date: 2006-03-30

2. Urban characteristics attributable to density-driven tie formation.

Authors: Wei Pan; Gourab Ghoshal; Coco Krumme; Manuel Cebrian; Alex Pentland
Journal: Nat Commun Date: 2013 Impact factor: 14.919

3. Scaling laws between population and facility densities.

Authors: Jaegon Um; Seung-Woo Son; Sung-Ik Lee; Hawoong Jeong; Beom Jun Kim
Journal: Proc Natl Acad Sci U S A Date: 2009-08-12 Impact factor: 11.205

4. Scaling of Prosocial Behavior in Cities.

Authors: Samuel Arbesman; Nicholas A Christakis
Journal: Physica A Date: 2011-02-15 Impact factor: 3.263

5. The statistics of urban scaling and their connection to Zipf's law.

Authors: Andres Gomez-Lievano; Hyejin Youn; Luís M A Bettencourt
Journal: PLoS One Date: 2012-07-18 Impact factor: 3.240

6. Fluctuation scaling, Taylor's law, and crime.

Authors: Quentin S Hanley; Suniya Khatun; Amal Yosef; Rachel-May Dyer
Journal: PLoS One Date: 2014-10-01 Impact factor: 3.240

7. Constructing cities, deconstructing scaling laws.

Authors: Elsa Arcaute; Erez Hatna; Peter Ferguson; Hyejin Youn; Anders Johansson; Michael Batty
Journal: J R Soc Interface Date: 2015-01-06 Impact factor: 4.118

8. On scaling of scientific knowledge production in U.S. metropolitan areas.

Authors: Önder Nomaler; Koen Frenken; Gaston Heimeriks
Journal: PLoS One Date: 2014-10-29 Impact factor: 3.240

9. Large cities are less green.

Authors: Erneson A Oliveira; José S Andrade; Hernán A Makse
Journal: Sci Rep Date: 2014-02-28 Impact factor: 4.379

10. Rural to Urban Population Density Scaling of Crime and Property Transactions in English and Welsh Parliamentary Constituencies.

Authors: Quentin S Hanley; Dan Lewis; Haroldo V Ribeiro
Journal: PLoS One Date: 2016-02-17 Impact factor: 3.240

26 in total

1. From global scaling to the dynamics of individual cities.

Authors: Jules Depersin; Marc Barthelemy
Journal: Proc Natl Acad Sci U S A Date: 2018-02-20 Impact factor: 11.205

2. Tomography of scaling.

Authors: Marc Barthelemy
Journal: J R Soc Interface Date: 2019-11-27 Impact factor: 4.118

3. Recent advances in urban system science: Models and data.

Authors: Elsa Arcaute; José J Ramasco
Journal: PLoS One Date: 2022-08-17 Impact factor: 3.752

4. Human mobility in large cities as a proxy for crime.

Authors: Carlos Caminha; Vasco Furtado; Tarcisio H C Pequeno; Caio Ponte; Hygor P M Melo; Erneson A Oliveira; José S Andrade
Journal: PLoS One Date: 2017-02-03 Impact factor: 3.240

5. Population patterns in World's administrative units.

Authors: Oscar Fontanelli; Pedro Miramontes; Germinal Cocho; Wentian Li
Journal: R Soc Open Sci Date: 2017-07-05 Impact factor: 2.963

6. The scaling of crime concentration in cities.

Authors: Marcos Oliveira; Carmelo Bastos-Filho; Ronaldo Menezes
Journal: PLoS One Date: 2017-08-11 Impact factor: 3.240

7. Does environmental policy affect scaling laws between population and pollution? Evidence from American metropolitan areas.

Authors: Nicholas Z Muller; Akshaya Jha
Journal: PLoS One Date: 2017-08-09 Impact factor: 3.240

8. An overview of city analytics.

Authors: Desmond J Higham; Michael Batty; Luís M A Bettencourt; Danica Vukadinović Greetham; Peter Grindrod
Journal: R Soc Open Sci Date: 2017-02-01 Impact factor: 2.963

9. A model of urban scaling laws based on distance dependent interactions.

Authors: Fabiano L Ribeiro; Joao Meirelles; Fernando F Ferreira; Camilo Rodrigues Neto
Journal: R Soc Open Sci Date: 2017-03-22 Impact factor: 2.963

10. Unveiling relationships between crime and property in England and Wales via density scale-adjusted metrics and network tools.

Authors: Haroldo V Ribeiro; Quentin S Hanley; Dan Lewis
Journal: PLoS One Date: 2018-02-22 Impact factor: 3.240