| Literature DB >> 23691194 |
Zhuojie Huang1, Xiao Wu, Andres J Garcia, Timothy J Fik, Andrew J Tatem.
Abstract
The expanding global air network provides rapid and wide-reaching connections accelerating both domestic and international travel. To understand human movement patterns on the network and their socioeconomic, environmental and epidemiological implications, information on passenger flow is required. However, comprehensive data on global passenger flow remain difficult and expensive to obtain, prompting researchers to rely on scheduled flight seat capacity data or simple models of flow. This study describes the construction of an open-access modeled passenger flow matrix for all airports with a host city-population of more than 100,000 and within two transfers of air travel from various publicly available air travel datasets. Data on network characteristics, city population, and local area GDP amongst others are utilized as covariates in a spatial interaction framework to predict the air transportation flows between airports. Training datasets based on information from various transportation organizations in the United States, Canada and the European Union were assembled. A log-linear model controlling the random effects on origin, destination and the airport hierarchy was then built to predict passenger flows on the network, and compared to the results produced using previously published models. Validation analyses showed that the model presented here produced improved predictive power and accuracy compared to previously published models, yielding the highest successful prediction rate at the global scale. Based on this model, passenger flows between 1,491 airports on 644,406 unique routes were estimated in the prediction dataset. The airport node characteristics and estimated passenger flows are freely available as part of the Vector-Borne Disease Airline Importation Risk (VBD-Air) project at: www.vbd-air.com/data.Entities:
Mesh:
Year: 2013 PMID: 23691194 PMCID: PMC3655160 DOI: 10.1371/journal.pone.0064317
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Descriptions of covariates used in the modeling process.
| Variables | Descriptions |
|
| |
| Popi | The population of the origin city |
| Popj | The population of the destination city |
| PPP2005i | The purchasing power index where the origin airport serves |
| PPP2005j | The purchasing power index where the destination airport serves |
| PDA2005i | The purchasing power per capita index where the origin airport serves |
| PDA2005j | The purchasing power per capita index where the destination airport serves |
| Strengthi | The sum of the edge weights of the adjacent edges for each vertex for the origin city |
| Strengthj | The sum of the edge weights of the adjacent edges for each vertex for the destination city |
| Degree_Outi | The degree number of the origin city on the air travel network |
| Degree_Inj | The degree number of the destination city on the air travel network |
| Closeness_Centralityi | The mean geodesic distance between a given node and all other nodes with paths from the given node to the other node. This variable is calculated according to the origin city. |
| Closeness_Centralityj | The closeness centrality measure for the destination city. |
| Betweeness_Centralityi | The number of shortest paths going through the original airport. |
| Betweeness_Centralityj | This is the calculation of betweeness centrality for the destination airport. |
|
| |
| Inverse Distance | Inverse great circle distance between the origin and the destination airport |
| Country | Indicates whether the origin and the destination are in the same country. |
| Alternative | Number of alternative routes to the destination |
| Stops | Number of stops on the shortest route from the origin to the destination |
| MaxC | The maximum capacity along the shortest path |
| Degree Link Type | This variable identifies the types of flows between different hierarchies of airports defined by the air travel services level. |
| Economic Link Type | This variable identifies the types of flows between different hierarchies of airports |
| Haul Type | This variable differentiates the effect of long haul flights. 1 for short-haul (2000 kilometers or less), 2 for medium-haul (between 2000 and 3500 kilometers) and 3 for longer hauls (3500 or more kilometers). |
Comparison of the four models with respect to prediction accuracy (in percentages).
| Coverage rate of the 95% prediction intervals | Avg. coverage rate of the 95% prediction intervals(cross-validation) | Range of coverage rate of the 95% prediction intervals(cross-validation) | Coverage rate of the ±30% observation intervals | Avg. coverage rate of the ±30% observation intervals(cross-validation) | Range of coverage rate of the ±30% observation intervals(cross-validation) | Successful prediction rate | Avg successful prediction rate(cross-validation) | Range of successful prediction rate(cross-validation) | |
| Model 1 | 6.39 | 6.73 | [5.89,7.56] | 29.82 | 29.76 | [21.94,31.84] | 68.42 | 68.33 | [66.49,69.52] |
| Model 2 | 4.80 | 4.79 | [0.27,10.33] | 31.48 | 30.63 | [29.62,31.82] | 69.16 | 68.80 | [67.44,70.34] |
| Model 3 | 23.16 | 24.16 | [22.42,25.27] | 33.09 | 33.38 | [29.94,42.29] | 70.04 | 69.43 | [68.25,70.92] |
| Model 4 | 52.11 | 49.86 | [46.97,51.63] | 47.79 | 31.17 | [30.52,33.2] | 79.72 | 70.41 | [69.97,72.71] |
Root Mean Squared Errors and Mean Absolute Errors for all models.
| Measurement | Categories | Number of Records | Model 1 | Model 2 | Model 3 | Model 4 |
| RMSE | Observed Passenger (OP)<102 | 2379 | 1680 | 2923 | 1947 | 726 |
| OP in 102–103 | 6440 | 3536 | 5127 | 32405 | 1802 | |
| OP in 103–104 | 7314 | 7397 | 8771 | 10639 | 4346 | |
| OP in 104–105 | 4817 | 20780 | 23002 | 41585 | 21940 | |
| OP>105 | 1132 | 163352 | 85897 | 216610 | 127194 | |
| MAE | Observed Passenger (OP)<102 | 2379 | 286 | 538 | 402 | 120 |
| OP in 102–103 | 6440 | 629 | 1073 | 1413 | 333 | |
| OP in 103–104 | 7314 | 2729 | 3218 | 3140 | 1621 | |
| OP in 104–105 | 4817 | 14697 | 14929 | 19689 | 13415 | |
| OP>105 | 1132 | 115710 | 61305 | 94447 | 89233 |
Figure 1Diagnostic plots for all models.
a) Predicted vs. observed value of model 4. b) Residual vs. observed value of model 4. c) Distribution of ratio of predicted value vs. observed value in log scale with 95% confidence interval for geometric mean. d) Distribution of ratio of capacity vs. observed value in log scale with 95% confidence interval for geometric mean.
Figure 2Predicted air traffic flows.
a) Predicted flights with passenger flows of more than 100,000. b) All possible passenger flows through direct flights originating from Atlanta. c) All possible passengers’ flows through one-stop flights originating in Atlanta. d) All possible passengers’ flows through two-stop flights originating Atlanta. e) All airports with an incoming passenger numbers more than 5,000,000.