Literature DB >> 33349746

Population data mobility retrieval at territory of Czechia in pandemic COVID-19 period.

Jan Platos1, Pavel Kromer1, Miroslav Voznak2, Vaclav Snasel1.   

Abstract

This article describes the methodology and the possibilities of collecting operation data in a mobile network provider. First, the architecture and the principles used in the system are described. The precision analysis of the population commuting in the region and during the pandemic and nonpandemic times. Moreover, several ideas about further utilization of the data will be formulated and described. Finally, a graph-based approach that describes the creation of the community structure between the people and the means of its analysis.
© 2020 John Wiley & Sons, Ltd.

Entities:  

Keywords:  COVID‐19; data analysis; mobile networks; population mobility

Year:  2020        PMID: 33349746      PMCID: PMC7744891          DOI: 10.1002/cpe.6105

Source DB:  PubMed          Journal:  Concurr Comput        ISSN: 1532-0626            Impact factor:   1.831


INTRODUCTION

The tremendous penetration of mobile phones in the population brings many new challenges and many difficulties. The traffic that is generated by the usage of mobile phones is enormous due to colossal internet traffic. The service traffic is also generated by logging into the system, beginning and end of the conversation, and other events. Such information may be utilized to gather essential characteristics of the masses' behavior, such as a population in a city and/or country. The information for the person tracking violates law regulation, such as General Data Protection Regulation and similar in each country. The utilization of the anonymous aggregated data brings a new insight into the behavior patterns and may analyze the specific situation. Many real‐life situations may benefit from such anonymous aggregated data. The first case is the replacement of the personal survey of the occupancy in ground transport. In 2014, we defined a methodology that was certified by the Ministry of Transportation of the Czech Republic, where the means of transport and behavior of the citizens were categorized and classified. Moreover, the methodology defines the categories of citizens according to their behavior during the day. The most important categories are commuter and noncommuter. A commuter is a person who begins in one location, then travels to another location where he spends some time, and then returns. Such a person is usually a worker who commutes to work from its home. The commuting path may be tracked and analyzed to identify the primary transport nodes, bottlenecks, etc. The noncommuter stays the whole day in a location where he started the day and never left it. Such a complex system that can track anonymous masses commuting behavior and paths that were used may be utilized to track the contact between individuals while maintaining the system's anonymous property.

MOBILE PHONE LOCATION DATA ANALYSIS

Mobile (cell) phone data has been used to provide useful information about human mobility and traffic patterns for almost two decades. , , , , , , , , , , , Mobile (cellular) network operators record the location of a mobile phone when it actively uses the network (i.e., makes or receives a call, uses mobile Internet). The information is stored primarily for billing purpose but it can be used as a latent source of aggregate spatiotemporal information about static and dynamic human mobility and traffic patterns. , , , Mobile phone data can be in this way exploited to study travel an commuting times, , , to provide information about congestions and traffic incidents, , and to detect origin‐destination flows. , , Low‐level location data requires the transformation to spatiotemporal trajectory (path) information. Such information can then be used by high‐level applications such as as early warning, environmental monitoring (e.g., air pollution sensing), traffic planning, , region‐level travel demand estimation, sociological (sociogeographical) research. A special attention has been paid to the study of the relationship between mobile phone data and human mobility and public transport. , , New methods to infer from mobile phone location data traffic measures (e.g., vehicle flows and densities on freeways and motorways) and top–level information such as home location , , and activity radius (space), , travel demand, , anchor points (stay locations) of daily travels, and activity types have been recently introduced. The methods include various data preprocessing techniques such as low‐pass filtering, noise removal and thresholding and so on. Then, data processing and mining in the form of cluster analysis , , , , , and outlier detection, , topology construction, , signal analysis (dynamic timewarping), density analysis, , motif extraction, and, for example, visual analytics are employed. In addition, mobile phone location data has been successfully used in context of epidemiology. , , It can contribute to the modeling and estimation of human contact dynamics and disease spread. Mobile phone data and the extracted communication and mobility patterns have been used to study the spatial structure and variation of the HIV epidemics and to identify disease hot spots. Together with social network data, it has shown a great potential for disease tracking and outbreak predictions, as demonstrated on the 2010 cholera epidemy in Haiti. The coronavirus 2019–2020 pandemic is undoubtedly the biggest challenge digital and mathematical epidemiology faces to date. Until the introduction of an efficient vaccine and/or COVID‐19 treatment (unavailable as of June 2020), social distancing and economy shutdown are the only measures available to slow‐down the epidemics. Extensive testing and tracing and highly focused epidemiological procedures can lower the enormous costs associated with the global coronavirus suppression measures. Mobile phone data can be used to tackle COVID‐19 pandemic in several ways: by helping to establish accurate situational awareness and by evaluating the impact of various interventions performed in order to contain the disease. This can be achieved by, for example, the estimation of origin‐destination flows, location and stay (hot spot) modeling, approximation of contact matrices, and so on.

SPATIAL‐TEMPORAL MOBILITY OF THE POPULATION

The utilization of mobile networks for the population mobility analysis opens an entirely new area of the investigation. First of all, the system needs to be described.

System architecture

The system's whole architecture is based on the architecture and the processes behind the wireless communication networks. The standard architecture is shown in Figure 1.
FIGURE 1

Architecture of the cell phone architecture

Architecture of the cell phone architecture Each cell phone communicates with the nearest Base transceiver station (BTS), which is the communication network's main building block. Each BTS is connected with the servers of the communication provider. The location of the BTS in the real world is based on many attributes. The main attributes are: predicted the number of devices connected at the same time, complexity of the landscape, number of buildings and reflection caused by buildings around the station, directions toward the main location of the devices (such as subway exits, etc.). The base stations' placement then creates the topology of the telecommunication network, which may be utilized for many other applications other than pure communication. The first application that is used by the phone operating system providers is the improvement of the phone location based on the triangulation between base station that is used as a correction to the GPS signal. Each mobile device that is connected to the telecommunication network generates a vast footprint during device usage. Each operation that the device performs, such as short message service, audio, video calls, and data transactions, generates a communication with the network that is tracked and stored in logs. Many countries have legislative regulations that define how much information needs to be stored for the police and other purposes. The Czech Republic defined that this information has to be stored for 3 months.

Spatial‐temporal mobility analysis

Analysis of the population's mobility may be gathered using many different systems, for example, GPS locations, debit cards, Internet of Things, and many others. Mobile networks have the main advantage in penetrations. The Czech Republic registers more SIM cards than is the population of the whole country. Such penetration leads to ideal conditions for utilizing this data in the analysis of the population's mobility pattern. The data analysis is done in a single day cycle from 12:00 a.m. to 11:59 p.m. for each day. The first 5 h are not very important, and we may expect that this time is a quiet night period. At 5:00 a.m. in the morning, start a day period, where each person/device's behavior is investigated. The location where the device is at five is taken as a starting home location. The location is tracked for the whole day, and it is expected that the device will move from the starting location to other places. In the majority, the movement leads to the working location. The sequence of the cells and BTS used during the movement is carefully tracked and evaluated for precise mapping of the location. Of course, the movement usually starts much later than at five. However, the 5:00 a.m. is the threshold, investigated, and defined in the Methodology to be a good time for division the night and day periods. The tracking continues during the whole day until the device ends in a location in the evening and night hours. When the location is the same as it was in the morning, we consider this location a full home location. Otherwise, the person changes its night location for obvious reasons such as business trips, etc. Such behavior has limitations and exceptions, which are mostly visible on Mondays, Fridays, and Saturdays. On Monday and Friday, many people travel from the weekend location to the work location and back. Fridays and Saturdays are also days for event trips with friends etc. When all devices are investigated, we receive the heat map of locations and stations where people appeared and its time distribution during the day. Such information may be aggregated to receive the overall statistics and its development during the day. The system parameters strictly limit the granularity of the statistics. The optimal interval is hours because the hour is used to refresh the communication between the device and the network when no other activity is performed. Due to fast development in the device's quality, shorter intervals may be selected, such as 30 and 15 min. The 4G networks use a shorter interval for synchronization and "i‐am‐alive" messages between the device and the network. The 5G networks will bring a new problem because they use adaptive syncing, where the interval is not defined as fixed.

UTILIZATION OF THE GATHERED DATA

The data aggregated during the day on the individual base stations must be grouped for the higher territorial unit. On a single pylon, the standard behavior contains several base station for each technology and several different technologies. The aggregation process may be defined as a sequence of the steps that follow each other to bring a more globalized view of the data. The details are shown in Figure 2. When the data are aggregated, there is much application that maybe build on it.
FIGURE 2

Aggregation process for the day cycle

Aggregation process for the day cycle

Behavior patterns in population mobility

When all the aggregated data are collected for a defined time interval, a behavior analysis may be performed. There are two basic views of the analysis. The first is based on the analysis of the occupancy of a place. The second is the population flow in time.

Static analysis of the people location

The occupancy or the utilization of the concrete base stations is an analysis relevant for evaluating the concentration of people in defined places. Such information may be utilized for: planning of public transportation services, crowd avoidance, security and safety analysis, urban planning, phenomena detection, unique person visits. For each place ‐ base station location, city part, city, region, we can count the number of unique devices/persons each time unit. This information is not very usable until the differences between dates are analyzed. As may be seen from Figure 3, the difference between the long‐term average and the current situation. The high decrease of the commuters and even higher increase in noncommuters indicates that the strange situation has happened because very few people come into the area and even fewer of them leave the area during the day. Of course, the strange situation is a COVID‐19 pandemic and the almost lock‐down of the whole country—more about this topic in the next section.
FIGURE 3

The differences in percents of the commuters and non‐commuters between March 15 and May 30 in Prague 1 city district

The differences in percents of the commuters and non‐commuters between March 15 and May 30 in Prague 1 city district The long‐term data collection in a specific location may be utilized to predict the area's occupancy in the future. Such information may be utilized to plan public transport and gather the number of persons for future calculation. An example of such prediction is depicted in Figure 4. As may be seen, the prediction may be made with defined confidence intervals.
FIGURE 4

Prediction for 14 days of the cumulative amount of persons

Prediction for 14 days of the cumulative amount of persons

Dynamic analysis of the people location

The population flow in time is valuable information for: public transport planning in the inter‐city level, occupancy of the transport vehicle measurement, road capacity utilization, traffic control, accident detection. The occupancy of the public transport services' vehicles was the original idea extended into many other applications. For measurement of this occupancy, precise information of the road network, station locations, vehicle movement logs is necessary. Figure 5 depicts the relations and number of people that travel between two cities along the trajectory of the public transport service concerning the source and the destination of the people during a day. The number in the edges depicts the comparison between the known number investigated by the Czech Statistical Agency using the survey analysis and the number computed from the data. As may be seen, the dynamic analysis can deliver information about the transport patterns. Because we can track the location and the speed of transportation and approximate the position, we can detect the type of vehicles used in transportation. With the time‐tracked GPS position of the public transport vehicles, we may detect people who follow the same pattern as the vehicles, which means they travel by the vehicle. If a person follows the same trajectory, it uses the same roads, but the cars travel at a different speed with no stops on these short trips. The system is then able to distinguish between public transport users and drivers. Even though many factors limit precision, it can achieve very high precision. Moreover, such information may be utilized the collect information about traffic jams, accidents, etc. The measurement of speed can also detect faster vehicles such as emergency helicopters that travel in the same area.
FIGURE 5

Relations of commuters between different cities/stations from city of Beroun to Prague

Relations of commuters between different cities/stations from city of Beroun to Prague The direct utilization of the data may also bring a new application to personal transport. Especially during the infection disease period, a person wants to avoid large crowds during their transport. The person flow data may help predict crowded places and calculate the path between the source and destination places with the smallest probability of crowded surroundings. Such applications appeared in the COVID Pandemic in many countries and cities.

Graph‐based applications

The detection of a device in a base station generates a vast stream of data from the whole network. Due to the time dependency, the data may be represented as time‐dependent graphs. Such a graph, which is demonstrated in Figure 6, may utilize the information from the temporary closeness of the devices/persons to create a probability network of the local neighborhood connections. The figure shows on the left that people change the location during the time and the occupancy on three base stations. The groups of people who appear in each time step on the BTS‐B are highlighted. The corresponding graph is depicted on the right. As may be seen, each time step generates cliques that evolve during the time. According to these time slices, a probability graph may be constructed and used to analyze the relationships between persons. Such a graph reflects the probability the person visited a place covered by the BTS.
FIGURE 6

Example of the network model based on the collected data

Example of the network model based on the collected data Such graphs may also be constructed from the logs of the GPS positions, but it requires the persons' cooperation and enabling of such functionality. When a graph is constructed, many different measures may be used for distance measurement to gather the similarity between users based on their mobility during the day. Among others, the shortest path describes the first level distance between nodes. As a more complex distance, a Close Trail Distance measure may be utilized. The behavior of the persons may then be analyzed using the clustering coefficients. The behavior patterns may be compared based on the historical records and compared the situations with reference and precisely analyzed situations.

Possible applications

The essential aspect of the data is its security. The first thing that needs to be set is that the analysis and statistics are created from the providers' anatomized data and fully anonymized daily. Therefore, the system cannot identify anybody directly from the data. Moreover, the person cannot be identified from its behavior. The security issues mentioned in the previous paragraph prevent the use of this data to direct mapping the location of the current people. This is a good thing because mentioning this methodology in the major TV news show leads to a concern of the general public and the personal freedom activist. On the other hand, the methodology may be used in other COVID‐related applications. The Czech Government uses these data to monitor the citizens' behavior related to their general mobility and the increase of the number of noncommuters, that is, people who do not commute to the job. The data are used on a day‐by‐day basis, and the government uses this to check the regulations' compliance. The compliance with the people quarantine regulation applied to the COVID‐positive persons may not be used the anonymous data. However, the government may grant the police, under the state of emergency, to use the nonanonymous data to monitor the quarantined people's real movement. Such data were used by the Israel secret service to monitor Israel citizens during the first wave of the COVID pandemic. However, the practice was forbidden by the constitutional court. Many government applications use very similar approaches to track citizen mobility and contacts. Unfortunately, such an application did not attract many users due to controversial or unclear personal data handling. The new applications based on the technology developed by the Google and Apple company, as the primary cell phone operating system providers attract a higher number of users, and the newly designed application, bring a completely new contact tracking level.

CONCLUSION

In this article, the system for processing the massive data from the telecommunication networks collected by the providers and stored under the national regulations were described. Several applications concerning the nature of the data were presented. Data collection brings completely new problems concerning the General Data Protection Regulation in the EU, national regulations, and other laws and restrictions. An aggregated data without any identification may be analyzed and the usable output produced with a well‐defined methodology. The Czech Ministry of Transportation certified the methodology. Other applications include population mobility during the time interval, comparison of the mobility patterns and changes, and anomaly detection. Moreover, several measures for data analysis were summarized. The mentioned applications need a vast computational power available in the data centers of the network providers. Such analysis will create an entirely new market for the providers. Using methodologies like the mentioned in these articles, the network providers will create an entirely new market with the specialized analysis that is impossible using the classical scheme and data sources.
  7 in total

1.  AllAboard: Visual Exploration of Cellphone Mobility Data to Optimise Public Transport.

Authors:  G Di Lorenzo; M Sbodio; F Calabrese; M Berlingerio; F Pinelli; R Nair
Journal:  IEEE Trans Vis Comput Graph       Date:  2016-02       Impact factor: 4.579

2.  Tracking Disease: Digital Epidemiology Offers New Promise in Predicting Outbreaks.

Authors:  Mary Bates
Journal:  IEEE Pulse       Date:  2017 Jan-Feb       Impact factor: 0.924

3.  Socio-geography of human mobility: a study using longitudinal mobile phone data.

Authors:  Santi Phithakkitnukoon; Zbigniew Smoreda; Patrick Olivier
Journal:  PLoS One       Date:  2012-06-28       Impact factor: 3.240

4.  Unveiling Spatial Epidemiology of HIV with Mobile Phone Data.

Authors:  Sanja Brdar; Katarina Gavrić; Dubravko Ćulibrk; Vladimir Crnojević
Journal:  Sci Rep       Date:  2016-01-13       Impact factor: 4.379

5.  Population data mobility retrieval at territory of Czechia in pandemic COVID-19 period.

Authors:  Jan Platos; Pavel Kromer; Miroslav Voznak; Vaclav Snasel
Journal:  Concurr Comput       Date:  2020-11-24       Impact factor: 1.831

6.  Mobile phone data for informing public health actions across the COVID-19 pandemic life cycle.

Authors:  Nuria Oliver; Bruno Lepri; Harald Sterly; Renaud Lambiotte; Sébastien Deletaille; Marco De Nadai; Emmanuel Letouzé; Albert Ali Salah; Richard Benjamins; Ciro Cattuto; Vittoria Colizza; Nicolas de Cordes; Samuel P Fraiberger; Till Koebe; Sune Lehmann; Juan Murillo; Alex Pentland; Phuong N Pham; Frédéric Pivetta; Jari Saramäki; Samuel V Scarpino; Michele Tizzoni; Stefaan Verhulst; Patrick Vinck
Journal:  Sci Adv       Date:  2020-06-05       Impact factor: 14.136

7.  Closed trail distance in a biconnected graph.

Authors:  Vaclav Snasel; Pavla Drazdilova; Jan Platos
Journal:  PLoS One       Date:  2018-08-31       Impact factor: 3.240

  7 in total
  1 in total

1.  Population data mobility retrieval at territory of Czechia in pandemic COVID-19 period.

Authors:  Jan Platos; Pavel Kromer; Miroslav Voznak; Vaclav Snasel
Journal:  Concurr Comput       Date:  2020-11-24       Impact factor: 1.831

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.