| Literature DB >> 36212894 |
Teresa Cristóbal1, Alexis Quesada-Arencibia1, Gabriele Salvatore de Blasio1, Gabino Padrón1, Francisco Alayón1, Carmelo R García1.
Abstract
Millions of people use public transport systems daily, hence their interest for the epidemiology of respiratory infectious diseases, both from a scientific and a health control point of view. This article presents a methodology for obtaining epidemiological information on these types of diseases in the context of a public road transport system. This epidemiological information is based on an estimation of interactions with risk of infection between users of the public transport system. The methodology is novel in its aim since, to the best of our knowledge, there is no previous study in the context of epidemiology and public transport systems that addresses this challenge. The information is obtained by mining the data generated from trips made by transport users who use contactless cards as a means of payment. Data mining therefore underpins the methodology. One achievement of the methodology is that it is a comprehensive approach, since, starting from a formalisation of the problem based on epidemiological concepts and the transport activity itself, all the necessary steps to obtain the required epidemiological knowledge are described and implemented. This includes the estimation of data that are generally unknown in the context of public transport systems, but that are required to generate the desired results. The outcome is useful epidemiological data based on a complete and reliable description of all estimated potentially infectious interactions between users of the transport system. The methodology can be implemented using a variety of initial specifications: epidemiological, temporal, geographic, inter alia. Another feature of the methodology is that with the information it provides, epidemiological studies can be carried out involving a large number of people, producing large samples of interactions obtained over long periods of time, thereby making it possible to carry out comparative studies. Moreover, a real use case is described, in which the methodology is applied to a road transport system that annually moves around 20 million passengers, in a period that predates the COVID-19 pandemic. The results have made it possible to identify the group of users most exposed to infection, although they are not the largest group. Finally, it is estimated that the application of a seat allocation strategy that minimises the risk of infection reduces the risk by 50%.Entities:
Keywords: COVID-19; Contact patterns; Data mining; Intelligent transport systems; Network epidemiology
Year: 2022 PMID: 36212894 PMCID: PMC9525233 DOI: 10.1007/s12652-022-04427-2
Source DB: PubMed Journal: J Ambient Intell Humaniz Comput
Summary of the relevant characteristics of different methodologies
| Work | Population type | Methodology for generating contact network | Contact types | Recorded information (Study period) | Participant sample size (contacts/interactions sample size) |
|---|---|---|---|---|---|
| Ferguson | General | Simulation based on instantaneous population density | Co-presence in households, schools and workplaces | NA | USA synthetic population of 300 million UK synthetic population of 58.1 million (NA) |
| Longini | General | Simulation bases on census data, demographic information and social network data | Co-presence in households, schools and workplaces | NA | NA |
| Wallinga | General, excluding 0–1 year | Survey (face-to-face interview) | Conversational | Age location (1 week) | 2106 |
| Moosong | General | Survey (Self-report) | Physical Conversational | Age location, duration frequency (1 day) | 7297 (97,904) |
| Klepac | General | Survey (Self-report) | Physical conversational | Age location (1 day) | 40,177 |
| Meyers | General | Simulation based on census data, demographic information, and social network data | Co-presence in households, schools, workplaces, hospitals and other public places | Simulated location | Vancouver synthetic population of 2600 people |
| Chao | General | Simulation based on census data, demographic information, and census transportation | Co-presence in households, schools, workplaces and community | Simulated location | USA synthetic population of 280 million |
| Isella | Event participants | Sensor proximity (RFID) | Distance proximity | Contact participants duration frequency (3 months) | 100 people in Scientific conference (10,000), 14,000 people in museum exhibition (23,000) |
| Cattuto | Event participants | Sensor proximity (Wireless sensor) | Distance proximity | Contact participants duration frequency (12, 3 and 2 days) | 25 people in exhibition (8700), 575 people in Scientific conference (17,000), 405 people in Scientific conference (60,000) |
| Salathé | School community | Sensor proximity (Wireless sensor) | Distance proximity | Contact participants duration frequency (1 day) | 778 (762,868) |
| Isella | Paediatric hospital service community | Sensor proximity (RFID) | Distance proximity | Contact participants duration frequency (7 days) | 119 (16,000) |
| Stehlé | School community | Sensor proximity (RFID) | Distance proximity | Contact participants duration frequency (2 days) | 242 (77,602) |
| Stopczynski | University students | Sensor proximity (Bluetooth) | Distance proximity | Contact participants duration frequency (28 days) | 464 (1,472,090) |
| Proposed Methodology | Intercity public transport system users | Sensor Proximity (Contactless smart cards) Data Mining Data driven simulation | Distance proximity | Contact participants duration frequency vehicle (31 days) | 43,804 (176,892) |
Notation of the formal model used by the methodology
| Notation | Meaning |
|---|---|
| Node on the transport network. Each node is associated with a stop. Subscript | |
| Set of transport network nodes | |
| Transport network arc. Each arc directly links two nodes of set | |
| Set of arcs on the transport network directly linking two nodes | |
| Directed graph representing the transport network | |
| Route on the transport network. Subscript | |
| Set of defined routes on the transport network | |
| Vehicle journey on route | |
| Set of vehicle journeys on all defined routes on the transport network | |
| Period of time | |
| Moment of time in period | |
| Set of vehicle journeys that have been completed in time period | |
| Set of vehicle journeys on route | |
| Set of vehicle journeys on route | |
| Public transport vehicle | |
| Vehicle journey on route |
Fig. 1Example of the proposed notation on a route
Fig. 2Schematic representation of vehicle
Fig. 3Schema of the Graph Database
Event data structure
| Start date and time | p1 user key | p1 age group | p2 user key | p2 age group | Number of events | Total duration |
Bodywork type data structure
| Bodywork type key | Seat identifier | |
|---|---|---|
| { |
Fig. 4General process diagram
Distance between the estimated destination node and the actual destination node
| Distance | Number of trips | Percentage of cases % |
|---|---|---|
(actual stop = estimated stop) | 98 926 | 48.21 |
| 0 < | 20 630 | 10.05 |
| 0.5 km < = | 21 939 | 10.69 |
| 0.75 km < = | 5452 | 2.65 |
| 1 km < = | 17 485 | 8.52 |
| 3 km < = | 40 751 | 19.86 |
Number of nodes (entities) of each type
| Node type | Number of entities |
|---|---|
| Bodywork type | 23 |
| User | 43,804 |
| Vehicle journey | 70,732 |
| Route | 440 |
| Stop | 2923 |
| Payment card | 44,372 |
| Vehicle | 443 |
Trips completed in the selected period
| Total trips | 996,184 |
|---|---|
| Trips with known destination | 60,545 |
| Trips with unknown destination | 935,639 |
| Trips for which the destination could be estimated | 725,145 |
| Trips for which the destination could not be estimated | 210,494 |
| Total trips entered in the Graph Database | 785,718 |
Fig 5Preliminary values obtained per age group in the period analyzed; a Number of users per age group, b Number of journeys per time interval, c Quartiles of the travel time per age group, and d Number of journeys per age group
Average number of trips made by members of each age group in the study period
| Age group | Average number of trips made |
|---|---|
| 0–14 | 11.8 |
| 15–19 | 16.6 |
| 20–24 | 16.4 |
| 25–29 | 18.1 |
| 30–39 | 20.8 |
| 40–49 | 21.6 |
| 50–59 | 20.2 |
| 60–69 | 15.9 |
| > = 70 | 11.9 |
Summary of median total events
| 1 event | 2 events | > 2 events | |
|---|---|---|---|
| Co-presence | 176,892 | 2897 | 225 |
| Close interaction EP | 20,941 | 270 | 13 |
| Close interaction MRP | 10,465 | 56 | 0 |
Summary of median event duration in minutes
| 1 event | 2 events | > 2 events | |
|---|---|---|---|
| Co-presence | 14 | 33 | 57 |
| Close interaction EP | 24 | 39 | 58 |
| Close interaction MRP | 24 | 40 | 59 |
Fig. 6Event matrices; a Co-presence interactions TM, b Co-presence interactions RM, c Close interactions TM applying EP, d Close interactions RM, e Close interactions TM applying MRP, and f Close interactions RM applying MRP
Population of Gran Canaria and users of the transport system by age group
| Age groups | Inhabitants | Users | Percentage |
|---|---|---|---|
| 0–14 | 109,616 | 1009 | 0.92 |
| 15–19 | 45,409 | 10,001 | 22.02 |
| 20–24 | 45,563 | 7203 | 15.8 |
| 25–29 | 51,535 | 3421 | 6.63 |
| 30–39 | 121,009 | 4585 | 3.78 |
| 40–49 | 152,829 | 5597 | 3.66 |
| 50–59 | 137,619 | 5585 | 4.05 |
| 60–69 | 91,611 | 3303 | 3.6 |
| > = 70 | 96,040 | 3100 | 3.22 |