Literature DB >> 36201501

Goal-oriented possibilistic fuzzy C-Medoid clustering of human mobility patterns-Illustrative application for the Taxicab trips-based enrichment of public transport services.

Miklós Mezei¹, Imre Felde², György Eigner^2,3, Gyula Dörgő⁴, Tamás Ruppert⁴, János Abonyi⁴.

Abstract

The discovery of human mobility patterns of cities provides invaluable information for decision-makers who are responsible for redesign of community spaces, traffic, and public transportation systems and building more sustainable cities. The present article proposes a possibilistic fuzzy c-medoid clustering algorithm to study human mobility. The proposed medoid-based clustering approach groups the typical mobility patterns within walking distance to the stations of the public transportation system. The departure times of the clustered trips are also taken into account to obtain recommendations for the scheduling of the designed public transportation lines. The effectiveness of the proposed methodology is revealed in an illustrative case study based on the analysis of the GPS data of Taxicabs recorded during nights over a one-year-long period in Budapest.

Entities: Chemical

Mesh：

Year: 2022 PMID： 36201501 PMCID： PMC9536562 DOI： 10.1371/journal.pone.0274779

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.752

Introduction

According to the UN reports, cities are responsible for approximately 70% of global carbon emissions, and the expected population living in cities will reach 6.5 billion by 2050 [1]. The transport sector is one of the main contributors to greenhouse gas emission. Rapid urban population growth, traffic congestion, and related air pollution put cities at the center of the climate mitigation agenda. These facts suggest urgent and transformative actions in urban mobility are required [2]. According to the report of Masson-Delmotte et al. on global warming [3], in 2014, transportation accounted for 23% of global energy-related CO2 emissions and by 2017 the impact of road transport was further increased by 2%, from which 44% was caused by passenger cars [4]. Another report from the European Commission states that, urban mobility accounts for 40% of all CO2 emissions of road transportation and up to 70% of other pollutants from transportation [5]. In slight contrast, another study by Toledo et al. found that individual motorized transport causes 59% of greenhouse gas emissions [6]. Shapiro et al. compared the emissions of private vehicles and public transportation, and found that public transport produces 95% less CO, 45% less CO2, and 48% less NO2 than private vehicles [7]. According to the work of Jenks et al. on the dimensions of sustainable cities, the sustainability of cities depends on environmental, transportation, social and economic issues [8]. This is well complemented by the nowadays trending smart-city concept, which supports the different fields of urban mobility to decrease carbon emissions [2]. Smart-mobility applications like mobile monitoring systems [9], traffic performance measurement [10], bicycle-sharing systems [11] and smart vehicle routing systems [12] were proved to have an advantageous environmental impact in facilitating air-pollution reduction. Based on the findings of Jenks et al., the improvement of transportation should be based on a better understanding of the impact of the urbanization form on travel behaviours [8]. By optimizing public transportation based on human mobility patterns, there is a possibility to avoid constant traffic jams and decrease pollution. The Taxicab trips provide a valuable and unique source of information to explore human mobility patterns of the city. As it was pointed out by Siła-Nowicka et al., the analysis of human mobility patterns is essential for understanding the evolution of size and structure of urban areas [13]. The primary goal of the analysis of these mobility patterns is to get a better overview of the system design. This trend of analysis has gained momentum, as the Internet of Things (IoT) equipment that captures movement information in real-time and at detailed spatial and temporal scales (e.g., GPS trackers [14]) has changed the ability to collect movement data [15]. These GPS trajectories enable the exploration of formerly hidden aspects of the dynamics of cities. Cities have introduced the concept of GPS sensor-equipped Taxicabs to enable Taxicab tracking services, which generate GPS trace mobility data [16]. As it was highlighted by Kumar et al., this position information provides helpful insight into the human mobility patterns of the city [17]. As an example, the work of Kaltenbrunner et al. illustrates how the data recorded by the bicycle sharing system was utilized to detect temporal and geographic mobility patterns within the city [18]. Böhm et al. used GPS traces and a microscopic model to analyse the emissions of four air pollutants from thousands of vehicles in three European cities [19]. Chen et al. presented how these trajectories could be used to analyse the emission of particle matter from braking behaviours [20]. Human mobility analysis approaches were overviewed, and two predictions (next-location- and crowd flow prediction) and two productive tasks (trajectory- and flow generation) were discussed in the work of Massimiliano et al. [14]. Mobility pattern mining is also used to understand the group-based travel behaviours as presented by Du et al. [21]. The analysis helps to diagnose and understand the residence of each region with their demand for public transportation. This is especially crucial, as according to Egger [22], transportation choices are a fundamental component of the sustainability due to their relevant impact on the economy and the further social, political, and environmental aspects. According to Badia et al., the convenience of transit systems versus cars in urban areas is generally well-accepted [23], and in particular, electric bus-based public transportation systems should be designed to improve the sustainability of the cities according to the work of Majumder et al. [24]. For the design of these networks, a multi-stage machine learning framework has been developed in the work of Tang et al. to predict boarding stops of passengers based on recurrent neural networks (RNN) [25]. Based on data-driven models, the stops of bus trips can also be estimated for public transport planning as it was presented in another work of Tang et al. [26]. The bus driving cycles were also analysed, where the on-road bus speed data were extracted from GPS data, identifying five significantly different bus-driving patterns [27]. In another work by AlRukaibi and AlKheder, the bus stopping stations are optimized in Kuwait, where a standard distance is proposed to keep 1–1.4 km between every two stops [28]. As it was described by Kumar et al., Taxicabs have comprehensively good coverage of the city, hence provide a basis for a reasonably good estimation of general mobility trends of people and city hotspots [17]. In their work, the trajectories of Taxicab positions are represented by the sequence of GPS points or the origin-destination pair for each passenger ride. The clustering of origin-destination locations provides valuable insight into the passenger movement and helps to identify where the Taxicab drivers are most likely to find their next customer [17]. Clustering is also an efficient approach to get an adaptive routing method for the cruising Taxicabs by suggesting vacant Taxicabs to the pathways having many potential passengers as showed in the work of Yamamoto et al. [29]. Data mining techniques, such as clustering and naive Bayesian classifier, are also applicable to historical data for building models and predicting Taxicab demand in context of time, weather, and location [30, 31]. The mobility patterns within the city of Singapore were analysed in the work of Kumar et al. by density-based clustering of origin-destination pairs of the passenger Taxicab rides using the DBSCAN algorithm [17]. Density-based hierarchical clustering method (DBH-CLUS) is used to identify pick-up/drop-off hotspots by Wan et al. [32], and the spatio-temporal patterns in the passenger movements are discovered using spatial clustering of the origin-destination data pairs in the work of Guo et al. [33]. Although, clustering is an efficient approach for the grouping of the rides and detecting relevant and frequent mobility patterns, its application for the design of public transportation lines reveals three major deficiencies and practical problems/aspects: The outliers shift the cluster centroids, significantly hindering the detection of relevant patterns. As we would like to avoid the ad-hoc installation of new public transportation stops, the cluster centroids should be selected from the existing stops of the city. The assignment of rides to public transportation stops is not arbitrary, only rides starting within a walkable distance should be considered as the member of a cluster. The k-means algorithm is capable to solve the practical segmentation problems [34, 35], while the classical Fuzzy C-means (FCM) [36] approach is the better choice for spherical clusters [37]. The classical FCM uses a variant of distance-based measure to define the distance between the cluster center and members. The Possibilistic Fuzzy c-means (PFCM) algorithm is introduced by Pal et al. [38] to reduce the effect of outliers in a cluster by the introduction of a typicality factor in the cost function. This algorithm was further modified by Király et al. [39] to retrieve the cluster centroids from a pre-defined set and form a Fuzzy C-medoid solution. The possibilistic approach to clustering aims to address the problems associated with the constraint on the membership used in FCM. Foremost, the main difference between FCM and Possibilistic C-means (PCM) [40] is in the membership representation. In the fuzzy case, each point is the member of different clusters at a particular ratio (the sum of the membership values of each point is 1), so the constraint used by the FCM approach can be interpreted as a shared degree of membership value (What is the ratio of the specific point in the cluster membership?) but not as degrees of typicality (How typical is the specific point in the cluster?) [41]. The membership value in a cluster represents the possibility of the point belonging to the cluster. On the other hand, the typicality of the point in the cluster features how typical the point in the specific cluster is. Since noise points or outliers are less typical in a cluster, typicality-based memberships automatically reduce the effect of noise points and outliers, and considerably improve the results. The daytime bus transportation schedules in many cities are usually well designed [42]. Late at night, Taxicab is the only way for getting around. Formerly, the night-bus route planning problem is investigated by leveraging Taxicab GPS traces based on the expected number of passengers along the routes [42]. Similarly, the daytime public transportation in the city of Budapest is relatively dense, hence we focused on the analysis of the late-night Taxicab rides to 1) Identify the mobility of the city 2) Make recommendations for the design of public transportation lines. The developed PFCMD clustering algorithm aims to cluster the start and end positions of Taxicab rides to public transportation stops to see whether a well-organized public transport line could replace the group of these Taxicab rides. The resultant rides are grouped according to their position, while the start time of the lines can be determined by the temporal analysis of the start times of Taxicab rides in the specific group. The frequently occurring start times indicate when the lines obtain the most significant possibility of replacing individual Taxicab rides. The developed analyses can also help to optimize the efficiency of the Taxicab service. We aim at modifying the PFCM clustering algorithm to Possibilistic Fuzzy C-medoid (PFCMD) to find the clusters based on the pre-defined set of possible central points and group the taxi rides within walking distance to these centroids. On the grounds of the aforesaid, the contribution of the present paper is to fully describe the developed novel Possibilistic Fuzzy C-medoid (PFCMD) clustering algorithm and prove its applicability for the discovery of human mobility patterns based on public transportation schedules during the night shifts at Budapest. The roadmap of the paper is as follows. The developed PFCMD algorithm is described with the problem formulation in the Method section, this is where the methods of temporal analysis are also detailed. The analysed dataset that contains the nightly taxi rides in Budapest over a year-long period, the effect of clustering parameters and the comparison of the clusters identified by the PFCMD algorithm and the k-medoid-based solutions are showcased in the Results section. Finally, the results are discussed, and the article is concluded with some last remarks in the Conclusions section.

Methods

In this section, the developed PFCMD clustering algorithm is defined. Firstly, we introduce the problem formulation. After that, the detailed description of the algorithm follows, and finally, the temporal analysis is briefly profiled. Let R = [r1, r2, …, r] be a given set of N patterns, n = 1…N, each of them representing a mobility pattern as a Taxicab’s ride. Therefore, the n pattern is defined by r = (p, p, t, t), where p = [p, p] denotes the start (pickup) and p = [p, p] indicates the end (drop-off) GPS latitude (p and p) and longitude (p and p) coordinates, and t and t are the start and end times, respectively. The Taxicab trips are defined based on the state identifier of the Taxicab, indicating the operation mode of the Taxicab. Therefore, pickups are recorded when the state identifier is changed from Free (0) to Occupied (1), while the drop-off is indicated by the change of the state identifier from Occupied (1) to Free (0). Moreover, we have a Taxicab identifier, but the workload of different Taxicabs was not analysed. The stations of public transportation are determined by s ∈ S, j = 1…N stations where s = [s, s] denotes the GPS latitude and longitude coordinates of the stations. We aim to assign the Taxicab rides to these stations based on the pickup and drop-off coordinates, and find a reasonable schedule for these lines. The grouping of the rides to public transportation stations is performed by clustering, while the design of the line schedule is defined by the time series analysis of the grouped rides.

The goal-oriented Possibilistic Fuzzy C-medoid algorithm (PFCMD)

As the clustering is performed in the geographical domain, only pickup and drop-off coordinates are used in this step of the methodology and for easier notation, the x is reduced to a vector containing the coordinate-based records x = [p, p], therefore, clustering is realized in a four dimensional space: x = [p, p, p, p]. These points are to be partitioned into C clusters. The prototype of the c cluster is denoted by v = [s, s], where s, s ∈ S and i ≠ j. The original PFCM algorithm [38] aims to minimize the following optimization problem: subject to constraints , and 0 ≤ u, τ ≤ 1, while m ≥ 1, η ≥ 1, γ > 0. u represents the c row of the membership matrix (U) and contains all the memberships associated with the c cluster. The typicality is represented by the typicality matrix θ = [τ], the V = [v1, …, v] is the matrix of cluster centres and X is the analysed dataset. The user defined constants are the relative importance of fuzzy membership a > 0 and the typicality value b > 0. The membership value, u, of a point in a cluster represents the membership of x in the c cluster. Originally, in fuzzy c-means clustering [43], the membership values of a data point are inversely proportional to the relative distance of the data point to the C cluster prototypes. However, assuming C = 2 and an equidistant data point from the two cluster centroids, the membership value of the data point in each cluster is 0.5, regardless of the absolute distance of the data point to the cluster centroids. Therefore, noise points far but equidistant from the cluster centroids would produce equal membership values in both clusters, instead of the more natural choice of very low cluster membership values. To overcome this problem, the typicality of a point in a cluster was introduced, τ, which is interpreted as how relatively typical the point in cluster C is [40]. Therefore, taking advantage of both approaches, Pal et al. combined these terms into a single cost function [38]. If D = ‖x − v‖>0 for all C, where the ‖x − v‖ notation describes a standard L2 vector norm, then the membership and typicality values are calculated based on Eqs 2 and 3, respectively. In the present work, we change the original typicality function of Pal et al. [38] for a flexible negative Gompertz function of the distance as presented in Eq 3, which models the willingness of people to walk between, to and from the nearest public transportation stop instead of choosing a door-to-door transportation method. The α, β and γ are the parameters of the typicality function, making it highly flexible for the definition of a desirability trend. In the present context, this means the connection of rides being close to the public transportation stop. The possible centroids are selected from a predefined set of points, in the present context the public transport stops. where 1 ≤ c ≤ C; 1 ≤ n ≤ N and D([p, p], s)2 represents the distance between the datapoint (starting or ending of the ride) and the public transport stop that represents the center of the given cluster. Finally, as the cluster centroids, membership and typicality values are determined, the x data point is considered to be the member of each cluster, where the combined cluster membership value is above a certain user-defined threshold, P: We applied the Partition Coefficient (PC) and the Classification Entropy (CE) to evaluate the quality of the clusterings: where CE values close to zero and PC values close to one indicate well-separated cluster structure [44].

Results

Human mobility patterns analysis proves the applicability of the proposed PFCMD algorithm in the Hungary capital city, Budapest. We focused on the night shifts to compare the most frequent patterns with the possible public transportation stops based on the C-medoid clustering method and discover the possible public transportation routes. In this section, first, the analysed dataset, the Taxicab rides data recorded during the nights in Budapest are introduced. This is followed by the discussion of the proposed clustering-based solution, paying special attention to parameter tuning. Finally, the recommendation for the schedule of the possible public transportation lines is proposed by the temporal analysis of the start time of the rides.

The analysed taxicab rides of Budapest

The proposed PFCMD algorithm is applied to location data from Taxicabs equipped with a GPS receiver and an interface to record the actual state of the Taxicabs (engaged, vacant, not in service or en route for an incoming carriage request) [16]. The analysed GPS data was recorded in 2014 in Budapest and contained 450 million position records of 801 different city Taxicabs. The public transportation data comes from the official Budapest public transportation company (BKK Budapesti Közlekedési Központ Zrt.). The dataset contains all information about the BKK lines incorporating the routes, stops, stop times, and trip information in standard General Transit Feed Specification (GTFS) [45] format. From this available information, our analysis utilizes the coordinates of the public transportation stops. As the public transportation system of the city can be considered quite dense both spatially and temporally, in our work, we focused on the night rides with the starting time beginning after 9:00 PM and ending before 6:00 AM. Fig 1 illustrates the relatively dense and well-distributed public transport network of Budapest, which is overviewed with the Taxicab routes. Therefore, our research question is whether Taxicab rides can be more sustainably replaced by well-planned public transport solutions (mainly buses). Are there significant hubs that should be connected? Are there frequent times that should be better served at nights?

Fig 1

Stop stations of public transportation (red circles) and the travels of the Taxicabs (black lines) on the left.

Some example rides with start and end location are plotted on the right side of the figure.

Stop stations of public transportation (red circles) and the travels of the Taxicabs (black lines) on the left.

Some example rides with start and end location are plotted on the right side of the figure. The resultant 436537 rides during the analysed nights of 2014 mean an average of ∼ 1196 rides per night. We can assume that the barrier of changing from Taxicabs to public transportation is high for some passengers or in some cases. Moreover, the topology of Budapest can be considered quite complex, as the city is divided by the river Danube and the nightlife is mainly concentrated on the eastern side, leaving the western side calmer and less dense. In this regard, it is apparent that the planning of public transportation in Budapest is a highly complex challenge, and a careful analysis of the rides is required to ensure the utilization of the designed lines by the passengers. The existing public transportation lines are not included in the current analysis. However, the results are comparable with the existing routes, and decision-makers can make recommendations to re-route existing lines or introduce new ones.

Clustering the public transportation data

Our main questions were: Are there significant hubs that should be connected? Are there times that should be better served during nights? By detecting the major mobility patterns of Taxicab rides and comparing the designed lines to the existing public transportation system, previously uncovered areas can be connected by introducing new lines. In order to detect the start- and end-points of these lines, we clustered the Taxicab rides using the developed PFCMD algorithm with the previously defined parameter setting and initialized from the results of a k-medoid clustering. The advantages of the applied algorithm are visible in Fig 2. The result of k-medoid clustering consists of several outlier clusters, which are indicated by the conspicuous red lines. It is clear that the public transportation system cannot aim to cover these occasional rides sometimes pointing out of the city, but instead it should strive to meet the needs of the bulk of the community. A very striking example is the trip to Vienna (the long red line in the left part of Fig 2), which was covered by the traditional k-medoid clustering solution and a separate cluster was dedicated to fulfil this need. A straightforward and simple assumption can be to look for the closest public transportation stops to this k-medoid solution. As seen in Fig 2, this reduces the solutions to the outermost public transportation stops but does not solve the issue of occasional and unique rides. However, initializing a PFCMD clustering solution from the result of the k-medoid clustering will let these unique rides and look for the hubs containing enough rides in a walkable distance. This algorithm is not just highly flexible, where the cluster centroids are selected not from the rides but the public transportation stops. However, the parameters allow a highly flexible setting that can be tailored for the requirements.

Fig 2

The start- and end-points (Sp and Ep, respectively) of the k-medoid clustering of the Taxicab rides are marked with red and black circles, respectively.

The nearest public transportation stops to these start- and end-points are marked by black (Sp) and green (Ep) asterisks. In contrast, the clusters designed by the PFCMD algorithm are marked with cyan (Sp) and magenta (Ep) triangles, respectively. The colour of the line connecting the related stations in a straight line is the same as the colour of the starting station. The problem of outliers is well-reflected in the case of the trip to Vienna (red line on the left side of the figure) and constraining the solution to the outermost public transportation stop does not solve the problem neither.

The start- and end-points (Sp and Ep, respectively) of the k-medoid clustering of the Taxicab rides are marked with red and black circles, respectively.

Tuning the parameters of the clustering algorithm

The aim of this section is to present how these parameters can be fine-tuned to tailor the algorithm to meet the requirements of the analysis. The value of fuzziness exponent, m: in the case of a crisp m value (closer to one), the resultant clusters are going to be crisp as well, with no fuzziness introduced to the system. However, by increasing the fuzziness parameters, the borders of different clusters become more overlapping and less crisp. Choosing a too high fuzziness parameter is disadvantageous as well: as the membership values u are less than one, taking them on a high m exponent results in a minimal number. Therefore, the cluster members are primarily determined by the typicality values τ. This effect of parameter m is discussed in depth with detailed experiments in Pal et al. [38]. For specific datasets, this parameter can be tuned based on the effects of outliers: starting with one, crisp clusters are generated, while increasing its value, the effect of outliers is reduced. The optimal value is tuned experimentally, typically between one and two; however, higher values are possible as well. In the present work, to avoid highly amorphous clusters, a relatively strict m parameter was chosen as 1.2. The parameters of typicality, η, α, β, γ: As it was formerly described in the Method section, the original typicality function introduced by Pal et al. [38] was replaced by a function showing a decreasing trend as presented on Fig 3. The shape of this function aims to symbolize the willingness of a passenger to walk between the origin or destination of his/her travel and the nearest public transportation stop. The parameters are chosen to represent an approximately 500m range in which the passengers happily walk, but between 500 and 1000m this willingness rapidly drops (α = 1, β = 100, γ = 0.01). This distance calculation is performed by considering the L2 norm distance of the start- and end-point of the travel and the public transportation stops. To preserve the shape of the typicality function and, thus, its physical meaning, the η is chosen as 1.

Fig 3

The typicality function represents the willingness of passengers to walk to a nearby public transportation stop.

This function implicitly determines how many clusters are needed to group rides nearby the transportation lines.

The typicality function represents the willingness of passengers to walk to a nearby public transportation stop.

This function implicitly determines how many clusters are needed to group rides nearby the transportation lines. The coefficients of the membership function, a and b: the choice of these coefficients or weights describes the emphasis on the membership and typicality values. In order to reduce the effect of outliers, the value of b is to be increased. However, by an increased value of a, the effect of membership values is favoured. In the present context, the typicality part of the equation constraints the collection of rides starting and/or ending far away from each other into the same cluster. Therefore, as in the present work, public transportation lines are to be designed, where the walking distance to and from the stops is a crucially important aspect of applicability. We put a much higher emphasis on the typicality values and chose a = 0.1 and b = 0.9. The number of clusters, C: The parameter C defines the number of cluster centroids. The final number of public transportation lines can be different: the routes in similar directions can be merged, or different clusters can have the same public transportation stops as their centroids. This provides the opportunity to find the dense hubs and serve their needs in public transportation service. By setting a relatively high parameter C allows flexibility to the algorithm to optimally populate the clusters and hence, we can select the truly significant ones (the ones containing a significantly high number of rides in the clusters.) The true number of new public transportation lines can be determined after analysing the resultant clusters and their comparison to the existing lines. According to this consideration we selected the number of clusters to cover a wide range of travels and applied two cluster validity measures to validate appropriateness of the number of the clusters. Finally, we set the C parameter to be 50. The 0.9296 PC partition coefficient and the 0.4322 CE classification entropy indicate that the algorithm generated well-separated partitions with the selected settings.

Spatial analysis of the resultant clusters

Fig 4 shows the number of rides in each cluster (bar plot) and the proportion of covered rides on the line plot. Evidently, most of the clusters consist of a few rides, but this and the following analysis and visualization results underpin our assumption that these rides form a very sparse system in which we need to determine the most significant hubs. Consequently, only a small fraction, less than 3% of the rides, can be covered by bus lines using these strict constraints on the walking distances. Fig 4 also shows that by selecting a higher number of clusters, parameter C of the PFCMD algorithm, we provide flexibility to the algorithm so that it can populate the available clusters. After the clustering step, the clusters with insufficient number of rides in them can be neglected.

Fig 4

The number of data points in each cluster if the threshold of membership is P = 0.15 (bar plots, left axis) and the proportion of rides covered by the clusters compared to all the analysed Taxicab rides during the nights (line plot, right axis).

The number of data points in each cluster if the threshold of membership is P = 0.15 (bar plots, left axis) and the proportion of rides covered by the clusters compared to all the analysed Taxicab rides during the nights (line plot, right axis).

Some clusters have low importance due to the few supporting passengers (bar plots in part (b)). As the Taxicab rides usually handle unique and occasional travels, a small percentage of the rides can be replaced by public transportation lines. The designed lines containing at least one ride are visible together in Fig 5 (the algorithm places as many clusters as desired, even if no rides fulfil the required criteria). Naturally, these clusters can be further filtered based on the more in-depth aspects of the public transportation experts.

Fig 5

The cluster centroids are represented as lines on the map of Budapest.

The cluster centroids are represented as lines on the map of Budapest.

The start and end points of a recommended public transport line are marked by yellow circles and black crosses, respectively. The width of the line is proportional to the size (and hence, importance) of the cluster. The narrower lines represent smaller, while the wider lines represent bigger clusters. Fig 6 showcases some examples of the resultant clusters of the PFCMD algorithm. The start- and end-points points are represented by yellow dots and black crosses, respectively. The arrow at each sub-figure connects the public transportation stops serving as the start- and end-points for the designed line, and points in the travelling direction. Every centre of the cluster is a public transportation stop in Budapest. Evidently, one of the most important hubs is the Budapest Ferenc Liszt International Airport, as clusters 8 and 25 (part (a) and (c) of Fig 6) point there and clusters 17 (part (b) of Fig 6), 31 and 45 origin from the international airport. Moreover, cluster 36 is responsible for shorter rides near the airport. As expected, the other dense area with numerous clusters pointing into and from is the inner city centre, where most of the events occur at nighttime. Clusters 26, 32 and 48 (part (d), (e) and (f)) are good examples of this.

Fig 6

Exemplary clusters, where the start and endpoints points of the individual rides in the cluster are represented by yellow dots and black crosses, respectively.

The cluster centroids, marked by the start and end point of the arrow, point from one public transport stop to another.

Exemplary clusters, where the start and endpoints points of the individual rides in the cluster are represented by yellow dots and black crosses, respectively.

The cluster centroids, marked by the start and end point of the arrow, point from one public transport stop to another.

Temporal analysis of the resultant clusters

The presented analysis not only calls attention to missing transportation directions but also recommends the schedule of these lines. After identifying the cluster members, the time schedule of the lines is analysed as well. The proposed approach assumes that there are typical time periods (e.g., typical Monday mornings) that can be aggregated for the analysis. These periods were determined by the exploratory data analysis of the number of travels. Fig 7 shows day-wise and hour-wise boxplots of the distribution number of the rides. These boxplots illustrate the number of rides at the specific temporal period. The data points can be grouped at seasonal intervals to reveal how the values are distributed within the days of the week and the hours of the day, and how this compares over time shows the day-wise breakdown of the Taxicab. The busiest days are Saturday and Friday when many people are most likely to arrive at the city for entertainment during the night and book Taxicabs to the city center. Similar plots can be generated for the hour-wise (or any temporal resolution) breakdown of the rides in a cluster.

Fig 7

Temporal analysis of night rides on the full dataset.

Temporal analysis of night rides on the full dataset.

The trend shows an increase during the week until the Saturdays. The hour-wise analysis shows a constant usage before midnight and a decreasing trend before 5 am. Also, we can notice the outliers on the hour-wise boxplot. These are coming from the different characteristics of the weekends. By analyzing the start times of the rides within all clusters, suggestions can be made for the schedule of the proposed lines. Fig 8 shows the time-series analyses of all clustered rides. Based on the analysis of the start time of the Taxicab rides, we can notice that during the weekdays, the distributions of the Taxicab usages are the same. The busy periods during the night can be determined: these are the optimal periods when the related line is most likely to take advantage.

Fig 8

Time-series analysis of the clustered dataset.

The characteristics of the rides are close to each other during the weekdays.

Time-series analysis of the clustered dataset.

The characteristics of the rides are close to each other during the weekdays. The proposed analysis can be performed with any temporal resolution of interest. A detailed overview of the temporal analysis solutions of Taxicab data and the determination of busy periods was presented by Varga et al. [16]. For example, by using a sufficiently fine temporal resolution (e.g., 60, 30, or 15-minute windows) and counting the number of rides starting in the specific window, the public transportation line can be scheduled for the busy periods where the lines are most needed.

Discussion

It is well known that Taxicab rides reasonably well represent human mobility patterns [17]. As the daytime public transportation system of Budapest is relatively dense and transparent with sometimes multiple parallel opportunities, in the present work we concentrated on Taxicab rides occurring at nights as in these cases, the venues of the different events (start and end of theatre plays or cinema movies, parties) are covered by the transport system less thoughtfully and purposefully. The resultant clusters show a solid connection to the nightlife of Budapest and people travelling to or from the airport. As we can see, the individual Taxicab rides at nighttime provide a relatively sparse coverage of the city, making it difficult to reasonably connect the rides and reduce the effect of outliers. The simple k-medoid-based algorithms may tend to incorporate outliers in the clusters, resulting in a very high bias of the data model. However, the proposed PFCMD overcomes the problem of outliers. Moreover, the goal-oriented typicality function supports incorporating user-defined desirability functions based on the walking distance to and from the public transportation stops. The designed clusters highlight two frequent areas of the nightlife of the city: the centre, with its numerous entertainment and recreation opportunities, and the airport, where the planes frequently take off and land in the very late and early hours. The temporal analysis of the clustered rides supports a more sustainable planning of the public transportation lines’ timetable. However, it is evident that after the clustering of the analysed dataset, there are not enough rides in some clusters to further analyse the dynamics of the rides. This dataset can be considered as a sampling of the Taxicab rides available in Budapest, as a single Taxicab company provides the data. However, the mobility pattern is well-reflected in the results: in the data recording (2014), there was no direct line to the airport in Hungary, which was implemented in 2017 (with line ID 100E). The results illustrate that the method is suitable to call attention to missing transportation lines and recommends the scheduling of these lines. However, it has to be noted that the clusters do not directly represent optimized routes; the clustering algorithm generates suggestions for the experts by summarizing the demands in a sophisticated and robust way. Moreover, the derived areas and departure times provide precious information for Taxicab drivers. These are the potential places where they can more easily secure a ride in the related time slots.

Conclusions

In the present work, the importance of human mobility patterns-based public transportation design in sustainable cities is discussed. A new clustering algorithm is developed to assign the GPS based patterns to pre-defined centre points. The proposed possibilistic fuzzy c-medoid (PFCMD) clustering algorithm can group the human mobility patterns to the existing public transportation stops places within a walkable distance. Based on the analysis of the resultant clusters, further insights into the dynamics of the city can be derived. The applicability of PFCMD is presented on the analysis of the GPS data of Taxicabs, assigning them to the public transportation stop place coordinates in the city of Budapest, Hungary. The results show some potential routes where the re-scheduled public transportation (buses), can replace Taxicab rides during the night shift. The temporal analysis of the clustered rides shows the potential days and times of the day to re-design the lines. To stimulate further research, the resultant MATLAB codes for the proposed possibilistic fuzzy c-medoid (PFCMD) clustering algorithm, is publicly available on the website of the authors (www.abonyilab.com). 14 Apr 2022

PONE-D-22-03021

Goal-oriented possibilistic fuzzy C-Medoid clustering of human mobility patterns – Illustrative application for the Taxicab trips-based enrichment of public transport services

PLOS ONE Dear Dr. Eigner, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by May 29 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Luca Pappalardo Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. We note that Figure 1, 3, 4, 5a, 5b, 5c, 5d, 5e and 5f in your submission contain [map/satellite] images which may be copyrighted. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright. We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission: a. You may seek permission from the original copyright holder of Figure 1, 3, 4, 5a, 5b, 5c, 5d, 5e and 5f to publish the content specifically under the CC BY 4.0 license. We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text: “I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.” Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission. In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].” b. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only. The following resources for replacing copyrighted map figures may be helpful: USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/ The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/ Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/ Landsat: http://landsat.visibleearth.nasa.gov/ USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/# Natural Earth (public domain): http://www.naturalearthdata.com/ 3. Thank you for stating the following in your Competing Interests section: “NO authors have competing interests.” Please complete your Competing Interests on the online submission form to state any Competing Interests. If you have no competing interests, please state "The authors have declared that no competing interests exist.", as detailed online in our guide for authors at http://journals.plos.org/plosone/s/submit-now This information should be included in your cover letter; we will change the online submission form on your behalf. 4. Thank you for stating the following financial disclosure: “Project no. 2019-1.3.1-KK-2019-00007. has been implemented with the support provided from the National Research, Development and Innovation Fund of Hungary, financed under the 2019-1.3.1-KK funding scheme. Funder: National Research, Development and Innovation Office https://nkfih.gov.hu/for-the-applicants The funders did not play any role regarding the study.” Please state what role the funders took in the study. If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." If this statement is not correct you must amend it as needed. Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf. Additional Editor Comments (if provided): The work is potentially interesting but it requires significant improvements. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: N/A ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: No Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The idea and scope of the paper are interesting. In particular, the application of such a clustering algorithm with a tipicality function used to model people's willingness to walk to a bus stop is interesting. However, the work is not mature enough. Both the introduction and results sections are often unclear, the experiments performed are quite poor, and the figures provided do not help at all at understanding. In general, I think this work has potential, but it is not ready to be published. Below specific comments: - Introduction Sometimes very confused. It is often not clear whether the authors are referring to something that has been done in another work or they are stating something not related to any another work. I would suggest to use the formula "Authors et al. [X] studied/analysed/stated/ ..." to let the reader understand that you are referring to someone else's work. rows 4-6: "Based on these facts, innovative changes are needed to mitigate the effects of climate change within cities, and the transport sector there is one of the main contributors to greenhouse gas emissions" this is not very clear, may be rephrased. rows 31-33: "These GPS trajectories could be used to analyse the emission of particle matter from braking behaviours" This statement is a bit out of context here. I would move it between the examples on how GPS trajectories can be used (from row 36 on). row 47: I would add "In particular, electric buses [...]". row 78-79: avoiding the ad-hoc installation of new stops is more of a choice than a major deficiency of clustering application for the design of public transportation lines. I would cite this as a decision made in the authors' paper, that has both a practical and a methodological motivation, but not as a deficiency of these particular clustering applications. rows 110-113: "The developed PFCMD clustering algorithm that aims to cluster [...]" note clear, maybe a typo (remove "that"?) - Materials and Methods This section is very well written and clear. eqs. (4) and (5): what does it mean "inA" in the end of the equation? - Figures In general, the figures are unintelligible: they have very low resolution, too small markers in the legend, it is almost impossible to even get the colors of the markers. Fig. 1: colors in the legend (especially in the plot on the right) are unintelligible Fig. 2: it seems it is missing the part of this figure referenced at row 297 Fig. 3: a zoom on the right hand side of the figure would definitely help. The figure as it stands is only useful for getting the outlier theme, but not all the rest. - Results rows 261-268: any reference / detail / experiment about the choice of m? - Discussion rows 343-344: "In our work, we demonstrate that Taxicab rides represent the human mobility patterns reasonably well (similarly to [18])." this does not seem the goal of the authors' work nor what they have done. - References Refs [14] and [21] refer to the same work Refs [41] and [45] refer to the same work Reviewer #2: In this paper, the authors propose a possibilistic fuzzy c-medoid clustering algorithm to study human mobility. The proposed medoid-based clustering approach groups the typical mobility patterns within walking distance to the stations of the public transportation system. Their results demonstrate that the mobility pattern is well-reflected. In fact, in the data recording (2014), there was no direct line to the airport in Hungary, which was implemented later in 2017. The manuscript and the analysis have the potential to improve the knowledge in the human mobility field; however, I have some concerns about the methodology and some improvements that can be applied to increase the readability and the quality of the proposed manuscript: 1. The sentence starting in row 110 "The developed PFCMD clustering algorithm that aims to cluster.." seems to have an unnecessary “that”; 2. The introduction is sometimes not flowy, and it is not easy to understand what the references of some previous works refer to; 3. The time series analysis is not deep enough. I expect a deeper discussion about this crucial aspect for this work; 4. The clustering is performed for the taxi rides collected for a whole year, and after that, some public transport lines are suggested from the result of the clustering. I believe that the fact that the lines are indicated over an entire year's data and their influence on the study should be discussed better; 5. In line 156 I would say “the clustering is performed in the geographical domain” instead of “the clustering is performed in the spatial space”; 6. What is the “inA” at the end of equations 4 and 5? it is not clear to me; 7. The section “Temporal analysis of the resultant clusters” should be extended and described better; 8. In the sub-section “The number of public transportation lines, c:” the fact that the number of clusters may affect the number of lines is repeated and this may appear redundant. Furthermore, I suggest selecting c in such a way that optimizes a designed objective function; 9. How was the parameter m selected? it should be discussed; 10. How is the clustering quality measured? 11. In line 311 and 340 the authors stated “as seen/illustrated in Figure, …” the number of the Figure should be provided; 12. I think it is necessary to define an algorithm that performs the suggestions that can be made for the schedule of the lines. In general the rationale behind those choices are not that clear to me; 13. The quality of the Figures should be imporved to help the reader understand better some concepts of this manuscript, for example, they have a very low resolution and the markers in the legend are too small. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Matteo Bohm Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 25 Jun 2022 Reply to reviewers Title: Goal-oriented possibilistic fuzzy C-Medoid clustering of human mobility patterns – Illustrative application for the Taxicab trips-based enrichment of public transport services Authors: Miklós Mezei, Imre Felde, György Eigner, Gyula Dörgő, Tamás Ruppert, János Abonyi Ref. no.: PONE-D-22-03021 Dear Editor, We are grateful for the useful and supporting comments of the reviewers. We have addressed all the suggestions as explained below. We would like to thank the reviewers for their constructive comments. The changes in the manuscript are highlighted in blue. We hope that the revised manuscript meets your expectations. About the copyrights of our figures. The research group obtains an academic Matlab license, which is legally suitable for publishing the results generated with its use. All figures are created by geoplot (https://www.mathworks.com/help/matlab/ref/geoplot.html) and geoscatter (https://www.mathworks.com/help/matlab/ref/geoscatter.html) Matlab toolboxes, which are commonly used for visualization, and the reslulted plots can be inserted into open source documents. We hope these are fit to the journal copyright policy. Sincerely yours, György Eigner Reviewer 1 The idea and scope of the paper are interesting. In particular, the application of such a clustering algorithm with a tipicality function used to model people's willingness to walk to a bus stop is interesting. However, the work is not mature enough. Both the introduction and results sections are often unclear, the experiments performed are quite poor, and the figures provided do not help at all at understanding. In general, I think this work has potential, but it is not ready to be published. Below specific comments: - Introduction Sometimes very confused. It is often not clear whether the authors are referring to something that has been done in another work or they are stating something not related to any another work. I would suggest to use the formula "Authors et al. [X] studied/analysed/stated/ ..." to let the reader understand that you are referring to someone else's work. Thank you very much for the suggestion. After the careful revision of the introduction, we agree that the flow of the text needed a significant improvement. We added phrases to clarify that we refer to the work of other researchers and build on their findings. We hope that the readability and flow of the text have increased significantly. rows 4-6: "Based on these facts, innovative changes are needed to mitigate the effects of climate change within cities, and the transport sector there is one of the main contributors to greenhouse gas emissions" this is not very clear, may be rephrased. Thank you very much. The cited sentence was indeed badly phrased. We have rephrased it: “The transport sector is one of the main contributors to greenhouse gas emission. Rapid urban population growth, traffic congestion, and related air pollution put cities at the center of the climate mitigation agenda. These facts suggest urgent and transformative actions in urban mobility are required [2].” Also, we read the manuscript carefully and corrected the grammar and misspellings. rows 31-33: "These GPS trajectories could be used to analyse the emission of particle matter from braking behaviours" This statement is a bit out of context here. I would move it between the examples on how GPS trajectories can be used (from row 36 on). Thank you very much for the valuable suggestion. We have moved it to the suggested part of the manuscript. row 47: I would add "In particular, electric buses [...]". Thank you very much. We have added the recommended phrase. row 78-79: avoiding the ad-hoc installation of new stops is more of a choice than a major deficiency of clustering application for the design of public transportation lines. I would cite this as a decision made in the authors' paper, that has both a practical and a methodological motivation, but not as a deficiency of these particular clustering applications. Thank you very much for your carefulness. We have corrected the title of the list, stating that here we list the three significant deficiencies and practical problems/aspects. rows 110-113: "The developed PFCMD clustering algorithm that aims to cluster [...]" note clear, maybe a typo (remove "that"?) Thank you very much for your carefulness. We have removed the unnecessary “that” from the sentence. - Materials and Methods This section is very well written and clear. eqs. (4) and (5): what does it mean "inA" in the end of the equation? Thank you very much. We have added a more detailed description to the paragraph next to the equations on page 6 and removed the unnecessary DinA notation that was introduced to prepresent a general distance norm. - Figures In general, the figures are unintelligible: they have very low resolution, too small markers in the legend, it is almost impossible to even get the colors of the markers. Fig. 1: colors in the legend (especially in the plot on the right) are unintelligible Thank you very much for this crucial highlight. We increased the marker on the legend. Fig. 2: it seems it is missing the part of this figure referenced at row 297 Thank you, you highlighted that we need to separate these two figures. Now you can find these in Figures 3 and 4 with the references at the end of “Tuning the parameters of the clustering algorithm” (Fig. 3) and the beginning of the “Spatial analysis of the resultant clusters” (Fig 4) sections. Fig. 3: a zoom on the right hand side of the figure would definitely help. The figure as it stands is only useful for getting the outlier theme, but not all the rest. The figure aims to show the outliers as the nature of the compared clustering algorithms. We modified the figure, now it is Fig. 2. We created a subplot inside the original one, where we zoom in to the city center to show the rest of the cluster parallel with the outliers. - Results rows 261-268: any reference / detail / experiment about the choice of m? Thank you very much for the insightful question. We have extended the description of the effect and choice of the parameter m and also referenced the work of Pal et al., where detailed experiments are present. “[...] effect of parameter $m$ is discussed in-depth with detailed experiments in Pal et al. [38]. For specific datasets, this parameter can be tuned based on the effects of outliers: starting with 1, crisp clusters are generated, while increasing its value, the effect of outliers is reduced. The optimal value is tuned experimentally, typically between 1 and 2. However, higher values are possible as well.” - Discussion rows 343-344: "In our work, we demonstrate that Taxicab rides represent the human mobility patterns reasonably well (similarly to [18])." this does not seem the goal of the authors' work nor what they have done. Thank you very much for the suggestion, we agree with that. We have rephrased the relevant part: “In our work, we build on the assumption that Taxicab rides represent the human mobility patterns reasonably well...” Also, your comment highlighted us we need to make a clear contribution. Now you can find it at the end of the Discussion section. - References Refs [14] and [21] refer to the same work Refs [41] and [45] refer to the same work Thank you very much for your carefulness. We have corrected the references. Thank you for all the constructive critics and careful revision. We truly believe that based on these improvements, the quality of the manuscript has increased significantly. Reviewer 2 In this paper, the authors propose a possibilistic fuzzy c-medoid clustering algorithm to study human mobility. The proposed medoid-based clustering approach groups the typical mobility patterns within walking distance to the stations of the public transportation system. Their results demonstrate that the mobility pattern is well-reflected. In fact, in the data recording (2014), there was no direct line to the airport in Hungary, which was implemented later in 2017. The manuscript and the analysis have the potential to improve the knowledge in the human mobility field; however, I have some concerns about the methodology and some improvements that can be applied to increase the readability and the quality of the proposed manuscript: 1. The sentence starting in row 110 "The developed PFCMD clustering algorithm that aims to cluster.." seems to have an unnecessary “that”; Thank you very much for your carefulness. We have removed the unnecessary “that” from the sentence. 2. The introduction is sometimes not flowy, and it is not easy to understand what the references of some previous works refer to; Thank you very much for the critical criticism. After the careful revision of the introduction, we agree that the flow of the text needed a significant improvement. We reconstructed the sentences aiming to clarify the referred works. We hope that the readability and flow of the text have increased significantly. 3. The time series analysis is not deep enough. I expect a deeper discussion about this crucial aspect for this work; AND 4. The clustering is performed for the taxi rides collected for a whole year, and after that, some public transport lines are suggested from the result of the clustering. I believe that the fact that the lines are indicated over an entire year's data and their influence on the study should be discussed better; We appropriate your comments. These are helpful and improved our manuscript a lot. We did not focus enough to prove our hypothesis that there are typical patterns during the rides in the time series. We extended the temporal analysis and the analysed taxicab rides sections. We added new plots to show the characteristic of the number of rides during the night shift. In Figure 8, you can notice the similar characteristic of the weekdays. We regenerated the boxplot analyses based on the complete datasets considering the 436 537 rides during the entire year (at night, between 9 p-m and 5 a.m). Figure 7 shows the new results. The additional information confirms that the proposed clustering algorithm explores relevant cluster in the time domain. 5. In line 156 I would say “the clustering is performed in the geographical domain” instead of “the clustering is performed in the spatial space”; Thank you very much for your carefulness. It is corrected. 6. What is the “inA” at the end of equations 4 and 5? it is not clear to me; Thank you very much. We have added a more detailed description to the paragraph next to the equations on page 6 and removed the unnecessary DinA notation that was introduced to prepresent a general distance norm. 7. The section “Temporal analysis of the resultant clusters” should be extended and described better; AND 12. I think it is necessary to define an algorithm that performs the suggestions that can be made for the schedule of the lines. In general the rationale behind those choices are not that clear to me; Thank you very much for the crucial comment. Revisiting the section, we agree that this part of the method was not detailed enough. We extended the description of the section, describing how the analysis of the start time of the rides in a specific cluster supports the determination of optimal public transportation schedules. Moreover, we highlighted that the work of Varga et al. describes in detail the value and method of temporal analysis of taxi rides. The discussion of the paper has been also extended to discuss that the algorithm generates suggestions for the experts by summarizing the demands in a sophisticated way. . 8. In the sub-section “The number of public transportation lines, c:” the fact that the number of clusters may affect the number of lines is repeated and this may appear redundant. Furthermore, I suggest selecting c in such a way that optimizes a designed objective function; Thank you very much for the important remark. We have chosen the parameter c, the number of clusters to be sufficiently high and allow flexibility for the algorithm. This way, the algorithm can populate these clusters with the rides, and we can analyse the resultant clusters considering only the clusters with a sufficient number of rides. We do not constrain the algorithm by strictly defining the number of clusters in advance. As shown in Fig. 4., this way, only a well-defined number of clusters will be statistically interesting for us based on the number of incorporated rides in the clusters. Thank you again for this important suggestion. We have extended the text with this further description. We hope this approach will clarify our parameter selection approach to the reader. 9. How was the parameter m selected? it should be discussed; Thank you very much for the insightful question. We have extended the description of the effect and choice of the parameter m and also referenced the work of Pal et al., where detailed experiments are present. “[...] effect of parameter $m$ is discussed in-depth with detailed experiments in Pal et al. For specific datasets, this parameter can be tuned based on the effects of outliers: starting with one, crisp clusters are generated, while increasing its value, the effect of outliers is reduced. The optimal value is tuned experimentally, typically between one and two. However, higher values are possible as well.” 10. How is the clustering quality measured? Thank you for this crucial comment. We can measure the quality of the cluster in two ways. Firstly, we can interpret the results (as we did in the “Clustering of Public Transportation Data” section). Secondly, we can apply cluster validity measures. For that, we extended the Methods section with two equations (eq. 7 and 8) to define the Partition Coefficient (PC) and Classification Entropy (CE). These are complex measures, thanks to that, these are independent of the distance values. The high PC and low CE measures both claim a well-separated cluster structure, which underpin our results. 11. In line 311 and 340 the authors stated “as seen/illustrated in Figure, …” the number of the Figure should be provided; Thank you very much, we have added the number of the figures to the description. (The part at line 340 has changed, but the exact number of the particular Figure is provided in every case.) 13. The quality of the Figures should be imporved to help the reader understand better some concepts of this manuscript, for example, they have a very low resolution and the markers in the legend are too small. Thank you very much for this crucial highlight. We increased the marker on the legend of Figure 2 (the previous version of the manuscript was Figure 1). We separated the typicality and the number of data points figures into individuals. Now you can find these in Figures 3 and 4. Also, we create a subplot inside the original Figure 2 (the previous version of the manuscript was Figure 3). We zoom in to the city center to show the rest of the cluster parallel with the outliers. We replaced the figure of the boxplots (Fig. 7) and put it in the temporal analyses to prove the datasets. Also, we created a new plot to prove the characteristics of the rides. You can see it in Figure 8. Submitted filename: Taxi-PLOS-ONE - reply to reviewers.pdf Click here for additional data file. 6 Sep 2022 Goal-oriented possibilistic fuzzy C-Medoid clustering of human mobility patterns – Illustrative application for the Taxicab trips-based enrichment of public transport services PONE-D-22-03021R1 Dear Dr. Eigner, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Yajie Zou Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #3: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: (No Response) Reviewer #3: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: (No Response) Reviewer #3: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: (No Response) Reviewer #3: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: (No Response) Reviewer #3: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: (No Response) Reviewer #3: (No Response) ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #3: No ********** 27 Sep 2022 PONE-D-22-03021R1 Goal-oriented possibilistic fuzzy C-Medoid clustering of human mobility patterns – Illustrative application for the Taxicab trips-based enrichment of public transport services Dear Dr. Eigner: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Yajie Zou Academic Editor PLOS ONE

2 in total

1. Traffic-related air pollution and health co-benefits of alternative transport in Adelaide, South Australia.

Authors: Ting Xia; Monika Nitschke; Ying Zhang; Pushan Shah; Shona Crabb; Alana Hansen
Journal: Environ Int Date: 2014-11-09 Impact factor: 9.621

2. Modeling carbon emissions from urban traffic system using mobile monitoring.

Authors: Daniel Jian Sun; Ying Zhang; Rui Xue; Yi Zhang
Journal: Sci Total Environ Date: 2017-05-11 Impact factor: 7.963

2 in total