| Literature DB >> 35982116 |
Ali M S Alfosool1, Yuanzhu Chen2, Daniel Fuller3.
Abstract
Walkability is an important measure with strong ties to our health. However, there are existing gaps in the literature. Our previous work proposed new approaches to address existing limitations. This paper explores new ways of applying transferability using transfer-learning. Road networks, POIs, and road-related characteristics grow/change over time. Moreover, calculating walkability for all locations in all cities is very time-consuming. Transferability enables reuse of already-learned knowledge for continued learning, reduce training time, resource consumption, training labels and improve prediction accuracy. We propose ALF-Score++, that reuses trained models to generate transferable models capable of predicting walkability score for cities not seen in the process. We trained transfer-learned models for St. John's NL and Montréal QC and used them to predict walkability scores for Kingston ON and Vancouver BC. MAE error of 13.87 units (ranging 0-100) was achieved for transfer-learning using MLP and 4.56 units for direct-training (random forest) on personalized clusters.Entities:
Mesh:
Year: 2022 PMID: 35982116 PMCID: PMC9388587 DOI: 10.1038/s41598-022-17713-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1ALF-Score++ utilizes features similar to that of ALF-Score and ALF-Score+ such as road network structure, POI, centrality measures and road embedding. GLEPO’s linear extension of user opinions[18] that produces a global view of relative user opinions, is then aligned with the features as an input to the machine learning processes. Models trained by ALF-Score++ are applicable to cities previously seen and unseen by the algorithms during the training processes. Walkability estimates that are produced through trained models will have a high spatial resolution, be representative of user opinion and provide a better insight of different regions and neighbourhoods. (Figure drawn by the authors).
List of road networks for various cities with their network and POI sizes that have been experimented with in this research.
| City | # of nodes | # of edges | # of POIs | Population density | Total land area (km2) |
|---|---|---|---|---|---|
| Victoria, BC | 6770 | 8593 | 3318 | 85,792 | 19.47 |
| Kingston Metro, ON | 3427 | 4769 | 813 | 161,175 | 1906.82 |
| St. John’s Metro, NL | 5364 | 6851 | 592 | 205,955 | 804.63 |
| Vancouver Metro, BC | 45,125 | 60,299 | 13,321 | 2,463,431 | 2878.52 |
| Montréal Metro, QC | 76,663 | 114,414 | 10,045 | 4,247,000 | 4604.26 |
| Toronto Metro, ON | 479,520 | Over a million | 23,930 | 6,417,516 | 5905.71 |
For brevity, in this paper we mostly focus on 3 cities of Kingston, Vancouver and Montréal. Nodes and edges are extracted from road networks. Population density and the total land area information are excerpted from Wikipedia.
Various deep neural network settings under which MLP and transfer learning were experimented with.
| # of dense layers | Output shape range | Total parameters | Optimizer | # of epochs |
|---|---|---|---|---|
| 2 | 8–16 | 10,945 | Adam | 200 |
| 5 | 50–300 | 418,301 | Adam | 300 |
| 11 | 50–1000 | 2,673,301 | AdaMax | 400 |
| 12 | 50–800 | 2,303,001 | AdaMax | 600 |
Figure 2Walkability results produced by 3 separate variations of ALF-Score and ALF-Score++ for the city of Montréal, QC and their correlation. Top left: predictions based on a model only trained for Montréal’s user data. Top right: predictions based on a transferred model only trained for a single city’s user data (St. John’s). Bottom left: predictions based on a model trained for Montréal’s user data while having the previously trained weights for St. John’s user data transferred in its transfer learned training process. Bottom right: correlation between the three variations. The road network for Montréal maintains over 76 thousand nodes. ALF-Score’s walkability scores range between 0 and 100 units. This range can be adjusted if needed. (Maps generated through RStudio[44] Version 1.2 using mapview package from rstudio.com. Correlation figure generated through RStudio[44] Version 1.2 using PerformanceAnalytics package from rstudio.com).
Figure 3Experimentation results of four machine learning techniques over five feature combinations for the city of Montréal, QC with a data split of 80–20. The bars represent MAE error over a range of 0–100 units. RF: random forest, MLP: multi Layer perceptrons, SVM: support vector machine, DC: decision tree. RF provides the best performance overall. (Bar plot generated through matplotlib[45] Version 3.4.3 from matplotlib.org).
Exploration of various machine learning techniques and feature combinations over an 80–20 data split (matching approach) for the city of Montréal, QC reflecting their top performing accuracy.
| Technique | POI | POI + | POI + | Network + | All |
|---|---|---|---|---|---|
| Network | Embedding | Embedding | |||
| Random forest | 19.65 | 18.20 | 17.13 | 15.47 | |
| MLP | 26.65 | 24.08 | 23.44 | 23.56 | 21.91 |
| SVM | 29.03 | 31.04 | 29.78 | 23.63 | 21.74 |
| Decision tree | 21.65 | 31.87 | 34.23 | 24.45 | 21.49 |
| Random forest | 24.75 | 22.19 | 22.61 | 20.28 | |
| MLP | 29.99 | 28.09 | 26.81 | 25.18 | 22.69 |
| SVM | 34.91 | 35.66 | 35.88 | 27.17 | 25.09 |
| Decision tree | 36.73 | 34.57 | 36.67 | 27.02 | 25.53 |
| Random forest | 0.7291 | 0.7590 | 0.7516 | 0.7784 | |
| MLP | 0.6386 | 0.6892 | 0.7076 | 0.7203 | 0.7520 |
| SVM | 0.6388 | 0.6258 | 0.6257 | 0.7051 | 0.7172 |
| Decision tree | 0.6014 | 0.6210 | 0.6079 | 0.7020 | 0.7249 |
Results represent MAE and RMSE errors over a range of 0–100 units, as well as .
Significant values are in [bold].
Exploration of the three experimentation approaches (1) matching, (2) combined and (3) zero-user-input over 5 different feature combinations and 2 different data split approaches based on data from the cities of St. John’s NL and Montréal QC reflecting MAE error range of 0–100 units.
| MLP | POI | POI + | POI + | Network + | All |
|---|---|---|---|---|---|
| Network | Embedding | Embedding | |||
| St. John’s (STJ on STJ (100%)) | 27.55 | 26.22 | 22.23 | 21.91 | 17.88 |
| Montréal (MTL on MTL (100%) | 26.65 | 24.08 | 23.44 | 23.56 | 21.91 |
| STJ on MTL (100%) | n/a | n/a | n/a | n/a | 32.44 |
| STJ on STJ (50%) + MTL (100%) | 26.87 | 25.10 | 23.55 | 19.31 | 15.77 |
| STJ on STJ + MTL | 25.87 | 23.74 | 21.45 | 20.23 | |
| MTL on STJ (100%) | n/a | n/a | n/a | n/a | 33.89 |
| MTL on STJ + MTL | 25.11 | 22.23 | 21.67 | 20.11 | 16.23 |
| MTL on STJ (100%) + MTL (50%) | 27.67 | 24.86 | 14.43 | 21.51 | 16.73 |
| MTL on STJ (100%) + MTL (80%) | 24.84 | 20.17 | 19.92 | 18.36 | |
| MTL on STJ (100%) + MTL (20%) | 29.66 | 25.34 | 25.73 | 22.89 | 18.34 |
Significant values are in [bold].
Exploration of the three experimentation approaches (1) matching, (2) combined and (3) zero-user-input over 5 different feature combinations and 2 different data split approaches based on data from the cities of St. John’s NL and Montréal QC.
| MLP | POI | POI + | POI + | Network + | |
|---|---|---|---|---|---|
| Network | Embedding | Embedding | |||
| St. John’s (STJ on STJ (100%)) | 30.14 | 29.06 | 26.82 | 25.37 | 20.13 |
| Montréal (MTL on MTL (100%) | 29.99 | 28.09 | 26.81 | 25.18 | 22.69 |
| STJ on MTL (100%) | n/a | n/a | n/a | n/a | 38.79 |
| STJ on STJ (50%) + MTL (100%) | 30 | 28.64 | 25.05 | 23.13 | 18.67 |
| STJ on STJ + MTL | 28.21 | 26.92 | 25.63 | 24.22 | |
| MTL on STJ (100%) | n/a | n/a | n/a | n/a | 40.33 |
| MTL on STJ + MTL | 28.84 | 25.84 | 25.36 | 23.87 | 19.17 |
| MTL on STJ (100%) + MTL (50%) | 31.02 | 27.9 | 27.11 | 24.69 | 19.71 |
| MTL on STJ (100%) + MTL (80%) | 27.31 | 24.77 | 24.82 | 22.23 | |
| MTL on STJ (100%) + MTL (20%) | 34.17 | 30.06 | 29.75 | 26.28 | 22.93 |
| St. John’s (STJ on STJ (100%)) | 0.6674 | 0.6773 | 0.7039 | 0.7219 | 0.7676 |
| Montréal (MTL on MTL (100%) | 0.6779 | 0.6993 | 0.7080 | 0.7180 | 0.7475 |
| STJ on MTL (100%) | n/a | n/a | n/a | n/a | 0.5840 |
| STJ on STJ (50%) + MTL (100%) | 0.6771 | 0.6959 | 0.7174 | 0.7455 | 0.7755 |
| STJ on STJ + MTL | 0.6846 | 0.7185 | 0.7150 | 0.7344 | |
| MTL on STJ (100%) | n/a | n/a | n/a | n/a | 0.5756 |
| MTL on STJ + MTL | 0.6795 | 0.7118 | 0.7099 | 0.7274 | 0.7712 |
| MTL on STJ (100%) + MTL (50%) | 0.6610 | 0.7031 | 0.7128 | 0.7283 | 0.7706 |
| MTL on STJ (100%) + MTL (80%) | 0.7015 | 0.7136 | 0.7227 | 0.7459 | |
| MTL on STJ (100%) + MTL (20%) | 0.6357 | 0.6718 | 0.6658 | 0.7144 | 0.7342 |
Reflecting RMSE error range of 0–100 units and .
Significant values are in [bold].
Figure 4Exploration of 3 approaches (1) matching, (2) combined, (3) zero-user-input. Combined approach is extensively tested with various conditions. One such condition is the different ways of data split to better understand how the data affects the transfer of knowledge in transfer learning while being able to provide solid training and testing sets. Best performance was observed to be generated through a complete random selection into an 80–20 split. MTL on STJ reflects on the prediction of scores for Montréal based only on a model trained on St. John’s. MTL on STJ+MTL on the other hand reflects on the prediction of scores for Montréal based on a transfer-learned model on both St. John’s and Montréal. (Bar plot generated through matplotlib[45] Version 3.4.3 from matplotlib.org).
Figure 5Top 150 features. While a noticeable difference is observed among the top 13 features, we can observe a steady trend among most embedding features. Embedding feature importance account for most of the feature importance. We can also observe that despite having the highest number of features (530) only a small number of POI features appear in the top 150 features. (Bar plot generated through matplotlib[45] Version 3.4.3 from matplotlib.org).
Figure 6Total contribution to feature importance among 668 features is divided into three categories: (1) centrality, (2) POI, (3) road embedding. Left: road embedding, while contributing to only 19% of the total features, accounts for 78.7% of the total feature importance, while centrality features contribute to 4.1% and POI features to 17.1% of the total feature importance. Right: when normalized to individual feature importance, we can observe, the highest contribution is by embedding features where each feature contributes to 58.2% of the total embedding contribution of 78.7% where each centrality feature contributes to 38.8% of the total centrality feature importance of 4.1 while each POI feature contributes to only 3.1% of the total contributing feature importance of 17.1%. (Pie chart generated through matplotlib[45] Version 3.4.3 from matplotlib.org).
Feature importance for all centrality features (10 features in total) which contribute to 4.1% of the total feature importance.
| Feature | Importance |
|---|---|
| Eccentricity | 0.01316184 |
| Stress | 0.004649347 |
| Betweenness centrality | 0.004590322 |
| Average shortest path length | 0.0043923 |
| Topological coefficient | 0.003773664 |
| Neighborhood connectivity | 0.003381009 |
| Radiality | 0.002233024 |
| Closeness centrality | 0.001581954 |
| Clustering coefficient | 0.001386535 |
| Degree | 0.000191574 |
Feature importance for top 10 (from 530) POI features.
| Feature | Importance |
|---|---|
| restaurant_600 | 0.014984423 |
| bar_1800 | 0.010154083 |
| cafe_1400 | 0.007755144 |
| cafe_1600 | 0.007659399 |
| cafe_2000 | 0.007125045 |
| cafe_1800 | 0.005620239 |
| restaurant_1000 | 0.005089702 |
| restaurant_1400 | 0.004664054 |
| restaurant_1400 | 0.004664054 |
| bench_1800 | 0.003845234 |
The entire 530 features contribute to 17.1% of the total feature importance.
Figure 7Left: Can-ALE for the city of Kingston, ON. Right: walkability results produced by ALF-Score++ for the city of Kingston, ON using the zero-user-input approach of a model trained through transfer learning based on user data from two cities of St. John’s and Montréal. The road network for Kingston maintains over 3400 nodes. ALF-Score’s walkability scores range between 0 and 100 units. This range can be adjusted if needed. (Maps generated through RStudio[44] Version 1.2 using mapview package from rstudio.com).
Figure 8Left: Can-ALE for the city of Vancouver, BC. Right: walkability results produced by ALF-Score++ for the city of Vancouver, BC using the zero-user-input approach of a model trained through transfer learning based on user data from two cities of St. John’s and Montréal. The road network for Vancouver maintains over 45 thousand nodes. (Maps generated through RStudio[44] Version 1.2 using mapview package from rstudio.com).
Figure 9Correlation between ALF-Score++ and Can-ALE for four different cities. Top left: Montréal QC, Top right: Kingston ON, Bottom left: Vancouver BC, Bottom right: St. John’s NL. (Correlation figures generated through RStudio[44] Version 1.2 using PerformanceAnalytics package from rstudio.com).