| Literature DB >> 32252432 |
Junsheng Huang1,2, Baohua Mao1,2,3, Yun Bai1,2, Tong Zhang1,2, Changjun Miao4.
Abstract
Various traffic-sensing technologies have been employed to facilitate traffic control. Due to certain factors, e.g., malfunctioning devices and artificial mistakes, missing values typically occur in the Intelligent Transportation System (ITS) sensing datasets, resulting in a decrease in the data quality. In this study, an integrated imputation algorithm based on fuzzy C-means (FCM) and the genetic algorithm (GA) is proposed to improve the accuracy of the estimated values. The GA is applied to optimize the parameter of the membership degree and the number of cluster centroids in the FCM model. An experimental test of the taxi global positioning system (GPS) data in Manhattan, New York City, is employed to demonstrate the effectiveness of the integrated imputation approach. Three evaluation criteria, the root mean squared error (RMSE), correlation coefficient (R), and relative accuracy (RA), are used to verify the experimental results. Under the ±5% and ±10% thresholds, the average RAs obtained by the integrated imputation method are 0.576 and 0.785, which remain the highest among different methods, indicating that the integrated imputation method outperforms the history imputation method and the conventional FCM method. On the other hand, the clustering imputation performance with the Euclidean distance is better than that with the Manhattan distance. Thus, our proposed integrated imputation method can be employed to estimate the missing values in the daily traffic management.Entities:
Keywords: Intelligent Transportation System; fuzzy C-means; genetic algorithm; missing values imputation
Year: 2020 PMID: 32252432 PMCID: PMC7181140 DOI: 10.3390/s20071992
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Working of GPS.
Samples of taxi GPS data.
| PT | DT | PLO | PLA | DLO | DLA | TD | TDI |
|---|---|---|---|---|---|---|---|
| 2013-1-13 4:36 | 2013-1-13 4:46 | −73.9969 | 40.72006 | −73.9935 | 40.69304 | 600 | 3.12 |
| 2013-1-13 4:37 | 2013-1-13 4:48 | −74.0003 | 40.73007 | −73.9874 | 40.76841 | 660 | 3.39 |
| 2013-1-13 4:41 | 2013-1-13 4:45 | −73.9973 | 40.72098 | −74.0004 | 40.73238 | 240 | 1.16 |
| … | … | … | … | … | … | … | … |
PT means pick-up time; DT means drop-off time; PLO means pick-up longitude; PLA means pick-up latitude; DLO means drop-off longitude; DLA means drop-off latitude; TD means trip duration (seconds); TDI means trip distance (km).
Figure 2Workflow of the research framework. MPR means missing partially at random; FCM-GA means fuzzy C-means with genetic algorithm.
Taxi demand volume on weekdays. The “question marks” in Table 2 represent the missing values.
| Monday | Tuesday | Wednesday | Thursday | Friday | |
|---|---|---|---|---|---|
| 0:00:00–0:05:00 | 158 | ? | ? | 249 | 341 |
| 0:05:00–0:10:00 | 184 | 200 | 254 | ? | ? |
| 0:10:00–0:15:00 | 163 | 200 | 263 | 248 | 341 |
| … | … | … | … | … | … |
| 7:30:00–7:35:00 | ? | 405 | 421 | 417 | 407 |
| 7:35:00–7:40:00 | 399 | 444 | ? | 455 | 435 |
| 7:40:00–7:45:00 | 429 | ? | ? | ? | 468 |
| … | … | … | … | … | … |
| 23:45:00–23:50:00 | 240 | ? | 286 | 395 | 549 |
| 23:50:00–23:55:00 | 205 | 281 | 284 | 398 | 542 |
| 23:55:00–0:00:00 | ? | ? | 282 | ? | 509 |
Figure 3Incomplete data on weekdays based on a 5 min aggregation interval.
Figure 4Flowchart of the integrated imputation algorithm.
Optimized parameters obtained by the integrated imputation method.
| Time Interval (min) | Manhattan Distance | Euclidean Distance | ||
|---|---|---|---|---|
|
|
|
|
| |
| 5 min | 12 | 1.1083 | 14 | 1.1936 |
| 10 min | 12 | 1.1123 | 14 | 1.1183 |
| 15 min | 13 | 1.1016 | 12 | 1.1145 |
| 20 min | 8 | 1.1426 | 9 | 1.1211 |
Figure 5Comparisons between the missing values and the estimated values.
Figure 6Root mean squared error (RMSE) comparisons under the condition of different data aggregation intervals and different missing ratios.
Figure 7Correlation coefficient comparisons under the condition of different data aggregation intervals and different missing ratios.
Figure 8Relative accuracy (RA) comparisons under the condition of different data aggregation intervals, different missing ratios, and the ±5% tolerance error range.
Figure 9RA comparisons under the condition of different data aggregation intervals, different missing ratios, and the ±10% tolerance error range.
Figure 10Time series comparisons with different missing ratios.
Average computation time of FCM and FCM-GA (in seconds).
| Time Interval (min) | Missing Ratio | FCM | FCMGA |
|---|---|---|---|
| 5 min | 5% | 32.90 | 258.30 |
| 10% | 32.99 | 259.40 | |
| 15% | 33.01 | 260.10 | |
| 20% | 33.18 | 262.30 | |
| 25% | 34.33 | 264.87 | |
| 10 min | 5% | 12.59 | 221.90 |
| 10% | 12.61 | 224.10 | |
| 15% | 14.22 | 224.96 | |
| 20% | 14.90 | 225.64 | |
| 25% | 17.39 | 228.22 | |
| 15 min | 5% | 9.08 | 168.31 |
| 10% | 9.39 | 169.32 | |
| 15% | 9.67 | 170.20 | |
| 20% | 11.87 | 171.99 | |
| 25% | 11.90 | 172.40 | |
| 20 min | 5% | 8.14 | 89.20 |
| 10% | 8.66 | 91.20 | |
| 15% | 8.80 | 91.88 | |
| 20% | 9.22 | 92.51 | |
| 25% | 9.80 | 93.91 |