| Literature DB >> 29664928 |
Hongtai Yang1, Jianjiang Yang2, Lee D Han3, Xiaohan Liu1, Li Pu4, Shih-Miao Chin5, Ho-Ling Hwang5.
Abstract
Along with the rapid development of Intelligent Transportation Systems, traffic data collection technologies have progressed fast. The emergence of innovative data collection technologies such as remote traffic microwave sensor, Bluetooth sensor, GPS-based floating car method, and automated license plate recognition, has significantly increased the variety and volume of traffic data. Despite the development of these technologies, the missing data issue is still a problem that poses great challenge for data based applications such as traffic forecasting, real-time incident detection, dynamic route guidance, and massive evacuation optimization. A thorough literature review suggests most current imputation models either focus on the temporal nature of the traffic data and fail to consider the spatial information of neighboring locations or assume the data follow a certain distribution. These two issues reduce the imputation accuracy and limit the use of the corresponding imputation methods respectively. As a result, this paper presents a Kriging based data imputation approach that is able to fully utilize the spatiotemporal correlation in the traffic data and that does not assume the data follow any distribution. A set of scenarios with different missing rates are used to evaluate the performance of the proposed method. The performance of the proposed method was compared with that of two other widely used methods, historical average and K-nearest neighborhood. Comparison results indicate that the proposed method has the highest imputation accuracy and is more flexible compared to other methods.Entities:
Mesh:
Year: 2018 PMID: 29664928 PMCID: PMC5903649 DOI: 10.1371/journal.pone.0195957
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Description of RTMS stations.
| Station | Direction | Location | Lanes | Mile marker |
|---|---|---|---|---|
| 115 | Northbound | Ellington Parkway @I24 | 2 | 10.6 |
| 117 | Northbound | Ellington Parkway @Cleveland | 2 | 11.4 |
| 119 | Northbound | Ellington Parkway @Granada | 2 | 11.8 |
| 121 | Northbound | Ellington Parkway @Douglas Ave | 2 | 12.2 |
| 123 | Northbound | Ellington Parkway @South of Trinity | 2 | 12.4 |
| 124 | Northbound | Ellington Parkway @Trinity | 2 | 13.0 |
Fig 1RTMS stations for this study.
Data description.
| Station number | Lane | Average speed (mi/h) | Average count | Missing rate |
|---|---|---|---|---|
| 115 | 1 | 47.41 (22.67) | 5 (4) | 3.19% |
| 2 | 36.63 (31.47) | 2 (3) | ||
| 117 | 1 | 38.46 (33.65) | 3 (4) | 33.83% |
| 2 | 25.94 (31.75) | 2 (4) | ||
| 119 | 1 | 57.85 (21.71) | 5 (4) | 14.27% |
| 2 | 37.77 (27.38) | 3 (5) | ||
| 121 | 1 | 41.95 (17.56) | 5 (4) | 15.28% |
| 2 | 40.47 (24.18) | 4 (4) | ||
| 123 | 1 | 43.41 (26.52) | 4 (5) | 18.77% |
| 2 | 43.22 (26.63) | 4 (5) | ||
| 124 | 1 | 40.06 (18.46) | 4 (3) | 17.28% |
| 2 | 35.26 (21.38) | 3 (4) |
Fig 2Boxplot of data missing rates by station and day of the week.
Fig 3Initial variogram.
Performance of proposed approach.
| Quantile | % Missing | MAD(Kr) | MAD(H) | MAD(Kk) | RMSE(Kr) | RMSE(H) | RMSE(Kk) |
|---|---|---|---|---|---|---|---|
| 25% | 1.0% | 3.65 | 5.63 | 4.66 | 7.26 | ||
| 30% | 1.4% | 3.04 | 3.81 | 3.89 | 5.25 | ||
| 35% | 1.8% | 3.21 | 5.33 | 4.44 | 6.89 | ||
| 40% | 2.8% | 3.16 | 4.42 | 4.46 | 6.25 | ||
| 45% | 4.4% | 3.40 | 4.90 | 4.72 | 6.37 | ||
| 50% | 6.0% | 3.11 | 4.06 | 4.52 | 5.78 | ||
| 55% | 8.3% | 3.11 | 4.39 | 4.52 | 5.99 | ||
| 60% | 13.9% | 2.93 | 3.99 | 4.13 | 5.55 | ||
| 65% | 20.7% | 3.20 | 4.16 | 4.44 | 5.75 | ||
| 70% | 31.1% | 3.01 | 4.18 | 4.26 | 5.87 | ||
| 75% | 36.1% | 3.07 | 4.00 | 4.30 | 5.68 |
Note: Kr represents the proposed imputation method, H represents the historical average method, and Kk represents the KNN method.
Fig 4Imputation performance of proposed approach.