| Literature DB >> 36078617 |
Qiang Shang1, Tian Xie1, Yang Yu1.
Abstract
Traffic accidents causing nonrecurrent congestion and road traffic injuries seriously affect public safety. It is helpful for traffic operation and management to predict the duration of traffic incidents. Most of the previous studies have been in a certain area with a single data source. This paper proposes a hybrid deep learning model based on multi-source incomplete data to predict the duration of countrywide traffic incidents in the U.S. The text data from the natural language description in the model were parsed by the latent Dirichlet allocation (LDA) topic model and input into the bidirectional long short-term memory (Bi-LSTM) and long short-term memory (LSTM) hybrid network together with sensor data for training. Compared with the four benchmark models and three state-of-the-art algorithms, the RMSE and MAE of the proposed method were the lowest. At the same time, the proposed model performed best for durations between 20 and 70 min. Finally, the data acquisition was defined as three phases, and a phased sequential prediction model was proposed under the condition of incomplete data. The results show that the model performance was better with the update of variables.Entities:
Keywords: deep learning; duration prediction; hybrid network; intelligent transportation systems; traffic data fusion
Mesh:
Year: 2022 PMID: 36078617 PMCID: PMC9518162 DOI: 10.3390/ijerph191710903
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 4.614
Figure 1The composition of incident duration.
Figure 2The general structure of the LSTM and Bi-LSTM.
Figure 3The structure of the proposed prediction model.
The feature variables used in this study.
| Categories | No. | Variables | Type | Coding |
|---|---|---|---|---|
| One-phase | 1 | Temperature (F) | Continuous | Numeric |
| 2 | Wind Chill (F) | Continuous | Numeric | |
| 3 | Humidity (%) | Continuous | Numeric | |
| 4 | Pressure (in) | Continuous | Numeric | |
| 5 | Visibility (mi) | Continuous | Numeric | |
| 6 | Wind Speed (mph) | Continuous | Numeric | |
| 7 | Weather Condition | Binary | Normal = 0, Severe = 1 | |
| 8 | Amenity | Binary | Not exist = 0, Nearby exist = 1 | |
| 9 | Bump | Binary | Not exist = 0, Nearby exist = 1 | |
| 10 | Crossing | Binary | Not exist = 0, Nearby exist = 1 | |
| 11 | Give Way | Binary | Not exist = 0, Nearby exist = 1 | |
| 12 | Junction | Binary | Not exist = 0, Nearby exist = 1 | |
| 13 | No Exit | Binary | Not exist = 0, Nearby exist = 1 | |
| 14 | Railway | Binary | Not exist = 0, Nearby exist = 1 | |
| 15 | Roundabout | Binary | Not exist = 0, Nearby exist = 1 | |
| 16 | Station | Binary | Not exist = 0, Nearby exist = 1 | |
| 17 | Stop | Binary | Not exist = 0, Nearby exist = 1 | |
| 18 | Traffic Calming | Binary | Not exist = 0, Nearby exist = 1 | |
| 19 | Traffic Signal | Binary | Not exist = 0, Nearby exist = 1 | |
| 20 | Sunrise Sunset | Binary | Night = 0, Daylight = 1 | |
| 21 | Civil Twilight | Binary | Night = 0, Daylight = 1 | |
| 22 | Nautical Twilight | Binary | Night = 0, Daylight = 1 | |
| 23 | Astronomical Twilight | Binary | Night = 0, Daylight = 1 | |
| Two-phase | 24 | Description of incidents | Natural language | Document |
| Three-phase | 25 | The severity of the incidents | Categorical | Four levels = (1, 2, 3, 4) |
| 26 | Influential distance (mi) | Continuous | Continuous | |
| Response | 27 | Duration | Continuous | Continuous |
Figure 4The distribution of the incident duration.
The hyperparameters of all methods.
| Methods | Hyperparameters |
|---|---|
| The proposed method | Initial learning rate (0.001); Adam optimizer; Number of neurons in hidden layers (LSTM = 128, Bi-LSTM = 64, Fully connected = 128); Dropout rate (0.4); Number of LDA topics (14). |
| LSTM | Number of neurons in hidden layers (LSTM = 128). |
| SVR | The linear kernel function. |
| LDA-LSTM | The same as our method. |
| LDA-BiLSTM | The same as our method. |
| MLP-LSTM | Number of neurons in hidden layers (LSTM = 128); MLP contains two hidden layers. |
| BSVR | Parameter optimized by Bayesian. |
| Ensemble Model Based on Clustering (EMC) | The K-means clustering (K = 3, 4, 5); Individual models based on the artificial neural network model. |
Figure 5The validation perplexity with various numbers of topics.
Figure 6The word clouds of the LDA topics.
The performance of different methods for the incident duration prediction.
| Methods | RMSE | MAE | MAPE |
|---|---|---|---|
| The proposed method | 15.72 | 13.12 | 0.31 |
| LSTM | 16.99 | 14.72 | 0.35 |
| SVR | 17.33 | 13.45 | 0.28 |
| LDA-LSTM | 16.25 | 13.98 | 0.34 |
| LDA-BiLSTM | 16.23 | 13.87 | 0.33 |
| MLP-LSTM | 16.21 | 13.56 | 0.31 |
| BSVR | 16.78 | 14.69 | 0.35 |
| EMC | 17.11 | 14.24 | 0.32 |
Figure 7The error of the proposed prediction model in each time period.
Figure 8The sum of the squared distances with the number of clusters.
The performance of the phased sequential prediction model in different phases.
| Incident States | RMSE | MAE | MAPE |
|---|---|---|---|
| One-phase | 16.59 | 14.37 | 0.34 |
| Two-phase | 16.33 | 14.03 | 0.33 |
| Three-phase | 15.72 | 13.12 | 0.31 |