| Literature DB >> 30691063 |
Chen Wang1,2, Lin Liu3, Chengcheng Xu4, Weitao Lv5.
Abstract
The objective of this paper is to predict the future driving risk of crash-involved drivers in Kunshan, China. A systematic machine learning framework is proposed to deal with three critical technical issues: 1. defining driving risk; 2. developing risky driving factors; 3. developing a reliable and explicable machine learning model. High-risk (HR) and low-risk (LR) drivers were defined by five different scenarios. A number of features were extracted from seven-year crash/violation records. Drivers' two-year prior crash/violation information was used to predict their driving risk in the subsequent two years. Using a one-year rolling time window, prediction models were developed for four consecutive time periods: 2013⁻2014, 2014⁻2015, 2015⁻2016, and 2016⁻2017. Four tree-based ensemble learning techniques were attempted, including random forest (RF), Adaboost with decision tree, gradient boosting decision tree (GBDT), and extreme gradient boosting decision tree (XGboost). A temporal transferability test and a follow-up study were applied to validate the trained models. The best scenario defining driving risk was multi-dimensional, encompassing crash recurrence, severity, and fault commitment. GBDT appeared to be the best model choice across all time periods, with an acceptable average precision (AP) of 0.68 on the most recent datasets (i.e., 2016⁻2017). Seven of nine top features were related to risky driving behaviors, which presented non-linear relationships with driving risk. Model transferability held within relatively short time intervals (1⁻2 years). Appropriate risk definition, complicated violation/crash features, and advanced machine learning techniques need to be considered for risk prediction task. The proposed machine learning approach is promising, so that safety interventions can be launched more effectively.Entities:
Keywords: driving risk; machine learning; temporal transferability; traffic violation behavior
Mesh:
Year: 2019 PMID: 30691063 PMCID: PMC6388263 DOI: 10.3390/ijerph16030334
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1Data usage for model development with a one-year sliding time window.
Figure 2A general framework of model development and analysis.
Basic statistics of high-risk/non-high-risk drivers.
| Scenario 1 | Scenario 2 | Scenario 3 | Scenario 4 | Scenario 5 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| HR | NHR | HR | NHR | HR | NHR | HR | NHR | HR | NHR | |
|
| 2520 | 384316 | 2062 | 377214 | 3632 | 376504 | 26442 | 361394 | 39438 | 348398 |
|
| 2323 | 384513 | 2052 | 376784 | 3736 | 375600 | 41558 | 346278 | 49432 | 338404 |
|
| 2744 | 384092 | 2229 | 375607 | 3903 | 374753 | 44629 | 343207 | 54224 | 333612 |
|
| 2825 | 384011 | 2179 | 376057 | 3836 | 374852 | 50332 | 337504 | 58234 | 329602 |
|
| 387836 | |||||||||
HR: High-risk drivers; NHR: Non-high-risk drivers.
Final features extracted from crash/violation records.
| Group | Feature | Variable | Abbreviation |
|---|---|---|---|
| Demographics | Gender | Male Driver | MAD |
| Female Driver | FD | ||
| Age | Young Driver (age < 30) | YD | |
| Middle-age Driver (30–65) | MD | ||
| Older Driver (age > 65) | OD | ||
| Car Ownership | Private Car Driver | PCD | |
| Bus Driver | BD | ||
| Large Truck Driver | LTD | ||
| Others | OE | ||
| Driving Experience | Driving Experience | DE | |
| Occupation | Famer | FM | |
| Students | SD | ||
| Unemployed | UE | ||
| Crash | Crash Frequency | Cumulative Crash Involvement | CCI |
| Cumulative Severe Crash Involvement | CSCI | ||
| Fault assignment | At-fault Crash Involvement | ACI | |
| Severe At-fault Crash Involvement | SACI | ||
| Crash Type I | Head-on Crash Involvement | HCI | |
| Angle Crash Involvement | AGCI | ||
| Sideswipe Crash Involvement | SWCI | ||
| Rear-end Crash Involvement | RECI | ||
| Single Crash Involvement | SCI | ||
| Crash Type II | Collide with Pedestrians | CWP | |
| Collide with Motorcycles | CWM | ||
| Collide with Cyclists | CWC | ||
| Intoxication | Drunk Driving/Drug Driving | DD | |
| Violation | Violation Frequency | Cumulative Violation Frequency | CVF |
| Cumulative Violation Types | CTV | ||
| Cumulative Violation Penalty Point | CVPP | ||
| Penalty Points | Maximum One-time Penalty Point | MOPP | |
| Average Penalty Points per time | APP | ||
| Penalty Fee | Cumulative Violation Penalty Fee | CVPF | |
| Average Violation Penalty Fee per time | AVPF | ||
| Maximum One-time Penalty Fee | MOPF | ||
| Dangerous Violation Counts | Red-light Running Violation | RLRV | |
| Traffic Sign/Markings Violation | TSMV | ||
| Right-of-Way Violation | ROWV | ||
| Speeding Violation over 50% | SV50 | ||
| Speeding Violation over 20–50% | SV20 | ||
| Drunk Driving Violation | DDV | ||
| Driving with Phone Usage | DPU | ||
| Overloading Violation | OV | ||
| Time Period | Late Night Violation (0–6) | LNV | |
| Morning Peak Hour Violation (7–9) | MPHV | ||
| Evening Peak Hour Violation (17–19) | EPHV | ||
| Night Violation (20–24) | NV | ||
| Location Type | Total crash >mean, severe crash >mean | TP1 | |
| Total crash >mean, severe crash <mean | TP2 | ||
| Total crash <mean, severe crash >mean | TP3 | ||
| Total crash <mean, severe crash <mean | TP4 |
Figure 3Receiver operating characteristic (ROC) curves of all sampling methods for 2016–2017 periods (scenario 3).
Figure 4Convergence of four machine learning models by GA hyperparameter tuning.
Figure 5ROC curve for five labeling scenarios for model D development.
Figure 6Precision–recall curves of four trained models for 2016–2017 period.
Confusion matrix of GBDT models for the testing dataset of 2016–2017 period with different threshold.
| Predicted | Non-High-Risk | High-Risk | GBDT Model Details | |
|---|---|---|---|---|
| Observed | ||||
| Non-high-risk | 37312 | 174 | Threshold = 0.480 | |
| High-risk | 120 | 261 | ||
| Non-high-risk | 37444 | 42 | Threshold = 0.584 | |
| High-risk | 280 | 101 | ||
| Non-high-risk | 37472 | 14 | Threshold = 0.646 | |
| High-risk | 324 | 57 | ||
Figure 7Feature importance (Gini index).
Figure 8Partial dependence of top 9 important features on risk pattern.
Figure 9Temporal transferability tests of four trained models
Figure 10Temporal transferability results.
Relative risk of predicted CP over NCP between 2018.1 and 2018.6.
| Predicted # of Drivers | Total Observation | Relative Risk (HR/NHR) | ||
|---|---|---|---|---|
| Total Crash Counts | HR | 2899 | 421 | 4.57 |
| NHR | 384,937 | 12,217 | ||
| Total major/full fault assignment | HR | 2899 | 326 | 12.49 |
| NHR | 384,937 | 3465 | ||
| Severe crash involvement | HR | 2899 | 36 | 20.97 |
| NHR | 384,937 | 228 | ||
| Severe crash with major/full fault | HR | 2899 | 26 | 62.77 |
| NHR | 384,937 | 55 | ||
| Total Property Damage Estimated | HR | 2899 | 866,546 RMB | 9.87 |
| NHR | 384,937 | 1,467,298 RMB |