| Literature DB >> 30856396 |
Matthias Schlögl1, Rainer Stütz2, Gregor Laaha3, Michael Melcher3.
Abstract
One of the main aims of accident data analysis is to derive the determining factors associated with road traffic accident occurrence. While current studies mainly use variants of count data regression to achieve this aim, the problem can also be considered as a binary classification task, with the dichotomous target variable indicating events (accidents) and non-events (no accidents). The effects of 45 variables - describing road condition and geometry, traffic volume and regulations, weather, and accident time - are analyzed using a dataset in high temporal (1 h) and spatial (250 m) resolution, covering the whole highway network of Austria over the period of four consecutive years. A combination of synthetic minority oversampling and maximum dissimilarity undersampling is used to balance the training dataset. We employ and compare a series of statistical learning techniques with respect to their predictive performance and discuss the importance of determining factors of accident occurrence from the ensemble of models. Findings substantiate that a trade-off between accuracy and sensitivity is inherent to imbalanced classification problems. Results show satisfying performance of tree-based methods which exhibit accuracies between 75% and 90% while exhibiting sensitivities between 30% and 50%. Overall, this analysis emphasizes the merits of using high-resolution data in the context of accident analysis.Keywords: Accident analysis; Binary classification; Imbalanced data; Road safety; Statistical learning
Mesh:
Year: 2019 PMID: 30856396 DOI: 10.1016/j.aap.2019.02.008
Source DB: PubMed Journal: Accid Anal Prev ISSN: 0001-4575