| Literature DB >> 36032187 |
ChienHsing Wu1, Shu-Chen Kao2, Chia-Chen Chang1.
Abstract
Extracting knowledge from open data of traffic accidents has been attracting increasing attention to policymakers responsible for road safety. This article presents a knowledge elicitation approach to exploring the determinants of traffic accidents from open government data of an urban area in Taiwan. The collected open dataset contains 34 decisional attributes and one predictive attribute (i.e., type of injury, including head, breast, leg), and 47,974 cases. Prediction models using a classification-oriented mechanism and generated rules that considered datasets from before (B-dataset; 30,116 cases) and after (A-dataset; 17,868 cases) beginning to combat the Covid-19 pandemic in an urban area of Taiwan were compared. The findings showed that prediction accuracy was acceptable but not high, at 70.73% for B-dataset and 74.77% for A-dataset. Determinants in the human and vehicle categories revealed higher classification ranks than those in the temporal and environment categories. Traffic accidents involving motorcycles were 5.13% higher in A-dataset, whereas those involving cars were 4.11% lower. Injury on leg or foot was 3.46% higher in A-dataset, whereas other types of injury were up to 1.00% lower. The average support for rules in the A-dataset rule base and the simplicity of the A-dataset decision tree were higher than those of B-dataset. The research demonstrates the value of open government data in prediction model development and knowledge elicitation to support policymaking in the traffic safety domain.Entities:
Keywords: Covid-19 pandemic; Knowledge elicitation; Open government data; Traffic accidents
Year: 2022 PMID: 36032187 PMCID: PMC9398789 DOI: 10.1016/j.heliyon.2022.e10302
Source DB: PubMed Journal: Heliyon ISSN: 2405-8440
Research design.
| Feature | Description |
|---|---|
| Objectives | Disclose determinants of traffic accidents using a data mining approach Discover knowledge for traffic accident prediction Compare traffic accidents before and after the outbreak of the Covid-19 pandemic regarding the main determinants of traffic accidents, the vehicle types (car, bus, motorcycle) and injury types (e.g., head, breast, leg, etc.), and the decision rules extracted. |
| Open dataset collection | Collect open government data of vehicle accidents in Taoyuan city from January 1, 2017 to June 30, 2021 Original dataset comprises 34 decisional attributes (e.g., weather, city road, speed limit, collision point, age) and one predictive attribute (injury) Original dataset comprises 47,974 original cases with injury labels (e.g., head, breast, leg, etc.) |
| Data pre-processing | Granulate continuous attributes using the equal with interval technique Group decisional attributes into four categories: temporal (5 attributes), environmental (15 attributes), human (12 attributes), and vehicle (2 attributes) Divide original dataset into two subsets (B-dataset and A-dataset) according to the critical date of the impact of Covid-19 in Taiwan (January 24, 2020) |
| Attribute dimension reduction | Reduce attribute dimensions based on the CP using the C4.5 algorithm Remove inconsistent granulated data based on the 60% consistency acceptance level ( |
| Mining mechanism | Conduct data mining with a tree-based algorithm (C4.5) with size of leaves equal to 2 Consider entire consistent (cleaned) granulated dataset |
| Training and testing | Use criteria of 70% for training and 30% for testing Obtain prediction accuracy |
| Knowledge discovered | Obtain decision rules with depth, support, and reliability Present findings and discuss implications of generated rules with high levels of support (e.g., 50) |
Ranks of CP of decisional attributes for B-dataset and A-dataset.
| Category and attribute code | Attribute name | Ranks from C4.5 | |
|---|---|---|---|
| Temporal | B-dataset | A-dataset | |
| X01 | Year | 33 | 34 |
| X02 | Month | 30 | 24 |
| X03 | Day | 34 | 33 |
| X04 | Week | 32 | 32 |
| X05 | Hour | 26 | 29 |
| Environmental | |||
| X06 | Weather (e.g., rain, storm, cloudy) | 31 | 30 |
| X07 | Light (e.g., dawn, dusk) | 25 | 27 |
| X08 | Road category (e.g., city road, country road) | 14 | 21 |
| X09 | Road type (e.g. railway, bridge, multiple intersection, overpass) | 18 | 14 |
| X10 | Speed limit (e.g., 50 km/h) | 17 | 12 |
| X11 | Road condition (e.g., railroad crossing, single lane) | 22 | 19 |
| X12 | Accident location (e.g., intersection, road section) | 19 | 18 |
| X13 | Accident site (e.g., left turn waiting zone, U-turn lane) | 15 | 15 |
| X14 | Road pavement (e.g., concrete, gravel) | 9 | 5 |
| X15 | Road pavement condition (e.g., snow or ice, wet, slippery) | 29 | 26 |
| X16 | Road pavement defect (e.g., bumpy, soft surface) | 13 | 7 |
| X17 | Road obstacles (e.g., under construction, parked vehicle) | 21 | 22 |
| X18 | Line of sight (e.g., curve, slope) | 24 | 23 |
| X19 | Signal type (e.g., traffic light, flashing signal) | 28 | 31 |
| X20 | Signal condition (e.g., normal, abnormal) | 27 | 28 |
| Human | |||
| X21 | Nationality (e.g., Taiwan, non-Taiwan) | 23 | 25 |
| X22 | Gender (e.g., male, female) | 16 | 16 |
| X23 | Age | 12 | 13 |
| X24 | Occupation (e.g., business man, affair worker) | 11 | 17 |
| X25 | Travel purpose (e.g., for work, for school) | 20 | 20 |
| X26 | Behavioral condition (e.g., parking, left-turning) | 10 | 11 |
| X27 | License status (e.g., legal, suspended) | 8 | 10 |
| X28 | License type (e.g., professional, regular) | 2 | 2 |
| X29 | License vehicle type (e.g., trailer, car, heavy, motorcycle) | 3 | 3 |
| X30 | Alcohol (e.g., zero or less than 0.15 mg/L, between 0.16 and 0.25) | 5 | 6 |
| X31 | Device use (e.g., cellphone) | 7 | 9 |
| X32 | Safety device use (e.g., helmet, no seatbelt) | 6 | 8 |
| Vehicle | |||
| X33 | Vehicle type (e.g., bus, car, motorcycle) | 1 | 1 |
| X34 | Collision point (e.g., front end, left side) | 4 | 4 |
Figure 1Ranks of attributes for B-dataset (before pandemic response) and A-dataset (after pandemic response).
Removal of inconsistent data and prediction accuracy.
| Datasets | Dataset size after inconsistency removal | Prediction accuracy | |
|---|---|---|---|
| Before cleaning | B-dataset-top15 | 30,116 | 55.52% |
| A-dataset-top15 | 17,868 | 55.30% | |
| After cleaning | B-dataset-top15-clean | 21,490 | 70.73% |
| A-dataset-top15-clean | 12,065 | 74.77% | |
Relative frequency of vehicle types and injury types for B-dataset (before pandemic response) and A-dataset (after pandemic response).
| Vehicle type (VT) | B | Relative frequency | A | Relative frequency | Difference |
|---|---|---|---|---|---|
| VT1 (bus) | 1699 | 7.91% | 831 | 6.89% | −1.02% |
| VT2 (car) | 16286 | 75.78% | 8647 | 71.67% | −4.11% |
| VT3 (motorcycle) | 3505 | 16.31% | 2587 | 21.44% | 5.13% |
| Total | 21490 | 12065 | |||
| Injury type (IT) | |||||
| IT1 (head) | 1727 | 8.04% | 691 | 5.73% | −2.31% |
| IT2 (neck) | 209 | 0.97% | 121 | 1.00% | 0.03% |
| IT3 (breast) | 380 | 1.77% | 197 | 1.63% | −0.14% |
| IT4 (abdomen) | 95 | 0.44% | 55 | 0.46% | 0.01% |
| IT5 (waist) | 302 | 1.41% | 155 | 1.28% | −0.12% |
| IT6 (back) | 162 | 0.75% | 90 | 0.75% | −0.01% |
| IT7 (hand/wrist) | 3120 | 14.52% | 1639 | 13.58% | −0.93% |
| IT8 (leg/foot) | 15495 | 72.10% | 9117 | 75.57% | 3.46% |
| Total | 21490 | 12065 |
Main features of B-Rulebase and A-Rulebase
| Feature | B-Rulebase | A-Rulebase | Note |
|---|---|---|---|
| Dataset size | 21,490 | 12,065 | |
| Rules generated | 9,622 | 4697 | |
| Mean support | 2.2334 | 2.5687 | A > B |
| Mean simplicity | 0.3799 | 0.4613 | A > B |
| No. of highly supported rules | 31 | 21 | ≧50 supports |
| Top 5 support values | 740, 453, 219, 218, 178 | 599, 377, 173, 153, 132 |
Figure 2Support and simplicity for B-Rulebase and A-Rulebase.
Extract from B-Rulebase (support >50).
| RID | X33 | X28 | X29 | X34 | X30 | X32 | X31 | X27 | X14 | X26 | X24 | X23 | X16 | X08 | X13 | Class | Dep. | Sup. | Rel. | Class distribution | Simplicity |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1752 | DLS10 | VHS11 | DAS2 | PEC1 | FDC1 | SLS1 | RS1 | SBC9 | OC21 | RSD4 | RC5 | AS9 | MW8 | 12 | 740 | 1 | 0,0,0,0,0,0,0,740 | 61.6667 | |||
| 1585 | DLS10 | VHS11 | DAS2 | SLS1 | SBC9 | OC21 | '∖'B3of10∖'' | RSD4 | RC5 | AS1 | MW8 | 10 | 453 | 1 | 0,0,0,0,0,0,0,453 | 45.3000 | |||||
| 1588 | DLS10 | VHS11 | DAS2 | FDC1 | SBC9 | OC21 | '∖'B4of10∖'' | RC5 | AS1 | MW8 | 9 | 219 | 1 | 0,0,0,0,0,0,0,219 | 24.3333 | ||||||
| 1598 | DLS10 | VHS11 | DAS2 | SBC9 | OC21 | '∖'B2of10∖'' | RC5 | AS1 | MW8 | 8 | 218 | 1 | 0,0,0,0,0,0,0,218 | 27.2500 | |||||||
| 5737 | DLS10 | VHS14 | DAS2 | SBC9 | OC21 | '∖'B3of10∖'' | AS1 | MW8 | 7 | 178 | 1 | 0,0,0,0,0,0,0,178 | 25.4286 | ||||||||
| 1683 | DLS10 | VHS11 | DAS2 | RS1 | SBC9 | OC21 | RC6 | AS2 | MW8 | 8 | 150 | 1 | 0,0,0,0,0,0,0,150 | 18.7500 | |||||||
| 3529 | VHS12 | DAS2 | PEC1 | SLS1 | SBC9 | OC21 | '∖'B3of10∖'' | AS1 | MW8 | 8 | 127 | 1 | 0,0,0,0,0,0,0,127 | 15.8750 | |||||||
| 1591 | DLS10 | VHS11 | DAS2 | PEC1 | SLS1 | SBC9 | OC21 | '∖'B5of10∖'' | RC5 | AS1 | MW8 | 10 | 114 | 1 | 0,0,0,0,0,0,0,114 | 11.4000 | |||||
| 2683 | VHS11 | OC22 | RC7 | AS1 | MW8 | 4 | 114 | 1 | 0,0,0,0,0,0,0,114 | 28.5000 | |||||||||||
| 1672 | DLS10 | VHS11 | DAS2 | SLS1 | RS1 | SBC9 | OC21 | '∖'B3of10∖'' | RSD4 | RC5 | AS2 | MW8 | 11 | 112 | 1 | 0,0,0,0,0,0,0,112 | 10.1818 | ||||
| 3902 | DLS10 | VHS12 | RS1 | SBC9 | OC21 | '∖'B3of10∖'' | RSD4 | AS9 | MW8 | 8 | 99 | 1 | 0,0,0,0,0,0,0,99 | 12.3750 | |||||||
| 5738 | DLS10 | VHS14 | DAS2 | SBC9 | OC21 | '∖'B4of10∖'' | AS1 | MW8 | 7 | 93 | 1 | 0,0,0,0,0,0,0,93 | 13.2857 | ||||||||
| 2430 | DLS10 | VHS11 | PEC1 | SBC9 | OC22 | '∖'B3of10∖'' | RC5 | AS1 | MW8 | 8 | 92 | 1 | 0,0,0,0,0,0,0,92 | 11.5000 | |||||||
| 2772 | VHS11 | OC4 | '∖'B3of10∖'' | RSD4 | RC5 | AS1 | MW8 | 6 | 91 | 1 | 0,0,0,0,0,0,0,91 | 15.1667 | |||||||||
| 3538 | VHS12 | DAS2 | SBC9 | OC21 | '∖'B4of10∖'' | RC5 | AS1 | MW8 | 7 | 81 | 1 | 0,0,0,0,0,0,0,81 | 11.5714 | ||||||||
| 5799 | DLS10 | VHS14 | DAS2 | PEC1 | SLS1 | SBC9 | OC21 | '∖'B3of10∖'' | RSD4 | RC5 | AS9 | MW8 | 11 | 79 | 1 | 0,0,0,0,0,0,0,79 | 7.1818 | ||||
| 4774 | DLS10 | VHS13 | SBC9 | OC21 | '∖'B3of10∖'' | RC5 | MW8 | 6 | 74 | 1 | 0,0,0,0,0,0,0,74 | 12.3333 | |||||||||
| 1720 | DLS10 | VHS11 | DAS2 | PEC1 | SLS1 | SBC9 | OC21 | '∖'B3of10∖'' | AS8 | MW8 | 9 | 73 | 1 | 0,0,0,0,0,0,0,73 | 8.1111 |
RID: Rule identifier, Dep.: Depth, Sup.: Support, Rel: Reliability.
Extract from A-Rulebase (support >50).
| RID | X33 | X28 | X29 | X34 | X14 | X30 | X16 | X32 | X31 | X27 | X26 | X10 | X23 | X9 | X13 | Class | Dep | Sup. | Rel. | Class distribution | Simplicity |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 915 | DLS10 | VHS11 | DAS2 | RSD4 | FDC1 | SLS1 | SBC9 | SL50 | AS9 | MW8 | 9 | 599 | 1 | 0,0,0,0,0,0,0,599 | 66.5556 | ||||||
| 834 | DLS10 | VHS11 | DAS2 | RSD4 | SBC9 | SL50 | '∖'B3of10∖'' | AS1 | MW8 | 8 | 377 | 1 | 0,0,0,0,0,0,0,377 | 47.1250 | |||||||
| 835 | DLS10 | RS1 | DAS2 | SLS1 | SBC9 | SL50 | '∖'B4of10∖'' | AS1 | MW8 | 9 | 173 | 1 | 0,0,0,0,0,0,0,173 | 19.2222 | |||||||
| 2540 | DLS10 | VHS14 | SBC9 | '∖'B3of10∖'' | AS1 | MW8 | 5 | 153 | 1 | 0,0,0,0,0,0,0,153 | 30.6000 | ||||||||||
| 864 | DLS10 | VHS11 | DAS2 | PEC1 | FDC1 | SBC9 | SL50 | '∖'B3of10∖'' | AS2 | MW8 | 9 | 132 | 1 | 0,0,0,0,0,0,0,132 | 14.6667 | ||||||
| 527 | DLS10 | VHS11 | RS1 | DAS2 | PEC1 | FDC1 | SBC9 | SL40 | AS1 | MW8 | 9 | 127 | 1 | 0,0,0,0,0,0,0,127 | 14.1111 | ||||||
| 842 | DLS10 | VHS11 | DAS2 | SBC9 | SL50 | '∖'B2of10∖'' | AS1 | MW8 | 7 | 122 | 1 | 0,0,0,0,0,0,0,122 | 17.4286 | ||||||||
| 838 | DLS10 | VHS11 | DAS2 | SLS1 | SBC9 | SL50 | '∖'B5of10∖'' | AS1 | MW8 | 8 | 115 | 1 | 0,0,0,0,0,0,0,115 | 14.3750 | |||||||
| 1421 | DLS10 | VHS12 | FDC1 | SBC5 | SL50 | AS1 | MW8 | 6 | 96 | 1 | 0,0,0,0,0,0,0,96 | 16.0000 | |||||||||
| 750 | DLS10 | VHS11 | FDC1 | SBC5 | SL50 | AS1 | MW8 | 6 | 91 | 1 | 0,0,0,0,0,0,0,91 | 15.1667 | |||||||||
| 1609 | VHS12 | DAS2 | SBC9 | SL50 | '∖'B3of10∖'' | AS1 | MW8 | 6 | 83 | 1 | 0,0,0,0,0,0,0,83 | 13.8333 | |||||||||
| 2509 | DLS10 | VHS14 | FDC1 | SBC9 | SL50 | '∖'B4of10∖'' | AS1 | MW8 | 7 | 69 | 1 | 0,0,0,0,0,0,0,69 | 9.8571 | ||||||||
| 869 | DLS10 | VHS11 | RS1 | DAS2 | SBC9 | SL50 | '∖'B4of10∖'' | AS2 | MW8 | 8 | 69 | 1 | 0,0,0,0,0,0,0,69 | 8.6250 | |||||||
| 2648 | DLS10 | VHS14 | DAS2 | SBC9 | SL50 | '∖'B3of10∖'' | RT3 | AS9 | MW8 | 8 | 69 | 1 | 0,0,0,0,0,0,0,69 | 8.6250 | |||||||
| 2366 | VHS14 | DAS2 | SLS1 | SBC5 | SL50 | '∖'B3of10∖'' | MW8 | 6 | 68 | 1 | 0,0,0,0,0,0,0,68 | 11.3333 | |||||||||
| 402 | DLS10 | VHS11 | DAS2 | FDC1 | SBC9 | SL30 | AS9 | MW8 | 7 | 68 | 1 | 0,0,0,0,0,0,0,68 | 9.7143 | ||||||||
| 1753 | DLS10 | VHS12 | DAS2 | RSD4 | SBC9 | SL50 | '∖'B3of10∖'' | RT3 | AS9 | MW8 | 9 | 68 | 1 | 0,0,0,0,0,0,0,68 | 7.5556 | ||||||
| 561 | DLS10 | VHS11 | DAS2 | FDC1 | SBC9 | SL40 | '∖'B3of10∖'' | AS9 | MW8 | 8 | 63 | 1 | 0,0,0,0,0,0,0,63 | 7.8750 |
RID: Rule identifier, Dep.: Depth, Sup.: Support, Rel: Reliability.
Extracts from B-Rulebase and A-Rulebase with a support value more than 50.
| B-RB | Values in B | Value in A | A-RB | |
|---|---|---|---|---|
| Predictive attribute | V | {leg or foot} | V | |
| Decisional attributes | ||||
| Collision point | V | {front end, right side} | V | |
| Driver license | V | {motorcycle license} | V | |
| Occupation | V | {unknown} | ||
| Alcohol test | V | {pass} | V | |
| Age | V | {from 11 to 30} | {from 21 to 30} | V |
| Behavior condition | V | {moving forward} | V | |
| Safety device use | V | {helmet or fasten seatbelt} | V | |
| Accident cite | V | {regular lane, on intersection} | {regular lane, near intersection, on intersection} | V |
| License status | V | {legal} | V | |
| Road category | V | {city road} | ||
| Device use (mobile phone) | V | {no} | ||
| Road pavement | V | {asphalt} | ||
| Road pavement defect | V | {None} | ||
| Speed limit | {50} | V | ||
| Road pavement condition | {wet} | V | ||
B-RB: B-Rulebase, A-RB: A-Rulebase