| Literature DB >> 32992459 |
Anurag Yedla1, Fatemeh Davoudi Kakhki2, Ali Jannesari1.
Abstract
Mining is known to be one of the most hazardous occupations in the world. Many serious accidents have occurred worldwide over the years in mining. Although there have been efforts to create a safer work environment for miners, the number of accidents occurring at the mining sites is still significant. Machine learning techniques and predictive analytics are becoming one of the leading resources to create safer work environments in the manufacturing and construction industries. These techniques are leveraged to generate actionable insights to improve decision-making. A large amount of mining safety-related data are available, and machine learning algorithms can be used to analyze the data. The use of machine learning techniques can significantly benefit the mining industry. Decision tree, random forest, and artificial neural networks were implemented to analyze the outcomes of mining accidents. These machine learning models were also used to predict days away from work. An accidents dataset provided by the Mine Safety and Health Administration was used to train the models. The models were trained separately on tabular data and narratives. The use of a synthetic data augmentation technique using word embedding was also investigated to tackle the data imbalance problem. Performance of all the models was compared with the performance of the traditional logistic regression model. The results show that models trained on narratives performed better than the models trained on structured/tabular data in predicting the outcome of the accident. The higher predictive power of the models trained on narratives led to the conclusion that the narratives have additional information relevant to the outcome of injury compared to the tabular entries. The models trained on tabular data had a lower mean squared error compared to the models trained on narratives while predicting the days away from work. The results highlight the importance of predictors, like shift start time, accident time, and mining experience in predicting the days away from work. It was found that the F1 score of all the underrepresented classes except one improved after the use of the data augmentation technique. This approach gave greater insight into the factors influencing the outcome of the accident and days away from work.Entities:
Keywords: machine learning; neural networks; word embedding
Mesh:
Year: 2020 PMID: 32992459 PMCID: PMC7579604 DOI: 10.3390/ijerph17197054
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Number of records in each target class before and after synthetic augmentation.
| Target Class | Count before Augmentation | Count after Augmentation |
|---|---|---|
| Class1: All Other Cases (Including 1st Aid) | 676 | 7564 |
| Class2: Days Away From Work Only | 31,653 | 31,653 |
| Class3: Days Restricted Activity Only | 16,633 | 16,633 |
| Class4: Days Away From Work & Restricted Activity | 10,025 | 10,025 |
| Class5: Fatality | 336 | 3842 |
| Class6: Injuries due to Natural Causes | 444 | 2785 |
| Class7: No Days Away From Work, No Restricted Activity | 27,627 | 27,627 |
| Class8: Occupational Illness not DEG 1–6 | 1346 | 9676 |
| Class9: Permanent Total or Permanent Partial Disability | 895 | 12,796 |
Figure 1An example of one-hot encoding.
Figure 2An example of encoding based on target statistics.
Figure 3Converting each word to a vector of length 300.
Figure 4Vector representation of narratives.
Figure 5A simple confusion matrix.
Accuracy and F1 score for all the models (fixed field entries).
| Model | F1 Score | Accuracy |
|---|---|---|
| Logistic regression | 0.64 | 67% |
| Decision Tree | 0.58 | 58% |
| Random Forest | 0.66 | 66% |
| Artificial Neural Network | 0.67 | 78% |
Accuracy and F1 score for all the models (imbalanced narratives).
| Model | F1 Score | Accuracy |
|---|---|---|
| Random Forest | 0.93 | 93% |
| Artificial Neural Network | 0.60 | 92% |
Figure 6Confusion matrix for random forest trained on injury narratives.
Figure 7F1 score of artificial neural networks on unbalanced and augmented narratives.
MSE and RMSE for all the models.
| Model | Input | MSE | RMSE |
|---|---|---|---|
| Random forest | Fixed Field Entries | 14.65 | 3.82 |
| Injury Narratives | 1502.61 | 38.76 | |
| Artificial neural network | Fixed Field Entries | 0.38 | 0.62 |
| Injury Narratives | 5944.74 | 77.10 |
Dependent variables and their description in descending order of their importance.
| Feature | Description |
|---|---|
| Nature of Injury | Identifies the injury in terms of its principal physical characteristics. |
| Injured body part | Identifies the body part affected by an injury. |
| Occupation | Occupation of the accident victim’s regular job title. |
| Coal or Metal | Identifies if the accident occurred at a Coal or Metal/Non-Metal mine. |
| Job Experience | Experience in the job title of the person affected calculated in the decimal year. |
| Hours | Time difference between accident time and shift begin time in hours. |
| Injury Source | Identifies the object, substances, exposure or bodily motion which directly produced or inflicted the injury. |
| Classification | Identifies the circumstances which contributed most directly to the resulting accident. |
| Activity | Specific activity the accident victim was performing at the time of the incident. |
| Accident type | Identifies the event which directly resulted in the injury/accident. |
| Sub-unit | The Sub-unit of the mining site where the accident occurred. |
| Mine experience | Total experience at a specific mine of the person affected calculated in decimal years. |
| Total experience | Total mining experience of the person affected calculated in decimal years. |