Literature DB >> 27863339

Classifying injury narratives of large administrative databases for surveillance-A practical approach combining machine learning ensembles and human review.

Helen R Marucci-Wellman1, Helen L Corns2, Mark R Lehto3.   

Abstract

Injury narratives are now available real time and include useful information for injury surveillance and prevention. However, manual classification of the cause or events leading to injury found in large batches of narratives, such as workers compensation claims databases, can be prohibitive. In this study we compare the utility of four machine learning algorithms (Naïve Bayes, Single word and Bi-gram models, Support Vector Machine and Logistic Regression) for classifying narratives into Bureau of Labor Statistics Occupational Injury and Illness event leading to injury classifications for a large workers compensation database. These algorithms are known to do well classifying narrative text and are fairly easy to implement with off-the-shelf software packages such as Python. We propose human-machine learning ensemble approaches which maximize the power and accuracy of the algorithms for machine-assigned codes and allow for strategic filtering of rare, emerging or ambiguous narratives for manual review. We compare human-machine approaches based on filtering on the prediction strength of the classifier vs. agreement between algorithms. Regularized Logistic Regression (LR) was the best performing algorithm alone. Using this algorithm and filtering out the bottom 30% of predictions for manual review resulted in high accuracy (overall sensitivity/positive predictive value of 0.89) of the final machine-human coded dataset. The best pairings of algorithms included Naïve Bayes with Support Vector Machine whereby the triple ensemble NBSW=NBBI-GRAM=SVM had very high performance (0.93 overall sensitivity/positive predictive value and high accuracy (i.e. high sensitivity and positive predictive values)) across both large and small categories leaving 41% of the narratives for manual review. Integrating LR into this ensemble mix improved performance only slightly. For large administrative datasets we propose incorporation of methods based on human-machine pairings such as we have done here, utilizing readily-available off-the-shelf machine learning techniques and resulting in only a fraction of narratives that require manual review. Human-machine ensemble methods are likely to improve performance over total manual coding.
Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

Entities:  

Keywords:  Cause of injury; Injury; Injury surveillance; Machine learning; Narrative text

Mesh:

Year:  2016        PMID: 27863339     DOI: 10.1016/j.aap.2016.10.014

Source DB:  PubMed          Journal:  Accid Anal Prev        ISSN: 0001-4575


  5 in total

1.  Applying Machine Learning to Workers' Compensation Data to Identify Industry-Specific Ergonomic and Safety Prevention Priorities: Ohio, 2001 to 2011.

Authors:  Alysha R Meyers; Ibraheem S Al-Tarawneh; Steven J Wurzelbacher; P Timothy Bushnell; Michael P Lampl; Jennifer L Bell; Stephen J Bertke; David C Robins; Chih-Yu Tseng; Chia Wei; Jill A Raudabaugh; Teresa M Schnorr
Journal:  J Occup Environ Med       Date:  2018-01       Impact factor: 2.162

2.  Workers' compensation claim counts and rates by injury event/exposure among state-insured private employers in Ohio, 2007-2017.

Authors:  Steven J Wurzelbacher; Alysha R Meyers; Michael P Lampl; P Timothy Bushnell; Stephen J Bertke; David C Robins; Chih-Yu Tseng; Steven J Naber
Journal:  J Safety Res       Date:  2021-09-17

3.  Derivation and validation of different machine-learning models in mortality prediction of trauma in motorcycle riders: a cross-sectional retrospective study in southern Taiwan.

Authors:  Pao-Jen Kuo; Shao-Chun Wu; Peng-Chen Chien; Cheng-Shyuan Rau; Yi-Chun Chen; Hsiao-Yun Hsieh; Ching-Hua Hsieh
Journal:  BMJ Open       Date:  2018-01-05       Impact factor: 2.692

4.  Predicting occupational injury causal factors using text-based analytics: A systematic review.

Authors:  Mohamed Zul Fadhli Khairuddin; Khairunnisa Hasikin; Nasrul Anuar Abd Razak; Khin Wee Lai; Mohd Zamri Osman; Muhammet Fatih Aslan; Kadir Sabanci; Muhammad Mokhzaini Azizan; Suresh Chandra Satapathy; Xiang Wu
Journal:  Front Public Health       Date:  2022-09-15

5.  Predictive Modeling for Occupational Safety Outcomes and Days Away from Work Analysis in Mining Operations.

Authors:  Anurag Yedla; Fatemeh Davoudi Kakhki; Ali Jannesari
Journal:  Int J Environ Res Public Health       Date:  2020-09-27       Impact factor: 3.390

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.