Gaurav Nanda1, Kathleen M Grattan2, MyDzung T Chu3, Letitia K Davis3, Mark R Lehto4. 1. School of Industrial Engineering, Purdue University, 315 N. Grant Street, West Lafayette, IN 47907-2023, USA. Electronic address: gnanda@purdue.edu. 2. Massachusetts Department of Public Health, 250 Washington Street, 4th Floor, Boston, MA 02108, USA. Electronic address: kathleen.grattan@state.ma.us. 3. Massachusetts Department of Public Health, 250 Washington Street, 4th Floor, Boston, MA 02108, USA. 4. School of Industrial Engineering, Purdue University, 315 N. Grant Street, West Lafayette, IN 47907-2023, USA. Electronic address: lehto@purdue.edu.
Abstract
INTRODUCTION: Studies on autocoding injury data have found that machine learning algorithms perform well for categories that occur frequently but often struggle with rare categories. Therefore, manual coding, although resource-intensive, cannot be eliminated. We propose a Bayesian decision support system to autocode a large portion of the data, filter cases for manual review, and assist human coders by presenting them top k prediction choices and a confusion matrix of predictions from Bayesian models. METHOD: We studied the prediction performance of Single-Word (SW) and Two-Word-Sequence (TW) Naïve Bayes models on a sample of data from the 2011 Survey of Occupational Injury and Illness (SOII). We used the agreement in prediction results of SW and TW models, and various prediction strength thresholds for autocoding and filtering cases for manual review. We also studied the sensitivity of the top k predictions of the SW model, TW model, and SW-TW combination, and then compared the accuracy of the manually assigned codes to SOII data with that of the proposed system. RESULTS: The accuracy of the proposed system, assuming well-trained coders reviewing a subset of only 26% of cases flagged for review, was estimated to be comparable (86.5%) to the accuracy of the original coding of the data set (range: 73%-86.8%). Overall, the TW model had higher sensitivity than the SW model, and the accuracy of the prediction results increased when the two models agreed, and for higher prediction strength thresholds. The sensitivity of the top five predictions was 93%. CONCLUSIONS: The proposed system seems promising for coding injury data as it offers comparable accuracy and less manual coding. PRACTICAL APPLICATIONS: Accurate and timely coded occupational injury data is useful for surveillance as well as prevention activities that aim to make workplaces safer.
INTRODUCTION: Studies on autocoding injury data have found that machine learning algorithms perform well for categories that occur frequently but often struggle with rare categories. Therefore, manual coding, although resource-intensive, cannot be eliminated. We propose a Bayesian decision support system to autocode a large portion of the data, filter cases for manual review, and assist human coders by presenting them top k prediction choices and a confusion matrix of predictions from Bayesian models. METHOD: We studied the prediction performance of Single-Word (SW) and Two-Word-Sequence (TW) Naïve Bayes models on a sample of data from the 2011 Survey of Occupational Injury and Illness (SOII). We used the agreement in prediction results of SW and TW models, and various prediction strength thresholds for autocoding and filtering cases for manual review. We also studied the sensitivity of the top k predictions of the SW model, TW model, and SW-TW combination, and then compared the accuracy of the manually assigned codes to SOII data with that of the proposed system. RESULTS: The accuracy of the proposed system, assuming well-trained coders reviewing a subset of only 26% of cases flagged for review, was estimated to be comparable (86.5%) to the accuracy of the original coding of the data set (range: 73%-86.8%). Overall, the TW model had higher sensitivity than the SW model, and the accuracy of the prediction results increased when the two models agreed, and for higher prediction strength thresholds. The sensitivity of the top five predictions was 93%. CONCLUSIONS: The proposed system seems promising for coding injury data as it offers comparable accuracy and less manual coding. PRACTICAL APPLICATIONS: Accurate and timely coded occupational injury data is useful for surveillance as well as prevention activities that aim to make workplaces safer.
Authors: Alysha R Meyers; Ibraheem S Al-Tarawneh; Steven J Wurzelbacher; P Timothy Bushnell; Michael P Lampl; Jennifer L Bell; Stephen J Bertke; David C Robins; Chih-Yu Tseng; Chia Wei; Jill A Raudabaugh; Teresa M Schnorr Journal: J Occup Environ Med Date: 2018-01 Impact factor: 2.162
Authors: Mohamed Zul Fadhli Khairuddin; Khairunnisa Hasikin; Nasrul Anuar Abd Razak; Khin Wee Lai; Mohd Zamri Osman; Muhammet Fatih Aslan; Kadir Sabanci; Muhammad Mokhzaini Azizan; Suresh Chandra Satapathy; Xiang Wu Journal: Front Public Health Date: 2022-09-15