Brian E Perron1, Bryan G Victor2, Gregory Bushman3, Andrew Moore3, Joseph P Ryan3, Alex Jiahong Lu4, Emily K Piellusch3. 1. Child and Adolescent Data Lab, University of Michigan, School of Social Work, 1080 S University Ave, Ann Arbor, MI, 48109, United States. Electronic address: beperron@umich.edu. 2. Indiana University School of Social Work, 902 West New York Street Indianapolis, Indiana, 46202, United States. 3. Child and Adolescent Data Lab, University of Michigan, School of Social Work, 1080 S University Ave, Ann Arbor, MI, 48109, United States. 4. Child and Adolescent Data Lab, University of Michigan, School of Social Work, 1080 S University Ave, Ann Arbor, MI, 48109, United States; University of Michigan, School of Information, 105 S State St, Ann Arbor, MI, 48109, United States.
Abstract
BACKGROUND: State child welfare agencies collect, store, and manage vast amounts of data. However, they often do not have the right data, or the data is problematic or difficult to inform strategies to improve services and system processes. Considerable resources are required to read and code these text data. Data science and text mining offer potentially efficient and cost-effective strategies for maximizing the value of these data. OBJECTIVE: The current study tests the feasibility of using text mining for extracting information from unstructured text to better understand substance-related problems among families investigated for abuse or neglect. METHOD: A state child welfare agency provided written summaries from investigations of child abuse and neglect. Expert human reviewers coded 2956 investigation summaries based on whether the caseworker observed a substance-related problem. These coded documents were used to develop, train, and validate computer models that could perform the coding on an automated basis. RESULTS: A set of computer models achieved greater than 90% accuracy when judged against expert human reviewers. Fleiss kappa estimates among computer models and expert human reviewers exceeded .80, indicating that expert human reviewer ratings are exchangeable with the computer models. CONCLUSION: These results provide compelling evidence that text mining procedures can be a cost-effective and efficient solution for extracting meaningful insights from unstructured text data. Additional research is necessary to understand how to extract the actionable insights from these under-utilized stores of data in child welfare.
BACKGROUND: State child welfare agencies collect, store, and manage vast amounts of data. However, they often do not have the right data, or the data is problematic or difficult to inform strategies to improve services and system processes. Considerable resources are required to read and code these text data. Data science and text mining offer potentially efficient and cost-effective strategies for maximizing the value of these data. OBJECTIVE: The current study tests the feasibility of using text mining for extracting information from unstructured text to better understand substance-related problems among families investigated for abuse or neglect. METHOD: A state child welfare agency provided written summaries from investigations of childabuse and neglect. Expert human reviewers coded 2956 investigation summaries based on whether the caseworker observed a substance-related problem. These coded documents were used to develop, train, and validate computer models that could perform the coding on an automated basis. RESULTS: A set of computer models achieved greater than 90% accuracy when judged against expert human reviewers. Fleiss kappa estimates among computer models and expert human reviewers exceeded .80, indicating that expert human reviewer ratings are exchangeable with the computer models. CONCLUSION: These results provide compelling evidence that text mining procedures can be a cost-effective and efficient solution for extracting meaningful insights from unstructured text data. Additional research is necessary to understand how to extract the actionable insights from these under-utilized stores of data in child welfare.
Authors: Braja G Patra; Mohit M Sharma; Veer Vekaria; Prakash Adekkanattu; Olga V Patterson; Benjamin Glicksberg; Lauren A Lepow; Euijung Ryu; Joanna M Biernacka; Al'ona Furmanchuk; Thomas J George; William Hogan; Yonghui Wu; Xi Yang; Jiang Bian; Myrna Weissman; Priya Wickramaratne; J John Mann; Mark Olfson; Thomas R Campion; Mark Weiner; Jyotishman Pathak Journal: J Am Med Inform Assoc Date: 2021-11-25 Impact factor: 7.942