Rachel Stemerman1, Jaime Arguello2, Jane Brice3, Ashok Krishnamurthy4, Mary Houston3, Rebecca Kitzmiller5. 1. Carolina Health Informatics Program, The University of North Carolina, Chapel Hill, North Carolina, USA. 2. School of Information and Library Sciences, The University of North Carolina, Chapel Hill, North Carolina, USA. 3. Department of Emergency Medicine, The University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA. 4. Department of Computer Science, The University of North Carolina, Chapel Hill, North Carolina, USA. 5. School of Nursing, The University of North Carolina, Chapel Hill, North Carolina, USA.
Abstract
OBJECTIVES: Social determinants of health (SDH), key contributors to health, are rarely systematically measured and collected in the electronic health record (EHR). We investigate how to leverage clinical notes using novel applications of multi-label learning (MLL) to classify SDH in mental health and substance use disorder patients who frequent the emergency department. METHODS AND MATERIALS: We labeled a gold-standard corpus of EHR clinical note sentences (N = 4063) with 6 identified SDH-related domains recommended by the Institute of Medicine for inclusion in the EHR. We then trained 5 classification models: linear-Support Vector Machine, K-Nearest Neighbors, Random Forest, XGBoost, and bidirectional Long Short-Term Memory (BI-LSTM). We adopted 5 common evaluation measures: accuracy, average precision-recall (AP), area under the curve receiver operating characteristic (AUC-ROC), Hamming loss, and log loss to compare the performance of different methods for MLL classification using the F1 score as the primary evaluation metric. RESULTS: Our results suggested that, overall, BI-LSTM outperformed the other classification models in terms of AUC-ROC (93.9), AP (0.76), and Hamming loss (0.12). The AUC-ROC values of MLL models of SDH related domains varied between (0.59-1.0). We found that 44.6% of our study population (N = 1119) had at least one positive documentation of SDH. DISCUSSION AND CONCLUSION: The proposed approach of training an MLL model on an SDH rich data source can produce a high performing classifier using only unstructured clinical notes. We also provide evidence that model performance is associated with lexical diversity by health professionals and the auto-generation of clinical note sentences to document SDH. Published by Oxford University Press on behalf of the American Medical Informatics Association 2020. This work is written by US Government employees and is in the public domain in the US.
OBJECTIVES: Social determinants of health (SDH), key contributors to health, are rarely systematically measured and collected in the electronic health record (EHR). We investigate how to leverage clinical notes using novel applications of multi-label learning (MLL) to classify SDH in mental health and substance use disorder patients who frequent the emergency department. METHODS AND MATERIALS: We labeled a gold-standard corpus of EHR clinical note sentences (N = 4063) with 6 identified SDH-related domains recommended by the Institute of Medicine for inclusion in the EHR. We then trained 5 classification models: linear-Support Vector Machine, K-Nearest Neighbors, Random Forest, XGBoost, and bidirectional Long Short-Term Memory (BI-LSTM). We adopted 5 common evaluation measures: accuracy, average precision-recall (AP), area under the curve receiver operating characteristic (AUC-ROC), Hamming loss, and log loss to compare the performance of different methods for MLL classification using the F1 score as the primary evaluation metric. RESULTS: Our results suggested that, overall, BI-LSTM outperformed the other classification models in terms of AUC-ROC (93.9), AP (0.76), and Hamming loss (0.12). The AUC-ROC values of MLL models of SDH related domains varied between (0.59-1.0). We found that 44.6% of our study population (N = 1119) had at least one positive documentation of SDH. DISCUSSION AND CONCLUSION: The proposed approach of training an MLL model on an SDH rich data source can produce a high performing classifier using only unstructured clinical notes. We also provide evidence that model performance is associated with lexical diversity by health professionals and the auto-generation of clinical note sentences to document SDH. Published by Oxford University Press on behalf of the American Medical Informatics Association 2020. This work is written by US Government employees and is in the public domain in the US.
Entities:
Keywords:
electronic health records; machine learning; natural language processing; social determinants of health
Authors: R L Okin; A Boccellari; F Azocar; M Shumway; K O'Brien; A Gelb; M Kohn; P Harding; C Wachsmuth Journal: Am J Emerg Med Date: 2000-09 Impact factor: 2.469
Authors: Jesse J Brennan; Theodore C Chan; Renee Y Hsia; Michael P Wilson; Edward M Castillo Journal: Acad Emerg Med Date: 2014-09 Impact factor: 3.451
Authors: Joao H Bettencourt-Silva; Natalia Mulligan; Marco Sbodio; John Segrave-Daly; Richard Williams; Vanessa Lopez; Carlos Alzate Journal: Stud Health Technol Inform Date: 2020-06-16
Authors: Cosmin A Bejan; John Angiolillo; Douglas Conway; Robertson Nash; Jana K Shirey-Rice; Loren Lipworth; Robert M Cronin; Jill Pulley; Sunil Kripalani; Shari Barkin; Kevin B Johnson; Joshua C Denny Journal: J Am Med Inform Assoc Date: 2018-01-01 Impact factor: 4.497
Authors: Todd Lingren; Louise Deleger; Katalin Molnar; Haijun Zhai; Jareen Meinzen-Derr; Megan Kaiser; Laura Stoutenborough; Qi Li; Imre Solti Journal: J Am Med Inform Assoc Date: 2013-09-03 Impact factor: 4.497
Authors: Braja G Patra; Mohit M Sharma; Veer Vekaria; Prakash Adekkanattu; Olga V Patterson; Benjamin Glicksberg; Lauren A Lepow; Euijung Ryu; Joanna M Biernacka; Al'ona Furmanchuk; Thomas J George; William Hogan; Yonghui Wu; Xi Yang; Jiang Bian; Myrna Weissman; Priya Wickramaratne; J John Mann; Mark Olfson; Thomas R Campion; Mark Weiner; Jyotishman Pathak Journal: J Am Med Inform Assoc Date: 2021-11-25 Impact factor: 7.942