Sunyang Fu1, Bjoerg Thorsteinsdottir2, Xin Zhang2, Guilherme S Lopes3, Sandeep R Pagali2, Nathan K LeBrasseur4, Andrew Wen5, Hongfang Liu5, Walter A Rocca6, Janet E Olson3, Jennifer St Sauver3, Sunghwan Sohn7. 1. Department of AI and Informatics, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA; University of Minnesota, Minneapolis, MN 55455, USA. 2. Department of Medicine, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA. 3. Department of Quantitative Health Sciences, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA. 4. Department of Physical Medicine & Rehabilitation, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA; Department of Physiology & Biomedical Engineering, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA. 5. Department of AI and Informatics, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA. 6. Department of Quantitative Health Sciences, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA; Department of Neurology, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA; Women's Health Research Center, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA. 7. Department of AI and Informatics, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA. Electronic address: Sohn.Sunghwan@mayo.edu.
Abstract
INTRODUCTION: Falls are a leading cause of unintentional injury in the elderly. Electronic health records (EHRs) offer the unique opportunity to develop models that can identify fall events. However, identifying fall events in clinical notes requires advanced natural language processing (NLP) to simultaneously address multiple issues because the word "fall" is a typical homonym. METHODS: We implemented a context-aware language model, Bidirectional Encoder Representations from Transformers (BERT) to identify falls from the EHR text and further fused the BERT model into a hybrid architecture coupled with post-hoc heuristic rules to enhance the performance. The models were evaluated on real world EHR data and were compared to conventional rule-based and deep learning models (CNN and Bi-LSTM). To better understand the ability of each approach to identify falls, we further categorize fall-related concepts (i.e., risk of fall, prevention of fall, homonym) and performed a detailed error analysis. RESULTS: The hybrid model achieved the highest f1-score on sentence (0.971), document (0.985), and patient (0.954) level. At the sentence level (basic data unit in the model), the hybrid model had 0.954, 1.000, 0.988, and 0.999 in sensitivity, specificity, positive predictive value, and negative predictive value, respectively. The error analysis showed that that machine learning-based approaches demonstrated higher performance than a rule-based approach in challenging cases that required contextual understanding. The context-aware language model (BERT) slightly outperformed the word embedding approach trained on Bi-LSTM. No single model yielded the best performance for all fall-related semantic categories. CONCLUSION: A context-aware language model (BERT) was able to identify challenging fall events that requires context understanding in EHR free text. The hybrid model combined with post-hoc rules allowed a custom fix on the BERT outcomes and further improved the performance of fall detection.
INTRODUCTION: Falls are a leading cause of unintentional injury in the elderly. Electronic health records (EHRs) offer the unique opportunity to develop models that can identify fall events. However, identifying fall events in clinical notes requires advanced natural language processing (NLP) to simultaneously address multiple issues because the word "fall" is a typical homonym. METHODS: We implemented a context-aware language model, Bidirectional Encoder Representations from Transformers (BERT) to identify falls from the EHR text and further fused the BERT model into a hybrid architecture coupled with post-hoc heuristic rules to enhance the performance. The models were evaluated on real world EHR data and were compared to conventional rule-based and deep learning models (CNN and Bi-LSTM). To better understand the ability of each approach to identify falls, we further categorize fall-related concepts (i.e., risk of fall, prevention of fall, homonym) and performed a detailed error analysis. RESULTS: The hybrid model achieved the highest f1-score on sentence (0.971), document (0.985), and patient (0.954) level. At the sentence level (basic data unit in the model), the hybrid model had 0.954, 1.000, 0.988, and 0.999 in sensitivity, specificity, positive predictive value, and negative predictive value, respectively. The error analysis showed that that machine learning-based approaches demonstrated higher performance than a rule-based approach in challenging cases that required contextual understanding. The context-aware language model (BERT) slightly outperformed the word embedding approach trained on Bi-LSTM. No single model yielded the best performance for all fall-related semantic categories. CONCLUSION: A context-aware language model (BERT) was able to identify challenging fall events that requires context understanding in EHR free text. The hybrid model combined with post-hoc rules allowed a custom fix on the BERT outcomes and further improved the performance of fall detection.
Authors: Janet E Olson; Euijung Ryu; Kiley J Johnson; Barbara A Koenig; Karen J Maschke; Jody A Morrisette; Mark Liebow; Paul Y Takahashi; Zachary S Fredericksen; Ruchi G Sharma; Kari S Anderson; Matthew A Hathcock; Jason A Carnahan; Jyotishman Pathak; Noralane M Lindor; Timothy J Beebe; Stephen N Thibodeau; James R Cerhan Journal: Mayo Clin Proc Date: 2013-09 Impact factor: 7.616
Authors: James A McCart; Donald J Berndt; Jay Jarman; Dezon K Finch; Stephen L Luther Journal: J Am Med Inform Assoc Date: 2012-12-15 Impact factor: 4.497
Authors: Brian W Patterson; Gwen C Jacobsohn; Manish N Shah; Yiqiang Song; Apoorva Maru; Arjun K Venkatesh; Monica Zhong; Katherine Taylor; Azita G Hamedani; Eneida A Mendonça Journal: BMC Med Inform Decis Mak Date: 2019-07-22 Impact factor: 2.796
Authors: Janet E Olson; Euijung Ryu; Matthew A Hathcock; Ruchi Gupta; Joshua T Bublitz; Paul Y Takahashi; Suzette J Bielinski; Jennifer L St Sauver; Karen Meagher; Richard R Sharp; Stephen N Thibodeau; Mine Cicek; James R Cerhan Journal: BMJ Open Date: 2019-11-06 Impact factor: 2.692
Authors: Sunyang Fu; Lester Y Leung; Anne-Olivia Raulli; David F Kallmes; Kristin A Kinsman; Kristoff B Nelson; Michael S Clark; Patrick H Luetmer; Paul R Kingsbury; David M Kent; Hongfang Liu Journal: BMC Med Inform Decis Mak Date: 2020-03-30 Impact factor: 2.796