Sungrim Moon1, Bridget McInnes2, Genevieve B Melton3. 1. School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA. 2. Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA. 3. Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA. ; Department of Surgery, University of Minnesota, Minneapolis, MN, USA.
Abstract
OBJECTIVES: Although acronyms and abbreviations in clinical text are used widely on a daily basis, relatively little research has focused upon word sense disambiguation (WSD) of acronyms and abbreviations in the healthcare domain. Since clinical notes have distinctive characteristics, it is unclear whether techniques effective for acronym and abbreviation WSD from biomedical literature are sufficient. METHODS: The authors discuss feature selection for automated techniques and challenges with WSD of acronyms and abbreviations in the clinical domain. RESULTS: There are significant challenges associated with the informal nature of clinical text, such as typographical errors and incomplete sentences; difficulty with insufficient clinical resources, such as clinical sense inventories; and obstacles with privacy and security for conducting research with clinical text. Although we anticipated that using sophisticated techniques, such as biomedical terminologies, semantic types, part-of-speech, and language modeling, would be needed for feature selection with automated machine learning approaches, we found instead that simple techniques, such as bag-of-words, were quite effective in many cases. Factors, such as majority sense prevalence and the degree of separateness between sense meanings, were also important considerations. CONCLUSIONS: The first lesson is that a comprehensive understanding of the unique characteristics of clinical text is important for automatic acronym and abbreviation WSD. The second lesson learned is that investigators may find that using simple approaches is an effective starting point for these tasks. Finally, similar to other WSD tasks, an understanding of baseline majority sense rates and separateness between senses is important. Further studies and practical solutions are needed to better address these issues.
OBJECTIVES: Although acronyms and abbreviations in clinical text are used widely on a daily basis, relatively little research has focused upon word sense disambiguation (WSD) of acronyms and abbreviations in the healthcare domain. Since clinical notes have distinctive characteristics, it is unclear whether techniques effective for acronym and abbreviation WSD from biomedical literature are sufficient. METHODS: The authors discuss feature selection for automated techniques and challenges with WSD of acronyms and abbreviations in the clinical domain. RESULTS: There are significant challenges associated with the informal nature of clinical text, such as typographical errors and incomplete sentences; difficulty with insufficient clinical resources, such as clinical sense inventories; and obstacles with privacy and security for conducting research with clinical text. Although we anticipated that using sophisticated techniques, such as biomedical terminologies, semantic types, part-of-speech, and language modeling, would be needed for feature selection with automated machine learning approaches, we found instead that simple techniques, such as bag-of-words, were quite effective in many cases. Factors, such as majority sense prevalence and the degree of separateness between sense meanings, were also important considerations. CONCLUSIONS: The first lesson is that a comprehensive understanding of the unique characteristics of clinical text is important for automatic acronym and abbreviation WSD. The second lesson learned is that investigators may find that using simple approaches is an effective starting point for these tasks. Finally, similar to other WSD tasks, an understanding of baseline majority sense rates and separateness between senses is important. Further studies and practical solutions are needed to better address these issues.
Entities:
Keywords:
Abbreviations as Topic; Artificial Intelligence; Automated Pattern Recognition; Medical Records; Natural Language Processing
Authors: Tielman T Van Vleck; Lili Chan; Steven G Coca; Catherine K Craven; Ron Do; Stephen B Ellis; Joseph L Kannry; Ruth J F Loos; Peter A Bonis; Judy Cho; Girish N Nadkarni Journal: Int J Med Inform Date: 2019-07-06 Impact factor: 4.046
Authors: Lisa V Grossman; Elliot G Mitchell; George Hripcsak; Chunhua Weng; David K Vawdrey Journal: J Biomed Inform Date: 2018-11-07 Impact factor: 6.317
Authors: Lisa Grossman Liu; Raymond H Grossman; Elliot G Mitchell; Chunhua Weng; Karthik Natarajan; George Hripcsak; David K Vawdrey Journal: Sci Data Date: 2021-06-02 Impact factor: 6.444