Andriy Mulyar1, Ozlem Uzuner2, Bridget McInnes2. 1. Computer Science Department, Virginia Commonwealth University, Richmond, Virginia, USA. 2. Information Sciences and Technology, George Mason University, Fairfax, Virginia, USA.
Abstract
OBJECTIVE: Clinical notes contain an abundance of important, but not-readily accessible, information about patients. Systems that automatically extract this information rely on large amounts of training data of which there exists limited resources to create. Furthermore, they are developed disjointly, meaning that no information can be shared among task-specific systems. This bottleneck unnecessarily complicates practical application, reduces the performance capabilities of each individual solution, and associates the engineering debt of managing multiple information extraction systems. MATERIALS AND METHODS: We address these challenges by developing Multitask-Clinical BERT: a single deep learning model that simultaneously performs 8 clinical tasks spanning entity extraction, personal health information identification, language entailment, and similarity by sharing representations among tasks. RESULTS: We compare the performance of our multitasking information extraction system to state-of-the-art BERT sequential fine-tuning baselines. We observe a slight but consistent performance degradation in MT-Clinical BERT relative to sequential fine-tuning. DISCUSSION: These results intuitively suggest that learning a general clinical text representation capable of supporting multiple tasks has the downside of losing the ability to exploit dataset or clinical note-specific properties when compared to a single, task-specific model. CONCLUSIONS: We find our single system performs competitively with all state-the-art task-specific systems while also benefiting from massive computational benefits at inference.
OBJECTIVE: Clinical notes contain an abundance of important, but not-readily accessible, information about patients. Systems that automatically extract this information rely on large amounts of training data of which there exists limited resources to create. Furthermore, they are developed disjointly, meaning that no information can be shared among task-specific systems. This bottleneck unnecessarily complicates practical application, reduces the performance capabilities of each individual solution, and associates the engineering debt of managing multiple information extraction systems. MATERIALS AND METHODS: We address these challenges by developing Multitask-Clinical BERT: a single deep learning model that simultaneously performs 8 clinical tasks spanning entity extraction, personal health information identification, language entailment, and similarity by sharing representations among tasks. RESULTS: We compare the performance of our multitasking information extraction system to state-of-the-art BERT sequential fine-tuning baselines. We observe a slight but consistent performance degradation in MT-Clinical BERT relative to sequential fine-tuning. DISCUSSION: These results intuitively suggest that learning a general clinical text representation capable of supporting multiple tasks has the downside of losing the ability to exploit dataset or clinical note-specific properties when compared to a single, task-specific model. CONCLUSIONS: We find our single system performs competitively with all state-the-art task-specific systems while also benefiting from massive computational benefits at inference.
Keywords:
clinical natural language processing, named entity recognition, textual entailment, semantic text similarity; multitask learning; natural language processing
Authors: Alistair E W Johnson; Tom J Pollard; Lu Shen; Li-Wei H Lehman; Mengling Feng; Mohammad Ghassemi; Benjamin Moody; Peter Szolovits; Leo Anthony Celi; Roger G Mark Journal: Sci Data Date: 2016-05-24 Impact factor: 6.444
Authors: Xiyu Ding; Michael Barnett; Ateev Mehrotra; Delphine S Tuot; Danielle S Bitterman; Timothy A Miller Journal: J Am Med Inform Assoc Date: 2022-08-16 Impact factor: 7.942
Authors: Andrew E Blanchard; Shang Gao; Hong-Jun Yoon; J Blair Christian; Eric B Durbin; Xiao-Cheng Wu; Antoinette Stroup; Jennifer Doherty; Stephen M Schwartz; Charles Wiggins; Linda Coyle; Lynne Penberthy; Georgia D Tourassi Journal: IEEE J Biomed Health Inform Date: 2022-06-03 Impact factor: 7.021