Mahnoosh Kholghi1, Laurianne Sitbon2, Guido Zuccon2, Anthony Nguyen3. 1. Science and Engineering Faculty, Queensland University of Technology, Brisbane 4000, Queensland, Australia. The Australian e-Health Research Centre, CSIRO, Brisbane 4029, Queensland, Australia m1.kholghi@qut.edu.au. 2. Science and Engineering Faculty, Queensland University of Technology, Brisbane 4000, Queensland, Australia. 3. The Australian e-Health Research Centre, CSIRO, Brisbane 4029, Queensland, Australia.
Abstract
OBJECTIVE: This paper presents an automatic, active learning-based system for the extraction of medical concepts from clinical free-text reports. Specifically, (1) the contribution of active learning in reducing the annotation effort and (2) the robustness of incremental active learning framework across different selection criteria and data sets are determined. MATERIALS AND METHODS: The comparative performance of an active learning framework and a fully supervised approach were investigated to study how active learning reduces the annotation effort while achieving the same effectiveness as a supervised approach. Conditional random fields as the supervised method, and least confidence and information density as 2 selection criteria for active learning framework were used. The effect of incremental learning vs standard learning on the robustness of the models within the active learning framework with different selection criteria was also investigated. The following 2 clinical data sets were used for evaluation: the Informatics for Integrating Biology and the Bedside/Veteran Affairs (i2b2/VA) 2010 natural language processing challenge and the Shared Annotated Resources/Conference and Labs of the Evaluation Forum (ShARe/CLEF) 2013 eHealth Evaluation Lab. RESULTS: The annotation effort saved by active learning to achieve the same effectiveness as supervised learning is up to 77%, 57%, and 46% of the total number of sequences, tokens, and concepts, respectively. Compared with the random sampling baseline, the saving is at least doubled. CONCLUSION: Incremental active learning is a promising approach for building effective and robust medical concept extraction models while significantly reducing the burden of manual annotation.
OBJECTIVE: This paper presents an automatic, active learning-based system for the extraction of medical concepts from clinical free-text reports. Specifically, (1) the contribution of active learning in reducing the annotation effort and (2) the robustness of incremental active learning framework across different selection criteria and data sets are determined. MATERIALS AND METHODS: The comparative performance of an active learning framework and a fully supervised approach were investigated to study how active learning reduces the annotation effort while achieving the same effectiveness as a supervised approach. Conditional random fields as the supervised method, and least confidence and information density as 2 selection criteria for active learning framework were used. The effect of incremental learning vs standard learning on the robustness of the models within the active learning framework with different selection criteria was also investigated. The following 2 clinical data sets were used for evaluation: the Informatics for Integrating Biology and the Bedside/Veteran Affairs (i2b2/VA) 2010 natural language processing challenge and the Shared Annotated Resources/Conference and Labs of the Evaluation Forum (ShARe/CLEF) 2013 eHealth Evaluation Lab. RESULTS: The annotation effort saved by active learning to achieve the same effectiveness as supervised learning is up to 77%, 57%, and 46% of the total number of sequences, tokens, and concepts, respectively. Compared with the random sampling baseline, the saving is at least doubled. CONCLUSION:Incremental active learning is a promising approach for building effective and robust medical concept extraction models while significantly reducing the burden of manual annotation.
Authors: Rosa L Figueroa; Qing Zeng-Treitler; Long H Ngo; Sergey Goryachev; Eduardo P Wiechmann Journal: J Am Med Inform Assoc Date: 2012-06-15 Impact factor: 4.497
Authors: Guido Zuccon; Amol S Wagholikar; Anthony N Nguyen; Luke Butt; Kevin Chu; Shane Martin; Jaimi Greenslade Journal: AMIA Jt Summits Transl Sci Proc Date: 2013-03-18
Authors: Kevin De Angeli; Shang Gao; Mohammed Alawad; Hong-Jun Yoon; Noah Schaefferkoetter; Xiao-Cheng Wu; Eric B Durbin; Jennifer Doherty; Antoinette Stroup; Linda Coyle; Lynne Penberthy; Georgia Tourassi Journal: BMC Bioinformatics Date: 2021-03-09 Impact factor: 3.169