Sumithra Velupillai1, Maria Skeppstedt2, Maria Kvist3, Danielle Mowery4, Brian E Chapman5, Hercules Dalianis6, Wendy W Chapman7. 1. Department of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, 164 40 Kista, Sweden. Electronic address: sumithra@dsv.su.se. 2. Department of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, 164 40 Kista, Sweden. Electronic address: mariask@dsv.su.se. 3. Department of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, 164 40 Kista, Sweden; Department of Learning, Informatics, Management and Ethics (LIME), Karolinska Institutet, Widerström Building, Tomtebodavägen 18A, Solna, Sweden. Electronic address: maria.kvist@karolinska.se. 4. Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, BAUM 423, Pittsburgh, PA 15206-3701, United States. Electronic address: dlm31@pitt.edu. 5. Department of Radiology, University of Utah, 729 Arapeen Drive, Salt Lake City, UT 84108, United States. Electronic address: brian.chapman@utah.edu. 6. Department of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, 164 40 Kista, Sweden. Electronic address: hercules@dsv.su.se. 7. Department of Biomedical Informatics, University of Utah, 26 South 2000 East, Room 5775 HSEB, Salt Lake City, UT 84112-5775, United States. Electronic address: wendy.chapman@utah.edu.
Abstract
OBJECTIVE: The ability of a cue-based system to accurately assert whether a disorder is affirmed, negated, or uncertain is dependent, in part, on its cue lexicon. In this paper, we continue our study of porting an assertion system (pyConTextNLP) from English to Swedish (pyConTextSwe) by creating an optimized assertion lexicon for clinical Swedish. METHODS AND MATERIAL: We integrated cues from four external lexicons, along with generated inflections and combinations. We used subsets of a clinical corpus in Swedish. We applied four assertion classes (definite existence, probable existence, probable negated existence and definite negated existence) and two binary classes (existence yes/no and uncertainty yes/no) to pyConTextSwe. We compared pyConTextSwe's performance with and without the added cues on a development set, and improved the lexicon further after an error analysis. On a separate evaluation set, we calculated the system's final performance. RESULTS: Following integration steps, we added 454 cues to pyConTextSwe. The optimized lexicon developed after an error analysis resulted in statistically significant improvements on the development set (83% F-score, overall). The system's final F-scores on an evaluation set were 81% (overall). For the individual assertion classes, F-score results were 88% (definite existence), 81% (probable existence), 55% (probable negated existence), and 63% (definite negated existence). For the binary classifications existence yes/no and uncertainty yes/no, final system performance was 97%/87% and 78%/86% F-score, respectively. CONCLUSIONS: We have successfully ported pyConTextNLP to Swedish (pyConTextSwe). We have created an extensive and useful assertion lexicon for Swedish clinical text, which could form a valuable resource for similar studies, and which is publicly available.
OBJECTIVE: The ability of a cue-based system to accurately assert whether a disorder is affirmed, negated, or uncertain is dependent, in part, on its cue lexicon. In this paper, we continue our study of porting an assertion system (pyConTextNLP) from English to Swedish (pyConTextSwe) by creating an optimized assertion lexicon for clinical Swedish. METHODS AND MATERIAL: We integrated cues from four external lexicons, along with generated inflections and combinations. We used subsets of a clinical corpus in Swedish. We applied four assertion classes (definite existence, probable existence, probable negated existence and definite negated existence) and two binary classes (existence yes/no and uncertainty yes/no) to pyConTextSwe. We compared pyConTextSwe's performance with and without the added cues on a development set, and improved the lexicon further after an error analysis. On a separate evaluation set, we calculated the system's final performance. RESULTS: Following integration steps, we added 454 cues to pyConTextSwe. The optimized lexicon developed after an error analysis resulted in statistically significant improvements on the development set (83% F-score, overall). The system's final F-scores on an evaluation set were 81% (overall). For the individual assertion classes, F-score results were 88% (definite existence), 81% (probable existence), 55% (probable negated existence), and 63% (definite negated existence). For the binary classifications existence yes/no and uncertainty yes/no, final system performance was 97%/87% and 78%/86% F-score, respectively. CONCLUSIONS: We have successfully ported pyConTextNLP to Swedish (pyConTextSwe). We have created an extensive and useful assertion lexicon for Swedish clinical text, which could form a valuable resource for similar studies, and which is publicly available.
Authors: Cheryl Clark; John Aberdeen; Matt Coarr; David Tresner-Kirsch; Ben Wellner; Alexander Yeh; Lynette Hirschman Journal: J Am Med Inform Assoc Date: 2011-04-22 Impact factor: 4.497
Authors: Berry de Bruijn; Colin Cherry; Svetlana Kiritchenko; Joel Martin; Xiaodan Zhu Journal: J Am Med Inform Assoc Date: 2011-05-12 Impact factor: 4.497
Authors: Eva S Klappe; Florentien J P van Putten; Nicolette F de Keizer; Ronald Cornet Journal: BMC Med Inform Decis Mak Date: 2021-04-07 Impact factor: 2.796