Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Facilitating information extraction without annotated data using unsupervised and positive-unlabeled learning.

Literature DB >> 33936440

Facilitating information extraction without annotated data using unsupervised and positive-unlabeled learning.

Zfania Tom Korach^1,2, Sharmitha Yerneni¹, Jonathan Einbinder^2,3, Carl Kallenberg³, Li Zhou^1,2.

Abstract

Information extraction (IE), the distillation of specific information from unstructured data, is a core task in natural language processing. For rare entities (<1% prevalence), collection of positive examples required to train a model may require an infeasibly large sample of mostly negative ones. We combined unsupervised- with biased positive-unlabeled (PU) learning methods to: 1) facilitate positive example collection while maintaining the assumptions needed to 2) learn a binary classifier from the biased positive-unlabeled data alone. We tested the methods on a real-life use case of rare (<0.42%) entity extraction from medical malpractice documents. When tested on a manually reviewed random sample of documents, the PU model achieved an area under the precision-recall curve of0.283 and Fj of 0.410, outperforming fully supervised learning (0.022 and 0.096, respectively). The results demonstrate our method's potential to reduce the manual effort required for extracting rare entities from narrative texts. ©2020 AMIA - All rights reserved.

Mesh：

Year: 2021 PMID： 33936440 PMCID： PMC8075513

Source DB: PubMed Journal: AMIA Annu Symp Proc ISSN： 1559-4076

Keyword Cloud
References

4 in total

1. Using Medical Text Extraction, Reasoning and Mapping System (MTERMS) to process medication information in outpatient clinical notes.

Authors: Li Zhou; Joseph M Plasek; Lisa M Mahoney; Neelima Karipineni; Frank Chang; Xuemin Yan; Fenny Chang; Dana Dimaggio; Debora S Goldman; Roberto A Rocha
Journal: AMIA Annu Symp Proc Date: 2011-10-22

2. Trie-based rule processing for clinical NLP: A use-case study of n-trie, making the ConText algorithm more efficient and scalable.

Authors: Jianlin Shi; John F Hurdle
Journal: J Biomed Inform Date: 2018-08-06 Impact factor: 6.317

3. Mining clinical phrases from nursing notes to discover risk factors of patient deterioration.

Authors: Zfania Tom Korach; Jie Yang; Sarah Collins Rossetti; Kenrick D Cato; Min-Jeoung Kang; Christopher Knaplund; Kumiko O Schnock; Jose P Garcia; Haomiao Jia; Jessica M Schwartz; Li Zhou
Journal: Int J Med Inform Date: 2019-12-14 Impact factor: 4.046

4. BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Authors: Jinhyuk Lee; Wonjin Yoon; Sungdong Kim; Donghyeon Kim; Sunkyu Kim; Chan Ho So; Jaewoo Kang
Journal: Bioinformatics Date: 2020-02-15 Impact factor: 6.937

4 in total