Literature DB >> 33936440

Facilitating information extraction without annotated data using unsupervised and positive-unlabeled learning.

Zfania Tom Korach1,2, Sharmitha Yerneni1, Jonathan Einbinder2,3, Carl Kallenberg3, Li Zhou1,2.   

Abstract

Information extraction (IE), the distillation of specific information from unstructured data, is a core task in natural language processing. For rare entities (<1% prevalence), collection of positive examples required to train a model may require an infeasibly large sample of mostly negative ones. We combined unsupervised- with biased positive-unlabeled (PU) learning methods to: 1) facilitate positive example collection while maintaining the assumptions needed to 2) learn a binary classifier from the biased positive-unlabeled data alone. We tested the methods on a real-life use case of rare (<0.42%) entity extraction from medical malpractice documents. When tested on a manually reviewed random sample of documents, the PU model achieved an area under the precision-recall curve of0.283 and Fj of 0.410, outperforming fully supervised learning (0.022 and 0.096, respectively). The results demonstrate our method's potential to reduce the manual effort required for extracting rare entities from narrative texts. ©2020 AMIA - All rights reserved.

Mesh:

Year:  2021        PMID: 33936440      PMCID: PMC8075513     

Source DB:  PubMed          Journal:  AMIA Annu Symp Proc        ISSN: 1559-4076


  4 in total

1.  Using Medical Text Extraction, Reasoning and Mapping System (MTERMS) to process medication information in outpatient clinical notes.

Authors:  Li Zhou; Joseph M Plasek; Lisa M Mahoney; Neelima Karipineni; Frank Chang; Xuemin Yan; Fenny Chang; Dana Dimaggio; Debora S Goldman; Roberto A Rocha
Journal:  AMIA Annu Symp Proc       Date:  2011-10-22

2.  Trie-based rule processing for clinical NLP: A use-case study of n-trie, making the ConText algorithm more efficient and scalable.

Authors:  Jianlin Shi; John F Hurdle
Journal:  J Biomed Inform       Date:  2018-08-06       Impact factor: 6.317

3.  Mining clinical phrases from nursing notes to discover risk factors of patient deterioration.

Authors:  Zfania Tom Korach; Jie Yang; Sarah Collins Rossetti; Kenrick D Cato; Min-Jeoung Kang; Christopher Knaplund; Kumiko O Schnock; Jose P Garcia; Haomiao Jia; Jessica M Schwartz; Li Zhou
Journal:  Int J Med Inform       Date:  2019-12-14       Impact factor: 4.046

4.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Authors:  Jinhyuk Lee; Wonjin Yoon; Sungdong Kim; Donghyeon Kim; Sunkyu Kim; Chan Ho So; Jaewoo Kang
Journal:  Bioinformatics       Date:  2020-02-15       Impact factor: 6.937

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.