Literature DB >> 23304300

Active Learning-based corpus annotation--the PathoJen experience.

Udo Hahn1, Elena Beisswanger, Ekaterina Buyko, Erik Faessler.   

Abstract

We report on basic design decisions and novel annotation procedures underlying the development of PathoJen, a corpus of Medline abstracts annotated for pathological phenomena, including diseases as a proper subclass. This named entity type is known to be hard to delineate and capture by annotation guidelines. We here propose a two-category encoding schema where we distinguish short from long mention spans, the first covering standardized terminology (e.g. diseases), the latter accounting for less structured descriptive statements about norm-deviant states, as well as criteria and observations that might signal pathologies. The second design decision relates to the way annotation instances are sampled. Here we subscribe to an Active Learning-based approach which is known to save annotation costs without sacrificing annotation quality by means of a sample bias. By design, Active Learning picks up 'hard' to annotate instances for human annotators, whereas 'easier' ones are passed over to the automatic classifier whose models already incorporate and gradually improve with previous annotation experience.

Entities:  

Mesh:

Year:  2012        PMID: 23304300      PMCID: PMC3540513     

Source DB:  PubMed          Journal:  AMIA Annu Symp Proc        ISSN: 1559-4076


  10 in total

1.  Active learning with support vector machine applied to gene expression data for cancer classification.

Authors:  Ying Liu
Journal:  J Chem Inf Comput Sci       Date:  2004 Nov-Dec

2.  High-performance gene name normalization with GeNo.

Authors:  Joachim Wermter; Katrin Tomanek; Udo Hahn
Journal:  Bioinformatics       Date:  2009-02-02       Impact factor: 6.937

3.  Building a semantically annotated corpus of clinical texts.

Authors:  Angus Roberts; Robert Gaizauskas; Mark Hepple; George Demetriou; Yikun Guo; Ian Roberts; Andrea Setzer
Journal:  J Biomed Inform       Date:  2009-01-23       Impact factor: 6.317

4.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.

Authors:  Özlem Uzuner; Brett R South; Shuying Shen; Scott L DuVall
Journal:  J Am Med Inform Assoc       Date:  2011-06-16       Impact factor: 4.497

5.  Applying active learning to assertion classification of concepts in clinical text.

Authors:  Yukun Chen; Subramani Mani; Hua Xu
Journal:  J Biomed Inform       Date:  2011-11-22       Impact factor: 6.317

6.  The CLEF corpus: semantic annotation of clinical text.

Authors:  Angus Roberts; Robert Gaizauskas; Mark Hepple; Neil Davis; George Demetriou; Yikun Guo; Jay Kola; Ian Roberts; Andrea Setzer; Archana Tapuria; Bill Wheeldin
Journal:  AMIA Annu Symp Proc       Date:  2007-10-11

7.  Active learning for human protein-protein interaction prediction.

Authors:  Thahir P Mohamed; Jaime G Carbonell; Madhavi K Ganapathiraju
Journal:  BMC Bioinformatics       Date:  2010-01-18       Impact factor: 3.169

8.  An active learning based classification strategy for the minority class problem: application to histopathology annotation.

Authors:  Scott Doyle; James Monaco; Michael Feldman; John Tomaszewski; Anant Madabhushi
Journal:  BMC Bioinformatics       Date:  2011-10-28       Impact factor: 3.169

9.  Assessment of disease named entity recognition on a corpus of annotated sentences.

Authors:  Antonio Jimeno; Ernesto Jimenez-Ruiz; Vivian Lee; Sylvain Gaudan; Rafael Berlanga; Dietrich Rebholz-Schuhmann
Journal:  BMC Bioinformatics       Date:  2008-04-11       Impact factor: 3.169

10.  Accelerating the annotation of sparse named entities by dynamic sentence selection.

Authors:  Yoshimasa Tsuruoka; Jun'ichi Tsujii; Sophia Ananiadou
Journal:  BMC Bioinformatics       Date:  2008-11-19       Impact factor: 3.169

  10 in total
  2 in total

1.  Active deep learning for the identification of concepts and relations in electroencephalography reports.

Authors:  Ramon Maldonado; Sanda M Harabagiu
Journal:  J Biomed Inform       Date:  2019-08-27       Impact factor: 6.317

2.  Active Deep Learning-Based Annotation of Electroencephalography Reports for Cohort Identification.

Authors:  Ramon Maldonado; Travis R Goodwin; Sanda M Harabagiu
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2017-07-26
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.