Literature DB >> 20183881

CALBC silver standard corpus.

Dietrich Rebholz-Schuhmann1, Antonio José Jimeno Yepes, Erik M Van Mulligen, Ning Kang, Jan Kors, David Milward, Peter Corbett, Ekaterina Buyko, Elena Beisswanger, Udo Hahn.   

Abstract

The CALBC initiative aims to provide a large-scale biomedical text corpus that contains semantic annotations for named entities of different kinds. The generation of this corpus requires that the annotations from different automatic annotation systems be harmonized. In the first phase, the annotation systems from five participants (EMBL-EBI, EMC Rotterdam, NLM, JULIE Lab Jena, and Linguamatics) were gathered. All annotations were delivered in a common annotation format that included concept identifiers in the boundary assignments and that enabled comparison and alignment of the results. During the harmonization phase, the results produced from those different systems were integrated in a single harmonized corpus ("silver standard" corpus) by applying a voting scheme. We give an overview of the processed data and the principles of harmonization--formal boundary reconciliation and semantic matching of named entities. Finally, all submissions of the participants were evaluated against that silver standard corpus. We found that species and disease annotations are better standardized amongst the partners than the annotations of genes and proteins. The raw corpus is now available for additional named entity annotations. Parts of it will be made available later on for a public challenge. We expect that we can improve corpus building activities both in terms of the numbers of named entity classes being covered, as well as the size of the corpus in terms of annotated documents.

Entities:  

Mesh:

Year:  2010        PMID: 20183881     DOI: 10.1142/s0219720010004562

Source DB:  PubMed          Journal:  J Bioinform Comput Biol        ISSN: 0219-7200            Impact factor:   1.122


  28 in total

1.  Annotation of phenotypes using ontologies: a gold standard for the training and evaluation of natural language processing systems.

Authors:  Wasila Dahdul; Prashanti Manda; Hong Cui; James P Balhoff; T Alexander Dececchi; Nizar Ibrahim; Hilmar Lapp; Todd Vision; Paula M Mabee
Journal:  Database (Oxford)       Date:  2018-01-01       Impact factor: 3.451

Review 2.  Community challenges in biomedical text mining over 10 years: success, failure and the future.

Authors:  Chung-Chi Huang; Zhiyong Lu
Journal:  Brief Bioinform       Date:  2015-05-01       Impact factor: 11.622

3.  tmChem: a high performance approach for chemical named entity recognition and normalization.

Authors:  Robert Leaman; Chih-Hsuan Wei; Zhiyong Lu
Journal:  J Cheminform       Date:  2015-01-19       Impact factor: 5.514

4.  Wide-coverage relation extraction from MEDLINE using deep syntax.

Authors:  Nhung T H Nguyen; Makoto Miwa; Yoshimasa Tsuruoka; Takashi Chikayama; Satoshi Tojo
Journal:  BMC Bioinformatics       Date:  2015-04-01       Impact factor: 3.169

5.  Mining the pharmacogenomics literature--a survey of the state of the art.

Authors:  Udo Hahn; K Bretonnel Cohen; Yael Garten; Nigam H Shah
Journal:  Brief Bioinform       Date:  2012-07       Impact factor: 11.622

6.  TaggerOne: joint named entity recognition and normalization with semi-Markov Models.

Authors:  Robert Leaman; Zhiyong Lu
Journal:  Bioinformatics       Date:  2016-06-09       Impact factor: 6.937

7.  Exploring and linking biomedical resources through multidimensional semantic spaces.

Authors:  Rafael Berlanga; Ernesto Jiménez-Ruiz; Victoria Nebot
Journal:  BMC Bioinformatics       Date:  2012-01-25       Impact factor: 3.169

8.  Concept annotation in the CRAFT corpus.

Authors:  Michael Bada; Miriam Eckert; Donald Evans; Kristin Garcia; Krista Shipley; Dmitry Sitnikov; William A Baumgartner; K Bretonnel Cohen; Karin Verspoor; Judith A Blake; Lawrence E Hunter
Journal:  BMC Bioinformatics       Date:  2012-07-09       Impact factor: 3.169

9.  Assessment of NER solutions against the first and second CALBC Silver Standard Corpus.

Authors:  Dietrich Rebholz-Schuhmann; Antonio Jimeno Yepes; Chen Li; Senay Kafkas; Ian Lewin; Ning Kang; Peter Corbett; David Milward; Ekaterina Buyko; Elena Beisswanger; Kerstin Hornbostel; Alexandre Kouznetsov; René Witte; Jonas B Laurila; Christopher Jo Baker; Cheng-Ju Kuo; Simone Clematide; Fabio Rinaldi; Richárd Farkas; György Móra; Kazuo Hara; Laura I Furlong; Michael Rautschka; Mariana Lara Neves; Alberto Pascual-Montano; Qi Wei; Nigel Collier; Md Faisal Mahbub Chowdhury; Alberto Lavelli; Rafael Berlanga; Roser Morante; Vincent Van Asch; Walter Daelemans; José Luís Marina; Erik van Mulligen; Jan Kors; Udo Hahn
Journal:  J Biomed Semantics       Date:  2011-10-06

10.  Overview of the BioCreative III Workshop.

Authors:  Cecilia N Arighi; Zhiyong Lu; Martin Krallinger; Kevin B Cohen; W John Wilbur; Alfonso Valencia; Lynette Hirschman; Cathy H Wu
Journal:  BMC Bioinformatics       Date:  2011-10-03       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.