Literature DB >> 22419783

Harmonization of gene/protein annotations: towards a gold standard MEDLINE.

David Campos1, Sérgio Matos, Ian Lewin, José Luís Oliveira, Dietrich Rebholz-Schuhmann.   

Abstract

MOTIVATION: The recognition of named entities (NER) is an elementary task in biomedical text mining. A number of NER solutions have been proposed in recent years, taking advantage of available annotated corpora, terminological resources and machine-learning techniques. Currently, the best performing solutions combine the outputs from selected annotation solutions measured against a single corpus. However, little effort has been spent on a systematic analysis of methods harmonizing the annotation results and measuring against a combination of Gold Standard Corpora (GSCs).
RESULTS: We present Totum, a machine learning solution that harmonizes gene/protein annotations provided by heterogeneous NER solutions. It has been optimized and measured against a combination of manually curated GSCs. The performed experiments show that our approach improves the F-measure of state-of-the-art solutions by up to 10% (achieving ≈70%) in exact alignment and 22% (achieving ≈82%) in nested alignment. We demonstrate that our solution delivers reliable annotation results across the GSCs and it is an important contribution towards a homogeneous annotation of MEDLINE abstracts.
AVAILABILITY AND IMPLEMENTATION: Totum is implemented in Java and its resources are available at http://bioinformatics.ua.pt/totum

Mesh:

Substances:

Year:  2012        PMID: 22419783     DOI: 10.1093/bioinformatics/bts125

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  7 in total

1.  Annotation of phenotypes using ontologies: a gold standard for the training and evaluation of natural language processing systems.

Authors:  Wasila Dahdul; Prashanti Manda; Hong Cui; James P Balhoff; T Alexander Dececchi; Nizar Ibrahim; Hilmar Lapp; Todd Vision; Paula M Mabee
Journal:  Database (Oxford)       Date:  2018-01-01       Impact factor: 3.451

2.  Gimli: open source and high-performance biomedical name recognition.

Authors:  David Campos; Sérgio Matos; José Luís Oliveira
Journal:  BMC Bioinformatics       Date:  2013-02-15       Impact factor: 3.169

3.  AMELIE speeds Mendelian diagnosis by matching patient phenotype and genotype to primary literature.

Authors:  Johannes Birgmeier; Maximilian Haeussler; Cole A Deisseroth; Ethan H Steinberg; Karthik A Jagadeesh; Alexander J Ratner; Harendra Guturu; Aaron M Wenger; Mark E Diekhans; Peter D Stenson; David N Cooper; Christopher Ré; Alan H Beggs; Jonathan A Bernstein; Gill Bejerano
Journal:  Sci Transl Med       Date:  2020-05-20       Impact factor: 19.319

4.  A document processing pipeline for annotating chemical entities in scientific documents.

Authors:  David Campos; Sérgio Matos; José L Oliveira
Journal:  J Cheminform       Date:  2015-01-19       Impact factor: 5.514

5.  A modular framework for biomedical concept recognition.

Authors:  David Campos; Sérgio Matos; José Luís Oliveira
Journal:  BMC Bioinformatics       Date:  2013-09-24       Impact factor: 3.169

6.  Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources.

Authors:  Dietrich Rebholz-Schuhmann; Senay Kafkas; Jee-Hyub Kim; Chen Li; Antonio Jimeno Yepes; Robert Hoehndorf; Rolf Backofen; Ian Lewin
Journal:  J Biomed Semantics       Date:  2013-10-11

7.  A gene-phenotype relationship extraction pipeline from the biomedical literature using a representation learning approach.

Authors:  Wenhui Xing; Junsheng Qi; Xiaohui Yuan; Lin Li; Xiaoyu Zhang; Yuhua Fu; Shengwu Xiong; Lun Hu; Jing Peng
Journal:  Bioinformatics       Date:  2018-07-01       Impact factor: 6.937

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.