Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Harmonization of gene/protein annotations: towards a gold standard MEDLINE.

Literature DB >> 22419783

Harmonization of gene/protein annotations: towards a gold standard MEDLINE.

David Campos¹, Sérgio Matos, Ian Lewin, José Luís Oliveira, Dietrich Rebholz-Schuhmann.

Abstract

MOTIVATION: The recognition of named entities (NER) is an elementary task in biomedical text mining. A number of NER solutions have been proposed in recent years, taking advantage of available annotated corpora, terminological resources and machine-learning techniques. Currently, the best performing solutions combine the outputs from selected annotation solutions measured against a single corpus. However, little effort has been spent on a systematic analysis of methods harmonizing the annotation results and measuring against a combination of Gold Standard Corpora (GSCs).
RESULTS: We present Totum, a machine learning solution that harmonizes gene/protein annotations provided by heterogeneous NER solutions. It has been optimized and measured against a combination of manually curated GSCs. The performed experiments show that our approach improves the F-measure of state-of-the-art solutions by up to 10% (achieving ≈70%) in exact alignment and 22% (achieving ≈82%) in nested alignment. We demonstrate that our solution delivers reliable annotation results across the GSCs and it is an important contribution towards a homogeneous annotation of MEDLINE abstracts.
AVAILABILITY AND IMPLEMENTATION: Totum is implemented in Java and its resources are available at http://bioinformatics.ua.pt/totum

Mesh：

Substances：
Proteins

Year: 2012 PMID： 22419783 DOI： 10.1093/bioinformatics/bts125

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

7 in total

1. Annotation of phenotypes using ontologies: a gold standard for the training and evaluation of natural language processing systems.

Authors: Wasila Dahdul; Prashanti Manda; Hong Cui; James P Balhoff; T Alexander Dececchi; Nizar Ibrahim; Hilmar Lapp; Todd Vision; Paula M Mabee
Journal: Database (Oxford) Date: 2018-01-01 Impact factor: 3.451

2. Gimli: open source and high-performance biomedical name recognition.

Authors: David Campos; Sérgio Matos; José Luís Oliveira
Journal: BMC Bioinformatics Date: 2013-02-15 Impact factor: 3.169

3. AMELIE speeds Mendelian diagnosis by matching patient phenotype and genotype to primary literature.

Authors: Johannes Birgmeier; Maximilian Haeussler; Cole A Deisseroth; Ethan H Steinberg; Karthik A Jagadeesh; Alexander J Ratner; Harendra Guturu; Aaron M Wenger; Mark E Diekhans; Peter D Stenson; David N Cooper; Christopher Ré; Alan H Beggs; Jonathan A Bernstein; Gill Bejerano
Journal: Sci Transl Med Date: 2020-05-20 Impact factor: 19.319

7. A gene-phenotype relationship extraction pipeline from the biomedical literature using a representation learning approach.

Authors: Wenhui Xing; Junsheng Qi; Xiaohui Yuan; Lin Li; Xiaoyu Zhang; Yuhua Fu; Shengwu Xiong; Lun Hu; Jing Peng
Journal: Bioinformatics Date: 2018-07-01 Impact factor: 6.937

7 in total

Harmonization of gene/protein annotations: towards a gold standard MEDLINE.

1. Annotation of phenotypes using ontologies: a gold standard for the training and evaluation of natural language processing systems.

2. Gimli: open source and high-performance biomedical name recognition.

3. AMELIE speeds Mendelian diagnosis by matching patient phenotype and genotype to primary literature.

4. A document processing pipeline for annotating chemical entities in scientific documents.

5. A modular framework for biomedical concept recognition.

6. Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources.

7. A gene-phenotype relationship extraction pipeline from the biomedical literature using a representation learning approach.