Literature DB >> 36227127

Assigning species information to corresponding genes by a sequence labeling framework.

Ling Luo1, Chih-Hsuan Wei1, Po-Ting Lai1, Qingyu Chen1, Rezarta Islamaj1, Zhiyong Lu1.   

Abstract

The automatic assignment of species information to the corresponding genes in a research article is a critically important step in the gene normalization task, whereby a gene mention is normalized and linked to a database record or an identifier by a text-mining algorithm. Existing methods typically rely on heuristic rules based on gene and species co-occurrence in the article, but their accuracy is suboptimal. We therefore developed a high-performance method, using a novel deep learning-based framework, to identify whether there is a relation between a gene and a species. Instead of the traditional binary classification framework in which all possible pairs of genes and species in the same article are evaluated, we treat the problem as a sequence labeling task such that only a fraction of the pairs needs to be considered. Our benchmarking results show that our approach obtains significantly higher performance compared to that of the rule-based baseline method for the species assignment task (from 65.8-81.3% in accuracy). The source code and data for species assignment are freely available. Database URL https://github.com/ncbi/SpeciesAssignment. Published by Oxford University Press 2022. This work is written by (a) US Government employee(s) and is in the public domain in the US.

Entities:  

Mesh:

Year:  2022        PMID: 36227127      PMCID: PMC9558450          DOI: 10.1093/database/baac090

Source DB:  PubMed          Journal:  Database (Oxford)        ISSN: 1758-0463            Impact factor:   4.462


  22 in total

1.  LINNAEUS: a species name identification system for biomedical literature.

Authors:  Martin Gerner; Goran Nenadic; Casey M Bergman
Journal:  BMC Bioinformatics       Date:  2010-02-11       Impact factor: 3.169

2.  HUNER: improving biomedical NER with pretraining.

Authors:  Leon Weber; Jannes Münchmeyer; Tim Rocktäschel; Maryam Habibi; Ulf Leser
Journal:  Bioinformatics       Date:  2020-01-01       Impact factor: 6.937

3.  NLM-Gene, a richly annotated gold standard dataset for gene entities that addresses ambiguity and multi-species gene recognition.

Authors:  Rezarta Islamaj; Chih-Hsuan Wei; David Cissel; Nicholas Miliaras; Olga Printseva; Oleg Rodionov; Keiko Sekiya; Janice Ward; Zhiyong Lu
Journal:  J Biomed Inform       Date:  2021-04-09       Impact factor: 6.317

4.  GeneTUKit: a software for document-level gene normalization.

Authors:  Minlie Huang; Jingchen Liu; Xiaoyan Zhu
Journal:  Bioinformatics       Date:  2011-02-08       Impact factor: 6.937

5.  The GNAT library for local and remote gene mention normalization.

Authors:  Jörg Hakenberg; Martin Gerner; Maximilian Haeussler; Illés Solt; Conrad Plake; Michael Schroeder; Graciela Gonzalez; Goran Nenadic; Casey M Bergman
Journal:  Bioinformatics       Date:  2011-08-03       Impact factor: 6.937

6.  The gene normalization task in BioCreative III.

Authors:  Zhiyong Lu; Hung-Yu Kao; Chih-Hsuan Wei; Minlie Huang; Jingchen Liu; Cheng-Ju Kuo; Chun-Nan Hsu; Richard Tzong-Han Tsai; Hong-Jie Dai; Naoaki Okazaki; Han-Cheol Cho; Martin Gerner; Illes Solt; Shashank Agarwal; Feifan Liu; Dina Vishnyakova; Patrick Ruch; Martin Romacker; Fabio Rinaldi; Sanmitra Bhattacharya; Padmini Srinivasan; Hongfang Liu; Manabu Torii; Sergio Matos; David Campos; Karin Verspoor; Kevin M Livingston; W John Wilbur
Journal:  BMC Bioinformatics       Date:  2011-10-03       Impact factor: 3.169

7.  On expert curation and scalability: UniProtKB/Swiss-Prot as a case study.

Authors:  Sylvain Poux; Cecilia N Arighi; Michele Magrane; Alex Bateman; Chih-Hsuan Wei; Zhiyong Lu; Emmanuel Boutet; Hema Bye-A-Jee; Maria Livia Famiglietti; Bernd Roechert; The UniProt Consortium
Journal:  Bioinformatics       Date:  2017-11-01       Impact factor: 6.937

8.  LitCovid: an open database of COVID-19 literature.

Authors:  Qingyu Chen; Alexis Allot; Zhiyong Lu
Journal:  Nucleic Acids Res       Date:  2020-11-09       Impact factor: 16.971

9.  LitVar: a semantic search engine for linking genomic variant data in PubMed and PMC.

Authors:  Alexis Allot; Yifan Peng; Chih-Hsuan Wei; Kyubum Lee; Lon Phan; Zhiyong Lu
Journal:  Nucleic Acids Res       Date:  2018-07-02       Impact factor: 16.971

10.  Transfer learning for biomedical named entity recognition with neural networks.

Authors:  John M Giorgi; Gary D Bader
Journal:  Bioinformatics       Date:  2018-12-01       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.