| Literature DB >> 36227127 |
Ling Luo1, Chih-Hsuan Wei1, Po-Ting Lai1, Qingyu Chen1, Rezarta Islamaj1, Zhiyong Lu1.
Abstract
The automatic assignment of species information to the corresponding genes in a research article is a critically important step in the gene normalization task, whereby a gene mention is normalized and linked to a database record or an identifier by a text-mining algorithm. Existing methods typically rely on heuristic rules based on gene and species co-occurrence in the article, but their accuracy is suboptimal. We therefore developed a high-performance method, using a novel deep learning-based framework, to identify whether there is a relation between a gene and a species. Instead of the traditional binary classification framework in which all possible pairs of genes and species in the same article are evaluated, we treat the problem as a sequence labeling task such that only a fraction of the pairs needs to be considered. Our benchmarking results show that our approach obtains significantly higher performance compared to that of the rule-based baseline method for the species assignment task (from 65.8-81.3% in accuracy). The source code and data for species assignment are freely available. Database URL https://github.com/ncbi/SpeciesAssignment. Published by Oxford University Press 2022. This work is written by (a) US Government employee(s) and is in the public domain in the US.Entities:
Mesh:
Year: 2022 PMID: 36227127 PMCID: PMC9558450 DOI: 10.1093/database/baac090
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 4.462