Literature DB >> 29796778

Putting hands to rest: efficient deep CNN-RNN architecture for chemical named entity recognition with no hand-crafted rules.

Ilia Korvigo1,2,3, Maxim Holmatov4,5, Anatolii Zaikovskii6, Mikhail Skoblov7,8,9.   

Abstract

Chemical named entity recognition (NER) is an active field of research in biomedical natural language processing. To facilitate the development of new and superior chemical NER systems, BioCreative released the CHEMDNER corpus, an extensive dataset of diverse manually annotated chemical entities. Most of the systems trained on the corpus rely on complicated hand-crafted rules or curated databases for data preprocessing, feature extraction and output post-processing, though modern machine learning algorithms, such as deep neural networks, can automatically design the rules with little to none human intervention. Here we explored this approach by experimenting with various deep learning architectures for targeted tokenisation and named entity recognition. Our final model, based on a combination of convolutional and stateful recurrent neural networks with attention-like loops and hybrid word- and character-level embeddings, reaches near human-level performance on the testing dataset with no manually asserted rules. To make our model easily accessible for standalone use and integration in third-party software, we've developed a Python package with a minimalistic user interface.

Entities:  

Keywords:  Biocreative; Chemdner; Chemical; Conditional random fields; Convolutional neural network; Deep learning; Named entities recognition; Neural attention; Recurrent neural network; Text mining; Tokenisation

Year:  2018        PMID: 29796778      PMCID: PMC5966369          DOI: 10.1186/s13321-018-0280-0

Source DB:  PubMed          Journal:  J Cheminform        ISSN: 1758-2946            Impact factor:   5.514


  14 in total

1.  ChemSpot: a hybrid system for chemical named entity recognition.

Authors:  Tim Rocktäschel; Michael Weidlich; Ulf Leser
Journal:  Bioinformatics       Date:  2012-04-12       Impact factor: 6.937

2.  Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization.

Authors:  Hong-Jie Dai; Po-Ting Lai; Yung-Chun Chang; Richard Tzong-Han Tsai
Journal:  J Cheminform       Date:  2015-01-19       Impact factor: 5.514

3.  A CRF-based system for recognizing chemical entity mentions (CEMs) in biomedical literature.

Authors:  Shuo Xu; Xin An; Lijun Zhu; Yunliang Zhang; Haodong Zhang
Journal:  J Cheminform       Date:  2015-01-19       Impact factor: 5.514

4.  A comparison of conditional random fields and structured support vector machines for chemical entity recognition in biomedical literature.

Authors:  Buzhou Tang; Yudong Feng; Xiaolong Wang; Yonghui Wu; Yaoyun Zhang; Min Jiang; Jingqi Wang; Hua Xu
Journal:  J Cheminform       Date:  2015-01-19       Impact factor: 5.514

5.  ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature.

Authors:  Matthew C Swain; Jacqueline M Cole
Journal:  J Chem Inf Model       Date:  2016-10-06       Impact factor: 4.956

6.  A document processing pipeline for annotating chemical entities in scientific documents.

Authors:  David Campos; Sérgio Matos; José L Oliveira
Journal:  J Cheminform       Date:  2015-01-19       Impact factor: 5.514

7.  CHEMDNER: The drugs and chemical names extraction challenge.

Authors:  Martin Krallinger; Florian Leitner; Obdulia Rabal; Miguel Vazquez; Julen Oyarzabal; Alfonso Valencia
Journal:  J Cheminform       Date:  2015-01-19       Impact factor: 5.514

8.  The CHEMDNER corpus of chemicals and drugs and its annotation principles.

Authors:  Martin Krallinger; Obdulia Rabal; Florian Leitner; Miguel Vazquez; David Salgado; Zhiyong Lu; Robert Leaman; Yanan Lu; Donghong Ji; Daniel M Lowe; Roger A Sayle; Riza Theresa Batista-Navarro; Rafal Rak; Torsten Huber; Tim Rocktäschel; Sérgio Matos; David Campos; Buzhou Tang; Hua Xu; Tsendsuren Munkhdalai; Keun Ho Ryu; S V Ramanan; Senthil Nathan; Slavko Žitnik; Marko Bajec; Lutz Weber; Matthias Irmer; Saber A Akhondi; Jan A Kors; Shuo Xu; Xin An; Utpal Kumar Sikdar; Asif Ekbal; Masaharu Yoshioka; Thaer M Dieb; Miji Choi; Karin Verspoor; Madian Khabsa; C Lee Giles; Hongfang Liu; Komandur Elayavilli Ravikumar; Andre Lamurias; Francisco M Couto; Hong-Jie Dai; Richard Tzong-Han Tsai; Caglar Ata; Tolga Can; Anabel Usié; Rui Alves; Isabel Segura-Bedmar; Paloma Martínez; Julen Oyarzabal; Alfonso Valencia
Journal:  J Cheminform       Date:  2015-01-19       Impact factor: 5.514

9.  Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations.

Authors:  Tsendsuren Munkhdalai; Meijing Li; Khuyagbaatar Batsuren; Hyeon Ah Park; Nak Hyeon Choi; Keun Ho Ryu
Journal:  J Cheminform       Date:  2015-01-19       Impact factor: 5.514

10.  GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text.

Authors:  Qile Zhu; Xiaolin Li; Ana Conesa; Cécile Pereira
Journal:  Bioinformatics       Date:  2018-05-01       Impact factor: 6.937

View more
  8 in total

1.  Ontology-Based Healthcare Named Entity Recognition from Twitter Messages Using a Recurrent Neural Network Approach.

Authors:  Erdenebileg Batbaatar; Keun Ho Ryu
Journal:  Int J Environ Res Public Health       Date:  2019-09-27       Impact factor: 3.390

2.  Dataset of solution-based inorganic materials synthesis procedures extracted from the scientific literature.

Authors:  Zheren Wang; Olga Kononova; Kevin Cruse; Tanjin He; Haoyan Huo; Yuxing Fei; Yan Zeng; Yingzhi Sun; Zijian Cai; Wenhao Sun; Gerbrand Ceder
Journal:  Sci Data       Date:  2022-05-25       Impact factor: 8.501

3.  Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies.

Authors:  Nadezhda Biziukova; Olga Tarasova; Sergey Ivanov; Vladimir Poroikov
Journal:  Front Genet       Date:  2020-12-22       Impact factor: 4.599

Review 4.  Opportunities and challenges of text mining in aterials research.

Authors:  Olga Kononova; Tanjin He; Haoyan Huo; Amalie Trewartha; Elsa A Olivetti; Gerbrand Ceder
Journal:  iScience       Date:  2021-02-06

5.  Concept recognition as a machine translation problem.

Authors:  Mayla R Boguslav; Negacy D Hailu; Michael Bada; William A Baumgartner; Lawrence E Hunter
Journal:  BMC Bioinformatics       Date:  2021-12-17       Impact factor: 3.169

6.  Single Model for Organic and Inorganic Chemical Named Entity Recognition in ChemDataExtractor.

Authors:  Taketomo Isazawa; Jacqueline M Cole
Journal:  J Chem Inf Model       Date:  2022-02-24       Impact factor: 6.162

7.  Quantifying the advantage of domain-specific pre-training on named entity recognition tasks in materials science.

Authors:  Amalie Trewartha; Nicholas Walker; Haoyan Huo; Sanghoon Lee; Kevin Cruse; John Dagdelen; Alexander Dunn; Kristin A Persson; Gerbrand Ceder; Anubhav Jain
Journal:  Patterns (N Y)       Date:  2022-04-08

8.  Chemical named entity recognition in the texts of scientific publications using the naïve Bayes classifier approach.

Authors:  O A Tarasova; A V Rudik; N Yu Biziukova; D A Filimonov; V V Poroikov
Journal:  J Cheminform       Date:  2022-08-13       Impact factor: 8.489

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.