Literature DB >> 35776534

Chemical identification and indexing in PubMed full-text articles using deep learning and heuristics.

Tiago Almeida1, Rui Antunes1, João F Silva1, João R Almeida1,2, Sérgio Matos1.   

Abstract

The identification of chemicals in articles has attracted a large interest in the biomedical scientific community, given its importance in drug development research. Most of previous research have focused on PubMed abstracts, and further investigation using full-text documents is required because these contain additional valuable information that must be explored. The manual expert task of indexing Medical Subject Headings (MeSH) terms to these articles later helps researchers find the most relevant publications for their ongoing work. The BioCreative VII NLM-Chem track fostered the development of systems for chemical identification and indexing in PubMed full-text articles. Chemical identification consisted in identifying the chemical mentions and linking these to unique MeSH identifiers. This manuscript describes our participation system and the post-challenge improvements we made. We propose a three-stage pipeline that individually performs chemical mention detection, entity normalization and indexing. Regarding chemical identification, we adopted a deep-learning solution that utilizes the PubMedBERT contextualized embeddings followed by a multilayer perceptron and a conditional random field tagging layer. For the normalization approach, we use a sieve-based dictionary filtering followed by a deep-learning similarity search strategy. Finally, for the indexing we developed rules for identifying the more relevant MeSH codes for each article. During the challenge, our system obtained the best official results in the normalization and indexing tasks despite the lower performance in the chemical mention recognition task. In a post-contest phase we boosted our results by improving our named entity recognition model with additional techniques. The final system achieved 0.8731, 0.8275 and 0.4849 in the chemical identification, normalization and indexing tasks, respectively. The code to reproduce our experiments and run the pipeline is publicly available. Database URL https://github.com/bioinformatics-ua/biocreativeVII_track2.
© The Author(s) 2022. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2022        PMID: 35776534      PMCID: PMC9248917          DOI: 10.1093/database/baac047

Source DB:  PubMed          Journal:  Database (Oxford)        ISSN: 1758-0463            Impact factor:   4.462


  57 in total

1.  Medical Subject Headings (MeSH).

Authors:  C E Lipscomb
Journal:  Bull Med Libr Assoc       Date:  2000-07

2.  Extraction of gene-disease relations from Medline using domain dictionaries and machine learning.

Authors:  Hong-Woo Chun; Yoshimasa Tsuruoka; Jin-Dong Kim; Rie Shiba; Naoki Nagata; Teruyoshi Hishiki; Jun'ichi Tsujii
Journal:  Pac Symp Biocomput       Date:  2006

3.  Biomedical named entity recognition and linking datasets: survey and our recent development.

Authors:  Ming-Siang Huang; Po-Ting Lai; Pei-Yen Lin; Yu-Ting You; Richard Tzong-Han Tsai; Wen-Lian Hsu
Journal:  Brief Bioinform       Date:  2020-12-01       Impact factor: 11.622

4.  NewsMeSH: A new classifier designed to annotate health news with MeSH headings.

Authors:  Joao Pita Costa; Luis Rei; Luka Stopar; Flavio Fuart; Marko Grobelnik; Dunja Mladenić; Inna Novalija; Anthony Staines; Jarmo Pääkkönen; Jenni Konttila; Joseba Bidaurrazaga; Oihana Belar; Christine Henderson; Gorka Epelde; Mónica Arrúe Gabaráin; Paul Carlin; Jonathan Wallace
Journal:  Artif Intell Med       Date:  2021-03-13       Impact factor: 5.326

5.  An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition.

Authors:  Ling Luo; Zhihao Yang; Pei Yang; Yin Zhang; Lei Wang; Hongfei Lin; Jian Wang
Journal:  Bioinformatics       Date:  2018-04-15       Impact factor: 6.937

6.  FullMeSH: improving large-scale MeSH indexing with full text.

Authors:  Suyang Dai; Ronghui You; Zhiyong Lu; Xiaodi Huang; Hiroshi Mamitsuka; Shanfeng Zhu
Journal:  Bioinformatics       Date:  2020-03-01       Impact factor: 6.937

7.  Comparative analysis of five protein-protein interaction corpora.

Authors:  Sampo Pyysalo; Antti Airola; Juho Heimonen; Jari Björne; Filip Ginter; Tapio Salakoski
Journal:  BMC Bioinformatics       Date:  2008-04-11       Impact factor: 3.169

8.  The CHEMDNER corpus of chemicals and drugs and its annotation principles.

Authors:  Martin Krallinger; Obdulia Rabal; Florian Leitner; Miguel Vazquez; David Salgado; Zhiyong Lu; Robert Leaman; Yanan Lu; Donghong Ji; Daniel M Lowe; Roger A Sayle; Riza Theresa Batista-Navarro; Rafal Rak; Torsten Huber; Tim Rocktäschel; Sérgio Matos; David Campos; Buzhou Tang; Hua Xu; Tsendsuren Munkhdalai; Keun Ho Ryu; S V Ramanan; Senthil Nathan; Slavko Žitnik; Marko Bajec; Lutz Weber; Matthias Irmer; Saber A Akhondi; Jan A Kors; Shuo Xu; Xin An; Utpal Kumar Sikdar; Asif Ekbal; Masaharu Yoshioka; Thaer M Dieb; Miji Choi; Karin Verspoor; Madian Khabsa; C Lee Giles; Hongfang Liu; Komandur Elayavilli Ravikumar; Andre Lamurias; Francisco M Couto; Hong-Jie Dai; Richard Tzong-Han Tsai; Caglar Ata; Tolga Can; Anabel Usié; Rui Alves; Isabel Segura-Bedmar; Paloma Martínez; Julen Oyarzabal; Alfonso Valencia
Journal:  J Cheminform       Date:  2015-01-19       Impact factor: 5.514

9.  BioCreative V CDR task corpus: a resource for chemical disease relation extraction.

Authors:  Jiao Li; Yueping Sun; Robin J Johnson; Daniela Sciaky; Chih-Hsuan Wei; Robert Leaman; Allan Peter Davis; Carolyn J Mattingly; Thomas C Wiegers; Zhiyong Lu
Journal:  Database (Oxford)       Date:  2016-05-09       Impact factor: 3.451

10.  Biomedical and clinical English model packages for the Stanza Python NLP library.

Authors:  Yuhao Zhang; Yuhui Zhang; Peng Qi; Christopher D Manning; Curtis P Langlotz
Journal:  J Am Med Inform Assoc       Date:  2021-06-22       Impact factor: 4.497

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.