Literature DB >> 35856889

A sequence labeling framework for extracting drug-protein relations from biomedical literature.

Ling Luo1, Po-Ting Lai1, Chih-Hsuan Wei1, Zhiyong Lu1.   

Abstract

Automatic extracting interactions between chemical compound/drug and gene/protein are significantly beneficial to drug discovery, drug repurposing, drug design and biomedical knowledge graph construction. To promote the development of the relation extraction between drug and protein, the BioCreative VII challenge organized the DrugProt track. This paper describes the approach we developed for this task. In addition to the conventional text classification framework that has been widely used in relation extraction tasks, we propose a sequence labeling framework to drug-protein relation extraction. We first comprehensively compared the cutting-edge biomedical pre-trained language models for both frameworks. Then, we explored several ensemble methods to further improve the final performance. In the evaluation of the challenge, our best submission (i.e. the ensemble of models in two frameworks via major voting) achieved the F1-score of 0.795 on the official test set. Further, we realized the sequence labeling framework is more efficient and achieves better performance than the text classification framework. Finally, our ensemble of the sequence labeling models with majority voting achieves the best F1-score of 0.800 on the test set. DATABASE URL: https://github.com/lingluodlut/BioCreativeVII_DrugProt. Published by Oxford University Press 2022. This work is written by (a) US Government employee(s) and is in the public domain in the US.

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 35856889      PMCID: PMC9297941          DOI: 10.1093/database/baac058

Source DB:  PubMed          Journal:  Database (Oxford)        ISSN: 1758-0463            Impact factor:   4.462


  17 in total

1.  Automatic early stopping using cross validation: quantifying the criteria.

Authors:  Lutz Prechelt
Journal:  Neural Netw       Date:  1998-06

2.  BioPPISVMExtractor: a protein-protein interaction extractor for biomedical literature using SVM and rich feature sets.

Authors:  Zhihao Yang; Hongfei Lin; Yanpeng Li
Journal:  J Biomed Inform       Date:  2009-08-23       Impact factor: 6.317

3.  A hybrid model based on neural networks for biomedical relation extraction.

Authors:  Yijia Zhang; Hongfei Lin; Zhihao Yang; Jian Wang; Shaowu Zhang; Yuanyuan Sun; Liang Yang
Journal:  J Biomed Inform       Date:  2018-03-27       Impact factor: 6.317

4.  Extracting drug-drug interactions from literature using a rich feature-based linear kernel approach.

Authors:  Sun Kim; Haibin Liu; Lana Yeganova; W John Wilbur
Journal:  J Biomed Inform       Date:  2015-03-19       Impact factor: 6.317

5.  BioRAT: extracting biological information from full-length papers.

Authors:  David P A Corney; Bernard F Buxton; William B Langdon; David T Jones
Journal:  Bioinformatics       Date:  2004-07-01       Impact factor: 6.937

6.  A linguistic rule-based approach to extract drug-drug interactions from pharmacological documents.

Authors:  Isabel Segura-Bedmar; Paloma Martínez; César de Pablo-Sánchez
Journal:  BMC Bioinformatics       Date:  2011-03-29       Impact factor: 3.169

7.  Extracting chemical-protein relations with ensembles of SVM and deep learning models.

Authors:  Yifan Peng; Anthony Rios; Ramakanth Kavuluru; Zhiyong Lu
Journal:  Database (Oxford)       Date:  2018-01-01       Impact factor: 3.451

8.  Comparative Toxicogenomics Database (CTD): update 2021.

Authors:  Allan Peter Davis; Cynthia J Grondin; Robin J Johnson; Daniela Sciaky; Jolene Wiegers; Thomas C Wiegers; Carolyn J Mattingly
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

9.  All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning.

Authors:  Antti Airola; Sampo Pyysalo; Jari Björne; Tapio Pahikkala; Filip Ginter; Tapio Salakoski
Journal:  BMC Bioinformatics       Date:  2008-11-19       Impact factor: 3.169

10.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Authors:  Jinhyuk Lee; Wonjin Yoon; Sungdong Kim; Donghyeon Kim; Sunkyu Kim; Chan Ho So; Jaewoo Kang
Journal:  Bioinformatics       Date:  2020-02-15       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.