Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A comparison of conditional random fields and structured support vector machines for chemical entity recognition in biomedical literature.

Literature DB >> 25810779

A comparison of conditional random fields and structured support vector machines for chemical entity recognition in biomedical literature.

Buzhou Tang¹, Yudong Feng², Xiaolong Wang³, Yonghui Wu⁴, Yaoyun Zhang⁴, Min Jiang⁴, Jingqi Wang⁴, Hua Xu⁴.

Abstract

BACKGROUND: Chemical compounds and drugs (together called chemical entities) embedded in scientific articles are crucial for many information extraction tasks in the biomedical domain. However, only a very limited number of chemical entity recognition systems are publically available, probably due to the lack of large manually annotated corpora. To accelerate the development of chemical entity recognition systems, the Spanish National Cancer Research Center (CNIO) and The University of Navarra organized a challenge on Chemical and Drug Named Entity Recognition (CHEMDNER). The CHEMDNER challenge contains two individual subtasks: 1) Chemical Entity Mention recognition (CEM); and 2) Chemical Document Indexing (CDI). Our study proposes machine learning-based systems for the CEM task.
METHODS: The 2013 CHEMDNER challenge organizers provided a manually annotated 10,000 UTF8-encoded PubMed abstracts according to a predefined annotation guideline: a training set of 3,500 abstracts, a development set of 3,500 abstracts and a test set of 3,000 abstracts. We developed machine learning-based systems, based on conditional random fields (CRF) and structured support vector machines (SSVM) respectively, for the CEM task for this data set. The effects of three types of word representation (WR) features, generated by Brown clustering, random indexing and skip-gram, on both two machine learning-based systems were also investigated. The performance of our system was evaluated on the test set using scripts provided by the CHEMDNER challenge organizers. Primary evaluation measures were micro Precision, Recall, and F-measure.
RESULTS: Our best system was among the top ranked systems with an official micro F-measure of 85.05%. Fixing a bug caused by inconsistent features marginally improved the performance (micro F-measure of 85.20%) of the system.
CONCLUSIONS: The SSVM-based CEM systems outperformed the CRF-based CEM systems when using the same features. Each type of the WR feature was beneficial to the CEM task. Both the CRF-based and SSVM-based systems using the all three types of WR features showed better performance than the systems using only one type of the WR feature.

Entities: Chemical Disease Species

Year: 2015 PMID： 25810779 PMCID： PMC4331698 DOI： 10.1186/1758-2946-7-S1-S8

Source DB: PubMed Journal: J Cheminform ISSN： 1758-2946 Impact factor: 5.514

17 in total

1. An overview of MetaMap: historical perspective and recent advances.

Authors: Alan R Aronson; François-Michel Lang
Journal: J Am Med Inform Assoc Date: 2010 May-Jun Impact factor: 4.497

2. A hybrid system for temporal information extraction from clinical text.

Authors: Buzhou Tang; Yonghui Wu; Min Jiang; Yukun Chen; Joshua C Denny; Hua Xu
Journal: J Am Med Inform Assoc Date: 2013-04-09 Impact factor: 4.497

3. Detection of IUPAC and IUPAC-like chemical names.

Authors: Roman Klinger; Corinna Kolárik; Juliane Fluck; Martin Hofmann-Apitius; Christoph M Friedrich
Journal: Bioinformatics Date: 2008-07-01 Impact factor: 6.937

4. CHEMDNER: The drugs and chemical names extraction challenge.

Authors: Martin Krallinger; Florian Leitner; Obdulia Rabal; Miguel Vazquez; Julen Oyarzabal; Alfonso Valencia
Journal: J Cheminform Date: 2015-01-19 Impact factor: 5.514

5. Evaluating word representation features in biomedical named entity recognition tasks.

Authors: Buzhou Tang; Hongxin Cao; Xiaolong Wang; Qingcai Chen; Hua Xu
Journal: Biomed Res Int Date: 2014-03-06 Impact factor: 3.411

6. The CHEMDNER corpus of chemicals and drugs and its annotation principles.

Authors: Martin Krallinger; Obdulia Rabal; Florian Leitner; Miguel Vazquez; David Salgado; Zhiyong Lu; Robert Leaman; Yanan Lu; Donghong Ji; Daniel M Lowe; Roger A Sayle; Riza Theresa Batista-Navarro; Rafal Rak; Torsten Huber; Tim Rocktäschel; Sérgio Matos; David Campos; Buzhou Tang; Hua Xu; Tsendsuren Munkhdalai; Keun Ho Ryu; S V Ramanan; Senthil Nathan; Slavko Žitnik; Marko Bajec; Lutz Weber; Matthias Irmer; Saber A Akhondi; Jan A Kors; Shuo Xu; Xin An; Utpal Kumar Sikdar; Asif Ekbal; Masaharu Yoshioka; Thaer M Dieb; Miji Choi; Karin Verspoor; Madian Khabsa; C Lee Giles; Hongfang Liu; Komandur Elayavilli Ravikumar; Andre Lamurias; Francisco M Couto; Hong-Jie Dai; Richard Tzong-Han Tsai; Caglar Ata; Tolga Can; Anabel Usié; Rui Alves; Isabel Segura-Bedmar; Paloma Martínez; Julen Oyarzabal; Alfonso Valencia
Journal: J Cheminform Date: 2015-01-19 Impact factor: 5.514

7. PubChem: a public information system for analyzing bioactivities of small molecules.

Authors: Yanli Wang; Jewen Xiao; Tugba O Suzek; Jian Zhang; Jiyao Wang; Stephen H Bryant
Journal: Nucleic Acids Res Date: 2009-06-04 Impact factor: 16.971

8. Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features.

Authors: Buzhou Tang; Hongxin Cao; Yonghui Wu; Min Jiang; Hua Xu
Journal: BMC Med Inform Decis Mak Date: 2013-04-05 Impact factor: 2.796

9. ChEBI: a database and ontology for chemical entities of biological interest.

Authors: Kirill Degtyarenko; Paula de Matos; Marcus Ennis; Janna Hastings; Martin Zbinden; Alan McNaught; Rafael Alcántara; Michael Darsow; Mickaël Guedj; Michael Ashburner
Journal: Nucleic Acids Res Date: 2007-10-11 Impact factor: 16.971

10. Overview of BioCreative II gene mention recognition.

Authors: Larry Smith; Lorraine K Tanabe; Rie Johnson nee Ando; Cheng-Ju Kuo; I-Fang Chung; Chun-Nan Hsu; Yu-Shi Lin; Roman Klinger; Christoph M Friedrich; Kuzman Ganchev; Manabu Torii; Hongfang Liu; Barry Haddow; Craig A Struble; Richard J Povinelli; Andreas Vlachos; William A Baumgartner; Lawrence Hunter; Bob Carpenter; Richard Tzong-Han Tsai; Hong-Jie Dai; Feng Liu; Yifei Chen; Chengjie Sun; Sophia Katrenko; Pieter Adriaans; Christian Blaschke; Rafael Torres; Mariana Neves; Preslav Nakov; Anna Divoli; Manuel Maña-López; Jacinto Mata; W John Wilbur
Journal: Genome Biol Date: 2008-09-01 Impact factor: 13.583

8 in total

1. Putting hands to rest: efficient deep CNN-RNN architecture for chemical named entity recognition with no hand-crafted rules.

Authors: Ilia Korvigo; Maxim Holmatov; Anatolii Zaikovskii; Mikhail Skoblov
Journal: J Cheminform Date: 2018-05-23 Impact factor: 5.514

A comparison of conditional random fields and structured support vector machines for chemical entity recognition in biomedical literature.

1. An overview of MetaMap: historical perspective and recent advances.

2. A hybrid system for temporal information extraction from clinical text.

3. Detection of IUPAC and IUPAC-like chemical names.

4. CHEMDNER: The drugs and chemical names extraction challenge.

5. Evaluating word representation features in biomedical named entity recognition tasks.

6. The CHEMDNER corpus of chemicals and drugs and its annotation principles.

7. PubChem: a public information system for analyzing bioactivities of small molecules.

8. Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features.

9. ChEBI: a database and ontology for chemical entities of biological interest.

10. Overview of BioCreative II gene mention recognition.

1. Putting hands to rest: efficient deep CNN-RNN architecture for chemical named entity recognition with no hand-crafted rules.

2. Ensemble-based Methods to Improve De-identification of Electronic Health Record Narratives.

3. Feature engineering for drug name recognition in biomedical texts: feature conjunction and feature selection.

4. CheNER: a tool for the identification of chemical entities and their classes in biomedical literature.

5. CHEMDNER: The drugs and chemical names extraction challenge.

6. Chemical named entity recognition in patents by domain knowledge and unsupervised feature learning.

7. Recognizing software names in biomedical literature using machine learning.

8. Chemical named entity recognition in the texts of scientific publications using the naïve Bayes classifier approach.