Literature DB >> 22107874

Automated annotation of chemical names in the literature with tunable accuracy.

Jun D Zhang1, Lewis Y Geer, Evan E Bolton, Stephen H Bryant.   

Abstract

BACKGROUND: A significant portion of the biomedical and chemical literature refers to small molecules. The accurate identification and annotation of compound name that are relevant to the topic of the given literature can establish links between scientific publications and various chemical and life science databases. Manual annotation is the preferred method for these works because well-trained indexers can understand the paper topics as well as recognize key terms. However, considering the hundreds of thousands of new papers published annually, an automatic annotation system with high precision and relevance can be a useful complement to manual annotation.
RESULTS: An automated chemical name annotation system, MeSH Automated Annotations (MAA), was developed to annotate small molecule names in scientific abstracts with tunable accuracy. This system aims to reproduce the MeSH term annotations on biomedical and chemical literature that would be created by indexers. When comparing automated free text matching to those indexed manually of 26 thousand MEDLINE abstracts, more than 40% of the annotations were false-positive (FP) cases. To reduce the FP rate, MAA incorporated several filters to remove "incorrect" annotations caused by nonspecific, partial, and low relevance chemical names. In part, relevance was measured by the position of the chemical name in the text. Tunable accuracy was obtained by adding or restricting the sections of the text scanned for chemical names. The best precision obtained was 96% with a 28% recall rate. The best performance of MAA, as measured with the F statistic was 66%, which favorably compares to other chemical name annotation systems.
CONCLUSIONS: Accurate chemical name annotation can help researchers not only identify important chemical names in abstracts, but also match unindexed and unstructured abstracts to chemical records. The current work is tested against MEDLINE, but the algorithm is not specific to this corpus and it is possible that the algorithm can be applied to papers from chemical physics, material, polymer and environmental science, as well as patents, biological assay descriptions and other textual data.

Entities:  

Year:  2011        PMID: 22107874      PMCID: PMC3281788          DOI: 10.1186/1758-2946-3-52

Source DB:  PubMed          Journal:  J Cheminform        ISSN: 1758-2946            Impact factor:   5.514


  13 in total

1.  Analysis of biomedical text for chemical names: a comparison of three methods.

Authors:  W J Wilbur; G F Hazard; G Divita; J G Mork; A R Aronson; A C Browne
Journal:  Proc AMIA Symp       Date:  1999

2.  The NLM Indexing Initiative.

Authors:  A R Aronson; O Bodenreider; H F Chang; S M Humphrey; J G Mork; S J Nelson; T C Rindflesch; W J Wilbur
Journal:  Proc AMIA Symp       Date:  2000

3.  Medical Subject Headings (MeSH).

Authors:  C E Lipscomb
Journal:  Bull Med Libr Assoc       Date:  2000-07

4.  The NLM Indexing Initiative's Medical Text Indexer.

Authors:  Alan R Aronson; James G Mork; Clifford W Gay; Susanne M Humphrey; Willie J Rogers
Journal:  Stud Health Technol Inform       Date:  2004

5.  Identification of new drug classification terms in textual resources.

Authors:  Corinna Kolárik; Martin Hofmann-Apitius; Marc Zimmermann; Juliane Fluck
Journal:  Bioinformatics       Date:  2007-07-01       Impact factor: 6.937

6.  Mining chemical and biological information from the drug literature.

Authors:  Debra L Banville
Journal:  Curr Opin Drug Discov Devel       Date:  2009-05

7.  A dictionary to identify small molecules and drugs in free text.

Authors:  Kristina M Hettne; Rob H Stierum; Martijn J Schuemie; Peter J M Hendriksen; Bob J A Schijvenaars; Erik M van Mulligen; Jos Kleinjans; Jan A Kors
Journal:  Bioinformatics       Date:  2009-09-16       Impact factor: 6.937

8.  A strategy for assigning new concepts in the MEDLINE database.

Authors:  Won Kim; W John Wilbur
Journal:  AMIA Annu Symp Proc       Date:  2005

9.  Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining.

Authors:  Kristina M Hettne; Antony J Williams; Erik M van Mulligen; Jos Kleinjans; Valery Tkachenko; Jan A Kors
Journal:  J Cheminform       Date:  2010-03-23       Impact factor: 5.514

10.  Detection of IUPAC and IUPAC-like chemical names.

Authors:  Roman Klinger; Corinna Kolárik; Juliane Fluck; Martin Hofmann-Apitius; Christoph M Friedrich
Journal:  Bioinformatics       Date:  2008-07-01       Impact factor: 6.937

View more
  3 in total

1.  The CHEMDNER corpus of chemicals and drugs and its annotation principles.

Authors:  Martin Krallinger; Obdulia Rabal; Florian Leitner; Miguel Vazquez; David Salgado; Zhiyong Lu; Robert Leaman; Yanan Lu; Donghong Ji; Daniel M Lowe; Roger A Sayle; Riza Theresa Batista-Navarro; Rafal Rak; Torsten Huber; Tim Rocktäschel; Sérgio Matos; David Campos; Buzhou Tang; Hua Xu; Tsendsuren Munkhdalai; Keun Ho Ryu; S V Ramanan; Senthil Nathan; Slavko Žitnik; Marko Bajec; Lutz Weber; Matthias Irmer; Saber A Akhondi; Jan A Kors; Shuo Xu; Xin An; Utpal Kumar Sikdar; Asif Ekbal; Masaharu Yoshioka; Thaer M Dieb; Miji Choi; Karin Verspoor; Madian Khabsa; C Lee Giles; Hongfang Liu; Komandur Elayavilli Ravikumar; Andre Lamurias; Francisco M Couto; Hong-Jie Dai; Richard Tzong-Han Tsai; Caglar Ata; Tolga Can; Anabel Usié; Rui Alves; Isabel Segura-Bedmar; Paloma Martínez; Julen Oyarzabal; Alfonso Valencia
Journal:  J Cheminform       Date:  2015-01-19       Impact factor: 5.514

2.  DisArticle: a web server for SVM-based discrimination of articles on traditional medicine.

Authors:  Sang-Kyun Kim; SeJin Nam; SangHyun Kim
Journal:  BMC Complement Altern Med       Date:  2017-01-28       Impact factor: 3.659

3.  Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation.

Authors:  Alex M Clark; Barry A Bunin; Nadia K Litterman; Stephan C Schürer; Ubbo Visser
Journal:  PeerJ       Date:  2014-08-14       Impact factor: 2.984

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.