Literature DB >> 23046495

Ranking relations between diseases, drugs and genes for a curation task.

Simon Clematide1, Fabio Rinaldi.   

Abstract

BACKGROUND: One of the key pieces of information which biomedical text mining systems are expected to extract from the literature are interactions among different types of biomedical entities (proteins, genes, diseases, drugs, etc.). Several large resources of curated relations between biomedical entities are currently available, such as the Pharmacogenomics Knowledge Base (PharmGKB) or the Comparative Toxicogenomics Database (CTD).Biomedical text mining systems, and in particular those which deal with the extraction of relationships among entities, could make better use of the wealth of already curated material.
RESULTS: We propose a simple and effective method based on logistic regression (also known as maximum entropy modeling) for an optimized ranking of relation candidates utilizing curated abstracts. Furthermore, we examine the effects and difficulties of using widely available metadata (i.e. MeSH terms and chemical substance index terms) for relation extraction. Cross-validation experiments result in an improvement of the ranking quality in terms of AUCiP/R by 39% (PharmGKB) and 116% (CTD) against a frequency-based baseline of 0.39 (PharmGKB) and 0.21 (CTD). For the TAP-10 metrics, we achieve an improvement of 53% (PharmGKB) and 134% (CTD) against the same baseline system (0.21 PharmGKB and 0.15 CTD).
CONCLUSIONS: Our experiments with the PharmGKB and the CTD database show a strong positive effect for the ranking of relation candidates utilizing the vast amount of curated relations covered by currently available knowledge databases. The tasks of concept identification and candidate relation generation profit from the adaptation to previously curated material. This presents an effective and practical method suitable for conservative extension and re-validation of biomedical relations from texts that has been successfully used for curation experiments with the PharmGKB and CTD database.

Entities:  

Year:  2012        PMID: 23046495      PMCID: PMC3465213          DOI: 10.1186/2041-1480-3-S3-S5

Source DB:  PubMed          Journal:  J Biomed Semantics


  23 in total

1.  An Overview of BioCreative II.5.

Authors:  Florian Leitner; Scott A Mardis; Martin Krallinger; Gianni Cesareni; Lynette A Hirschman; Alfonso Valencia
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2010 Jul-Sep       Impact factor: 3.710

2.  Manual curation is not sufficient for annotation of genomic databases.

Authors:  William A Baumgartner; K Bretonnel Cohen; Lynne M Fox; George Acquaah-Mensah; Lawrence Hunter
Journal:  Bioinformatics       Date:  2007-07-01       Impact factor: 6.937

3.  BANNER: an executable survey of advances in biomedical named entity recognition.

Authors:  Robert Leaman; Graciela Gonzalez
Journal:  Pac Symp Biocomput       Date:  2008

4.  PharmGKB: understanding the effects of individual genetic variants.

Authors:  Katrin Sangkuhl; Dorit S Berlin; Russ B Altman; Teri E Klein
Journal:  Drug Metab Rev       Date:  2008       Impact factor: 4.518

5.  Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms.

Authors: 
Journal:  Neural Comput       Date:  1998-09-15       Impact factor: 2.026

6.  Terminological resources for text mining over biomedical scientific literature.

Authors:  Fabio Rinaldi; Kaarel Kaljurand; Rune Sætre
Journal:  Artif Intell Med       Date:  2011-06-11       Impact factor: 5.326

7.  The Comparative Toxicogenomics Database: update 2011.

Authors:  Allan Peter Davis; Benjamin L King; Susan Mockus; Cynthia G Murphy; Cynthia Saraceni-Richards; Michael Rosenstein; Thomas Wiegers; Carolyn J Mattingly
Journal:  Nucleic Acids Res       Date:  2010-09-22       Impact factor: 16.971

8.  BioGRID: a general repository for interaction datasets.

Authors:  Chris Stark; Bobby-Joe Breitkreutz; Teresa Reguly; Lorrie Boucher; Ashton Breitkreutz; Mike Tyers
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

9.  The Universal Protein Resource (UniProt).

Authors: 
Journal:  Nucleic Acids Res       Date:  2006-11-16       Impact factor: 16.971

10.  Overview of the protein-protein interaction annotation extraction task of BioCreative II.

Authors:  Martin Krallinger; Florian Leitner; Carlos Rodriguez-Penagos; Alfonso Valencia
Journal:  Genome Biol       Date:  2008-09-01       Impact factor: 13.583

View more
  11 in total

1.  BERT-GT: Cross-sentence n-ary relation extraction with BERT and graph transformer.

Authors:  Po-Ting Lai; Zhiyong Lu
Journal:  Bioinformatics       Date:  2021-01-08       Impact factor: 6.937

2.  Topics in machine learning for biomedical literature analysis and text retrieval.

Authors:  Rezarta Islamaj Doğan; Lana Yeganova
Journal:  J Biomed Semantics       Date:  2012-10-05

3.  Using ODIN for a PharmGKB revalidation experiment.

Authors:  Fabio Rinaldi; Simon Clematide; Yael Garten; Michelle Whirl-Carrillo; Li Gong; Joan M Hebert; Katrin Sangkuhl; Caroline F Thorn; Teri E Klein; Russ B Altman
Journal:  Database (Oxford)       Date:  2012-04-23       Impact factor: 3.451

4.  OntoGene web services for biomedical text mining.

Authors:  Fabio Rinaldi; Simon Clematide; Hernani Marques; Tilia Ellendorff; Martin Romacker; Raul Rodriguez-Esteban
Journal:  BMC Bioinformatics       Date:  2014-11-27       Impact factor: 3.169

5.  LimTox: a web tool for applied text mining of adverse event and toxicity associations of compounds, drugs and genes.

Authors:  Andres Cañada; Salvador Capella-Gutierrez; Obdulia Rabal; Julen Oyarzabal; Alfonso Valencia; Martin Krallinger
Journal:  Nucleic Acids Res       Date:  2017-07-03       Impact factor: 16.971

6.  Cellular Signaling Pathways in Insulin Resistance-Systems Biology Analyses of Microarray Dataset Reveals New Drug Target Gene Signatures of Type 2 Diabetes Mellitus.

Authors:  Syed Aun Muhammad; Waseem Raza; Thanh Nguyen; Baogang Bai; Xiaogang Wu; Jake Chen
Journal:  Front Physiol       Date:  2017-01-25       Impact factor: 4.566

7.  A systematic simulation-based meta-analytical framework for prediction of physiological biomarkers in alopecia.

Authors:  Syed Aun Muhammad; Nighat Fatima; Rehan Zafar Paracha; Amjad Ali; Jake Y Chen
Journal:  J Biol Res (Thessalon)       Date:  2019-04-04       Impact factor: 1.889

8.  Systems-level differential gene expression analysis reveals new genetic variants of oral cancer.

Authors:  Syeda Zahra Abbas; Muhammad Imran Qadir; Syed Aun Muhammad
Journal:  Sci Rep       Date:  2020-09-04       Impact factor: 4.379

9.  Using the OntoGene pipeline for the triage task of BioCreative 2012.

Authors:  Fabio Rinaldi; Simon Clematide; Simon Hafner; Gerold Schneider; Gintare Grigonyte; Martin Romacker; Therese Vachon
Journal:  Database (Oxford)       Date:  2013-02-09       Impact factor: 3.451

10.  DTMiner: identification of potential disease targets through biomedical literature mining.

Authors:  Dong Xu; Meizhuo Zhang; Yanping Xie; Fan Wang; Ming Chen; Kenny Q Zhu; Jia Wei
Journal:  Bioinformatics       Date:  2016-08-09       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.