Rong Xu1, QuanQiu Wang. 1. Medical Informatics Program, Center for Clinical Investigation, Case Western Reserve University, Cleveland, Ohio, USA.
Abstract
OBJECTIVE: A comprehensive and machine-understandable cancer drug-side effect (drug-SE) relationship knowledge base is important for in silico cancer drug target discovery, drug repurposing, and toxicity predication, and for personalized risk-benefit decisions by cancer patients. While US Food and Drug Administration (FDA) drug labels capture well-known cancer drug SE information, much cancer drug SE knowledge remains buried the published biomedical literature. We present a relationship extraction approach to extract cancer drug-SE pairs from the literature. DATA AND METHODS: We used 21,354,075 MEDLINE records as the text corpus. We extracted drug-SE co-occurrence pairs using a cancer drug lexicon and a clean SE lexicon that we created. We then developed two filtering approaches to remove drug-disease treatment pairs and subsequently a ranking scheme to further prioritize filtered pairs. Finally, we analyzed relationships among SEs, gene targets, and indications. RESULTS: We extracted 56,602 cancer drug-SE pairs. The filtering algorithms improved the precision of extracted pairs from 0.252 at baseline to 0.426, representing a 69% improvement in precision with no decrease in recall. The ranking algorithm further prioritized filtered pairs and achieved a precision of 0.778 for top-ranked pairs. We showed that cancer drugs that share SEs tend to have overlapping gene targets and overlapping indications. CONCLUSIONS: The relationship extraction approach is effective in extracting many cancer drug-SE pairs from the literature. This unique knowledge base, when combined with existing cancer drug SE knowledge, can facilitate drug target discovery, drug repurposing, and toxicity prediction.
OBJECTIVE: A comprehensive and machine-understandable cancer drug-side effect (drug-SE) relationship knowledge base is important for in silico cancer drug target discovery, drug repurposing, and toxicity predication, and for personalized risk-benefit decisions by cancerpatients. While US Food and Drug Administration (FDA) drug labels capture well-known cancer drug SE information, much cancer drug SE knowledge remains buried the published biomedical literature. We present a relationship extraction approach to extract cancer drug-SE pairs from the literature. DATA AND METHODS: We used 21,354,075 MEDLINE records as the text corpus. We extracted drug-SE co-occurrence pairs using a cancer drug lexicon and a clean SE lexicon that we created. We then developed two filtering approaches to remove drug-disease treatment pairs and subsequently a ranking scheme to further prioritize filtered pairs. Finally, we analyzed relationships among SEs, gene targets, and indications. RESULTS: We extracted 56,602 cancer drug-SE pairs. The filtering algorithms improved the precision of extracted pairs from 0.252 at baseline to 0.426, representing a 69% improvement in precision with no decrease in recall. The ranking algorithm further prioritized filtered pairs and achieved a precision of 0.778 for top-ranked pairs. We showed that cancer drugs that share SEs tend to have overlapping gene targets and overlapping indications. CONCLUSIONS: The relationship extraction approach is effective in extracting many cancer drug-SE pairs from the literature. This unique knowledge base, when combined with existing cancer drug SE knowledge, can facilitate drug target discovery, drug repurposing, and toxicity prediction.
Entities:
Keywords:
Cancer Drug Toxicity; Drug Repurposing; Drug Target Discovery; Information Extraction; Natural Language Processing; Text Mining
Authors: Lisa A Ladewski; Steven M Belknap; Jonathan R Nebeker; Oliver Sartor; E Allison Lyons; Timothy C Kuzel; Martin S Tallman; Dennis W Raisch; Amy R Auerbach; Glen T Schumock; Hau C Kwaan; Charles L Bennett Journal: J Clin Oncol Date: 2003-10-15 Impact factor: 44.544
Authors: Paul Avillach; Jean-Charles Dufour; Gayo Diallo; Francesco Salvo; Michel Joubert; Frantz Thiessard; Fleur Mougin; Gianluca Trifirò; Annie Fourrier-Réglat; Antoine Pariente; Marius Fieschi Journal: J Am Med Inform Assoc Date: 2012-11-29 Impact factor: 4.497
Authors: David S Wishart; Craig Knox; An Chi Guo; Dean Cheng; Savita Shrivastava; Dan Tzur; Bijaya Gautam; Murtaza Hassanali Journal: Nucleic Acids Res Date: 2007-11-29 Impact factor: 16.971