BACKGROUND: Systems approaches to studying drug-side-effect (drug-SE) associations are emerging as an active research area for both drug target discovery and drug repositioning. However, a comprehensive drug-SE association knowledge base does not exist. In this study, we present a novel knowledge-driven (KD) approach to effectively extract a large number of drug-SE pairs from published biomedical literature. DATA AND METHODS: For the text corpus, we used 21,354,075 MEDLINE records (119,085,682 sentences). First, we used known drug-SE associations derived from FDA drug labels as prior knowledge to automatically find SE-related sentences and abstracts. We then extracted a total of 49,575 drug-SE pairs from MEDLINE sentences and 180,454 pairs from abstracts. RESULTS: On average, the KD approach has achieved a precision of 0.335, a recall of 0.509, and an F1 of 0.392, which is significantly better than a SVM-based machine learning approach (precision: 0.135, recall: 0.900, F1: 0.233) with a 73.0% increase in F1 score. Through integrative analysis, we demonstrate that the higher-level phenotypic drug-SE relationships reflects lower-level genetic, genomic, and chemical drug mechanisms. In addition, we show that the extracted drug-SE pairs can be directly used in drug repositioning. CONCLUSION: In summary, we automatically constructed a large-scale higher-level drug phenotype relationship knowledge, which can have great potential in computational drug discovery.
BACKGROUND: Systems approaches to studying drug-side-effect (drug-SE) associations are emerging as an active research area for both drug target discovery and drug repositioning. However, a comprehensive drug-SE association knowledge base does not exist. In this study, we present a novel knowledge-driven (KD) approach to effectively extract a large number of drug-SE pairs from published biomedical literature. DATA AND METHODS: For the text corpus, we used 21,354,075 MEDLINE records (119,085,682 sentences). First, we used known drug-SE associations derived from FDA drug labels as prior knowledge to automatically find SE-related sentences and abstracts. We then extracted a total of 49,575 drug-SE pairs from MEDLINE sentences and 180,454 pairs from abstracts. RESULTS: On average, the KD approach has achieved a precision of 0.335, a recall of 0.509, and an F1 of 0.392, which is significantly better than a SVM-based machine learning approach (precision: 0.135, recall: 0.900, F1: 0.233) with a 73.0% increase in F1 score. Through integrative analysis, we demonstrate that the higher-level phenotypic drug-SE relationships reflects lower-level genetic, genomic, and chemical drug mechanisms. In addition, we show that the extracted drug-SE pairs can be directly used in drug repositioning. CONCLUSION: In summary, we automatically constructed a large-scale higher-level drug phenotype relationship knowledge, which can have great potential in computational drug discovery.
Authors: M Whirl-Carrillo; E M McDonagh; J M Hebert; L Gong; K Sangkuhl; C F Thorn; R B Altman; T E Klein Journal: Clin Pharmacol Ther Date: 2012-10 Impact factor: 6.875
Authors: David S Wishart; Craig Knox; An Chi Guo; Savita Shrivastava; Murtaza Hassanali; Paul Stothard; Zhan Chang; Jennifer Woolsey Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971
Authors: Sarah L Kinnings; Nina Liu; Nancy Buchmeier; Peter J Tonge; Lei Xie; Philip E Bourne Journal: PLoS Comput Biol Date: 2009-07-03 Impact factor: 4.475
Authors: Fergus N Doubal; Myzoon Ali; G David Batty; Andreas Charidimou; Maria Eriksdotter; Martin Hofmann-Apitius; Yun-Hee Kim; Deborah A Levine; Gillian Mead; Hermann A M Mucke; Craig W Ritchie; Charlotte J Roberts; Tom C Russ; Robert Stewart; William Whiteley; Terence J Quinn Journal: BMC Neurol Date: 2017-04-17 Impact factor: 2.474