Safa Fathiamini1, Amber M Johnson2, Jia Zeng2, Alejandro Araya1, Vijaykumar Holla2, Ann M Bailey2, Beate C Litzenburger2, Nora S Sanchez2, Yekaterina Khotskaya2, Hua Xu1, Funda Meric-Bernstam3, Elmer V Bernstam4, Trevor Cohen5. 1. School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA. 2. Sheikh Khalifa Al Nahyan Ben Zayed Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, USA. 3. Sheikh Khalifa Al Nahyan Ben Zayed Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, USA Department of Investigational Cancer Therapeutics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA Department of Surgical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA. 4. School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA Division of General Internal Medicine, Department of Internal Medicine, The University of Texas Health Science Center at Houston, TX, USA. 5. School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA trevor.cohen@uth.tmc.edu elmer.v.bernstam@uth.tmc.edu.
Abstract
INTRODUCTION: Genomic profiling information is frequently available to oncologists, enabling targeted cancer therapy. Because clinically relevant information is rapidly emerging in the literature and elsewhere, there is a need for informatics technologies to support targeted therapies. To this end, we have developed a system for Automated Identification of Molecular Effects of Drugs, to help biomedical scientists curate this literature to facilitate decision support. OBJECTIVES: To create an automated system to identify assertions in the literature concerning drugs targeting genes with therapeutic implications and characterize the challenges inherent in automating this process in rapidly evolving domains. METHODS: We used subject-predicate-object triples (semantic predications) and co-occurrence relations generated by applying the SemRep Natural Language Processing system to MEDLINE abstracts and ClinicalTrials.gov descriptions. We applied customized semantic queries to find drugs targeting genes of interest. The results were manually reviewed by a team of experts. RESULTS: Compared to a manually curated set of relationships, recall, precision, and F2 were 0.39, 0.21, and 0.33, respectively, which represents a 3- to 4-fold improvement over a publically available set of predications (SemMedDB) alone. Upon review of ostensibly false positive results, 26% were considered relevant additions to the reference set, and an additional 61% were considered to be relevant for review. Adding co-occurrence data improved results for drugs in early development, but not their better-established counterparts. CONCLUSIONS: Precision medicine poses unique challenges for biomedical informatics systems that help domain experts find answers to their research questions. Further research is required to improve the performance of such systems, particularly for drugs in development.
INTRODUCTION: Genomic profiling information is frequently available to oncologists, enabling targeted cancer therapy. Because clinically relevant information is rapidly emerging in the literature and elsewhere, there is a need for informatics technologies to support targeted therapies. To this end, we have developed a system for Automated Identification of Molecular Effects of Drugs, to help biomedical scientists curate this literature to facilitate decision support. OBJECTIVES: To create an automated system to identify assertions in the literature concerning drugs targeting genes with therapeutic implications and characterize the challenges inherent in automating this process in rapidly evolving domains. METHODS: We used subject-predicate-object triples (semantic predications) and co-occurrence relations generated by applying the SemRep Natural Language Processing system to MEDLINE abstracts and ClinicalTrials.gov descriptions. We applied customized semantic queries to find drugs targeting genes of interest. The results were manually reviewed by a team of experts. RESULTS: Compared to a manually curated set of relationships, recall, precision, and F2 were 0.39, 0.21, and 0.33, respectively, which represents a 3- to 4-fold improvement over a publically available set of predications (SemMedDB) alone. Upon review of ostensibly false positive results, 26% were considered relevant additions to the reference set, and an additional 61% were considered to be relevant for review. Adding co-occurrence data improved results for drugs in early development, but not their better-established counterparts. CONCLUSIONS: Precision medicine poses unique challenges for biomedical informatics systems that help domain experts find answers to their research questions. Further research is required to improve the performance of such systems, particularly for drugs in development.
Authors: Charles A Sneiderman; Dina Demner-Fushman; Marcelo Fiszman; Nicholas C Ide; Thomas C Rindflesch Journal: J Am Med Inform Assoc Date: 2007-08-21 Impact factor: 4.497
Authors: Ginger Tsueng; Max Nanis; Jennifer T Fouquier; Michael Mayers; Benjamin M Good; Andrew I Su Journal: Bioinformatics Date: 2020-02-15 Impact factor: 6.937
Authors: Katherine C Kurnit; Ann M Bailey; Jia Zeng; Amber M Johnson; Md Abu Shufean; Lauren Brusco; Beate C Litzenburger; Nora S Sánchez; Yekaterina B Khotskaya; Vijaykumar Holla; Amy Simpson; Gordon B Mills; John Mendelsohn; Elmer Bernstam; Kenna Shaw; Funda Meric-Bernstam Journal: Cancer Res Date: 2017-11-01 Impact factor: 12.701