Jun Xu1, Hee-Jin Lee1, Jia Zeng2, Yonghui Wu1, Yaoyun Zhang1, Liang-Chin Huang1, Amber Johnson2, Vijaykumar Holla2, Ann M Bailey2, Trevor Cohen1, Funda Meric-Bernstam3, Elmer V Bernstam4, Hua Xu5. 1. School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA. 2. Institute for Personalized Cancer Therapy, University of Texas MD Anderson Cancer Center, Houston, TX, USA. 3. Institute for Personalized Cancer Therapy, University of Texas MD Anderson Cancer Center, Houston, TX, USA Department of Investigational Cancer Therapeutics, University of Texas MD Anderson Cancer Center, Houston, TX, USA. 4. School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA Division of General Internal Medicine, Department of Internal Medicine, Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA. 5. School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA hua.xu@uth.tmc.edu.
Abstract
OBJECTIVE: Clinical trials investigating drugs that target specific genetic alterations in tumors are important for promoting personalized cancer therapy. The goal of this project is to create a knowledge base of cancer treatment trials with annotations about genetic alterations from ClinicalTrials.gov. METHODS: We developed a semi-automatic framework that combines advanced text-processing techniques with manual review to curate genetic alteration information in cancer trials. The framework consists of a document classification system to identify cancer treatment trials from ClinicalTrials.gov and an information extraction system to extract gene and alteration pairs from the Title and Eligibility Criteria sections of clinical trials. By applying the framework to trials at ClinicalTrials.gov, we created a knowledge base of cancer treatment trials with genetic alteration annotations. We then evaluated each component of the framework against manually reviewed sets of clinical trials and generated descriptive statistics of the knowledge base. RESULTS AND DISCUSSION: The automated cancer treatment trial identification system achieved a high precision of 0.9944. Together with the manual review process, it identified 20 193 cancer treatment trials from ClinicalTrials.gov. The automated gene-alteration extraction system achieved a precision of 0.8300 and a recall of 0.6803. After validation by manual review, we generated a knowledge base of 2024 cancer trials that are labeled with specific genetic alteration information. Analysis of the knowledge base revealed the trend of increased use of targeted therapy for cancer, as well as top frequent gene-alteration pairs of interest. We expect this knowledge base to be a valuable resource for physicians and patients who are seeking information about personalized cancer therapy.
OBJECTIVE: Clinical trials investigating drugs that target specific genetic alterations in tumors are important for promoting personalized cancer therapy. The goal of this project is to create a knowledge base of cancer treatment trials with annotations about genetic alterations from ClinicalTrials.gov. METHODS: We developed a semi-automatic framework that combines advanced text-processing techniques with manual review to curate genetic alteration information in cancer trials. The framework consists of a document classification system to identify cancer treatment trials from ClinicalTrials.gov and an information extraction system to extract gene and alteration pairs from the Title and Eligibility Criteria sections of clinical trials. By applying the framework to trials at ClinicalTrials.gov, we created a knowledge base of cancer treatment trials with genetic alteration annotations. We then evaluated each component of the framework against manually reviewed sets of clinical trials and generated descriptive statistics of the knowledge base. RESULTS AND DISCUSSION: The automated cancer treatment trial identification system achieved a high precision of 0.9944. Together with the manual review process, it identified 20 193 cancer treatment trials from ClinicalTrials.gov. The automated gene-alteration extraction system achieved a precision of 0.8300 and a recall of 0.6803. After validation by manual review, we generated a knowledge base of 2024 cancer trials that are labeled with specific genetic alteration information. Analysis of the knowledge base revealed the trend of increased use of targeted therapy for cancer, as well as top frequent gene-alteration pairs of interest. We expect this knowledge base to be a valuable resource for physicians and patients who are seeking information about personalized cancer therapy.
Authors: Funda Meric-Bernstam; Amber Johnson; Vijaykumar Holla; Ann Marie Bailey; Lauren Brusco; Ken Chen; Mark Routbort; Keyur P Patel; Jia Zeng; Scott Kopetz; Michael A Davies; Sarina A Piha-Paul; David S Hong; Agda Karina Eterovic; Apostolia M Tsimberidou; Russell Broaddus; Elmer V Bernstam; Kenna R Shaw; John Mendelsohn; Gordon B Mills Journal: J Natl Cancer Inst Date: 2015-04-11 Impact factor: 13.506
Authors: Yonghui Wu; Mia A Levy; Christine M Micheel; Paul Yeh; Buzhou Tang; Michael J Cantrell; Stacy M Cooreman; Hua Xu Journal: BMC Genomics Date: 2012-12-17 Impact factor: 3.969
Authors: Ruth L Seal; Susan M Gordon; Michael J Lush; Mathew W Wright; Elspeth A Bruford Journal: Nucleic Acids Res Date: 2010-10-06 Impact factor: 16.971
Authors: David S Wishart; Craig Knox; An Chi Guo; Savita Shrivastava; Murtaza Hassanali; Paul Stothard; Zhan Chang; Jennifer Woolsey Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971
Authors: Florian Borchert; Andreas Mock; Aurelie Tomczak; Jonas Hügel; Samer Alkarkoukly; Alexander Knurr; Anna-Lena Volckmar; Albrecht Stenzinger; Peter Schirmacher; Jürgen Debus; Dirk Jäger; Thomas Longerich; Stefan Fröhling; Roland Eils; Nina Bougatf; Ulrich Sax; Matthieu-P Schapranow Journal: Brief Bioinform Date: 2021-11-05 Impact factor: 11.622
Authors: E Soysal; H-J Lee; Y Zhang; L-C Huang; X Chen; Q Wei; W Zheng; J T Chang; T Cohen; J Sun; H Xu Journal: CPT Pharmacometrics Syst Pharmacol Date: 2017-03-13