Hao Liu1, Yuan Chi1, Alex Butler1, Yingcheng Sun1, Chunhua Weng2. 1. Department of Biomedical Informatics, Columbia University, New York, NY, USA. 2. Department of Biomedical Informatics, Columbia University, New York, NY, USA. Electronic address: chunhua@columbia.edu.
Abstract
OBJECTIVE: We present the Clinical Trial Knowledge Base, a regularly updated knowledge base of discrete clinical trial eligibility criteria equipped with a web-based user interface for querying and aggregate analysis of common eligibility criteria. MATERIALS AND METHODS: We used a natural language processing (NLP) tool named Criteria2Query (Yuan et al., 2019) to transform free text clinical trial eligibility criteria from ClinicalTrials.gov into discrete criteria concepts and attributes encoded using the widely adopted Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) and stored in a relational SQL database. A web application accessible via RESTful APIs was implemented to enable queries and visual aggregate analyses. We demonstrate CTKB's potential role in EHR phenotype knowledge engineering using ten validated phenotyping algorithms. RESULTS: At the time of writing, CTKB contained 87,504 distinctive OMOP CDM standard concepts, including Condition (47.82%), Drug (23.01%), Procedure (13.73%), Measurement (24.70%) and Observation (5.28%), with 34.78% for inclusion criteria and 65.22% for exclusion criteria, extracted from 352,110 clinical trials. The average hit rate of criteria concepts in eMERGE phenotype algorithms is 77.56%. CONCLUSION: CTKB is a novel comprehensive knowledge base of discrete eligibility criteria concepts with the potential to enable knowledge engineering for clinical trial cohort definition, clinical trial population representativeness assessment, electronical phenotyping, and data gap analyses for using electronic health records to support clinical trial recruitment.
OBJECTIVE: We present the Clinical Trial Knowledge Base, a regularly updated knowledge base of discrete clinical trial eligibility criteria equipped with a web-based user interface for querying and aggregate analysis of common eligibility criteria. MATERIALS AND METHODS: We used a natural language processing (NLP) tool named Criteria2Query (Yuan et al., 2019) to transform free text clinical trial eligibility criteria from ClinicalTrials.gov into discrete criteria concepts and attributes encoded using the widely adopted Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) and stored in a relational SQL database. A web application accessible via RESTful APIs was implemented to enable queries and visual aggregate analyses. We demonstrate CTKB's potential role in EHR phenotype knowledge engineering using ten validated phenotyping algorithms. RESULTS: At the time of writing, CTKB contained 87,504 distinctive OMOP CDM standard concepts, including Condition (47.82%), Drug (23.01%), Procedure (13.73%), Measurement (24.70%) and Observation (5.28%), with 34.78% for inclusion criteria and 65.22% for exclusion criteria, extracted from 352,110 clinical trials. The average hit rate of criteria concepts in eMERGE phenotype algorithms is 77.56%. CONCLUSION: CTKB is a novel comprehensive knowledge base of discrete eligibility criteria concepts with the potential to enable knowledge engineering for clinical trial cohort definition, clinical trial population representativeness assessment, electronical phenotyping, and data gap analyses for using electronic health records to support clinical trial recruitment.
Authors: J Marc Overhage; Patrick B Ryan; Christian G Reich; Abraham G Hartzema; Paul E Stang Journal: J Am Med Inform Assoc Date: 2011-10-28 Impact factor: 4.497
Authors: Kenneth J Rothman; Ellen M Mikkelsen; Anders Riis; Henrik T Sørensen; Lauren A Wise; Elizabeth E Hatch Journal: Epidemiology Date: 2009-01 Impact factor: 4.822
Authors: Catherine A McCarty; Rex L Chisholm; Christopher G Chute; Iftikhar J Kullo; Gail P Jarvik; Eric B Larson; Rongling Li; Daniel R Masys; Marylyn D Ritchie; Dan M Roden; Jeffery P Struewing; Wendy A Wolf Journal: BMC Med Genomics Date: 2011-01-26 Impact factor: 3.063
Authors: Omri Gottesman; Helena Kuivaniemi; Gerard Tromp; W Andrew Faucett; Rongling Li; Teri A Manolio; Saskia C Sanderson; Joseph Kannry; Randi Zinberg; Melissa A Basford; Murray Brilliant; David J Carey; Rex L Chisholm; Christopher G Chute; John J Connolly; David Crosslin; Joshua C Denny; Carlos J Gallego; Jonathan L Haines; Hakon Hakonarson; John Harley; Gail P Jarvik; Isaac Kohane; Iftikhar J Kullo; Eric B Larson; Catherine McCarty; Marylyn D Ritchie; Dan M Roden; Maureen E Smith; Erwin P Böttinger; Marc S Williams Journal: Genet Med Date: 2013-06-06 Impact factor: 8.822
Authors: Cong Liu; Chi Yuan; Alex M Butler; Richard D Carvajal; Ziran Ryan Li; Casey N Ta; Chunhua Weng Journal: J Am Med Inform Assoc Date: 2019-11-01 Impact factor: 4.497
Authors: V Jenkins; V Farewell; D Farewell; J Darmanin; J Wagstaff; C Langridge; L Fallowfield Journal: Br J Cancer Date: 2013-03-19 Impact factor: 7.640
Authors: Kevin Wu; Eric Wu; Michael DAndrea; Nandini Chitale; Melody Lim; Marek Dabrowski; Klaudia Kantor; Hanoor Rangi; Ruishan Liu; Marius Garmhausen; Navdeep Pal; Chris Harbron; Shemra Rizzo; Ryan Copping; James Zou Journal: AAPS J Date: 2022-04-21 Impact factor: 4.009