Literature DB >> 33216120

The COVID-19 Trial Finder.

Yingcheng Sun1, Alex Butler1,2, Fengyang Lin3, Hao Liu1, Latoya A Stewart4, Jae Hyun Kim1, Betina Ross S Idnay5,6, Qingyin Ge3, Xinyi Wei3, Cong Liu1, Chi Yuan1, Chunhua Weng1.   

Abstract

Clinical trials are the gold standard for generating reliable medical evidence. The biggest bottleneck in clinical trials is recruitment. To facilitate recruitment, tools for patient search of relevant clinical trials have been developed, but users often suffer from information overload. With nearly 700 coronavirus disease 2019 (COVID-19) trials conducted in the United States as of August 2020, it is imperative to enable rapid recruitment to these studies. The COVID-19 Trial Finder was designed to facilitate patient-centered search of COVID-19 trials, first by location and radius distance from trial sites, and then by brief, dynamically generated medical questions to allow users to prescreen their eligibility for nearby COVID-19 trials with minimum human computer interaction. A simulation study using 20 publicly available patient case reports demonstrates its precision and effectiveness.
© The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  COVID-19; clinical trial; eligibility criteria; information filtering; questionnaire; web application

Mesh:

Year:  2021        PMID: 33216120      PMCID: PMC7717322          DOI: 10.1093/jamia/ocaa304

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


INTRODUCTION

Patient-to-trial matching remains a critical bottleneck in clinical research, largely due to the free-text format of clinical trial information, particularly eligibility criteria that are indispensable for screening patient eligibility and yet not amenable to even simple computation. Existing clinical trial search systems are either keyword based or questionnaire based. Keyword-based search engines, such as ClinicalTrials.gov, FindMeCure.com, Janssen Global Trial Finder, or ResearchMatch, require users to search for trials using keywords, which tends to impose challenges for query formulation and generate information overload. Static questionnaire systems, such as Fox Trial Finder, filter out irrelevant trials by asking users to answer a long list of preselected questions, which can be laborious and are not user-friendly. The coronavirus disease 2019 (COVID-19) pandemic is one of the greatest challenges modern medicine has faced. As of August 2020, there have been more than 6 million confirmed cases and 180 000 reported deaths in the United States, with few approved treatments., In response to the COVID-19 emergency, clinical trial research assessing the efficacy and safety of COVID-19 treatments are being created at an unprecedented rate. As of August 31, 2020, well over 3100 clinical trials have been registered in ClinicalTrials.gov, the largest clinical trial registry in the world. The need for rapid and accessible trial search tools has never been more apparent than now. In this article, we describe an open-source semantic search engine for COVID-19 clinical trials conducted in the United States called the COVID-19 Trial Finder by extending our previously published method for using dynamically generated questionnaires for enabling efficient clinical trial search. This is an interactive COVID-19 trial search engine that enables minimized, dynamic questionnaire generation in response to user provided answers in real time. It is powered by a regularly updated machine-readable dataset for all the COVID-19 trials in the United States. It is also enhanced with a Web-based visualization of the geographic distribution of COVID-19 trials in the United States to enable friendly user navigation with the trial space. By facilitating search for appropriate COVID-19 trials in specific geographic areas, the system enables research volunteers to perform self-screening using the eligibility criteria of these COVID-19 trials. Further, it allows clinical trialists to assess the landscape of COVID-19 trials by eligibility criteria and geographical locations in order to identify collaboration opportunities for similar COVID-19 studies and improve trial response corresponding to evolving case surges. The system is accessible online (https://covidtrialx.dbmi.columbia.edu), as well as its source code (https://github.com/WengLab-InformaticsResearch/COVID19-TrialFinder). We evaluated the system on 20 published COVID-19 case reports and demonstrated its precision and efficiency.

MATERIALS AND METHODS

The COVID-19 Trial Finder consists of 2 modules, for trial indexing and trial retrieval, respectively. The trial indexing module works offline to extract entities and attributes from eligibility criteria text and to create a trial index using semantic tags, which are the extracted terms mapped to standardized clinical concepts. The retrieval module dynamically generates medical questions and iteratively filters out trials based on user answers until a sufficiently shortened list of trials is generated. Figure 1 shows the system architecture.
Figure 1.

System architecture. (A) The trial indexing module works offline. (B) The trial retrieval module interacts with users.

System architecture. (A) The trial indexing module works offline. (B) The trial retrieval module interacts with users. Clinical trial eligibility criteria exist largely as free text, so they must be formalized to a machine-readable syntax to allow for semantic trial retrieval. In the trial indexing module, COVID-19–related trials are acquired from ClinicalTrials.gov by querying all the trials indexed with “COVID-19” being their condition. Using a semi-automated method, their eligibility criteria are structured and formatted using the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), first by an automated tool Critera2Query, and then verified and corrected manually as needed by medical domain experts (A.B., J.K., L.S.) to overcome the limitations in Criteria2Query. With international memberships in the terminology sets, the OMOP CDM provides a comprehensive standard vocabulary for representing clinical concepts commonly available inpatient data. Take an example in our system, the phrase “shortness of breath” in the criteria text will first be extracted by Criteria2Query through its named entity recognition module and mapped to the concept “dyspnea” as the semantic tag through the entity normalization module, which will then be used to index associated trials. We provided a dataset with this article, including 581 COVID-19 trials annotated by 10 223 semantic tags, with 17.6 tags per trial on average. After removing duplicates, 1811 distinct tags are used for trial index. Supplementary Table 1 lists a subset of the dataset. Newly registered COVID-19 trials are updated weekly to the database, and the trial index list is updated accordingly. Entities from the following 5 domains are used for question generation: condition, device, drug, measurement, and procedure. Domain-specific question templates are provided in Supplementary Table 2. For example, “Have you ever been diagnosed with (condition concept)?” is the question template for condition entities. The templates are not exhaustive to cover all possible questions but are designed to triage trials by building on prior knowledge of common eligibility criteria. The trial retrieval module interacts with users and asks criteria-related questions to facilitate eligibility determination. Users first enter their location (eg, zip code) and then select a study type (eg, interventional, observational). Next, 5 most frequently used criteria about current age, high-risk status (eg, hospital worker), COVID-19 status, current hospitalization or intensive care unit admission, and pregnancy status are formulated as “standard questions” and posed to all users. These criteria are frequently used across all COVID-19 trials and thus serve as an important participant stratification step, so that they are posed together on a single page instead of via dynamically generated questions. Afterward, an eligibility criterion with the maximum information gain (ie, with the highest entropy) is selected and rendered into question using the corresponding templates. Based on the user’s answer to the presented question, the most ineligible trials are filtered out. This process iterates, each time narrowing down the trial candidates pool and visualizing the recruiting sites of the remaining eligible trials on an interactive map, until the user reaches a short list of trials. Four main Web interfaces are shown in Figure 2. On the index page, users can specify the geographical area to search for recruiting sites by inputting a zip code and an adjustable radius in section 1. Section 2 shows a collapsible section containing advanced search options such as trial type, recruiting status, and keyword search. Five “standard questions” are expected to be answered in section 3, and users can skip them by clicking the “skip questions” button in section 4 and enter the dynamic questionnaire page. Users can select the question type and answer one question at a time in section 5, and the candidate trial list is dynamically updated and shown in section 6. The answered questions are recorded in section 7, where users can return to previous questions to make updates. After clicking the “Show Eligible Trials” button, users will be able to visualize them geographically. Eligible trials are listed by their titles in section 8, and all the recruiting sites within the user delimited area are marked as small green icons on the map section 9. The embedded map is powered by Google Maps application programming interface. Users can select any trial to review additional details such as study type, description, contact information, and location(s) in section 10, and its recruiting sites within the geographic area as specified by the user in section 1 will be highlighted and pinpointed on the map as well. Additionally, participants interested in learning more about the study will be able to access a link to ClinicalTrials.gov. The initial version of the COVID-19 Trial Finder was released in May, and more than 690 page visits from 20 countries were recorded by the end of August 2020, including 615 visits from 40 states in the United States, according to the report of Google Analytics.
Figure 2.

Overview of the 4 main COVID-19 Trial Finder Web interfaces: (A) index page, (B) standard question page, (C) dynamic questionnaire page, and (D) visualization page. Sections 1-10 indicates 10 different features.

Overview of the 4 main COVID-19 Trial Finder Web interfaces: (A) index page, (B) standard question page, (C) dynamic questionnaire page, and (D) visualization page. Sections 1-10 indicates 10 different features.

EXPERIMENT

We evaluated the effectiveness of the system by assessing its precision in identifying appropriate trials for users. We selected 20 patient cases in the U.S. from COVID-19 case reports curated by LitCOVID, with consideration for diversity in location, age, sex, and comorbidities, and run simulations on our system based on these patient cases. The detailed information of the 20 cases can be found in the Supplementary Table 3. For each case report, the zip code was based on the corresponding author’s address stated in the case report. We set the default radius as 100 miles to ensure adequate coverage of available trials. The 5 standard questions and multiple dynamic questions were answered based on the patient profile and reported symptoms. We continuously answered the dynamic questions until no more questions could be generated and the system prompted a review of the returned trial list. An example of the question-answering process is shown in Supplementary Table 4. We then manually reviewed each identified clinical trial in the final list to confirm its relevance to the user query by examining the inclusion and exclusion criteria available at ClinicalTrials.gov based on the patient case report (ie, the clinical trial was deemed relevant if the user met all inclusion criteria and no exclusion criteria). If there was no information in the case report to determine if the patient met any of the exclusion criteria, we considered the clinical trial relevant because eligibility could be further determined by the clinical research staff once the user initiates contact for possible participation. Finally, we calculated the individual search precision for each patient case by comparing the number of identified trials in the final list to the number of trials manually confirmed relevant. The averaged precision for the 20 user cases was calculated to indicate the system precision. Next, we evaluated the efficiency of the system by comparing the number of trials identified at each step of the search. The percentage of trials being filtered was the number of identified trials after answering the 5 standard questions divided by the number of retrieved trials after answering dynamically generated questions.

RESULTS

Table 1 demonstrates the diversity of the user cases. Patient age ranged from 43 days to 80 years, with 25% under 30 years of age, 55% between 30 and 60 years of age, and 20% over 60 years of age. The locations were distributed across 10 states in the United States encompassing 13 different counties. On average, 14 questions were answered for each patient case report, yielding an average precision of 79.76% in finding eligible trials. Because the number of trials found by the system varied considerably for each case (eg, only 1 trial was identified for cases 3 and 20, while 34 were identified for case 10), precision was normalized by the number of trials after screening for each case. On average, 34.8% of trials were filtered out after answering 9 dynamic questions, which is consistent with the experimental results for efficiency evaluation of DQueST.
Table 1.

Precision of COVID-19 Trial Finder in finding eligible trials for 20 user cases

CasePubMed IDAgeSexLocationQuestions answeredStart Number of TrialsTrials After 5 Standard QuestionsTrials After ScreeningTrials being filteredPrecision
1 326335534 yMNew York, NY91165420%1
2 3224028526 yMMaricopa County, AZ92316944%1
3 3252203757 yMAshland, KY610110%1
4 3231469956 yFNorth Chicago, IL2235271448%0.93
5 3235186080 yMAtlanta, GA724191332%0.92
6 3222271356 yMOrange County, LA1035261062%0.9
7 3246470733 yFNew York, NY25116602460%0.88
8 3223767034 yFWashington, DC86621210%0.86
9 3232836474 yMBoca Raton, FL143020670%0.83
10 3228231220 yMNew York, NY24116603443%0.82
11 3200442735 yMSnohomish County, WA142614140%0.79
12 3259284348 yMNewark, NJ24110342721%0.78
13 3272023367 yFNew York, NY23116472645%0.75
14 3232247848 yFNew York, NY1911641880%0.75
15 3233035654 yMSeattle, WA1026440%0.75
16 3240443143 dMNew York, NY12985420%0.75
17 3222020873 yFKing County, WA1026141029%0.7
18 3237515049 yMNew York, NY20116926826%0.68
19 3232247853 yMNew York, NY2011653498%0.38
20 3236849321 yMMiami-Dade, FL6168188%0
Average 34.8%79.76%

F: female; M: male.

Precision of COVID-19 Trial Finder in finding eligible trials for 20 user cases F: female; M: male.

DISCUSSION

A small number of identified trials were irrelevant as confirmed by manual review. In review, the imprecision encountered in finding eligible trials was largely caused by the inability to generate relevant questions. For a few criteria, no questions were asked to filter out ineligible trials, and these limitations can be summarized into 3 types: location, identity, and condition. Examples alongside the unmatched criteria and causes for the errors are described in Table 2.
Table 2.

Examples of 3 types of missing questions that cause ineligible trials that cannot be filtered out.

Limitation TypeCase No.IDCriteriaError Cause
Location10NCT04367831INC: New admission to eligible CUIMC ICUs within 5 dLocation question lacks granularity
18NCT04358029INC: Patients who have been diagnosed with COVID-19 infection at Mount Sinai HospitalLocation question lacks specificity (eg, diagnosis location)
Identity7NCT04349371INC: Employment by NewYork-Presbyterian HospitalNo question about employment
11NCT04360850INC: Must be a licensed mental healthcare providerNo question about job title
13NCT04414371INC: Enrolled in 4-y universities/colleges in 2020No question about student status
Condition4NCT04350593EXC: Severe COVID-19No severity question
20NCT04431856INC: Have a child between 6 and 13 yNo question asked about offspring information

COVID-19: coronavirus disease 2019; CUIMC: Columbia University Irving Medical Center; EXC: exclusion criteria; ICU: intensive care unit; INC: inclusion criteria.

Examples of 3 types of missing questions that cause ineligible trials that cannot be filtered out. COVID-19: coronavirus disease 2019; CUIMC: Columbia University Irving Medical Center; EXC: exclusion criteria; ICU: intensive care unit; INC: inclusion criteria. The “location” limitation refers to insufficient granularity or specificity in our question template for locations. The “identity” limitation signifies insufficient specificity in the identity of the participant, such as some trials recruiting clinical therapists instead of COVID-19–infected patients. For the “condition” limitation, extraction may be incorrect or missed so that concepts are mismatched to the terms. For example, the word severe can be a qualifier for a condition as opposed to be part of condition definition. To improve the relevance and precision of the trial filtering, this system could utilize a more granular annotation model to cover more entities and attributes as well as a wider range of domain types for these annotations such as visit, person, or observation within the OMOP model. Further, additional question templates would allow an increased number of questions to be posed. Considering the tradeoff between finer granularity in the annotation model and the increase in annotation cost, we did not add more questions in this study, but future efforts can explore how to efficiently annotate more types of criteria to boost the precision of trial matching while maintaining a high level of usability and comfortable ease of access to maintain user participation. Currently, we included only COVID-19 trials conducted in the United States in the Trial Finder application simply to keep the scope manageable for evaluation purpose and to avoid the need for engineering work on translating the system into different foreign languages. Our open-source method is available for adoption and implementation by researchers across the world. We compared the inclusion and exclusion criteria of 777 COVID-19 trials in the United States and 2318 COVID-19 trials outside the United States registered on ClinicalTrials.gov by October 1, 2020, and found 42.3% of overlap (87.0% if not counting infrequent criteria, which are defined as criteria that appear in <10 trials). For the different criteria, they can also be indexed with standard concepts and searched by corresponded questions because the OMOP CDM includes international terminologies. It will be interesting and feasible to apply the Trial Finder system for non-U.S. trials in the future.

CONCLUSION

The COVID-19 Trial Finder facilitates fast search and self-eligibility screening for COVID-19 trial seekers. Despite its limitations, preliminary evaluation by emulated case reports demonstrates its precision and efficiency, showing its potential as a user-friendly COVID-19 trial search engine.

FUNDING

This work was supported by the National Library of Medicine grant R01LM009886-11 (Bridging the Semantic Gap Between Research Eligibility Criteria and Clinical Data) and National Center for Advancing Clinical and Translational Science grants UL1TR001873 and 3U24TR001579-05 (to CW).

AUTHOR CONTRIBUTIONS

YS, AB, and CW conceived the system design together. YS, AB, FL, and CL designed and implemented the system. CW supervised the design and implementation. HL, LAS, JHK, and CY contributed to the data annotation. BRSI, QG, and XW contributed to the evaluation of the system. All authors edited and approved the manuscript.

SUPPLEMENTARY MATERIAL

Supplementary material is available at Journal of the American Medical Information Association online.

DATA AVAILABILITY STATEMENTS

The data underlying this article are available in Dryad Digital Repository, at https://doi.org/10.5061/dryad.7h44j0zs9 (https://datadryad.org/stash/share/XWwjmqkOcRkXofSvPg-XNCahexMbEjGf4gea07KTFeA).

CONFLICT OF INTEREST

None declared. Click here for additional data file.
  7 in total

1.  Keep up with the latest coronavirus research.

Authors:  Qingyu Chen; Alexis Allot; Zhiyong Lu
Journal:  Nature       Date:  2020-03       Impact factor: 49.962

2.  Criteria2Query: a natural language interface to clinical databases for cohort definition.

Authors:  Chi Yuan; Patrick B Ryan; Casey Ta; Yixuan Guo; Ziran Li; Jill Hardin; Rupa Makadia; Peng Jin; Ning Shang; Tian Kang; Chunhua Weng
Journal:  J Am Med Inform Assoc       Date:  2019-04-01       Impact factor: 4.497

3.  Connecting the public with clinical trial options: The ResearchMatch Trials Today tool.

Authors:  Jill M Pulley; Rebecca N Jerome; Gordon R Bernard; Erik J Olson; Jason Tan; Consuelo H Wilkins; Paul A Harris
Journal:  J Clin Transl Sci       Date:  2018-08

4.  DQueST: dynamic questionnaire for search of clinical trials.

Authors:  Cong Liu; Chi Yuan; Alex M Butler; Richard D Carvajal; Ziran Ryan Li; Casey N Ta; Chunhua Weng
Journal:  J Am Med Inform Assoc       Date:  2019-11-01       Impact factor: 4.497

5.  EliIE: An open-source information extraction system for clinical trial eligibility criteria.

Authors:  Tian Kang; Shaodian Zhang; Youlan Tang; Gregory W Hruby; Alexander Rusanov; Noémie Elhadad; Chunhua Weng
Journal:  J Am Med Inform Assoc       Date:  2017-11-01       Impact factor: 4.497

6.  Chia, a large annotated corpus of clinical trial eligibility criteria.

Authors:  Fabrício Kury; Alex Butler; Chi Yuan; Li-Heng Fu; Yingcheng Sun; Hao Liu; Ida Sim; Simona Carini; Chunhua Weng
Journal:  Sci Data       Date:  2020-08-27       Impact factor: 6.444

7.  An interactive online dashboard for tracking COVID-19 in U.S. counties, cities, and states in real time.

Authors:  Benjamin D Wissel; P J Van Camp; Michal Kouril; Chad Weis; Tracy A Glauser; Peter S White; Isaac S Kohane; Judith W Dexheimer
Journal:  J Am Med Inform Assoc       Date:  2020-07-01       Impact factor: 4.497

  7 in total
  1 in total

1.  A Framework for Systematic Assessment of Clinical Trial Population Representativeness Using Electronic Health Records Data.

Authors:  Yingcheng Sun; Alex Butler; Ibrahim Diallo; Jae Hyun Kim; Casey Ta; James R Rogers; Hao Liu; Chunhua Weng
Journal:  Appl Clin Inform       Date:  2021-09-08       Impact factor: 2.762

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.