| Literature DB >> 32483629 |
Neha Jain1, Kathleen F Mittendorf1, Marilyn Holt1, Michele Lenoue-Newton1, Ian Maurer2, Clinton Miller2, Matthew Stachowiak2, Michelle Botyrius2, James Cole2, Christine Micheel1,3, Mia Levy4,5.
Abstract
OBJECTIVE: As clinical trials evolve in complexity, clinical trial data models that can capture relevant trial data in meaningful, structured annotations and computable forms are needed to support accrual.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32483629 PMCID: PMC7647323 DOI: 10.1093/jamia/ocaa066
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Figure 1.Clinical trial model and workflow schematic. This high-level schematic describes the curation model components and workflow. Clinical trial documents are pulled into the clinical trial dataset from publicly available sources. Using the web-based interface, a curator creates structured assertions for trials. This is done using the terminologies and concept groups available in the data model. A single clinical trial can be broken down into multiple individual treatment cohort assertions (TCAs), each corresponding to a separate treatment arm. Once the assertions are created, they undergo a secondary manual review before being published into the clinical trial knowledge base. This knowledge base is utilized for clinical trial matching, display on My Cancer Genome website, as well as for multiple downstream applications.
Components of clinical trial eligibility criteria assertion model
| Concepts | Definitions | Examples | |
|---|---|---|---|
| Core model concepts | Eligibility criteria assertion (ECA) | Structured diagnosis with or without associated biomarker criteria as described in the trial document | Breast Cancer: [ER Negative AND PR Negative AND HER2 Negative] AND [NONE: TP53 Germline Mutations] |
| Treatment context assertion (TCA) | Combination of the intention of treatment or sequencing of treatment and the treatment agents themselves as described in the trial document | [Niraparib AND Radiation Therapy] AND [Adjuvant Setting] | |
| Treatment arm assertion (TAA) | Linking the patient population (diagnosis and biomarker criteria) and treatment arms outlined in clinical trial document. Achieved via linking the ECA & TCA | “Breast Cancer: [ER Negative AND PR Negative AND HER2 Negative] AND [NONE TP53 Somatic or Germline Mutations]” & “ [Niraparib AND Radiation Therapy] AND [Adjuvant Setting]” | |
| Terminologies | Diagnosis | Solid tumor, hematologic or lymphoid malignancy. The mapping structure means that multiple disease synonyms will match to the same disease, ie, hepatic cancer, hepatic carcinoma, cancer of the liver, liver cancer, and hepatocellular carcinoma would all map to the concept of hepatocellular carcinoma | breast carcinoma, head and neck squamous cell carcinoma, acute myeloid leukemia |
| Biomarker | Includes biomarkers related to gene variants (mutation, deletion, fusion, amplification, loss); protein variants (expression, over-expression, deficient expression); cytogenetic/chromosomal abnormalities (duplication, deletion, monosomy, trisomy, karyotype, translocations, inversions); viral markers (EBV, HPV, KSHV, MCPyV, etc.), serological (HLA, HLB markers), epigenetic markers (methylation status), specialty markers (microsatellite instability, tumor mutation, MMR status) | MYC amplification, MSH2 loss, ERBB2 overexpression, monosomy 7, complex karyotype, t(9; 11)(p21; q23), KMT2A Fusion, dup(1)(q10qter), EBV positive, HLA-A*02:05, MGMT promoter methylation positive, MSI-High, TMB-Low, dMMR | |
| Therapies | Any therapeutic approach used in clinical trial document and defined on NCIt including, but not limited to, targeted therapy, immunotherapy, hormonal therapy, cytotoxic agents, monoclonal antibodies, antibody-drug conjugates, vaccine therapy, and hematopoietic and bone marrow transplantation (surgical interventions and radiation therapy subtypes are excluded) | osimertinib, larotrectinib, tamoxifen, pembrolizumab, oxaliplatin, trastuzumab, brentuximab vedotin, Lu-177-DOTA-TATE, CAR T-cell therapy | |
| Therapeutic context | Describes the clinical context for treatment as would be relevant to a patient's disease state | Neoadjuvant, adjuvant, metastatic, treatment-naïve, relapse, refractory, etc. | |
| Concept groups | Disease groups | A meaningful grouping of diagnoses usually based on organ systems, similarity of disease biology or other commonalities | Urinogenital Cancer group: Extragonadal Embryonal Carcinoma, Renal Pelvis and Ureter Carcinoma, Urothelial Carcinoma, Uterine Corpus Neuroendocrine Neoplasm, Bladder Small Cell Neuroendocrine Carcinoma, Cervix Carcinoma, Malignant Bladder Neoplasm, Malignant Ovarian Germ Cell Tumor, Malignant Renal Pelvis Neoplasm, Malignant Reproductive System Neoplasm, Malignant Ureter Neoplasm, Malignant Urethral Neoplasm, Ovarian Embryonal Carcinoma, Testicular Embryonal Carcinoma, Transitional Cell Carcinoma, Ureter Small Cell Carcinoma |
| Biomarker groups | A grouping of biomarker concepts usually to accommodate biomarkers that frequently appear together in clinical trials, or are related via a single pathway, or are usually altered in a particular disease. Can also be used to accommodate NCCN-approved risk biomarker groups for prognostic risk or diagnostic classification | 11q23 abnormalities: del(11)(q10), KMT2A-AFF1 Fusion, KMT2A Fusion, KMT2A-MLLT3 Fusion, inv(11)(p15q23), KMT2A-ELL Fusion, KMT2A-MLLT10 Fusion, KMT2A-MLLT1 Fusion, KMT2A-MLLT4 Fusion, t(10; 11)(p12; q23), t(11; 19)(q23; p13.1), t(11; 19)(q23; p13.3), t(4; 11)(q21; q23), t(6; 11)(q27; q23), t(9; 11)(p21; q23), Trisomy 11 | |
| Drug groups | A grouping of drugs based on drug categories/classes that have similar mechanism of action or usually appear together in trial documents. Can accommodate drug groups that cannot be directly derived from the drug ontology in NCIt | FDA approved aromatase inhibitors: exemestane, anastrozole, letrozole. |
Figure 2.Core model concepts: framework and instance. The figure shows the components of the assertions defined as core model concepts. The top section indicates the framework for eligibility criteria assertion, therapeutic context assertion, and treatment arm assertion; and the bottom section presents a real-world trial example.
Figure 3.Clinical trial curation workflow and results. The figure above shows (A) a broad overview of the curation workflow. There are currently 67 479 cancer-related clinical trials loaded into the system (as of 10/30/2019). Of these, 15 578 were found to have a status of recruiting or not-yet-recruiting. Of these, 9855 trials were automatically flagged for manual review as possibly containing biomarker key words. According to the curation SOP, of the 9855 manually reviewed clinical trials, 5045 met criteria for manual curation of disease-biomarker eligibility criteria and treatment context. A total of 4810 trials were considered out of scope based on the curation SOP. The trials that had a biomarker-driven eligibility criterion were curated. To date, we have manually curated and created structured annotations for 5045 clinical trials. A detailed copy of the SOP is provided in the Supplementary Material. Trials included for manual curation have a recruiting status of “Recruiting” or “Not yet recruiting,” that are (i) interventional (ii) directed toward treating cancer (not for treating side-effects or toxicities caused by cancer treatments), and (iii) contain biomarker-driven eligibility criteria (patient’s tumor is required to have a specific biomarker to enroll on the trial). (B) the different biomarker type supported by the system for clinical trial curation. Genomic biomarker makes up the largest category followed by protein, cytogenetic, viral, serological, and epigenetic-related biomarkers. The numbers in parentheses on the X-axis indicate the actual number of defined concepts in each category, while the instances of cumulative use across clinical trial curations are shown on the y-axis. The curated trial dataset (n = 5045) was used to calculate these numbers.
Number of entities for concept groups and germline biomarkers with usage data
| Number of entities | Cumulative Usage in Unique Trials | |
|---|---|---|
| Biomarker groups | 355 | 2404 |
| Drug groups | 257 | 277 |
| Disease groups | 3 | 8 |
| Germline biomarkers | 104 | 250 |
Figure 4.Screenshot of display of curated clinical trial assertions. This screenshot from the My Cancer Genome website shows the grouping of eligibility criteria assertions (enclosed by yellow boxes) with the treatment context assertions (enclosed by red boxes). Accessed 2/2/2020.
Figure 5.Screenshot of gene and disease inclusion criteria for open trials investigating olaparib. This screenshot from the My Cancer Genome website shows the grouping of trials by disease and biomarker eligibility criteria that are investigating the use of olaparib. The total number of trials associated with each disease is shown on the data labels. The breakdown of the trials for each disease category can be viewed by hovering over the area of interest at the relevant My Cancer Genome web page (https://www.mycancergenome.org/content/drugs/olaparib/). The curated trial dataset (n = 5045) was used to calculate these numbers. Accessed 06/26/2019.
Figure 6.Landscape analysis of all cancer trials shows the (A) top diseases (B) top biomarker alterations, and (C) top drugs in our curated knowledge base. The curated trial dataset (n = 5045) was used to calculate these numbers.