Neha M Jain1, Marilyn Holt1, Christine Micheel1,2, Mia Levy3,4. 1. Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN. 2. Division of Hematology/Oncology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN. 3. Division of Hematology/Oncology, Department of Internal Medicine, Rush University Medical Center, Chicago, IL. 4. Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN.
Abstract
PURPOSE: The field of oncology is expanding rapidly. New trials are opening as an increasing number of therapeutic agents are being investigated before they can become approved therapies. Aggregate views of these data, particularly data associated with diseases, biomarkers, and drugs, can be helpful in understanding the trends in current research as well as existing gaps in cancer care. METHODS: In this paper, we performed a landscape analysis for breast cancer and acute myeloid leukemia related trials with structured, curated data from clinical trials using the My Cancer Genome clinical trial knowledgebase. RESULTS: We have performed detailed analytics on breast cancer (N = 1,128) and acute myeloid leukemia trial sets (N = 483) to highlight the top biomarkers, drug classes, and drugs-thereby supporting a full view of biomarkers, biomarker groups, and drugs that are currently being explored in these respective diseases. CONCLUSION: Analysis and data visualization of the cancer clinical trial landscape can inform strategic planning for new trial designs and trial activation at a particular site.
PURPOSE: The field of oncology is expanding rapidly. New trials are opening as an increasing number of therapeutic agents are being investigated before they can become approved therapies. Aggregate views of these data, particularly data associated with diseases, biomarkers, and drugs, can be helpful in understanding the trends in current research as well as existing gaps in cancer care. METHODS: In this paper, we performed a landscape analysis for breast cancer and acute myeloid leukemia related trials with structured, curated data from clinical trials using the My Cancer Genome clinical trial knowledgebase. RESULTS: We have performed detailed analytics on breast cancer (N = 1,128) and acute myeloid leukemia trial sets (N = 483) to highlight the top biomarkers, drug classes, and drugs-thereby supporting a full view of biomarkers, biomarker groups, and drugs that are currently being explored in these respective diseases. CONCLUSION: Analysis and data visualization of the cancer clinical trial landscape can inform strategic planning for new trial designs and trial activation at a particular site.
The term cancer is used to describe many conditions, and cancer's complexity and diversity is reflected in cancer treatments as well. It is not surprising, therefore, that a quarter of all drug approvals in the United States are for oncology.[1] Before these drugs are approved, they go through a multistep process of approval—preclinical studies, clinical trials, regulatory review, etc—to generate data demonstrating safety and efficacy. The ability to view these data for all oncology trials in an aggregated form can be a powerful tool for providing unique insights into the clinical trial enterprise.
CONTEXT
Key ObjectiveWhat additional information can be extracted from an aggregated view of clinical trial data?Knowledge GeneratedAggregated data when converted into visualizations allowed us to elucidate biomarkers, drugs, and drug classes that were most frequently associated with clinical trials. The model expressivity plays a key role in determining the utility of this data.RelevanceSuch visualizations can be helpful in understanding treatment trends, gaps in the cancer care spectrum, and on an institutional level identify opportunities for opening new trials.Currently, publicly available sources such as cancer.gov[2] and ClinicalTrials.gov[3] have information about clinical trials. Most of this information, excluding some clinical trial metadata, is unstructured and trapped in free-text documents.[4] This makes it hard to computationally analyze these documents and extract trial elements for quick visualization. Developing informatics tools that support structuring these data elements and make these data machine-readable will open up avenues for downstream applications.Several efforts have been made to develop tools to structure clinical trial data, including My Cancer Genome,[5] Matchminer,[6] IBM Watson,[7] MolecularMatch,[8] JAX-CKB,[9] and TrialProspector,[10] to name a few. Some of these models are fully commercial and therefore details about the data elements are not readily available. Additionally, the aggregated data are not made publicly available. Some groups have published aggregate data analyses,[9] but the scope of curation is limited to individual genomic biomarkers and does not take into consideration cytogenetic and protein biomarkers, which are becoming increasingly therapeutically relevant. Certain biomarkers are typically reported together in clinical trial documents, or are related via a single pathway, or may be altered in relation to specific cancer diagnosis, such as human epidermal growth factor receptor 2 (HER2)-positive, estrogen receptor (ER)-positive, EGFR-activating mutations, and poor-risk mutations in acute myeloid leukemia (AML) as defined by National Comprehensive Cancer Network (NCCN) guidelines. Curating these together as an entity rather than individual alterations can add value to the curated trial content making it more relevant for patient populations. We have discussed this in detail elsewhere.[11] Most groups do not curate information about clinical trial interventional arms, the respective drugs in each of the arms related to the disease group and biomarker eligibility criteria, nor the treatment context for the trial (ie, neoadjuvant or adjuvant metastatic). This information when curated accurately can provide additional context to data and can provide meaningful insights.In collaboration with Genomoncology LLC,[12] we developed our own clinical trial data model that has been used to manually curate > 9,100 oncology trials. The individual and aggregate data from all these trials are publicly available on the My Cancer Genome website.[5] Although these data from the My Cancer Genome website offer an insight into the design of individual trials and the landscape of the oncology trial spectrum, assessing therapy trends across all cancer trials and understanding biomarkers being studied for drug development in selected diseases of interest warrants a deeper analysis. We chose breast cancer and AML for our analysis because these diseases especially highlight the importance of curating protein and cytogenetic biomarkers, which is a unique strength of our curation model. The model has been described in detail elsewhere.[11] Furthermore, breast cancer is now the most common cancer in the world[13] and therefore is a disease of wide interest. AML is one of the most common hematologic malignancy in adults and also most often associated with clinical trials. Therefore, we performed a comprehensive analysis of all clinical trials related to breast cancer and AML to assess treatment trends and gather a snapshot view of the clinical trial spectrum across the globe.
METHODS
Using the clinical trial data model described earlier,[11] we curated all therapeutic interventional breast cancer and AML trials regardless of the presence of a biomarker-driven eligibility criterion or trial recruiting status. For these trials, the clinical trial document was reviewed, and the information related to the diagnosis, biomarker eligibility criteria (wherever applicable), treatment context as well as drugs was captured in a structured format. All trials curated till December 3, 2019, were included in our analysis to allow a full view of trials for this disease, and not just the trials that were open and recruiting at the time of analysis.All curated data for breast cancer and AML trials were accessed from the backend of the model and extracted out in a structured form as a CSV file (Data Supplement). As described in our previous publication,[11] a trial arm is a combination of disease (with or without a biomarker) and a specific treatment. Therefore, a clinical trial with breast cancer exploring two different drug combinations would be organized into two trial arms. Drug classes were manually added and associated with all drugs associated with trial arms. Here, a trial arm refers to Drug class information was obtained from Drugbank,[14] NCIt,[15] and PubMed.[16] Some previous studies have classified vaccine therapy, viral therapy, and cell therapy as immunotherapy,[17] but for the purposes of this paper, we defined them as individual categories. Immunotherapy was used to define cytotoxic T-cell lymphocyte-4 inhibitors, programmed cell death protein 1 (PD-1), or programmed death ligand-1 (PD-L1) inhibitors, whereas chimeric antigen receptor T-cells (CAR-T) cell therapies and other engineered cell therapies that directly or indirectly modulate the immune system were classified as cell-therapy based immunotherapies. Some drugs could not be clearly classified into one category, whereas others were relatively new and did not have much information available, while yet others were extremely specialized and could not be added in the previously defined categories. These drugs were grouped in the other category. A full list of all drugs and drug classes can be found in the Data Supplement. All data were sorted and visualized using Microsoft Excel.For the triple-negative breast cancer (TNBC)-focused analysis, we extracted trial phase, drug names, and drug classes for all TNBC trials that were using pembrolizumab as one of the treatment drugs. These trials were manually reviewed and classified further by metastatic line of therapy into first line, second line or greater, first to third line, and any line. The data were visualized in multiple ways with sunburst plots, first oriented by line of metastatic therapy, and then by drug class. These analyses are not directly available from the website and need downloadable access to the full data set as well significant manual data sorting and enhancements. The aggregate data can be found in the Data Supplement.
RESULTS
Landscape Analysis of Breast Cancer Trials
Figure 1 shows the landscape of biomarkers investigated in breast cancer trials curated in the My Cancer Genome knowledgebase. One thousand eight hundred eighty-one clinical trial arms from 1,128 clinical trials related to breast cancer were reviewed for this analysis between June 2012 and December 2019. Unsurprisingly, the top biomarker alterations across breast trials (Fig 1A) are related to protein expression of the ER and progesterone receptor (PR), and amplification or protein expression of HER2. BRCA1 and BRCA2 mutations and PIK3CA mutations also had a significantly high number of associated trials owing to the multiple studies investigating poly (ADP-ribose) polymerase inhibitors and PIK3CA inhibitors, respectively. These trials ultimately led to the approval of several poly (ADP-ribose) polymerase inhibitors[18,19] and alpelisib.[20,21] Other biomarkers on the graph are also being explored for multiple drugs that are now approved or under study, namely MET and FGFR mutations. Figure 1B shows all breast trials intersected by trial phase as well as treatment context (ie, neoadjuvant, adjuvant, and metastatic). As expected, the largest category of trials focused on patients in the metastatic setting (800 trials, 69.7%), followed by the neoadjuvant (226, 19.7%) and adjuvant settings (121, 10.5%). Figure 2A shows the breakdown of breast cancer trials by the drug classes being investigated in the different cohorts. Cytotoxic therapy was most widely used in clinical trial arms (n = 1,023), closely followed by tyrosine kinase inhibitors (TKIs; n = 571) and hormonal therapy (n = 458). Figure 2B shows the number of unique drugs in each of the studied drug classes. TKIs with a total of 105 agents being investigated was the top category, followed by cytotoxic therapy (n = 82) and immunotherapy or cell therapy (n = 78). Cytotoxic agents comprised 32% of the drugs in breast cancer trials.
FIG 1.
Landscape analysis of all breast cancer trial arms. (A) The top biomarker alterations (B) and breast trials intersected via phase and treatment context. A total of 1,881 clinical trial arms from 1,128 clinical trials were used for this analysis.
FIG 2.
Landscape of drugs and drug classes in breast cancer trials. (A) The top drug classes being studied in breast trial cohorts and (B) the number of unique drugs in each of the drug classes being studied for breast trial cohorts. A total of 1,881 clinical trial arms from 1,128 clinical trials were used for this analysis. ADC, antibody-drug conjugate; HDAC, histone deacetylase; PARP, poly (ADP-ribose) polymerase; TKI, tyrosine kinase inhibitor.
Landscape analysis of all breast cancer trial arms. (A) The top biomarker alterations (B) and breast trials intersected via phase and treatment context. A total of 1,881 clinical trial arms from 1,128 clinical trials were used for this analysis.Landscape of drugs and drug classes in breast cancer trials. (A) The top drug classes being studied in breast trial cohorts and (B) the number of unique drugs in each of the drug classes being studied for breast trial cohorts. A total of 1,881 clinical trial arms from 1,128 clinical trials were used for this analysis. ADC, antibody-drug conjugate; HDAC, histone deacetylase; PARP, poly (ADP-ribose) polymerase; TKI, tyrosine kinase inhibitor.Since tumor ER, PR, and HER2 status is routinely assessed for all invasive breast cancer diagnoses, we wanted to classify all breast cancer trials according to this criterion to highlight the distribution of drug discovery across this trio of biomarkers. The highest number of trial arms were found associated with HER2-negative breast cancer (n = 638; 34%). TNBC was the second largest category with 553 trials (29%). Interestingly, there were only six trials (0.3%) that were recruiting patients who had both a positive HER2 status and a negative hormone receptor status (Fig 3A). This highlights the importance of protein biomarkers in breast cancers and makes a strong case for developing models that support curation for protein biomarkers.
FIG 3.
Breast cancer subanalysis. (A) Distribution of breast cancer clinical trials by protein biomarker eligibility. All breast cancer trial arms were distributed as per the protein biomarker criteria. A total of 1,881 clinical trial arm from 1,128 trials were used for this analysis. (B) Distribution of TNBC clinical trials by line of therapy, drug class, drug, and phase. All TNBC trials that were in the metastatic treatment setting and used pembrolizumab as one of the treatment drug were used for this analysis (n = 33). The loops, moving out, represent metastatic line of therapy, drug classes, drugs, and trial phase, respectively. HER2, human epidermal growth factor receptor 2; HR, hormone receptor; TNBC, triple-negative breast cancer; Tx, therapy.
Breast cancer subanalysis. (A) Distribution of breast cancer clinical trials by protein biomarker eligibility. All breast cancer trial arms were distributed as per the protein biomarker criteria. A total of 1,881 clinical trial arm from 1,128 trials were used for this analysis. (B) Distribution of TNBC clinical trials by line of therapy, drug class, drug, and phase. All TNBC trials that were in the metastatic treatment setting and used pembrolizumab as one of the treatment drug were used for this analysis (n = 33). The loops, moving out, represent metastatic line of therapy, drug classes, drugs, and trial phase, respectively. HER2, human epidermal growth factor receptor 2; HR, hormone receptor; TNBC, triple-negative breast cancer; Tx, therapy.In an attempt to more deeply interrogate the TNBC trials, Figure 3B shows TNBC trials that were in the metastatic setting and used pembrolizumab as one of the treatment drugs. Most trials explored cytotoxic therapies in combination with pembrolizumab. Kinase inhibitors with pembrolizumab was the second most common combination. Most trials were in the phase II setting (n = 19), some in phase III (n = 4), and a few more in phase I (n = 7). Majority of the trials were in metastatic: any line, closely followed by metastatic first line. Only two trials were exploring single-agent pembrolizumab.
Landscape Analysis of AML Clinical Trials
The landscape of global AML trials has been shown in Figure 4. Figure 4A shows the top biomarker alterations featured across all AML-related clinical trial arms. Seven hundred seventy-nine clinical trial arms from 483 clinical trials related to AML were reviewed for this analysis between June 2012 and December 2019. KMT2A fusion, FLT3 mutation, and DEK-NUP214 fusion were the top biomarker alterations across these trials. Seventy-five percent of the biomarkers in Figure 4A were cytogenetic markers, highlighting the importance and prevalence of these markers in AML trials. Figure 4B displays the top biomarker alteration groups encountered while curating AML-related trial arms. NCCN[22] and European Leukemia Net (ELN[23]) guideline-driven alteration groups feature in this list along with other biomarker groups that were created by curators to facilitate efficient curation of biomarkers that are frequently featured together in multiple clinical trials. Figure 5A shows the breakdown of AML trials by the drug classes being investigated in the different cohorts. Cytotoxic therapy was still most widely used in clinical trial arms (n = 888), closely followed by transplant therapy (n = 160), TKIs (n = 138), and a variety of other targeted therapies and immunotherapies. Figure 5B shows the number of unique drugs in each of the studied drug classes. TKIs with a total of 41 agents being investigated was the top category, followed by cytotoxic therapy (n = 39) and engineered cell therapy (cell-based immunotherapy, n = 29).
FIG 4.
Landscape analysis of all AML trial arms. (A) Top biomarker alterations and (B) top biomarker groups. A total of 779 clinical trial arms from 483 clinical trials were used for this analysis. AML, acute myeloid leukemia; ELN, European Leukemia Net; FAB, French-American-British; NCCN, National Comprehensive Cancer Network.
FIG 5.
Landscape of drugs and drug classes in AML trial arms. (A) The top drug classes being studied in AML trial arms and (B) the number of unique drugs in each of the drug classes being studied for AML trial arms. A total of 779 clinical trial arms from 483 clinical trials were used for this analysis. ADC, antibody-drug conjugate; AML, acute myeloid leukemia; MAb, monoclonal antibody; TKI, tyrosine kinase inhibitor.
Landscape analysis of all AML trial arms. (A) Top biomarker alterations and (B) top biomarker groups. A total of 779 clinical trial arms from 483 clinical trials were used for this analysis. AML, acute myeloid leukemia; ELN, European Leukemia Net; FAB, French-American-British; NCCN, National Comprehensive Cancer Network.Landscape of drugs and drug classes in AML trial arms. (A) The top drug classes being studied in AML trial arms and (B) the number of unique drugs in each of the drug classes being studied for AML trial arms. A total of 779 clinical trial arms from 483 clinical trials were used for this analysis. ADC, antibody-drug conjugate; AML, acute myeloid leukemia; MAb, monoclonal antibody; TKI, tyrosine kinase inhibitor.The treatment context associated with AML trials has been summarized in Table 1. The AML trials were divided based on line of treatment (first-line v relapse or refractory), stage of treatment (induction, consolidation, or maintenance), as well as treatment modality (cellular or transplant therapies, noncellular therapy, or post-transplant therapy). Therefore, each trial could be a part of one or all three of these categories. The highest number of trials was found to be associated with relapse or refractory, followed closely by noncellular therapy.
TABLE 1.
Curated Treatment Contexts in Acute Myeloid Leukemia
Curated Treatment Contexts in Acute Myeloid Leukemia
DISCUSSION
In this manuscript, we describe a landscape analysis of breast cancer and AML clinical trials. This curation effort supports several unique analytics that required significant manual input and are currently not available on the My Cancer Genome website.[5]The analysis presented here highlights the importance of biomarker and biomarker groups for curating oncology trials especially for breast cancer and AML. Breast cancer and AML were used for this analysis for their relative abundance in the population. Furthermore, these cancers also highlight the importance of model expressivity of the My Cancer Genome clinical trial curation and data model. It is not surprising to see the large number of trials associated with the ER, PR, and HER2-related biomarkers for breast cancer, or the numerous AML trials associated with the risk groups. But the unique strength in our model is the inclusion of protein biomarkers for breast cancer trials (which are usually measured through immunohistochemistry tests, not genomic tests) as well as the cytogenetic markers (often evaluated using fluorescent in situ hybridization tests or karyotyping) for AML trials. Other efforts at creating a landscape analysis have only focused on genomic biomarkers, which does not offer a complete view into the complexity of oncology biomarkers. The curation of biomarker groups and cytogenetic markers for AML and protein biomarkers for breast cancer trials helps appreciate the significance and prevalence of these biomarkers in the full view of other curated biomarkers.Another unique contribution of this manuscript is the detailed analysis related to investigational drugs, drug classes, and treatment context. It is not surprising that cytotoxic chemotherapy regimens still remain the backbone of most cancer treatments and are seen in abundance in clinical trials—for breast cancer as well as AML. Cytotoxic therapy–related regimens are widely used as treatment controls and their abundance can therefore be accounted. Although transplant therapies have always been an important cornerstone of AML treatments, emergence of cell-based immunotherapies or engineered cell therapies has been an exciting development. There were a total of 242 drugs evaluated in AML trials, compared with the 31[24] drugs approved for AML over the period explored in this analysis. Similarly, for the 479 agents being investigated in breast cancer, 68[25] agents are US Food and Drug Administration–approved for treating breast cancer. This is not surprising, given the recent data that < 3.5% of oncology drugs that are investigated in clinical trials actually get approved.[26]Despite the relative rarity of triple-negative breast cancer (TNBC),[27] a significantly high number of trials were found to be investigating it. This is because of the lethality of the disease as well as the lack of therapies in this area. TNBC is an aggressive form of breast cancer, which currently does not have targeted therapy standard of care options. It is noteworthy that the frequent inclusion of patients with TNBC in multidisease trials and phase I trials also contributed to this high percentage.This analysis also allowed us to highlight the complexity of treatment context related to AML (Table 1). These could be overlapping and are therefore more difficult to interpret. For example, a particular treatment could be classified as first-line therapy, induction therapy, and noncellular therapy. The information reported on the publicly available clinical trial document was relied upon to perform this classification. Since trial document can sometimes be hard to interpret, the classification was not always straightforward. Most trials could be distinguished based on line of treatment and treatment modality but stage of treatment was not always readily available in trial document. In future, simplifying or unifying treatment contexts for AML may help describe the treatment context and clinical appropriateness for these trials more efficiently.One of the important observations was the enormous number of clinical trials that were open for both of the diseases studied. Low trial enrollment has been a stumbling block, preventing many trials from reaching completion. In light of this knowledge, it seems counterproductive to open multiple trials exploring either the same or similar drugs resulting in a division of efforts rather than a consolidated unified effort in one direction. Tang et al[17] found that only 5% of the 1,105 combination trials exploring PD-1 and PD-L1 agents are exploring new immunotherapy agents, whereas the remaining 95% are evaluating the previously approved five agents (avelumab, atezolizumab, durvalumab, nivolumab, and pembrolizumab) in combination with other immunotherapies, targeted therapies, or cytotoxic therapies.In an attempt to investigate the existing overlap of trials in our data set, we performed a deeper analysis for a smaller subset: metastatic TNBC trials exploring pembrolizumab. At first instance, these trials look considerably different with minimal overlap in terms of the treatment setting and the actual drug being explored. But a closer look revealed overlap in terms of the drug classes being explored. For example, a clinician looking for a first-line metastatic trial will have to choose from nine different available trial options, four of which are exploring pembrolizumab in combination with cytotoxic therapy (Fig 3B). Such fragmentation of efforts can result in significant duplication of work, making the whole enrollment effort counterproductive. Investing in tools that support visualization of drug and drug class data can elucidate trial data and potentially help conserve resources and prevent duplication of efforts in the future.In conclusion, in this paper, we display the landscape of AML and breast cancer trials. Our clinical trial data model[11] allows us to record several data elements in a structured exportable format. In this paper, we presented aggregate data analysis for all breast cancer and AML trials in an effort to illuminate the oncology landscape for these two diseases. At an institutional level, such aggregated data can help visualize the trial portfolio of an institution, thereby identifying strengths and areas of gap in cancer care. Globally, this can help understand trends in oncology both in research as well as standard of care treatments. Such visualizations can help assess the landscape of oncology and guide the decision to open a new trial after careful assessment of existing trials, with the ultimate goal of preventing duplication of efforts.
Authors: Jennifer K Litton; Hope S Rugo; Johannes Ettl; Sara A Hurvitz; Anthony Gonçalves; Kyung-Hun Lee; Louis Fehrenbacher; Rinat Yerushalmi; Lida A Mina; Miguel Martin; Henri Roché; Young-Hyuck Im; Ruben G W Quek; Denka Markova; Iulia C Tudor; Alison L Hannah; Wolfgang Eiermann; Joanne L Blum Journal: N Engl J Med Date: 2018-08-15 Impact factor: 91.245
Authors: Mark Robson; Seock-Ah Im; Elżbieta Senkus; Binghe Xu; Susan M Domchek; Norikazu Masuda; Suzette Delaloge; Wei Li; Nadine Tung; Anne Armstrong; Wenting Wu; Carsten Goessl; Sarah Runswick; Pierfranco Conte Journal: N Engl J Med Date: 2017-06-04 Impact factor: 91.245
Authors: Satya S Sahoo; Shiqiang Tao; Andrew Parchman; Zhihui Luo; Licong Cui; Patrick Mergler; Robert Lanese; Jill S Barnholtz-Sloan; Neal J Meropol; Guo-Qiang Zhang Journal: Cancer Inform Date: 2014-12-04
Authors: David S Wishart; Yannick D Feunang; An C Guo; Elvis J Lo; Ana Marcu; Jason R Grant; Tanvir Sajed; Daniel Johnson; Carin Li; Zinat Sayeeda; Nazanin Assempour; Ithayavani Iynkkaran; Yifeng Liu; Adam Maciejewski; Nicola Gale; Alex Wilson; Lucy Chin; Ryan Cummings; Diana Le; Allison Pon; Craig Knox; Michael Wilson Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971
Authors: Neha Jain; Kathleen F Mittendorf; Marilyn Holt; Michele Lenoue-Newton; Ian Maurer; Clinton Miller; Matthew Stachowiak; Michelle Botyrius; James Cole; Christine Micheel; Mia Levy Journal: J Am Med Inform Assoc Date: 2020-07-01 Impact factor: 4.497