Mark T A Donoghue1, Alison M Schram2, David M Hyman2,3,4, Barry S Taylor5,6,7,8. 1. Marie-Josee and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, USA. 2. Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA. 3. Weill Cornell Medical College, New York, NY, USA. 4. Loxo Oncology, Stamford, CT, USA. 5. Marie-Josee and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, USA. taylorb@mskcc.org. 6. Weill Cornell Medical College, New York, NY, USA. taylorb@mskcc.org. 7. Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA. taylorb@mskcc.org. 8. Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA. taylorb@mskcc.org.
Abstract
The molecular characterization of tumors now informs clinical cancer care for many patients. This advent of molecular oncology has been driven by the expanding number of therapeutic biomarkers that can predict sensitivity to both approved agents and investigational agents. Beyond its role in driving clinical-trial enrollments and guiding therapy in individual patients, large-scale clinical genomics in oncology also represents a rapidly expanding research resource for translational scientific discovery. Here we review the progress, opportunities, and challenges of scientific and translational discovery from prospective clinical genomic screening programs now routinely conducted for patients with cancer.
The molecular characterization of tumors now informs clinical cancer care for many patients. This advent of molecular oncology has been driven by the expanding number of therapeutic biomarkers that can predict sensitivity to both approved agents and investigational agents. Beyond its role in driving clinical-trial enrollments and guiding therapy in individual patients, large-scale clinical genomics in oncology also represents a rapidly expanding research resource for translational scientific discovery. Here we review the progress, opportunities, and challenges of scientific and translational discovery from prospective clinical genomic screening programs now routinely conducted for patients with cancer.
There is widespread enthusiasm for prospective tumor sequencing to guide therapy selection in patients with cancer[1-3]. Yet, the utility of clinical sequencing in oncology beyond the cancer types in which it is a standard of care has been much debated. Indeed, the number of genomic alterations clinically validated as predictive biomarkers of drug response is relatively small when compared to the number of all mutant genes implicated in cancer[4-9]. Consequently, simply expanding the adoption of clinical genomics is unlikely to address the unmet needs for the majority of cancer patients given the current milieu of available therapies. Clinical sequencing has therefore begun to inform other aspects of oncology care. An emerging lesson from clinical cancer genomics, defined herein as the prospective clinical sequencing of tumor specimens to guide treatment decisions for active disease, is that these initiatives are generating large-scale data resources that can also be leveraged for scientific discovery across patients, especially when integrated with clinical annotation and treatment data. Here, we review progress in such efforts to uncover fundamental scientific discoveries that ultimately feed the clinical enterprise. We also review the unique challenges posed by this new scientific resource and strategies to overcome them to facilitate rigorous, robust, and reproducible translational science.
Evolution of genomic testing
Next-generation sequencing has become the foundational technology for modern clinical diagnostic testing in oncology, with several laboratory-developed tests recently achieving FDA recognition[10,11]. NGS testing has become standard of care in many cancer types as a method to identify therapeutically actionable alterations in tumor DNA. Most FDA-approved and/or standard of care biomarker-drug associations are disease-specific and are designed to capture all oncogenic mutations in a gene (such as those in BRCA1/2, IDH1, KIT) or be mutant allele-specific (BRAF V600E, FGFR3 R248C, S249C etc.) or alteration type-specific (ERBB2 amplifications, ALK fusions)[2]. Newer tumor-agnostic indications are emerging with the approval of immune checkpoint blockade (ICB) therapy in cancers with microsatellite instability (MSI)[12,13] and larotrectinib in solid tumors with NTRK fusions[14]. Testing can also identify patients unlikely to respond to certain therapies, as in the case of KRAS mutations and cetuximab in metastatic colorectal cancers[15]. Ultimately, given the current class of therapeutic options available, larger next-generation sequencing panels can (with notable exceptions discussed below) detect the majority of relevant genomic biomarkers across a range of abnormality types more efficiently than can conventional single-gene testing while also capturing many potentially emerging associations.The merits and technical aspects of different clinical sequencing tests in oncology have been reviewed elsewhere[1,16-19]. The most broadly used strategy by both academic and commercial laboratories has been targeted gene panels for bulk tumor tissue DNA sequencing[20-22]. These efforts have sought to maximize applicability in the patient population, uptake, and cost effectiveness while leveraging assay designs that balance competing factors in a manner consistent with community standards and best practices[23-25] (Box 1). This has necessarily limited the utility of clinical sequencing data for pure discovery (Fig. 1). Multiple cross-cutting forces shape these design choices, including the 1) regulatory approval process, 2) evolving landscape of financial reimbursement, 3) size of the intended patient population, 4) clinical volume at treating centers, 5) breadth of clinical intent, 6) consent structure and complexity, and 7) institution-specific issues involved in operationalizing clinical sequencing as a routine test in a long-established clinical cancer care workflow. While our focus here is DNA and tissue-based clinical genomics in oncology, many of these issues and a host of new challenges must also be overcome for next-generation testing of circulating tumor-derived cell-free DNA (cfDNA), which is discussed elsewhere[26].
Fig. 1:
Balancing accessibility, utility, and discovery in clinical genomics.
At present, there is a trade-off between cost, accessibility, and the opportunity for novel discovery afforded by different clinical sequencing platforms. The model depicts the current inverse relationship between accessibility and cost. As the cost (x-axis) of different sequencing strategies increases (driven by the size of the genomic footprint and depth of sequencing) accessibility in the community decreases (y-axis). The most common modalities of sequencing are labeled as are examples of the additional types of information enabled possible by adopting the indicated sequencing strategy. The discovery potential of a given strategy typically tracks with cost (bottom). At one end of the spectrum, small gene panels have the lowest cost but have limited discovery potential. In contrast, at the other end of the spectrum, WGS has both the highest cost and greatest discovery potential. WES, whole-exome sequencing; WGS, whole-genome sequencing; CNAs, copy number alterations.
As the clinical scope and intent of prospective sequencing in oncology is fundamentally different from discovery-driven retrospective research sequencing, targeted sequencing remains the methodology of greatest uptake. This is because, given the current treatment landscape in oncology, the majority of therapeutically relevant variants exist in a relatively small subset of all genes. Indeed, the most systematic effort to date to harmonize knowledge about cancer-associated mutations included 3437 unique variants across 357 diseases and 791 drug associations, yet encompassed just 415 genes[27]. Consequently, if the goal were to match patients to one of the current class of approved or investigational agents, broader scale sequencing such as whole-exome (WES) or genome sequencing (WGS) may provide limited increase in clinical benefit when compared to carefully designed large-panel targeted sequencing assays[16]. In fact, the detection of cryptic oncogenic events such as promoter mutations or exon skipping[28,29], broader allele-specific DNA copy number[30], and microsatellite instability[31] can be achieved without the reduced sensitivity associated with lower coverage WES or WGS. Nevertheless, areas for which WES and WGS have potentially greater clinical utility due to the more comprehensive sequencing of passenger mutations are the identification of key mutational and genomic signatures such as homologous recombination deficiency[32,33] as well as the identification of neoantigens, discrete subclones, and unexpected mechanisms of acquired resistance to therapy. As such, many clinical implementations of WES, WGS, and multimodality DNA and RNA sequencing exist[33-40].Therefore, despite the value of continued larger-panel targeted clinical sequencing, the drift toward clinical whole-genome sequencing (WGS) is likely inevitable. The development of new classes of drugs that target molecular aberrations not accessible by panel sequencing approaches and uniquely detected by WGS could motivate the substantial cost reductions and operational improvements that broad-based implementation of clinical WGS will require in our current health systems. The advent of clinical WGS in oncology care will likely result in a hybrid testing environment. WGS is likely to be used early in clinical care to establish a vital baseline of germline pathogenicity, somatic changes, and their actionability along with broader signatures of genomic changes[41]. Later in disease management, however, there will be clinical context-specific needs for far more sensitive testing (tissue or plasma-based) for disease monitoring or to identify emerging subclonal resistance for which clinical WGS is suboptimal. No single test will ultimately prove sufficient to address the many clinical needs in oncology care and structural considerations such as the reimbursement landscape must adapt as the utility of such hybrid testing is proven. Notwithstanding the exact configuration of clinical sequencing in oncology, it has recently become clear that the accrual of sequencing information is driving the creation of an entirely new institutional research resource that can catalyze new science and discovery.
New disease biology and richer clinical interpretation
Efforts to pool together prospective sequencing data within and across institutions are generating increasingly larger cohorts of molecularly profiled cancer patients, with opportunities for discovery science. Multiple aspects of clinical sequencing data make it favorable for discovery compared with retrospective research data in cancer. Among these are 1) larger cohort sizes and greater disease diversity than exist in the retrospective research domain; 2) specific quantitative attributes of clinical sequencing such as a greater depth of sequencing; 3) a distinct disease profile often comprised of advanced and post-treatment metastatic disease; and 4) the potentially transformative opportunity for clinical data integration with maturing outcomes and therapeutic phenotypes. Indeed, the analysis of clinical sequencing data has demonstrated that a broad set of genomic features can be identified from existing testing modalities, and that when understood at a basic and translational level, can be fed back into the clinical enterprise to facilitate richer reporting and a deeper understanding of a patient’s disease. This includes, but is not limited to, novel mutant allele and broader driver genetic defect discovery and functional validation; disease-specific genomics; basic mechanisms of tumor evolution; germline genetics; biomarker discovery and validation; mechanisms of therapeutic resistance; and immunogenomics. Each is catalyzed to a different degree by key aspects of clinical sequencing data (Fig. 2). These findings are not limited to individual genes and mutations, but broader genome- and disease-level (including etiological) discoveries[42].
Fig. 2:
Facets of clinical sequencing that drive translational discovery science.
Critical aspects of clinical sequencing that can impact various discovery efforts are its 1) population-scale, 2) use of matched germline sequencing, 3) depth of sequencing, and 4) genomic content (see axes). Each of these facets can also drive different types of discoveries. The relative importance of each axis is displayed for various analysis types, the further away from the center the greater the importance of that particular axis is to the analysis in question.
At the scale of individual alterations
One of the most attractive aspects of clinical sequencing in oncology from the perspective of discovery is the population sizes now being characterized. These sample sizes, albeit from sequencing of already established cancer-associated genes, are driving efforts to mine the long right tail of the frequency distribution of driver mutations across cancer patients. This seeks to overcome a major hurdle in effective precision oncology, of understanding the biological and therapeutic importance of the multitude of variants of uncertain significance identified from such sequencing. To this end, a variety of computational and medium- to high-throughput functional screening approaches, reviewed in part elsewhere[2], have been developed to mine this long tail. These efforts are fueled in part by access to population-scale sequencing cohorts and have revealed increasingly rare mutations that provide foundations for accelerated clinical translation[2,43], extend biomarkers of therapeutic sensitivity to a greater number of patients[44-49], inform drug development[50], and uncover new tumor biology [51,52]. Indeed, emerging evidence whereby different mutations in the same actionable cancer gene can have distinct biochemical consequences has underscored additional layers of complexity in our mechanistic understanding of pathway signaling[53]. This can dictate different therapeutic strategies, as has now emerged for mutant BRAF[54,55] and MEK1[56]. There now exists multiple decision support systems and emerging community standards to turn this knowledge into practice[8,9,27,57,58].There is a growing appreciation that the mere presence of an oncogenic driver mutation is not sufficient to convey its role in tumorigenesis, cancer progression, prognosis, or response to treatment. Indeed, context matters. For example the importance of serial evolution of oncogenic alterations has been realized in part by the ability to infer changes in mutant zygosity from the combination of allele-specific DNA copy number data and the quantitative accurate mutant allele frequencies from high depth of coverage in panel-based clinical sequencing data. The dosage of mutant alleles can modulate distinct molecular phenotypes[59,60], drive tumorigenesis and disease progression[61], and shape prognosis and therapeutic sensitivity[62]. Mutant oncogene dose-dependent sensitivity to targeted therapies has already been demonstrated in patients with AKT1 E17K mutations treated with AKT inhibitors[63] and in BRAF V600-mutant metastatic melanomas treated with RAF inhibitor therapy[62]. Other cryptic mechanisms of serial evolution of mutant oncogenes, such as cis-acting double mutations in PIK3CA, have been linked to greater therapeutic sensitivity[64], and additional associations may emerge for new investigational therapies such as inhibitors of KRAS G12C[65] and recently approved therapies targeting mutant oncogenes frequently affected by zygosity changes[66]. Information about zygosity, clonality, co-mutations, disease lineage, and germline status,among other factors, all play a vital part in understanding the role of any individual oncogenic variant in an affected tumor. More comprehensive panel clinical sequencing facilitates the feed-forward loop that propels new discoveries back into the clinical enterprise to enrich clinical reporting and inform cancer care.
Broader genome-wide alterations
Beyond the DNA copy number alterations that span discrete mutant loci and genes frequently targeted by focal amplifications or deletions, the ability to infer genome-wide allele-specific copy number is facilitating discoveries at the genome-scale in clinical sequencing data. Whole-genome doubling (WGD) [67,68], for example, is a tetraploidization of the tumor genome and is among the most common molecular abnormalities in cancer, albeit varying across tumor types and disease states[69]. WGD is associated with worse clinical outcomes both pan-cancer and in specific disease contexts[70]. The identification of WGD in clinical sequencing data can facilitate evolutionary analysis of individual patient tumors by revealing the order of acquisition of key alterations[70-72] that can aid biological and clinical interpretation and may ultimately help to predict chemo-sensitivity[73].The accurate inference of mutational signatures[74] in clinical genomic data can reveal aspects of inherited susceptibility, disease etiology and pathogenesis, environmental exposures, and therapeutic sensitivity. Notably among these signatures is microsatellite instability (MSI), which is characterized by hypermutation at repetitive sequence motifs due to DNA mismatch repair (MMR) dysfunction. MSI is a tumor-agnostic biomarker of sensitivity to ICB therapy[13]. The ability to infer MSI from larger-panel clinical sequencing without assay iteration has facilitated its rapid clinical validation[31] and uptake. MSI represents a critical link between germline pathogenicity (MMR defects and predisposition to multiple cancer types[75]) and somatic phenotypes that can drive further consolidation of clinical testing by motivating simultaneous germline and somatic testing (described below). However, challenges remain for identifying and interpreting other mutational signatures[76]. Despite the development of algorithms for their inference from targeted sequencing data[77], not all mutational signatures are sufficiently specific in their nucleotide pattern to be accurately inferred from such data or may require the detection of multiple orthogonal signatures to reduce false positives, even when using broader sequencing strategies[78,79]. Moreover, incorrect attribution of such mutational signatures can complicate the clinical interpretation of key lesions and therapeutic vulnerabilities, perhaps best typified by homologous recombination deficiency and PARP inhibitor use[80,81]. The specificity and accuracy of mutational signature inference, whether driven by substitutions, DNA copy number changes[82], structural rearrangements, or the combination thereof[33,78], will improve with the clinical use of the aforementioned WGS sequencing modalities[78].
Integrated germline and somatic reporting
With adoption of routine mutational signature and zygosity inference from matched germline profiling in the context of clinical sequencing, there exists an opportunity for fully integrated germline and somatic reporting to more completely understand disease pathogenesis and inform treatment. Such an approach may ultimately go beyond addressing ambiguities regarding the origin of somatic phenotypes, but define a more comprehensive view of disease as arising from the interaction of heritable and somatic factors. This shift toward integrated germline-somatic reporting is motivated in part by the rate at which pathogenic variants areidentified from broad-based clinical genomic profiling in both unselected patient cohorts and those with cancer[37,83-87]. Placing germline pathogenic variants into their somatic context, be it their tumor-specific zygosity or association with broader somatic mutational signatures, is increasingly important to guide the interpretation of their biological significance. Indeed, there is emerging evidence that zygosity changes that accompany germline pathogenic variants in even classical cancer predisposition genes are variable depending on cancer type, and may reflect differences in subsequent somatic mutational signatures of that lesion and even therapy response[32].Such discovery-based science at the intersection of germline and somatic cancer genetics is revealing new complexities in the interpretation of individual variants (Box 2). These complexities are therapeutically relevant today, such as in the case of a cancer patient found to harbor a pathogenic variant in a MMR gene but whose tumor lacks MSI and therefore would not be predicted sensitive to ICB therapy[12,13]. Richer clinical reporting could streamline testing for patients and inform multiple aspects of clinical care; [88], however, to our knowledge routine fully integrated clinical reporting is not presently performed at scale. This approach remains complicated by privacy considerations, cost concerns, and the complexity of incidental findings[89], which have potential consequences for genetic counseling[90] and triggering cascades of subsequent care of uncertain value[91]. These are complex and both labor- and expertise-intensive endeavours[92] that require substantial infrastructure and investment - likely explaining, at least in part, the prevalence of tumor-only sequencing. Addressing the complexities for integrated germline and somatic reporting is urgent as the community is already seeking to incorporate somatic features into the classification of germline variants of uncertain significance[93].
Prospective immunogenomics
A perhaps surprising area of innovation for larger-panel clinical sequencing has been immunogenomics. Principle among these advances is the routine calculation of tumor-intrinsic biomarkers of response to ICB therapy such as tumor mutational burden (TMB)[94-97]. However, TMB can be affected by tumor purity[98], intratumoral heterogeneity[99], disease subtype[100], mutational signatures[101-103], and from the perspective of clinical genomics, panel size and content[104-106]. Overall, considerable work remains to refine the clinical utility of TMB[107] and to better understand both the factors by which it co-varies and how it affects and confounds the identification of single-gene or pathway-based biomarkers of (ICB) therapy.As an adjunct to TMB, other factors essential for modern immunogenomics can be readily inferred from the current generation of clinical sequencing assays such as the genotypes of human leukocyte antigen genes (HLA-A, B, and C). These genes encode MHC class I proteins that present intracellular peptides on the cell surface to the immune system. Overall, the diversity and evolutionary divergence of the HLA class I repertoire has been associated with differences in the efficacy of ICB therapy[108,109]. In tumors, somatic mutations in these genes and related antigen processing machinery have been correlated with immune cell infiltration[110], and somatic loss-of-heterozygosity (LOH) of the HLA locus is associated with high subclonal neoantigen burden and is a mechanism of immune escape[111,112]. Notably, accurate HLA class I genotyping [113,114] and somatic HLA zygosity inference requires patient-matched germline DNA sequencing, reinforcing its importance as part of the routine clinical workflow. This can further consolidate clinical testing modalities and inform novel enrichment and accrual strategies for the next generation of clinical trials testing HLA-specific neoantigen-directed cancer vaccines among others[115,116].The inclusion of HLA class I genes in paired tumor and matched germline clinical sequencing also facilitates neoantigen discovery[117]. Tumor-specific neoantigens result from somatic mutations and are foreign peptides absent from normal tissue. Clonal neoantigen burden is a biomarker of response to ICB therapy[99]. Looking beyond just the burden of neoantigens, fitness models have been proposed to measure neoantigen quality, which is also associated with longer term survival in pancreatic cancer patients[118,119]. Personalized vaccine strategies are being developed to target neoantigens that autologously bind to HLA genes within cancer patients[120-123]. However, targeted clinical sequencing can only identify a small fraction of the potential neoantigens present in individual tumors. As such, WES has been primarily used to identify and prioritize putative neoantigens and vaccine targets via in silico approaches, often in conjunction with RNA sequencing to assess peptide expression[121,123]. By contrast, other neoantigen-based strategies can directly benefit from the current generation of largely targeted sequencing in oncology. One such strategy is leveraging shared or ‘public’ neoantigens that bind common HLA subtypes and can potentially benefit a larger population of patients and avoid much of the complexity and cost of developing personalized therapies from private neoantigens[124-126].
Early detection and disease classification
Clinical genomics has proven enormously valuable for identifying patients with disease states that inform screening and early detection for other malignancies. For instance, prospective clinical sequencing can identify patients with solid tumors as well as clonal hematopoiesis (CH)[127]. CH is characterized by somatic mutations in hematopoietic stem and progenitor cells in patients without hematologic disease[128,129]. CH mutations typically arise in genes associated with myeloid disease and other malignancies and therefore are readily detected in prospective sequencing of cancer patients, especially those for whom matched germline sequencing is performed at high depth of coverage. As cancer patients with CH are at greater risk of developing hematologic malignancies such as AML and MDS[127], it may serve as an increasingly important facet of early detection and screening[130,131], with future clinical validation particularly dependent on large-scale clinical sequencing initiatives. Finally, broader efforts exist to integrate multiple types of alterations that can be readily inferred from targeted clinical sequencing data of individual patient tumors, coupled to novel computational methods development including those in machine learning. One such example is the development of algorithms to predict tissue of origin for cancers from clinical genomics data alone[132] that were themselves trained on large-scale clinical sequencing data. Such point-of-care decision support can serve as a useful adjunct to conventional histologic diagnosis and is another example of the value of discovery in clinical sequencing feeding back into the clinical enterprise, in this case potentially providing integrated pathological diagnoses.Overall, the breadth of these discovery efforts emphasize that, when properly controlling for the effect of prior therapy, even foundational aspects of tumorigenesis and cancer biology can be revealed from the sequencing of largely advanced and metastatic specimens.
Implementation and downstream uses
As clinical genomics has emerged as standard-of-care testing for many indications, this has led to a proliferation of different assays at major academic medical centers, which has both benefits and drawbacks. Complexities include the substantial and ongoing investments in assay development, clinical validation, and infrastructure as well as the variability in test content and capability that can complicate care, reimbursement, and trial accrual and design. By contrast, advantages include control over gene panel content, access to deep clinical annotation, sample and assay quality assessment data to drive iterative assay improvement (of diminishing returns as the feasibility of broad-based clinical WGS grows), customizable integration of reports into the electronic medical record, a unified platform for clinical use and discovery, and long-term cost effectiveness[16]. Test proliferation at academic medical centers also indirectly fosters innovation for the broader field that may not otherwise happen, through responses to institution-specific needs. These advantages are often seen only with economies of scale, so the field must democratize the expertise and benefits from such test development to the broader clinical oncology community. This could dovetail with other models that seek to build partnership with the pharmaceutical industry and government with a network of centers to facilitate information, data, and resource exchange[133].
Translational science in clinical trials
Target-driven clinical trials were an early crucible through which key lessons were learned for the implementation and use of prospective sequencing in oncology[2]. Challenges abound for the translational science incorporated into such trials. Significant variability in clinical sequencing test capabilities exist as discussed above between sites enrolling patients in multi-center trials. This effectively limits the sample size for correlative genomics to only those sites with sufficient capabilities unless trial agreements break down the barriers typical of competitive academic sites to facilitate the transfer of materials for study. In addition, the desire in many target-driven trials for central tissue testing to support companion diagnostic test development that may be redundant with the activities of the individual sites can further limit high-quality genomic studies by depleting key tissue resources. Many of these issues can be overcome with cross-laboratory concordance testing[134,135] and compliance with guidelines and standards for next-generation sequencing[23,25,136]. Also playing an important role is community-wide harmonization of key biological features across various clinical sequencing assays, such as recent efforts for tumor mutational burden[105-107]. While challenging, these efforts are likely more straightforward than harmonizing the real-time interpretation of candidate enrolling and sensitizing lesions[27] across sites without some degree of central review or formal FDA recognition[137]As biomarker hypotheses and enrichment strategies to guide trial enrollment become more complex, some enrolling sites and institutional tests will have the necessary capabilities, while others may not. These differences in testing capability may in turn influence accrual, trial design and availability, and ultimately drug approval. Even if such challenges were overcome, many key scientific and clinical questions remain that are difficult to address in the setting of a target-driven clinical trial, where the populations are necessarily small and uniform molecular profiling is absent. This has led to tremendous enthusiasm for real-world data and evidence to supplement the lessons learned from clinical trials[138,139]. Prospective studies using real-world data, defined here as population-level clinical and molecular data acquired outside the context of a clinical trial, provide an avenue to address such questions, but these will require careful clinical data homogenization, the selection of adequate control populations, and more to ensure that clinical practice realities do not limit robust results.
Retrospective biomarker analyses
The greatest value of clinical sequencing for discovery is arguably the real-time integration with clinical data including maturing outcomes and treatment phenotypes. This permits both an initial discovery and its clinical cross-validation to take place at the same time and in the same cohort, which is a strategy leveraged for many of the aforementioned discoveries. Clinical and treatment annotation will be of increasing importance for key retrospective biomarker questions that are impossible to answer in the setting of a clinical trial. However, a multitude of challenges complicate rigorous retrospective analyses, which can risk erroneous clinical findings (Fig. 3a).
Fig. 3:
The known unknowns and potential pitfalls of retrospective biomarker analyses.
a) Various potential confounding factors that complicate the current generation of rigorous retrospective biomarker analyses using real-world clinical sequencing data in oncology. b) Over time, as clinical practice patterns change and the patient population for which clinical sequencing is routinely performed expands beyond late-stage and treatment refractory disease, the prognostic composition of the cohort will shift, leading to potentially spurious associations with outcome. c) Biomarker analysis pan-cancer can fail to discriminate between survival differences driven by underlying biology versus therapeutic intervention when affected cancer types have very different natural histories or the biomarker itself bestows favorable or worse prognosis. d) The effect of subsequent lines of therapy can confound key clinical endpoints in clinical sequencing cohorts composed of patients with heterogeneously administered therapies. TTF, time to treatment failure; OS, overall survival.
Foremost among challenges is the lack of rigorously validated quantitative treatment outcomes. Overall survival (OS) measured from the start of treatment is justifiably the established and preferred endpoint for demonstrating clinical benefit to a therapy of interest in clinical trials. However, to date most patients for whom clinical sequencing is obtained are not part of a therapeutic trial enrolling homogenous patient populations who then receive highly standardized therapy, the response to which could then be measured in a standardized and quantitative manner. Moreover, an increasing number of retrospective biomarker analyses are being performed pan-cancer, driven in part by the recent excitement around tissue-agnostic biomarkers of therapeutic sensitivity[13,14]. Yet, multiple potential confounding factors can plague different clinical endpoints in retrospectively collected treatment data in a pan-cancer cohort, preventing accurate assessment of clinical benefit from a given line of therapy. This is especially true of the preferred OS endpoint that, when analyzed pan-cancer, does not account for the often-profound prognostic differences between individual cancer types.First, initial efforts to incorporate clinical genomics into active cancer care focused on the most advanced and treatment-refractory patients in need of novel therapeutic strategies who had the poorest prognosis. Over time, as practice patterns change with the approval of biomarker-driven therapies in earlier settings, clinical sequencing has expanded to patients with earlier-stage disease. This can lead to ‘prognostic drift’ in a cohort that accrues in real-time that, if not correctly adjusted for, can produce spurious associations with outcome (Fig. 3b). Second, prognosis between any two cancer types can vary widely and independently of the candidate biomarker of interest (Fig. 3c). If a cancer type of longer natural history also has an alteration rate of a candidate biomarker that is significantly greater than a poorer prognosis cancer type, simple outcome analyses using a clinical endpoint like OS from the start of therapy may arrive at the wrong conclusion about the effect of the sensitizing biomarker. Third, several key genomic alterations being investigated as therapeutic biomarkers can alone distinguish prognostically distinct subsets within individual cancer types, as is the case with BRCA1/2 mutations in ovarian cancers[140]. Consequently, any analysis of such a biomarker using OS will inevitably be confounded by the favorable prognosis its presence bestows on the carrier and fail to discriminate between survival differences driven by underlying biology versus therapeutic intervention. Fourth, OS from the start of therapy can be confounded by both therapies received concurrently with, and following the cessation of a given line of treatment (Fig. 3d). The confounding effect of concurrent and subsequent therapy is of particular concern in highly heterogeneous real-world cohorts such as those generated by clinical sequencing in oncology. For these reasons, both routine clinical trial practice and international health authority guidance suggests that OS should only be utilized as the primary endpoint for comparative analyses in large, rigorously designed, adequately powered, and highly controlled randomized phase III trials that enroll homogenous patient populations who then receive highly standardized therapy[141-143].Retrospective therapeutic biomarker analyses must therefore leverage one of multiple potentially suboptimal alternative endpoints that can be subjective or difficult to standardize. Some are more conservative or straightforward to curate than others, such as time to treatment failure or progression-free survival, which may be less prone to confounding in the real-world setting of clinical sequencing in cancer. However, they too can fail to reflect desired therapeutic effects. Time to treatment failure cannot quantify benefit for agents given for fixed durations like platinum-based chemotherapies or anti-CTLA-4 blockade, while progression free survival can be difficult to determine without the regular interval scans and radiographic measurements that standardize determination of disease progression on the majority of clinical trials.Ultimately, rigorous and independently validated biomarkers of sensitivity to individual therapeutic agents are urgently needed to guide effective cancer care. However, false signals can lead the biomedical community astray at high cost in time and resources for institutions and patients. So, despite the exploratory power of these cohorts and data, key biomarker questions can likely only be tested robustly in a rigorously controlled large-scale prospective clinical trial where all of the aforementioned confounding clinical variables can be annotated, controlled, and duly adjusted for. For others, recognizing that no real-world data approach is perfect can motivate policies that require independent cohorts of clinical cross-validation for individual findings and a focus on more conservative endpoints, which mirrors best practices in the field for signal-seeking clinical trials.
Tragedy of the commons
Atypical of research data, clinical sequencing datasets obtained during the course of oncology care are often not siloed in any single research laboratory and therefore can represent catalyzing institutional resources for discovery if shared broadly. Nevertheless, challenges abound. It is unclear how proper attribution and credit can be ensured for data producers, who may be part of the clinical operational enterprise distinct from the research teams leveraging this data. Nor is it clear how to ensure privacy protections for cohorts composed, in part, of still-active cancer patients, while also meeting the obligations to the broader research community for the public deposition of sensitive data types such as raw sequencing data, germline variant calls, and others. It remains difficult to mediate the use of potentially encumbered or embargoed data, for instance from clinical trials with existing agreements governing the use of data prior to the public reporting of clinical efficacy data. These issues are particularly fraught when it comes to clinical annotation and treatment phenotypes. Navigating the issues around clinical data for integration with molecular information requires buy-in from key stakeholders from the individual treating physicians to the larger research enterprise and potential outside entities such as clinical trial sponsors, among others. Cross-institutional initiatives such as AACR GENIE[144] are seeking new ways to address these issues while aggregating clinical sequencing data from many sources for broader utilization[145].At our academic cancer center, we have generated and shared clinical sequencing data from a prospective tumor profiling initiative for tens of thousands of tumors over the last several years to facilitate its use. All analyzed molecular data to date is shared via the cBioPortal for Cancer Genomics[146,147]. Raw data is available on institutional computing resources and is accessible to any research team at the institution via an IRB-approved process codified in the protocol to which all patients consent for clinical sequencing. These resources are updated nightly to ensure broadest availability and use. Such resources, however, require nimble governance that ensures open, broad, and rigorous utilization while assisting researchers to adjudicate overlapping uses. Questions have arisen such as how best to navigate two different research groups at the institution asking the same question of the same largely unpublished data, or how to handle situations when two research groups, leveraging the same institutional data resource but using slightly different analytical approaches, arrive at different conclusions. Ultimately an open dialogue among the key operational, clinical, and scientific stakeholders is necessary to ensure best use of these newer generations of institutional research resources. Our community must also ensure rigorous meta-data accompanies the publication of clinical sequencing and clinical annotation data to ensure proper use by others in the biomedical community encountering these data types for the first time.Ultimately, unfettered data access and sharing is critical for the biomedical enterprise, catalyzing a greater body of science than can be achieved by any one group. However, such sharing raises the risk of scientific overlap and even misinterpretation that can lead to incorrect findings and stifle progress in clinical genomics. Responsibility ultimately lies both with data producers and users to ensure shared data are analyzed in a manner that is rigorous, well-documented, and trustworthy to ensure progress in improving human health and oncology.
Outlook
Significant progress remains to be made in extending the clinical benefit of prospective molecular characterization to more cancer patients. In parallel, molecular profiling initiatives will continue to grow as an increasing component of oncology care - especially as entirely new modalities of characterization, such as cell-free DNA and single-cell sequencing, mature toward clinical utilization[148-151]. While each of these new technologies will come with their own regulatory, ethical, and practical considerations and complexities, together they represent an unprecedented opportunity for scientific discovery. Our health systems must therefore mature to support this degree of routine molecular profiling, enable seamless and structured data sharing, and ensure real-time integration with deep clinical phenotyping. This will accelerate the discovery of clinical phenotypes associated with alterations in the cancer genome and drive expanded use of real-world evidence to aid in clinical and regulatory decision making.
Authors: Craig M Bielski; Mark T A Donoghue; Mayur Gadiya; Aphrothiti J Hanrahan; Helen H Won; Matthew T Chang; Philip Jonsson; Alexander V Penson; Alexander Gorelick; Christopher Harris; Alison M Schram; Aijazuddin Syed; Ahmet Zehir; Paul B Chapman; David M Hyman; David B Solit; Kevin Shannon; Sarat Chandarlapaty; Michael F Berger; Barry S Taylor Journal: Cancer Cell Date: 2018-11-01 Impact factor: 31.743
Authors: Lawrence J Jennings; Maria E Arcila; Christopher Corless; Suzanne Kamel-Reid; Ira M Lubin; John Pfeifer; Robyn L Temple-Smolkin; Karl V Voelkerding; Marina N Nikiforova Journal: J Mol Diagn Date: 2017-03-21 Impact factor: 5.568
Authors: Franklin W Huang; Eran Hodis; Mary Jue Xu; Gregory V Kryukov; Lynda Chin; Levi A Garraway Journal: Science Date: 2013-01-24 Impact factor: 47.728
Authors: Amy S Gargis; Lisa Kalman; Meredith W Berry; David P Bick; David P Dimmock; Tina Hambuch; Fei Lu; Elaine Lyon; Karl V Voelkerding; Barbara A Zehnbauer; Richa Agarwala; Sarah F Bennett; Bin Chen; Ephrem L H Chin; John G Compton; Soma Das; Daniel H Farkas; Matthew J Ferber; Birgit H Funke; Manohar R Furtado; Lilia M Ganova-Raeva; Ute Geigenmüller; Sandra J Gunselman; Madhuri R Hegde; Philip L F Johnson; Andrew Kasarskis; Shashikant Kulkarni; Thomas Lenk; C S Jonathan Liu; Megan Manion; Teri A Manolio; Elaine R Mardis; Jason D Merker; Mangalathu S Rajeevan; Martin G Reese; Heidi L Rehm; Birgitte B Simen; Joanne M Yeakley; Justin M Zook; Ira M Lubin Journal: Nat Biotechnol Date: 2012-11 Impact factor: 54.908
Authors: Stacy W Gray; Elyse R Park; Julie Najita; Yolanda Martins; Lara Traeger; Elizabeth Bair; Joshua Gagne; Judy Garber; Pasi A Jänne; Neal Lindeman; Carol Lowenstein; Nelly Oliver; Lynette Sholl; Eliezer M Van Allen; Nikhil Wagle; Sam Wood; Levi Garraway; Steven Joffe Journal: Genet Med Date: 2016-02-11 Impact factor: 8.822
Authors: Benjamin H Durham; Estibaliz Lopez Rodrigo; Jennifer Picarsic; David Abramson; Veronica Rotemberg; Steven De Munck; Erwin Pannecoucke; Sydney X Lu; Alessandro Pastore; Akihide Yoshimi; Diana Mandelker; Ozge Ceyhan-Birsoy; Gary A Ulaner; Michael Walsh; Mariko Yabe; Kseniya Petrova-Drus; Maria E Arcila; Marc Ladanyi; David B Solit; Michael F Berger; David M Hyman; Mario E Lacouture; Caroline Erickson; Ruth Saganty; Michelle Ki; Ira J Dunkel; Vicente Santa-María López; Jaume Mora; Julien Haroche; Jean-Francois Emile; Olivier Decaux; Frederic Geissmann; Savvas N Savvides; Alexander Drilon; Eli L Diamond; Omar Abdel-Wahab Journal: Nat Med Date: 2019-11-25 Impact factor: 53.440