Literature DB >> 32461694

Phase and context shape the function of composite oncogenic mutations.

Alexander N Gorelick^1,2, Francisco J Sánchez-Rivera³, Yanyan Cai⁴, Craig M Bielski^1,2, Evan Biederstedt², Philip Jonsson⁵, Allison L Richards⁵, Neil Vasan^1,6, Alexander V Penson^1,2, Noah D Friedman^1,2, Yu-Jui Ho³, Timour Baslan³, Chaitanya Bandlamudi⁵, Maurizio Scaltriti⁴, Nikolaus Schultz^2,5,7, Scott W Lowe^3,8, Ed Reznik^9,10, Barry S Taylor^11,12,13,14.

Abstract

Cancers develop as a result of driver mutations1,2 that lead to clonal outgrowth and the evolution of disease3,4. The discovery and functional characterization of individual driver mutations are central aims of cancer research, and have elucidated myriad phenotypes5 and therapeutic vulnerabilities6. However, the serial genetic evolution of mutant cancer genes7,8 and the allelic context in which they arise is poorly understood in both common and rare cancer genes and tumour types. Here we find that nearly one in four human tumours contains a composite mutation of a cancer-associated gene, defined as two or more nonsynonymous somatic mutations in the same gene and tumour. Composite mutations are enriched in specific genes, have an elevated rate of use of less-common hotspot mutations acquired in a chronology driven in part by oncogenic fitness, and arise in an allelic configuration that reflects context-specific selective pressures. cis-acting composite mutations are hypermorphic in some genes in which dosage effects predominate (such as TERT), whereas they lead to selection of function in other genes (such as TP53). Collectively, composite mutations are driver alterations that arise from context- and allele-specific selective pressures that are dependent in part on gene and mutation function, and which lead to complex-often neomorphic-functions of biological and therapeutic importance.

Entities: Chemical

Mesh：

Substances：

Year: 2020 PMID： 32461694 PMCID： PMC7294994 DOI： 10.1038/s41586-020-2315-8

Source DB: PubMed Journal: Nature ISSN： 0028-0836 Impact factor: 69.504

To study the pattern, prevalence, and function of composite mutations in cancer, hereafter defined as two or more distinct somatic mutations in the same gene and tumor specimen, we analyzed the germline blood and matched tumor tissue of 31,359 cancer patients in whom prospective clinical sequencing was performed to guide treatment decisions for advanced and metastatic disease (Fig. 1a, Extended Data Fig. 1a, Supplementary Table 1).

Fig. 1:

Composite mutations in human cancers.

a) Schematic representation of composite mutation discovery and characterization. b) Top, statistically significant enrichment (P < 10−5) for composite mutations in tumors of increasing tumor mutational burden. Nominal P based on one-sided permutation tests for enrichment (100,000 permutations) applied independently to the subset of tumors with each indicated TMB (bottom, number of cases), n=30,505 biologically independent tumor samples with TMB ≤ 40. c) Proportion of composite mutations including the fraction ascribed to mutational processes associated with hypermutation (MSI, microsatellite instability; MMR, mismatch repair; TMZ, temozolomide-associated hypermutation; POLE, DNA polymerase epsilon-associated hypermutation; cases excluded from analysis unless otherwise noted). d) Percentage of cases with composite mutations by cancer gene function. P<10−308 (numeric limit, two-sided McNemar’s test; n=29,507 patients). e) Types of composite mutations by cancer gene function (P<10−308, numeric limit, two-sided Fisher’s exact test; n=5,954 composite mutations). Error bars in panels d-e are 95% binomial confidence intervals (CIs).

Extended Data Fig. 1:

Study cohort and rates of composite mutations.

a) Distribution of cancer types in the study cohort. b) The rate of composite mutations (22.7% of all tumors) compared to a simulated background rate (black, P = 10−5 from one-sided permutation test for enrichment with 100,000 random permutation-based simulations (no permutation exceeded observed value). c) The observed rate of composite mutations in the primary untreated cancers of The Cancer Genome Atlas cohort (n=10,908 solid tumors) when controlling for gene content for consistency with the targeted sequencing panel of the prospective cohort studied here. In black, null distribution from sampling (see Methods). d) The observed and expected rate of composite mutations in tumors of the indicated tumor mutational burden (as in Fig. 1b, n=30,505 biologically independent tumor samples with TMB 40, P = 1×10−9 from two-sided Wilcoxon signed-rank test.

Selection for composite mutations

In total, 22.7% (n = 7,874) of tumors harbored at least one composite mutation, 56% more frequent than expected by chance when controlling for gene content and mutational burden (P < 10−5, see Methods; Extended Data Fig. 1b–c, Supplementary Table 2). Significantly more composite mutations arose than would be expected in cases of modest mutational burden (4–12 mutations/megabase, ~45% of all tumors, P < 10−5; Fig. 1b, Extended Data Fig. 1d), an enrichment that decreased in tumors of increasing mutational burden. As positive selection cannot be easily distinguished from the predominantly neutral impact of increasing mutational burden, high mutational burden tumors were considered biologically distinct and excluded from analysis (see Methods, Fig. 1c). Finally, we also found that known mechanisms of localized hypermutation explained few composite mutations overall (Extended Data Fig. 2).

Extended Data Fig. 2:

Sources of local hypermutation.

a) The number of composite mutations comprised of two or more constituent variants (top) and the distribution of likely causative mutational signatures among them (bottom, see legend). Composite mutants comprised of greater than three mutations were increasingly produced by APOBEC-associated mutagenesis indicative of localized hypermutation[54,55], but accounted for a minority of events cohort-wide. b) Left, the somatic mutational data in the study cohort reflected the elevated mutation rates previously observed at both the positions closest to the nucleosome dyad as well as DNA bound to active transcription factor binding sites[39,40]. However, mutations arising in composite events were proportionally less often proximal to such sites (defined here as within the full width at half maximum of the peak of mutation rate (red) than were singleton mutations (right, P = 10−27 and 10−47, respectively; two-sided two-sample Z-test, n=323,883 single-nucleotide substitutions arising in 471 biologically distinct melanoma samples).

Composite mutations arose more frequently in tumor suppressor genes (TSGs) than in oncogenes (17.5 versus 6.7% of all mutations; P = 10−261, two-sample Z-test) (Fig. 1d). Furthermore, 70% of composite mutations in TSGs consisted of one or more truncating variants, compared to only 13% for oncogenes (Fig. 1e), suggesting biallelic loss drives the enrichment for composite mutations in TSGs. Lineage-specific patterns of driver mutations in individual cancers genes were, in part, reflected in the pattern of composite mutations pan-cancer (Fig. 2a and Extended Data Fig. 3a). This included a higher burden of composite mutations in PIK3CA in breast cancers, APC in colorectal cancers, CDK12 in prostate cancers, and EGFR in both lung cancers and gliomas among others. By contrast, not all significantly mutated genes had frequent composite mutations such as KRAS in multiple cancers or VHL in renal cell carcinomas, often reflecting serial genetic evolution by other means.

Fig. 2:

Gene and residue-specific selective pressure for composite mutations.

a) Prevalence of composite mutations by affected gene and lineage (cancer types of >100 and ≥5 total and composite-mutant cases, n=31,563 samples). Top, percent of cases with composite mutations and the expected value based on cohort size and mutational burden. Expected values are the mean percentage of 10,000 random permutations for each lineage; bars, 95% CIs. b) The significance of enrichment for composite mutations in cancer genes (FDR-adjusted P values from one-sided binomial test for enrichment, n=26,997; light gray is not significant). c) Hotspot mutation utilization among composite and singleton mutations by decreasing population-level frequency (P<10−308, two-sided Mann–Whitney U test, n=93,616 and 2,920 singleton and composite missense mutations respectively in 25,037 patients). Inset, the percent of all missense mutations comprising composite and singleton mutants that were individually significant mutational hotspots. P, two-sided two-sample Z-test for equal proportions, n=105,297 total single-nucleotide variants, error bars are 95% binomial CIs. d) Right and left are the proposed and observed temporal order of acquisition of two functional variants in composite mutations in oncogenes (from mutation clonality). TSGs shown as a negative control. P, two-sided binomial test, error bars in all panels are 95% binomial CIs (n=336 evaluable composite mutations).

Extended Data Fig. 3:

Number and distribution of composite events across genes.

a) The number and percent of cases in the study cohort harboring composite mutations in the indicated genes (right) juxtaposed to their overall mutation rate (left). Shown are the genes with a significant enrichment of composite mutations (Q < 0.01, FDR-adjusted P values from one-sided binomial test for enrichment, n=26,997, as in main text Fig. 2b), limited to the top 10 genes by significance in each category of gene function unless fewer. b) The significance of enrichment for composite mutations (n and statistical tests as described above and in main text Fig. 2b) limited to 168 oncogenes.

We next sought to determine whether individual cancer genes were enriched or depleted for composite mutations in individual genes, controlling for determinants of their background mutation rate (see Methods)[9]. In total, 34 genes were significantly enriched for composite mutations (Q < 0.01; Fig. 2b, Supplementary Table 3), including both TSGs such as APC, TP53, PTEN, and MAP3K1, and oncogenes, the most significant of which was PIK3CA (9.9% of all PIK3CA mutations were composite, 95% CI 9.0–10.9; Extended Data Fig. 3b). Other frequently mutated oncogenes were not enriched for composite mutations including IDH1, reflecting the requirement for heterozygosity in IDH-mutant cells to sustain adequate D-2HG production[10] and KRAS, which may reflect selection against further detrimental oncogenic Ras activation[8,11]. Mutational recurrence alone cannot, therefore, predict whether a cancer gene is enriched for composite mutations. Consistent with their selection, composite mutants were 2.5-fold more likely than individual mutations to include a hotspot—residues mutated in cancer more often than would be expected in the absence of selection[12,13] (P ≈ 0, two-sample Z-test for equal proportion) (Fig. 2c). Composite mutations were notably absent of hotspots of greatest positive selection (e.g. KRAS G12, BRAF V600), but were instead prevalent among less common hotspots, suggesting selective pressure is greatest for weakly functional alleles. Based on differences in their clonality, in 69% of cases the more prevalent hotspot mutation (at the population level) preceded the less prevalent mutation in oncogenes (95% CI 59–78%, Fig. 2d), consistent with a model whereby the less prevalent allele synergizes with a more potent initial hotspot mutation. TSGs exhibited no such temporal ordering, reflecting how prevalence is poorly correlated with fitness for predominantly loss-of-function mutations. Together, these data indicate a strong mutant allele-specific selective pressure for composite mutations that evolve along a chronology driven in part by oncogenic fitness.

Phase and function

The elevated rate of likely driver mutations in composite mutants led us to investigate their allelic configuration. We combined sequencing read support with clonality to phase mutations, thereby ensuring composite mutations arose in the same tumor cell population. Among evaluable composite mutants, 67% and 19% arose in cis or trans respectively (n = 977 and 275), while 14% were indeterminate (n = 210). The higher rate of cis mutants reflected, in part, reduced sensitivity for detecting trans mutations from the short-read sequencing used here, an effect we controlled for in subsequent analyses (see Methods). TSGs were significantly more likely to harbor composite mutations in trans (71% in trans, n=79 of 111), especially those with two truncating mutations consistent with biallelic inactivation. By contrast, composite-mutant oncogenes with two missense mutations were largely cis-acting (91%, n=243 of 268; P = 3×10−33, two-sided Fisher’s exact test) (Fig. 3a). Composite mutations involving silent mutations exhibited no such difference in phase among these genes, suggesting that cis mutant enrichment in oncogenes reflects selective pressure. Notably, while not precluding resistance in trans[14], all identified secondary resistance mutations arose in cis[15-17] (n=18; P = 0.02, two-sided Fisher’s test; Fig. 3b, Extended Data Fig. 4), suggesting exogenous selective pressures drive, in part, the phase of composite mutations.

Fig. 3:

Cis- and trans-acting composite mutants.

The phase of composite mutations by their a) type and affected cancer gene (P for starred comparisons from left to right are 4×10−4, 2×10−5, 3×10−33, 8×10−24, 8×10−24, and not significant was 0.3, two-sided Fisher’s exact test, n=1,062 evaluable composite mutations and error bars are 95% binomial CIs); b) association or not with acquired therapy resistance (P, two-sided Fisher’s exact test, n=1,198 evaluable composite mutations); c) affected individual oncogenes and TSGs (top and bottom, known or predicted functional mutations in ≥10 phase-able tumors, number of cases with phase-able composite mutations as indicated). d) The pattern of TP53 composite mutations with arcing lines indicating the position of pairs of mutations in ≥2 tumors; height corresponds to recurrence. At bottom, the number of mutated cases at each individual residue and the Q of significance (FDR-adjusted P value from one-sided binomial test) for each residue as arising in composite. TAD, transactivation domain; OD, oligomerization domain. e) Schematic of the experimental workflow for generating isogenic cells for phenotypic comparison of TP53 mutations. f) Heatmap of the top 30 differentially expressed genes between TP53-, TP53-, and TP53-mutant cells.

Extended Data Fig. 4:

Cis composite secondary resistance mutations.

The cis composite mutations classified as arising in post-treatment specimens due to acquired resistance to one of several molecularly targeted therapies in the study cohort.

Despite these patterns, extensive variability existed in the phase of composite mutations in individual cancer genes (Fig. 3c). EGFR, TERT, and PIK3CA had the highest percentage of cis composite mutations among oncogenes (88–97%). Remarkably, prevalent cis-acting composite mutations were observed even among canonical TSGs, comprising 77.1% of all composite mutations in these genes. Here, TP53 was notable as 43% of all phase-able composite mutations (n = 70 of 163) were cis-acting and enriched in a cluster of residues near the C-terminal end of the DNA binding domain (E287, E285, E271, and R280; Fig. 3d). While short-read sequencing technologies restricts phasing to variants within close physical proximity and potentially overestimates the prevalence of cis mutations, these data are nevertheless inconsistent with conventional loss-of-function via biallelic inactivation and may suggest a broader functional effect of composite mutations in TP53 and other TSGs. To assess the phenotypic consequence of cis-acting composite mutations in the DNA binding domain of TP53, we developed an isogenic system for acute TP53 reconstitution. As E287D was the most significant residue enriched for composite mutants, we focused on a representative TP53 R280T-E287D cis composite mutant. To model its impact in the lineage of affected tumors, we transduced Kras p53 mouse lung cancer cells with GFP-labeled retroviral constructs encoding WT, R280T, E287D, or cis R280T-E287D p53 cDNAs (residues R277T and E284D in mice) after which GFP-expressing cells were selected and RNA sequencing was performed (Fig. 3e, Extended Data Fig. 5a, Supplementary Table 5). TP53 mRNA expression was stable and robust, while TP53, TP53, and TP53 led to a decrease in p21 (CDKN1A) induction, a surrogate marker of p53 functionality (Extended Data Fig. 5b–c). TP53 cells transcriptionally resembled TP53, while TP53 cells resembled TP53 (Extended Data Fig. 5d). By contrast, TP53 cells had a mixed transcriptional phenotype, bearing a dominant differential expression signature equivalent to the one induced by either TP53 or TP53 while retaining a TP53-like down-regulation of the AP-1 transcription factor program (Fig. 3f, Extended Data Fig. 5e). These data correlated with human tumor genomics, whereby null-like TP53 R280T was common, but TP53 E287D was rare and nearly always arose as a composite mutation (Extended Data Fig. 5f). A second cis-acting composite mutant (TP53) similarly promoted a transcriptional program distinct from its constituent mutations (Extended Data Fig. 5g). Importantly, TP53 was not associated with increased growth in vitro or survival in vivo compared to the individual mutations (Extended Data Fig. 5h–i). Collectively, these data suggest that cis-acting TP53 composite mutations tune mutant p53 transcriptional phenotypes, leading to a selection of function absent from null-like single TP53 mutations.

Extended Data Fig. 5:

Phenotypic characterization of TP53 composite mutants.

a) TP53 R270T-E287D mutant lung adenocarcinoma. Left, mutant allele fractions of clonal TP53 mutations consistent with loss of WT TP53 (error bars, 95% binomial CIs). Expected mutant allele fractions of different copy number states are shown as horizontal lines. Mutant KEAP1 in the same tumor (with LOH) is shown for reference. Right, spanning reads indicating cis mutations. b) Right and left, TP53 and CDKN1A mRNA expression in Kras p53 mouse lung cancer cells expressing distinct p53 genotypes. Bars, average of three replicates, error bars are 95% confidence intervals. c) The aggregate Z-score per replicate for the mRNA expression of canonical p53 target genes [n=3 replicates per allele; box center is median, edges are 25 and 75% quartiles, whiskers are minima/maxima of the most extreme values]. d) Principal component analysis (PCA) of the transcriptomes of TP53 genotypes (n=3 replicates shown per condition). e) Dendrogram as in main text Fig. 3f indicating the genes of interest [effectors of the AP-1 transcription factor network (PID_AP1_PATHWAY; Q = 1.4e-7 based on mSigDB’s computed overlap with n=5,501 gene sets from the curated C2 collection)]. f) The prevalence of TP53 R280T and E287D mutations (top) and the fraction arising as composite mutants (bottom). In parentheses, corresponding mouse alleles. g) PCA of the transcriptomes of the TP53 R277K-E282K composite mutation genotypes (as in panel d, n=3 replicates per allele). h) The percentage of GFP+ FACS-purified KP LUAD cells stably transduced with pMIG-Empty or pMIG-p53-R277T-E284D and cultured in vitro for 10 days in a 60:40 mixture with untransduced parental cells (bar indicates mean, error bars are standard deviation, n=3 independent infections). i) Overall survival of the indicated genotypes stably transduced in FACS-purified KP LUAD cells (n=100,000 cells) and injected into the tail vein of immuno-compromised mice.

Conditionally dependent mutant alleles

The residue-specific transcriptional phenotypes of TP53 composite mutants suggest broader allele-specific selection among composite mutations. We therefore identified individual alleles exhibiting an excess of composite mutations (see Methods). In total, 86 mutant residues in 24 cancer genes were enriched for arising as composite variants (Q < 0.01) (Fig. 4a, Supplementary Table 4). Nearly 70% of these mutations occurred in only four genes (TP53, PIK3CA, APC, and EGFR), with few reaching saturation for discovery at the current cohort size, and 56% also arising as individually significant hotspot mutations (Fig. 2b and Extended Data Fig. 6)[13]. As with TP53, several TSGs had mutant allele-specific enrichment that may suggest selection for something other than conventional loss-of-function. In PIK3CA, mutations enriched for composite mutants (E726, E453, K111, R108, R93) were nearly always in cis when phase-able and often arose through APOBEC-associated mutagenesis (Extended Data Fig. 7). Notably, composite PIK3CA mutations drive elevated PI3K activity, downstream signaling, cell proliferation, tumor growth, and may increase PI3K inhibitor sensitivity[18], confirming that in addition to introducing passenger mutations, APOBEC and other mutational processes create numerous functional driver mutations.

Fig. 4:

Mutant allele-specific enrichment for composite mutations.

a) Enrichment significance of individual mutant residues arising in composite mutations (n=1,821 distinct mutant sites tested; n=155,241 variants overall) compared to significance of composite enrichment among genes (Q for mutant sites is FDR-adjusted one-sided Fisher’s exact test and Q for genes, refer to Fig. 2b). b) The degree of TERT expression induced by transient transfection of the indicated mutations individually or as cis composite in three melanoma cell lines. Shown is average and standard error (error bars) across n=4 or 5 replicates per allele. P, two-way ANOVA assessing expression as a function of genotype and baseline expression of each cell line (see Methods); at bottom, P<10−8 values from left to right are 3×10−9, 1×10−9, 2×10−9, and 2×10-11.

Extended Data Fig. 6:

Saturation analysis of genes for composite mutation detection.

Down-sampling indicates the number of residues identified as enriched for arising in composite mutations in each of four genes (Q < 0.1, FDR-adjusted one-sided Fisher’s exact tests as in Fig. 4a; n=1,000 – 26,997 patients per down-sample) as a function of the number of tumors sequenced (loess fit is shown with 95% confidence interval). Four genes shown that accounted for the greatest proportion of all enriched residues detected (main text Fig. 4a). EGFR appears to reach saturation for discovery of residues enriched for arising in composite, whereas the other genes have not yet reached saturation for discovery at the current cohort size.

Extended Data Fig. 7:

Mutational signature attribution among composite mutations.

a) The fraction of all composite mutations identified here in which one or both individual mutations could be unambiguously attributed to an established mutational signature. The majority of composite variants could not be directly attributed to APOBEC, UV, smoking, or other known mutational signatures. b) The fraction of composite mutations per gene in which one or both variants could be attributed to an established mutational signature.

Multiple significant residues appeared to be conditional alleles, rarely arising without a second cis activating mutation (Extended Data Fig. 8a). Among these were EGFR-mutant residues (E709, V834, and L833)[19] and the TERT promoter mutation 205G>A (Fig. 4a). TERT promoter mutations are common in human cancer[20] and create novel GABPA binding sites that promote aberrant telomerase activity[21]. 205G>A was the sixth most common TERT promoter mutant and exclusively arose in cis (n = 13 of 13) with either the highly prevalent 228G>A or 250G>A hotspots which, despite their frequency, were never together in composite (Extended Data Fig. 8b). To test if 205G>A synergizes with existing promoter mutations to enhance TERT expression, we expressed constructs with a luciferase reporter engineered to contain various TERT promoter mutations alone or as cis composite mutants in three melanoma cell lines (A375, Sk-Mel2, and Sk-Mel30). TERT induced modest TERT expression compared to wildtype, but less than TERT or TERT alone. Consistently, TERT creates a novel motif that GAPBA binds with lower affinity than those created by canonical TERT hotspots (Extended Data Fig. 8c). The selective pressure for TERT is therefore likely based on the cooperativity of tandem motifs associated with it and canonical promoter hotspots bound by GABPA heterotetramer complexes[21]. When expressing TERT as a cis-composite with either TERT or TERT, thereby modeling the 205G>A-mutant human tumors, TERT expression increased relative to either mutation alone (Fig. 4b). These data suggest that 205G>A is hypermorphic, driving modestly elevated TERT expression that is weakly selected for and therefore does not arise as an individual hotspot mutation, but is instead a conditionally dependent composite allele.

Extended Data Fig. 8:

Conditional mutant alleles.

a) The number of affected cases harboring each of the indicated somatic mutations in TERT, EGFR, or PIK3CA as either individual mutations (top) or as part of composite mutants (bottom). Conditional mutations were defined as those statistically enriched for arising as part of composite mutations, but seldom as individual hotspot mutations in cancer (predominantly accompanied by a second somatic mutation). b) The incidence of TERT promoter mutations and the fraction arising as composite mutations (orange). Bottom, the co-occurrence and mutual exclusivity of composite mutations in the TERT promoter (Ps for annotated tiles are: five, 0.002; six, 3×10−7; zero, 1×10−25; two-sided Fisher’s exact test, n=29,507 patients). c) Transcription factor GABPA binding affinity for mutant and wildtype TERT promoter sequences at the 228G>A, 250G>A and the conditional 205G>A allele.

Our results indicate that composite mutations are driver alterations whose selective advantage appears to be primarily determined by their allelic configuration and context. No single model explains the context-dependent phenotypic consequences of composite mutations. In some cancer genes whose function is dosage-dependent, cis-acting composite mutants are additive and arise predominantly in weakly oncogenic alleles and genes (e.g. PIK3CA[22-24]). This suggests an evolutionary model whereby the second mutation arises through selection for hypermorphic activity beyond the level sufficient for activation by the first allele. In other genes like TP53 whose phenotypic consequences are manifold, cis mutants seem to drive functional innovation. There, the evolutionary advantage consistent with our results is via tuning subtle phenotypic differences conferred by the asymmetric combination of the output of individual mutations. Mutant cancer genes must ultimately be considered, both biologically and clinically, in their allelic context, with implications for our understanding of cancer gene function, malignant phenotypes, and therapy.

Methods

Prospective sequencing cohort

Somatic mutation data consisted of 34,650 tumor and matched normal specimens from 31,359 patients with prospectively characterized solid cancers. All patients provided written informed consent and were prospectively sequenced as part of their active care at Memorial Sloan Kettering Cancer Center (MSKCC) between Jan. 2014 and Apr. 2019 as part of an Institutional Review Board-approved research protocol (NCT01775072). Details of patient consent, sample acquisition, sequencing and mutational analysis have been previously published[25,26]. Briefly, matched tumor and blood specimens for each patient were sequenced using MSK-IMPACT, a custom hybridization capture-based next-generation sequencing assay. All samples were sequenced with one of three incrementally larger versions of the assay encompassing 341, 410, and 468 cancer-associated genes, respectively. The study cohort consisted of tumors samples with one of 429 distinct cancer subtypes. For the purposes of grouping histological subtypes into primary cancer diagnosis, we utilized the OncoTree structured classification of disease (http://oncotree.mskcc.org). Histologic subtypes of fewer than 50 tumor samples were aggregated into a miscellaneous category and non-solid tumor types were excluded from the study cohort (as well as from analyses of The Cancer Genome Atlas data), resulting in a final cohort of 41 distinct tumor types.

Mutational data and annotation

Somatic nonsynonymous substitutions and small insertions and deletions (indels) were identified with a clinically validated pipeline as previously described[26,27]. Each mutation was classified as likely functional if it was previously reported as a mutational hotspot[12,13] or was part of a cluster of spatially co-located residues that arose in physical proximity in the folded protein in three dimensions[28]. Truncating variants were considered likely functional if they arose in known tumor suppressor genes based on gene function curated by OncoKB[29]. Finally, any additional somatic mutations not satisfying the aforementioned criteria were similarly annotated as likely functional if previously curated via literature mining by OncoKB as oncogenic, likely oncogenic, or predicted oncogenic[29]. For all composite mutants where one or both mutations were a known therapeutic target or known resistance mutation as defined by OncoKB levels 1–4, R1, or R2 alterations (annotation as of April 2019), each mutation was manually reviewed and classified as a likely resistance mutation based on the cancer type of the affected tumor sample, the existence of known resistance mutations to commonly-used targeted therapies indicated for the given cancer type, and if available, review of the clinical histories of affected patients. Composite mutations in which one mutation was an established second-site mutation (e.g. EGFR T790M in non-small cell lung cancer[17] and AR mutations in prostate cancer mediating resistance to anti-androgen therapy) were always classified as resistance mutations. Notably, composite mutations in only 3.4% of cases in this advanced and post-treatment cohort have been associated with therapy resistance, indicating that prior therapy exposure alone cannot explain their prevalence. However, as detailed clinical histories including prior lines of treatment and response phenotypes were not available for all patients, a small number of composite mutations are likely misclassified as non-resistance-associated.

Mutational burden classification

Tumor samples were classified as hypermutated if they harbored either microsatellite instability/mismatch repair deficiency, DNA polymerase epsilon (POLE)-mediated ultra-mutation, or temozolomide (TMZ)-induced hypermutation[30]. Microsatellite instability (MSI) was considered present for any tumor with an MSISensor[31] score of greater than or equal to 10 as previously clinically validated[32]. Tumor samples with POLE, MMR, and TMZ-induced hypermutation were identified by mutational signature decomposition analysis. Briefly, in each tumor specimen with 20 or more substitutions, the proportion of mutations attributable to each of 30 known somatic mutational signatures were calculated based on a basin-hopping algorithm (https://github.com/mskcc/mutation-signatures)33. This method uses the distribution of 96 unique trinucleotides generated by 6 possible C or T-normalized single-nucleotide substitutions (i.e. C>A, C>G, C>T, T>A, T>C, T>G) and their 5’ and 3’-adjacent bases to estimate the fraction of mutations attributed to each mutational signature in each specimen. Tumor specimens for which at least 20% of its substitutions were attributed to POLE (signatures 10 or 14), TMZ (signature 11), or MMR (signatures 6, 15, 20, 21, 26) were classified as hypermutated. To classify tumor specimens with a high mutational burden compared to the majority of cancers of that type, but that otherwise lack one of these known mechanisms of hypermutation, we performed in each individual cancer type of greater than 50 tumor specimens 1-dimensional k-means clustering of the mutational burden of all tumors (nonsynonymous exonic mutations per Mb). Between 1 and 9 clusters were inferred to best describe the distribution of mutational burden per cancer type. The cluster of lowest mutational burden centered at 20+ mutations/Mb and accounting for <10% of the samples in tumor type established the threshold for high mutational burden, and all tumor specimens in this cluster or those clusters with higher mutational burden were considered high mutational burden.

Composite mutation identification and annotation

For the purposes of this analysis, a composite mutation was the occurrence of two or more somatic mutations to the same gene within a single sequenced tumor specimen. Carriers of pathogenic germline variants with a second somatic mutation were not considered here. We identified composite mutations as arising due to somatic hypermutation or high mutational burden of unknown etiology (as defined above), or a mechanism of resistance to targeted therapy per the aforementioned annotation in non-hypermutated tumors. Any composite mutation arising in a hypermutated tumor was considered separately and excluded from primary analyses unless otherwise noted. All composite mutations not meeting these criteria were analyzed further.

Population, gene, and residue-specific composite mutation enrichment testing

Multiple somatic mutations will accumulate in a gene in the absence of selection at a rate that correlates with the mutational burden and mutational mechanisms of a given tumor. Using a permutation-based framework, we simulated the burden of composite mutations for a given tumor mutation burden. Briefly, the true number of tumor specimens harboring a composite mutation was calculated (n). We assembled an m x 2 matrix, where m is the total number of nonsynonymous somatic mutations in our cohort. Each row in the matrix identified the sample and the gene in which a particular mutation arose. We constructed a null distribution by randomly permuting the second column of this matrix 100,000 times, thereby preserving the mutation burden of each gene and each tumor specimen. Upon each iteration, the number of tumor specimens harboring a composite mutation was reassessed. An empirical p-value was calculated as the fraction of permutations satisfying ni ≥ ntrue. We used the same procedure for assessing the enrichment of composite mutations for tumor samples in ranges of specific mutational burdens. To test for enrichment or depletion for composite mutations within cancer types (in cancer types with greater than 50 profiled tumors), we used a modified permutation analysis controlling for the underlying gene-specific tendency for mutated genes within each cancer type to harbor a composite. To do so, we defined a mutation event to be a tumor sample-mutated gene tuple. A mutation event (s,g) occurs when a tumor sample s was found to harbor one or more mutations to a gene g. Then, we implemented a permutation analysis that shuffles mutations across samples in a manner that preserves 1) gene mutation burden, 2) tumor sample mutation burden, and 3) the total number of mutation events that were observed in that cancer type using the permatswap function in the R package vegan[34]. This final constraint enforces that the number of non-zero entries in the mutation event matrix (the binary matrix of patients and genes) remains constant for each permutation. This constraint is particularly relevant in cancer types whose mutation burden is dominated by genes that are depleted of composite mutations (e.g. KRAS in pancreatic cancer, BRAF or KRAS in thyroid cancer). We evaluated the enrichment of composite mutations in each gene by modeling composite mutation burden as a function of genomic covariates, testing the likelihood of the observed number of composite mutations (corresponding to the probability of observing this burden of composite mutations by chance) using a binomial test. To parametrize (the background rate of composite mutations in the absence of selection for each gene g), we estimated the expected number of composite mutated samples in a gene from the total number of samples with an observed mutation in the gene n, such that . Dropping the superscript for clarity, was estimated for each gene using negative binomial regression to model the observed number of composite-mutant samples in a gene n as a function of the global background rate of composite mutations across all genes, adjusted for multiple covariates per gene including its replication timing r, coding sequence length l, the percent of GC content g and the chromatin state of the gene h. Coding sequence length and percent of GC content were obtained from the Biomart community portal[35] for Ensembl human reference genome GRCh37. For the purposes of statistical testing, the non-coding promoter region of TERT was added as a distinct unit (gene) for which we computed distinct values of percent GC content and length for the region targeted by the MSK-IMPACT assay design. Replication timing and chromatin state for each gene were obtained from previous estimates[9]. Additional covariates included the version of the MSK-IMPACT assay in which the gene was introduced i, and the average total DNA copy number of the gene across its mutated samples t. As the composite mutation rate for a gene depends on both the number of composite mutant tumors and the number of samples mutated (i.e. the exposure for the count of composite mutants), an offset term was added to the model that represents the log-number of tumor samples harboring mutations in the gene of interest. The observed number of composite mutant tumors for a gene was therefore modelled: Using this model, we predicted the number of composite mutant tumors for each gene arising by chance , calculating the expected fraction of samples with a composite mutation (out of the total number of mutated samples) in each gene . We then used a binomial test to evaluate the null hypothesis that for each gene the observed number of composite mutations arose due to random chance. Here, we modeled the incidence of composite mutations per gene using a binomial distribution, and calculated the probability of n tumor specimens harboring composite mutations in n tumor specimens by chance given : Our parameterization was estimated using nonsynonymous mutations, including those under positive selection in cancer (e.g. hotspots), which may reduce overall model sensitivity. We therefore evaluated one of multiple alternative parameterizations of , including using 1) nonsynonymous mutational data that excludes known hotspot mutations under selection, and 2) only synonymous mutations. Neither alternative parameterization produced a qualitatively distinct result for genes originally detected as significantly enriched but did increase the overall sensitivity of the test. To ensure appropriate control for potential false positive findings, we leveraged the parameterization from the complete dataset on nonsynonymous mutational data. Moreover, we observed no difference in the rate of synonymous mutations among genes that were either enriched for composite mutations or not (P = 0.2, Mann-Whitney U test), indicating there was little evidence for the accumulation of variants in the absence of selective pressure. Finally, all unique individual mutant residues present in five or more non-hypermutated cases excluding known or likely resistance mutations were also assessed for the significance of their enrichment for arising as composite mutations. All missense, nonsense, splice-site, and translation start-site mutations at a given residue were included, as were unique mutant positions in the promoter of TERT and in-frame indels spanning known hotspots of clustered indels[13]. For each residue in a given gene, we assessed whether it arose as part of a composite mutation significantly more often than all other mutant residues in the same gene using a right-sided Fisher’s exact test. Mutant residues were considered significant at FDR-corrected Q < 0.01 (see below).

Attributing mutations to mutagenic processes

We attributed the individual variants that comprise composite mutations to a mutational origin using one of 30 established mutational signatures[36,37]. Mutational signature decomposition in each tumor was performed as described above and a signature was considered present if it accounted for five or more substitutions in the affected specimen (to ensure high confidence decompositions in targeted sequencing data with comparatively fewer mutations relative to broader-scale sequencing). Multiple signatures of the same etiology were merged by combining the frequency distribution of trinucleotide contexts (APOBEC signatures 2 and 13; MMR signatures 6, 15, 20, 21, and 26; Smoking-associated signatures 4, 18, 24, and 29). A substitution was attributed to a mutational signature present in a given case if the probability weight of the relevant trinucleotide exceeded 10%. For substitutions attributed to multiple signatures present in an affected tumor, it was attributed to the signature that was most frequently associated with the affected cancer type. To adjust for the non-specificity of trinucleotide context probabilities for smoking-associated signatures, C>A mutations regardless of trinucleotide context were considered smoking-associated in tumors for which mutational signature decomposition identified a smoking signature (in esophageal squamous and adenocarcinomas; head and neck squamous; hepatobiliary; hepatocellular; lung squamous, adenocarcinoma, and adeno-squamous, oral cavity, and renal cell carcinoma)[38]. Substitutions of a trinucleotide context of insufficient probability in any signature in an affected tumor was considered of ambiguous origin and not attributable while those mutations that could be attributed to aging and another signature present in a given tumor was considered non-separable and classified has being of multiple signatures. Finally, we also considered several additional mechanisms that can drive site-specific mutation rates as potential sources of composite mutations[39,40,41]. First, we estimated the mutation rate within 1kb up- and downstream of all nucleosome dyads (obtained from https://bitbucket.org/bbglab/nucleosome-periodicity/src/master/) mapping to regions sequenced in the MSK-IMPACT panels. Having fit a spline to the mutation rate distribution, we calculated the full-width-half-maximum distances from the dyad and compared the rate of singleton and composite mutations within this region (Extended Data Fig. 2b). We conducted a similar analysis on the potential effect of active coding transcription factor binding sites (TFBSs) on composite mutations. We obtained the positions of active TFBS in coding regions of the genome via integration with DHS binding sites in human melanocytes following an established procedure[40]. The mutation rate within 1kb of these active TFBS were inferred using TCGA cutaneous melanoma samples from the TCGA MC3 dataset to increase the total number of mutations among melanoma samples. We then assessed the proximity of singleton and composite mutations to the elevated mutation rate at TFBS sites as described for nucleosome dyads (Extended. Data Fig. 2). To investigate the effect of APOBEC3A-mediated mutagenesis, we obtained the position of the optimal stem-loop DNA structure favored by APOBEC3A from published sources[41]. We investigated the overlap of such optimal sites with those mutant alleles enriched for arising as a composite mutation. In total, only 1 of 86 significant residues enriched for arising as composite mutations was at the position of the optimal APOBEC3A substrate (ARID1A S2264). Finally, we compared the rate of composite mutations involving known hotspot mutations as described above with those derived from an orthogonal method optimized to reduce false positive mutations due to site-specific mutagenesis[42]. Controlling for overlapping gene content, there was no difference between the proportion of composite mutations involving hotspot mutations based on the origin of the hotspot mutations [percent and 95% CI are: 9.6 (9.2–10) versus 10 (9.6–10.5), P = 0.2, two-sample Z-test], indicating no excess of false positive hotspots due to site-specific mutagenesis are driving the results described here.

Phasing composite mutations

The allelic configuration of composite mutations (phase), in cis (arising on the same allele) or in trans (arising on different alleles), was inferred primarily from sequencing read support. Briefly, for each pair of somatic mutations in a composite mutant, all reads spanning the relevant loci were re-aligned to the reference genome (hg19) by pairwise sequence alignment using a Needleman-Wunsch algorithm[43]. The number of unique reads supporting both wildtype alleles (AB), both mutant alleles (ab), or a mixture of mutant and wildtype alleles (aB, Ab) were subsequently tabulated. For the purposes of the present study, composite mutations were classified in cis when: 1) three or more spanning reads supported both mutant alleles (ab ≥ 3), and 2) at least one of these variants was called by two or fewer spanning reads that called the other variant as wildtype (aB ≤ 2 | Ab ≤ 2). Composite mutations were classified in trans when: 1) each variant was supported by three or more reads that were simultaneously wildtype for its partner mutation (aB ≥ 3 and Ab ≥ 3), and 2) two or fewer reads called both mutant alleles (AB ≤ 2), and 3) the mutations arose in the same tumor cell population based on their cancer cell fractions (CCFs, see above). We note that there is an inherent difference in the sensitivity of detection for cis and trans variants, specifically that trans variants must satisfy at least two read-support positive criteria (aB ≥ 3 and Ab ≥ 3) and are required to be in the same cell, whereas cis variants require only a single positive criterion (ab ≥ 3) without any constraint of evidence for arising in the same cell. This difference in sensitivity for detection likely explains, to some extent, the increased number of cis relative to trans composite mutations. To determine the effect of this sensitivity bias, we also phased variants with at least one synonymous mutation. We observed no difference in the rate of synonymous composite mutations in oncogenes versus tumor suppressors (5% vs 7%, P = 0.2, Mann-Whitney U test), in contrast to the significant difference in nonsynonymous composite mutations (14% vs 35%, P <10−6). To control for differences in the sensitivity of detection of cis and trans mutations, analyses of the effects of allelic configuration on composite mutations compared the relative fraction of cis/trans mutations between two defined groups (e.g. oncogenes vs. TSGs). We additionally inferred the phase of select composite mutants associated with therapeutic resistance mutations in regions of clonal loss of heterozygosity [(copy-neutral-) LOH]. Composite mutants spanned by LOH were assumed in cis if the spanning locus had a minor copy number of zero and a total copy number of one or more (LOH via heterozygous loss, copy-neutral LOH, or the latter combined with subsequent genomic gains) inferred from the aforementioned purity-corrected integer copy number data from FACETS. These must also have arisen in the same tumor cell population as estimated from CCFs (as described above) and their observed mutant allele frequencies (MAFs) were approximately equal to the expected MAFs for a given copy number state in a cis allelic configuration (95% CIs of the observed MAF overlap the expected MAF of the given copy number configuration, controlling for tumor purity). Composite mutations not satisfying any of the aforementioned conditions were not able to be unambiguously phased. As with other short-read sequencing data, our phasing approach is limited by the requirement that any two mutations arise within sufficient physical proximity in the genome to be spanned by common aligned sequencing reads. While the higher depth of sequencing coverage in our targeted clinical sequencing platform (~700-fold median in the tumor samples) does increase the likelihood of sequencing a fragment of tumor DNA encompassing both somatic mutations, and improves the quantification of CCFs by reducing measurement error[8], this limitation cannot be overcome with short-read sequencing.

Assessing cellular context and molecular timing

We estimated the clonality of all somatic mutations in each affected tumor specimen (the cancer cell fraction or CCF) as described previously using the FACETS framework[8]. To ensure conservative estimates, all somatic mutations were conservatively assumed to have arisen on the major (more common) allele, thus minimizing the possibility of overestimating the CCF. To determine if the constituents of a composite mutation arose in the same cell, we defined a criterion based on the confidence intervals (CIs) of the CCF. Specifically, if the sum of the lower 95% CIs for each mutation CCF summed to greater than 1, the two somatic mutations in the same gene and tumor specimen were considered to exist within the same cancer cell population. If either of the two somatic mutations were clonal (the upper 95% CI overlapped 1), then both mutations were considered to have arisen in the same tumor cell population. We inferred the chronological order of two somatic mutations in each composite mutation based on their estimated CCFs. Any mutations previously associated with acquired resistance to targeted therapies were excluded, as these will arise after the originating sensitizing lesion and skew results. Only composite mutations determined to arise in the same tumor cell population (based on the sum of CCFs, described above) were considered and required previous evidence establishing both mutations as candidate functional driver mutations individually. The 95% CI of the CCFs of both mutations were inferred as previously described[44]. If the lower 95% CI was greater than the upper bound of the other variant, then the first mutation was determined to have a greater clonality, and therefore to have arisen first in the tumor. Similarly, if the upper 95% CI of a mutation was less than the lower bound of the other mutation in the composite, it was considered to have arisen second. If the 95% CIs of CCFs of the two mutations in the composite overlapped, or if there was not sufficient evidence that the two mutations existed in the same cancer cell population in the affected tumor specimen, we considered their chronology to be indeterminate.

TP53 composite mutation analysis and validation studies

For the generation of MSCV-p53-IRES-GFP constructs (pMIG-p53 cDNAs), methods were as follows. Fragments encoding wildtype, single, or composite mutant p53 cDNAs were obtained from IDT or SGI-DNA and cloned into pMIG (Addgene #9044) using standard restriction enzyme-based methods. Briefly, p53 cDNAs were amplified using primers that add BglII and EcoRI restriction sites on the 5’ and 3’ regions, respectively, and subsequently digested and cloned into linearized pMIG backbone harboring BglII and EcoRI cloning overhangs. All constructs were sequence-verified using Sanger sequencing. Primer sequences are available in Supplementary Table 5. HEK293T (ATCC CRL-3216) cells were obtained from ATCC. Murine KrasG12D/+;Trp53−/− (KP) lung adenocarcinoma (LUAD) cells were provided by the Jacks laboratory [45]. All cells were maintained in a humidified incubator at 37°C with 5% CO2 and grown in DMEM supplemented with 10% FBS and 100 IU/ml penicillin/streptomycin. For virus production, 7.5 million HEK293T cells were plated in 15cm plates the day before transfection. The following day cells were transfected with 10ug pMIG-p53 cDNA (or pMIG-Empty as control) and 10ug of pCL-Eco (Addgene #12371) using 50uL of lipofectamine 2000 (ThermoFisher). Twenty-four hours following transfection media was replaced with fresh DMEM. Two rounds of virus were harvested (at 48 and 72hrs post-transfection), pooled, and kept at 4°C until used for cell transduction. One million KP LUAD cells were seeded in 10cm plates and immediately transduced with retroviral supernatants and 8ug/mL polybrene. Cells were grown for 48hrs before purifying using fluorescence activated cell sorting (FACS). All transductions were done in triplicate. Following transduction, stable GFP+ populations were purified by FACS on a FACSAria (BD Biosciences). 120hrs post-transduction, total RNA was isolated using the RNeasy Mini Kit (Qiagen) following standard manufacturer protocols. Purified polyA mRNA was subsequently fragmented and first and second strand cDNA synthesis performed using standard Illumina mRNA TruSeq library preparation protocols. Double-stranded cDNA was subsequently processed for TruSeq dual-index Illumina library generation. For sequencing, pooled multiplexed libraries were sequenced on NextSeq instrumentation in high-output mode, generating approximately 12 million 76bp single-end reads per replicate condition. The resulting RNA sequencing data was analyzed by first trimming adaptor sequences using Trimmomatic[46]. Sequencing reads were aligned to GRCm38.p5(mm10) using STAR[47], and genome-wide transcript quantification was performed using featureCounts[48]. After removing transcripts with fewer than eight aligned reads (low undetected expression at given library size, n=9848 transcripts retained), differentially expressed genes were identified using DESeq2, with a cutoff of absolute log2FoldChange ≥ 1 and adjusted P < 0.01 between experimental conditions[49]. Mouse genes were mapping to human homologs using gene homologies provided by the Mouse Genome Database (MGD) Project[50]. Principal components analysis was performed with output from DESeq2[49]. For fluorescent competition assays, FACS-purified KP LUAD cells stably transduced with either pMIG-Empty or pMIG-p53-R277T-E284D were mixed ~60:40 with untransduced parental cells and cultured in vitro for 10 days. The percentage of GFP+ cells was monitored over time using a Guava easyCyte HT flow cytometer (Millipore). All mouse experiments were approved by the MSKCC Internal Animal Care and Use Committee. No pre-specified sample size was required, and 5 or 10 mice per condition were utilized. Mice were maintained under specific pathogen-free conditions and food and water were provided ad libitum. Mice (Hsd:Athymic Nude-Foxn1, abbreviated Nu/Nu) were purchased from Envigo (stock #069). For experiments involving orthotopic transplantation of Kras;Trp53 lung adenocarcinoma (KP LUAD) cells, 100,000 cells stably transduced with either empty vector (pMIG-Empty) or p53 mutant cDNAs (pMIG-p53-R277T, pMIG-p53-E284D, or pMIG-p53-R277T-E284D) were resuspended in 200uL of PBS and tail vein injected into 6–8 week old Nu/Nu female mice. These stable cell populations were generated and FACS-purified as described above, and injected 120hrs post-transduction.

TERT promoter mutation analysis and validation

TERT promoter mutations present in five or more patients, accounting for multiple samples per patient, were assessed for co-occurrence and mutual exclusivity among composite mutations via two-sided Fisher’s exact test. A pair of somatic mutations with P < 0.01 were considered co-occurring (or mutually-exclusive) if their log odds ratio was greater (or less) than zero. To predict the affinity for GABPA to bind TERT promoter mutant alleles, 31-bp DNA sequences for wildtype or mutant TERT centered on each of 205G>A, 228G>A, and 250G>A (chr5:1295205G>A) mutations were extracted and generated by editing the appropriate base. The position frequency matrix for GABPA binding profiles in humans was acquired from JASPAR2018[51] (Matrix ID: MA0062.1), and scores quantifying the predicted affinity of GABPA for each TERT promoter sequence were calculated using TFBSTools[52]. Only binding site motifs overlapping the relevant locus in each of the wildtype and mutant sequence were retained. P-values quantifying the likelihood of a GABPA binding site in each sequence to arise by chance were calculated using TFMPvalue[53]. To assess the effect of TERT promoter composite mutations on TERT expression, A375, Sk-Mel2, and Sk-Mel30 melanoma cell lines were obtained (kindly provided by Rosen and Merghoub laboratories). pGL4.0-TERT WT, G228A, and G250A plasmids were provided by the Costello laboratory (Addgene plasmids #84924, #84926, #84925)[21]. pGL4.0-TERT G205A, G205A/G228A, and G205A/G250A plasmids were generated using Q5 Site-Directed mutagenesis kit (NEB, E0554S). All plasmids were verified using Sanger sequencing. Thereafter, 1 × 104 cells from A375, Sk-Mel2, and Sk-Mel30 were seeded into each well of 96-well plates. Cells were transiently transfected with pGL4.0-empty vector (Promega), TERT WT, or mutant plasmids (180ng/well) along with pGL4.74[hRluc/TK] Vector (18ng/well, Promega) as internal control using Lipofectamine 3000 (Thermo Fisher). Dual luciferase activity measurement was performed 48 hours after transfection using the Dual-Luciferase Reporter Assay System (Promega) following the manufacturer’s instructions. The firefly luciferase activity of individual wells was normalized relative to Renilla luciferase activity. Experiments were performed in biological tetraplicates or pentaplicates. To quantify the effect of a specific TERT variant, we compared individual genotypes (e.g. G205A to WT) using linear models of Luciferase expression, where we controlled for the baseline telomerase expression of each cell line, i.e. luc ~ variant + cell line + constant where variant is a binary term encoding the presence/absence of a genotype (relative to the chosen reference), and cell line is a factor introduced to control for the contribution of each cell line’s baseline expression. All cell lines utilized for either the TERT or TP53 functional validation experiments were authenticated by short tandem repeat analysis and confirmed negative for mycoplasma.

Statistical analyses and figures

All statistical analyses were performed using the R statistical programming environment (version 3.5.0). Figures were generated using either base R or the ggplot2 library. Error bars indicate the 95% binomial CIs calculated using the Pearson-Klopper method, unless otherwise noted. CIs for the down-sampling analysis were calculated using the loess.sd function from the msir library. P-values for the difference in proportions were calculated using Fisher’s exact test or two-sample Z-tests, unless otherwise noted. P-values were corrected for multiple comparisons using the Benjamini-Hochberg method and reported as Q-values when applicable.

Data and code availability

All mutational data from the prospective sequencing cohort is available at http://download.cbioportal.org/composite_mutations_maf.txt.gz. Mutational data from The Cancer Genome Atlas was acquired from https://gdc.cancer.gov/about-data/publications/pancanatlas. RNA sequencing data were deposited in the GEO with accession number GSE136295. All other genomic and clinical data accompanies the manuscript and is available as Extended Data and Supplementary Information. All other materials are available upon request from the authors. Source code for these analyses is available at https://github.com/taylor-lab/composite-mutations.

Study cohort and rates of composite mutations.

Sources of local hypermutation.

Number and distribution of composite events across genes.

Cis composite secondary resistance mutations.

The cis composite mutations classified as arising in post-treatment specimens due to acquired resistance to one of several molecularly targeted therapies in the study cohort.

Phenotypic characterization of TP53 composite mutants.

Saturation analysis of genes for composite mutation detection.

Mutational signature attribution among composite mutations.

Conditional mutant alleles.

51 in total

1. Mutation selection and the natural history of cancer.

Authors: J Cairns
Journal: Nature Date: 1975-05-15 Impact factor: 49.962

2. Widespread Selection for Oncogenic Mutant Allele Imbalance in Cancer.

Authors: Craig M Bielski; Mark T A Donoghue; Mayur Gadiya; Aphrothiti J Hanrahan; Helen H Won; Matthew T Chang; Philip Jonsson; Alexander V Penson; Alexander Gorelick; Christopher Harris; Alison M Schram; Aijazuddin Syed; Ahmet Zehir; Paul B Chapman; David M Hyman; David B Solit; Kevin Shannon; Sarat Chandarlapaty; Michael F Berger; Barry S Taylor
Journal: Cancer Cell Date: 2018-11-01 Impact factor: 31.743

Review 3. Implementing Genome-Driven Oncology.

Authors: David M Hyman; Barry S Taylor; José Baselga
Journal: Cell Date: 2017-02-09 Impact factor: 41.582

Review 4. Lessons from the cancer genome.

Authors: Levi A Garraway; Eric S Lander
Journal: Cell Date: 2013-03-28 Impact factor: 41.582

5. The clonal evolution of tumor cell populations.

Authors: P C Nowell
Journal: Science Date: 1976-10-01 Impact factor: 47.728

Review 6. Cancer genome landscapes.

Authors: Bert Vogelstein; Nickolas Papadopoulos; Victor E Velculescu; Shibin Zhou; Luis A Diaz; Kenneth W Kinzler
Journal: Science Date: 2013-03-29 Impact factor: 47.728

7. Disruption of wild-type IDH1 suppresses D-2-hydroxyglutarate production in IDH1-mutated gliomas.

Authors: Genglin Jin; Zachary J Reitman; Christopher G Duncan; Ivan Spasojevic; David M Gooden; B Ahmed Rasheed; Rui Yang; Giselle Y Lopez; Yiping He; Roger E McLendon; Darell D Bigner; Hai Yan
Journal: Cancer Res Date: 2012-11-30 Impact factor: 12.701

8. Mutation and cancer: statistical study of retinoblastoma.

Authors: A G Knudson
Journal: Proc Natl Acad Sci U S A Date: 1971-04 Impact factor: 11.205

Review 9. Hallmarks of cancer: the next generation.

Authors: Douglas Hanahan; Robert A Weinberg
Journal: Cell Date: 2011-03-04 Impact factor: 41.582

10. Mutational heterogeneity in cancer and the search for new cancer-associated genes.

Authors: Michael S Lawrence; Petar Stojanov; Paz Polak; Gregory V Kryukov; Kristian Cibulskis; Andrey Sivachenko; Scott L Carter; Chip Stewart; Craig H Mermel; Steven A Roberts; Adam Kiezun; Peter S Hammerman; Aaron McKenna; Yotam Drier; Lihua Zou; Alex H Ramos; Trevor J Pugh; Nicolas Stransky; Elena Helman; Jaegil Kim; Carrie Sougnez; Lauren Ambrogio; Elizabeth Nickerson; Erica Shefler; Maria L Cortés; Daniel Auclair; Gordon Saksena; Douglas Voet; Michael Noble; Daniel DiCara; Pei Lin; Lee Lichtenstein; David I Heiman; Timothy Fennell; Marcin Imielinski; Bryan Hernandez; Eran Hodis; Sylvan Baca; Austin M Dulak; Jens Lohr; Dan-Avi Landau; Catherine J Wu; Jorge Melendez-Zajgla; Alfredo Hidalgo-Miranda; Amnon Koren; Steven A McCarroll; Jaume Mora; Brian Crompton; Robert Onofrio; Melissa Parkin; Wendy Winckler; Kristin Ardlie; Stacey B Gabriel; Charles W M Roberts; Jaclyn A Biegel; Kimberly Stegmaier; Adam J Bass; Levi A Garraway; Matthew Meyerson; Todd R Golub; Dmitry A Gordenin; Shamil Sunyaev; Eric S Lander; Gad Getz
Journal: Nature Date: 2013-06-16 Impact factor: 49.962

12 in total

1. Determinants of anti-PD-1 response and resistance in clear cell renal cell carcinoma.

Authors: Lewis Au; Emine Hatipoglu; Marc Robert de Massy; Kevin Litchfield; Gordon Beattie; Andrew Rowan; Desiree Schnidrig; Rachael Thompson; Fiona Byrne; Stuart Horswell; Nicos Fotiadis; Steve Hazell; David Nicol; Scott T C Shepherd; Annika Fendler; Robert Mason; Lyra Del Rosario; Kim Edmonds; Karla Lingard; Sarah Sarker; Mary Mangwende; Eleanor Carlyle; Jan Attig; Kroopa Joshi; Imran Uddin; Pablo D Becker; Mariana Werner Sunderland; Ayse Akarca; Ignazio Puccio; William W Yang; Tom Lund; Kim Dhillon; Marcos Duran Vasquez; Ehsan Ghorani; Hang Xu; Charlotte Spencer; José I López; Anna Green; Ula Mahadeva; Elaine Borg; Miriam Mitchison; David A Moore; Ian Proctor; Mary Falzon; Lisa Pickering; Andrew J S Furness; James L Reading; Roberto Salgado; Teresa Marafioti; Mariam Jamal-Hanjani; George Kassiotis; Benny Chain; James Larkin; Charles Swanton; Sergio A Quezada; Samra Turajlic
Journal: Cancer Cell Date: 2021-10-28 Impact factor: 31.743

2. Genomic and Metabolic Hallmarks of SDH- and FH-deficient Renal Cell Carcinomas.

Authors: Angela Yoo; Cerise Tang; Mark Zucker; Kelly Fitzgerald; Renzo G DiNatale; Phillip M Rappold; Kate Weiss; Benjamin Freeman; Chung-Han Lee; Nikolaus Schultz; Robert Motzer; Paul Russo; Jonathan Coleman; Victor E Reuter; Ying-Bei Chen; Maria I Carlo; Anthony J Gill; Ritesh R Kotecha; A Ari Hakimi; Ed Reznik
Journal: Eur Urol Focus Date: 2022-03-11

3. Drugging multiple same-allele driver mutations in cancer.

Authors: Ruth Nussinov; Mingzhen Zhang; Ryan Maloney; Hyunbum Jang
Journal: Expert Opin Drug Discov Date: 2021-03-26 Impact factor: 7.050

4. Characterization of tumors with ultralow tumor mutational burden in Japanese cancer patients.

Authors: Keiichi Hatakeyama; Takeshi Nagashima; Keiichi Ohshima; Sumiko Ohnami; Shumpei Ohnami; Yuji Shimoda; Akane Naruoka; Koji Maruyama; Akira Iizuka; Tadashi Ashizawa; Tohru Mochizuki; Kenichi Urakami; Yasuto Akiyama; Ken Yamaguchi
Journal: Cancer Sci Date: 2020-08-07 Impact factor: 6.716

Review 5. Mutant p53 in cell-cell interactions.

Authors: Steven Pilley; Tristan A Rodriguez; Karen H Vousden
Journal: Genes Dev Date: 2021-04-01 Impact factor: 11.361

6. Transcriptional Networks Identify BRPF1 as a Potential Drug Target Based on Inflammatory Signature in Primary Lower-Grade Gliomas.

Authors: Mingyang Xia; Huiyao Chen; Tong Chen; Ping Xue; Xinran Dong; Yifeng Lin; Duan Ma; Wenhao Zhou; Wei Shi; Hao Li
Journal: Front Oncol Date: 2021-12-02 Impact factor: 6.244

Review 7. Discovery through clinical sequencing in oncology.

Authors: Mark T A Donoghue; Alison M Schram; David M Hyman; Barry S Taylor
Journal: Nat Cancer Date: 2020-08-10

Review 8. Targeting SHIP1 and SHIP2 in Cancer.

Authors: Chiara Pedicone; Shea T Meyer; John D Chisholm; William G Kerr
Journal: Cancers (Basel) Date: 2021-02-20 Impact factor: 6.639

Review 9. PI3K inhibitors are finally coming of age.

Authors: Bart Vanhaesebroeck; Matthew W D Perry; Jennifer R Brown; Fabrice André; Klaus Okkenhaug
Journal: Nat Rev Drug Discov Date: 2021-06-14 Impact factor: 112.288

10. Co-occurring gain-of-function mutations in HER2 and HER3 modulate HER2/HER3 activation, oncogenesis, and HER2 inhibitor sensitivity.

Authors: Ariella B Hanker; Benjamin P Brown; Jens Meiler; Arnaldo Marín; Harikrishna S Jayanthan; Dan Ye; Chang-Ching Lin; Hiroaki Akamatsu; Kyung-Min Lee; Sumanta Chatterjee; Dhivya R Sudhan; Alberto Servetto; Monica Red Brewer; James P Koch; Jonathan H Sheehan; Jie He; Alshad S Lalani; Carlos L Arteaga
Journal: Cancer Cell Date: 2021-06-24 Impact factor: 38.585