Literature DB >> 31080515

Quantifying local malignant adaptation in tissue-specific evolutionary trajectories by harnessing cancer's repeatability at the genetic level.

Natsuki Tokutomi¹, Caroline Moyret-Lalle², Alain Puisieux², Sumio Sugano¹, Pierre Martinez².

Abstract

Cancer is a potentially lethal disease, in which patients with nearly identical genetic backgrounds can develop a similar pathology through distinct combinations of genetic alterations. We aimed to reconstruct the evolutionary process underlying tumour initiation, using the combination of convergence and discrepancies observed across 2,742 cancer genomes from nine tumour types. We developed a framework using the repeatability of cancer development to score the local malignant adaptation (LMA) of genetic clones, as their potential to malignantly progress and invade their environment of origin. Using this framework, we found that premalignant skin and colorectal lesions appeared specifically adapted to their local environment, yet insufficiently for full cancerous transformation. We found that metastatic clones were more adapted to the site of origin than to the invaded tissue, suggesting that genetics may be more important for local progression than for the invasion of distant organs. In addition, we used network analyses to investigate evolutionary properties at the system-level, highlighting that different dynamics of malignant progression can be modelled by such a framework in tumour-type-specific fashion. We find that occurrence-based methods can be used to specifically recapitulate the process of cancer initiation and progression, as well as to evaluate the adaptation of genetic clones to given environments. The repeatability observed in the evolution of most tumour types could therefore be harnessed to better predict the trajectories likely to be taken by tumours and preneoplastic lesions in the future.

Entities: CellLine Chemical Disease Gene Species

Keywords: Bioinformatics/Phyloinformatics; adaptation; biomedicine; cancer evolution; disease biology

Year: 2019 PMID： 31080515 PMCID： PMC6503823 DOI： 10.1111/eva.12781

Source DB: PubMed Journal: Evol Appl ISSN： 1752-4571 Impact factor: 5.183

INTRODUCTION

Cancer is a disease fuelled by somatic evolution (Nowell, 1976), in which cells from multicellular organisms switch to uncurtailed growth (Davies & Lineweaver, 2011), potentially leading them to invade distant organs and eventually to death. This evolution is at least partly explained by the accrual of (epi)genetic alterations in cells over several generations and is context‐specific, with different recurrent alterations observed in different tissue types (Greaves & Maley, 2012). The hypothesis that malignant transformation requires multiple alterations provides an explanation to why cancer incidence increases with age (Armitage & Doll, 1954) and why progression to cancer occurs via intermediate benign stages (Fearon & Vogelstein, 1990; Vogelstein et al., 2013). This is further corroborated by the observation that most solid adult tumours harbour multiple “driver” alterations, that is those likely to functionally impact cell behaviour and push it towards malignancy (Zack et al., 2013). Yet, the dynamics and stochastic nature of malignant transformation are still insufficiently understood: clinicians still lack efficient diagnostic tools to accurately predict if and when benign lesions will progress to cancer (Martinez et al., 2016), which patients at risk will develop tumours and what their genetic characteristics will be (Lässig, Mustonen, & Walczak, 2017). The heterogeneity observed between tumours of the same type demonstrates that multiple evolutionary trajectories can converge towards malignancies with similar phenotypic characteristics (Yates & Campbell, 2012). Yet, how much impact individual driver alterations have on this adaptation or how they interact is still largely unknown. In addition, although metastatic cancer still represents a major clinical challenge, the role of the genetic makeup of the primary site in the adaptation to a novel distant site is also unclear. While reconstructing the genetic history of individual patients through sequencing becomes easier (Gerlinger et al., 2012; Nik‐Zainal et al., 2012; Yates et al., 2017), the generic process of carcinogenesis underlying each tumour type remains elusive. The availability of public data sets describing the genetic characteristics of multiple specimen of various tumour types however represents an opportunity to decipher oncogenesis as a stochastic process. All tumours characterised so far are outcomes of their type‐specific evolutionary process. This is akin to Gould's contingency definition (Gould, 1990), as if “replaying the tape” of somatic evolution: recording all malignant developments starting from the mostly identical genetic backgrounds of different human individuals (Rosenberg et al., 2002). Here, we investigate these recorded outcomes to infer the evolutionary landscape of each tumour type, mapping the evolutionary trajectories that can lead normal cells to cancerous transformation. We make the following assumptions (a) cancer is of monoclonal origin, whereby the genetic background of the initial clone is detectable in all subsequent generations; (b) although different evolutionary trajectories can lead to such a clone, they all are similarly capable of invading their organ‐specific environment of origin. We thus estimate different evolutionary parameters to investigate the contribution of all drivers within genetic clones and calculate a local malignant adaptation (LMA) score resulting from their combination in nine tumour types. Similar to a fitness definition in evolutionary biology, our score can be understood as a measure of adaptation to a disease‐specific evolutionary context, leading to harmful over‐proliferation and domination of the local environment. Our model highlights differences across tumour types regarding the interactivity between driver alterations and predicts that premalignant skin and colorectal lesions are adapted to their environment, yet not as much as invasive tumours. We find that genetic landscapes of local adaptation do not explain the location of distant metastases, suggesting that adaptation to metastatic sites does not rely on genetics as much as tumorigenesis does. Finally, we suggest that networks can be used to represent stepwise genetic progression, providing useful tools to stochastically reconstruct the contingencies of the oncogenesis process and study its systemic properties.

METHODS

TCGA data

We downloaded data for 2,742 samples from The Cancer Genome Atlas (TCGA), for which we could obtain both allelic frequencies for mutations and copy number alterations (CNAs). This represented 133 bladder cancers (BLCA), 914 breast cancers (BRCA), 195 colorectal cancers (COAD), 256 glioblastomas (GBM), 296 head and neck squamous cell cancers (HNSC), 306 kidney clear cell renal cell carcinomas (KIRC), 262 lung adenocarcinomas (LUAD), 132 lung squamous cell cancers (LUSC) and 248 skin melanomas (SKCM). Raw SNP array data were normalised using the aroma R package (Ortiz‐Estevez, Aramburu, Bengtsson, Neuvial, & Rubio, 2012) with paired normals, and copy number profiles were called using ASCAT (Van Loo et al., 2010).

Naevi & melanoma data

The mutational and copy number data for 37 primary melanomas with paired precursor lesions were downloaded from Shain et al. (2015). Naevi, blue naevi and intermediate benign annotations were considered benign lesions, while intermediate malignant, melanoma in situ, melanoma (all stages), desmoplastic melanoma and blue naevus‐like melanoma annotations were considered malignant. In the few cases in which multiple malignant lesions were found for a patient, the least advanced was selected. Mutations were considered clonal when the normalised MAF values in the reported sequencing data were >0.8. To detect CNAs, we first calculated the mean and standard deviation in segment mean (i.e., logR ratios for expected chromosomal copies in a segment) for all segments from the reported normal samples. Gains and losses for the four CN drivers retained for the melanoma landscape (CDK4_gain, CDKN2A_loss, MDM2_gain, PTEN_loss) were calculated based on the reported segment mean for the relevant segment of each sample deviated significantly from the normal segment mean distribution, using a one‐tailed test with the pnorm R function. It is worth noting that, unlike whole‐exome TCGA data, these data are mostly obtained through targeted sequencing of a 293‐gene panel and may therefore miss information on given drivers.

Colorectal adenoma and carcinoma data

We retrieved whole‐genome sequencing data from 31 samples from 9 adenomas and 72 samples from 11 carcinomas (Cross et al. 2018). Mutations were called with platypus (Rimmer et al., 2014), copy number data and ploidy were obtained using the cloneHD software (Fischer, Vázquez‐García, Illingworth, & Mustonen, 2014). Mutation clonality was estimated using the EstimateClonality package (McGranahan et al., 2015) on each of the 31 samples as for the TCGA data. Mutations were considered clonal in an entire adenoma when they were predicted as clonal in ≥75% of the related samples.

MET500 data

Genomic data were retrieved from Robinson et al. (2017) and via the MET500 website (https://met500.path.med.umich.edu/). Sample annotation was manually curated to attribute a tumour type to the primary site and metastatic site. All curated annotations are reported in Supporting Information Tables 2 and 3. Sample purity was collected from the Supp. Information of the original publication; sample ploidy was calculated as the mean copy number weighed by number of targeted exons. Mutation clonality was estimated using the EstimateClonality package.

Clonality and driver alterations

Mutation clonality was estimated thanks to the EstimateClonality package (McGranahan et al., 2015); clonal mutations were defined as those for which the 95% confidence interval of the cancer cell fraction (CCF) included 1. Gain and losses of each gene in each sample were defined relatively, if the copy number deviated by 0.6 or more from the ploidy given by ASCAT in the segment that included the gene of interest. Segments with <10 probes were filtered out. Driver alterations for each tumour type were retrieved from the IntOGen website (Gonzalez‐Perez et al., 2013; Rubio‐Perez et al., 2015). Due to the lack of an appropriate clonality estimation method, all copy number alterations were considered clonal. Final matrices of clonal mutations per sample are available via github. To limit the number of potential combinations, only the 50 most frequent driver alterations that occurred in >0.5% of a cohort were selected. Due to their proximity on the chromosome, CDKN2A loss and CDKN2B loss were merged into a single event, which avoids later biases when investigating co‐occurrence. This left 34 drivers in GBM, 47 in KIRC and 50 in all other tumour types.

Quantification of evolutionary parallelism

We represented each data set as a gene per patient matrix, using 0 to denote the absence of a clonal alteration and 1 for the presence of a specific clonal alteration in a specific patient. When multiple point mutations occur in the same gene in a patient, they are reported with a value of 1 encompassing all mutations. For each tumour type, 10 randomised matrices were generated by randomly reassigning the mutations of each patient. For instance, upon simulation, a patient with 300 clonal mutations out of 17,000 potential genes will still have 300 clonal mutations albeit in different genes than the originally observed alterations. The Jaccard Index between two samples is then computed using the following formula: Where A and B are the number of alterations in each sample, and A∩B the number of overlapping alterations. All pairwise indices are computed for each matrix. Indices from the 10 simulated matrices are then pooled as randomised controls for a given tumour type.

Genetic parameters of local malignant adaptation

The selective advantage of each driver alteration was defined as the ratio between expectations and observations in a given tumour type, considering mutations and CNAs separately. The expected number of mutations occurring in a gene was calculated by dividing the total number of mutations by the total number of genes in which at least 1 mutation was observed, weighted by each gene's length in base pairs, as given by the median transcript length for the gene in Ensembl (Zerbino et al., 2018). Only clonal mutations were considered. CNA selective advantage was calculated separately for gains and losses, by the ratio of expected occurrences given all events to the actual occurrences of each CNA, giving equal probability to all genes without weighing for length. The selective advantage SA of each driver d is thus centred around 1 with a 0 lower boundary. We hypothesised that drivers that tend to occur with few other drivers had more impact on malignant progression, and thus more “self‐sufficient,” that those occurring with more additional drivers. Malignant epistatic interactions reflect the fact that two alterations can depend on each other, either positively or negatively, to mediate the potential of a clone to become malignant and invasive in its local environment. We quantified these interactions using the ratio of observed co‐occurrences of driver alterations to expectations of observing each alteration in each sample. We relied on hypergeometric probabilities, based on the specific number of alterations per sample and the number of occurrences of each alteration in the cohort. See Appendix S1 for full details and examples.

Scoring local malignant adaptation: models of parameter combination

As the prevalence and interplay of each parameter are unknown, we designed different models that correspond to different combinations of the three LMA parameters investigated. For simplicity, all models rely on summing the contribution of each driver alteration to the adaptation of each individual sample, given its alteration load and tumour‐type‐specific context. Parameters in a model are then assigned specific weights, which are optimised so as to minimise the variation of the overall score between samples in each tumour type. We separated the three parameters into two types of LMA components: intrinsic (selective advantage, self‐sufficiency) and interactive (malignant epistatic interactions). In any given tumour type, the intrinsic component is driver‐specific, while the interactive component is context‐dependent and relies on which other driver alterations are also present in a sample. We designed four models based on the combination of two criteria: (a) whether selective advantage and self‐sufficiency are separate or combined; (b) whether the malignant epistatic score of a driver is given by the mean of all its interactions with the other drivers in the sample, or by the product of these interactions. This thus corresponds to four possible designs in total, which are labelled separate_mean, separate_prod, combined_mean, combined_prod (Supporting Information Table S1). The former two models therefore comprise the three parameters weighed individually then summed, while the latter two models only comprise two individually weighted parameters, with the intrinsic component being given for each driver d by SS multiplied by SAd. Such a multiplicative relationship is more relevant than an additive one, as SA (range: 0.3–208.5) has higher variability than SS (range: 0.3–1.8).

Weight inference

To assign weights to the components of each model, we used 37 possible empirical values ranging from 0.01 to 100, symmetrically mirrored around 1. The complete list is (0.010, 0.011, 0.012, 0.014, 0.017, 0.020, 0.025, 0.033, 0.05, 0.10, 0.11, 0.12, 0.14, 0.17, 0.20, 0.25, 0.33, 0.5, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100). For all four models, we performed LMA score calculations using all possible combinations of values for each of the model's parameters. For instance, using the combined_prod model with a 0.2 factor for the intrinsic component and a 3 factor for the interactive one, the LMA score for a sample s with N drivers is given by the following equation:where E j is the malignant epistatic score for the interaction of drivers i and j. SA, SS and E depend on the context given by the tumour type of s. We calculated the standard deviation of the LMA scores from the related samples normalised by their median. We used an objective function aiming to minimise the standard deviation in normalised LMA while assessing all weight combinations, so as to match our initial assumption that most tumour‐initiating clones in a given environment have a similar score. As different tumour types can involve different development mechanisms, the weights were optimised in tissue‐specific fashion. The weight combination yielding the lowest standard deviation in normalised LMA was thus selected in each tumour type.

Network representation

Network images were produced using the cytoscape software (Shannon et al., 2003).

RESULTS

Cancer evolution is highly repeatable

In order to calculate the adaptation of a genetic clone to its local environment, we defined landscapes of LMA specific to each of 9 cancer types, based on the occurrence of their most frequent driver alterations. Our LMA score aims at scoring the potential of a genetic clone to develop malignantly in a given environment. To accurately reflect the genetic background of the clones that ultimately adapted to each environment, we only focused on clonal driver alterations (i.e., those present in all cells of a tumour). We investigated the repeatability of cancer evolution in each tumour type using a Jaccard Index based at the gene level, considering all sample‐specific sets of clonal alterations as the genotypes of similarly adapted phenotypes (Bailey, Rodrigue, & Kassen, 2015; Yeaman, Gerstein, Hodgins, & Whitlock, 2018). Results were compared to randomised distributions of mutations that followed the same mutational load per patient (see Methods). As expected given the recurrence of driver mutations, our results highlight a high parallelism within each tumour type (Figure 1a). On average, the similarity between samples from the real distribution of alterations was 1.4–5.1 times superior to the 95th percentile of those observed in our randomised controls (Figure 1b). The glioblastoma (GBM) and lung squamous cell cancer (LUSC) sets displayed particularly high parallelism, with 96% of the indices being higher than the 95th percentile of the randomised control.

Figure 1

Repeatability of cancer evolution. (a) Jaccard distances between the clonal genetic makeup of samples in the TCGA (left, vivid colours) and randomised controls (right, light colours) for each of the nine tumour types. (b) Jaccard distances in the TCGA data normalised via division by the 95th percentile of each corresponding randomised data. Horizontal red line highlights a value of 1 (no difference). The percentage of observations exceeding the simulated 95th percentile is reported left of each distribution. Boxes represent the middle quartiles; black horizontal bars represent the median of each distribution; whiskers extend up to 1.5 times the interquartile range (box height) away from the box. Outliers (beyond the whiskers) are not displayed

Genetic landscapes of local malignant adaptation: parameter definition

We identified three types of genetics‐based factors that could quantify the adaptation of cells to a determined environment: selective advantage, driver self‐sufficiency and malignant epistatic interactions (Figure 2a). Selective advantage has been the focus of numerous previous studies (Gonzalez‐Perez & Lopez‐Bigas, 2012; Lawrence et al., 2013; Martincorena et al., 2017; Zapata et al., 2018) and was defined in our case as the ratio of observed occurrence of each clonal alteration (driver or not) compared to its expected occurrence given its number of clonal observations, weighted by gene size (in base pairs). We compared our selection score to the corrected dN/dS measure obtained by Zapata et al. (2018) in a recent publication. We find that both scores are moderately, yet significantly correlated (p < 0.001, Supporting Information Figure S1). The observed variability can furthermore be explained by the fact that their analysis was based on pan‐cancer data while our genetic landscapes are tumour‐type specific, and the fact that we focus solely on clonal alterations.

Figure 2

Evolutionary parameters of Local Malignant Adaptation (LMA). (a) General scheme. Selective advantage is computed from the repartition of alterations per gene, self‐sufficiency and malignant epistatic interactions are calculated from the repartition of alterations per patient. The parameters are combined according to different models to calculate the LMA of all tumour‐initiating clones in each cohort. The parameters corresponding to the lowest deviation in the overall LMA score across specimens are selected. (b) Number of additional drivers observed in samples harbouring the 20 most frequent alterations in colorectal adenocarcinoma (COAD) and (c) lung squamous cell carcinoma (LUSC). Horizontal green bars represent the median; diamonds represent the mean. Blue means significantly fewer partners than expected; red means significantly more. Dotted line indicates the overall mean. (d) Malignant epistatic interactions between all retained drivers in COAD (top right) and SKCM (skin melanoma, bottom left). Blue indicates negative interaction due to co‐occurrences rarer than expected; red indicates positive interactions (higher co‐occurrence than expected) Driver self‐sufficiency is however a novel measure, reflecting how many additional drivers are needed on average to induce malignant development. Differences could be observed across tumour types, with for instance a high discrepancy in the number of additional drivers in colorectal adenocarcinomas and a relatively homogeneous distribution in lung squamous cancers (Figure 2b,c and Supporting Information Figure S2). Both selective advantage and self‐sufficiency measures are specific to single driver alterations in a given tumour type. We found they were significantly correlated, yet with a very high variability (R 2 = 0.04, p < 0.001, Supporting Information Figure S2). This indicates that they can provide distinct information on the impact of each alteration and that mutations highly selected for may still require numerous other alterations in order to induce full cancerous transformation. We defined malignant epistatic interactions as the ratio of co‐occurrence between two genes given their respective frequencies and the number of mutations per sample in a cohort, using hypergeometric probabilities (see Methods). In this work, these interactions differ slightly from the traditional concept of epistasis and are entirely focused on cancerous development: the interaction between two genes can favour or hamper malignant transformation on the long term, without immediately impacting selective advantage during somatic development. The most negative interaction was the one found between known antagonists BRAF and NRAS in melanoma (Curtin et al., 2005), while the most positive interactions included those between AURKA gain, APC and TP53 mutations in colorectal cancer (Figure 2d). Across cancers, CNAs tended to frequently co‐occur with each other and TP53 mutations (Supporting Information Figure S4), in agreement with the role of TP53 in promoting genome instability, which then accelerates CNA acquisition (Martinez et al., 2018; Sansregret, Vanhaesebroeck, & Swanton, 2018).

Model selection

In this study, we aim to calculate the adaptation of genetic clones that reached invasive potential in each tumour type. We decide to model LMA as the sum of the contribution of all clonal drivers in a sample using different models, corresponding to distinct combinations of the three parameters previously calculated. The selective advantage and self‐sufficiency parameters correspond to the intrinsic component of LMA, as they solely depend on individual driver properties. Malignant epistatic interactions define the interactive component of LMA, depending on interactions between all drivers present in a sample. We used four models based on these two criteria: either combining (“combined”) or separating (“separate”) selective advantage and self‐sufficiency; and either quantifying the epistatic interactions as the mean of all interactions between a driver and its partners (“mean”) or the product of these interactions (“product”). We then established the optimal weights for all components of each model that minimised the variation in LMA across all samples on a tumour‐type basis (see Methods). This objective function aims at producing data matching our assumption that the initiating clones in different tumours of the same type are all similarly well adapted in this landscape. We evaluated the relevance of these models by calculating the prevalence of each component in the LMA score of all samples on a per set basis, as well as the correlation between LMA and the number of drivers in each sample (Figure 3). Both models separating the intrinsic component of LMA were found to be optimal by down‐weighing selection strength and the resulting scores were often dominated by a single component, often being self‐sufficiency (Figure 3a,b and Supporting Information Figure S5). In addition, the separate_mean model diminished the contribution of malignant epistatic interactions, while they appeared as major contributors in several tumour types under the separate_prod model. A similar observation was made for the combined models, in which these interactions were down‐weighted in the combined_mean model while this was not the case in the combined_prod model (Figure 3c,d).

Figure 3

Model selection. (a, b) Optimal weights found for selective advantage, self‐sufficiency and malignant epistatic interactions in the “separate” models (log10 scale). (a) separate_mean; (b) separate_prod. (c, d) Optimal weights found for the intrinsic (selective advantage x self‐sufficiency) and interactive (malignant epistatic interactions) components of the combined models (log10 scale). (c) combined_mean; (d) combined_prod. (e) Distribution of R 2 values for the correlation between local malignant adaptation and number of drivers in all nine tumour types according to all four models. (f) Share of the LMA score of each tumour specimen corresponding to the intrinsic component using the combined_prod model in all tumour types Under all models and in most tumour types, the resulting LMA score of a sample was highly correlated to its number of drivers (all p < 0.001, Figure 3e). The combined_prod model was however the model in which LMA and number of drivers were the least correlated (p < 0.01 against all other models, paired Wilcoxon test). We therefore elected the combined_prod model as the most appropriate of the four models to calculate LMA, as it was less dominated by a single component and was less likely to measure adaptation as a static process of stacking up driver alterations.

Differences among and within cancer types

We measured the percentage of the LMA scores accountable to the intrinsic component for each sample of each tumour type (Figure 3f). Our results suggest our assumption that all malignancies are equivalently adapted to their local environment could rely on different evolutionary dynamics in different tissue‐specific contexts. Colorectal, head & neck and lung (adenocarcinomas and squamous cell) cancers were defined by a prevalence of the intrinsic component in the scores of their specific samples, while the interactive component prevailed in glioblastoma. Interestingly, we observed a high variability in the share of each LMA score component of all samples in many tumour types, particularly in breast cancer. These observations reflect the extensive heterogeneity recurrently observed both inter‐ and intra‐tumour at the genetic and phenotypic levels, which can be mirrored by our occurrence‐based framework to calculate LMA.

Premalignant lesions are specifically adapted to the local landscape

We analysed two published cohorts of premalignant lesions linked to melanomas and colorectal carcinomas (CRC), to understand whether these precursors differed from fully‐formed tumours in our measure of adaptation. We first analysed 31 precursor/melanoma pairs (Shain et al., 2015), including 23 benign naevi. We observed that the LMA score of the melanomas was consistently and significantly higher than the one of their paired benign precursor (p = 0.001 paired Wilcoxon test, p = 0.01 unpaired, Figure 4a).

Figure 4

Premalignant lesions. (a) Normalised LMA scores of paired melanomas (left) and their precursor lesions (right). (b) Normalised LMA scores of unpaired colorectal carcinomas (left) and adenomas (right). (c) Normalised LMA scores of naevi in the SKCM landscape (left) and in the eight other landscapes on average (right). (d) Normalised LMA scores of adenomas in the COAD landscape (left) and in the eight other landscapes on average (right) We additionally analysed 9 colorectal adenomas and 11 carcinomas from (Cross et al., 2018). Multiple samples were available for all cases (2–6 per adenoma, 4–13 per carcinoma, see Methods). The LMA of carcinomas was higher than the one of adenomas, although not significantly, possibly due to the limited sample size and absence of benign/malign pairing (p = 0.37, Wilcoxon test, Figure 4b). A similar trend was observed with significance when investigating individual biopsies rather than whole lesions, although the redundancy and uneven number of samples across cases may bias this observation (p = 0.018, Wilcoxon test, Supporting Information Figure S6). When we calculated the LMA scores of the 23 naevi and 9 adenomas in other tumour‐type‐specific landscapes, they appeared more adapted, respectively, to the melanoma and CRC landscapes than to the other landscapes on average (p < 0.001 and p = 0.005, paired Wilcoxon test, Figure 4c,d and Supporting Information Figure S7). Our model is thus able to detect that premalignant lesions are specifically adapted to their local environment, further suggesting that additional evolutionary steps are still necessary to acquire locally invasive capacity.

Metastasis does not rely on genetic adaptation to the distant site's landscape

We then applied our methods to metastatic samples, in order to shed light on whether the genetic basis of adaptation to an environment was as relevant for its metastatic colonisation as for local invasion. We used the MET500 data set of 500 metastatic samples (Robinson et al., 2017) and used sample annotation to identify 170 samples for which we had a primary site LMA landscape (i.e., any of the 9 tumour types analysed), 82 samples with an available metastatic site landscape, and 35 for which the landscape of both metastatic and primary sites was known. Manual curation was employed to attribute a relevant tumour type to each sample (Supporting Information Tables 2 and 3). As we had two different potential landscapes corresponding to lung metastases (LUAD, adenocarcinoma, and LUSC, squamous cell cancer), all calculations were replicated by taking either type as default landscape for lung metastases. For the 35 samples with existing landscapes for the primary and metastatic sites, we observed that LMA scores were consistently higher in the primary site than in the metastatic one (p < 0.001, paired Wilcoxon test, Figure 5a and Supporting Information Figure S8). This suggests that metastatic clones are more genetically adapted to the environment in which they originated than to the one they colonised. As for premalignant lesions, we analysed the LMA scores of each sample in all the other tumour types. LMA in the landscape of the primary site was significantly higher than the average LMA in the other 8 landscapes (p < 0.001, paired Wilcoxon test, Figure 4b, Supporting Information Figure S9), while we did not observe any difference when considering the adaptation in the metastatic site's landscape (Figure 5c and Supporting Information Figure S10). This suggests that these genetic clones were specifically adapted to their environment of origin. However, their genetic makeup did not provide them with the potential to specifically adapt to their metastatic site as well as local primary tumours.

Figure 5

Metastatic lesions. (a) Normalised LMA scores of 35 lesions with landscapes existing for both primary and metastatic site. Left, LMA in the primary site's landscape; right, LMA in the metastatic site's landscape. (b) Normalised LMA scores of 170 lesions with a primary site landscape in their specific landscape (left), or in the other 8 landscapes on average (right). (c) Normalised LMA scores of 82 lesions with a metastatic site landscape in their specific landscape (left), or in the other eight landscapes on average (right)

TCGA landscapes as evolutionary graphs

Our results demonstrate that our LMA model can quantify genetic adaptation to specific somatic evolutionary contexts. Our method can further be combined to a network approach to understand adaptation dynamics as individual driver alterations accrue in a clone over time. Fitness landscapes can be represented by graphs (Diaz‐Uriarte & Wren, 2018; Palmer et al., 2015) in which nodes are unique combinations of drivers, each with a distinct fitness. We reproduced such an architecture with our LMA scores, where edges connect nodes as additional drivers are added on top of previous combinations (Figure 6a). This network architecture can represent all evolutionary trajectories as successive acquisitions of any given number of driver alterations. As such a combinatorial approach is computationally heavy, we restricted our analysis to the 15 most common alterations in each tumour type. This corresponds to a maximum of 32,767 unique combinations of driver alterations per tumour type.

Figure 6

Graphs of local malignant adaptation. (a) Example graph with the five most common driver alterations in colorectal cancer (COAD). Nodes are combinations of alterations, each connected to all possible previous and posterior combinations of step‐by‐step driver acquisition. LMA is increasingly coloured in blue to red to yellow. An example of evolutionary trajectory in which a clone subsequently acquires 4 mutations is highlighted by a green dashed line throughout the network. (b) Average and (c) maximum node LMA score per number of driver alterations in all tumour types. Landscapes were limited to the 15 most prominent drivers of each tumour type. (d) Number of clonal drivers per TCGA patient and (e) number of drivers in all evolutionary trajectories allowing to reach a minimal LMA score equal to the median TCGA score of the same tumour type. Yellow bars indicate the mean of each tumour type; dotted line indicates the overall mean. (f) Correlation between the mean number of clonal drivers actually observed in TCGA patients and the mean number of drivers required to reach at least the median TCGA score in the corresponding landscape We took advantage of this framework to investigate how malignant adaptation to the local environment evolves in the nine investigated TCGA tumour types, by following how genetic clones progress through the graph by acquiring new alterations. We observed differences in LMA dynamics depending on the properties of each tumour type. We see that in many cases, LMA increases linearly on average with each novel driver, as can be expected from our approach based on summing the contextualised contributions to adaptation of each driver in a clone. However, tumour types in which our model predicted a high prevalence of malignant epistatic interactions (GBM, BRCA) display a strong deviation from linearity (Figure 6b). This is also reflected in how fast the maximum LMA score in the network is reached, with some tumour types displaying a log‐like distribution with decreasing improvement with each additional alteration, while the maximum LMA of GBM increases exponentially (Figure 6c). Interestingly, the maximum LMA for BLCA, BRCA and COAD is reached early and decreases after, respectively, 10, 12 and 13 alterations, as would be expected under diminishing returns epistasis (Chou, Chiu, Delaney, Segrè, & Marx, 2011). This highlights that our model can produce non‐linear relationships between LMA and the number of drivers, depending on the nature of malignant epistatic interactions among driver alterations. Finally, we investigated if these network representations could represent a basis on which to predict the evolutionary trajectories potentially leading to cancer. In all tumour types, we recorded all trajectories that reached a LMA score superior or equal to the median score observed in the corresponding TCGA cohort. The first node meeting such a criterion in a trajectory was considered final and its offspring nodes were not investigated. These trajectories thus represent all combinations of genes that likely lead to sufficient genetic adaptation for tumorigenesis. The number of drivers in these combinations was superior to the one observed in the actual sample, which was expected given that their LMA score had to be equal or higher than the data set's median (Figure 6d,e). The mean number of required alterations is however strongly correlated to the mean number of clonal drivers observed in each tumour type (R 2 = 0.91, p < 0.001, Figure 6f). This suggests that networks of contingency‐based adaptation metrics can thus provide a framework to both represent and study precancerous progression under a novel angle, while recapitulating the specificities of different tumour types. Their use can furthermore identify system‐level properties of the evolutionary context that funnels malignant somatic evolution in different organs and environments.

DISCUSSION

Here, we developed a methodology to investigate the progression of normal cells towards oncogenesis, by way of measuring the adaptation of genetic clones to a given environment. Our method relies on the contingency and repeatability of cancer development, observed when multiple same‐type cancers occur with distinct evolutionary trajectories in different human individuals. We built and optimised simple models to quantify LMA based on the presence of recurrent genetic alterations in nine cohorts corresponding to nine tumour types. This approach is similar to fitness landscapes that aim to map the space in which phenotypes evolve and adapt in evolutionary biology. We then applied our method to independent data sets of premalignant and metastatic lesions to analyse the impact of genetics in the adaptation to and the colonisation of different environments. Our exploratory model aimed at simplicity and still suffers from several limitations. First of all, our model is solely based on genetics and ignores how a cell's context and phenotypic state can dictate its response to specific genetic insults (Sieber, Tomlinson, & Tomlinson, 2005), which can impact tumorigenesis (Morel et al., 2017; Puisieux, Pommier, Morel, & Lavial, 2018) or lead to non‐genetic selection (Shaffer et al., 2017). It also ignores the contribution of epigenetic alterations as potential drivers; the inclusion of which will require a more general understanding on the recurrent epigenetic alterations functionally linked to tumour formation (Timp & Feinberg, 2013). There furthermore exists no method to estimate the clonality of CNAs, thus hampering our accuracy in including only the truly clonal ones in LMA calculations. Despite the fact that our model includes malignant epistatic relationships, it is unable to estimate the ordering of alterations, which can heavily influence evolutionary trajectories (Ortmann et al., 2015). As occurrence‐driven definition of epistatic interactions requires large numbers of observations, animal models in which driver combinations can be induced and followed over time could provide valuable controls for pre‐identified targets (Rogers et al., 2018). Aside from genetic alterations, our model does not include the interactive adaptation relationship between a pre‐invasive cell and its environment, the interplay between both being very likely to modify selective pressures as potential tumour‐initiating cells develop (Bissell & Radisky, 2001; Rozhok, Salstrom, & DeGregori, 2016; Scott & Marusyk, 2017). Finally, our work focuses on a single clone being responsible for initial local invasion, while it is possible that this process can involve multiple clones (Casasent et al., 2018). Such polyclonal invasion would however likely involve a recent common ancestor. Addressing these drawbacks in the future will allow to better stochastically model malignant progression. Despite these limitations, our model fits the hypothesis that carcinogenesis is likely a stepwise process. Using similar data on per sample genomic alterations from three complementary data sets on primary tumours, premalignant and metastatic lesions, we were able to investigate the genetic determinants of different stages of cancer progression. Our analyses highlighted that premalignant lesions were specifically adapted to their environment, yet insufficiently to promote local malignant invasion. This thus suggests that cancer arises when benign lesions acquire further driver alterations, at least in melanomas and colorectal carcinomas. In addition, the analysis of metastatic lesions suggested that genetics contributed to the formation of the primary tumours but were not a defining factor in the adaptation to the metastatic site. Metastasis to a specific site requires convergent evolution for clones from divergent backgrounds to adapt to the new environment (Cunningham, Brown, Vincent, & Gatenby, 2015). This adaptation may however not depend on novel genetic alterations, potentially relying on cellular plasticity (Varga & Greten, 2017) or epigenetic changes (Roe et al., 2017), or may even involve completely different genetic determinants than ab initio oncogenesis in the same site. By combining our LMA score to evolutionary approaches, we were furthermore able to create a framework to simulate genomic progression towards a malignant phenotype. This framework could capture the specificities of different tumour‐type‐specific evolutionary dynamics and suggests potential applications to forecast and control tumour evolution. Such an approach can help predict the most probable (and fastest) evolutionary route(s) linking premalignant lesions to cancer progression, given their genetic makeup. Our method however only scores malignant potential, which may differ from proliferative advantage in the tissue of origin. Interestingly, recent work on the somatic evolution of normal oesophagus suggests that alterations providing a competitive growth advantage in the normal tissue did not necessarily yield malignant phenotypes, as exemplified by the prevalence of NOTCH1 mutations in normal, but not in cancerous tissue (Martincorena et al., 2018). Combined with our findings, this would imply that to accurately simulate (and predict the outcome of) the evolutionary process underlying cancer initiation, the growth advantage and potential for malignant transformation provided by somatic genomic alterations should be assessed and integrated separately. Although our work focused on clonal mutations and tumour initiation, clonal evolution continues as cancer develops, with multiple clones co‐existing and evolving in parallel (Martinez et al., 2013). This results in genetic heterogeneity likely to foster resistance and hamper accurate diagnosis. Multi‐region sampling of individual tumours however also highlighted the repeatability of evolutionary patterns after initiation (Gerlinger et al., 2014), which can be used to predict the evolution of tumours once they become invasive (Caravagna et al., 2018). There is therefore a critical need for large, centralised public data sets, with accurate clinical annotation and the alterations recurrently selected by each treatment type, to help reconstruct treatment‐specific fitness landscapes. Adapting our methodology to represent these landscapes as graphs could serve as a basis to predict the nodes (i.e., clones) most likely to be generated by undirected genetic drift of the current population, then selected by different therapeutic pressures. By identifying genotypes associated to high fitness in a treatment‐specific landscape, and low fitness in another, this could in turn provide opportunities to channel cancer evolution towards extinction, via sequential treatment regimens inducing evolutionary bottlenecks (Dhawan et al., 2017; Goldie, Coldman, & Gudauskas, 1982; Nichol et al., 2019). As the data available on cancer genomics rapidly increase in quality and quantity, the accuracy of evolutionary frameworks aiming at forecasting cancer evolution will inevitably improve, in time providing compelling tools to guide clinical decisions and optimise therapy.

CONFLICT OF INTEREST

None declared.

AUTHOR CONTRIBUTIONS

PM designed the experiments. NT and PM analysed the data. CML and SS organised the collaboration. CML, AP, SS and PM supervised the work. PM wrote the article. All authors reviewed the article. Click here for additional data file.

1 in total

1. Assessing Cell Activities rather than Identities to Interpret Intra-Tumor Phenotypic Diversity and Its Dynamics.

Authors: Laloé Monteiro; Lydie Da Silva; Boris Lipinski; Frédérique Fauvet; Arnaud Vigneron; Alain Puisieux; Pierre Martinez
Journal: iScience Date: 2020-04-13

1 in total