Literature DB >> 25730763

Analysis of the genetic phylogeny of multifocal prostate cancer identifies multiple independent clonal expansions in neoplastic and morphologically normal prostate tissue.

Colin S Cooper^1,2,3, Rosalind Eeles^1,4, David C Wedge⁵, Peter Van Loo^5,6,7, Anne Y Warren⁸, Christopher S Foster^9,10, Hayley C Whitaker¹¹, Ultan McDermott⁵, Daniel S Brewer^1,3,12, David E Neal^11,13, Gunes Gundem⁵, Ludmil B Alexandrov⁵, Barbara Kremeyer⁵, Adam Butler⁵, Andrew G Lynch¹⁴, Niedzica Camacho¹, Charlie E Massie¹¹, Jonathan Kay¹¹, Hayley J Luxton¹¹, Sandra Edwards¹, ZSofia Kote-Jarai¹, Nening Dennis⁴, Sue Merson¹, Daniel Leongamornlert¹, Jorge Zamora⁵, Cathy Corbishley¹⁵, Sarah Thomas⁴, Serena Nik-Zainal⁵, Sarah O'Meara⁵, Lucy Matthews¹, Jeremy Clark³, Rachel Hurst³, Richard Mithen¹⁶, Robert G Bristow^17,18,19, Paul C Boutros^17,20,21, Michael Fraser^18,19, Susanna Cooke⁵, Keiran Raine⁵, David Jones⁵, Andrew Menzies⁵, Lucy Stebbings⁵, Jon Hinton⁵, Jon Teague⁵, Stuart McLaren⁵, Laura Mudie⁵, Claire Hardy⁵, Elizabeth Anderson⁵, Olivia Joseph⁵, Victoria Goody⁵, Ben Robinson⁵, Mark Maddison⁵, Stephen Gamble⁵, Christopher Greenman²², Dan Berney²³, Steven Hazell⁴, Naomi Livni⁴, Cyril Fisher⁴, Christopher Ogden⁴, Pardeep Kumar⁴, Alan Thompson⁴, Christopher Woodhouse⁴, David Nicol⁴, Erik Mayer⁴, Tim Dudderidge⁴, Nimish C Shah¹¹, Vincent Gnanapragasam¹¹, Thierry Voet²⁴, Peter Campbell⁵, Andrew Futreal⁵, Douglas Easton²⁵, Michael R Stratton⁵.

Abstract

Genome-wide DNA sequencing was used to decrypt the phylogeny of multiple samples from distinct areas of cancer and morphologically normal tissue taken from the prostates of three men. Mutations were present at high levels in morphologically normal tissue distant from the cancer, reflecting clonal expansions, and the underlying mutational processes at work in morphologically normal tissue were also at work in cancer. Our observations demonstrate the existence of ongoing abnormal mutational processes, consistent with field effects, underlying carcinogenesis. This mechanism gives rise to extensive branching evolution and cancer clone mixing, as exemplified by the coexistence of multiple cancer lineages harboring distinct ERG fusions within a single cancer nodule. Subsets of mutations were shared either by morphologically normal and malignant tissues or between different ERG lineages, indicating earlier or separate clonal cell expansions. Our observations inform on the origin of multifocal disease and have implications for prostate cancer therapy in individual cases.

Entities: Chemical

Mesh：

Year: 2015 PMID： 25730763 PMCID： PMC4380509 DOI： 10.1038/ng.3221

Source DB: PubMed Journal: Nat Genet ISSN： 1061-4036 Impact factor: 38.330

Prostate cancer is commonly multifocal[1], although the origin of multifocal disease remains controversial. Analyses of patterns of allele loss have suggested the independence of most individual foci[2,3]. However such studies cannot exclude the presence of common underlying mutations not detected by the methods employed. Recent attempts to unravel the origins of multifocal disease using high-resolution genome technologies have also led to conflicting data with different authors concluding either that all foci in a single prostate are related[4] or that all foci are unrelated[5]. To gain further insights into the mechanism of prostate cancer development particularly the origin of multifocal disease we selected three representative prostate cancers (Fig.1, Supplementary Fig.1) that had been ERG-status mapped using the FISH break-apart method[6,7]. Twelve cancer samples and three samples designated as morphologically normal prostate based on central pathology review, were analyzed using paired-end massively-parallel DNA sequencing of complete genomes to generate comprehensive catalogues of genetic alterations (for coverage statistics see Supplementary Table 1). For 3D representations of each prostate and clinical characteristics see respectively Supplementary Fig. 2 and Supplementary Table 2. Prostates were named according to their Cancer Research UK project designation: Cases 6, 7 and 8.

Figure 1

Prostate samples chosen for whole-genome sequencing. a, ERG rearrangements determined by fluorescence in situ hybridization (FISH). Case 7 is a multifocal cancer containing two separate foci (T1/T2/T4/T5 and T3). Case 8 is also designated as a multifocal cancer,(nodules T1/T2, and T3). Yellow: un-rearranged normal ERG gene; Red, ERG gene split but both 3′ and 5′ ends retained; Green, ERG gene rearranged but only its 3′ end retained. Panels b and c: 3-colour FISH used to distinguish different ERG-locus translocation breakpoints in Case 7. b, Position of the three FISH probes: probe 1 (blue, BAC RP11-164E1, and probe 1a, BACs RP11-95G19, RP11-720N21, CTD-2511E13) was labeled in Aqua (Kreatech 415 Platinum Bright): probe 2 (red, fosmid G248P80319F5 37Kb) labeled with Cy3; and Probe 3 (green, fosmid G248P86592E2 38.5k, and probe 4, BACs RP11-372O17, RP11-115E14, RP11-729O4) labeled with FITC. The purple arrows represent the positions of ERG breakpoints detected in these experiments. For the precise position of the ERG breakpoints G and H see Table 2. c, Left: Tumor areas with ERG locus breaks G and H are indicated as light and dark green respectively. Break J was found in an adjacent prostate section not show in this figure. Right: representations of the ERG FISH patterns. Original FISH images are show in Supplementary Fig. 1. “Split” denotes that 5′ and 3′ ERG signals were separated but retained in the cell. “Del” indicates that 5′ ERG signals were lost from the cell, while 3′ ERG signals were retained.

Somatic mutations, absent from cancer and blood samples, were observed at significant levels in morphologically normal prostate tissue distant from cancer in Case 6 (518 substitutions) and in Case 7 (454 substitutions) (Supplementary Fig. 3), some of which may have potential functional significance (Table 1). The presence of substitution mutations in morphologically normal prostate tissue was confirmed in validation DNA-sequencing experiments to an average read depth of 10,000. Substitutions were present in an estimated ~48%, and ~42% of cells in morphologically normal samples from Case 6 and Case 7 respectively (Supplementary Fig. 3b)), demonstrating clonal expansions of cells within morphologically normal prostate tissue, in agreement with studies using mitochondrially-encoded enzyme cytochrome c oxidase as a marker[8].

Table 1

Sample	Description	Gene	Protein Description	Type	% reads	Total num reads	MA predicted functional impact	ANNOVARsignificantalgorithms
0006#N	chr9:g.131115799G>A	SLC27A4	p.V435I	misssense	13.79	58	low	1
0006#N	chr14:g.20389481C>T	OR4K5	p.T239M	misssense	13.25	83	high	4
0006#N	chr15:g.33873844G>T	RYR3	p.A525S	misssense	33.33	48	medium
0006#N	chr4:g.88766379C>G	MEPE	p.S120*	nonsense	20.83	24		2
0007#N	chr5:g.150885254A>T	FAT2	p.S4308T	misssense	23.4	47	low	5
0007#N	chr7:g.150934857G>T	CHPF2	p.R470L	misssense	17.24	58	medium	5
0007#N	chr8:g.24192995G>A	ADAM28	p.D470N	misssense	17.78	45	neutral	2
0007#N	chr12:g.24989522G>T	BCAT1	p.L276M	misssense	26.47	34	medium

Mutations and clonal expansions in morphologically normal tissue: point mutations present in exons with indication of functional significance. Missense and nonsense mutations detected and visually confirmed in the adjacent morphologically normal tissue were tested for functional impact using the MutationAssessor.org[27] and wANNOVAR[28] services. The OR4K5 gene was excluded as a candidate because of the potential to overcall mutations in genes encoding very large proteins[29]. Since none of the mutations had a high “MA” we considered that epigenetic changes may provide a more likely driver of clonal expansion.

Aiming to understand the tumor subclonal architecture and their phylogeny, we initially constructed phylogenetic trees based on copy number (Supplementary Fig. 4 & 5, Supplementary Data Set 1) and substitution data. We adapted our previously developed Bayesian Dirichlet process to identify clusters of substitutions in n dimensions[9], where n is the number of samples from the case, such that shared and unique subclones could be identified between related samples (Fig. 2d and Supplementary Fig. 6). To further explore the fine details and verify the main features of the phylogeny tree and clonal structure, a selection of substitutions from each potential relationship between samples were sequenced to an average read depth of 10,000 in independent DNA sequencing analyses, verifying 279 mutations across all samples. This provided us with our final integrated phylogenetic trees (Fig. 2a-c) and final list of somatic point mutations (Supplementary Data Set 2). The structure of these trees was also supported by verified insertions, deletions and breakpoints (Supplementary Data Set 3 & 4). The single cancer mass from Patient 6 contained three independent cancer clones represented by samples 6_T2, 6_T3 and 6_T4 (Fig. 2a), with a single verified substitution linking 6_T1/6_T2 and 6_T3. Patient 7 contained at least three independent cancer lineages: one (7_T3) representing the smaller cancer nodule and two (7_T1/7_T2 and 7_T4/7_T5) present in the larger cancer mass (Fig. 2b). Ten mutations were common to the morphologically normal prostate sample and to cancer samples 7_T1 and 7_T2, and three mutations joined 7_T4/7_T5 to the separate multifocal lesion 7_T3. These observations show that Prostate 7 contains at least two clones of cells that existed prior to the formation of the distinct cancers lineages. Prostate 8 contained two cancer lineages represented by 8_T1/8_T2 and 8_T3 (Fig. 2c), with 43 substitutions shared between all three tumor samples, 8_T1, 8_T2 and 8_T3, 8 of which were also present in distant morphologically normal sample 8_N.

Figure 2

Phylogenies of multi-focal prostate cancers. a-c, Phylogenies revealing the relationships between sample clones for each case. Each line is associated with a clone from a particular sample. The length of each line is proportional to the weighted quantity of variations on a logarithmic scale. The thickness of a line indicates the proportion the clone makes up of that sample i.e. 48%/52% for 6_T1 and 12%/88% for 8_T3. The minor clone of 8_T3b has no detected unique variants. 8_T3 contained 43 mutations present as a 12% subclone (T3a) shared with 8_T1/8_T2. In validation experiments 8_T3 did not contain any of the five ERG and TMPRSS2 rearrangements present in 8_T1/8_T2 (Table 2)) or mutations that were unique to 8_T1/8_T2 (10,000 depth) indicating that it represents an earlier clone of 8_T1/8_T2 seeded into tissue sample 8_T3. The various TMPRSS2-ERG translocations are indicated by their TERG ID (Table 2). d, Example 2D density plots showing the posterior distribution of the fraction of cells bearing a mutation in two samples. The fraction of cells is modeled using a Bayesian Dirichlet processes. These plots illustrate samples that have shared clonal mutations (6_T1/6_T2), and branched (unrelated) mutations (7_T2/T_T3). There are two examples of samples with a subclone. 7_T2/7_T5 has a peak at (0,0.72), which represents subclonal mutations in 72% of cells in 7_T5 that have occurred only in this sample, after divergence from the other samples. Similarly, 8_T1/8_T3 has a peak at (0.54,0), representing subclonal mutations in 54% of cells in T1 only.

Complex patterns of ERG alteration were observed in samples from Patient 6 and Patient 7 (Fig. 3); each main lineage contained at least one and in some cases two unique TMPRSS2-ERG fusions with distinct breakpoint locations within the TMPRSS2 and ERG genes (Fig. 2, Table2). The presence of multiple distinct TMPRSS2-ERG fusions was demonstrated by direct PCR across the breakpoint and by an ERG FISH break-apart assay (Table 2, Fig. 1b,c, Supplementary Fig. 1). In this respect TMPRSS-ERG fusions could be considered to be similar to the convergent gene alterations observed in kidney cancer where distinct alterations of genes such as SETD2, PTEN, and KDM5C were observed in different parts of the same cancer[10]. A deletion on Chromosome 8 exhibited a very similar pattern of alterations (Supplementary Fig. 7), but we did not see convergent evolution for other potential driver genes (Supplementary Table 3). Where two TMPRSS2-ERG fusions existed in a single lineage we were unable to determine whether these fusions co-existed at any time in the same cell as reported previously[11] and as implied by the phylogenic tree. However the FISH assay (Fig 1b,c) demonstrated that in sample 7_T4 the two TMPRSS2_ERG fusions were present in distinct cell populations at the time that the cancer sample was taken. Moreover, an additional separate ERG breakpoint was detected in a region of the cancer that had not been sampled in the DNA sequencing studies (TERG J). The occurrence of several TMPRSS2-ERG fusions is a single cancer mass is consistent with previous FISH-based studies reporting multiple ETS fusions in a low proportion of individual cancer foci[11]. ERG alterations are believed to represent a relatively early event in cancer development in agreement with their occurrence in prostatic intraepithelial neoplasia (PIN)[6], but our observations suggest that they may not always be present at the very first cellular expansion. Mutations shared either between different ERG-lineages or between cancer and morphologically normal tissue may represent earlier clonal cell expansions on the same lineage (Fig. 2a-c). Alternatively they could represent separate clones of cells within which multiple independent cancer lineages developed.

Figure 3

Patterns of ERG alterations. a-c, Circos plots highlighting ERG rearrangements present in each prostate. Each color represents a different cancer sample as indicated.

Table 2

	Donor			Middle		Acceptor
Samples	Chr	Position	Strand	Type	Seq	Chr	Position	Strand	Breakpoint	Genes	Verification	TERG ID
6_T1, 6_T2	21	39867180	+	HOMOLOGY	T	21	42877104	+	deletion	ERG-TMPRSS2	CS & P (6_T1); V (6_T1, 6_T2)	A
6_T1, 6_T4	21	39877208	+	HOMOLOGY	T	21	42871170	+	deletion	ERG-TMPRSS2	P (6_T1); V (6_T1, 6_T4)	B
6_T1, 6_T4	21	39877355	−	HOMOLOGY	CC	21	42819405	−	insertion	ERG-MX1	CS & P (6_T1); V (6_T1, 6_T4)
6_T1, 6_T4	21	39877745	+	NTS	CAT	21	39880855	+	deletion	ERG-ERG	CS & P (6_T1); V (6_T1, 6_T4)
6_T3	20	10441211	−	HOMOLOGY	G	21	39872887	+	translocation	C20orf94-ERG	CS & P & V (6_T3)
6_T3	20	10441429	+	HOMOLOGY	GT	21	42868518	−	translocation	C20orf94-TMPRSS2	CS & P & V (6_T3)
6_T3	21	39872930	+	Exact	---	21	42868510	+	deletion	ERG-TMPRSS2	CS & P & V (6_T3)	C
7_T1, 7_T2	1	205613440	+	HOMOLOGY	C	21	42857784	−	translocation	_-TMPRSS2	V (7_T1, 7_T2)
7_T1, 7_T2	2	204298424	−	HOMOLOGY	A	21	42849002	+	translocation	RAPH1-TMPRSS2	V (7_T1, 7_T2)
7_T1, 7_T2	2	204298476	+	Exact	---	19	42797705	+	translocation	RAPH1-CIC	P (7_T1); V (7_T1, 7_T2)
7_T1, 7_T2	10	120084722	−	HOMOLOGY	TG	21	42842154	+	translocation	C10orf84-TMPRSS2	CS & P (7_T1); V (7_T1, 7_T2)
7_T1, 7_T2	10	120084747	+	HOMOLOGY	AC	21	39872234	+	translocation	C10orf84-ERG	CS & P (7_T2); V (7_T1, 7_T2)
7_T1, 7_T2	21	39872152	+	HOMOLOGY	A	21	42861527	+	deletion	ERG-TMPRSS2	CS & P (7_T1); V (7_T1, 7_T2)	D
7_T1, 7_T2	21	42842403	+	Exact	---	21	42848506	−	inversion_+	TMPRSS2-TMPRSS2	CS & P (7_T1); V (7_T1, 7_T2)
7_T2	21	39831266	+	HOMOLOGY	AAAC	21	42875633	+	deletion	ERG-TMPRSS2	CS & P & V (7_T2)	E
7_T3	21	39861568	+	NTS	TA	21	42865303	+	deletion	ERG-TMPRSS2	CS & P & V (7_T3)	F
7_T4	21	39835734	+	HOMOLOGY	G	21	42867100	+	deletion	ERG-TMPRSS2	CS & P & V (7_T4)	G
7_T4	21	42841552	−	HOMOLOGY	GGCT	21	42851963	+	inversion_−	TMPRSS2-TMPRSS2	CS & P & V (7_T4)
7_T4, 7_T5	21	39868722	+	Exact	---	21	42870051	+	deletion	ERG-TMPRSS2	CS & P (7_T4); V (7_T4, 7_T5)	H
8_T1, 8_T2	21	38745261	+	HOMOLOGY	T	21	42851601	−	inversion_+	DYRK1A-TMPRSS2	P (8_T1); V (8_T1, 8_T2)
8_T1, 8_T2	21	38745286	−	HOMOLOGY	A	21	42859198	−	insertion	DYRK1A-TMPRSS2	CS & P (8_T1); V (8_T1, 8_T2)
8_T1, 8_T2	21	39831518	+	Exact	---	21	42870497	−	inversion_+	ERG-TMPRSS2	CS (8_T1); P & V (8_T1, 8_T2)	I
8_T1, 8_T2	21	42844460	−	HOMOLOGY	T	21	42851648	+	inversion_−	TMPRSS2-TMPRSS2	V (8_T1, 8_T2)
8_T1, 8_T2	21	42863787	−	HOMOLOGY	G	21	42870663	+	inversion_−	TMPRSS2-TMPRSS2	CS & P (8_T1); V (8_T1, 8_T2)

Patterns of ERG alterations. Positions and structure of each ERG breakpoints and related rearrangements. The position and structure of the breakpoint was determined, in the majority of cases, by capillary sequencing using custom-designed PCR across the rearrangement breakpoint as previously described[30] (“CS” in column “Verification”), and/or by in-silico reconstruction using local de novo assembly in Brass phase 2. Verification by sizing PCR products across the breakpoint using gel electrophoresis was also performed (“P”). All breakpoints were visually verified (“V”) to ensure the presence of discordant reads and checked that they did not occur in repeat regions.

Recently, we identified 21 distinct mutational signatures from 7,042 samples across 30 different cancer types[12]. The contribution of mutational processes was calculated for prostate cancer as previously described[12,13] (Fig. 4). A signature (designated Signature 1A in Ref. 12) associated with spontaneous deamination of 5-methyl-cytosine at CpG sequences explained ~50% of all of our mutations. Two additional signatures with unknown etiology, designated Signature 5 and Signature 8, best explained the remaining somatic mutations. Signature 5, present in all prostate samples may reflect an endogenous mutational process[12]. Signature 8, present in two cancer samples from a single cancer nodule, is characterized by weak C>A strand bias. Critically these observations show that the same mutational processes, giving rise to Signatures 1a and 5, are detected both in cancer and in matched morphologically normal prostate tissue. We identified clustering of C>T and C>G mutations previously referred to as kataegis[14] and complex interdependent translocations and deletions called chromoplexy[15] in some cancer lineages (Supplementary Fig. 8 & 9).

Figure 4

Relative contributions of mutational signatures to the total mutation burden of each sample. The mutational spectra, as defined by the triplets of nucleotides around each substitution, of each sample were deconvoluted into mutational processes using 22 distinct signatures determined from 7,042 cancers as described previously[12,13]. The signature designations (1a, 5, 8) match those reported previously[12]. For sample 7_T4 and 8_N there were too few mutations to be able to accurately identify the contributions of the mutational signatures.

Next generation sequence technologies have previously been used to identify critical genetic processes in prostate cancer development[15-19]. Our results demonstrate the presence of clonal expansions or fields of cells in the morphologically normal prostate that provide a background against which prostate cancer develops. A recent study on a 115 year old woman identified 424 point mutations, thought to result from somatic mosaicism, in the rapidly dividing tissue blood, but failed to detect any mutations in brain tissue[20]. The presence of mutations in blood was accompanied by telomere attrition that was not observed in other tissues. Prostate is considered to be a relatively quiescent tissue[21], and we found that the telomeres in morphologically normal tissue from Cases 6 and 7 had not undergone attrition, being of comparable length to telomeres in adjacent cancer. The processes at work in morphologically normal prostate therefore appear to be distinct from those reported for blood (see Supplementary Notes for full discussion). Whether the clones of cells observed in morphologically normal prostate are generated by a pathological process or are the product of somatic mosaicism involving unexpectedly high mutation rates, the resulting clonal fields of cells may influence cancer development and/or contribute to multifocality and the presence of multiple cancer lineages in a single cancer mass. Evidence for a field effect in prostate cancer is also supported by studies demonstrating tumor-like alterations in cytomorphology, gene expression, epigenetics in adjacent morphologically normal tissue, and the presence of multifocal disease in a high proportion of prostates. Field effects have also been proposed for oral cancer[22], head and neck cancer[23] and breast cancer[24]. Our results have implications for the use of cancer focal therapy when targeting a single nodule of cancer within the prostate[25,26] and for potential chemotherapeutic approaches. We propose that (i) focal therapy may only be curative if surrounding clonal cell populations within morphologically normal tissue were also ablated, and (ii) cancer heterogeneity may hinder therapeutic targeting and biomarker investigation.

ONLINE METHODS

Sample Selection and Fluorescence in situ Hybridisation

Samples for analysis were collected from prostatectomy patients at the Addenbrooke’s Hospital (see Supplementary Table 2). The study was approved by the Trent Multicentre Research Ethics Committee. Informed consent was obtained for all patients. Prostates were sliced and processed as described previously[31]. In brief, a single 5 mm slice of the prostate was selected for research purposes. 4 or 6 mm cores were taken from the slice and frozen. Frozen cores were mounted vertically and sectioned transversely giving a single 5 μm frozen section for H&E staining followed by 6×50 μm sections for DNA preparation. The presence of or complete absence of cancer was confirmed independently by three pathologists in central pathology review of the 5 μm H&E stained tissue slice immediately adjacent to tissue slices used for DNA preparation. The ERG fluorescence in situ hybridisation break-apart assay for assessing ERG gene rearrangement was performed as described previously[6], both (i) on whole-mount formalin-fixed sections, taken immediately adjacent to the research slice, and (ii) on the frozen slices, immediately adjacent to the samples selected for DNA sequencing that had been initially subject to H&E staining. In all cases, the ERG status determined by these two methods and shown in Figure 1, were consistent.

DNA sequencing

Samples and Massively Parallel Sequencing

DNA was extracted from 18 samples from 3 patients: 12 prostate cancer samples, 3 adjacent morphologically normal prostate samples and 3 matched bloods. Paired-end whole genome sequencing of the samples was performed at Illumina, Inc. Paired-end libraries were manually generated from 1 μg of gDNA using the Illumina Paired End Sample Prep Kit (Catalog # PE-102-1002). Fragmentation was performed with Covaris E220. After end repair, A-tailing, and adapter ligation as per the sample prep kit instructions, libraries were manually size-selected using agarose gel electrophoresis, targeting 300 bp inserts. Adapter-ligated libraries were PCR amplified for 10 cycles and purified through a second agarose gel electrophoresis. Final libraries were QC’ed on a Agilent Bioanalyzer and quantified by qPCR and/or picogreen fluorimetry. Samples were clustered with Illumina v1.5 flowcells using the Illumina cBot with the TruSeq Paired End Cluster Kit v3. Flowcells were sequenced as 100 base paired-end (non-indexed) reads on the Illumina HiSeq2000 using TruSeq SBS chemistry v3 to a target depth of 50× for the tumour samples and 30× for adjacent morphologically normal and blood samples. The Burrows-Wheeler Aligner (BWA) was used to align the sequencing data from each lane to the GRCh37 reference human genome[32]. Lanes that pass quality control are merged into a single well-annotated sample BAM file with duplicate reads removed. This data has been submitted to the European Genome-Phenome Archive (EGAD00001000689).

Mutation-Calling: Substitutions

CaVEMan (Cancer Variants Through Expectation Maximization), an in-house bespoke algorithm developed at the Sanger Institute, was used for calling somatic substitutions. CaVEMan utilises a Bayesian expectation maximization (EM) algorithm: Given the reference base, copy number status and fraction of aberrant tumor cells present in each cancer sample, CaVEMan generates a probability score for potential genotypes at each genomic position. A ‘somatic’ probability of 95% and above was applied as a cut off. Further post-processing filters were applied to eliminate false positive calls arising from genomic features that generate mapping errors and systematic sequencing artifacts. In addition to the standard filters applied in the Sanger pipeline we designed project-specific filters to improve the positive predictive value of our callers based on results from visually inspecting and calling many hundreds of variants. Visually inspecting involves checking that the variant was in at least three reads, not in any reads of control, no strand bias, no correlation of the reads containing the variant and read quality, not in a location where indels are also detected, not in a poorly mapped region, and not in a repeat region. Substitutions that are found in the WGS data of more than 2.5% of a batch of 465 normal non-malignant samples from a range of tissue types were also removed. Additional visual verification across all samples for a patient was performed for all non-intronic gene substitutions, all substitutions in adjacent morphologically normal samples, potential “field effect” substitutions, substitutions shared between adjacent morphologically normal and neoplastic samples, and the rare predicted substitutions apparently violating the inferred phylogeny.

Mutation-Calling: Insertions/Deletions

Insertions and deletions in the tumor, morphologically normal and matched blood control genomes were called using a modified Pindel version 0.2.0 on the NCBI37 genome build[33]. As with the substitutions, all standard Sanger pipeline filters were applied, as well as a custom filter built based on results from visually calling identified variants. Indels that were detected by Pindel in more than two samples from a series of hundreds of malignant non-prostate tissue were also removed. If an indel detected by Pindel that does not pass the filters is found in another sample for that patient and does pass all filters, it is also included. From those indels that passed all filters, for each sample, up to one hundred variants were validated by capillary sequencing. In addition, visual verification across all samples for a patient was performed for all indels occurring within genes, all indels in adjacent morphologically normal samples, potential “field effect” indels, those indels that were not supported by the phylogeny and a sampling of variants from each phylogeny relationship.

Mutation-Calling: Structural Variants

Brass (Breakpoints via assembly), an in-house bespoke algorithm developed at the Sanger Institute, was used for detecting structural variants. In Brass phase 1, discordant read pairs are detected and integrated to find regions of interest. These regions of interest are removed if they have been found in the matched blood normal sample, have been detected as germline in PCR validation of any other sample, have a low numbers of reads supporting them or appear to be in a “difficult” region of the genome. For a subset of regions, validation was performed by gel electrophoresis PCR using custom-designed PCR primers across the rearrangement breakpoint as previously described[34] and for those products that give a band the precise location and nature of the breakpoint was determined by standard Sanger capillary sequencing methods. In the cases where the PCR experiments failed, Brass phase 2 was applied to the remaining predicted somatic structural variants. This gathers reads around the region, including half-unmapped reads and performs a local de novo assembly using Velvet[35]. Identifiable breakpoints have a distinctive De Bruijn graph pattern and allowed the breakpoint to be regenerated down to base pair resolution. Any breakpoints where an exact location could not be determined were removed. To ensure that breakpoints shared between samples in a patient were picked up, in-silico and PCR cross-sample experiments were performed. All breakpoints reported have been visually verified to ensure the presence of discordant reads and checked to ensure they were not in repeat regions. To detect rearrangements involved in chromoplexy, a recently described process generating chained rearrangements we applied ChainFinder[15]. We used default parameters, selecting the rearrangements from 57 prostate genomes as background. As input copy number data, we used data derived from Affymetrix SNP 6.0 arrays, and processed using ASCAT[36]. As input structural variants, for each patient, we combined all high confidence breakpoints detected in all samples of that patient. One chained event was manually filtered, as it combined somatic rearrangements present in separate subpopulations in different samples, and hence could not have occurred as one chromoplexy event.

Mutation-Calling: Copy Number

The Battenberg algorithm was used to detect clonal and sub-clonal somatic copy number alterations (CNA) and estimate ploidy and tumour content from the NGS data as previously described[9]. Briefly, germline heterozygous SNPs are phased using Impute2 and a- and b- alleles assigned. Data is segmented using piecewise constant fitting[37] and subclonal copy number segments are identified as those with deviations in the b-allele frequencies from the values expected when all cells have a common copy number in that segment, using a t-test. Ploidy and tumour content are estimated using the same method used by ASCAT[36].

Construction of phylogenetic trees

For each patient, phylogenetic trees were constructed separately using (i) copy number aberrations (CNAs) and (ii) point mutations. Clonal and subclonal CNAs were identified using the previously described Battenberg algorithm[9]. This method achieves high sensitivity for the detection of CNAs found in small proportions of cells by phasing heterozygous SNPs into parent specific haplotype blocks. Joint analysis of SNPs within these blocks, rather than single SNPs, allows the resolution of CNAs found in ~5% of cells, with 30× sequencing depth. Matching of copy number and rearrangement breakpoints, supported by visual inspection of allele frequency and logR plots, was used to identify CNAs common to multiple samples. Point mutations were analysed using an adaptation of a previously described Bayesian Dirichlet process. Mutations within each sample are modelled as deriving from an unknown number of subclones, each of which is present at an unknown fraction of tumour cells and contributes an unknown proportion of all somatic mutations, with all the unknown parameters jointly estimated. In order to identify clusters of mutations that are common to 2 or more samples, the Dirichlet process was extended into 2 dimensions, with the fraction of tumour cells bearing a mutation in each of a pair of samples jointly estimated from the number of reads observed in each sample. The presence of clusters of unique or shared mutations can be inferred from the position of the peaks in the resulting 2-dimensional probability density.

Dirichlet process clustering

We used a previously developed Bayesian Dirichlet process to model clusters of clonal and subclonal point mutations, allowing inference of the number of subclones, the fraction of cells within each subclone and the number of mutations within each clone[36]. Within this model, the number of reads bearing the ith mutation, y, is drawn from a binomial distribution where N is the total number of reads at the mutated base and ζ is the expected fraction of reads that would report a mutation present in 100% of tumour cells at that locus. π ∈ (0, 1), the fraction of tumour cells carrying the ith mutation, is modelled as coming from a Dirichlet process. We use the stick-breaking representation of the Dirichlet process: where ω is the weight of the hth mutation cluster, i.e. the proportion of all somatic mutations specific to that cluster. This model was extended into n dimensions, where n is the number of related samples, with the number of mutant reads obtained from each sample modelled as an independent binomial distribution, each with an independent π drawn with a Dirichlet process from a base distribution U(0,1). Gibbs sampling was used to estimate the posterior distribution of the parameters of interest, implemented in R, version 2.11.1. The Markov chain was run for 500 iterations, of which the first 100 were discarded. In order to plot the mutation density, each possible pair of related samples was treated separately. The median of the density was estimated from π, each weighted by the associated value of ω, using a bivariate Gaussian kernel, implemented in the R library KernSmooth. Median values were then plotted using the R function ‘levelplot’, using a colour palette graduated from white (low probability of a mutation) to red (high probability of a mutation).

Targeted PCR and MiSeq sequencing of selected mutations and structural variants

PCR primers for somatic substitutions and indels were designed using Primer-Z[38], with known SNPs and human repeats masked. All amplicons were designed to be a maximum of 500 bp and all variants of interest were checked to be within a read generated on a 2×250bp MiSeq run. DNA was amplified using Phusion HotStart II DNA polymerase kit (Thermo Fisher Scientific) and thermo cycler. DNA was denatured at 98 °C for 30 seconds followed by 30 cycles of denaturing at 98 °C for 10 seconds, annealing at 65 °C for 20 seconds and extension at 72 °C for 20 seconds. Products were incubated at 72 °C for 5 minutes before cooling to 4 °C. All PCR products were analysed using 96 well 2% agarose E-gel with ethidium bromide (Life Technologies). If no detectable band was present these reactions were repeated using an annealing temperature of 60 °C. 2 μl of PCR mixture for each sample of DNA were pooled. Pooled DNA was diluted 1:10, and tagged with an individual barcode (Fluidigm) using Expand High Fidelity PCR System (Roche), following manufacturers protocol (Access Array System for Illumina Systems User Guide). DNA was denatured at 98 °C for 1 minute followed by 15 cycles of denaturing at 98 °C for 15 seconds, annealing at 60 °C for 30 seconds and extension at 72 °C for 1 minute. Products were incubated at 72 °C for 3 minutes before cooling to 4 °C. Barcoded PCR samples were pooled for each patient and analysed using 2100 Bioanalyzer (Agilent) to determine the average size of the PCR library and by KAPA SYBR FAST qPCR (Anachem) to determine the library concentration. 2 nM of each sample was analysed using MiSeq (Illumina). The average sequencing depth across all mutations assessed within each patient varied between 4900 (in 8_T1) and 16600 (in 7_T4). However, for around a fifth of the targeted mutations within each patient, the average coverage across all samples from that patient was very much lower, 200 or lower. Many of these low coverage mutations had mutant allele frequencies very different from the values obtained from whole genome sequencing (WGS). These PCRs were considered to have failed and were not included in subsequent analysis. Due to the very high coverage, a low rate of sequencing errors was observed for most mutations. This manifested as a small percentage of aberrant reads, peaked close to zero and rapidly decaying exponentially with allele fraction. The rate of these errors was evaluated by considering those samples that reported no mutant reads in WGS. For this purpose, only mutations that were identified in samples that were previously identified as being phylogenetically related were included, in order to filter out low quality or questionable calls. Allele frequencies, f, were converted to mutation copy numbers, n, as previously described[39]. where ρ, and are, respectively, the tumor purity, the locus-specific copy number in the blood normal cells, inferred from the Battenberg algorithm. Mutation copy numbers correspond to the fraction of cells bearing a mutation multiplied by the number of chromosomal copies bearing the mutation and are more informative than raw allele frequencies as they are adjusted for tumour ploidy and normal cell contamination. The distribution of misreads was then found to have similar distributions for the different patients, with average reported mutation copy numbers of 0.0059 ± 0.0072, 0.0032 ± 0.0070 and 0.0037 ± 0.0035 in patients 6, 7 and 8, respectively. The highest reported mutation copy number for these mutations was 0.041. This value was therefore used as a threshold for distinguishing between mutations present in a small proportion of cells and misreads arising from sequencing errors. It should be noted that a mutation copy number of 0.041 corresponds to an allele frequency of ~1% for most mutations, since most mutations occur in diploid regions of the genome and the average tumour content across the samples is below 50%. For samples 6_T2, 6_T3 and 6_T4, it was apparent that nearly all mutations that were present in 6_T1 were identified at allele fractions slightly above the threshold used to exclude artefacts (corresponding to a mutation copy number ~0.05). Since these mutations were exclusively those present in 6_T1, it appears that ‘contamination’ of these 3 samples by 6_T1 occurred at some point during the PCR experiment, although whether this contamination is physical or the result of bleed-through of tags used in multiplexing is unknown. Assessment of WGS data, by checking the allele frequency of mutations identified uniquely in 6_T1 in samples 6_T2, 6_T3 and 6_T4, indicated that there may have been some intermixing of the cells 6_T1 with 6_T2, corresponding to a much lower percentage of cells (1.8%) and possibly arising from growth of cells in 6_T1 into the region sampled in 6_T2. Further, no evidence for intermixing of 6_T1 with 6_T3 or 6_T4 was found in WGS data. For this reason, mutations apparently present in the PCR experiment in 6_T2, 6_T3 and 6_T4 and identified in 6_T1 in both WGS and PCR were only considered to be validated if they fell above a higher threshold, set to a mutation copy number of 0.2, that excluded mutant reads arising from the contamination of these samples.

Mutational Signatures

The mutational spectra, as defined by the triplets of nucleotides around each mutation, of each sample was deconvoluted into mutational processes as described[12,13].

Clustering of Mutations

We investigated regional clustering of substitution mutations by constructing plots (“rainfall plots”) in which the distance between each somatic substitution, and the substitution immediately before it has been plotted for each mutation. This was achieved exactly as described previously[9].

38 in total

1. Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Authors: Daniel R Zerbino; Ewan Birney
Journal: Genome Res Date: 2008-03-18 Impact factor: 9.043

2. Method for sampling tissue for research which preserves pathological data in radical prostatectomy.

Authors: Anne Y Warren; Hayley C Whitaker; Beverley Haynes; Trogon Sangan; Leigh-Anne McDuffus; Jonathan D Kay; David E Neal
Journal: Prostate Date: 2012-07-16 Impact factor: 4.104

3. Exome sequencing of prostate cancer supports the hypothesis of independent tumour origins.

Authors: Johan Lindberg; Daniel Klevebring; Wennuan Liu; Mårten Neiman; Jianfeng Xu; Peter Wiklund; Fredrik Wiklund; Ian G Mills; Lars Egevad; Henrik Grönberg
Journal: Eur Urol Date: 2012-03-31 Impact factor: 20.096

Review 4. The molecular biology of head and neck cancer.

Authors: C René Leemans; Boudewijn J M Braakhuis; Ruud H Brakenhoff
Journal: Nat Rev Cancer Date: 2010-12-16 Impact factor: 60.716

5. Molecular analysis of multifocal prostate cancer by comparative genomic hybridization.

Authors: Masayuki Kobayashi; Haruna Ishida; Takayuki Shindo; Shin-Ichiro Niwa; Mika Kino; Koji Kawamura; Naoto Kamiya; Takashi Imamoto; Hiroyoshi Suzuki; Yoshifumi Hirokawa; Taizo Shiraishi; Tohru Tanizawa; Yukio Nakatani; Tomohiko Ichikawa
Journal: Prostate Date: 2008-12-01 Impact factor: 4.104

6. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing.

Authors: Peter J Campbell; Philip J Stephens; Erin D Pleasance; Sarah O'Meara; Heng Li; Thomas Santarius; Lucy A Stebbings; Catherine Leroy; Sarah Edkins; Claire Hardy; Jon W Teague; Andrew Menzies; Ian Goodhead; Daniel J Turner; Christopher M Clee; Michael A Quail; Antony Cox; Clive Brown; Richard Durbin; Matthew E Hurles; Paul A W Edwards; Graham R Bignell; Michael R Stratton; P Andrew Futreal
Journal: Nat Genet Date: 2008-04-27 Impact factor: 38.330

7. The landscape of cancer genes and mutational processes in breast cancer.

Authors: Philip J Stephens; Patrick S Tarpey; Helen Davies; Peter Van Loo; Chris Greenman; David C Wedge; Serena Nik-Zainal; Sancha Martin; Ignacio Varela; Graham R Bignell; Lucy R Yates; Elli Papaemmanuil; David Beare; Adam Butler; Angela Cheverton; John Gamble; Jonathan Hinton; Mingming Jia; Alagu Jayakumar; David Jones; Calli Latimer; King Wai Lau; Stuart McLaren; David J McBride; Andrew Menzies; Laura Mudie; Keiran Raine; Roland Rad; Michael Spencer Chapman; Jon Teague; Douglas Easton; Anita Langerød; Ming Ta Michael Lee; Chen-Yang Shen; Benita Tan Kiat Tee; Bernice Wong Huimin; Annegien Broeks; Ana Cristina Vargas; Gulisa Turashvili; John Martens; Aquila Fatima; Penelope Miron; Suet-Feung Chin; Gilles Thomas; Sandrine Boyault; Odette Mariani; Sunil R Lakhani; Marc van de Vijver; Laura van 't Veer; John Foekens; Christine Desmedt; Christos Sotiriou; Andrew Tutt; Carlos Caldas; Jorge S Reis-Filho; Samuel A J R Aparicio; Anne Vincent Salomon; Anne-Lise Børresen-Dale; Andrea L Richardson; Peter J Campbell; P Andrew Futreal; Michael R Stratton
Journal: Nature Date: 2012-05-16 Impact factor: 49.962

8. The genomic complexity of primary human prostate cancer.

Authors: Michael F Berger; Michael S Lawrence; Francesca Demichelis; Yotam Drier; Kristian Cibulskis; Andrey Y Sivachenko; Andrea Sboner; Raquel Esgueva; Dorothee Pflueger; Carrie Sougnez; Robert Onofrio; Scott L Carter; Kyung Park; Lukas Habegger; Lauren Ambrogio; Timothy Fennell; Melissa Parkin; Gordon Saksena; Douglas Voet; Alex H Ramos; Trevor J Pugh; Jane Wilkinson; Sheila Fisher; Wendy Winckler; Scott Mahan; Kristin Ardlie; Jennifer Baldwin; Jonathan W Simons; Naoki Kitabayashi; Theresa Y MacDonald; Philip W Kantoff; Lynda Chin; Stacey B Gabriel; Mark B Gerstein; Todd R Golub; Matthew Meyerson; Ashutosh Tewari; Eric S Lander; Gad Getz; Mark A Rubin; Levi A Garraway
Journal: Nature Date: 2011-02-10 Impact factor: 49.962

9. High-resolution genome-wide copy-number analysis suggests a monoclonal origin of multifocal prostate cancer.

Authors: Lara K Boyd; Xueying Mao; Liyan Xue; Dongmei Lin; Tracy Chaplin; Sakunthala C Kudahetti; Elzbieta Stankiewicz; Yongwei Yu; Luis Beltran; Greg Shaw; John Hines; R Tim D Oliver; Daniel M Berney; Bryan D Young; Yong-Jie Lu
Journal: Genes Chromosomes Cancer Date: 2012-02-15 Impact factor: 5.006

10. Duplication of the fusion of TMPRSS2 to ERG sequences identifies fatal human prostate cancer.

Authors: G Attard; J Clark; L Ambroisine; G Fisher; G Kovacs; P Flohr; D Berney; C S Foster; A Fletcher; W L Gerald; H Moller; V Reuter; J S De Bono; P Scardino; J Cuzick; C S Cooper
Journal: Oncogene Date: 2007-07-16 Impact factor: 9.867

177 in total

Review 1. Cellular determinants and microenvironmental regulation of prostate cancer metastasis.

Authors: Kiera Rycaj; Hangwen Li; Jianjun Zhou; Xin Chen; Dean G Tang
Journal: Semin Cancer Biol Date: 2017-04-11 Impact factor: 15.707

Review 2. The evolution of tumour phylogenetics: principles and practice.

Authors: Russell Schwartz; Alejandro A Schäffer
Journal: Nat Rev Genet Date: 2017-02-13 Impact factor: 53.242

Review 3. Cellular and Molecular Mechanisms Underlying Prostate Cancer Development: Therapeutic Implications.

Authors: Ugo Testa; Germana Castelli; Elvira Pelosi
Journal: Medicines (Basel) Date: 2019-07-30

4. Utility of Single-Cell Genomics in Diagnostic Evaluation of Prostate Cancer.

Authors: Joan Alexander; Jude Kendall; Jean McIndoo; Linda Rodgers; Robert Aboukhalil; Dan Levy; Asya Stepansky; Guoli Sun; Lubomir Chobardjiev; Michael Riggs; Hilary Cox; Inessa Hakker; Dawid G Nowak; Juliana Laze; Elton Llukani; Abhishek Srivastava; Siobhan Gruschow; Shalini S Yadav; Brian Robinson; Gurinder Atwal; Lloyd C Trotman; Herbert Lepor; James Hicks; Michael Wigler; Alexander Krasnitz
Journal: Cancer Res Date: 2017-11-27 Impact factor: 12.701

Review 5. Precision medicine for advanced prostate cancer.

Authors: Stephanie A Mullane; Eliezer M Van Allen
Journal: Curr Opin Urol Date: 2016-05 Impact factor: 2.309

6. Spatial genomic heterogeneity within localized, multifocal prostate cancer.

Authors: Paul C Boutros; Michael Fraser; Nicholas J Harding; Richard de Borja; Dominique Trudel; Emilie Lalonde; Alice Meng; Pablo H Hennings-Yeomans; Andrew McPherson; Veronica Y Sabelnykova; Amin Zia; Natalie S Fox; Julie Livingstone; Yu-Jia Shiah; Jianxin Wang; Timothy A Beck; Cherry L Have; Taryne Chong; Michelle Sam; Jeremy Johns; Lee Timms; Nicholas Buchner; Ada Wong; John D Watson; Trent T Simmons; Christine P'ng; Gaetano Zafarana; Francis Nguyen; Xuemei Luo; Kenneth C Chu; Stephenie D Prokopec; Jenna Sykes; Alan Dal Pra; Alejandro Berlin; Andrew Brown; Michelle A Chan-Seng-Yue; Fouad Yousif; Robert E Denroche; Lauren C Chong; Gregory M Chen; Esther Jung; Clement Fung; Maud H W Starmans; Hanbo Chen; Shaylan K Govind; James Hawley; Alister D'Costa; Melania Pintilie; Daryl Waggott; Faraz Hach; Philippe Lambin; Lakshmi B Muthuswamy; Colin Cooper; Rosalind Eeles; David Neal; Bernard Tetu; Cenk Sahinalp; Lincoln D Stein; Neil Fleshner; Sohrab P Shah; Colin C Collins; Thomas J Hudson; John D McPherson; Theodorus van der Kwast; Robert G Bristow
Journal: Nat Genet Date: 2015-05-25 Impact factor: 38.330

7. Genomic Heterogeneity Within Individual Prostate Cancer Foci Impacts Predictive Biomarkers of Targeted Therapy.

Authors: David J VanderWeele; Richard Finney; Kotoe Katayama; Marc Gillard; Gladell Paner; Seiya Imoto; Rui Yamaguchi; David Wheeler; Justin Lack; Maggie Cam; Andrea Pontier; Yen Thi Minh Nguyen; Kazuhiro Maejima; Aya Sasaki-Oku; Kaoru Nakano; Hiroko Tanaka; Donald Vander Griend; Michiaki Kubo; Mark J Ratain; Satoru Miyano; Hidewaki Nakagawa
Journal: Eur Urol Focus Date: 2018-02-15

Review 8. Clonal expansion in non-cancer tissues.

Authors: Nobuyuki Kakiuchi; Seishi Ogawa
Journal: Nat Rev Cancer Date: 2021-02-24 Impact factor: 60.716

9. Multiparametric liquid biopsy analysis in metastatic prostate cancer.

Authors: Emmanuelle Hodara; Gareth Morrison; Alexander Cunha; Daniel Zainfeld; Tong Xu; Yucheng Xu; Paul W Dempsey; Paul C Pagano; Farideh Bischoff; Aditi Khurana; Samuel Koo; Marc Ting; Philip D Cotter; Mathew W Moore; Shelly Gunn; Joshua Usher; Shahrooz Rabizadeh; Peter Danenberg; Kathleen Danenberg; John Carpten; Tanya Dorff; David Quinn; Amir Goldkorn
Journal: JCI Insight Date: 2019-03-07

Review 10. Androgen receptor and prostate cancer stem cells: biological mechanisms and clinical implications.

Authors: Qu Deng; Dean G Tang
Journal: Endocr Relat Cancer Date: 2015-08-18 Impact factor: 5.678