Literature DB >> 36045292

African-specific molecular taxonomy of prostate cancer.

Weerachai Jaratlerdsiri1,2, Jue Jiang1,2, Tingting Gong1,2,3, Sean M Patrick4, Cali Willet5, Tracy Chew5, Ruth J Lyons2, Anne-Maree Haynes2, Gabriela Pasqualim6,7, Melanie Louw8, James G Kench9, Raymond Campbell10, Lisa G Horvath2,11, Eva K F Chan2,12, David C Wedge13, Rosemarie Sadsad5, Ilma Simoni Brum6, Shingai B A Mutambirwa14, Phillip D Stricker2,15, M S Riana Bornman4, Vanessa M Hayes16,17,18,19.   

Abstract

Prostate cancer is characterized by considerable geo-ethnic disparity. African ancestry is a significant risk factor, with mortality rates across sub-Saharan Africa of 2.7-fold higher than global averages1. The contributing genetic and non-genetic factors, and associated mutational processes, are unknown2,3. Here, through whole-genome sequencing of treatment-naive prostate cancer samples from 183 ancestrally (African versus European) and globally distinct patients, we generate a large cancer genomics resource for sub-Saharan Africa, identifying around 2 million somatic variants. Significant African-ancestry-specific findings include an elevated tumour mutational burden, increased percentage of genome alteration, a greater number of predicted damaging mutations and a higher total of mutational signatures, and the driver genes NCOA2, STK19, DDX11L1, PCAT1 and SETBP1. Examining all somatic mutational types, we describe a molecular taxonomy for prostate cancer differentiated by ancestry and defined as global mutational subtypes (GMS). By further including Chinese Asian data, we confirm that GMS-B (copy-number gain) and GMS-D (mutationally noisy) are specific to African populations, GMS-A (mutationally quiet) is universal (all ethnicities) and the African-European-restricted subtype GMS-C (copy-number losses) predicts poor clinical outcomes. In addition to the clinical benefit of including individuals of African ancestry, our GMS subtypes reveal different evolutionary trajectories and mutational processes suggesting that both common genetic and environmental factors contribute to the disparity between ethnicities. Analogous to gene-environment interaction-defined here as a different effect of an environmental surrounding in people with different ancestries or vice versa-we anticipate that GMS subtypes act as a proxy for intrinsic and extrinsic mutational processes in cancers, promoting global inclusion in landmark studies.
© 2022. Crown.

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 36045292      PMCID: PMC9477733          DOI: 10.1038/s41586-022-05154-6

Source DB:  PubMed          Journal:  Nature        ISSN: 0028-0836            Impact factor:   69.504


Main

Prostate cancer is a common heterogeneous disease that is responsible annually for more than 1,400,000 new diagnoses and 375,000 male-associated deaths worldwide[1]. Characterized by a highly variable natural history and diverse clinical behaviours[4], it is not surprising that genome profiling has revealed extensive intra- and intertumour heterogeneity and complexity[5,6]. The identification of oncogenic subtypes[7] and actionable drug targets[8] are moving prostate cancer management a step closer to the promise of precision medicine[7,9-12]. Although high-income European ancestral countries are well along the road to incorporating cancer genomics in all aspects of cancer care[13], the rest of the world lags behind, with a notable absence in sub-Saharan Africa[14]. Prostate cancer is no different, with a single large-scale study out of China[11]; in 2018, we provided a snapshot for sub-Saharan Africa, reporting an elevated mutational density in a mere six cases[15]. With mortality rates of greater than double compared with high-income countries and quadrupled for greater Asia, in sub-Saharan Africa, prostate cancer is the top-ranked male-associated cancer both by diagnosis and deaths, including southern Africa with age-standardized rates of 65.9 and 22 per 100,000, respectively[1]. Through the Southern African Prostate Cancer Study (SAPCS), we report a 2.1-fold increase in aggressive disease (grades 4–5) and 4.8-fold increase in prostate-specific antigen levels at diagnosis compared with African Americans[16]. Here we describe, to our knowledge, the largest cancer and prostate cancer genomics data for sub-Saharan Africa, including 123 South African men. Controlling for study artefacts, an additional 53 Australian and 7 Brazilian individuals were passed simultaneously through the same high-depth whole-genome sequencing (WGS), mutation-calling and analytical framework. Focusing on treatment-naive cases (100% South Africans, 98% Australians and two confirmed Brazilians) and aggressive tumours (grades 4–5 for 72.2% South Africans, 86.8% Australians and 85.7% Brazilians; Extended Data Fig. 1a) at biopsy (100% South Africans) or surgery (100% Australians, 62.5% Brazilians) and patient-matched blood achieving coverages of 88.69 ± 14.78 and 44.34 ± 8.11, respectively (median ± s.d.; Supplementary Table 1), we uniformly generated, called and assessed about 2 million somatic variants. Through ancestral classification (genetic ancestry over self-identified ethnicity), we show a greater number of acquired genetic alterations within African individuals while identifying both globally relevant and African-specific genomic subtypes. Combining our somatic variant dataset with that published for ethnically defined European[7,8,17,18] and Chinese[11] prostate cancer genomes, we reveal a prostate cancer taxonomy with different clinical outcomes. The inclusion of 2,658 cancer genomes from the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG)[13] expanded our global mutational subtyping between cancer types. Using known clock-like mutational processes in each subtype, we inferred mutation timing of oncogenic drivers in broad periods of tumour evolution and calculated the mutation rates for each subtype that had a distinctive tumour evolution pattern. Combined, these analyses enable us to demonstrate how global inclusion in cancer genomics can unravel unseen heterogeneity in prostate cancer in terms of its genomic and clinical behaviours.
Extended Data Fig. 1

Clinical cohorts and statistical metrics.

a, Clinical and pathological patient. characterization. Pairwise comparisons using contingency tables and Fisher’s Exact test between African ancestry and Admixed/European ancestry are highlighted in bold with two-sided P-value <0.05 (*), <0.01 (**), or <0.001 (***). Summary statistics, including the median, first and third quartiles (Q1-Q3), are also present. b, STRUCTURE analysis of bi-allelic germline variants with the logistic prior model. Model components used to explain structure in the plot are K = 5. All spectrum of African contributions are summed and assigned as African ancestry. c, Saturation curve for all driver types across 183 patients. Recurrent copy number gains and losses were measured using GISTIC v2 (Supplementary Methods). CDS, coding sequence; SV, structural variation. d, Spearman’s correlation between different variables measured in this cohort. Dot sizes represent the magnitude of correlation, with significant P-values (two-sided) <0.01.

Genetic ancestry

Genetic ancestries were estimated for the 183 patient donors using a joint dataset in a unified analysis aggregated from a collection of geographically matched African (n = 64) and European (n = 4) deep-coverage published and unpublished reference genomes[19]. Ancestries were assigned using 7,472,833 markers as African (n = 113, all South Africans), with greater than 98% contribution; European (n = 61; 53 Australians, 5 South Africans and 3 Brazilians), allowing for up to 10% Asian contribution (with a single outlier of 26%); and African–European admixed (n = 9; 5 South Africans and 4 Brazilians), with as little as 4% African or European contribution (Extended Data Fig. 1b).

Total somatic mutations

In 183 prostate tumours, we identified 1,067,885 single-nucleotide variants (SNVs), 11,259 dinucleotides, 307,263 small insertions and deletions (indels, <50 bp), 419,920 copy-number alterations (CNAs) and 22,919 structural variants (SVs), with each mutational type elevated in tumours from African individuals (Fig. 1a). A median of 37.54% ± 5.51 of SNVs were C-to-T mutations, and the transition and transversion ratio was 1.282 cohort-wise. Tumours from African individuals had a higher rate of small mutations (SNVs and indels), with a median of 1.197 mutations per Mb (range 0.031–170.445) compared with those of Europeans (1.061 mutations per Mb; P = 0.013, two-sample t-test; exclusion of hypermutated tumours at >30 mutations per Mb, P = 0.028). The percentage of genome alteration (PGA) was similarly greater in Africans (7.26% versus 2.82%, P = 0.021). Correlation tests of ancestry and total somatic mutations also supported the findings (false-discovery rate (FDR) = 0.009 and FDR = 0.032 for SNVs and PGA, respectively; Extended Data Fig. 1d). The top six highest estimates of SV breakpoints per sample were observed among African patients (928–2,284 breakpoints). No overall differences between the ancestries were observed for chromothripsis (range, 52–55%) and chromoplexy (range, 33–38%), whereas tumours from African individuals demonstrated a trend towards a higher number of interchromosomal chromoplexic chains (1–6 versus 1–2). Moreover, the magnitude of all types of mutations was strongly correlated with one another (Fig. 1b). Thus, the more mutations a prostate tumour has of any given type, the more mutations it is likely to have of all types.
Fig. 1

Mutational density in prostate tumours of individuals with different ancestries.

a, The distribution of somatic aberrations (event number or number of base pairs) for 7 mutational types across 183 tumour–blood WGS pairs representing n = 61 European, n = 113 African and n = 9 admixed individuals. The box plots show the median (centre line), the 25th and 75th percentiles (box limits), and ±1.5× the interquartile range (whiskers). b, The different types of mutational burden observed in this cohort. The samples were percentile-ranked and then ordered on the basis of the sum of percentiles across the mutational types observed in each ancestral group (left). Right, Spearman correlation is shown between mutational types, with the dot size representing the magnitude of correlation and the background colour giving the statistical significance of FDR values.

Mutational density in prostate tumours of individuals with different ancestries.

a, The distribution of somatic aberrations (event number or number of base pairs) for 7 mutational types across 183 tumour–blood WGS pairs representing n = 61 European, n = 113 African and n = 9 admixed individuals. The box plots show the median (centre line), the 25th and 75th percentiles (box limits), and ±1.5× the interquartile range (whiskers). b, The different types of mutational burden observed in this cohort. The samples were percentile-ranked and then ordered on the basis of the sum of percentiles across the mutational types observed in each ancestral group (left). Right, Spearman correlation is shown between mutational types, with the dot size representing the magnitude of correlation and the background colour giving the statistical significance of FDR values.

Candidate oncogenic drivers

Prostate cancer is known to have a long tail of oncogenic drivers[18] across the spectrum of different mutational types[8] (Extended Data Fig. 2). Protein-coding mutations, including those that are probably and possibly damaging, were significantly greater in each African individual (PolyPhen-2, 14 versus 11 mutations in a European individual; P = 0.022, two-sample t-test; exclusion of hypermutated tumours, P = 0.039). We identified 482 coding and 167 non-coding drivers defined by the PCAWG consortium[20] (Extended Data Fig. 3a). A median of two (first quartile to third quartile, 2–4) coding drivers was observed in this study (Supplementary Table 2), with one (0–2) appearing to be specific to prostate cancer[7,8,17,18]. The coding driver genes significantly mutated among 183 patients were FOXA1, PTEN, SPOP and TP53 (10–25 patients, FDR = 1.34 × 10−21–9.44 × 10−5), whereas non-coding driver elements included the FOXA1 3′ UTR, SNORD3B-2 small RNA and a regulatory micro RNA promoter at chromosome 22: 38381983 (FDR = 9.12 × 10−13, FDR = 6.16 × 10−9 and FDR = 0.070, respectively). Recurrent CNAs of all the patients included 137 gains and 129 losses (GISTIC2, FDR < 0.10; Supplementary Table 3) with some spanning driver genes (Extended Data Fig. 3b), such as DNAH2 (FDR = 2.18 × 10−7), FAM66C (FDR = 1.30 × 10−9), FOXP1 (FDR = 0.005), FXR2 (FDR = 2.18 × 10−7), PTEN (FDR = 9.61 × 10−13), SHBG (FDR = 2.18 × 10−7) and TP53 (FDR = 2.18 × 10−7).
Extended Data Fig. 2

Somatic driver mutations in 183 prostate cancer patients of different ancestries.

The covariates on the right show the total number of altered samples for different mutational types. a, Search of the top 300 driver genes altered in primary prostate tumours among 183 specimens. Only driver genes discovered in PCAWG and this study, present in more than six patients or significantly different between Africans and Europeans are chosen for plotting. The top barplot shows the distribution of the number of prostate cancer drivers and/or that of PCAWG. The heatmap shows drivers found in this study (rows) for each patient (columns). Heatmaps are coloured by mutational type. The dual barplot on the left depicts gene-level comparisons of mutational recurrence directly between Africans and Europeans. Bottom covariates show the clinical features of patients. The percentage of transition/transversion mutations across 183 patients shows 1,364,210 small somatic mutations across chromosomes 1-Y. b, The bottom heatmap shows the top 22 of previously reported coding driver genes in prostate cancer observed in this study[7,8,17,18]. The left barplot shows statistical support of recurrence analysis for our study.

Extended Data Fig. 3

Discovery of prostate cancer drivers.

a, The number and types of PCAWG driver genes and elements studied in our cohort. b, Recurrent copy number alterations among 183 prostate tumours identified with a 99% confidence level using GISTIC v2 (Supplementary Methods). The figure shows GISTIC peaks of significant regions of recurrent amplification (red) or deletion (blue) supported by FDR < 0.01. c, Genome-wide scan for significantly recurrent breakpoints in our study. The quantile-quantile plot shows two-sided P-values for mutational densities across 183 prostate cancer patients. Multiple hypothesis corrections using the false discovery rate (FDR; Benjamini–Hochberg method) are shown in Supplementary Table 4. Generalized linear modelling (GLM) of somatic mutation densities along the genome with significant background mutational processes adjusted in the model is also shown. d, Bionano Genomics optical genome mapping at the HLA complex. Examples of HLA translocations from a European patient (ID 12543) and an African patient (ID UP2360) studied in this cohort are characterized by pairs of optical maps, each carrying a fusion junction with flanking fragments aligning to one side of the two reference breakpoints. Using the recurrent HLA breakpoints identified in this study, the genome map of the African specimen is found to have a low-end fusion function matched with chromosome 6 through a manual inspection of unfiltered consensus maps using Bionano Access v1.5.2. Note that the HLA alternate contig fused in the European tumour is different from one suggested by short-read sequencing (chr6_GL000252v2_alt). The reference genome map is an in silico digest of the human reference hg38 with the DLE-1 enzyme. Genome map sizes are indicated on the horizontal axis, in megabase (Mb) units. Matching fluorescent labels between sample and reference genome map are connected by grey lines.

Moreover, a fraction of somatic SVs (2 breakpoints each; 1,328 breakpoints in total) overlapped with 156 driver genes reported as altered by significantly recurrent breakpoints in the PCAWG study[20], while, using a generalized linear model with adjustable background covariates, we identified an additional 100 genes to be significantly affected by SV breakpoints (FDR = 1.3 × 10−43–0.097; Extended Data Fig. 3c and Supplementary Table 4). For more than 20% of tumours, SV breakpoints coexisted with other mutational types within DNAH2, ERG, FAM66C, FXR2, PTEN, SHBG and TP53. Using optical genome mapping—an alternative non-sequencing method to examine for chromosomal abnormalities[21]—we validated recurrent breakpoints in HLA regions (DQA1 and DQB1 genes), identifying translocations between the 3 Mb HLA complex at chromosome 6 and its corresponding HLA alternative contigs (Extended Data Fig. 3d). Differences in oncogenic driver alterations between ancestries were observed (Fig. 2a,b). Specifically, tumours from African individuals were more likely to have CNAs and mutations in SETBP1 (frequency = 0.33, odds ratio (OR) = 0.357, P = 0.012), DDX11L1 (frequency = 0.48, OR = 0.24, P = 0.0001), STK19 (frequency = 0.25, OR = 0.215, P = 0.004) and NCOA2 (frequency = 0.51, OR = 0.172, P = 3.14 × 10−6), along with SVs in PCAT1 (frequency = 0.13, OR = 0.11, P = 0.012). By contrast, SVs for TMPRSS2 (frequency = 0.38, OR = 3.639, P = 0.0006) and ERG (frequency = 0.34, OR = 3.159, P = 0.003) were more notable among Europeans. Although several DNA-damage repair genes and other genes previously associated with African ancestry were not significantly altered between Africans and Europeans in this study, 10 were solely altered in Africans with most in the coding sequence (frequency = 0.009–0.035). All of these data support the inclusion of a larger number of under-represented populations in clinical enrolment for the benefit of precision oncology studies[22].
Fig. 2

Taxonomy and differences in driver mutations in prostate cancer by ancestry.

a, The selected 35 driver genes classified as (1) the most altered in this study (>10 patients), irrespective of ancestry (green); (2) DNA-damage repair (DDR) genes that are known to be associated with African ancestry (orange); (3) other ancestry-associated genes studied in prostate cancer (assoc., purple). The OR, 95% confidence interval and two-sided P value (<0.05) were calculated using Fisher exact tests for count data and including 10 African-specific (OR = 0) and 3 European-specific (OR = infinity) genes. Significance was observed for TMPRSS2 (P = 0.0006), ERG (P = 0.003), SETBP1 (P = 0.012), DDX11L1 (P = 0.0001), STK19 (P = 0.004), NCOA2 (P = 3.14 × 10−6), PCAT1 (P = 0.012), PAPSS2 (P = 0.042) and MTCH2 (P = 0.014). b, The mutational frequency of the altered driver genes between Africans and Europeans by mutational type (CDS, non-coding, SV and CNA). c, An integrative clustering analysis reveals four distinct molecular subtypes of prostate cancer. The molecular subtypes are illustrated by small somatic mutations (coding regions and non-coding elements), somatic CNAs and somatic SVs. The proportion and association between the iCluster membership and patient ancestry are illustrated in  d. Additional unsupervised consensus clustering on each data type was performed and mostly recapitulated the subtypes by integrative analysis. d, Total somatic mutations across four molecular subtypes in this study. The dashed lines indicate the median values of mutational densities across the four subtypes. For each subtype, patients are ordered on the basis of their ancestry.

Taxonomy and differences in driver mutations in prostate cancer by ancestry.

a, The selected 35 driver genes classified as (1) the most altered in this study (>10 patients), irrespective of ancestry (green); (2) DNA-damage repair (DDR) genes that are known to be associated with African ancestry (orange); (3) other ancestry-associated genes studied in prostate cancer (assoc., purple). The OR, 95% confidence interval and two-sided P value (<0.05) were calculated using Fisher exact tests for count data and including 10 African-specific (OR = 0) and 3 European-specific (OR = infinity) genes. Significance was observed for TMPRSS2 (P = 0.0006), ERG (P = 0.003), SETBP1 (P = 0.012), DDX11L1 (P = 0.0001), STK19 (P = 0.004), NCOA2 (P = 3.14 × 10−6), PCAT1 (P = 0.012), PAPSS2 (P = 0.042) and MTCH2 (P = 0.014). b, The mutational frequency of the altered driver genes between Africans and Europeans by mutational type (CDS, non-coding, SV and CNA). c, An integrative clustering analysis reveals four distinct molecular subtypes of prostate cancer. The molecular subtypes are illustrated by small somatic mutations (coding regions and non-coding elements), somatic CNAs and somatic SVs. The proportion and association between the iCluster membership and patient ancestry are illustrated in  d. Additional unsupervised consensus clustering on each data type was performed and mostly recapitulated the subtypes by integrative analysis. d, Total somatic mutations across four molecular subtypes in this study. The dashed lines indicate the median values of mutational densities across the four subtypes. For each subtype, patients are ordered on the basis of their ancestry.

Integrative clustering analysis

Molecular subtyping of tumours is a standard approach in cancer genomics to stratify patients into different degrees of somatic alterations in a homogeneous population, with an implication for clinical use[9-11]. Identifying five out of the seven TCGA oncogenic driver-defined subtypes in our study[7], European patients were 25% more likely than African patients to be classified (Supplementary Table 5 and Extended Data Fig. 4a–d). Whereas TMPRSS2-ERG fusions (predominantly 3 Mb deletions) were significantly elevated in our tumours from European individuals compared with from African individuals (37.7% versus 13.3%; OR = 3.919, P = 0.0004), albeit not significantly, African patients were 1.3-fold more likely to present with SPOP-coding mutations (MATH and BTB domains).
Extended Data Fig. 4

TCGA molecular taxonomy.

a, Seven important oncogenic drivers identified by TCGA within our African and European patients. b, Coding mutations observed within SPOP and FOXA1 genes. Rarely, a mutation at the BTB domain of SPOP gene is shown (R221C in an African patient, KAL0072). FH, forkhead. c, ETV1 fusions within positive patients caused by copy number (CN) losses and/or structural variants (DEL, deletion; ICX, interchromosomal translocation; and INV, unbalanced or balanced inversion). CN changes in chromosome 7 show the ETV1 loss with log2 CN ratio less than −0.2. d, ERG fusions caused by CN losses and/or structural variants.

For further molecular classification, we performed iCluster analysis on all mutational types (small mutations, CNAs and SVs) identifying four subtypes—A to D (Fig. 2c,d and Supplementary Table 6). We found that subtype A is mutationally quiet (1.01 mutations per Mb, 0.50 breakpoints per 10 Mb, 2% PGA); by contrast, subtype D showed the greatest mutational density (1.91 mutations per Mb, 1.08 breakpoints per 10 Mb, 31% PGA) with a mixture of CN gains and losses, whereas subtypes B and C were marked by substantial CN gains or losses, respectively (Fig. 3a). The quiet subtype seems to be common in prostate cancer studies[7,9,23], while the number of pan-cancer consensus drivers[20] increased from subtype A (median, 2 drivers) to B (median, 3 drivers), C (median, 3 drivers) and D (median, 4 drivers).
Fig. 3

Significance of somatic aberrations across four diverse subtypes.

a, Analysis of the long tail of driver genes using different combinations of mutational types (CDS, coding driver data; NC, non-coding driver data; SV, significantly recurrent breakpoint data; and CN, gene-level CN data), resulting in the identification of 124 preferentially mutated genes among the subtypes. Ordered by mutational frequency, 100 (80.6%) have been reported as significantly recurrent mutations/SV breakpoints in the PCAWG Consortium[20], and 24 (19.4%) are significantly mutated in this study (marked by asterisks). Using iClusterplus, unsupervised hierarchical clustering of all mutational types identified four prostate cancer subtypes (A–D; Fig. 2c), presented for 183 patients (rows) and 124 mutated genes (columns), with each subgroup ordered by ancestry. Ancestrally diverse subtypes A and C are mutationally quiet and are marked by CN loss, respectively. African-specific/predominant subtypes B and D are marked by CN gains and are mutationally noisy, respectively. Three genes on chromosome X, KDM6A, ATRX and ZMYM3, are considered to be significant due to the abundance of homozygous (homo.) loss present in subtype C. Chr., chromosome; hemi., hemizygous; ISUP, International Society of Urologic Pathologists; NA, not applicable. b, Kaplan–Meier plot of biochemical relapse (BCR)-free survival proportion of European patients for subtype A (n = 161) versus C (n = 19). c, Kaplan–Meier plot of the cancer survival probability of European patients for subtype A (n = 82) versus C (n = 17). For b and c, the probability estimates, 95% confidence intervals and two-sided P values (log-rank test) are indicated.

Significance of somatic aberrations across four diverse subtypes.

a, Analysis of the long tail of driver genes using different combinations of mutational types (CDS, coding driver data; NC, non-coding driver data; SV, significantly recurrent breakpoint data; and CN, gene-level CN data), resulting in the identification of 124 preferentially mutated genes among the subtypes. Ordered by mutational frequency, 100 (80.6%) have been reported as significantly recurrent mutations/SV breakpoints in the PCAWG Consortium[20], and 24 (19.4%) are significantly mutated in this study (marked by asterisks). Using iClusterplus, unsupervised hierarchical clustering of all mutational types identified four prostate cancer subtypes (A–D; Fig. 2c), presented for 183 patients (rows) and 124 mutated genes (columns), with each subgroup ordered by ancestry. Ancestrally diverse subtypes A and C are mutationally quiet and are marked by CN loss, respectively. African-specific/predominant subtypes B and D are marked by CN gains and are mutationally noisy, respectively. Three genes on chromosome X, KDM6A, ATRX and ZMYM3, are considered to be significant due to the abundance of homozygous (homo.) loss present in subtype C. Chr., chromosome; hemi., hemizygous; ISUP, International Society of Urologic Pathologists; NA, not applicable. b, Kaplan–Meier plot of biochemical relapse (BCR)-free survival proportion of European patients for subtype A (n = 161) versus C (n = 19). c, Kaplan–Meier plot of the cancer survival probability of European patients for subtype A (n = 82) versus C (n = 17). For b and c, the probability estimates, 95% confidence intervals and two-sided P values (log-rank test) are indicated. Using all of the mutational types in the analysis, 124 genes were significantly mutated across the four subtypes (FDR = 3.742 × 10−13–0.067; Fig. 3a), occurring in 31 to 183 patients (frequency, 0.17–1). Among them, 100 genes were reported as oncogenic drivers in the PCAWG[20], and FOXA1 and SPOP genes acting as the TCGA subtypes were also replicated in this analysis, while the 24 new mutated genes among the subtypes were predominantly affected by SV breakpoints and CNAs. The median number of mutated genes ranged from 28 (range 3–105) for subtype A to 82, 98 and 93 for subtypes B, C and D, respectively (42–109, 72–112, 49–107). Although different mutational types tended to co-occur within genes and/or patients (Supplementary Table 7), small mutations (coding and non-coding) were noticeably observed in the quiet subtype A, supporting acquisition early in tumorigenesis[24]. Our preferentially mutated genes within tumour subtypes resemble the long tail of prostate cancer drivers[18], with some highly impacting many tumours, but most only affecting a few tumours. The 124 preferentially mutated genes within our tumour subtypes corresponded to 8 TCGA/ICGC cancer pathways (Supplementary Information and Extended Data Fig. 5). Whereas six showed slightly elevated mutational frequencies in tumours from African individuals, genes affecting epigenetic mechanisms were significantly biased towards European individuals (OR = 5.586, P = 2.9 × 10−7; Extended Data Fig. 6b). Pathway enrichment analysis supported five functional networks of the cancer pathways, with two of them involved in signal transduction and DNA checkpoint processes that five out of the eight pathways interacted with (Extended Data Fig. 6a and Supplementary Table 8).
Extended Data Fig. 5

Prostate cancer genes and pathways.

The search of our 124 preferentially mutated genes across tumour subtypes is carried out using the TCGA and ICGC cancer databases. The top affected genes for each pathway are present with lollipop plots to show their hotspots of simple coding mutations if they existed. Mutational frequencies of each altered gene in a pathway are separately measured between Africans (n = 113) and Europeans (n = 61) and shown on the right as a percentage in order (AFR, EUR).

Extended Data Fig. 6

Major biological pathways and networks of prostate cancer.

a, Networks of functional interactions between driver genes are shown for each cancer pathway. Nodes represent Gene Ontology biological processes and Reactome pathways and edges show functional interactions. b, Pathway alteration frequencies between African and European. A sample was considered altered in a given pathway if at least a single gene in the pathway had a genomic alteration (see Extended Data Fig. 5). P-values indicate the level of significance (two-sided Fisher’s exact test).

Global molecular subtypes

By combining molecular profiling and patient demographics, genetic ancestry and geography, we identified a new prostate cancer taxonomy that we define as GMS (Fig. 2d). Whereas all European patients from Australia (n = 53) and Brazil (n = 3) were limited to GMS-A and GMS-C, tumours from African individuals were dispersed across all four subtypes. We found that GMS-B and GMS-D predominate in African individuals, with GMS-B including a single patient of admixed ancestry (92% African) and GMS-D including a single admixed (63% African) and a single European ancestral patient. The latter individual was one of only five Europeans in our study who was born and raised in Africa. Compared with the other patients of European ancestry, this patient showed the highest mutational density across all types. Alternative consensus clustering of individual mutational types mostly recapitulated the subtypes by integrative analysis (Supplementary Table 6). By further including Chinese Asian high-risk prostate cancer data[11] (n = 93; Extended Data Fig. 7a), we found that GMS-A is ancestrally and geographically universal, whereas GMS-D remained African specific, with a new African–Asian GMS-E emerging. GMS-B remained African specific and GMS-C remained European–African specific. Although all of the patients were treatment naive at the time of sampling, our European cohort was recruited with extensive follow-up data (median ± s.d., 122.5 ± 44.4 months). Interestingly, biochemical relapse (Fig. 3b) and death-free survival probability (Fig. 3c) explains better clinical outcomes for patients presenting with the universal over the European-African GMS (GMS-A versus GMS-C, log-rank test, P = 0.008 and P = 0.041, respectively).
Extended Data Fig. 7

Molecular subtypes in prostate cancer and pan-cancers.

a, Unsupervised hierarchical clustering of primary prostate tumours across three major ancestral groups was performed using total somatic mutations present within WGS normalized data. Admixed individuals were also tested in prostate cancer subtypes to which they belonged. b, Molecular subtyping of total somatic mutations within pan-cancer studies, namely pancreatic, ovarian, breast and liver cancers. Raw data of small somatic mutations, structural variants and copy number alterations acquired per cancer were retrieved from the PCAWG[13]. For each subtype, patients are ordered based on their ancestry. Ancestral groups are assigned using a cut-off of ancestral contribution greater than 70%; otherwise, considered as Admixed.

Our GMS taxonomy could leverage pan-cancer studies in the following ways. First, a sampling strategy of patients from the PCAWG project was rather homogeneous in each cancer, therefore inhibiting the discovery of globally restricted subtypes[3,13] (Extended Data Fig. 7b). Second, genetic ancestral[25] and geographical data of patients should be included in molecular profiling of cancers. Finally, the inclusion of ethnic disparity in cancer studies would need to properly address genetic admixture in a sampling cohort, with a too low ancestral cut-off appearing to create highly admixed, but similar, ancestry among individuals, therefore discouraging ethnically diverse samples.

New and known mutational signatures

Approximating the contribution of mutational signatures to individual cancer genomes facilitates the association of the signatures with exogenous or endogenous mutagen exposures that contribute to the development of human cancer[3]. Here we generated a list of CN and SV signatures and their contributions to prostate cancer using non-negative matrix factorization[26] (Extended Data Fig. 8a,b). Combined with a known catalogue of small mutational signatures, including single-base substitutions (SBSs), doublet base substitutions (DBSs) and indels (IDs), we observed not only a substantial variation in the number of mutational features but also over-representation in tumours from African individuals (Extended Data Fig. 8c). Overall, 96 SBS, 78 DBS and 83 ID features examined had significantly higher totals in African individuals (SBSs, 3,399 versus 2,840 in Europeans, P = 0.014; DBSs, 42 versus 32, P = 0.006; IDs, 374 versus 360, P = 0.016, two-sample t-tests). We generated six de novo signatures for each small signature type (median cosine similarity, 0.986, 0.856 and 0.976, respectively), corresponding to 12, 7 and 8 global signatures, respectively (median cosine similarity, 0.966, 0.850 and 0.946, respectively; Extended Data Fig. 9), with 26 likely to be of biological origin (SBS47, possible sequencing artefacts). DBSs accounted for about 1% of the prevalence of SBSs. The CN features were also greater in Africans (CN, 3,971 versus 2,721, P = 1.92 × 10−8; SV, 94 versus 88, P = 0.100). The SV features defined in a recent pan-cancer study[26] were each mutually exclusive and included simple SVs (split according to size, replication timing and occurrence at fragile sites), templated insertions (split by size), local n-jumps and local–distant clusters. The factorization of a sample-by-mutation spectrum matrix identified six CN signatures (CN1–6) and eight SV signatures (SV1–8), as well as their contributions to each tumour.
Extended Data Fig. 8

Known and novel mutational signatures in prostate cancer.

a, Copy number signatures in prostate cancer across 45 CN features ranked by mutational processes observed. The six most distinctive signatures and their important components extracted by the NMF algorithm were run on the sample size of 183 genomes. Bar charts represent the estimated proportion of each event feature assigned to each signature (rows sum to one). b, Structural variation signatures in prostate cancer ranked by mutational processes observed from small deletion to reciprocal rearrangement. The eight most distinctive signatures and their important components extracted from 44 features using the NMF algorithm were run on the sample size of 183 genomes. Bar charts represent the estimated proportion of each event feature assigned to each signature (rows sum to one). c, Frequency of SBS, DBS, ID, CN and SV features across 183 tumours. Colours at the bottom panel show the following ancestral groups: i) African, red; ii) Admixed, green; and iii) European, blue. d, Stacked barplots of multiple signature exposures for each mutational type enriched per patient and ranked by ancestral group. In many cases, certain mutational signatures occur more frequent in a tumour than others. The top enrichment of small- to large-size mutational signatures mentioned is shown for each patient in Supplementary Table 9 (see Enrichment). Copy number and structural variation signatures (CN1-6 and SV1-8, respectively) are the first identified in this study for prostate cancer, and their top enrichment of signature mixture/exposure per patient appears to be significantly associated with our GMS (one-way ANOVA or Fisher’s exact test, two-sided P-values = 5.1e-07–0.017), considering either de novo or global mutational signatures discovered in the Catalogue of Somatic Mutations in Cancer (COSMIC). This supports a role of GMS in explaining intrinsic and extrinsic mutational processes in cancer.

Extended Data Fig. 9

Total profiles of SBS, DBS, ID, CN and SV signatures.

The classification of each signature type (SBS, 96 classes; DBS, 78 classes; ID, 83 classes; CN, 45 classes; and SV, 44 classes) is described in Supplementary Methods. The plotted data are available in digital form (Supplementary Table 9).

We found that the full spectrum of mutational signatures (SBSs, DBSs, IDs, CNs and SVs) supports our newly described GMS. Enrichment records of the top signatures in each tumour were significantly associated type by type with the taxonomic subtypes, except for DBSs (P = 5.1 × 10−7–0.017, one-way analysis of variance (ANOVA) or Fisher exact test; Extended Data Fig. 8d). Regardless of the signature type, 13 out of 40 mutational signatures showed either inverse or proportionate correlations with our GMS (FDR = 4.97 × 10−13–0.095, Spearman correlation; Fig. 4a). Duplication signatures, including CN1 (tandem duplication), CN4 (whole-genome duplication), SV2 (insertion) and SV5 (large duplication), were biased to the most mutationally noisy subtype (Extended Data Fig. 8a, b), with CN4 and SV5 frequent in Africans (correlation coefficient = −0.24, FDR = 0.005–0.006). Figure 4b shows that the duplication signatures have at least a 1.5× greater proportion of genomic aberrations in GMS-B, GMS-C and GMS-D compared with the universal GMS-A. Furthermore, the African-specific subtype GMS-B consisted of several CN4 and SV5 genomic aberrations composed predominantly of CN amplification (>5 copies and mainly >100 kb in length) and tandem duplication (<5 Mb in size occurred during early to late timing of DNA replication), respectively. Moreover, the mutational density of 30 out of 32 genes that are highly mutated in our GMS and reported in prostate cancer was significantly correlated with different somatic signatures, with most observed in CN2, CN6 and SV6 signatures that were mainly caused by deleted genomes (FDR = 1.61 × 10−7–0.082).
Fig. 4

Estimates of genomic aberrations contributed by each mutational signature.

a, Correlation plots of total mutational signatures along with clinical and genomic characteristics. The size of each dot represents the FDR values of Spearman correlation P values (two-sided) using Benjamini–Hochberg correction. The colours of each dot represent the correlation coefficient. GMS subtypes are assigned as 1–4 for subtypes A–D, respectively; African, admixed and European are recorded as 1–3, respectively. The correlation of 32 recurrent genes in prostate cancer is shown on the x axis. Many small- or large-sized mutational signatures agree with the GMS. HR, homologous recombination; PSA, prostate-specific antigen. b, Sankey diagram depicting a proportion of duplication signatures observed across cancer subtypes. Duplication features, including amplification (Amp), translocation (trans) plus, local n-jump, templated insertion (ins), amplification loss of heterozygosity (LOH), gain, tandem duplication and gain LOH (Extended Data Fig. 8a,b) are summed per subtype and equally weighted to 20. Links connecting between nodes (GMS, signatures and features) have widths proportional to the total number of CN or SV features across all patients within each GMS subtype to which they belong. Note that we believe that GMS-B is the identity of the African-specific genomic subtype.

Estimates of genomic aberrations contributed by each mutational signature.

a, Correlation plots of total mutational signatures along with clinical and genomic characteristics. The size of each dot represents the FDR values of Spearman correlation P values (two-sided) using Benjamini–Hochberg correction. The colours of each dot represent the correlation coefficient. GMS subtypes are assigned as 1–4 for subtypes A–D, respectively; African, admixed and European are recorded as 1–3, respectively. The correlation of 32 recurrent genes in prostate cancer is shown on the x axis. Many small- or large-sized mutational signatures agree with the GMS. HR, homologous recombination; PSA, prostate-specific antigen. b, Sankey diagram depicting a proportion of duplication signatures observed across cancer subtypes. Duplication features, including amplification (Amp), translocation (trans) plus, local n-jump, templated insertion (ins), amplification loss of heterozygosity (LOH), gain, tandem duplication and gain LOH (Extended Data Fig. 8a,b) are summed per subtype and equally weighted to 20. Links connecting between nodes (GMS, signatures and features) have widths proportional to the total number of CN or SV features across all patients within each GMS subtype to which they belong. Note that we believe that GMS-B is the identity of the African-specific genomic subtype.

Evolution of GMS

Timeline estimates of individual somatic events reflect evolutionary periods that differ from one patient to another; for example, a cluster of identical alterations derived from clones in one patient presented as subclonal events in another patient (Extended Data Fig. 10a,b). However, they provide in part the order of driver mutations and CNAs present in each sample[24]. The reconstruction of aggregating single-sample ordering of all drivers and CNAs reveals different evolutionary patterns that are unique to each GMS subtype (Fig. 5a,b and Extended Data Fig. 10c). We drew approximate cancer timelines for each GMS subtype portraying the ordering of driver genes, recurrent CNAs and signature activities chronologically interleaved with whole-genome duplication and the emergence of the most recent common ancestor leading up to diagnosis. Basically, significantly co-occurring interactions of the drivers and CNAs are shown (OR = 2.6–97.8, P = 2.04 × 10−30–0.01), supporting their clonal and subclonal ordering states within the reconstructed timelines. SBS and indel signatures that are abundant in each GMS subtype display changes in their mutational spectrum between the clonal and subclonal state, suggesting a difference in mutation rates. The plot of clock-like CpG-to-TpG mutations and patient-age adjustment shows a median mutation rate of as low as 0.968 per year for the universal GMS, but a highest rate of 1.315 per year observed in the African-individual-specific GMS-D. GMS-B and GMS-C have rates of 1.144 and 1.092 per year, respectively. Assessing the relative timing of somatic driver events, TP53 mutations and accompanying 17p loss are of particular interest, occurring early in GMS-C progression and at a later stage in GMS-A. League model relative timing of driver events (Supplementary Information) is consistent with a fraction of probability distribution of the TP53 alterations at the early stage, but most are at an intermediate state of evolution (Extended Data Fig. 10d). This basic knowledge of in vivo tumour development suggests that some tumours could have a shorter latency period before reaching their malignant potential, so known genomic heterogeneity of their primary clones is paramount to pave the way for early detection.
Extended Data Fig. 10

Stages of prostate tumour development.

a, Clonal architecture and its frequency in prostate cancer between Africans and Europeans. Tumours are divided into three groups: monoclonal, linear and branching polyclonal. The number of small somatic mutations (SSM) and CNAs as percentage of genome alteration (PGA) is provided as median and range in bracket. Cancer cell fraction (CCF) in each clone and/or subclone is shown in a circular node. Tumours that show characteristics consistent with being polytumours or with multiple independent primary tumors are excluded to remain conservative. b, Unbiased hierarchical clustering of CNAs between clonal (trunk) and subclonal (branch) mutations. Trunk mutations encompass those that occur between the root node (normal) and its only child node, while all others are classified to have occurred in branch. Red indicates gain; blue indicates loss; and rows indicate patients. Unidentified regions in trunk and branch are assumed to have neutral copy number. ConsensusClusterPlus showed seven CNA clusters among our patients to be optimal. The figure shows that a trunk alteration from one patient is mutationally similar to a branch alteration from another, rather than to other trunk ones from different patients in a cohort. c, Cancer timelines of GMS-B and -D identified in this study. Detailed explanation is provided in Fig. 5. Significant somatic interactions based on Fisher’s Exact test are indicated by odds ratio (OR) estimates and two-sided P-values on the top left panels. Interaction significance between somatic events in GMS-B and -D has P-values ranging from 3.16e-22–0.041 and 9.11e-25, respectively. Mutation rate plots show the median ±2× standard error of fitted data as dashed lines and error bands, respectively. d, Relative ordering model (PhylogicNDT LeagueModel) results for a cohort of 66 samples. The samples can be analysed if they have somatic events of interest prevalent greater than 5% of the sample size and have informative clonal status available for each event (16 events). Probability distributions show the uncertainty of timing for specific events in the cohort. e, Molecular timing distribution of copy number gains and loss of heterozygosity (LOH) between Africans and Europeans. Pie charts depict the distribution of the inferred mutation time for a given copy number alteration. Orange denotes early clonal gains/LOH, with a gradient to green for late gains/LOH. The size of each chart is proportional to the recurrence of this event across different patients. Most of the gains and LOH are considered early clonal based on MutationTimeR results. Whole-genome duplication is more frequent in Africans (63%) than in Europeans (57%).

Fig. 5

Evolutionary history of globally mutated subtypes.

a, The cancer timeline of the universal subtype (A) begins from the fertilized egg to the age of the patients in a cohort. b, The cancer timeline of GMS-C. Estimates for major events, such as whole-genome duplication (WGD) and the emergence of the most-recent common ancestor (MRCA) are used to define the early, variable, late and subclonal stages of tumour evolution approximately in chronological time. When the early and late clonal stages are uncertain, the variable stage is assigned. Driver genes and CNAs are shown in each stage if present in previous studies[8,20] and defined by the MutationTime.R program. Mutational signatures (Sigs) that, on average, change over the course of tumour evolution, or are substantially active, are shown as described in the Supplementary Information. The dagger symbols denote alterations that are found to have different timing. Significant pairwise interaction events between the mutations and CNAs were computed to support cancer timelines. The OR and two-sided P value were calculated using Fisher exact tests. Co-occurrence or mutually exclusive event is considered when OR > 2 or OR < 0.5, respectively. The interaction significance between pairs in GMS-A and GMS-C has P values ranging from 2.04 × 10−30 to 0.047 and from 1.64 × 10−27 to 0.045, respectively. Median mutation rates of CpG-to-TpG burden per Gb are calculated using the age-adjusted branch length of cancer clones and maximally branching subclones. The mutation rate plots in a and b show the median ± 2 s.e. of fitted data as dashed lines and error bands, respectively. c, Schematic of a world map with the distribution of GMS-A–D among ancestrally/globally diverse populations. The gene–environment interaction of GMS is shown on the right. The contingency table of the number of patients with different ancestries (germline variants) stratified by subtypes and associated with certain geography or environmental exposure (two-sided P = 0.0005, Fisher exact test with 2,000 bootstraps).

Evolutionary history of globally mutated subtypes.

a, The cancer timeline of the universal subtype (A) begins from the fertilized egg to the age of the patients in a cohort. b, The cancer timeline of GMS-C. Estimates for major events, such as whole-genome duplication (WGD) and the emergence of the most-recent common ancestor (MRCA) are used to define the early, variable, late and subclonal stages of tumour evolution approximately in chronological time. When the early and late clonal stages are uncertain, the variable stage is assigned. Driver genes and CNAs are shown in each stage if present in previous studies[8,20] and defined by the MutationTime.R program. Mutational signatures (Sigs) that, on average, change over the course of tumour evolution, or are substantially active, are shown as described in the Supplementary Information. The dagger symbols denote alterations that are found to have different timing. Significant pairwise interaction events between the mutations and CNAs were computed to support cancer timelines. The OR and two-sided P value were calculated using Fisher exact tests. Co-occurrence or mutually exclusive event is considered when OR > 2 or OR < 0.5, respectively. The interaction significance between pairs in GMS-A and GMS-C has P values ranging from 2.04 × 10−30 to 0.047 and from 1.64 × 10−27 to 0.045, respectively. Median mutation rates of CpG-to-TpG burden per Gb are calculated using the age-adjusted branch length of cancer clones and maximally branching subclones. The mutation rate plots in a and b show the median ± 2 s.e. of fitted data as dashed lines and error bands, respectively. c, Schematic of a world map with the distribution of GMS-A–D among ancestrally/globally diverse populations. The gene–environment interaction of GMS is shown on the right. The contingency table of the number of patients with different ancestries (germline variants) stratified by subtypes and associated with certain geography or environmental exposure (two-sided P = 0.0005, Fisher exact test with 2,000 bootstraps).

Discussion

Our study represents one of the largest whole-genome prostate cancer genome resources for sub-Saharan Africa (a summary is provided in Supplementary Table 12). Acknowledging the lack of information on clinical staging for the South African patients (recruited at diagnosis), we describe a prostate cancer molecular taxonomy, identifying ancestrally distinctive GMS. Compared to previous taxonomy using significantly mutated genes in prostate cancer[7,18], we found that GMS subtypes compliment known subtypes such as SPOP and FOXA1 mutations, in contrast to under-represented subtypes in this study, including gene fusions (Extended Data Fig. 4a). We also found that GMS subtypes correlate with mutational signatures reported in the known catalogue of somatic mutations in cancer, in which each tumour is represented by different degrees of exogenous and endogenous mutagen exposures[3]. Our study used the analysis of evolution across 38 cancer types by the PCAWG consortium[24], recognizing that each GMS subtype represents a unique evolutionary history with drivers and mutational signatures varied between cancer stages and linking somatic evolution to a patient’s demographics. Thus, some represent rare or geographically restricted signatures that have not been observed in pan-cancer studies[3,13]. We considered two extreme cases, universal GMS-A versus African-specific GMS-B and GMS-D, that would have been influenced by two different mutational processes for conceptual simplicity (Fig. 5c). One factor is predisposing genetics[27-29] contributing to endogenous mutational processes, especially those with significant germline–somatic interactions, such as the TMPRSS2-ERG fusion that is less frequently observed in men of African and Asian ancestry[11,30], germline BRCA2 mutations and the somatic SPOP driver co-occurred with their respective counterparts[31,32]. Another factor is modifiable environmental attributes that are specific to certain circumstances or geographical regions that, to date, have not been observed in prostate cancer. They act as mutagenic forces leading to the positive selection of point mutations throughout life in healthy tissues[33,34] and cancers[35], forming fluid boundaries between normal ageing and cancer tissues. According to Ottman[36], the above-mentioned model of gene–environment interaction is observed when there is a different effect of a genotype on disease in individuals with different environmental exposures or, alternatively, a different effect of an environmental exposure on disease in individuals with different genotypes. Other GMS subtypes would be a combination of the two processes, warranting a need for larger populations capturing ancestral versus ethnic and geographical diversity. As such, the study directly accounts for the large spatiogenomic heterogeneity of prostate cancer and its associated evolutionary history in understanding the disease aetiology. Our study suggests that larger genomic datasets of geo-ethnically diverse and ancestrally defined populations in a unified analysis will continue to identify rare and geographically restricted subtypes in prostate cancer and potentially other cancers. We demonstrate that ancestral and geographical attributes of patients could facilitate those studies on cancer population genomics, an alternative to cancer personalized genomics, for a better scientific understanding of nature versus nurture.

Methods

Patient cohorts and WGS

Our study included 183 treatment-naive patients with prostate cancer who were recruited under informed consent and appropriate ethics approval (Supplementary Information 2) from Australia (n = 53), Brazil (n = 7) and South Africa (n = 123). While matched for pathological grading, as previously reported, prostate-specific antigen levels are notably elevated within our African patients[16] and we cannot exclude on the basis of potential metastasis (as data on metastases in this cohort are unavailable). DNA extracted from fresh tissue and matched blood underwent 2 × 150 bp sequencing on the Illumina NovaSeq instrument (Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research).

WGS processing and variant calling

Each lane of raw sequencing reads was aligned against human reference hg38 + alternative contigs using bwa (v.0.7.15)[37]. Lane-level BAM files from the same library were merged, and duplicate reads were marked. The Genome Analysis Toolkit (GATK, v.4.1.2.0) was used for base quality recalibration[38]. Contaminated and duplicate samples (n = 8) were removed. We implemented three main pipelines for the discovery of germline and somatic variants, with the latter including small (SNV and indel) to large genomic variation (CNAs and SVs). The complete pipelines and tools used are available from the Sydney Informatics Hub (SIH), Core Research Facilities, University of Sydney (see the ‘Code availability’ section). Scalable bioinformatic workflows are described in Supplementary Information 4. Genetic ancestry was estimated using fastSTRUCTURE (v.1.0)[39], Bayesian inference for the best approximation of marginal likelihood of a very large variant dataset. Reference panels for African and European ancestry compared in this study were retrieved from previous whole-genome databases[19].

Analysis of chromothripsis and chromoplexy

Clustered genomic rearrangements of prostate tumours were identified using ShatterSeek (v.0.4)[40] and ChainFinder (v.1.0.1)[41]. Our somatic SV and somatic CNA call sets were prepared and co-analysed using custom scripts (see the ‘Code availability’ section; Supplementary Information 6).

Analysis of mutational recurrence

We used three approaches to detect recurrently mutated genes or regions based on three mutational types, including small mutations, SVs and CNAs (Supplementary Information 7). In brief, small mutations were tested within a given genomic element as being significantly more mutated than the adjacent background sequences. The genomic elements retrieved from syn5259886, the PCAWG Consortium[20], were a group of coding sequences and ten groups of non-coding regions. SV breakpoints were tested in a given gene for their statistical enrichment using gamma–Poisson regression and corrected by genomic covariates[12]. Focal and arm-level recurrent CNAs were examined using GISTIC (v.2.0.23)[42]. Known driver mutations in coding and non-coding regions published in PCAWG[20,43,44] were also recorded in our 183 tumours, and those specific to prostate cancer genes were also included[7,8,12,17,18].

Integrative analysis of prostate cancer subtypes

Integrative clustering of three genomic data types for 183 patients was performed using iClusterplus[11,45] in R, with the following inputs: (1) driver genes and elements; (2) somatic CN segments; and (3) significantly recurrent SV breakpoints. We ran iClusterPlus.tune with clusters ranging from 1 to 9. We also performed unsupervised consensus clustering on each of the three data types individually. Association analysis of genomic alteration with different iCluster subtypes was performed in detail (Supplementary Information 8). Differences in driver mutations, recurrent breakpoints and somatic CNAs across different iCluster subtypes were reported.

Comparison of iCluster with Asian and pan-cancer data

To compare molecular subtypes between extant human populations, the Chinese Prostate Cancer Genome and Epigenome Atlas (CPGEA, PRJCA001124)[11] was merged and processed with our integrative clustering analysis across the three data types described above, with some modifications. Moreover, we leveraged the PCAWG consortium data[13] to define molecular subtypes across different ethnic groups in other cancer types using published data of somatic mutations, SV and GISTIC results by gene. Four cancer types consisting of breast, liver, ovarian and pancreatic cancers were considered due to existing primary ancestries of African, Asian and European with at least 70% contribution. Full details are provided in Supplementary Information 8.4. PCAWG[13] participants with prostate cancer were retrieved to compare with Australian data with clinical follow-up. Only those of European ancestry greater than 90% (n = 139) were analysed for the three genomic data types of iCluster subtyping, as well as individual consensus clustering. Clustering results identical to the larger cohort size mentioned above were chosen for association analyses. Differences in the biochemical relapse and lethal prostate cancer of the participants across the subtypes were assessed using the Kaplan–Meier plot followed by a log-rank test for significance.

Analysis of mutational signatures

Mutational signatures (SBSs, DBSs and indels), as defined by the PCAWG Mutational Signatures Working Group[3], were fit to individual tumours with observed signature activities using SigProfiler[46]. Non-negative matrix factorization was implemented to detect de novo and global signature profiles among 183 patients and their contributions. New mutational genome rearrangement signatures (CN and SV) were also performed using non-negative matrix factorization, with 45 CN and 44 SV features examined across 183 tumours. We followed the PCAWG working classification and annotation scheme for genomic rearrangement[26]. Two SV callers were used to obtain exact breakpoint coordinates. Replication timing scores influencing on SV detection were set at >75, 20–75 and <20 for early, mid, and late timing, respectively[47]. Full details of analysis steps, parameters and relevant statistical tests are provided in Supplementary Information 9.

Reconstruction of cancer timelines

Timing of CN gains and driver mutations (SNVs and indels) into four epochs of cancer evolution (early clonal, unspecified clonal, late clonal and subclonal) was conducted using MutationTimeR[24]. CN gains including 2 + 0, 2 + 1 and 2 + 2 (1 + 1 for a diploid genome) were considered for a clearer boundary between epochs instead of solely information of variant allele frequency. Confidence intervals (tlo – tup) for timing estimates were calculated with 200 bootstraps. Mutation rates for each subtype were calculated according to ref. [24] such that CpG-to-TpG mutations were counted for the analysis because they were attributed to spontaneous deamination of 5-methyl-cytosine to thymine at CpG dinucleotides, therefore acting as a molecular clock. League model relative ordering was performed to aggregate across all study samples to calculate the overall ranking of driver mutations and recurrent CNAs. The information for the ranking was derived from the timing of each driver mutation and that of clonal and subclonal CN segments, as described above. A full description is provided in Supplementary Information 10.
  42 in total

1.  Spatial genomic heterogeneity within localized, multifocal prostate cancer.

Authors:  Paul C Boutros; Michael Fraser; Nicholas J Harding; Richard de Borja; Dominique Trudel; Emilie Lalonde; Alice Meng; Pablo H Hennings-Yeomans; Andrew McPherson; Veronica Y Sabelnykova; Amin Zia; Natalie S Fox; Julie Livingstone; Yu-Jia Shiah; Jianxin Wang; Timothy A Beck; Cherry L Have; Taryne Chong; Michelle Sam; Jeremy Johns; Lee Timms; Nicholas Buchner; Ada Wong; John D Watson; Trent T Simmons; Christine P'ng; Gaetano Zafarana; Francis Nguyen; Xuemei Luo; Kenneth C Chu; Stephenie D Prokopec; Jenna Sykes; Alan Dal Pra; Alejandro Berlin; Andrew Brown; Michelle A Chan-Seng-Yue; Fouad Yousif; Robert E Denroche; Lauren C Chong; Gregory M Chen; Esther Jung; Clement Fung; Maud H W Starmans; Hanbo Chen; Shaylan K Govind; James Hawley; Alister D'Costa; Melania Pintilie; Daryl Waggott; Faraz Hach; Philippe Lambin; Lakshmi B Muthuswamy; Colin Cooper; Rosalind Eeles; David Neal; Bernard Tetu; Cenk Sahinalp; Lincoln D Stein; Neil Fleshner; Sohrab P Shah; Colin C Collins; Thomas J Hudson; John D McPherson; Theodorus van der Kwast; Robert G Bristow
Journal:  Nat Genet       Date:  2015-05-25       Impact factor: 38.330

2.  Comprehensive molecular classification of localized prostate adenocarcinoma reveals a tumour subtype predictive of non-aggressive disease.

Authors:  A Kamoun; G Cancel-Tassin; G Fromont; N Elarouci; L Armenoult; M Ayadi; J Irani; X Leroy; A Villers; G Fournier; L Doucet; S Boyault; L Brureau; L Multigner; A Diedhiou; M Roupret; E Compérat; P Blanchet; A de Reyniès; O Cussenot
Journal:  Ann Oncol       Date:  2018-08-01       Impact factor: 32.976

3.  Tumour genomic and microenvironmental heterogeneity for integrated prediction of 5-year biochemical recurrence of prostate cancer: a retrospective cohort study.

Authors:  Emilie Lalonde; Adrian S Ishkanian; Jenna Sykes; Michael Fraser; Helen Ross-Adams; Nicholas Erho; Mark J Dunning; Silvia Halim; Alastair D Lamb; Nathalie C Moon; Gaetano Zafarana; Anne Y Warren; Xianyue Meng; John Thoms; Michal R Grzadkowski; Alejandro Berlin; Cherry L Have; Varune R Ramnarine; Cindy Q Yao; Chad A Malloff; Lucia L Lam; Honglei Xie; Nicholas J Harding; Denise Y F Mak; Kenneth C Chu; Lauren C Chong; Dorota H Sendorek; Christine P'ng; Colin C Collins; Jeremy A Squire; Igor Jurisica; Colin Cooper; Rosalind Eeles; Melania Pintilie; Alan Dal Pra; Elai Davicioni; Wan L Lam; Michael Milosevic; David E Neal; Theodorus van der Kwast; Paul C Boutros; Robert G Bristow
Journal:  Lancet Oncol       Date:  2014-11-13       Impact factor: 41.316

4.  Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries.

Authors:  Hyuna Sung; Jacques Ferlay; Rebecca L Siegel; Mathieu Laversanne; Isabelle Soerjomataram; Ahmedin Jemal; Freddie Bray
Journal:  CA Cancer J Clin       Date:  2021-02-04       Impact factor: 508.702

Review 5.  Prostate cancer.

Authors:  Shahneen Sandhu; Caroline M Moore; Edmund Chiong; Himisha Beltran; Robert G Bristow; Scott G Williams
Journal:  Lancet       Date:  2021-08-06       Impact factor: 79.321

6.  A genomic and epigenomic atlas of prostate cancer in Asian populations.

Authors:  Jing Li; Chuanliang Xu; Hyung Joo Lee; Shancheng Ren; Xiaoyuan Zi; Zhiming Zhang; Haifeng Wang; Yongwei Yu; Chenghua Yang; Xiaofeng Gao; Jianguo Hou; Linhui Wang; Bo Yang; Qing Yang; Huamao Ye; Tie Zhou; Xin Lu; Yan Wang; Min Qu; Qingsong Yang; Wenhui Zhang; Nakul M Shah; Erica C Pehrsson; Shuo Wang; Zengjun Wang; Jun Jiang; Yan Zhu; Rui Chen; Huan Chen; Feng Zhu; Bijun Lian; Xiaoyun Li; Yun Zhang; Chao Wang; Yue Wang; Guangan Xiao; Junfeng Jiang; Yue Yang; Chaozhao Liang; Jianquan Hou; Conghui Han; Ming Chen; Ning Jiang; Dahong Zhang; Song Wu; Jinjian Yang; Tao Wang; Yongliang Chen; Jiantong Cai; Wenzeng Yang; Jun Xu; Shaogang Wang; Xu Gao; Ting Wang; Yinghao Sun
Journal:  Nature       Date:  2020-03-25       Impact factor: 49.962

7.  Sequencing of prostate cancers identifies new cancer genes, routes of progression and drug targets.

Authors:  David C Wedge; Gunes Gundem; Thomas Mitchell; Dan J Woodcock; Inigo Martincorena; Mohammed Ghori; Jorge Zamora; Adam Butler; Hayley Whitaker; Zsofia Kote-Jarai; Ludmil B Alexandrov; Peter Van Loo; Charlie E Massie; Stefan Dentro; Anne Y Warren; Clare Verrill; Dan M Berney; Nening Dennis; Sue Merson; Steve Hawkins; William Howat; Yong-Jie Lu; Adam Lambert; Jonathan Kay; Barbara Kremeyer; Katalin Karaszi; Hayley Luxton; Niedzica Camacho; Luke Marsden; Sandra Edwards; Lucy Matthews; Valeria Bo; Daniel Leongamornlert; Stuart McLaren; Anthony Ng; Yongwei Yu; Hongwei Zhang; Tokhir Dadaev; Sarah Thomas; Douglas F Easton; Mahbubl Ahmed; Elizabeth Bancroft; Cyril Fisher; Naomi Livni; David Nicol; Simon Tavaré; Pelvender Gill; Christopher Greenman; Vincent Khoo; Nicholas Van As; Pardeep Kumar; Christopher Ogden; Declan Cahill; Alan Thompson; Erik Mayer; Edward Rowe; Tim Dudderidge; Vincent Gnanapragasam; Nimish C Shah; Keiran Raine; David Jones; Andrew Menzies; Lucy Stebbings; Jon Teague; Steven Hazell; Cathy Corbishley; Johann de Bono; Gerhardt Attard; William Isaacs; Tapio Visakorpi; Michael Fraser; Paul C Boutros; Robert G Bristow; Paul Workman; Chris Sander; Freddie C Hamdy; Andrew Futreal; Ultan McDermott; Bissan Al-Lazikani; Andrew G Lynch; G Steven Bova; Christopher S Foster; Daniel S Brewer; David E Neal; Colin S Cooper; Rosalind A Eeles
Journal:  Nat Genet       Date:  2018-04-16       Impact factor: 38.330

8.  The genomic complexity of primary human prostate cancer.

Authors:  Michael F Berger; Michael S Lawrence; Francesca Demichelis; Yotam Drier; Kristian Cibulskis; Andrey Y Sivachenko; Andrea Sboner; Raquel Esgueva; Dorothee Pflueger; Carrie Sougnez; Robert Onofrio; Scott L Carter; Kyung Park; Lukas Habegger; Lauren Ambrogio; Timothy Fennell; Melissa Parkin; Gordon Saksena; Douglas Voet; Alex H Ramos; Trevor J Pugh; Jane Wilkinson; Sheila Fisher; Wendy Winckler; Scott Mahan; Kristin Ardlie; Jennifer Baldwin; Jonathan W Simons; Naoki Kitabayashi; Theresa Y MacDonald; Philip W Kantoff; Lynda Chin; Stacey B Gabriel; Mark B Gerstein; Todd R Golub; Matthew Meyerson; Ashutosh Tewari; Eric S Lander; Gad Getz; Mark A Rubin; Levi A Garraway
Journal:  Nature       Date:  2011-02-10       Impact factor: 49.962

9.  The repertoire of mutational signatures in human cancer.

Authors:  Ludmil B Alexandrov; Jaegil Kim; Gad Getz; Steven G Rozen; Michael R Stratton; Nicholas J Haradhvala; Mi Ni Huang; Alvin Wei Tian Ng; Yang Wu; Arnoud Boot; Kyle R Covington; Dmitry A Gordenin; Erik N Bergstrom; S M Ashiqul Islam; Nuria Lopez-Bigas; Leszek J Klimczak; John R McPherson; Sandro Morganella; Radhakrishnan Sabarinathan; David A Wheeler; Ville Mustonen
Journal:  Nature       Date:  2020-02-05       Impact factor: 49.962

10.  Signatures of mutational processes in human cancer.

Authors:  Ludmil B Alexandrov; Serena Nik-Zainal; David C Wedge; Samuel A J R Aparicio; Sam Behjati; Andrew V Biankin; Graham R Bignell; Niccolò Bolli; Ake Borg; Anne-Lise Børresen-Dale; Sandrine Boyault; Birgit Burkhardt; Adam P Butler; Carlos Caldas; Helen R Davies; Christine Desmedt; Roland Eils; Jórunn Erla Eyfjörd; John A Foekens; Mel Greaves; Fumie Hosoda; Barbara Hutter; Tomislav Ilicic; Sandrine Imbeaud; Marcin Imielinski; Marcin Imielinsk; Natalie Jäger; David T W Jones; David Jones; Stian Knappskog; Marcel Kool; Sunil R Lakhani; Carlos López-Otín; Sancha Martin; Nikhil C Munshi; Hiromi Nakamura; Paul A Northcott; Marina Pajic; Elli Papaemmanuil; Angelo Paradiso; John V Pearson; Xose S Puente; Keiran Raine; Manasa Ramakrishna; Andrea L Richardson; Julia Richter; Philip Rosenstiel; Matthias Schlesner; Ton N Schumacher; Paul N Span; Jon W Teague; Yasushi Totoki; Andrew N J Tutt; Rafael Valdés-Mas; Marit M van Buuren; Laura van 't Veer; Anne Vincent-Salomon; Nicola Waddell; Lucy R Yates; Jessica Zucman-Rossi; P Andrew Futreal; Ultan McDermott; Peter Lichter; Matthew Meyerson; Sean M Grimmond; Reiner Siebert; Elías Campo; Tatsuhiro Shibata; Stefan M Pfister; Peter J Campbell; Michael R Stratton
Journal:  Nature       Date:  2013-08-14       Impact factor: 49.962

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.