| Literature DB >> 34097189 |
Simona Giunta1,2.
Abstract
Cancer is underlined by genetic changes. In an unprecedented international effort, the Pan-Cancer Analysis of Whole Genomes (PCAWG) of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) sequenced the tumors of over two thousand five hundred patients across 38 different cancer types, as well as the corresponding healthy tissue, with the aim of identifying genome-wide mutations exclusively found in cancer and uncovering new genetic changes that drive tumor formation. What set this project apart from earlier efforts is the use of whole genome sequencing (WGS) that enabled to explore alterations beyond the coding DNA, into cancer's non-coding genome. WGS of the entire cohort allowed to tease apart driving mutations that initiate and support carcinogenesis from passenger mutations that do not play an overt role in the disease. At least one causative mutation was found in 95% of all cancers, with many tumors showing an average of 5 driver mutations. The PCAWG Project also assessed the transcriptional output altered in cancer and rebuilt the evolutionary history of each tumor showing that initial driver mutations can occur years if not decades prior to a diagnosis. Here, I provide a concise review of the Pan-Cancer Project papers published on February 2020, along with key computational tools and the digital framework generated as part of the project. This represents an historic effort by hundreds of international collaborators, which provides a comprehensive understanding of cancer genetics, with publicly available data and resources representing a treasure trove of information to advance cancer research for years to come.Entities:
Keywords: Cancer; Chromothripsis; Driver mutations; Genomes; PCAWG; Pan-Cancer project; RNA; Telomeres; Whole genome sequencing
Mesh:
Year: 2021 PMID: 34097189 PMCID: PMC8180541 DOI: 10.1007/s10555-021-09969-z
Source DB: PubMed Journal: Cancer Metastasis Rev ISSN: 0167-7659 Impact factor: 9.264
Fig. 1Key advances in understanding cancer genomes. A timeline of key technological advances in sequencing, seminal milestones and large-cohort studies published in the last 50 years (not to scale) that have contributed to our current understanding of mutations driving cancer.
PCAWG Project major findings reviewed here. The published collection of papers can be accessed on the Nature website landing page for the PCAWG Consortium (www.nature.com/collections/afdejfafdb).
| Publication | Brief description | Chapter | Reference |
|---|---|---|---|
Campbell PJ, Getz G, Korbel JO, Stuart JM, Jennings JL, Stein LD, et al. Pan-cancer analysis of whole genomes. Nature. 2020 10.1038/s41586-020-1969-6 | Identified driver mutations across cancer genomes | 3.1 | [ Preprint [ |
Rheinbay E, Nielsen MM, Abascal F, Wala JA, Shapira O, Tiao G, et al. Analyses of non-coding somatic drivers in 2,658 cancer whole genomes. Nature. 2020 10.1038/s41586-020-1965-x | Analysis of the 13% of tumor samples that have non-coding mutations that drive cancer | 3.2 | [ Preprint [ |
Alexandrov LB, Kim J, Haradhvala NJ, Huang MN, Tian Ng AW, Wu Y, et al. The repertoire of mutational signatures in human cancer. Nature. 2020 10.1038/s41586-020-1943-3 | Identified new signatures of mutational processes that cause base substitutions, small insertions and deletions, and structural variation in cancer | 3.3 | [ Preprint [ |
Li Y, Roberts ND, Wala JA, Shapira O, Schumacher SE, Kumar K, et al. Patterns of somatic structural variation in human cancer genomes. Nature. 2020 10.1038/s41586-019-1913-9 | Identified new signatures of mutational processes that cause larger-scale structural variations associated with cancer | 3.3 | [ Preprint [ |
Gerstung M, Jolly C, Leshchiner I, Dentro SC, Gonzalez S, Rosebrock D, et al. The evolutionary history of 2,658 cancers. Nature. 2020 10.1038/s41586-019-1907-7 | Analysis of the timings and mutational patterns in the evolution of tumors to map the progression and occurrence of each driver | 3.4 | [ Preprint [ |
Calabrese C, Davidson NR, Demircioğlu D, Fonseca NA, He Y, Kahles A, et al. Genomic basis for RNA alterations in cancer. Nature. 2020 10.1038/s41586-020-1970-0 | Describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes, and promoter activity | 3.5.1 | [ Preprint [ |
Zhang Y, Chen F, Fonseca NA, He Y, Fujita M, Nakagawa H, et al. High-coverage whole-genome analysis of 1220 cancers reveals hundreds of genes deregulated by rearrangement-mediated cis-regulatory alterations. Nature Communications. 2020 10.1038/s41467-019-13885-w | Analysis of the diverse transcriptional consequences of gene deregulation by non-coding regions in cancer | 3.5.2 | [ Preprint [ |
Rodriguez-Martin B, Alvarez EG, Baez-Ortega A, Zamora J, Supek F, Demeulemeester J, et al. Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition. Nature Genetics. 2020 https://doi. org/10.1038/s41588-019-0562-0 | Evaluates “jumping” of retrotransposable elements as a driver of cancer-associated mutagenesis | 3.6 | [ Preprint [ |
Sieverling L, Hong C, Koser SD, Ginsbach P, Kleinheinz K, Hutter B, et al. Genomic footprints of activated telomere maintenance mechanisms in cancer. Nature communications 10.1038/s41467-019-13824-9 | Explores different known and still unknown pathways used by cancer to maintain their telomeres | 3.7 | [ Preprint [ |
Yuan Y, Ju YS, Kim Y, Li J, Wang Y, Yoon CJ, et al. Comprehensive molecular characterization of mitochondrial genomes in human cancers. Nature Genetics. 2020 10.1038/s41588-019-0557-x | Mutational analysis of mitochondrial DNA in cancer | 3.8 | [ Preprint [ |
Akdemir KC, Le VT, Sahaana C, Li Y, Group P-SVW, Verhaak RG, et al. Chromatin Folding Domains Disruptions by Somatic Genomic Rearrangements in Human Cancers. Nat Genet. 2019 10.1038/s41588-019- 0564-y | Alterations of 3D genome architecture in cancer | 3.9 | [ Preprint [ |
Reyna MA, Haan D, Paczkowska M, Verbeke LPC, Vazquez M, Kahraman A, et al. Pathway and network analysis of more than 2500 whole cancer genomes. Nature Communications. 2020 10.1038/s41467-020-14351-8 | Establishes the most commonly mutated pathways and molecular processes in driving cancer formation and progression | 3.10 | [ Preprint [ |
Cortés-Ciriano I, Lee JJK, Xi R, Jain D, Jung YL, Yang L, et al. Comprehensive analysis of chromothripsis in 2,658 human cancers using whole-genome sequencing. Nature Genetics. 2020 10.1038/s41588-019-0576-7 | Analysis of chromothripsis, a mutational process found to occur early in a high proportion of all cancers and to drive tumor genetic heterogeneity | 3.11 | [ Preprint [ |
Zapatka M, Borozan I, Brewer DS, Iskar M, Grundhoff A, Alawi M, et al. The landscape of viral associations in human cancers. Nature Genetics. 2020 10.1038/s41588-019-0558-9 | HPV integration and impaired antiviral defense drive cervical, bladder, and head-and-neck carcinomas | 3.12 | [ Preprint [ |
Recap of selected datasets and computational tools generated as part of the PCAWG project.
| Name of datasets and tools | Description | Accession link |
|---|---|---|
| PCAWG landing page | This is the recommended starting point for users wishing to access the PCAWG datasets (Most of the data is open access with some controlled access requiring approval from the ICGC) | |
| Cancer Genome Collaboratory cloud portal | Cancer Collaboratory is an academic cloud-based access to the PCAWG dataset, excepting the TCGA-originated portion of the controlled data tier (see Bionimbus) (Open and controlled access) | |
| The Bionimbus | Bionimbus is a cloud portal of protected data for cloud-based access to the TCGA-originated portion of the controlled data tier (Controlled access) | |
| UCSC Xena data portal | UCSC Xena is a data portal for visualizations and analyses to integrate omics data generated by the PCAWG Consortium, including copy number, gene expression, gene fusion, promoter usage, simple somatic mutations, large somatic structural variation, mutational signatures, and phenotypic data | |
| Expression Atlas | Expression Atlas is an open science resource to find information about gene and protein expression. It enables queries across different tissues, cell types, developmental stages, and experimental conditions, across thousands of publicly available RNA-seq, microarray, and proteomics datasets | |
| PCAWG-Scout | PCAWG-Scout is a data portal that provides a framework to make on-demand, in-depth analyses over the open access PCAWG data | |
| Chromothripsis Explorer | The Chromothripsis Explorer portal enables exploration of patterns of chromothripsis in the PCAWG dataset | |
| Cancer LncRNA Census | The Cancer LncRNA Census is an ongoing effort to identify and catalogue lncRNA genes which have been causally implicated in cancer | |
| PCAWG Core Pipelines | This Dockstore site contains binaries, source code, and documentation for the open source software tool for all core alignment, QC, and variation-calling pipelines used by PCAWG packaged as portable binaries using Docker and described using workflow description languages | |
| Overture suite software tool | Overture comprises a set of open source tools for efficiently managing large genomic datasets and transferring them efficiently and reliably across the Internet | |
| Butler software tool | Butler is a workflow framework that facilitates large-scale genomic analyses on public and academic clouds while offering comprehensive error detection and self-healing capabilities (reviewed in Suppl. Text Chap. | |
| SVclone software tool | SVclone is a computational method for inferring the cancer cell fraction of structural variant (SV) breakpoints from whole genome sequencing data | |
| DriverPower software tool | DriverPower is a tool used to discover potential coding and non-coding cancer driver elements from tumor whole genome or whole exome somatic mutation sets | |
| TrackSig software tool | TrackSig is a computational framework to infer changes in somatic mutational signatures over time | |
| ActivePathways tool | ActivePathways is a tool for multivariate pathway enrichment analysis that identifies gene sets, such as pathways or Gene Ontology terms, prioritizes genes based on the significance of signals from the omics datasets, and performs pathway enrichment analysis of these prioritized genes |
All resources described are open accesses, unless otherwise stated in this table
Fig. 2Number of mutations that drive each cancer. At least one driver mutation was found for 95% of all cancers, with an average of 4-5 driver mutations for each type of cancer [3]. Depicted are examples of cancers that have different amount of driver mutations promoting carcinogenesis. Depending on the type of cancer, anywhere from one to ten driver mutations are required for the tumor to develop.