| Literature DB >> 34245122 |
Carlo Ganini1,2, Ivano Amelio1, Riccardo Bertolo1,3, Pierluigi Bove1,3, Oreste Claudio Buonomo1, Eleonora Candi1,2, Chiara Cipriani1,3, Nicola Di Daniele1, Hartmut Juhl4, Alessandro Mauriello1, Carla Marani1,3, John Marshall5, Sonia Melino1, Paolo Marchetti6, Manuela Montanaro1, Maria Emanuela Natale1,3, Flavia Novelli1, Giampiero Palmieri1, Mauro Piacentini1, Erino Angelo Rendina6, Mario Roselli1, Giuseppe Sica1, Manfredi Tesauro1, Valentina Rovella1, Giuseppe Tisone1, Yufang Shi1,7,8, Ying Wang7, Gerry Melino1.
Abstract
Cancer genomes have been explored from the early 2000s through massive exome sequencing efforts, leading to the publication of The Cancer Genome Atlas in 2013. Sequencing techniques have been developed alongside this project and have allowed scientists to bypass the limitation of costs for whole-genome sequencing (WGS) of single specimens by developing more accurate and extensive cancer sequencing projects, such as deep sequencing of whole genomes and transcriptomic analysis. The Pan-Cancer Analysis of Whole Genomes recently published WGS data from more than 2600 human cancers together with almost 1200 related transcriptomes. The application of WGS on a large database allowed, for the first time in history, a global analysis of features such as molecular signatures, large structural variations and noncoding regions of the genome, as well as the evaluation of RNA alterations in the absence of underlying DNA mutations. The vast amount of data generated still needs to be thoroughly deciphered, and the advent of machine-learning approaches will be the next step towards the generation of personalized approaches for cancer medicine. The present manuscript wants to give a broad perspective on some of the biological evidence derived from the largest sequencing attempts on human cancers so far, discussing advantages and limitations of this approach and its power in the era of machine learning.Entities:
Keywords: artificial intelligence; cancer; molecular signature; omics; whole-genome sequencing
Mesh:
Year: 2021 PMID: 34245122 PMCID: PMC8564642 DOI: 10.1002/1878-0261.13056
Source DB: PubMed Journal: Mol Oncol ISSN: 1574-7891 Impact factor: 6.603
Fig. 1Global cancer genomics approaches. (A) Multiomics approach in The Cancer Genome Atlas (TCGA), the first international project to catalogue the mutational landscape of human cancers. Data from more than 10 000 patients worldwide have been analysed in terms of gene expression, CNAs, DNA methylation and mutations in the coding regions of the genome, providing the mutational landscape of 12 common cancers. (B) The Pan‐Cancer Analysis of Whole Genomes (PGAWG) project analysed more than 2600 whole‐cancer genomes from the International Genome Cancer Consortium (IGCG), building upon previous data from TCGA. Cancer type alteration burden has been evaluated regarding mutations (single base substitutions, double base substitutions, small insertion and deletions), CNAs, SVs and RNA expression (heatmap); genomic alterations have been catalogued according to the site of occurrence, coding region of a gene, regulatory regions as promoter, 5′ or 3′ untranslated regions (5′UTR and 3′UTR) or intron splicing variants, or for CNAs and SVs, providing a specific global profile for each gene alteration; among each class of alterations, driver mutations have been recognized and coding point mutations, together with somatic CNAs (SCNAs), represent the highest number of driver events in cancers (bar chart) [80]. (C) ARGO, Accelerating Research in Genomic Oncology, is the ongoing phase of the global‐scale omics approach of the ICGC, aimed at collecting omics and clinical data from more than 80 000 patients, with the goal to address key biological and clinical questions for each cancer type. This would allow the development of personalized medicine approaches for each cancer patient.
Fig. 2Mutational signatures of cancers. (A) Global genomics data obtained in the Pan‐Cancer Analysis of Whole Genomes (PCAWG) have been processed with fitting algorithm models to recognize mutational signatures in each cancer type. (B) The mutational profile of each cancer type can be dissected in multiple signatures according to the distribution of single base pair (bp) substitutions (SBS), double base substitutions (DBS) or small insertions/deletions (InDels); this approach allows researchers to correlate a specific signature with biological programme alterations in cancers (the APOBEC signature is an example) or with the clinical history of the patient (smoking‐associated signatures) [54]. (C) Signatures can be further investigated in their role in cellular or animal models using CRISPR‐Cas9 technology and single‐guided‐RNA screening platforms.
Fig. 3Mutational hotspots of cancers in noncoding regions of the genome. Mutational hotspots in cancer are frequently localized in known mutated genes and can act as drivers. Their frequency in noncoding regions has been recently evaluated [48]. Apart from known hotspots in coding regions, 25% of cancers show clusters of mutations that are localized at the 5′UTR or 3′UTR of genes, as well as on long noncoding RNAs and on their promoters. These hotspots can also be linked to specific signatures, such as UV, activated induced cytidine deaminases and APOBEC enzymes activity [112].
Fig. 4Evolutionary history of cancers, molecular timing and early detection. (A) The mutational history of each cancer can be evaluated from a single biopsy by considering the evolution of tumour heterogeneity. (B) The clonal allelic status of point mutations can be used as a model to classify mutations as preferentially early, variable, constant, late or subclonal. The first two classes of mutations usually harbour driver mutations among many genes, whereas the late and the subclonal classes usually do not contain driver mutations. (C) The classification of mutations according to their type [driver, CNAs, mutational signatures (Sigs)] and their allelic burden allows the reconstruction of a timeline for the development of each tumour [49], potentially extending the time for an early diagnostic approach. MRCA, most recent common ancestor.
Fig. 5Executable cancer models. (A) Experimental data from a global omics approach can be used as a matrix source for mechanistic computational models that can be continuously processed and refined using data from different cancer types and patients. This will provide data‐based mechanistic hypotheses on each cancer sample. (B) The data obtained through a global omics approach can be further integrated in a machine‐learning system, which is able to refine its ability to highlight mechanistic processes at the root of each cancer sample and can be further integrated with patient‐derived omics and clinical data to develop more precise information of cancer stage and development, ultimately allowing precise personalized medicine interventions.