Literature DB >> 31228182

OncodriveCLUSTL: a sequence-based clustering method to identify cancer drivers.

Claudia Arnedo-Pac¹, Loris Mularoni¹, Ferran Muiños¹, Abel Gonzalez-Perez^1,2, Nuria Lopez-Bigas^1,2,3.

Abstract

MOTIVATION: Identification of the genomic alterations driving tumorigenesis is one of the main goals in oncogenomics research. Given the evolutionary principles of cancer development, computational methods that detect signals of positive selection in the pattern of tumor mutations have been effectively applied in the search for cancer genes. One of these signals is the abnormal clustering of mutations, which has been shown to be complementary to other signals in the detection of driver genes.
RESULTS: We have developed OncodriveCLUSTL, a new sequence-based clustering algorithm to detect significant clustering signals across genomic regions. OncodriveCLUSTL is based on a local background model derived from the simulation of mutations accounting for the composition of tri- or penta-nucleotide context substitutions observed in the cohort under study. Our method can identify known clusters and bona-fide cancer drivers across cohorts of tumor whole-exomes, outperforming the existing OncodriveCLUST algorithm and complementing other methods based on different signals of positive selection. Our results indicate that OncodriveCLUSTL can be applied to the analysis of non-coding genomic elements and non-human mutations data.
AVAILABILITY AND IMPLEMENTATION: OncodriveCLUSTL is available as an installable Python 3.5 package. The source code and running examples are freely available at https://bitbucket.org/bbglab/oncodriveclustl under GNU Affero General Public License. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2019 PMID： 31228182 PMCID： PMC6853674 DOI： 10.1093/bioinformatics/btz501

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Identification of the alterations driving tumorigenesis is a major goal of cancer research. Knowledge of the molecular mechanisms underlying tumorigenesis is a necessary step for the implementation of precision cancer medicine. Given that cancer development is an evolutionary process, the detection of signals of positive selection in the somatic mutational pattern of genes has been exploited to identify drivers across tumor cohorts. Specifically, the non-random spatial accumulation, or clustering, of mutations along the protein sequence has been used to identify cancer drivers and provide clues about oncogenic mechanisms (Chang ; Tamborero , Tokheim ). This signal is complementary to others (such as recurrence and functional impact) and thus, their combination can produce more comprehensive lists of driver genes (Porta-Pardo ; Rheinbay ; Tamborero ). Since the rate of mutation generation across the genome is highly variable (Alexandrov ; Lawrence ; Polak, ; Schuster-Böckler and Lehner 2012; Stamatoyannopoulos ), clustering-based methods face the challenge of constructing an accurate background model of the distribution of mutations to correctly assess the significance of observed clusters. Ideally, such a model would include all the genomic position-dependent covariates of the mutation rate. Alternatively, one can locally simulate the same number of mutations as observed in the region following the probabilities of k-nucleotide context-dependent substitutions and assess whether the distribution of mutations along the region follows the expectation (Mularoni ). This background model is not affected by large-scale covariates of the mutation rate (e.g. replication timing or chromatin state) and can thus be applied to any region of the genome of any species. Here we introduce OncodriveCLUSTL, a new linear clustering algorithm to detect genomic regions and elements with significant clustering signals. The algorithm is based on a local background model derived from the observed tri- or penta-nucleotide substitution frequency of a cohort. OncodriveCLUSTL identifies known mutation clusters and driver genes across TCGA cohorts. It outperforms the existing OncodriveCLUST (Tamborero ), and complements methods based on different signals of positive selection. We show that OncodriveCLUSTL identifies mutation clusters in human promoter regions and in mouse genes.

2 Implementation and availability

OncodriveCLUSTL is an unsupervised clustering algorithm implemented in Python 3.5. It analyzes somatic mutations that have been observed in genomic elements (GEs) across a cohort of tumor samples (Fig. 1a-1). Mutations in each GE are smoothed along its sequence using a Tukey-based kernel density function, and clusters are identified (Fig. 1a-2, 3) and scored based on the number and the shape of the distribution of mutations. Cluster scores are summed up to produce a GE clustering score. The significance of the observed clusters and GEs is assessed through the analysis of n iterations, where mutations are randomly sampled within a window of nucleotides centered at each mutation (local), following the frequency of cohort tri- or penta-nucleotide changes (Fig. 1a-4, 5; Supplementary Methods for further details). By default, P-values are adjusted using the Benjamini-Hochberg method and GEs below 1% false-discovery rate (FDR) are considered significant. OncodriveCLUSTL source code and examples are freely available at https://bitbucket.org/bbglab/oncodriveclustl. A web version of OncodriveCLUSTL can be run at https://bbglab.irbbarcelona.org/oncodriveclustl.

Fig. 1.

OncodriveCLUSTL algorithm and results. Overview of OncodriveCLUSTL (a). OncodriveCLUSTL detects well-known cancer genes (b) and complements methods based on different signals of positive selection (c). OncodriveCLUSTL can be successfully applied to mutations in promoter regions (d) and mouse genes (e)

3 Performance

3.1 Mutations in human protein-coding genes across 19 TCGA cohorts

OncodriveCLUSTL detects well-known cancer genes in the COSMIC Cancer Gene Census (CGC; Sondka ) with clusters of different sizes (Fig. 1b;Supplementary Figs S3 and S8; Supplementary Table S2 and S3) (Ellrott ). It outperforms the previously developed protein-clustering method OncodriveCLUST (Tamborero ), which builds a background model obtained from synonymous mutations, in both true and false positives rates (Supplementary Figs S4, S8 and S9; Supplementary Methods for further details). These findings demonstrate that the improved clustering detection method and the local background model fine-tune the detection of drivers. OncodriveCLUSTL also exhibits similar specificity and sensitivity as the 3D protein-clustering method HotMAPS (Tokheim ) (Fig. 1c, Supplementary Figs S5 and S8–S11). Interestingly, although the linear clustering analysis performed by OncodriveCLUSTL can miss the detection of 3D clusters (Supplementary Fig. S10), it can identify CGCs with clusters of truncating or silent mutations (Supplementary Fig. S10) as well as CGCs without a PDB structure or protein model (Supplementary Fig. S11), which are missed by HotMAPS. In addition, the results of OncodriveCLUSTL complement those of methods based on distinct signals of positive selection (OncodriveFML, Mularoni ; dNdScv, Martincorena ) (Fig. 1c, Supplementary Figs S6 and S7), thus highlighting the relevance of combining methods exploiting different signals to enhance comprehensiveness in driver’s identification.

3.2 Mutations in promoters across a cohort of tumor whole-genomes

Consistent with the study describing the dataset (Fredriksson ), OncodriveCLUSTL found a significant cluster in the TERT promoter (Fig. 1d), the mutations of which result in the upregulation of TERT (Supplementary Fig. S12). Significant clustering was also detected in few other promoters, which need to be carefully vetted to be nominated as cancer drivers, as we and others have shown that some local mutational processes can also lead to mutation clustering (Sabarinathan ; Zou ).

3.3 Mutations in C3H mouse genes in chemically induced hepatocarcinomas

As described by the authors of the dataset (Connor ), OncodriveCLUSTL identified significant clustering in Braf, Hras and Egfr (Fig. 1e).

4 Conclusions

OncodriveCLUSTL is a new method to identify sequence-based clustering signals across the genome. It shows satisfactory sensitivity and specificity, outperforming the existing OncodriveCLUST and complementing other methods of driver detection in coding sequences. It can also be successfully applied to the detection of mutational clustering in non-coding regions and in non-human data. Click here for additional data file.

18 in total

1. Chromatin organization is a major influence on regional mutation rates in human cancer cells.

Authors: Benjamin Schuster-Böckler; Ben Lehner
Journal: Nature Date: 2012-08-23 Impact factor: 49.962

2. Systematic analysis of noncoding somatic mutations and gene expression alterations across 14 tumor types.

Authors: Nils J Fredriksson; Lars Ny; Jonas A Nilsson; Erik Larsson
Journal: Nat Genet Date: 2014-11-10 Impact factor: 38.330

3. Exome-Scale Discovery of Hotspot Mutation Regions in Human Cancer Using 3D Protein Structure.

Authors: Collin Tokheim; Rohit Bhattacharya; Noushin Niknafs; Derek M Gygax; Rick Kim; Michael Ryan; David L Masica; Rachel Karchin
Journal: Cancer Res Date: 2016-04-28 Impact factor: 12.701

4. Human mutation rate associated with DNA replication timing.

Authors: John A Stamatoyannopoulos; Ivan Adzhubei; Robert E Thurman; Gregory V Kryukov; Sergei M Mirkin; Shamil R Sunyaev
Journal: Nat Genet Date: 2009-03-15 Impact factor: 38.330

Review 5. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers.

Authors: Zbyslaw Sondka; Sally Bamford; Charlotte G Cole; Sari A Ward; Ian Dunham; Simon A Forbes
Journal: Nat Rev Cancer Date: 2018-11 Impact factor: 60.716

6. Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines.

Authors: Kyle Ellrott; Matthew H Bailey; Gordon Saksena; Kyle R Covington; Cyriac Kandoth; Chip Stewart; Julian Hess; Singer Ma; Kami E Chiotti; Michael McLellan; Heidi J Sofia; Carolyn Hutter; Gad Getz; David Wheeler; Li Ding
Journal: Cell Syst Date: 2018-03-28 Impact factor: 10.304

7. Comprehensive identification of mutational cancer driver genes across 12 tumor types.

Authors: David Tamborero; Abel Gonzalez-Perez; Christian Perez-Llamas; Jordi Deu-Pons; Cyriac Kandoth; Jüri Reimand; Michael S Lawrence; Gad Getz; Gary D Bader; Li Ding; Nuria Lopez-Bigas
Journal: Sci Rep Date: 2013-10-02 Impact factor: 4.379

8. Mutational heterogeneity in cancer and the search for new cancer-associated genes.

Authors: Michael S Lawrence; Petar Stojanov; Paz Polak; Gregory V Kryukov; Kristian Cibulskis; Andrey Sivachenko; Scott L Carter; Chip Stewart; Craig H Mermel; Steven A Roberts; Adam Kiezun; Peter S Hammerman; Aaron McKenna; Yotam Drier; Lihua Zou; Alex H Ramos; Trevor J Pugh; Nicolas Stransky; Elena Helman; Jaegil Kim; Carrie Sougnez; Lauren Ambrogio; Elizabeth Nickerson; Erica Shefler; Maria L Cortés; Daniel Auclair; Gordon Saksena; Douglas Voet; Michael Noble; Daniel DiCara; Pei Lin; Lee Lichtenstein; David I Heiman; Timothy Fennell; Marcin Imielinski; Bryan Hernandez; Eran Hodis; Sylvan Baca; Austin M Dulak; Jens Lohr; Dan-Avi Landau; Catherine J Wu; Jorge Melendez-Zajgla; Alfredo Hidalgo-Miranda; Amnon Koren; Steven A McCarroll; Jaume Mora; Brian Crompton; Robert Onofrio; Melissa Parkin; Wendy Winckler; Kristin Ardlie; Stacey B Gabriel; Charles W M Roberts; Jaclyn A Biegel; Kimberly Stegmaier; Adam J Bass; Levi A Garraway; Matthew Meyerson; Todd R Golub; Dmitry A Gordenin; Shamil Sunyaev; Eric S Lander; Gad Getz
Journal: Nature Date: 2013-06-16 Impact factor: 49.962

9. Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity.

Authors: Matthew T Chang; Saurabh Asthana; Sizhi Paul Gao; Byron H Lee; Jocelyn S Chapman; Cyriac Kandoth; JianJiong Gao; Nicholas D Socci; David B Solit; Adam B Olshen; Nikolaus Schultz; Barry S Taylor
Journal: Nat Biotechnol Date: 2015-11-30 Impact factor: 54.908

10. Short inverted repeats contribute to localized mutability in human somatic cells.

Authors: Xueqing Zou; Sandro Morganella; Dominik Glodzik; Helen Davies; Yilin Li; Michael R Stratton; Serena Nik-Zainal
Journal: Nucleic Acids Res Date: 2017-11-02 Impact factor: 16.971

12 in total

1. Hepatocellular Carcinoma in Mongolia Delineates Unique Molecular Traits and a Mutational Signature Associated with Environmental Agents.

Authors: Laura Torrens; Marc Puigvehí; Miguel Torres-Martín; Huan Wang; Miho Maeda; Philipp K Haber; Thais Leonel; Mireia García-López; Roger Esteban-Fabró; Wei Qiang Leow; Carla Montironi; Sara Torrecilla; Ajay Ramakrishnan Varadarajan; Patricia Taik; Genís Campreciós; Chinbold Enkhbold; Erdenebileg Taivanbaatar; Amankyeldi Yerbolat; Augusto Villanueva; Sofía Pérez-Del-Pulgar; Swan Thung; Jigjidsuren Chinburen; Eric Letouzé; Jessica Zucman-Rossi; Andrew Uzilov; Jaclyn Neely; Xavier Forns; Sasan Roayaie; Daniela Sia; Josep M Llovet
Journal: Clin Cancer Res Date: 2022-10-14 Impact factor: 13.801

2. Detection of oncogenic and clinically actionable mutations in cancer genomes critically depends on variant calling tools.

Authors: Carlos A Garcia-Prieto; Francisco Martínez-Jiménez; Alfonso Valencia; Eduard Porta-Pardo
Journal: Bioinformatics Date: 2022-05-05 Impact factor: 6.931

Review 3. A compendium of mutational cancer driver genes.

Authors: Francisco Martínez-Jiménez; Ferran Muiños; Inés Sentís; Jordi Deu-Pons; Iker Reyes-Salazar; Claudia Arnedo-Pac; Loris Mularoni; Oriol Pich; Jose Bonet; Hanna Kranas; Abel Gonzalez-Perez; Nuria Lopez-Bigas
Journal: Nat Rev Cancer Date: 2020-08-10 Impact factor: 60.716

4. Improving existing analysis pipeline to identify and analyze cancer driver genes using multi-omics data.

Authors: Quang-Huy Nguyen; Duc-Hau Le
Journal: Sci Rep Date: 2020-11-25 Impact factor: 4.379

5. A New Insight for the Identification of Oncogenic Variants in Breast and Prostate Cancers in Diverse Human Populations, With a Focus on Latinos.

Authors: Nelson M Varela; Patricia Guevara-Ramírez; Cristian Acevedo; Tomás Zambrano; Isaac Armendáriz-Castillo; Santiago Guerrero; Luis A Quiñones; Andrés López-Cortés
Journal: Front Pharmacol Date: 2021-04-12 Impact factor: 5.810

6. Genomic Analysis Reveals Heterogeneity Between Lesions in Synchronous Primary Right-Sided and Left-Sided Colon Cancer.

Authors: Hanqing Hu; Qian Zhang; Rui Huang; Zhifeng Gao; Ziming Yuan; Qingchao Tang; Feng Gao; Meng Wang; Weiyuan Zhang; Tianyi Ma; Tianyu Qiao; Yinghu Jin; Guiyu Wang
Journal: Front Mol Biosci Date: 2021-08-04