Literature DB >> 26163694

SomaticSignatures: inferring mutational signatures from single-nucleotide variants.

Julian S Gehring1, Bernd Fischer2, Michael Lawrence3, Wolfgang Huber4.   

Abstract

UNLABELLED: Mutational signatures are patterns in the occurrence of somatic single-nucleotide variants that can reflect underlying mutational processes. The SomaticSignatures package provides flexible, interoperable and easy-to-use tools that identify such signatures in cancer sequencing data. It facilitates large-scale, cross-dataset estimation of mutational signatures, implements existing methods for pattern decomposition, supports extension through user-defined approaches and integrates with existing Bioconductor workflows.
AVAILABILITY AND IMPLEMENTATION: The R package SomaticSignatures is available as part of the Bioconductor project. Its documentation provides additional details on the methods and demonstrates applications to biological datasets. CONTACT: julian.gehring@embl.de, whuber@embl.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2015. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2015        PMID: 26163694      PMCID: PMC4817139          DOI: 10.1093/bioinformatics/btv408

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Mutational signatures link observed somatic single-nucleotide variants to mutation generating processes (Alexandrov ). The identification of these signatures offers insights into the evolution, heterogeneity and developmental mechanisms of cancer (Fischer ; Alexandrov ; Nik-Zainal ). Existing softwares offer specialized functionality for this approach and have contributed to the characterization of signatures in multiple cancer types (Nik-Zainal ; Fischer ), while their reliance on custom data input and output formats limits integration into common workflows. The SomaticSignatures package aims to encourage wider adoption of mutational signatures in tumor genome analysis by providing an accessible R implementation that supports multiple statistical approaches, scales to large datasets and closely interacts with the data structures and tools of Bioconductor (R Core Team, 2015; Gentleman ).

2 Approach

The probability of a somatic single-nucleotide variant (SNV) to occur can depend on the sequence neighborhood, and a fruitful approach is to analyze SNV frequencies together with their immediate sequence context, the flanking 3 and 5 bases (Alexandrov ). As an example, the mutation of A to G in the sequence TAC defines the mutational motif T[A>G]C. The occurrence patterns of such motifs capture characteristics of mutational mechanisms, and the frequencies of the 96 possible motifs across all samples define the mutational spectrum. It is represented by the matrix M, with M enumerating over the motifs i and the samples j. The mutational spectrum can be interpreted by decomposing M into two matrices of smaller size (Nik-Zainal ), where the number of signatures r is typically small compared to the number of samples, and the elements of the residual matrix ε are minimized such that WH is a useful approximation of the data. The columns of W describe the composition of a signature: W is the relative frequency of somatic motif i in the kth signature. The rows of H indicate the contribution of each signature to a particular sample j. A primary goal of the SomaticSignatures package is the easy application of this approach to datasets in an environment that provides users with powerful visualizations and algorithms.

3 Methods

Several approaches exist for the decomposition [Equation (1)] that differ in their constraints and computational complexity. In principal component analysis (PCA), for a given r, W and H are chosen such that the norm is minimal and W is orthonormal. Non-negative matrix factorization (NMF) (Brunet ) is motivated by the fact that the mutational spectrum fulfills and imposes the same requirement on the elements of W and H. Different NMF and PCA algorithms allow additional constraints on the results, such as sparsity. To deduce the number r of signatures present in the data, information theoretical criteria as well as prior biological knowledge can be employed (Nik-Zainal ; Alexandrov ).

4 Results

SomaticSignatures is a flexible and efficient tool for inferring characteristics of mutational mechanisms, based on the methodology developed by Nik-Zainal . It integrates with Bioconductor tools for processing and annotating genomic variants. An analysis starts with a set of SNV calls, typically imported from a VCF file and represented as a VRanges object (Obenchain ). Since the original calls do not contain information about the sequence context, we construct the mutational motifs first, based on the sequence of a reference or personalized genome. Subsequently, we define the mutational spectrum M. While its columns are by default defined by the sample labels, users can specify an alternative grouping covariate, for example tumor type. Mutational signatures and their contribution to each sample’s mutational spectrum are estimated with a chosen decomposition method for a defined number of signatures. We provide convenient access to implementations for NMF and PCA, and users can apply functions with alternative decomposition methods through the API. The user interface and library of plotting functions facilitate subsequent analysis and presentation of results (Fig. 1). Accounting for technical biases is often essential, particularly when analyzing across multiple datasets. For this purpose, we provide methods to normalize for the background distribution of sequence motifs and demonstrate the adjustment for batch effects.
Fig. 1.

Analysis of mutational signatures for eight TCGA studies (Gehring, 2014). The observed mutational spectrum of each study (panel a) was decomposed into five distinct mutational signatures S1–S5 (panel b) with NMF. The presence of these signatures in the studies (panel c), as shown by hierarchical clustering, underlines the similarities in mutational processes of biologically related cancer types. An annotated high-resolution version of this figure is available as Supplementary Figure S1

Analysis of mutational signatures for eight TCGA studies (Gehring, 2014). The observed mutational spectrum of each study (panel a) was decomposed into five distinct mutational signatures S1–S5 (panel b) with NMF. The presence of these signatures in the studies (panel c), as shown by hierarchical clustering, underlines the similarities in mutational processes of biologically related cancer types. An annotated high-resolution version of this figure is available as Supplementary Figure S1 In the documentation of the software, we illustrate a use case by analyzing 594 607 somatic SNV calls from 2408 TCGA whole-exome sequenced samples (Gehring, 2014). The analysis, including NMF, PCA and hierarchical clustering, completes within minutes on a standard desktop computer. The different approaches yield a biologically meaningful grouping of the eight cancer types according to the estimated signatures (Fig. 1). We have applied this approach to the characterization of kidney cancer and have shown that classification of subtypes according to mutational signatures is consistent with classification based on RNA expression profiling and mutation rates (Durinck ).
  8 in total

1.  Metagenes and molecular pattern discovery using matrix factorization.

Authors:  Jean-Philippe Brunet; Pablo Tamayo; Todd R Golub; Jill P Mesirov
Journal:  Proc Natl Acad Sci U S A       Date:  2004-03-11       Impact factor: 11.205

2.  VariantAnnotation: a Bioconductor package for exploration and annotation of genetic variants.

Authors:  Valerie Obenchain; Michael Lawrence; Vincent Carey; Stephanie Gogarten; Paul Shannon; Martin Morgan
Journal:  Bioinformatics       Date:  2014-03-28       Impact factor: 6.937

3.  Spectrum of diverse genomic alterations define non-clear cell renal carcinoma subtypes.

Authors:  Steffen Durinck; Eric W Stawiski; Andrea Pavía-Jiménez; Zora Modrusan; Payal Kapur; Bijay S Jaiswal; Na Zhang; Vanina Toffessi-Tcheuyap; Thong T Nguyen; Kanika Bajaj Pahuja; Ying-Jiun Chen; Sadia Saleem; Subhra Chaudhuri; Sherry Heldens; Marlena Jackson; Samuel Peña-Llopis; Joseph Guillory; Karen Toy; Connie Ha; Corissa J Harris; Eboni Holloman; Haley M Hill; Jeremy Stinson; Celina Sanchez Rivers; Vasantharajan Janakiraman; Weiru Wang; Lisa N Kinch; Nick V Grishin; Peter M Haverty; Bernard Chow; Julian S Gehring; Jens Reeder; Gregoire Pau; Thomas D Wu; Vitaly Margulis; Yair Lotan; Arthur Sagalowsky; Ivan Pedrosa; Frederic J de Sauvage; James Brugarolas; Somasekar Seshagiri
Journal:  Nat Genet       Date:  2014-11-17       Impact factor: 38.330

4.  Bioconductor: open software development for computational biology and bioinformatics.

Authors:  Robert C Gentleman; Vincent J Carey; Douglas M Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano Iacus; Rafael Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony J Rossini; Gunther Sawitzki; Colin Smith; Gordon Smyth; Luke Tierney; Jean Y H Yang; Jianhua Zhang
Journal:  Genome Biol       Date:  2004-09-15       Impact factor: 13.583

5.  EMu: probabilistic inference of mutational processes and their localization in the cancer genome.

Authors:  Andrej Fischer; Christopher J R Illingworth; Peter J Campbell; Ville Mustonen
Journal:  Genome Biol       Date:  2013-04-29       Impact factor: 13.583

6.  Mutational processes molding the genomes of 21 breast cancers.

Authors:  Serena Nik-Zainal; Ludmil B Alexandrov; David C Wedge; Peter Van Loo; Christopher D Greenman; Keiran Raine; David Jones; Jonathan Hinton; John Marshall; Lucy A Stebbings; Andrew Menzies; Sancha Martin; Kenric Leung; Lina Chen; Catherine Leroy; Manasa Ramakrishna; Richard Rance; King Wai Lau; Laura J Mudie; Ignacio Varela; David J McBride; Graham R Bignell; Susanna L Cooke; Adam Shlien; John Gamble; Ian Whitmore; Mark Maddison; Patrick S Tarpey; Helen R Davies; Elli Papaemmanuil; Philip J Stephens; Stuart McLaren; Adam P Butler; Jon W Teague; Göran Jönsson; Judy E Garber; Daniel Silver; Penelope Miron; Aquila Fatima; Sandrine Boyault; Anita Langerød; Andrew Tutt; John W M Martens; Samuel A J R Aparicio; Åke Borg; Anne Vincent Salomon; Gilles Thomas; Anne-Lise Børresen-Dale; Andrea L Richardson; Michael S Neuberger; P Andrew Futreal; Peter J Campbell; Michael R Stratton
Journal:  Cell       Date:  2012-05-17       Impact factor: 41.582

7.  Deciphering signatures of mutational processes operative in human cancer.

Authors:  Ludmil B Alexandrov; Serena Nik-Zainal; David C Wedge; Peter J Campbell; Michael R Stratton
Journal:  Cell Rep       Date:  2013-01-10       Impact factor: 9.423

8.  Signatures of mutational processes in human cancer.

Authors:  Ludmil B Alexandrov; Serena Nik-Zainal; David C Wedge; Samuel A J R Aparicio; Sam Behjati; Andrew V Biankin; Graham R Bignell; Niccolò Bolli; Ake Borg; Anne-Lise Børresen-Dale; Sandrine Boyault; Birgit Burkhardt; Adam P Butler; Carlos Caldas; Helen R Davies; Christine Desmedt; Roland Eils; Jórunn Erla Eyfjörd; John A Foekens; Mel Greaves; Fumie Hosoda; Barbara Hutter; Tomislav Ilicic; Sandrine Imbeaud; Marcin Imielinski; Marcin Imielinsk; Natalie Jäger; David T W Jones; David Jones; Stian Knappskog; Marcel Kool; Sunil R Lakhani; Carlos López-Otín; Sancha Martin; Nikhil C Munshi; Hiromi Nakamura; Paul A Northcott; Marina Pajic; Elli Papaemmanuil; Angelo Paradiso; John V Pearson; Xose S Puente; Keiran Raine; Manasa Ramakrishna; Andrea L Richardson; Julia Richter; Philip Rosenstiel; Matthias Schlesner; Ton N Schumacher; Paul N Span; Jon W Teague; Yasushi Totoki; Andrew N J Tutt; Rafael Valdés-Mas; Marit M van Buuren; Laura van 't Veer; Anne Vincent-Salomon; Nicola Waddell; Lucy R Yates; Jessica Zucman-Rossi; P Andrew Futreal; Ultan McDermott; Peter Lichter; Matthew Meyerson; Sean M Grimmond; Reiner Siebert; Elías Campo; Tatsuhiro Shibata; Stefan M Pfister; Peter J Campbell; Michael R Stratton
Journal:  Nature       Date:  2013-08-14       Impact factor: 49.962

  8 in total
  139 in total

1.  Mutation signatures specific to DNA alkylating agents in yeast and cancers.

Authors:  Natalie Saini; Joan F Sterling; Cynthia J Sakofsky; Camille K Giacobone; Leszek J Klimczak; Adam B Burkholder; Ewa P Malc; Piotr A Mieczkowski; Dmitry A Gordenin
Journal:  Nucleic Acids Res       Date:  2020-04-17       Impact factor: 16.971

Review 2.  Computational approaches for discovery of mutational signatures in cancer.

Authors:  Adrian Baez-Ortega; Kevin Gori
Journal:  Brief Bioinform       Date:  2019-01-18       Impact factor: 11.622

3.  Ancestral-derived effects on the mutational landscape of laryngeal cancer.

Authors:  Meganathan P Ramakodi; Rob J Kulathinal; Yujin Chung; Ilya Serebriiskii; Jeffrey C Liu; Camille C Ragin
Journal:  Genomics       Date:  2015-12-22       Impact factor: 5.736

4.  Exploring background mutational processes to decipher cancer genetic heterogeneity.

Authors:  Alexander Goncearenco; Stephanie L Rager; Minghui Li; Qing-Xiang Sang; Igor B Rogozin; Anna R Panchenko
Journal:  Nucleic Acids Res       Date:  2017-07-03       Impact factor: 16.971

5.  BATCAVE: calling somatic mutations with a tumor- and site-specific prior.

Authors:  Brian K Mannakee; Ryan N Gutenkunst
Journal:  NAR Genom Bioinform       Date:  2020-02-06

Review 6.  Computational tools to detect signatures of mutational processes in DNA from tumours: A review and empirical comparison of performance.

Authors:  Hanane Omichessan; Gianluca Severi; Vittorio Perduca
Journal:  PLoS One       Date:  2019-09-12       Impact factor: 3.240

7.  Modeling clinical and molecular covariates of mutational process activity in cancer.

Authors:  Welles Robinson; Roded Sharan; Mark D M Leiserson
Journal:  Bioinformatics       Date:  2019-07-15       Impact factor: 6.937

8.  In Utero Exposure to Benzo[a]pyrene Induces Ovarian Mutations at Doses That Deplete Ovarian Follicles in Mice.

Authors:  Ulrike Luderer; Matthew J Meier; Gregory W Lawson; Marc A Beal; Carole L Yauk; Francesco Marchetti
Journal:  Environ Mol Mutagen       Date:  2018-12-21       Impact factor: 3.216

9.  Prostate cancer reactivates developmental epigenomic programs during metastatic progression.

Authors:  Mark M Pomerantz; Xintao Qiu; Yanyun Zhu; David Y Takeda; Wenting Pan; Sylvan C Baca; Alexander Gusev; Keegan D Korthauer; Tesa M Severson; Gavin Ha; Srinivas R Viswanathan; Ji-Heui Seo; Holly M Nguyen; Baohui Zhang; Bogdan Pasaniuc; Claudia Giambartolomei; Sarah A Alaiwi; Connor A Bell; Edward P O'Connor; Matthew S Chabot; David R Stillman; Rosina Lis; Alba Font-Tello; Lewyn Li; Paloma Cejas; Andries M Bergman; Joyce Sanders; Henk G van der Poel; Simon A Gayther; Kate Lawrenson; Marcos A S Fonseca; Jessica Reddy; Rosario I Corona; Gleb Martovetsky; Brian Egan; Toni Choueiri; Leigh Ellis; Isla P Garraway; Gwo-Shu Mary Lee; Eva Corey; Henry W Long; Wilbert Zwart; Matthew L Freedman
Journal:  Nat Genet       Date:  2020-07-20       Impact factor: 38.330

10.  Associations among the mutational landscape, immune microenvironment, and prognosis in Chinese patients with hepatocellular carcinoma.

Authors:  Zhi-Qiang Hu; Hao-Yang Xin; Chu-Bin Luo; Jia Li; Zheng-Jun Zhou; Ji-Xue Zou; Shao-Lai Zhou
Journal:  Cancer Immunol Immunother       Date:  2020-08-06       Impact factor: 6.968

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.