Galen F Gao1, Coyin Oh1,2,3, Gordon Saksena1, Davy Deng1,3,4, Lindsay C Westlake1, Barbara A Hill1, Michael Reich1,5, Steven E Schumacher1,3, Ashton C Berger1,3, Scott L Carter1,2,3, Andrew D Cherniack1, Matthew Meyerson1,3,6, Barbara Tabak1,3, Rameen Beroukhim1,3,7, Gad Getz1,8,9. 1. Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA. 2. Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School, Boston, MA, USA. 3. Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA. 4. Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, USA. 5. Department of Medicine, Division of Medical Genetics, University of California, San Diego, La, Jolla, CA, USA. 6. Department of Genetics, Harvard Medical School, Boston, MA, USA. 7. Department of Medicine, Harvard Medical School, Boston, MA, USA. 8. Department of Pathology, Harvard Medical School, Boston, MA, USA. 9. Department of Pathology, Massachusetts General Hospital, Boston, MA, USA.
Abstract
MOTIVATION: Somatic copy-number alterations (SCNAs) play an important role in cancer development. Systematic noise in sequencing and array data present a significant challenge to the inference of SCNAs for cancer genome analyses. As part of The Cancer Genome Atlas, the Broad Institute Genome Characterization Center developed the Tangent normalization method to generate copy-number profiles using data from single-nucleotide polymorphism (SNP) arrays and whole-exome sequencing (WES) technologies for over 10 000 pairs of tumors and matched normal samples. Here, we describe the Tangent method, which uses a unique linear combination of normal samples as a reference for each tumor sample, to subtract systematic errors that vary across samples. We also describe a modification of Tangent, called Pseudo-Tangent, which enables denoising through comparisons between tumor profiles when few normal samples are available. RESULTS: Tangent normalization substantially increases signal-to-noise ratios (SNRs) compared to conventional normalization methods in both SNP array and WES analyses. Tangent and Pseudo-Tangent normalizations improve the SNR by reducing noise with minimal effect on signal and exceed the contribution of other steps in the analysis such as choice of segmentation algorithm. Tangent and Pseudo-Tangent are broadly applicable and enable more accurate inference of SCNAs from DNA sequencing and array data. AVAILABILITY AND IMPLEMENTATION: Tangent is available at https://github.com/broadinstitute/tangent and as a Docker image (https://hub.docker.com/r/broadinstitute/tangent). Tangent is also the normalization method for the copy-number pipeline in Genome Analysis Toolkit 4 (GATK4). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Somatic copy-number alterations (SCNAs) play an important role in cancer development. Systematic noise in sequencing and array data present a significant challenge to the inference of SCNAs for cancer genome analyses. As part of The Cancer Genome Atlas, the Broad Institute Genome Characterization Center developed the Tangent normalization method to generate copy-number profiles using data from single-nucleotide polymorphism (SNP) arrays and whole-exome sequencing (WES) technologies for over 10 000 pairs of tumors and matched normal samples. Here, we describe the Tangent method, which uses a unique linear combination of normal samples as a reference for each tumor sample, to subtract systematic errors that vary across samples. We also describe a modification of Tangent, called Pseudo-Tangent, which enables denoising through comparisons between tumor profiles when few normal samples are available. RESULTS: Tangent normalization substantially increases signal-to-noise ratios (SNRs) compared to conventional normalization methods in both SNP array and WES analyses. Tangent and Pseudo-Tangent normalizations improve the SNR by reducing noise with minimal effect on signal and exceed the contribution of other steps in the analysis such as choice of segmentation algorithm. Tangent and Pseudo-Tangent are broadly applicable and enable more accurate inference of SCNAs from DNA sequencing and array data. AVAILABILITY AND IMPLEMENTATION: Tangent is available at https://github.com/broadinstitute/tangent and as a Docker image (https://hub.docker.com/r/broadinstitute/tangent). Tangent is also the normalization method for the copy-number pipeline in Genome Analysis Toolkit 4 (GATK4). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Jarupon Fah Sathirapongsasuti; Hane Lee; Basil A J Horst; Georg Brunner; Alistair J Cochran; Scott Binder; John Quackenbush; Stanley F Nelson Journal: Bioinformatics Date: 2011-08-09 Impact factor: 6.937
Authors: F Favero; T Joshi; A M Marquard; N J Birkbak; M Krzystanek; Q Li; Z Szallasi; A C Eklund Journal: Ann Oncol Date: 2014-10-15 Impact factor: 32.976
Authors: Jeremiah A Wala; Pratiti Bandopadhayay; Noah F Greenwald; Ryan O'Rourke; Ted Sharpe; Chip Stewart; Steve Schumacher; Yilong Li; Joachim Weischenfeldt; Xiaotong Yao; Chad Nusbaum; Peter Campbell; Gad Getz; Matthew Meyerson; Cheng-Zhong Zhang; Marcin Imielinski; Rameen Beroukhim Journal: Genome Res Date: 2018-03-13 Impact factor: 9.438
Authors: Yotam Drier; Michael S Lawrence; Scott L Carter; Chip Stewart; Stacey B Gabriel; Eric S Lander; Matthew Meyerson; Rameen Beroukhim; Gad Getz Journal: Genome Res Date: 2012-11-02 Impact factor: 9.043
Authors: Gro Nilsen; Knut Liestøl; Peter Van Loo; Hans Kristian Moen Vollan; Marianne B Eide; Oscar M Rueda; Suet-Feung Chin; Roslin Russell; Lars O Baumbusch; Carlos Caldas; Anne-Lise Børresen-Dale; Ole Christian Lingjaerde Journal: BMC Genomics Date: 2012-11-04 Impact factor: 3.969