E S Venkatraman1, Adam B Olshen. 1. Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY 10021, USA. venkatre@mskcc.org
Abstract
MOTIVATION: Array CGH technologies enable the simultaneous measurement of DNA copy number for thousands of sites on a genome. We developed the circular binary segmentation (CBS) algorithm to divide the genome into regions of equal copy number. The algorithm tests for change-points using a maximal t-statistic with a permutation reference distribution to obtain the corresponding P-value. The number of computations required for the maximal test statistic is O(N2), where N is the number of markers. This makes the full permutation approach computationally prohibitive for the newer arrays that contain tens of thousands markers and highlights the need for a faster algorithm. RESULTS: We present a hybrid approach to obtain the P-value of the test statistic in linear time. We also introduce a rule for stopping early when there is strong evidence for the presence of a change. We show through simulations that the hybrid approach provides a substantial gain in speed with only a negligible loss in accuracy and that the stopping rule further increases speed. We also present the analyses of array CGH data from breast cancer cell lines to show the impact of the new approaches on the analysis of real data. AVAILABILITY: An R version of the CBS algorithm has been implemented in the "DNAcopy" package of the Bioconductor project. The proposed hybrid method for the P-value is available in version 1.2.1 or higher and the stopping rule for declaring a change early is available in version 1.5.1 or higher.
MOTIVATION: Array CGH technologies enable the simultaneous measurement of DNA copy number for thousands of sites on a genome. We developed the circular binary segmentation (CBS) algorithm to divide the genome into regions of equal copy number. The algorithm tests for change-points using a maximal t-statistic with a permutation reference distribution to obtain the corresponding P-value. The number of computations required for the maximal test statistic is O(N2), where N is the number of markers. This makes the full permutation approach computationally prohibitive for the newer arrays that contain tens of thousands markers and highlights the need for a faster algorithm. RESULTS: We present a hybrid approach to obtain the P-value of the test statistic in linear time. We also introduce a rule for stopping early when there is strong evidence for the presence of a change. We show through simulations that the hybrid approach provides a substantial gain in speed with only a negligible loss in accuracy and that the stopping rule further increases speed. We also present the analyses of array CGH data from breast cancer cell lines to show the impact of the new approaches on the analysis of real data. AVAILABILITY: An R version of the CBS algorithm has been implemented in the "DNAcopy" package of the Bioconductor project. The proposed hybrid method for the P-value is available in version 1.2.1 or higher and the stopping rule for declaring a change early is available in version 1.5.1 or higher.
Authors: Efsevia Vakiani; Manickam Janakiraman; Ronglai Shen; Rileen Sinha; Zhaoshi Zeng; Jinru Shia; Andrea Cercek; Nancy Kemeny; Michael D'Angelica; Agnes Viale; Adriana Heguy; Philip Paty; Timothy A Chan; Leonard B Saltz; Martin Weiser; David B Solit Journal: J Clin Oncol Date: 2012-06-04 Impact factor: 44.544
Authors: Tyrone Ryba; Ichiro Hiratani; Junjie Lu; Mari Itoh; Michael Kulik; Jinfeng Zhang; Thomas C Schulz; Allan J Robins; Stephen Dalton; David M Gilbert Journal: Genome Res Date: 2010-04-29 Impact factor: 9.043
Authors: Charles G Mullighan; Sima Jeha; Deqing Pei; Debbie Payne-Turner; Elaine Coustan-Smith; Kathryn G Roberts; Esmé Waanders; John K Choi; Xiaotu Ma; Susana C Raimondi; Yiping Fan; Wenjian Yang; Guangchun Song; Jun J Yang; Hiroto Inaba; James R Downing; Wing H Leung; W Paul Bowman; Mary V Relling; William E Evans; Jinghui Zhang; Dario Campana; Ching-Hon Pui Journal: Blood Date: 2015-11-02 Impact factor: 22.113
Authors: Clara Bodelon; Svetlana Vinokurova; Joshua N Sampson; Johan A den Boon; Joan L Walker; Mark A Horswill; Keegan Korthauer; Mark Schiffman; Mark E Sherman; Rosemary E Zuna; Jason Mitchell; Xijun Zhang; Joseph F Boland; Anil K Chaturvedi; S Terence Dunn; Michael A Newton; Paul Ahlquist; Sophia S Wang; Nicolas Wentzensen Journal: Carcinogenesis Date: 2015-12-09 Impact factor: 4.944