MOTIVATION: DNA copy number aberration (CNA) is a hallmark of genomic abnormality in tumor cells. Recurrent CNA (RCNA) occurs in multiple cancer samples across the same chromosomal region and has greater implication in tumorigenesis. Current commonly used methods for RCNA identification require CNA calling for individual samples before cross-sample analysis. This two-step strategy may result in a heavy computational burden, as well as a loss of the overall statistical power due to segmentation and discretization of individual sample's data. We propose a population-based approach for RCNA detection with no need of single-sample analysis, which is statistically powerful, computationally efficient and particularly suitable for high-resolution and large-population studies. RESULTS: Our approach, correlation matrix diagonal segmentation (CMDS), identifies RCNAs based on a between-chromosomal-site correlation analysis. Directly using the raw intensity ratio data from all samples and adopting a diagonal transformation strategy, CMDS substantially reduces computational burden and can obtain results very quickly from large datasets. Our simulation indicates that the statistical power of CMDS is higher than that of single-sample CNA calling based two-step approaches. We applied CMDS to two real datasets of lung cancer and brain cancer from Affymetrix and Illumina array platforms, respectively, and successfully identified known regions of CNA associated with EGFR, KRAS and other important oncogenes. CMDS provides a fast, powerful and easily implemented tool for the RCNA analysis of large-scale data from cancer genomes.
MOTIVATION: DNA copy number aberration (CNA) is a hallmark of genomic abnormality in tumor cells. Recurrent CNA (RCNA) occurs in multiple cancer samples across the same chromosomal region and has greater implication in tumorigenesis. Current commonly used methods for RCNA identification require CNA calling for individual samples before cross-sample analysis. This two-step strategy may result in a heavy computational burden, as well as a loss of the overall statistical power due to segmentation and discretization of individual sample's data. We propose a population-based approach for RCNA detection with no need of single-sample analysis, which is statistically powerful, computationally efficient and particularly suitable for high-resolution and large-population studies. RESULTS: Our approach, correlation matrix diagonal segmentation (CMDS), identifies RCNAs based on a between-chromosomal-site correlation analysis. Directly using the raw intensity ratio data from all samples and adopting a diagonal transformation strategy, CMDS substantially reduces computational burden and can obtain results very quickly from large datasets. Our simulation indicates that the statistical power of CMDS is higher than that of single-sample CNA calling based two-step approaches. We applied CMDS to two real datasets of lung cancer and brain cancer from Affymetrix and Illumina array platforms, respectively, and successfully identified known regions of CNA associated with EGFR, KRAS and other important oncogenes. CMDS provides a fast, powerful and easily implemented tool for the RCNA analysis of large-scale data from cancer genomes.
Authors: Björn Nilsson; Mikael Johansson; Fatima Al-Shahrour; Anne E Carpenter; Benjamin L Ebert Journal: Bioinformatics Date: 2009-02-19 Impact factor: 6.937
Authors: Steven A McCarroll; Finny G Kuruvilla; Joshua M Korn; Simon Cawley; James Nemesh; Alec Wysoker; Michael H Shapero; Paul I W de Bakker; Julian B Maller; Andrew Kirby; Amanda L Elliott; Melissa Parkin; Earl Hubbell; Teresa Webster; Rui Mei; James Veitch; Patrick J Collins; Robert Handsaker; Steve Lincoln; Marcia Nizzari; John Blume; Keith W Jones; Rich Rava; Mark J Daly; Stacey B Gabriel; David Altshuler Journal: Nat Genet Date: 2008-09-07 Impact factor: 38.330
Authors: Barbara A Weir; Michele S Woo; Gad Getz; Sven Perner; Li Ding; Rameen Beroukhim; William M Lin; Michael A Province; Aldi Kraja; Laura A Johnson; Kinjal Shah; Mitsuo Sato; Roman K Thomas; Justine A Barletta; Ingrid B Borecki; Stephen Broderick; Andrew C Chang; Derek Y Chiang; Lucian R Chirieac; Jeonghee Cho; Yoshitaka Fujii; Adi F Gazdar; Thomas Giordano; Heidi Greulich; Megan Hanna; Bruce E Johnson; Mark G Kris; Alex Lash; Ling Lin; Neal Lindeman; Elaine R Mardis; John D McPherson; John D Minna; Margaret B Morgan; Mark Nadel; Mark B Orringer; John R Osborne; Brad Ozenberger; Alex H Ramos; James Robinson; Jack A Roth; Valerie Rusch; Hidefumi Sasaki; Frances Shepherd; Carrie Sougnez; Margaret R Spitz; Ming-Sound Tsao; David Twomey; Roel G W Verhaak; George M Weinstock; David A Wheeler; Wendy Winckler; Akihiko Yoshizawa; Soyoung Yu; Maureen F Zakowski; Qunyuan Zhang; David G Beer; Ignacio I Wistuba; Mark A Watson; Levi A Garraway; Marc Ladanyi; William D Travis; William Pao; Mark A Rubin; Stacey B Gabriel; Richard A Gibbs; Harold E Varmus; Richard K Wilson; Eric S Lander; Matthew Meyerson Journal: Nature Date: 2007-11-04 Impact factor: 49.962
Authors: John C Marioni; Natalie P Thorne; Armand Valsesia; Tomas Fitzgerald; Richard Redon; Heike Fiegler; T Daniel Andrews; Barbara E Stranger; Andrew G Lynch; Emmanouil T Dermitzakis; Nigel P Carter; Simon Tavaré; Matthew E Hurles Journal: Genome Biol Date: 2007 Impact factor: 13.583
Authors: Mitchell Guttman; Carolyn Mies; Katarzyna Dudycz-Sulicz; Sharon J Diskin; Don A Baldwin; Christian J Stoeckert; Gregory R Grant Journal: PLoS Genet Date: 2007-08 Impact factor: 5.917
Authors: Sharon J Diskin; Mingyao Li; Cuiping Hou; Shuzhang Yang; Joseph Glessner; Hakon Hakonarson; Maja Bucan; John M Maris; Kai Wang Journal: Nucleic Acids Res Date: 2008-09-10 Impact factor: 16.971
Authors: Daniel C Koboldt; Qunyuan Zhang; David E Larson; Dong Shen; Michael D McLellan; Ling Lin; Christopher A Miller; Elaine R Mardis; Li Ding; Richard K Wilson Journal: Genome Res Date: 2012-02-02 Impact factor: 9.043
Authors: Robert B Scharpf; Rafael A Irizarry; Matthew E Ritchie; Benilton Carvalho; Ingo Ruczinski Journal: J Stat Softw Date: 2011-05-01 Impact factor: 6.440