MOTIVATION: Array comparative genomic hybridization (aCGH) is a pervasive technique used to identify chromosomal aberrations in human diseases, including cancer. Aberrations are defined as regions of increased or decreased DNA copy number, relative to a normal sample. Accurately identifying the locations of these aberrations has many important medical applications. Unfortunately, the observed copy number changes are often corrupted by various sources of noise, making the boundaries hard to detect. One popular current technique uses hidden Markov models (HMMs) to divide the signal into regions of constant copy number called segments; a subsequent classification phase labels each segment as a gain, a loss or neutral. Unfortunately, standard HMMs are sensitive to outliers, causing over-segmentation, where segments erroneously span very short regions. RESULTS: We propose a simple modification that makes the HMM robust to such outliers. More importantly, this modification allows us to exploit prior knowledge about the likely location of "outliers", which are often due to copy number polymorphisms (CNPs). By "explaining away" these outliers with prior knowledge about the locations of CNPs, we can focus attention on the more clinically relevant aberrated regions. We show significant improvements over the current state of the art technique (DNAcopy with MergeLevels) on previously published data from mantle cell lymphoma cell lines, and on published benchmark synthetic data augmented with outliers. AVAILABILITY: Source code written in Matlab is available from http://www.cs.ubc.ca/~sshah/acgh.
MOTIVATION: Array comparative genomic hybridization (aCGH) is a pervasive technique used to identify chromosomal aberrations in human diseases, including cancer. Aberrations are defined as regions of increased or decreased DNA copy number, relative to a normal sample. Accurately identifying the locations of these aberrations has many important medical applications. Unfortunately, the observed copy number changes are often corrupted by various sources of noise, making the boundaries hard to detect. One popular current technique uses hidden Markov models (HMMs) to divide the signal into regions of constant copy number called segments; a subsequent classification phase labels each segment as a gain, a loss or neutral. Unfortunately, standard HMMs are sensitive to outliers, causing over-segmentation, where segments erroneously span very short regions. RESULTS: We propose a simple modification that makes the HMM robust to such outliers. More importantly, this modification allows us to exploit prior knowledge about the likely location of "outliers", which are often due to copy number polymorphisms (CNPs). By "explaining away" these outliers with prior knowledge about the locations of CNPs, we can focus attention on the more clinically relevant aberrated regions. We show significant improvements over the current state of the art technique (DNAcopy with MergeLevels) on previously published data from mantle cell lymphoma cell lines, and on published benchmark synthetic data augmented with outliers. AVAILABILITY: Source code written in Matlab is available from http://www.cs.ubc.ca/~sshah/acgh.
Authors: Lynda B Bennett; Kristen H Taylor; Gerald L Arthur; Farahnaz B Rahmatpanah; Sam I Hooshmand; Charles W Caldwell Journal: Epigenomics Date: 2010-02-01 Impact factor: 4.778
Authors: Nathan E Wineinger; Richard E Kennedy; Stephen W Erickson; Mary K Wojczynski; Carl E Bruder; Hemant K Tiwari Journal: Int J Comput Biol Drug Des Date: 2008
Authors: Christian Steidl; Sohrab P Shah; Bruce W Woolcock; Lixin Rui; Masahiro Kawahara; Pedro Farinha; Nathalie A Johnson; Yongjun Zhao; Adele Telenius; Susana Ben Neriah; Andrew McPherson; Barbara Meissner; Ujunwa C Okoye; Arjan Diepstra; Anke van den Berg; Mark Sun; Gillian Leung; Steven J Jones; Joseph M Connors; David G Huntsman; Kerry J Savage; Lisa M Rimsza; Douglas E Horsman; Louis M Staudt; Ulrich Steidl; Marco A Marra; Randy D Gascoyne Journal: Nature Date: 2011-03-02 Impact factor: 49.962
Authors: Ryan D Morin; Nathalie A Johnson; Tesa M Severson; Andrew J Mungall; Jianghong An; Rodrigo Goya; Jessica E Paul; Merrill Boyle; Bruce W Woolcock; Florian Kuchenbauer; Damian Yap; R Keith Humphries; Obi L Griffith; Sohrab Shah; Henry Zhu; Michelle Kimbara; Pavel Shashkin; Jean F Charlot; Marianna Tcherpakov; Richard Corbett; Angela Tam; Richard Varhol; Duane Smailus; Michelle Moksa; Yongjun Zhao; Allen Delaney; Hong Qian; Inanc Birol; Jacqueline Schein; Richard Moore; Robert Holt; Doug E Horsman; Joseph M Connors; Steven Jones; Samuel Aparicio; Martin Hirst; Randy D Gascoyne; Marco A Marra Journal: Nat Genet Date: 2010-01-17 Impact factor: 38.330
Authors: Chris D Greenman; Graham Bignell; Adam Butler; Sarah Edkins; Jon Hinton; Dave Beare; Sajani Swamy; Thomas Santarius; Lina Chen; Sara Widaa; P Andy Futreal; Michael R Stratton Journal: Biostatistics Date: 2009-10-15 Impact factor: 5.899