Roger Pique-Regi1, Antonio Ortega, Shahab Asgharzadeh. 1. Signal and Image Processing Institute, Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, EEB 400, 3740 McClintock Ave, Los Angeles, CA 90089-2564, USA. rpique@ieee.org
Abstract
MOTIVATION: The complexity of a large number of recently discovered copy number polymorphisms is much higher than initially thought, thus making it more difficult to detect them in the presence of significant measurement noise. In this scenario, separate normalization and segmentation is prone to lead to many false detections of changes in copy number. New approaches capable of jointly modeling the copy number and the non-copy number (noise) hybridization effects across multiple samples will potentially lead to more accurate results. METHODS: In this article, the genome alteration detection analysis (GADA) approach introduced in our previous work is extended to a multiple sample model. The copy number component is independent for each sample and uses a sparse Bayesian prior, while the reference hybridization level is not necessarily sparse but identical on all samples. The expectation maximization (EM) algorithm used to fit the model iteratively determines whether the observed hybridization levels are more likely due to a copy number variation or to a shared hybridization bias. RESULTS: The new proposed approach is compared with the currently used strategy of separate normalization followed by independent segmentation of each array. Real microarray data obtained from HapMap samples are randomly partitioned to create different reference sets. Using the new approach, copy number and reference intensity estimates are significantly less variable if the reference set changes; and a higher consistency on copy numbers detected within HapMap family trios is obtained. Finally, the running time to fit the model grows linearly in the number samples and probes. AVAILABILITY: http://biron.usc.edu/~piquereg/GADA.
MOTIVATION: The complexity of a large number of recently discovered copy number polymorphisms is much higher than initially thought, thus making it more difficult to detect them in the presence of significant measurement noise. In this scenario, separate normalization and segmentation is prone to lead to many false detections of changes in copy number. New approaches capable of jointly modeling the copy number and the non-copy number (noise) hybridization effects across multiple samples will potentially lead to more accurate results. METHODS: In this article, the genome alteration detection analysis (GADA) approach introduced in our previous work is extended to a multiple sample model. The copy number component is independent for each sample and uses a sparse Bayesian prior, while the reference hybridization level is not necessarily sparse but identical on all samples. The expectation maximization (EM) algorithm used to fit the model iteratively determines whether the observed hybridization levels are more likely due to a copy number variation or to a shared hybridization bias. RESULTS: The new proposed approach is compared with the currently used strategy of separate normalization followed by independent segmentation of each array. Real microarray data obtained from HapMap samples are randomly partitioned to create different reference sets. Using the new approach, copy number and reference intensity estimates are significantly less variable if the reference set changes; and a higher consistency on copy numbers detected within HapMap family trios is obtained. Finally, the running time to fit the model grows linearly in the number samples and probes. AVAILABILITY: http://biron.usc.edu/~piquereg/GADA.
Authors: David Fredman; Stefan J White; Susanna Potter; Evan E Eichler; Johan T Den Dunnen; Anthony J Brookes Journal: Nat Genet Date: 2004-07-11 Impact factor: 38.330
Authors: A John Iafrate; Lars Feuk; Miguel N Rivera; Marc L Listewnik; Patricia K Donahoe; Ying Qi; Stephen W Scherer; Charles Lee Journal: Nat Genet Date: 2004-08-01 Impact factor: 38.330
Authors: Jennifer L Freeman; George H Perry; Lars Feuk; Richard Redon; Steven A McCarroll; David M Altshuler; Hiroyuki Aburatani; Keith W Jones; Chris Tyler-Smith; Matthew E Hurles; Nigel P Carter; Stephen W Scherer; Charles Lee Journal: Genome Res Date: 2006-06-29 Impact factor: 9.043
Authors: Jing Huang; Wen Wei; Jane Zhang; Guoying Liu; Graham R Bignell; Michael R Stratton; P Andrew Futreal; Richard Wooster; Keith W Jones; Michael H Shapero Journal: Hum Genomics Date: 2004-05 Impact factor: 4.639
Authors: Sharon J Diskin; Mingyao Li; Cuiping Hou; Shuzhang Yang; Joseph Glessner; Hakon Hakonarson; Maja Bucan; John M Maris; Kai Wang Journal: Nucleic Acids Res Date: 2008-09-10 Impact factor: 16.971
Authors: Robert B Scharpf; Terri H Beaty; Holger Schwender; Samuel G Younkin; Alan F Scott; Ingo Ruczinski Journal: BMC Bioinformatics Date: 2012-12-12 Impact factor: 3.169
Authors: Armand Valsesia; Brian J Stevenson; Dawn Waterworth; Vincent Mooser; Peter Vollenweider; Gérard Waeber; C Victor Jongeneel; Jacques S Beckmann; Zoltán Kutalik; Sven Bergmann Journal: BMC Genomics Date: 2012-06-15 Impact factor: 3.969
Authors: Alexandra M Lopes; Kenneth I Aston; Emma Thompson; Filipa Carvalho; João Gonçalves; Ni Huang; Rune Matthiesen; Michiel J Noordam; Inés Quintela; Avinash Ramu; Catarina Seabra; Amy B Wilfert; Juncheng Dai; Jonathan M Downie; Susana Fernandes; Xuejiang Guo; Jiahao Sha; António Amorim; Alberto Barros; Angel Carracedo; Zhibin Hu; Matthew E Hurles; Sergey Moskovtsev; Carole Ober; Darius A Paduch; Joshua D Schiffman; Peter N Schlegel; Mário Sousa; Douglas T Carrell; Donald F Conrad Journal: PLoS Genet Date: 2013-03-21 Impact factor: 5.917