Zilu Zhou1, Weixin Wang2, Li-San Wang2, Nancy Ruonan Zhang3. 1. Graduate Group in Genomics and Computational Biology. 2. Department of Pathology and Laboratory Medicine, Perelman School of Medicine. 3. Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA.
Abstract
Motivation: Copy number variations (CNVs) are gains and losses of DNA segments and have been associated with disease. Many large-scale genetic association studies are performing CNV analysis using whole exome sequencing (WES) and whole genome sequencing (WGS). In many of these studies, previous single-nucleotide polymorphism (SNP)-array data are available. An integrated cross-platform analysis is expected to improve resolution and accuracy, yet there is no tool for effectively combining data from sequencing and array platforms. The detection of CNVs using sequencing data alone can also be further improved by the utilization of allele-specific reads. Results: We propose a statistical framework, integrated CNV (iCNV) detection algorithm, which can be applied to multiple study designs: WES only, WGS only, SNP array only, or any combination of SNP and sequencing data. iCNV applies platform-specific normalization, utilizes allele specific reads from sequencing and integrates matched NGS and SNP-array data by a hidden Markov model. We compare integrated two-platform CNV detection using iCNV to naïve intersection or union of platforms and show that iCNV increases sensitivity and robustness. We also assess the accuracy of iCNV on WGS data only and show that the utilization of allele-specific reads improve CNV detection accuracy compared to existing methods. Availability and implementation: https://github.com/zhouzilu/iCNV. Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: Copy number variations (CNVs) are gains and losses of DNA segments and have been associated with disease. Many large-scale genetic association studies are performing CNV analysis using whole exome sequencing (WES) and whole genome sequencing (WGS). In many of these studies, previous single-nucleotide polymorphism (SNP)-array data are available. An integrated cross-platform analysis is expected to improve resolution and accuracy, yet there is no tool for effectively combining data from sequencing and array platforms. The detection of CNVs using sequencing data alone can also be further improved by the utilization of allele-specific reads. Results: We propose a statistical framework, integrated CNV (iCNV) detection algorithm, which can be applied to multiple study designs: WES only, WGS only, SNP array only, or any combination of SNP and sequencing data. iCNV applies platform-specific normalization, utilizes allele specific reads from sequencing and integrates matched NGS and SNP-array data by a hidden Markov model. We compare integrated two-platform CNV detection using iCNV to naïve intersection or union of platforms and show that iCNV increases sensitivity and robustness. We also assess the accuracy of iCNV on WGS data only and show that the utilization of allele-specific reads improve CNV detection accuracy compared to existing methods. Availability and implementation: https://github.com/zhouzilu/iCNV. Supplementary information: Supplementary data are available at Bioinformatics online.
Authors: Sharon J Diskin; Cuiping Hou; Joseph T Glessner; Edward F Attiyeh; Marci Laudenslager; Kristopher Bosse; Kristina Cole; Yaël P Mossé; Andrew Wood; Jill E Lynch; Katlyn Pecor; Maura Diamond; Cynthia Winter; Kai Wang; Cecilia Kim; Elizabeth A Geiger; Patrick W McGrady; Alexandra I F Blakemore; Wendy B London; Tamim H Shaikh; Jonathan Bradfield; Struan F A Grant; Hongzhe Li; Marcella Devoto; Eric R Rappaport; Hakon Hakonarson; John M Maris Journal: Nature Date: 2009-06-18 Impact factor: 49.962
Authors: Kai Wang; Mingyao Li; Dexter Hadley; Rui Liu; Joseph Glessner; Struan F A Grant; Hakon Hakonarson; Maja Bucan Journal: Genome Res Date: 2007-10-05 Impact factor: 9.043
Authors: Menachem Fromer; Jennifer L Moran; Kimberly Chambert; Eric Banks; Sarah E Bergen; Douglas M Ruderfer; Robert E Handsaker; Steven A McCarroll; Michael C O'Donovan; Michael J Owen; George Kirov; Patrick F Sullivan; Christina M Hultman; Pamela Sklar; Shaun M Purcell Journal: Am J Hum Genet Date: 2012-10-05 Impact factor: 11.025
Authors: Jennifer L Freeman; George H Perry; Lars Feuk; Richard Redon; Steven A McCarroll; David M Altshuler; Hiroyuki Aburatani; Keith W Jones; Chris Tyler-Smith; Matthew E Hurles; Nigel P Carter; Stephen W Scherer; Charles Lee Journal: Genome Res Date: 2006-06-29 Impact factor: 9.043
Authors: Jeffrey T Leek; Robert B Scharpf; Héctor Corrada Bravo; David Simcha; Benjamin Langmead; W Evan Johnson; Donald Geman; Keith Baggerly; Rafael A Irizarry Journal: Nat Rev Genet Date: 2010-09-14 Impact factor: 53.242
Authors: Joseph T Glessner; Kai Wang; Guiqing Cai; Olena Korvatska; Cecilia E Kim; Shawn Wood; Haitao Zhang; Annette Estes; Camille W Brune; Jonathan P Bradfield; Marcin Imielinski; Edward C Frackelton; Jennifer Reichert; Emily L Crawford; Jeffrey Munson; Patrick M A Sleiman; Rosetta Chiavacci; Kiran Annaiah; Kelly Thomas; Cuiping Hou; Wendy Glaberson; James Flory; Frederick Otieno; Maria Garris; Latha Soorya; Lambertus Klei; Joseph Piven; Kacie J Meyer; Evdokia Anagnostou; Takeshi Sakurai; Rachel M Game; Danielle S Rudd; Danielle Zurawiecki; Christopher J McDougle; Lea K Davis; Judith Miller; David J Posey; Shana Michaels; Alexander Kolevzon; Jeremy M Silverman; Raphael Bernier; Susan E Levy; Robert T Schultz; Geraldine Dawson; Thomas Owley; William M McMahon; Thomas H Wassink; John A Sweeney; John I Nurnberger; Hilary Coon; James S Sutcliffe; Nancy J Minshew; Struan F A Grant; Maja Bucan; Edwin H Cook; Joseph D Buxbaum; Bernie Devlin; Gerard D Schellenberg; Hakon Hakonarson Journal: Nature Date: 2009-04-28 Impact factor: 49.962
Authors: D Pinkel; R Segraves; D Sudar; S Clark; I Poole; D Kowbel; C Collins; W L Kuo; C Chen; Y Zhai; S H Dairkee; B M Ljung; J W Gray; D G Albertson Journal: Nat Genet Date: 1998-10 Impact factor: 38.330
Authors: Steven A McCarroll; Alan Huett; Petric Kuballa; Shannon D Chilewski; Aimee Landry; Philippe Goyette; Michael C Zody; Jennifer L Hall; Steven R Brant; Judy H Cho; Richard H Duerr; Mark S Silverberg; Kent D Taylor; John D Rioux; David Altshuler; Mark J Daly; Ramnik J Xavier Journal: Nat Genet Date: 2008-09 Impact factor: 38.330
Authors: Niklas Krumm; Peter H Sudmant; Arthur Ko; Brian J O'Roak; Maika Malig; Bradley P Coe; Aaron R Quinlan; Deborah A Nickerson; Evan E Eichler Journal: Genome Res Date: 2012-05-14 Impact factor: 9.043
Authors: Sarah Sandmann; Marius Wöste; Aniek O de Graaf; Birgit Burkhardt; Joop H Jansen; Martin Dugas Journal: Gigascience Date: 2020-11-02 Impact factor: 6.524