Feifei Xiao1, Xizhi Luo1, Ning Hao2, Yue S Niu2, Xiangjun Xiao3, Guoshuai Cai4, Christopher I Amos3, Heping Zhang5. 1. Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, SC, USA. 2. Department of Mathematics, University of Arizona, Tucson, AZ, USA. 3. Department of Quantitative Sciences, Baylor College of Medicine, Houston, TX, USA. 4. Department of Environmental Health Science, University of South Carolina, Columbia, SC, USA. 5. Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA.
Abstract
MOTIVATION: Integration of multiple genetic sources for copy number variation detection (CNV) is a powerful approach to improve the identification of variants associated with complex traits. Although it has been shown that the widely used change point based methods can increase statistical power to identify variants, it remains challenging to effectively detect CNVs with weak signals due to the noisy nature of genotyping intensity data. We previously developed modSaRa, a normal mean-based model on a screening and ranking algorithm for copy number variation identification which presented desirable sensitivity with high computational efficiency. To boost statistical power for the identification of variants, here we present a novel improvement that integrates the relative allelic intensity with external information from empirical statistics with modeling, which we called modSaRa2. RESULTS: Simulation studies illustrated that modSaRa2 markedly improved both sensitivity and specificity over existing methods for analyzing array-based data. The improvement in weak CNV signal detection is the most substantial, while it also simultaneously improves stability when CNV size varies. The application of the new method to a whole genome melanoma dataset identified novel candidate melanoma risk associated deletions on chromosome bands 1p22.2 and duplications on 6p22, 6q25 and 19p13 regions, which may facilitate the understanding of the possible roles of germline copy number variants in the etiology of melanoma. AVAILABILITY AND IMPLEMENTATION: http://c2s2.yale.edu/software/modSaRa2 or https://github.com/FeifeiXiaoUSC/modSaRa2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Integration of multiple genetic sources for copy number variation detection (CNV) is a powerful approach to improve the identification of variants associated with complex traits. Although it has been shown that the widely used change point based methods can increase statistical power to identify variants, it remains challenging to effectively detect CNVs with weak signals due to the noisy nature of genotyping intensity data. We previously developed modSaRa, a normal mean-based model on a screening and ranking algorithm for copy number variation identification which presented desirable sensitivity with high computational efficiency. To boost statistical power for the identification of variants, here we present a novel improvement that integrates the relative allelic intensity with external information from empirical statistics with modeling, which we called modSaRa2. RESULTS: Simulation studies illustrated that modSaRa2 markedly improved both sensitivity and specificity over existing methods for analyzing array-based data. The improvement in weak CNV signal detection is the most substantial, while it also simultaneously improves stability when CNV size varies. The application of the new method to a whole genome melanoma dataset identified novel candidate melanoma risk associated deletions on chromosome bands 1p22.2 and duplications on 6p22, 6q25 and 19p13 regions, which may facilitate the understanding of the possible roles of germline copy number variants in the etiology of melanoma. AVAILABILITY AND IMPLEMENTATION: http://c2s2.yale.edu/software/modSaRa2 or https://github.com/FeifeiXiaoUSC/modSaRa2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Elizabeth Gillanders; Suh-Hang Hank Juo; Elizabeth A Holland; MaryPat Jones; Derek Nancarrow; Diana Freas-Lutz; Raman Sood; Naeun Park; Mezbah Faruque; Carol Markey; Richard F Kefford; Jane Palmer; Wilma Bergman; D Timothy Bishop; Margaret A Tucker; Brigitte Bressac-de Paillerets; Johan Hansson; Mitchell Stark; Nelleke Gruis; Julia Newton Bishop; Alisa M Goldstein; Joan E Bailey-Wilson; Graham J Mann; Nicholas Hayward; Jeffrey Trent Journal: Am J Hum Genet Date: 2003-07-03 Impact factor: 11.025
Authors: Daniel A Peiffer; Jennie M Le; Frank J Steemers; Weihua Chang; Tony Jenniges; Francisco Garcia; Kirt Haden; Jiangzhen Li; Chad A Shaw; John Belmont; Sau Wai Cheung; Richard M Shen; David L Barker; Kevin L Gunderson Journal: Genome Res Date: 2006-08-09 Impact factor: 9.043
Authors: Steven A McCarroll; Finny G Kuruvilla; Joshua M Korn; Simon Cawley; James Nemesh; Alec Wysoker; Michael H Shapero; Paul I W de Bakker; Julian B Maller; Andrew Kirby; Amanda L Elliott; Melissa Parkin; Earl Hubbell; Teresa Webster; Rui Mei; James Veitch; Patrick J Collins; Robert Handsaker; Steve Lincoln; Marcia Nizzari; John Blume; Keith W Jones; Rich Rava; Mark J Daly; Stacey B Gabriel; David Altshuler Journal: Nat Genet Date: 2008-09-07 Impact factor: 38.330
Authors: Kai Wang; Mingyao Li; Dexter Hadley; Rui Liu; Joseph Glessner; Struan F A Grant; Hakon Hakonarson; Maja Bucan Journal: Genome Res Date: 2007-10-05 Impact factor: 9.043
Authors: Graeme J Walker; James O Indsto; Raman Sood; Mezbah U Faruque; Ping Hu; Pam M Pollock; Paul Duray; Elizabeth A Holland; Kevin Brown; Richard F Kefford; Jeffrey M Trent; Graham J Mann; Nicholas K Hayward Journal: Genes Chromosomes Cancer Date: 2004-09 Impact factor: 5.006
Authors: Edward J Hollox; Ulrike Huffmeier; Patrick L J M Zeeuwen; Raquel Palla; Jesús Lascorz; Diana Rodijk-Olthuis; Peter C M van de Kerkhof; Heiko Traupe; Gys de Jongh; Martin den Heijer; André Reis; John A L Armour; Joost Schalkwijk Journal: Nat Genet Date: 2007-12-02 Impact factor: 38.330