Chi Song1, Shih-Chi Su2, Zhiguang Huo3, Suleyman Vural4, James E Galvin5, Lun-Ching Chang6. 1. Division of Biostatistics, Ohio State University, Columbus, OH 43210, USA. 2. Whole-Genome Research Core Laboratory of Human Diseases, Chang Gung Memorial Hospital, Keelung, Taiwan. 3. Department of Biostatistics, University of Florida, Gainsville, FL 32611, USA. 4. Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA. 5. Comprehensive Center for Brain Health, Department of Neurology, Miller School of Medicine, University of Miami, Miami, FL 33101, USA. 6. Department of Mathematical Sciences, Florida Atlantic University, Boca Raton, FL 33431, USA.
Abstract
SUMMARY: In this article, we introduce a hierarchical clustering and Gaussian mixture model with expectation-maximization (EM) algorithm for detecting copy number variants (CNVs) using whole exome sequencing (WES) data. The R shiny package "HCMMCNVs" is also developed for processing user-provided bam files, running CNVs detection algorithm, and conducting visualization. Through applying our approach to 325 cancer cell lines in 22 tumor types from Cancer Cell Line Encyclopedia (CCLE), we show that our algorithm is competitive with other existing methods and feasible in using multiple cancer cell lines for CNVs estimation. In addition, by applying our approach to WES data of 120 oral squamous cell carcinoma (OSCC) samples, our algorithm, using the tumor sample only, exhibits more power in detecting CNVs as compared with the methods using both tumors and matched normal counterparts. AVAILABILITY AND IMPLEMENTATION: HCMMCNVs R shiny software is freely available at github repository https://github.com/lunching/HCMM_CNVs. and Zenodo https://doi.org/10.5281/zenodo.4593371. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
SUMMARY: In this article, we introduce a hierarchical clustering and Gaussian mixture model with expectation-maximization (EM) algorithm for detecting copy number variants (CNVs) using whole exome sequencing (WES) data. The R shiny package "HCMMCNVs" is also developed for processing user-provided bam files, running CNVs detection algorithm, and conducting visualization. Through applying our approach to 325 cancer cell lines in 22 tumor types from Cancer Cell Line Encyclopedia (CCLE), we show that our algorithm is competitive with other existing methods and feasible in using multiple cancer cell lines for CNVs estimation. In addition, by applying our approach to WES data of 120 oral squamous cell carcinoma (OSCC) samples, our algorithm, using the tumor sample only, exhibits more power in detecting CNVs as compared with the methods using both tumors and matched normal counterparts. AVAILABILITY AND IMPLEMENTATION: HCMMCNVs R shiny software is freely available at github repository https://github.com/lunching/HCMM_CNVs. and Zenodo https://doi.org/10.5281/zenodo.4593371. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Daniel C Koboldt; Qunyuan Zhang; David E Larson; Dong Shen; Michael D McLellan; Ling Lin; Christopher A Miller; Elaine R Mardis; Li Ding; Richard K Wilson Journal: Genome Res Date: 2012-02-02 Impact factor: 9.043
Authors: Jarupon Fah Sathirapongsasuti; Hane Lee; Basil A J Horst; Georg Brunner; Alistair J Cochran; Scott Binder; John Quackenbush; Stanley F Nelson Journal: Bioinformatics Date: 2011-08-09 Impact factor: 6.937
Authors: Peter Van Loo; Silje H Nordgard; Ole Christian Lingjærde; Hege G Russnes; Inga H Rye; Wei Sun; Victor J Weigman; Peter Marynen; Anders Zetterberg; Bjørn Naume; Charles M Perou; Anne-Lise Børresen-Dale; Vessela N Kristensen Journal: Proc Natl Acad Sci U S A Date: 2010-09-13 Impact factor: 11.205
Authors: Jordi Barretina; Giordano Caponigro; Nicolas Stransky; Kavitha Venkatesan; Adam A Margolin; Sungjoon Kim; Christopher J Wilson; Joseph Lehár; Gregory V Kryukov; Dmitriy Sonkin; Anupama Reddy; Manway Liu; Lauren Murray; Michael F Berger; John E Monahan; Paula Morais; Jodi Meltzer; Adam Korejwa; Judit Jané-Valbuena; Felipa A Mapa; Joseph Thibault; Eva Bric-Furlong; Pichai Raman; Aaron Shipway; Ingo H Engels; Jill Cheng; Guoying K Yu; Jianjun Yu; Peter Aspesi; Melanie de Silva; Kalpana Jagtap; Michael D Jones; Li Wang; Charles Hatton; Emanuele Palescandolo; Supriya Gupta; Scott Mahan; Carrie Sougnez; Robert C Onofrio; Ted Liefeld; Laura MacConaill; Wendy Winckler; Michael Reich; Nanxin Li; Jill P Mesirov; Stacey B Gabriel; Gad Getz; Kristin Ardlie; Vivien Chan; Vic E Myer; Barbara L Weber; Jeff Porter; Markus Warmuth; Peter Finan; Jennifer L Harris; Matthew Meyerson; Todd R Golub; Michael P Morrissey; William R Sellers; Robert Schlegel; Levi A Garraway Journal: Nature Date: 2012-03-28 Impact factor: 49.962
Authors: Jason Li; Richard Lupat; Kaushalya C Amarasinghe; Ella R Thompson; Maria A Doyle; Georgina L Ryland; Richard W Tothill; Saman K Halgamuge; Ian G Campbell; Kylie L Gorringe Journal: Bioinformatics Date: 2012-04-02 Impact factor: 6.937
Authors: F Favero; T Joshi; A M Marquard; N J Birkbak; M Krzystanek; Q Li; Z Szallasi; A C Eklund Journal: Ann Oncol Date: 2014-10-15 Impact factor: 32.976
Authors: Brian J O'Roak; Pelagia Deriziotis; Choli Lee; Laura Vives; Jerrod J Schwartz; Santhosh Girirajan; Emre Karakoc; Alexandra P Mackenzie; Sarah B Ng; Carl Baker; Mark J Rieder; Deborah A Nickerson; Raphael Bernier; Simon E Fisher; Jay Shendure; Evan E Eichler Journal: Nat Genet Date: 2011-05-15 Impact factor: 38.330
Authors: Jonathan S Packer; Evan K Maxwell; Colm O'Dushlaine; Alexander E Lopez; Frederick E Dewey; Rostislav Chernomorsky; Aris Baras; John D Overton; Lukas Habegger; Jeffrey G Reid Journal: Bioinformatics Date: 2015-09-17 Impact factor: 6.937