Ziqiao Wang1,2, Peng Wei1. 1. Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA. 2. Quantitative Sciences Program, The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX 77030, USA.
Abstract
MOTIVATION: Integrative genomic analysis is a powerful tool used to study the biological mechanisms underlying a complex disease or trait across multiplatform high-dimensional data, such as DNA methylation, copy number variation and gene expression. It is common to perform large-scale genome-wide association analysis of an outcome for each data type separately and combine the results ad hoc, leading to loss of statistical power and uncontrolled overall false discovery rate (FDR). RESULTS: We propose a multivariate mixture model (IMIX) framework that integrates multiple types of genomic data and allows modeling of inter-data-type correlations. We investigated the across-data-type FDR control in IMIX and demonstrated lower misclassification rates at controlled overall FDR than established individual data type analysis strategies, such as the Benjamini-Hochberg FDR control, the q-value and the local FDR control by extensive simulations. IMIX features statistically principled model selection, FDR control and computational efficiency. Applications to The Cancer Genome Atlas data provided novel multi-omics insights into the genes and mechanisms associated with the luminal and basal subtypes of bladder cancer and the prognosis of pancreatic cancer. AVAILABILITYAND IMPLEMENTATION: We have implemented our method in R package 'IMIX' available at https://github.com/ziqiaow/IMIX, as well as CRAN soon. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Integrative genomic analysis is a powerful tool used to study the biological mechanisms underlying a complex disease or trait across multiplatform high-dimensional data, such as DNA methylation, copy number variation and gene expression. It is common to perform large-scale genome-wide association analysis of an outcome for each data type separately and combine the results ad hoc, leading to loss of statistical power and uncontrolled overall false discovery rate (FDR). RESULTS: We propose a multivariate mixture model (IMIX) framework that integrates multiple types of genomic data and allows modeling of inter-data-type correlations. We investigated the across-data-type FDR control in IMIX and demonstrated lower misclassification rates at controlled overall FDR than established individual data type analysis strategies, such as the Benjamini-Hochberg FDR control, the q-value and the local FDR control by extensive simulations. IMIX features statistically principled model selection, FDR control and computational efficiency. Applications to The Cancer Genome Atlas data provided novel multi-omics insights into the genes and mechanisms associated with the luminal and basal subtypes of bladder cancer and the prognosis of pancreatic cancer. AVAILABILITYAND IMPLEMENTATION: We have implemented our method in R package 'IMIX' available at https://github.com/ziqiaow/IMIX, as well as CRAN soon. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Wei Sun; Paul Bunn; Chong Jin; Paul Little; Vasyl Zhabotynsky; Charles M Perou; David Neil Hayes; Mengjie Chen; Dan-Yu Lin Journal: Nucleic Acids Res Date: 2018-04-06 Impact factor: 16.971
Authors: Charles C Guo; Tadeusz Majewski; Li Zhang; Hui Yao; Jolanta Bondaruk; Yan Wang; Shizhen Zhang; Ziqiao Wang; June Goo Lee; Sangkyou Lee; David Cogdell; Miao Zhang; Peng Wei; H Barton Grossman; Ashish Kamat; Jonathan James Duplisea; James Edward Ferguson; He Huang; Vipulkumar Dadhania; Jianjun Gao; Colin Dinney; John N Weinstein; Keith Baggerly; David McConkey; Bogdan Czerniak Journal: Cell Rep Date: 2019-05-07 Impact factor: 9.423
Authors: Woonyoung Choi; Sima Porten; Seungchan Kim; Daniel Willis; Elizabeth R Plimack; Jean Hoffman-Censits; Beat Roth; Tiewei Cheng; Mai Tran; I-Ling Lee; Jonathan Melquist; Jolanta Bondaruk; Tadeusz Majewski; Shizhen Zhang; Shanna Pretzsch; Keith Baggerly; Arlene Siefker-Radtke; Bogdan Czerniak; Colin P N Dinney; David J McConkey Journal: Cancer Cell Date: 2014-02-10 Impact factor: 31.743
Authors: Melissa A Richard; Tianxiao Huan; Symen Ligthart; Rahul Gondalia; Min A Jhun; Jennifer A Brody; Marguerite R Irvin; Riccardo Marioni; Jincheng Shen; Pei-Chien Tsai; May E Montasser; Yucheng Jia; Catriona Syme; Elias L Salfati; Eric Boerwinkle; Weihua Guan; Thomas H Mosley; Jan Bressler; Alanna C Morrison; Chunyu Liu; Michael M Mendelson; André G Uitterlinden; Joyce B van Meurs; Oscar H Franco; Guosheng Zhang; Yun Li; James D Stewart; Joshua C Bis; Bruce M Psaty; Yii-Der Ida Chen; Sharon L R Kardia; Wei Zhao; Stephen T Turner; Devin Absher; Stella Aslibekyan; John M Starr; Allan F McRae; Lifang Hou; Allan C Just; Joel D Schwartz; Pantel S Vokonas; Cristina Menni; Tim D Spector; Alan Shuldiner; Coleen M Damcott; Jerome I Rotter; Walter Palmas; Yongmei Liu; Tomáš Paus; Steve Horvath; Jeffrey R O'Connell; Xiuqing Guo; Zdenka Pausova; Themistocles L Assimes; Nona Sotoodehnia; Jennifer A Smith; Donna K Arnett; Ian J Deary; Andrea A Baccarelli; Jordana T Bell; Eric Whitsel; Abbas Dehghan; Daniel Levy; Myriam Fornage Journal: Am J Hum Genet Date: 2017-11-30 Impact factor: 11.025