Hongjie Ke1, Zhao Ren2, Jianfei Qi3, Shuo Chen4, George C Tseng5, Zhenyao Ye4, Tianzhou Ma1. 1. Department of Epidemiology and Biostatistics, University of Maryland, College Park, 20742, USA. 2. Department of Statistics, University of Pittsburgh, Pittsburgh, 15260, USA. 3. Department of Biochemistry and Molecular Biology, University of Maryland, Baltimore, 21201, USA. 4. Department of Epidemiology & Public Health, University of Maryland, Baltimore, 21201, USA. 5. Department of Biostatistics, University of Pittsburgh, Pittsburgh, 15260, USA.
Abstract
MOTIVATION: The advancement of high-throughput technology characterizes a wide variety of epigenetic modifications and noncoding RNAs across the genome involved in disease pathogenesis via regulating gene expression. The high-dimensionality of both epigenetic/noncoding RNA and gene expression data make it challenging to identify the important regulators of genes. Conducting univariate test for each possible regulator-gene pair is subject to serious multiple comparison burden, and direct application of regularization methods to select regulator-gene pairs is computationally infeasible. Applying fast screening to reduce dimension first before regularization is more efficient and stable than applying regularization methods alone. RESULTS: We propose a novel screening method based on robust partial correlation to detect epigenetic and noncoding RNA regulators of gene expression over the whole genome, a problem that includes both high-dimensional predictors and high-dimensional responses. Compared to existing screening methods, our method is conceptually innovative that it reduces the dimension of both predictor and response, and screens at both node (regulators or genes) and edge (regulator-gene pairs) levels. We develop data-driven procedures to determine the conditional sets and the optimal screening threshold, and implement a fast iterative algorithm. Simulations and applications to long non-coding RNA and microRNA regulation in Kidney cancer and DNA methylation regulation in Glioblastoma Multiforme illustrate the validity and advantage of our method. AVAILABILITY: The R package, related source codes and real data sets used in this paper are provided at https://github.com/kehongjie/rPCor. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: The advancement of high-throughput technology characterizes a wide variety of epigenetic modifications and noncoding RNAs across the genome involved in disease pathogenesis via regulating gene expression. The high-dimensionality of both epigenetic/noncoding RNA and gene expression data make it challenging to identify the important regulators of genes. Conducting univariate test for each possible regulator-gene pair is subject to serious multiple comparison burden, and direct application of regularization methods to select regulator-gene pairs is computationally infeasible. Applying fast screening to reduce dimension first before regularization is more efficient and stable than applying regularization methods alone. RESULTS: We propose a novel screening method based on robust partial correlation to detect epigenetic and noncoding RNA regulators of gene expression over the whole genome, a problem that includes both high-dimensional predictors and high-dimensional responses. Compared to existing screening methods, our method is conceptually innovative that it reduces the dimension of both predictor and response, and screens at both node (regulators or genes) and edge (regulator-gene pairs) levels. We develop data-driven procedures to determine the conditional sets and the optimal screening threshold, and implement a fast iterative algorithm. Simulations and applications to long non-coding RNA and microRNA regulation in Kidney cancer and DNA methylation regulation in Glioblastoma Multiforme illustrate the validity and advantage of our method. AVAILABILITY: The R package, related source codes and real data sets used in this paper are provided at https://github.com/kehongjie/rPCor. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Cameron W Brennan; Roel G W Verhaak; Aaron McKenna; Benito Campos; Houtan Noushmehr; Sofie R Salama; Siyuan Zheng; Debyani Chakravarty; J Zachary Sanborn; Samuel H Berman; Rameen Beroukhim; Brady Bernard; Chang-Jiun Wu; Giannicola Genovese; Ilya Shmulevich; Jill Barnholtz-Sloan; Lihua Zou; Rahulsimham Vegesna; Sachet A Shukla; Giovanni Ciriello; W K Yung; Wei Zhang; Carrie Sougnez; Tom Mikkelsen; Kenneth Aldape; Darell D Bigner; Erwin G Van Meir; Michael Prados; Andrew Sloan; Keith L Black; Jennifer Eschbacher; Gaetano Finocchiaro; William Friedman; David W Andrews; Abhijit Guha; Mary Iacocca; Brian P O'Neill; Greg Foltz; Jerome Myers; Daniel J Weisenberger; Robert Penny; Raju Kucherlapati; Charles M Perou; D Neil Hayes; Richard Gibbs; Marco Marra; Gordon B Mills; Eric Lander; Paul Spellman; Richard Wilson; Chris Sander; John Weinstein; Matthew Meyerson; Stacey Gabriel; Peter W Laird; David Haussler; Gad Getz; Lynda Chin Journal: Cell Date: 2013-10-10 Impact factor: 41.582
Authors: Jie Peng; Ji Zhu; Anna Bergamaschi; Wonshik Han; Dong-Young Noh; Jonathan R Pollack; Pei Wang Journal: Ann Appl Stat Date: 2010-03 Impact factor: 2.083