Ruiqing Zheng1, Min Li1, Xiang Chen1, Fang-Xiang Wu1,2, Yi Pan1,3, Jianxin Wang1. 1. School of Information Science and Engineering, Central South University, Changsha, China. 2. Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, Canada. 3. Department of Computer Science, Georgia State University, Atlanta, GA, USA.
Abstract
MOTIVATION: Reconstructing gene regulatory networks (GRNs) based on gene expression profiles is still an enormous challenge in systems biology. Random forest-based methods have been proved a kind of efficient methods to evaluate the importance of gene regulations. Nevertheless, the accuracy of traditional methods can be further improved. With time-series gene expression data, exploiting inherent time information and high order time lag are promising strategies to improve the power and accuracy of GRNs inference. RESULTS: In this study, we propose a scalable, flexible approach called BiXGBoost to reconstruct GRNs. BiXGBoost is a bidirectional-based method by considering both candidate regulatory genes and target genes for a specific gene. Moreover, BiXGBoost utilizes time information efficiently and integrates XGBoost to evaluate the feature importance. Randomization and regularization are also applied in BiXGBoost to address the over-fitting problem. The results on DREAM4 and Escherichia coli datasets show the good performance of BiXGBoost on different scale of networks. AVAILABILITY AND IMPLEMENTATION: Our Python implementation of BiXGBoost is available at https://github.com/zrq0123/BiXGBoost. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Reconstructing gene regulatory networks (GRNs) based on gene expression profiles is still an enormous challenge in systems biology. Random forest-based methods have been proved a kind of efficient methods to evaluate the importance of gene regulations. Nevertheless, the accuracy of traditional methods can be further improved. With time-series gene expression data, exploiting inherent time information and high order time lag are promising strategies to improve the power and accuracy of GRNs inference. RESULTS: In this study, we propose a scalable, flexible approach called BiXGBoost to reconstruct GRNs. BiXGBoost is a bidirectional-based method by considering both candidate regulatory genes and target genes for a specific gene. Moreover, BiXGBoost utilizes time information efficiently and integrates XGBoost to evaluate the feature importance. Randomization and regularization are also applied in BiXGBoost to address the over-fitting problem. The results on DREAM4 and Escherichia coli datasets show the good performance of BiXGBoost on different scale of networks. AVAILABILITY AND IMPLEMENTATION: Our Python implementation of BiXGBoost is available at https://github.com/zrq0123/BiXGBoost. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.