Juan Xie1, Anjun Ma1, Yu Zhang2, Bingqiang Liu3, Sha Cao4, Cankun Wang1, Jennifer Xu1,5, Chi Zhang6, Qin Ma1. 1. Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA. 2. Colleges of Computer Science and Technology, Jilin University, Changchun 130012, China. 3. School of Mathematics, Shandong University, Jinan 250100, China. 4. Department of Biostatistics, Indiana University, School of Medicine, Indianapolis, IN 46202, USA. 5. Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA. 6. Department of Medical & Molecular Genetics, Indiana University, School of Medicine, Indianapolis, IN 46202, USA.
Abstract
MOTIVATION: The biclustering of large-scale gene expression data holds promising potential for detecting condition-specific functional gene modules (i.e. biclusters). However, existing methods do not adequately address a comprehensive detection of all significant bicluster structures and have limited power when applied to expression data generated by RNA-Sequencing (RNA-Seq), especially single-cell RNA-Seq (scRNA-Seq) data, where massive zero and low expression values are observed. RESULTS: We present a new biclustering algorithm, QUalitative BIClustering algorithm Version 2 (QUBIC2), which is empowered by: (i) a novel left-truncated mixture of Gaussian model for an accurate assessment of multimodality in zero-enriched expression data, (ii) a fast and efficient dropouts-saving expansion strategy for functional gene modules optimization using information divergency and (iii) a rigorous statistical test for the significance of all the identified biclusters in any organism, including those without substantial functional annotations. QUBIC2 demonstrated considerably improved performance in detecting biclusters compared to other five widely used algorithms on various benchmark datasets from E.coli, Human and simulated data. QUBIC2 also showcased robust and superior performance on gene expression data generated by microarray, bulk RNA-Seq and scRNA-Seq. AVAILABILITY AND IMPLEMENTATION: The source code of QUBIC2 is freely available at https://github.com/OSU-BMBL/QUBIC2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: The biclustering of large-scale gene expression data holds promising potential for detecting condition-specific functional gene modules (i.e. biclusters). However, existing methods do not adequately address a comprehensive detection of all significant bicluster structures and have limited power when applied to expression data generated by RNA-Sequencing (RNA-Seq), especially single-cell RNA-Seq (scRNA-Seq) data, where massive zero and low expression values are observed. RESULTS: We present a new biclustering algorithm, QUalitative BIClustering algorithm Version 2 (QUBIC2), which is empowered by: (i) a novel left-truncated mixture of Gaussian model for an accurate assessment of multimodality in zero-enriched expression data, (ii) a fast and efficient dropouts-saving expansion strategy for functional gene modules optimization using information divergency and (iii) a rigorous statistical test for the significance of all the identified biclusters in any organism, including those without substantial functional annotations. QUBIC2 demonstrated considerably improved performance in detecting biclusters compared to other five widely used algorithms on various benchmark datasets from E.coli, Human and simulated data. QUBIC2 also showcased robust and superior performance on gene expression data generated by microarray, bulk RNA-Seq and scRNA-Seq. AVAILABILITY AND IMPLEMENTATION: The source code of QUBIC2 is freely available at https://github.com/OSU-BMBL/QUBIC2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Anjun Ma; Cankun Wang; Yuzhou Chang; Faith H Brennan; Adam McDermaid; Bingqiang Liu; Chi Zhang; Phillip G Popovich; Qin Ma Journal: Nucleic Acids Res Date: 2020-07-02 Impact factor: 16.971
Authors: Zhe Wang; Shiyi Yang; Yusuke Koga; Sean E Corbett; Conor V Shea; W Evan Johnson; Masanao Yajima; Joshua D Campbell Journal: NAR Genom Bioinform Date: 2022-09-13