Lun Hu1,2, Jun Zhang2, Xiangyu Pan2, Hong Yan3, Zhu-Hong You1. 1. Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China. 2. School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430070, China. 3. Department of Electrical Engineering, City University of Hong Kong, Hong Kong 999077, China.
Abstract
MOTIVATION: Clustering analysis in a biological network is to group biological entities into functional modules, thus providing valuable insight into the understanding of complex biological systems. Existing clustering techniques make use of lower-order connectivity patterns at the level of individual biological entities and their connections, but few of them can take into account of higher-order connectivity patterns at the level of small network motifs. RESULTS: Here, we present a novel clustering framework, namely HiSCF, to identify functional modules based on the higher-order structure information available in a biological network. Taking advantage of higher-order Markov stochastic process, HiSCF is able to perform the clustering analysis by exploiting a variety of network motifs. When compared with several state-of-the-art clustering models, HiSCF yields the best performance for two practical clustering applications, i.e. protein complex identification and gene co-expression module detection, in terms of accuracy. The promising performance of HiSCF demonstrates that the consideration of higher-order network motifs gains new insight into the analysis of biological networks, such as the identification of overlapping protein complexes and the inference of new signaling pathways, and also reveals the rich higher-order organizational structures presented in biological networks. AVAILABILITY AND IMPLEMENTATION: HiSCF is available at https://github.com/allenv5/HiSCF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Clustering analysis in a biological network is to group biological entities into functional modules, thus providing valuable insight into the understanding of complex biological systems. Existing clustering techniques make use of lower-order connectivity patterns at the level of individual biological entities and their connections, but few of them can take into account of higher-order connectivity patterns at the level of small network motifs. RESULTS: Here, we present a novel clustering framework, namely HiSCF, to identify functional modules based on the higher-order structure information available in a biological network. Taking advantage of higher-order Markov stochastic process, HiSCF is able to perform the clustering analysis by exploiting a variety of network motifs. When compared with several state-of-the-art clustering models, HiSCF yields the best performance for two practical clustering applications, i.e. protein complex identification and gene co-expression module detection, in terms of accuracy. The promising performance of HiSCF demonstrates that the consideration of higher-order network motifs gains new insight into the analysis of biological networks, such as the identification of overlapping protein complexes and the inference of new signaling pathways, and also reveals the rich higher-order organizational structures presented in biological networks. AVAILABILITY AND IMPLEMENTATION: HiSCF is available at https://github.com/allenv5/HiSCF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.