Literature DB >> 33124218

Reconstructing gene regulatory networks in single-cell transcriptomic data analysis.

Hao Dai1,2, Qi-Qi Jin1,3,4, Lin Li1,3, Luo-Nan Chen1,4,5,6.   

Abstract

Gene regulatory networks play pivotal roles in our understanding of biological processes/mechanisms at the molecular level. Many studies have developed sample-specific or cell-type-specific gene regulatory networks from single-cell transcriptomic data based on a large amount of cell samples. Here, we review the state-of-the-art computational algorithms and describe various applications of gene regulatory networks in biological studies.

Entities:  

Keywords:  Cell-specific network; Cell-type-specific network; Computational algorithm; Gene regulatory network; Sample-specific network; Single-cell RNA sequencing

Mesh:

Year:  2020        PMID: 33124218      PMCID: PMC7671911          DOI: 10.24272/j.issn.2095-8137.2020.215

Source DB:  PubMed          Journal:  Zool Res        ISSN: 2095-8137


INTRODUCTION

Single-cell RNA sequencing (scRNA-seq) technology has made it possible to measure and compare gene transcriptomic profiles at single-cell resolution (Eberwine et al., 2014; Stegle et al., 2015). Based on scRNA-seq data, new cell types with distinct functions can be identified and cellular lineages during differentiation can be traced (Rozenblatt-Rosen et al., 2017; Villani et al., 2017). Many studies have focused on developing accurate and robust computational methods for scRNA-seq data analysis (Zeng & Dai, 2019), where a key problem is how to construct gene regulatory networks (GRNs) to pinpoint crucial factors, e.g., those that control cellular differentiation and determine phenotypes in disease progression (Iacono et al., 2019). In fact, scRNA-seq technology provides a large number of cell samples, making it possible to study gene-gene associations and transcriptional networks accurately (Dai et al., 2019). At present, many studies and algorithms have been developed to construct GRNs from scRNA-seq data based on various principles and perspectives.

INFERENCE METHODS FOR GRNs

Correlation networks

Correlation network analysis is one of the most widely used methods for scRNA-seq data. These networks measure gene-gene associations based on correlation coefficients and are suitable for large and high-dimensional datasets. In 2008, Langfelder & Horvath (2008) presented the popular weighted gene co-expression network (WGCNA) for weighted correlation network analysis. This method detects modules of highly correlated genes, identifies hub genes within these modules, and measures the relationships among the modules. Since then, many improved algorithms have been presented. For example, the partial and semi-partial correlation (PPCOR) method (Kim, 2015) substitutes the correlation coefficient in WGCNA by semi-partial correlation, which can measure the association between two variables after eliminating the effects of all other variables, i.e., the relationship between two genes is direct rather than influenced by other genes. The part mutual information (PMI) method (Zhang et al., 2015; Zhao et al., 2016) identifies direct associations based on the partial independence concept, and the partial information decomposition and context (PIDC) approach (Chan et al., 2017) uses partial information decomposition to determines the relationships between genes. These methods apply the concept of information theory and can measure nonlinear relationships. Usually, correlation networks are undirected, which means that the regulatory direction between two genes is unknown. However, the lag-based expression association for pseudotime-series (LEAP) approach (Specht & Li, 2017) can construct directed gene co-expression networks from pseudotime-ordered scRNA-seq data. This method computes the Pearson correlation coefficient over all possible time lags along the estimated pseudotime (no branch), and then uses the maximum correlation to construct the network. LEAP can capture the associations hidden by time lags and provides more accurate GRNs. Single-cell regularized inference using time-stamped expression profiles (SINCERITIES) (Gao et al., 2018) is another approach applied to construct directed GRNs from time-stamped single-cell transcriptional expression profiles. This method divides the single-cell data into several time points, uses Granger causality to infer regulatory networks centered by transcription factors (TFs), and uses ridge regression and partial correlation analyses to recover the directed regulatory relationships among genes. More accurate GRNs can be provided by this method. Several methods use statistical likelihood or Bayesian networks to infer GRNs, e.g., context likelihood of relatedness (CLR) (Faith et al., 2007) and first-order autoregressive moving-average and variational Bayesian expectation-maximization (AR1MA1-VBEM) (Sanchez-Castillo et al., 2018). These methods are similar to correlation networks but are considered more accurate for measuring gene-gene relationships based on their nonlinear principles.

Dynamic networks

In comparison to static (correlation) networks, dynamic networks are more suitable for describing changes in network dynamics, such as cellular lineages during differentiation. The Boolean model is one of the simplest methods, which takes the value of 0 or 1 to represent the absence or presence of gene expression and uses the Boolean operators AND, OR, and NOT to describe the interaction between two genes. The Boolean model is more robust to the effects of dropout, which makes it quite useful for scRNA-seq data. Many methods have reconstructed GRNs based on synchronous or asynchronous Boolean models, such as reduced ordered binary decision diagrams (ROBDD) (Garg et al., 2008), cellular network optimizer (CellNOptR) (Terfve et al., 2012), bool trainer (BTR) (Lim et al., 2016), single-cell network synthesis toolkit (SCNS) (Woodhouse et al., 2018), and gene modular network (GMN) (Zhang et al., 2020), which have been applied to find key regulators of cell fate and reveal network rewiring during cell differentiation (Moignard et al., 2015; Xu et al., 2014). However, the Boolean model must convert expression data into binary data, which may obscure gene-gene interactions. In contrast, differential equation-based models are more complex and offer high-precision predictions (Chen et al., 2009, 2010; Wang et al., 2006), but these methods must balance time complexity and prediction accuracy. Matsumoto et al. (2017) presented a highly efficient optimization algorithm (single-cell ordinary differentiation equations, SCODE) to reconstruct expression dynamics and infer GRNs from differentiating cells. This method integrates the transformation of linear ordinary differential equations (ODEs) and linear regression and can reconstruct the observed expression dynamics and GRNs accurately with remarkable efficiency.

Tree-based networks

Huynh-Thu et al. (2010) developed a tree-based algorithm (gene network inference with ensemble of trees, GENIE3), which adopted a distinctive way to infer regulatory networks. This method decomposes the prediction of GRNs into p regression models constructed by tree-based ensemble methods, e.g., Random Forests or Extra-Trees, where p is the number of genes. In each regression model, the expression pattern of gene x (target gene) is predicted from all other genes (input genes), and the weight of interaction between the target and input genes is determined by the importance of each input gene in the regression model. Several improvements to GENIE3 have been developed. For example, the GRN inference based on gradient boosting machine (GRNBoost2) method (Moerman et al., 2019) uses gradient boosting with GENIE3 architecture to improve algorithm efficiency. Jump3 (Huynh-Thu & Sanguinetti, 2015) combines the tree-based algorithm and dynamic systems to infer GRNs by exploiting the time series of expression data. The single-cell regulatory network inference and clustering (SCENIC) method (Aibar et al., 2017) removes the indirect targets from the GENIE3 modules based on TF motif enrichment analysis, and only retains those modules with enriched TF-binding motifs, called regulons. Generally, these methods are competitive with correlation models, and are able to construct directed networks (Chen & Mar, 2018; Pratapa et al., 2020).

Deep-learning-based networks

Deep-learning frameworks have also been used to infer gene relationships. The convolutional neural network for coexpression (CNNC) approach (Yuan & Bar-Joseph, 2019) is a supervised and task-specific method, in which the network is trained by positive and negative samples, e.g. known targets of TFs, known pathways for specific biological processes, and known disease genes. Based on the data types used for training, CNNC can predict TF targets and identify disease-related genes.

Cell-specific networks

Recently, Dai et al. (2019) presented a new cell-specific network (CSN) method that can construct a network for each single cell from scRNA-seq data by considering statistical independence. Different from all other approaches, this method can identify gene-gene interactions and describe network heterogeneity at the single-cell level. CSN may help to find new cell types from a network perspective and reveal “dark” genes that play important roles in the network but are generally ignored by traditional differential analyses. Moreover, by considering partial independence, the conditional cell-specific network (CCSN) approach (Li et al., 2020) was developed to further reduce false positives in CSN method.

APPLICATION OF GENE REGULATORY NETWORKS

Method selection

The methods listed in this paper have their own advantages and disadvantages. How to choose the best method primarily depends on the scientific problem to be addressed. If the study focuses on time-series-related problems, such as development, cell differentiation, or disease procedures, the first choice would be an algorithm that constructs GRNs based on time-ordered data. If we only compare differences between two samples, e.g. the difference between a disease and normal state, and the difference before and after medication, the algorithm based on static data should be selected. Directed networks provide information on the direction of a regulatory relationship, whereas undirected networks only measure the existence and strength of a regulatory relationship. Nonlinear algorithms can predict the strength of a regulatory relationship more accurately, but computational time will be longer; linear algorithms reduce the computational time, but accuracy also declines; binary algorithms can only show whether or not a regulatory relationship exists, but they are the fastest. Thus, if the purpose of a study is to explore the key regulatory factors controlling a biological process, research should focus on the changes or differences in network structure, instead of the strength of the regulatory relationship, and thus binary algorithms may be preferred. If we know certain regulatory factors are important and hope to identify their upstream and downstream genes, we can choose a directed network with linear or nonlinear algorithm. If we hope to simulate a biological process through a network, for example, by deleting network nodes to simulate gene knockout processes, then nonlinear algorithms are necessary. In addition, although the latest algorithms are often better than earlier ones, it is still important to build and compare networks constructed by different principles. Tables 1lists the type and principle of each method, and Table 2 lists the code and source of each method for reference.
Table 1

Summary of inference methods for gene regulatory networks

MethodType of edgeInput dataPrincipleReferences
WGCNALinearUndirectedStaticPearson correlationLangfelder & Horvath, 2008
PPCORLinearUndirectedStaticSemi-partial correlationKim, 2015
PMINonlinearUndirectedStaticPart mutual informationZhang et al., 2015; Zhao et al., 2016
PIDCNonlinearUndirectedStaticPartial information decompositionChan et al., 2017
LEAPLinearDirectedTime-orderedPearson correlationSpecht & Li, 2017
SINCERITIESLinearDirectedTime-orderedRidge regression and partial correlationGao et al., 2018
AR1MA1 -VBEMNonlinearDirectedTime-orderedBayesian frameworkSanchez-Castillo et al., 2018
ROBDDBinaryDirectedTime-orderedBoolean modelGarg et al., 2008
CellNOptRBinaryDirectedTime-orderedBoolean modelTerfve et al., 2012
BTRBinaryDirectedTime-orderedBoolean modelLim et al., 2016
SCNSBinaryDirectedTime-orderedBoolean modelWoodhouse et al., 2018
SCODENonlinearDirectedTime-orderedOrdinary differentiation equationsMatsumoto et al., 2017
GENIE3NonlinearDirectedStaticRandom Forests or Extra-TreesHuynh-Thu et al., 2010
Jump3NonlinearDirectedTime-orderedDecision treesHuynh-Thu & Sanguinetti, 2015
SCENICNonlinearDirectedStaticGENIE3 and TF motif enrichment analysisAibar et al., 2017
GRNBoost2NonlinearDirectedStaticGENIE3 and gradient boostingMoerman et al., 2019
CNNCNonlinearUndirectedStaticDeep learningYuan & Bar-Joseph, 2019
CSNNonlinearUndirectedStatic / Time-orderedStatistic independencyDai et al., 2019
CCSNNonlinearUndirectedStatic / Time-orderedStatistically partial independencyLi et al., 2020
Table 2

Sources of GRN inference methods

MethodCodeSource
WGCNARR package: WGCNA
PPCORRR package: ppcor
PMIMATLABhttp://www.sysbio.ac.cn/cb/chenlab/software/PCA-PMI
PIDCJuliahttps://github.com/Tchanders
LEAPRR package: LEAP
SINCERITIESR / MATLABhttp://www.cabsel.ethz.ch/tools/sincerities.html, https://github.com/CABSEL/SINCERITIES
AR1MA1- VBEMMATLABhttps://github.com/mscastillo/GRNVBEM
ROBDDJavahttp://si2.epfl.ch/~garg/genysis.html
CellNOptRRhttp://www.bioconductor.org/packages/release/bioc/html/CellNOptR.html
BTRRR package: BTR
SCNSRhttps://github.com/swoodhouse/SCNS-GUI
SCODERhttps://github.com/hmatsu1226/SCODE
GENIE3Rhttp://www.montefiore.ulg.ac.be/~huynh-thu/software.html
Jump3MATLABhttp://homepages.inf.ed.ac.uk/vhuynht/software.html
SCENICRhttp://scenic.aertslab.org
GRNBoost2Pythonhttp://arboreto.readthedocs.io
CNNCPythonhttps://github.com/xiaoyeye/CNNC
CSNMATLABhttps://github.com/wys8c764/CSN
CCSNMATLABhttp://sysbio.sibcb.ac.cn/cb/chenlab/soft/CCSN.zip
In this paper, we selected several widely used algorithms to test whether they can identify proven gene regulatory relationships (Liu et al., 2020; Van Dijk et al., 2018). As shown in Table 3, most methods identified all six regulatory relationships, although two linear methods WGCNA and SINCERITIES did not perform well. This result is not unexpected as nonlinear algorithms usually predict gene regulatory relationships more accurately.
Table 3

Comparison of GRN inference methods

MethodProven gene regulatory relationship
GSE114397GSE139343
VIM- ZEB1VIM- SNAI2VIM- MYCARID1A-ZIC1ARID1A-SOX1ARID1A-MAP2
WGCNA×××××
PPCOR
PMI
LEAP
SINCERITIES×××
SCODE
SCENIC×
CSN

GRN analysis in biological studies

All methods listed in this paper use scRNA-seq data as input. Most GRN analyses need some prior information; algorithms based on time-ordered data need time-series information and algorithms based on static data need cell-type information. Both CSN and CCSN construct one network for each single cell, so they are suitable for either time-ordered data or static data. Some widely used data analysis software, e.g., Seurat (Butler et al., 2018) and Monocle (Qiu et al., 2017; Trapnell et al., 2014), can help obtain cell-type or time-series information based on clustering or pseudo-time analyses. No matter which algorithm is used, network analysis is generally similar. For each network, the first step is to identify the modules it contains. Generally, a module represents a functional unit, as the genes performing the same function are often closely related to each other. In each module, the number of edges connected to a node, i.e., network degree, is an important indicator. If the network degree of a certain gene shows considerable differences between a disease and normal state, or shows significant changes during cell differentiation, this gene may be an important regulatory factor. If regulatory factors are known, the genes related to these factors should be considered. In addition, it should be noted that the genes linking different modules are often very important. Gene regulatory networks have been widely used in biological studies. For example, based on correlation network analysis, Pina et al. (2015) identified a key regulatory gene (Ddit3) in erythroid lineage programming and found the Ddit3-Gata2 regulatory axis could antagonize myeloid programs and enabled erythroid programs, which was validated experimentally. Xu et al. (2014) constructed Boolean networks composed of 30 genes related to the self-renewal and pluripotency of mouse embryonic stem cells (mESCs) cultured in serum/LIF or serum-free 2i/LIF conditions. They removed nodes from the Boolean network to simulate single and combinatorial RNA interference (RNAi) knockdown, with the predicted post-RNAi expression levels based on network analysis showing good agreement with experimental testing. In addition, Moignard et al. (2015) used diffusion maps to identify the developmental trajectory of the mesoderm toward blood in mouse based on scRNA-seq data, and then constructed Boolean networks to recapitulate blood development. The model predicted that the Erg gene is activated by Sox17 or Hoxb4, which were validated by the observations that Sox and Hox factors control early expression of Erg. Harly et al. (2019) used LEAP to identify target genes of TCF-1 during innate lymphoid cell (ILC) development, and identified the role of TCF-1 in developmental progress of ILC precursors. Sagar et al. (2020) established a γδ T-cell differentiation map based on fetal and adult thymus scRNA-seq data using GENIE3 to construct GRNs and illustrate fetal and adult differences. Differentially expressed gene networks have also been successfully applied to recover and characterized distinct stages of γδ T-cell differentiation. Elyanow et al. (2020) presented a new computational method (netNMF-sc) using gene-gene co-expression networks as prior knowledge to perform dimensionality reduction and imputation of scRNA-seq data with high dropout rates, which was competitive with many other methods for dimension reduction and imputation.

FUTURE PERSPECTIVES

Although many GRN methods have been developed, GRN inference remains a challenging problem in bioinformatics and computational biology. A critical issue is the low quality of single-cell sequencing data. As RNA is obtained from only one cell, noise from amplification and dropout events in sequencing is a common problem. Recently, the integration of various single-cell-omics data, such as ATAC-seq and ChIP-seq, has attracted increasing attention (Li et al., 2017; Mimitou et al., 2019; Stuart et al., 2019), which may help in the development of next-generation GRN inference algorithms for various fields, including developmental and evolutionary biology.

COMPETING INTERESTS

The authors declare that they have no competing interests.

AUTHORS’ CONTRIBUTIONS

H.D. and L.N.C. conceived the review. H.D. prepared the draft. H.D., L.L., and Q.Q.J. collected the materials. All authors contributed to the discussions. All authors read and approved the final version of the manuscript.
  44 in total

1.  Inferring gene regulatory networks from multiple microarray datasets.

Authors:  Yong Wang; Trupti Joshi; Xiang-Sun Zhang; Dong Xu; Luonan Chen
Journal:  Bioinformatics       Date:  2006-07-24       Impact factor: 6.937

2.  Conditional mutual inclusive information enables accurate quantification of associations in gene regulatory networks.

Authors:  Xiujun Zhang; Juan Zhao; Jin-Kao Hao; Xing-Ming Zhao; Luonan Chen
Journal:  Nucleic Acids Res       Date:  2014-12-24       Impact factor: 16.971

3.  Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data.

Authors:  Shuonan Chen; Jessica C Mar
Journal:  BMC Bioinformatics       Date:  2018-06-19       Impact factor: 3.169

4.  Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles.

Authors:  Jeremiah J Faith; Boris Hayete; Joshua T Thaden; Ilaria Mogno; Jamey Wierzbowski; Guillaume Cottarel; Simon Kasif; James J Collins; Timothy S Gardner
Journal:  PLoS Biol       Date:  2007-01       Impact factor: 8.029

5.  Single-Cell Network Analysis Identifies DDIT3 as a Nodal Lineage Regulator in Hematopoiesis.

Authors:  Cristina Pina; José Teles; Cristina Fugazza; Gillian May; Dapeng Wang; Yanping Guo; Shamit Soneji; John Brown; Patrik Edén; Mattias Ohlsson; Carsten Peterson; Tariq Enver
Journal:  Cell Rep       Date:  2015-06-04       Impact factor: 9.423

6.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells.

Authors:  Cole Trapnell; Davide Cacchiarelli; Jonna Grimsby; Prapti Pokharel; Shuqiang Li; Michael Morse; Niall J Lennon; Kenneth J Livak; Tarjei S Mikkelsen; John L Rinn
Journal:  Nat Biotechnol       Date:  2014-03-23       Impact factor: 54.908

7.  Construction and validation of a regulatory network for pluripotency and self-renewal of mouse embryonic stem cells.

Authors:  Huilei Xu; Yen-Sin Ang; Ana Sevilla; Ihor R Lemischka; Avi Ma'ayan
Journal:  PLoS Comput Biol       Date:  2014-08-14       Impact factor: 4.475

8.  Analysis on gene modular network reveals morphogen-directed development robustness in Drosophila.

Authors:  Shuo Zhang; Juan Zhao; Xiangdong Lv; Jialin Fan; Yi Lu; Tao Zeng; Hailong Wu; Luonan Chen; Yun Zhao
Journal:  Cell Discov       Date:  2020-06-30       Impact factor: 10.849

9.  CellNOptR: a flexible toolkit to train protein signaling networks to data using multiple logic formalisms.

Authors:  Camille Terfve; Thomas Cokelaer; David Henriques; Aidan MacNamara; Emanuel Goncalves; Melody K Morris; Martijn van Iersel; Douglas A Lauffenburger; Julio Saez-Rodriguez
Journal:  BMC Syst Biol       Date:  2012-10-18

10.  Synchronous versus asynchronous modeling of gene regulatory networks.

Authors:  Abhishek Garg; Alessandro Di Cara; Ioannis Xenarios; Luis Mendoza; Giovanni De Micheli
Journal:  Bioinformatics       Date:  2008-07-09       Impact factor: 6.937

View more
  3 in total

1.  Coevolutionary insights between promoters and transcription factors in the plant and animal kingdoms.

Authors:  Jing-Song Zhang; Hai-Quan Wang; Jie Xia; Kun Sha; Shu-Tao He; Hao Dai; Xiao-Hu Hao; Yi-Wei Zhou; Qiu Wang; Ke-Ke Ding; Zhang-Lei Ju; Wen Wang; Luo-Nan Chen
Journal:  Zool Res       Date:  2022-09-18

2.  Single-cell entropy network detects the activity of immune cells based on ribosomal protein genes.

Authors:  Qiqi Jin; Chunman Zuo; Haoyue Cui; Lin Li; Yiwen Yang; Hao Dai; Luonan Chen
Journal:  Comput Struct Biotechnol J       Date:  2022-06-30       Impact factor: 6.155

3.  Inference of Molecular Regulatory Systems Using Statistical Path-Consistency Algorithm.

Authors:  Yan Yan; Feng Jiang; Xinan Zhang; Tianhai Tian
Journal:  Entropy (Basel)       Date:  2022-05-13       Impact factor: 2.738

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.