Literature DB >> 33490923

Prioritizing transcriptional factors in gene regulatory networks with PageRank.

Hongxu Ding1,2, Ying Yang1,3, Yuanqing Xue1, Lucas Seninge1, Henry Gong1, Rojin Safavi1, Andrea Califano2, Joshua M Stuart1.   

Abstract

Biological states are controlled by orchestrated transcriptional factors (TFs) within gene regulatory networks. Here we show TFs responsible for the dynamic changes of biological states can be prioritized with temporal PageRank. We further show such TF prioritization can be extended by integrating gene regulatory networks reverse engineered from multi-omics profiles, e.g. gene expression, chromatin accessibility, and chromosome conformation assays, using multiplex PageRank.
© 2020 The Authors.

Entities:  

Keywords:  Gene Network; Molecular Biology; Omics

Year:  2020        PMID: 33490923      PMCID: PMC7809505          DOI: 10.1016/j.isci.2020.102017

Source DB:  PubMed          Journal:  iScience        ISSN: 2589-0042


Introduction

Biological processes are primarily executed via gene regulatory networks (GRNs), which are controlled by key transcriptional factors (TFs) (Levine and Davidson, 2005; Califano and Alvarez, 2017). Such key TFs usually occupy the top of the gene regulatory hierarchy (Chan and Kyba, 2013). The regulatory hierarchy of a specific TF depends on the number and hierarchy of corresponding transcriptional targets, thus can be quantified by PageRank centrality (Brin and Page, 1998; Page et al., 1999). Originally proposed for ranking search results of World Wide Web (WWW) snapshots, PageRank and related algorithms has been successfully applied to the analysis of single static biological networks (Morrison et al., 2005; Koschützki and Schreiber, 2008; Tarca et al., 2009; Iván and Grolmusz, 2011). The advent of high-throughput sequencing technologies provide unprecedented temporal and multi-dimensional biological information for understanding transcriptional regulation. For instance, transcriptional regulatory dynamics among consecutive biological states can be characterized with single cell RNA sequencing (scRNA-Seq) (Kolodziejczyk et al., 2015) and trajectory analysis (Herring et al., 2018). Meanwhile, epigenetic regulation of gene transcription can be illustrated using, e.g. chromatin accessibility (Klemm et al., 2019) and chromosome conformation (Sati and Cavalli, 2017) assays. Here, we show within such temporal and multiplex GRNs, TFs can be prioritized with temporal (Rozenshtein and Gionis, 2016) and multiplex (Halu et al., 2013) PageRank. As the extension of original steady-state PageRank in temporal networks, temporal PageRank ranks nodes based on their connections that change over time (Rozenshtein and Gionis, 2016). In temporal GRNs, important TFs are those connected with more time-related targets and other important TFs. Such TFs will then be considered at the top of the temporal gene regulatory hierarchy and prioritized (Figure 1A, see Transparent methods). Multiplex PageRank, on the other hand, extends PageRank analysis to multiplex networks. In such networks, the same nodes might interact with one another in different layers. Multiplex PageRank is then calculated according to the topology of a predefined base network, with regular PageRank of other supplemental networks as edge weights and personalization vector (Halu et al., 2013). Therefore, GRNs reverse engineered from multi-omics assays can be integrated for TF prioritization (Figure 1B, see Transparent methods).
Figure 1

Graphic overview

Schematic diagrams of temporal and multiplex PageRank analysis are shown in (A) and (B), respectively.

Graphic overview Schematic diagrams of temporal and multiplex PageRank analysis are shown in (A) and (B), respectively.

Results

Interpreting regulatory dynamics using temporal PageRank

We first demonstrated TFs controlling cellular state transitions can be prioritized with temporal PageRank. Specifically, we analyzed the human myoblast-muscle cell differentiation process, during which single cells were harvested and profiled every 24 hr from T0 to T72 (Trapnell et al., 2014). To provide intuitions for the rationale of temporal PageRank analysis, following the schematic diagram in Figure 1A, we visualized static GRNs at neighboring timepoints, as well as the yielding temporal GRNs. Considering the sizes of such static/temporal GRNs, for clear visualization, we only highlighted key regulatory modules by filtering out less confident interactions (see Transparent methods). As shown in Figure 2A, two independent regulatory modules were identified at T0. The first module was controlled by cell cycle TFs TOP2A (Mjelle et al., 2015) and FOXM1 (Wierstra and Alves, 2007), in concordance with the active proliferation of myoblasts. The second module, which is responsible for the lineage identity of myoblast, was marked by the lineage-specific TF MYF5 (Blais et al., 2005). As for T24 (Figure 2B) and the following timepoints (Figure S1), the corresponding GRNs were majorly controlled by a single regulatory module. Such a module was composed of muscle cell-specific TFs, including muscle cell lineage markers MEF2C and ANKRD1 (Blais et al., 2005), as well as epigenetic modifier HMGA1 (Brocher et al., 2010). Thus, the differentiation of myoblast can be described as the sequential interplay of key TFs. We further applied temporal PageRank on the differential GRNs derived from the corresponding adjacent static counterparts. As shown in Figures 2C and S1, the regulatory dynamics of myoblast-muscle cell differentiation was recapitulated, by discovering all the above-mentioned key TFs. We also analyzed the 33 major lineages during mouse organogenesis reported in the MOCA data sets (Cao et al., 2019) as the additional proof-of-concept (Figure S2).
Figure 2

PageRank analysis on the myoblast-muscle cell differentiation process

(A and B) Static scRNA-Seq GRNs for T0 and T24. Vertex and label sizes correspond to static PageRank values. Red and blue edges correspond to positive and negative interactions, respectively.

(C) Temporal scRNA-Seq GRN between T0 and T24. Vertex and label sizes correspond to temporal PageRank values. Orange and purple vertices correspond to increased and decreased gene expression from T0 to T24, respectively. Red and blue edges correspond to gained and lost interactions from T0 to T24, respectively.

(D and E) Bubble plots showing the top 20 combined static and temporal PageRank candidates by analyzing scRNA-Seq and ATAC-Seq GRNs. The size of bubbles correspond to the degree values (number of connecting interactions). The color of bubbles correspond to the gene expression quantified by log2(rpm+1), where rpm stands for reads per million. For static PageRank, absolute gene expression was quantified, with red and gray corresponding to high and low gene expression, respectively. For temporal PageRank, differential gene expression was quantified, with red and blue corresponding to increased and decreased gene expression, respectively.

(F and G) Heatmaps showing the contribution of ATAC-Seq GRNs in TF prioritization. The contributions of scRNA-Seq and ATAC-Seq GRNs were normalized to 1. F and G describes static and temporal PageRank analysis, respectively.

PageRank analysis on the myoblast-muscle cell differentiation process (A and B) Static scRNA-Seq GRNs for T0 and T24. Vertex and label sizes correspond to static PageRank values. Red and blue edges correspond to positive and negative interactions, respectively. (C) Temporal scRNA-Seq GRN between T0 and T24. Vertex and label sizes correspond to temporal PageRank values. Orange and purple vertices correspond to increased and decreased gene expression from T0 to T24, respectively. Red and blue edges correspond to gained and lost interactions from T0 to T24, respectively. (D and E) Bubble plots showing the top 20 combined static and temporal PageRank candidates by analyzing scRNA-Seq and ATAC-Seq GRNs. The size of bubbles correspond to the degree values (number of connecting interactions). The color of bubbles correspond to the gene expression quantified by log2(rpm+1), where rpm stands for reads per million. For static PageRank, absolute gene expression was quantified, with red and gray corresponding to high and low gene expression, respectively. For temporal PageRank, differential gene expression was quantified, with red and blue corresponding to increased and decreased gene expression, respectively. (F and G) Heatmaps showing the contribution of ATAC-Seq GRNs in TF prioritization. The contributions of scRNA-Seq and ATAC-Seq GRNs were normalized to 1. F and G describes static and temporal PageRank analysis, respectively.

Integrating multi-omics GRNs using multiplex PageRank

We then demonstrated GRNs reverse engineered from multi-omics assays can be integrated through multiplex PageRank for TF prioritization. Specifically, we included matching ATAC-Seq profiles (Pliner et al., 2018) of the above-mentioned differentiation process. We then constructed static and temporal GRNs, and performed corresponding PageRank analysis, following the workflow described in Transparent methods. Although the scRNA-Seq and ATAC-Seq GRNs were topologically different (Figure S1), muscle cell signature TF MEF2C was identified with both GRN types across the entire differentiation process (Figure S3). Meanwhile, additional muscle cell TFs, e.g. KLF5 (Hayashi et al., 2016) and REST (Iannotti et al., 2013) were recapitulated by analyzing ATAC-Seq GRNs (Figure S3). Such results suggested GRNs reverse engineered from multi-omics profiles agreed on the general principle, while each provided unique insights into the gene regulatory machinery. Aiming at prioritizing TFs by combining scRNA-Seq and ATAC-Seq GRNs, we performed multiplex PageRank analysis. The contributions of scRNA-Seq and ATAC-Seq GRNs were quantified in Figures 2F and 2G. Noticeably, multiplex PageRank can be applied to integrate GRNs under both static and temporal scenarios. As shown in Figures 2D, 2E and S4, key TFs elucidated from scRNA-Seq and ATAC-Seq GRNs were together recapitulated. As an additional proof-of-concept, we analyzed the human hematopoiesis process, including the linear lineage progression of hematopoietic stem cell, multi-potent progenitor, and CMP (common myeloid progenitor), as well as the bifurcation from CMP to granulocyte-macrophage progenitor and megakaryocyte-erythroid progenitor, with multiplex PageRank. Following the same pipeline as the previous analysis, GRNs assembled from matching scRNA-Seq (Pellin et al., 2019) and ATAC-Seq (Corces et al., 2016) data sets were analyzed (Figure S5). We further expanded multiplex PageRank to integrate gene expression, chromatin accessibility and chromosome conformation GRNs, by analyzing scRNA-Seq, ATAC-Seq, and HiChIP (Mumbach et al., 2017) profiles of human T-cells. We used scRNA-Seq GRN as the base network for the integration. As shown in Figure S6, several known crucial TFs responsible for T cell homeostasis, e.g. FOXP1 (Feng et al., 2011) and functionalities, e.g. LEF1 (Travis et al., 1991) were recapitulated among the top 20 identified TFs. Moreover, a systematic survey of the top 20 TFs was performed with GO analysis (http://geneontology.org/). As shown in Table S1, a significant amount of T-cell-related biological processes were recapitulated. Noticeably, the three included GRNs complement each other by providing unique insights into gene regulatory machinery. For instance, the prioritization of LEF1 and FOXP1 was majorly contributed by the HiChIP and ATAC-Seq GRNs, respectively.

Discussion

Taken together, by analyzing diverse biological questions, we demonstrated that key TFs responsible for biological processes can be prioritized by analyzing GRNs using PageRank. Specifically, we showed that the crucial TFs controlling the dynamic transition of biological states can be prioritized with temporal PageRank. Further, we showed GRNs reverse engineered from multi-omics profiles can be integrated for TF prioritization with multiplex PageRank. PageRank quantifies the importance of TFs during biological processes by performing comprehensive surveys on GRN hierarchies, therefore extremely suitable for TF prioritization. Specifically, PageRank analysis can prioritize TFs even if their expression patterns are obscure. For instance, as shown in Figure 2E, although no strong differential expression was observed during the transition from T0 to T24, muscle cell-specific TF ANKRD1 was ranked #2 with temporal PageRank analysis. Meanwhile, PageRank analysis prioritizes TFs with the entire GRN hierarchies, rather than based on “flattened” architectures which only consider the direct targets. For instance, as shown in Figure S2, during the mouse embryo development of inhibitory interneuron lineage from stage E10.5 to stage E11.5, the degree centrality of Sox6 was insignificant, while ranked #3 by temporal PageRank analysis. We further performed a systematic comparison between our PageRank analysis with a state-of-the-art TF prioritization algorithm, VIPER (Alvarez et al., 2016; Ding et al., 2018) (see Figure S11 for details).

Limitations of the study

Taken together, we anticipate the PageRank analysis would provide novel and comprehensive insights for the understanding of transcriptional regulation, by identifying regulators that potentially reside at the top of the regulatory hierarchy. One thing to be noticed for temporal PageRank analysis is that, we would not recommend applying it on distinct networks. Consider an extreme case, where (1) the number of nodes in network A and B are the same, while A has 10 times more interactions than B, and (2) the two networks have no overlapping interactions, for example. The differential network between A and B will include all interactions in A and B. If temporal PageRank analysis is performed on such a differential network, the yielded top ranks will be dominated by A nodes, considering the 10 times more interactions. B nodes, on the other hand, are under-appreciated even though they also convey important information describing differences between A and B. As for analyzing GRNs, similar biases could happen. For instance, the epigenetic landscape of zygotes is less restricted, thus more regulatory interactions are expected. In contrast, the more restricted epigenetic landscapes reduce possible regulatory interactions in terminally differentiated cells such as T-cells. Thus, the temporal PageRank analysis between zygote and T cell GRNs is highly likely to ignore functional TFs in T-cells. We thus would suggest only applying temporal PageRank analysis between temporally adjacent networks. As for multiplex PageRank analysis, one limitation is that the results might vary according to the choice of the base network. For instance, during the myoblast-muscle cell differentiation process, although known myoblast-specific TFs SP1/3 (Parakati and DiMario, 2002) were prioritized by PageRank analyses based on only ATAC-Seq GRNs (Figure S3), they failed to be captured when using scRNA-Seq GRNs as base networks for multiplex PageRank analyses (Figures 2D and 2E). On the other hand, the effect of base network choice on the final multiplex PageRank integration was minor when analyzing T-cells (Figure S7). Thus, one possible future direction might be developing “reciprocal” multiplex PageRank analysis for GRNs, in which feedbacks among multiplex networks are considered as reported in (Tu et al., 2018).

Resource availability

Lead contact

Hongxu Ding, hding16@ucsc.edu.

Material availability

This study did not generate any new material.

Data and code availability

scRNA-Seq profiles of myoblast-muscle cell differentiation were downloaded from Gene Expression Omnibus (GEO) under accession number GSE52529. MOCA scRNA-Seq profiles were downloaded from GEO under accession number GSE119945. Hematopoiesis scRNA-Seq profiles were downloaded from GEO under accession number GSE117498. Healthy PBMC T cell scRNA-Seq profiles were downloaded from: https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc_10k_v3. ATAC-Seq profiles of myoblast-muscle cell differentiation were downloaded from GEO under accession number GSE109828. Hematopoiesis ATAC-Seq profiles were downloaded from GEO under accession number GSE74912. T cell ATAC-Seq and HiChIP profiles were downloaded from GEO under accession number GSE101498. The pageRank R package is available on Bioconductor, and GitHub repository https://github.com/hd2326/pageRank. Custom scripts used to reproduce the results and figures are also available at https://github.com/hd2326/pageRank.

Methods

All methods can be found in the accompanying Transparent Methods supplemental file.
  29 in total

1.  An initial blueprint for myogenic differentiation.

Authors:  Alexandre Blais; Mary Tsikitis; Diego Acosta-Alvear; Roded Sharan; Yuval Kluger; Brian David Dynlacht
Journal:  Genes Dev       Date:  2005-02-10       Impact factor: 11.361

Review 2.  FOXM1, a typical proliferation-associated transcription factor.

Authors:  Inken Wierstra; Jürgen Alves
Journal:  Biol Chem       Date:  2007-12       Impact factor: 3.915

3.  Sp1- and Sp3-mediated transcriptional regulation of the fibroblast growth factor receptor 1 gene in chicken skeletal muscle cells.

Authors:  Rajini Parakati; Joseph X DiMario
Journal:  J Biol Chem       Date:  2001-12-26       Impact factor: 5.157

Review 4.  Chromosome conformation capture technologies and their impact in understanding genome function.

Authors:  Satish Sati; Giacomo Cavalli
Journal:  Chromosoma       Date:  2016-04-30       Impact factor: 4.316

5.  Enhancer connectome in primary human cells identifies target genes of disease-associated DNA elements.

Authors:  Maxwell R Mumbach; Ansuman T Satpathy; Evan A Boyle; Chao Dai; Benjamin G Gowen; Seung Woo Cho; Michelle L Nguyen; Adam J Rubin; Jeffrey M Granja; Katelynn R Kazane; Yuning Wei; Trieu Nguyen; Peyton G Greenside; M Ryan Corces; Josh Tycko; Dimitre R Simeonov; Nabeela Suliman; Rui Li; Jin Xu; Ryan A Flynn; Anshul Kundaje; Paul A Khavari; Alexander Marson; Jacob E Corn; Thomas Quertermous; William J Greenleaf; Howard Y Chang
Journal:  Nat Genet       Date:  2017-09-25       Impact factor: 38.330

6.  HMGA1 down-regulation is crucial for chromatin composition and a gene expression profile permitting myogenic differentiation.

Authors:  Jan Brocher; Benjamin Vogel; Robert Hock
Journal:  BMC Cell Biol       Date:  2010-08-11       Impact factor: 4.241

7.  Functional characterization of somatic mutations in cancer using network-based inference of protein activity.

Authors:  Mariano J Alvarez; Yao Shen; Federico M Giorgi; Alexander Lachmann; B Belinda Ding; B Hilda Ye; Andrea Califano
Journal:  Nat Genet       Date:  2016-06-20       Impact factor: 38.330

8.  The single-cell transcriptional landscape of mammalian organogenesis.

Authors:  Junyue Cao; Malte Spielmann; Xiaojie Qiu; Xingfan Huang; Daniel M Ibrahim; Andrew J Hill; Fan Zhang; Stefan Mundlos; Lena Christiansen; Frank J Steemers; Cole Trapnell; Jay Shendure
Journal:  Nature       Date:  2019-02-20       Impact factor: 49.962

9.  Centrality analysis methods for biological networks and their application to gene regulatory networks.

Authors:  Dirk Koschützki; Falk Schreiber
Journal:  Gene Regul Syst Bio       Date:  2008-05-15

Review 10.  Single-Cell Computational Strategies for Lineage Reconstruction in Tissue Systems.

Authors:  Charles A Herring; Bob Chen; Eliot T McKinley; Ken S Lau
Journal:  Cell Mol Gastroenterol Hepatol       Date:  2018-02-13
View more
  1 in total

1.  Identification of Transcription Factors Regulating SARS-CoV-2 Tropism Factor Expression by Inferring Cell-Type-Specific Transcriptional Regulatory Networks in Human Lungs.

Authors:  Haonan Tong; Hao Chen; Cranos M Williams
Journal:  Viruses       Date:  2022-04-17       Impact factor: 5.818

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.