| Literature DB >> 35017649 |
Weiyang Tao1,2, Timothy R D J Radstake3,4, Aridaman Pandit5,6.
Abstract
Changes in a few key transcriptional regulators can lead to different biological states. Extracting the key gene regulators governing a biological state allows us to gain mechanistic insights. Most current tools perform pathway/GO enrichment analysis to identify key genes and regulators but tend to overlook the gene/protein regulatory interactions. Here we present RegEnrich, an open-source Bioconductor R package, which combines differential expression analysis, data-driven gene regulatory network inference, enrichment analysis, and gene regulator ranking to identify key regulators using gene/protein expression profiling data. By benchmarking using multiple gene expression datasets of gene silencing studies, we found that RegEnrich using the GSEA method to rank the regulators performed the best. Further, RegEnrich was applied to 21 publicly available datasets on in vitro interferon-stimulation of different cell types. Collectively, RegEnrich can accurately identify key gene regulators from the cells under different biological states, which can be valuable in mechanistically studying cell differentiation, cell response to drug stimulation, disease development, and ultimately drug development.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35017649 PMCID: PMC8752721 DOI: 10.1038/s42003-021-02991-5
Source DB: PubMed Journal: Commun Biol ISSN: 2399-3642
Fig. 1The analytic workflow of RegEnrich package.
RegEnrich consists of four major steps: differential expression analysis, regulator-target network construction, enrichment analysis, and regulator ranking and visualization.
Fig. 2Time consumption and memory usage by RegEnrich.
With one CPU core, (a) the time consumed and (b) maximum memory used by RegEnrich when analyzing a gene expression dataset with different numbers of genes (ranging from 2000 to 40,000) and different numbers of samples (ranging from 10 to 100 and from 50 to 200 for COEN and GRN network, respectively).
Fig. 3Comparison of key regulators (or hubs) identified by different methods.
Venn diagram shows the overlap between the top 50 hubs/regulators (a) defined by out-degree, out-closeness and the RegEnrich score (using COEN network and FET enrichment method), (b) defined by out-degree, out-closeness and the VIPER package (using ARACNe network). (c) Venn diagram shows the overlap between the top 50 hubs/regulators defined by RegEnrich (using different parameter combinations) and those by the VIPER package. (d) Ladder plots compare the rank of regulators identified by RegEnrich using different network inferencing methods, and those by RegEnrich and by VIPER. Lines connect the same regulators. The expression pattern of top three key regulators and their targets identified by (e) RegEnrich using COEN network and FET method and by (f) VIPER. Orange lines are the normalized expression of regulators and grey lines are the normalized expression of the targets of the regulators. The brown bars below the x-axis indicate samples at week 1, and purple bars the samples at week 3. The analyses were performed using the data obtained from[34].
The transcription regulators identified by RegEnrich and VIPER in gene-silencing studies*.
| GEO accession | Silencing technology | No. of samples | Cell line | Silenced gene(s) | Ranking | ||
|---|---|---|---|---|---|---|---|
| RegEnrich (FET) | RegEnrich (GSEA) | ARACNE + VIPER | |||||
| GSE19114[ | shRNA | 44 | BTIC | STAT3 | 2 | 1 | 7 |
| GSE17172[ | shRNA | 9 | ST486 | FOXM1 | 2 | 1 | – |
| GSE19114[ | shRNA | 12 | SNB19 | STAT3 | 14 | 5 | n/a |
| GSE2350[ | siRNA | 8 | Burkitt lymphoma cell line | BCL6 | 32 | 11 | – |
| GSE19114[ | shRNA | 12 | SNB19 | CEBPB | 50 | 24 | n/a |
| GSE19114[ | shRNA | 44 | BTIC | STAT3 & CEBPB | 2 & 285 | 1 & 957 | 6 & n/a |
| GSE19114[ | shRNA | 12 | SNB19 | STAT3 & CEBPB | 365 & 6 | 38 & 11 | n/a & n/a |
| GSE51978[ | shRNA | 9 | IMR32 | CHAF1A | 10 & 29# | 15 & 46# | – |
| GSE19114[ | shRNA | 44 | BTIC | CEBPB | 913 | 793 | n/a |
* “n/a” means no result for the regulators of interest obtained after ranking procedure. “–” indicates that ARACNE failed to construct a network based on the dataset. “#”means the ranking on day 5 and day 10 according to the experimental setting.
The transcription factors identified by RegEnrich in interferon stimulation studies*.
| Type | Interferon | Time | Concentration | Cell type/line | Families of transcription factors | GEO accession | ||
|---|---|---|---|---|---|---|---|---|
| STAT | IRF | ETS | ||||||
| Type I | IFNa | 6 h and 12 h | 500U/ml | HT1080 | STAT1 (1), STAT2 (6), STAT5B (22) | ELF1 (14) | GSE31019[ | |
| 6 h | 500U/ml | SKOV3 | STAT1 (2), STAT2 (11) | IRF7 (5), IRF1 (13), IRF2 (18) | ETV6 (20), ELF1 (30) | GSE31019[ | ||
| 6 h | 10 U/mL | Primary Hepatocytes | STAT2 (12), STAT1 (24) | IRF7 (4), IRF9 (7), IRF1 (17), IRF6 (29) | ETV7 (1), ETV6 (27), ELK4 (28) | GSE31193[ | ||
| 24 h | 10 U/mL | Primary Hepatocytes | STAT2 (1), STAT4 (7), STAT1 (9) | IRF7 (3), IRF9 (11) | ETV7 (4), ELK4 (17) | GSE31193[ | ||
| 10 h | 1000 U/ml | Fibroblast | STAT1 (8), STAT2 (24) | IRF9 (7) | ELF1 (32) | GSE67737[ | ||
| IFNa2 | 6 h | 1000 U/ml | Keratinocyte | STAT1 (6), STAT2 (9), STAT5A (32) | IRF7 (20), IRF1 (21), IRF6 (25), IRF2 (28) | ETV7 (11), ELF1 (26) | GSE124939[ | |
| 18 h | 1000 U/ml | Primary macrophage | STAT3 (9), STAT1 (13) | IRF1 (26), IRF7 (27) | GSE30536[ | |||
| IFNa2b | 2 h | 1000 U/ml | EBV-transformed B lymphoblastoid cell lines | ELF1 (2) | GSE117637[ | |||
| 2 h | 1000 U/ml | Fibroblast | IRF2 (6) | FLI1 (29) | GSE117637[ | |||
| IFNa6 | 6 h | 1000 U/ml | Keratinocyte | STAT1 (14), STAT2 (17) | IRF7 (11), IRF1 (28), IRF6 (30) | ETV7 (9), ELF1 (32) | GSE124939[ | |
| IFNb | 6 h | 1000 U/ml | Keratinocyte | STAT1 (8), STAT2 (15) | IRF1 (5), IRF7 (6), IRF2 (19), IRF6 (20) | ETV7 (1), ELF1 (18) | GSE124939[ | |
| 10 h | 1000 U/ml | Fibroblast | STAT1 (10), STAT2 (30) | IRF9 (8) | ETV7 (34), ELF1 (35) | GSE67737[ | ||
| Type II | IFNg | 6 h | 5 ng/ml | Keratinocyte | STAT3 (13), STAT1 (23), STAT2 (24) | IRF2 (1), IRF1 (5) | ETV7 (8) | GSE124939[ |
| 20 h | 10 ng/ml | Monocyte-derived macrophages | STAT1 (2), STAT2 (25) | IRF1 (1), IRF9 (31) | ETV7 (3) | GSE79077[ | ||
| 6 h | 20 ng/mL | Monocytes | STAT2 (9), STAT1 (11) | IRF1 (3), IRF9 (15), IRF7 (22) | ETV7 (1) | GSE36537[ | ||
| 18 h | 20 ng/mL | Monocytes | GSE36537[ | |||||
| 18 h | 20 ng/mL | Monocyte-derived macrophages | STAT1 (17), STAT2 (34) | IRF1 (2), IRF7 (35) | ETV7 (1) | GSE36537[ | ||
| 24 h | 20 ng/mL | keratinocytes | STAT3 (3) | IRF1 (1), IRF2 (10), IRF7 (14) | ETV7 (2) | GSE12109[ | ||
| 24 h | 100 U/ml | Peripheral blood derived macrophages | STAT3 (4) | IRF7 (3), IRF1 (8), IRF9 (26) | ETV7 (2), ELF4 (27) | GSE11886[ | ||
| 10 h | 1000 U/ml | dermal fibroblast | STAT1 (8), STAT3 (23) | IRF1 (4), IRF9 (10) | ETV7 (11) | GSE67737[ | ||
| 3 h | n.a. | Monocyte-derived macrophages | STAT2 (12), STAT6 (30) | ETS2 (5), ELK1 (24) | GSE130567[ | |||
*Three families of transcription factors were assessed for 21 datasets, where cells were stimulated by different interferons. Only TFs in STAT, IRF, and ETS family ranked in top 35 were shown in “Families of transcription factors” columns as a format of “Regulator (ranking)”. The best combination of parameters (i.e., COEN network and GSEA enrichment method) identified in Table 1 was used in the analysis.
Fig. 4The genes consistently identified as key regulators.
Key regulators in (a) type I interferon stimulation and (b) type II interferon stimulation datasets. The full list has been shown in Table 2. The top 35 regulators in each dataset were included as key regulators, and only the regulators identified in more than 25% of datasets were shown.