| Literature DB >> 26393364 |
Vipin Narang1, Muhamad Azfar Ramli2, Amit Singhal1, Pavanish Kumar1, Gennaro de Libero1, Michael Poidinger1, Christopher Monterola2.
Abstract
Human gene regulatory networks (GRN) can be difficult to interpret due to a tangle of edges interconnecting thousands of genes. We constructed a general human GRN from extensive transcription factor and microRNA target data obtained from public databases. In a subnetwork of this GRN that is active during estrogen stimulation of MCF-7 breast cancer cells, we benchmarked automated algorithms for identifying core regulatory genes (transcription factors and microRNAs). Among these algorithms, we identified K-core decomposition, pagerank and betweenness centrality algorithms as the most effective for discovering core regulatory genes in the network evaluated based on previously known roles of these genes in MCF-7 biology as well as in their ability to explain the up or down expression status of up to 70% of the remaining genes. Finally, we validated the use of K-core algorithm for organizing the GRN in an easier to interpret layered hierarchy where more influential regulatory genes percolate towards the inner layers. The integrated human gene and miRNA network and software used in this study are provided as supplementary materials (S1 Data) accompanying this manuscript.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26393364 PMCID: PMC4578944 DOI: 10.1371/journal.pcbi.1004504
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1Integrated human transcriptional and post-transcriptional GRN.
The network contains a total of 1867 miRNAs and 21,940 genes, including 1374 TFs and 20,556 non-TF genes (or mRNAs for simplicity). The numbers on the edges denote the total number of interactions between different types of nodes.
Fig 2Degree distribution of the general human TF-miRNA-mRNA network over (a) all nodes, (b) mRNAs, (c,d) TFs (in, out degrees) and (e,f) miRNAs (in, out degrees), (g,h,i) regulation of TFs, mRNAs and miRNAs by TFs, (j,k) regulation of TFs and mRNAs by miRNAs.
Statistics of degree distribution for various interactions in the integrated human GRN.
| Interaction Type | Min | Max | Mean | Median |
|---|---|---|---|---|
| TF → TF | 1 | 918 | 195.75 | 74 |
| TF ← TF | 1 | 80 | 27.06 | 25 |
| TF → mRNA | 1 | 11357 | 1377.35 | 3 |
| mRNA ← TF | 1 | 94 | 21.55 | 17 |
| TF → miRNA | 1 | 343 | 83.75 | 60.5 |
| miRNA ← TF | 1 | 74 | 9.39 | 3 |
| miRNA → TF | 1 | 595 | 99.45 | 74 |
| TF ← miRNA | 1 | 595 | 131.73 | 111 |
| miRNA → mRNA | 1 | 5741 | 893.41 | 595.5 |
| mRNA ← miRNA | 1 | 773 | 99.06 | 76 |
Fig 3Relationship between a node’s in or out-degree in the integrated human TF-miRNA-mRNA network and its expression characteristics in BioGPS (a) absolute expression level, (b) tissue specific gene expression.
Fig 4Evolution of the integrated human TF-miRNA-mRNA network with increasing number of ChIP’ed transcription factors and cell types in which transcription factors have been ChIP’ed.
As more TFs are ChIP’ed, while the shape of TF out-degree distribution remains the same (a), a proportionate number of edges are added to the network (b). Addition of these new edges leads to a linear increase in the in-degrees of mRNA nodes both for average in-degree and high in-degree mRNAs (c,d). This is similar to percolation dynamics where the frequency of both average and large size clusters increases as an increasing number of lattice spaces are filled up (e). ChIP of the same TFs in more cell types adds fewer new edges to the network (f) and the TF nodes (g) with a plateau reached beyond 10 cell types. Shown in (h) is an extrapolation of mRNA in-degree if all known TFs in the human genome (~1400) were to be ChIP’ed.
Public MCF-7 gene expression datasets downloaded from NCBI’s Gene Expression Omnibus (GEO) database.
| Category | GEO Series ID | Platform | Description | Reference |
|---|---|---|---|---|
| Datasets for building MCF-7 gene network | GSE11324 | Affymetrix HG U133 plus 2.0 | Control | [ |
| GSE11352 | Affymetrix HG U133 plus 2.0 | Control | [ | |
| GSE42619 | Agilent HG G4112F 4x44K | Control | [ | |
| GSE51403 | Illumina HiSeq 2000 RNA-seq | Control | [ |
Fig 5MCF-7 estrogen response gene network.
(a) Differentially expressed genes with FDR < 0.001 were selected from two high quality datasets GSE11324 (microarray) and GSE51403 (RNA-seq). A significant number of genes were common between the two datasets, near about 70% if a FDR cutoff of 0.05 were to be used as shown by dotted ellipses. The union list between GSE11324 and GSE51403 was selected for network construction. The union list also had a good overlap with another dataset GSE11352. (b) The final MCF-7 network consisted of 5736 nodes including 462 TFs, 58 miRNAs and 5216 mRNA genes. The numbers on the edges denote the total number of interconnections between various types of nodes.
Top 20 regulatory molecules identified by various ranking strategies in the MCF-7 estrogen response gene regulatory network.
| Ranking Strategy | Top regulatory molecules (ranked 1 to 20 from left to right) |
|---|---|
|
| hsa-miR-941, GATA4, PGR, SIM1, CXCL12, hsa-miR-653, hsa-miR-489, EGR3, SOX3, HEY2, MYBL1, hsa-miR-548M, TMEM229A, FOXE3, FOXC2, CREB3L1, GRHL3, MYB, hsa-miR-623, hsa-miR-1231 |
|
| MYC, ELF1, TAF1, E2F6, E2F1, HDAC2, EGR1, CHD2, GATA2, USF1, RAD21, IRF1, GABPA, FOXA1, AR, SMC3, SRF, RFX5, GATA3, ZNF143 |
|
| ESR1, STAT2, TP63, hsa-miR-4800-5P, GATA3, HIF1A, NFYA, TFAP2C, IRF1, SREBF1, GTF2F1, ATF3, BHLHE40, hsa-miR-4640-3P, RFX5, PBX3, SMARCC1, E2F1, USF2, FOXA1 |
|
| PURB, HNRNPD, NFAT5, ZBTB4, CHD2, STAT3, HMGB1, NAA15, ILF3, CREB1, FOXN2, NCOA3, GABPB1, HBP1, HIF1A, AFF4, ZNF143, ELK4, KLF10, ZNF367 |
|
| MYC, ELF1, TAF1, E2F6, E2F1, EGR1, HDAC2, CHD2, GATA2, RAD21, USF1, IRF1, AR, GABPA, FOXA1, SMC3, GATA3, YBX1, SRF, ZNF143 |
|
| MYC, E2F6, ELF1, YBX1, E2F1, STAT3, GABPA, CHD2, EGR1, ESR1, TAF1, SRF, ZNF143, RAD21, FOS, ELK4, GATA2, AR, TFAP2C, JUNB |
|
| MYC, MYB, ATF3, RUNX3, BRCA1, PPARA, ESR1, SMAD7, CXCL12, ASCL1, LMO2, CDKN1A, E2F1, TCERG1, SOX9, TP53, ETS2, SMAD3, TFAP2C, MEIS2 |
|
| MYC, ELF1, TAF1, E2F6, E2F1, HDAC2, EGR1, CHD2, USF1, RAD21, IRF1, GABPA, SMC3, SRF, RFX5, GATA3, ZNF143, YBX1, USF2, STAT3 |
Fig 6Randomization test to determine whether the core TFs and microRNAs identified in MCF-7 estrogen response GRN were obtained by chance.
The scatter plot compares the core number of a TF in MCF-7 estrogen response GRN (y-axis) with its average core number over 10,000 randomly sampled networks (x-axis). To be comparable the randomly sampled networks contained the same number of TFs and miRNAs as the MCF-7 estrogen response GRN.
Fig 7Randomization test to determine whether the core TFs and microRNAs identified in MCF-7 estrogen response GRN were obtained by chance.
The scatter plot compares the core number of a microRNA in MCF-7 estrogen response GRN (y-axis) with its average core number over 10,000 randomly sampled networks (x-axis). To be comparable the randomly sampled networks contained the same number of TFs and miRNAs as the MCF-7 estrogen response GRN.
Literature validation scores for various ranking strategies (lower scores are better).
| Ranking Strategy | Literature validation score |
|---|---|
| Most differentially expressed regulators | 1563 |
| Maximum out-degree | 837 |
| Maximum target fold enrichment | 1165 |
| Maximum in-degree | 1584 |
| Maximum closeness centrality | 796 |
| Maximum betweenness centrality | 718 |
| Maximum pagerank | 728 |
| Innermost K-core | 894 |
Fig 8Comparison of gene expression measurements in two repeats of the same biological experiment.
The scatter plot shows the measured fold changes of gene expression in E2 vs. Control treated MCF-7 cells in GSE11324 (x-axis) and GSE51304 (y-axis) experiments. Only genes which passed a FDR cutoff of 0.001 (a) or 0.05 (b) are shown. Although both datasets are of high quality, the absolute values of fold change are only moderately correlated. However, the direction of fold change is consistent for most of the genes.
Performance of explaining gene expression in E2 vs. control treated MCF-7 cells using core regulators identified by various ranking strategies.
Three different mathematical or AI models were used for modeling gene expression: linear regression (LR), support vector machines (classification, SVC, and regression, SVR) and principal component analysis (PCA). Performance was measured as area under the ROC curve (AUROC) for real-valued estimators and using Matthew’s correlation coefficient (MCC) for binary classifiers in 5-fold cross validation.
| Ranking Strategy | Area under ROC curve | MCC | ||
|---|---|---|---|---|
| LR | SVR | PCA | SVC | |
| Most differentially expressed regulators | 0.567 | 0.575 | 0.572 | 0.149 |
| Maximum out-degree | 0.675 | 0.683 | 0.602 | 0.289 |
| Maximum target fold enrichment | 0.681 | 0.632 | 0.590 | 0.222 |
| Maximum in-degree | 0.586 | 0.585 | 0.586 | 0.155 |
| Maximum closeness centrality | 0.676 | 0.684 | 0.602 | 0.285 |
| Maximum betweenness centrality | 0.673 | 0.682 | 0.619 | 0.287 |
| Maximum pagerank | 0.653 | 0.646 | 0.620 | 0.254 |
| Innermost K-core | 0.674 | 0.695 | 0.605 | 0.294 |
Fig 9Hierarchical organization of all regulatory molecules, including 106TFs and 58miRNAs, in MCF-7 estrogen response GRN using K-core algorithm.
TF and miRNA nodes are represented by rectangles and diamonds respectively. Nodes are colored red or green depending upon whether the molecule’s expression is up or down-regulated in E2 vs. Control cells. The hierarchy is based on the principle of network centrality where nodes which are more important for the flow of regulatory information are more towards the core. Nodes in core 1 (Myc) are most central, followed by nodes in cores 2, 3, and so on in decreasing order of centrality. Some cores have been clubbed together for ease of visualization.
Gene expression classification in the MCF-7 estrogen response GRN using various selections of regulatory nodes based on their core numbers, K, in K-core hierarchy.
In the top half of the table the innermost core regulators (K ≤ 2) are always included and the cumulative effect of adding further core regulators is measured. In the bottom half of the table the innermost core regulators (K ≤ 2) are excluded in order to measure the individual contributions of regulators at various core levels. Classification accuracy is reported in terms of area under the ROC curve (AUROC) for real valued classifiers (LR, SVR and PCA) and Matthew’s correlation coefficient (MCC) for binary classifiers (SVC).
| Selected regulators | Area under ROC curve | MCC | ||||
|---|---|---|---|---|---|---|
| K cutoff | # Regulators | # Targets | LR | PCA | SVR | SVC |
| K ≤ 15 | 111 | 5625 | 0.733 | 0.609 | 0.742 | 0.364 |
| K ≤ 10 | 89 | 5647 | 0.715 | 0.608 | 0.725 | 0.349 |
| K ≤ 7 | 61 | 5675 | 0.696 | 0.606 | 0.703 | 0.309 |
| K ≤ 3 | 45 | 5691 | 0.692 | 0.606 | 0.699 | 0.311 |
| K ≤ 2 | 29 | 5707 | 0.687 | 0.610 | 0.693 | 0.300 |
| K = 3 | 16 | 5720 | 0.584 | 0.535 | 0.580 | 0.130 |
| 3 < K ≤ 7 | 16 | 5720 | 0.570 | 0.525 | 0.565 | 0.084 |
| 7 < K ≤ 10 | 28 | 5708 | 0.569 | 0.525 | 0.568 | 0.091 |
| 10 < K ≤ 15 | 22 | 5714 | 0.569 | 0.572 | 0.609 | 0.165 |
Literature validation in terms of the average rank of genes in various cores of the K-core hierarchical organization of MCF-7 estrogen response GRN (lower scores are better).
| Core(s) | # Regulators | Average rank |
|---|---|---|
| K≤2 | 29 | 47.8 |
| K = 3 | 16 | 52.4 |
| 4≤K≤7 | 16 | 56.8 |
| 8≤K≤9 | 16 | 78.1 |
| K = 10 | 12 | 80.4 |
Performance of gene expression classification in the MCF-7 estrogen response GRN with and without the inclusion of miRNAs in the list of regulators.
Each row of the table represents a different selection of regulatory nodes based on their core number, K, in the hierarchy produced by K-core. Classification accuracy is reported in terms of the area under the ROC curve (AUROC) for LR and SVR.
| Selected regulators | With miRNAs (AUROC) | Without miRNAs (AUROC) | ||
|---|---|---|---|---|
| K cutoff | LR | SVR | LR | SVR |
| K ≤ 15 | 0.733 | 0.742 | 0.695 | 0.705 |
| K ≤ 10 | 0.715 | 0.725 | 0.695 | 0.705 |
| K ≤ 7 | 0.696 | 0.703 | 0.694 | 0.700 |
| K ≤ 3 | 0.692 | 0.699 | 0.689 | 0.696 |
Fig 10Overlap of miRNA-mRNA interactions predicted in silico by various tools.