| Literature DB >> 34930908 |
Ivan Rodrigo Wolf1, Rafael Plana Simões1,2, Guilherme Targino Valente3,4.
Abstract
Gene regulatory networks (GRNs) play key roles in development, phenotype plasticity, and evolution. Although graph theory has been used to explore GRNs, associations amongst topological features, transcription factors (TFs), and systems essentiality are poorly understood. Here we sought the relationship amongst the main GRN topological features that influence the control of essential and specific subsystems. We found that the Knn, page rank, and degree are the most relevant GRN features: the ones are conserved along the evolution and are also relevant in pluripotent cells. Interestingly, life-essential subsystems are governed mainly by TFs with intermediary Knn and high page rank or degree, whereas specialized subsystems are mainly regulated by TFs with low Knn. Hence, we suggest that the high probability of TFs be toured by a random signal, and the high probability of the signal propagation to target genes ensures the life-essential subsystems' robustness. Gene/genome duplication is the main evolutionary process to rise Knn as the most relevant feature. Herein, we shed light on unexplored topological GRN features to assess how they are related to subsystems and how the duplications shaped the regulatory systems along the evolution. The classification model generated can be found here: https://github.com/ivanrwolf/NoC/ .Entities:
Year: 2021 PMID: 34930908 PMCID: PMC8688434 DOI: 10.1038/s41598-021-03625-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
The number of interactions, regulators, and targets of analyzed GRNs.
| Organism/cell type | Raw interaction | Interaction* | Target* | Regulator* | Total of instances* | References | Num. genes | Num. TFs; Reference | % genes used |
|---|---|---|---|---|---|---|---|---|---|
| 4490 | 3744 | 1594 | 197 | 1791 | [ | 4464 | 207[ | 40.12 | |
| 17,030 | 17,030 | 3150 | 149 | 3299 | [ | 6446 | 301[ | 51.17 | |
| 19,657 | 14,319 | 767 | 114 | 881 | [ | 17,532 | 1052[ | 5.02 | |
| 18,772 | 5117 | 3428 | 307 | 3735 | [ | 33,467 | 2451[ | 11.16 | |
| 106,096 | 9591 | 2307 | 306 | 2613 | [ | 42,220 | 1639[ | 6.18 | |
| mESC** | 110,517 | 110,517 | 21,025 | 40 | 21,065 | [ | – | – | – |
| mESC-J1** | 17,422 | 17,422 | 8148 | 6 | 8154 | [ | – | – | – |
| mESC-V6.5** | 5675 | 5675 | 2758 | 3 | 2761 | [ | – | – | – |
| mESC-E14** | 361 | 361 | 361 | 1 | 362 | [ | – | – | – |
*Number of interactions and nodes after filtering.
**Datasets exclusively used as test sets. The number of genes per species were retrieved from NCBI (accession numbers GCF_000005845.2, GCF_000146045.2, GCF_000001215.4, GCF_000001735.4, and GCF_000001405.39). The “Num. TFs” depicts the number of transcription factors of each species. The “% genes used” are the proportion between the “Total of instances” and the “Num. of genes”.
Figure 1Predictive performances. (a) Predictive performances during supervised learning. CCI correctly classified instances, TPR true positive rate, 1-FPR one minus false positive rate, MCC Matthews correlation coefficient, ROC area receiver operating characteristic area under the curve, PRC area precision-recall curve; (b) predictive score of the consensus models over each test set. Blue boxes and the Y-left axis depict the classification using the normal consensus model (only the scores from CCI were plotted), whereas the red boxes and Y-right axis depict classification using the random model. The “+” indicates the mean. The Mann–Whitney test showed a p-value < 0.001 for all comparisons.
Figure 2Decision tree, GO, and network simulation analysis. (a) The consensus tree which “A”, “B”, “C”, “D”, “E”, and “F” are the bins from the discretization step. Orange squares are the node’s features, and blue squares are the classified leaves; (b) the biological process (rows) of genes in tree’s leaves in (a) and the feature that leads to the leaves (Knn, degree, or page rank) (columns). The “reg.” means regulators, “tar.” means targets. The black box indicates the presence of a given GO term in genes at that tree leaves. The histogram in the box below the heatmap depicts the percentage of GO terms from genes that lie in each leaf type; (c) representation of hypothetical networks. The Knn was calculated for the regulators (yellow nodes). Blue nodes are genes with just one connection. The red node depicts a blue node duplication. The green nodes represent other regulators or genes regulated by many regulators. “I”, “II”, “III”, and “IV” represents networks in an initial state, after a gene duplication or during pervasive transcription, after duplication of a different regulator, and after duplication of the regulator in which Knn is calculated, respectively; (d) simulation of Knn evolution of regulators from (c). The X-axis is the degree of targets and regulators, and the Y-axis is the regulator’s Knn. The diagonal grey line is the identity line (a line where every point has proximal X and Y coordinates), which by crossing only the second point, indicates divergencies since the beginning of the simulation.