| Literature DB >> 30842463 |
Adrian I Campos1,2, Julio A Freyre-González3.
Abstract
Genetic regulatory networks (GRNs) have been widely studied, yet there is a lack of understanding with regards to the final size and properties of these networks, mainly due to no network currently being complete. In this study, we analyzed the distribution of GRN structural properties across a large set of distinct prokaryotic organisms and found a set of constrained characteristics such as network density and number of regulators. Our results allowed us to estimate the number of interactions that complete networks would have, a valuable insight that could aid in the daunting task of network curation, prediction, and validation. Using state-of-the-art statistical approaches, we also provided new evidence to settle a previously stated controversy that raised the possibility of complete biological networks being random and therefore attributing the observed scale-free properties to an artifact emerging from the sampling process during network discovery. Furthermore, we identified a set of properties that enabled us to assess the consistency of the connectivity distribution for various GRNs against different alternative statistical distributions. Our results favor the hypothesis that highly connected nodes (hubs) are not a consequence of network incompleteness. Finally, an interaction coverage computed for the GRNs as a proxy for completeness revealed that high-throughput based reconstructions of GRNs could yield biased networks with a low average clustering coefficient, showing that classical targeted discovery of interactions is still needed.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30842463 PMCID: PMC6403251 DOI: 10.1038/s41598-019-39866-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Density and number of regulators exhibit trends with completeness in Abasy Atlas GRNs. (a) Relationship between density values and number of genes in the network for all the existing GRNs in Abasy Atlas. (b) Relationship between time, genomic coverage, and density of E. coli GRN. dsRNA-GRN including regulatory RNA interactions, strong-GRN with only strongly supported interactions as described in RegulonDB[21,45]. (c) Relationship between the number of genes in a regulatory network and number of regulators.
Figure 2Goodness of fit of Abasy Atlas GRN P(k) to alternative probability distributions. (a) Kolmogorov-Smirnov D (KS D) statistic of the GRN P(k) data against the MLE probability distributions. Higher values indicate a higher deviation (worse fit) from the fitted distribution. (b) Log-likelihood ratio test score of power-law vs. other distributions. Higher values (red) indicate a preference for power law while smaller values (blue) indicate a preference for an alternative distribution (y axis labels). Blank spaces denote non-significant comparisons. All ER graphs initial parameters were generated by randomly sampling from the distribution of biologically equivalent measures (see Materials and methods). The scores depicted are the mean of 1000 random sampling experiments using a previously published information retrieval sampling scheme[32]. (c,d) KS_D statistic assessing the goodness of fit of Erdos-Renyi graphs (sampled with the information retrieval scheme) to a Poisson (c) and Power-law P(k) distribution. As before higher values of KS D indicate a worse fit. Results represent the mean of 100 iterations of the sampling scheme for each combination of bait and coverage values. (e) Heatmap depicting the goodness of fit differences for the same ER sampled networks, negative values would indicate a preference for power law whereas positive values indicate the expected preference for Poisson, all of this differences are statistically significant (see Supplementary Fig. 3). Detailed annotated subgraphs a and b are available in Supplementary Fig. 4.
Figure 3Property incompatibilities between GRN and theoretical network null models. (a,b) BA and ER graphs were generated to span the range of densities observed in Abasy Atlas GRNs (No significant difference between the parametrized ER networks and the GRN distribution). (c) Tail length distribution for the networks depicted in (a) (p < 0.001 Mann-Whitney U test). (d) Distribution of average clustering coefficient for the networks depicted in (b) note the substantial differences between biological GRNs and both null models (p < 0.001 Mann-Whitney U test). (e,f) Sampled ER network (previously reported as power law) properties (same networks as above) were calculated. Note that the better the fit to a power law (Fig. 2c–e) the higher the deviation of actual properties such as tail-length (e) and average clustering coefficient (f).
Figure 4GRN total number of interactions prediction. (a–c) Models to estimate the total number of interactions in a GRN. (a) Edge regression model (EdR). (b) Density invariance model (DI) where Dg was obtained from average density of most complete graphs. (c) Density proportionality model (DP), where density is modeled as an exponential decay. (d–f) Dependency between completeness comparison score and average clustering coefficient for the different models: Edge linear dependency (d) Density invariant (e) and the density proportionality factor (f). E. coli and M. tuberculosis GRNs are represented with different colors and markers. The comparison score enables a direct comparison of the GRN completeness as predicted by our interaction coverage (derived from the models) or the classical genomic coverage approach; it ranges from minus to positive infinity, with negative values indicating that the interaction coverage predicts the GRN to be less complete than the genomic coverage.
Characteristics and total number of interactions predicted by the density proportionality (DP), density Invariance (DI) or edge regress (EdR) approaches.
| Actual number of interactions | Number of genes | Genomic coverage | Total interactions DP* | Density DP* | Total interactions DI* | Density DI* | Total interactions EdR* | Density EdR* | |
|---|---|---|---|---|---|---|---|---|---|
| 196627_v2016_s17_eStrong ( | 2911 | 3138 | 0.708413 | 7422 (4321–13450) | 0.00075 (0.00043–0.00136) | 8866 (4549–13182) | 0.00090 (0.00046–0.00133) | 7457 (6836–8054) | 0.00075 (0.00069–0.00081) |
| 224308_v2016_sSW16 ( | 3040 | 4421 | 0.423886 | 11277 (6487–20734) | 0.00057 (0.00033–0.00106) | 17599 (9030–26168) | 0.00090 (0.00046–0.00133) | 10639 (9764–11481) | 0.00054 (0.00049–0.00058) |
| 451516_v2015_sRTB13 ( | 2039 | 2844 | 0.240155 | 6583 (3845–11879) | 0.00081 (0.00047–0.00146) | 7282 (3736–10827) | 0.00090 (0.00046–0.00133) | 6728 (6165–7269) | 0.00083 (0.00076–0.00089) |
| 511145_v2017_sRDB16_dsRNA ( | 6843 | 4497 | 0.537469 | 11514 (6619–21185) | 0.00056 (0.00032–0.00104) | 18210 (9343–27076) | 0.00090 (0.00046–0.00133) | 10827 (9937–11684) | 0.00053 (0.00049–0.00057) |
| 83332_v2015_s15 ( | 6572 | 4091 | 0.62112 | 10259 (5917–18800) | 0.00061 (0.00035–0.00112) | 15070 (7732–22407) | 0.00090 (0.00046–0.00133) | 9820 (9011–10599) | 0.00058 (0.00053–0.00063) |
*Estimate (95% confidence interval).
Figure 5E. coli purely high throughput subset of GRN contains an unexpectedly nonrandom low clustering coefficient. Average clustering coefficient distribution of the subsampled networks of E. coli[21]. All networks have the same number nodes. The number of edges in random-edge-removal networks and pure high throughput is the same. M. tuberculosis v2015 (83332_v2015_s15) complete GRN clustering coefficient is depicted in red for comparison.