| Literature DB >> 28265516 |
Alberto J Martin1, Sebastián Contreras-Riquelme2, Calixto Dominguez3, Tomas Perez-Acle1.
Abstract
One of the main challenges of the post-genomic era is the understanding of how gene expression is controlled. Changes in gene expression lay behind diverse biological phenomena such as development, disease and the adaptation to different environmental conditions. Despite the availability of well-established methods to identify these changes, tools to discern how gene regulation is orchestrated are still required. The regulation of gene expression is usually depicted as a Gene Regulatory Network (GRN) where changes in the network structure (i.e., network topology) represent adjustments of gene regulation. Like other networks, GRNs are composed of basic building blocks; small induced subgraphs called graphlets. Here we present LoTo, a novel method that using Graphlet Based Metrics (GBMs) identifies topological variations between different states of a GRN. Under our approach, different states of a GRN are analyzed to determine the types of graphlet formed by all triplets of nodes in the network. Subsequently, graphlets occurring in a state of the network are compared to those formed by the same three nodes in another version of the network. Once the comparisons are performed, LoTo applies metrics from binary classification problems calculated on the existence and absence of graphlets to assess the topological similarity between both network states. Experiments performed on randomized networks demonstrate that GBMs are more sensitive to topological variation than the same metrics calculated on single edges. Additional comparisons with other common metrics demonstrate that our GBMs are capable to identify nodes whose local topology changes between different states of the network. Notably, due to the explicit use of graphlets, LoTo captures topological variations that are disregarded by other approaches. LoTo is freely available as an online web server at http://dlab.cl/loto.Entities:
Keywords: Differential analysis; Gene Regulatory Network; Graphlet; Metric
Year: 2017 PMID: 28265516 PMCID: PMC5333545 DOI: 10.7717/peerj.3052
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1All possible realizations of three node graphlets that can be defined in LoTo.
The direction of edges indicate the sense of the transcriptional regulation. Black edges denote true interactions, and red-dashed edges depict false ones. In this definition, true and false edges are given equal relevance. Adapted from Milo et al. (2002).
Description of graphlet types.
The number of required TF-coding genes, true edges, false edges is shown for each graphlet type.
| Graphlet type | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TF required | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
| True edges | 2 | 2 | 3 | 2 | 3 | 4 | 3 | 4 | 3 | 4 | 4 | 5 | 6 |
| False edges | 4 | 4 | 3 | 4 | 3 | 2 | 3 | 2 | 3 | 2 | 2 | 1 | 0 |
Graphlets occurrence in the condition specific GRNs and in the reference network.
| Graphlet type | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Reference | 329819 | 6305 | 1634 | 4338 | 1641 | 488 | 89 | 5 | 0 | 8 | 31 | 3 | 1 |
| Wild-type | 329790 | 6302 | 1634 | 4307 | 1578 | 488 | 89 | 5 | 0 | 8 | 31 | 3 | 1 |
| 329685 | 6060 | 1592 | 4154 | 1552 | 485 | 82 | 3 | 0 | 6 | 27 | 3 | 1 |
Characterization of condition specific GRNs of E. coli.
The number of TF-coding genes (TF), total number of genes (V), existing regulations (EP) and the number of nodes that do not participate in any graphlet (NG) for the two GRNs representing wild-type E. coli and the ompR knock-out.
| GRN | TF | V | EP | NG |
|---|---|---|---|---|
| Wild-type | 196 | 1796 | 4478 | 11 |
| 189 | 1787 | 4437 | 11 |
Figure 2Comparison between single-edge and GBMs.
For each randomization procedure, average values over 1 × 103 replicas for single-edge (solid blue line) and graphlet-based (solid red line) F1 and MCC are shown at different percentages of randomization. (A and B) Show F1 for SWAP and REMO randomizations respectively; and (C and D) show MCC for the SWAP and REMO cases respectively.
Figure 3Contribution of each graphlet type to F1 and MCC GBMs on the two randomization procedures of the E. coli reference network.
For each randomization procedure, the plots show the contribution of each graphlet type to the averaged values of each metric over the 1 × 103 replicas. The X-axis indicates the percentage of randomization, ranging from total randomization on the lefthand side to no variation on the right side. The Y-axis indicates the contribution of each graphlet type to the metric in the form of a percentage. (A) shows F1 for the SWAP randomization, and (B) F1 for the REMO randomizations respectively; and (C and D), MCC for the SWAP and REMO cases, respectively.
Correlation between differences in node centralities and GBMs for TF-coding genes.
Pearson’s (upper right) and Spearman’s (lower left) correlations computed between node centralities and GBMs calculated for TF-coding genes on the comparison between the wild type GRNs of E. coli and ompR knock-out. Centralities metrics are: Average Shortest Path Length (ASPL), Betweenness Centrality (BC), Closeness Centrality (CLC), Clustering Coefficient (CC), Eccentricity (ECC), Neighborhood Connectivity (NC), Stress (STR), Degree (DEG, sum of outdegree and indegree), Outdegree (ODE), and Indegree (IDE). GBMs are F1 and MCC. Statistically significant correlation coefficients (p-value ≤0.01) are shown in bold and their backgrounds are shaded in gray.
| ASPL | BC | CLC | CC | ECC | NC | STR | DEG | ODE | IDE | F1 | MCC | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ASPL | – | 0.058 | 0.042 | − | − | |||||||
| BC | – | −0.012 | 0.002 | 0.108 | −0.017 | 0.072 | 0.028 | 0.135 | 0.012 | 0.014 | ||
| CLC | – | −0.018 | −0.176 | −0.174 | ||||||||
| CC | – | 0.001 | −0.034 | −0.029 | ||||||||
| ECC | – | 0.092 | − | − | ||||||||
| NC | – | −0.022 | −0.003 | −0.001 | ||||||||
| STR | – | 0.057 | 0.011 | 0.127 | 0.017 | 0.019 | ||||||
| DEG | – | − | −0.173 | |||||||||
| ODE | 0.109 | – | − | − | ||||||||
| IDE | – | −0.034 | −0.030 | |||||||||
| F1 | − | − | − | − | − | − | − | − | − | − | – | |
| MCC | − | − | − | − | − | − | − | − | − | − | – |
TF-coding nodes identified by centralities and graphlet based F1.
The table shows confusion matrices of TF-coding genes whose variation in local topology was identified by differences in the centrality metrics and by F1 based on graphlets. This table was built on the comparison between GRNs of E. coli for wild type and ompR knock-out conditions. In this case, nodes identified by both approaches are considered TPs; those whose topological variation was identified only by a change in node centrality are FPs; while those solely identified by F1 are considered FNs. Nodes that do not show any variation in their topology are TNs. Centralities metrics are: Average Shortest Path Length (ASPL), Betweenness Centrality (BC), Closeness Centrality (CLC), Clustering Coefficient (CC), Eccentricity (ECC), Neighborhood Connectivity (NC), Stress (STR), Degree (DEG, sum of outdegree and indegree), Outdegree (ODE), and Indegree (IDE).
| TP | FP | TN | FN | |
|---|---|---|---|---|
| ASPL | 40 | 8 | 131 | 18 |
| BC | 35 | 45 | 94 | 23 |
| CLC | 40 | 8 | 131 | 18 |
| CC | 23 | 1 | 138 | 35 |
| ECC | 11 | 1 | 138 | 47 |
| NC | 51 | 1 | 138 | 6 |
| STR | 30 | 8 | 131 | 28 |
| DEG | 14 | 1 | 138 | 44 |
| IDE | 13 | 1 | 138 | 45 |
| ODE | 8 | 1 | 138 | 50 |
Figure 4ompR subnetwork.
Subnetwork formed by all graphlets in which ompR participates (red colored node) showing the comparison between wild-type and the ompR knock-out GRNs. The subnetwork elements are displayed using different colors for TF-coding genes and effector genes. TP elements are those present in both networks being compared, FN are network elements present only in the wild-type network and FP are those elements present only in the ompR network. The small insert represent the subnetwork formed by only direct neighbors of ompR in the comparison using the same coloring scheme.