| Literature DB >> 24635915 |
Pierre Blavy, Florence Gondret, Sandrine Lagarrigue, Jaap van Milgen, Anne Siegel1.
Abstract
BACKGROUND: Most of the existing methods to analyze high-throughput data are based on gene ontology principles, providing information on the main functions and biological processes. However, these methods do not indicate the regulations behind the biological pathways. A critical point in this context is the extraction of information from many possible relationships between the regulated genes, and its combination with biochemical regulations. This study aimed at developing an automatic method to propose a reasonable number of upstream regulatory candidates from lists of various regulated molecules by confronting experimental data with encyclopedic information.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24635915 PMCID: PMC4004165 DOI: 10.1186/1752-0509-8-32
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Figure 1A new unified formalism merging reactions and effects and its interpretation in a causality graph. This graphical scheme illustrates the successive steps in the conversion of irreversible reactions available in encyclopedic database in a causality graph. (1) Common knowledge: encyclopedic information about mammalian cell metabolism and its regulation generally refer to two different concepts: i) the effects (r1) that describe the consequence of the variation induced by a regulator (e.g., a, a transcription factor; e, an enzyme) on reactions (e.g., rate of transcription, speed of reaction), and ii) the reactions (r2) that catabolize substrates (e.g., b or c) to produce newly formed molecules (products; d). Regulators may have a positive (activator) or a negative (inhibitor) effect, or could affect the reaction with a sign which is not yet properly referenced (modulator). (2) Unified formalism: a new common formalism of “regulated reactions” was proposed. The effects were modeled as reactions that were regulated by a regulator and that produced a regulated product using a not limiting and not modeled unknown substrate. This formalism clearly distinguished between the fluxes of substrates and products, on the one hand, and reaction speed, on the other, this latter trait being dependent on the amounts of regulators. (3) Causality graph: Under the quasi-stationary hypothesis, the reaction speed was described as an increasing function of the availability of each substrate and of the amounts of each activator, a decreasing function of the amount of each inhibitor, or a monotonous function of the amount of each modulator, respectively. The causality graph described the variations in the amounts of molecules, the speeds of reactions, and the fluxes (as nodes); it predicted their consequences as edges (positive: +, negative: - or unknown: ?). Both regulated reactions and reactions that were not explicitly regulated could be considered in this formalism.
Modeling encyclopedic information as a set of regulated reactions used to build a causality graph
| Metabolites | 122,591 | Metabolites, genes | 132,762 | Quantity | 92,872 |
| Genes | 35,594 | Availability | 151,127 | ||
| | | Reversible reactions | 40,541 | Reaction speed | 158,554 |
| | | Irreversible reactions | 118,013 | | |
| Reversible reactions | 41,278 | Substrates | 147,009 | + | 1711,844 |
| Irreversible reactions | 110,838 | Products | 168,748 | - | 104,538 |
| Positive effects | 2,493 | Activators | 72,899 | ? | 18,636 |
| Negative effects | 854 | Inhibitors | 960 | | |
| Unknown effects | 10,251 | Modulators | 18,350 | | |
| Gene to protein | 59,956 | ||||
aObtained from the TRANSPATH database that includes a description of biochemical reactions, protein-protein interactions, and transcription factors involved in signal transduction of mammalian cells.
bReactions and effects (i.e., causal dependencies) were unified in the concept of regulated reactions, which corresponded to a set of substrates, products and regulators (activators, inhibitors or modulators). These regulators included transcription factors (assuming the regulated gene as the product of a reaction using a non-limiting unknown substrate) and enzymes (catalyzing biochemical reactions between substrates and products). A Boolean attribute was added to distinguish between reversible and irreversible reactions.
cThe regulated reactions were converted in a causality graph to model the variations in the amounts of molecules, fluxes and reaction speeds (nodes), and to predict their consequences (edges). Because nodes were shared between various regulated reactions, the conversion of the regulated reaction network in the causality graph led to a large increase in the number of nodes and edges.
Topological properties of regulated reaction networks and causality graphs built from various experimental lists
| 0 | | | | | | | | | | |
| Molecules, % | 1.9 | 17 | 52 | 0.8 | 10 | 46 | 0.1 | 1.8 | 14 | |
| Reactions, % | 1.1 | 16 | 59 | 0.6 | 9.2 | 53 | 0.05 | 1.6 | 15 | |
| 2.1 | 2.2 | 2.3 | 2.3 | 2.2 | 2.3 | 1.9 | 2.3 | 2.2 | ||
| 84 | 93 | 96 | 89 | 90 | 96 | 92 | 94 | 94 | ||
| 3.6 | 4.2 | - | 4.4 | 4.1 | - | 2.9 | 48 | - | ||
| 6 | 9 | - | 8 | 14 | - | 4 | 11 | - | ||
| | | | | | | | | | ||
| Nodes, % | 1.8 | 19 | 61 | 0.6 | 10 | 55 | 0.09 | 1.4 | 16 | |
| Edges, % | 2.6 | 28 | 13 | 0.1 | 16 | 72 | 0.06 | 0.3 | 19 | |
| 1.69 | 1.87 | 1.94 | 2.35 | 1.87 | 1.93 | 1.67 | 2.35 | 1.91 | ||
| 79 | 78 | 85 | 91 | 66 | 85 | 83 | 94 | 75 | ||
| 100 | | | | | | | | | | |
| Molecules, % | 1.4 | 11 | 44 | 0.8 | 6.9 | 35 | 0.1 | 1.8 | 12 | |
| Reactions, % | 0.8 | 8 | 49 | 0.6 | 5.7 | 38 | 0.05 | 1.6 | 11 | |
| 2.10 | 2.25 | 2.30 | 2.34 | 2.21 | 2.30 | 1.89 | 2.30 | 1.24 | ||
| 87 | 97 | 98 | 89 | 96 | 98 | 92 | 94 | 97 | ||
| 3.8 | 4.5 | 7.7 | 4.4 | 4.4 | - | 2.9 | 4.8 | 4.4 | ||
| 6 | 9 | 12 | 8 | 14 | - | 4 | 11 | 16 | ||
| | | | | | | | | | ||
| Nodes, % | 1.2 | 10 | 50 | 0.6 | 6.7 | 40 | 0.09 | 1.4 | 13 | |
| Edges, % | 1.3 | 7.1 | 41 | 0.1 | 3.5 | 29 | 0.06 | 0.3 | 6.6 | |
| 1.73 | 1.97 | 2.03 | 2.35 | 2.06 | 2.04 | 2.35 | 2.09 | 1.67 | ||
| 84 | 93 | 91 | 91 | 89 | 91 | 83 | 94 | 93 | ||
| 1000 | | | | | | | | | | |
| Molecules, % | 0.7 | 3.8 | 20 | 0.8 | 3.6 | 16 | 0.1 | 1.8 | 7.1 | |
| Reactions, % | 0.4 | 1.9 | 22 | 0.6 | 2.4 | 14 | 0.05 | 1.6 | 5.8 | |
| 2.15 | 2.21 | 2.42 | 2.34 | 2.24 | 2.38 | 1.89 | 2.30 | 2.28 | ||
| 94 | 97 | 96 | 89 | 97 | 96 | 92 | 94 | 97 | ||
| 3.8 | 4.4 | - | 4.4 | 5.0 | 5.1 | 2.9 | 4.8 | - | ||
| 6 | 9 | - | 8 | 15 | 13 | 11 | - | 4 | ||
| | | | | | | | | | ||
| Nodes, % | 0.6 | 3.0 | 23 | 0.6 | 3.0 | 15 | 1.4 | 6.6 | 0.09 | |
| Edges, % | 0.6 | 2.2 | 8.9 | 0.1 | 1.0 | 5.3 | 0.3 | 2.3 | 0.06 | |
| 1.72 | 1.88 | 2.34 | 2.35 | 2.28 | 2.03 | 2.35 | 2.35 | 1.67 | ||
| 86 | 87 | 97 | 91 | 97 | 91 | 94 | 97 | 83 | ||
aThree case-study situations were analyzed by using lists with different number and nature of targets: i) the list 1 included 250 unique genes targeted by PPARA[20], ii) the list 2 included 136 gene transcripts that were either up- or down-regulated after addition of agonists of PPARA in cell culture [21], and iii) the list 3 consisted in seven metabolites involved in the successive steps of glycolysis in mammalian cells [32]. Details are provided in Additional file 1: Table S1.
bThe first level neighborhood was obtained by taking all the reactions in which the input molecules were involved. The second level was obtained by adding all the new molecules involved in these latter reactions. The third level was an iteration of this procedure.
cThroughout the neighborhood computation, the first n molecules involved in most reactions in cell metabolism were ignored. In the tested situations, n corresponded to 0, 100 or 1,000. After neighborhood computation, these molecules (so-called “hubs” because they shared many relationships) were added to the network only in the case where they participated in reactions selected from the input lists.
dThe proportion (%) of molecules and reactions (nodes and edges, respectively) that were selected from the full graph in the regulated reaction network (causality graph, respectively) was calculated.
eThe topological properties of the network were analyzed using different network statistical parameters. A r value close to 1 indicates that the graph was scale-free. Assuming that the probability P(k) that a molecule in a network interacts with k other molecules follows a power law [P(k) ~ k], a high γ value indicates that there were few highly connected nodes in the network. The quantity L denotes the average shortest path length by which one can reach node A by node B, and D corresponds to the graph diameter. Within a row, the sign “–“ indicates that the computation of these parameters had failed. This analysis shows that the conversion of regulated reactions in a causality graph kept the original structure of the network.
Success in retrieving the known transcription factor regulating an input list of its gene targets
| 1 | 11.9 | 12.2 | 13.2 | 14.4 |
| 10 | 36.3 | 35.9 | 39.0 | 40.3 |
| 20 | 44.1 | 45.1 | 47.8 | 49.9 |
| 200 | 75.5 | 75.9 | 78.8 | 79.7 |
| 500 | 82.5 | 82.6 | 86.2 | 86.5 |
| 1000 | 83.7 | 83.7 | 87.6 | 87.7 |
| | ||||
| 1 | 0.0 | 0.0 | 0.0 | 0.4 |
| 10 | 0.4 | 1.2 | 1.6 | 2.4 |
| 20 | 0.4 | 1.2 | 2.4 | 4.0 |
| 200 | 5.6 | 7.2 | 14.8 | 18.8 |
| 500 | 11.6 | 13.2 | 27.2 | 28.0 |
| 1000 | 20.0 | 21.2 | 38.4 | 38.8 |
aThe rate of success corresponds to the number of situations out of 250 tests, where the known transcription factor (TF) referenced in the transcriptional regulatory database (TRED) [17] was present among n regulatory candidates that were automatically provided. Candidates were scored for coverage (i.e., the ability to explain the greatest number of targets) or for specificity (i.e., a tradeoff between the number of regulated targets and the total number of regulated molecules). Because the known TF can have the same score as a set of other candidates (i.e., ex aequo), the probability to find the known TF among the candidates was estimated under the hypothesis that ex aequo candidates were randomly ordered. The results indicate that rate of success increased with the number (n) of candidates considered.
bA total of 250 different lists of genes, each of these lists being regulated by a known transcription factor, were automatically extracted from TRED. These lists contained between 1 to 352 target genes (see Additional file 2: Table S2). In a first step, each list of genes targeted by a known TF was successively submitted to analysis. In a second step, all genes included in these lists were randomly shuffled to constitute 250 lists of biologically non-relevant targets.
cThe results indicate that the rate of success was reasonable when 50 to 100 candidates in the answer sets were considered (as indicated in bold face).
PPARs and other regulatory partners were automatically proposed from experimental lists of gene targets
| RXRA: PPARG | [4-29]c | RXRA | [2-20] | RXRA: PXR-isoform1A | [1] | RXRA: PPARA | [4] |
| RXRA: PPARA | [4-29] | RXRA: VDR | [2-20] | RXR: VDR | [3] | RXRA: PPARD | [5-8] |
| RXRA: PPARD | [4-29] | | | RXRA: PPARD | [4] | RXRG: PPARA | [5-8] |
| RXRG: PPARA | [4-29] | | | RXRA: PPARG | [14] | RXRG: PPARD | [5-8] |
| RXRG: PPARD | [4-29] | | | | | RXRA: PPARG | [9,10] |
| RXRG: PPARG | [4-29] | | | | | RXRG: PPARG | [9,10] |
| RXRG: PPARG | [4-29] | | | | | | |
| VDR | [30-2223] | VDR | [2-20] | VDR: RXRA | [3] | | |
| VDR: RXRA | [30-2223] | VDR: RXRA | [2-20] | | | | |
| VDR: calcitriol | [30-2223] | | | | | | |
| VDR: calcitriol: 9-cis-retinoic acid: RXRA | [30-2223] | | | | | | |
| VDR: BLM | [30-2223] | | | | | | |
| PPARA: RXRA | [4-29] | PPARA | [21-1702] | PPARA: RXRA | [2] | PPARA: RXRA | [4] |
| PPARA: RXRG | [4-29] | PPARA: RXRA | [21-1702] | | | | |
| PPARG: RXRA | [4-29] | PPARG: abietic acid | [21-1702] | PPARG: RXRA | [14] | PPARG: RXRA | [9,10] |
| PPARG: RXRG | [4-29] | PPARG: 15d-PGJ2 | [21-1702] | | | PPARG: RXRG | [9,10] |
| | | PPARG: azPC | [21-1702] | | | PPARG | [12] |
| PPARD: RXRA | [4-29] | PPARD | [21-1702] | PPARD: RXRA | [4] | PPARD: RXR | [5-8] |
aDetailed lists are provided in Additional file 1: Table S1, with i) list 1 including 250 regulated genes identified as controlled by PPARA from a literature review [20] and ii) list 2 including 136 differentially expressed genes in response to PPARA agonists in cell culture [21].
bRegulatory candidates of these lists were elicited by an automatic algorithm based on encyclopedic information extracted from the TRANSPATH database and modeled as a causality graph. The candidates were scored for coverage (i.e., the number of targets regulated by a given candidate) or for specificity (i.e., a tradeoff between the number of regulated targets and the total number of regulated molecules) and the first 50th candidates having the highest scores were retained.
cThe range [N-N] indicated that the genes or the protein complex in which they participated had the same score as a set of N- N+1 molecules. When a molecule or a complex appeared more than once, only its best position was shown.
Functional clusters among target genes or their regulatory candidates
| Cluster 1: | Cluster 1: |
| Cluster 2: | Cluster 2: |
| Cluster 4: | Cluster 3: |
| Cluster 7: | Cluster 4: |
| Cluster 3: | Cluster 6: |
| | Cluster 5: |
| | Cluster 8: |
| Cluster 6: | Cluster 7: |
| Cluster 5: | |
| Cluster 9: | |
aTranscripts experimentally identified as responsive to PPARA agonists in cell culture [21] (see Additional file 2: Table S2 for details) were clustered using the DAVID functional annotation tool [5]. Only clusters with an enrichment > 0.5 and a Benjamini score < 0.15 were kept. Eventually, a synthetic description was chosen to name the cluster.
bFifty upstream regulatory candidates that were automatically-proposed according to best specificity scores (see Additional file 2: Table S2 for a detailed list) were clustered. All molecules were first linked to their related genes, with heterodimer protein complexes (e.g., PPARA:RXRA) being switched in the two genes (e.g., PPARA and RXRA).
cThe results show that the input list of regulated transcripts and the output list of proposed candidates notably shared clusters related to fatty acid and cholesterol metabolisms. All clusters were enriched in transcription factors, and especially in peroxisome proliferator-activated receptors (PPAR) isotypes and their partners RXR (indicated in bold face) when calculated from the list of regulatory candidates.
Upstream candidates of a list of metabolites participating in glycolysis
| | ||
| Triose-phosphate isomerase 1 | Carbohydrate metabolic process | |
| Pyruvate carboxylase | Carbohydrate metabolic process | |
| Glucokinase (hexokinase) | Carbohydrate metabolic process | |
| Glucose-6-phosphatase | Carbohydrate metabolic process | |
| Glucose-6 phosphate isomerase | Carbohydrate metabolic process | |
| Aldose reductase | Carbohydrate metabolic process | |
| Serine dehydratase | Gluconeogenesis | |
| Glucosidase | Carbohydrate metabolic process | |
| Glucosamine-6-phosphate deaminase | Carbohydrate metabolic process | |
| Adenylate kinase | ATP metabolic process | |
| Catalase | Cellular response to growth factor stimulus | |
| | ||
| Facilitated glucose transporter member 2 (GLUT2) | Carbohydrate metabolic process | |
| Facilitated glucose transporter member 4 (GLUT4) | Carbohydrate metabolic process | |
| heat shock 70 kDa protein 5 (glucose-regulated protein) | activation of signaling protein activity involved in unfolded protein response | |
| Glutamate receptor | Calcium ion transport | |
| Wolfram syndrome 1 | Calcium ion homeostasis, glucose homeostasis | |
| B-cell CLL/lymphoma 2 | B-cell homeostasis | |
| | ||
| Mitogen-activated protein kinase 8 (JNK1) | JNK cascade | |
| Mitogen-activated protein kinase kinase kinase 7 | Activation of MAPK activity; JNK cascade | |
| | ||
| DNA damage inducible transcript 3 | Negative regulation of transcription, DNA-dependent; activation of signaling protein activity involved in unfolded protein response | |
| X-box binding protein | Activation of signaling protein activity involved in unfolded protein response | |
aThe input target list included the 7 metabolites in the successive steps of the glycolytic pathway in mammalian cells [32]: Glucose; Glucose-6-phosphate; D-fructose-6-phosphate; Fructose-1,6-biphosphate-1; D-glyceraldehyde-3-phosphate; Phosphoenol-pyruvate and Pyruvate. They were supposed to have an increased abundance (i.e., a positive sign) in the situation tested.
bThe regulatory candidates were proposed from the confrontation of the target list to encyclopedic information within a causality graph. Gene ontology (GO) terms for biological process and function were then associated.
Figure 2Finding upstream regulators of various metabolites. This scheme illustrates how an input list of metabolites and the output list of regulatory candidates at the enzymatic and molecular levels were positioned together. A list of 7 metabolites participating in glycolysis (GLUC: glucose; G6P: glucose-6-phosphate; F6P: fructose-6-phosphate; F-1,6P: fructose 1,6 diphosphate; G3P: glyceraldehyde-3 phosphate; PEP: phosphoenolpyruvate; PYR: puryvate) were submitted as input (i.e., molecules for which common or specific upstream regulators should be identified). All these metabolites were supposed to be increasingly abundant in response to an unknown external factor. Regulated reactions in which these metabolites were involved and their neighbor molecules (up to 3 levels of neighboring) were extracted from an encyclopedic database on mammalian signaling pathways and converted into a causality graph. The proposed candidates included transporters (glucose binding; GLUT2/SLC2A2 and GLUT4/SLC2A4), catalytic enzymes (GCK: glucokinase; GPI: glucose-6 phopshate isomerase; TPI: triose phosphate isomerase; PC: pyruvate carboxylase; G6PC: glucose-6 phosphate carboxylase; SDS: serine dehydratase) and integrative actors (MAP3K7; MAPK8/JNK1) in glucose homeostasis and insulin signaling.