| Literature DB >> 15701758 |
S Döhr1, A Klingenhoff, H Maier, M Hrabé de Angelis, T Werner, R Schneider.
Abstract
Pathway- or disease-associated genes may participate in more than one transcriptional co-regulation network. Such gene groups can be readily obtained by literature analysis or by high-throughput techniques such as microarrays or protein-interaction mapping. We developed a strategy that defines regulatory networks by in silico promoter analysis, finding potentially co-regulated subgroups without a priori knowledge. Pairs of transcription factor binding sites conserved in orthologous genes (vertically) as well as in promoter sequences of co-regulated genes (horizontally) were used as seeds for the development of promoter models representing potential co-regulation. This approach was applied to a Maturity Onset Diabetes of the Young (MODY)-associated gene list, which yielded two models connecting functionally interacting genes within MODY-related insulin/glucose signaling pathways. Additional genes functionally connected to our initial gene list were identified by database searches with these promoter models. Thus, data-driven in silico promoter analysis allowed integrating molecular mechanisms with biological functions of the cell.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15701758 PMCID: PMC549397 DOI: 10.1093/nar/gki230
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1General strategy for problem-oriented promoter modeling. The bold numbers to the left of the short descriptions indicate the different steps of the strategy and correspond to the numbering used in Methods and Results. Step 2 indicates selection of orthologous promoters. Genes are symbolized by squares and the three species used are indicated (human, mouse, rat). Step 3 symbolizes the generation of models each containing two transcription factor binding sites (TFBSs) from orthologous promoter sets of individual genes obtained in Step 2. Horizontal optimization is done in Steps 4–6 across promoters from the initial problem-specific gene list (IPL). The links between promoter models and the functional association of genes in the cell is symbolized at the bottom (Step 8). For details of our application example, see Figure 4.
Figure 4Functional association between the biological networks and promoter model-derived regulatory networks. The gray arc symbolizes the cell membrane. Dark gray symbols indicate gene products. Membrane receptors are shown inserted into the membrane (with symbolized ligand docking site outside the membrane); ion channels are shown as bipartite structures crossing the membrane; gray circles indicate intracellular proteins. The functional connections between the genes from the IPL were derived by BiblioSphere™ analysis and are indicated by gray arrows; ‘?’ indicates putative connections. M1,2,3 above the gene symbols indicates that models 1, 2 and 3 all match within the promoter of the respective gene. Shaded areas underlying the graphics indicate potential regulatory networks, which are linked by shared promoter models (regulatory network M1b and regulatory network M5a).
Problem-oriented gene selection: MODY
| Gene | LocusID | Description | Ortholog | Functional data (Literature) |
|---|---|---|---|---|
| ABCC8* | 6833 | ATP-binding cassette, subfamily C (CFTR/MRP), member 8 | hmr | Insulin release |
| ANXA7 | 310 | Annexin VII: calcium-channel, voltage-gated | hmr | Membrane fusion |
| CACNA1A | 773 | Calcium channel, voltage-dependent, P/Q type, alpha 1A subunit | hr | Hormone release |
| CACNA1D | 776 | Calcium channel, voltage-dependent, L type, alpha 1D subunit | h | Calcium signaling |
| CACNA1H | 8912 | Calcium channel, voltage-dependent, L type, alpha 1H subunit | h | Calcium signaling |
| GCG* | 2641 | Glucagon | hm | Glucose metabolism |
| GCGR | 2642 | Glucagon receptor | hm | Carbohydrate metabolism |
| GCK* | 2645 | Glucokinase | hmr | Glucose metabolism |
| GCKR | 2646 | Glucokinase regulatory peptide | hm | Glucose metabolism |
| GIPR* | 2696 | Gastric inhibitory polypeptide receptor | hmr | Stimulates insulin release |
| GLP1R* | 2740 | Glucagon-like peptide 1 receptor | hmr | Stimulates insulin release |
| IGF1* | 3479 | Insulin-like growth factor 1 | hmr | Glucose metabolism |
| IGF1R* | 3480 | Insulin-like growth factor 1 receptor | hmr | Carbohydrate metabolism |
| INS | 3630 | Insulin | hmr | Glucose metabolism |
| INSR* | 3643 | Insulin receptor | hmr | Carbohydrate metabolism |
| INSrR | 3645 | Insulin related receptor | hmr | Carbohydrate metabolism |
| IRS1* | 3667 | Insulin receptor substrate 1 | hmr | Inhibition of insulin signaling |
| ITPR3 | 3710 | Inositol 1,4,5-triphosphate receptor 3 | hm | Calcium channel, signaling |
| KCNJ3* | 3760 | Potassium inwardly rectifying channel, subfamily J, member 3 | hmr | Insulin release (assumed) |
| KCNJ5 | 3762 | Potassium inwardly rectifying channel, subfamily J, member 5 | hr | Insulin release (assumed) |
| KCNJ6* | 3763 | Potassium inwardly rectifying channel, subfamily J, member 6 | hm | Insulin release |
| KCNJ11* | 3767 | Potassium inwardly rectifying channel, subfamily J, member 11 | hmr | Insulin release |
| LEPR | 3953 | Leptin receptor | h | Adipose-tissue regulation |
| NPY1R | 4886 | Neuropeptide Y/peptide YY receptor Y1 | h | Gastrointestinal signaling |
| PCSK1* | 5122 | EC 3.4.21.93, proprotein convertase 1 | hmr | Insulin processing |
| PCSK2* | 5126 | EC 3.4.21.94, proprotein convertase 2 | hmr | Insulin processing |
| SLC2A2* | 6514 | Solute carrier family 2 | hmr | Carbohydrate metabolism |
The initial problem-specific list (IPL) of 27 genes; all gene names are according to HUGO officially preferred symbols (46). Availability of orthologous gene promoters is indicated by single-letter abbreviations in column 4. h = human, m = mouse, r = rat. The 15 final orthologous promoter sets used for promoter modeling are indicated by asterisks (*).
Figure 2Model descriptions. The selected five 2-TFBSs-models (TFBSs symbolized by gray boxes) generated from promoter analysis are shown on the top (M1–M5). Naming of TFBSs is according to vertebrate matrix families in MatInspector (Genomatix). The threshold used (opt = optimized; −0.02 = optimized − 0.02) is indicated above the boxes. ‘+’ and ‘−’ signs inside the boxes indicate strand orientation of the respective TFBS. Numbers centered below the boxes denote distances between TFBSs. Extended models (M1a, M1b, M1c, M5a) are shown below models M1–M5 (newly added TFBSs are indicated by open boxes).
Figure 3Optimization of model selectivity. The histogram shows the increase in selectivity (as defined in Methods) determined for the gene list against the Genomatix Human Promoter Database (see also Table 2). The joined boxes below the histogram indicate the different model structures with 2-, 3- or 4-TFBSs.
Model evaluation
| Model | Origin | Model matches in IPL(27 genes) | Recall in IPL | Hits in EPD | Hits in GPD | Selectivity | |||
|---|---|---|---|---|---|---|---|---|---|
| % | N | % | N | % | EPD | GPD | |||
| M1 | KCNJ11 | KCNJ11, ABCC8, ANXA7, GCGR, INSRR, IRS1, ITPR3, KCNJ3 | 30.0 | 96 | 3.2 | 1335 | 2.7 | 9.4 | 11.1 |
| M2 | ABCC8 | ABCC8, ANXA7, CACNA1H, GIPR, IGF1R, KCNJ11, LEPR, PCSK1, PCSK2 | 33.0 | 253 | 8.4 | 3283 | 6.5 | 3.9 | 5.1 |
| M3 | GIPR | GIPR, KCNJ3, CACNA1H, IRS1, KCNJ11 | 18.5 | 95 | 3.2 | 1650 | 3.3 | 5.8 | 5.6 |
| M4 | GCG | GCG, ANXA7, INSR | 11.1 | 145 | 4.8 | 3093 | 6.2 | 2.3 | 1.8 |
| M5 | GLP1R | GLP1R, ABCC8, GIPR, INS, PCSK1, PCSK2 | 22.2 | 35 | 1.2 | 484 | 1.0 | 18.5 | 22.2 |
| M1a | KCNJ11 | KCNJ11, ABCC8, ITPR3 | 11.1 | 34 | 1.1 | 490 | 1.0 | 9.8 | 11.3 |
| M1b | KCNJ11 | KCNJ11, ABCC8, ANXA7, INSRR, IRS1, ITPR3, KCNJ3 | 25.9 | 36 | 1.2 | 505 | 1.0 | 21.6 | 25.6 |
| M5a | GLP1R | GLP1R, GIPR, INS, PCSK2 | 14.8 | 20 | 0.7 | 260 | 0.5 | 22.1 | 28.5 |
| M1c | KCNJ11 | KCNJ11, ABCC8, ITPR3 | 11.1 | 15 | 0.5 | 191 | 0.4 | 22.2 | 29.2 |
Selected models and their matches found in the list (IPL) of 27 genes and in two different databases (EPD and GPD). All gene names are according to HUGO officially preferred symbols (46). Origin of the model (column 2) denotes the respective set of orthologous gene promoters used for modeling. Promoters of four genes (ABCC8, ANXA7, GIPR, KCNJ11) match to three different models indicating highly interconnected networks. Models with three TFBSs show higher selectivity than models with two TFBSs (columns 5, 6 and 7, absolute match numbers, percentage recognized of all sequences in database and selectivity).