| Literature DB >> 22897887 |
Igor V Deyneko1, Siegfried Weiss, Sara Leschner.
Abstract
BACKGROUND: Transcriptional activity of genes depends on many factors like DNA motifs, conformational characteristics of DNA, melting etc. and there are computational approaches for their identification. However, in real applications, the number of predicted, for example, DNA motifs may be considerably large. In cases when various computational programs are applied, systematic experimental knock out of each of the potential elements obviously becomes nonproductive. Hence, one needs an approach that is able to integrate many heterogeneous computational methods and upon that suggest selected regulatory elements for experimental verification.Entities:
Mesh:
Year: 2012 PMID: 22897887 PMCID: PMC3465240 DOI: 10.1186/1471-2105-13-202
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Workflow diagrams for the experimental discovery of regulatory modules. A) Conventional workflow used to identify regulatory modules in promoters. B) Workflow based exclusively on experimental research. C) Integrative approach for identification of regulatory modules.
Positive and negative characteristics of different workflows shown in Figure1for the discovery of functional regulatory units
| + ease of application | + very straightforward, no parameters or thresholds | + can integrate many existing programs |
| + software is available | + guaranteed result in several rounds | + different algorithms address particular properties of promoters |
| + the spectrum of existing methods covers all particular aspects of transcriptional regulation | | + optimization of a collection of combinatorial modules instead of optimization of each module separately |
| – big number of methods to choose from (over 150 can be found in the Internet) | – may lead to a scission of a functional module rendering all parts non functional | – huge number of predicted features require much memory and CPU = > specificity filtering should be applied before modules optimization |
| – relative performance of methods differs for different datasets | – high lab work and time investments | |
| – chance of a correct prediction is ~5-10% [ | | |
| – impossible to estimate the number of required rounds |
Coverage values for the most specific motif from the respective library in the positive, negative and random datasets
| Positive set ( | 0.85 | 0.77 | 0.77 | 0.77 | 0.77 | 0.85 | 0.92 | 0.77 |
| Negative set ( | 0.32 | 0.17 | 0.07 | 0.09 | 0.19 | 0.11 | 0.27 | 0.47 |
| Random set ( | 0.35 | 0.19 | 0.06 | 0.08 | 0.21 | 0.18 | 0.37 | 0.59 |
aIn brackets are total number of motifs that pass specificity criteria.
Results of the experimental verification of predicted modules
| 1 | Meme1a + TSS | 301_1 (+), 48_1 (+) | 134_1 (−), 156_4 (−), 212_2 (−) |
| 2 | NagC + BRCZ4 + TSS | 272_1 (−) | 134_1 (−), 134_2 (−), 212_2 (−) |
| 3 | (FNR + NagC) OR (FNR + NagC + TSS) | 48_2 (−), 156_1 (−), 272_1 (−) | 156_4 (−), 134_1 (−), 134_2 (−) |
| 4 | (TGIF + FNR + NagC) OR (TGIF + FNR + NagC + TSS) | 301_1 (+), 48_1 (+) | 134_3 (+), 212_1 (+) |
| 5 | Meme1a + HNF1 | 301_1 (+), 48_1 (+) | 134_1 (−), 134_2 |
| 6 | Meme1a + FNR + NagC | 301_1 (+), 48_1 (+) | 134_1 (−), 134_2 (−), 156_4 (−) |
| 7 | MEF2 + TGIF | 301_2 (−) | |
| 8 | BRCZ4 + HNF1 | 271_1 (−) | 134_1 (−), 134_2 (−) |
| 9 | (RcsAB + MDScan3) OR (RcsAB + MDScan2) | 48_2 (−) | 134_1 (−), 134_2 (−) |
| 10 | (DME1 + RcsAB + Poly-(A)8 + TSS) | 301_1 (+) | 156_4 (−) |
| 11 | MEF2 + Meme4 | 156_2 (−), 272_2 (−), 48_2 (−) | |
a Motif identical to tusp[6].
b Fragments showing specific expression (+) support the respective regulatory module, fragments showing no- or unspecific expression (−) reject respective modules. After two experimental rounds module 4 is proved to be functional.
Figure 2Tumor specific regulatory module on the sequence of fragment 212_1. Schematic representation of the regulatory module on the DNA sequence of the fragment 212_1 (A). Expression of the GFP_OVA gene under control of the fragment 212_1 in tumor (B) and spleen (C).
Coverage values of modules on positive, negative and random datasets
| 1 | 0.92 | 3.8E-23 | 0.0181 | 1 | 0.0155 | 1 |
| 2 | 0.92 | 8.8E-14 | 0.0776 | 8 | 0.0813 | 7 |
| 3 | 1 | 2.2E-20 | 0.0350 | 3 | 0.0308 | 4 |
| 4c | 1 | 6.5E-19 | 0.0362 | 5 | 0.0399 | 6 |
| 5 | 1 | 4.1E-19 | 0.0371 | 6 | 0.0385 | 5 |
| 6 | 1 | 3.4E-23 | 0.0360 | 4 | 0.0187 | 2 |
| 7 | 1 | 3.5E-14 | 0.1101 | 10 | 0.0922 | 9 |
| 8 | 1 | 1.9E-09 | 0.2018 | 11 | 0.2137 | 11 |
| 9 | 1 | 1.6E-13 | 0.1009 | 9 | 0.1035 | 10 |
| 10 | 0.77 | 5.9E-19 | 0.0200 | 2 | 0.0223 | 3 |
| 11 | 1 | 6.8E-15 | 0.0642 | 7 | 0.0813 | 8 |
| Add1 | 1 | 5.5E-31 | 0.0091 | | 0.0047 | |
| Add2 | 1 | 3.6E-47 | 0.0000 | 0.000267d | ||
ap-values are calculated as probability of 13*C successful hits out of 13 trials, with probability of success in one trial C.
bRank is according to the ratio C/C and C/C respectively.
cFunctional module 4 does not rank high according to its p-value. One can easily find motifs combinations exhibiting superior statistics (for example, Add1 (Meme1 + Meme4 + TGIF) or Add2 (Meme1 + Meme2 + Meme5).
dTo calculate this value, another random sequence set was generated by repeating 10 times the procedure of random splitting the Salmonella genome. In total the module can be found on 21 of such sequences.