| Literature DB >> 23924163 |
Igor V Deyneko1, Alexander E Kel, Olga V Kel-Margoulis, Elena V Deineko, Edgar Wingender, Siegfried Weiss.
Abstract
BACKGROUND: Accurate recognition of regulatory elements in promoters is an essential prerequisite for understanding the mechanisms of gene regulation at the level of transcription. Composite regulatory elements represent a particular type of such transcriptional regulatory elements consisting of pairs of individual DNA motifs. In contrast to the present approach, most available recognition techniques are based purely on statistical evaluation of the occurrence of single motifs. Such methods are limited in application, since the accuracy of recognition is greatly dependent on the size and quality of the sequence dataset. Methods that exploit available knowledge and have broad applicability are evidently needed.Entities:
Mesh:
Year: 2013 PMID: 23924163 PMCID: PMC3754795 DOI: 10.1186/1471-2105-14-241
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Distributions of PWM scores and distances between BSs in real and random CEs. (A) Distribution of PWM scores for first and second BSs in real CEs (red) and random sequence CEs (blue). Scores S and S define the rectangle OABC and perfectly separate high scoring CEs. By reducing the scores (dashed green lines), many additional true CEs, but also a large number of random CE are also covered by the rectangle OA′B′C′. Introduction of a sum of scores (diagonal EF) greatly improves the separation between real and random CEs (discontinuous line A′E′F′C′). (B) Distribution of distances between BSs and sum of matrix scores in real CEs (blue). Distance values were averaged in intervals of score values (1.75-1.80), (1.80-1.85), (1.85-1.90), (1.90-1.95) and (1.95-2.00) (red). The trend line reflects the dependence between PWM scores and distance between BSs.
Figure 2Receiver Operating Characteristic (ROC) curves of three methods on recognition of CE NFAT/AP-1.
Figure 3Nucleotide level correlation scores (nCC) on the TRANSCompel dataset. Nucleotide level correlation scores (nCC) on the TRANSCompel dataset. The graphs show nCC scores at increasing noise levels. Values for CisModule could be calculated only for the “noise0” dataset. For further details see (Klepper et al. [1]).
recognition of regulatory elements in tissue specific promoters
| MatrixCatch | 1 | 4 | 7 | 4 | 5 |
| CMA | 0 | 1 | 3 | 0 | 1 |
| ModuleSearcher | 0 | 1 | 6 | 1 | 3 |
| CisModule | 0 | 0 | 1 | 1 | 2 |
Number of datasets of tissue specific promoters in which the programs found at least one module with the required level of specificity. The total number of datasets is 11.
Specificity values of regulatory modules
| Breast (24) | 5.29 | 1.65 | 2.90 | 3.66 |
| Heart (68) | 2.60 | – | 1.38 | – |
| Kidney (51) | 3.47 | 1.46 | 2.54 | – |
| Muscle (86) | 1.43 | – | 1.35 | – |
| Pancreas (61) | 2.56 | – | 1.43 | – |
| Prostate (17) | 9.54 | 6.19 | 2.49 | 6.54 |
| Thyroid (74) | 1.62 | – | 1.40 | – |
Highest values of specificity (C/C) shown by the programs in different datasets. None of the programs found modules in the datasets: Cerebellum, Liver, Spleen and Testis.
Composite element in prostate specific promoters
| Original composite element sequence | A | |||||||
| uc002uum.1 | MOB4 | −346 | + | 0.972 | 0.976 | 0.012 | 3.801e-06 | AGTT |
| uc003jwu.1 | OCLN | −213 | + | 0.973 | 0.949 | 0.202 | 9.234e-05 | AGA |
| uc003qcg.1 | EPB41L2 | −100 | + | 0.988 | 0.981 | 0.099 | 1.418e-06 | AGA |
| uc001iia.1 | NET1 | −369 | + | 0.991 | 0.765 | 0.207 | 1.091e-05 | ACC |
| uc002eby.1 | ZNF843 | −352 | + | 0.963 | 0.929 | 0.123 | 1.895e-04 | AGCCTA |
| uc004dpe.1 | SHROOM4 | +3 | + | 0.961 | 0.915 | 0.140 | 3.402e-04 | TGC |
| Sequence complementary to the original composite element
sequence | A | |||||||
| uc003edg.1 | C3orf15 | −317 | – | 0.946 | 0.959 | 0.110 | 1.347e-04 | TGGC |
| uc003fsb.1 | TP63 | −345 | – | 0.924 | 0.972 | 0.228 | 2.397e-04 | ACAAA |
| uc003gno.1 | C1QTNF7 | −27 | – | 0.947 | 0.997 | 0.234 | 4.040e-06 | AAAC |
| uc003xye.1 | SULF1 | −333 | – | 0.728 | 0.987 | 0.304 | 1.456e-04 | AAAGAAA |
| uc002tah.1 | AFF3 | −149 | – | 0.922 | 0.993 | 0.046 | 4.600e-06 | TCAGAAG |
| uc003sli.1 | MAD1L1 | +2 | – | 0.761 | 0.981 | 0.276 | 1.790e-04 | TGTC |
| uc003zwl.1 | KIAA1539 | −310 | – | 0.760 | 0.959 | 0.300 | 8.385e-04 | CTCCGTA |
| uc001lwy.1 | SLC22A18 | −113 | – | 0.939 | 0.965 | 0.167 | 1.691e-04 | CGCTCCC |
| uc001wpn.1 | SDR39U1 | −12 | – | 0.767 | 0.993 | 0.313 | 3.045e-05 | TTAG |
| uc004env.1 | COL4A6 | −44 | – | 0.752 | 0.981 | 0.285 | 2.156e-04 | TGAGATG |
Composite regulatory element C/EBP / C/EBP recognized in promoters of genes active in prostate tissue. Nucleotides with significant conservation shown in bold (within binding motifs) and italics (intermediate sequence).
1 Names according to (Jacox et al., [12]).
2 Beginning of the element relative to TSS.
3S - PWM scores for the first and second C/EBP motif, CS - composite score.