| Literature DB >> 23331723 |
Alexis Vandenbon1, Yutaro Kumagai, Shunsuke Teraguchi, Karlou Mar Amada, Shizuo Akira, Daron M Standley.
Abstract
BACKGROUND: Identification of cis- and trans-acting factors regulating gene expression remains an important problem in biology. Bioinformatics analyses of regulatory regions are hampered by several difficulties. One is that binding sites for regulatory proteins are often not significantly over-represented in the set of DNA sequences of interest, because of high levels of false positive predictions, and because of positional restrictions on functional binding sites with regard to the transcription start site.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23331723 PMCID: PMC3602658 DOI: 10.1186/1471-2105-14-26
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The effect of the random sampling approach for minimizing GC content influences. (A). The average GC content in bins of 100 bps in the region -1 kb to +1 kb is shown for 159 promoters of genes with small and large intestine-specific expression (black), and for the same number of sequences randomly sampled from the genomic set of promoters with k = 1 (blue), k = 2 (red), and k = 3 (green). Values of sampled sets are mean values with bars representing the standard deviation based on 500 sampled sets. (B). For the same dataset, the average RMSD of GC content is shown for k = 1 to 10. In this case k* is set to 2.
Figure 2Illustration of our approach for prediction of local TFBS enrichment. (A). For 159 mouse genes with high expression in small and large intestine we show the local enrichment scores in the region -500 to +500 around the TSS for 5 TFBS motifs. The h/2 values were 10 bps for TATA and RXR/RAR/VDR; 20 bps for HNF1; 50 bps for HNF4; and 200 bps for bHLH. (B) The expected local enrichment scores for the same motifs based on the genome-wide set of promoter sequences. h/2 values are as in (A). (C) log10(P) values for the local enrichment scores for the same 5 motifs in the same set of promoters. (D). Visual representation of the locally enriched regions for the 5 motifs. Boxes represent bases with significant scores and lines at both sides represent the h/2 values used. Enriched regions correspond to the regions covered by the boxes and lines.
Overview of performance of several methods and measures for prediction of motif enrichment on artificial and real datasets
| LocaMo Finder (Gaussian) | local | 0.755 | 0.609 | 0.674 | 0.371 | 0.757 | 0.498 | this study |
| LocaMo Finder (uniform) | local | 0.727 | 0.519 | 0.606 | 0.343 | 0.723 | 0.465 | this study |
| RSAT (Binomial distribution) ($) | global | 0.714 | 0.285 | 0.408 | 0.429 | 0.440 | 0.434 | RSAT [ |
| ORI (**) | global | 0.677 | 0.386 | 0.492 | 0.343 | 0.563 | 0.426 | this study |
| Hypergeometric distribution (*) | global | 0.745 | 0.272 | 0.399 | 0.400 | 0.450 | 0.424 | AlignACE [ |
| Fisher’s exact test (*) | global | 0.747 | 0.276 | 0.403 | 0.400 | 0.443 | 0.420 | oPOSSUM [ |
| ORI (*) | global | 0.768 | 0.258 | 0.387 | 0.429 | 0.407 | 0.417 | this study |
| RSAT (Binomial distribution) ($$) | global | 0.591 | 0.498 | 0.541 | 0.271 | 0.607 | 0.375 | RSAT [ |
| Hypergeometric distribution (***) | global | 0.605 | 0.522 | 0.560 | 0.243 | 0.706 | 0.361 | AlignACE [ |
| Fisher’s exact test (***) | global | 0.605 | 0.530 | 0.565 | 0.243 | 0.667 | 0.356 | oPOSSUM [ |
| Casimiro | local | 0.727 | 0.053 | 0.099 | 0.629 | 0.132 | 0.218 | [ |
| Berendzen | local | 0.859 | 0.044 | 0.083 | 0.786 | 0.093 | 0.167 | [ |
| Vardhanabhuti | local | 0.409 | 0.079 | 0.133 | 0.314 | 0.090 | 0.139 | [ |
| FIRE (Information content) | global | 0.586 | 0.342 | 0.432 | 0.100 | 0.200 | 0.133 | FIRE [ |
| TFM-Explorer | local | 0.432 | 0.145 | 0.217 | 0.186 | 0.076 | 0.108 | [ |
| FREE | local | 0.155 | 0.182 | 0.167 | 0.029 | 0.013 | 0.018 | [ |
| A-GLAM | local | 0.032 | 0.259 | 0.057 | 0.000 | 0.000 | NA | [ |
For each method or measure the type of measure (“local”: local enrichment of positioning; “global”: global enrichment), the recall, precision, and F-measure is given for the artificial and real datasets, as well as a reference. Methods are sorted by decreasing F-measure obtained on the real datasets. (*) P value threshold 0.01; (**) P value threshold 0.001; (***) P value threshold 1e-4; ($) sig threshold 0; ($$) sig threshold 2.
A selection of regions of local enrichment that could not be found using standard TFBS over-representation analysis
| B cells, T cells (2) | ETS domain TFs, including SPI1 or PU.1 | -10 to 0 (10) | 0.15 | [ |
| B cells, T cells (2) | HIF1 | 55 to 200 (200) | 0.11 | [ |
| testis (10) | RFX1 | -91 to 129 (200) | 0.012 | [ |
| testis (10) | CREB-binding TFs, including ATF family | -148 to 31 (100) | 0.035 | [ |
| liver (16) | Cux1 (CR3 + HD) | -103 to -90 (10) | 0.037 | [ |
| small and large intestine (19) | HNF1 | -93 to -37 (20) | 0.012 | [ |
| small and large intestine (19) | RXR, RAR, and VDR | -52 to -43 (10) | 0.065 | [ |
| testis (22) | MYB family TFs | -72 to 95 (100) | 0.025 | [ |
| testis (22) | heat shock factors | -58 to 217 (200) | 0.032 | [ |
| skeletal muscle (42) | THR alpha and beta | -30 to -15 (50) | 0.025 | [ |
A selection of regulatory motifs is shown for which regions of local enrichment were detected in mouse tissues and cell types of the GNF GeneAtlas dataset. The tissue, the start and stop position of the region, the h/2 used, the ORI p-value, and references supporting the role of the regulatory motif in the tissue in question are shown.
Figure 3General trends of significantly locally enriched regions detected in the GNF GeneAtlas gene sets. (A) For each base in the region from -2 kb to +1 kb, the number of times it was found to be included in regions of local enrichment is shown, for 32 human and 44 mouse gene sets. The grey region indicates the region from position -300 to +300 where local enrichment was often found. (B) Human enriched regions sorted by Z score of PhastCons scores of the TFBSs within each region. (C) Human enriched regions sorted by p-value of enrichment of weak TFBSs within each region.
Regions with local enrichment of TFBSs of ETS domain TFs
| human | T cells, NK cells (5) | -141 to 51 (100) |
| human | 721 B-lymphoblasts, BM CD34+ cells (11) | -174 to 42 (100) |
| human | 721 B-lymphoblasts, BM CD34+ cells (12) | -151 to 37 (100) |
| human | B cells, Burkitt's lymphoma (13) | -148 to 23 (100) |
| human | BM CD34+ cells, 721 B-lymphoblasts (14) | -81 to 10 (50) |
| human | NK cells, T cells (15) | -97 to 34 (50) |
| mouse | B cells, T cells (2) | -106 to 17 (50) |
| mouse | skeletal muscle, heart (4) | -161 to 26 (100) |
| mouse | thymus, ovary (6) | -137 to 18 (100) |
| mouse | testis (10) | -133 to -58 (100) |
| mouse | T cells, B cells (12) | -133 to -98 (100) |
| mouse | oocyte, fertilized egg (25) | -139 to 42 (100) |
| mouse | oocyte, fertilized egg (34) | -240 to 34 (200) |
| mouse | testis (35) | -178 to 71 (200) |
| mouse | oocyte, fertilized egg (37) | -77 to -11 (50) |
Region with local enrichment of ETS domain TF binding motifs were found in different sets of sequences. The species, and the tissues and cell types associated with the promoters in which the motif was found are listed, as well as the regions of local enrichment.
Figure 4Similar regions of local enrichment were detected in human and mouse promoters. For 4 regulatory motifs, enriched regions predicted in mouse and in human genes with testis-specific expression are shown.