| Literature DB >> 22085861 |
Janet M Young1, Ralf M Luche, Barbara J Trask.
Abstract
BACKGROUND: Mammalian olfactory receptors (ORs) are subject to a remarkable but poorly understood regime of transcriptional regulation, whereby individual olfactory neurons each express only one allele of a single member of the large OR gene family.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22085861 PMCID: PMC3247239 DOI: 10.1186/1471-2164-12-561
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1General characteristics of 314 putative OR promoter regions. Panel A. Interspersed repeat content, GC content and orthologous conservation scores in a 2-kb region surrounding the putative TSSs of 314 ORs. GC content is calculated in 50-bp windows across each sequence (with a 10-bp slide), then averaged across all sequences in the dataset, plotting values at the center of each window. Interspersed repeat content was determined by RepeatMasker and the proportion of promoters containing a repeat element at each base position relative to the TSS was calculated (averaged over 20-bp windows, sliding along promoters 1 bp at a time). An orthologous conservation score was calculated using SCONE [45]: the value plotted is 1 - P-value (SCONE output, see Methods) and is averaged over 20-bp windows, sliding along promoters 1 bp at a time. The vertical dashed gray line represents the predicted TSS. Panel B: Distribution of predicted O/E binding sites using MatInspector's default parameters ("opt", black line) or a reduced stringency ("opt-0.1", gray line) in the 314 sequences. Coverage is calculated as the proportion of promoter sequences containing a predicted O/E binding site at each base-pair, averaged over 20-bp windows, sliding along promoters 1 bp at a time. Panel C: Distribution of predicted TATA boxes using MatInspector's optimized score threshold (black line) or a less stringent threshold (gray line) (20-bp windows, 1-bp slide).
Figure 2Number of predicted TF binding sites within 200-bp of the TSS compared to background sequences. Each log-scale plot shows the total number of predicted binding sites in the set of 314 200-bp promoter regions on the y-axis. The x-axis represents the number of sites found in the preceding 200-bp regions (upper panels, A and C) or the average number of sites predicted in 10,000 shuffled sequence datasets (lower panels, B and D). Each data point represents a family of transcription factor matrices in the MatBase database; MatInspector analysis was performed using the default parameters (see Additional File 4 for analysis using less stringent parameters, and Table 1 and Additional Files 5 and 6 for full results and matrix family names). Panels A and B (left) show results before masking O/E sites and TATA boxes, and panels C and D (right) show analysis after masking those sites. Data points for which one of the two values is 0 are not plotted (none shows statistically significant enrichment). Solid black symbols represent matrix families showing statistically significant enrichment; solid gray symbols represent matrix families showing statistically significant depletion; the blue square symbol highlights the V$NOLF family of matrices representing the O/E binding site (as expected, no O/E sites were predicted after applying the mask, so O/E does not appear in panels C and D); the red diamond symbol highlights the O$VTBP family of matrices representing TATA boxes.
Summary of statistical tests for 200-bp putative promoter region
| Comparison to previous 200bp region | Comparison to shuffled sequences | Conservation tests | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MatBase matrix family | MatBase family description | Consensus of arbitrarily chosen PWM | Number of predicted sites in 200bp promoter region | Number of 200bp promoters containing > = 1 site | Number of sites expected | Enrichment compared to expected number | Enrichment p-value | Number of sites expected | Enrichment compared to expected number | Enrichment p-value | p-value: whether whole sites are more conserved than local backgrounc | p-vaiue: whether core nucleotides are more conserved than local background | Significance levels of three tests |
| V$NOLF | Neuron-specific-olfactory factor | nncdabTCCCyngrgarbnkgn | 167 | 137 | 24 | 6.96 | 7.4E-28 | 27.4 | 6.1 | <0.0001 | 5.2E-22 | 2.6E-54 | X X X |
| O$VTBP | Vertebrate TATA binding protein factor | staTAAAwrnn | 391 | 211 | 494 | 0.79 | 1 | 311 | 1.26 | <0.0001 | 1 | 0.65 | . X . |
| V$IKRS | Ikaros zinc finger family | yyTGGGagr | 123 | 104 | 69 | 1.78 | 5.9E-5 | 65.6 | 1.88 | <0.0001 | 2.2E-4 | 4.9E-14 | X X X |
| V$NOLF | Neuron-specific-olfactory factor | nncdabTCCCyngrgarbnkgn | 653 | 265 | 198 | 3.3 | 9.6E-58 | 239.6 | 2.73 | <0.0001 | 2.7E-64 | 3.4E-130 | X X X |
| O$VTBP | Vertebrate TATA binding protein factor | staTAAAwrnn | 2388 | 309 | 2972 | 0.8 | 1 | 2119.1 | 1.13 | <0.0001 | 1 | 0.99 | . X . |
| V$IKRS | Ikaros zinc finger family | yyTGGGagr | 963 | 297 | 622 | 1.55 | 5E-18 | 710.3 | 1.36 | <0.0001 | 1.6E-29 | 1.7E-73 | X X X |
| V$ARID | AT rich interactive domain factor | AATAccvm | 140 | 94 | 89 | 1.57 | 4.6E-4 | 71.9 | 1.95 | <0.0001 | 0.53 | 0.0021 | + X + |
| V$ATBF | AT-binding TF | hhwkrttantAATTahh | 101 | 69 | 68 | 1.49 | 0.0068 | 56.3 | 1.8 | <0.0001 | 0.069 | 7.4E-08 | + X X |
| V$BCDF | Bicoid-like homeodomain TFs | abnyTAATcmnv | 152 | 119 | 102 | 1.49 | 0.001 | 131.5 | 1.16 | 0.0401 | 4.7E-17 | 4.1E-20 | + + X |
| V$BRN5 | Brn-5 POU domain factors | gCATAawttat | 327 | 165 | 282 | 1.16 | 0.037 | 217.5 | 1.5 | <0.0001 | 0.015 | 5.3E-09 | + X X |
| V$CART | Cart-1 cartilage homeoprotein 1 | cTAATtrnsynattan | 452 | 183 | 331 | 1.37 | 8.7E-6 | 318.7 | 1.42 | <0.0001 | 2.7E-17 | 2.1E-30 | X X X |
| V$DLXF | Distal-less homeodomain TFs | nntAATTan | 274 | 129 | 173 | 1.58 | 1.0E-6 | 141.8 | 1.93 | <0.0001 | 1.9E-20 | 2.7E-35 | X X X |
| V$HBOX | Homeobox TFs | raaTTTAattgaa | 510 | 192 | 327 | 1.56 | 1.3E-10 | 317.9 | 1.6 | <0.0001 | 4.3E-17 | 2.7E-30 | X X X |
| V$HOMF | Homeodomain TFs | mCTAAttnn | 646 | 214 | 449 | 1.44 | 1.4E-09 | 463.2 | 1.39 | <0.0001 | 8.6E-4 | 1.4E-13 | X X X |
| V$HOXF | Paralog hox genes 1-8, clusters A, B, C, D | nnamTAATgrggrwnn | 583 | 204 | 404 | 1.44 | 6.7E-09 | 385.2 | 1.51 | <0.0001 | 2.3E-09 | 9.3E-26 | X X X |
| V$LHXF | Lim homeodomain factors | nntwwttAATTaatnn | 557 | 187 | 396 | 1.41 | 1.0E-7 | 350.4 | 1.59 | <0.0001 | 1.8E-08 | 2.8E-28 | X X X |
| V$MYOD | Myoblast determining factors | mrgCARCwgswg | 30 | 20 | 13 | 2.31 | 0.0069 | 16.3 | 1.84 | 0.0042 | 0.017 | 0.51 | + + + |
| V$NKX1 | NK1 homeobox TFs | wgnrcyAATTrgygsnn | 140 | 75 | 89 | 1.57 | 4.6E-4 | 70.2 | 1.99 | <0.0001 | 1.2E-13 | 9.9E-21 | + X X |
| V$NKX6 | NK6 homeobox TFs | TTAAttac | 263 | 151 | 178 | 1.48 | 3.0E-5 | 155.1 | 1.7 | <0.0001 | 1.3E-07 | 6.1E-13 | X X X |
| V$PAXH | PAX homeodomain binding sites | aawaATTAnn | 152 | 68 | 95 | 1.6 | 1.7E-4 | 77.5 | 1.96 | <0.0001 | 0.0015 | 2.5E-10 | X X X |
| V$PDX1 | Pancreatic and intestinal homeodomain TF | rnTAATtagync | 193 | 96 | 131 | 1.47 | 3.4E-4 | 108.4 | 1.78 | <0.0001 | 1.4E-6 | 8.2E-16 | + X X |
| V$AP4R | AP4and related proteins | wgaryCAGCtgyggnc | 121 | 74 | 61 | 1.98 | 5.1E-6 | 99 | 1.22 | 0.0321 | 7.7E-08 | 0.061 | X + X |
| V$DICE | Downstream Immunoglobulin Control Element | kgtySTCTccacag | 186 | 134 | 135 | 1.38 | 0.0026 | 138.1 | 1.35 | <0.0001 | 0.0026 | 0.2 | + X + |
| V$DLXF | Distal-less homeodomain TFs | nntAATTan | 1252 | 274 | 1149 | 1.09 | 0.019 | 1041.7 | 1.2 | <0.0001 | 0.0004 | 1.3E-18 | + X X |
| V$HAND | Twist subfamily of class B bHLH TFs | ccagaTGGCcccccn | 696 | 252 | 537 | 1.3 | 3.3E-6 | 619.8 | 1.12 | 0.0048 | 0.0067 | 0.0019 | X + + |
| V$NKX1 | NK1 homeobox TFs | wgnrcyAATTrgygsnn | 783 | 236 | 619 | 1.26 | 6.6E-6 | 634.4 | 1.23 | <0.0001 | 9.9E-12 | 1E-21 | X X X |
| V$PAX5 | PAX-5 B-cell-specific activator protein | bcnnnrNKCAnbgnwgnrkrgc | 227 | 139 | 180 | 1.26 | 0.011 | 192.2 | 1.18 | 0.0085 | 0.09 | 0.025 | + + + |
| V$PAX6 | PAX-4/PAX-6 paired domain binding sites | GCASbswtgmgtgmn | 664 | 249 | 555 | 1.2 | 9.8E-4 | 617 | 1.08 | 0.0354 | 0.011 | 0.0022 | + + + |
| V$PAXH | PAX homeodomain binding sites | aawaATTAnn | 999 | 247 | 889 | 1.12 | 0.0061 | 767.1 | 1.3 | <0.0001 | 4.5E-09 | 2E-20 | + X X |
| V$PDX1 | Pancreatic and intestinal homeodomain TF | rnTAATtagync | 1013 | 257 | 828 | 1.22 | 8.9E-6 | 743.9 | 1.36 | <0.0001 | 2.6E-5 | 3.7E-25 | X X X |
| V$PTF1 | Pancreas TF 1, heterotrimeric TF | bmcaCCTGyvktkttycccrw | 125 | 95 | 93 | 1.34 | 0.018 | 100.9 | 1.24 | 0.0102 | 0.015 | 0.17 | + + + |
| V$SIX3 | Sine oculis homeobox homolog 3 | nnrhnknTAATswcwncnstv | 647 | 254 | 574 | 1.13 | 0.02 | 515.5 | 1.26 | <0.0001 | 1.8E-07 | 6.6E-28 | + X X |
Results of our statistical tests for enrichment and conservation are provided for selected matrix families before and after masking O/E sites and TATA boxes. Before masking, we provide results only for selected matrix families that we discuss in the text. After masking, we provide results for any matrix family that appeared statistically significant in all three tests before applying the Bonferroni correction for multiple testing. Additional Files 5, 6, 10 and 11 give results for all matrix families, as well as for all individual matrices. Additional Files 7, 8, 9, 12 and 13 give results of analogous tests for 500-bp putative promoter regions. P-values provided here are not corrected for multiple testing, but in selecting matrices for further discussion we used the conservative Bonferroni correction. For each matrix family, we provide the description from MatBase, using the abbreviation TF for transcription factor. We also provide the consensus sequence (using IUPAC degeneracy codes) of an arbitrarily chosen matrix from each family.
The "significance level" column summarizes the results of the three statistical tests in the following order: (a) enrichment versus previous 200-bp region (b) enrichment versus shuffled sequences (c) conservation scores versus surrounding nucleotides, counting whichever of the sites test or the cores test proved more significant (see Methods). The "." symbol indicates not significant; the "+" symbol indicates significance level of p < = 0.05 before applying Bonferroni correction; and the "X" symbol indicates that the p-value remains significant after applying the Bonferroni correction.
Figure 3Overlap between three tests for motif importance. The Venn diagrams depict the number of matrix families that are significant in each of our three statistical tests after Bonferroni correction, and after masking O/E sites and TATA boxes. Panel A shows results of MatInspector scans using default parameters, and panel B shows results using less stringent MatInspector predictions.
Figure 4O/E sites and TATA-boxes are enriched near rat, human, dog and cow OR TSSs. In all panels, the black lines represent data for the 314 mouse OR promoters (i.e., the same data as shown in Figure 1). Colored lines represent putative orthologous promoter regions (determined using UCSC's liftOver utility) from rat (gray), human (red), dog (dark blue) and cow (green). In panels B and C, solid lines represent matrix matches exceeding MatInspector's default score threshold ("opt"), and dotted lines represent matches found using a less stringent score threshold ("opt-0.1"). Panel A shows that orthologous promoters from all four placental mammals examined exhibit the same reduction in repeat content and characteristic fluctuation in GC content near the predicted TSS. Panel B shows that O/E sites are enriched upstream of orthologous promoters in placental mammals, and panel C shows that TATA-boxes are enriched upstream of orthologous TSSs. As in Figure 1, coverage is calculated as the proportion of promoter sequences containing a predicted O/E (or VTBP) binding site at each base-pair, averaged over 20-bp windows, sliding along promoters 1 bp at a time.