| Literature DB >> 15723697 |
Christoph Dieterich1, Steffen Grossmann, Andrea Tanzer, Stefan Röpcke, Peter F Arndt, Peter F Stadler, Martin Vingron.
Abstract
BACKGROUND: Promoters are key players in gene regulation. They receive signals from various sources (e.g. cell surface receptors) and control the level of transcription initiation, which largely determines gene expression. In vertebrates, transcription start sites and surrounding regulatory elements are often poorly defined. To support promoter analysis, we present CORG http://corg.molgen.mpg.de, a framework for studying upstream regions including untranslated exons (5' UTR). DESCRIPTION: The automated annotation of promoter regions integrates information of two kinds. First, statistically significant cross-species conservation within upstream regions of orthologous genes is detected. Pairwise as well as multiple sequence comparisons are computed. Second, binding site descriptions (position-weight matrices) are employed to predict conserved regulatory elements with a novel approach. Assembled EST sequences and verified transcription start sites are incorporated to distinguish exonic from other sequences. As of now, we have included 5 species in our analysis pipeline (man, mouse, rat, fugu and zebrafish). We characterized promoter regions of 16,127 groups of orthologous genes. All data are presented in an intuitive way via our web site. Users are free to export data for single genes or access larger data sets via our DAS server http://tomcat.molgen.mpg.de:8080/das. The benefits of our framework are exemplarily shown in the context of phylogenetic profiling of transcription factor binding sites and detection of microRNAs close to transcription start sites of our gene set.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15723697 PMCID: PMC555765 DOI: 10.1186/1471-2164-6-24
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Resources for validated transcription start sites
| Eukaryotic promoter database (EPD) [44] | The Eukaryotic promoter database is the smallest in size, but largely consists of manually curated entries. |
| DataBase of Transcriptional Start Sites (DBTSS) [45] | The DBTSS contains reliable information on the transcriptional start sites for man and mouse promoters. They exploit the oligo-capping technique to enrich their pool of clones for full-length 5'-to-3' cDNAs |
| H-Invitational Database (H-InvDB) [46] | H-InvDB is an international effort to integrate annotation of 41,118 full-length human cDNA clones that are currently available from six high throughput cDNA sequencing projects. |
| FANTOM 2 (RIKEN) [47] | The RIKEN consortium presented the FANTOM collection of RIKEN full-length cDNA clones. FANTOM stands for Functional Annotation of Mouse cDNA clones. |
| The Reference Sequence project (RefSeq) [48] | The Reference Sequence project aims to provide a comprehensive, integrated, non-redundant set of sequences, including full-length transcripts (mRNA) |
Figure 1Genomic context of human SRF. This image is displayed after the user selected a gene identifier on the search page. It provides the user with the genomic context of the selected gene. Known and predicted transcription start sites are shown as labelled red dots. Local similarities to homologous regions from other species are shown as connected purple boxes. Blue bars depict all upstream regions as contained in CORG. The structure of the corresponding EnsEMBL transcripts as well as the extent of RefSeq transcripts is show in the bottom track.
Figure 2Graphical multiple alignment view (JAVA applet). Multiple alignment view of 6 homologous sequences from 5 species. All consistent local similarities in the upstream region of SRF homologs are placed relative to the species-specific translation start sites. The distance of the aligned segment to the translation start site is almost equal for all mammals and larger for the fish. The extent of each upstream region is shown as orange bar. Regions covered by flanking genes would be shown in red.
Figure 3Textual multiple alignment view. Multiple alignment as rendered by CLUSTAL X. The largest multiple alignment was retrieved from the JAVA applet by a cut and paste operation and rendered in CLUSTAL X [36]. Conserved binding sites are highlighted by red or blue boxes. Known sites as given in TRANSFAC are marked with a dollar sign [42]. Note that the validated Egr-1 site is only conserved in mammals. This site is bound by the serum-inducible Krox-24 zinc finger protein.
Rfam non-coding RNAs in CORG A + sign indicates that a sequence fragment from the corresponding species (hsa Homo sapiens, mmu Mus musculus, rno Rattus norvegicus, dre Danio rerio, tru Takifugu rubripes) is contained in the CORG CNB; ∅ indicates that a blast search for an orhologous sequence in the Ensemble database was unsuccessful; n.d. mean no descriptive Ensemble gene annotation. The CNBs containing mir-196a-2 are shifted compared to the known microRNA sequences, preventing the detection of the correct stem-loop structure. The B columns marks whether a candidate was identified by a blast search against the Rfam or microRNA Registry, the A column shows whether a hairpin structure was identified by RNAalifold.pRNAz is the p-value for being an evolutionary conserved RNA secondary structure element returned by RNAz.
| CNB | B | A | ncRNA | hsa | mmu | rno | dre | tru | gene | |
| 119596 | + | + | 0.995 | mir-34c | + | + | + | + | ∅ | n.d. (BCT-4) |
| 119607 | + | + | 0.938 | mir-34b in hsa | ||||||
| 119658 | + | + | 0.985 | |||||||
| 159914 | + | + | 0.998 | mir-138-2 | + | + | + | + | ∅ | SLC12A3, n.d. in teleosts |
| 159932 | + | + | 0.999 | |||||||
| 159939 | + | + | 0.998 | |||||||
| 194777 | + | + | 0.998 | mir-196b | + | - | + | + | + | HOXA9, dre: HOXA9a and HOXA9b |
| 194820 | + | + | 0.999 | |||||||
| 194839 | + | + | 0.999 | |||||||
| 194941 | + | + | 0.999 | |||||||
| 226470 | + | + | 0.999 | mir-10a | + | + | + | + | + | HOXB4, dre: HOXB4a and HOXB4b |
| 226514 | + | + | 0.999 | |||||||
| 226555 | + | + | 0.999 | |||||||
| 226677 | + | - | 0.004 | |||||||
| 238163 | + | + | 0.992 | mir-10b | + | + | + | + | + | HOXD4, dre: HOXD4a, n.d in tru |
| 238188 | + | + | 0.984 | |||||||
| 238265 | + | + | 0.994 | |||||||
| 391314 | + | - | 0.125 | mir-196a-2 | + | + | - | + | + | HOXC9, dre: HOXC9a |
| 391315 | + | - | 0.999 | |||||||
| 391318 | + | - | 0.511 | |||||||
| 470004 | + | - | 0.218 | U93 | + | + | + | 0 | + | n.d. |
| 110374 | - | + | 0.995 | IRES ? | + | + | + | + | + | DGCR8 |
| 146100 | - | + | 0.891 | + | + | + | + | 0 | Ptf1a | |
| 393794 | - | + | 0.999 | IRE | + | + | + | + | + | SLCA1 |
Figure 4Alignment and predicted RNA structure of . The mir-10b CNB shows the typical pattern of substitutions in a microRNA precursor hairpin: There are two well-conserved arms, of which the mature microRNA is almost absolutely conserved, and a much more variable loop region. [43].
Figure 5Alignment and predicted RNA structure of the Iron Response Element. The Iron Responsive Element (UTRdb [8] identifier: BB277285) shows a substitution pattern that is different from the hairpin structure in Figure 4. Additional orthologous sequences from the frog Xenopus tropicalis (xtr), the chicken Gallus gallus (gga) and the pufferfish Tetraodon nigroviridis are included.