| Literature DB >> 17130158 |
Ian J Donaldson1, Berthold Göttgens.
Abstract
Specificity of mammalian gene regulatory regions is achieved to a large extent through the combinatorial binding of sets of transcription factors to distinct binding sites, discrete combinations of which are often referred to as regulatory modules. Identification and subsequent characterization of gene regulatory modules will be a key step in assembling transcriptional regulatory networks from gene expression profiling data, with the ultimate goal of unravelling the regulatory codes that govern gene expression in various cell types. Here we describe the new bioinformatics tool, Composite Motif Discovery (CoMoDis), which streamlines computational identification of novel regulatory modules starting from a single seed motif. Seed motifs represent binding sites conserved across mammalian species. CoMoDis facilitates novel motif discovery by automating the extraction of DNA sequences flanking seed motifs and streamlining downstream motif discovery using a variety of tools, including several that utilize phylogenetic conservation criteria. CoMoDis is available at http://hscl.cimr.cam.ac.uk/CoMoDis_portal.html.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17130158 PMCID: PMC1702496 DOI: 10.1093/nar/gkl839
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1The flow of data in a typical motif discovery experiment using CoMoDis. The user begins with a list of genes thought to be controlled by the same transcription factor with a known DNA sequence binding motif. CoMoDis locates all conserved motifs for this factor (‘seed motifs’) within the loci of the presumed target genes and outputs seed motif flanking sequences for subsequent motif discovery. Questions are highlighted that should be considered when using CoMoDis. External tools are also shown that will aid the user in completing the analysis.
Transcription factor seed motifs currently available in CoMoDis
| In-house curated motifs | Xie | Ettwiller | |||
|---|---|---|---|---|---|
| Name | Consensus | Identifier | Consensus | Identifier | Consensus |
| AML1 | TGYGGT | #1 (NRF-1) | RCGCANGCGY | #1 (CAAT) | CCAATC |
| AP1 | NNNSTCA | #2 (MYC) | CACGTG | #2 (SP1) | GGGCGG |
| CRE | TGACGTCA | #3 (ELK-1) | SCGGAAGY | #3 (CRE) | TGACGTCA |
| CRE | TGACG (half) | #4 (Novel) | ACTAYRNNNCCCR | #4 (ETS) | CGGAAG |
| CEBP | SYAAY | #5 (NK-Y) | GATTGGY | #5 (Ebox) | CACGTG |
| EBF | CCCNNGRG | #6 (SP1) | GGGCGGR | #6 | ACTACA |
| Ebox | CANNTG | #7 (AP-1) | TGANTCA | #7 (CRE-like) | GTGACG |
| Ebox-GATA | CANNTG-GATA | #8 (Novel) | TMTCGCGANR | #8 | CTTTGT |
| c-Myc | CAYGYG | #9 (ATF3) | TGAYRTCA | #9 (SP1-like) | CCCTCCCCC |
| MyoD | CANCWG | #10 (YY1) | GCCATNTTG | #10 | GCGCAGGCGC |
| ETS | GGAW | #11 (GABP) | MGGAAGTG | #11 | GCGCGC |
| GATA | GATA | #12 (E12) | CAGGTG | #12 | AACTTT |
| GLI1 | GACCACCCA | #13 (LEF1) | CTTTGT | #13 | CCTTTAA |
| HMG | WWCAAWG | #14 (ATF3) | TGACGTCA | #14 | TGCGCA |
| HNF1 | GTTAAT | #15 (AP-4) | CAGCTG | #15 | CTCGCGAGA |
| HNF3 | TRTTTRY | #16 (C-ETS-2) | RYTTCCTG | #16 | TTGGCT |
| HNF4 | CAAAGK | #17 (IRF1) | AACTTT | #17 (TATA) | TATAAA |
| Ikaros | HRGGAW | #18 (SREBP-1) | TCANNTGAY | #18 | AAGATGGCGG |
| Iroquois | ACANNTGT | #19 (Novel) | GKCGCN(7)TGAYG | #19 | TTTGTT |
| MEF2 | CTAWWWWTAR | #20 (E4F1) | GTGACGY | #20 | ATGCAAAT |
| MEIS1 | TGACAS | #21 (Novel) | GGAANCGGAANY | #21 | TAATTA |
| MYB | YAACNG | #22 (Novel) | TGCGCANK | #22 | TTTAAG |
| NBOX | CACNAG | #23 (CHX10) | TAATTA | #23 | CGCATGCG |
| NANOG | SATTANS | #24 (MAZ) | GGGAGGRR | #24 | ATAAAT |
| NFAT | GGAAA | #25 (ESRRA) | TGACCTY | #25 | TTTAAA |
| NFAT-AP1 | WGGAAA-TGASTCA | #26 (E4BP4) | TTAYRTAA | #26 | GCCATTTT |
| NFAT-AP1 | WGGAAA-STCA (half) | #27 (Novel) | TGGN(6)KCCAR | #27 | ATAAAA |
| NFKB | GGGRNNYYY | #28 (RSRFC4) | CTAWWWATA | #28 | TAAATA |
| NKX2.5 | CAMTTNR | #29 (Novel) | CTTTAAR | #29 (HTH) | CAGGTG |
| OCT3/4 | ATGMWWVW | #30 (Novel) | YGCGYRCGC | #30 | CTAGCAAC |
| OTX | TAATCY | #31 (Novel) | GGGYGTGNY | #31 (CRE) | TGACGC |
| p53 | RCNWGYNN*0-1*NNRCAWGY | #32 (NF-E2) | TGASTMAGC | #32 | CATTGT |
| PAX5 | RNKMANBSNWGNRKRMM | #33 (MEF-2) | YTATTTTNR | #33 | GCCATCTT |
| RE1 | NTYAGMRCCNNRGMSAG | #34 (Novel) | CYTAGCAAY | #34 | ATTTAT |
| SOX2 | CWTTGTD | #35 (MYOD) | GCANCTGNY | #35 | ATGAAT |
| SP1 (1) | GGGHGGG | #36 (FREAC-2) | RTAAACA | Ettwiller | |
| SP1 (2) | GGGSWGGG | #37 (Novel) | GTTRYCATRR | ||
| SP1 (3) | GGKGYGGG | #38 (ERR-alpha) | TGACCTTG | #1 | TAATTA |
| SRF | CCWWWWWWGG | (Novel) | TCCCRNNRTGC | #2 | CAGCTG |
| STAT5 | TTCYNRGAA | #40 (STAT5A) | TTCYNRGAA | #3 (TRE) | TGAGTCA |
| TEF | CATTCC | #41 (MEIS1) | TGACAGNY | #4 (ETS) | CAGGAAGT |
| #42 (Novel) | TGACATY | #5 | CCCTCCC | ||
| #43 (Novel) | GTTGNYNNRGNAAC | #6 | AATAAA | ||
| #44 (OCT-X) | YATGNWAAT | #7 (Homeo-like) | AATTAA | ||
| #45 (Novel) | CCANNAGRKGGC | #8 | AGAAAA | ||
| #46 (Novel) | WTTGKCTG | #9 | ATAAAA | ||
| #47 (NF-1) | TGCCAAR | #10 | TTTCCA | ||
| #48 (C-REL) | GCGNNANTTCC | #11 (TATA-box) | TATAAATAG | ||
| #49 (SOX-9) | CATTGTYY | #12 | AGGAAA | ||
| #50 (PU.1) | RGAGGAARY | #13 | TTTCCT | ||
| #14 | TTCAAA | ||||
| #15 | TGACCT | ||||
| #16 | ATTTGCAT | ||||
| #17 | TTGTTT | ||||
| #18 | TTTAAA | ||||
| #19 | TTTCAG | ||||
Summary of motif discovery and motif scanning tools. The addresses link to the author's website
| Tool | Web site | Reference |
|---|---|---|
| Motif Discovery—single sequence output | ||
| BioProspector | ( | |
| DME | ( | |
| GAME | ( | |
| nMICA | ( | |
| Weeder | ( | |
| YMF | ( | |
| Motif Discovery—orthologous sequence output | ||
| PhyloCon | ( | |
| PhyloGibbs | ( | |
| PhyME | ( | |
| MotifScanning | ||
| Clover | ( | |
| MotifScanner | Unpublished | |
| PROMO | ( | |