| Literature DB >> 12398796 |
Nikolaus Rajewsky1, Massimo Vergassola, Ulrike Gaul, Eric D Siggia.
Abstract
BACKGROUND: Regulation of gene transcription is crucial for the function and development of all organisms. While gene prediction programs that identify protein coding sequence are used with remarkable success in the annotation of genomes, the development of computational methods to analyze noncoding regions and to delineate transcriptional control elements is still in its infancy.Entities:
Mesh:
Substances:
Year: 2002 PMID: 12398796 PMCID: PMC139975 DOI: 10.1186/1471-2105-3-30
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Summary of the input (dark blue) and output (white) of the three algorithms (light blue).
Figure 2Ahab score for the hairy locus Module score for the hairy locus (screenshot from our interactive web browser). Plotted is the Ahab score as a function of position in the genome. Known modules are marked as "module". Four of the known modules (stripes 1 and 5–7) have high enough scores to appear among the top 146 genome wide predictions and Ahabs predicted binding sites are mapped out in these cases. The stripe3+4 module is not recovered.
Figure 3The even-skipped stripe 3+7 module Known binding sites (in blue) and sites predicted by Ahab (in red) for the even-skipped stripe 3+7 module. knirps sites are marked by circles, hunchback sites by boxes. The upper (lower) half depicts binding sites for the plus (minus) strand. The height of the red symbols corresponds to the score of the sites (Eq. 4).
The 32 modules and nearby genes which are known to be patterned in the early blastoderm (upper block) and 11 additional modules (lower block) for which a pair is linked to the same gene.
| Rank | Score | Gene | Location |
|---|---|---|---|
| 1 * | 37.82 | hairy (stripe 6) | up/9.2 kb |
| 2 * | 28.29 | knirps | up/1.6 kb |
| 3 * | 27.24 | tailless | up/2.6 kb |
| 5 * | 25.32 | knirps | up/1.1 kb |
| 8 | 20.80 | runt (stripe 7) | up/3.2 kb |
| 9 | 19.89 | optix = six3 | down/11 kb |
| 10 | 19.75 | Dichaete | down/2.3 kb |
| 17 | 18.88 | Tenascin-m | up/110 kb |
| 18 | 18. 87 | giant | down/14.5 kb |
| 20 * | 18.76 | Kruppel | up/4.1 kb |
| 23 | 18.30 | ken | intra |
| 24 | 18.29 | giant (posterior) | up/2.1 kb |
| 25 | 18.29 | hairy (stripe 1) | up/4.7 kb |
| 27 | 18.06 | hairy (stripe 1 or 5) | up/5.4 kb |
| 34 | 17.36 | hairy (stripe 7) | up/10.4 kb |
| 36 * | 17.25 | even skipped (stripe 3+7) | up/3.5 kb |
| 37 | 17.15 | knirps-like | intra |
| 41 * | 17.02 | hairy (stripe 5) | up/6.2 kb |
| 43 | 16.94 | brinker | up/10.9 kb |
| 45 | 16.83 | pipsqueak | intra |
| 46 | 16.82 | teashirt | intra |
| 48 * | 16.80 | short gastrulation | intra |
| 51 | 16.76 | abd-A | up/17 kb |
| 54 | 16.65 | abd-B | up/15.3 kb |
| 61 | 16.20 | vnd | intra |
| 75 | 15.90 | cap n' collar | up/4.3 kb |
| 76 * | 15.89 | even skipped (stripe 2) | up/1.7 kb |
| 91 | 15.69 | runt (stripe 3) | up/9.67 |
| 120 * | 15.34 | tailless (proximal torso) | up/0.64 kb |
| 124 | 15.32 | proboscopedia | intra |
| 126 | 15.29 | runt | up/17.2 kb |
| 129 * | 15.24 | hunchback (central stripe) | up/3.34 kb |
| 6 | 23.97 | Cyp6V1 | down/1.6 kb |
| 11 | 19.66 | CG13595? | down/4.5 kb |
| 14 | 19.32 | Cyp6V1 | up/6 kb |
| 55 | 16.62 | echinoid | up/58 kb |
| 58 | 16.34 | CG2118/Acf1/faf | intra |
| 69 | 16.01 | faf/Acf1/CG2118 | intra |
| 93 | 15.65 | echinoid | intra |
| 105 | 15.51 | bruno 3 | up/95 kb |
| 117 | 15.37 | CG5060 | up/31.1 kb |
| 130 | 15.22 | bruno 3 | up/25.1 kb |
| 132 | 15.18 | CG5060 | up/30.1 kb |
Modules which were used to construct weight matrices are marked with stars. The columns give the rank of each module, the score, the gene, information about the location of the module in respect to the gene (up/downstream or intragenic). For References and additional material see additional File 3.
Statistics of factor binding sites for the set of 146 modules predicted by Ahab.
| bcd | cd | dl | hb | kni | Kr | tll | torRE | |
|---|---|---|---|---|---|---|---|---|
| sites | 179 | 99 | 143 | 213 | 302 | 203 | 597 | 24 |
| specific | 3.3 | 3.3 | 3.6 | 3.4 | 2.6 | 3.6 | 2.8 | 4.3 |
| modules | 65 | 35 | 62 | 72 | 84 | 76 | 119 | 4 |
The specificity is defined as a standard error, eg 3 (4) implies a 0.14% (0.003%) probability of getting a match as good as the median data match from random sequence.
Motifs derived by Gibbs sampling the indicated modules (see additional File 1).
| Module | recovered factors (copies) | novel motifs |
|---|---|---|
| eve stripes 2, 3+7 | kni(15), bcd(9), hb/cad(9) | |
| eve stripes 5, 4+6 | hb/cad(6), kni/hb/cad(12) | RTTNSRCGSAAT(9), ATYCYGCARY/ |
| h stripes 5,6,7 | Kr(12), hb/cad(14) | GRCNWG[T/G]TSNSA (9) |
| hb (both mods) | ATTTTCCNSC (9) | |
| kni (1.1 k) | Kr(7), tll(5), hb/cad(9) | GWGWG[A/C]GWGYG(7) |
| Kr (700 bp) | bcd(5), hb/cad(7) | TWNTGATCCWS (6) |
| tll (3 mods) | kni(9), cad(9), | TCRAWSAAT/ |
The criterion for a motif to match a known factor is given in Methods; copies refers to all sequences in the Gibbs derived motif. Only the consensus sequences of the most prominent unclassified motifs are shown with the abbreviations (R = A/G, W = A/T, S = G/C). Matches in italics are marginal and names linked by / co-occur within the same Gibbs motif, possibly on opposite strands.
Modules predicted by Ahab from the hairy derived Gibbs motifs that are patterned in the early blastoderm.
| Rank | Score | Gene | Location |
|---|---|---|---|
| 1 * | 42.16 | hairy (stripe 7 element) | up/10.9 kb |
| 2 * | 36.43 | hairy (stripe 6 element) | up/9.2 kb |
| 3 * | 31.43 | hairy (stripe 5 element) | up/6.2 kb |
| 4 * | 25.08 | hairy (stripe 7 element) | up/10.4 kb |
| 5 | 22.58 | hunchback (hb central stripe) | up/3.3 kb |
| 7 | 21.91 | abd-A (in iab-7 region) | up/83.2 kb |
| 8 | 20.02 | homothorax | intra |
| 9 | 19.41 | bxd (in bxd reg region) | up/18.8 kb |
| 10 | 18.55 | frizzled 2 | up/37.9 kb |
| 16 | 17.76 | hairy (not in known modules) | up/18.9 kb |
| 17 | 17.73 | abd-A | up/0.2 kb |
| 21 | 17.43 | abd-A | up/35.8 kb |
| 24 | 16.69 | fd64A | up/1.4 kb |
| 25 | 16.55 | nubbin | up/2.6 kb |
| 31 | 16.10 | extra macrochaetae | down/2.2 kb |
| 37 | 15.90 | knirps | intra |
| 46 | 15.90 | Btk29A | intra |
| 11 | 18.55 | CG6559 | up/30 kb |
| 39 | 15.72 | CG6559 | up/45 kb |
The last two modules are distinguished by proximity to a common gene. Modules which were used to construct weight matrices are marked with stars. The format follows Table 1. For references and additional material see additional File 5.
Figure 4Argos score for the upstream regions of giant, knirps and Kruppel Argos score to observe a 500 bp module upstream of giant, knirps and Kruppel. The bars mark known modules and translation start is at the right most base.