| Literature DB >> 17217514 |
Abstract
BACKGROUND: As more and more genomes are sequenced, comparative genomics approaches provide a methodology for identifying conserved regulatory elements that may be involved in gene regulations.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17217514 PMCID: PMC1780116 DOI: 10.1186/1471-2105-7-S4-S21
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The effect of (a) SMM and (b) WBMM on the size of sequence space to be searched and on the percentage of true binding sites retained for the muscle genes. Masking percentage (MP) is defined in Methods. 't/7' represents SMM and indicates that a base is retained if the base is in the non-repeat region and conserved in at least t of the 7 species (Methods). 't/7 W' indicates the corresponding WBMM.
Comparison of motifs identified by different programs for the muscle genes 1,2,3,4,5.
| masking repeats 6 | GGGACATG 14/2/68 | TCAGCCCT 4/1/63 | N | N | ATCAGCCC 4/2/60 | 34/5/191 |
| 1/7 | AGGGGGCATG 14/1/19 | N | N | N | N | 34/1/19 |
| 2/7 | GACAGCTG 14/9/41 | ACAAGG 4/1/5 | AAATAGCCCC 7/1/4 | GACATCTGGC 4/1/14 | N | 34/12/64 |
| 3/7 | CAGCTGTT 14/10/19 | CCTTATTTGG 4/2/12 | GCTAAAAATAGC 7/6/12 | N | CATACAAGGC 4/1/2 | 34/19/45 |
| 4/7 | GACAGCTG 14/9/19 | CCCAAAATAGCC 4/1/5 | CTATAAATAC 7/6/13 | N | CCATACAAGGCC 4/1/3 | 34/17/40 |
| 2/7 W | GACAGCTG 14/6/43 | TGCCCT 4/1/15 | N | GACAGCTGAG 4/1/15 | ACAAGGCC 4/1/31 | 34/9/104 |
| 3/7 W | ACAGCTGC 14/8/21 | AGGGCA 4/1/12 | GGGCTATAAA 7/2/9 | AGGGCAGC 4/1/37 | N | 34/12/79 |
| 4/7 W | CAGCTGTT 14/9/15 | CCAAATATGG 4/2/3 | CCTAAGAATAGC 7/2/5 | N | CATACAAGGC 4/1/2 | 34/14/25 |
| Compare-Prospector | CTGTSA 14/1/4 | KAGCYATA 4/1/1 | GYTATW 7/5/7 | CAGCTGTS 4/1/4 | N | 34/8/16 |
| Toucan 7 | GGGrmAGG 14/1/5 | N | N | N | CCTGCT 4/2/12 | 34/3/17 |
1 x/y/z in the table denotes: experimentally determined binding sites/overlap between experimental sites and predicted sites/predicted sites by a discovered motif.
2 Refer to Figure 1 for the description of 't/7' as well as 't/7 W'.
3 'N' in a table cell indicates that the corresponding motif was not detected.
4 Representation of degenerated nucleotides: M = (AC), S = (GC), V = (AGC), R = (AG), Y = (CT), H = (ACT), W = (AT), K = (GT), D = (AGT), B = (GCT), N = (AGCT)
5 None of the motifs reported by our approach, CompareProspector or Toucan predicted the experimentally determined Sp1 binding site.
6 Masking repeats represents the 5000-bp upstream sequences (no gap), for which only the repeat regions were masked, were used.
7 Motifs identified by Toucan were taken from their report [7].
Figure 2Comparison of performance of CompareProspector, Toucan and our approach on the muscle genes. Sn denotes the combined sensitivity of a particular program for the muscle genes, PPR denotes combined positive prediction rate, AP denotes combined average performance (Table 1). 'Mask repeats' represents the 5000 bp upstream sequences (no gap) for which only the repeat regions were masked were used. Refer to Figure 1 for the description of 't/7' as well as 't/7 W'.