| Literature DB >> 17963524 |
Erik Larsson1, Per Lindahl, Petter Mostad.
Abstract
BACKGROUND: Correct temporal and spatial gene expression during metazoan development relies on combinatorial interactions between different transcription factors. As a consequence, cis-regulatory elements often colocalize in clusters termed cis-regulatory modules. These may have requirements on organizational features such as spacing, order and helical phasing (periodic spacing) between binding sites. Due to the turning of the DNA helix, a small modification of the distance between a pair of sites may sometimes drastically disrupt function, while insertion of a full helical turn of DNA (10-11 bp) between cis elements may cause functionality to be restored. Recently, de novo motif discovery methods which incorporate organizational properties such as colocalization and order preferences have been developed, but there are no tools which incorporate periodic spacing into the model.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17963524 PMCID: PMC2200674 DOI: 10.1186/1471-2105-8-418
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Schematic drawing of the model structure. The triangle and rectangle represent the first and second motif respectively. Gray boxes indicate valid locations for the second motif given the position of the first. The "phase" (distance offset) is assumed to be constant over all sequences and is determined by the algorithm.
Figure 2Web interface screenshot, showing the parameter setup screen. The schematic shows valid positions for motif 2 given the position of motif one. The image is dynamically generated to reflect the current parameter settings.
Figure 3Performance on synthetic sequence datasets containing colocalized and periodically spaced CArG and ETS motifs with varying information content. HeliCis with different settings was compared to MEME and BioProspector. The information content of the motifs was gradually reduced by varying the number of pseudocounts and the sensitivity of the different tools was determined by calculating the fraction of correctly identified motifs. Results are from 5 averaged trials.
Figure 4Performance on synthetic sequence datasets with varying motif coverage. Datasets of 20 sequences with colocalized and periodically spaced CArG and ETS motifs were generated. The proportion of sequences containing the motifs was gradually reduced, thus making them increasingly difficult to detect. HeliCis with different settings was compared to MEME and BioProspector. The plots show sensitivity and positive predictive value (PPV = TP/(TP + FP)). Results are from 5 averaged trials.