| Literature DB >> 19465400 |
Deepak Sharma1, Debasisa Mohanty, Avadhesha Surolia.
Abstract
RegAnalyst is a user-friendly web interface that integrates MoPP (Motif Prediction Program), MyPatternFinder (pattern detection tool) and MycoRegDB (mycobacterial promoter and regulatory elements database). Since motif discovery is a challenging task, numerous tools have been developed over the past few years. Strikingly, the existing programs were not successful in detecting the known consensus in all mycobacterial (epitomizing degenerate) datasets even in the absence of noise and their performance was further reduced in the presence of noise. Consequently, MoPP, a de novo and greedy (for degeneracy) 'inexact' word-based tool that is tailored to enumerate significantly conserved degenerate oligonucleotide motifs was developed. Benchmarking on datasets from MycoRegDB and SCPD (http://rulai.cshl.edu/SCPD/) indicate that MoPP (i) consistently outperforms other motif discovery tools on highly degenerate as well as less degenerate datasets and (ii) successfully detects completely degenerate motifs (with no two instances of a pattern being exactly the same) even in the presence of noise. We have also developed another accessory program, MyPatternFinder, that scans a given sequence or genome to find exact or approximate matches to a query motif of any length identified by MoPP or any other user-defined motif. RegAnalyst will be a valuable tool for in silico analysis of regulatory networks and can be accessed at http://www.nii.ac.in/~deepak/RegAnalyst.Entities:
Mesh:
Year: 2009 PMID: 19465400 PMCID: PMC2703886 DOI: 10.1093/nar/gkp388
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Schematic representation of MoPP's algorithm. The unparalleled ability of MoPP in detecting degenerate motifs is due to the steps indicated in italics.
Performance comparison of MoPP with five popular motif finders on mycobacterial datasets
| Regulon | Consensus | Size | MoPP | YMF | PRISM | SCOPE | Oligo | MEME |
|---|---|---|---|---|---|---|---|---|
aWeeder could not be compared since the background file was not available.
bPattern is highlighted in bold if it matches the consensus with not more than one mismatch and ranks among top five. Number in parenthesis indicates rank of the pattern.
cNot considered a match since ≥80% of the matching residues are degenerate nucleotides or matching with degenerate nucleotides.
dAccording to MtbRegList (www.usherbrooke.ca/vers/MtbRegList).
Figure 2.A typical output of MoPP on analysis of a large (95 genes) and highly degenerate dataset, MycoSig50bp. MoPP successfully identified both –10 (motifs ranked 2 and 3) and –35 consensus (motifs ranked 1 and 3) sequences. For each of the detected motif, user can view (i) a colored display of patterns along with their positions (by clicking on the count link), (ii) a tabular output of patterns and their positions and (iii) alignment and frequency matrix of patterns (by clicking on the consensus sequence).
Figure 3.(a) Performance comparison of MoPP with other motif discovery tools on 30 datasets derived from mycobacteria and yeast. *Weeder could not be assessed on mycobacterial datasets since the background file was not available. (b) Scalability of MoPP in terms of motif level success rate (mSr) and performance coefficient at binding site level (sPC) with respect to the sequence length (margin size).
Detection of exact sigma consensus sequences in the complete M. tuberculosis H37Rv genome by MyPatternFinder
| Sigma factor | Consensus sequence (Ref.) | Total number of hits | Gene | Distance from start codon |
|---|---|---|---|---|
| TTGACW-N17-TATAMT | 0 | – | – | |
| TTGACW-N16–21-TATAMT ( | 0 | – | – | |
| TTGACW-N16–21-TATAMT ( | 20 | Rv0068 | −84 | |
| Rv0305c ( | −163 | |||
| Rvnr01 ( | −225 | |||
| Rv1403c | −84 | |||
| Rv2011c | −50 | |||
| Rv2487c ( | −288 | |||
| Rv2578c | −35 | |||
| Rv3082c ( | −44 | |||
| Rv3760 | −485 | |||
| GTTT-N17-GGGTAT ( | 4 | Rv1248c ( | −358 | |
| Rv3287c ( | −35 | |||
| Rv3349c | −264 | |||
| SGGAAC-N17–22-SGTTS ( | 150 | Rv0384c ( | −72 | |
| Rv0474 | −150 | |||
| Rv0563 ( | −78 | |||
| Rv0569 | −475 | |||
| Rv1072 | −79 | |||
| Rv1535 | −93 | |||
| Rv1786 | −448 | |||
| Rv1792 | −112 | |||
| Rv1883c | −217 | |||
| Rv2018 | −182 | |||
| Rv2184c | −178 | |||
| Rv2308 | −34 | |||
| Rv2334 ( | −364 | |||
| Rv2373c ( | −138 | |||
| Rv2466cg | −77 | |||
| Rv2525c | −345 | |||
| Rv2694c | −96 | |||
| Rv2745c | −66 | |||
| Rv2804c | −384 | |||
| Rv2839c ( | −313 | |||
| Rv3179 | −321 | |||
| Rv3482c | −248 | |||
| Rv3597c ( | −203 | |||
| Rv3832c | −481 | |||
| Rv3913 ( | −66 |
aGene is reported only if the distance of consensus sequence is ≤500 bp upstream of the start codon and it has a non-coding upstream region of ≥25 bp.
bAccording to Cole et al. (36).
cLocation is relative to the translation start site as determined at http://genolist.pasteur.fr/TubercuList, except for Rv3287c (rsbW/usfX), where location is relative to transcription start site according to Beaucher et al. (37).
dW = A/T; M = A/C; S = G/C.
eBy allowing one mismatch in the consensus sequence.
fAlso predicted to be an E. coli σ70 promoter with one mismatch.
gInvolvement of the particular sigma factor has been experimentally verified (37,38).
Hypoxia responsive motifs present upstream of genes
| Sequence | Score | Gene |
|---|---|---|
| 11.8486 | Rv1039c ( | |
| 11.7186 | Rv1811 ( | |
| 11.414 | Rv1552 ( | |
| 11.2576 | Rv0345 | |
| 11.195 | Rv0877 | |
| 10.8046 | Rv3318 ( | |
| 10.761 | Rv1824 | |
| 10.7435 | Rv1881c ( | |
| 10.7301 | Rv2194 ( | |
| 10.6815 | Rv2221c ( | |
| 10.5356 | Rv1256c ( |
aIn addition to those detected by Park et al. (13).
bLower case characters show disagreement to motif consensus.
cCalculated as mentioned in Park et al. (13).
dAccording to Cole et al. (36).
eInduced in artificial granulomas (30).
fRepressed in hypoxia in M. tuberculosis H37Rv:ΔdosR (13).