| Literature DB >> 21149261 |
Guojun Li1, Bingqiang Liu, Qin Ma, Ying Xu.
Abstract
We present a new algorithm, BOBRO, for prediction of cis-regulatory motifs in a given set of promoter sequences. The algorithm substantially improves the prediction accuracy and extends the scope of applicability of the existing programs based on two key new ideas: (i) we developed a highly effective method for reliably assessing the possibility for each position in a given promoter to be the (approximate) start of a conserved sequence motif; and (ii) we developed a highly reliable way for recognition of actual motifs from the accidental ones based on the concept of 'motif closure'. These two key ideas are embedded in a classical framework for motif finding through finding cliques in a graph but have made this framework substantially more sensitive as well as more selective in motif finding in a very noisy background. A comparative analysis shows that the performance coefficient was improved from 29% to 41% by our program compared to the best among other six state-of-the-art prediction tools on a large-scale data sets of promoters from one genome, and also consistently improved by substantial margins on another kind of large-scale data sets of orthologous promoters across multiple genomes. The power of BOBRO in dealing with noisy data was further demonstrated through identification of the motifs of the global transcriptional regulators by running it over 2390 promoter sequences of Escherichia coli K12.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21149261 PMCID: PMC3074163 DOI: 10.1093/nar/gkq948
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Prediction performance of seven programs on sequences with multiple motifs
| Program | Hypothetic TFs (24 | Inserted motifs (305 |
|---|---|---|
| AlignACE | 0 (0.00) | 0 (0.00) |
| Bioprospector | 10 (41.7) | 83 (27.2) |
| CONSENSUS | 7 (29.2) | 74 (24.3) |
| MDscan | 7 (29.2) | 40 (13.1) |
| MEME | 16 (66.7) | 156 (51.1) |
| Weeder | 4 (16.6) | 27 (8.9) |
| BOBRO | 22 (91.7) | 201 (65.9) |
aThe numbers in brackets on the first row are the total numbers of hypothetic TFs and inserted motif segments in the whole data sets, respectively. Second and fourth columns represent the numbers of hypothetic TFs and inserted motif segments identified by the corresponding programs, respectively.
Prediction of BOBRO on E. coli K12 co-regulated promoter sequences
BOBRO outputs 37 optimum motif closures. The names of corresponding TFs are listed in the first column of the table. In the second column, m is the number of all the predicted motifs in respective motif closure output by BOBRO, and, n the number of those in the corresponding closure that have been documented as TFBSs. The profile logos, consensus sequences, and P-values of these closures are presented in third, fourth, and fifth columns, respectively.
Figure 1.Comparison between BOBRO and six other programs on 37 co-regulated data sets from E. coli K12. The numbers shown in (a) and (b) are the average values of SN, SP and PC, respectively. (a) Performance comparison with motif length information. (b) Performance comparison without motif length information. (c) Comparisons of average deviation degrees (ADD) between predicted motif lengths by MEME and BOBRO.
Figure 2.Comparison between BOBRO and other programs on orthologous promoters across multiple genomes. The top panel shows the PCs of prediction results by the seven programs on 547 E. coli promoters. The lower panels are PC, SN and SP of prediction results by BOBRO and MicroFootprinter on promoters of 10 E. coli TFs.
Figure 3.Comparisons between documented and predicted cis motifs for the eight TFs. Each blue bar represents the total number of documented motifs of the corresponding regulon, and the red bar represents the correctly predicted motifs for the corresponding regulon.