| Literature DB >> 19715597 |
Ronna R Mallios1, David M Ojcius, David H Ardell.
Abstract
BACKGROUND: Promoter identification is a first step in the quest to explain gene regulation in bacteria. It has been demonstrated that the initiation of bacterial transcription depends upon the stability and topology of DNA in the promoter region as well as the binding affinity between the RNA polymerase sigma-factor and promoter. However, promoter prediction algorithms to date have not explicitly used an ensemble of these factors as predictors. In addition, most promoter models have been trained on data from Escherichia coli. Although it has been shown that transcriptional mechanisms are similar among various bacteria, it is quite possible that the differences between Escherichia coli and Chlamydia trachomatis are large enough to recommend an organism-specific modeling effort.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19715597 PMCID: PMC2743672 DOI: 10.1186/1471-2105-10-271
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
29 experimentally verified σ66 promoters.
ant distance from the lead nt of the -35 hexamer to the TLS.
bReferences: M: Mathews & Timms [19]; G: Grech et al [17]; H: Hefty et al [18]; T: Tan [16]
chour Post Infection of transcriptional activation [31]
Selecting rows with END = 32 (*) ensures non-redundant observations with regard to hexamers and HMM_SCORE.
Figure 1Flowchart of duration HMM iteration.
Figure 2Flowchart of Stepwise Binary Logistic Regression iteration.
Models produced by Stepwise Binary Logistic Regression Iteration and M2 Cross-Validation.
aThe variables are listed in order of entrance into the model and the sign indicates the sign of the coefficient.
bROC analysis Area Under the Curve
M2 duration HMM sequence alignment modifications.
Comparison of M2 Cross-Validation and predictions of comparable algorithms for 25 training set genes.
| CT | HMM2 SCORE | M2 Cross-Validation | NNPP2.2 | TSS-PREDICT | Footy |
| CT046 | -1.6 | 1/1 | 0/4 | 0/2 | 0/0 |
| CT062 | 4.0 | 1/2 | 0/0 | 1/1 | 1/1 |
| CT080 | 0.5 | 1/2 | 1/4 | 0/2 | 0/0 |
| CT091 | 1.3 | 1/3 | 1/1 | 1/1 | 0/0 |
| CT098 | 3.7 | 1/1 | 0/1 | 1/2 | 1/1 |
| CT111 | -1.0 | 1/1 | 1/3 | 0/2 | 1/1 |
| CT286 | 1.2 | 1/1 | 1/2 | 1/1 | 1/1 |
| CT322 | -2.1 | 0/0 | 0/0 | 0/2 | 0/0 |
| CT323 | 1.6 | 1/1 | 1/3 | 1/1 | 1/1 |
| CT377 | 5.3 | 1/2 | 1/3 | 1/1 | 0/0 |
| CT394 | -1.5 | 1/1 | 1/2 | 1/1 | 0/0 |
| CT439m | 1.8 | 1/1 | 0/3 | 0/0 | 1/1 |
| CT442 | -0.9 | 1/2 | 1/1 | 1/1 | 0/0 |
| CT444 | 3.2 | 2/5 | 2/5 | 1/2 | 0/0 |
| CT518 | -4.2 | 1/1 | 0/0 | 1/1 | 0/0 |
| CT557 | -3.4 | 1/1 | 0/1 | 1/1 | 0/0 |
| CT559 | -2.7 | 1/1 | 1/1 | 0/2 | 1/1 |
| CT576 | 0.6 | 1/2 | 1/3 | 1/2 | 0/0 |
| CT596 | 0.5 | 1/1 | 0/1 | 0/2 | 1/1 |
| CT674 | 4.0 | 1/2 | 1/2 | 0/0 | 0/0 |
| CT701 | -3.3 | 1/1 | 1/2 | 0/2 | 0/0 |
| CT708 | 2.6 | 1/1 | 1/2 | 1/1 | 1/1 |
| CT743 | -3.8 | 0/0 | 0/2 | 1/5 | 0/0 |
| CT752 | -3.5 | 0/0 | 1/1 | 0/2 | 1/1 |
| CT863 | 4.0 | 1/1 | 1/1 | 1/1 | 0/0 |
| Sensitivity | 23/26 (0.89) | 17/26 (0.65) | 15/26 (0.58) | 10/26 (0.39) | |
| Precision | 23/34 (0.68) | 17/48 (0.35) | 15/38 (0.40) | 10/10 (1.0) | |
Comparing predictions of M2 and other algorithms for 2 training set genes not in M2 training set.
| CT | M2 | NNPP2.2 | TSS-PREDICT | Footy |
| CT665 | 0/1 | 1/3 | 0/2 | 0/0 |
| CT681 | 0/1 | 0/1 | 0/2 | 0/1 |
Figure 3Visualization of the M2 duration HMM. The top WebLogos illustrate nucleotide frequencies in each of the hexamer positions. The bottom WebLogos convert the frequencies to bits of information.
Figure 4Histogram of predicted promoter position, n = 479. POSITION marks the 5' end of the data-file 32-mer, and is consequently ~40 nt upstream from the TSS. This distribution peaks with the 5' end around 68 nts upstream from the TLS and the TSS around 28 nts upstream from the TSS.