| Literature DB >> 22741547 |
Frank Emmert-Streib1, Antti Häkkinen, Andre S Ribeiro.
Abstract
BACKGROUND: Evidence suggests that in prokaryotes sequence-dependent transcriptional pauses affect the dynamics of transcription and translation, as well as of small genetic circuits. So far, a few pause-prone sequences have been identified from in vitro measurements of transcription elongation kinetics.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22741547 PMCID: PMC3534578 DOI: 10.1186/1471-2105-13-152
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Reactions and kinetic parameters for the gene expression model
| 1 | ||
| 2 | ||
| 3 | ||
| 4 | ||
| 5 | ||
| 6 | ||
| 7 | ||
| 8 | ||
| 9 | ||
| 10 | ||
| 11 | ||
| 12 | ||
| 13 | ||
| 14 | ||
| 15 | ||
| 16 | ||
| 17 | see above | |
| 18 | see above | |
| 19 | ||
| 20 | ||
| 21 | ||
| 22 | ||
| 23 |
Chemical reactions, rate constants (in s−1), and delays (in s) used to model transcription and translation. Pro – promoter, Rp – RNA polymerase, Rib – ribosome, [RibR – number of translating ribosomes on RNA strand, P – complete protein, U – unoccupied nucleotide and O – nucleotide occupied by Rp, A – activated nucleotide; U,O,A – corresponding ribonucleotides. n denotes the number of the nucleotide in the sequence. ΔP – range of nucleotides that Rp occupies, ΔP=25. ΔR – range of ribonucleotides that ribosome occupies, ΔR=31. Notation X∼N(μσ2) denotes that the values of X are drawn from normal distribution with a mean of μand variance of σ2. Parameter values are from measurements in E. coli, mainly for LacZ [3,20,24-34,53]. The duration of protein folding after translation is completed (τ) is set according to measurements of a commonly used GFP mutant [35].
Features used for classification
| 1 | mean autocorrelation function | ||
| 2 | standard deviation of autocorrelation function | ||
| 3 | mean autocorrelation function | ||
| 4 | standard deviation of autocorrelation function | ||
| 5 | mean cross-correlation function | ||
| 6 | standard deviation of cross-correlation function | ||
| 7 | mean cross-correlation function | ||
| 8 | standard deviation of cross-correlation function | ||
| 9 | mean decay time | ||
| 10 | standard deviation of decay time |
Summary of the 10 variables we use to define a feature vector for a model M.
Six different models used for detection of pauses
| Model | Features |
|---|---|
| A | No sequence-dependent pause sites. |
| B | Pause site at nucleotide 500. |
| C | Pause site at nucleotide 250. |
| D | Pause site at nucleotide 750. |
| E | Pause site with mean duration
|
| F | Pause site with mean duration
|
The six models with different pause characteristics are considered for the purposes of detection and classification of sequence dependent pauses.
Figure 1Average number of proteins. Average number of proteins for each model. Each time series has been averaged over 10 independent runs and each data point has been averaged over 100 time steps and smoothed over a window of size 20.
Figure 2P-values for comparing model A and B. P-values in dependence of the sample size from two-sample t-tests. Top row: ΔL=200. Bottom row: ΔL=1,000. First column: comparison of model A with model B. Second column: comparison of different instances of model A.
Figure 3Results of hierarchical clustering of the models. Hierarchical clustering of feature vectors from models B, E and F (left tree) and from models A, C and D (right tree). The labels index the feature vectors. Each model is represented by 50 feature vectors.
Figure 4Significance of correlations among features. Graphical visualization of the p-values of correlation coefficients between different features. The colors red to blue represent low to high p-values. The diagonal is shown in black to indicate that the self-correlations are not of interest.
Figure 5Maximum likelihood estimation of the position of pause sites. Logarithmic relative Likelihood for three models: Model C (left), model B (middle) and model D (right). The estimated maximum likelihood values of the nucleotide positions are 200, 410 and 780 (vertical blue lines) and the true position values (250, 500 and 750) are indicated by vertical green lines. The boundary of the 95% bootstrap confidence region of the ML estimates is indicated by horizontal lines.