| Literature DB >> 17127217 |
D Ashok Reddy, Chanchal K Mitra.
Abstract
The transcription start site (TSS) region shows greater variability compared with other promoter elements. We are interested to search for its variability by using information content as a measure. We note in this study that the variability is significant in the block of 5 nucleotides (nt) surrounding the TSS region compared with the block of 15 nt. This suggests that the actual region that may be involved is in the range of 5-10 nt in size. For Escherichia coli, we note that the information content from dinucleotide substitution matrices clearly shows a better discrimination, suggesting the presence of some correlations. However, for human this effect is much less, and for mouse it is practically absent. We can conclude that the presence of short-range correlations within the TSS region is species-dependent and is not universal. We further observe that there are other variable regions in the mitochondrial control element apart from TSS. It is also noted that effective comparisons can only be made on blocks, while single nucleotide comparisons do not give us any detectable signals.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17127217 PMCID: PMC5054067 DOI: 10.1016/S1672-0229(06)60032-6
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Fig. 1The information content of the mitochondrial genome near the control element as a function of the position. A. Computation by using the neighbor-independent substitution (4×4) matrix. B. Computation by using the neighbor-dependent substitution (16×16) matrix. The 240 sequences used here are from the 3’ end of the control element, which contains the potential promoter region.
Fig. 2The information content of the mitochondrial genome near the control element computed as an average for a 5-nt overlapping block. A. Computation by using the neighbor-independent substitution (4×4) matrix. B. Computation by using the neighbor-dependent substitution (16×16) matrix. The 240 sequences used here are from the 3’ end of the control element, which contains the potential promoter region. The error bars in Panel Β are too small to be seen in the plot.
Fig. 3The average information content of the TSS regions of human, mouse, and E. coli as a function of the block size. Histograms shown in A1, B1, and C1 (top row) represent the information content determined by using the neighbor-independent matrix. Similarly, A2, B2, and C2 (bottom row) represent the information content by using the neighbor-dependent matrix. In each graph the bars represent the information content for blocks of 5 (–2 to +3), 11 (–5 to +6), and 15 (–7 to +8) nt, respectively. The positions are with reference to TSS that represents +1. The error bars in A2, B2, and C2 are too small to be seen in the plot.
The Number of Promoter Sequences from Corresponding Databases
| Organism | No. of Sequences | Database |
|---|---|---|
| Human | 1,789 | EPD |
| Mouse | 118 | EPD |
| 472 | PromEC | |
| Mitochondria (chordate) | 240 | Entrez genome (NCBI) |