| Literature DB >> 17942412 |
Gang-Qing Hu1, Xiaobin Zheng, Yi-Fan Yang, Philippe Ortet, Zhen-Su She, Huaiqiu Zhu.
Abstract
Correct annotation of translation initiation site (TIS) is essential for both experiments and bioinformatics studies of prokaryotic translation initiation mechanism as well as understanding of gene regulation and gene structure. Here we describe a comprehensive database ProTISA, which collects TIS confirmed through a variety of available evidences for prokaryotic genomes, including Swiss-Prot experiments record, literature, conserved domain hits and sequence alignment between orthologous genes. Moreover, by combining the predictions from our recently developed TIS post-processor, ProTISA provides a refined annotation for the public database RefSeq. Furthermore, the database annotates the potential regulatory signals associated with translation initiation at the TIS upstream region. As of July 2007, ProTISA includes 440 microbial genomes with more than 390 000 confirmed TISs. The database is available at http://mech.ctb.pku.edu.cn/protisa.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17942412 PMCID: PMC2238952 DOI: 10.1093/nar/gkm799
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Statistics of confirmed TISs (as of July 2007)
| Kingdom | Group | Genome No. | IPT No. | CDC No. | HSC No. | Gene No.a |
|---|---|---|---|---|---|---|
| Archaea | 8 | 207 | 4238 | 1454 | 4729 | |
| 23 | 156 | 13 175 | 7330 | 15 389 | ||
| 1 | 0 | 152 | 54 | 163 | ||
| Bacteria | 2 | 0 | 1836 | 932 | 2213 | |
| 36 | 286 | 21 137 | 12 561 | 27 297 | ||
| 1 | 3 | 687 | 350 | 746 | ||
| 11 | 10 | 6782 | 5431 | 8516 | ||
| 11 | 2 | 3405 | 2212 | 3858 | ||
| 2 | 0 | 911 | 701 | 1099 | ||
| 22 | 277 | 13 461 | 9 682 | 16 855 | ||
| 4 | 99 | 2011 | 1276 | 2603 | ||
| 89 | 864 | 63 647 | 51 122 | 77 320 | ||
| 1 | 1 | 773 | 419 | 837 | ||
| 1 | 0 | 439 | 429 | 655 | ||
| 53 | 138 | 31 928 | 32 043 | 45 656 | ||
| 36 | 52 | 23 893 | 23 601 | 34 828 | ||
| 103 | 6 716 | 93 832 | 93 644 | 127 165 | ||
| 14 | 17 | 8932 | 7416 | 11 996 | ||
| 11 | 39 | 6051 | 4143 | 6911 | ||
| 9 | 23 | 3906 | 2580 | 4635 | ||
| 1 | 8 | 497 | 292 | 591 | ||
| 1 | 0 | 499 | 359 | 663 | ||
| Sum | – | 440 | 8 898 | 302 192 | 258 031 | 394 725 |
aNumber of genes with at least one confirmed TIS. A TIS might be confirmed by several evidences. About 1–2% of the genes have more than one confirmed TIS.
Statistics of genomes with specific signals (as of July 2007)
| Kingdom | Group | SD_like only | Atypical only | SD_like and TA_like | SD_like and Atypical | TA_like and Atypical |
|---|---|---|---|---|---|---|
| Archaea | – | – | 6 | 2 | – | |
| 5 | – | 16 | – | 2 | ||
| – | – | 1 | – | – | ||
| Bacteria | – | – | 2 | – | – | |
| 1 | – | 33 | – | 2 | ||
| – | – | 1 | – | – | ||
| – | 6 | – | 5 | – | ||
| 1 | – | – | 10 | – | ||
| 1 | – | 1 | – | – | ||
| – | 11 | – | 11 | – | ||
| – | – | 4 | – | – | ||
| 79 | 2 | 8 | – | – | ||
| 1 | – | – | – | – | ||
| 1 | – | – | – | – | ||
| – | 1 | – | – | – | ||
| 15 | 1 | – | 37 | – | ||
| 2 | – | – | 34 | – | ||
| 82 | 1 | – | 20 | – | ||
| 9 | – | 2 | 3 | – | ||
| 11 | – | – | – | – | ||
| 3 | – | 2 | 4 | – | ||
| 1 | – | – | – | – | ||
| Sum | – | 212 | 22 | 76 | 126 | 4 |
Figure 1.Sequence logo and spacer length distribution of representative signals for the genomes (A) E. coli k-12; (B) S. coelicolor; (C) A. fulgidus; and (C) Synechocystis sp. PCC 6803. The positional weight matrix of the signal is visualized by a sequence logo in which the height of a letter on a given position is proportional to its occurring frequency. A letter is bottom-up shown if the occurring frequency is lower than that from the background. The consensus is shown below the logo. The spacer length is defined as the distance (or the number of nucleotides) between the TIS and each of all annotated signals, which are calculated by the positional weight matrix visualized in sequence logo.