| Literature DB >> 16539733 |
Eduardo Pareja1, Pablo Pareja-Tobes, Marina Manrique, Eduardo Pareja-Tobes, Javier Bonal, Raquel Tobes.
Abstract
BACKGROUND: Transcriptional regulation processes are the principal mechanisms of adaptation in prokaryotes. In these processes, the regulatory proteins and the regulatory DNA signals located in extragenic regions are the key elements involved. As all extragenic spaces are putative regulatory regions, ExtraTrain covers all extragenic regions of available genomes and regulatory proteins from bacteria and archaea included in the UniProt database. DESCRIPTION: ExtraTrain provides integrated and easily manageable information for 679816 extragenic regions and for the genes delimiting each of them. In addition ExtraTrain supplies a tool to explore extragenic regions, named Palinsight, oriented to detect and search palindromic patterns. This interactive visual tool is totally integrated in the database, allowing the search for regulatory signals in user defined sets of extragenic regions. The 26046 regulatory proteins included in ExtraTrain belong to the families AraC/XylS, ArsR, AsnC, Cold shock domain, CRP-FNR, DeoR, GntR, IclR, LacI, LuxR, LysR, MarR, MerR, NtrC/Fis, OmpR and TetR. The database follows the InterPro criteria to define these families. The information about regulators includes manually curated sets of references specifically associated to regulator entries. In order to achieve a sustainable and maintainable knowledge database ExtraTrain is a platform open to the contribution of knowledge by the scientific community providing a system for the incorporation of textual knowledge.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16539733 PMCID: PMC1453763 DOI: 10.1186/1471-2180-6-29
Source DB: PubMed Journal: BMC Microbiol ISSN: 1471-2180 Impact factor: 3.605
Families of Transcriptional regulatory proteins in bacteria and archaea.
| Family | InterPro entry | ExtraTrain entries | Action | Structural motif | DBD position |
| AraC/XylS | IPR000005 HTHAraC | 2485 | Activator | HTH | C-terminal |
| ArsR | IPR001845 HTH_ArsR | 982 | Repressor | HTH | Central |
| AsnC | IPR000485 HTH_AsnC_lrp | 803 | Dual | HTH | N-terminal |
| Cold shock domain | IPR002059 Cold_shock | 607 | Activator | RNA- binding like | Variable |
| CRP-FNR | IPR001808 HTH_Crp | 414 | Activator/Dual | HTH | C-terminal |
| DeoR | IPR001034 HTH_DeoR | 680 | Repressor | HTH | N-terminal |
| GntR | IPR000524 HTH_GntR | 1989 | Repressor | HTH | N-terminal |
| IclR | IPR005471 HTH_IclR | 538 | Repressor | HTH | N-terminal |
| LacI | IPR000843 HTH_LacI | 1079 | Repressor | HTH | N-terminal |
| LuxR | IPR000792 HTH_LuxR | 2117 | Activator | HTH | C-terminal |
| LysR | IPR000847 HTH_LysR | 3864 | Dual | HTH | N-terminal |
| MarR | IPR000835 HTH_MarR | 1316 | Dual | HTH | Central |
| MerR | IPR000551 HTH_MerR | 1112 | Repressor | HTH | N-terminal |
| NtrC/Fis | IPR002197 HTH_Fis | 3089 | Activator | HTH | C-terminal |
| OmpR | IPR001867 Trans_reg_C | 2253 | Activator | winged helix | C-terminal |
| TetR | IPR001647 HTH_TetR | 2718 | Repressor | HTH | N-terminal |
This table contains information about the 16 families of transcription factors included in ExtraTrain database. The first column contains the name of each family. The second column contains the identifier of the InterPro entry that defines each family. The third column contains the number of members of each family included in the database. The fourth column indicates if the members of this family usually are activators, repressors or have a dual action. The fifth column indicates the protein structural motif involved in the DNA interaction. The last column indicates the N or C-terminal position of the DNA-binding domain in the sequence of the regulatory protein.
Figure 1Case study: constructing the working set. The set of extragenic regions upstream genes encoding AcrR BLAST similar proteins has been incorporated to the "working set". For extragenic sequences 8, 10, 13, 16 and 17 the check-box for obtaining the complementary inverted sequence has been marked. Thus, the 17 upstream extragenic sequences are equally oriented with regard to the start points of the genes. Clicking on "FASTA SEQUENCES" button the user obtains the extragenic sequences in FASTA format. Clicking on "PALINSIGHT" button the user sends the sequences to Palinsight viewer.
Figure 2Case study: Palinsight displaying the shared palindrome in sequences 1 to 10 of the working set. Using Palinsight we have detected a shared palindrome in these extragenic sequences upstream genes encoding AcrR similar proteins. The same palindrome is conserved for all Escherichia coli and Shigella flexneri sequences (extragenic sequences 1–6). Another slightly different palindrome is conserved in Salmonella enterica and Salmonella typhimurium (extragenic sequences 7–10).
Figure 3Case study: Palinsight displaying the shared palindrome in sequences 11 to 17 of the working set. Extragenic sequences 11–14 from Yersinia present another identical palindrome. Palindromes detected for Erwinia carotovora, Photorhabdus luminescens and Pseudomonas syringae present more differences but the 17 extragenic sequences conserve the palindromic motif TAC - -ACA- - - -|- - - -TGT - -GTA that appears at the top of the figures 2 and 3. The detected palindromes are candidates to be binding-sites for AcrR and AcrR similar proteins.
Figure 4Case study: Palinsight displaying the window of Results of searching the pattern. When we searched for the pattern TAC - -ACA- - - -|- - - -TGT - -GTA clicking on the "Search this pattern" button we obtained the positions of this motif in the set of selected extragenic sequences. Table 3 is the copy of the complete content of this window.
Extragenic regions upstream genes encoding AcrR similar proteins
| regulator ID | Gene Name | genome | Extra. region length | |
| 1 | NP_414997 | acrR | 141 | |
| 2 | NP_706357 | acrR | Shigella flexneri 2a str. 301 | 141 |
| 3 | NP_752516 | acrR | Escherichia coli CFT073 | 105 |
| 4 | NP_836135 | acrR | Shigella flexneri 2a str. 2457T | 141 |
| 5 | NP_286205 | acrR | 141 | |
| 6 | NP_308544 | acrR | 141 | |
| 7 | NP_455074 | acrR | 141 | |
| 8 | NP_806113 | acrR | 141 | |
| 9 | NP_459472 | acrR | 141 | |
| 10 | YP_151443 | acrR | 141 | |
| 11 | NP_668381 | acrR | 144 | |
| 12 | NP_992188 | acrR2 | 144 | |
| 13 | NP_406606 | acrR | 144 | |
| 14 | YP_069526 | acrR | 143 | |
| 15 | YP_049277 | acrR | 165 | |
| 16 | NP_ 931055 | acrR | 129 | |
| 17 | NP_794058 | PSPTO4302 | 254 |
This table contains some data extracted from the entries corresponding to the extragenic regions that participate in the case study.
Results of searching the Palinsight pattern
| PPP..PPP........PPP..PPP |
| TAC..ACA........TGT..GTA |
| Seq. 1 Palinsigth position: 77 Pattern position:90 |
| TACATACATTCACAAATGTATGTA |
| Seq. 2 Palinsigth position: 77 Pattern position:90 |
| TACATACATTCACAAATGTATGTA |
| Seq. 3 Palinsigth position: 41 Pattern position:54 |
| TACATACATTCACAAATGTATGTA |
| Seq. 4 Palinsigth position: 77 Pattern position:90 |
| TACATACATTCACAAATGTATGTA |
| Seq. 5 Palinsigth position: 77 Pattern position:90 |
| TACATACATTCACAAATGTATGTA |
| Seq. 6 Palinsigth position: 77 Pattern position:90 |
| TACATACATTCACAAATGTATGTA |
| Seq. 7 Palinsigth position: 77 Pattern position:90 |
| TACATACATCCATAAATGTATGTA |
| Seq. 8 Palinsigth position: 77 Pattern position:90 |
| TACATACATCCATAAATGTATGTA |
| Seq. 9 Palinsigth position: 77 Pattern position:90 |
| TACATACATCCATAAATGTATGTA |
| Seq. 10 Palinsigth position: 77 Pattern position:90 |
| TACATACATCCATAAATGTATGTA |
| Seq. 11 Palinsigth position: 80 Pattern position:93 |
| TACATACATTCGTGAATGTATGTA |
| Seq. 12 Palinsigth position: 80 Pattern position:93 |
| TACATACATTCGTGAATGTATGTA |
| Seq. 13 Palinsigth position: 80 Pattern position:93 |
| TACATACATTCGTGAATGTATGTA |
| Seq. 14 Palinsigth position: 80 Pattern position:93 |
| TACATACATTCGTGAATGTATGTA |
| Seq. 15 Palinsigth position: 78 Pattern position:91 |
| TACATACATACTTGAATGTATGTA |
| Seq. 16 Palinsigth position: 65 Pattern position:78 |
| TACAAACATACGTGAATGTATGTA |
| Seq. 17 Palinsigth position: 78 Pattern position:91 |
| TACTTACATTCGCGGTTGTTTGTA |