| Literature DB >> 18984623 |
Grant H Jacobs1, Augustine Chen, Stewart G Stevens, Peter A Stockwell, Michael A Black, Warren P Tate, Chris M Brown.
Abstract
Messenger RNAs, in addition to coding for proteins, may contain regulatory elements that affect how the protein is translated. These include protein and microRNA-binding sites. Transterm (http://mRNA.otago.ac.nz/Transterm.html) is a database of regions and elements that affect translation with two major unique components. The first is integrated results of analysis of general features that affect translation (initiation, elongation, termination) for species or strains in Genbank, processed through a standard pipeline. The second is curated descriptions of experimentally determined regulatory elements that function as translational control elements in mRNAs. Transterm focuses on protein binding sites, particularly those in 3'-untranslated regions (3'-UTR). For this release the interface has been extensively updated based on user feedback. The data is now accessible by strain rather than species, for example there are 10 Escherichia coli strains (genomes) analysed separately. In addition to providing a repository of data, the database also provides tools for users to query their own mRNA sequences. Users can search sequences for Transterm or user defined regulatory elements, including protein or miRNA targets. Transterm also provides a central core of links to related resources for complementary analyses.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18984623 PMCID: PMC2686486 DOI: 10.1093/nar/gkn763
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Part of the new Transterm user interface. Users select data to analyse from four datasets, e.g. ‘NCBI Genbank—One sequence for each coding sequence entry’. A taxomic group is selected by NCBI ‘TaxId’ number (e.g. 9606), then a particular type of output (listed in Table 1) can be selected by using the pull down menu (e.g. Consensus of initiation region, Figure 2). Data selected can be for all the sequences or a non-redundant set (for H. sapiens 96 417 versus 32 763 sequences). This data can also be searched using Blast or Scan for matches.
The key output files and a brief description of the contents of each. Further descriptions are available through the online help ‘Main Transterm Datafiles’
| ClassSSN-TaxID.complete | Entries with complete CDS (have both inits + terms) |
|---|---|
| A: Lists of entries and identifiers in the redundant and non-redundant sets | |
| *.dat | Data: LOCUS, AccNo, Init [-20,+20], Term [−10,+10], Len, GC3, Nc |
| *.entry | Genbank names without descriptions |
| *.names | List of GenBank names (original input file) |
| *.text | Feature table outputs of TEXT information |
| *.TTSelected | Entries selected by reject_dups criteria |
| B: 5′-UTRs | |
| *.5UTR | 5′-UTRs/flanks, transterm format |
| *.5UTRnrtt | 5′-UTRs/flanks, non-redundant |
| *.5UTRnrtt.fa | 5′-UTRs/flanks, FASTA sequences, non-redundant |
| *.5UTR.fa | 5′-UTRs/flanks, FASTA sequences |
| C: Initiation codon context | |
| *.InitEntries | Entries in.init |
| *.init.fa | Initiation region, FASTA sequences |
| *.init | Initiation region |
| *.initmatrix | GCG consensus output for initiation region (NR) |
| *.initnrttbit | Bit scores for NR initiation region |
| *.initnrttchi | Chi scores for NR initiation region |
| *.initnrttcvs | CVS scores for NR initiation region |
| *.initnrtt.fa | Initiation region, FASTA sequences, non-redundant |
| *.initnrttver | Schneider info. scores, init. region, non-redundant |
| *.initver | Schneider information scores, init. region |
| D: CDS (coding sequences) | |
| *.CDS.fa | Full CDS entries, FASTA sequences |
| *.CDS | Full CDS entries |
| *.CDSnrtt.fa | Full CDS entries, FASTA sequences, non-redundant |
| *.CDSnrtt | Full CDS entries, non-redundant |
| E: Codon usage and bias | |
| *.cod | GCG format of codon usage |
| *.rscu | Output rscu table |
| *.sum | Summary of all the key values |
| F: Termination codon context | |
| *.TermEntries | Entries in.term |
| *.term.fa | Termination region, FASTA sequences |
| *.term | Termination region |
| *.termmatrix | GCG consensus output for termination region (NR) |
| * _termnr.summary | Count_signal of tetramer freq (readable output) |
| * _termnr.tet_tab | Termination tetramer (codon + 3′ base) frequencies |
| * _termnr.tri_tab | Termination trimer (codon) frequencies |
| *.termnrttbit | Bit scores for NR termination region |
| *.termnrttchi | Chi scores for NR termination region |
| *.termnrttcvs | CVS scores for NR termination region |
| *.termnrtt.fa | Termination region, FASTA sequences, non-redundant |
| *.termnrtt | NR version of.term, by old reject_dups criteria |
| *.termnrttver | Info. scores, term. region, non-redundant |
| *.termver | Information scores, term. region |
| G: 3′-UTRs | |
| *.3UTR.fa | 3′-UTRs/flanks, FASTA sequences |
| *.3UTR | 3′-UTRs/flanks |
| *.3UTRnrtt.fa | 3′-UTRs/flanks, FASTA sequences, non-redundant |
| *.3UTRnrtt | 3′-UTRs/flanks, non-redundant |
Figure 2.The ‘Consensus of initiation region’ files for Synechocystis PCC6803 (NBSynePCC_2-1148.initmatrix) and Pseudomonas aeruginosa PAO1 (NBPseuaeru-208964.initmatrix). A count of the percentage of each base in each position is shown (see text for analysis). The position (Pos) in the matrix is shown above −20 to +13, the ATG is at +1 to +3. The consensus (Cons) (>65%) is shown below. For these datasets the upper sequences were 41.7% GC3 and lower 65.8% GC3. More comprehensive descriptions of the data are also available (Table 1).
Figure 3.An example of Transterm element description (Puf3p-binding site). Elements may be described by strings, regular expressions, matrices or RNA secondary structure rules. In this case the element is simply described as a string. Users may construct more complex descriptions of the element based on the referenced literature, for example allowing mismatches, insertions or deletions.