Literature DB >> 18957446

tRNAdb 2009: compilation of tRNA sequences and tRNA genes.

Frank Jühling1, Mario Mörl, Roland K Hartmann, Mathias Sprinzl, Peter F Stadler, Joern Pütz.   

Abstract

One of the first specialized collections of nucleic acid sequences in life sciences was the 'compilation of tRNA sequences and sequences of tRNA genes' (http://www.trna.uni-bayreuth.de). Here, an updated and completely restructured version of this compilation is presented (http://trnadb.bioinf.uni-leipzig.de). The new database, tRNAdb, is hosted and maintained in cooperation between the universities of Leipzig, Marburg, and Strasbourg. Reimplemented as a relational database, tRNAdb will be updated periodically and is searchable in a highly flexible and user-friendly way. Currently, it contains more than 12 000 tRNA genes, classified into families according to amino acid specificity. Furthermore, the implementation of the NCBI taxonomy tree facilitates phylogeny-related queries. The database provides various services including graphical representations of tRNA secondary structures, a customizable output of aligned or un-aligned sequences with a variety of individual and combinable search criteria, as well as the construction of consensus sequences for any selected set of tRNAs.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18957446      PMCID: PMC2686557          DOI: 10.1093/nar/gkn772

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

As a constantly increasing number of complete genomes is published, it becomes necessary to transfer existing sequence compilations to state-of-the-art IT infrastructure to cope with the challenges of the post-genomic era. One of the most abundant groups of nucleic acids are tRNA molecules, being present in all types of cells and organelles. Unique features of these molecules include (i) the high degree of structural conservation, (ii) the plethora of cellular factors tRNAs interact with, and (iii) the largest diversity and highest density of nucleoside modifications found in nature. Furthermore, with respect to phylogeny, tRNAs offer a unique complexity: since each cell and eukaryotic organelle has a cohort of related but distinct tRNA species, phylogenetic analyses permit an integrated view on an entire set of tRNAs that underwent coevolution, rather than being restricted to comparison of a single RNA species. Hence, a tRNA database must fulfil specific criteria in order to address these features. The tRNAdb, with one of the largest numbers of entries among RNA databases, is not only an excellent model system for implementing and validating algorithms and processes of automated nucleic acid data transfer, but also for the development of novel sequence analysis tools (1). In its re-structured version, the tRNAdb meets the demands of present-day web-based interfaces and provides a basis for integration of structure-function relationships and additional information on evolution and phylogeny.

DATABASE CONTENT AND ORGANIZATION

In the new tRNA database, sequences are stored on a MySQL database server (http://dev.mysql.com) and also as a BLAST database (2). The relational database management system implements a powerful search engine that allows access to all data and offers a high flexibility in queries. In particular, the opportunity of using the BLAST database provides highly efficient similarity searches. The database for mammalian mitochondrial tRNA sequences (Mamit-tRNA) with its user-friendly web interface (http://mamit-trna.u-strasbg.fr) served as template for the new database. Accordingly, color codes and visualization styles were adopted from the Mamit-tRNA compilation (3). The new version of tRNAdb is based on the ‘Compilation of tRNA sequences and sequences of tRNA genes’, distributed as a collection of MS Excel spread sheets (1). To integrate this original sequence collection into the new compilation, the complete dataset was retrieved and stored in indexed tables using several custom-made scripts. After the integrity of the individual sequences had been verified, the data were transferred into the relational database system. For detailed taxonomic queries, a tree provided by the NCBIs taxonomy section (http://www.ncbi.nlm.nih.gov/sites/entrez?db=taxonomy) was implemented, now providing a complete set of individual taxon names and synonyms. Furthermore, all taxon names appearing in the original tRNA compilation were manually matched with the taxonomic tree. Several outdated entries, where the organisms have been renamed or reclassified in the meantime, were identified and adjusted according to current taxonomy. In addition, the bacterial sequences were subjected to manual proofreading, replacing the previous erroneous entry and being tagged as ‘corrected’ in the comment note. New ID management was implemented with prefixes ‘tdbD’ and ‘tdbR’ for DNA and RNA sequences, respectively. However, for compatibility reasons, the newly designed web interface supports both the former and the new ID format. Besides the imported data sets, 255 new tRNA gene sequences retrieved from a series of completed archaeal genomes recently submitted to NCBI (Methanococcus aeolicus Nankai-3, Methanosarcina acetivorans C2A, Methanospirillum hungatei JF-1, Nanoarchaeum equitans Kin4-M, Staphylothermus marinus F1 and Sulfolobus acidocaldarius DSM 639) were scanned by tRNAscan-SE (4) and imported using a new data input interface directly connected to the database. For reasons of clarity and compatibility, all sequences are presented in the alignment format of the Mamit-tRNA database (3), including a structural annotation generated from the assignment of nucleotide position numbers. Currently, the data set contains 12 099 sequences from tRNA genes (577 organisms) and 623 tRNA sequences (104 organisms) (Table 1).
Table 1.

Actual entries of the updated version of tRNAdb

Taxon
Organisms

tRNA genes

tRNA sequences
tRNA genestRNA sequencesCytoplasmMitochondriaChloroplastCytoplasmMitochondriaChloroplast
Root5771049758196537647411138
Cellular organisms571999705196537645711138
Bacteria2351963680013900
Archaea4991088007600
Eukaryota287712249196537624211138
Viruses6553001700
Actual entries of the updated version of tRNAdb The database is organized in two independent and fully searchable parts. One part combines tRNA gene sequences which, in the previously published compilation (1), were divided into the sections ‘Genomic tRNA Compilation’ [mostly identified by the annotation of complete genomes using tRNAScan-SE (4)] and ‘Compilation of tRNA Genes’. This part of the database also includes tRNA-like structures (‘TLSs’) encoded in viruses and phages. In the second part, sequences obtained from direct tRNA analysis [including identified nucleoside modifications (5,6)] are presented, corresponding to the former section ‘Compilation of tRNA Sequences’.

SEQUENCE SEARCH TOOL

Using the advanced functionality of MySQL- and BLAST-based databases, the new compilation provides a powerful and fast search engine. Query results are stored on the server and are linked to the corresponding session object. Furthermore, the retrieved data can be edited manually. Queries can include DNA or RNA sequences, amino acid family, anticodon, references, Pubmed-ID of the reference, gene description as well as comments. Taxa can be identified by searching for specific names, strains, taxononomic IDs or even synonyms. In addition, individual searches concerning sequence and/or structural characteristics (e.g. conserved or semiconserved nucleotides) are possible. Besides that, the server accepts sequence IDs of the new and the previous tRNA database as queries, and can perform BLAST searches. Query results are displayed in a clearly arranged list and can be adjusted concerning individual details. Since the 3′-CCA terminus is not included in the Mamit-tRNA color code (CCA ends are not encoded in mitochondrial tRNA genes), a new color was assigned to the CCA triplet. In addition, the list covers information related to each organism, the amino acid specificity and the primary sequence of the tRNA. Optionally, the secondary structure can be displayed for each kind of sequence (DNA or RNA). For convenience, a thumbnail presentation allows a fast preview of the secondary structures. To directly highlight the cloverleaf structure of a selected tRNA, an image generator has been implemented, supporting all tRNA domains including variable stem- and loop-sizes. Positions of nucleotides are numbered according to conventional rules (1,7). Furthermore, an additional module was implemented providing statistical information for each alignment output to allow an easy comparison of individual sequences. According to the Mamit-tRNA database, consensus and typical structures of selected sequences can be calculated and displayed (3). Most conveniently, the retrieved data can be downloaded in a variety of file formats for further investigation using other applications. Export of sequences in FASTA (8), ClustalW (9) and Vienna RNA Package (10) file formats facilitates further analysis. The representation of tRNA sequences poses additional challenges compared to those of the tRNA genes. More than 90 modified nucleosides have been characterized in tRNAs from Bacteria, Archaea and Eukarya (http://library.med.utah.edu/RNAmods/). Most of the base modifications are faithfully represented in the tRNA database. However, further processing of this information is not trivial, as the majority of RNA bioinformatics software is unable to cope with non-standard nucleotides. Hence, retrieved RNA sequences can be transformed into compatible DNA sequences.

DISCUSSION AND CONCLUSION

Well-curated and up-to-date databases are a highly useful tool of molecular biology and genetics. While the first tRNA database edition was a valuable instrument for the tRNA research community, the overwhelming amount of newly available sequences released by the variety of different genome sequencing projects made it necessary to develop a modern relational database system. In the new edition, all sequences of the original Excel-based compilation (http://www.trna.uni-bayreuth.de) as well as complete sets of tRNA gene sequences of several recently published archaeal genomes have been included. Furthermore, the standardized NCBI taxonomy system has been implemented, leading to high compatibility with other sequence databases. The new versatile search engine allows complex query combinations concerning sequence, structure and taxonomy, thus meeting the demands of systematic investigations of tRNA sequence/structure relationships. For the next edition of this compilation, proofreading of the remaining sequences (Eukarya and Archaea) will be completed. In addition, newly published tRNA genes and tRNA sequences will be imported. Possible extensions of the database are (i) inclusion of 5′- and 3′-flanking nucleotides to extract information on tRNA maturation (11), (ii) indication of tRNA introns (12), (iii) tools to extract identity elements for aminoacylation (13), (iv) indication of anticodon editing (14), (v) display of pathological tRNA mutations (3,15), (vi) information on posttranscriptional modifications with known roles in fine-tuning tRNA structure and function (16), (vii) display of isoacceptors and isodecoders [tRNAs with identical anticodon but sequence deviations elsewhere (17)], or (viii) information on tRNA expression levels [e.g. tissue-specific differences in eukaryotes (18)].

ACCESS

tRNAdb is freely accessible at http://trnadb.bioinf.uni-leipzig.de. This article should be cited in research projects assisted by the use of the database. Comments, corrections and new entries are welcome.

FUNDING

This work was funded by the Centre National de la Recherche Scientifique (CNRS), Université Louis Pasteur Strasbourg 1, Association Française contre les Myopathies (AFM), Deutsche Forschungsgemeinschaft [DFG - MO-634/2, MO-634/3, HA 1672/7-3/4/5, and SPP-1174 (‘Metazoan Deep Phylogeny’) project STA 850/3-2] and by the French-German PROCOPE program (DAAD D/0628236, EGIDE PHC 14770PJ). Funding for open access charge: CNRS and DFG. Conflict of interest statement. None declared.
  15 in total

1.  tRNomics: analysis of tRNA genes from 50 genomes of Eukarya, Archaea, and Bacteria reveals anticodon-sparing strategies and domain-specific features.

Authors:  Christian Marck; Henri Grosjean
Journal:  RNA       Date:  2002-10       Impact factor: 4.942

2.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

Review 3.  Universal rules and idiosyncratic features in tRNA identity.

Authors:  R Giegé; M Sissler; C Florentz
Journal:  Nucleic Acids Res       Date:  1998-11-15       Impact factor: 16.971

4.  The RNA Modification Database: 1999 update.

Authors:  J Rozenski; P F Crain; J A McCloskey
Journal:  Nucleic Acids Res       Date:  1999-01-01       Impact factor: 16.971

5.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.

Authors:  T M Lowe; S R Eddy
Journal:  Nucleic Acids Res       Date:  1997-03-01       Impact factor: 16.971

6.  Improved tools for biological sequence comparison.

Authors:  W R Pearson; D J Lipman
Journal:  Proc Natl Acad Sci U S A       Date:  1988-04       Impact factor: 11.205

7.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.

Authors:  J D Thompson; D G Higgins; T J Gibson
Journal:  Nucleic Acids Res       Date:  1994-11-11       Impact factor: 16.971

8.  C to U editing and modifications during the maturation of the mitochondrial tRNA(Asp) in marsupials.

Authors:  M Mörl; M Dörner; S Pääbo
Journal:  Nucleic Acids Res       Date:  1995-09-11       Impact factor: 16.971

9.  Diversity of tRNA genes in eukaryotes.

Authors:  Jeffrey M Goodenbour; Tao Pan
Journal:  Nucleic Acids Res       Date:  2006-11-06       Impact factor: 16.971

10.  Compilation of tRNA sequences and sequences of tRNA genes.

Authors:  Mathias Sprinzl; Konstantin S Vassilenko
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

View more
  411 in total

1.  Actin-binding protein ABP140 is a methyltransferase for 3-methylcytidine at position 32 of tRNAs in Saccharomyces cerevisiae.

Authors:  Akiko Noma; Sanghyun Yi; Takayuki Katoh; Yoshimi Takai; Takeo Suzuki; Tsutomu Suzuki
Journal:  RNA       Date:  2011-04-25       Impact factor: 4.942

2.  Identification and modeling of a phosphatase-like domain in a tRNA 2'-O-ribosyl phosphate transferase Rit1p.

Authors:  Anna Czerwoniec; Janusz M Bujnicki
Journal:  Cell Cycle       Date:  2011-10-15       Impact factor: 4.534

Review 3.  Ribosome evolution: emergence of peptide synthesis machinery.

Authors:  Koji Tamura
Journal:  J Biosci       Date:  2011-12       Impact factor: 1.826

4.  Kinetics of tRNA folding monitored by aminoacylation.

Authors:  Hari Bhaskaran; Annia Rodriguez-Hernandez; John J Perona
Journal:  RNA       Date:  2012-01-27       Impact factor: 4.942

5.  Role of wobble base pair geometry for codon degeneracy: purine-type bases at the anticodon wobble position.

Authors:  Gunajyoti Das; R H Duncan Lyngdoh
Journal:  J Mol Model       Date:  2012-03-08       Impact factor: 1.810

6.  The t6A modification acts as a positive determinant for the anticodon nuclease PrrC, and is distinctively nonessential in Streptococcus mutans.

Authors:  Jo Marie Bacusmo; Silvia S Orsini; Jennifer Hu; Michael DeMott; Patrick C Thiaville; Ameer Elfarash; Mellie June Paulines; Diego Rojas-Benítez; Birthe Meineke; Chris Deutsch; Dirk Iwata-Reuyl; Patrick A Limbach; Peter C Dedon; Kelly C Rice; Stewart Shuman; Valérie de Crécy-Lagard
Journal:  RNA Biol       Date:  2017-09-13       Impact factor: 4.652

7.  Pervasive within-Mitochondrion Single-Nucleotide Variant Heteroplasmy as Revealed by Single-Mitochondrion Sequencing.

Authors:  Jacqueline Morris; Young-Ji Na; Hua Zhu; Jae-Hee Lee; Hoa Giang; Alexandra V Ulyanova; Gordon H Baltuch; Steven Brem; H Isaac Chen; David K Kung; Timothy H Lucas; Donald M O'Rourke; John A Wolf; M Sean Grady; Jai-Yoon Sul; Junhyong Kim; James Eberwine
Journal:  Cell Rep       Date:  2017-12-05       Impact factor: 9.423

8.  The Oxytricha trifallax macronuclear genome: a complex eukaryotic genome with 16,000 tiny chromosomes.

Authors:  Estienne C Swart; John R Bracht; Vincent Magrini; Patrick Minx; Xiao Chen; Yi Zhou; Jaspreet S Khurana; Aaron D Goldman; Mariusz Nowacki; Klaas Schotanus; Seolkyoung Jung; Robert S Fulton; Amy Ly; Sean McGrath; Kevin Haub; Jessica L Wiggins; Donna Storton; John C Matese; Lance Parsons; Wei-Jen Chang; Michael S Bowen; Nicholas A Stover; Thomas A Jones; Sean R Eddy; Glenn A Herrick; Thomas G Doak; Richard K Wilson; Elaine R Mardis; Laura F Landweber
Journal:  PLoS Biol       Date:  2013-01-29       Impact factor: 8.029

9.  The tRNA recognition mechanism of folate/FAD-dependent tRNA methyltransferase (TrmFO).

Authors:  Ryota Yamagami; Koki Yamashita; Hiroshi Nishimasu; Chie Tomikawa; Anna Ochi; Chikako Iwashita; Akira Hirata; Ryuichiro Ishitani; Osamu Nureki; Hiroyuki Hori
Journal:  J Biol Chem       Date:  2012-10-24       Impact factor: 5.157

10.  Trmt61B is a methyltransferase responsible for 1-methyladenosine at position 58 of human mitochondrial tRNAs.

Authors:  Takeshi Chujo; Tsutomu Suzuki
Journal:  RNA       Date:  2012-10-24       Impact factor: 4.942

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.