Literature DB >> 26673694

GtRNAdb 2.0: an expanded database of transfer RNA genes identified in complete and draft genomes.

Patricia P Chan1, Todd M Lowe2.   

Abstract

Transfer RNAs represent the largest, most ubiquitous class of non-protein coding RNA genes found in all living organisms. The tRNAscan-SE search tool has become the de facto standard for annotating tRNA genes in genomes, and the Genomic tRNA Database (GtRNAdb) was created as a portal for interactive exploration of these gene predictions. Since its published description in 2009, the GtRNAdb has steadily grown in content, and remains the most commonly cited web-based source of tRNA gene information. In this update, we describe not only a major increase in the number of tRNA predictions (>367000) and genomes analyzed (>4370), but more importantly, the integration of new analytic and functional data to improve the quality and biological context of tRNA gene predictions. New information drawn from other sources includes tRNA modification data, epigenetic data, single nucleotide polymorphisms, gene expression and evolutionary conservation. A richer set of analytic data is also presented, including better tRNA functional prediction, non-canonical features, predicted structural impacts from sequence variants and minimum free energy structural predictions. Views of tRNA genes in genomic context are provided via direct links to the UCSC genome browsers. The database can be searched by sequence or gene features, and is available at http://gtrnadb.ucsc.edu/.
© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 26673694      PMCID: PMC4702915          DOI: 10.1093/nar/gkv1309

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Transfer RNA (tRNA) genes play an essential role in protein translation in all living cells. Among the numerous tRNA search programs created in the last 20 years, tRNAscan-SE (1) remains a popular standard for whole-genome annotation of tRNA genes. The program has undergone major development in the past three years, and now implements a number of analytic improvements, including integration of the Infernal covariance model search program (2), isotype-specific scoring models and the ability to distinguish between cytosolic-type and nuclear-encoded mitochondrial-type tRNAs (Chan et al., in preparation). To catalog the increasing number of tRNAs found in complete genomes, we developed the Genomic tRNA Database (GtRNAdb) as a repository for all identifications made by tRNAscan-SE. The initial version of the database provided a summary overview of all tRNAs, display of tRNAscan-SE identification information, the primary sequences and predicted secondary structures of tRNAs, multi-gene alignments, and links for each tRNA to explore genomic context within the UCSC Genome Browser (3) and the Archaeal/Microbial Genome Browser (4). In addition, the GtRNAdb search module allowed search by characteristics, and a custom BLAST (5) server enabled study of tRNA gene similarity across all sequences in the GtRNAdb. Our understanding of the multifaceted roles of tRNAs has leaped forward in recent years, with new appreciation for unexpected complexity and functionality of individual tRNAs. Unusual tRNAs range from a highly tissue-specific tRNA in mammalian brains (6), to tRNAs with modifications that enable highly specific regulation of tRNA stability (7), to tRNA ‘pseudogenes’ that are involved in regulation of messenger RNA stability (8). Furthermore, abundant small RNAs derived from tRNAs are now known to have major roles in abnormal cell proliferation and other diseases (9–12), and are likely to emerge as important players in other contexts with improved methods for their detection (13). Given this new complexity, the need to study the functional roles of individual tRNAs has motivated development of the new GtRNAdb, which now enables the identification of unusual features within tRNAs, sequence polymorphisms in populations, epigenetic gene locus activation state and tRNA transcript abundance among different cell types. By collecting and integrating these diverse types of information in one place, we hope to accelerate the discovery of new tRNA biology and foster an appreciation of the unrecognized regulatory roles of tRNAs across all domains of life.

NEW DATABASE CONTENT AND FEATURES

The new GtRNAdb has greatly expanded in size and phylogenetic scope, now containing more than 367 000 tRNAs derived from the genomes of 155 eukaryotes, 184 archaea and 4032 bacteria (Table 1). This constitutes a more than 4x increase from the original description of the database (14). Because of the large increase in the number of species, a new ‘Quick Search’ box on the home page allows users to type in any part of a species name (e.g. ‘Dros’ or ‘coli’), and get direct access to the tRNA Gene Summary page for all matching genomes. As before, the Gene Summary page has summary statistics for all the tRNAs in a selected genome, arranged by ‘two-box’, ‘four-box’ or ‘six-box’ codon families for quick inspection of tRNA counts, intron counts and potential pseudogenes.
Table 1.

Summary statistics of the genomes and predicted tRNAs in GtRNAdb 2.0. tRNA genes were predicted by using tRNAscan-SE (1)

EukaryotaBacteriaArchaeaTotal
Number of genomes15540321844371
tRNAs decoding standard 20 amino acids118 806237 6358712365 153
Selenocysteine or TCA Suppressor tRNAs3111346181675
Other Possible Suppressor tRNAs (CTA or TTA)263191283
Total predicted tRNAs119 380239 0008731367 111
Predicted pseudogenes1 118 008116511 119 174
On the left side of the tRNA Gene Summary page is a number of items to give quick access to genome-specific tRNA data generated by tRNAscan-SE. A major improvement has been made to the first menu item, the tRNA Gene List, which now includes all fully sortable columns and a ‘Search’ box that allows dynamic filtering for any number of tRNA traits (e.g. ‘Arg TCG chr1:’). This feature is extremely efficient for finding specific subsets within many dozens or hundreds of tRNAs in a given genome. Sorting from highest to lowest tRNA genes by score, number of introns or number of canonical structure mismatches also allows ranking tRNAs by traits of interest; multiple keys can be used to sort by holding down the shift key and selecting additional columns.

INDIVIDUAL TRNA INFORMATION PAGES

The most significant new feature of GtRNAdb is the ability to show extensive information for each tRNA gene on its own page (Figure 1). By clicking on a tRNA name in the Gene List, one is able to view a full page of rich new information, including a new feature of tRNAscan-SE 2.0 (Chan et al., in preparation): tRNA prediction based on newly built isotype-specific models. In prior versions of tRNAscan-SE, the tRNA isotype was entirely determined based on the anticodon. However, classifying tRNAs based on the highest-scoring isotype model has been shown to be more effective in bacteria with TFAM (15). Accordingly, the new individual tRNA information pages give tRNAscan-SE statistics for ‘Top Scoring / Second Best Scoring Isotype Model’, which is usually consistent with the anticodon, but sometimes is not. Disagreement could be a trait of tRNAs no longer functional in translation, or of ‘hybrid tRNAs’ which may cause ambiguous codon recognition (16). The upstream and downstream sequences flanking each tRNA gene are also included on this page to help spot regulatory motifs (e.g. poly-U termination signals). Each tRNA gene page also includes covariance model search scores that are broken down by contribution from primary sequence patterns versus secondary structures, enabling tentative identification of some types of tRNA pseudogenes.
Figure 1.

Individual gene page for human tRNA-Val-AAC-4–1. (A) Direct link to the genomic locus of the tRNA gene with the display of related data tracks in the UCSC Genome Browser (3) is provided when available. Top scoring isotype-specific models are included to illustrate consensus (or lack thereof) in isotype classification. The rank of Val-AAC-4–1 indicates that it is the fourth highest scoring out of six human Val-AAC tRNA genes. An atypical feature for the displayed tRNA is G50:G64, a non-Watson-Crick base pair mismatch in the T-arm. Known modifications of the tRNA were retrieved from MODOMICS (18). (B) Expression of tRNA fragments derived from tRNA-Val-AAC using ARM-Seq were retrieved from published literature (13). (C) Graphic representation of tRNA secondary structure prediction from tRNAscan-SE was rendered by NAVIEW (22). Secondary structure fold using minimum free energy was generated by RNAfold (17). (D) Multiple sequence alignments of tRNA genes with the same isotype are shown with the stems highlighted and individual scores. (E) Variants from dbSNP (19) build 142 located at the tRNA-Val-AAC-4–1 locus are listed with their relative tRNA positions, alternate alleles, commonality, predicted effects and direct links to the dbSNP website for further information.

Individual gene page for human tRNA-Val-AAC-4–1. (A) Direct link to the genomic locus of the tRNA gene with the display of related data tracks in the UCSC Genome Browser (3) is provided when available. Top scoring isotype-specific models are included to illustrate consensus (or lack thereof) in isotype classification. The rank of Val-AAC-4–1 indicates that it is the fourth highest scoring out of six human Val-AAC tRNA genes. An atypical feature for the displayed tRNA is G50:G64, a non-Watson-Crick base pair mismatch in the T-arm. Known modifications of the tRNA were retrieved from MODOMICS (18). (B) Expression of tRNA fragments derived from tRNA-Val-AAC using ARM-Seq were retrieved from published literature (13). (C) Graphic representation of tRNA secondary structure prediction from tRNAscan-SE was rendered by NAVIEW (22). Secondary structure fold using minimum free energy was generated by RNAfold (17). (D) Multiple sequence alignments of tRNA genes with the same isotype are shown with the stems highlighted and individual scores. (E) Variants from dbSNP (19) build 142 located at the tRNA-Val-AAC-4–1 locus are listed with their relative tRNA positions, alternate alleles, commonality, predicted effects and direct links to the dbSNP website for further information. Another new type of information in this table includes ‘Atypical Features’ which highlight deviations from the canonical tRNA structure or highly conserved nucleotides (e.g. ‘G50:G64’, Figure 1A). ‘Rank of Isodecoder’ indicates how highly the current tRNA scores relative to all other tRNAs sharing the same anticodon in that genome; a higher rank may roughly correlate positively with relative usage in translation. In addition to the secondary structure predicted by sequence alignment to the tRNA covariance model (included in the original GtRNAdb), we have now added a contrasting minimum free energy (MFE) secondary structure prediction created by RNAfold (17) (Figure 1C). It is not yet clear how the MFE structure changes after addition of tRNA modifications, or how a highly stable non-cloverleaf fold may affect early tRNA processing, although compiling comparative structure information in the new GtRNAdb should facilitate this active area of research.

INTEGRATING EXTERNAL DATA FOR RNA MODIFICATION, TRANSCRIPTION, GENOMIC VARIATION AND EPIGENETIC INFORMATION

Because the most highly accessed species within the GtRNAdb are human, Escherichia coli K12, budding yeast and mouse, we have begun to integrate some of the growing wealth of published functional data for these species to put tRNA gene predictions in richer context. First, we integrated all available RNA modification data directly from the extremely valuable MODOMICS database (18), as tRNAs are densely modified and knowledge of these different modifications is critical for better understanding tRNA function. Second, a new method was developed in our lab in collaboration with the Phizicky lab (13) which enables greatly enhanced sequencing of tRNA fragments, as well as detection of m1A, m1G and m3C modifications across all expressed tRNAs. To enable easier access to the first global view of tRNA fragment patterns, we have integrated small RNA-seq read profiles for every human and yeast tRNA from this study (Figure 1B), and we plan to continue adding high value expression data as new studies are published. Third, we have collected the most recent data on human genome variation from dbSNP (Build 142) (19), since understanding the potential phenotypic effects of mutations in tRNA genes is an integral part of understanding their role in human disease. While it has been assumed that multiple copies of tRNAs offer redundancy in most eukaryotes, it was shown recently that a single point mutation in a single tRNA gene could cause severe neural degeneration in a particular mouse genetic background (6). In order to enable the biomedical research community to identify other important mutations, we have included not only the position of the genomic variants within the tRNA genes, but also the predicted impact of such mutations given conserved primary and secondary structure of tRNAs (Figure 1E). The goal is to predict the functional consequences of tRNA gene polymorphisms, similar to predictive programs developed for protein coding genes like SnpEff (20). A key feature of the new database is its capacity to incorporate new and valuable reference data sets that provide insight into tRNA gene function as they become available. An example is the wealth of epigenomic information from the ENCODE project measuring transcription factor ChIP-seq data for a number of proteins found associated with RNA polymerase III transcription (TBP, BRF1, RPC155, POL3 and BDP1) across five different cell lines (GM12878, H1 embryonic stem cells, K562, HeLa and HepG2). We have created custom ‘session’ views for each tRNA in the human genome, accessible by clicking on links given in the ‘Genome Browser View’ (Figure 2). Other tracks displayed in these views, including the 100-vertebrate multi-genome alignments, can allow rapid assessment of the evolutionary conservation and ‘age’ of each tRNA gene in the genome. Additional types of Genome Browser Views will be added as they become available and relevant to tRNA biology.
Figure 2.

Example of ‘Genome Browser Views’ for two human tRNAs showing evolutionary conservation and ENCODE ChIP-Seq data for RNA polymerase III-associated transcription factors. (A) View of tRNA-Val-AAC-4–1, which has a lower tRNAscan-SE score (66.4 bits), is less conserved (fewer alignments to other species in the 100-Vertebrate Multi-Genome Alignment & Conservation track at bottom), and is not as transcriptionally active (red peaks from ENCODE Transcription Factor ChIP-seq data) as other more canonical Val-AAC genes. (B) View of tRNA-Val-AAC-1–1, which is a higher scoring tRNA (77.9 bits), is more conserved (across most mammals), and much more transcriptionally active (y-axis scale same as in part (A)). ChIP-seq data are from the ENCODE project (23) using antibodies to TBP (TATA-Box Binding Protein) and RPC155 (aka POLR3A,155kDa RNA polymerase III polypeptide A), derived from ‘Signal based on Uniform processing from the ENCODE Integrative Analysis Data Hub’ at http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/integration_data_jan2011/hub.txt).

Example of ‘Genome Browser Views’ for two human tRNAs showing evolutionary conservation and ENCODE ChIP-Seq data for RNA polymerase III-associated transcription factors. (A) View of tRNA-Val-AAC-4–1, which has a lower tRNAscan-SE score (66.4 bits), is less conserved (fewer alignments to other species in the 100-Vertebrate Multi-Genome Alignment & Conservation track at bottom), and is not as transcriptionally active (red peaks from ENCODE Transcription Factor ChIP-seq data) as other more canonical Val-AAC genes. (B) View of tRNA-Val-AAC-1–1, which is a higher scoring tRNA (77.9 bits), is more conserved (across most mammals), and much more transcriptionally active (y-axis scale same as in part (A)). ChIP-seq data are from the ENCODE project (23) using antibodies to TBP (TATA-Box Binding Protein) and RPC155 (aka POLR3A,155kDa RNA polymerase III polypeptide A), derived from ‘Signal based on Uniform processing from the ENCODE Integrative Analysis Data Hub’ at http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/integration_data_jan2011/hub.txt).

SEARCHING THE GTRNADB BY SEQUENCE (BLAST) OR CHARACTERISTICS (TRNA SIFTER)

The custom GtRNAdb BLAST server can be used to search any given sequence against all tRNAs in the database. Options include searching for tRNA matches in all species, or only in one of the three domains of life. As before, standard BLAST options include being able to set the Expect value (E-value) threshold or word size (5). If tRNA matches occur in genomes available in the UCSC Genome Browser (3) or Archaeal/Microbial Genome Browser (4), users can view tRNA hits within the genome browsers by clicking on the provided links. One of the goals in developing the GtRNAdb is to provide a facile tool for comparative analysis across multiple genomes. The search capabilities of the redubbed ‘tRNA Sifter’ allow researchers to query the database with criteria including phylogenetic domain and clade, partial species name, chromosome or scaffold name, any combination of amino acid isotypes and anticodons, number of introns and the existence of a genome-encoded 3′-CCA. In this new version of GtRNAdb, we have added additional search criteria, including maximum and minimum tRNA score, and number of mismatches identified in the tRNA secondary structure. Results can be viewed in the web browser interface, via links to individual tRNA information pages, or downloaded for further analysis. As an example, a peculiar trait of the previously mentioned brain-specific tRNA found in mouse (6) is that it is the only ‘high-scoring’ (>65 bits) Arg-TCT tRNA in the mouse genome that does not have an intron. Is this peculiar intron-less Arg-TCT evolutionarily shared with humans, vertebrates or more broadly? One could examine each species’ tRNAs individually, but using the tRNA Sifter, one can search for all Arg-TCT tRNAs, scoring >65 bits, which have 0 intron. In a fairly simple single-step query, one can learn that this unique intron-less Arg-TCT tRNA is indeed shared broadly among mammals, amphibians, reptiles and birds, but not among insects or nematodes. This type of query should enable researchers to maximize the value of comparative genomics to understand tRNA evolution and function.

FUTURE DEVELOPMENT

With the advent of individual tRNA information pages, we are now able to introduce many new types of characterization data by linking to or drawing directly from external databases. Individual tRNA gene pages also allow us to encourage links from other databases (like RNAcentral (21)) back to specific GtRNAdb entries. We aim to advance tRNA research by continuing to bring together diverse, complementary types of experimental as well as computational analyses in an integrated platform that facilitates both advanced searches and browsing. We plan to expand these functional data links to other species in the future by collaborating with those communities. Users are also encouraged to suggest new functionality or other relevant data sets for inclusion.
  23 in total

1.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.

Authors:  T M Lowe; S R Eddy
Journal:  Nucleic Acids Res       Date:  1997-03-01       Impact factor: 16.971

2.  tRNA-derived microRNA modulates proliferation and the DNA damage response and is down-regulated in B cell lymphoma.

Authors:  Roy L Maute; Christof Schneider; Pavel Sumazin; Antony Holmes; Andrea Califano; Katia Basso; Riccardo Dalla-Favera
Journal:  Proc Natl Acad Sci U S A       Date:  2013-01-07       Impact factor: 11.205

3.  Endogenous tRNA-Derived Fragments Suppress Breast Cancer Progression via YBX1 Displacement.

Authors:  Hani Goodarzi; Xuhang Liu; Hoang C B Nguyen; Steven Zhang; Lisa Fish; Sohail F Tavazoie
Journal:  Cell       Date:  2015-05-07       Impact factor: 41.582

4.  RNA function. Ribosome stalling induced by mutation of a CNS-specific tRNA causes neurodegeneration.

Authors:  Ryuta Ishimura; Gabor Nagy; Ivan Dotu; Huihao Zhou; Xiang-Lei Yang; Paul Schimmel; Satoru Senju; Yasuharu Nishimura; Jeffrey H Chuang; Susan L Ackerman
Journal:  Science       Date:  2014-07-25       Impact factor: 47.728

5.  The UCSC Archaeal Genome Browser: 2012 update.

Authors:  Patricia P Chan; Andrew D Holmes; Andrew M Smith; Danny Tran; Todd M Lowe
Journal:  Nucleic Acids Res       Date:  2011-11-12       Impact factor: 16.971

6.  ViennaRNA Package 2.0.

Authors:  Ronny Lorenz; Stephan H Bernhart; Christian Höner Zu Siederdissen; Hakim Tafer; Christoph Flamm; Peter F Stadler; Ivo L Hofacker
Journal:  Algorithms Mol Biol       Date:  2011-11-24       Impact factor: 1.405

7.  CLP1 links tRNA metabolism to progressive motor-neuron loss.

Authors:  Toshikatsu Hanada; Stefan Weitzer; Barbara Mair; Christian Bernreuther; Brian J Wainger; Justin Ichida; Reiko Hanada; Michael Orthofer; Shane J Cronin; Vukoslav Komnenovic; Adi Minis; Fuminori Sato; Hiromitsu Mimata; Akihiko Yoshimura; Ido Tamir; Johannes Rainer; Reinhard Kofler; Avraham Yaron; Kevin C Eggan; Clifford J Woolf; Markus Glatzel; Ruth Herbst; Javier Martinez; Josef M Penninger
Journal:  Nature       Date:  2013-03-10       Impact factor: 49.962

8.  An integrated encyclopedia of DNA elements in the human genome.

Authors: 
Journal:  Nature       Date:  2012-09-06       Impact factor: 49.962

9.  TFAM 1.0: an online tRNA function classifier.

Authors:  Helena Tåquist; Yuanyuan Cui; David H Ardell
Journal:  Nucleic Acids Res       Date:  2007-06-25       Impact factor: 16.971

10.  ARM-seq: AlkB-facilitated RNA methylation sequencing reveals a complex landscape of modified tRNA fragments.

Authors:  Aaron E Cozen; Erin Quartley; Andrew D Holmes; Eva Hrabeta-Robinson; Eric M Phizicky; Todd M Lowe
Journal:  Nat Methods       Date:  2015-08-03       Impact factor: 28.547

View more
  290 in total

1.  The First Draft Genome Assembly of Snow Sheep (Ovis nivicola).

Authors:  Maulik Upadhyay; Andreas Hauser; Elisabeth Kunz; Stefan Krebs; Helmut Blum; Arsen Dotsev; Innokentiy Okhlopkov; Vugar Bagirov; Gottfried Brem; Natalia Zinovieva; Ivica Medugorac
Journal:  Genome Biol Evol       Date:  2020-08-01       Impact factor: 3.416

2.  Widespread Historical Contingency in Influenza Viruses.

Authors:  Jean Claude Nshogozabahizi; Jonathan Dench; Stéphane Aris-Brosou
Journal:  Genetics       Date:  2016-11-09       Impact factor: 4.562

3.  The draft genomes of five agriculturally important African orphan crops.

Authors:  Yue Chang; Huan Liu; Min Liu; Xuezhu Liao; Sunil Kumar Sahu; Yuan Fu; Bo Song; Shifeng Cheng; Robert Kariba; Samuel Muthemba; Prasad S Hendre; Sean Mayes; Wai Kuan Ho; Anna E J Yssel; Presidor Kendabie; Sibo Wang; Linzhou Li; Alice Muchugi; Ramni Jamnadass; Haorong Lu; Shufeng Peng; Allen Van Deynze; Anthony Simons; Howard Yana-Shapiro; Yves Van de Peer; Xun Xu; Huanming Yang; Jian Wang; Xin Liu
Journal:  Gigascience       Date:  2019-03-01       Impact factor: 6.524

Review 4.  Short RNA regulators: the past, the present, the future, and implications for precision medicine and health disparities.

Authors:  Isidore Rigoutsos; Eric Londin; Yohei Kirino
Journal:  Curr Opin Biotechnol       Date:  2019-07-16       Impact factor: 9.740

5.  Differential expression of human tRNA genes drives the abundance of tRNA-derived fragments.

Authors:  Adrian Gabriel Torres; Oscar Reina; Camille Stephan-Otto Attolini; Lluís Ribas de Pouplana
Journal:  Proc Natl Acad Sci U S A       Date:  2019-04-08       Impact factor: 11.205

Review 6.  Translational Control under Stress: Reshaping the Translatome.

Authors:  Vivek M Advani; Pavel Ivanov
Journal:  Bioessays       Date:  2019-05       Impact factor: 4.345

7.  The t6A modification acts as a positive determinant for the anticodon nuclease PrrC, and is distinctively nonessential in Streptococcus mutans.

Authors:  Jo Marie Bacusmo; Silvia S Orsini; Jennifer Hu; Michael DeMott; Patrick C Thiaville; Ameer Elfarash; Mellie June Paulines; Diego Rojas-Benítez; Birthe Meineke; Chris Deutsch; Dirk Iwata-Reuyl; Patrick A Limbach; Peter C Dedon; Kelly C Rice; Stewart Shuman; Valérie de Crécy-Lagard
Journal:  RNA Biol       Date:  2017-09-13       Impact factor: 4.652

Review 8.  Pathways to disease from natural variations in human cytoplasmic tRNAs.

Authors:  Jeremy T Lant; Matthew D Berg; Ilka U Heinemann; Christopher J Brandl; Patrick O'Donoghue
Journal:  J Biol Chem       Date:  2019-01-14       Impact factor: 5.157

9.  The birth of a bacterial tRNA gene by large-scale, tandem duplication events.

Authors:  Gökçe B Ayan; Hye Jin Park; Jenna Gallie
Journal:  Elife       Date:  2020-10-30       Impact factor: 8.140

10.  Evolutionary Gain of Alanine Mischarging to Noncognate tRNAs with a G4:U69 Base Pair.

Authors:  Litao Sun; Ana Cristina Gomes; Weiwei He; Huihao Zhou; Xiaoyun Wang; David W Pan; Paul Schimmel; Tao Pan; Xiang-Lei Yang
Journal:  J Am Chem Soc       Date:  2016-09-26       Impact factor: 15.419

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.