| Literature DB >> 26673694 |
Patricia P Chan1, Todd M Lowe2.
Abstract
Transfer RNAs represent the largest, most ubiquitous class of non-protein coding RNA genes found in all living organisms. The tRNAscan-SE search tool has become the de facto standard for annotating tRNA genes in genomes, and the Genomic tRNA Database (GtRNAdb) was created as a portal for interactive exploration of these gene predictions. Since its published description in 2009, the GtRNAdb has steadily grown in content, and remains the most commonly cited web-based source of tRNA gene information. In this update, we describe not only a major increase in the number of tRNA predictions (>367000) and genomes analyzed (>4370), but more importantly, the integration of new analytic and functional data to improve the quality and biological context of tRNA gene predictions. New information drawn from other sources includes tRNA modification data, epigenetic data, single nucleotide polymorphisms, gene expression and evolutionary conservation. A richer set of analytic data is also presented, including better tRNA functional prediction, non-canonical features, predicted structural impacts from sequence variants and minimum free energy structural predictions. Views of tRNA genes in genomic context are provided via direct links to the UCSC genome browsers. The database can be searched by sequence or gene features, and is available at http://gtrnadb.ucsc.edu/.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26673694 PMCID: PMC4702915 DOI: 10.1093/nar/gkv1309
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Summary statistics of the genomes and predicted tRNAs in GtRNAdb 2.0. tRNA genes were predicted by using tRNAscan-SE (1)
| Eukaryota | Bacteria | Archaea | Total | |
|---|---|---|---|---|
| Number of genomes | 155 | 4032 | 184 | 4371 |
| tRNAs decoding standard 20 amino acids | 118 806 | 237 635 | 8712 | 365 153 |
| Selenocysteine or TCA Suppressor tRNAs | 311 | 1346 | 18 | 1675 |
| Other Possible Suppressor tRNAs (CTA or TTA) | 263 | 19 | 1 | 283 |
| Total predicted tRNAs | 119 380 | 239 000 | 8731 | 367 111 |
| Predicted pseudogenes | 1 118 008 | 1165 | 1 | 1 119 174 |
Figure 1.Individual gene page for human tRNA-Val-AAC-4–1. (A) Direct link to the genomic locus of the tRNA gene with the display of related data tracks in the UCSC Genome Browser (3) is provided when available. Top scoring isotype-specific models are included to illustrate consensus (or lack thereof) in isotype classification. The rank of Val-AAC-4–1 indicates that it is the fourth highest scoring out of six human Val-AAC tRNA genes. An atypical feature for the displayed tRNA is G50:G64, a non-Watson-Crick base pair mismatch in the T-arm. Known modifications of the tRNA were retrieved from MODOMICS (18). (B) Expression of tRNA fragments derived from tRNA-Val-AAC using ARM-Seq were retrieved from published literature (13). (C) Graphic representation of tRNA secondary structure prediction from tRNAscan-SE was rendered by NAVIEW (22). Secondary structure fold using minimum free energy was generated by RNAfold (17). (D) Multiple sequence alignments of tRNA genes with the same isotype are shown with the stems highlighted and individual scores. (E) Variants from dbSNP (19) build 142 located at the tRNA-Val-AAC-4–1 locus are listed with their relative tRNA positions, alternate alleles, commonality, predicted effects and direct links to the dbSNP website for further information.
Figure 2.Example of ‘Genome Browser Views’ for two human tRNAs showing evolutionary conservation and ENCODE ChIP-Seq data for RNA polymerase III-associated transcription factors. (A) View of tRNA-Val-AAC-4–1, which has a lower tRNAscan-SE score (66.4 bits), is less conserved (fewer alignments to other species in the 100-Vertebrate Multi-Genome Alignment & Conservation track at bottom), and is not as transcriptionally active (red peaks from ENCODE Transcription Factor ChIP-seq data) as other more canonical Val-AAC genes. (B) View of tRNA-Val-AAC-1–1, which is a higher scoring tRNA (77.9 bits), is more conserved (across most mammals), and much more transcriptionally active (y-axis scale same as in part (A)). ChIP-seq data are from the ENCODE project (23) using antibodies to TBP (TATA-Box Binding Protein) and RPC155 (aka POLR3A,155kDa RNA polymerase III polypeptide A), derived from ‘Signal based on Uniform processing from the ENCODE Integrative Analysis Data Hub’ at http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/integration_data_jan2011/hub.txt).