Literature DB >> 19880380

UTRdb and UTRsite (RELEASE 2010): a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs.

Giorgio Grillo1, Antonio Turi, Flavio Licciulli, Flavio Mignone, Sabino Liuni, Sandro Banfi, Vincenzo Alessandro Gennarino, David S Horner, Giulio Pavesi, Ernesto Picardi, Graziano Pesole.   

Abstract

The 5' and 3' untranslated regions of eukaryotic mRNAs (UTRs) play crucial roles in the post-transcriptional regulation of gene expression through the modulation of nucleo-cytoplasmic mRNA transport, translation efficiency, subcellular localization and message stability. UTRdb is a curated database of 5' and 3' untranslated sequences of eukaryotic mRNAs, derived from several sources of primary data. Experimentally validated functional motifs are annotated and also collated as the UTRsite database where more specific information on the functional motifs and cross-links to interacting regulatory protein are provided. In the current update, the UTR entries have been organized in a gene-centric structure to better visualize and retrieve 5' and 3'UTR variants generated by alternative initiation and termination of transcription and alternative splicing. Experimentally validated miRNA targets and conserved sequence elements are also annotated. The integration of UTRdb with genomic data has allowed the implementation of an efficient annotation system and a powerful retrieval resource for the selection and extraction of specific UTR subsets. All internet resources implemented for retrieval and functional analysis of 5' and 3' untranslated regions of eukaryotic mRNAs are accessible at http://utrdb.ba.itb.cnr.it/.

Entities:  

Mesh:

Substances:

Year:  2009        PMID: 19880380      PMCID: PMC2808995          DOI: 10.1093/nar/gkp902

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

One of the main challenges of the post-genomic era is the understanding of the mechanisms that control the spatio-temporal regulation of gene expression. The fate of newly synthesized mRNA with respect to its nucleo-cytoplasmic transport, stability, translation efficiency and subcellular localization is determined at the post-transcriptional level. Such regulation is mostly mediated by cis-acting elements located in the 5′ and 3′ untranslated regions of mRNAs (5′UTR and 3′UTR) (1) and miRNAs interacting with their specific targets in 3′UTRs (2,3). Various specific functional sequence elements and miRNA targets have been identified and characterized in mRNA UTRs. These elements usually correspond to short oligonucleotide tracts whose biological activity relies on a combination of their primary sequence and specific secondary structure. These motifs act either as target sites for RNA binding factors or interact directly with the translation machinery. Additionally, miRNA targets, usually located in the 3′UTR, present a very degenerate complementarity with the miRNAs, tolerating several mismatches, gaps and G–U pairings, outside of 6–8 bp continuous seed region at the 5′-end of the miRNA. Additionally, some UTRs may be targeted by complementary natural antisense transcripts masking RNA binding protein or miRNA binding sites (4). Notably, it is now clear that the same gene may generate several transcript variants, through the use of alternative sites for the initiation and termination of transcription and through alternative splicing. Alternative transcripts can differ both in the coding and in the untranslated regions (5). Specifically, alternative 5′ and 3′UTRs may differentially modulate the gene expression due to the presence of different combinations of functional motifs and miRNA targets. The availability of a large collection of functionally related sequences—such as UTRs—is invaluable for structural and functional analyses and for a better understanding of the specific role of different variants. To address this issue we have developed a new version of UTRdb, a collection of 5′ and 3′ UTR sequences derived from eukaryotic mRNAs, where the entries have been organized in a gene-centric structure in order to provide relevant information about splicing variants. Sequences collated in UTRdb were recovered from the National Center for Biotechnology Information (NCBI) RefSeq transcripts (6) using custom software. For human genes, a more comprehensive collection of UTRs is available [derived from the full set of over 300 000 alternative full-length transcripts collected in ASPicDB (7)] generated by a thorough analysis of all available EST/mRNA data. All UTRdb entries are further annotated for the occurrence of validated regulatory elements, conserved elements and structured RNAs, and miRNA targets (see below). Furthermore, the completeness of 5′UTRs is assessed by the occurrence of mapping CAGE tags (22) (if available) and that of 3′UTRs by the occurrence of a polyA signal and/or a polyA tail. We have also further expanded UTRsite, a collection of regulatory elements located in 5′ and 3′ UTRs and whose function and structure have been experimentally determined and published. The UTRsite collection may prove useful in automatic annotation projects of unknown expressed sequences as well as for finding previously undetected signals in known sequences. In the present release, the information for each UTRsite entry has been further enriched including data on functional interacting RNA-binding proteins. The gene-centric structure of UTRdb facilitates a full integration with all possible gene attributes collected in the NCBI Gene database (8) or other genomic resources such as the UCSC genome browser (9). In this way, the retrieval of specific UTR subsets is possible based on the features associated with each gene, for example a GO term (10), a MIM identifier (11) or a Unigene accession (12).

GENERATION OF UTRdb AND ITS INTEGRATION WITH OTHER DATABASES

UTRdb entries are automatically generated through the accurate parsing of the feature table of NCBI RefSeq and ASPicDB transcripts for the UTRef and UTRfull sections of UTRdb database, respectively. ASPicDB contains all possible transcript isoforms for a gene reconstructed by using all available transcript and EST sequences as described in (13). UTR entries are then annotated for the occurrence of tandem and interspersed repetitive elements by using RepeatMasker (v3.2.8, March 2009; A.F.A. Smit, R. Hubley & P. Green RepeatMasker at http://repeatmasker.org), and known regulatory motifs collected in the UTRsite database, as detailed in (14). Each UTRsite entry (Figure 1) is prepared/reviewed/updated by expert scientists (in many cases, those who performed the experimental analysis) by using a suitably developed submission tool (15).
Figure 1.

Sample UTRsite entry. The general information section includes the pattern syntax of the regulatory motif in a format suitable for PatSearch software (23) and the number of hits/kb expected in a sequence collection of randomly generated sequences of the same nucleotide composition of UTRdb. The cross-link to the RFAM database (24), transcripts, genes and RNA binding proteins, if available are also provided, as well as all relevant references (not shown here).

Sample UTRsite entry. The general information section includes the pattern syntax of the regulatory motif in a format suitable for PatSearch software (23) and the number of hits/kb expected in a sequence collection of randomly generated sequences of the same nucleotide composition of UTRdb. The cross-link to the RFAM database (24), transcripts, genes and RNA binding proteins, if available are also provided, as well as all relevant references (not shown here). UTRdb entries are also annotated for the occurrence of validated miRNA targets, collected in miRecords (16), a large, high-quality database of experimentally validated miRNA targets resulting from meticulous literature curation. Furthermore, we annotated a set of 3′UTR sequences that have a high likelihood to represent bona fide miRNA target recognition sites, as predicted by the HOCTAR tool (17). For a subset of seven organisms, namely human, mouse, rat, cow, dog, chicken and Arabidopsis, for which a suitable genome assembly is available, we also determined the genomic coordinates of UTRs. For such species we were able to clean all redundancies based on the observation of coincident UTRs coordinates, arising from alternative mRNA isoforms. Additional annotations are specifically provided for genome-linked UTRs. These include: (i) highly conserved sequence blocks from the 17-way PhastCons vertebrate conserved elements (18); (ii) significantly conserved tracts detected by Evofold (19); and (iii) structural conserved non-coding RNAs detected by RNAz (20). PhastCons detects evolutionarily conserved elements using a genome-wide multiple alignment based on a phylogenetic hidden Markov model (21). Evofold is a general comparative genomics method based on phylogenetic stochastic context-free grammars for identifying RNA secondary structures encoded in the human genome and conserved in an eight-way genome-wide alignment of the human, chimpanzee, mouse, rat, dog, chicken, zebrafish and pufferfish genomes (19). RNAz evaluates conserved genomic DNA sequences for signatures of structural conservation of base pairing patterns and exceptional thermodynamic stability. We employed three sets retrieved from (20), with regions conserved with P-value >0.9 in human, mouse, rat and dog (Set 1); human, mouse, rat, dog and chicken (Set 2); in human, mouse, rat, dog, chicken and either fugu or zebrafish (Set 3). To assess the completeness of 5′UTRs in human and mouse we used the mapping data of the CAGE tags indicating the location of the transcription start site (22). The 5′-end of a 5′UTR has been considered as complete when at least five CAGE tags map in a nearby position (a window of 5 bp around the mapping position of the 5′-end of the 5′UTR). Analogously, a 3′UTR is considered complete at its 3′-end if a polyA signal and/or a polyA tail is detected in the original transcript sequence. The UTRdb and UTRSite data have been organized into relational databases using MySQL as the Database Management System. A novel implementation detail in this new release is that several physical databases (containing UTR sequences and annotations from Refseq and ASPicDB transcripts, chromosome coordinates of source transcripts for the seven model organism, taxonomic data, etc.) are used to store all the information on UTRs and their annotations. The new search and retrieval system retrieves and integrates data contained in these different relational databases to give out the requested data on UTRs and related annotations [such as the database from which a UTR was recovered (Refseq or ASPicDB), genomic coordinates and structure, miRNA targets and conserved elements localization, functional elements, etc.]. An exemplar entry of UTRdb is shown in Figure 2.
Figure 2.

Sample entry of the UTRdb database. The ‘Genomic Information’ and ‘Features and Annotation’ sections report information on genome mapping coordinates and on the localization of UTRsite elements, miRNA targets and conserved elements, respectively.

Sample entry of the UTRdb database. The ‘Genomic Information’ and ‘Features and Annotation’ sections report information on genome mapping coordinates and on the localization of UTRsite elements, miRNA targets and conserved elements, respectively.

UTRdb CONTENT

UTRdb (UTRef section, release 2010) contains a total of 473 330 5′UTR and 527 323 3′UTR entries, respectively, from 483 605 genes in 79 species (see the Supplementary Data for more information). A total of 788 370 UTRsite motifs are annotated (317 767 in the 5′UTRs and 470 603 in the 3′UTRs), 20 191 experimentally validated miRNA targets, and 242 773 conserved regions. For human, the UTRfull section is also available, including UTRs deriving from full length transcripts collected in ASPicDB (7). Overall, UTRfull contains 124 345 and 194 503 5′ and 3′UTRs respectively (3.37/gene) and 3′UTRs (5.18/gene), with 348 412 annotated UTRsite motifs, 649 679 conserved elements and 105 209 experimentally validated miRNA targets.

AVAILABILITY OF UTRdb

UTRdb and UTRsite are accessible through a newly developed retrieval system where simple and advanced search forms are available. UTRs can be retrieved by several accession IDs, GO terms and MIM identifiers. Additionally, the advanced form permits a further refinement of the UTR subset to be retrieved using several criteria including the number of CAGE mapping tags (for 5′UTRs), the length of the UTR, the number of spanning exons, the occurrence of UTRsite motifs, conserved elements and miRNA targets. A download facility for selected UTR entries in FASTA format is also available. Further online utilities are UTRscan and UTRblast. The UTRscan feature allows the enquirer to search user-submitted sequences for any of the motifs collected in UTRsite. The UTRblast utility allows database searches against any of the UTRdb sections. UTRdb, UTRsite and other related resources are publicly available at http://utrdb.ba.itb.cnr.it/.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

Ministero dell’Istruzione, dell’Università e della Ricerca, Italy: Fondo Italiano Ricerca di Base, Italy: ‘Laboratorio Internazionale di Bioinformatica’ (LIBI); Laboratorio di Bioinformatica per la Biodiversità Molecolare (MBLAB). Funding for open access charge: Ministero dell’Istruzione, Università e Ricerca, Italy.
  24 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome.

Authors:  Stefan Washietl; Ivo L Hofacker; Melanie Lukasser; Alexander Hüttenhofer; Peter F Stadler
Journal:  Nat Biotechnol       Date:  2005-11       Impact factor: 54.908

Review 3.  Illuminating the silence: understanding the structure and function of small RNAs.

Authors:  Tariq M Rana
Journal:  Nat Rev Mol Cell Biol       Date:  2007-01       Impact factor: 94.444

4.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes.

Authors:  Adam Siepel; Gill Bejerano; Jakob S Pedersen; Angie S Hinrichs; Minmei Hou; Kate Rosenbloom; Hiram Clawson; John Spieth; Ladeana W Hillier; Stephen Richards; George M Weinstock; Richard K Wilson; Richard A Gibbs; W James Kent; Webb Miller; David Haussler
Journal:  Genome Res       Date:  2005-07-15       Impact factor: 9.043

5.  Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences.

Authors:  David C King; James Taylor; Laura Elnitski; Francesca Chiaromonte; Webb Miller; Ross C Hardison
Journal:  Genome Res       Date:  2005-07-15       Impact factor: 9.043

Review 6.  Regulatory roles of natural antisense transcripts.

Authors:  Mohammad Ali Faghihi; Claes Wahlestedt
Journal:  Nat Rev Mol Cell Biol       Date:  2009-07-29       Impact factor: 94.444

7.  PatSearch: A program for the detection of patterns and structural motifs in nucleotide sequences.

Authors:  Giorgio Grillo; Flavio Licciulli; Sabino Liuni; Elisabetta Sbisà; Graziano Pesole
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

8.  Identification and classification of conserved RNA secondary structures in the human genome.

Authors:  Jakob Skou Pedersen; Gill Bejerano; Adam Siepel; Kate Rosenbloom; Kerstin Lindblad-Toh; Eric S Lander; Jim Kent; Webb Miller; David Haussler
Journal:  PLoS Comput Biol       Date:  2006-04-21       Impact factor: 4.475

Review 9.  Untranslated regions of mRNAs.

Authors:  Flavio Mignone; Carmela Gissi; Sabino Liuni; Graziano Pesole
Journal:  Genome Biol       Date:  2002-02-28       Impact factor: 13.583

10.  UTRdb and UTRsite: a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs.

Authors:  Flavio Mignone; Giorgio Grillo; Flavio Licciulli; Michele Iacono; Sabino Liuni; Paul J Kersey; Jorge Duarte; Cecilia Saccone; Graziano Pesole
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

View more
  157 in total

1.  Path to facilitate the prediction of functional amino acid substitutions in red blood cell disorders--a computational approach.

Authors:  Rajith B; George Priya Doss C
Journal:  PLoS One       Date:  2011-09-13       Impact factor: 3.240

2.  The role of mRNA decay in p53-induced gene expression.

Authors:  Brian D Melanson; Reetesh Bose; Jeff D Hamill; Kristen A Marcellus; Elysia F Pan; Bruce C McKay
Journal:  RNA       Date:  2011-10-21       Impact factor: 4.942

3.  Long Non-Coding RNAs (lncRNAs) of Sea Cucumber: Large-Scale Prediction, Expression Profiling, Non-Coding Network Construction, and lncRNA-microRNA-Gene Interaction Analysis of lncRNAs in Apostichopus japonicus and Holothuria glaberrima During LPS Challenge and Radial Organ Complex Regeneration.

Authors:  Chuang Mu; Ruijia Wang; Tianqi Li; Yuqiang Li; Meilin Tian; Wenqian Jiao; Xiaoting Huang; Lingling Zhang; Xiaoli Hu; Shi Wang; Zhenmin Bao
Journal:  Mar Biotechnol (NY)       Date:  2016-07-09       Impact factor: 3.619

4.  Rice MEL2, the RNA recognition motif (RRM) protein, binds in vitro to meiosis-expressed genes containing U-rich RNA consensus sequences in the 3'-UTR.

Authors:  Saori Miyazaki; Yutaka Sato; Tomoya Asano; Yoshiaki Nagamura; Ken-Ichi Nonomura
Journal:  Plant Mol Biol       Date:  2015-08-30       Impact factor: 4.076

5.  Analysis of the 5' untranslated region (5'UTR) of the alcohol oxidase 1 (AOX1) gene in recombinant protein expression in Pichia pastoris.

Authors:  Chris A Staley; Amy Huang; Maria Nattestad; Kristin T Oshiro; Laura E Ray; Tejas Mulye; Zhiguo Harry Li; Thu Le; Justin J Stephens; Seth R Gomez; Allison D Moy; Jackson C Nguyen; Andreas H Franz; Joan Lin-Cereghino; Geoff P Lin-Cereghino
Journal:  Gene       Date:  2012-01-25       Impact factor: 3.688

6.  Postnatal dynamics of Zeb2 expression in rat brain: analysis of novel 3' UTR sequence reveals a miR-9 interacting site.

Authors:  Klara Kropivšek; Jasmine Pickford; David A Carter
Journal:  J Mol Neurosci       Date:  2013-10-25       Impact factor: 3.444

7.  Toward a systematic understanding of mRNA 3' untranslated regions.

Authors:  Wenxue Zhao; Denitza Blagev; Joshua L Pollack; David J Erle
Journal:  Proc Am Thorac Soc       Date:  2011-05

8.  Gene clustering pattern, promoter architecture, and gene expression stability in eukaryotic genomes.

Authors:  Yong H Woo; Wen-Hsiung Li
Journal:  Proc Natl Acad Sci U S A       Date:  2011-02-07       Impact factor: 11.205

9.  Gene structure and spatio-temporal expression of chicken LPIN2.

Authors:  Caixia Zhang; Runzhi Wang; Wen Chen; Xiangtao Kang; Yanqun Huang; Richard Walker; Juan Mo
Journal:  Mol Biol Rep       Date:  2014-02-22       Impact factor: 2.316

10.  Optimization of mRNA design for protein expression in the crustacean Daphnia magna.

Authors:  Kerstin Törner; Takashi Nakanishi; Tomoaki Matsuura; Yasuhiko Kato; Hajime Watanabe
Journal:  Mol Genet Genomics       Date:  2014-03-02       Impact factor: 3.291

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.