Literature DB >> 26602692

AREsite2: an enhanced database for the comprehensive investigation of AU/GU/U-rich elements.

Jörg Fallmann1, Vitaly Sedlyarov2, Andrea Tanzer3, Pavel Kovarik2, Ivo L Hofacker4.   

Abstract

AREsite2 represents an update for AREsite, an on-line resource for the investigation of AU-rich elements (ARE) in human and mouse mRNA 3'UTR sequences. The new updated and enhanced version allows detailed investigation of AU, GU and U-rich elements (ARE, GRE, URE) in the transcriptome of Homo sapiens, Mus musculus, Danio rerio, Caenorhabditis elegans and Drosophila melanogaster. It contains information on genomic location, genic context, RNA secondary structure context and conservation of annotated motifs. Improvements include annotation of motifs not only in 3'UTRs but in the whole gene body including introns, additional genomes, and locally stable secondary structures from genome wide scans. Furthermore, we include data from CLIP-Seq experiments in order to highlight motifs with validated protein interaction. Additionally, we provide a REST interface for experienced users to interact with the database in a semi-automated manner. The database is publicly available at: http://rna.tbi.univie.ac.at/AREsite.
© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 26602692      PMCID: PMC4702876          DOI: 10.1093/nar/gkv1238

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

AU-rich elements (AREs) and GU- or U- rich elements (G/UREs) are sequence motifs found in many coding and non-coding RNAs. Upon interaction with RNA-binding proteins (RBPs) they can influence the half-life of RNA molecules. This interaction can induce RNA stabilization or destabilization, mediated by mechanisms that depend on the RBP and the genic motif context, but are otherwise not fully understood. The most prominent example is an important gene expression regulating mechanism known as AU-rich element mediated decay (AMD) (1). However, AMD is not the only RNA stability regulating process that depends on successful RNA-RBP interaction. RBPs interact e.g. with GU-rich elements (GRE), as well as U-rich elements (UREs) that have also been shown to modulate mRNA half-life (2–5). So far, mostly protein coding genes have been shown to be regulated by these mechanisms and only 3′UTR binding was shown to regulate mRNA half-life (6). Only recently CLIP-Seq (7) was introduced as a new method to identify RBP binding sites in a high-throughput manner. These CLIP-Seq experiments, identified many novel binding sites for RNA-binding proteins (RBP) involved in RNA regulation (see e.g. (4,8–10), etc.), showing significant binding of RBPs in genic regions like introns or 5′UTRs, with unknown regulatory function. Furthermore, experiments show that binding sites often contain only partial matches with previously annotated motifs, such that a more relaxed view of motif preferences has become necessary. Therefore, the research community faces novel challenges regarding the investigation of RNA-RBP interplay beyond current paradigms. In silico methods play an important role in the identification of (novel) binding sites and the prediction of their regulatory role. Established databases like ARED (11), GRED (5), AURA (12) or the old AREsite (13) provide the user with information on motif location, accessibility and more, but are not designed to cope with more recent findings and high-throughput requests. On the one hand AREsite focuses solely on 3′UTRs of protein coding genes, while ARED and GRED are very restricted regarding motifs. More than 40 citations and 45 000 visitors, underline the need for new comprehensive bioinformatical resources in this research area, made publicly available now with AREsite2 at http://rna.tbi.univie.ac.at/AREsite

IMPROVEMENTS

AREsite2 accounts for recent developments by extending its analysis approach to the whole gene body, instead of restricting it to 3′UTRs or introns. The choice of region of interest remains with the user. Furthermore, by applying more relaxed motif pattern definitions than e.g. ARED for annotation, we aim at a high coverage of experimentally validated and candidate binding sites relevant for interaction, dynamics and mechanisms of RNA-RBP interaction. Experimentally validated binding sites are a solid basis for the detailed investigation of RNA elements that interact with proteins. To improve our annotation of motifs in this new release, we include binding sites from CLIPdb (14) pre-processed datasets for the prominent RBPs ELAVL1 (HuR), Zfp36 (TTP) and HNRNPD1 (Auf1) where available. Additionally, we will integrate new binding sites from experimental data when they become available, as we did for example with data from Mukherjee et.al. (10). AREsite was to our knowledge the first database including the local structuredness of ARE motif sites in terms of opening energies and accessibilities. As RNA secondary structure proves important for successful RNA-RBP interactions, we integrated RNAplfold (15) derived accessibilities also in this new release. To further improve this feature, AREsite2 incorporates stable secondary structures in overlap with annotated motifs from genome wide scans with RNALfoldZ (16,17). Z-score filtered locally stable RNA secondary structures were predicted for all included genomes and visualization is embedded using forna (18). The comprehensive manual literature search of version 1 was automated by interaction with PUBMED via the ENTREZ API. Information retrieval for the experienced user with the need for semi-automatic requests is now possible via a REST interface. Table 1 provides a short comparison of supported features and changes between AREsite in versions 1 and 2.
Table 1.

Summary of features in AREsite and AREsite2, respectively

AREsiteAREsite2
Genic features
3′UTRsYesYes
5′UTRsYes
CDSYes
IntronsYes
mRNAsYesYes
Non-coding RNAsYes
Species
H. sapiensYesYes
M. musculusYesYes
D. rerioYes
D. melanogasterYes
C. elegansYes
Motif features
AREsYesYes
UREs/GREsYes
Motif accessibilityYesYes
Secondary structures in overlapYes
Conservation informationYesYes
Result downloadYesYes
Database dumpYes
Related literatureYesYes
REST interfaceYes
Experimental evidenceYes

Table 1 highlights differences between AREsite and AREsite2.

Table 1 highlights differences between AREsite and AREsite2. Furthermore the backend was changed to a relational database system, allowing dumps of the whole database to be retrieved by the user and easing maintenance and updates of the database with new experimental results, annotations and species.

Genomes and annotation

Following genomes were used for annotation of motifs and secondary structure prediction H. sapiens, hg38: GRCh38.p2 (Genome Reference Consortium Human Build 38), INSDC Assembly GCA_000001405.17, December 2013 M. musculus, mm10: GRCm38.p3 (Genome Reference Consortium Mouse Reference 38), INSDC Assembly GCA_000001635.5, January 2012 D. rerio, zv9: Zv9 (The Danio rerio Sequencing Project assembly Zv9), INSDC Assembly GCA_000002035.2, April 2010 D. melanogaster, BDGP6: Berkeley Drosophila Genome Project (BDGP) assembly release 6, July 2014 C. elegans, WBcel235: WS245 release of WormBase (which includes the WBcel235 version of the C. elegans reference genome) INSDC Assembly GCA_000002985.3, December 2012 Gene and transcript annotation for all genomes was retrieved from ENSEMBL (19) version 79 via their ENSEMBL perl API. AREsite2 contains A/G/URE annotations for ∼60 000 genes in H. sapiens, ∼43 000 genes in M. musculus, ∼35 000 genes in D. melanogaster, ∼17 000 genes in D. rerio and ∼47 000 in C. elegans, multiplying the information content compared to version 1.

Motifs

While the previous release of AREsite includes only motifs ranging from the ARE core motif ATTTA to its extended 13-mer version WWWWATTTAWWWW, recent experiments (4,8–10) have shown that this is not enough to cover the broad variation of RBP target motifs. With this new release we cover a far broader spectrum of AU/G/U-rich motifs. Together with the fact that we do no longer focus on 3′UTR regions only, but include the whole gene body, as well as non-protein coding genes, the database has undergone a significant increase in size. However, this vast increase in annotated motifs also means that more motifs without (known) regulatory function are now included in the database. To cope with that and improve the gain of knowledge, we decided to integrate experimentally validated target sites of TTP, HuR and Auf1, being the most prominent RBPs involved in mRNA halflife regulation, and highlight them for the end user. To that purpose we used Bedtools (20) and extracted intersections of annotated motifs and experimental results derived from CLIPdb or directly from source (e.g. (10)). Motifs in overlap with CLIP signal are color coded in the output page (TTP red, Auf1 blue, HuR green, multiple bright red, no overlap gray). Motifs annotated for the gene of interest are collected in a sortable table that can be downloaded in bed, xlsx or pdf format, if overlaps with experimental data was detected, links to the corresponding dataset are provided.

Structural context

Secondary structure of an RNA molecule influences the binding probability of RBPs. Most ABPs are for example known to prefer single-stranded RNA molecules for interaction. Thus, we applied RNAplfold to predict the probabilities of being unpaired for stretches ±20nt around annotated motifs. As in version 1 of AREsite results of this analysis are rendered as downloadable SVGs and help to check the accessibility of motifs of interest for RBPs. Furthermore, we integrated the results of genome wide RNALfoldZ screens for locally stable RNA secondary structures. Overlaps of annotated motifs with Z-score filtered stable structures were predicted for all included genomes and are part of the output. If overlaps are found, the user can investigate the structure via a linkout to forna(18).

Conservation

Information on the conservation of the region of interest is provided at two stages. Once for the gene of interest, where we plot ENSEMBL (19) GERP (21) conservation scores for the whole gene body where available. Additionally, we provide multiple sequence alignments, retrieved from ENSEMBL genomic alignments where available, for annotated motifs to visualize conservation on a per motif scale.

Literature

The ENTREZ API makes it possible to programmatically fetch publications from PUBMED for a given search string. This allows us to retrieve publications for each gene of interest in context of A/U/GRE motifs and binding proteins respectively. However, the main advantages of automatically retrieved publications is that we stay up-to-date with PUBMED. For convenience and transparency the user can follow the link to PubMed, which contains the used search string, to manually query from PUBMED.

Statistics

At http://rna.tbi.univie.ac.at/AREsite/statistics, we provide an interface for the user to request the number of genes containing at least one motif of interest in their gene body. The generated bar plot illustrates how many genes contain the selected motif in either intronic or exonic parts of 3′UTR, 5′UTR, CDS and total.

RESULTS

This section explains example output from AREsite2 for the gene Cxcl2 in Homo sapiens. If a search for the motifs ATTTA, WWTTTWW, GTTTG, TTTGTTT and AWTAAA is started, database entries are provided for the user as svg-plots and html5-tables. For visualization we use the R (R Core Team (2015)) package Gviz. The output begins with information on the genomic location of the searched gene. Figure 1A presents the ideogram of hg38 chromosome 4 with highlighted position of Cxcl2. Figure 1B visualizes the gene body and known transcripts of Cxcl2 as annotated by ENSEMBL. Annotated motifs, colored accordingly, if overlapping experimental data was available (see section Motifs) are highlighted in Figure 1C. All of these figures contain a link to the ENSEMBL genome browser, where selected motifs are made available as custom tracks. ENSEMBL (19) GERP (21) conservation scores for the whole gene body are visualized in Figure 1D where available.
Figure 1.

(A) Idiogram of hg38 chromosome 4, the location of Cxcl2 is highlighted (B) ENSEMBL (19) annotated known transcripts for Cxcl2. Exons are shown as boxes and introns as lines. The genome axis plot above indicates the orientation of the gene and its genomic location. (C) Together with Figure 1B, this plot highlights the genic location of annotated motifs and shows overlaps with experimental data in color code (see section Motifs). (D) GERP (21) conservation scores of the gene of interest are plotted if available.

(A) Idiogram of hg38 chromosome 4, the location of Cxcl2 is highlighted (B) ENSEMBL (19) annotated known transcripts for Cxcl2. Exons are shown as boxes and introns as lines. The genome axis plot above indicates the orientation of the gene and its genomic location. (C) Together with Figure 1B, this plot highlights the genic location of annotated motifs and shows overlaps with experimental data in color code (see section Motifs). (D) GERP (21) conservation scores of the gene of interest are plotted if available. The search for more sequence patterns and parsing of the whole gene body leads to an increase in predicted motifs. Table 2 shows a comparison of genes per genome containing at least one core ARE (AUUUA), GRE (GUUUG) and URE (UUUUU). To cope with this massive numbers and help users to filter potentially interesting candidates, we provide the second part of the results sections. The first table (Figure 2A) provides information on the genomic and genic location of an annotated motif, as well as experimental evidence for RBP interaction, if available. Accessibility or occupation of motifs by overlapping stable secondary structures, can be seen in the next table (Figure 2B). Detailed conservation information for each motif can be derived as multiple sequence alignment from table three (Figure 2C). Concluding table provides the results of the literature search, sorted by newest publications (Figure 2D). All tables are searchable, and content can be downloaded by the user.
Table 2.

Genes with annotated A/U/GRE in AREsite2

GenomeGenes with AREGenes with UREGenes with GRE
ExonIntronExonIntronExonIntron
H. sapiens31k30k24k17k24k17k
M. musculus24k23k18k13k18k13k
D. rerio17k20k10k10k11k10k
D. melanogaster13k9k8k6k8k5k
C. elegans19k17k16k10k13k9k

Table 2 lists the number of genes with at least one ARE (AUUUA), GRE (GUUUG) and URE (UUUUU) in AREsite2 for all available genomes

Figure 2.

(A) Results table containing motifs of interest, their genic location and experimental evidence for RBP interaction if available. (B) Accessibility plot for a motif of interest, showing short- and mid-range basepair probabilities. The user has the option to investigate different settings of base pair distances (default 5nt). (C) Multiple sequence alignment of an annotated motif, the motif region is shown in red. (D) Results of the PUBMED literature search with via the ENTREZ API. The used search string is printed for an easy manual copy-and-paste literature search in the PUBMED interface.

(A) Results table containing motifs of interest, their genic location and experimental evidence for RBP interaction if available. (B) Accessibility plot for a motif of interest, showing short- and mid-range basepair probabilities. The user has the option to investigate different settings of base pair distances (default 5nt). (C) Multiple sequence alignment of an annotated motif, the motif region is shown in red. (D) Results of the PUBMED literature search with via the ENTREZ API. The used search string is printed for an easy manual copy-and-paste literature search in the PUBMED interface. Table 2 lists the number of genes with at least one ARE (AUUUA), GRE (GUUUG) and URE (UUUUU) in AREsite2 for all available genomes

CONCLUSIONS AND PERSPECTIVES

AREsite2 presents a major update to AREsite, including three additional genomes and a high amount of newly annotated motifs. Furthermore, the new backend allows for easier integration of more genomes, other motifs, experimental and structure data. We provide the whole database as mysql-dump and all annotated motifs in bed, bed12 and gtf format for download. The RESTful service makes it easy for advanced users to retrieve information without the need to download any of these files in a semi-automatic manner. An example script for that purpose is included in the supplementary data, the most recent version can readily be downloaded from the website directly. We aim to integrate more experimental data as soon as they become available, either through CLIPdb, or directly from source if feasible.

AVAILABILITY

The database is publicly available at: http://rna.tbi.univie.ac.at/AREsite An example script for interaction with the REST interface, a database dump and motif annotation as bed, bed12 and gtf files are available at: http://rna.tbi.univie.ac.at/AREsite/bulk.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR online.
  20 in total

1.  Prediction of locally stable RNA secondary structures for genome-wide surveys.

Authors:  I L Hofacker; B Priwitzer; P F Stadler
Journal:  Bioinformatics       Date:  2004-01-22       Impact factor: 6.937

Review 2.  AU-rich elements and the control of gene expression through regulated mRNA stability.

Authors:  Timothy J Gingerich; Jean-Jacques Feige; Jonathan LaMarre
Journal:  Anim Health Res Rev       Date:  2004-06       Impact factor: 2.615

3.  Distribution and intensity of constraint in mammalian genomic sequence.

Authors:  Gregory M Cooper; Eric A Stone; George Asimenos; Eric D Green; Serafim Batzoglou; Arend Sidow
Journal:  Genome Res       Date:  2005-06-17       Impact factor: 9.043

4.  ARE-mRNA degradation requires the 5'-3' decay pathway.

Authors:  Georg Stoecklin; Thomas Mayo; Paul Anderson
Journal:  EMBO Rep       Date:  2006-01       Impact factor: 8.807

5.  BEDTools: a flexible suite of utilities for comparing genomic features.

Authors:  Aaron R Quinlan; Ira M Hall
Journal:  Bioinformatics       Date:  2010-01-28       Impact factor: 6.937

6.  Conserved GU-rich elements mediate mRNA decay by binding to CUG-binding protein 1.

Authors:  Irina A Vlasova; Nuzha M Tahoe; Danhua Fan; Ola Larsson; Bernd Rattenbacher; Julius R Sternjohn; Jayprakash Vasdewani; George Karypis; Cavan S Reilly; Peter B Bitterman; Paul R Bohjanen
Journal:  Mol Cell       Date:  2008-02-01       Impact factor: 17.970

7.  Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP.

Authors:  Markus Hafner; Markus Landthaler; Lukas Burger; Mohsen Khorshid; Jean Hausser; Philipp Berninger; Andrea Rothballer; Manuel Ascano; Anna-Carina Jungkamp; Mathias Munschauer; Alexander Ulrich; Greg S Wardle; Scott Dewell; Mihaela Zavolan; Thomas Tuschl
Journal:  Cell       Date:  2010-04-02       Impact factor: 41.582

8.  RNA Accessibility in cubic time.

Authors:  Stephan H Bernhart; Ullrike Mückstein; Ivo L Hofacker
Journal:  Algorithms Mol Biol       Date:  2011-03-09       Impact factor: 1.405

9.  AREsite: a database for the comprehensive investigation of AU-rich elements.

Authors:  Andreas R Gruber; Jörg Fallmann; Franz Kratochvill; Pavel Kovarik; Ivo L Hofacker
Journal:  Nucleic Acids Res       Date:  2010-11-11       Impact factor: 16.971

10.  ARED 3.0: the large and diverse AU-rich transcriptome.

Authors:  Tala Bakheet; Bryan R G Williams; Khalid S A Khabar
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

View more
  39 in total

1.  KSRP modulates melanoma growth and efficacy of vemurafenib.

Authors:  Wenwen Liu; Chu-Fang Chou; Shanrun Liu; David Crossman; Nabiha Yusuf; Yunkun Wu; Ching-Yi Chen
Journal:  Biochim Biophys Acta Gene Regul Mech       Date:  2019-06-30       Impact factor: 4.490

2.  Herpes Simplex Virus 1 Abrogates the cGAS/STING-Mediated Cytosolic DNA-Sensing Pathway via Its Virion Host Shutoff Protein, UL41.

Authors:  Chenhe Su; Chunfu Zheng
Journal:  J Virol       Date:  2017-02-28       Impact factor: 5.103

3.  A novel method for the identification of conserved structural patterns in RNA: From small scale to high-throughput applications.

Authors:  Marco Pietrosanto; Eugenio Mattei; Manuela Helmer-Citterich; Fabrizio Ferrè
Journal:  Nucleic Acids Res       Date:  2016-08-31       Impact factor: 16.971

4.  Reversible methylation of m6Am in the 5' cap controls mRNA stability.

Authors:  Jan Mauer; Xiaobing Luo; Alexandre Blanjoie; Xinfu Jiao; Anya V Grozhik; Deepak P Patil; Bastian Linder; Brian F Pickering; Jean-Jacques Vasseur; Qiuying Chen; Steven S Gross; Olivier Elemento; Françoise Debart; Megerditch Kiledjian; Samie R Jaffrey
Journal:  Nature       Date:  2016-12-21       Impact factor: 49.962

5.  Desthiobiotin-Streptavidin-Affinity Mediated Purification of RNA-Interacting Proteins in Mesothelioma Cells.

Authors:  Jelena Kresoja-Rakic; Emanuela Felley-Bosco
Journal:  J Vis Exp       Date:  2018-04-25       Impact factor: 1.355

Review 6.  Dysregulation of TTP and HuR plays an important role in cancers.

Authors:  Hao Wang; Nannan Ding; Jian Guo; Jiazeng Xia; Yulan Ruan
Journal:  Tumour Biol       Date:  2016-09-19

7.  Pnrc2 regulates 3'UTR-mediated decay of segmentation clock-associated transcripts during zebrafish segmentation.

Authors:  Thomas L Gallagher; Kiel T Tietz; Zachary T Morrow; Jasmine M McCammon; Michael L Goldrich; Nicolas L Derr; Sharon L Amacher
Journal:  Dev Biol       Date:  2017-06-23       Impact factor: 3.582

8.  RNA-binding protein ZFP36/TTP protects against ferroptosis by regulating autophagy signaling pathway in hepatic stellate cells.

Authors:  Zili Zhang; Mei Guo; Yujia Li; Min Shen; Desong Kong; Jiangjuan Shao; Hai Ding; Shanzhong Tan; Anping Chen; Feng Zhang; Shizhong Zheng
Journal:  Autophagy       Date:  2019-11-11       Impact factor: 16.016

9.  Hepatic tristetraprolin promotes insulin resistance through RNA destabilization of FGF21.

Authors:  Konrad T Sawicki; Hsiang-Chun Chang; Jason S Shapiro; Marina Bayeva; Adam De Jesus; Brian N Finck; Jason A Wertheim; Perry J Blackshear; Hossein Ardehali
Journal:  JCI Insight       Date:  2018-07-12

10.  Primary endothelial cell-specific regulation of hypoxia-inducible factor (HIF)-1 and HIF-2 and their target gene expression profiles during hypoxia.

Authors:  Rafal Bartoszewski; Adrianna Moszyńska; Marcin Serocki; Aleksandra Cabaj; Andreas Polten; Renata Ochocka; Louis Dell'Italia; Sylwia Bartoszewska; Jarosław Króliczewski; Michał Dąbrowski; James F Collawn
Journal:  FASEB J       Date:  2019-03-27       Impact factor: 5.191

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.