Literature DB >> 21071424

AREsite: a database for the comprehensive investigation of AU-rich elements.

Andreas R Gruber¹, Jörg Fallmann, Franz Kratochvill, Pavel Kovarik, Ivo L Hofacker.

Abstract

AREsite is an online resource for the detailed investigation of AU-rich elements (ARE) in vertebrate mRNA 3'-untranslated regions (UTRs). AREs are one of the most prominent cis-acting regulatory elements found in 3'-UTRs of mRNAs. Various ARE-binding proteins that possess RNA stabilizing or destabilizing functions are recruited by sequence-specific motifs. Recent findings suggest an essential role of the structural mRNA context in which these sequence motifs are embedded. AREsite is the first database that allows to quantify the structuredness of ARE motif sites in terms of opening energies and accessibility probabilities. Moreover, we also provide a detailed phylogenetic analysis of ARE motifs and incorporate information about experimentally validated targets of the ARE-binding proteins TTP, HuR and Auf1. The database is publicly available at: http://rna.tbi.univie.ac.at/AREsite.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2010 PMID： 21071424 PMCID： PMC3013810 DOI： 10.1093/nar/gkq990

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

AU-rich elements (AREs) are distinct sequence elements in the 3′-untranslated region (UTR) of mRNAs often consisting of one or several AUUUA pentamers located in an adenosine and uridine rich region (1). Numerous proteins directly interact with AREs, thereby modulating mRNA stability or translational efficiency. The importance of these sequence motifs has been highlighted recently by a multitude of studies pointing out that the loss of ARE-mediated mRNA control leads to severe pathologies as AREs affect gene expression on a global scale (2–7). AREs have been studied bioinformatically early on (8) and today’s estimate is that ∼7% of the human protein-coding genes contain AREs (9). However, the presence of an ARE consensus motif alone is not enough to qualify a gene as a true in vivo target of ARE-binding proteins. Recent computational and experimental evidence (10–13) and the fact that ARE-targeting proteins bind to RNA in single-stranded conformation (14) emphasize the need to analyze the structural context these motifs are embedded in. Furthermore, the mounting comparative genomics data available can be harnessed to identify evolutionarily conserved motif sites. AREsite is the first database that combines sequence annotation of AREs with the prediction of the accessibility and evolutionary conservation of the motif site. In addition to these features, we incorporated information from extensive expert literature search and list experimentally validated targets of the ARE-binding proteins TTP, HuR and Auf1.

DATABASE GENERATION AND CONTENT

In its current version AREsite uses Ensembl release 56 as data basis. For human and mouse, any protein-coding gene that has at least one transcript with a 3′-UTR sequence has been added to the collection. To account for the various definitions of AREs found in literature we decided not to restrict the database to a single motif, but offer the user the possibility to screen for a total of eight different consensus motifs, starting with the plain AUUUA pentamer to the WWWWAUUUAWWWW 13-mer, which resembles the core motif embedded in a stretch of A/U residues. By default, only the representative transcript of the selected gene, which we define as the transcript with the most AUUUA counts in its 3′-UTR sequence, is analyzed in detail. For each transcript we list sequence statistics and calculate the fold enrichment based on an order-0 and an order-1 Markov model for each motif. Beside plain sequence annotation of ARE motifs in transcripts AREsite also offers the researcher to study sequence conservation of motifs on both transcript and genomic level. For each motif site we provide annotated alignments with highlighted conserved motifs and sequence logos (15). Finally, an overview figure in form of a phylogenetic tree depicts the conservation pattern of all detected motif sites. Motif site accessibility in terms of opening energies and probabilities of being unpaired are calculated using RNAplfold (16,17). For each motif we present accessibility values for the core AUUUA pentamer. Furthermore, results are visualized in an interactive SVG plot that allows the user to explore different parameter settings (Figure 1).

Figure 1.

Screenshot of the interactive SVG plot showing an ARE motif site of the human TNF-alpha gene. TNF-alpha is one of the best characterized ARE-containing genes. Its ARE target site consists of several consecutive ATTTA (AUUUA) motifs which favors the site’s accessibility. When using a SVG ready web browser the user can explore the target site and flanking nucleotides with different parameter settings. With default settings (u = 5), the plot shows for each nucleotide i the energy that is needed to open local secondary structure for a stretch of five nucleotides (5′–3′) ending at position i. For the three best studied ARE-binding proteins TTP, HuR and Auf1, literature was screened for putative or confirmed mRNA targets. We classified the type of evidence for an mRNA being targeted by one of the three proteins by five criteria: (i) direct binding of the protein to the mRNA or its 3′-UTR (e.g. using RNA immunoprecipitation or electrophoretic mobility shift assays); (ii) an independent reporter assay confirming the functionality of the putative binding site; (iii) the loss or overexpression of the ARE-binding protein affects mRNA and/or (iv) the protein level of the target mRNA; (v) the stability of the target mRNA is affected by the lack or excess of the ARE-binding protein as shown by actinomycin D chase experiments or cell-free decay assays. New references will be added on a regular basis. Figure 2 shows a typical output of an AREsite query. If the user aims for permanent storage of the search results, annotated Genbank files can be downloaded for each analyzed transcript.

Figure 2.

Snapshot of a typical AREsite results page (gene: human IL6). (A) Basic statistics about the selected gene. (B) Experimental evidence collected for this gene. For each of the ARE-binding proteins TTP, HuR and Auf1 we list the type of evidence. The user can choose to see the supporting publications which are directly linked to Pubmed. (C) Overview figure that shows all know transcripts of the selected gene and highlights detected ARE motifs in the 3′-UTRs. The representative transcript which is analyzed in detail is shown in a gray box. (D) Detailed summary of the analysis results for the representative transcript. For each motif site the user can choose to display accessibility plots, genomic and transcript alignments together with sequence logos. (E) Overview figure of the conservation analysis. Black circles (genomic alignments) and boxes (transcript alignments) indicate that the corresponding ARE motif was also detected in the sequence of the corresponding species.

Generation of alignments from transcripts

Alignments of orthologous transcripts were generated using data from the Ensembl gene orthology pipeline. For each gene database entry we first collected all orthologous genes from other species that have a strict one to one relation. Next we screened for transcripts that have an annotated 3′-UTR and among those we selected the one that showed the best coverage (at least 75%) of the reference species 3′-UTR. Multiple species whole transcript alignments were then generated with CLUSTAL W. To investigate the sequence conservation of the motif site we finally extract the region containing the motif site plus five flanking nucleotides on each side from the alignments. Each alignment sequence is then searched with the corresponding consensus ARE motif. Finally, detected motifs are used as sequence anchors and sequences are realigned using DIALIGN (18). The same procedure was also applied to the processed and filtered genomic alignments.

Generation of genomic alignments

Since comparative data at the level of transcripts is still limited, we decided to also incorporate data from genome-wide alignments to get a more refined picture of the conservation pattern of motifs. Interpretation of these data though has to be done with caution since there is no guarantee that the aligned sequences from other species really belong to the gene of interest. We apply, however, filtering strategies that ensure that aligned sequences are homologous over a longer stretch of nucleotides than simply the motif site. Genomic alignments in MAF format were obtained for each UTR sequence from multiz generated alignments available at the UCSC genome browser (19). For human, corresponding alignments were extracted from 46 species multiple alignments based on the human genome assembly hg19, and for mouse from 30 species multiple alignments based on the assembly mm9. The obtained alignment blocks were often too short for any practical use and so we developed a MAF processing and filtering pipeline, that first merges adjacent MAF blocks to longer ones and then returns alignment windows of 120 nt and a step size of 30 nt. Finally, these windowed alignments were realigned with CLUSTAL W and were filtered to contain only sequences that have a length of at least 50% of the sequence length of the reference species.

Quantifying motif site accessibility

For the calculation of the motif site accessibility in terms of opening energies and probabilities of being unpaired we used RNAplfold (16) with different parameter settings. RNAplfold is a thermodynamic RNA folding program that calculates local base-pairing probabilities, as well as the probability that a stretch of u consecutive nucleotides is unpaired (17). These probabilities are directly related to the energy needed to open all secondary structures in the respective stretch of nucleotides. The parameter set W = 80, L = 40 models the effects of cotranscriptional folding and has been previously used to predict siRNA binding (20). AREsite features also a different parameter setting (W = 240, L = 120), which considers longer base pair spans and shows improved results on siRNA binding as well as on RNA–RNA interaction (H. Tafer, personal communication). For each detected motif site we list the accessibility values (u = 5) for the core AUUUA pentamer for both parameter settings (short range, mid range).

DISCUSSION

In this contribution we have introduced AREsite, a database for the detailed investigation of ARE motifs in terms of motif site accessibility and evolutionary conservation. In its current state AREsite reports 3275 human protein coding genes which have at least one occurrence of the consensus motif WUAUUUAUW in their 3′-UTR sequences. This corresponds to ∼16% of the human protein coding genes. For 711 of those genes AREsite lists experimental evidence that they are targets of ARE-binding proteins. The requirements which are needed to qualify a gene as an in vivo target of ARE-binding proteins are still poorly understood. AREsite with its features of conservation pattern analysis and accessibility prediction can help researchers to unravel the underlying mechanism. Recent studies (11,13) demonstrate the great value of combining computational accessibility prediction and wet-lab data. When interpreting accessibility predictions one has to keep in mind, however, that low accessibility does not necessarily exclude a gene from being an in vivo target. mRNA regulation is a complex system and the binding of one factor might lead to structural rearrangements which can make a formerly cryptic site accessible or vice versa (21). In the context of AREs, this concept has been nicely demonstrated by using artificially designed mRNA openers and closers to control mRNA stability (22). The accurate modeling of these combinatorial effects will be among the most challenging issues for future work.

FUNDING

University of Vienna “Research platform: Structural and Functional Analysis of mRNA Molecules Targeted by the RNA-binding Protein Tristetraprolin” (to P.K. and I.L.H.) Funding for open access charge: University of Vienna. Conflict of interest statement. None declared.

22 in total

1. ARED: human AU-rich element-containing mRNA database reveals an unexpectedly diverse functional repertoire of encoded proteins.

Authors: T Bakheet; M Frevel; B R Williams; W Greer; K S Khabar
Journal: Nucleic Acids Res Date: 2001-01-01 Impact factor: 16.971

2. Predicting in vivo binding sites of RNA-binding proteins using mRNA secondary structure.

Authors: Xiao Li; Gerald Quon; Howard D Lipshitz; Quaid Morris
Journal: RNA Date: 2010-04-23 Impact factor: 4.942

3. Computational prediction of RNA structural motifs involved in posttranscriptional regulatory processes.

Authors: Michal Rabani; Michael Kertesz; Eran Segal
Journal: Proc Natl Acad Sci U S A Date: 2008-09-24 Impact factor: 11.205

4. The impact of target site accessibility on the design of effective siRNAs.

Authors: Hakim Tafer; Stefan L Ameres; Gregor Obernosterer; Christoph A Gebeshuber; Renée Schroeder; Javier Martinez; Ivo L Hofacker
Journal: Nat Biotechnol Date: 2008-04-27 Impact factor: 54.908

5. Deletion of the RNA-binding proteins ZFP36L1 and ZFP36L2 leads to perturbed thymic development and T lymphoblastic leukemia.

Authors: Daniel J Hodson; Michelle L Janas; Alison Galloway; Sarah E Bell; Simon Andrews; Cheuk M Li; Richard Pannell; Christian W Siebel; H Robson MacDonald; Kim De Keersmaecker; Adolfo A Ferrando; Gerald Grutz; Martin Turner
Journal: Nat Immunol Date: 2010-07-11 Impact factor: 25.606

6. A Pumilio-induced RNA structure switch in p27-3' UTR controls miR-221 and miR-222 accessibility.

Authors: Martijn Kedde; Marieke van Kouwenhove; Wilbert Zwart; Joachim A F Oude Vrielink; Ran Elkon; Reuven Agami
Journal: Nat Cell Biol Date: 2010-09-05 Impact factor: 28.824

7. RNAcontext: a new method for learning the sequence and structure binding preferences of RNA-binding proteins.

Authors: Hilal Kazan; Debashish Ray; Esther T Chan; Timothy R Hughes; Quaid Morris
Journal: PLoS Comput Biol Date: 2010-07-01 Impact factor: 4.475

8. Essential role of the RNA-binding protein HuR in progenitor cell survival in mice.

Authors: Mallika Ghosh; Hector Leonardo Aguila; Jason Michaud; Youxi Ai; Ming-Tao Wu; Annabrita Hemmes; Ari Ristimaki; Caiying Guo; Henry Furneaux; Timothy Hla
Journal: J Clin Invest Date: 2009-11-02 Impact factor: 14.808

9. The RNA-binding protein Elavl1/HuR is essential for placental branching morphogenesis and embryonic development.

Authors: Vicky Katsanou; Stavros Milatos; Anthie Yiakouvaki; Nikos Sgantzis; Anastasia Kotsoni; Maria Alexiou; Vaggelis Harokopos; Vassilis Aidinis; Myriam Hemberger; Dimitris L Kontoyiannis
Journal: Mol Cell Biol Date: 2009-03-23 Impact factor: 4.272

10. The UCSC Genome Browser database: update 2010.

Authors: Brooke Rhead; Donna Karolchik; Robert M Kuhn; Angie S Hinrichs; Ann S Zweig; Pauline A Fujita; Mark Diekhans; Kayla E Smith; Kate R Rosenbloom; Brian J Raney; Andy Pohl; Michael Pheasant; Laurence R Meyer; Katrina Learned; Fan Hsu; Jennifer Hillman-Jackson; Rachel A Harte; Belinda Giardine; Timothy R Dreszer; Hiram Clawson; Galt P Barber; David Haussler; W James Kent
Journal: Nucleic Acids Res Date: 2009-11-11 Impact factor: 16.971

88 in total

1. The protein Zfand5 binds and stabilizes mRNAs with AU-rich elements in their 3'-untranslated regions.

Authors: Guoan He; Dongxu Sun; Zhiying Ou; Aihao Ding
Journal: J Biol Chem Date: 2012-06-04 Impact factor: 5.157

2. The mRNA stability factor HuR inhibits microRNA-16 targeting of COX-2.

Authors: Lisa E Young; Ashleigh E Moore; Lena Sokol; Nicole Meisner-Kober; Dan A Dixon
Journal: Mol Cancer Res Date: 2011-11-02 Impact factor: 5.852

Review 3. MYC: connecting selective transcriptional control to global RNA production.

Authors: Theresia R Kress; Arianna Sabò; Bruno Amati
Journal: Nat Rev Cancer Date: 2015-09-18 Impact factor: 60.716

Review 4. MicroRNA and AU-rich element regulation of prostaglandin synthesis.

Authors: Ashleigh E Moore; Lisa E Young; Dan A Dixon
Journal: Cancer Metastasis Rev Date: 2011-12 Impact factor: 9.264

Review 5. Coordinate regulation of mRNA decay networks by GU-rich elements and CELF1.

Authors: Irina Vlasova-St Louis; Paul R Bohjanen
Journal: Curr Opin Genet Dev Date: 2011-04-13 Impact factor: 5.578

6. Postnatal dynamics of Zeb2 expression in rat brain: analysis of novel 3' UTR sequence reveals a miR-9 interacting site.

Authors: Klara Kropivšek; Jasmine Pickford; David A Carter
Journal: J Mol Neurosci Date: 2013-10-25 Impact factor: 3.444

7. Human antigen R-mediated mRNA stabilization is required for ultraviolet B-induced autoinduction of amphiregulin in keratinocytes.

Authors: Hironao Nakayama; Shinji Fukuda; Natsuki Matsushita; Hisayo Nishida-Fukuda; Hirofumi Inoue; Yuji Shirakata; Koji Hashimoto; Shigeki Higashiyama
Journal: J Biol Chem Date: 2013-02-21 Impact factor: 5.157

Review 8. Genome-wide technology for determining RNA stability in mammalian cells: historical perspective and recent advantages based on modified nucleotide labeling.

Authors: Hidenori Tani; Nobuyoshi Akimitsu
Journal: RNA Biol Date: 2012-10-01 Impact factor: 4.652

9. Genetic polymorphisms in RNA binding proteins contribute to breast cancer survival.

Authors: Rohit Upadhyay; Sandhya Sanduja; Vimala Kaza; Dan A Dixon
Journal: Int J Cancer Date: 2012-09-18 Impact factor: 7.396

10. Targeted mRNA Decay by RNA Binding Protein AUF1 Regulates Adult Muscle Stem Cell Fate, Promoting Skeletal Muscle Integrity.

Authors: Devon M Chenette; Adam B Cadwallader; Tiffany L Antwine; Lauren C Larkin; Jinhua Wang; Bradley B Olwin; Robert J Schneider
Journal: Cell Rep Date: 2016-07-21 Impact factor: 9.423