Literature DB >> 23975767

catRAPID omics: a web server for large-scale prediction of protein-RNA interactions.

Federico Agostini¹, Andreas Zanzoni, Petr Klus, Domenica Marchese, Davide Cirillo, Gian Gaetano Tartaglia.

Abstract

SUMMARY: Here we introduce catRAPID omics, a server for large-scale calculations of protein-RNA interactions. Our web server allows (i) predictions at proteomic and transcriptomic level; (ii) use of protein and RNA sequences without size restriction; (iii) analysis of nucleic acid binding regions in proteins; and (iv) detection of RNA motifs involved in protein recognition.
RESULTS: We developed a web server to allow fast calculation of ribonucleoprotein associations in Caenorhabditis elegans, Danio rerio, Drosophila melanogaster, Homo sapiens, Mus musculus, Rattus norvegicus, Saccharomyces cerevisiae and Xenopus tropicalis (custom libraries can be also generated). The catRAPID omics was benchmarked on the recently published RNA interactomes of Serine/arginine-rich splicing factor 1 (SRSF1), Histone-lysine N-methyltransferase EZH2 (EZH2), TAR DNA-binding protein 43 (TDP43) and RNA-binding protein FUS (FUS) as well as on the protein interactomes of U1/U2 small nucleolar RNAs, X inactive specific transcript (Xist) repeat A region (RepA) and Crumbs homolog 3 (CRB3) 3'-untranslated region RNAs. Our predictions are highly significant (P < 0.05) and will help the experimentalist to identify candidates for further validation. AVAILABILITY: catRAPID omics can be freely accessed on the Web at http://s.tartaglialab.com/catrapid/omics. Documentation, tutorial and FAQs are available at http://s.tartaglialab.com/page/catrapid_group.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2013 PMID： 23975767 PMCID： PMC3810848 DOI： 10.1093/bioinformatics/btt495

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

Increasing evidence indicates that ribonucleoprotein interactions are fundamental for cellular regulation (Khalil and Rinn, 2011). Moreover, several studies highlighted the involvement of RNA molecules in the onset and progression of human diseases including neurological disorders (Johnson ). To our knowledge, there are two sequence-based methods for prediction of protein–RNA interactions: catRAPID (Bellucci ) and RPISeq (Muppirala ). The catRAPID algorithm exploits predictions of secondary structure, hydrogen bonding and van der Waals’ contributions to estimate the binding propensity of protein and RNA molecules. RPISeq is based on support vector machine (SVM) and random forest (RF) models predicting protein–RNA interactions from primary structure alone (Muppirala ). Both methods show remarkable performances, but catRAPID discriminates positive and negative cases with higher accuracy (Cirillo ) and has been tested on long non-coding RNAs (Agostini ). Here we introduce catRAPID omics to perform high-throughput predictions of protein–RNA interactions using the information on protein and RNA domains involved in macromolecular recognition.

2 WORKFLOW AND IMPLEMENTATION

The catRAPID omics server provides two main services to explore the interaction potential of (i) a protein of interest with respect to a target transcriptome or (ii) a given RNA with respect to the nucleic acid binding proteome. Several options are available to refine the type of analysis in eight model organisms or custom libraries (see online documentation): In the case of a protein query, catRAPID omics takes as input the protein sequence (FASTA format): full-length or, alternatively, nucleic acid binding regions. For a transcript query (FASTA format), the server uses the full-length sequence if below 1200 nt, or, alternatively, uses fragments with predicted stable secondary structure (Agostini ). Full-length proteins and nucleic acid binding regions can be searched. The server automatically detects disordered proteins lacking canonical RNA binding domains. Indeed, it has been observed that disordered regions are enriched in RNA binding proteins (Castello ). As RNA motifs are important for protein recognition (Kazan ), a search for these elements is carried out. The motifs were taken from RNA-Binding Protein DataBase (RBPDB) (Cook ), SpliceAid-F (Giulietti ) and a recent motif compendium (Ray ). Using the interaction propensities distribution, catRAPID omics predicts the RNA binding ability of the input protein (86% accuracy) and ranks RNA interactions (downloadable by the user). In the output page (Fig. 1A), we report all the variables used to estimate protein–RNA associations: interaction propensity (Bellucci ), discriminative power (Bellucci ), interaction strength (Agostini ) and presence of protein RNA binding domains as well as RNA motifs. A ‘star rating system’ ranks the binding propensities (http://service.tartaglialab.com/static_files/shared/faqs.html). As for the reference sets, ENSEMBL (version 68) is used for retrieval and classification of coding and non-coding RNAs, whereas protein sequences are gathered from the UniProtKB database (release 2012_11). Finally, catRAPID omics uses hmmscan, a Hidden Markov Model-based algorithm from the HMMER3 package (Finn ), to identify known PfamA domains (Finn ) and recognize protein regions involved in binding nucleic acid molecules. Algorithm hit significance is determined according to the PfamA ‘gathering thresholds’.

Fig. 1.

catRAPID omics features and performances. (A) Example of the output table showing Z-score (interaction propensity normalized with respect to experimental cases), discriminative power (with respect to training sets), interaction strength (enrichment with respect to random interactions) and presence of RNA binding domains as well as RNA motifs. Interaction scores are ranked according to a ‘star rating system’ ranging from 0 to 3 (http://service.tartaglialab.com/static_files/shared/faqs.html). A click on the text redirects to reference pages. Performances on (B) full-length proteins and (C) RNA binding protein domains. Gray is used to highlight transcriptomic studies (i.e. RNA sequencing) and red indicates proteomic analyses (i.e. mass spectrometry). The significance of our predictions was assessed using Fisher’s exact test (the dashed line corresponds to P = 0.05)

3 PERFORMANCES

The catRAPID algorithm has been previously validated on a number of protein–RNA associations (Agostini ; Bellucci ; Cirillo, ; Johnson ). To evaluate large-scale performances of catRAPID omics, we used data from recent large-scale experiments. To compare predicted and experimental interactions, we used Fisher’s exact test. As shown in Figure 1B, performances on the human splicing factor serine/arginine-rich splicing factor 1 (SRSF1) (Sanford ) and murine nucleic acid binding protein Histone-lysine N-methyltransferase EZH2 (EZH2) (Zhao ) are highly significant (P-values: 0.01 and 0.01, respectively). Good performances are found for low-throughput experiments on murine non-coding X inactive specific transcript (Xist) repeat A region (RepA) (Maenner ; Royce-Tolland ) and yeast small nuclear RNA U1 (Cvitkovic and Jurica, 2012) (P-values: 0.03 and 0.015) (Fig. 1B). To illustrate the ability of catRAPID omics to predict interactions with nucleic acid binding domains (Fig. 1C), we used murine FUS (Han ) and rat TAR DNA-binding protein 43 (TDP43) (Sephton ) (P-values: 3e-05 and 0.002) as well as human Crumbs homolog 3 (CRB3) 3′-untranslated region (Iioka ) and yeast small nuclear U2 (Cvitkovic and Jurica, 2012) (P-values: 0.001 and 2e-0.6). To evaluate catRAPID’s performances on high-throughput data, we collected positive interactions (TDP43: 568, FUS: 99, SRSF1: 358, EZH2: 1141) as well as negative controls (same numbers as positives and generated in four random extractions). Comparing the interaction scores of positives and negatives, we found enrichment (calculated as discriminative power) in 72% (TDP43), 88% (FUS), 74% (SRSF1) and 56% (EZH2) of cases. On the same datasets, SVM RPIseq showed enrichment in 58% (TDP43; RF has enrichment in 53%), 83% (FUS; RF has enrichment in 68%), 47% (SRSF1; RF has enrichment in 59%) and 41% (EZH2; RF has enrichment in 48%) of cases.

4 CONCLUSIONS

Despite recent technical developments, detection of protein–RNA associations remains a challenging task. For this reason, we developed an algorithm that can be used to complement experimental efforts (Zanzoni ). The catRAPID omics server offers unique features such as organism-specific proteomic and transcriptomic libraries, possibility to generate custom datasets, analysis of long sequences and calculation of interaction specificities. Moreover, we implemented an algorithm for the detection of RNA motifs as well as protein RNA binding domains, which will help to retrieve recognition motifs embedded in sequences. Our server enables fast calculations of ribonucleoprotein associations and predicts RNA binding activity of proteins with high accuracy, thus resulting in a powerful tool for designing new experiments.

22 in total

1. Predicting protein associations with long noncoding RNAs.

Authors: Matteo Bellucci; Federico Agostini; Marianela Masin; Gian Gaetano Tartaglia
Journal: Nat Methods Date: 2011-06 Impact factor: 28.547

2. Genome-wide identification of polycomb-associated RNAs by RIP-seq.

Authors: Jing Zhao; Toshiro K Ohsumi; Johnny T Kung; Yuya Ogawa; Daniel J Grau; Kavitha Sarma; Ji Joon Song; Robert E Kingston; Mark Borowsky; Jeannie T Lee
Journal: Mol Cell Date: 2010-12-22 Impact factor: 17.970

3. The Pfam protein families database.

Authors: Robert D Finn; Jaina Mistry; John Tate; Penny Coggill; Andreas Heger; Joanne E Pollington; O Luke Gavin; Prasad Gunasekaran; Goran Ceric; Kristoffer Forslund; Liisa Holm; Erik L L Sonnhammer; Sean R Eddy; Alex Bateman
Journal: Nucleic Acids Res Date: 2009-11-17 Impact factor: 16.971

4. RNAcontext: a new method for learning the sequence and structure binding preferences of RNA-binding proteins.

Authors: Hilal Kazan; Debashish Ray; Esther T Chan; Timothy R Hughes; Quaid Morris
Journal: PLoS Comput Biol Date: 2010-07-01 Impact factor: 4.475

5. Splicing factor SFRS1 recognizes a functionally diverse landscape of RNA transcripts.

Authors: Jeremy R Sanford; Xin Wang; Matthew Mort; Natalia Vanduyn; David N Cooper; Sean D Mooney; Howard J Edenberg; Yunlong Liu
Journal: Genome Res Date: 2008-12-30 Impact factor: 9.043

6. The A-repeat links ASF/SF2-dependent Xist RNA processing with random choice during X inactivation.

Authors: Morgan E Royce-Tolland; Angela A Andersen; Hannah R Koyfman; Dale J Talbot; Anton Wutz; Ian D Tonks; Graham F Kay; Barbara Panning
Journal: Nat Struct Mol Biol Date: 2010-07-25 Impact factor: 15.369

7. Identification of neuronal RNA targets of TDP-43-containing ribonucleoprotein complexes.

Authors: Chantelle F Sephton; Can Cenik; Alper Kucukural; Eric B Dammer; Basar Cenik; Yuhong Han; Colleen M Dewey; Frederick P Roth; Joachim Herz; Junmin Peng; Melissa J Moore; Gang Yu
Journal: J Biol Chem Date: 2010-11-04 Impact factor: 5.157

8. HMMER web server: interactive sequence similarity searching.

Authors: Robert D Finn; Jody Clements; Sean R Eddy
Journal: Nucleic Acids Res Date: 2011-05-18 Impact factor: 16.971

9. Efficient detection of RNA-protein interactions using tethered RNAs.

Authors: Hidekazu Iioka; David Loiselle; Timothy A Haystead; Ian G Macara
Journal: Nucleic Acids Res Date: 2011-02-07 Impact factor: 16.971

10. RBPDB: a database of RNA-binding specificities.

Authors: Kate B Cook; Hilal Kazan; Khalid Zuberi; Quaid Morris; Timothy R Hughes
Journal: Nucleic Acids Res Date: 2010-10-29 Impact factor: 16.971

99 in total

Review 1. Molecular Pathophysiology of Fragile X-Associated Tremor/Ataxia Syndrome and Perspectives for Drug Development.

Authors: Teresa Botta-Orfila; Gian Gaetano Tartaglia; Aubin Michalon
Journal: Cerebellum Date: 2016-10 Impact factor: 3.847

2. Long intervening non-coding RNA 00320 is human brain-specific and highly expressed in the cortical white matter.

Authors: James D Mills; Jieqiong Chen; Woojin S Kim; Paul D Waters; Avanita S Prabowo; Eleonora Aronica; Glenda M Halliday; Michael Janitz
Journal: Neurogenetics Date: 2015-03-29 Impact factor: 2.660

3. RNA helicase A activity is inhibited by oncogenic transcription factor EWS-FLI1.

Authors: Hayriye Verda Erkizan; Jeffrey A Schneider; Kamal Sajwan; Garrett T Graham; Brittany Griffin; Sergey Chasovskikh; Sarah E Youbi; Abraham Kallarakal; Maksymilian Chruszcz; Radhakrishnan Padmanabhan; John L Casey; Aykut Üren; Jeffrey A Toretsky
Journal: Nucleic Acids Res Date: 2015-01-06 Impact factor: 16.971