Literature DB >> 26520853

catRAPID signature: identification of ribonucleoproteins and RNA-binding regions.

Carmen Maria Livi1, Petr Klus1, Riccardo Delli Ponti1, Gian Gaetano Tartaglia2.   

Abstract

MOTIVATION: Recent technological advances revealed that an unexpected large number of proteins interact with transcripts even if the RNA-binding domains are not annotated. We introduce catRAPID signature to identify ribonucleoproteins based on physico-chemical features instead of sequence similarity searches. The algorithm, trained on human proteins and tested on model organisms, calculates the overall RNA-binding propensity followed by the prediction of RNA-binding regions. catRAPID signature outperforms other algorithms in the identification of RNA-binding proteins and detection of non-classical RNA-binding regions. Results are visualized on a webpage and can be downloaded or forwarded to catRAPID omics for predictions of RNA targets.
AVAILABILITY AND IMPLEMENTATION: catRAPID signature can be accessed at http://s.tartaglialab.com/new_submission/signature CONTACT: gian.tartaglia@crg.es or gian@tartaglialab.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2015. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 26520853      PMCID: PMC4795616          DOI: 10.1093/bioinformatics/btv629

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

RNA-binding proteins (RBPs) use RNA-binding domains (RDs) to recognize target RNAs and to regulate co-/post-transcriptional processes. Examples of classical RDs include RNA-recognition motif (RRM), double-stranded RNA-binding domain (dsRRM), K-homology (KH), RGG box and the Pumilio/FBF (PUM) domain (Lunde ). In addition to classical RDs, recent experimental studies on HeLa (Castello ), HEK298 (Baltz ) and mESC (Kwon ) cells, indicate that a number of RNA-interacting proteins contain non-classical RDs (ncRDs) for which annotation is not yet available. Discovery of new RDs is a challenging task: domain-detection tools, such as HMMER (Finn ) and BLAST (Camacho ) rely on sequence similarity searches to identify annotated RDs and fail to recognize newly discovered RBPs. Similarly, other methods such as RNApred (Kumar ) predict RNA-binding ability using features of annotated RDs that might be different in ncRDs. Alternatives to identify RNA-binding regions include BindN+ (Wang ), PPRInt (Kumar ) and RNAbindR+ (Walia ), but the algorithms have been trained to identify single amino acids and not contiguous regions. catRAPID signature overcomes these limitations by (i) predicting the propensity of a protein to interact with RNA and (ii) identifying RNA-binding regions through physico-chemical properties instead of sequence patterns. The algorithm is an extension of the catRAPID approach (Bellucci ) to predict protein-RNA interactions and the cleverSuite algorithm (Klus ) to classify protein groups using physico-chemical features.

2 Algorithm and performances

To build catRAPID signature we exploited a number of physico-chemical properties reported in our previous publication (Klus ): We used each physico-chemical property [e.g. structural disorder (Castello )] to build a signature, or profile, containing position-specific information arranged in a sequential order from the N- to the C-terminus; We computed Pearson correlation coefficient between signatures of annotated human RDs and same-length regions taken from RNA-binding proteins as well as negative controls (Supplementary Table S1 and online Documentation); We identified a number of discriminating physico-chemical properties, their associated RDs and correlation cutoffs (Supplementary Table S2 and online Documentation). For each protein, we calculated the fraction of residues with correlation coefficients above the cutoffs that are associated with physico-chemical properties and RDs (Table S2; online Documentation), which we then used to train catRAPID signature. Using a Support Vector Machine with RBF-kernel (online Documentation), we built a method for the (i) identification of ribonucleoproteins and (ii) prediction of RNA-binding regions: catRAPID signature shows an AUC = 0.76 for discrimination of 950 RBPs from 950 negative cases (10-fold cross-validation; Supplementary Fig. S1, Supplementary Data). On an independent test set (Table S3) comprising 47 mouse proteins harboring ncRDs and same number of negatives (Kwon ), we obtained accuracy = 0.71, sensitivity = 0.70, specificity = 0.72 and precision = 0.70. By contrast, conventional pattern recognition methods such as HMMER and BLAST show poor sensitivity (Table S3). Our algorithm outperforms RNApred in both specificity and precision (0.25 and 0.52, respectively; Table S3). Moreover, catRAPID signature reliably detects ribonucleoproteins across different kingdoms, including M. pulmonis, E. coli, C. albicans, S. cerevisiae, A. thaliana and A. oryza (Supplementary Fig. S2; online Documentation). The training for the identification of RNA-binding regions has been done on 1115 annotated RNA-binding regions. As negative counterpart we randomly selected 1115 non-binding regions of the same length from each RBP (AUC = 0.80 in 10-fold cross-validation; Supplementary Fig. S1). On 102 ncRDs versus 102 negative mouse proteins, catRAPID signature outperforms other algorithms: accuracy = 0.67, sensitivity = 0.76, specificity = 0.60 and precision = 0.65 (Supplementary Table S4). By contrast, RNABindR + shows accuracy = 0.48, sensitivity = 0.53, specificity = 0.42 and precision = 0.48. Similar performances were obtained for BindN + and PPRInt (Supplementary Table S4). In addition, we observed high performances on a protein dataset whose RNA-binding sites have been determined through X-ray and NMR (Supplementary Fig. S3 and online Documentation).

3 Server description and example

The input of the server is a FASTA sequence. To illustrate the output with an example, we studied the RNA-binding ability of Fragile X Mental Retardation Protein FMRP. catRAPID signature predicts that FMRP binds to RNA (overall interaction score = 0.85; Fig. 1A; Fig. S4) and correctly identifies two peaks corresponding to the KH domains and one peak in the RGG box (Ascano ) [Fig. 1A,B and C; ‘classical’ score = 0.73]. In addition, catRAPID signature indicates that the N-terminus (amino acids 1-215; Fig. 1B) has RNA-binding ability (‘putative’ score = 0.74), which is in agreement with very recent evidence revealing the presence of a novel KH domain (Myrick ). Comparing experimental targets [number of PAR-CLIP binding sites ≥ 1] (Ascano ) with transcriptome-wide predictions of FMRP N-terminus [amino acids 1–215; Fig. 1D] (Agostini ) we observed a significant enrichment in predicted interaction propensities (P-value < 1 −9 calculated with Kolmogorov–Smirnov test on 105 × 103 transcripts of which 7 × 103 positives), which suggests that the N-terminus contributes to the RNA-binding ability of the full-length FMRP.
Fig. 1.

RNA-binding ability of Fragile X Mental Retardation Protein FMRP. (A) The server reports the propensity of FMRP for the putative (0.74), classical (0.73) and non-classical (0.57) RBP classes, as well as an overall prediction score (0.85); (B) The profile shows protein regions and their propensity to interact with RNA. catRAPID signature correctly identifies two peaks corresponding to the central KH domains, a region in the RGG box [amino acids 527-552] at the C-terminus (Ascano ) and a recently discovered RD at the N-terminus (Myrick ). (C) Annotated RDs are shown in a table and linked to PFAM webpages; (D) Annotated and predicted RNA-binding sequences can be downloaded and/or forwarded to catRAPID omics (Agostini ) for further analysis

RNA-binding ability of Fragile X Mental Retardation Protein FMRP. (A) The server reports the propensity of FMRP for the putative (0.74), classical (0.73) and non-classical (0.57) RBP classes, as well as an overall prediction score (0.85); (B) The profile shows protein regions and their propensity to interact with RNA. catRAPID signature correctly identifies two peaks corresponding to the central KH domains, a region in the RGG box [amino acids 527-552] at the C-terminus (Ascano ) and a recently discovered RD at the N-terminus (Myrick ). (C) Annotated RDs are shown in a table and linked to PFAM webpages; (D) Annotated and predicted RNA-binding sequences can be downloaded and/or forwarded to catRAPID omics (Agostini ) for further analysis

4 Conclusions

As newly discovered RDs are not annotated, traditional domain-detection tools fail their identification. catRAPID signature addresses this limitation by detecting binding regions through physico-chemical features. Our algorithm will be helpful to investigate components of ribonucleoprotein complexes and to identify RNA-binding regions. Click here for additional data file.
  15 in total

1.  SVM based prediction of RNA-binding proteins using binding residues and evolutionary information.

Authors:  Manish Kumar; M Michael Gromiha; Gajendra P S Raghava
Journal:  J Mol Recognit       Date:  2011 Mar-Apr       Impact factor: 2.137

Review 2.  RNA-binding proteins: modular design for efficient function.

Authors:  Bradley M Lunde; Claire Moore; Gabriele Varani
Journal:  Nat Rev Mol Cell Biol       Date:  2007-06       Impact factor: 94.444

3.  Predicting protein associations with long noncoding RNAs.

Authors:  Matteo Bellucci; Federico Agostini; Marianela Masin; Gian Gaetano Tartaglia
Journal:  Nat Methods       Date:  2011-06       Impact factor: 28.547

4.  The mRNA-bound proteome and its global occupancy profile on protein-coding transcripts.

Authors:  Alexander G Baltz; Mathias Munschauer; Björn Schwanhäusser; Alexandra Vasile; Yasuhiro Murakawa; Markus Schueler; Noah Youngs; Duncan Penfold-Brown; Kevin Drew; Miha Milek; Emanuel Wyler; Richard Bonneau; Matthias Selbach; Christoph Dieterich; Markus Landthaler
Journal:  Mol Cell       Date:  2012-06-08       Impact factor: 17.970

5.  Insights into RNA biology from an atlas of mammalian mRNA-binding proteins.

Authors:  Alfredo Castello; Bernd Fischer; Katrin Eichelbaum; Rastislav Horos; Benedikt M Beckmann; Claudia Strein; Norman E Davey; David T Humphreys; Thomas Preiss; Lars M Steinmetz; Jeroen Krijgsveld; Matthias W Hentze
Journal:  Cell       Date:  2012-05-31       Impact factor: 41.582

6.  BLAST+: architecture and applications.

Authors:  Christiam Camacho; George Coulouris; Vahram Avagyan; Ning Ma; Jason Papadopoulos; Kevin Bealer; Thomas L Madden
Journal:  BMC Bioinformatics       Date:  2009-12-15       Impact factor: 3.169

7.  BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features.

Authors:  Liangjiang Wang; Caiyan Huang; Mary Qu Yang; Jack Y Yang
Journal:  BMC Syst Biol       Date:  2010-05-28

8.  The RNA-binding protein repertoire of embryonic stem cells.

Authors:  S Chul Kwon; Hyerim Yi; Katrin Eichelbaum; Sophia Föhr; Bernd Fischer; Kwon Tae You; Alfredo Castello; Jeroen Krijgsveld; Matthias W Hentze; V Narry Kim
Journal:  Nat Struct Mol Biol       Date:  2013-08-04       Impact factor: 15.369

9.  FMRP targets distinct mRNA sequence elements to regulate protein expression.

Authors:  Manuel Ascano; Neelanjan Mukherjee; Pradeep Bandaru; Jason B Miller; Jeffrey D Nusbaum; David L Corcoran; Christine Langlois; Mathias Munschauer; Scott Dewell; Markus Hafner; Zev Williams; Uwe Ohler; Thomas Tuschl
Journal:  Nature       Date:  2012-12-12       Impact factor: 49.962

10.  HMMER web server: interactive sequence similarity searching.

Authors:  Robert D Finn; Jody Clements; Sean R Eddy
Journal:  Nucleic Acids Res       Date:  2011-05-18       Impact factor: 16.971

View more
  36 in total

1.  circSamd4 represses myogenic transcriptional activity of PUR proteins.

Authors:  Poonam R Pandey; Jen-Hao Yang; Dimitrios Tsitsipatis; Amaresh C Panda; Ji Heon Noh; Kyoung Mi Kim; Rachel Munk; Thomas Nicholson; Douglas Hanniford; Diana Argibay; Xiaoling Yang; Jennifer L Martindale; Ming-Wen Chang; Simon W Jones; Eva Hernando; Payel Sen; Supriyo De; Kotb Abdelmohsen; Myriam Gorospe
Journal:  Nucleic Acids Res       Date:  2020-04-17       Impact factor: 16.971

2.  TriPepSVM: de novo prediction of RNA-binding proteins based on short amino acid motifs.

Authors:  Annkatrin Bressin; Roman Schulte-Sasse; Davide Figini; Erika C Urdaneta; Benedikt M Beckmann; Annalisa Marsico
Journal:  Nucleic Acids Res       Date:  2019-05-21       Impact factor: 16.971

3.  APRICOT: an integrated computational pipeline for the sequence-based identification and characterization of RNA-binding proteins.

Authors:  Malvika Sharan; Konrad U Förstner; Ana Eulalio; Jörg Vogel
Journal:  Nucleic Acids Res       Date:  2017-06-20       Impact factor: 16.971

4.  Sequence-Based Prediction of RNA-Binding Residues in Proteins.

Authors:  Rasna R Walia; Yasser El-Manzalawy; Vasant G Honavar; Drena Dobbs
Journal:  Methods Mol Biol       Date:  2017

Review 5.  Probing Long Non-coding RNA-Protein Interactions.

Authors:  Jasmine Barra; Eleonora Leucci
Journal:  Front Mol Biosci       Date:  2017-07-11

6.  Epigenetic inactivation of the p53-induced long noncoding RNA TP53 target 1 in human cancer.

Authors:  Angel Diaz-Lagares; Ana B Crujeiras; Paula Lopez-Serra; Marta Soler; Fernando Setien; Ashish Goyal; Juan Sandoval; Yutaka Hashimoto; Anna Martinez-Cardús; Antonio Gomez; Holger Heyn; Catia Moutinho; Jesús Espada; August Vidal; Maria Paúles; Maica Galán; Núria Sala; Yoshimitsu Akiyama; María Martínez-Iniesta; Lourdes Farré; Alberto Villanueva; Matthias Gross; Sven Diederichs; Sonia Guil; Manel Esteller
Journal:  Proc Natl Acad Sci U S A       Date:  2016-11-07       Impact factor: 11.205

7.  Protein aggregation, structural disorder and RNA-binding ability: a new approach for physico-chemical and gene ontology classification of multiple datasets.

Authors:  Petr Klus; Riccardo Delli Ponti; Carmen Maria Livi; Gian Gaetano Tartaglia
Journal:  BMC Genomics       Date:  2015-12-16       Impact factor: 3.969

Review 8.  Advances in the characterization of RNA-binding proteins.

Authors:  Domenica Marchese; Natalia Sanchez de Groot; Nieves Lorenzo Gotor; Carmen Maria Livi; Gian G Tartaglia
Journal:  Wiley Interdiscip Rev RNA       Date:  2016-08-08       Impact factor: 9.957

9.  The lncRNA H19 positively affects the tumorigenic properties of glioblastoma cells and contributes to NKD1 repression through the recruitment of EZH2 on its promoter.

Authors:  Barbara Fazi; Sabrina Garbo; Nicola Toschi; Annunziato Mangiola; Malinska Lombari; Daria Sicari; Cecilia Battistelli; Silvia Galardi; Alessandro Michienzi; Gianluca Trevisi; Rona Harari-Steinfeld; Carla Cicchini; Silvia Anna Ciafrè
Journal:  Oncotarget       Date:  2018-02-14

10.  omiXcore: a web server for prediction of protein interactions with large RNA.

Authors:  Alexandros Armaos; Davide Cirillo; Gian Gaetano Tartaglia
Journal:  Bioinformatics       Date:  2017-10-01       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.