Literature DB >> 28637296

omiXcore: a web server for prediction of protein interactions with large RNA.

Alexandros Armaos1,2, Davide Cirillo1,2, Gian Gaetano Tartaglia1,2,3.   

Abstract

SUMMARY: Here we introduce omiXcore, a server for calculations of protein binding to large RNAs (> 500 nucleotides). Our webserver allows (i) use of both protein and RNA sequences without size restriction, (ii) pre-compiled library for exploration of human long intergenic RNAs interactions and (iii) prediction of binding sites.
RESULTS: omiXcore was trained and tested on enhanced UV Cross-Linking and ImmunoPrecipitation data. The method discriminates interacting and non-interacting protein-RNA pairs and identifies RNA binding sites with Areas under the ROC curve > 0.80, which suggests that the tool is particularly useful to prioritize candidates for further experimental validation.
AVAILABILITY AND IMPLEMENTATION: omiXcore is freely accessed on the web at http://service.tartaglialab.com/grant_submission/omixcore. CONTACT: gian.tartaglia@crg.es. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2017. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 28637296      PMCID: PMC5870566          DOI: 10.1093/bioinformatics/btx361

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

RNA-binding proteins (RBPs) amount to a large number of heterogeneous molecules encompassing a vast array of biological functions and binding modalities (Marchese ). The identification of RNA targets is important to characterize RBPs roles in physiological (Tartaglia, 2016) and pathological (Bolognesi ) conditions. Considerable attention has been given to long non-coding RNAs that are implicated in important cell functions (Guttman and Rinn, 2012) but are difficult to characterize because of their tissue-dependent expression (Chen ). Indeed, RNA interactions with RBPs require laborious experimental procedures such as chromatin isolation by RNA purification to detect protein networks bound to the RNA of interest (Chu ). The development of enhanced UV Cross-Linking and ImmunoPrecipitation (eCLIP) has recently provided a wealth of information on RBPs-binding sites at the transcriptomic level (Van Nostrand ). The large and homogeneous amount of data provided by eCLIP experiments represents an ideal dataset to train methods for prediction of protein interactions with long non-coding RNAs. Indeed, despite considerable efforts in RNA crystallography (Zhang and Ferré-D’amaré, 2014), the paucity of structural information leads to an urgency in the implementation of high-throughput approaches for identification of protein-RNA interactions. Using the catRAPID approach (Bellucci ), we developed the uniform fragmentation procedure to predict interaction propensities between protein and RNA fragments (Cirillo ). Here, we introduce omiXcore to perform predictions of long RNAs (500 nt and larger). Calibrated on eCLIP data, omiXcore allows fast and quantitative prediction of RBP interactions with human long intergenic RNAs (lincRNAs), facilitating experimental design and analysis.

2 Workflow and implementation

The omiXcore server allows calculation of the interaction propensities of a protein sequence against i) human lincRNAs (14 717 entries available in http://www.ensembl.org/) or ii) a custom list of transcripts (maximum of 30 K characters). Once the user submits a protein of interest, the catRAPID signature algorithm (Livi ) estimates the RNA-binding ability. If the protein is predicted to interact with RNA, its partners are calculated and the binding sites visualized. To train the algorithm, we used the eCLIP interactomes of 96 RBPs (56 studied in HepG2 and 78 in K562; downloaded from https://www.encodeproject.org/in July 2016). We mapped targets of RBPs to their canonical transcript isoforms. For each RNA, we measured the overall affinity defined as the number of reads (average of two replicas) divided by isoforms abundance (Trapnell ). For each RBP, we ranked the transcripts by and computed the local affinities at each RNA site. To build the negative set, we compiled a list of transcripts that do not interact with the RBP of interest (i.e. they are not reported in the two eCLIP replicas) but bind to at least one of the other RBPs. In total we used 12 234 positive and 12 717 negative interactions (balanced set with 100 RNAs per RBPs). For each protein-RNA pair, we used the uniform fragmentation procedure to calculate interaction propensities between protein and RNA fragments (Cirillo ). The uniform fragmentation approach is based on the division of protein and RNA sequences into overlapping segments [100 fragments for each molecule] (Cirillo ). This analysis is particularly useful to identify protein and RNA regions involved in the binding. We computed mean and SD of the interaction propensities between each RNA fragment and the protein fragments, which we combined in the position-dependent vector . To predict the binding sites of a specific RNA fragment , we in tegrated the interaction propensities using the formula hk and calculating . Similarly, is computed using and Both and are defined in the range [0,1] and fitted to the experimental and optimizing the internal weights and (neural network architecture with i = 100 and k = 50; total of 1.2 × 106 binding regions used).

3 Performances

omiXcore builds on top of catRAPID algorithms that have been previously validated on a large number of interactions (Agostini ; Cirillo ; Livi ): to evaluate omiXcore performances, we employed a leave-one-out procedure on the 96 individual subsets, each one corresponding to one RBP with its positive and negative interactors. Performances on RBP partners (Area under the ROC curve AUC = 0.83; Sensitivity = 0.75; Specificity = 0.78; Matthews correlation coefficient of 0.55; Fig. 1A) and RNA binding sites (AUC = 0.78; Sensitivity = 0.70; Specificity = 0.90; Fig. 1B) were assessed using a binary classification of interacting versus non-interacting pairs ( and cut-offs at 0.25). Cut-off points for and (0.5 and 0.1, respectively) were set maximizing the distance of the ROC curve from diagonal line (Fig. 1A and B). The 0.65 correlation (Spearman’s Rho) between and allows to quantify binding sites in the continuum range (Fig. 1B and C), which is useful to detect low-affinity interactions (Jankowsky and Harris, 2015). On the testing set, omiXcore shows higher AUCs (in the range of 0.93–0.99) than binary classifiers such as RPIseq [RPIseq-RF:0.50–0.60; RPIseq-SVM:0.46–0.66] (Muppirala ) and Global Score [0.55–0.88; see also Supplementary Material for other performances] (Cirillo ).
Fig. 1

omiXcore performances. (A) Binding partner prediction. For each RBP, the algorithm discriminates between interacting and non-interacting RNA pairs (cut-off of 0.25). (B) Within each RNA sequence, binding sites can be identified in a binary way ( cut-off of 0.1) or in the continuum range (average correlation of 0.65). (C) Example of correlation between experimental and predicted binding sites: Y-box-binding protein 3 and nuclear receptor corepressor transcript (correlation of 0.80)

omiXcore performances. (A) Binding partner prediction. For each RBP, the algorithm discriminates between interacting and non-interacting RNA pairs (cut-off of 0.25). (B) Within each RNA sequence, binding sites can be identified in a binary way ( cut-off of 0.1) or in the continuum range (average correlation of 0.65). (C) Example of correlation between experimental and predicted binding sites: Y-box-binding protein 3 and nuclear receptor corepressor transcript (correlation of 0.80)

4 Conclusions

In this work, we introduced the omiXcore tool for predicting RBP interactions with large RNAs. The algorithm allows detection of RNA binding sites by evaluating local physicochemical properties of polypeptide and nucleotide sequences (Bellucci ). omiXcore was calibrated on eCLIP data (Van Nostrand ) and is useful to prioritize coding and non-coding RNA targets for further experimental validation. We optimized the webserver to perform fast calculations of lincRNAs, for which we provide a pre-compiled library. Indeed, lincRNAs are poorly abundant and regulated in a precise spatiotemporal manner, which makes their characterization particularly difficult in the wet lab. Click here for additional data file.
  16 in total

1.  Predicting protein associations with long noncoding RNAs.

Authors:  Matteo Bellucci; Federico Agostini; Marianela Masin; Gian Gaetano Tartaglia
Journal:  Nat Methods       Date:  2011-06       Impact factor: 28.547

2.  Quantitative predictions of protein interactions with long noncoding RNAs.

Authors:  Davide Cirillo; Mario Blanco; Alexandros Armaos; Andreas Buness; Philip Avner; Mitchell Guttman; Andrea Cerase; Gian Gaetano Tartaglia
Journal:  Nat Methods       Date:  2016-12-29       Impact factor: 28.547

3.  Dramatic improvement of crystals of large RNAs by cation replacement and dehydration.

Authors:  Jinwei Zhang; Adrian R Ferré-D'Amaré
Journal:  Structure       Date:  2014-09-02       Impact factor: 5.006

4.  Xist recruits the X chromosome to the nuclear lamina to enable chromosome-wide silencing.

Authors:  Chun-Kan Chen; Mario Blanco; Constanza Jackson; Erik Aznauryan; Noah Ollikainen; Christine Surka; Amy Chow; Andrea Cerase; Patrick McDonel; Mitchell Guttman
Journal:  Science       Date:  2016-08-04       Impact factor: 47.728

Review 5.  Modular regulatory principles of large non-coding RNAs.

Authors:  Mitchell Guttman; John L Rinn
Journal:  Nature       Date:  2012-02-15       Impact factor: 49.962

6.  Predicting RNA-protein interactions using only sequence information.

Authors:  Usha K Muppirala; Vasant G Honavar; Drena Dobbs
Journal:  BMC Bioinformatics       Date:  2011-12-22       Impact factor: 3.169

7.  Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP).

Authors:  Eric L Van Nostrand; Gabriel A Pratt; Alexander A Shishkin; Chelsea Gelboin-Burkhart; Mark Y Fang; Balaji Sundararaman; Steven M Blue; Thai B Nguyen; Christine Surka; Keri Elkins; Rebecca Stanton; Frank Rigo; Mitchell Guttman; Gene W Yeo
Journal:  Nat Methods       Date:  2016-03-28       Impact factor: 28.547

8.  catRAPID omics: a web server for large-scale prediction of protein-RNA interactions.

Authors:  Federico Agostini; Andreas Zanzoni; Petr Klus; Domenica Marchese; Davide Cirillo; Gian Gaetano Tartaglia
Journal:  Bioinformatics       Date:  2013-08-23       Impact factor: 6.937

9.  catRAPID signature: identification of ribonucleoproteins and RNA-binding regions.

Authors:  Carmen Maria Livi; Petr Klus; Riccardo Delli Ponti; Gian Gaetano Tartaglia
Journal:  Bioinformatics       Date:  2015-10-31       Impact factor: 6.937

Review 10.  Advances in the characterization of RNA-binding proteins.

Authors:  Domenica Marchese; Natalia Sanchez de Groot; Nieves Lorenzo Gotor; Carmen Maria Livi; Gian G Tartaglia
Journal:  Wiley Interdiscip Rev RNA       Date:  2016-08-08       Impact factor: 9.957

View more
  8 in total

1.  The moonlighting RNA-binding activity of cytosolic serine hydroxymethyltransferase contributes to control compartmentalization of serine metabolism.

Authors:  Giulia Guiducci; Alessio Paone; Angela Tramonti; Giorgio Giardina; Serena Rinaldo; Amani Bouzidi; Maria C Magnifico; Marina Marani; Javier A Menendez; Alessandro Fatica; Alberto Macone; Alexandros Armaos; Gian G Tartaglia; Roberto Contestabile; Alessandro Paiardini; Francesca Cutruzzolà
Journal:  Nucleic Acids Res       Date:  2019-05-07       Impact factor: 16.971

2.  Structural analysis of SARS-CoV-2 genome and predictions of the human interactome.

Authors:  Andrea Vandelli; Michele Monti; Edoardo Milanetti; Alexandros Armaos; Jakob Rupert; Elsa Zacco; Elias Bechara; Riccardo Delli Ponti; Gian Gaetano Tartaglia
Journal:  Nucleic Acids Res       Date:  2020-11-18       Impact factor: 16.971

3.  Genistein Represses HOTAIR/Chromatin Remodeling Pathways to Suppress Kidney Cancer.

Authors:  Mitsuho Imai-Sumida; Pritha Dasgupta; Priyanka Kulkarni; Marisa Shiina; Yutaka Hashimoto; Varahram Shahryari; Shahana Majid; Yuichiro Tanaka; Rajvir Dahiya; Soichiro Yamamura
Journal:  Cell Physiol Biochem       Date:  2020-01-22

4.  Introduction to Bioinformatics Resources for Post-transcriptional Regulation of Gene Expression.

Authors:  Eliana Destefanis; Erik Dassi
Journal:  Methods Mol Biol       Date:  2022

5.  An Integrative Study of Protein-RNA Condensates Identifies Scaffolding RNAs and Reveals Players in Fragile X-Associated Tremor/Ataxia Syndrome.

Authors:  Fernando Cid-Samper; Mariona Gelabert-Baldrich; Benjamin Lang; Nieves Lorenzo-Gotor; Riccardo Delli Ponti; Lies-Anne W F M Severijnen; Benedetta Bolognesi; Ellen Gelpi; Renate K Hukema; Teresa Botta-Orfila; Gian Gaetano Tartaglia
Journal:  Cell Rep       Date:  2018-12-18       Impact factor: 9.423

6.  RNA structure drives interaction with proteins.

Authors:  Natalia Sanchez de Groot; Alexandros Armaos; Ricardo Graña-Montes; Marion Alriquet; Giulia Calloni; R Martin Vabulas; Gian Gaetano Tartaglia
Journal:  Nat Commun       Date:  2019-07-19       Impact factor: 14.919

Review 7.  Zooming in on protein-RNA interactions: a multi-level workflow to identify interaction partners.

Authors:  Alessio Colantoni; Jakob Rupert; Andrea Vandelli; Gian Gaetano Tartaglia; Elsa Zacco
Journal:  Biochem Soc Trans       Date:  2020-08-28       Impact factor: 5.407

8.  RBPsuite: RNA-protein binding sites prediction suite based on deep learning.

Authors:  Xiaoyong Pan; Yi Fang; Xianfeng Li; Yang Yang; Hong-Bin Shen
Journal:  BMC Genomics       Date:  2020-12-09       Impact factor: 3.969

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.