| Literature DB >> 32820806 |
Alessio Colantoni1, Jakob Rupert2,3, Andrea Vandelli4,5,6, Gian Gaetano Tartaglia1,2,3,4,5,7, Elsa Zacco3.
Abstract
Interactions between proteins and RNA are at the base of numerous cellular regulatory and functional phenomena. The investigation of the biological relevance of non-coding RNAs has led to the identification of numerous novel RNA-binding proteins (RBPs). However, defining the RNA sequences and structures that are selectively recognised by an RBP remains challenging, since these interactions can be transient and highly dynamic, and may be mediated by unstructured regions in the protein, as in the case of many non-canonical RBPs. Numerous experimental and computational methodologies have been developed to predict, identify and verify the binding between a given RBP and potential RNA partners, but navigating across the vast ocean of data can be frustrating and misleading. In this mini-review, we propose a workflow for the identification of the RNA binding partners of putative, newly identified RBPs. The large pool of potential binders selected by in-cell experiments can be enriched by in silico tools such as catRAPID, which is able to predict the RNA sequences more likely to interact with specific RBP regions with high accuracy. The RNA candidates with the highest potential can then be analysed in vitro to determine the binding strength and to precisely identify the binding sites. The results thus obtained can furthermore validate the computational predictions, offering an all-round solution to the issue of finding the most likely RNA binding partners for a newly identified potential RBP.Entities:
Keywords: clip; molecular modelling; protein–RNA interaction predictions; protein–RNA interaction validation; protein–RNA interactions
Mesh:
Substances:
Year: 2020 PMID: 32820806 PMCID: PMC7458403 DOI: 10.1042/BST20191059
Source DB: PubMed Journal: Biochem Soc Trans ISSN: 0300-5127 Impact factor: 5.407
Figure 1.The workflow of discovering RNA partners for an RBP.
Protein and RNA sequence databases, structural information and results from RIP/CLIP experiments feed computational prediction tools such as catRAPID. The software utilises this information to define RNA sequences with high probability of interacting with a given RBP and rank them accordingly. Several in vitro techniques allow for the validation of predicted results, for the calculation of binding strength and the definition of the binding sites.
Figure 2.Bioinformatics analysis of CLIP-Seq and RIP-Seq data.
After being de-multiplexed based on sample-specific barcodes, reads undergo a pre-processing phase. UMIs are not always used, being more common in iCLIP and eCLIP protocols. When employed, they are sometimes used to remove PCR duplicates directly at this stage, but in most cases reads are simply marked based on UMI sequence, as shown by the colours assigned to trimmed reads. After the alignment of reads to the genome or to the transcriptome is performed, post-processing is needed to filter out multi-mapped reads and to collapse reads mapping at the same position, that are likely to represent PCR duplicates; if UMI-based read marking occurred, natural duplicates, which map at the same place but have different UMIs, can be retained, as shown here. RNA target identification and binding site detection strategy depend on the protocol. Roughly, such approaches can be divided into transcript enrichment analysis (RIP-Seq), which is analogous to differential expression analysis, and peak-calling (all protocols). Single-nucleotide resolution can be achieved using CIMSs in HITS-CLIP, transitions in PAR-CLIP and CITSs in iCLIP/eCLIP. HITS-CLIP experiments do not always produce clear and usable CIMS patterns.
A summary of the different catRAPID implementations
| Name of the algorithm | Description | Input | Output |
|---|---|---|---|
| It divides inputted protein and RNA into fragments and computes the interaction propensity between each fragment. | • A protein sequence in FASTA format. | • Interaction profile plot[ | |
| A variant of | • A protein sequence in FASTA format. | • Interaction profile plot[ | |
| It computes the interactions between a molecule (protein/RNA) and the reference set (transcriptome/nucleotide-binding proteome) of a model organism. | • Protein/RNA sequence in FASTA format. | • Graphical representation of protein sequence/domains. | |
| It allows the identification of co-expressed protein–RNA pairs in human tissues. | • A protein sequence in FASTA format | • Correlation coefficient representing the coexpression of the protein–RNA pair. | |
| It scans a protein sequence for RNA-binding regions. | • One or more protein sequences in FASTA format. | • Overall binding score. | |
| It allows the creation of a new reference set for | • One or more protein or RNA sequences. | • A library ID that can be used in catRAPID | |
| It computes the interaction strength of a protein–RNA pair with respect to a reference set of sequences of similar length. | • A protein sequence in FASTA format | • Table of interaction strength (significance of interaction propensity). |
The interaction profile plot represents the interaction score (y-axis) of the protein along the RNA sequence (x-axis), giving information about the transcript regions that are most likely to be bound by the protein;
The interaction matrix is an heatmap showing the interaction propensity between each possible fragment of the protein (y-axis) and the RNA (x-axis);
The pie chart shows the proportion of targets having High, Moderate and Low star rating score. Star rating score weights the interaction based on the interaction propensity, the presence of RNA/DNA binding domains and the presence of known RNA motifs;
The interaction heatmap shows the interaction score of the individual amino acid-nucleotide pairs;
The binding propensity plot reports, for each amino acid (x-axis), the propensity to be part of a binding region;
The Cumulative distribution function plots report the interaction score of the query protein–RNA pair within the distribution of the interaction scores from the reference set.
A more detailed explanation of the different algorithms is available on catRAPID tutorial page (http://s.tartaglialab.com/static_files/shared/tutorial.html) and documentation page (http://s.tartaglialab.com/static_files/shared/documentation.html).
Methods for in vitro characterisation of protein–RNA binding
| Method | Principle of detection | Sample requirements | Detection range | Sample capacity | Direct measurements |
|---|---|---|---|---|---|
| Detection of RNA–protein complex’ electrophoretic mobility properties, typically different compared to free RNA. | • Labelled RNA. | ≥10−18 mol RNA. | 0.5–500 µl depending on electrophoresis setup. | ||
| Quantification of 32P-labelled RNA via imagine screen or scintillation counter. | • About 0.1 µM labelled RNA (usually with 32P). | ≥10−15 mol RNA. | Multi-well plate dot-blot setup. | ||
| Changes in fluorescence anisotropy or polarisation of excitation light upon binding. | • Fluorescent labelling of one of the partners. | nM ranges of fluorophores. | Multi-well plates. | ||
| Energy transfer of between fluorophores detected as a change in fluorescence intensity. | • Two fluorophores, either one on each partner or strategically placed on one for structural studies. | single-molecule experiments. | Single molecule to multi-well plates. | ||
| Variations in the refractive index of polarised laser light upon molecular binding. | • About 200 µl 25 nM RNA/sensor. | 1 pM < | Up to 16 channels with microfluidics. | ||
| Detection of the variation of refracted white light upon the binding of the interaction partner to the immobilised ligand on the optical fibres. | • 1–50 µg/ml of ligand, immobilised on biosensor. | 1 nM < | Single channel, 5 min per measurement (BLItz) or multi-well plate, 1–8 simultaneous channels. | ||
| Variations in temperature-induced fluorescence emission of a target as a function of the concentration of a non-fluorescent ligand. | • 1–20 µl, nM–µM concentrations. | pM < | Up to 96 samples per run in a multi-capillary system. | ||
| Voltage-dependent variations of the movement of short fluorescent DNA nanolevers attached to a gold surface upon binding of an analyte. | • Immobilisation of one binding partner. | nM < | Four flow channels with six microelectrodes for sampling per chip | ||
| Measuring the heat consumed/released during titration of sample with the ligand in regard to reference cell. | • 200 µl–2 ml of 1-2 µM receptor. | • | single cell. |
Kinetic constants are measured directly and are used as basis for equilibrium thermodynamic parameters calculations, apart from ITC where the reaction enthalpy can be obtained without relying on kinetic data. Kd: equilibrium dissociation constant; kon: association rate constant; koff: dissociation rate constant; n: stechiometry of binding; Rh: hydrodynamic radius (radius of a theoretical sphere with the same translational diffusion coefficient); ΔH: reaction enthalpy.
A short overview of the major structural biology techniques with a comparison of their advantages and disadvantages for the study of protein–RNA interactions
| Method | Principle of detection | Resolution | Sample requirements | Pros/Cons |
|---|---|---|---|---|
| Detection of the electric current, induced by the magnetization of the non-equilibrium spins in a magnetic field. Upon Fourier transform, the results can be used to determine structural constraints and produce a molecular model. | atomic (<2 Å). | • Isotope labelling, side-chain deuteration essential for larger complexes to avoid lengthy relaxation times. | • Solution-based, can observe time-resolved experiments and kinetics, most accessible on the list, possibilities of differential isotope labelling, saturation transfer experiments and more. | |
| Detection of diffracted X-ray photons, scattered by the crystal, from which an 3D electron density map is calculated, which is then used to build the molecular structure model. | atomic (<2 Å). | Crystals of the protein–RNA complex, frozen in liquid nitrogen. | • Highest resolution limit with free electron lasers. | |
| Based on electron microscopy, the sample images are grouped into specific projections, with a 3D model calculated based on them. | high (<5 Å). | Monodisperse sample blotted onto grids and frozen under cryogenic conditions. | • Solution based, flexible buffer components, no need for crystals etc. | |
| Detection of diffracted X-ray photons (SAXS) or neutrons (SANS) on sample solutions under small angles (typically <10°), from which a scattering curve and a 3D shape can be calculated. | medium (>10 Å). | • Monodisperse sample, dilution series from 1 mg/ml to 20 mg/ml. | • Investigation of molecule shape as well as other information, selective deuteration can provide valuable contrast (SANS). |