| Literature DB >> 21459850 |
Abstract
Post-transcriptional gene regulation is mediated through complex networks of protein-RNA interactions. The targets of only a few RNA binding proteins (RBPs) are known, even in the well-characterized budding yeast. In silico prediction of protein-RNA interactions is therefore useful to guide experiments and to provide insight into regulatory networks. Computational approaches have identified RBP targets based on sequence binding preferences. We investigate here to what extent RBP-RNA interactions can be predicted based on RBP and mRNA features other than sequence motifs. We analyze global relationships between gene and protein properties in general and between selected RBPs and known mRNA targets in particular. Highly translated RBPs tend to bind to shorter transcripts, and transcripts bound by the same RBP show high expression correlation across different biological conditions. Surprisingly, a given RBP preferentially binds to mRNAs that encode interaction partners for this RBP, suggesting coordinated post-transcriptional auto-regulation of protein complexes. We apply a machine-learning approach to predict specific RBP targets in yeast. Although this approach performs well for RBPs with known targets, predictions for uncharacterized RBPs remain challenging due to limiting experimental data. We also predict targets of fission yeast RBPs, indicating that the suggested framework could be applied to other species once more experimental data are available.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21459850 PMCID: PMC3152324 DOI: 10.1093/nar/gkr160
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Features used in the correlation analysis and in the predictions
| Feature class | Features and description of data | Protein | RNA |
|---|---|---|---|
| Gene Ontology | GO RNA metabolism, GO protein biosynthesis, GO transcription, GO transport, GO DNA metabolism, GO mitochondrion GO cell cycle, GO signaling, GO bioprocess GO metabolism | X | X |
| Chromosomal position | Chromosome, genomic strand, chromo. Start coordinates, chromo. Stop coordinates | X | X |
| Gene physical properties | Length (ORF length), number of introns, first intron GC, first intron length | X | X |
| Protein physical properties | Isoelectric point, kDa (mass), TRP, VAL, etc. (total) and A, Y, etc. (relative) abundance of each amino acid, sulphur content and nitrogen content | X | X |
| Other physical properties | Codon Adaptation Index, protein length, codon bias, frequency of Optimal Codons (FOP), hydropathicity (Gravy score, indicating hydrophylicity or hydrophobicity) aromaticity (frequency of aromatic amino acids such as Phenylalanine, Tyrosine and Tryptophan) | X | |
| Protein localization | Local. Vacuole, Local. Cytoplasm, Local. Nucleus, Local. End. Ret. | X | |
| Experimental translation | mRNA half-life, ribosome occupancy, ribosome density, mRNA levels | X | X |
| mRNA properties | mRNA properties (Vienna RNA package): stem density; number of stems; 3stems, 5stems and orfstems (per length in transcript sections); c3 c5 co (absolute number); mRNA fold. energy; score (PARS) | X | |
| Predicted protein structure | PSIPRED prediction of secondary structure: coils in struc., strands in struc., helix in struc | X | |
| UTR properties | UTR properties: 3′- and 5′-UTR length, 3′-UTR A cont. etc … (relative abundance of each RNA base for the two UTRs); u3AC etc … (dinucleotide occurence). | X | |
| Genetic interactions | Known genetic interactions from the BioGRID. | X | X |
Some features are used only for RBPs, some only for the mRNAs and some for both, as indicated. (Detailed feature names and references in Supplementary Table S1).
Figure 1.Differences in correlations between known RBP–mRNA pairs versus randomized sets. (A) Pair-wise Spearman correlation for positive protein–mRNA pairs. (B) Average of absolute value of Spearman correlation in the 100 randomized negative sets. (C) Spearman correlation P-values of the correlations in the positive set depicted in A. (D) Spearman correlation P-values of the correlations averaged over 100 randomized sets depicted in (B). Note how some correlations between the RBP features and corresponding target features are only present in the positive set (off-diagonal quadrants).
Figure 2.A selection of Spearman correlations between features of the budding yeast protein–mRNA pairs. Blue and red lines indicate positive and negative correlations, respectively, with thicker lines indicating stronger correlations. (A) Correlations of protein features, calculated for the 40 budding yeast proteins for which mRNA targets are known (26). (B) Correlations of mRNA features, calculated for all the mRNAs that are RBP targets. (C) Correlations between protein and mRNA features only present in experimentally verified RBP–mRNA interactions. Features are circled in different colors according to the type as indicated. Features and abbreviations are explained in Supplementary Table S1.
Figure 3.Differences in expression correlation and interactions in the positive and negative sets. (A) The average absolute correlation of mRNAs bound by the same RBPs is higher when more RBPs are shared (Spearman r = 0.08, P < 10−16), particularly when eight RBPs bind the same two targets. (B) The absolute expression correlation between RBP and mRNA targets is higher in the positive set of RBP–mRNA pairs (black line) than it is in random pairs (gray circles with gray line showing the average. (C) There are more physical protein interactions between RBP and mRNA pairs in the positive set (black line) than in the randomized sets (gray circles with gray line showing the average). The same analysis carried out on two other interaction data sets is shown in Supplementary Figure S3.
Figure 4.Performance of machine-learning approaches in 2-fold cross validation tests. ROC curves for predictions of budding yeast RBP–mRNA interactions. (A) SVM, 10 ROC curves with average AUC = 0.75. (B) RF, 10 ROC curves with average AUC = 0.77. See ‘Materials and Methods’ section for details on ROC curves.
Figure 5.Predicting interactions of Nop15p, an RBP without targets in training set. A total of 51 interactions with data can be predicted, a subset of the set identified in (51). The number in brackets is the threshold of probability for which 360 predictions are obtained, to compare the different methods. Enhanced results are obtained after adding eight positive predictions to the training set. (A) Only six of the 51 targets are predicted. (B) Enhancing the training set with the six interactions that were predicted and experimentally verified in (51) we could predict 32 of the 51 targets. Thicker lines indicate more confident predictions. Targets represented with diamonds are proteins with which Nop15p is known to physically interact. The target which is represented with a circle is the only mRNA target whose protein is not physically interacting with Nop15.
Figure 6.Diagram reflecting the bias present in the currently available data. This diagram shows why knowing more RBP targets helps to improve the predictions. The feature set for one of our objects is composed of protein and mRNA features, implying that all objects involving the interaction of one RBP with different mRNA targets will have a large set of identical features. Each feature is a coordinate on an axis of a multidimensional space. The 2D space represented here shows two of these axes, with the x-axis corresponding to one RBP feature and the y-axis corresponding to one mRNA feature. It follows that pairs of the same RBP with different mRNAs have the same coordinates on the x-axis. Three different training sets are represented with squares (interacting pairs) and circles (non-interacting pairs). The training defines a boundary between two regions of the 2D space, one for interacting pairs and one for non-interacting pairs. The black triangle represents a new RBP–mRNA pair with known features (i.e. its position), but unknown interaction (i.e. its shape). If we used the boundary established during SVM training shown in (A), we would not be able to conclude whether the new pair interacts. The boundary could include the triangle on either side with equal probability. However, with just one known interaction of this RBP in the training [indicated by the extra square and circle in (B) and (C) at x = 2.5] the boundary would be better defined and we could be more certain that the new pair either interacts (B) or not (C).