| Literature DB >> 25759810 |
Giovanni Nigita1, Salvatore Alaimo2, Alfredo Ferro3, Rosalba Giugno3, Alfredo Pulvirenti3.
Abstract
RNA editing is a post-transcriptional alteration of RNA sequences that is able to affect protein structure as well as RNA and protein expression. Adenosine-to-inosine (A-to-I) RNA editing is the most frequent and common post-transcriptional modification in human, where adenosine (A) deamination produces its conversion into inosine (I), which in turn is interpreted by the translation and splicing machineries as guanosine (G). The disruption of the editing machinery has been associated to various human diseases such as cancer or neurodegenerative diseases. This biological phenomenon is catalyzed by members of the adenosine deaminase acting on RNA (ADAR) family of enzymes and occurs on dsRNA structures. Despite the enormous efforts made in the last decade, the real biological function underlying such a phenomenon, as well as ADAR's substrate features still remain unknown. In this work, we summarize the major computational aspects of predicting and understanding RNA editing events. We also investigate the detection of short motif sequences potentially characterizing RNA editing signals and the use of a logistic regression technique to model a predictor of RNA editing events. The latter, named AIRlINER, an algorithmic approach to assessment of A-to-I RNA editing sites in non-repetitive regions, is available as a web app at: http://alpha.dmi.unict.it/airliner/. Results and comparisons with the existing methods encourage our findings on both aspects.Entities:
Keywords: A-to-I RNA editing; ADARs; logistic regression; motif analysis; prediction
Year: 2015 PMID: 25759810 PMCID: PMC4338823 DOI: 10.3389/fbioe.2015.00018
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
Figure 1Statistics about the . Distribution of editing sites frequency in repetitive ERs (A) and non-repetitive ERs (B). Distribution of repetitive ERs sequence length (C) and non-repetitive ERs sequence length (D). The figure shows that the non-repetitive ERs are shorter than repetitive ones and contain fewer editing sites.
Filtered motifs in ERs (47 edited regions).
| Motif | Sequence ( | Width | Type | |
|---|---|---|---|---|
| 1 | CCAGGCTGGAGTGCAGTGGCGCAATCTCA | 29 | Non-palindromic | 1E-126 |
| 2 | GGATTACAGGCGTGAGCCACCGCGCCTGG | 29 | Non-palindromic | 3,60E-123 |
| 3 | GAGGTGCTGGGATTATAGGGG | 21 | Non-palindromic | 8,50E-35 |
| 4 | CCTGACCTCATGAGA | 15 | Non-palindromic | 4,10E-22 |
| 5 | AGACATGGAACCAACCTAAATGCCCACCA | 29 | Non-palindromic | 9,40E-17 |
| 6 | AGGAGGCAAAGGAAG | 15 | Non-palindromic | 7,00E-11 |
| 7 | TGGGATTGCAGGCAT | 15 | Non-palindromic | 1,20E-06 |
| 8 | TTTCATGGCTGCATAGTATTCTATTGTGT | 29 | Non-palindromic | 1,00E-05 |
| 9 | TGTAAATTAGTACAGCCTTTATGGAAAAC | 29 | Non-palindromic | 2,90E-12 |
| 10 | AGTCCCAGCTTCTCGAGAAGCTGGGACT | 28 | Palindromic | 2,7E-97 |
| 11 | TGCACCCCAGGCTGGGGTGCA | 21 | Palindromic | 8,4E-50 |
| 12 | CTTGTACTCCCAACATGTTGGGAGTACAAG | 30 | Palindromic | 5,2E-72 |
| 13 | CTTGAACCTCGGAGGTTCAAG | 21 | Palindromic | 3,9E-28 |
Figure 2Neighborhood preferences that we computed for experimentally verified editing sites in non-repetitive regions (A) and random sites (B) chosen among those for which no editing event is reported. Neighborhood preferences are coherent with the upstream nucleotide distribution of editing site sequence contexts reported in Eggington et al. (2011).
Confusion matrix computed by applying InosinePredict (Eggington et al., .
| Prediction outcome | |||
|---|---|---|---|
| Editing site | Non-editing site | ||
| Actual value | Editing sites | 58.48 | 41.52 |
| Random sites | 60.18 | 39.82 | |
Editing percentages for each sites have been divided into two classes (editing/non-editing) using the thresholds defined in Eggington et al. (.
Confusion matrix computed by applying AIRlINER to our dataset.
| Prediction outcome | |||
|---|---|---|---|
| Editing site | Non-editing site | ||
| Actual value | Editing sites | 71.18 | 28.82 |
| Random sites | 34.05 | 65.95 | |
All editing sites for which editing probability is >0.5 were classified as editing while the remaining as non-editing.
Figure 3Receiver operating characteristic curve (ROC) computed for the two prediction algorithms. We also provide a ROC curve for a variant of our algorithm (AIRlINER 4 nt), which takes into account only the flanking region of 4 nt around an adenosine. Such a curve is useful to compare the performance with our algorithm using the same flanking region. AIRlINER shows an average area under the ROC curve (AUC) equal to 0.7466, while InosinePredict gets an AUC of 0.5072. AIRlINER 4 nt has an AUC of 0.7464.
Figure 4Comparison between AIRlINER and InosinePredict by means of receiver operating characteristic curve (ROC) computed using the data set built from Bahn et al. (. Here we also show a ROC curve for a variant of the proposed algorithm (AIRlINER 4 nt), which takes into account only the flanking region of 4 nt around an adenosine. AIRlINER shows an average area under the ROC curve (AUC) equal to 0.6763, while InosinePredict gets an AUC of 0.4498. AIRlINER 4 nt has an AUC of 0.6435.