| Literature DB >> 19422723 |
Pufeng Du1, Liyan Jia, Yanda Li.
Abstract
BACKGROUND: RNA editing is a type of post-transcriptional modification of RNA and belongs to the class of mechanisms that contribute to the complexity of transcriptomes. C-to-U RNA editing is commonly observed in plant mitochondria and chloroplasts. The in vivo mechanism of recognizing C-to-U RNA editing sites is still unknown. In recent years, many efforts have been made to computationally predict C-to-U RNA editing sites in the mitochondria of seed plants, but there is still no algorithm available for C-to-U RNA editing site prediction in the chloroplasts of seed plants.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19422723 PMCID: PMC2688514 DOI: 10.1186/1471-2105-10-135
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The summary of the dataset
| Organism | No. of Genes | Total | POS | NEG |
| 13 | 2284 | 28 | 2256 | |
| 14 | 2885 | 27 | 2858 | |
| 17 | 1960 | 35 | 1925 | |
| 17 | 3712 | 32 | 3680 | |
| 16 | 1921 | 33 | 1888 | |
| 10 | 2362 | 20 | 2342 | |
| 22 | 3802 | 42 | 3760 | |
| 13 | 1658 | 28 | 1630 | |
| 16 | 2839 | 26 | 2813 | |
| 13 | 3311 | 23 | 3288 | |
| 13 | 3294 | 25 | 3269 | |
| Overall | 164 | 30028 | 319 | 29709 |
The Number of Genes column is the number of edited genes in the organism, the Total column is the number of all cytidines in the edited genes, the POS column is the number of edited cytidines in the edited genes and the NEG column is the number of unedited cytidines in the edited genes.
The performance of leave-one-species-out cross-validation
| Organism | Sen | Spe | PPV | ACC | BA | MCC |
| 71.43% | 99.87% | 86.96% | 99.52% | 85.65% | 0.79 | |
| 92.59% | 99.79% | 80.65% | 99.72% | 96.19% | 0.86 | |
| 91.43% | 99.90% | 94.12% | 99.74% | 95.66% | 0.93 | |
| 90.63% | 99.84% | 82.86% | 99.76% | 95.23% | 0.87 | |
| 90.91% | 99.74% | 85.71% | 99.58% | 95.32% | 0.88 | |
| 100.00% | 99.87% | 86.96% | 99.87% | 99.94% | 0.93 | |
| 40.48% | 99.89% | 80.95% | 99.24% | 70.18% | 0.57 | |
| 64.29% | 99.02% | 52.94% | 98.43% | 81.65% | 0.58 | |
| 28.57% | 99.82% | 72.73% | 98.61% | 64.19% | 0.45 | |
| 76.92% | 99.75% | 74.07% | 99.54% | 88.34% | 0.75 | |
| 100.00% | 99.91% | 88.46% | 99.91% | 99.95% | 0.94 | |
| 96.00% | 99.97% | 96.00% | 99.94% | 97.98% | 0.96 | |
| Over All | 80.88% | 99.81% | 82.17% | 99.61% | 90.34% | 0.81 |
Sen means sensitivity, Spe means specificity, PPV means positive predictive value, ACC means accuracy, BA means balanced accuracy and MCC means Matthew's correlation coefficient. All the values were obtained with leave-one-species-out cross-validation on the training set. The performance marked with "(*)" was obtained using the extended EPES definition. The overall performance was calculated using the "(*)" performance.
The performance evaluated on a balanced dataset
| Organism | Sen | Spe | PPV | ACC | BA | MCC |
| 71.43% | 100.00% | 100.00% | 85.71% | 85.71% | 0.75 | |
| 92.59% | 100.00% | 100.00% | 96.30% | 96.30% | 0.93 | |
| 91.43% | 100.00% | 100.00% | 95.71% | 95.71% | 0.92 | |
| 90.63% | 100.00% | 100.00% | 95.31% | 95.31% | 0.91 | |
| 90.91% | 100.00% | 100.00% | 95.45% | 95.45% | 0.91 | |
| 100.00% | 100.00% | 100.00% | 100.00% | 100.00% | 1.00 | |
| 40.48% | 100.00% | 100.00% | 70.24% | 70.24% | 0.50 | |
| 64.29% | 100.00% | 100.00% | 82.14% | 82.14% | 0.69 | |
| 76.92% | 100.00% | 100.00% | 88.46% | 88.46% | 0.79 | |
| 100.00% | 100.00% | 100.00% | 100.00% | 100.00% | 1.00 | |
| 96.00% | 100.00% | 100.00% | 98.00% | 98.00% | 0.96 | |
| Over All | 80.88% | 100.00% | 100.00% | 90.44% | 90.44% | 0.82 |
Sen means sensitivity, Spe means specificity, PPV means positive predictive value, ACC means accuracy, BA means balanced accuracy and MCC means Matthew's correlation coefficient. On the balanced dataset, the BA always equals the ACC.
The performance in independent tests
| Test data | Sen | Spe | PPV | ACC | BA | MCC |
| 10% | 86.67% | 99.74% | 68.42% | 99.66% | 93.20% | 0.77 |
| 20% | 88.57% | 99.68% | 72.09% | 99.58% | 94.13% | 0.80 |
| 30% | 79.83% | 99.75% | 77.50% | 99.54% | 89.79% | 0.78 |
Sen means sensitivity, Spe means specificity, PPV means positive predictive value, ACC means accuracy, BA means balanced accuracy and MCC means Matthew's correlation coefficient. The Test data column is the percentage of data that has been randomly selected as the test set. The remaining data are used as the training set.
Performance test with the entire genome sequence
| Organism | Sen | Spe | PPV | ACC | BA | MCC |
| 67.86% | 99.93% | 48.72% | 99.90% | 83.89% | 0.57 | |
| 87.50% | 99.95% | 65.12% | 99.94% | 93.72% | 0.75 | |
| 50.00% | 99.23% | 7.29% | 99.17% | 74.61% | 0.19 | |
| 84.00% | 99.94% | 58.33% | 99.93% | 91.97% | 0.70 | |
| 72.57% | 99.79% | 26.45% | 99.76% | 86.18% | 0.44 |
Sen means sensitivity, Spe means specificity, PPV means positive predictive value, ACC means accuracy, BA means balanced accuracy and MCC means Matthew's correlation coefficient.
Figure 1Phylogenetically skewed knowledge of chloroplast C-to-U RNA editing sites. Current knowledge of chloroplast C-to-U RNA editing sites is phylogenetically skewed. The performance of CURE-Chloroplast on different lineages of seed plants is associated with the abundance of data relating to that lineage. The column "# organisms" refers to the number of organisms in the corresponding lineage. The column "# genes" refers to the total number of edited genes.