| Literature DB >> 16147990 |
Jonatha M Gott1, Neeta Parimi, Ralf Bundschuh.
Abstract
Gene finding is complicated in organisms that exhibit insertional RNA editing. Here, we demonstrate how our new algorithm Predictor of Insertional Editing (PIE) can be used to locate genes whose mRNAs are subjected to multiple frameshifting events, and extend the algorithm to include probabilistic predictions for sites of nucleotide insertion; this feature is particularly useful when designing primers for sequencing edited RNAs. Applying this algorithm, we successfully identified the nad2, nad4L, nad6 and atp8 genes within the mitochondrial genome of Physarum polycephalum, which had gone undetected by existing programs. Characterization of their mRNA products led to the unanticipated discovery of nucleotide deletion editing in Physarum. The deletion event, which results in the removal of three adjacent A residues, was confirmed by primer extension sequencing of total RNA. This finding is remarkable in that it comprises the first known instance of nucleotide deletion in this organelle, to be contrasted with nearly 500 sites of single and dinucleotide addition in characterized mitochondrial RNAs. Statistical analysis of this larger pool of editing sites indicates that there are significant biases in the 2 nt immediately upstream of editing sites, including a reduced incidence of nucleotide repeats, in addition to the previously identified purine-U bias.Entities:
Mesh:
Substances:
Year: 2005 PMID: 16147990 PMCID: PMC1201332 DOI: 10.1093/nar/gki820
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 5Primer extension sequencing of nad2 DNA and mRNA showing the region encompassing the triple A deletion. Arrowheads indicate the A residues present in the DNA, but missing from the bulk RNA. The missing As are indicated by a thick line at the right; inserted Cs are marked with asterisks.
Figure 1Schematic description of the PIE algorithm. A position specific scoring matrix generated from protein alignments of known sequences (left) is compared to the family of translation products that could potentially be generated by insertional editing throughout the gene of interest (right). See text and ref. (10) for details.
Figure 2Characterization of the cox2 mRNA. The region of the Physarum polycephalum mitochondrial genome that contains the cox2 gene is shown, with the predicted sites of C insertion shown below. Note that only one C is expected to be added at any given cluster; relative probabilities of insertion at any given site are indicated (see scale at the bottom). The experimentally determined editing sites are shown above the nucleotide in the genomic sequence that lies immediately 5′ to the inserted C. Note that when an inserted C lies above an encoded C, the exact site of C insertion is ambiguous. Numbers refer to genomic coordinates from ref. (5).
Figure 3Editing sites within the polycistronic atp8/nad4L mRNA. Genomic (mtDNA) and RNA (cDNA) sequences are shown. Conceptual translation products are shown, with start and stop codons underlined. The incorrectly predicted atp8 stop codon is indicated by a dotted underline. Oligonucleotide primers mentioned in the text are indicated by a double underline. Numbers refer to genomic coordinates from ref. (5).
Figure 4Editing sites within the nad6 mRNA. Notations are as in the legend to Figure 3 except that the cox3 start codon is indicated by a double underline.
Total observed editing events
| Previous coding | Total coding | Stable RNA | Previous total | Total | |
|---|---|---|---|---|---|
| Editing sites | 250 | 390 | 107 | 357 | 497 |
| C insertion | 222 | 353 | 97 | 319 | 450 |
| Unambiguous | 140 | 227 | 66 | 206 | 293 |
This table gives an overview of the total number of editing events in the mRNAs characterized before our study (nad7, cox1, cox3, cytb, atp1 and atp9), all mRNAs including the ones studied here (cox2, nad2, nad4L, nad6 and atp8), and the stable RNAs, as well as the total characterized before our study and including the results of our study.
Codons created by C insertions
| A | U | G | C | |
|---|---|---|---|---|
| AXC | 5 (4) | 76 (49) | 10 (5) | 40 (29) |
| ACX | 0 (0) | 0 (0) | 0 (0) | |
| UXC | 4 (2) | 11 (7) | 1 (1) | 12 (8) |
| UCX | 6 (1) | 11 (3) | 0 (0) | |
| GXC | 3 (2) | 33 (23) | 0 (0) | 29 (17) |
| GCX | 1 (1) | 5 (4) | 1 (0) | |
| CAX | 8 (5) | 3 (3) | 1 (1) | 2 (1) |
| CUX | 17 (12) | 12 (5) | 4 (2) | 5 (1) |
| CGX | 4 (3) | 6 (6) | 0 (0) | 1 (0) |
| CCX | 14 (9) | 12 (7) | 2 (2) | 2 (1) |
The numbers shown comprise all 11 characterized mRNAs; numbers in parenthesis include data from the six previously known mRNAs.
Correlation between the two positions immediately preceding the editing site
| −2\−1 | A | U | G | Total |
|---|---|---|---|---|
| A | 6 (14) | 122 (115) | 13 (11) | 141 (62%) |
| U | 9 (3) | 16 (26) | 6 (2) | 31 (14%) |
| G | 6 (5) | 43 (39) | 0 (4) | 49 (21%) |
| C | 2 (1) | 4 (6) | 0 (1) | 6 (3%) |
| Total | 23 (10%) | 185 (82%) | 19 (8%) | 227 (100%) |
The main entries are the actual numbers of observations of each of the pairs of bases while the numbers in parenthesis are the number of observations expected from the percentages of the totals in the individual positions.