| Literature DB >> 24174546 |
Donna Koslowsky1, Yanni Sun, Jordan Hindenach, Terence Theisen, Jasmin Lucas.
Abstract
One of the most striking examples of small RNA regulation of gene expression is the process of RNA editing in the mitochondria of trypanosomes. In these parasites, RNA editing involves extensive uridylate insertions and deletions within most of the mitochondrial messenger RNAs (mRNAs). Over 1200 small guide RNAs (gRNAs) are predicted to be responsible for directing the sequence changes that create start and stop codons, correct frameshifts and for many of the mRNAs generate most of the open reading frame. In addition, alternative editing creates the opportunity for unprecedented protein diversity. In Trypanosoma brucei, the vast majority of gRNAs are transcribed from minicircles, which are approximately one kilobase in size, and encode between three and four gRNAs. The large number (5000-10,000) and their concatenated structure make them difficult to sequence. To identify the complete set of gRNAs necessary for mRNA editing in T. brucei, we used Illumina deep sequencing of purified gRNAs from the procyclic stage. We report a near complete set of gRNAs needed to direct the editing of the mRNAs.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24174546 PMCID: PMC3919587 DOI: 10.1093/nar/gkt973
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
The major gRNA classes involved in the editing of ATPase 6
| mRNA 5′ | mRNA 3′ | Copy no. | ATPase 6: Major gRNA classes |
|---|---|---|---|
| 24 | 72 | 6 | |
| 29 | 72 | 1630 | |
| ✰31 | 75 | 2044 | |
| ✰31 | 75 | 143 | |
| ✰62 | 102 | 1435 | |
| ✰84 | 127 | 4 | |
| ✰86 | 127 | 2158 | |
| ✰105 | 152 | 54 | |
| ✰113 | 152 | 743 | |
| ✰135 | 183 | 1 | |
| ✰138 | 183 | 430 | |
| 144 | 177 | 210 | |
| ✰158 | 208 | 14 | |
| 164 | 208 | 54 | |
| ✰176 | 210 | 36 | |
| ✰165 | 208 | 24 | |
| ✰189 | 243 | 3 | |
| ✰192 | 243 | 172 | |
| ✰218 | 248 | 147 | |
| ✰224 | 269 | 864 | |
| ✰226 | 269 | 808 | |
| ✰249 | 299 | 4 | |
| ✰248 | 292 | 25 157 | |
| ✰252 | 292 | 134 | |
| ✰255 | 292 | 244 | |
| 253 | 298 | 5405 | |
| ✰266 | 313 | 586 | |
| 300 | 346 | 263 | |
| ✰301 | 345 | 647 | |
| ✰301 | 335 | 125 | |
| ✰331 | 375 | 24 736 | |
| ✰331 | 375 | 712 | |
| ✰332 | 378 | 8776 | |
| 331 | 371 | 3561 | |
| 331 | 374 | 302 | |
| ✰332 | 378 | 387 | |
| ✰332 | 378 | 144 | |
| ✰349 | 389 | 41 | |
| ✰360 | 407 | 181 | |
| 387 | 435 | 1428 | |
| ✰387 | 435 | 1049 | |
| ✰387 | 437 | 934 | |
| 413 | 461 | 3 | |
| ✰424 | 464 | 25 624 | |
| ✰427 | 467 | 2307 | |
| ✰421 | 460 | 1864 | |
| 424 | 457 | 368 | |
| ✰455 | 491 | 368 | |
| ✰455 | 497 | 22 | |
| ✰487 | 528 | 1 | |
| ✰487 | 526 | 8723 | |
| ✰521 | 567 | 232 | |
| ✰546 | 593 | 15 | |
| ✰549 | 593 | 2587 | |
| ✰549 | 592 | 181 | |
| ✰557 | 593 | 69 619 | |
| 568 | 611 | 670 | |
| 583 | 629 | 2 | |
| 589 | 629 | 854 | |
| 612 | 657 | 3 | |
| 613 | 654 | 618 | |
| ✰613 | 657 | 183 | |
| ✰638 | 689 | 5 | |
| ✰640 | 689 | 39 063 | |
| ✰647 | 689 | 678 | |
| ✰640 | 689 | 131 | |
| ✰654 | 689 | 234 | |
| ✰672 | 716 | 119 | |
| ✰680 | 714 | 7581 | |
| ✰680 | 714 | 105 | |
| ✰680 | 719 | 1291 | |
| ✰686 | 728 | 12 | |
| ✰698 | 728 | 740 | |
| ✰715 | 760 | 2 | |
| ✰715 | 755 | 2272 | |
| ✰720 | 767 | 4588 | |
| ✰720 | 767 | 165 | |
| ✰728 | 765 | 920 | |
| ✰747 | 789 | 13 | |
| 770 | 822 | 1 | |
| ✰774 | 822 | 8663 | |
The gRNA is identified by its complementarity to the 5′ (column 1) and 3’ (column 2) number of the fully edited mRNA (+1 = 0). The gRNAs were sorted based on both mRNA regions covered and on guiding sequence class. Sequence variations observed in both the 5′ non-complementary region and the 3′ U-tail were ignored in assigning sequence classes. Transcript copy numbers (column 3) were determined by adding all gRNAs of the same sequence class. Major sequence classes were defined as containing greater than 100 transcript copies. The exact sequence shown is of the most abundant transcript in each sequence class. In the case of rare gRNA transcripts, the identified gRNAs are shown regardless of copy number. The asterisk indicates the highest scoring gRNAs identified for each population. Starred (✰) gRNAs indicate novel gRNAs not found in the KISS database (http://splicer.unibe.ch/kiss). A total of 84% of the gRNAs identified in this study are ‘novel gRNAs’, not previously identified.
aIdentified highest scoring gRNA (‘Best align’).
bmRNA 5′ border based on alternative sequence.
Figure 1.The gRNA-mRNA sequence alignment for fully edited ATPase 6. The cDNA sequence of the most abundant gRNA in its sequence class is shown aligned beneath the fully edited mRNA. Lowercase u’s indicate uridines added by editing, asterisks indicate encoded uridines deleted during editing. Nucleotides and deletion sites in the fully edited mRNA were numbered starting from the 5′ end (+1 = 0). gRNAs are colored (on-line version only) based on transcript abundance as follows: Blue < 100; Green < 1000; Purple < 10 000; Orange < 100 000; Red > 100 000; Black = not quantified. Watson-Crick (|) and G:U base pairs (:) are indicated. Mismatches are indicated by the number sign (#) and shown in a contrasting color. The potential mRNA sequence generated by a more abundant alternative A6 initiating gRNA is also shown in Figure 1. Although this gRNA would introduce a number of sequence changes (indicated in red online) downstream of the stop codon (underlined), the generated anchor sequence for the next gRNA is maintained.
Figure 2.Example of sequential 5′ end truncations suggestive of a 5′–3′ exonuclease activity. The gA6 (248–292) sequence class was large (∼25 000 transcripts, containing a large number of transcripts with both 5′ truncations and sequence variations in the U-tail (length of U-tail and U-tail punctuated with other nucleotides). The transcript numbers reported are for the specific sequence shown (i.e. 6788 cDNAs with a T-13 tail).
Major gRNA classes for the COIII 699–753 region
| mRNA 5′ | mRNA 3′ | Copy number | Major gRNA sequence classes for the COIII 699–753 region |
|---|---|---|---|
| 699 | 748 | 977 | |
| 701 | 748 | 2726 | |
| 7061 | 753 | 31 331 | |
| 7071 | 753 | 8037 | |
| 7131 | 753 | 595 | |
| 7151 | 753 | 182 | |
| 7192 | 753 | 131 | |
| 7073 | 753 | 206 | |
| 7064 | 753 | 6744 | |
| 7074 | 753 | 3791 | |
| 7134 | 753 | 154 | |
| 7065 | 753 | 1214 | |
| 7075 | 753 | 1124 | |
| 7076 | 752 | 848 | |
Fourteen major sequence classes that fall into two distinct populations were identified that could guide the editing of the 699–753 region. The two populations are shifted by 5 nt in their guiding regions and have 13 nt differences in the overlap region (all R to R and Y to Y changes allowing them to guide the generation of the same sequence). The 706–753 population can be further divided into six major sequence groups as indicated. Within each group the different sequence classes differ in the position of the polyU tail (changing the mRNA 5′ border). The different groups, however, are defined by distinct differences in the sequence of the guiding region (nucleotide changes shown in bold and underlined).
Figure 3.Length of gRNA complementarity to fully edited mRNAs. The shortest and longest gRNAs identified had 24 nt and 61 nt of complementarity to their fully edited mRNA. The bulk of the sequence classes (64%) had 38–48 nt of complementarity.
Figure 4.gRNA characteristics: number of non-matched nucleotides found 5′ to the anchor (A) or 3′ to the guiding region, excluding the U-tail (B). Most of the identified gRNAs had few non-complementary nucleotides.
Figure 5.Long 5′ non-matched extensions often signal editing anomalies. gND7 (147–199) and gND7 (152–190) were both identified as initiating gRNAs for the 5′ editing domain of ND7. (A) The sequence of the two gRNAs is shown aligned below the edited ND7 mRNA sequence. A single T insertion (bold and underlined) disrupts the anchor of gND7 (152–190). (B) The putative alternative sequence generated if editing initiates with gND7 (152–190). Watson–Crick (|) and G:U base pairs (:) are indicated. The number sign (#) indicates C:A base pairs required for generation of this possible sequence.
Most common gRNA initiation sequences
| Initiating Sequence | Number of sequence classes | % | Number of transcripts | % |
|---|---|---|---|---|
| 5′ ATATAT | 214 | 35.2% | 1 320 726 | 37.4% |
| 5′ ATATAA | 128 | 21.1% | 870 728 | 24.7% |
| 5′ AAAAAA | 28 | 4.6% | 45 540 | 1.3% |
| 5′ ATATAC | 26 | 4.3% | 134 862 | 3.8% |
| 5′ ATACAA | 17 | 2.8% | 60 428 | 1.7% |
| 5′ ATATTA | 16 | 2.6% | 31 779 | 0.9% |
| 5′ ATATAG | 15 | 2.4% | 269 306 | 7.6% |
| 5′ ATAAAT | 15 | 2.6% | 23 669 | 0.7% |
| 5′ ATACAT | 13 | 2.2% | 80 453 | 2.2% |
| 5′ ATAAAA | 12 | 2.1% | 38 907 | 1.1% |
| 5′ ATAAAG | 9 | 1.5% | 176 446 | 5.0% |
| 5′ ATACTA | 8 | 1.4% | 58 797 | 1.7% |
The major sequence classes identified were grouped based on the first 6 nt and sorted based on both the number of sequence classes and the total number of transcripts found. Most transcripts (∼74%) initiated with ATATA (includes ATATAT (37.4%)), ATATAA (24.7%, ATATAC (3.8%)) and ATATAG (7.6%).
Figure 6.Identification of gRNAs that initiate with long A-runs. (A) Examples of identified gRNAs initiating with long A-runs. Sequence complementary to the fully edited mRNA is underlined. (B) Alignment of gCR4(405–548) with its corresponding edited mRNA. (C) Alignment of gA6(521–567) with its corresponding edited mRNA. The gRNA cDNA sequence is shown aligned beneath the fully edited mRNA as described in Figure 1. The partial sequence of the downstream gRNA that directs the creation of the anchor-binding site is also shown. For both of the these gRNAs, the anchor interaction (bold font) involves a single G:C base pair. Two downstream gRNAs were identified that direct editing of the A6 550–570 region. gA6 (549–593) would direct the insertion of 12 U-residues, while gA6(557–593) would direct the insertion of 11 U-residues. gA6(557–593) is much more abundant (∼70 000 versus ∼2500 transcripts identified). (D) Comparison of the protein sequences generated by the conventional 12U-edited (top line) and alternatively 11U-edited A6 transcripts. The alternative protein sequence (double underlined) is 11 AA shorter.
Figure 7.Abundance of gRNA transcripts (y-axis, note log scale) that align to a respective nucleotide in the fully edited mRNA. Both nucleotides and deletion sites in the fully edited mRNA were numbered starting from the 5′ end (+1 = 0). Shaded boxes indicate identified gRNAs that cover specific editing sites. Data include only the identified conventional gRNAs. (A) Identified gRNAs for the A6 edited mRNA. (B) Identified gRNAs for CR3. (C) Identified gRNAs for ribosomal protein subunit 12 (RSP12). All individual data points were designated with open circles. Close overlapping of individual data points generate solid black lines.
Figure 8.gCYb (54–91) transcript alignment with minicircle MCP23. The minicircle sequence is shown on top with both gCYb(54–91) and the previously identified gCYb-560A aligned underneath.
Figure 9.Abundant mismatched gRNA generates edited sequence with a single AA change. (A) Two gRNAs that edited the same region as the rare gND8(161–198) were identified that contain mismatches to the conventional edited sequence of ND8 (mismatches in bold and underlined). The gRNAs are shown aligned beneath the fully edited mRNA as described in Figure 1. (B) gND8(158–186) is the most abundant of the three transcripts, and would generate an edited sequence with a single amino acid change.