| Literature DB >> 21245033 |
Katherine Sorber1, Michelle T Dimon, Joseph L DeRisi.
Abstract
Over 50% of genes in Plasmodium falciparum, the deadliest human malaria parasite, contain predicted introns, yet experimental characterization of splicing in this organism remains incomplete. We present here a transcriptome-wide characterization of intraerythrocytic splicing events, as captured by RNA-Seq data from four timepoints of a single highly synchronous culture. Gene model-independent analysis of these data in conjunction with publically available RNA-Seq data with HMMSplicer, an in-house developed splice site detection algorithm, revealed a total of 977 new 5' GU-AG 3' and 5 new 5' GC-AG 3' junctions absent from gene models and ESTs (11% increase to the current annotation). In addition, 310 alternative splicing events were detected in 254 (4.5%) genes, most of which truncate open reading frames. Splicing events antisense to gene models were also detected, revealing complex transcriptional arrangements within the parasite's transcriptome. Interestingly, antisense introns overlap sense introns more than would be expected by chance, perhaps indicating a functional relationship between overlapping transcripts or an inherent organizational property of the transcriptome. Independent experimental validation confirmed over 30 new antisense and alternative junctions. Thus, this largest assemblage of new and alternative splicing events to date in Plasmodium falciparum provides a more precise, dynamic view of the parasite's transcriptome.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21245033 PMCID: PMC3089446 DOI: 10.1093/nar/gkq1223
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 2.Validation of new splicing events. (A) Shade indicates the relative abundance of each isoform. Initial outer PCR (green arrows) amplifies both isoforms from cDNA. A restriction enzyme then cuts the known isoform. Nested inner PCR (blue arrows) amplifies only the uncut, new isoform, which is then sequence confirmed. Gbrowse (66) windows depict validation of a skipped exon in MAL13P1.159 (B), an antisense junction in PFF0290w (C), and an alternate 3′-splice site in PFB0279w (D). All HMMSplicer junctions scoring higher than 980 are shown as either dark blue bars (known junctions) or light blue bars (new conflicting junctions). The number of reads supporting each junction is shown in the bars, while the direction of the arrow reflects the direction of the splice sites. Validation sequencing results are shown in magenta. Bowtie coverage for each nucleotide in the window is shown as a histogram. Underneath, the dark blue bars depict PlasmoDB v6.3 gene models with numbers denoting the exons, while the gold bars at the bottom of each window depict ESTs.
Putative P. falciparum splicing and non-sense-mediated decay factor homologs identified by reciprocal best hits analysis with human or S. cerevisiae sequences
| Complex | Human/Yeast | Complex | Human/Yeast | ||
|---|---|---|---|---|---|
| SNRPB/SMB1 | PF14_0146 | PRPF3/PRP3 | MAL13P1.45 | ||
| SNRPD1/SMD1 | PF11_0266 | NHP2L1/SNU13 | PF11_0250 | ||
| SNRPD2/SMD2 | PFB0865w | PRPF4/PRP4 | MAL13P1.385 | ||
| SNRPD3/SMD3 | PFI0475w | PRPF31/ | PFD0450c | ||
| SNRPE/SME1 | MAL13P1.253 | PPIH/- | PF08_0121 | ||
| SNRPF/SMX3 | PF11_0280 | SART1/SNU66 | PFC1060c | ||
| SNRPG/SMX2 | MAL8P1.48 | USP39/SAD1 | PF13_0096 | ||
| LSM2/LSM2 | PFE1020w | SNRNP27/- | MAL8P1.71 | ||
| LSM3/LSM3 | PF08_0049 | PRPF19/PRP19 | PFC0365w | ||
| LSM4/LSM4 | PF11_0524 | CRNKL1/CLF1 | PFD0180c | ||
| LSM5/LSM5 | PF14_0411 | CDC5L/CEF1 | PF10_0327 | ||
| LSM6/LSM6 | PF13_0142 | ISY1/ISY1 | PF14_0688 | ||
| LSM7/LSM7 | PFL0460w | BCAS2/SNT309 | PFF0695w | ||
| NAA38/LSM8 | MAL8P1.9 | XAB2/SYF1 | PFL1735c | ||
| SNRNP70/SNP1 | MAL13P1.338 | PLRG1/PRP46 | PFC0100c | ||
| SNRPA/MUD1 | MAL13P1.35 | SYF2/SYF2 | ? | ||
| SNRPC/YHC1 | PF08_0084 | SNW1/ | PFB0875c | ||
| SNRPA1/LEA1 | PF13_0362 | BUD31/ | PFE1140c | ||
| SNRPB2/MSL1 | PFI1695c | PPIE/- | ? | ||
| U2AF1/- | PF11_0200 | CCDC12/- | PF14_0490 | ||
| U2AF2/MUD2 | PF14_0656 | AQR/- | PF13_0273 | ||
| SF1/MSL5 | PFF1135w | CWC15/ | PF07_0091 | ||
| SF3A1/PRP21 | PF14_0713 | PPIL1/- | PFE1430c | ||
| SF3A2/PRP11 | PFF0970w | DHX16/PRP2 | PF10_0294? | ||
| SF3A3/PRP9 | PFI1215w | BAT1/SUB2 | PFB0445c | ||
| SF3B1/HSH155 | PFC0375c | DDX46/PRP5 | PFE0430w | ||
| SF3B2/CUS1 | PF14_0587 | SLU7/SLU7 | PFF0500c | ||
| SF3B3/RSE1 | PFL1680w | DHX38/PRP16 | MAL13P1.322 | ||
| SF3B4/HSH49 | PF14_0194 | CDC40/CDC40 | PFL0970w | ||
| SF3B5/YSF3 | PF13_0296 | PRPF18/ | PFI1115c | ||
| PHF5A/RDS3 | PF10_0179a | DHX8/PRP22 | PF10_0294? | ||
| SF3B14/- | PFL1200c | UPF1/NAM7 | PF10_0057 | ||
| DDX23/PRP28 | PFE0925c | UPF2/NMD2 | PFI1265w | ||
| CD2BP2/LIN1 | PF10_0310 | UPF3A/UPF3 | ? | ||
| EFTUD2/SNU114 | PF10_0041 | UPF3B/- | PF13_0158 | ||
| SNRNP200/ BRR2 | PFD1060w | SRSF1/- | PFE0865c | ||
| TXNL4A/DIB1 | PFL1520w | SRSF12/- | PFE0160c | ||
| PRPF8/PRP8 | PFD0265w | PTBP2/- | PFF0320c | ||
| PRPF6/ | PF11_0108 | SFRS4/- | PF10_0217 | ||
| SNRNP40/- | MAL8P1.43 | TRA2B/- | PF10_0028 |
The human or S. cerevisiae factor in bold font represents the best match for the P. falciparum homolog. Homologs of spliceosomal and NMD factors not found are denoted with question marks, while SR and hnRNP factors not found are not shown. Saccharomyces cerevisiae homologs that do not reside in the same complex as their human counterparts are italicized.
aPlasmodium falciparum proteins described in PlasmoDB as ‘conserved Plasmodium protein’ or with descriptions that do not reflect involvement in splicing.
bHomologs identified only by the human sequence.
Figure 1.(A) Histogram of 5′ GU-AG 3′ junctions found by HMMSplicer binned by score. Defaults retain all junctions supported by multiple reads scoring above 400 and all junctions supported by single reads scoring above 600. The grey line plots all reported 5′ GU-AG 3′ junctions, while the red line charts junctions that match previously known junctions in PlasmoDB v6.3 gene models or in ESTs. The blue line charts new junctions. The dashed line drawn at 1075 represents the operational score threshold. (B) Breakdown of canonical junctions with scores above 1075, with additional classification of new junctions. ‘Outside of gene model’ refers to new junctions with at least one inner edge mapped to an intergenic region. ‘Within gene model’ indicates that both inner edges mapped to the same gene model. ‘Neighboring gene models’ indicates that the inner edges mapped to neighboring gene models. (C) Comparison of the 5′- and 3′-splice site WebLogos for previously known junctions recovered versus new junctions above 1075. WebLogos calculated for human junctions are included for reference. Red bars indicate the 5′ GU-AG 3′ boundaries used for inclusion in each set. The height of each letter indicates the preference strength for that nucleotide at each position.
Verification of new junctions in conflict with known junctions
| Gene name | PlasmoDBv6.3 description | Score | Validated | Type | Frame-shift? | Isoform difference in bp (aa) | R | T | LT/ES | S |
|---|---|---|---|---|---|---|---|---|---|---|
| PFL1810w | Conserved | 1544.2 | Yes | 5′ss | Yes | 131 | 11 | 8 | 11 | 5 |
| 1283.2 | Yes | 5′ss | Yes | 218 | 3 | 0 | 3 | 1 | ||
| PFE0390w | Conserved | 1422.1 | Yes | 5′ss | Yes | 65 | 13 | 10 | 8 | 3 |
| PF13_0138 | MSF-1 like protein | 1372 | Yes | 5′ss | Yes | 55 | 7 | 5 | 3 | 7 |
| PFI0400c | Conserved | 1369.8 | Yes | Exon skip | No | 126 ( | 2 | 1 | 19 | 1 |
| PFF0290w | Long chain polyunsaturated fatty acid elongation enzyme | 1291.8 | Yes | Antisense | – | N/A | 9 | 6 | 14 | 5 |
| MAL13P1.225 | Thioredoxin | 1277.3 | Yes | Exon skip | Yes | 34 | 2 | 6 | 0 | 0 |
| PFE0055c | Heat shock protein | 1275.4 | Yes | 5′ss | No | 36 (12) | 54 | 14 | 3 | 8 |
| MAL8P1.126 | Serine protease | 1257.4 | Yes | 5′ss | Yes | 109 | 1 | 1 | 11 | 8 |
| PF10_0025 | PF70 protein | 1256.8 | Yes | 5′ss | Yes | 74 | 18 | 2 | 0 | 0 |
| PFD1050w | Alpha-tubulin II | 1243.2 | – | Antisense | – | N/A | 1 | 0 | 5 | 1 |
| MAL13P1.159 | Thioredoxin | 1239.9 | Yes | Exon skip | No | 33 ( | 0 | 1 | 0 | 3 |
| PFC0780w | Cleavage and polyadenylation specific factor | 1231.7 | – | Antisense | – | N/A | 2 | 6 | 27 | 8 |
| PFD0775c | RNA binding protein | 1228.4 | Yes | Antisense | – | N/A | 1 | 6 | 8 | 0 |
| PF10_0194 | NoOP12-like protein | 1219.4 | Yes | Exon skip | Yes | 41 | 1 | 0 | 0 | 1 |
| PFL1440c | Conserved | 1217.6 | Yes | Exon skip | No | 57 ( | 0 | 2 | 0 | 1 |
| PF11_0291 | Conserved | 1203.5 | Yes | 5′ss | No | 39 (13) | 1 | 0 | 0 | 5 |
| PFC0360w | Activator of HSP90 ATPase homolog 1-like protein | 1200.5 | Yes | Exon skip | Yes | 223 | 1 | 1 | 3 | 0 |
| PFC0495w | Plasmepsin VI | 1192.6 | Yes | Antisense | – | N/A | 0 | 0 | 8 | 3 |
| PF14_0394 | Conserved | 1190 | Yes | 5′ss | Yes | 98 | 2 | 4 | 5 | 0 |
| MAL13P1.146 | AMP deaminase | 1189.3 | Yes | antisense | – | N/A | 1 | 1 | 0 | 0 |
| PF11_0379 | Conserved | 1050.5 | Yes | Exon skip | No | 60 ( | 1 | 0 | 0 | 1 |
| PFL1445w | Conserved | 1041.3 | Yes | Exon skip | Yes | 85 | 0 | 6 | 0 | 0 |
| MAL13P1.16 | SNARE protein | 1034.7 | Yes | Exon skip | No | 108 ( | 0 | 0 | 0 | 4 |
| MAL13P1.277 | DNAJ-like protein | 1034.2 | Yes | Exon skip | Yes | 146 | 2 | 0 | 3 | 0 |
| PFF1210w | Phosphatidic acid phosphatase | 1032.4 | Yes | 5′ss | No | 66 (22) | 2 | 0 | 3 | 5 |
| PFB0600c | Conserved | 1026.1 | Yes | Antisense | – | N/A | 1 | 1 | 3 | 0 |
| PF14_0128 | Ubiquitin conjugating enzyme | 1018.5 | Yes | Exon skip | Yes | 104 | 0 | 1 | 3 | 0 |
| PF14_0316 | DNA topoisomerase II | 1011.4 | – | 5′ss | No | 459 (153) | 0 | 0 | 3 | 1 |
| PFB0279w | Conserved | 1010.9 | Yes | 3′ss | Yes | 97 | 1 | 4 | 3 | 0 |
| PFL1465c | Heat shock protein hslv | 1004.5 | – | Exon skip | Yes | 40 | 0 | 2 | 0 | 4 |
| PF10_0372 | Antigen UB05 | 1004.4 | – | antisense | – | N/A | 0 | 1 | 0 | 1 |
| PF11_0182 | Conserved | 1004.1 | Yes | Exon skip | Yes | 56 | 0 | 0 | 5 | 0 |
| PFF0365c | G-protein associated signal transduction protein | 996.3 | – | Exon skip | Yes | 163 | 2 | 0 | 0 | 0 |
| PFB0445c | DEAD box helicase, UAP56 | 995.9 | – | 3′ss | Yes | 74 | 1 | 0 | 0 | 0 |
| PFD0895c | Bet3 transport protein | 991 | – | Antisense | – | N/A | 0 | 0 | 2 | 0 |
| PF10_0116 | Conserved | 989.9 | Yes | 5′ss | Yes | 74 | 0 | 0 | 2 | 0 |
| PF14_0604 | Conserved | 988.1 | Yes | Exon skip | Yes | 344 | 1 | 0 | 0 | 0 |
| PFI0560c | Conserved | 987.7 | Yes | Exon skip | Yes | 40 | 1 | 0 | 0 | 2 |
| PFB0550w | Peptide chain release factor subunit 1 | 985.5 | – | Exon skip | Yes | 155 | 0 | 0 | 2 | 0 |
| PF11_0355 | Conserved | 984.6 | Yes | Antisense | – | N/A | 1 | 1 | 0 | 0 |
Conflicts are ranked by lowest HMMSplicer score within the pair, and the black line denotes the operating HMMSplicer threshold of 1075.
aValidations shown in more detail in Figure 2. For all conflict types except antisense, the new junction was evaluated for maintenance of the ORF—nucleotide and amino acid (if applicable) differences between new and known isoforms are listed. Read counts for new junctions (normalized by the number of reads mapped by Bowtie for each timepoint) are listed for ring [R, (TP1, TP0, TP8)], troph [T, (TP2, TP16, TP24)], late troph/early schizont [LT/ES, (TP3, TP32)] and schizont [S, (TP4, TP40, TP48)] timepoints.
Figure 3.WebLogo 5′- and 3′-splice site motifs for high scoring 5′ GC-AG 3′ HMMSplicer junctions (n = 12). Red bars indicate the boundaries used for inclusion in the set. The height of each letter indicates the information content for that nucleotide at each position. The large error bars derive from the small size of the input set.
Figure 4.Breakdown of alternative splicing events detected transcriptome-wide. (A) Alternative splicing events both by type and area in the genome. Events ‘In gene models’ belong to junction groups in which at least one junction maps within a gene model in the sense direction. ‘Intergenic’ events belong to junction groups with no junctions mapping to gene models. ‘Antisense’ events occur in junction groups with at least one junction within a gene model in the antisense direction. (B) Breakdown of the 279 alternative splicing events that have the potential to change the gene model’s coding sequence. ‘Frameshift-unclear’ could not be analyzed for ORF extension or truncation without assuming which downstream junction(s) co-occur in a given isoform. (C) Histogram of alternative splicing (AS) junctions (n = 296) by ratio of AS junction reads to recovered gene model (GM) junction reads. In cases of conflict with more than one GM junction, the GM junction with the most reads was chosen as the denominator.
Figure 5.Characterization of antisense splice junctions. (A) Schematic of all sense and antisense junctions recovered for PFD1050w (α-tubulin II). (B) WebLogos of the 5′- and 3′-splice sites of antisense junctions. The height of each letter indicates the preference strength for that nucleotide at each position. (C) Observed and expected distributions of antisense intron overlap with sense introns. The expected distribution was calculated by first determining the probability of encountering a GU (5′-splice site) or an AG (3′-splice site) on the opposite strand of introns versus exons in the genes with mapped antisense junctions. These probabilities guided otherwise random re-placement of each antisense junction within its corresponding gene model. This re-placement was iterated 100 times, with the mean percent of nucleotide overlap with sense introns ± standard deviation shown.