| Literature DB >> 23231500 |
Musa A Hassan1, Mariane B Melo, Brian Haas, Kirk D C Jensen, Jeroen P J Saeij.
Abstract
BACKGROUND: Accurate gene model predictions and annotation of alternative splicing events are imperative for genomic studies in organisms that contain genes with multiple exons. Currently most gene models for the intracellular parasite, Toxoplasma gondii, are based on computer model predictions without cDNA sequence verification. Additionally, the nature and extent of alternative splicing in Toxoplasma gondii is unknown. In this study, we used de novo transcript assembly and the published type II (ME49) genomic sequence to quantify the extent of alternative splicing in Toxoplasma and to improve the current Toxoplasma gene annotations.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23231500 PMCID: PMC3543268 DOI: 10.1186/1471-2164-13-696
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Schematic of methods used for transcript assembly and genome annotation with RNA-seq. A workflow of the strategy used to assemble and annotate Toxoplasma full-length transcripts using RNA-seq data. Because genes with updated annotations may fall into multiple categories (i.e. fused genes, changed exons, and changed UTRs) we highlight only the types of variation observed in the current study.
Figure 22,930 PASA transcripts did not overlap with any of the currently annotated (ToxoDB) type II (ME49) genes. Shown is a PASA transcript S2826 (Black) that aligned to a genomic sequence on TGME49_chrIV but did not overlap with any of the ME49 genes. We also show RNA-seq reads pile-up (histogram represented as reads per kilobase per million reads or RPKM) on the exons and 13 reads mapping to the exon-exon junction. This transcript produces an ORF of 146 amino acids, which is homologous to the TGGT1_124090 protein from the type 1 GT1 strain, but has no homology to any annotated ME49 protein.
Figure 3Refinement of gene models as currently predicted in ToxoDB. (A) A ToxoDB gene (Blue) with discrepancies in the UTRs compared to the PASA transcript (Black). (B) Two ME49 genes (Blue) are fused into one PASA transcript (Black). (C) A PASA transcript with two novel 5’ exons lacking in the predicted ME49 gene (Black). (D) A PASA transcript (Blue) with a novel 3’ exon lacking in the ME49 gene (Blue). (E) A PASA transcript (Blue) with a novel internal exon in a region containing an intron in a ME49 gene (Black). (F) A ME49 gene (Black) with three exons fused into one in the corresponding PASA transcript (Blue)
Figure 4Alternative splicing in takes similar forms described in other eukaryotes. Types of alternative splicing included Alternate Donor (A), Alternate Acceptor (B), Retained or Spliced Intron (C), Retained or Spliced Exons (D), Initiation Within and Intron (E) and Alternate Terminal Exons (F). ToxoDB genes are depicted in blue and PASA transcripts in black. Red arrows indicate regions where variation is observed.
Figure 5The ability to detect alternatively spliced isoforms using RNA-seq data is dependent on read coverage. Shown are fraction of alternatively spliced (from a total of 50) (Black bars) and multi-exonic genes (from a total of 5,873)(Grey bars) identified in bins (A) grouped by the Log10 of the expression level as reads per kilobase (RPK) or (B) grouped by the Log10 of raw coverage. Alternative isoforms are more likely to be detected amongst highly expressed genes compared to genes with low expression. See also additional file 9 for the relationship between read coverage and full assembly of transcripts.
Percentage spliced in (PSI) (shown as a fraction) values for some of the alternatively spliced transcripts in the three clonal strains of
| TGME49_053370 | Roptry neck 4 L1 homologue | 0.41 | 0.23 | 0.16 |
| TGME49_008740 | microneme protein, putative | 0.28 | 0.31 | 0.46 |
| TGME49_078510 | protein phosphatase 2C, putative | 0.71 | 0.72 | 0.61 |
| TGME49_038230 | serine/threonine protein phosphatase, putative | 0.48 | 0.08 | 0.09 |
| TGME49_112660 | Hypothetical protein | 0.50 | 0.55 | 0.05 |
| TGME49_097470 | Myosin light chain 2, putative | 0.34 | 0.43 | 0.02 |
Differential isoform usage between Pru and RH strains
| TGME49_008740 | (0,0):3272,(0,1):62,(1,0):14,(1,1):275 | (0,0):9355,(0,1):99,(1,0):73,(1,1):799 |
| TGME49_112660 | (0,0):580,(0,1):2,(1,0):4,(1,1):68 | (0,0):667,(0,1):23,(1,1):74 |
| TGME49_078510 | (0,0):2647,(0,1):15,(1,0):15,(1,1):178 | (0,0):4612,(0,1):18,(1,0):5,(1,1):225 |
| TGME49_097470 | (0,0):1648,(0,1):12,(1,0):22,(1,1):140 | (0,0):3796,(0,1):19,(1,1):297 |
(0,0):x indicates x reads align to both isoforms but are not used in support of either isoform for a variety of reasons including reads aligning to exons far removed from the spliced site, (0,1):x, x reads support the second isoform but not the first, (1,0):x indicate x reads support the first isoform and not the second (1,1):x, x reads supporting both isoforms. Missing values for RH indicate the absence of reads supporting the alternative isoform. The gene descriptions are shown in Table 1.
Figure 6Different isoforms of alternatively spliced transcripts are differentially expressed in diverse strains. Shown is RNA-seq read pile up (histogram) on the exons (horizontal bars) of alternative isoforms of TGME49_097470 gene that show differential isoform usage between a type II (Pru) and a type I (RH) strain. The predominant isoform for the TGME49_097470 gene in RH has identical gene model and protein sequence as that predicted in ToxoDB.