| Literature DB >> 20141604 |
Thomas D Otto1, Daniel Wilinski, Sammy Assefa, Thomas M Keane, Louis R Sarry, Ulrike Böhme, Jacob Lemieux, Bart Barrell, Arnab Pain, Matthew Berriman, Chris Newbold, Manuel Llinás.
Abstract
Recent advances in high-throughput sequencing present a new opportunity to deeply probe an organism's transcriptome. In this study, we used Illumina-based massively parallel sequencing to gain new insight into the transcriptome (RNA-Seq) of the human malaria parasite, Plasmodium falciparum. Using data collected at seven time points during the intraerythrocytic developmental cycle, we (i) detect novel gene transcripts; (ii) correct hundreds of gene models; (iii) propose alternative splicing events; and (iv) predict 5' and 3' untranslated regions. Approximately 70% of the unique sequencing reads map to previously annotated protein-coding genes. The RNA-Seq results greatly improve existing annotation of the P. falciparum genome with over 10% of gene models modified. Our data confirm 75% of predicted splice sites and identify 202 new splice sites, including 84 previously uncharacterized alternative splicing events. We also discovered 107 novel transcripts and expression of 38 pseudogenes, with many demonstrating differential expression across the developmental time series. Our RNA-Seq results correlate well with DNA microarray analysis performed in parallel on the same samples, and provide improved resolution over the microarray-based method. These data reveal new features of the P. falciparum transcriptional landscape and significantly advance our understanding of the parasite's red blood cell-stage transcriptome.Entities:
Mesh:
Year: 2010 PMID: 20141604 PMCID: PMC2859250 DOI: 10.1111/j.1365-2958.2009.07026.x
Source DB: PubMed Journal: Mol Microbiol ISSN: 0950-382X Impact factor: 3.501
RNA-Seq mapping statistics against P. falciparum genome.
| Sequencing run | |||
|---|---|---|---|
| Undepleted | Depleted by specific oligos | Depleted by exonuclease and specific oligos | |
| Total reads | 5 161 203 | 5 657 762 | 4 847 379 |
| % Mapped | 94 | 92 | 86 |
| % Mapped to unique locations | 15 | 24 | 52 |
| Reads mapped to rRNA | 3 120 248 | 3 269 563 | 1 034 004 |
| % Reads mapped to rRNA | 60% | 58% | 21% |
| Fold coverage | 1.14 | 2.05 | 3.75 |
| % Genome not covered | 72 | 65 | 49 |
| % Genome covered, < 5-fold | 96 | 93 | 84 |
| % Genome covered, > 10-fold | 1 | 2 | 7 |
| Max. coverage across in coding sequences | 11 008 | 4 697 | 7 061 |
| Max. coverage, genome-wide | 73 864 | 95 351 | 64 552 |
| Max. average coverage in coding sequences | 2 774 | 1 296 | 1 598 |
| Genes with gmean coverage > 5 | 749 | 1 205 | 2 438 |
| Genes with gmean coverage > 10 | 93 | 191 | 499 |
Summary statistics of mapping of Illumina sequencing reads on to P. falciparum 3D7 genome from RNA-Seq runs after depletion by specific oligonucleotides and by exonuclease digestion. Oligonucleotides used for specific depletion have been described in Table S1.
Reads mapped using SSAHA2.
Coverage determined using MAQ; non-unique reads randomly partitioned over repeats.
Fig. 1Workflow of short read processing for gene expression analysis by RNA-Seq. The Illumina sequencing reads are mapped with SSAHA2 (Ning ) against the Plasmodium falciparum 3D7 genome. After mapping, splice reads and coverage plots are obtained. The splice reads are used to confirm or find new splice sites as well as alternative splice sites. The coverage plots show the RNA expression levels over each base pair of the genome. To calculate the expression per CDS per time point, the coverage plots and the uniqueness plots are used. Uniqueness plots indicate the uniqueness of a particular region of the genome. Using the coverage, it is possible to identify incorrect annotation, novel transcripts and potential untranslated regions (UTRs) of protein coding transcripts (as described in the text).
Fig. 2Expression profiles of 3975 annotated genes at seven time points in the intra-erythrocytic developmental cycle (IDC) of P. falciparum 3D7 and comparison of RNA-Seq data with microarray data. A. Heat map of genes expressed in the IDC cycle (Bozdech ) with the RNA-Seq data. B. Heat map of genes expressed in the IDC cycle, derived from microarray experiments using the identical biological samples. C. Pearson correlation between the RNA-Seq and the microarray data sets.
Fig. 3RNA-Seq coverage plots for selected genes and their corresponding expression profiles (expressed as gmean) at seven time points in the intra-erythrocytic developmental cycle (IDC) of P. falciparum 3D7. A. Expression profile of a multi-exon gene PF11_0152 (GTPase activator) [maximal expression 423 (gmean)]. B. Expression profiles of three adjacent genes: PFI0180w (max expression 2000), PFI0185w (no expression) and PFI0190w (max expression 780) on chromosome 9, showing opposite temporal regulation of expression for PFI0180w (alpha tubulin – black profile plot), PFI0190w (60S ribosomal protein L32 – red profile plot) and lack of expression for PFI0185w. C. Expression profile of a novel mlncRNA transcript (PF10TR002, see in Table S5) identified on Chr10 (max expression 580).
Fig. 4Use of RNA-Seq data in correction of gene models in P. falciparum 3D7. An example is shown where a previously incorrect predicted gene model was corrected using RNA-Seq evidence for the gene PF10_0022 [Plasmodium exported protein (PHISTc)]. The coverage plots indicate that the first exon is shorter by 27 bp at the 3′ end. The arrow and black boxed areas highlight the location of structural changes incorporated in the gene PF10_0022. The correctly spliced form is confirmed by 36 known bridging reads (green features). This new splice site was confirmed by RT-PCR (orange features). Coverage plots also identified the 5′ UTR in PF10_0022 (shown by grey striped feature). The incorrect gene model was taken from the published version of the P. falciparum 3D7 genome (Gardner ).
Overview of changes to annotation of the P. falciparum 3D7 genome.
| Previous annotation | Modified annotation | Difference | |
|---|---|---|---|
| Predicted protein coding genes | 5 317 | 5 438 | 121 |
| Changes to gene structures based on RNA-Seq evidence | 423 | ||
| Predicted spliced transcripts | 2 870 | 2 952 | 82 |
| Predicted splice sites | 8 315 | 8 517 | 202 |
| Confirmed splice sites (by ≥ 1 Illumina read pair) | 6 590 | 6 891 | 301 |
| Confirmed splice sites (by ≥ 2 Illumina read pairs) | 6 095 | 6 389 | 294 |
| % confirmed splice sites (by ≥ 2 Illumina reads pairs) | 73 | 75 | 2 |
| Reads confirming predicted splice sites | 453 881 | 479 011 | 25 130 |
Overview of annotation changes in the P. falciparum 3D7 genome with the aid of the RNA-Seq data during the period between March, 2008 and May, 2009.
Annotation from May 2008, produced without using RNA-Seq data.
Annotation from March 2009, edited using RNA-Seq data.
Overview of alternative splicing events confirmed by RNA-Seq.
| Chr. | New splice site | Number of confirming reads | Alternative splicing events | Gene identifier | Product |
|---|---|---|---|---|---|
| 1 | 294554.. 294928 | 3 | Exon skipping | PFA0345w | Centrin-1 |
| 1 | 388453.. 388750 | 4 | Exon skipping | PFA0485w | Phosphatidate cytidylyltransferase, putative |
| 2 | 140595.. 140820 | 2 | 3′ and 5′ alternative | PFB0140w | Zinc finger protein, putative |
| 2 | 231909.. 232099 | 10 | 3′ and 5′ alternative | PFB0255w | Conserved Plasmodium protein, unknown function |
| 2 | 382875.. 383071 | 2 | 3′ and 5′ alternative | PFB0410c | Phospholipase A2, putative |
| 2 | 412537.. 412701 | 3 | 3′ and 5′ alternative | PFB0455w | 60S ribosomal protein L37ae, putative |
| 2 | 276441.. 276558 | 5 | Alternative stop | PFB0305c | Merozoite surface protein 5 |
| 3 | 214831.. 215062 | 3 | 3′ and 5′ alternative | PFC0200w | 60S Ribosomal protein L44, putative |
| 3 | 458474.. 458623 | 3 | 3′ and 5′ alternative | PFC0441c | SAC3/GNAP family-related protein, putative |
| 3 | 553204.. 553287 | 13 | 3′ and 5′ alternative | PFC0571c | Conserved Plasmodium protein, unknown function |
| 3 | 566965.. 567142 | 6 | 3′ and 5′ alternative | PFC0582c | Vesicle transport v-SNARE protein, putative |
| 4 | 99991.. 100185 | 33 | 3′ and 5′ alternative | PFD0070c | rifin |
| 4 | 673403.. 673580 | 2 | 3′ and 5′ alternative | PFD0720w | Conserved ARM repeats protein, unknown function |
| 4 | 785847.. 785932 | 3 | 3′ and 5′ alternative | PFD0850c | Memo-like protein |
| 5 | 1082779.. 1082882 | 2 | 3′ and 5′ alternative | PFE1305c | ADP-ribosylation factor GTPase-activating protein, putative |
| 5 | 1147532.. 1147829 | 4 | Exon skipping | PFE1375c | Conserved Plasmodium protein, unknown function |
| 5 | 1202254.. 1202495 | 5 | Intron creation | PFE1465w | Conserved Plasmodium protein, unknown function |
| 6 | 1171453.. 1171616 | 2 | 3′ and 5′ alternative | PFF1375c | Ethanolaminephosphotrans ferase, putative |
| 6 | 258869.. 258956 | 24 | Intron creation | PFF0300w | RNA binding protein, putative |
| 6 | 533969.. 534599 | 18 | 3′ and 5′ alternative | PFF0630c | Conserved Plasmodium protein, unknown function |
| 6 | 533969.. 534618 | 42 | Exon Skipping | PFF0630c | Conserved Plasmodium protein, unknown function |
| 6 | 534448.. 534599 | 2 | 3′ and 5′ alternative | PFF0630c | Conserved Plasmodium protein, unknown function |
| 6 | 534448.. 534618 | 6 | 3′ and 5′ alternative | PFF0630c | Conserved Plasmodium protein, unknown function |
| 6 | 794834.. 794959 | 38 | 3′ and 5′ alternative | PFF0920c | Conserved Plasmodium protein, unknown function |
| 7 | 80499.. 80835 | 3 | Alternative stop | MAL7P1.2 25 | Plasmodium exported protein (PHISTa-like), unknown function |
| 7 | 112961.. 113197 | 2 | 3′ and 5′ alternative | MAL7P1.2 29 | Cytoadherence linked asexual protein |
| 7 | 137990.. 138206 | 2 | 3′ and 5′ alternative | PF07_0004 | Plasmodium exported protein, unknown function |
| 7 | 1314328.. 1314516 | 21 | 3′ and 5′ alternative | MAL7P1.1 60 | Conserved Plasmodium protein, unknown function |
| 8 | 553020.. 553217 | 11 | 3′ and 5′ alternative | MAL8P1.1 06 | Conserved Plasmodium protein, unknown function |
| 8 | 227245.. 227377 | 19 | 3′ and 5′ alternative | MAL8P1.1 43 | Conserved Plasmodium protein, unknown function |
| 8 | 284523.. 284611 | 2 | 3′ and 5′ alternative | MAL8P1.1 38 | Alpha/beta hydrolase, putative |
| 9 | 115387.. 115522 | 18 | 3′ and 5′ alternative | PFI0125c | Serine/Threonine protein kinase, FIKK family |
| 9 | 285483.. 285587 | 59 | 3′ and 5′ alternative | PFI0280c | Autophagocytosis-associated protein, putative |
| 9 | 285483.. 285808 | 3 | Exon skipping | PFI0280c | Autophagocytosis-associated protein, putative |
| 9 | 169401.. 169747 | 3 | Alternative stop | PFI0175w | Conserved Plasmodium protein, unknown function |
| 9 | 527195.. 527472 | 2 | Exon skipping | PFI0560c | Conserved Plasmodium protein, unknown function |
| 9 | 749906.. 750048 | 2 | 3′ and 5′ alternative | PFI0890c | Organelle ribosomal protein L3 precursor, putative |
| 9 | 857945.. 858202 | 18 | Exon skipping | PFI1030c | Ubiquitin conjugating enzyme, putative |
| 9 | 1135552.. 1135898 | 8 | 3′ and 5′ alternative | PFI1375w | Cytochrome C oxidase, putative |
| 9 | 1219040.. 1219282 | 2 | 3′ and 5′ alternative | PFI1490c | Ran-binding protein, putative |
| 9 | 1427723.. 1427820 | 7 | Alternative start | PFI1740c | Ring-exported protein 2 |
| 10 | 122332.. 122507 | 7 | 3′ and 5′ alternative | PF10_0028 | RNA binding protein, putative |
| 10 | 616830.. 617522 | 2 | 3′ and 5′ alternative | PF10_0149 | Cysteinyl-tRNA synthetase, putative |
| 10 | 617363.. 617522 | 5 | 3′ and 5′ alternative | PF10_0149 | Cysteinyl-tRNA synthetase, putative |
| 10 | 630332.. 630635 | 3 | Exon skipping | PF10_0153a | Conserved Plasmodium protein, unknown function |
| 10 | 1506427.. 1506546 | 1194 | 3′ and 5′ alternative | PF10_0372 | Antigen UB05 |
| 11 | 204951.. 205102 | 9 | 3′ and 5′ alternative | PF11_0058 | RNA polymerase subunit, putative |
| 11 | 539968.. 540166 | 218 | 3′ and 5′ alternative | PF11_0149 | Rhomboid protease ROM1, putative |
| 11 | 616746.. 616900 | 4 | 3′ and 5′ alternative | PF11_0169 | SNO glutamine amidotransferase, putative |
| 11 | 736961.. 737116 | 3 | 3′ and 5′ alternative | PF11_0202 | Clathrin coat assembly protein, putative |
| 11 | 1029280.. 1029486 | 14 | 3′ and 5′ alternative | PF11_0273 | DNAJ protein, putative |
| 11 | 1431895.. 1433153 | 3 | Alternative stop | PF11_0377 | Casein kinase 1, PfCK1 |
| 12 | 194966.. 195139 | 2 | 3′ and 5′ alternative | PFL0190w | Ubiquitin conjugating enzyme E2, putative |
| 12 | 545253.. 545453 | 2 | 3′ and 5′ alternative | PFL0610w | Conserved Plasmodium protein, unknown function |
| 12 | 556129.. 556311 | 9 | 3′ and 5′ alternative | PFL0623c | conserved Plasmodium membrane protein, unknown function |
| 12 | 675551.. 675934 | 15 | Exon skipping | PFL0825c | Conserved Plasmodium protein, unknown function |
| 12 | 848566.. 848672 | 4 | 3′ and 5′ alternative | PFL1015w | Conserved Plasmodium protein, unknown function |
| 12 | 1429555.. 1429746 | 5 | Intron creation | PFL1650w | Conserved Plasmodium protein, unknown function |
| 13 | 598721.. 598890 | 3 | 3′ and 5′ alternative | MAL13P1. 70 | Conserved Plasmodium membrane protein, unknown function |
| 13 | 656337.. 656495 | 2 | 3′ and 5′ alternative | MAL13P1. 82 | Phosphatidylinositol synthase |
| 13 | 656725.. 657067 | 3 | Exon skipping | MAL13P1. 82 | Phosphatidylinositol synthase |
| 13 | 670259.. 670473 | 22 | 3′ and 5′ alternative | MAL13P1. 84 | Protein kinase, putative |
| 13 | 892554.. 892819 | 3 | 3′ and 5′ alternative | MAL13P1. 118 | 3′,5′-cyclic nucleotide phosphodiesterase |
| 13 | 1097484.. 1097716 | 2 | Intron creation | MAL13P1. 144 | Translation initiation factor EIF-2B gamma subunit, putative |
| 13 | 1097497.. 1097716 | 2 | Intron creation | MAL13P1. 144 | Translation initiation factor EIF-2B gamma subunit, putative |
| 13 | 1280278.. 1280427 | 5 | 3′ and 5′ alternative | MAL13P1. 163 | ER lumen protein retaining receptor 1, putative |
| 13 | 2037484.. 2037617 | 5 | 3′ and 5′ alternative | MAL13P1. 257 | Conserved Plasmodium protein, unknown function |
| 13 | 2093143.. 2093239 | 2 | Intron creation | MAL13P1. 267 | conserved Plasmodium protein, unknown function |
| 13 | 2094689.. 2094839 | 5 | Intron creation | MAL13P1. 267 | Conserved Plasmodium protein, unknown function |
| 13 | 2242140.. 2242247 | 3 | Exon skipping | MAL13P1. 277 | DNAJ-like protein, putative |
| 13 | 2242140.. 2242464 | 4 | Exon skipping | MAL13P1. 277 | DNAJ-like protein, putative |
| 13 | 2438774.. 2438947 | 88 | 3′ and 5′ alternative | MAL13P1. 303 | Polyadenylate-binding protein, putative |
| 13 | 2463514.. 2463766 | 10 | Exon skipping | MAL13P1. 306 | Conserved Plasmodium protein, unknown function |
| 14 | 361362.. 361530 | 3 | 3′ and 5′ alternative | PF14_0089 | Conserved Plasmodium protein, unknown function |
| 14 | 446307.. 446640 | 12 | Exon skipping | PF14_0108 | conserved Plasmodium protein, unknown function |
| 14 | 448014.. 448158 | 4 | 3′ and 5′ alternative | PF14_0778 | Conserved Plasmodium membrane protein, unknown function |
| 14 | 521860.. 522189 | 5 | Exon skipping | PF14_0128 | Ubiquitin conjugating enzyme, putative |
| 14 | 1079304.. 1079501 | 36 | 3′ and 5′ alternative | PF14_0253 | Conserved Plasmodium membrane protein, unknown function |
| 14 | 1446211.. 1446527 | 24 | Exon skipping | PF14_0338 | Conserved Plasmodium protein, unknown function |
| 14 | 2016906.. 2017099 | 36 | 3′ and 5′ alternative | PF14_0469 | Transcription factor IIIb subunit, putative |
| 14 | 2255846.. 2256076 | 9 | 3′ and 5′ alternative | PF14_0526 | Conserved Plasmodium protein, unknown function |
| 14 | 2481102.. 2481202 | 41 | 3′ and 5′ alternative | PF14_0581 | Apicoplast ribosomal protein S10 precursor, putative |
| 14 | 2587073.. 2587215 | 48 | 3′ and 5′ alternative | PF14_0607 | Conserved Plasmodium membrane protein, unknown function |
| 14 | 2812903.. 2813007 | 2 | 3′ and 5′ alternative | PF14_0653 | Derlin-2, putative |
Overview of alternative splicing events confirmed by RNA-Seq Solexa reads over seven time points used in the study.
Fig. 5Use of RNA-Seq data to detect alternative splicing and exon skipping events in the IDC transcriptome of P. falciparum 3D7. A. Alternative splice sites for exon 4 of PF14_0581 (putative apicoplast ribosomal protein isoforms) highlighted by aligned bridging reads (red) from early ring time points. The boxed area highlights the location of alternative splicing in the gene PF14_0581. The dotted red line links read pairs from the same template DNA. The blue bars show reads that map to the borders of exons, across an intron. Perfectly mapping reads are not shown. B. Example of exon skipping in PF14_0108 (a predicted protein of unknown function). A new splice form was indicated by a read mapping across two introns and exon, and its read pair (red). Both splice-forms were confirmed by RT-PCR (orange boxes). The boxed area highlights the location of exon skipping in the gene PF14_0108. Only the last eight exons of PF14_0108 are shown in the figure.