Literature DB >> 14617379

Integrative analysis of intraerythrocytic differentially expressed transcripts yields novel insights into the biology of Plasmodium falciparum.

Raphael D Isokpehi1, Winston A Hide.   

Abstract

BACKGROUND: The intraerythrocytic development of Plasmodium falciparum, the most virulent human malaria parasite involves asexual and gametocyte stages. There has been a significant increase in disparate datasets derived from genomic and post-genomic analysis of the parasite that necessitates delivery of integrated analysis from which biological processes important to the survival of the parasite can be determined.
METHODS: In order to resolve genes associated with stage differentially expressed transcripts, we have developed and implemented an integrative approach that combines evidence from P. falciparum expressed sequence tags (ESTs), genomic, microarray, proteomic and gene ontology data.
RESULTS: A total of 143 gametocyte-overexpressed and 51 asexual-overexpressed transcripts were identified. A subset of 74 genes associated with these transcripts showed evidence of stage-correlated protein expression, of which 53 have not been experimentally characterised. Our study has revealed (1) possible regulatory mechanisms in malaria parasites' gametocyte maturation, (2) correlation between EST and microarray data for a P. falciparum gene family to present unique EST-derived information, (3) candidate drug and antigenic targets on which computational and experimental studies can be performed, and (4) the need for more empirical studies on gene and protein expression in malaria parasites.
CONCLUSION: Applying different domains of data to the same underlying gene set has yielded novel insights into the biology of the parasite and presents an approach to appraise critically the data quality of post-genomic datasets from malaria parasites.

Entities:  

Year:  2003        PMID: 14617379      PMCID: PMC305352          DOI: 10.1186/1475-2875-2-38

Source DB:  PubMed          Journal:  Malar J        ISSN: 1475-2875            Impact factor:   2.979


Background

Pathogen bioinformatics have been developed and applied as a vehicle to discover novel genes and the search for virulence-associated genes combining approaches that assay gene expression, adaptive evolution and gene transfer [1-3]. In this study, layers of data about Plasmodium falciparum, obtained with gene transcript and genome sequencing as well as gene and protein expression profiling technologies, were integrated to reveal insights into previously undiscovered regulation during intraerythrocytic development. Genes that merit further analysis are described. This integrative approach uses an evidence-based assessment of disparate datasets similar to gene structure prediction approaches that rely on accumulation of evidence such as similarity to known genes, nucleotide compositional features, intron/exon boundaries and promoter sequences [4]. The high malaria burden in Africa [5,6] necessitates increased efforts to understand the biology of the pathogen with a view to discovering new drugs, candidate vaccines and diagnostics, as well as improving existing ones. The publication of the genomes of the human malaria parasite P. falciparum and the rodent malaria parasite Plasmodium yoelii as well as ongoing sequencing projects of other Plasmodium species presents new opportunities to achieve the above-mentioned goals [7-9]. In addition, there have been efforts to obtain and analyse on a large-scale, gene expression profiles (transcriptome) of Plasmodium species using Expressed Sequence Tags (ESTs) [1,10-13], full length cDNAs [14], Serial Analysis of Gene Expression (SAGE) [15,16] and microarrays [17-19]. Protein expression profiles (proteome) on particular stages of the P. falciparum life cycle are also available [20,21]. The random single-pass sequencing of a cDNA library to generate short (200–500 bp) nucleotide sequences that tag an expressed gene sequence is an established method of gene discovery [22,23]. EST gene indices are generated by computer-based methods to organise these tags by assigning them into groups to remove redundancies and yield reconstructed transcripts that represent consensus sequences of each group [22,24,25]. These indices are being used to understand the complexity of the human genome, especially in providing information on alternative transcripts, non-translated transcripts, truly unique genes and extremely short genes that will complement the genome data [25]. The availability of the complete genome of P. falciparum 3D7 makes it possible to provide similar information for the parasite. In fact, additional EST and full-length cDNA sequences are required to improve the current annotation and verify predicted genes [7]. EST sequencing projects on Plasmodium have identified novel genes [1,10,13] but only limited analyses have been performed on ESTs for coordinate and differential gene expression [13]. Plasmodium ESTs from a variety of cDNA libraries are available in the GenBank EST database (dbEST). As of February 2003, 11 libraries comprising of nine asexual, one sporozoite and one gametocyte were available in dbEST. ESTs from some of these libraries have been indexed [1,10,13,26]. Microarrays, mRNA differential display and EST-based analysis have been used to study transcriptional differences between asexual and gametocyte stages of P. falciparum, revealing stage-specific genes [13,17,27]. These studies were done prior to the publication of the genome sequence of strain 3D7. Furthermore, in the case of Li and colleagues [13], the functional annotation was selective. An EST-based analysis with an improved functional annotation that combines the automated annotation from P. falciparum gene indices and the curated annotation in the Plasmodium Genome Database (PlasmoDB) [28] is needed. In addition, integration of proteomic data with such analysis has been recognized as an important component in drug target identification and validation in the human genome [29]. The number of ESTs used to generate a consensus sequence in a gene index can provide a rough estimate of the mRNA abundance in the tissue or cell of origin [23]. Furthermore, statistical tests have been developed to identify genes that are differentially expressed (significantly overexpressed) in a particular tissue compared to one or more other tissues [30,31]. The differences in EST counts have been applied to understand gene expression in different metabolic pathways, tissues or stages [32-34]. These differences appear to correlate with biology of the tissue or stage under investigation. Microarray and SAGE methods are more narrow but sensitive for differential gene expression studies and can be used to validate broader EST-based analysis [13]. The life cycle of P. falciparum involves stages in the female anopheline mosquito vector and stages in the human host [35]. The parasite goes through pre-erythrocytic and intraerythrocytic stages in the human host. The pre-erythrocytic stage involves invasion and growth within liver cells, whereas the intraerythrocytic cycle is a multi-stage process, which includes differentiation into asexual stages (rings, merozoites, trophozoites and schizonts) as well as sexual stages (male and female gametocytes). The clinical symptoms of malaria are produced primarily as a consequence of the asexual life cycle, while the sexual cycle, which can be divided into early (I-II) and late (III-V) gametocyte stages [36], is necessary for the development of the parasite in the mosquito. The intensive research on gene expression in the asexual stage compared to gametocyte stage can be inferred from the number of cDNA libraries deposited in the dbEST as mentioned above. The late (mature) stage gametocyte cDNA library (ID:10054) should contain transcripts important for gametocyte maturation and also formation of gametes and fertilization [37]. The availability of a cDNA library of 3D7 (ID:9765) asexual mixed stage (rings, trophozoites and schizonts) and genome data from the same strain presents an opportunity to determine differentially expressed transcripts between the two libraries. Transcription and translation in malaria parasites is complex and characterized by features such as multiple transcripts, antisense transcripts, stage-specific transcripts, chromosomal clusters encoding co-expressed proteins, unspliced mRNA, gene family member-specific expression and translational control [20,38,39]. These features contribute to parasite fitness and ability to undergo a complex life cycle. Understanding the role of these features in the regulation of important intraerythrocytic biological processes can deliver new tools for malaria control. For example, a proportion of genes involved in glycolysis, proteolysis and apicoplast targeting of nuclear encoded genes are thought to be regulated during the transition from asexual to sexual stages [7,40]. The integration of data from EST sequencing with those from genomic, microarray and proteomic technologies could provide insights into molecular mechanisms that contribute to the regulation of these processes. The significant increase in disparate datasets from genome sequencing and post-genomic analysis of P. falciparum necessitates delivery of integrated analysis from which biological processes important to the survival of the parasite can be determined. The integrated approach developed has identified stage-overexpressed genes with computational and experimental evidence to support their functional analysis. Furthermore, the approach is demonstrated as a means to appraise critically the data quality of the increasing number of post-genomic datasets from malaria parasites.

Methods

Integrative analysis approach

The integrative analysis approach that was used to combine genomic, expressed sequence tag, microarray, proteomic and gene ontology data from P. falciparum 3D7 is presented in Figure 1. The starting integrative criterion was significant overexpression of a transcript in a stage relative to the other stage. Criteria used and their acceptable ranges are presented in Table 1.
Figure 1

Simplified flowchart of integrative analysis of Flowchart symbols: rounded rectangle, start or end; rectangle, process; diamond, decision.

Table 1

Threshold values for steps in integrative analysis of Plasmodium falciparum data

Criterion and acceptable range
Reconstructed transcript derived from minimum of 5 ESTs
Agreement of pairwise differential expression statistics at P < 0.05
Maximum BLASTX E-value of 10-10 against predicted proteins
Correlation of functional annotation with Plasmodium falciparum gene indices
Evidence that protein is expressed in same stage as gene
Gene Ontology classification: proteolysis, glycolysis or localised to plastid
Microarray: Published data on a gene family
Simplified flowchart of integrative analysis of Flowchart symbols: rounded rectangle, start or end; rectangle, process; diamond, decision. Threshold values for steps in integrative analysis of Plasmodium falciparum data

Expressed sequence tags and transcript reconstruction

Expressed Sequence Tags derived from P. falciparum 3D7 mixed asexual stage (dbEST ID: 9765) and gametocyte (III-V) stages (dbEST ID: 10054) cDNA libraries were retrieved using Sequence Retrieval System (SRS) version 7.02 from EMBL database (Release 74, March 2003). These sets of ESTs were sequenced by Washington University Plasmodium EST Project [13]. A total of 15,126 ESTs consisting of 11,872 asexual and 3,254 gametocyte ESTs were downloaded. Transcript reconstruction of these ESTs was performed using stackPACK clustering system version 2.2 [22,24] as described previously for reconstructing Plasmodium transcripts [1]. Briefly, the process starts with removal of artifactual sequences such as repeats and vector sequences. The "clean" sequences are grouped using a loose clustering approach into clusters and the clusters assembled into contigs. The alignments of sequences that make up these assembled clusters are analysed to produce consensus sequences of maximal length representing the reconstructed transcripts. stackPACK was chosen for its ability to provide extended consensus sequences [41] (Hide et al. in preparation). Clusters containing only a single sequence are called singletons. A gene index, manufactured by such a method, is therefore a non-redundant representation of a set of reconstructed gene fragments that approximates to the best available representation of genes for that organism. The clustering was unsupervised in that known sequences such as mRNA, full-length cDNA, previously reconstructed ESTs or exon constructs were not used to guide the process. This type of clustering was required to provide valid input data for the software used to calculate the differential expression statistics applied in this study.

Differential gene expression analysis

Audic-Claverie (AC) and the Chi-square (χ2) 2 × 2 statistical tests for differential gene expression were used to identify stage-overexpressed transcripts. These pairwise tag statistics are based on EST counts of contigs (assembled clusters) with at least five ESTs since for a 95% confidence interval, the first value that is significantly different from 0 is 5 [30,32]. The calculation of these statistics was implemented with the web version of IDEG6 software; with a significance threshold of 0.05 [31]. A suite of PERL scripts was written to extract EST counts from output of stackPACK 2.2 and present the input dataset in the format required by IDEG6. Data extracted from the output file of IDEG6 were (1) contig description; (2) observed and normalised EST counts from the two libraries; and (3) probability that a transcript is differentially expressed as represented by P-values for the two tests. Transcripts for which the P-values for both statistics were less than 0.05 were taken as differentially expressed. Since these statistics determined transcripts differentially expressed, the terms asexual-overexpressed and gametocyte-overexpressed were used for transcripts (or genes) with significant overexpression in mixed asexual stage and late stage gametocytes respectively.

Protein expression profiles and functional annotation of transcripts

Annotated protein predictions (release 4.0) of the whole genome sequence of P. falciparum 3D7 was obtained from the PlasmoDB website; . A total of 5,334 predicted protein sequences were obtained. The overview page for each gene was retrieved using wget and saved as a Hypertext Markup Language (HTML) file on a local computer to allow ease of manipulation without accessing the database over the Internet. A PERL script was used to query each page for the words sporozoite, merozoite, trophozoite or gametocyte preceded by an apostrophe (') followed by a specific text as for the gametocyte; 'gametocyte stage peptide fragment(s) detected by mass spectrometry'. A match of this text was taken as evidence of expression and protein expression at the stage was assigned 1 or else 0 for no evidence. Thus, a 4-digit binary accession that indicates evidence for expression in sporozoite, merozoite, trophozoite and gametocyte is used to represent the 15 protein expression profiles presented by Florens et al. [20] and an additional accession for lack of evidence in all stages (0000). Reconstructed transcripts were annotated on the basis of similarity searches using NCBI BLASTX version 2.2.1 against predicted proteins of P. falciparum 3D7. Statistical significance cut-off was set at an E-value of 10-10 following that of Carlton et al. [1]. Since an unsupervised clustering was performed, to support the functional annotation, the annotations obtained were correlated with the TIGR P. falciparum Gene Index; (Version 6.0, Release Date – January 11, 2003) and the Apicomplexan EST Database (ApiESTDB); . Both these indices were generated with supervised clustering. The correlation was done by computational extraction of associated annotation of the TIGR Tentative Consensus (TC) followed by manual checking to determine if the annotation obtained in our analysis was identical to that of the TIGR TCs. This was done for only differentially expressed contigs. If the annotations were not identical, the reconstructed sequence was excluded from further analysis. ApiESTDB was consulted when additional support was required to make a decision.

Mining gene ontology annotation associated with transcripts

Genes classified as being involved in glycolysis (GO:0006096), proteolysis (GO:0006508) or targeted to the plastid (GO:0009536) were retrieved by searching PlasmoDB gene overview page for the respective GO identification (ID) number in a similar way as described for the protein expression profile except the search text was the respective GO ID preceded by the greater than sign (>) for example >GO:0006096. This text limits the search to the Gene Ontology section of the gene overview page. The number of genes retrieved was: 20 for glycolysis, 98 for proteolysis and 553 for plastid component. This corresponds to values obtained from the web-based PlasmoDB query page.

Correlation of EST-based abundance with microarray expression levels

The numbers of ESTs used to generate a reconstructed sequence were retrieved from the FASTA sequence description line of all reconstructed sequences generated by stackPACK 2.2. The levels of expression or average signal intensities obtained from microarray experiments on the serine repeat antigen (SERA) gene family of P. falciparum [19,42-44] were used to compare the levels of expression obtained using ESTs. This gene family is characterised by a cysteine proteinase framework [39] and was selected because its members are annotated as being involved in proteolysis. Published microarray studies on this family have been obtained that facilitated comparative analysis with EST data.

Results

Transcript reconstruction and functional annotation of transcripts

Transcript reconstruction using stackPACK 2.2 resulted in 1,760 contigs and 3,391 singletons. A total of 569 transcripts had an EST count of at least five ESTs. Functional annotation by similarity searching was performed for all reconstructed transcripts. A total of 210 transcripts that were differentially expressed were manually checked for correlation with TIGR and/or ApiESTDB P. falciparum gene indices. This process yielded 194 transcripts with correlated functional annotation.

Differential expression transcripts and protein expression profiling

The majority of the stage-overexpressed transcripts were from the late gametocyte stage. However, the mixed asexual stage had the highest percentage (83%) of genes with evidence of protein expression in the same stage (stage-correlated protein expression) compared to 31% for the late gametocyte stage. The observations are summarised in Tables 2 to 5. The 194 transcripts differentially expressed between the two libraries consisted of 51 from the mixed asexual stage and 143 from the late gametocyte stage. The complete list with transcript identification used in this study, correlated transcripts in the TIGR P. falciparum gene index, gene locus name, gene product description, representative EST or ESTs (for genes with representation from both libraries), observed and normalized EST counts for the two stages, as well as protein expression profile, are presented in the additional files 1 and 2 for mixed asexual stage and late gametocyte stage respectively. A list of stage-overexpressed transcripts that match those of Li et al. [13] is presented in additional file 3.
Table 2

Summary of functional annotation and protein expression of Plasmodium falciparum transcripts

TranscriptsNumber
Differentially expressed210
Correlated functional annotation194
Stage-overexpressed
 Mixed asexual stage51
 Late stage gametocyte143
With significant match to predicted proteins
 Mixed asexual stage48
 Late stage gametocyte128
Correlated protein expression
 Mixed asexual stage40
 Late stage gametocyte38
Table 5

Distribution of protein expression profiles for Plasmodium falciparum stage-overexpressed genes

Gene categoryBinary accessionaCount
Asexual-overexpressed
With protein expression1111, 0111, 1011, 1101, 1110, 0011, 0101, 0110, 1010, 1100, 0010, 010040
Without protein expression0000, 1001, 0001, 10008
Gametocyte-overexpressed
With protein expression1111, 0111, 1011, 1101, 0011, 0101, 1001, 000134
Without protein expression0000, 1110, 0110, 1010, 1100, 0010, 0100, 100087

a 4-digit binary accession for protein expression evidence in sporozoite, merozoite, trophozoite and gametocyte.

Summary of functional annotation and protein expression of Plasmodium falciparum transcripts Asexual-overexpressed Plasmodium falciparum transcripts a Transcript generated by stackPACK 2.2. b TIGR Tentative Consensus correlated with transcript available at . c Gene can be viewed at . d EST can be retrieved at . e Gene involved in glycolysis. f Apicoplast-targeted gene. g Gene involved in proteolysis. Gametocyte-overexpressed Plasmodium falciparum transcripts a Transcript generated by stackPACK 2.2. b TIGR Tentative Consensus correlated with transcript available at . c Gene can be viewed at . d EST can be retrieved at . e Gene involved in glycolysis. f Apicoplast-targeted gene. Distribution of protein expression profiles for Plasmodium falciparum stage-overexpressed genes a 4-digit binary accession for protein expression evidence in sporozoite, merozoite, trophozoite and gametocyte. A total of 128 gametocyte-overexpressed and 48 asexual-overexpressed transcripts had a significant match with the predicted P. falciparum 3D7 proteins. Seventy-four genes (40 asexual-overexpressed, 34 gametocyte-overexpressed) showed evidence of stage-correlated protein expression (Tables 3 and 4). The well-studied S-antigen (PF10_0343) is one of the 8 asexual-overexpressed genes without stage-correlated protein expression. Four gametocyte-overexpressed genes (PFB0730w, PFI1210w, PF10_0115 and PFL0105w) had more than one reconstructed transcript. Multiple transcripts were generated when the reconstructed transcripts associated with a gene are not contiguous, and thus were not assembled into the same contig. Fifty-three of the 74 genes were classified as novel in that either the description of the gene product is labelled hypothetical protein or have the word putative.
Table 3

Asexual-overexpressed Plasmodium falciparum transcripts

TranscriptaTIGR Tentative ConsensusbGene locus name cDescription of gene productRepresentative EST(s) d
cn672TC6879PFI0265crhoptry protein, putativeBI670632
cn1243TC6890 TC6891PFL1385c101 kd malaria antigenBI670667
cn656TC6894PF11_0098endoplasmic reticulum-resident calcium binding proteinBI670528 BM274707
cn346TC6883 TC6884 TC6885PF14_0598 eglyceraldehyde-3-phosphate dehydrogenaseBI670581 BM273393
cn659TC6886 TC6887PFB0340c gcysteine protease, putativeBI670678
cn646TC6895PF14_0102rhoptry-associated protein 1BI670673
cn1292TC6896PFI0875wHeat shock proteinBI670644
cn634TC6897 TC6898 TC8065MAL13P1.214phosphoethanolamine N-methyltransferase, putativeBI670572
cn1258TC6900PFI1445whypothetical proteinBI670690
cn1175TC6899PFC0120wCytoadherence linked asexual protein, CLAGBI670808
cn637TC6921PFE0165wactin depolymerizing factor, putativeBI813965 BM274236
cn1246TC6922MAL8P1.142 gproteasome beta-subunitBI670563
cn628TC6926PF10_0203ADP-ribosylation factorBI814382
cn1338TC6943PF14_0141ribosomal protein L10, putativeBI670722
cn1375TC6945MAL7P1.77hypothetical proteinBI814179
cn1569TC6954 TC6955PFE0915cproteasome subunit beta type 1BI670682
cn1255TC6969 TC7520PFB0445chelicase, putativeBI670715
cn604TC6958PFL0210ceukaryotic initiation factor 5a, putativeBI670597
cn1249TC6970PF07_0054histone h2b, putativeBI670668
cn1465TC6959PF14_03682-Cys peroxiredoxinBI670633
cn581TC6975PF14_0543 fhypothetical protein, conservedBI814501
cn1219TC6956PF10_0345merozoite surface protein-3BI670568
cn1339TC6992PFL1420wmacrophage migration inhibitory factor homolog, putativeBI815759
cn1396TC6971PF10_0121hypoxanthine phosphoribosyltransferaseBI814714
cn567TC6917PF10_0268merozoite capping protein-1BI670775
cn1555TC7001PFI0155cras family GTP-ase, putativeBI814010
cn561TC7038PF10_0016acyl CoA binding protein, putativeBI815304
cn1165TC7015PFD0240chypothetical proteinBI816061
cn1379TC7007PF07_0087 fhypothetical proteinBI813959
cn1475TC6914PFI1090ws-adenosylmethionine synthetase, putativeBI813864
cn1811TC6989 TC6990PF14_0323calmodulinBI814267
cn564TC6993PFE1050wadenosylhomocysteinase(S-adenosyl-L-homocysteine hydrolase)BI814536
cn613TC7023 TC8311PFB0490chypothetical proteinBI815328
cn1485TC7032PF13_022840S ribosomal subunit protein S6, putativeBI670560
cn1681TC7025PF13_0328proliferating cell nuclear antigenBI813993
cn558TC7018PF14_0678exported protein 2BI670646
cn1605TC6904MAL13P1.130hypothetical proteinBI814223
cn1997TC7030PFE0660curidine phosphorylase, putativeBI814451
cn557TC7036PF13_0092cholinephosphate cytidylyltransferaseBI814410
cn1368TC7086PF14_0569hypothetical proteinBI814420

a Transcript generated by stackPACK 2.2. b TIGR Tentative Consensus correlated with transcript available at . c Gene can be viewed at . d EST can be retrieved at . e Gene involved in glycolysis. f Apicoplast-targeted gene. g Gene involved in proteolysis.

Table 4

Gametocyte-overexpressed Plasmodium falciparum transcripts

Transcript aTIGR Tentative Consensus bGene locus name cDescription of gene productRepresentative EST(s) d
cn298TC6923 TC7279 TC9304PFD0310wsexual stage-specific protein precursorBI814617 BM273325
cn156TC6995PFL0795chypothetical proteinBI813971 BM273682
cn144TC7077PF11_0525 fhypothetical proteinBM273367
cn369TC6974PF10_026440S ribosomal protein, putativeBI814069 BM273547
cn57TC7312 TC7511PFL2420whypothetical proteinBM273440
cn271TC6963PFB0730wDNA helicase, putativeBM273418
cn291TC6911PF07_0029heat shock protein 86BI670622 BM273491
cn43TC6936PFL2215wactinBM273378
cn105TC7084PF07_0061hypothetical proteinBI936117 BM273354
cn168TC6963PFB0730wDNA helicase, putativeBM273308
cn178TC6987PFI1210whypothetical proteinBM274237
cn337TC7315PF08_0081hypothetical proteinBM274748
cn404TC7057PF10_0115QF122 antigenBM273319 BQ596378
cn46TC7235PFL0105whypothetical proteinBM273988 BQ577236
cn246TC7159PF14_0359hypothetical protein, conservedBI814120 BM273571
cn60TC7496PF10_0328hypothetical proteinBM273370
cn155TC7437PF11_0294 eATP-dependent phosphofructokinase, putativeBM273524
cn269TC7203MAL6P1.306hypothetical proteinBI815038 BM273934
cn347TC6987PFI1210whypothetical proteinBM273395
cn19TC7561MAL13P1.148P. falciparum myosinBM274131
cn683TC7619PFD0235chypothetical proteinBM274865
cn833TC7170PFL1070cendoplasmin homolog precursor, putativeBI670681 BM273857
cn71TC6893PFL0105whypothetical proteinBM274046
cn93TC7763PF11_0460hypothetical proteinBM273313
cn165TC7103PF13_0165hypothetical proteinBI670714 BM273638
cn288TC7304PF10_0165DNA polymerase delta catalytic subunitBM274252
cn685TC7766PF11_0331t-complex protein 1, alpha subunit, putativeBM273631
cn717TC7621PF10_0115QF122 antigenBM273917
cn737TC8144PFL1395chypothetical proteinBM273513
cn832TC7423PFI0460whypothetical proteinBM273947
cn49TC7047PF10_0242hypothetical proteinBM274006 BQ597262
cn248TC7431PFD0685cchromosome associated protein, putativeBI936055 BM274686
cn326TC7394PFC0570chypothetical proteinBM273462 BU496460
cn750TC7788PF10_0256hypothetical proteinBM273642 BQ452171
cn945TC7533PFA0460ctubulin-specific chaperone a, putativeBM273558 BQ451292
cn982TC7573MAL6P1.48hypothetical protein, expressedBI814116 BM273303
cn681TC7652PFE0845c60S ribosomal subunit protein L8, putativeBM273443 BU495298
cn805TC7301MAL13P1.120splicing factor, putativeBI815872 BM274487

a Transcript generated by stackPACK 2.2. b TIGR Tentative Consensus correlated with transcript available at . c Gene can be viewed at . d EST can be retrieved at . e Gene involved in glycolysis. f Apicoplast-targeted gene.

In order to identify gametocyte-overexpressed genes that also have stage-correlated protein expression in the proteomics data of Lasonder et al. [21], the spreadsheet file containing 1,289 unique malaria proteins from that study was processed to yield a 3-digit binary accession representing evidence for protein expression of genes in trophozoites/schizonts, gametocytes and gametes. Fifteen of the 34 gametocyte-overexpressed genes were detected by both proteomic analyses (Table 6). Our analysis points to the need to clarify potential confusion in the annotation of the sexual stage specific protein precursor or Pfs16 (PFD0310w), a known marker for the earliest events of sexual differentiation [45]. The locus name (PF11_0318) of another gene, PF16, may be assigned to this gene [21]. PF16 has sequence similarity to a sperm flagella protein localized to the central pair of the axoneme. The gametocyte-overexpressed gene identified in this study was confirmed to be Pfs16 and not PF16 by the identical functional annotation of the associated consensus sequence from this study and that in the TIGR P. falciparum gene index.
Table 6

Gametocyte-overexpressed Plasmodium falciparum genes with correlated protein expression in two proteomic studies

Gene locus nameDescription of gene productProtein expression binary accession a
Florens et al. [20]bLasonder et al. [21]c
PFA0460ctubulin-specific chaperone a, putative0001011
PFD0310wsexual stage-specific protein precursor0011111
PFD0685cchromosome associated protein, putative0101010
PFE0845c60S ribosomal subunit protein L8, putative0111111
PF07_0029heat shock protein 861111111
PF10_0165DNA polymerase delta catalytic subunit0111010
PF10_0242hypothetical protein0111111
PF10_026440S ribosomal protein, putative0111111
PF11_0294ATP-dependent phosphofructokinase, putative0001011
PF11_0331t-complex protein 1, alpha subunit, putative1111111
PF11_0525hypothetical protein1001010
PFL0795chypothetical protein0001011
PFL1070cendoplasmin homolog precursor, putative1111111
PFL2215wactin1111111
PF14_0359hypothetical protein, conserved0111111

a Evidence of expression: 0, no evidence; 1, with evidence. b 4-digit binary accession for protein expression evidence in sporozoite, merozoite, trophozoite and gametocyte. c 3-digit binary accession for protein evidence in trophozoite/schizont, gametocyte and gametes.

Gametocyte-overexpressed Plasmodium falciparum genes with correlated protein expression in two proteomic studies a Evidence of expression: 0, no evidence; 1, with evidence. b 4-digit binary accession for protein expression evidence in sporozoite, merozoite, trophozoite and gametocyte. c 3-digit binary accession for protein evidence in trophozoite/schizont, gametocyte and gametes. The identified asexual-overexpressed genes that have been experimentally characterised have known roles in protein degradation, purine salvage, rhoptry biogenesis and protein trafficking, schizont rupture, merozoite invasion, phospholipid biosynthesis, nuclear metabolism, oxidative stress defense, cell proliferation and membrane biogenesis. Glyceraldehyde-3-phosphate dehydrogenase (PF14_0598) and ATP-dependent phosphofructokinase (PF11_0294) are two of 20 genes known to be involved in glycolysis. They demonstrate differential expression and show evidence of stage-correlated protein expression. Microarray average intensities [19] available in PlasmoDB for PF11_0294 support its gametocyte-overexpression when compared to a closely related gene, PFI0755c that also codes for a phosphofructokinase and shows protein expression in intraerythrocytic stages [20,21]. The microarray expression values for PFI0755c in trophozoite and schizont stages are 17,223.33 and 7,894 respectively in contrast to ~1,600 in both stages for PF11_0294. Inspection of the predicted protein features of PF11_0294 revealed the presence of two protein domains: gonadotropin-releasing domain, GnRH (Pfam ID: PF00446) and laminin N-terminal (Domain VI) (Pfam ID: PF00055). These domains are found in proteins that are extracellular and have a role in regulation of germ cell development. PFB0340c, a cysteine protease and member of the SERA gene family was significantly overexpressed in mixed asexual stage. Other genes in the SERA family for which EST data were available were checked for correlation of functional annotation and their EST count retrieved. As shown in Table 7, the EST counts were variable across the gene family consistent with microarray-based studies [42-44]. There was EST evidence for expression of PFB0345c (SERA4), PFB0340c (SERA5) and PFB0335c (SERA6), the three central genes that were demonstrated to be essential for asexual stage growth [42]. The GenBank accession numbers of a representative EST from these genes are BI936220, BI815392 and BQ633262 respectively. PFB0340c showed the highest EST count and microarray intensity values during asexual development of the parasite. Furthermore, multiple contigs mapped to this gene, which may represent alternative transcripts.
Table 7

Correlation of EST abundance and microarray intensity associated with SERA gene family

Gene (Locus name)EST countaComments bMicroarray intensity values c
Miller et al. [42]Le Roch et al. [43]Bozdech et al. [19]Wu et al. [44]
RTSTSAsyn
SERA8 (PFB0325c)--/+35.310.439.3--179
SERA7 (PFB0330c)d7-/+160.5982.1129822385475.832415
SERA6 (PFB0335c) e2+200.7588.61012.61695.174802.833428
SERA5 (PFB0340c) e, f98+1255.44623.710265.513253.6759511.1728613
SERA4 (PFB0345c) e4+200496.71456.73115.1710053.172273
SERA3 (PFB0350c)-+87.3341579.7-6319.834572
SERA2 (PFB0355c)--/+185.4219.4399.1--1401
SERA1 (PFB0360c)2-/+125.9178.1615.7--376

a -, no ESTs observed. b Comments on gene expression: -/+, low or absent expression; +, expression confirm by RT-PCR and microarray. c R, Rings; T, Trophozoite; S, Schizont; Asyn, asynchronous culture; -, No expression value reported. d EST count of TIGR TC7227. e Central genes in the SERA locus that could not be disrupted in study [42]. f Gene with multiple transcripts, TC6886 (BI670678) TC6962 (BI814535).

Correlation of EST abundance and microarray intensity associated with SERA gene family a -, no ESTs observed. b Comments on gene expression: -/+, low or absent expression; +, expression confirm by RT-PCR and microarray. c R, Rings; T, Trophozoite; S, Schizont; Asyn, asynchronous culture; -, No expression value reported. d EST count of TIGR TC7227. e Central genes in the SERA locus that could not be disrupted in study [42]. f Gene with multiple transcripts, TC6886 (BI670678) TC6962 (BI814535). Out of the 17 transcripts (four asexual and 13 gametocyte) associated with genes targeted to the apicoplast, only two genes: MAL13P1.281 and PFE0145w have similarities to known genes (glutamate-tRNA ligase and 50S ribosomal subunit protein L28). There was evidence of protein expression in at least one asexual stage for two (PF07_0087, PF14_0543) of the four asexual-overexpressed genes (Table 3). Six gametocyte-overexpressed genes showed evidence for expression in the sporozoite stage while only PF11_0525 showed evidence in the sporozoite and gametocyte stages. PF11_0525 has predicted protein motifs that indicate its likely function. The domains are IQ (calmodulin-binding motif, Pfam ID: PF00612) and LysM (lysin motif, Pfam ID: PF01476), which is a general peptidoglycan-binding module. A list of apicoplast-targeted genes with stage-overexpressed transcripts is presented in additional file 4.

Discussion

An integrative approach was used to determine genes associated with transcripts differentially expressed between mixed asexual stage and late stage gametocyte parasites. The publication of the genome sequence of two malaria parasites presents opportunities for post-genomic era malaria research including gene discovery and comprehensive understanding of gene expression [46]. The study has revealed (1) possible regulatory mechanisms in malaria parasites' gametocyte maturation, (2) correlation between EST and microarray data for a P. falciparum gene family to present unique EST-derived information, (3) candidate genes on which computational and experimental studies can be performed, and (4) the need for more empirical studies on gene and protein expression in malaria parasites. A total of 569 contigs was used to determine stage-overexpression. These presents 366 more contigs than described by Li et al. [13] reflecting inclusion of new mixed asexual stage ESTs deposited after March 2002. Only 21 of the 24 significantly stage-specific transcripts identified by Li et al. [13] were among our stage-overexpressed transcripts after correlation of functional annotation. Both studies demonstrate the asexual-overexpression of the gene for glyceraldehyde-3-phosphate dehydrogenase (GAPDH), an important gene in the glycolytic pathway [47]. Gene and protein expression were observed, as well as protein domain evidence for specialization or adaptation of ATP-dependent phosphofructokinase (PF11_0294) for metabolic coupling of glucose utilization and maturation of gametocytes in malaria parasites. This enzyme is of major regulatory importance in Plasmodium and has been characterised only in Plasmodium berghei [48]. In addition, it has been proposed as a potential drug target in protozoan parasites [49]. Two genes (PF11_0294, PFI0755c) annotated as phosphofructokinase are present in the genome [7]. This is consistent with the fact that many key enzymes in the glycolytic pathway occur as isoenzymes [48]. Interestingly, PF11_0294 possesses a gonadotropin-releasing domain GnRH and laminin N-terminal (Domain VI) that are thought to regulate germ cell development. PFI0755c does not contain these domains. PF11_0525 is the only apicoplast-targeted gene associated with a gametocyte-overexpressed transcript that showed stage-correlated protein expression. The fact that germ cell biology is conserved in evolution enables us to speculate on the possible roles of this protein. The calmodulin (CaM) binding site has been extensively studied in a sperm autoantigen (Sp17), which is a zona binding protein and a member of the family of CaM binding proteins that contain the IQ motif in the CaM binding domain. This domain has a regulatory role and undergoes proteolytic processing at the initiation of an acrosome reaction [50]. Some bacterial proteins such as hydrolytic enzymes contain the general peptidoglycan-binding module (LysM) and have a role in cell-wall penetration [51]. PF11_0525 does not have evidence of a bipartite peptide for apicoplast targeting and thus may be targeted via a different mechanism to the organelle or it may no longer function in the plastid. The EST counts of the SERA gene family are comparable with the gene expression levels observed in microarray experiments. Both technologies agree that expression levels of members are variable as is expression of central genes during the asexual stage of the parasite. PFB0340c (SERA5) is the first described member of the family [39] and is also a malaria vaccine candidate [52]. The EST counts for PFB0340c observed is consistent with high gene expression levels in trophozoites and schizonts in published microarray experiments. Specifically, Miller et al. [42] and Aoki et al. [52] observed PFB0340c to be substantially more strongly transcribed than other SERA genes. The increasing amount of published and unpublished data from microarray, SAGE, EST and differential display on malaria parasites shows that pairwise correlation is required. Comparison of such datasets obtained from different gene expression technologies can complement less sensitive technologies, hence adding value to data generation from these methods. For example, this study provides identity of ESTs and also potential alternative transcripts that can be used to further characterize the SERA central genes. Furthermore, PFB0325c (SERA8) did not have EST evidence consistent with low or absent expression observed in the microarray studies. However, there was evidence of its expression in the sporozoite stage, indicating the gene may be functional in other stages of the life cycle as speculated by Miller et al. [42]. Large-scale comparative expression analysis of gene families in multiple malaria parasites is needed to advance the knowledge of their evolution and their role during intraerythrocytic development. The two uncharacterized genes from which we speculate functional insights, PF11_0294 and PF11_0525, have putative orthologues in P. yoelii yoelli (PY05918 and PY06990 respectively) [8] and were also detected in two independent proteomic analysis as expressed in the mature gametocyte stage [20,21]. These observations strengthen the need for further studies on these genes and the possibility of studies with model malaria parasites. In general, various categories of candidate genes were provided that can be intensively studied as drug targets, antigenic targets, epidemiological or clinical markers. Eighty-seven of the 121 gametocyte-overexpressed genes did not show evidence of stage-correlated protein expression while 15 of those with such evidence were corroborated by the two proteomics studies. These corroborated genes represent a set of gametocyte-overexpressed genes with correlated transcription and translation data and thus candidates for studies on gametocyte maturation in malaria parasites. A shortlist of stage-overexpressed genes targeted to the plastid is presented to facilitate studies to understand the regulation of plastid metabolism in malaria parasites. This study has identified the lack of correlation between gene and protein expression of the asexual-overexpressed S-antigen, consistent with observations from published proteome analysis [20]. This observation and those from the gametocyte-overexpressed transcripts as well as comparing outputs from EST clustering efforts demonstrate that our integrative approach has the utility to compare outputs of different post-genomic analysis. The analysis indicates the need for additional empirical studies on gene and protein expression in malaria parasites. Such studies could improve current understanding on discrepancies between gene and protein expression profiling data as well as the detection of proteins with unique characteristics such as proteolytic processing, post-translational modification and sub-cellular location.

Conclusions

The value of integrating a variety of datasets to unravel undiscovered regulation in biological processes during the gametocyte maturation stages of P. falciparum was demonstrated. Furthermore, comparative analysis of EST and microarray data was performed on the SERA gene family to advance the knowledge of their gene regulation and additional functional genomics reagents were presented to facilitate their study. Finally, the integrative approach was shown as a means to appraise critically the data quality of the increasing number of post-genomic datasets from malaria parasites.

Additional File 1

Plasmodium falciparum asexual-overexpressed transcripts Click here for file

Additional File 2

Plasmodium falciparum gametocyte-overexpressed transcripts Click here for file

Additional File 3

Correlated stage-overexpressed transcripts in this study and that of Li et al. [13] Click here for file

Additional File 4

Plasmodium falciparum candidate genes for studies into plastid metabolism Click here for file
  49 in total

1.  Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometry.

Authors:  Edwin Lasonder; Yasushi Ishihama; Jens S Andersen; Adriaan M W Vermunt; Arnab Pain; Robert W Sauerwein; Wijnand M C Eling; Neil Hall; Andrew P Waters; Hendrik G Stunnenberg; Matthias Mann
Journal:  Nature       Date:  2002-10-03       Impact factor: 49.962

Review 2.  Current methods of gene prediction, their strengths and weaknesses.

Authors:  Catherine Mathé; Marie-France Sagot; Thomas Schiex; Pierre Rouzé
Journal:  Nucleic Acids Res       Date:  2002-10-01       Impact factor: 16.971

3.  Shotgun DNA microarrays and stage-specific gene expression in Plasmodium falciparum malaria.

Authors:  R E Hayward; J L Derisi; S Alfadhli; D C Kaslow; P O Brown; P K Rathod
Journal:  Mol Microbiol       Date:  2000-01       Impact factor: 3.501

4.  Profiling the malaria genome: a gene survey of three species of malaria parasite with comparison to other apicomplexan species.

Authors:  J M Carlton; R Muller; C A Yowell; M R Fluegge; K A Sturrock; J R Pritt; E Vargas-Serrato; M R Galinski; J W Barnwell; N Mulder; A Kanapin; S E Cawley; W A Hide; J B Dame
Journal:  Mol Biochem Parasitol       Date:  2001-12       Impact factor: 1.759

Review 5.  Entering the post-genomic era of malaria research.

Authors:  P Horrocks; S Bowman; S Kyes; A P Waters; A Craig
Journal:  Bull World Health Organ       Date:  2003-11-17       Impact factor: 9.408

6.  Serial analysis of gene expression (SAGE) in Plasmodium falciparum: application of the technique to A-T rich genomes.

Authors:  A Munasinghe; S Patankar; B P Cook; S L Madden; R K Martin; D E Kyle; A Shoaibi; L M Cummings; D F Wirth
Journal:  Mol Biochem Parasitol       Date:  2001-03       Impact factor: 1.759

7.  Exploring the transcriptome of the malaria sporozoite stage.

Authors:  S H Kappe; M J Gardner; S M Brown; J Ross; K Matuschewski; J M Ribeiro; J H Adams; J Quackenbush; J Cho; D J Carucci; S L Hoffman; V Nussenzweig
Journal:  Proc Natl Acad Sci U S A       Date:  2001-08-07       Impact factor: 11.205

8.  Cross-referencing eukaryotic genomes: TIGR Orthologous Gene Alignments (TOGA).

Authors:  Yuandan Lee; Razvan Sultana; Geo Pertea; Jennifer Cho; Svetlana Karamycheva; Jennifer Tsai; Babak Parvizi; Foo Cheung; Valentin Antonescu; Joseph White; Ingeborg Holt; Feng Liang; John Quackenbush
Journal:  Genome Res       Date:  2002-03       Impact factor: 9.043

Review 9.  The Plasmodium vivax genome sequencing project.

Authors:  Jane Carlton
Journal:  Trends Parasitol       Date:  2003-05

10.  A proteomic view of the Plasmodium falciparum life cycle.

Authors:  Laurence Florens; Michael P Washburn; J Dale Raine; Robert M Anthony; Munira Grainger; J David Haynes; J Kathleen Moch; Nemone Muster; John B Sacci; David L Tabb; Adam A Witney; Dirk Wolters; Yimin Wu; Malcolm J Gardner; Anthony A Holder; Robert E Sinden; John R Yates; Daniel J Carucci
Journal:  Nature       Date:  2002-10-03       Impact factor: 49.962

View more
  5 in total

1.  Screening the Schistosoma mansoni transcriptome for genes differentially expressed in the schistosomulum stage in search for vaccine candidates.

Authors:  Leonardo P Farias; Cibele A Tararam; Patricia A Miyasato; Milton Y Nishiyama; Katia C Oliveira; Toshie Kawano; Sergio Verjovski-Almeida; Luciana Cezar de Cerqueira Leite
Journal:  Parasitol Res       Date:  2010-09-18       Impact factor: 2.289

2.  Heavy path mining of protein-protein associations in the malaria parasite.

Authors:  Xinran Yu; Turgay Korkmaz; Timothy G Lilburn; Hong Cai; Jianying Gu; Yufeng Wang
Journal:  Methods       Date:  2015-04-07       Impact factor: 3.608

3.  Comparative Genomics and Systems Biology of Malaria Parasites Plasmodium.

Authors:  Hong Cai; Zhan Zhou; Jianying Gu; Yufeng Wang
Journal:  Curr Bioinform       Date:  2012-12-01       Impact factor: 3.543

4.  Proteomic profiling of Plasmodium sporozoite maturation identifies new proteins essential for parasite development and infectivity.

Authors:  Edwin Lasonder; Chris J Janse; Geert-Jan van Gemert; Gunnar R Mair; Adriaan M W Vermunt; Bruno G Douradinha; Vera van Noort; Martijn A Huynen; Adrian J F Luty; Hans Kroeze; Shahid M Khan; Robert W Sauerwein; Andrew P Waters; Matthias Mann; Hendrik G Stunnenberg
Journal:  PLoS Pathog       Date:  2008-10-31       Impact factor: 6.823

5.  A novel subnetwork alignment approach predicts new components of the cell cycle regulatory apparatus in Plasmodium falciparum.

Authors:  Hong Cai; Changjin Hong; Timothy G Lilburn; Armando L Rodriguez; Sheng Chen; Jianying Gu; Rui Kuang; Yufeng Wang
Journal:  BMC Bioinformatics       Date:  2013-09-24       Impact factor: 3.169

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.