Literature DB >> 22375094

In Silico Prediction of Evolutionarily Conserved GC-Rich Elements Associated with Antigenic Proteins of Plasmodium falciparum.

Porkodi Panneerselvam1, Praveen Bawankar, Surashree Kulkarni, Swati Patankar.   

Abstract

The Plasmodium falciparum genome being AT-rich, the presence of GC-rich regions suggests functional significance. Evolution imposes selection pressure to retain functionally important coding and regulatory elements. Hence searching for evolutionarily conserved GC-rich, intergenic regions in an AT-rich genome will help in discovering new coding regions and regulatory elements. We have used elevated GC content in intergenic regions coupled with sequence conservation against P. reichenowi, which is evolutionarily closely related to P. falciparum to identify potential sequences of functional importance. Interestingly, ~30% of the GC-rich, conserved sequences were associated with antigenic proteins encoded by var and rifin genes. The majority of sequences identified in the 5' UTR of var genes are represented by short expressed sequence tags (ESTs) in cDNA libraries signifying that they are transcribed in the parasite. Additionally, 19 sequences were located in the 3' UTR of rifins and 4 also have overlapping ESTs. Further analysis showed that several sequences associated with var genes have the capacity to encode small peptides. A previous report has shown that upstream peptides can regulate the expression of var genes hence we propose that these conserved GC-rich sequences may play roles in regulation of gene expression.

Entities:  

Keywords:  Plasmodium; antigenic variation; comparative genomics; genome bias; regulatory elements

Year:  2011        PMID: 22375094      PMCID: PMC3283219          DOI: 10.4137/EBO.S8162

Source DB:  PubMed          Journal:  Evol Bioinform Online        ISSN: 1176-9343            Impact factor:   1.625


Introduction

Regulatory motifs that allow fine-tuning of gene expression are of interest in the malaria parasite Plasmodium falciparum. These include promoters, mRNA stability motifs and translation regulatory sequences. Some regulatory motifs also encode non-coding RNAs (ncRNAs) that in turn regulate expression of genes. The importance of regulatory motifs cannot be underestimated in the parasite since mechanisms of regulation of gene expression are still being elucidated in this human pathogen.1–3 The comparative genomics approach has been successfully employed to identify evolutionarily well-conserved regulatory elements in C. elegans, S. cerevisiae and Homo sapiens.4–6 This is based on the rationale that functionally important sequences are often conserved among species. Comparative genomics has also been used in Plasmodium species to identify regulatory motifs.7 Another feature of the Plasmodium falciparum genome that has proved useful in the search for new regulatory elements has been nucleotide bias. Plasmodium falciparum has an unusually AT-rich genome,8 with an average AT content of 80% that increases to 90% in intergenic regions. In such a biased genome, local regions of increased GC content in the non-coding regions appear to correlate with functionally important features. For example, a conserved, GC-rich region found upstream of heat shock protein (hsp) genes is a functionally important DNA regulatory element.9,10 In two reports including one from our group, noncoding RNAs (ncRNAs) were identified in Plasmodium falciparum based on searching for conserved GC-rich intergenic regions.10,11 Similarly, nucleotide compositional contrast has been used to identify ncRNA in the AT rich genome of Dictyostelium discoideum and hyperthermophiles.12–14 This type of screen exploited the fact that most RNA regulatory elements carry out their functions by inter-molecular or intra-molecular base pairing; hence an increase in GC content especially in an AT-rich genome would result in RNAs having more stable secondary structures. 15 Most of these reports also used comparative genomics and evolutionary conservation as a tool to assess functional significance. The choice of genomes used for comparative genomics is critical. In a bioinformatics screen described previously,11 since the complete genome of P. yoelii was available we chose this species for identifying conserved, GC-rich intergenic regions that were shown to encode ncRNAs. However, with the recent availability of other Plasmodium genomes, it is likely that other genomes might be equally useful for comparative genomics. Indeed, P. yoelii and P. falciparum appear to have diverged >100 million years ago16 however, P. falciparum has been shown to be most closely related to the chimpanzee malaria parasite P. reichenowi.17–19 Apart from housekeeping genes, several ORFs that encode cell surface proteins in P. falciparum are conserved between P. falciparum and P. reichenowi; these include CSP,20 MSP221 and var CSA.22 In contrast, the var, rifin and stevor multigene families that are involved in antigenic variation in P. falciparum are represented by a single multi-gene family (yir) in P. yoelii that is most closely related to the vir family in P. vivax.23,24 Over the entire genome, P. yoelii is most closely related to the other rodent malaria parasites P. berghei and P. chabaudi.25 In this report we ask whether regulatory elements can be identified by a bioinformatics screen using elevated GC content in the P. falciparum genome, followed by sequence conservation in other Plasmodium species. Due to the large evolutionary distance between P. falciparum and P. yoelii, we hypothesized that the choice of these two genomes for comparative genomics may not identify regulatory elements associated with immunogenic genes that are specifically expressed in P. falciparum and not in P. yoelii. Hence for identification of genomic sequences that might be involved in host-specific functions eg, evasion from the immune system or regulation of antigenic variation genes, a primate malaria parasite genome would be more appropriate for the comparative genomics part of any bioinformatics screen. We show that a large number of GC-rich sequences are conserved in the genomes of P. falciparum and the primate parasite P. reichenowi. Many of these GC-rich sequences flank genes involved in antigenic variation and some may be transcribed and translated. Several reports in the literature show that short RNAs can regulate transcription26 and short ORFs can regulate translation of downstream genes.27,28 Indeed, one of these reports shows that an upstream ORF regulates expression of certain var genes.29 We suggest that the sequences identified in this study may play roles in regulation of antigenic gene expression at the level of transcriptional or translational control.

Materials and Methods

GC% filter source data

The genome of Plasmodium falciparum 3D7 was downloaded chromosome wise from the online database (http://www.plasmodb.org/). Exon locations of all protein coding genes were also downloaded from the same database. Due to the unavailability of exon location data in the new version—PlasmoDB 5.2, all the data were downloaded from the older version PlasmoDB 4.4.

GC% C program algorithm

A C program was written which reads large text files of the Plasmodium falciparum genome. The program divides the genome into 70 base chunks with the sliding window of 10 bases. It uses exon location data and excludes those chunks which fall within ORFs. The GC% of each chunk was then calculated. An output FASTA file was generated with the sequences of all 70 base chunks with greater than 35% GC according to the sliding window model and lying outside ORFs. If any 70 base chunks with greater than 35% GC were overlapping, these were combined and treated as a single sequence. All such 70 base chunks were associated with their chromosomal locations; note that since overlapping chunks were merged together, some regulatory elements are greater than 70 bases.

Sequence Conservation Source Data

The genome contigs of Plasmodium species viz. P. yoelii, P. vivax, P. reichenowi, P. berghei, P. gallinaceum, P. knowlesi and P. chaubadi were downloaded from PlasmoDB 5.2. The Washington University BLAST version 2.0 (WU-BLAST) downloaded from http://www.blast.wustl.edu/ was employed to analyze sequence conservation. This BLAST version was installed on a Linux machine.

Shell Script

A shell script was written which took each sequence from the output FASTA file containing sequences having GC content greater than 35% and fed it into the BLAST software. It performs BLAST of all chunks in each of the query files with all the available contigs in the database file. The E value cut-off was set as 1e–10. Positive controls for the above strategy were rRNA, tRNA and the sequences identified with Plasmodium yoelii earlier by Upadhyay et al. In short, after running the BLAST analysis of GC-rich sequences using different genomes, we checked whether the 43 annotated tRNAs, 27 annotated rRNAs and 18 ncRNA sequences identified by Upadhyay et al were correctly identified.

Results and Discussion

Use of the P. reichenowi genome for comparative genomics can identify novel GC-rich conserved sequences

Previous work in our lab had used a bioinformatics strategy to identify GC-rich sequences present in intergenic regions that were conserved between P. falciparum and P. yoelii. This screen used two cut-offs (35% GC followed by an E value cut-off of 1e-10) and identified 18 sequences, many of which were found to be small molecular weight RNAs also known as non-coding RNAs (ncRNAs). These cut-offs were appropriate in searching for ncRNAs since we were able to identify all 43 annotated tRNAs and 27 annotated rRNAs from the P. falciparum genome. We hypothesized that using the same strategy but with different genomes for the comparative genomics part of the screen might give more GC-rich, conserved sequences that are associated with host-specific functions. These sequences might be regulatory DNA sequences, ncRNAs or protein-encoding regions. To ensure that the 35% GC cut-off was appropriate for identifying such regulatory sequences, and particularly to be sure that the probability of finding the GC-rich sequence was greater than chance, we did a simple statistical analysis. The average GC content of the 23 megabase P. falciparum genome (19%) was compared to the GC content of the 70 base chunks used in the screen (35%) with a Chi-square test using Minitab software. The P value of this test was 0.0003, indicating that the probability of finding a 35% GC-rich sequence of 70 bases in the P. falciparum genome, is very low. Hence, any GC-rich sequences identified should be significantly different from the genome in their nucleotide content. We proceeded to test our hypothesis that sequences greater than 35% GC-rich and conserved in other Plasmodium species might be regulatory sequences associated with host-specific functions. To test this, we initially performed the bioinformatics screen using only chromosome 1 of P. falciparum. This screen retained the original parameters of GC threshold and BLAST cutoff (>35% GC rich and BLAST value of e-10), however the BLAST analysis was performed against seven Plasmodium species—P. yoelii, P. reichenowi, P. berghei, P. vivax, P. gallinaceum, P. knowlesi and P. chabaudi. For all genomes except P. reichenowi, no new GC-rich, conserved sequences were identified. Interestingly, eighty-five new sequences could be identified when the screen involved comparison with the chimpanzee parasite, P. reichenowi. No new sequences were identified when BLAST was performed against the macaque parasite P. knowlesi and the human parasite P. vivax. This is consistent with reports that P. knowlesi falls in the same phylogenetic group as P. vivax.19 Hence, P. reichenowi was chosen as the most appropriate genome to do the comparative analysis for identifying regulatory elements in P. falciparum.

Proximal Intergenic Sequences

The bioinformatics screen was repeated using the entire P. falciparum genome to identify GC-rich sequences with a cutoff of 35% GC; these sequences were compared for conservation against the complete P. reichenowi genome (BLAST value of e-10) yielding ~1500 conserved GC-rich regions. In order to further prioritize these sequences an additional parameter was applied. This parameter restricted the output to sequences that lie within 500 bases of the start or stop codons of annotated ORFs (termed proximal intergenic regions). The rationale was that a majority of DNA regulatory elements and translational control elements are generally found within 500 bases of the start or stop codons of flanking genes. Hence we decided to sort out sequences that could lie within 5′ or 3′ UTRs of P. falciparum genes. Very few P. falciparum UTRs have been annotated, nevertheless Watanabe et al conclude from their analysis of a cDNA library that the 5′ UTRs of P. falciparum genes are unusually long, averaging 346 bp.30 Golightly et al report a 3′ UTR of 450 bp in the mRNA of Pgs28, an ookinete protein of the avian parasite P. gallinaceum. 31 Hence, we defined all the intergenic sequences within 500 bp of the coding region as ‘proximal intergenic sequences’. Those intergenic sequences, which lie greater than 500 bp from the coding sequence, were designated as ‘deep intergenic sequences’. Concurrently, Neafesy et al has suggested that conserved CpG dinucleotides enriched in proximal intergenic regions might function as regulatory elements.32 With these criteria in mind, ~1500 new GC-rich sequences that were identified during the bioinformatics analysis described in this report were pruned down to 151 by screening for proximal intergenic sequences (see Supplementary Table 1).

Immunogenic Proteins are Conserved in P. falciparum and P. reichenowi

Having shown that 151 sequences that are GC-rich and present in intergenic regions are conserved between P. falciparum and P. reichenowi, we wished to test our hypothesis that these might be associated with antigenic genes that are found in these two species. As a first step towards this, we tested whether families of antigenic genes found in P. falciparum are also present in P. reichenowi. A comparison of the chimpanzee’s genetic blue-prints with that of the human genome shows that our closest living relatives share 96 percent of our DNA. Humans and chimps originate from a common ancestor, and scientists believe they diverged some six million years ago.33 Interestingly the human malaria parasite P. falciparum diverged from the chimpanzee malaria parasite P. reichenowi around 5–7 million years ago17,34 suggesting that the primate parasites may have diverged at the same period when their hosts diverged. Several studies have shown that P. falciparum is most closely related to P. reichenowi.20,21 This is true not only for housekeeping genes but also for genes that encode proteins involved in host-parasite interactions. These include some of the var genes that encode the PfEMP family of proteins important for antigenic variation and evasion of the host immune response. Indeed, Trimnell et al22 have shown that fragments of the var1CSA and var2CSA genes are conserved between P. falciparum and P. reichenowi suggesting an ancient origin of some var loci. Like P. falciparum, P. reichnowi is also shown to express key invasion proteins like EBLs and MAEBLs.35,36 To further test the extent of relatedness of the parasites, an analysis was done for other genes involved in anti-genic variation. Antigenic proteins of P. falciparum involved in host pathogen interactions were chosen and BLAST analysis of the genes was performed with P. reichenowi contigs (PlasmoDB BLAST server— blastn). Two genes were chosen at random from each of the PfEMP, rifin and stevor families of antigenic surface proteins and the P. yoelii genome was used for comparison. Table 1 shows the results of this analysis.
Table 1

BLAST analysis of antigenic proteins.

Gene ID and the gene productNo of hits with P. yoeliiE value of best hit with P. yoeliiNo of hits with P. reichenowiE value of best hit with P. reichenowiBest hit with P. reichenowi
MAL13P1.1 PfEMP120.0271092e–87Pr_3502696.c000023469.Contig1
PF07_0051 PfEMP1273e–51077e–54Pr_3502696.c000023041.Contig1
MAL13P1.2 RIFIN20.017194e-128Pr_3502696.c000027339.Contig1
PFF0850c RIFIN0372e–95Pr_3502696.c000023726.Contig1
MAL13P1.505 STEVOR10.01535e-140Pr_3502696.c000023791.Contig1
PFI0045c STEVOR40.01434e-136Pr_3502696.c000023791.Contig1

Note: Two members of the PfEMP, rifin and stevor families were chosen arbitrarily from the P. falciparum genome and BLAST was performed against the genomes of Plasmodium yoelii and Plasmodium reichenowi.

Except for the var gene PF07_0051 there were fewer than 5 matches to the P. yoelii genome with the antigenic genes tested. PF07_0051 showed 27 matches with a best E value of 3e–5 indicating that this var gene may have weak homology to sequences in the P. yoelii genome. This is consistent with the data that there have been no genes showing homology to the var gene family in reports on P. yoelii genome analysis.8 Instead, the P. yoelii genome contains a multigene family (yir) that shows homology to the P. vivax vir multigene family.23,24 In contrast, 34–194 matches of the var, rifin and stevor genes were obtained by using BLAST against the P. reichenowi genome and these matches gave extremely low E values (E value < e-140) indicating that the sequences are highly conserved. The high numbers of matches obtained (eg, 194 with a rifin gene) indicate that P. reichenowi also has three different families of antigenic proteins like P. falciparum. Hence the data suggests that the P. falciparum genome is more similar to the genome of P. reichenowi than P. yoelii when antigenic variation genes are analyzed.

Sequences Proximal to var Genes

Having shown that antigenic variation genes are conserved in P. falciparum and P. reichenowi and that 151 GC-rich sequences are also conserved in the two genomes, the next question was whether these GC-rich sequences flanked antigenic variation genes. As mentioned in the previous section, sequestration and rosetting are key determinants of P. falciparum pathogenesis and these processes are mediated by the var gene family called Plasmodium falciparum Erythrocyte Membrane Proteins 1 (PfEMP1). To evade immunity and extend infections, parasites clonally vary the PfEMP1 proteins that are expressed on the surface of the infected red blood cells.37 Mechanisms of regulation of var genes have been a topic of intense research due to the clinical importance of these genes.38,39 Expression of var genes is regulated by two regions with separate promoters, one upstream of the coding region and a second within the intron.40 Upstream promoters of var genes fall into four major sequence classes: upsA, upsB, upsC and upsE41 of which upsA- upsB- and upsE type var genes lie in sub-telomeric regions and upsC-type var genes lie in internal clusters. Recent evidence indicates that var genes are activated by recruitment of the promoter to a perinuclear site that is permissive for transcription42 and also that the PfSIR2 regulator plays a role in var gene silencing.43,44 Recent studies indicate that ncRNAs associate with chromatin and thus regulate the expression of var gene family.45 Additionally, an upstream ORF can regulate certain var genes.29 Interestingly, the BLAST result with P. reichenowi showed that 27 of the proximal intergenic GC-rich sequences flank var genes (listed in Table 2). All these sequences lie in the 5′ UTR of the flanking var genes and most are less than 20 bp away from the predicted ORF of PfEMP1 proteins. The close proximity of the GC-rich sequences to the var ORF led us to wonder whether these sequences might be transcribed either as short RNAs or as part of the var mRNA transcripts.
Table 2

Conserved GC rich sequence associated with var genes.

CandidateLocationPfEMP1 AssociatedGC%IdentityAssociated ESTs
PfNC1.1varChr 1: 29631 to 29730PFA0005w3758/100AU088275
PfNC1.2varChr 1: 616621 to 616710PFA0765c38.933/90AU088275 and AU087013
PfNC2.1varChr 2: 25101 to 25230PFB0010w40.856/130AU087013 and AU088275
PfNC2.2varChr 2: 923651 to 923750PFB1055c4258/100AU087013 and AU088275
PfNC3.1varChr 3: 33511 to 33640PFC0005w38.572/130AU088275
PfNC3.2varChr 3: 1034931 to 1035030PFC1120c4142/100AU088275
PfNC4.1varChr 4: 35061 to 35150PFD0005w46.732/90AU088275
PfNC4.2varChr 4: 606841 to 606930PFD0635c42.238/90AU088275
PfNC4.3varChr 4: 970091 to 970160PFD1005c3534/70AU088275
PfNC4.4varChr 4: 981221 to 981290PFD1015c3736/70
PfNC4.5varChr 4: 1183861 to 1183950PFD1245c45.631/90AU088275
PfNC6varChr 6: 3401 to 3500PFF0010w4238/100AU088275
PfNC7.1varChr 7: 30531 to 30670MAL7P1.21237.981/140AU088275 and AU087013
PfNC7.2varChr 7: 605971 to 606040MAL7P1.503735/70AU088275
PfNC7.3varChr 7: 614461 to 614570PF07_00504038/110AU088275
PfNC7.4varChr 7: 644311 to 644440MAL7P1.5541.543/130AU087013
PfNC8.1varChr 8: 22251 to 22330PF08_014241.341/80AU087013
PfNC8.2varChr 8: 441381 to 441450PF08_01063834/70AU087013
PfNC8.3varChr 8: 1399241 to 1399340MAL8P1.2203838/100AU088275 and AU087013
PfNC9.1varChr 9: 19931 to 20070PFI0005w40.792/140AU088275
PfNC9.2varChr 9: 1503331 to 1503430PFI1830c3738/100AU088275
PfNC10varChr 10: 28351 to 28490PF10_000136.476/100AU088275
PfNC11varChr 11: 24021 to 24150PF11_00074067/130AU088275
PfNC12.1varChr 12: 32601 to 32670PFL0020w37.141/70AU088275
PfNC12.2varChr 12: 774191 to 774300PFL0935c38.253/110AU088275 and AU087013
PfNC12.3varChr 12: 1704411 to 1704490PFL1955w46.336/80AU088275 and AU087013
PfNC12.4varChr 12: 2248951 to 2249040PFL2665c45.656/90AU088275

Note: All candidates are found in the 5′ UTRs of var genes and are within 150 bases of the start codon.

A search of PlasmoDB revealed that the Sugano malaria cDNA library30,46,47 has identified several short transcripts (ESTs AU088275 and AU087013) in the 5′ UTRs of var genes. An analysis of the GC-rich sequences that are proximal to var genes showed that all except the PfNC4.4var overlap with at least one of the two ESTs AU088275 and AU087013. The two ESTs are transcribed from the same strand as the PfEMP1 mRNA and AU088275 and AU087013 showed alignment with 30 and 16 regions of the P. falciparum genome respectively. This bioinformatics study was able to identify 23 out of 30 and 10 out of 16 regions in the case of AU088275 and AU087013 respectively. The GC-rich sequences that were not identified in this study are less conserved compared to P. reichenowi and hence did not show up after the BLAST with a cut off of 1e–10. The presence of short transcripts that overlap with the GC-rich sequences identified in this bioinformatics screen suggests that indeed these sequences are transcribed. PfNC4.4var was the only sequence with no associated ESTs and this sequence lies 190 bases away from the annotated PfEMP1. A BLAST was performed with the sequence of PfNC4.4.var against the genome of P. falciparum and we identified 6 matches that were all proximal to PfEMP1 genes. To test whether any short RNAs are associated with the sequence PfNC4.4var we performed Northern analysis on mixed stage asexual parasites using strand-specific probes. These results indicate that the sequence is not expressed in mixed stage asexual parasites (data not shown); perhaps the expression of this sequence is below the limit of detection by Northern analysis or is stage-specific. Alternatively the sequence may function as a DNA regulatory element rather than as RNA or may be involved in translational control of the flanking var gene. The sequences of the ESTs AU088275 and AU087013 were compared with each other and with the sequence PfNC4.4var using ClustalW (http://www.ebi.ac.uk/clustalw/). The scores obtained show that the ESTs AU088275 and AU087013 are 68% similar to each other at the sequence level while the sequence PfNC4.4var is quite distinct from either of these ESTs showing 25%–32% sequence similarity in the ClustalW analysis. Further analysis of the ESTs showed that AU088275 and AU087013 are in the 5′ UTRs of var genes of the upsB or upsBsh subtypes while sequence PfNC4.4var is found in the 5′ UTRs of 7 var genes of the upsC subtype. Having shown that the GC-rich sequences that flank var genes are found in short transcripts, we next asked whether these sequences have the capacity to encode proteins, either as upstream ORFs (uORFs) or as N-terminal extensions of the annotated var genes. Indeed, a majority of the GC-rich sequences showed the presence of upstream ORFs (uORFs) ranging in size from minimal ORFs (1 amino acid) to 21 amino acids. Several of the uORFs are found in a majority of the GC-rich regions (pentapeptide MYATI found 20 times) and others are found less frequently (MYQNTTKPCMPRYKPRMHDIM found once). Interestingly, when all the GC-rich sequences that flank var genes were aligned with each other, it was noticed that the most conserved sequences (highlighted in grey with asterisks), encoded the uORF pentapeptide MYATI (Fig. 1). In contrast, sequence conservation was poor in the regions surrounding the uORF. This suggests an evolutionary pressure to maintain the uORF encoding sequences indicating these sequences may have functional importance. A sequence alignment between var-associated GC-rich sequences of P. falciparum and P. reichenowi (Fig. 2) shows a significant sequence similarity between PfNC12.4var and the homologous region from P. reichenowi and the uORF MYATI is conserved between the two species.
Figure 1

Sequence alignment of proximal upstream regions of upsB var genes.

Notes: The box shows that conserved GC-rich sequences contain the putative upstream ORF MYATI. Grey highlights show-conserved sequences, indicating that sequences flanking the putative uORF are less conserved than the regions encoding the uORF. The annotated start codon is highlighted in grey.

Figure 2

BLAST result of PfNC12.4var against the Plasmodium reichenowi genome.

Note: The regions of conservation are shown with stars and the uORF is highlighted in grey.

uORFs have been shown to play important roles in translational control. For example, a minimal uORF can regulate translation of certain HIV genes.48 This minimal ORF (consisting of only a start and a stop codon) overlaps with the start codon of the vpu gene and mutating the start and stop codons of this minimal ORF results a reduction of translation of the downstream env gene. Upstream AUGs and uORFs in human and rodent genes appear to regulate translation initiation by the ribosome scanning machinery.27 Finally, and most pertinently for this work, the presence of uORFs has been shown to regulate the expression of the downstream var gene.29 We propose that the uORFs identified in this report flank var genes at the 5′ regions and may play similar roles in regulation of var gene expression.

Sequences Proximal to rifin Genes

Rifin genes constitute the largest multi-gene family in the P. falciparum genome with 149 members. Transcription from rifin genes is highest at the rings and early trophozoite stages and proteins encoded by these mRNAs are localized to the Maurer’s clefts.49,50 Presence of antibodies against RIFINS in patient sera suggests that these proteins are indeed exposed on the surface of erythrocytes.51 More recently, the discovery of a PEXEL/VTS transport signal found in proteins exported from the parasite vacuole to the erythrocyte was observed in RIFIN proteins and is consistent with a potential cell surface localization.52,53 The function of RIFINS is unknown however these proteins may be involved in cytoadherence. Similar to var genes, rifin genes are also clonally variable although the mechanisms underlying the two processes appear to be different. A search of the proximal intergenic GC-rich sequences obtained in our screen of the P. falciparum genome shows that 19 sequences flank rifin genes. The list of sequences is shown in Table 3. All the sequences except for one (PfNC10.1rif) lie in the 3′ UTR of rifin genes and are 1 to 500 bases away from the stop codon of the rifin open reading frame. PfNC10.1rif is located in the 5′ UTR of rifin gene PF10_0002w. Four of the GC-rich regions that flank rifins are associated with short ESTs (BI816203 and BQ577081) and all the ESTs are transcribed from the same strand as the rifin gene. There is a paucity of information regarding regulation of rifin gene expression. A recent study has mapped promoter elements that are required for expression of one rifin gene (PF11_0009) that is highly expressed in 3D7 parasites. 54 The promoter elements include two repressor regions that are bound by nuclear proteins expressed at different stages of the parasite life cycle. While 5′ flanking sequences are essential for transcriptional regulation, it is tempting to speculate that events in the 3′ UTRs of rifin genes, particularly the GC-rich sequences discovered in this study may play roles in gene regulation.
Table 3

Conserved GC rich regions associated with rifin genes.

S.noCandidateAssociated RIFINGC%IdentityAssociated ESTs
PfNC1.1rifChr 1: 62341 to 62410PFA0045c35%36/70
PfNC1.2rifChr 1: 81921 to 81990PFA0080c38%55/70
PfNC2rifChr 2: 32951 to 33020PFB0015c35%36/70
PfNC3rifChr 3: 1025611 to 1025680PFC1115w35%34/70BI816203
PfNC4rifChr 4: 67831 to 67940PFD0025w37.3%78/110
PfNC6rifChr 6: 1352101 to 1352170PFF1575w37%50/70
PfNC7.1rifChr 7: 45441 to 45540MAL7P1.21535%79/100
PfNC7.2rifChr 7: 55261 to 55330MAL7P1.21737%45/70BQ577081
PfNC7.3rifChr 7: 1454751 to 1454820PF07_013437%47/70
PfNC9.1rifChr 9: 42361 to 42460PFI0025c34%80/100
PfNC9.2rifChr 9: 1479191 to 1479290PFI1810w35%81/100
PfNC10.1rifChr 10: 39021 to 39090PF10_0002w35.7%89/100
PfNC10.2rifChr 10: 47981 to 48050PF10_000535.7%88/100
PfNC10.3rifChr 10: 1623881 to 1623950PF10_039835.7%98/100
PfNC12.1rifChr 12: 43711 to 43790PFL0025c33.8%92/100BQ577081
PfNC12.2rifChr 12: 2239401 to 2239480PFL2660w35%87/100
PfNC13.1rifChr 13: 30631 to 30700MAL13P1.237.1%94/100BQ577081
PfNC13.2rifChr 13: 53591 to 53670PF13_000640%95/100

Note: All candidates are found in the 3′ UTRs of rifin genes.

Conclusion

In conclusion, this report shows that a bioinformatics strategy involving a search for GC-rich intergenic regions that are conserved between P. falciparum and P. reichenowi can be used to uncover conserved GC-rich sequences proximal to antigenic variation genes. These sequences are transcribed and may also encode short upstream ORFs. It will be of interest to test the functional importance of these sequences in regulation of antigenic variation and clinical disease. List of 151 GC-rich sequences proximal to annotated genes identified in P. falciparum.
Table S1

List of 151 GC-rich sequences proximal to annotated genes identified in P. falciparum.

S.noName of the candidateProximal gene and orientationProximal geneDistance from the proximal geneGC% and identity
1Chr 1: 62341 to 62410RIFIN1035% and 36/70
2Chr 1: 81921 to 81990RIFIN838% and 55/70
3Chr 1: 197781 to 197850Ubiquitin carboxyl terminal hydrolase535% and 62/70
4Chr 1: 556971 to 557100PfEMP132035% and 126/130
5Chr 1: 616621 to 616710PfEMP1838.9% and 33/90
6Chr 1: 29631 to 29730PfEMP1337% and 58/100
7Chr 1: 503731 to 503800Hypothetical protein35% and 40/70
8Chr 2: 25101 to 25230PfEMP1240.8% and 56/130
9Chr 2: 32951 to 33020RIFIN1035% and 36/70
10Chr 2: 54331 to 54410STEVOR isoform gam beta843.8% and 73/80
11Chr 2: 147571 to 147640Hypothetical protein737% and 58/70
12Chr 2: 165911 to 165990Hypothetical protein15435% and 57/80
13Chr 2: 197361 to 197430Hypothetical protein44535% and 66/70
14Chr 2: 301801 to 301970Cysteine protease putative840.6% and 162/170
15Chr 2: 473501 to 473600Protein kinase putative1037% and 93/100
16Chr 2: 923651 to 923750PfEMP1342% and 58/100
17Chr 3: 8031 to 8160Hypothetical protein23436.2% and 93/130
18Chr 3: 8461 to 8540Hypothetical protein35% and 62/80
19Chr 3: 10961 to 11050Hypothetical protein21634.4% and 74/90
20Chr 3: 33511 to 33640PfEMP1138.5% and 72/130
21Chr 3: 443031 to 443120Hypothetical protein19734.4% and 80/90
22Chr 3: 540901 to 540980Hypothetical protein35% and 40/50
23Chr 3: 686651 to 686720Protein kinase putative10738% and 31/70
24Chr 3: 691491 to 691560Protein kinase putative335% and 66/70
25Chr 3: 1025611 to 1025680Rifin1235% and 34/70
26Chr 3: 1034931 to 1035030Var gene741% and 42/100
27Chr 3: 1046441 to 1046510Hypothetical protein35135% and 35/70
28Chr 3: 1051191 to 1051280Hypothetical protein21334.4% and 65/90
29Chr 4: 35061 to 35150PfEMP1346.7% and 32/90
30Chr 4: 67831 to 67940RIFIN44537.3% and 78/110
31Chr 4: 311851 to 311920Hypothetical protein and lysine decarboxylase36 and 29835% and 42/70
32Chr 4: 336301 to 336410Sexual stage specific precursor and hypothetical protein135 and 539.1% and 106/110
33Chr 4: 500431 to 500520Hypothetical protein15935.6% and 65/90
34Chr 4: 606841 to 606930PfEMP1942.2% and 38/90
35Chr 4: 667311 to 667410GTP binding protein19734% and 95/100
36Chr 4: 851901 to 851970Hypothetical protein40435% and 54/70
37Chr 4: 862591 to 862660CGI141 protein homolog, and hypothetical protein322 and 21835% and 56/70
38Chr 4: 970091 to 970160PfEMP 14135% and 34/70
39Chr 4: 981221 to 981290PfEMP119137% and 36/70
40Chr 4: 1064251 to 1064380Hypothetical protein37.7% and 80/130
41Chr 4: 1183861 to 1183950PfEMP11345.6% and 31/90
43Chr 5: 619951 to 620030Hypothetical protein838.8% and 56/80
44Chr 6: 3401 to 3500PfEMP1342% and 38/100
45Chr 6: 296831 to 296920Translation initiation factor IF23834.4% and 77/90
46Chr 6: 661791 to 661880Hypothetical proteins6 and 55131.1% and 55/90
47Chr 6: 672491 to 672560Hypothetical protein35% and 64/70
48Chr 6: 1352101 to 1352170RIFIN8037% and 50/70
49Chr 7: 30531 to 30670PfEMP1337.9% and 81/140
50Chr 7: 45441 to 45540RIFIN44735% and 79/100
51Chr 7: 55261 to 55330RIFIN137% and 45/70
52Chr 7: 98671 to 98740Hypothetical protein21537% and 70/70
53Chr 7: 614461 to 614570PfEMP1340% and 38/110
54Chr 7: 644311 to 644440PfEMP1841.5% and 43/130
55Chr 7: 1012111 to 1012190Conserved GTP binding protein and cell cycle control protein cwf15 homologue390 and 10832.5% and 44/80
56Chr 7: 1145161 to 1145250MAL7_28Sa747.8% and 64/90
57Chr 7: 1155801 to 1155890Hypothetical protein636.7% and 87/90
58Chr 7: 1393981 to 1394050Hypothetical protein28735% and 70/70
59Chr 7: 1454751 to 1454820RIFIN1037% and 47/70
60Chr 7: 605971 to 606040PfEMP17137% and 35/70
61Chr 8: 22251 to 22330PfEMP13941.3% and 41/80
62Chr 8: 98871 to 99000MAL8b_28s rRNA17636.2% and 117/130
63Chr 8: 99751 to 99850PF08_tmp216436% and 98/100
64Chr 8: 187771 to 187870Hypothetical protein31% and 93/100
65Chr 8: 233921 to 233990Tubulin gamma chain4535% and 66/70
66Chr 8: 441381 to 441450PfEMP18038% and 34/70
67Chr 8: 1289051 to 1289130PF08_tmp1 r RNA and putative senescence associated protein225 and 12950% and 53/80
68Chr 8: 1289821 to 1289940Senescence associated protein16340% and 116/120
69Chr 8: 1290151 to 1290250Senescence associated protein49337% and 96/100
70Chr 8: 1399241 to 1399340PfEMP1738% and 38/100
71Chr 8: 1410561 to 1410640Hypothetical protein35% and 56/80
72Chr 8: 1412661 to 1412770Hypothetical protein1038.2% and 52/110
73Chr 9: 19931 to 20070PfEMP11040.7% and 92/140
74Chr 9: 42361 to 42460RIFIN44934% and 80/100
75Chr 9: 369131 to 369220Formylmethionine deformylase34931.1% and 55/90
76Chr 9: 406351 to 406460Transporter protein and hypothetical protein74 and 5433.6% and 108/110
77Chr 9: 632261 to 632330Hypothetical protein6335% and 58/70
78Chr 9: 749991 to 750080Large ribosomal subunit protein L3, prokaryotic (50S)like11532.2% and 88/90
79Chr 9: 757341 to 757430Hypothetical protein33.3% and 50/90
80Chr 9: 907861 to 907930Hypothetical protein37% and 64/90
81Chr 9: 1092431 to 1092510NAD synthase and Hypothetical protein11 and 32432.5% and 75/80
82Chr 9: 1107301 to 1107400Hypothetical protein244% and 100/100
83Chr 9: 1130251 to 1130370Cytochrome c oxidase subunit,32132.5% and 103/120
84Chr 9: 1283801 to 1283870Hypothetical protein35937% and 70/70
85Chr 9: 1291101 to 1291170Hypothetical protein8435% and 54/70
86Chr 9: 1293991 to 1294060Peptide release factor and DHHC type zinc finger protein353 and 19937% and 62/70
87Chr 9: 1314241 to 1314350mRN A processing protein and Hypothetical protein526 and 19235.5% and 101/110
88Chr 9: 1479191 to 1479290RIFIN46735% and 81/100
89Chr 9: 1503331 to 1503430PfEMP1737% and 38/100
90Chr 10: 28351 to 28490PfEMP1136.4% and 76/100
91Chr 10: 39021 to 39090RIFIN1735.7% and 89/100
92Chr 10: 47981 to 48050RIFIN335.7% and 88/100
93Chr 10: 125441 to 125510Hypothetical protein43037.1% and 57/70
94Chr 10: 274111 to 274200Hypothetical protein18237.8% and 87/90
95Chr 10: 401521 to 401590Hypothetical protein37.1% and 54/70
96Chr 10: 694111 to 694200Hypothetical protein1035.6% and 50/90
97Chr 10: 886941 to 887040Hypothetical proteins4 and 34734% and 98/100
98Chr 10: 960765 to 960854Hypothetical protein32.2% and 88/90
99Chr 10: 1162615 to 1162694Hypothetical protein35% and 79/80
100Chr 10: 1211885 to 1211964Hypothetical protein8036.3% and 77/80
101Chr 10: 1231655 to 1231784Hypothetical protein1933.8% and 100/130
102Chr 10: 1623881 to 1623950RIFIN2235.7% and 98/100
103Chr 11: 5321 to 5460Hypothetical protein38837.1% and 115/140
104Chr 11: 5631 to 5730Hypothetical protein11834% and 66/100
105Chr 11: 6141 to 6220Hypothetical protein5135% and 65/80
106Chr 11: 6311 to 6420Hypothetical protein221 and 40135.5% and 73/110
107Chr 11: 6581 to 6760Hypothetical proteins491 and 6134.4% and 149/180
108Chr 11: 6901 to 7110Hypothetical protein7136.2% and 145/210
109Chr 11: 7271 to 7420Hypothetical protein232 and 34041.3% and 106/150
110Chr 11: 7501 to 7600Hypothetical protein462 and 18033% and 57/100
111Chr 11: 7941 to 8210Hypothetical protein6140% and 119/270
112Chr 11: 8411 to 8720Hypothetical protein26234.2% and 251/310
113Chr 11: 18451 to 18530Hypothetical protein1735% and 69/80
114Chr 11: 24021 to 24150PfEMP11049% and 67/120
115Chr 11: 150597 to 150706Hypothetical protein10336.4% and 107/110
116Chr 11: 151007 to 151126Hypothetical protein10535% and 118/120
117Chr 11: 317947 to 318026Hypothetical protein9932.5% and 80/80
118Chr 11: 347207 to 347276Hypothetical protein24037.1% and 69/70
119Chr 11: 569106 to 569175Hypothetical protein56 and 22035.7% and 69/70
120Chr 11: 796606 to 796695Hypothetical protein17335.6% and 90/90
121Chr 11: 1395814 to 1395883Hypothetical protein36.8% and 66/70
122Chr 11: 1417434 to 1417503Hypothetical protein45335.7% and 66/70
123Chr 11: 1527984 to 1528073Hypothetical protein35.6% and 81/90
124Chr 11: 1663894 to 1664013Hypothetical protein21630.8% and 117/120
125Chr 11: 1918214 to 1918333Hypothetical protein217 and 7433.3% and 112/120
126Chr 11: 1927134 to 1927213Hypothetical protein5 and 25536.3% and 44/80
127Chr 11: 1929634 to 1929743Hypothetical protein5838.2% and 78/110
128Chr 11: 1929974 to 1930053Hypothetical protein28236.3% and 58/80
129Chr 11: 2010214 to 2010293RIFIN440% and 97/100
130Chr 12: 32601 to 32670PfEMP13337.1% and 41/70
131Chr 12: 43711 to 43790RIFIN1033.8% and 92/100
132Chr 12: 774191 to 774300PfEMP1138.2% and 53/110
133Chr 12: 1360341 to 1360430Hypothetical protein19034.4% and 54/90
134Chr 12: 1404951 to 1405020Hypothetical protein14937.1% and 67/70
135Chr 12: 1529361 to 1529430Hypothetical protein37937.1% and 70/70
136Chr 12: 1704411 to 1704490PfEMP11046.3% and 36/80
137Chr 12: 1739561 to 1739630PfEMP137.5% and 35/70
138Chr 12: 2239401 to 2239480RIFIN135% and 87/100
139Chr 12: 2248951 to 2249040PfEMP1645.6% and 56/90
140Chr 13: 30631 to 30700RIFIN837.1% and 94/100
141Chr 13: 53591 to 53670RIFIN50040% and 95/100
142Chr 13: 977431 to 977520Aspartyl (acid) protease, putative5538.9% and 82/90
143Chr 13: 2517471 to 2517550Hypothetical protein12333.8% and 79/80
144Chr 13: 2791651 to 2791760Hypothetical protein conserved227 and 7630.9% and 102/110
145Chr 13: 2799331 to 2799400MAL13_5.8SrRNA rRNA31135.7% and 57/70
146Chr 14: 141091 to 141160Hypothetical protein and acid phosphatase206 and 25037.1% and 62/70
147Chr 14: 472871 to 472940GTP-binding protein, putative235.7% and 59/70
148Chr 14: 989170 to 989279DNA directed DNA polymerase8235.5% and 103/110
149Chr 14: 1086373 to 1086442Hypothetical protein38035.7% and 69/70
150Chr 14: 1213632 to 1213711Hypothetical protein17735% and 80/80
151Chr 14: 1540333 to 1540412Translocation protein sec62, putative17035% and 77/80
152Chr 14: 2247864 to 2247933Protein phosphatase 2C, putative34935.7% and 69/70
  53 in total

1.  A selenocysteine tRNA and SECIS element in Plasmodium falciparum.

Authors:  Tobias Mourier; Arnab Pain; Bart Barrell; Sam Griffiths-Jones
Journal:  RNA       Date:  2005-02       Impact factor: 4.942

Review 2.  A greedy promoter controls malarial var-iations.

Authors:  Artur Scherf
Journal:  Cell       Date:  2006-01-27       Impact factor: 41.582

Review 3.  Regulation of gene expression in protozoa parasites.

Authors:  Consuelo Gomez; M Esther Ramirez; Mercedes Calixto-Galvez; Olivia Medel; Mario A Rodríguez
Journal:  J Biomed Biotechnol       Date:  2010-03-02

4.  Noncoding RNA genes identified in AT-rich hyperthermophiles.

Authors:  Robert J Klein; Ziva Misulovin; Sean R Eddy
Journal:  Proc Natl Acad Sci U S A       Date:  2002-05-28       Impact factor: 11.205

5.  Malaria parasite sequences from chimpanzee support the co-speciation hypothesis for the origin of virulent human malaria (Plasmodium falciparum).

Authors:  Austin L Hughes; Federica Verra
Journal:  Mol Phylogenet Evol       Date:  2010-06-10       Impact factor: 4.286

6.  Global genetic diversity and evolution of var genes associated with placental and severe childhood malaria.

Authors:  Adama R Trimnell; Susan M Kraemer; Sandeep Mukherjee; David J Phippard; Joel H Janes; Eric Flamoe; Xin-zhuan Su; Philip Awadalla; Joseph D Smith
Journal:  Mol Biochem Parasitol       Date:  2006-04-18       Impact factor: 1.759

7.  Evolution of noncoding and silent coding sites in the Plasmodium falciparum and Plasmodium reichenowi genomes.

Authors:  Daniel E Neafsey; Daniel L Hartl; Matt Berriman
Journal:  Mol Biol Evol       Date:  2005-04-27       Impact factor: 16.240

8.  Computational identification of non-coding RNAs in Saccharomyces cerevisiae by comparative genomics.

Authors:  John P McCutcheon; Sean R Eddy
Journal:  Nucleic Acids Res       Date:  2003-07-15       Impact factor: 16.971

Review 9.  Transcriptional control and gene silencing in Plasmodium falciparum.

Authors:  Bradley I Coleman; Manoj T Duraisingh
Journal:  Cell Microbiol       Date:  2008-07-10       Impact factor: 3.715

Review 10.  The genetic signatures of noncoding RNAs.

Authors:  John S Mattick
Journal:  PLoS Genet       Date:  2009-04-24       Impact factor: 5.917

View more
  4 in total

Review 1.  Untranslated regions of mRNA and their role in regulation of gene expression in protozoan parasites.

Authors:  Shilpa J Rao; Sangeeta Chatterjee; Jayantapal K Pal
Journal:  J Biosci       Date:  2017-03       Impact factor: 1.826

2.  Comparative molecular developmental aspects of the mammalian- and the avian lungs, and the insectan tracheal system by branching morphogenesis: recent advances and future directions.

Authors:  John N Maina
Journal:  Front Zool       Date:  2012-08-07       Impact factor: 3.172

Review 3.  Noncoding RNAs as emerging regulators of Plasmodium falciparum virulence gene expression.

Authors:  Shruthi S Vembar; Artur Scherf; T Nicolai Siegel
Journal:  Curr Opin Microbiol       Date:  2014-07-12       Impact factor: 7.934

4.  Overexpression of Plasmodium berghei ATG8 by Liver Forms Leads to Cumulative Defects in Organelle Dynamics and to Generation of Noninfectious Merozoites.

Authors:  Christiane Voss; Karen Ehrenman; Godfree Mlambo; Satish Mishra; Kota Arun Kumar; John B Sacci; Photini Sinnis; Isabelle Coppens
Journal:  MBio       Date:  2016-06-28       Impact factor: 7.867

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.