Literature DB >> 25830531

Plasmodium knowlesi genome sequences from clinical isolates reveal extensive genomic dimorphism.

Miguel M Pinheiro1, Md Atique Ahmed2, Scott B Millar1, Theo Sanderson3, Thomas D Otto3, Woon Chan Lu4, Sanjeev Krishna5, Julian C Rayner3, Janet Cox-Singh1.   

Abstract

Plasmodium knowlesi is a newly described zoonosis that causes malaria in the human population that can be severe and fatal. The study of P. knowlesi parasites from human clinical isolates is relatively new and, in order to obtain maximum information from patient sample collections, we explored the possibility of generating P. knowlesi genome sequences from archived clinical isolates. Our patient sample collection consisted of frozen whole blood samples that contained excessive human DNA contamination and, in that form, were not suitable for parasite genome sequencing. We developed a method to reduce the amount of human DNA in the thawed blood samples in preparation for high throughput parasite genome sequencing using Illumina HiSeq and MiSeq sequencing platforms. Seven of fifteen samples processed had sufficiently pure P. knowlesi DNA for whole genome sequencing. The reads were mapped to the P. knowlesi H strain reference genome and an average mapping of 90% was obtained. Genes with low coverage were removed leaving 4623 genes for subsequent analyses. Previously we identified a DNA sequence dimorphism on a small fragment of the P. knowlesi normocyte binding protein xa gene on chromosome 14. We used the genome data to assemble full-length Pknbpxa sequences and discovered that the dimorphism extended along the gene. An in-house algorithm was developed to detect SNP sites co-associating with the dimorphism. More than half of the P. knowlesi genome was dimorphic, involving genes on all chromosomes and suggesting that two distinct types of P. knowlesi infect the human population in Sarawak, Malaysian Borneo. We use P. knowlesi clinical samples to demonstrate that Plasmodium DNA from archived patient samples can produce high quality genome data. We show that analyses, of even small numbers of difficult clinical malaria isolates, can generate comprehensive genomic information that will improve our understanding of malaria parasite diversity and pathobiology.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 25830531      PMCID: PMC4382175          DOI: 10.1371/journal.pone.0121303

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Plasmodium knowlesi is a malaria parasite of old world macaques that causes zoonotic malaria in humans [1]. P. knowlesi has been widely used as an experimental model leading to seminal discoveries in aspects of malaria biology, including antigenic variation, vaccine development and erythrocyte invasion (for example [2,3,4]). More recently, the discovery of severe cases of P. knowlesi malaria in the human population has re-kindled human-disease focussed research on this important parasite [5]. P. knowlesi lacks unique morphological characteristics and human infections are often mis-diagnosed as P. malariae or other Plasmodium species[6]. Novel P. knowlesi-specific PCR assays now allow accurate identification of P. knowlesi malaria and PCR-confirmed cases are continuously reported across Southeast Asia, including severe and fatal cases in Malaysia [7,8,9,10]. P. knowlesi is a widespread human infectious agent in Southeast Asia, yet we currently know very little about naturally circulating parasite populations that enter the human host or the factors that are associated with severe disease. In Sarawak, Malaysian Borneo, we found that P. knowlesi parasitaemia is associated with disease severity [8,9]. To study the relationship between parasitaemia and variation in the proteins that are involved in invasion of human erythrocytes, short regions of two P. knowlesi invasion genes, P. knowlesi normocyte binding protein (Pknbp) xa and Pknbpxb, were sequenced from more than 100 human infections [11]. Both gene fragments were polymorphic and the Pknbpxa fragment was dimorphic with distinct co-associating polymorphisms that segregated into two clusters. In the study cohort, patients were infected with parasites with either Pknbpxa dimorphic type at almost equal frequency but only alleles found in one dimorphic type associated with markers of disease severity [11]. While this suggests a potential link between invasion phenotypes, parasitaemia and virulence, it is critical to extend the study beyond a candidate gene level and out to the whole genome. A reference P. knowlesi genome sequence has been generated from the macaque-adapted experimental H strain [12], but P. knowlesi genome sequences from clinically well-characterised isolates are not currently available. The generation of parasite genome sequences from clinical Plasmodium samples requires a leucocyte depletion step to minimise the amount of contaminating human DNA. However, many archived sample collections exist, including our own collection of frozen whole blood samples from patients with P. knowlesi malaria, that have not been leucocyte depleted before freezing. Adapting depletion approaches to these frozen sample sets would unlock a wealth of genomic information. Here we report a method to deplete human DNA from frozen clinical malaria samples and render them suitable for whole genome sequencing. The method exploits two assumptions; 1) that not all leucocytes are lysed when whole blood goes through one freeze/thaw cycle and 2) the more robust parasites would survive the same treatment either in intact infected red blood cells (IRBCs) or as free parasites released from lysed erythrocytes. We developed a simple filtration method to remove leucocytes and recover parasite-rich pellets for Plasmodium genome sequencing. The method offers the malaria research community a means to interrogate Plasmodium species genome data in important archived sample collections. In this case, we use the approach to generate genome sequence data from six previously frozen P. knowlesi clinical isolates, and show that the Pknbpxa dimorphism may extend across the P. knowlesi genome.

Materials and Methods

Patient samples

Archived frozen whole blood samples were used from adult patients recruited into a non-interventional study with informed signed consent that included use of samples in related studies. Patient consent forms are securely stored in the University of St Andrews. Patient recruitment and consent protocols were approved by the Medical Research and Ethics Committee, Ministry of Health Malaysia and the Ethics Committee Faculty of Medicine and Health Sciences, University Malaysia Sarawak. The use of the samples in the study reported here was further approved by the University of St Andrews Teaching and Research Ethics Committee.

Human DNA depletion using Whatman filter paper

EDTA blood samples from P. knowlesi patients were collected and stored at -40°C. The samples were thawed and the volume measured before gentle mixing in ice-cold PBS at a ratio of 300ul thawed blood per 5ml cold PBS. The mixture was pipetted into a 10mL syringe barrel, the base was lined with 3 layers of Whatman No 3 (6uM pore size) to remove small lymphocytes and 3 layers of Whatman No 1 (11uM pore size) on top to remove larger surviving leucocytes. The filter papers were cut to fit the internal diameter of the syringe and pre-wet with PBS before use. Not more than 10mL of diluted blood was loaded per syringe column. The filtrate was collected into sterile 50mL centrifuge tubes following centrifugation at 125g for 2 minutes at 4°C. The columns were washed through with 10mL volumes of cold PBS and each wash was collected into the filtrate tube by centrifugation as above until the filters were no longer blood-stained. The total combined filtrate, up to 40 mL, was centrifuged at 2000g for 10 minutes at 4°C to pellet any surviving IRBCs and free parasites. Pellets were re-suspended in 1ml cold PBS and transferred to 1.5ml Eppendorf tubes and recovered by centrifugation at 14,000g, for 2 minutes at 4°C. Pellets were re-suspended and washed in 1mL cold PBS and collected by centrifugation as described. This wash step was repeated two more times. The washed IRBC/parasite pellets were suspended in 20ul Proteinase K (QIAGEN) followed by 200ul cold PBS. The mixture was vortexed thoroughly before DNA extraction using QIAamp Blood Mini kit (QIAGEN) with RNase A, as per manufacturers instructions. For samples with more than 100,000 parasites /ul blood the initial blood dilution step was 150ul thawed blood into 5mL cold PBS.

TaqMan qPCR multiplexed for human and P. knowlesi DNA

Plasmodium specific 18ssURNA Plasmo1: 5′ GTTAAGGGAGTGAAGACGA TCAGA and Plasmo 2: 5′ AACCCAAAGACTTTGATTTC TCATAA primers were used [13] with the published P. knowlesi TaqMan probe 5′ CTCTCCGGAGATTAGAACTCTTAGATTGCT labelled with 5'FAM and 3'BHQ1 [14]. Human DNA primers: Plat1-A 5′ CTTACCACATCCGCTCCATC, and Plat1-B 5′ TTCACACTCTCCGTCACATTG with the probe 5′ HEX/CACATCCCC/ZEN/AGTGCCGAGTTAGA/3IABkFQ were used. The qPCR master mix contained 250nM Plasmo1, 250nM Plasmo2, 250nM Plat1-A, 250nM Plat1-B, 125nM Pk probe, 125nM Plat1-Probe, 1 x Roche RT-PCR Master Mix and 1ul DNA template in 20ul final volume. qPCR cycling was 10 minutes at 95°C, followed by 45 cycles of 10 seconds at 95°C, 30 seconds at 57°C, and 1 second at 72°C using the Roche LightCycler 480 II.

Illumina sequencing

DNA was quantified (Qubit Fluorometric Quantitation, Invitrogen, Life Technologies) and sheared into fragments of 400–600 bp. Illumina libraries were generated using a) the PCR free protocol (NoPCR) [15] or b) the standard library preparation using the KAPA enzyme [16] with eight PCR cycles. NoPCR libraries were sequenced on the Illumina HiSeq 2000 platform for 100 paired-end cycles and standard PCR libraries were sequenced on Illumina MiSeq for 150 paired ends cycles using V4 or V5 SBS sequencing kits and proprietary reagents according to manufacturer's recommended protocol (https://icom.illumina.com/). Data were analysed from the Illumina sequencing machines using RTA1.6, RTA1.8 or GA v0.3 analysis pipelines.

Reference genome

The Plasmodium knowlesi H strain reference genome version 11.1 GeneDB (www.genedb.org/Homepage/Pknowlesi) was downloaded from PlasmoDB (www.plasmodb.org) [12,17,18]. The region corresponding to the pknbpxa gene (PKH_146970 and PKH_146980) in chromosome 14 was partially missing and fragmented in the current reference genome and we corrected for this using the published pknbpxa gene sequence (GenBank accession number EU867791.1) [2]. Common non-coding DNA regions upstream and downstream of the pknbpxa gene were located in both the Plasmodium knowlesi strain H reference genome and the published pknbpxa gene. With this information it was possible to replace the pknbpxa gene (PKH_146970) in the reference genome sequence with the published EU867791.1 gene sequence without disrupting subsequent mapping. The pknbpxb gene, which was not annotated correctly, was rectified using the EU867792.1 published gene sequence [2].

Genome sequence mapping

HiSeq and MiSeq reads from P. knowlesi enriched, human DNA depleted, samples are deposited in the EMBL-EBI European Nucleotide Archive (http://www.ebi.ac.uk)[19]. The archive references are for HiSeq: ERR274221; ERR274222; ERR274224; ERR274225 and MiSeq: ERR366425 and ERR366426. Sequences mapping to the human genome, representing patient DNA, were removed from this data in the sequencing pipeline. The reads were mapped to the corrected P. knowlesi H strain reference genome sequence (PlasmoDB-11.1_PknowlesiH_Genome.fasta) using Bowtie-2 [20] followed by Bedtools to summarise the coverage of each genome [21].

Single Nucleotide Polymorphism (SNP) calling

Samtools mpileup with threshold base quality set to 13 was used with BCFtools to generate Variant Call SNP Format (VCF) files for each P. knowlesi genome sequence [22]. A varFilter (BCFtools) was applied and all SNP sites with allele frequency less than 0.9 were removed. Insertions and deletions were not included in any of the analyses or scripts. Only SNP sites with a minimum coverage of 13 were taken into consideration.

Linkage Disequilibrium analysis of full-length pknbpxa sequences extracted from P. knowlesi genome sequence data

We used Artemis [12,17,23,24] and the VCF files to generate full-length pknbpxa gene sequences as fasta files from each of the genome sequences (n = 6). The fasta files were converted to the Haploview compatible PLINK format [25]. Linkage disequilibrium was performed on the full-length coding region of Pknbpxa sequences using Haploview and analysed using default parameters [26,27]. Nucleotide diversity (π) was calculated using a 400bp window length with a step size of 25bp, DnaSP [28].

Identification of polymorphisms genome-wide co-associating with the Pknbpxa fragment dimorphism

An algorithm was developed to identify SNP sites in each genome sequence (n = 6), co-associating with the P. knowlesi Pknbpxa dimorphic pattern already identified in a small fragment of this gene [11] and also visible in Artemis on chromosome 14 at the Pknbpxa locus (Fig. 1). Briefly the script was designed to screen VCF files to identify each SNP and test if the SNP co-associated with SNPs defining the Pknbpxa dimorphism. Co-associating patterns were predefined in the algorithm to describe which kind of symbols (SNP pattern) each genome required to fit within either of the P. knowlesi Pknbpxa dimorphic forms. Every time a SNP fit the pattern the event was signalled (recorded). Finally, an image was created to show the density of all SNP sites and then the co-associating SNPs for each chromosome (S1 Fig.).
Fig 1

A screen shot of Artemis DNA view comparing six Plasmodium knowlesi genome sequences from patient isolates to the Plasmodium knowlesi H strain reference genome sequence.

The P. knowlesi normocyte binding protein xa locus on chromosome 14 is shown. The screen shot shows segregation of the sequences from patient isolates into two groups, (n = 3 in each group) and the dimorphism is clearly visible.

A screen shot of Artemis DNA view comparing six Plasmodium knowlesi genome sequences from patient isolates to the Plasmodium knowlesi H strain reference genome sequence.

The P. knowlesi normocyte binding protein xa locus on chromosome 14 is shown. The screen shot shows segregation of the sequences from patient isolates into two groups, (n = 3 in each group) and the dimorphism is clearly visible.

Testing the distribution of co-associated SNPs defined in the Pknbpxa dimorphism

To identify positions on each P. knowlesi chromosome where the density of co-associating sites was more evident a Chi square test of independence was applied followed by a calculation of adjusted residuals. For this, each chromosome was divided into 30 equal parts and a contingency table was created to reflect the number of SNPs co-associating with the Pknbpxa dimorphism per part per chromosome. Adjusted residuals were calculated in a contingency table and a threshold of > 3.00 for more co-associating SNPs than expected and < -3.00 for less co-associating dimorphic SNPs than expected was applied to the resulting values. By applying these thresholds it was possible to identify, within 99.7% limits of confidence, co-associating SNP sites for each chromosome with higher or lower than expected co-associating SNP density.

Gene Ontology (GO)

P. knowlesi genes were analysed using Blast2GO http://www.blast2go.com version 2.7.2 [29]. All genes with complete coverage (4623) were blasted against nr@ncbi database and an InterProScan 5 analysis was performed [30]. The Gene Ontology classification was done with default parameters [31]. Genes with no (0) co-associating SNP sites, with >0–<10 (1–9) and >9 (≥ 10) SNPs that co-associated with the P. knowlesi genome-wide dimorphism were identified within each resulting GO group.

Testing for enrichment of dimorphic genes in particular GO subgroups

Genes with dimorphic SNP sites were tested for statistically significant enrichment of dimorphic genes in GO subgroups using topGO Enrichment analysis for Gene Ontology. R package version 2.14.0. Adrian Alexa and Jorg Rahnenfuhrer (2010). (http://www.bioconductor.org/packages/release/bioc/html/topGO.html). For this we selected two groups of genes those with one or more dimorphic SNP sites (≥1) and a separate group with ten or more dimorphic SNP sites (≥10) and analysed for enrichment against all genes with at least one SNP whether or not dimorphic.

Results

Human DNA depletion from frozen whole blood samples

Frozen whole blood samples from fifteen P. knowlesi patients, with parasite counts ranging from 10,000–400,000 parasite/ul, were thawed and human DNA depleted using an in-house method. Briefly white blood cells were removed by filtration through Whatman filter paper followed by parasite recovery as described in detail (see Methods section). Total human and parasite DNA was quantified using qPCR (Table 1). Nine of fifteen isolates had the required >100ng of P. knowlesi DNA, and seven of the nine had <80% human DNA contamination, the cut-off for Plasmodium genome sequencing, and were suitable for sequencing (Table 1). Five and two DNA samples were used to generate NoPCR and PCR sequencing libraries and multiplexed in a single lane on Illumina HiSeq and MiSeq platforms respectively (Table 1). The remaining six samples had insufficient P. knowlesi DNA and/or >80% human DNA (hDNA) contamination (Table 1).
Table 1

Clinical samples human DNA depleted using the Whatman filtration method.

Sample IDParasites/ulAccession ID § Total hDNA (ng) qPCRTotal Pk DNA (ng) qPCR% Pk DNA
47128,500ND30229449.3
DuplicateERR366425* 13123864.5
8747,000NS2255118.7
DuplicateNS6856
7340,000ERR366426* 3323387.6
DuplicateND10616260.4
9129,000NS2693712
22010,000NS735243.1
4886,000ND1402116745.4
DuplicateERR274221** 42182465.7
50A139,000ERR274222** 317277589.8
DuplicateND415328088.8
50B390,000ERR274223597123967.5
55186,500NS123542825.7
DuplicateNS341843311.2
58149,000ERR274224** 33650660
62139,000NS988716.7
178104,000NS103119.4
233326,000NS101531323.5
25858,000NS5609414.3
DuplicateNS2455919.4
29966,000ERR274225** 62698461.1

Samples sequenced using Illumina HiSeq or MiSeq sequencing platforms are labelled. The amount of Parasite DNA recovered and per cent human DNA contamination are given.

*MiSeq.

**HiSeq.

NS = Not suitable.

ND = Duplicate sample suitable but not sequenced.

§ Accession number EMBL-EBI European Nucleotide Archive (http://www.ebi.ac.uk).

Samples sequenced using Illumina HiSeq or MiSeq sequencing platforms are labelled. The amount of Parasite DNA recovered and per cent human DNA contamination are given. *MiSeq. **HiSeq. NS = Not suitable. ND = Duplicate sample suitable but not sequenced. § Accession number EMBL-EBI European Nucleotide Archive (http://www.ebi.ac.uk).

DNA obtained from frozen P. knowlesi clinical isolates generated high coverage genome sequence

P. knowlesi sequence data was generated from seven patient isolates, five from HiSeq runs and two from MiSeq runs. The HiSeq runs generated >36 million reads and MiSeq >5 million reads. An average mapping of >90% was obtained for both HiSeq and MiSeq data producing an average coverage of >140x for HiSeq and >30x for MiSeq. The total number of reads mapped and not mapped, percent human DNA contamination and coverage per genome sequence are summarized in Table 2.
Table 2

P. knowlesi clinical isolate genome sequence summary report.

Sequence IDERR366426ERR366425ERR274221ERR274223ERR274222ERR274225ERR274224
TechnologyMiSeqMiSeqHiSeqHiSeqHiSeqHiSeqHiSeq
Sample Ref.SKS-047SKS-073SKS-048SKS-050BSKS-050ASKS-299SKS-058
Total Reads600399061305624521776047211338587561765176979241469862
Total Reads Mapped473114850491253709553438967367484669104320371134547072
Total Reads Not Mapped12728421081437812222682439711028926685660816922790
% Reads Not Mapped21171717171616
% Human DNA1.581.590.750.480.140.710.73
Coverage3234166175218195156
Number of SNPs267055260888318427317051317072304641304147
Number Dimorphic SNPs42771427714277142771427714277142771
% Zero Coverage (%)7.87.95.55.85.85.96
Coverage >1 (%)91.691.494.193.993.993.793.6
Coverage >5 (%)90.189.993.593.393.493.192.9
Coverage >10 (%)88.488.492.992.89392.692.4

Genome Size 23487363

Genome Size 23487363

P. knowlesi genome analysis

The reads were mapped to the P. knowlesi H strain reference genome following correction of the Pknbpxa locus (see materials and methods section). Two genome sequences (ERR274222 and ERR274223) were generated from a single patient representing pre- and post-treatment samples. Only the pre-treatment sample sequence, ERR274222, was included in subsequent analyses along with sequences from five other patients all collected pre-treatment. The sequences covered 5228 genes, including genes and gene fragments annotated as genes of un-known function. Data from 605 genes were excluded because coverage was zero at one or more base position leaving 4623 genes in subsequent analyses. This filter excluded all but five of the 195 SICAVar genes and gene fragments and all but three of the 67 KIR genes and gene fragments. Both of these gene families are highly polymorphic, and the gene sequences in these contemporary clinical isolates are likely to be very different to those in the historical monkey-adapted reference genome, so mapping issues and low coverage in these gene sets is to be expected. Of the remaining genes 2180 (47.2%) were annotated as genes with unknown function. The SNP distribution across the genome is shown in S1 Fig.

Dimorphism extends across and beyond Pknbpxa

In a previous study we identified a DNA sequence dimorphism in a fragment (885bp) representing 10% of the P. knowlesi normocyte binding protein (Pknbp)xa gene that codes for a protein involved in red blood cell invasion. To determine the extent of the dimorphism across the gene, full-length Pknbpxa, (PKH_146970) coding sequences were assembled from the six genome sequences obtained from the same patient cohort. Ninety-one (91) Pknbpxa SNPs co-associated with the dimorphic pattern (r2 = 1). This dimorphism effectively divides the Pknbpxa gene sequences into two clusters of sequence types, with Pknbpxa sequences from three genomes falling into cluster 1 and three into cluster 2. Nucleotide diversity (π) was higher across the clusters (π = 0.01441), than it was within each cluster, (π = 0.00518 for cluster 1 (n = 3) and π = 0.00868 for cluster 2; n = 3 each). Cluster 1 was less diverse than cluster 2, consistent with Pknbpxa nucleotide diversity found in the previous study, but the significance of this difference cannot be estimated based on six sequences. The P. knowlesi genome sequences from clinical isolates were viewed in Artemis, a genome browser and annotation tool and referenced to the P. knowesi H strain genome sequence. Two SNP patterns emerged and the Pknbpxa dimorphism was clearly visible with sequences from three patient isolates clustering into each pattern (Fig. 1). To test whether the dimporphism extended beyond the boundaries of the Pknbpxa gene, SNP association with the Pknbpxa dimorphism was examined first along chromosome 14 and then genome-wide using an in-house script (see Materials and Methods section). The dimorphic SNP pattern was evident at multiple genetic loci on all chromosomes (S1 Fig.). The relative distribution of co-associated SNPs on each chromosome was determined by dividing each chromosome into 30 equal parts and using the Chi squared test of independence to test expected and observed events (S1 Table). Although the dimorphism extends across the full genome, the intensity and distribution is not uniform or clustered in any particular chromosomal region. The position and number of non-synonymous and synonymous SNPs co-associating with the dimorphism per gene per chromosome are represented in Fig. 2.
Fig 2

The number and position of SNP sites per gene co-associating with the P. knowelsi genome-wide dimorphism.

Non-synonymous polymorphisms (red) are shown above the line and synonymous polymorphisms (blue) are shown below the line. The line is drawn at zero. The chromosomes are drawn to scale and the height of the bars represents the number of SNP sites per gene per region of each chromosome. The scale is given in the boxed area and is the number of SNP sites per gene.

The number and position of SNP sites per gene co-associating with the P. knowelsi genome-wide dimorphism.

Non-synonymous polymorphisms (red) are shown above the line and synonymous polymorphisms (blue) are shown below the line. The line is drawn at zero. The chromosomes are drawn to scale and the height of the bars represents the number of SNP sites per gene per region of each chromosome. The scale is given in the boxed area and is the number of SNP sites per gene. More than half of the P. knowlesi genes in the genome, 2801 of 4623 genes (60.8%), appear to be dimorphic. Within the dimorphic group the number of dimorphic SNP sites per gene varied widely. For example, while Pknbpxa had a total of 326 SNPs, of which 91 (27.9%) co-associated with the dimorphism, a related gene of similar size, Pknbpxb, had a total of 197 SNPs of which only 5 (2.2%) co-associated with the dimorphism (S2 Table). Applying a more conservative cut-off identified 507 genes with ≥10 co-associating SNPs, representing 11% of genes in the genome with adequate coverage. Of these 301 (59.5%) were annotated as genes of unknown function. The chromosome location and annotated function of the remaining 206 genes is listed in S2 Table. Notable genes within this high stringency dimorphic group included 12 of 27 (44%) of genes annotated as transcription factors with AP2 domains in the P. knowlesi genome. Several genes associated with drug resistance in other Plasmodium species, such as putative multi drug resistance-associated protein PkMRP1(PKH_144590) and putative multidrug resistance protein, PkMDR 2 (PKH_125840) S2 Table, were also dimorphic, while the putative chloroquine resistance transporter (CRT) gene (PKH_010710) had 23 SNPs, none were dimorphic and only one SNP conferred an amino acid change. The enrichment of dimorphic genes among genes encoding transcription factors with AP2 domains was obvious and identified manually. We then used Gene Ontology (GO), (Blast2GO) tools to examine whether other P. knowlesi dimorphic genes were enriched in GO groups that served particular biological functions. Genes were sorted into GO term groups and sub-groups with putative or known molecular function, cellular process activity and biological process activity (Table 3). We then calculated the proportion genes with ≥1 dimorphic SNP in each GO group (Table 3). Most of the GO term groups had, as expected, approximately 60% dimorphic genes but there was variation (Table 3). If dimorphic genes have evolved randomly over time then the proportion of genes with dimorphic SNP sites in the GO groups would not be expected to be different from the distribution of genes with dimorphic SNP sites in the whole genome that is: 39% of genes with no dimorphic SNPs; 50% of genes with 1–9 dimorphic SNPs and 11% of gene with ten or more dimorphic SNPs. Several GO sub-groups had more genes than expected with 1–9 dimorphic SNP sites and ≥10 dimorphic SNP sites for example genes with molecular transducer activity, nucleic acid binding transcription factor activity and membrane association (Fig. 3a, 3b and 3c). There were also sub-groups of genes where dimorphic SNP sites were under-represented, including structural and molecular activity, developmental process and immune system process (Table 3 and Fig. 3). We used topGO to test for statistically significant enrichment of dimorphic genes in GO term groups (Table 4). In the first instance all genes with at least one dimorphic SNP were analysed and there was significant enrichment, particularly in the ion binding function, helicase activity and tRNA metabolic process function GO term groups (Table 4). Genes with ≥10 dimorphic SNPs were significantly enriched in the nucleic acid binding transcription factor activity and kinase activity GO term groups (Table 4).
Table 3

Summary of P. knowlesi gene ontology (GO) analysis and the proportion of genes in each group with dimorphic SNP's.

Gene ontology (GO) groupGO-IDGO- subgroupTotal number of genesTotal genes in dimorphismProportion in dimorphism
Molecular function GO:0060089Molecular transducer activity530.60
GO:0000988Protein binding transcription factor activity840.50
GO:0001071Nucleic acid binding transcription factor activity31230.74
GO:0030234Enzyme regulator activity41240.59
GO:0005215Transporter activity101630.62
GO:0005198Structural molecule activity151670.44
GO:0003824Catalytic activity9996530.65
GO:0005488Binding10316640.64
Cellular Processes GO:0031012Extracellular matrix100.00
GO:0005576Extracellular region630.50
GO:0031974Membrane-enclosed lumen55340.62
GO:0016020Membrane15120.80
GO:0005623Cell10545730.54
GO:0043226Organelle8134230.52
GO:0032991Macromolecular complex4472290.51
Biological processes GO:0051704Multi-organism process540.80
GO:0002376Immune system process310.33
GO:0022610Biological adhesion441.00
GO:0032502Developmental process820.25
GO:0040011Locomotion1050.50
GO:0000003Reproduction730.43
GO:0023052Signaling72400.56
GO:0065007Biological regulation109590.54
GO:0071840Cellular component organization or biogenesis184990.54
GO:0050896Response to stimulus157930.59
GO:0051179Localization2551550.61
GO:0044699Single-organism process2641490.56
GO:0009987Cellular process12157210.59
GO:0008152Metabolic process12117160.59

Gene Ontology assigned using Blast2GO—Software for Biologists, http://www.blast2go.com.

Fig 3

P. knowlesi genes are grouped by gene ontology (GO) terms.

The percent of genes in each GO sub-group of a) molecular function, b) cellular processes and c) biological processes are shown, n = the total number of mapped annotated genes in each GO sub-group. Percent of genes in GO subgroups with: no dimorphic SNP sites shown in brown, genes with 1–9 dimorphic SNP sites turquoise and genes with ≥10 dimorphic SNP sites purple. The expected percent of genes with 1–9 dimorphic SNPs (50%) is marked with a turquoise hatched line and the expected percent of genes with ≥10 dimorphic SNPs (11%) is marked with a purple hatched line. Gene ontology was assigned using Blast2GO—Software for Biologists, http://www.blast2go.com.

Table 4

topGO gene enrichment analysis.

Gene Ontology (Blast@GO) topGO term functional description topGO ID≥1 dimorphic SNPs≥10 dimorphic SNPs
Molecular Function p = p =
BindingIon bindingGO:00431670.0000770.01011
Catalytic activityHelicase activityGO:00043860.000095
Catalytic activityATPase activityGO:00168870.00140.02964
Catalytic activityHydrolase activity, acting on glycosyl bondsGO:00167980.0328
Catalytic activityDNA bindingGO:00036770.0121
Catalytic activityGTPase activityGO:00039240.0421
Nucleic acid binding transcription factor activityNucleic acid binding transcription factor activityGO:00010710.000037
Catalytic activityKinase activityGO:00163010.00067
Cellular Component
Cell; OrganelleMicrotubule organizing centerGO:00058150.021
CellPlasma membraneGO:00058860.037
Cell; OrganelleNucleusGO:00056340.0029
Cell; OrganelleCytoskeletonGO:00058560.0402
Biological Process
Metaboloc process; Cellular processtRNA metabolic processGO:00063990.00033
Metaboloc process; Cellular processCellular amino acid metabolic processGO:00065200.00357
Single-organism process; Metaboloc process; Cellular processMitosisGO:00070670.00849
Metaboloc process; cellular processNucleobase-containing compound catabolic processGO:00346550.01453
Multi-organism processTransportGO:00068100.03512
Metaboloc process; cellular processCellular protein modification processGO:00064640.03678
Metaboloc process; cellular processCellular nitrogen compound metabolic processGO:00346410.028
Gene Ontology assigned using Blast2GO—Software for Biologists, http://www.blast2go.com.

P. knowlesi genes are grouped by gene ontology (GO) terms.

The percent of genes in each GO sub-group of a) molecular function, b) cellular processes and c) biological processes are shown, n = the total number of mapped annotated genes in each GO sub-group. Percent of genes in GO subgroups with: no dimorphic SNP sites shown in brown, genes with 1–9 dimorphic SNP sites turquoise and genes with ≥10 dimorphic SNP sites purple. The expected percent of genes with 1–9 dimorphic SNPs (50%) is marked with a turquoise hatched line and the expected percent of genes with ≥10 dimorphic SNPs (11%) is marked with a purple hatched line. Gene ontology was assigned using Blast2GO—Software for Biologists, http://www.blast2go.com.

Discussion

Here we describe a method for enriching Plasmodium DNA from frozen whole blood samples collected from patients with malaria. The method required at least 200ul of whole blood at >40,000 parasites/ul to obtain sufficient parasite DNA for genome sequencing platforms. Parasite DNA recovery was inconsistent and human DNA contamination was the main problem. Nonetheless, seven of fifteen patient samples had sufficiently enriched P. knowlesi DNA to produce high quality genome sequences using Illumina sequencing platforms. The success may in part be because P. knowlesi is less AT rich (62%) than other Plasmodium genomes [12] perhaps reducing amplification bias. Combining the frozen sample filtration method described here with methylated DNA digestion and target enriched sequencing approaches described by others [16,32], may yield valuable Plasmodium genome data from many precious pre-existing frozen clinical sample collections. In a previous study we identified a sequence dimorphism in a fragment (885bp) of the P. knowlesi normocyte binding protein (Pknbp)xa that codes for a protein involved in red blood cell invasion [11]. Pknbpxa dimorphic cluster 2 contained alleles associated with markers of disease severity implying that dimorphic cluster 2 may contain more virulent parasites than cluster 1. Our genome data revealed that the dimorphism extended along the full-length (>8000bp) Pknbpxa coding region, along chromosome 14 and beyond. SNPs co-associating with the Pknbpxa dimorphism were distributed genome-wide across all chromosomes. Interestingly, even within the limitation that only six samples were sequenced, the dimorphism comprised numerous non-synonymous substitutions, suggesting, for the first time, that there may be at least two distinct types of P. knowlesi circulating in Sarawak, Malaysian Borneo, and that some may be more virulent that others. Dimorphic loci have been described in many Plasmodium species, particularly in merozoite surface antigens and invasion ligands of P. falciparum and P. vivax [33,34,35]. In P. ovale dimorphic characteristics at selected loci prompted the division of P. ovale into two sub-species [35]. Even so the evolution and maintenance of allelic dimorphisms in Plasmodium species is difficult to explain [34]. Here we demonstrate a genome-wide dimorphism, involving more than half of the genes in the P. knowlesi genome, including genes coding for functions that transcend from exposed parasite surfaces to protected internal sites. The sub-division of P. knowlesi into distinct types will require further sequence confirmation, yet the genome-wide nature of the dimorphism is striking. Although there was significant enrichment of dimorphic genes in several GO functional groups it is not clear what is driving a genome-wide dimorphism in P. knowlesi. Interestingly twelve genes implicated in parasite lifecycle stage-specific transcription, the putative transcription factors with Apicomplexan Apetala2 (AP2) domains [36,37,38,39] were dimorphic. Variation at these loci may mark genetically distinct lifecycle characteristics isolating P. knowlesi into strains or subspecies. In addition, all nine members of the ABC, ABC C transporter protein family of genes, annotated in the P. knowlesi genome, were dimorphic [12,40]. These genes are found in all phyla and represent an ancient gene family that, in eukaryotes, expel a wide range of unwanted substrates [41]. This family of genes include P. knowlesi PkMDR2 and PkMRP1 that were both polymorphic and dimorphic implying selection pressure at these loci. PkMDR2 and PkMRP1 are orthologues of P. falciparum PfMDR 2 and PfMRP1, genes that carry genetic markers of drug resistance, including resistance to mefloquine [40,42]. Tantalizingly, experimental lines of P. knowlesi were found innately resistant to mefloquine in Rhesus monkeys and clinical isolates did not respond well to mefloquine ex vivo [43,44]. Patients with uncomplicated P. knowlesi infections responded to mefloquine but one patient with severe disease exhibited RIII type resistance [45,46,47]. Selection at these promiscuous transporter loci in zoonotic parasites that, unlike P. falciparum, are not under conventional drug selection pressure may at first seem surprising. However, domestic and wild animals eat plants with bio-active properties—they self-medicate [48]. The jungles of Sarawak are considered un-mined treasure-troves of plant species with medicinal properties that are freely available to the animal species living there, including the macaque reservoir of P. knowlesi [49]. Selection at Plasmodium loci, that have evolved to eliminate natural toxins, then assumes biological relevance. Unfortunately these loci also evolve to eliminate antimalarial compounds when used to treat patients with malaria. P. knowlesi is a relatively 'un-tamed' Plasmodium species, therefore P. knowlesi genomes may retain ancient and diverse genetic signatures, that are presently invisible in heavily drug selected human-host restricted parasite populations such as P. falciparum and P. vivax. High throughput pathogen genome sequencing is a powerful new tool for infectious disease research. Here we use Illumina HiSeq and MiSeq platforms to produce high quality P. knowlesi genome sequences from difficult archived frozen samples. Analysis of the sequences uncovered a P. knowlesi genome-wide dimorphism that suggests there are least two types of P. knowlesi parasites in our patient cohort. We further discovered dimorphic genes among transporter genes that are important in antimalarial drug resistance. Genome-wide pathogen analyses, of even a small number of clinical malaria isolates, instantly added context to our understanding of Plasmodium pathobiology, particularly through between-species comparison.

P. knowlesi genome SNP density map.

Six P. knowlesi genome sequences from patient isolates were mapped to the P.knowlesi reference genome. Sites that differ from the reference are shown as blue bars (all SNP sites) or grey bars (SNP sites co-associating with the P. knowlesi genome-wide dimorphism). Each bar is 1 pixel wide and represents DNA fragments 809 bases long. The height of the bars represents the number of SNP sites per 809 base fragment. Gaps correspond to regions with low coverage (see results section) or where the reference genome is incomplete (runs of 'N'). (PNG) Click here for additional data file.

Distribution of co-associating SNPs by chromosome in six P. knowlesi genome sequences from human isolates.

Each chromosome was divided into 30 equal parts. (PDF) Click here for additional data file.

List of annotated genes from six clinical isolates with >9 SNPs that co—associate with the Plasmodium knowlesi genome-wide dimorphism.

(PDF) Click here for additional data file.
  48 in total

1.  Identification of a transcription factor in the mosquito-invasive stage of malaria parasites.

Authors:  Masao Yuda; Shiroh Iwanaga; Shuji Shigenobu; Gunnar R Mair; Chris J Janse; Andrew P Waters; Tomomi Kato; Izumi Kaneko
Journal:  Mol Microbiol       Date:  2009-02-10       Impact factor: 3.501

2.  Two nonrecombining sympatric forms of the human malaria parasite Plasmodium ovale occur globally.

Authors:  Colin J Sutherland; Naowarat Tanomsing; Debbie Nolder; Mary Oguike; Charlie Jennison; Sasithon Pukrittayakamee; Christiane Dolecek; Tran Tinh Hien; Virgilio E do Rosário; Ana Paula Arez; João Pinto; Pascal Michon; Ananias A Escalante; Francois Nosten; Martina Burke; Rogan Lee; Marie Blaze; Thomas Dan Otto; John W Barnwell; Arnab Pain; John Williams; Nicholas J White; Nicholas P J Day; Georges Snounou; Peter J Lockhart; Peter L Chiodini; Mallika Imwong; Spencer D Polley
Journal:  J Infect Dis       Date:  2010-05-15       Impact factor: 5.226

Review 3.  The ABCs of multidrug resistance in malaria.

Authors:  Jan B Koenderink; Reginald A Kavishe; Sanna R Rijpma; Frans G M Russel
Journal:  Trends Parasitol       Date:  2010-06-11

4.  BEDTools: a flexible suite of utilities for comparing genomic features.

Authors:  Aaron R Quinlan; Ira M Hall
Journal:  Bioinformatics       Date:  2010-01-28       Impact factor: 6.937

5.  Severe Plasmodium knowlesi malaria in a tertiary care hospital, Sabah, Malaysia.

Authors:  Timothy William; Jayaram Menon; Giri Rajahram; Leslie Chan; Gordon Ma; Samantha Donaldson; Serena Khoo; Charlie Frederick; Jenarun Jelip; Nicholas M Anstey; Tsin Wen Yeo
Journal:  Emerg Infect Dis       Date:  2011-07       Impact factor: 6.883

6.  A TaqMan real-time PCR assay for the detection and quantitation of Plasmodium knowlesi.

Authors:  Paul C S Divis; Sandra E Shokoples; Balbir Singh; Stephanie K Yanow
Journal:  Malar J       Date:  2010-11-30       Impact factor: 2.979

7.  The European Nucleotide Archive.

Authors:  Rasko Leinonen; Ruth Akhtar; Ewan Birney; Lawrence Bower; Ana Cerdeno-Tárraga; Ying Cheng; Iain Cleland; Nadeem Faruque; Neil Goodgame; Richard Gibson; Gemma Hoad; Mikyung Jang; Nima Pakseresht; Sheila Plaister; Rajesh Radhakrishnan; Kethi Reddy; Siamak Sobhany; Petra Ten Hoopen; Robert Vaughan; Vadim Zalunin; Guy Cochrane
Journal:  Nucleic Acids Res       Date:  2010-10-23       Impact factor: 16.971

8.  InterProScan 5: genome-scale protein function classification.

Authors:  Philip Jones; David Binns; Hsin-Yu Chang; Matthew Fraser; Weizhong Li; Craig McAnulla; Hamish McWilliam; John Maslen; Alex Mitchell; Gift Nuka; Sebastien Pesseat; Antony F Quinn; Amaia Sangrador-Vegas; Maxim Scheremetjew; Siew-Yit Yong; Rodrigo Lopez; Sarah Hunter
Journal:  Bioinformatics       Date:  2014-01-21       Impact factor: 6.937

9.  Susceptibility of human Plasmodium knowlesi infections to anti-malarials.

Authors:  Farrah A Fatih; Henry M Staines; Angela Siner; Mohammed Atique Ahmed; Lu Chan Woon; Erica M Pasini; Clemens Hm Kocken; Balbir Singh; Janet Cox-Singh; Sanjeev Krishna
Journal:  Malar J       Date:  2013-11-19       Impact factor: 2.979

10.  A cascade of DNA-binding proteins for sexual commitment and development in Plasmodium.

Authors:  Abhinav Sinha; Katie R Hughes; Katarzyna K Modrzynska; Thomas D Otto; Claudia Pfander; Nicholas J Dickens; Agnieszka A Religa; Ellen Bushell; Anne L Graham; Rachael Cameron; Bjorn F C Kafsack; April E Williams; Manuel Llinas; Matthew Berriman; Oliver Billker; Andrew P Waters
Journal:  Nature       Date:  2014-02-23       Impact factor: 49.962

View more
  33 in total

Review 1.  Host Cell Tropism and Adaptation of Blood-Stage Malaria Parasites: Challenges for Malaria Elimination.

Authors:  Caeul Lim; Selasi Dankwa; Aditya S Paul; Manoj T Duraisingh
Journal:  Cold Spring Harb Perspect Med       Date:  2017-11-01       Impact factor: 6.915

Review 2.  Systems biology of malaria explored with nonhuman primates.

Authors:  Mary R Galinski
Journal:  Malar J       Date:  2022-06-07       Impact factor: 3.469

Review 3.  Clinical management of Plasmodium knowlesi malaria.

Authors:  Bridget E Barber; Matthew J Grigg; Daniel J Cooper; Donelly A van Schalkwyk; Timothy William; Giri S Rajahram; Nicholas M Anstey
Journal:  Adv Parasitol       Date:  2021-09-01       Impact factor: 3.125

4.  Population genomic structure and adaptation in the zoonotic malaria parasite Plasmodium knowlesi.

Authors:  Samuel Assefa; Caeul Lim; Mark D Preston; Craig W Duffy; Mridul B Nair; Sabir A Adroub; Khamisah A Kadir; Jonathan M Goldberg; Daniel E Neafsey; Paul Divis; Taane G Clark; Manoj T Duraisingh; David J Conway; Arnab Pain; Balbir Singh
Journal:  Proc Natl Acad Sci U S A       Date:  2015-10-05       Impact factor: 11.205

5.  Normocyte-binding protein required for human erythrocyte invasion by the zoonotic malaria parasite Plasmodium knowlesi.

Authors:  Robert W Moon; Hazem Sharaf; Claire H Hastings; Yung Shwen Ho; Mridul B Nair; Zineb Rchiad; Ellen Knuepfer; Abhinay Ramaprasad; Franziska Mohring; Amirah Amir; Noor A Yusuf; Joanna Hall; Neil Almond; Yee Ling Lau; Arnab Pain; Michael J Blackman; Anthony A Holder
Journal:  Proc Natl Acad Sci U S A       Date:  2016-06-14       Impact factor: 11.205

6.  Simian malaria in wild macaques: first report from Hulu Selangor district, Selangor, Malaysia.

Authors:  Rumana Akter; Indra Vythilingam; Loke Tim Khaw; Rajes Qvist; Yvonne Ai-Lian Lim; Frankie Thomas Sitam; Balan Venugopalan; Shamala Devi Sekaran
Journal:  Malar J       Date:  2015-10-05       Impact factor: 2.979

7.  Phylogeographic Evidence for 2 Genetically Distinct Zoonotic Plasmodium knowlesi Parasites, Malaysia.

Authors:  Ruhani Yusof; Md Atique Ahmed; Jenarun Jelip; Hie Ung Ngian; Sahlawati Mustakim; Hani Mat Hussin; Mun Yik Fong; Rohela Mahmud; Frankie Anak Thomas Sitam; J Rovie-Ryan Japning; Georges Snounou; Ananias A Escalante; Yee Ling Lau
Journal:  Emerg Infect Dis       Date:  2016-08       Impact factor: 6.883

Review 8.  Malaria in the 'Omics Era'.

Authors:  Mirko Pegoraro; Gareth D Weedall
Journal:  Genes (Basel)       Date:  2021-05-30       Impact factor: 4.096

9.  Clustering and genetic differentiation of the normocyte binding protein (nbpxa) of Plasmodium knowlesi clinical isolates from Peninsular Malaysia and Malaysia Borneo.

Authors:  Md Atique Ahmed; Mun Yik Fong; Yee Ling Lau; Ruhani Yusof
Journal:  Malar J       Date:  2016-04-26       Impact factor: 2.979

10.  Genetic diversity and natural selection in the rhoptry-associated protein 1 (RAP-1) of recent Plasmodium knowlesi clinical isolates from Malaysia.

Authors:  Mira Syahfriena Amir Rawa; Mun-Yik Fong; Yee-Ling Lau
Journal:  Malar J       Date:  2016-02-05       Impact factor: 2.979

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.