| Literature DB >> 31285495 |
Ernest Diez Benavente1, Ana Rita Gomes1,2, Jeremy Ryan De Silva3, Matthew Grigg4, Harriet Walker1, Bridget E Barber4,5, Timothy William5,6,7, Tsin Wen Yeo4, Paola Florez de Sessions8, Abhinay Ramaprasad9, Amy Ibrahim1, James Charleston1, Martin L Hibberd1,8, Arnab Pain9, Robert W Moon1, Sarah Auburn4, Lau Yee Ling3, Nicholas M Anstey4, Taane G Clark10,11, Susana Campino12.
Abstract
The zoonotic Plasmodium knowlesi parasite is the most common cause of human malaria in Malaysia. Genetic analysis has shown that the parasites are divided into three subpopulations according to their geographic origin (Peninsular or Borneo) and, in Borneo, their macaque host (Macaca fascicularis or M. nemestrina). Whilst evidence suggests that genetic exchange events have occurred between the two Borneo subpopulations, the picture is unclear in less studied Peninsular strains. One difficulty is that P. knowlesi infected individuals tend to present with low parasitaemia leading to samples with insufficient DNA for whole genome sequencing. Here, using a parasite selective whole genome amplification approach on unprocessed blood samples, we were able to analyse recent genomes sourced from both Peninsular Malaysia and Borneo. The analysis provides evidence that recombination events are present in the Peninsular Malaysia parasite subpopulation, which have acquired fragments of the M. nemestrina associated subpopulation genotype, including the DBPβ and NBPXa erythrocyte invasion genes. The NBPXb invasion gene has also been exchanged within the macaque host-associated subpopulations of Malaysian Borneo. Our work provides strong evidence that exchange events are far more ubiquitous than expected and should be taken into consideration when studying the highly complex P. knowlesi population structure.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31285495 PMCID: PMC6614422 DOI: 10.1038/s41598-019-46398-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Comparison of whole genome sequencing before and after parasite enrichment using SWGA.
| Sample ID | Parasitemia p/µl* (%) | Sample type | Reads aligned to | % genome with coverage >5-fold | % genes with coverage >5-fold | % intergenic regions with coverage >5-fold | Mean coverage | Total N SNPs |
|---|---|---|---|---|---|---|---|---|
| 1 | 320 (0.006%) | No SWGA | 2.45 | 0.64 | 0.80 | 0.52 | 1.71 | 1,797 |
| SWGA | 12.11 | 19.08 | 23.36 | 16.51 | 11.94 | 59,031 | ||
| 2 | 539 (0.01%) | No SWGA | 0.81 | 0.04 | 0.03 | 0.04 | 1.28 | 15 |
| SWGA | 3.66 | 6.49 | 7.54 | 5.97 | 4.38 | 14,746 | ||
| 3 | 851 (0.017%) | No SWGA | 3.15 | 3.88 | 4.69 | 3.34 | 2.25 | 11,974 |
| SWGA | 27.95 | 43.32 | 53.40 | 37.14 | 17.52 | 143,483 | ||
| 4 | 1581 (0.03%) | No SWGA | 1.14 | 0.08 | 0.08 | 0.09 | 1.34 | 127 |
| SWGA | 20.37 | 11.79 | 14.59 | 10.05 | 8.96 | 34,314 | ||
| 5 | 3554 (0.07%) | No SWGA | 1.22 | 0.32 | 0.35 | 0.30 | 1.58 | 628 |
| SWGA | 6.31 | 28.01 | 35.21 | 23.32 | 6.51 | 79,139 | ||
| 6 | 5300 (0.1%) | No SWGA | 2.31 | 3.38 | 4.34 | 2.66 | 2.18 | 10,479 |
| SWGA | 17.10 | 48.26 | 59.66 | 41.19 | 15.71 | 159,652 | ||
| 7 | 5875 (0.11%) | No SWGA | 1.87 | 0.28 | 0.31 | 0.26 | 1.55 | 609 |
| SWGA | 20.26 | 54.74 | 66.95 | 47.41 | 18.72 | 179,304 | ||
| 8 | 10634 (0.2%) | No SWGA | 2.34 | 2.67 | 3.55 | 2.00 | 2.10 | 8,147 |
| SWGA | 22.33 | 60.50 | 73.96 | 52.40 | 23.61 | 208,202 | ||
| 9 | ND | No SWGA | 10.59 | 2.51 | 2.58 | 2.56 | 2.14 | 2,197 |
| SWGA | 44.54 | 32.03 | 38.01 | 28.87 | 8.97 | 119,157 | ||
| 10 | 26368 (0.5%) | No SWGA | 11.01 | 31.36 | 36.22 | 28.77 | 4.05 | 104,805 |
| SWGA | 42.21 | 48.26 | 59.89 | 40.99 | 18.04 | 162,920 |
*These results are from single runs, and not pooled samples (average of a total of 2 billion bp sequenced per sample (human and parasite)).
Figure 1Correlation of parasitaemia (%) and genome coverage (>5 reads) in amplified DNA samples. Parasitaemia data was available for 13 amplified samples used in this study. An increase in coverage is observed with samples with a higher parasitaemia (R-squared = 0.6).
Figure 2Neighbour-Joining tree for 103 P. knowlesi isolates shows three main clusters. The tree shows the expected split into three different clusters associated with: (i) Peninsular Malaysia (purple in tips), (ii) Malaysian Borneo Macaca nemestrina (Mn-Pk, green) and (iii) Malaysian Borneo M. fascicularis (Mf-Pk, blue). The tree also shows the correct positioning of the 4 newly generated Peninsular isolates (in red) within the Peninsular cluster, and the clustering of the 16 new Malaysian Borneo isolates from Sabah (in orange) within either of the Mf-Pk or Mn-Pk associated clusters. Bootstrapping was performed (n = 100) and all the nodes that split the relevant subpopulations presented with a value greater than 90.
Figure 3P. knowlesi isolates with the highest genomic coverage from Peninsular Malaysia (P137 and P050) present with genetic exchange events from the Mn-Pk sub-population. Peninsular isolate P137 was compared to DIM5 (top two panels) as a representative of the Mn-Pk cluster, and DIM6 (second row panels) of the Mf-Pk cluster; these isolates were selected based on sequencing quality and completeness, and an absence of evidence of multiplicity of infection. Isolate P050 was compared to the same isolates in the bottom 4 panels. On the top left panel each green dot represents a 50 kbp section of the DIM5 genome. Its position on the X-axis is defined by the average SNP π obtained by comparing its sequence in a pairwise manner to the same syntenic genome 50 kbp fragment in each isolate in the Mn-Pk cluster; in the Y-axis the average SNP π is compared to the same fragment of the Peninsular isolates. This average SNP π defines the similarity of each dot to the different clusters. The top most right panel represents the same data as the top left most panel with the P137 50 kbp fragments highlighted in purple for clarity. The same analysis was conducted in the second row of panels but using a Mf-Pk cluster isolate and the average SNP π to Mf-Pk as the X-axis. The dashed line represents the linear regression for the coloured dots in each plot, and the regions of interest were identified (in light green in the right panels) by finding the fragments of the Peninsular isolate genomes that presented low similarity to the Peninsular cluster and high similarity to either the Mn-Pk (green) or Mf-Pk (blue). This approach accounts for the highest residuals. After further filtering through the assessment of individual genes, we report the set of results in Supplementary Table 2.
Figure 4Haplotype plot and neighbour-joining tree for three Duffy Binding Protein invasion genes ((A) DBPα, (B) DBPβ, (C) DBPγ) provides insights into the population dynamics of the different haplotypes. Only isolates with at least 30-fold coverage across the gene were used in each plot: 81 isolates for DBPα, 89 isolates for DBPβ and 70 isolates for DBPγ. A strong genetic divergence of the sequences from the different clusters was found for each of the 3 genes, and the Peninsular cluster had the highest diversity across all 3 genes (A–C). Red stars indicate isolates with differences in subpopulation clustering when compared to whole genome clustering, suggesting a genetic exchange. Bootstrapping was performed (n = 100) and all the nodes that split the relevant subpopulations and/or exchange events presented with a value greater than 84.
Figure 5Haplotype plots and neighbour-joining trees for two Normocyte Binding Protein invasion genes ((A) NBPXa, (B) NBPXb) provides insights into the population dynamics of the gene haplotypes. Only isolates with at least 30-fold coverage across the gene were used in each plot: 88 isolates for NBPXa and 90 isolates for NBPXb. Red stars indicate isolates with differences in subpopulation clustering when compared to whole genome clustering, suggesting a genetic exchange. For the NBPXa gene (A), evidence of strong genetic divergence of the sequences from the different clusters was found. However, the NBPXb gene (B, right) presented a fairly distinct pattern of diversity. The clusters have small genetic distances between themselves, making the separation between them less obvious. Bootstrapping was performed (n = 100) and all the nodes that split the relevant subpopulations and/or exchange events presented with a value greater than 82.