| Literature DB >> 27348299 |
Roberto Amato1,2, Sarah Auburn3, Richard D Pearson1,2, Olivo Miotto1,2,4, Jacob Almagro-Garcia2, Chanaki Amaratunga5, Seila Suon6, Sivanna Mao7, Rintis Noviyanti8, Hidayat Trimarsanto8, Jutta Marfurt3, Nicholas M Anstey3, Timothy William9, Maciej F Boni10, Christiane Dolecek10, Tinh Tran Hien10, Nicholas J White4, Pascal Michon11,12, Peter Siba11, Livingstone Tavul11, Gabrielle Harrison13,14, Alyssa Barry13,14, Ivo Mueller13,14, Marcelo U Ferreira15, Nadira Karunaweera16, Milijaona Randrianarivelojosia17, Qi Gao18, Christina Hubbart2, Lee Hart2, Ben Jeffery2, Eleanor Drury1, Daniel Mead1, Mihir Kekre1, Susana Campino1, Magnus Manske1, Victoria J Cornelius1,2, Bronwyn MacInnis1, Kirk A Rockett1,2, Alistair Miles1,2, Julian C Rayner1, Rick M Fairhurst5, Francois Nosten4,19, Ric N Price3,20, Dominic P Kwiatkowski1,2.
Abstract
The widespread distribution and relapsing nature of Plasmodium vivax infection present major challenges for the elimination of malaria. To characterize the genetic diversity of this parasite in individual infections and across the population, we performed deep genome sequencing of >200 clinical samples collected across the Asia-Pacific region and analyzed data on >300,000 SNPs and nine regions of the genome with large copy number variations. Individual infections showed complex patterns of genetic structure, with variation not only in the number of dominant clones but also in their level of relatedness and inbreeding. At the population level, we observed strong signals of recent evolutionary selection both in known drug resistance genes and at new loci, and these varied markedly between geographical locations. These findings demonstrate a dynamic landscape of local evolutionary adaptation in the parasite population and provide a foundation for genomic surveillance to guide effective strategies for control and elimination of P. vivax.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27348299 PMCID: PMC4966634 DOI: 10.1038/ng.3599
Source DB: PubMed Journal: Nat Genet ISSN: 1061-4036 Impact factor: 38.330
Figure 1Defining the accessible genome
When short read sequencing data from clinical samples of Plasmodium vivax are aligned to the 14 chromosomes comprising the Sal1 reference genome, there is low coverage and mapping quality in subtelomeric hypervariable regions (red) and three internal hypervariable regions (orange). Excluding these regions, we defined a core genome (white) which comprises 94.4% of the chromosomal sequence; coordinates are given in Supplementary Table 2. Aggregated across all samples, 99% of nucleotide positions in the core genome are alignable (≤10% reads of mapping quality 0) compared to 86% in subtelomeric and 85% in internal hypervariable regions; and 94% of positions in the core genome have ≥5x read depth compared to 37% in subtelomeric and 54% in internal hypervariable regions. When genome assemblies for other P. vivax strains15 were aligned to the Sal1 reference genome, the genome-wide coverage was 88.5% for India VII, 89.1% for Mauritania I and 89.6% for North Korea strains, whereas the coverage across the core genome was 98.5%, 98.7% and 99.0% respectively.
Gene categories enriched for high N/S ratio, nucleotide diversity, and Tajima’s D.
Each metric is represented by its median and P value by Mann-Whitney test, comparing genes in a given category versus all others, with bold font indicating significant values (P<0.05 after Bonferroni correction). Rows are ordered by π. N/S=non-synonymous/synonymous ratio. π=nucleotide diversity per base. D=Tajima’s D. No Pf/Py ortholog=genes that lack a known ortholog in P. falciparum/P. yoelii. TM domain=genes containing a transmembrane domain. Max schizont=maximum expression during the intraerythrocytic cycle was in late schizont stage42. Max sporozoite/zygote/ookinete=maximum expression in the sporozoite/zygote/ookinete43. These estimates are based on high-quality SNPs in genes with ≥10 SNPs in the subset of 148 samples used for detailed population comparisons as described in Methods. Estimates for individual genes, including all SNPs or restricted to high-quality SNPs, are given in Supplementary Dataset 1.
| Comparison | Genes | N/S | π | ||||
|---|---|---|---|---|---|---|---|
| No Pf ortholog | 97 | 2.23 | 7.3×10-4 | -1.86 | |||
| No Py ortholog | 251 | 1.86 | 6.7×10-4 | -1.92 | |||
| Max schizont | 844 | 1.60 | 6.1×10-4 | -2.04 | |||
| Max sporozoite | 422 | 1.43 | 3.6×10-1 | 6.0×10-4 | 3.2×10-2 | -2.03 | 6.9×10-2 |
| Signal peptide | 569 | 1.46 | 6.5×10-2 | 6.0×10-4 | -1.95 | ||
| TM domain | 646 | 1.50 | 1.9×10-2 | 5.9×10-4 | -1.98 | ||
| Max ookinete | 230 | 1.40 | 6.1×10-1 | 5.8×10-4 | 2.0×10-1 | -2.08 | 8.6×10-1 |
| Has paralog | 206 | 1.38 | 3.4×10-1 | 5.7×10-4 | 6.4×10-2 | -2.01 | 5.8×10-3 |
| Max zygote | 339 | 1.35 | 2.6×10-2 | 5.4×10-4 | 7.4×10-1 | -2.10 | 8.9×10-1 |
| All genes | 3062 | 1.43 | 5.5×10-4 | -2.07 | |||
Figure 2Copy number variation
Common forms of copy number variation in a region of chromosome 8 with a deletion of the first three exons of PVX_094265; in regions of chromosome 6 and 14 with copy number variations of pvdbp and PVX_101445 respectively; and in a region of chromosome 10 region where multiple genes including pvmdr1 are duplicated. Top panel shows an illustrative sample for each genomic region: upper trace shows GC-normalised coverage with inferred copy number marked by red line; lower trace shows the proportion of read pairs mapping in opposing directions, indicating the presumptive breakpoints of a duplication (note that not all samples have identical breakpoints, Supplementary Dataset 2). Lower panel shows number of samples in each population having a copy number other than one: western Thailand (WTH, n=88), western Cambodia (WKH, n=19) and Papua Indonesia (PID, n=41).
Figure 3Genetic structure of mixed infections
A shows distribution of F across all samples. F is analogous to an inbreeding coefficient27 and a value of 1 indicates a perfect clone. Left: Distribution of F in western Thailand (WTH), western Cambodia (WKH) and Papua Indonesia (PID), showing median (thick line) and inter-quartile range (thin line). Middle: Distribution of F stratified by the number of dominant clones in a sample and by whether they are related to each other, showing median (thick line) and inter-quartile range (thin line). Right: Distribution of F (vertical axis) and the proportion of heterozygous genotype calls (horizontal axis) in samples with different numbers of dominant clones.
Each row of B shows an illustrative sample. Left: non-reference allele frequency (NRAF) distribution across all heterozygous SNPs. Right: vertical axis is heterozygosity calculated in 20kb bins with the scale truncated (0–0.03) to highlight runs of homozygosity (RoH). Sample a is near-clonal as evidenced by F = 1 and lack of heterozygous SNPs. Samples b-e each contain two dominant clones as evidenced by the bimodal NRAF distribution. Sample b contains two unrelated clones (no RoH). Sample c contains two partially related clones (RoH across minority of the genome). Sample d contains two meiotic siblings (RoH extending over ~50% of the genome). Sample e contains two clones that are the product of inbreeding over multiple generations (RoH extending over ~80% of the genome). Sample f appears to contain a complex mixture of related parasites (relatively flat NRAF distribution indicates multiple dominant clones but there is substantial RoH).
Figure 4Parasite population structure.
Population structure is evident by principal components analysis (panel A), ADMIXTURE (panel B) and on a neighbour joining tree (panel C). ADMIXTURE analysis identifies three major components of population structure which correspond to the three largest groups of samples, i.e. western Thailand (n=88), western Cambodia (n=37) and Papua Indonesia (n=55). The neighbour-joining tree shows how these three major components encompass the Southeast Asian and Pacific Islands (Malaysia, Papua Indonesia, Papua New Guinea), the western part of mainland Southeast Asia (Western Thailand, Myanmar, and China) and the eastern part of the mainland (Cambodia, Vietnam, Eastern Thailand, and Laos). Samples from other parts of the world (India, Sri Lanka, Madagascar, and Brazil) are separated from Southeast Asian samples by long branches.
Figure 5Population-specific signatures of recent positive selection
Metrics of extended haplotype homozygosity were estimated in 88 samples from western Thailand (WTH), 19 from western Cambodia (WKH) and 41 from Papua Indonesia (PID). The strongest evidence for recent selection was identified by XP-EHH (i.e. by comparing populations) and in most cases this was supported by iHS tests within individual populations. Horizontal axis represents genome position with chromosomes 1-14 shown in alternating colours. Vertical axis shows the results of XP-EHH and iHS tests represented by –log10 P values on a scale of 0 to 15. Dashed line shows the Bonferroni-corrected threshold for genome-wide significance, red points mark significant P values. Loci with ≥2 SNPs with significant P values within 80 kb of each other are marked by red lines in the tracks labelled ‘Selected regions’. The iHS signal on chromosome 13 in WKH was confined to two adjacent SNPs and is therefore not marked as significant. These signatures are described in more detail in Supplementary Table 6 and Supplementary Figure 7.