| Literature DB >> 26709695 |
David J Winter1, M Andreína Pacheco1,2, Andres F Vallejo3, Rachel S Schwartz1, Myriam Arevalo-Herrera3,4, Socrates Herrera3, Reed A Cartwright1,5, Ananias A Escalante1,2.
Abstract
Plasmodium vivax is the most prevalent malarial species in South America and exerts a substantial burden on the populations it affects. The control and eventual elimination of P. vivax are global health priorities. Genomic research contributes to this objective by improving our understanding of the biology of P. vivax and through the development of new genetic markers that can be used to monitor efforts to reduce malaria transmission. Here we analyze whole-genome data from eight field samples from a region in Cordóba, Colombia where malaria is endemic. We find considerable genetic diversity within this population, a result that contrasts with earlier studies suggesting that P. vivax had limited diversity in the Americas. We also identify a selective sweep around a substitution known to confer resistance to sulphadoxine-pyrimethamine (SP). This is the first observation of a selective sweep for SP resistance in this species. These results indicate that P. vivax has been exposed to SP pressure even when the drug is not in use as a first line treatment for patients afflicted by this parasite. We identify multiple non-synonymous substitutions in three other genes known to be involved with drug resistance in Plasmodium species. Finally, we found extensive microsatellite polymorphisms. Using this information we developed 18 polymorphic and easy to score microsatellite loci that can be used in epidemiological investigations in South America.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26709695 PMCID: PMC4692395 DOI: 10.1371/journal.pntd.0004252
Source DB: PubMed Journal: PLoS Negl Trop Dis ISSN: 1935-2727
Scheme used to produced alternative reference genomes.
| Reference | A | C | G | T |
| First shift | T | A | C | G |
| Second shift | G | T | A | C |
| Third shift | C | G | T | A |
Summary of sequencing data.
| Sample | Reads (millions) | Proportion of reads mapped to Sal-I | Mean Coverage | p5 | p10 | p20 | Proportion of minority bases |
|---|---|---|---|---|---|---|---|
| 446 | 27.754 | 0.007 | 1.7 | 0.041 | 0.001 | 0.000 | 7.3 × 10−4 |
| 494 | 33.337 | 0.043 | 10.9 | 0.972 | 0.733 | 0.064 | 1.11 × 10−3 |
| 495 | 29.129 | 0.084 | 19.8 | 0.993 | 0.979 | 0.636 | 8.5 × 10−4 |
| 496 | 35.048 | 0.106 | 29.5 | 0.994 | 0.990 | 0.952 | 1.1 × 10−4 |
| 498 | 24.365 | 0.010 | 1.9 | 0.075 | 0.001 | 0.000 | 7.8 × 10−4 |
| 499 | 18.224 | 0.278 | 42.1 | 0.995 | 0.993 | 0.986 | 9.1 × 10−4 |
| 500 | 25.050 | 0.004 | 0.8 | 0.005 | 0.000 | 0.000 | 7.6 × 10−4 |
| 503 | 23.108 | 0.066 | 12.2 | 0.9830 | 0.825 | 0.100 | 9.1 × 10−4 |
“p5”, “p10” and “p20” refer to the proportion of sites covered by at least 5, 10 and 20 reads respectively.“Proportion of minority bases” refers to the proportion of mapped-reads at a given site that contained a base that made up a minor fraction of all mapped bases.
Fig 1Summary of genomic data.
Segments from outside to inside: P. vivax chromosomes with exonic regions shaded green; regions excluded from variant calling in this study; heatmap of in 10kb windows, high-diversity regions are darker and the maximum value is 0.0018; heatmap of in 10kb windows; mean number of samples covered per base in 10kb windows.
Fig 2Reference base frequency spectra.
The first eight panels represent the distribution of the frequency of reference bases in sequencing reads produced from our samples across all sites in the P. vivax genome. In the final panel the same distribution is graphed for reads generated from Sal-I, a known single infection. Red triangles represent the frequency expected if one non-reference containing read was mapped to a site with average sequencing coverage for that sample (given in parentheses after the sample name). The majority of sites from all samples contain only non-reference bases; these sites were removed to allow clearer visualization of these distributions.
Summary of SNV data.
| Total | Exonic | Intronic | 3‘UTR | 5‘UTR | Intergenic | |
|---|---|---|---|---|---|---|
| Number of SNVs | 33855 | 16413 | 3107 | 808 | 1328 | 14333 |
| Density | 1.8 | 1.2 | 2.4 | 1.9 | 1.8 | 2.0 |
|
| 7.0 × 10−4 | 5.9 × 10−4 | 7.9 × 10−4 | 6.5 × 10−4 | 6.4 × 10−4 | 7.5 × 10−4 |
|
| 6.8 × 10−4 | 5.7 × 10−4 | 7.5 × 10−4 | 6.2 × 10−4 | 6.1 × 10−4 | 7.2 × 10−4 |
Fig 3Allele sharing among P. vivax populations.
Note: Ellipses for Peruvian [18] Cambodian and Madagascan [31] populations include only those SNVs reported in the cited studies and not included in a region that was excluded in our study.
Breakdown of singletons.
| Sample | Singletons this study | Singletons all studies | Callable sites |
|---|---|---|---|
| 446 | 660 (78.7) | 242 (28.9) | 38.1 |
| 498 | 10 (1) | 8 (0.8) | 45.4 |
| 499 | 2974 (158.5) | 1623 (86.5) | 85.3 |
| 494 | 3134 (168.0) | 1606 (86.1) | 84.8 |
| 495 | 837 (44.7) | 520 (27.8) | 85.1 |
| 496 | 2361 (126.0) | 1291 (68.9) | 85.2 |
| 500 | 122 (35.3) | 32 (9.3) | 15.7 |
| 503 | 2806 (150.1) | 1523 (81.4) | 85.0 |
| Total | 12913 (111.9) | 6854 (59.4) | – |
Values in parentheses are number of singleton SNVs per million callable sites.
Values in the Callable sites column are the proportion of all sites that were callable for a given sample.
Summary of microsatellite loci discovered in this study.
| Name | Position | Motif | Repeats | Alleles (genomes) | Alleles (validation) |
|---|---|---|---|---|---|
| CLAIM1 | 2:96021 | ATGC | 5–7 | 4 | 4 |
| CLAIM2 | 2:261105 | AC | 6–13 | 4 | 4 |
| CLAIM3 | 5:280204 | AACAGC | 8–11 | 4 | 3 |
| CLAIM4 | 5:1139547 | AT | 8–9 | 3 | 4 |
| CLAIM5* | 5:1195225 | AAAT | 6–10 | 4 | 4 |
| CLAIM6* | 5:1248584 | AT | 8–10 | 3 | 2 |
| CLAIM7 | 6:436676 | AT | 8–9 | 3 | 1 |
| CLAIM8* | 7:761767 | AT | 8–9 | 3 | 2 |
| CLAIM9 | 7:1343057 | AG | 8–12 | 3 | 1 |
| CLAIM10* | 8:439431 | AT | 8–10 | 4 | 4 |
| CLAIM11 | 9:1048573 | AAAT | 6–7 | 3 | 4 |
| CLAIM12* | 9:1454922 | AT | 8–9 | 3 | 2 |
| CLAIM13* | 10:1348254 | AT | 8–9 | 3 | 2 |
| CLAIM14 | 10:1368121 | AT | 3–8 | 3 | 3 |
| CLAIM15 | 12:2076009 | AC | 7–11 | 4 | 3 |
| CLAIM16* | 12:2101147 | AT | 8–9 | 3 | 2 |
| CLAIM17* | 13:950389 | AT | 7–8 | 3 | 2 |
| CLAIM18 | 14:1469652 | AT | 6–14 | 4 | 3 |
Loci marked with an asterisk were developed by comparing P. vivax sequences to P. cynmolgi. For the numbers of alleles, “genomes” refers to the number identified from genomic data and “validation” to the number detected from the validation panel. PCR primers for each locus are given in S2 Table.
Fig 4A selective sweep around dhps.
Two measures of genetic diversity (y-axis), were calculated in overlapping windows across a portion of Chromosome 14 (positions in x-axis). The dashed path under the x-axis represents the position of exons in the current annotation of the P. vivax reference genome. Exons of dhps are shaded black, all others are shaded gray. Dashed lines represent the genome-wide average of each diversity measure.
Non-synonymous variants at drug resistance loci.
| Locus | Chromosome | Position | Codon | Amino Acid | Number of samples |
|---|---|---|---|---|---|
|
| 14 | 1257856 | gCc/gGc | A383G | 8 (8) |
|
| 5 | 964761 | Agc/Cgc | S58R* | 4 (7) |
| 5 | 964763 | agC/agA* | S58R* | 3 (7) | |
| 5 | 964939 | aGc/aAc | S117N* | 6 (6) | |
|
| 10 | 363169 | tAc/tTc | Y976F | 3 (8) |
| 10 | 363223 | aCg/aTg* | T958M* | 8 (8) | |
| 10 | 363374 | Atg/Ctg | M908L | 8 (8) | |
| 10 | 364598 | Gat/Aat | D500N | 4 (8) | |
| 10 | 365435 | Gtg/Ttg | V221L | 5 (8) | |
| PVX_124085 | 14 | 2043859 | Caa/Gaa* | Q1419E* | 1 (7) |
| 14 | 2044528 | Ccg/Tcg | P1196S | 3 (8) | |
| 14 | 2044798 | Gcg/Tcg* | A1106S* | 2 (6) | |
| 14 | 2044975 | Ctg/Atg | L1047M | 1 (7) | |
| 14 | 2045050 | Gtg/Atg* | V1022M* | 7 (7) | |
| 14 | 2046072 | aGc/aTc | S681I | 3 (6) | |
| 14 | 2047233 | aGg/aTg* | R294M* | 7 (7) | |
| 14 | 2047816 | Gaa/Caa* | E100Q* | 2 (6) |
“Number of samples” refers to the number of genomic sequences that contained this allele in the present study, the number of samples from which any allele could be reliably called appears in parentheses. Variants marked with an asterisk were also recorded from a population in the Amazon basin in Peru [18]. Note the numbering of amino acid substitutions in PVX_124085 differs between these studies as a result of the different annotations used in each case.