Literature DB >> 31262308

Detection of low-density Plasmodium falciparum infections using amplicon deep sequencing.

Angela M Early^1,2, Rachel F Daniels^3,4, Timothy M Farrell^3,4, Jonna Grimsby⁵, Sarah K Volkman^3,4,6, Dyann F Wirth^3,4, Bronwyn L MacInnis^3,4, Daniel E Neafsey^3,4.

Abstract

BACKGROUND: Deep sequencing of targeted genomic regions is becoming a common tool for understanding the dynamics and complexity of Plasmodium infections, but its lower limit of detection is currently unknown. Here, a new amplicon analysis tool, the Parallel Amplicon Sequencing Error Correction (PASEC) pipeline, is used to evaluate the performance of amplicon sequencing on low-density Plasmodium DNA samples. Illumina-based sequencing of two Plasmodium falciparum genomic regions (CSP and SERA2) was performed on two types of samples: in vitro DNA mixtures mimicking low-density infections (1-200 genomes/μl) and extracted blood spots from a combination of symptomatic and asymptomatic individuals (44-653,080 parasites/μl). Three additional analysis tools-DADA2, HaplotypR, and SeekDeep-were applied to both datasets and the precision and sensitivity of each tool were evaluated.
RESULTS: Amplicon sequencing can contend with low-density samples, showing reasonable detection accuracy down to a concentration of 5 Plasmodium genomes/μl. Due to increased stochasticity and background noise, however, all four tools showed reduced sensitivity and precision on samples with very low parasitaemia (< 5 copies/μl) or low read count (< 100 reads per amplicon). PASEC could distinguish major from minor haplotypes with an accuracy of 90% in samples with at least 30 Plasmodium genomes/μl, but only 61% at low Plasmodium concentrations (< 5 genomes/μl) and 46% at very low read counts (< 25 reads per amplicon). The four tools were additionally used on a panel of extracted parasite-positive blood spots from natural malaria infections. While all four identified concordant patterns of complexity of infection (COI) across four sub-Saharan African countries, COI values obtained for individual samples differed in some cases.
CONCLUSIONS: Amplicon deep sequencing can be used to determine the complexity and diversity of low-density Plasmodium infections. Despite differences in their approach, four state-of-the-art tools resolved known haplotype mixtures with similar sensitivity and precision. Researchers can therefore choose from multiple robust approaches for analysing amplicon data, however, error filtration approaches should not be uniformly applied across samples of varying parasitaemia. Samples with very low parasitaemia and very low read count have higher false positive rates and call for read count thresholds that are higher than current default recommendations.

Entities: Chemical

Keywords: Haplotype calling; Malaria; Molecular epidemiology; Molecular surveillance; Multiclonal infection; Multiplicity of infection; Plasmodium; Targeted amplicon deep sequencing; Within-host diversity

Mesh：

Year: 2019 PMID： 31262308 PMCID： PMC6604269 DOI： 10.1186/s12936-019-2856-1

Source DB: PubMed Journal: Malar J ISSN： 1475-2875 Impact factor: 2.979

Background

Amplicon deep sequencing is an increasingly utilized genotyping approach that provides a cost-effective strategy to profile the genetic diversity of pathogen infections. Like single nucleotide polymorphism (SNP)-based genotyping methods, both the data-generation and data-analysis steps of amplicon sequencing are highly scalable, allowing for studies of hundreds to thousands of samples. Additionally, amplicons can be designed to cover long genetic segments composed of multiple variants, allowing for the identification of complete DNA sequences (or haplotypes) in a targeted genomic region. When targeting a highly polymorphic genomic region, a single amplicon can distinguish among hundreds of unique haplotypes [1], providing higher resolution than either SNP-based or length-based genotyping approaches. This improves estimates of the number of lineages within polyclonal infections (or complexity of infection; COI) [2-4], permits the discovery of unknown alleles [5-7], and provides increased information for haplotype-based analyses of epistasis and linkage disequilibrium [8]. Amplicon analysis in Plasmodium has been adapted to multiple sequencing platforms depending on the desired cost, sample size, and sequence length [3, 9–11]. Because of this high resolution and flexibility, amplicon-based methods have been utilized in a range of applications, including studies of allele-specific vaccine efficacy [1], disease severity [10], clearance rate [12], within-host competition [13], relapse rate [9], drug resistance [5-7], host selection [8], and population structure [8, 14]. Amplicon sequencing has high sensitivity for the detection of minority parasite lineages within an infection, and is of particular interest in longitudinal studies that track intra-host dynamics [3, 4]. When used to detect known single variant markers, amplicon sequences can be analysed with relatively straightforward approaches. Longer, complex haplotypes, however, require more sophisticated analysis methods. Amplicon sequencing data are known to be subject to PCR and sequencing artifacts, particularly for genomic regions with high A/T-content and high rates of homopolymerism [15, 16]. In addition, library preparation method and primer choice can influence the types and extent of errors [17]. Correctly identifying sequence errors is therefore a challenge when applying amplicon sequencing to Plasmodium falciparum. Fortunately, several new analytical tools have been developed in recent years to address these challenges [18-21]. Unlike approaches that use reference datasets or cluster sequences with hard percent-identity thresholds, these new methods are more flexible and can distinguish among sequences that differ by only a single nucleotide change [22]. When Plasmodium concentrations are reasonably high, these approaches have been demonstrated to be robust. To date, however, these methods have not been thoroughly evaluated on low-density Plasmodium samples. It is therefore unclear whether additional considerations are required when interpreting amplicon sequencing data from infections with low parasitaemia. This study assesses amplicon sequencing’s lower limit of detection using four analysis tools, and further evaluates each tool’s accuracy and capacity to recover quantitative information on the relative abundance of different haplotypes within infections. Three of these tools—DADA2 [18], HaplotypR [19], and SeekDeep [20]—were previously published and developed to contend with any Plasmodium amplicon. The fourth—the Parallel Amplicon Sequencing Error Correction (PASEC) pipeline—is a distance- and abundance-based error-correction tool that was specifically tailored for use with CSP and SERA2 amplicons [1], and it is formally presented here for the first time. Amplicon sequencing of densely polymorphic regions in the P. falciparum CSP and SERA2 genes was applied to two sample collections. The first—a set of in vitro human/parasite DNA mixtures that mimic low-density parasite infections—was designed to test the limit of detection for amplicon sequencing. The second sample set consisted of DNA extracted from dried blood spots from malaria-infected individuals collected on filter paper in sub-Saharan Africa. This allowed a comparison of analysis approaches using conditions under which samples are typically collected and processed. All four tools detected P. falciparum haplotypes with high sensitivity, and additionally were able to discriminate between major and minor haplotypes with reasonable accuracy. Additionally, PASEC was able to identify a SERA2 indel in patient samples due to its incorporation of prior knowledge on sequence composition. Overall, the results show that low parasitaemia does not preclude amplicon analysis of P. falciparum samples, although researchers should expect reduced sensitivity and reduced precision with low read-count samples (< 100 reads/amplicon) and at parasite DNA concentrations under 5 genomes/μl.

Methods

Sample assembly and composition

Mock Plasmodium/human DNA mixtures

Mixtures of DNA from cultured P. falciparum parasites were combined with human genomic DNA to construct samples that mimic human infections. DNA from up to five culture-adapted parasite lines were combined in various proportions and number (Fig. 1; for exact sample composition and nucleotide differences between clones see Additional file 1: Tables S1, S2). Stock mixtures of 200 genomic copies/μl of DNA template were prepared by real-time PCR quantification of copies/μl in triplicate relative to a plasmid containing a single copy of the quantification target gene [23]. These stock solutions were then diluted to the indicated concentrations in sequencing-grade water and 10 ng commercial human DNA (Promega Corp cat#G3041) was added to all samples. After mixing and dilution, a subset of samples were re-quantified using the same qPCR protocol and reported sample concentrations were adjusted as needed. Plasmodium-free negative control samples were also constructed. These contained either 10 ng of human DNA or only water.

Fig. 1

Mock and natural infection sample composition. a Mock infection samples were constructed from mixtures of P. falciparum and human DNA to mimic the parasite DNA concentrations found in extracted low-density infections. b DNA from up to five clonal cultured parasite lines was combined to create each mock sample, leading to within-sample haplotype counts of one to four. c Natural infection samples were previously collected and extracted from a combination of symptomatic patients and asymptomatic carriers [1]. Parasite densities were determined by blood smear

Natural infections

Previously extracted DNA from 95 blood spots, obtained from individuals infected with P. falciparum, was re-amplified and re-sequenced as part of this study. These samples were acquired from both symptomatic and asymptomatic individuals from four countries in sub-Saharan Africa as part of the RTS,S malaria vaccine phase 3 trial and had parasite densities that ranged from 44 to 653,080 parasites/μl as determined by blood smear (Fig. 1; [24]). Full details on sampling and extraction, including human subjects approval for use of these samples, are provided in Neafsey et al. [1]. In brief, samples were collected as blood spots on Whatman FTA cards, shipped to the Broad Institute, and stored in desiccators until processing. DNA was extracted in batches of 95 samples plus one blank control card using seven 3-mm punches and the automated Chemagen Chemagic bead-based extraction platform. Total DNA was stored at − 80 °C until re-amplification and sequencing.

Positive control plasmid

A plasmid containing synthetic target amplicon sequences for both CSP and SERA2 was obtained from a commercial vendor (Invitrogen/Thermo Fisher Scientific) and served as a positive control during the PCR amplification step. Outside the primer regions, the plasmid sequence contains nucleotide variants not observed in natural P. falciparum isolates so that any instances of contamination can be readily identified. The plasmid map can be found in Additional file 1: Fig. S1.

PCR and sequencing

Two regions from the CSP (PF3D7_0304600) and SERA2 (PF3D7_0207900) genes were PCR amplified as previously described [1]. In brief, 5 μl of ~ 0.5 ng/μl DNA served as template for the initial PCR which amplified the targeted regions. A second PCR was carried out to index samples and create the full sequencing constructs. The final CSP and SERA2 amplicons cover 288 and 258 nucleotides, respectively (Pf3D7_03_v3:221,352–221,639; Pf3D7_02_v3:320,763–321,020). Both amplicons cover sequence regions of high nucleotide diversity in sub-Saharan Africa to maximize the number of distinct haplotypes that can be detected across samples from this geographic area. All DNA samples and negative controls were amplified and sequenced in duplicate. Paired-end 250-bp reads were generated in one MiSeq run conducted on a pool of 384 PCR products. Unless otherwise noted, each PCR/sequencing technical replicate was analysed as a distinct sample. Before downstream analysis, raw sequencing data were demultiplexed and aligned to amplicon reference sequences to remove all non-Plasmodium sequences.

Sample analysis with PASEC

For each sample, paired-end reads were merged using FLASH [25] and aligned with BWA-MEM v0.7.12-r1039 [26] to the amplicon regions of the P. falciparum reference genome assembly (PlasmoDB v.9.0 3D7). Two short homopolymeric tracts in CSP were masked from analysis, as such regions are highly error-prone in Illumina sequencing and these specific tracts were not known to harbour natural polymorphisms. Masked coordinates are given in Additional file 3. Within each sample, haplotypes were filtered according to a set of pre-specified thresholds developed by Neafsey et al. [1]. Haplotypes were required to (1) cover the entire amplicon region, (2) have no uncalled bases, (3) be supported by at least two sets of merged read pairs (henceforth referred to simply as “reads”), and (4) have an intra-sample frequency ≥ 0.01. To account for single nucleotide errors introduced during PCR and sequencing, the filtered haplotypes were clustered based on nucleotide distance and read depth. To inform the stringency of this clustering, the relative read depth of putative PCR errors in the dataset was analysed. Across all the mock samples, there were 208 erroneous haplotypes that (1) differed from a true haplotype within the same sample by a single nucleotide change, and (2) appeared only once in the sequencing run, making it unlikely that contamination was the source of the error. 92% of these erroneous haplotypes were present at a within-sample frequency that was as at least eight times less than the frequency of the correct “parent” haplotype (Additional file 1: Fig. S2). This 8:1 ratio was, therefore, used throughout the whole data set: if two haplotypes within the same sample differed by only one nucleotide and had a read coverage ratio ≥ 8:1, they were merged, maintaining the identity of the more common haplotype. This same 8:1 threshold was used in the initial application of the PASEC pipeline with data from sub-Saharan Africa where both nucleotide and haplotypic diversity was high at these amplicons [1]. In studies where it is necessary to discriminate between closely related haplotypes with a high frequency skew, however, a different cutoff could be applied. Previous implementations of PASEC removed all potential chimeric reads and applied sample read depth filters (≥ 200 reads for one of the two amplicons) [1, 8]. Here, these metrics were analysed, but hard filters were not applied to the samples before the downstream analyses presented in the results. The results therefore represent a minimally—not an optimally—filtered data set. For most applications, further filtration is recommended, as discussed below. Full details on the PASEC pipeline, its customizable parameters, and its implementation in this study are found in Additional files 2 and 3 and at https://github.com/tmfarrell/pasec.

Sample analysis with DADA2, HaplotypR, and SeekDeep

All samples were independently analysed using three additional amplicon analysis tools: DADA2 [18], HaplotypR [19], and SeekDeep v.2.6.0 [20]. Beyond the changes detailed below, input parameters deviated only modestly from the default settings. Parameters and scripts used for executing each pipeline can be found in Additional file 3. While previous implementations of PASEC applied a 200 reads/sample threshold, no read count filters were applied at the sample level in the analysis comparisons. SeekDeep gives the option of grouping data from technical PCR/sequencing replicates of the same sample and applying clustering and filtering to this grouped data to increase confidence in final calls. The pipeline was therefore run under two conditions: grouping technical replicates (the recommended, default SeekDeep approach; “SeekDeep2x”) and treating each PCR/sequencing replicate independently (“SeekDeep1x”). This permitted more equivalent comparisons among pipelines that do not incorporate replicate information and allowed for a determination of whether a single replicate is sufficient for making accurate haplotype calls. For HaplotypR, the command-line interface was extended in two ways. First, it was altered to return full haplotype sequences as opposed to only bases at variant positions. Second, the trimming input command was expanded to allow each amplicon to have different lengths. The version of HaplotypR used in this analysis can be found at https://github.com/tmfarrell/HaplotypR. After running the pipeline, the authors’ recommended sample-level filtering was applied to the data. Specifically, each sample was required to have a minimum of 25 reads, and individual haplotypes needed to have a minimum of 3 reads and a within-host frequency of at least 0.1%.

Comparison of analysis tools

All four tools were assessed for their ability to resolve haplotypes at within-sample frequencies down to 1% using the mock low-parasitaemia samples. Two performance metrics were computed by comparing expected vs. observed haplotypes in each sample: sensitivity (proportion of all expected haplotypes that were observed) and precision (proportion of all observed haplotypes that were expected). For sensitivity calculations, only haplotypes present at a concentration of at least 1 copy/μl of DNA template (5 copies/PCR reaction) were considered. For each tool, samples were only included in the performance metric calculation if at least one haplotype was identified. Except for the SeekDeep2x implementation, each PCR/sequencing replicate was analysed as a distinct sample.

Results

Sequencing coverage for low-density mock infections and natural infections from sub-Saharan Africa

In total, 148 DNA mixtures of known haplotypic composition, 190 blood samples from sub-Saharan Africa, 12 positive-control plasmid samples, and 4 negative-control samples without Plasmodium DNA were PCR amplified for CSP and SERA2 and sequenced on a single Illumina MiSeq run. The 148 mock infections were constructed to mimic infections with low parasite density and contained between 1 and 200 P. falciparum genomes/μl (Fig. 1a). We assume that these values roughly correspond to parasite densities of 1 and 200 parasites/μl as sampled peripheral blood is heavily enriched for ring-stage infected cells containing only a single parasite genome. Actual extracted DNA concentrations will vary, however, based on the volume of blood extracted, the extraction efficiency, and the DNA suspension volume. In the initial amplification step, 5 μl of DNA template were used, so samples at the lowest end of this distribution (1 genome/μl) should have had, on average, five genomic copies per PCR reaction. After sequencing, 145 samples had full-length read coverage for at least one of the two amplicons. For each amplicon, initial raw coverage across these samples ranged from 0 to 280,876 reads. After implementing the PASEC pipeline, coverage ranged from 0 to 31,787 reads. Coverage was sufficient for both amplicons, although median coverage was higher for CSP than for SERA2 (1872 vs. 909; Fig. 2a). All samples with low coverage (< 100 reads) had Plasmodium DNA concentrations below 21 genomes/μl. Overall, however, coverage and genome copy number were only weakly correlated (Spearman’s ρ = 0.55, P = 9.3 × 10−14; Fig. 2b), suggesting that stochastic factors influence read counts for low parasitaemia samples in general.

Fig. 2

Sequencing coverage of mock and natural infection samples. Overall sequencing coverage was lower for mock infection (a) than natural infection (c) samples (Mann–Whitney U Test, P = 1 × 10−7) although natural infections had a higher proportion of samples with no reads. Total read coverage (reads combined from both amplicons) correlated weakly with parasite genome concentration for mock infections (b) and parasitaemia for natural infections (d) Read coverage was higher for the samples from natural infections (Fig. 2c). These samples were extracted from dried blood spots and had parasite densities that ranged from 44 to 653,080 parasites/μl as determined by microscopy of blood smears. As with the mock infections, coverage was generally higher for samples with higher parasite loads, but this correlation was low (Spearman’s ρ = 0.31, P = 1.1 × 10−9; Fig. 2d). While read coverage was higher, overall sequencing success was lower for the natural than for the mock infections (Fig. 2c), a likely result of difficulties with extracting high quality DNA from the stored filter paper blood spots. As would be expected under this scenario, failure rate was not evenly distributed across the natural infection samples, suggesting some experienced a higher degree of degradation. Each of the 95 blood samples was PCR amplified and sequenced in duplicate, yielding two CSP and two SERA2 technical replicates per initial blood sample extraction, or 340 total amplicon samples. Of these 340 amplicon samples, 94 (25%) had low read counts (< 100 reads). These failures clustered in a small number of blood samples, suggesting that amplification and sequencing success is dependent on sample quality: only 33 (35%) of the blood samples experienced any amplicon failure and 18 samples (19%) received low read counts for all 4 amplicon attempts.

Absolute haplotype concentration affects the probability of sequencing success

One challenge of amplicon sequencing analysis is to correctly resolve individual haplotypes present within an infection at varying concentrations. Each mock sample contained between one and four unique haplotypes at the CSP and SERA2 amplicons present at concentrations of 1–200 copies/μl (Fig. 1b). Overall, there was a high recovery of these expected haplotypes from each of the samples. PASEC correctly identified all haplotypes present at a concentration of 30 copies/μl or higher and 96% of haplotypes with concentrations over 20 copies/μl. Conversely, only 41% of haplotypes with 1–5 copies/μl were recovered (Fig. 3a). As further discussed in the tool comparison below, this haplotype sensitivity is only slightly influenced by the post-sequencing analysis method and instead is driven by a failure to initially amplify and/or sequence these low frequency haplotypes.

Fig. 3

Identification of haplotypes in mock samples. a Detection of known haplotypes within the mock samples was dependent on the haplotype concentration (copies/μl) within the DNA sample. 5 μl of DNA template were used in the first round PCR amplification step prior to sequencing. Error bars represent the binomial-estimated standard deviation. b Across all mock samples, 31% of identified haplotypes were erroneous, but these haplotypes were generally supported by fewer reads than correct haplotypes. The number of nucleotide (nt) errors per haplotype was calculated as the nucleotide distance between an observed haplotype and the closest expected haplotype within the sample

Amplicon sequencing retains some information on within-sample haplotype frequencies, even at low concentrations

When performing direct short-read sequencing, relative read depth can be used to infer sample features like genotype ratios or genome copy number variations. During construction of amplicon libraries, however, PCR amplification prior to sequencing introduces stochastic variation in the final read counts. Nevertheless, analysis of the final read ratios in the mock samples shows that some information about the original haplotype ratios can be recovered. For samples with at least 100 reads, the correlation between the haplotypic ratio in the template DNA and final read ratio was strong across all haplotypes (Pearson’s r = 0.82, P < 0.001), but weaker for haplotypes with intermediate frequencies between 0.1 and 0.9 (Pearson’s r = 0.60, P < 0.001; Additional file 1: Fig. S3). In 73% of samples with at least a 4% margin between the two most prevalent haplotypes, read ratio correctly identified the most prevalent haplotype in the starting DNA mixture. Again, low read count reduced the probability of identifying the correct major haplotype (Fig. 4a). Similarly, major haplotype identification was less accurate in samples with very low total Plasmodium DNA concentration (< 5 genomes/μl; Fig. 4b).

Fig. 4

Proportion of mock samples where the major haplotype was correctly identified. Identification of the major haplotype within a sample was less reliable at a low read counts and b low parasite genome concentrations. Samples were excluded from the analysis if the difference in prevalence between the top two haplotypes was less than 4%. Error bars represent the binomial-estimated standard deviation

Erroneous haplotypes have lower read support than correct haplotypes

Read support is a useful indicator of the likelihood that a called haplotype is correct. Haplotypes with single-read support were largely sequencing artifacts, with only 0.030% matching a haplotype sequence known to be present in the sample mixtures. The default PASEC pipeline therefore requires haplotypes to have read support ≥ 2, a filter that eliminated 89.0% of CSP and 85.8% of SERA2 initially called haplotypes from the dataset. After minimal filtration, 0.75% of the total reads were erroneous, a percentage close to that previously reported by Hathaway et al. on a different dataset analysed with their tool SeekDeep (0.8%) [20]. Overall, this resulted in 31% of identified haplotypes being erroneous. Both erroneous reads and erroneous haplotypes were unevenly distributed across samples, however, making it possible to reduce the false positive rate with further filtration. First, erroneous haplotypes showed lower read support than true haplotypes (Fig. 3b). Raising the minimum haplotype read depth from two to five reads increased precision from 0.81 to 0.91 while having a smaller impact on sensitivity, which was lowered from 0.71 to 0.68 (Additional file 1: Fig. S4). Second, erroneous reads were more prevalent in samples with low read depth and/or low parasite concentration (Additional file 1: Fig. S5), which results in low precision within these sample groups specifically (Fig. 5). Finally, the number of reads supporting erroneous haplotypes differs between samples with low and high read depth. In samples with fewer than 100 reads, 68% of identified haplotypes were erroneous and 86% of these erroneous haplotypes had fewer than five supporting reads. In samples with at least 100 reads, 15% of identified haplotypes were erroneous but only 32% of these had fewer than five supporting reads. Therefore, in instances where samples with low read count must be included, researchers may decide to apply filters that are dependent on sample read depth, similar to the parasitaemia-dependent frequency filters created by Mideo et al. [12].

Fig. 5

Error rates are higher for samples with low read counts and/or low parasite density. Sensitivity and precision are affected by a read count per amplicon and b parasite genome concentration. All results were obtained with the PASEC pipeline on the full set of mock samples using only minimal filtration. 95% confidence intervals were estimated with 1000 bootstrapped data set replicates

Frequency and source of haplotype errors in the mock samples

The PASEC pipeline contains customized filtration and error-correction steps to remove erroneous CSP and SERA2 haplotypes. The filtration and error-correction steps in PASEC were designed to address three main sources of erroneous haplotypes: sequencing errors, chimeric reads, and sample contamination. The frequency of these error types and the efficacy of the various PASEC filters are discussed in more detail below. To provide a more complete profile of the error types found in amplicon data, the presented results are minimally—not optimally—filtered. As most filters will result in a tradeoff between sensitivity and specificity, researchers can tailor the exact level of filtering to their specific data set and scientific question.

Nucleotide sequence errors

The majority of erroneous haplotypes are expected to result from sequence errors (nucleotide substitutions or indels) that occur during Illumina sequencing or the initial rounds of PCR. The PASEC pipeline accounted for these errors with two approaches: (1) hard masking error-prone sequence regions and (2) clustering haplotypes that differed by a single nucleotide and had a read coverage ratio ≥ 8:1. Hard masking was applied to two homopolymeric regions in CSP composed of 9 and 6 poly-Ts. In the raw data, erroneous indels within these two regions were detected in 5.7% and 1.2% of full-length reads. While true indels might occur in these sequences in natural populations, this high artifactual indel rate suggests that inference of variants in these regions would be too unreliable using Illumina sequencing. Compared to masking, the clustering of haplotypes had an even greater impact on reducing nucleotide errors: 57.0% of CSP haplotypes and 47.9% of SERA2 haplotypes were eliminated at this step. In the final minimally filtered dataset, approximately half of the erroneous haplotypes (51%) differed from a true haplotype by one or two nucleotide changes and were likely the result of Illumina sequencing or PCR errors. As discussed above, these haplotypes were supported by fewer reads than true haplotypes (Fig. 3b) and were more prevalent in samples with low read count. Additional filtration could therefore be applied on these factors to further reduce the false positive rate after assessing the potential need to detect closely related haplotypes with a high frequency skew.

Chimeric reads

Chimeric reads are false recombinant haplotypes generated during PCR amplification. While a necessary consideration when performing amplicon sequencing, their overall impact on the mock sample analysis was minimal. Potential chimeras were identified with the isBimera function in DADA2 [18], which identifies all haplotypes that could be constructed from a simple combination of two other haplotypes within the same sample. This analysis flagged 7 CSP and 16 SERA2 samples as containing a total of 36 chimeric haplotypes. Eleven (31%) of the flagged haplotypes were in fact true haplotypes known to be within the given sample. Further analysis showed that 20 of the 25 flagged erroneous haplotypes were only one nucleotide change away from another haplotype in the sample, and the remaining five were related by two nucleotide changes. This suggests that these haplotypes may have resulted from PCR or sequencing error instead of chimeric read formation. Eighteen (78%) of the flagged samples had total read counts under 200, the read threshold previously used with the PASEC pipeline [1]. The increased stochasticity associated with low-read samples may explain why these haplotypes were not merged as part of the PASEC sequencing error filter. Correctly identifying chimeric reads in natural infections presents an additional challenge, especially in regions of high malaria prevalence where recombination among haplotypes will be higher. Of the 50 most common CSP sequences detected in sub-Saharan Africa [8], 38 (76%) were flagged as chimeric combinations by DADA2. Researchers must therefore consider additional factors like population-level haplotype frequency when identifying chimeric reads in natural infections [19, 20].

Cross-sample or environmental contamination

A large percentage (49%) of erroneous haplotypes had no evidence of chimerism and were unlikely to have resulted from sequencing errors as they were ≥ 3 nucleotide changes away from any true haplotype within a given sample. 68% of these haplotypes were present in other samples from the same MiSeq run, suggesting cross-sample or environmental contamination. The remaining haplotypes occurred only once in the whole dataset and may have resulted from environmental contamination. A small amount of cross-sample or environmental contamination was also observed in the negative control samples that contained either water (N = 2) or human DNA (N = 2). These four Plasmodium-free samples contained 5, 7, 16, and 20 reads, respectively. All of these read counts fell well below per-sample threshold of 200 reads that was used previously with the PASEC pipeline [1].

Comparison of PASEC with three state-of-the-art amplicon analysis tools

The performance of PASEC—a pipeline that has been carefully tuned for use with the CSP and SERA2 amplicons in P. falciparum—was compared to that of three analysis tools that were developed to be applied to amplicons from any genomic region: DADA2 [18], HaplotypR [19], and SeekDeep [20]. All four of these tools were designed to detect low-frequency haplotypes and differentiate unique haplotypes with single-nucleotide resolution. There are, however, differences in the analytical approaches. For instance, during error filtration PASEC and HaplotypR rely mainly on variant frequency and read depth, while SeekDeep incorporates k-mer frequencies and base quality scores and DADA2 further models sequencer-specific error likelihoods. SeekDeep additionally allows users to incorporate replicate PCR and sequencing runs into the analysis. This approach provides higher confidence for differentiating between sequencing errors and true haplotypes that differ at only a single nucleotide. However, as the mock samples did not provide the opportunity to discriminate between such closely related haplotypes, this SeekDeep feature was not evaluated in the trial. While all these tools have undergone rigorous testing, no previous study has focused on their performance under extremely low parasite densities (but see [12]). Here, each tool was applied to the mock samples and it was evaluated on (1) the proportion of all expected haplotypes that were observed (sensitivity) and (2) the proportion of observed haplotypes that were expected (precision).

Sensitivity and precision

Overall, the four tools performed comparably on the mock sample panel, although they showed more variability in precision than in sensitivity (Fig. 6). What differs most between pipelines is their ability to filter out erroneous haplotypes, not identify correct haplotypes. For instance, while the sensitivity of SeekDeep1x—the SeekDeep implementation using only one technical replicate—was comparable to the other four pipelines, its precision was substantially lower, driven by the identification of a high number of erroneous haplotypes. The use of replicate samples in SeekDeep2x greatly decreased the tool’s false positive rate, increasing precision with a small cost in sensitivity.

Fig. 6

Sensitivity and precision of five analysis pipelines for the detection of haplotypes in mock samples. a Analysis approaches vary more in precision than in sensitivity. b Performance of all pipelines improves when considering only samples that had at least 100 reads for an individual amplicon. Data shown include results from both the CSP and SERA2 amplicons. 95% confidence intervals were estimated with 1000 bootstrapped data set replicates Each tool’s performance varied to some extent across amplicons. This variation was not consistent across pipelines, and as a result, the pipelines’ rank order for precision and sensitivity was different for CSP and SERA2 (Table 1; Additional file 1: Fig. S6).

Table 1

Sensitivity and precision of each pipeline (mean [95% CI])

	DADA2	HaplotypR	PASEC	SeekDeep1x	SeekDeep2x
All samples
Sensitivity
All	0.66 [0.62, 0.70]	0.66 [0.62, 0.70]	0.71 [0.68, 0.75]	0.72 [0.68, 0.76]	0.62 [0.56, 0.68]
CSP	0.66 [0.61, 0.71]	0.64 [0.59, 0.70]	0.70 [0.64, 0.75]	0.70 [0.65, 0.75]	0.61 [0.53, 0.69]
SERA2	0.65 [0.59, 0.70]	0.68 [0.62, 0.74]	0.73 [0.68, 0.78]	0.73 [0.68, 0.79]	0.63 [0.55, 0.71]
Precision
All	0.81 [0.77, 0.84]	0.88 [0.85, 0.90]	0.81 [0.78, 0.85]	0.25 [0.23, 0.27]	0.68 [0.63, 0.74]
CSP	0.72 [0.67, 0.77]	0.94 [0.91, 0.97]	0.86 [0.81, 0.89]	0.26 [0.23, 0.28]	0.77 [0.69, 0.84]
SERA2	0.91 [0.87, 0.94]	0.82 [0.78, 0.86]	0.77 [0.72, 0.82]	0.25 [0.22, 0.28]	0.61 [0.53, 0.68]
Samples with ≥ 100 reads
Sensitivity
All	0.83 [0.80, 0.86]	0.84 [0.81, 0.86]	0.83 [0.81, 0.86]	0.83 [0.80, 0.86]	0.78 [0.78, 0.87]
CSP	0.82 [0.78, 0.86]	0.82 [0.77, 0.86]	0.82 [0.78, 0.86]	0.82 [0.78, 0.86]	0.84 [0.78, 0.89]
SERA2	0.85 [0.80, 0.89]	0.86 [0.81, 0.90]	0.85 [0.80, 0.89]	0.85 [0.81, 0.89]	0.82 [0.75, 0.88]
Precision
All	0.83 [0.80, 0.86]	0.89 [0.87, 0.92]	0.92 [0.90, 0.94]	0.26 [0.24, 0.28]	0.79 [0.74, 0.84]
CSP	0.75 [0.70, 0.79]	0.94 [0.91, 0.96]	0.95 [0.92, 0.97]	0.27 [0.24, 0.30]	0.88 [0.83, 0.93]
SERA2	0.92 [0.88, 0.95]	0.84 [0.80, 0.88]	0.90 [0.86, 0.93]	0.25 [0.22, 0.28]	0.71 [0.63, 0.78]

Sensitivity and precision of each pipeline (mean [95% CI])

Effect of sample read depth and genome copy number

All five pipelines showed reduced performance at low parasite concentrations (< 5 genomes/μl of template or < 25 genomes/PCR reaction; Additional file 1: Fig. S7) and at very low read depths (< 25 reads/sample; the exception being HaplotypR, which filtered out samples with < 25 reads). In particular, SeekDeep2x performed best on samples with at least 100 reads (Fig. 6b). Parasite genome copy number also affected the tools’ success at returning any data for a sample (i.e., resolving at least one haplotype within that sample). Overall, the pipelines reported haplotypes within 78% (HaplotypR), 81% (DADA2), 84% (SeekDeep2x), 89% (PASEC), and 96% (SeekDeep1x) of the samples (Additional file 1: Fig. S8A). The majority of the samples returning no data contained Plasmodium DNA concentrations under 5 genomes/μl (Additional file 1: Fig. S8B).

Determination of major haplotype frequency

As reported above, PASEC correctly identified the expected major haplotype in 73% of the mock samples. Misidentification of the expected haplotype could result from errors in the pipeline or stochasticity during sample construction, PCR amplification and sequencing. Strongly suggesting that stochasticity in sample processing and sequencing plays a role, the frequency estimate for each sample’s major haplotype was highly correlated between tools (Pearson’s r for all pairs > 0.85, P < 0.001; Additional file 1: Fig. S9A). The correlation between tools was even higher when limiting the analysis to samples with at least 100 reads (Pearson’s r for all pairs > 0.97, P < 0.001; Additional file 1: Fig. S9B). All tools, therefore, arrive at comparable frequency estimates based on the number of reads produced per haplotype.

Analysis of natural infection samples from sub-Saharan Africa with the four tools

All five pipelines were then applied to newly generated amplicon data from 95 previously extracted parasite positive blood spots from four countries in sub-Saharan Africa (Fig. 1c) [1]. These biological samples were PCR amplified and sequenced in duplicate, yielding 190 independently sequenced samples for each of the two amplicons. With the exception of SeekDeep2x, the technical replicates were again treated as separate samples in the analysis step. All tools were run with the same parameters used for the mock samples. The tools differed in the total number of unique haplotypes identified across the samples, with estimates ranging from 48 to 336 for CSP and 38 to 412 for SERA2 (Additional file 1: Fig. S10). For both amplicons, SeekDeep1x and DADA2 identified substantially more haplotypes than the other approaches, although a large percentage of these haplotypes were found at within-sample frequencies under 1%, raising the possibility that they were artifacts. Only PASEC identified a three nucleotide indel in SERA2 that was found on seven different haplotypic backgrounds. This was because the PASEC hard filters permitted this indel to remain based on its prior observation in African parasites [1]. Consistent with expectations for sub-Saharan Africa, the majority of the natural infection samples contained multiple P. falciparum parasite haplotypes. COI was estimated for each sample as the maximum number of unique haplotypes identified at either of the two amplicons. With the exception of SeekDeep1x, all four tools produced similar trends of mean COI per country (Fig. 7; Additional file 1: Fig. S11). The overall higher number of haplotypes identified with SeekDeep1x is also in keeping with the observation that SeekDeep showed lower precision on the mock samples than the other tools when run with single replicates (Fig. 6).

Fig. 7

Mean COI estimates for four sub-Saharan African study sites made by the five analysis pipelines. COI was defined as the maximum number of haplotypes retrieved for the sample from either of the two amplicons. Amplicon-specific estimates are found in Additional file 1: Fig. S11

Discussion

Amplicon sequencing of complex haplotypic regions is a powerful tool being applied to an increasing range of questions in malaria research. This highly scalable approach can accurately estimate COI, identify distinct haplotypes within polyclonal infections, and permit temporal tracking of distinct clones, however, reliable analysis requires a thorough understanding of potential error sources. Previous applications and evaluations of amplicon sequencing have focused on moderate to high density infections. Here, the performance of amplicon sequencing was assessed for the first time under a scenario of extremely low parasite densities (1–200 genomes/μl of DNA template), which mimicked samples that could be obtained from asymptomatic carriers. The results show that amplicon sequencing remains a viable approach under such challenging scenarios, as it was able to detect 77% of individual haplotypes present at concentrations of 5–10 genomic copies/μl when using 5 μl of template per PCR reaction. The ability of Illumina-based amplicon sequencing to reliably detect Plasmodium DNA at these extremely low concentrations shows that it has a limit of detection on par with standard nested PCR [27] and qPCR [28] methods. While amplicon sequencing is successful at low parasite densities, analysis of such samples presents unique challenges, particularly when parasite DNA concentration drops below 5 genomes/μl. At these low concentrations, overall sample-level error rates are higher and quantification of haplotype ratios is less accurate, regardless of the applied analysis tool. Researchers should, therefore, take steps to lower false positive rates in this challenging class of samples. Since erroneous haplotypes are generally supported by fewer reads (Fig. 3b) and samples with lower read counts have a higher proportion of false haplotypes (Additional file 1: Fig. S5), it should be standard practice to raise read thresholds when analysing low parasitaemia or low coverage samples. PASEC’s high performance was the result of hand-tuning for use with the amplicons CSP and SERA2. This included the hard masking of difficult-to-sequence homopolymer runs in the CSP amplicon and the a priori identification of indels in SERA2. As a result of this customization, it was the only tool to identify a naturally occurring three nucleotide deletion in SERA2 that is present in Africa. Importantly, however, this study shows that three other tools—DADA2, HaplotypR, and SeekDeep—also provide robust results when prior knowledge of the error profile of an individual amplicon is unavailable and rapid, parallelized analysis is not needed. Amplicon sequencing will become more useful as further methodological development is undertaken. For instance, ongoing updates to SeekDeep (made after v. 2.6.0, which is used here) have focused on improving both sensitivity and specificity, especially with low read-depth and single-replicate samples (github.com/bailey-lab/SeekDeep). In this analysis, precision varied most among tools, resulting from their different approaches towards error correction. As the rank order of the tools’ precision differed between the two amplicons, however, the relative success of these different approaches seems dependent on genetic context. Evaluation of these tools on a larger set of diverse amplicons will be required to formulate an understanding of how specific genetic characteristics drive these differences in precision. In the meantime, with PASEC and SeekDeep in particular, users can increase precision by implementing a simple 100 read threshold at the sample level (Table 1) or by calibrating filters with variable read thresholds when parasite concentrations are known [12]. Additional increases in precision will require further development in areas like contaminant identification, and this work is ongoing [29, 30]. These advances will also improve sensitivity with low-frequency haplotypes as more refined error identification could lessen the need for stringent cutoffs like the 1% within-sample read count filter recommended with PASEC. Further improvements in sensitivity, however, will largely rely on changes upstream of the analysis stage as the inability to detect a haplotype generally resulted from a failure to capture it at the amplification or sequencing stage. This is reflected by the roughly equivalent sensitivities for the four evaluated tools. The exact error profile described here is not directly portable to studies that use alternative amplicons and PCR protocols or that employ different sequencing methods. Still, it likely provides reasonable guidelines for the use of amplicon sequencing with low-density samples. Mideo et al. [12] previously implemented sample-level filtering with a different CSP amplicon that was sequenced using Ion Torrent technology. Using a dilution series of mock samples, they evaluated the relationship between parasite density and haplotype error rate, allowing haplotype frequency cutoffs to shift as a function of sample parasite DNA concentration. As in the study here, they found that the proportion of erroneous reads within a sample increased dramatically below 6 genomic copies/μl. However, while this similarity is suggestive, it should not preclude future evaluations with different protocols, and researchers should continue to inform filtration parameters with study-specific error estimates. Similarly, studies that use other amplicons or sample from different geographic regions must consider the expected haplotype diversity within the targeted parasite population. This knowledge can refine filtering at both the nucleotide and haplotype level. At the nucleotide level, segments prone to sequencing errors can be hard masked, and alternatively, known variants—like difficult-to-sequence indels—can be permitted to pass through otherwise stringent filters. Such filtration is directly incorporated into PASEC but could also be performed post hoc with other analysis tools. At the haplotype level, comparing the frequencies of haplotypes within samples, within plates, and across the entire population can help flag sequencing errors, chimeric reads, and instances of contamination. All the amplicon analysis pipelines used here rely on population-level information either gathered previously or drawn simultaneously from the dataset to inform filtering. As filtering cutoffs directly affect both sensitivity and specificity, however, researchers should make informed decisions regarding the expected sensitivity/specificity tradeoffs, especially in the instances where filtering levels are not manually set by the user. In addition for large studies, filtering can be implemented in an iterative way as more data are acquired for a given population.

Conclusion

As demonstrated here with the new tool PASEC, amplicon sequencing can be applied to samples with both low and high parasite densities, although the consistent detection of parasite clones with very low prevalence (< 5 genomes/μl of extracted DNA) is challenging. When used under their recommended conditions, three other versatile analysis tools (DADA2, HaplotypR, and SeekDeep) showed similar performance compared to PASEC. Overall, all tools performed well, and so final choice of analysis method will depend largely on study design (e.g., the inclusion of technical PCR/sequencing replicates), the read coverage of the samples, and expectations regarding the targeted Plasmodium genotypes (e.g., the potential presence of indels or the need to differentiate between low frequency haplotypes with a single SNP difference). Regardless of the tool used, however, it should be standard practice to raise read thresholds when analysing amplicon data from samples with low parasitaemia or low coverage (< 100 reads) and to tailor final filters based on haplotype frequencies within the study population. Additional file 1. Additional tables and figures. Additional file 2. Parallel amplicon sequencing error correction (PASEC) workflow. Additional file 3. Pipeline configuration and parameter files.

29 in total

1. Exposing malaria in-host diversity and estimating population diversity by capture-recapture using massively parallel pyrosequencing.

Authors: Jonathan J Juliano; Kimberly Porter; Victor Mwapasa; Rithy Sem; William O Rogers; Frédéric Ariey; Chansuda Wongsrichanalai; Andrew Read; Steven R Meshnick
Journal: Proc Natl Acad Sci U S A Date: 2010-11-01 Impact factor: 11.205

2. High-throughput pooling and real-time PCR-based strategy for malaria detection.

Authors: Steve M Taylor; Jonathan J Juliano; Paul A Trottman; Jennifer B Griffin; Sarah H Landis; Paluku Kitsa; Antoinette K Tshefu; Steven R Meshnick
Journal: J Clin Microbiol Date: 2009-11-25 Impact factor: 5.948

3. FLASH: fast length adjustment of short reads to improve genome assemblies.

Authors: Tanja Magoč; Steven L Salzberg
Journal: Bioinformatics Date: 2011-09-07 Impact factor: 6.937

4. Genetic Diversity and Protective Efficacy of the RTS,S/AS01 Malaria Vaccine.

Authors: D E Neafsey; M Juraska; T Bedford; D Benkeser; C Valim; A Griggs; M Lievens; S Abdulla; S Adjei; T Agbenyega; S T Agnandji; P Aide; S Anderson; D Ansong; J J Aponte; K P Asante; P Bejon; A J Birkett; M Bruls; K M Connolly; U D'Alessandro; C Dobaño; S Gesase; B Greenwood; J Grimsby; H Tinto; M J Hamel; I Hoffman; P Kamthunzi; S Kariuki; P G Kremsner; A Leach; B Lell; N J Lennon; J Lusingu; K Marsh; F Martinson; J T Molel; E L Moss; P Njuguna; C F Ockenhouse; B Ragama Ogutu; W Otieno; L Otieno; K Otieno; S Owusu-Agyei; D J Park; K Pellé; D Robbins; C Russ; E M Ryan; J Sacarlal; B Sogoloff; H Sorgho; M Tanner; T Theander; I Valea; S K Volkman; Q Yu; D Lapierre; B W Birren; P B Gilbert; D F Wirth
Journal: N Engl J Med Date: 2015-10-21 Impact factor: 91.245

5. Using Amplicon Deep Sequencing to Detect Genetic Signatures of Plasmodium vivax Relapse.

Authors: Jessica T Lin; Nicholas J Hathaway; David L Saunders; Chanthap Lon; Sujata Balasubramanian; Oksana Kharabora; Panita Gosi; Sabaithip Sriwichai; Laurel Kartchner; Char Meng Chuor; Prom Satharath; Charlotte Lanteri; Jeffrey A Bailey; Jonathan J Juliano
Journal: J Infect Dis Date: 2015-03-06 Impact factor: 5.226

6. A phase 3 trial of RTS,S/AS01 malaria vaccine in African infants.

Authors: Selidji Todagbe Agnandji; Bertrand Lell; José Francisco Fernandes; Béatrice Peggy Abossolo; Barbara Gaelle Nfono Ondo Methogo; Anita Lumeka Kabwende; Ayola Akim Adegnika; Benjamin Mordmüller; Saadou Issifou; Peter Gottfried Kremsner; Jahit Sacarlal; Pedro Aide; Miguel Lanaspa; John J Aponte; Sonia Machevo; Sozinho Acacio; Helder Bulo; Betuel Sigauque; Eusébio Macete; Pedro Alonso; Salim Abdulla; Nahya Salim; Rose Minja; Maxmillian Mpina; Saumu Ahmed; Ali Mohammed Ali; Ali Takadir Mtoro; Ali Said Hamad; Paul Mutani; Marcel Tanner; Halidou Tinto; Umberto D'Alessandro; Hermann Sorgho; Innocent Valea; Biébo Bihoun; Issa Guiraud; Berenger Kaboré; Olivier Sombié; Robert Tinga Guiguemdé; Jean Bosco Ouédraogo; Mary J Hamel; Simon Kariuki; Martina Oneko; Chris Odero; Kephas Otieno; Norbert Awino; Meredith McMorrow; Vincent Muturi-Kioi; Kayla F Laserson; Laurence Slutsker; Walter Otieno; Lucas Otieno; Nekoye Otsyula; Stacey Gondi; Allan Otieno; Victorine Owira; Esther Oguk; George Odongo; Jon Ben Woods; Bernhards Ogutu; Patricia Njuguna; Roma Chilengi; Pauline Akoo; Christine Kerubo; Charity Maingi; Trudie Lang; Ally Olotu; Philip Bejon; Kevin Marsh; Gabriel Mwambingu; Seth Owusu-Agyei; Kwaku Poku Asante; Kingsley Osei-Kwakye; Owusu Boahen; David Dosoo; Isaac Asante; George Adjei; Evans Kwara; Daniel Chandramohan; Brian Greenwood; John Lusingu; Samwel Gesase; Anangisye Malabeja; Omari Abdul; Coline Mahende; Edwin Liheluka; Lincoln Malle; Martha Lemnge; Thor G Theander; Chris Drakeley; Daniel Ansong; Tsiri Agbenyega; Samuel Adjei; Harry Owusu Boateng; Theresa Rettig; John Bawa; Justice Sylverken; David Sambian; Anima Sarfo; Alex Agyekum; Francis Martinson; Irving Hoffman; Tisungane Mvalo; Portia Kamthunzi; Rutendo Nkomo; Tapiwa Tembo; Gerald Tegha; Mercy Tsidya; Jane Kilembe; Chimwemwe Chawinga; W Ripley Ballou; Joe Cohen; Yolanda Guerra; Erik Jongert; Didier Lapierre; Amanda Leach; Marc Lievens; Opokua Ofori-Anyinam; Aurélie Olivier; Johan Vekemans; Terrell Carter; David Kaslow; Didier Leboulleux; Christian Loucq; Afiya Radford; Barbara Savarese; David Schellenberg; Marla Sillman; Preeti Vansadia
Journal: N Engl J Med Date: 2012-11-09 Impact factor: 91.245

7. Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform.

Authors: Melanie Schirmer; Umer Z Ijaz; Rosalinda D'Amore; Neil Hall; William T Sloan; Christopher Quince
Journal: Nucleic Acids Res Date: 2015-01-13 Impact factor: 16.971

8. Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors: Heng Li; Richard Durbin
Journal: Bioinformatics Date: 2009-05-18 Impact factor: 6.937

9. A general SNP-based molecular barcode for Plasmodium falciparum identification and tracking.

Authors: Rachel Daniels; Sarah K Volkman; Danny A Milner; Nira Mahesh; Daniel E Neafsey; Daniel J Park; David Rosen; Elaine Angelino; Pardis C Sabeti; Dyann F Wirth; Roger C Wiegand
Journal: Malar J Date: 2008-10-29 Impact factor: 2.979

10. Characterizing and measuring bias in sequence data.

Authors: Michael G Ross; Carsten Russ; Maura Costello; Andrew Hollinger; Niall J Lennon; Ryan Hegarty; Chad Nusbaum; David B Jaffe
Journal: Genome Biol Date: 2013-05-29 Impact factor: 13.583

14 in total

1. Genetic Diversity of Plasmodium falciparum and Distribution of Antimalarial Drug Resistance Mutations in Symptomatic and Asymptomatic Infections.

Authors: Elangwe M Sarah-Matio; Emilie Guillochon; Antoine Berry; Isabelle Morlais; Sandrine E Nsango; Luc Abate; Christelle M Ngou; Gaelle A Bouopda; Lionel B Feufack-Donfack; Albert N Bayibéki; Majoline Tchioffo Tsapi; Arthur Talman; Alejandro Marin-Menendez; Lawrence Ayong; Antoine Claessens; Thierry Lefèvre
Journal: Antimicrob Agents Chemother Date: 2022-07-06 Impact factor: 5.938

2. Parasites and blood-meal hosts of the tsetse fly in Tanzania: a metagenomics study.

Authors: Ju Yeong Kim; Jun Ho Choi; Sung-Hyun Nam; Robert Fyumagwa; Tai-Soon Yong
Journal: Parasit Vectors Date: 2022-06-22 Impact factor: 4.047

3. Design and implementation of multiplexed amplicon sequencing panels to serve genomic epidemiology of infectious disease: A malaria case study.

Authors: Emily LaVerriere; Philipp Schwabl; Manuela Carrasquilla; Aimee R Taylor; Zachary M Johnson; Meg Shieh; Ruchit Panchal; Timothy J Straub; Rebecca Kuzma; Sean Watson; Caroline O Buckee; Carolina M Andrade; Silvia Portugal; Peter D Crompton; Boubacar Traore; Julian C Rayner; Vladimir Corredor; Kashana James; Horace Cox; Angela M Early; Bronwyn L MacInnis; Daniel E Neafsey
Journal: Mol Ecol Resour Date: 2022-05-03 Impact factor: 8.678

4. Exposure to Diverse Plasmodium falciparum Genotypes Shapes the Risk of Symptomatic Malaria in Incident and Persistent Infections: A Longitudinal Molecular Epidemiologic Study in Kenya.

Authors: Kelsey M Sumner; Elizabeth Freedman; Judith N Mangeni; Andrew A Obala; Lucy Abel; Jessie K Edwards; Michael Emch; Steven R Meshnick; Brian W Pence; Wendy Prudhomme-O'Meara; Steve M Taylor
Journal: Clin Infect Dis Date: 2021-10-05 Impact factor: 9.079

5. A targeted amplicon sequencing panel to simultaneously identify mosquito species and Plasmodium presence across the entire Anopheles genus.

Authors: Alex Makunin; Petra Korlević; Naomi Park; Scott Goodwin; Robert M Waterhouse; Katharina von Wyschetzki; Christopher G Jacob; Robert Davies; Dominic Kwiatkowski; Brandyce St Laurent; Diego Ayala; Mara K N Lawniczak
Journal: Mol Ecol Resour Date: 2021-06-19 Impact factor: 8.678

6. Genotyping cognate Plasmodium falciparum in humans and mosquitoes to estimate onward transmission of asymptomatic infections.

Authors: Wendy Prudhomme-O'Meara; Steve M Taylor; Kelsey M Sumner; Elizabeth Freedman; Lucy Abel; Andrew Obala; Brian W Pence; Amy Wesolowski; Steven R Meshnick
Journal: Nat Commun Date: 2021-02-10 Impact factor: 14.919

Review 7. Antimalarial Drug Resistance and Implications for the WHO Global Technical Strategy.

Authors: Matthew M Ippolito; Kara A Moser; Jean-Bertin Bukasa Kabuya; Clark Cunningham; Jonathan J Juliano
Journal: Curr Epidemiol Rep Date: 2021-03-14

8. Sensitive universal detection of blood parasites by selective pathogen-DNA enrichment and deep amplicon sequencing.

Authors: Briana R Flaherty; Joel Barratt; Meredith Lane; Eldin Talundzic; Richard S Bradbury
Journal: Microbiome Date: 2021-01-02 Impact factor: 14.650

9. An open dataset of Plasmodium falciparum genome variation in 7,000 worldwide samples.

Authors: Ambroise Ahouidi; Mozam Ali; Jacob Almagro-Garcia; Alfred Amambua-Ngwa; Chanaki Amaratunga; Roberto Amato; Lucas Amenga-Etego; Ben Andagalu; Tim J C Anderson; Voahangy Andrianaranjaka; Tobias Apinjoh; Cristina Ariani; Elizabeth A Ashley; Sarah Auburn; Gordon A Awandare; Hampate Ba; Vito Baraka; Alyssa E Barry; Philip Bejon; Gwladys I Bertin; Maciej F Boni; Steffen Borrmann; Teun Bousema; Oralee Branch; Peter C Bull; George B J Busby; Thanat Chookajorn; Kesinee Chotivanich; Antoine Claessens; David Conway; Alister Craig; Umberto D'Alessandro; Souleymane Dama; Nicholas Pj Day; Brigitte Denis; Mahamadou Diakite; Abdoulaye Djimdé; Christiane Dolecek; Arjen M Dondorp; Chris Drakeley; Eleanor Drury; Patrick Duffy; Diego F Echeverry; Thomas G Egwang; Berhanu Erko; Rick M Fairhurst; Abdul Faiz; Caterina A Fanello; Mark M Fukuda; Dionicia Gamboa; Anita Ghansah; Lemu Golassa; Sonia Goncalves; William L Hamilton; G L Abby Harrison; Lee Hart; Christa Henrichs; Tran Tinh Hien; Catherine A Hill; Abraham Hodgson; Christina Hubbart; Mallika Imwong; Deus S Ishengoma; Scott A Jackson; Chris G Jacob; Ben Jeffery; Anna E Jeffreys; Kimberly J Johnson; Dushyanth Jyothi; Claire Kamaliddin; Edwin Kamau; Mihir Kekre; Krzysztof Kluczynski; Theerarat Kochakarn; Abibatou Konaté; Dominic P Kwiatkowski; Myat Phone Kyaw; Pharath Lim; Chanthap Lon; Kovana M Loua; Oumou Maïga-Ascofaré; Cinzia Malangone; Magnus Manske; Jutta Marfurt; Kevin Marsh; Mayfong Mayxay; Alistair Miles; Olivo Miotto; Victor Mobegi; Olugbenga A Mokuolu; Jacqui Montgomery; Ivo Mueller; Paul N Newton; Thuy Nguyen; Thuy-Nhien Nguyen; Harald Noedl; Francois Nosten; Rintis Noviyanti; Alexis Nzila; Lynette I Ochola-Oyier; Harold Ocholla; Abraham Oduro; Irene Omedo; Marie A Onyamboko; Jean-Bosco Ouedraogo; Kolapo Oyebola; Richard D Pearson; Norbert Peshu; Aung Pyae Phyo; Chris V Plowe; Ric N Price; Sasithon Pukrittayakamee; Milijaona Randrianarivelojosia; Julian C Rayner; Pascal Ringwald; Kirk A Rockett; Katherine Rowlands; Lastenia Ruiz; David Saunders; Alex Shayo; Peter Siba; Victoria J Simpson; Jim Stalker; Xin-Zhuan Su; Colin Sutherland; Shannon Takala-Harrison; Livingstone Tavul; Vandana Thathy; Antoinette Tshefu; Federica Verra; Joseph Vinetz; Thomas E Wellems; Jason Wendler; Nicholas J White; Ian Wright; William Yavo; Htut Ye
Journal: Wellcome Open Res Date: 2021-07-13

10. Implementing parasite genotyping into national surveillance frameworks: feedback from control programmes and researchers in the Asia-Pacific region.

Authors: Rintis Noviyanti; Olivo Miotto; Alyssa Barry; Jutta Marfurt; Sasha Siegel; Nguyen Thuy-Nhien; Huynh Hong Quang; Nancy Dian Anggraeni; Ferdinand Laihad; Yaobao Liu; Maria Endang Sumiwi; Hidayat Trimarsanto; Farah Coutrier; Nadia Fadila; Najia Ghanchi; Fatema Tuj Johora; Agatha Mia Puspitasari; Livingstone Tavul; Leily Trianty; Retno Ayu Setya Utami; Duoquan Wang; Kesang Wangchuck; Ric N Price; Sarah Auburn
Journal: Malar J Date: 2020-07-27 Impact factor: 2.979