| Literature DB >> 31262308 |
Angela M Early1,2, Rachel F Daniels3,4, Timothy M Farrell3,4, Jonna Grimsby5, Sarah K Volkman3,4,6, Dyann F Wirth3,4, Bronwyn L MacInnis3,4, Daniel E Neafsey3,4.
Abstract
BACKGROUND: Deep sequencing of targeted genomic regions is becoming a common tool for understanding the dynamics and complexity of Plasmodium infections, but its lower limit of detection is currently unknown. Here, a new amplicon analysis tool, the Parallel Amplicon Sequencing Error Correction (PASEC) pipeline, is used to evaluate the performance of amplicon sequencing on low-density Plasmodium DNA samples. Illumina-based sequencing of two Plasmodium falciparum genomic regions (CSP and SERA2) was performed on two types of samples: in vitro DNA mixtures mimicking low-density infections (1-200 genomes/μl) and extracted blood spots from a combination of symptomatic and asymptomatic individuals (44-653,080 parasites/μl). Three additional analysis tools-DADA2, HaplotypR, and SeekDeep-were applied to both datasets and the precision and sensitivity of each tool were evaluated.Entities:
Keywords: Haplotype calling; Malaria; Molecular epidemiology; Molecular surveillance; Multiclonal infection; Multiplicity of infection; Plasmodium; Targeted amplicon deep sequencing; Within-host diversity
Mesh:
Year: 2019 PMID: 31262308 PMCID: PMC6604269 DOI: 10.1186/s12936-019-2856-1
Source DB: PubMed Journal: Malar J ISSN: 1475-2875 Impact factor: 2.979
Fig. 1Mock and natural infection sample composition. a Mock infection samples were constructed from mixtures of P. falciparum and human DNA to mimic the parasite DNA concentrations found in extracted low-density infections. b DNA from up to five clonal cultured parasite lines was combined to create each mock sample, leading to within-sample haplotype counts of one to four. c Natural infection samples were previously collected and extracted from a combination of symptomatic patients and asymptomatic carriers [1]. Parasite densities were determined by blood smear
Fig. 2Sequencing coverage of mock and natural infection samples. Overall sequencing coverage was lower for mock infection (a) than natural infection (c) samples (Mann–Whitney U Test, P = 1 × 10−7) although natural infections had a higher proportion of samples with no reads. Total read coverage (reads combined from both amplicons) correlated weakly with parasite genome concentration for mock infections (b) and parasitaemia for natural infections (d)
Fig. 3Identification of haplotypes in mock samples. a Detection of known haplotypes within the mock samples was dependent on the haplotype concentration (copies/μl) within the DNA sample. 5 μl of DNA template were used in the first round PCR amplification step prior to sequencing. Error bars represent the binomial-estimated standard deviation. b Across all mock samples, 31% of identified haplotypes were erroneous, but these haplotypes were generally supported by fewer reads than correct haplotypes. The number of nucleotide (nt) errors per haplotype was calculated as the nucleotide distance between an observed haplotype and the closest expected haplotype within the sample
Fig. 4Proportion of mock samples where the major haplotype was correctly identified. Identification of the major haplotype within a sample was less reliable at a low read counts and b low parasite genome concentrations. Samples were excluded from the analysis if the difference in prevalence between the top two haplotypes was less than 4%. Error bars represent the binomial-estimated standard deviation
Fig. 5Error rates are higher for samples with low read counts and/or low parasite density. Sensitivity and precision are affected by a read count per amplicon and b parasite genome concentration. All results were obtained with the PASEC pipeline on the full set of mock samples using only minimal filtration. 95% confidence intervals were estimated with 1000 bootstrapped data set replicates
Fig. 6Sensitivity and precision of five analysis pipelines for the detection of haplotypes in mock samples. a Analysis approaches vary more in precision than in sensitivity. b Performance of all pipelines improves when considering only samples that had at least 100 reads for an individual amplicon. Data shown include results from both the CSP and SERA2 amplicons. 95% confidence intervals were estimated with 1000 bootstrapped data set replicates
Sensitivity and precision of each pipeline (mean [95% CI])
| DADA2 | HaplotypR | PASEC | SeekDeep1x | SeekDeep2x | |
|---|---|---|---|---|---|
| All samples | |||||
| Sensitivity | |||||
| All | 0.66 [0.62, 0.70] | 0.66 [0.62, 0.70] | 0.71 [0.68, 0.75] | 0.72 [0.68, 0.76] | 0.62 [0.56, 0.68] |
| | 0.66 [0.61, 0.71] | 0.64 [0.59, 0.70] | 0.70 [0.64, 0.75] | 0.70 [0.65, 0.75] | 0.61 [0.53, 0.69] |
| | 0.65 [0.59, 0.70] | 0.68 [0.62, 0.74] | 0.73 [0.68, 0.78] | 0.73 [0.68, 0.79] | 0.63 [0.55, 0.71] |
| Precision | |||||
| All | 0.81 [0.77, 0.84] | 0.88 [0.85, 0.90] | 0.81 [0.78, 0.85] | 0.25 [0.23, 0.27] | 0.68 [0.63, 0.74] |
| | 0.72 [0.67, 0.77] | 0.94 [0.91, 0.97] | 0.86 [0.81, 0.89] | 0.26 [0.23, 0.28] | 0.77 [0.69, 0.84] |
| | 0.91 [0.87, 0.94] | 0.82 [0.78, 0.86] | 0.77 [0.72, 0.82] | 0.25 [0.22, 0.28] | 0.61 [0.53, 0.68] |
| Samples with ≥ 100 reads | |||||
| Sensitivity | |||||
| All | 0.83 [0.80, 0.86] | 0.84 [0.81, 0.86] | 0.83 [0.81, 0.86] | 0.83 [0.80, 0.86] | 0.78 [0.78, 0.87] |
| | 0.82 [0.78, 0.86] | 0.82 [0.77, 0.86] | 0.82 [0.78, 0.86] | 0.82 [0.78, 0.86] | 0.84 [0.78, 0.89] |
| | 0.85 [0.80, 0.89] | 0.86 [0.81, 0.90] | 0.85 [0.80, 0.89] | 0.85 [0.81, 0.89] | 0.82 [0.75, 0.88] |
| Precision | |||||
| All | 0.83 [0.80, 0.86] | 0.89 [0.87, 0.92] | 0.92 [0.90, 0.94] | 0.26 [0.24, 0.28] | 0.79 [0.74, 0.84] |
| | 0.75 [0.70, 0.79] | 0.94 [0.91, 0.96] | 0.95 [0.92, 0.97] | 0.27 [0.24, 0.30] | 0.88 [0.83, 0.93] |
| | 0.92 [0.88, 0.95] | 0.84 [0.80, 0.88] | 0.90 [0.86, 0.93] | 0.25 [0.22, 0.28] | 0.71 [0.63, 0.78] |
Fig. 7Mean COI estimates for four sub-Saharan African study sites made by the five analysis pipelines. COI was defined as the maximum number of haplotypes retrieved for the sample from either of the two amplicons. Amplicon-specific estimates are found in Additional file 1: Fig. S11