| Literature DB >> 22952595 |
Amanda Ross1, Cristian Koepfli, Xiaohong Li, Sonja Schoepflin, Peter Siba, Ivo Mueller, Ingrid Felger, Thomas Smith.
Abstract
People living in endemic areas often habour several malaria infections at once. High-resolution genotyping can distinguish between infections by detecting the presence of different alleles at a polymorphic locus. However the number of infections may not be accurately counted since parasites from multiple infections may carry the same allele. We use simulation to determine the circumstances under which the number of observed genotypes are likely to be substantially less than the number of infections present and investigate the performance of two methods for estimating the numbers of infections from high-resolution genotyping data. The simulations suggest that the problem is not substantial in most datasets: the disparity between the mean numbers of infections and of observed genotypes was small when there was 20 or more alleles, 20 or more blood samples, a mean number of infections of 6 or less and where the frequency of the most common allele was no greater than 20%. The issue of multiple infections carrying the same allele is unlikely to be a major component of the errors in PCR-based genotyping. Simulations also showed that, with heterogeneity in allele frequencies, the observed frequencies are not a good approximation of the true allele frequencies. The first method that we proposed to estimate the numbers of infections assumes that they are a good approximation and hence did poorly in the presence of heterogeneity. In contrast, the second method by Li et al estimates both the numbers of infections and the true allele frequencies simultaneously and produced accurate estimates of the mean number of infections.Entities:
Mesh:
Year: 2012 PMID: 22952595 PMCID: PMC3430681 DOI: 10.1371/journal.pone.0042496
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Factors investigated by the simulations.
| parameter | values |
| number of alleles | 2, 3, 4, 5, 7, 15, |
| frequency of most common allele |
|
| mean number of infections | 1.13, 1.27, 1.58, 2.31, 3.16, 4.07, |
| distribution of numbers of blood samples |
|
| number of blood samples | 5, 10, 15, 20, 30, 50, |
The baseline scenario is indicated by bold font.
All alleles have equal frequency.
The remaining alleles have equal frequency.
Figure 1Impact of factors on the simulated mean number of genotypes.
a: Mean number of infections b: number of alleles c: frequency of most common allele d: number of blood samples. Solid line: mean number of infections, shaded polygon: minimum and maximum of mean observed number of genotypes for the 101 simulated sets of blood samples. Baseline case: 100 children, 20 alleles of equal frequency, the number of infections per blood sample follows a zero-truncated Poisson distribution with a mean of 5.03 (Table 1). Each panel shows the effect of varying one factor from the baseline case.
Figure 2Model performance: estimated numbers of infections from the simulated sets of blood samples.
LH column: Method 1, RH column: Method 2. Solid line: true mean number of infections, shaded polygon: minimum and maximum of mean observed number of genotypes for the 101 sets (samples of blood samples) for each scenario. Boxplots: Estimates of the mean number of infections for each of the 101 sets. Baseline scenario: 100 children, 20 alleles of equal frequency, the number of infections per blood sample follows a zero-truncated Poisson distribution and a mean of 5.03 (Table 1).
Figure 3Actual and observed frequency of the most common allele.
Solid line: line of equality, shaded polygon: the minimum and maximum observed frequencies of the most common allele for the 101 sets (each of 100 blood samples) for each scenario. There were 20 alleles, the frequency of the most dominant allele is shown on the X-axis, the remaining 19 had equal frequency. The number of infections per blood sample followed a zero-truncated Poisson distribution with a mean of 5.03.
Figure 4Model performance when the assumed distribution for the numbers of infections is incorrect.
The observed numbers of genotypes were simulated using numbers of infections following negative binomial distributions, however the models assume that the numbers of infections follow a Poisson distribution. The parameter was set to 3. A small value of indicates larger variation and skew compared to a Poisson distribution whereas a value of 100 is similar to a Poisson. LH: Method 1 RH: Method 2. Solid line: true mean number of infections, shaded polygon: minimum and maximum of mean observed number of genotypes for the 101 sets (each set of blood samples) for each scenario. Boxplots: Estimates of the mean number of infections for each of the 101 sets. We simulated blood samples from 100 children and 20 alleles of equal frequency (Table 1).
Data summary and estimated mean numbers of infections.
| parameter |
|
|
|
| number of blood samples included | 76 | 80 | 79 |
| number of alleles detected | 33 | 67 | 31 |
| frequency of most common allele | 15% | 5% | 24% |
| mean number of observed genotypes | 1.71 (1.47, 1.95) | 2.19 (1.89, 2.49) | 2.21 (1.90, 2.53) |
| maximum number of observed genotypes | 7 | 6 | 6 |
| heterozygosity | 0.93 | 0.98 | 0.87 |
| Estimated mean number of infections: | |||
| method 1 | 1.87 (1.57, 2.22) | 2.26 (1.89, 2.80) | 2.58 (2.07, 3.28) |
| method 2 | 1.80 (1.41, 2.23) | 2.26 (1.70, 2.89) | 2.54 (2.23, 2.86) |
The expected heterozygosity is the probability that 2 clones taken at random from the population carry different alleles.