| Literature DB >> 31836742 |
Josephine Malinga1,2, Polycarp Mogeni3, Irene Omedo3, Kirk Rockett4, Christina Hubbart4, Anne Jeffreys4, Thomas N Williams3,5, Dominic Kwiatkowski4,6, Philip Bejon3,7, Amanda Ross8,9.
Abstract
Knowledge of how malaria infections spread locally is important both for the design of targeted interventions aiming to interrupt malaria transmission and the design of trials to assess the interventions. A previous analysis of 1602 genotyped Plasmodium falciparum parasites in Kilifi, Kenya collected over 12 years found an interaction between time and geographic distance: the mean number of single nucleotide polymorphism (SNP) differences was lower for pairs of infections which were both a shorter time interval and shorter geographic distance apart. We determine whether the empiric pattern could be reproduced by a simple model, and what mean geographic distances between parent and offspring infections and hypotheses about genotype-specific immunity or a limit on the number of infections would be consistent with the data. We developed an individual-based stochastic simulation model of households, people and infections. We parameterized the model for the total number of infections, and population and household density observed in Kilifi. The acquisition of new infections, mutation, recombination, geographic location and clearance were included. We fit the model to the observed numbers of SNP differences between pairs of parasite genotypes. The patterns observed in the empiric data could be reproduced. Although we cannot rule out genotype-specific immunity or a limit on the number of infections per individual, they are not necessary to account for the observed patterns. The mean geographic distance between parent and offspring malaria infections for the base model was 0.5 km (95% CI 0.3-1.5), for a distribution with 68% of distances shorter than the mean. Very short mean distances did not fit well, but mixtures of distributions were also consistent with the data. For a pathogen which undergoes meiosis in a setting with moderate transmission and a low coverage of infections, analytic methods are limited but an individual-based model can be used with genotyping data to estimate parameter values and investigate hypotheses about underlying processes.Entities:
Mesh:
Year: 2019 PMID: 31836742 PMCID: PMC6911066 DOI: 10.1038/s41598-019-54348-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Quantities in the simulation model.
| Quantity | Description | Units of measurement |
|---|---|---|
| number of SNPs | positive integer | |
| age of an infection | five-day time-step | |
| time from start of simulation | five-day time-step | |
| number of new infections for infection | Integer, greater or equal to zero | |
| mean number of new infections for an infection of age | Numeric, greater or equal to zero | |
| mean number of new infections for an infection at time-step | Numeric, greater or equal to zero | |
| relative infectiousness to mosquitoes of an infection of age | Numeric, 0 ≤ | |
| probability of mutation for one allele per infection cycle | Probability | |
| probability of clearance per five-day time-step | Probability | |
| probability of recombination conditional on multiple infections in a host | Probability | |
| parameter for distance between parent and offspring infection, (mean = | Kilometres |
Figure 1Examples of the half-normal distribution probability density function for positive values of the distance between parent and offspring infections. The dotted lines mark the mean of the distribution; Red line = 0.4 km, Blue line = 1.2 km.
Inputs to the model.
| Parameter | Value | Reference |
|---|---|---|
| population density | 260,000 people | [ |
| area size (in km2) | 891 | [ |
| median number of people per homestead (IQR) | 8 (6–11) | |
| median homestead density per km2 (IQR) | 20 (7–58) | (a) |
| estimated parasite prevalence | 60% in 1998; 10% in 2011 | [ |
| number of infections per km2 | 200 in 1998, 25 in 2011 | (b) |
| mean number of new infections per five-day time-step( | Varies | (c) |
| probability of clearance per five-day time-step, | 0.025 | (d)[ |
| probability of mutation per SNP, | 2.0 × 10−7 | (e) |
| relative infectiousness of an infection by age of infection, | Varies | (f)[ |
(a) The co-ordinates for households in the simulated 9 km by 9 km grid were randomly drawn from two independent uniform distributions.
(b) The expected number of infections was derived from microscopy and RDT prevalence data and multiplied by the MOI corresponding to the prevalence in a systematic review ([32], Malinga et al. submitted).
(c) The values of μ are fixed according one of the phases of the simulation to produce the correct numbers of infections estimated for Kilifi in (b).
(d) Assuming an exponential decay function, a probability of clearance of 0.025 per five-day time-step corresponds to a mean duration of infection of 200 days[56–58].
(e) The probability of a mutation per base pair per generation has been estimated to be 1.7 × 10−9 [59]. We multiplied this by the estimated number of generations in the liver stage, for the gametocytes and in the asexual blood-stage, each of 48 hours, before the mature gametocytes are taken up in a blood meal. The probability of at least one mutation per SNP per infection cycle is then approximately 2.0 × 10−7.
(f) Assumes an exponential decay function estimated from a period of higher infectiousness of an infection (probability of infection >5%)[41].
Simulated scenarios.
| Parameters to be estimated by a grid search | Value |
|---|---|
| parameter for the distance between parent and offspring infections (in kilometers)** | 0.10, 0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, 1.00, 1.20, 1.50, 2.00, 2.50, 3.00 |
| probability of recombination resulting from multiply infected individuals | 0.01, 0.50, 1.00 |
| imported infections per 1000 people per year (model variant (ii)) | |
| total number of current infections per person (model variant (iii)) | |
| maximum number of recently seen similar genotypes for new infection to fail to establish (model variant (iv)) | |
| number of SNPs different for defining ‘similar’ genotypes (model variant (iv)) | |
| number of time-steps for counting recently seen similar genotypes (model variant (iv)) | |
| heterogeneity between houses in transmission (model variant (v)) |
*The base model scenario is indicated by bold font. **The mean of a half-normal distribution is given by .
Figure 2Ability of the method to recover known parameter values from simulated data. Black solid line: indicates the true parameter value of σ. Red dashed line: the log likelihood. The log likelihood is a measure of the support from the data for a parameter value. The estimated parameter value is the one which coincides with the maximum log likelihood. In this figure, the estimated parameter values are correct since they align with the black lines. The method and simulated data used the base model with the probability of recombination in multiply infected individuals set to 0.5 (details of the base model are given in the Methods section).
Figure 3Patterns of the log likelihood for different values of recombination for the base model. The x-axis shows the value of σ, the parameter for the distance between parent and offspring infections. The log-likelihood is a measure of support from the data for the parameter values. Red solid line: the base model with probability of recombination in multiply infected individuals set to 0.5; Green line: 0.0; Blue line: 1.0.
Estimated mean distance between parent and offspring infections for each model variant.
| Model Variant | Mean distance in km* (95% CI) |
|---|---|
| Base model | |
| Model variant (ii) | |
| Model variant (iii) | |
| Model variant (iv) | |
| Model variant (v) |
*The mean of a half-normal distribution is given by . In all cases, the probability of a short distance is highest and 58% of parent-offspring distances are shorter than the mean value (Fig. 1). Model variant (ii): the base model with imported infections; Model variant (iii): the base model with a limit on the number of current infections per person; Model variant (iv): the base model with immunity to recently seen similar genotypes; Model variant (v): the base model with heterogeneity in transmission at the household level.
Figure 4Patterns of the log likelihood by σ, the parameter for the distance for the different model variants. Red line: the base model; Blue line: the base model with imported infections (model variant ii); Brown line: the base model with a limit on the number of current infections per person (model variant iii); Purple line: the base model with immunity to recently seen genotypes (model variant iv); Green line: the base model with heterogeneity in transmission at the household level (model variant v). Model variant (vi) (combinations of the above variants) are not shown since they produced similar results.
Figure 5Patterns of the log likelihood by σ, the parameter for the distance for data simulated from mixture distributions. Data was simulated assuming a 50:50 mixture distribution for short and longer distances of movement. Black line: 0.05 & 3.0 km; Blue line: 0.1 & 2.5 km; Green line: 0.4 & 4.0 km.
Figure 6Predicted effect of time and distance on the number of SNPs different between pairs of infections. These predictions are from the base model with the best-fitting value for mean distance (0.4 km).
Figure 7Plot of residuals by geographic distance. The base model was used with the best fitting value for mean geographic distance (0.4 km). We found no evidence of differences in the estimated parameter values between the different 6 km by 6 km squares, or over the course of the study period.
Figure 8Patterns of the log likelihood by σ, the parameter for distance for different values of the input parameters. Red solid line: reference scenario for sensitivity analysis (base model with constant transmission, 13,000 initial infections with only the warm-up and model run-in period, uniform distribution of households), Blue dashed line: reference with a lower number of initial infections (1000 initial infections with an additional run-in period), Green dashed line: reference scenario with half the warm-up period (500 time-steps), Purple solid line: reference with 50% clustering of households in the study area.