Marcia C Castro1, Mathieu Maheu-Giroux2, Christinah Chiyaka3,4, Burton H Singer3. 1. Department of Global Health and Population, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America. 2. Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom. 3. Emerging Pathogens Institute, University of Florida, Gainesville, Florida, United States of America. 4. School of Social and Community Medicine, University of Bristol, Bristol, United Kingdom.
Abstract
Methodology to estimate malaria incidence rates from a commonly occurring form of interval-censored longitudinal parasitological data-specifically, 2-wave panel data-was first proposed 40 years ago based on the theory of continuous-time homogeneous Markov Chains. Assumptions of the methodology were suitable for settings with high malaria transmission in the absence of control measures, but are violated in areas experiencing fast decline or that have achieved very low transmission. No further developments that can accommodate such violations have been put forth since then. We extend previous work and propose a new methodology to estimate malaria incidence rates from 2-wave panel data, utilizing the class of 2-component mixtures of continuous-time Markov chains, representing two sub-populations with distinct behavior/attitude towards malaria prevention and treatment. Model identification, or even partial identification, requires context-specific a priori constraints on parameters. The method can be applied to scenarios of any transmission intensity. We provide an application utilizing data from Dar es Salaam, an area that experienced steady decline in malaria over almost five years after a larviciding intervention. We conducted sensitivity analysis to account for possible sampling variation in input data and model assumptions/parameters, and we considered differences in estimates due to submicroscopic infections. Results showed that, assuming defensible a priori constraints on model parameters, most of the uncertainty in the estimated incidence rates was due to sampling variation, not to partial identifiability of the mixture model for the case at hand. Differences between microscopy- and PCR-based rates depend on the transmission intensity. Leveraging on a method to estimate incidence rates from 2-wave panel data under any transmission intensity, and from the increasing availability of such data, there is an opportunity to foster further methodological developments, particularly focused on partial identifiability and the diversity of a priori parameter constraints associated with different human-ecosystem interfaces. As a consequence there can be more nuanced planning and evaluation of malaria control programs than heretofore.
Methodology to estimate malaria incidence rates from a commonly occurring form of interval-censored longitudinal parasitological data-specifically, 2-wave panel data-was first proposed 40 years ago based on the theory of continuous-time homogeneous Markov Chains. Assumptions of the methodology were suitable for settings with high malaria transmission in the absence of control measures, but are violated in areas experiencing fast decline or that have achieved very low transmission. No further developments that can accommodate such violations have been put forth since then. We extend previous work and propose a new methodology to estimate malaria incidence rates from 2-wave panel data, utilizing the class of 2-component mixtures of continuous-time Markov chains, representing two sub-populations with distinct behavior/attitude towards malaria prevention and treatment. Model identification, or even partial identification, requires context-specific a priori constraints on parameters. The method can be applied to scenarios of any transmission intensity. We provide an application utilizing data from Dar es Salaam, an area that experienced steady decline in malaria over almost five years after a larviciding intervention. We conducted sensitivity analysis to account for possible sampling variation in input data and model assumptions/parameters, and we considered differences in estimates due to submicroscopic infections. Results showed that, assuming defensible a priori constraints on model parameters, most of the uncertainty in the estimated incidence rates was due to sampling variation, not to partial identifiability of the mixture model for the case at hand. Differences between microscopy- and PCR-based rates depend on the transmission intensity. Leveraging on a method to estimate incidence rates from 2-wave panel data under any transmission intensity, and from the increasing availability of such data, there is an opportunity to foster further methodological developments, particularly focused on partial identifiability and the diversity of a priori parameter constraints associated with different human-ecosystem interfaces. As a consequence there can be more nuanced planning and evaluation of malaria control programs than heretofore.
Estimation of incidence, and in some cases recovery, rates for malaria infection is a central objective of ongoing community surveillance programs [1]. By incidence rate we mean the number of new infections acquired in a small interval of time per person at risk (i.e. uninfected) at the beginning of the interval. Analogously, a recovery rate is the number of terminations of infection in a small interval of time per person at risk (i.e. infected) at the beginning of the interval.It is important to observe that incidence rate, as defined above, is not the same as incidence, as commonly used in the contemporary malaria literature [e.g., see reference 2]. In the latter case, incidence is defined as [Number of positive species-specific clinical cases observed during the duration of a survey]/[(Number of people observed over the survey duration) * (duration of survey)]. We quantitatively compare and contrast these notions in the Discussion section.Ideally one would like to have continuous histories of infection status on designated populations so that incidence rates could be directly ascertained over short intervals starting at any designated time. This would facilitate showing the impact of local-in-time weather events, seasonal variation, and the impact of intervention strategies on these rates at arbitrary times of interest to health service workers, government personnel setting malaria control policy, and research investigators.Continuous infection status histories are virtually never available at a community level. The most common longitudinal data collection plan is a time series of 2-wave panel data sets (or observations of infection status on the same individuals taken at two time points separated by an interval of length Δ), with spacing between waves varying from a few weeks [3,4] to several months [5]. Thus, there are unobserved transitions between states (uninfected and infected) that create a challenge for estimation of incidence and recovery rates. To address this challenge, we require specification of a class of continuous-time 2-state stochastic process models of infection status dynamics that must be shown to be consistent with observed data, and within which estimation of incidence and recovery rates is feasible.The first attempt to carry out this program was by Bekessy et al. [6] using data from the Garki malaria surveys [3] in Kano State, Nigeria from 1970–1975. They introduced the 2-state continuous time Markov chains as candidates to represent the unobserved infection dynamics. The empirical question associated with this choice was whether or not a transition matrix P(Δ) generated by a member of this class of models could represent a transition matrix arising from field data. Here Δ is the time interval between observations collected at a survey date T1 and a later survey date T2. P(Δ) is a 2x2 transition matrix associated with a continuous time Markov chain with entries p(Δ) = conditional probability that an individual has infection status j at the end of a time interval of length Δ given that his/her infection status at the beginning of the interval is i. Here, i and j can take on the values 1 = uninfected or 2 = infected. is a 2x2 stochastic matrix with entries n/(n + n), where n counts the number of individuals observed to be in state i at the beginning of interval Δ and in state j at the end of the interval. The entries are interpreted as conditional frequencies of observing a transition i → j between the consecutive survey dates.For the Garki surveys, Δ ≈ 10 weeks. The transition matrices P(Δ) have a representation in terms of incidence and recovery rates given by:Here q1 is the incidence rate at any time t in the interval Δ, and q2 is the corresponding recovery rate. Bekessy et al. [6] showed that a 2x2 stochastic matrix, P*, has a representation of the form Eq (1) if and only ifThus, the empirically determined stochastic matrices can be generated by observations taken at two points in time on a continuous time Markov chain provided trace . Formal statistical tests of this hypothesis were put forth by Singer & Cohen [7].Interestingly, Bekessy et al. [6], Singer & Cohen [7], and Molineaux & Gramiccia [3] found that all pairs of consecutive surveys during the baseline period of data collection in the Garki project satisfied eq (2). However, there were pairs of consecutive surveys conducted during the intervention phase of the project where trace . In these instances, Molineaux & Gramiccia [3] claimed that incidence and recovery rates were not estimable. While this assertion is correct for the class of model eq (1), the unanswered question as of 1980 was: what alternative and substantively defensible models could generate transition matrices satisfying trace and within which incidence and recovery rates could be estimated?Surprisingly, to the best of our knowledge, this question has not been taken up in the past 40 years. Nevertheless, its importance stems from the fact that 2-wave panel data in a diversity of malaria surveys/surveillance projects have a majority of their transition matrices, , satisfying trace . In addition, incidence and recovery rates are important quantities for evaluation of malaria intervention programs.The purposes of this paper are to: (i) present a class of models with associated 2x2 transition matrices, P*, within which those satisfying eq (1) are nested, some members of which satisfy trace P* < 1, and for all of which incidence and recovery rates can be calculated; (ii) exhibit identifiability and/or partial identifiability criteria arising from specific malaria contexts that ensure uniqueness, or highly constrained non-uniqueness of incidence and recovery rates; and (iii) apply the models and methods in (i) and (ii) to a time series of 2-wave panel data sets from Dar es Salaam, Tanzania as an example of the applicability of the proposed method [5,8,9]. To facilitate dissemination and utilization of the methodology by the malaria community, we developed a code to calculate incidence rates from longitudinal data utilizing the R package v.2.15.1 [10], which we make available as Supporting Information in a documented version (S1 Code) and as a R file (S2 Code). To allow replication of our results, we provide the Dar es Salaam data utilized in this paper (Tables 1 and 2).
Table 1
Elements of the 2x2 matrix and time interval between two consecutive survey rounds, extracted from the UMCP data, and used to estimate incidence rates by wards.
Consecutive pairs of survey rounds
Municipality
Ward
n11
n12
n21
n22
Time Interval
R12
Kinondoni
Magomeni
78
7
50
8
42.1
R12
Kinondoni
Mikocheni
76
12
13
1
40.2
R12
Kinondoni
Mwananyamala
35
23
2
5
28.3
R12
Kinondoni
Mzimuni
82
8
4
0
34.8
R12
Kinondoni
Ndugumbi
111
12
81
14
40.0
R12
Temeke
Azimio
28
48
1
8
29.1
R12
Temeke
Keko
64
2
2
0
32.8
R12
Temeke
Kurasini
115
20
46
7
41.6
R12
Temeke
Miburani
92
22
8
1
37.8
R12
Temeke
Mtoni
95
15
49
12
39.0
R12
Ilala
Buguruni
39
5
1
0
32.5
R12
Ilala
Ilala
104
5
47
2
40.1
R12
Ilala
Kipawa
88
18
59
6
42.0
R12
Ilala
Mchikichini
150
28
15
3
36.8
R12
Ilala
Vingunguti
32
27
2
2
29.1
R23
Kinondoni
Magomeni
146
26
23
4
41.8
R23
Kinondoni
Mikocheni
99
13
25
4
36.4
R23
Kinondoni
Mwananyamala
145
14
53
8
40.5
R23
Kinondoni
Mzimuni
138
18
23
5
40.6
R23
Kinondoni
Ndugumbi
142
6
29
1
39.0
R23
Temeke
Azimio
91
9
82
19
40.9
R23
Temeke
Keko
164
29
20
1
41.3
R23
Temeke
Kurasini
124
23
19
6
41.1
R23
Temeke
Miburani
121
23
18
2
39.5
R23
Temeke
Mtoni
111
12
29
1
39.0
R23
Ilala
Buguruni
148
22
32
4
42.2
R23
Ilala
Ilala
151
7
24
0
38.5
R23
Ilala
Kipawa
155
26
49
5
41.1
R23
Ilala
Mchikichini
167
7
33
6
38.5
R23
Ilala
Vingunguti
115
7
52
3
40.8
R34
Kinondoni
Magomeni
214
14
36
3
42.6
R34
Kinondoni
Mikocheni
171
7
22
0
44.2
R34
Kinondoni
Mwananyamala
197
13
42
2
45.9
R34
Kinondoni
Mzimuni
178
5
34
0
43.3
R34
Kinondoni
Ndugumbi
219
10
17
1
44.7
R34
Temeke
Azimio
99
18
27
3
45.3
R34
Temeke
Keko
137
7
33
0
43.5
R34
Temeke
Kurasini
212
8
41
1
43.5
R34
Temeke
Miburani
221
3
37
2
45.6
R34
Temeke
Mtoni
176
15
23
1
45.4
R34
Ilala
Buguruni
128
8
31
1
42.1
R34
Ilala
Ilala
161
2
11
0
45.6
R34
Ilala
Kipawa
210
9
40
1
42.5
R34
Ilala
Mchikichini
182
14
17
1
45.9
R34
Ilala
Vingunguti
125
14
10
2
45.5
R45
Kinondoni
Magomeni
186
3
15
0
36.2
R45
Kinondoni
Mikocheni
224
3
15
0
35.3
R45
Kinondoni
Mwananyamala
209
10
7
1
37.5
R45
Kinondoni
Mzimuni
201
1
8
0
39.1
R45
Kinondoni
Ndugumbi
154
3
15
0
36.1
R45
Temeke
Azimio
115
1
16
0
39.1
R45
Temeke
Keko
135
0
12
0
43.6
R45
Temeke
Kurasini
187
3
9
0
34.1
R45
Temeke
Miburani
92
3
3
0
38.1
R45
Temeke
Mtoni
110
10
16
1
36.5
R45
Ilala
Buguruni
178
27
7
1
38.8
R45
Ilala
Ilala
159
14
11
1
32.9
R45
Ilala
Kipawa
169
14
10
1
35.6
R45
Ilala
Mchikichini
139
11
12
1
33.7
R45
Ilala
Vingunguti
151
28
15
6
37.2
R56
Kinondoni
Magomeni
46
0
2
0
52.3
R56
Kinondoni
Mikocheni
143
2
2
0
50.8
R56
Kinondoni
Mwananyamala
87
0
6
0
51.1
R56
Kinondoni
Mzimuni
116
1
3
0
49.7
R56
Kinondoni
Ndugumbi
66
0
0
0
52.6
R56
Temeke
Azimio
67
1
0
0
51.1
R56
Temeke
Keko
104
1
3
0
48.9
R56
Temeke
Kurasini
63
0
1
0
49.1
R56
Temeke
Miburani
177
5
9
0
51.7
R56
Temeke
Mtoni
93
0
4
0
55.6
R56
Ilala
Buguruni
80
3
13
0
49.2
R56
Ilala
Ilala
50
2
5
0
46.3
R56
Ilala
Kipawa
146
7
12
0
56.5
R56
Ilala
Mchikichini
89
3
6
0
61.6
R56
Ilala
Vingunguti
106
2
6
0
50.7
Note: The column Time Interval (in weeks) represents the input in the R code (S2 Code). The columns n11, n12, n21, and n22 are the elements of the 2x2 matrix described as in the R code.
Table 2
Elements of the 2x2 matrix and time interval (in weeks) between two consecutive survey rounds, extracted from the UMCP data, and used to estimate incidence rates by stage of the survey round.
Consecutive pairs of survey rounds
Stage
n11
n12
n21
n22
Time Interval
R12
I
95
98
5
15
28.9
R12
II
185
15
7
0
33.7
R12
III
318
62
36
5
37.9
R12
IV
310
32
177
28
39.7
R12
V
281
45
155
21
41.9
R23
I
351
30
187
30
40.7
R23
II
450
69
75
10
41.4
R23
III
387
43
76
12
38.2
R23
IV
404
25
82
2
38.8
R23
V
425
75
91
15
41.3
R34
I
421
45
79
7
45.6
R34
II
443
20
98
1
43.0
R34
III
574
24
76
3
45.3
R34
IV
556
27
51
2
45.2
R34
V
636
31
117
5
42.8
R45
I
475
39
38
7
37.8
R45
II
514
28
27
1
40.1
R45
III
455
17
30
1
35.3
R45
IV
423
27
42
2
35.0
R45
V
542
20
34
1
35.3
R56
I
260
3
12
0
51.0
R56
II
300
5
19
0
49.3
R56
III
409
10
17
0
54.1
R56
IV
209
2
9
0
52.5
R56
V
255
7
15
0
54.0
Note: The column Time Interval represents the input in the R code (S2 Code). The columns n11, n12, n21, and n22 are the elements of the 2x2 matrix described as in the R code.
Note: The column Time Interval (in weeks) represents the input in the R code (S2 Code). The columns n11, n12, n21, and n22 are the elements of the 2x2 matrix described as in the R code.Note: The column Time Interval represents the input in the R code (S2 Code). The columns n11, n12, n21, and n22 are the elements of the 2x2 matrix described as in the R code.An important feature of our considerations is allowance for non-identifiability in situations where subject-matter-based constraints are too weak to ensure unique parameter identification from input information. Depending upon the particular setting, the extent of non-identifiability—or partial identifiability—may be sufficiently limited that the unidentified parameters are restricted to a narrow range of values, with accompanying incidence rates also varying over a small range. This is, indeed, the situation in Dar es Salaam, the site of our empirical data. As a consequence, we show how to explicitly incorporate variability due to a small degree of under-identification together with sampling variability to produce composite uncertainty intervals for incidence rates. Useful discussions of partial identifiability, including in the setting of mixture models, are given by Manski [11], Gustafson [12], and Henry et al. [13].
Methods
Ethics statement
Ethical approval to use the UMCP data was provided by the Harvard T.H. Chan School of Public Health Institutional Review Board (Protocol # 20323–101).
Two-component mixtures of continuous time Markov chains
We begin with the observation that every 2x2 stochastic matrix, P, with non-zero entries can be a transition matrix for at least one 2-component mixture of continuous-time Markov chains. Thus, P has a representation of the form
where S is a diagonal matrix with entries s, i = 1,2 in the unit interval, [0,1]. U and V are each stochastic matrices having representations of the form eq (1). Hence they satisfy the condition eq (2): trace U > 1 and trace V > 1. It will be convenient to represent the matrices P, U, and V as points in the unit square; namely p = (p1,p2), u = (u1,u2), and v = (v1,v2). The coordinates p,u, and v, i = 1,2 are the diagonal entries in P, U, and V respectively.Depending on the empirical setting, we will also have occasion to consider data of the form p = (p1,0) and p = (0,p2); i.e. points on the boundary of the unit square. In the first of these conditions, P has a representation of the form eq (3), but with u = (1,0) (hence, trace U = 1) and trace V > 1. For p = (0,p2), the representation eq (3) also holds but now with u = (0,1) and trace V > 1. Data of the form p = (p1,0) occur frequently in the Dar es Salaam data, as discussed later. In these boundary cases, U still has a representation of the form eq (1) but with q = (0,−∞) when p = (p1,0), and q = (−∞,0) when p = (0,p2). In the first case, the interpretation of q2 = ∞ is that there is an infinitely fast recovery rate. This would be associated with a population where everyone is on prophylaxis, or where effective anti-malarial drugs are administered immediately following a diagnosis of infection. In the second case, q1 = ∞ corresponds to an infinitely fast incidence rate. This would be a situation where there is instantaneous new infection of any exposed individual.For data corresponding to p in the interior of the unit square—i.e. with non-zero entries—we calculate the incidence and recovery rates for the mixture eq (3) via:In terms of U and V, the rates and , i = 1,2 can be expressed as (note that these formulas are entries in matrix logarithms of and – see e.g. [7]):When p = (p1,0), we haveThe recovery rate is ∞ for every s2 > 0. When s2 = 0, we haveAn analogous argument yields r1 and r2 when p = (0,p2).
Identification
We first re-express Eq (3) via the system of equations
where (s,u,v) ∈ [0,1]6 for i = 1,2 and u1+u2>1, v1+v2>1.Given p = (p1,p2), eq (6) is an under-identified system. Additional subject-matter motivated constraints must be imposed to either identify (s,u,v) uniquely or restrict this vector to a small set in [0,1]6∩((u,v):u1+u2>1, v1+v2>1). In the context of malaria in Dar es Salaam, we impose the constraints: u1 ≤ 0.2, u2 = 1, 0.9A full rationale for the above restrictions will be given later when we present the Dar es Salaam case study. However, the central point here is that a system of constraints such as these is essential for parameter identification or partial identification. The conditions that v1 − v2 be ‘large’ and v1 < 1 require additional comment. First, it is a matter of judgment about what is a high probability of being observed uninfected at consecutive surveys of the V-process, while still not being a sure thing—i.e. v1 = 1. In identifying parameters, we first select v1 ∈[0.9,1) and secondarily choose v2 as small as possible consistent with the other constraints. Two examples will serve to illustrate the issues.
Example 1
Suppose p = (p1,p2) = (0.7,0.2). This is a case where trace P = p1 + p2 < 1. Now, set v1 = 0.95. Then (s1,u1) is a point on the curve 0.95s1 − s1u1 − 0.25 = 0, subject to the constraints u1 ≤ 0.2 and s1 < 0.5. This is a curve of admissibility for (s1,u1). Then, from 0.20 = s2 + (1 − s2)v2 and v1 + v2 > 1, we have v2 = [0.20 − s2]/(1 − s2) > 1 − v1 = 0.05. The left hand equality represents a curve of admissibility for (s2,v2). The two curves represent the extent of non-identifiability of the parameters in eq (6) given the a priori constraints for Dar es Salaam. This variation could be further reduced if, for example, we specified s1 based on additional empirical evidence.Some judgment enters in choosing v2 > 0.05. We want v2 close to 0.05 to make v1 − v2 ‘large’, but not so close that v1 + v2 = 1 + ε yields an unreasonably small value of ε. The concern here is that the contribution to the overall incidence rate from the V-process is given by , where Δ is the time between consecutive surveys. As ε → 0, . This is, of course, physically impossible. Thus, we choose ε = 0.008 based on the fact that larger values would not have much influence on , while smaller values, by another power of 10 or more, would yield unrealistically high conversion rates from negative to positive parasitological status. Now, ε = 0.008 implies that v2 = 0.058 and s2 = 0.15. This leaves (s1,u1) on the curve 0.95s1 − s1u1 − 0.25 = 0 as the only source of non-identifiability. Table 3 shows the impact of this source of variation on the incidence rate r1.
Table 3
Influence of variation in (s1,u1)on the incidence rate r1.
s1
u1
r1
0.263
0.006
0.0381
0.300
0.117
0.0204
0.320
0.168
0.0184
0.330
0.192
0.0177
0.333
0.199
0.0175
In Table 3, , with and . We set Δ = 40 weeks, which is roughly the average spacing between consecutive surveys in the Dar es Salaam data, as described later. The point (s1,u1) = (0.263, 0.006) corresponds to a very low probability of being observed uninfected at two consecutive surveys of the U-process. Thus, we retain the other 4 points as plausible values and include the variation in r1 corresponding to them in our overall assessment of variability in the incidence rate. We again emphasize that this is variability induced by non-identifiability. It is an important feature of the methodology introduced herein. Overall variation in incidence rates is a composite of sampling variability and non-identifiability variation, detailed later.
Example 2
Suppose p = (p1,0) with p1 > 0.5. With v1 + v2 = 1 + ε, if we set ε = 0.008 as in example 1, then r1 is given by eq (5) and the model is identified. The values of v1 and v2 are determined by the choice of μ > 0 in the specification v1 = (1 − μ)p1 with μ small. As a numerical example, suppose p1 = 0.9 and μ = 0.05. Then v1 = 0.95 x 0.9 = 0.855. Since 0.9 = s1 + (1 − s1)v1, we have s1 = 0.31. Then v2 = 1.008 − 0.855 = 0.153 and v1 − v2 = 0.702. Finally, we have , since . Then r1 = 0.01217 using Δ = 40 weeks.Although it does not arise in the Dar es Salaam example, we briefly indicate identification of v when p = (p1,0) and p1 < 0.5. Here we must have v2 > v1 as seen in the following simple, but generic, example. Suppose p1 = 0.4 and v1 = 0.95 p1 = 0.38. From 0.4 = s1 + (1 − s1)v1, we obtain s1 = 0.0322. Then from v1 + v2 = 1 + ε, we have v2 = 1 + ε − v1, and with ε = 0.008, v2 = 0.628. Then v1 − v2 = −0.248.
Assessing variability of incidence rates
When p = (p1,p2) is in the interior of the unit square, we generate 1,000 tables by doing binomial sampling for row 1 with probability p1 and for row 2 with probability p2 with sample sizes n11 + n12 and n21 + n22, respectively. If, for a particular table, the system of eq (6) has a unique solution (s,u,v), subject to context-specific constraints, then we compute an incidence rate, r1, for that table. If there is a zone of non-identifiability, as previously exemplified by the equation 0.95s1 − s1u1 − 0.25 = 0 in Example 1, then we compute r1 for each of 100 values u1 (which then determines s1) subject to the a priori constraints on s1 and u1. This yields a set of incidence rates that reflect variation due to non-identification. We used the minimum, median, and maximum values of r1 from each such set of 100 values and viewed them as the summary rates for the particular table. Finally we take the summary rates, for tables where non-identifiability is an issue, and the unique rate for tables when the system eq (6) is identified, and rank this full set of rates. We designate the 2.5th percentile and the 97.5th percentile of the ranked list as the upper and lower bounds on a 95% variation interval for the incidence rate of the original table. This takes both sampling variability and variation due to non-identifiability into account.When p = (p1,0), we treat the 0 as a structural zero—in the case of Dar es Salaam—and only do binomial sampling on the first row to generate 1,000 tables having this same structure. We then describe the variation in r1 in the same manner as indicated above. In applications where p = (p1,0) does not have a structural zero, we perturb the second coordinate to a small value—e.g. 10−5 or less—and do binomial sampling for the second row with this value. Then we proceed as in the above paragraph to calculate a confidence interval for r1.
Submicroscopic infection
Light microscopy has limitations as a technology for diagnosing Plasmodium
falciparum infections, particularly in low transmission settings [14-16]. In a recent study of Okell et al. [16], the supplementary information for the paper contains an especially interesting and useful table comparing prevalence estimates using microscopy and PCR on the same blood samples. The data come from a wide variety of settings, and exhibit considerable variation in prevalence rates as ascertained via microscopy. The prevalence ratio, p = [prevalence rate from microscopy]/[prevalence rate from PCR] provides a basis for adjusting empirical microscopy rates to what you would expect to find if PCR had actually been done on the same blood samples. This calculation will, of course, only yield adjusted prevalence rates. For our longitudinal data, it would have been desirable to have microscopy and PCR based estimates of p12,p21 and p22, from which we could directly recover p11. However, the lack of identifiability of transition probabilities from prevalence rates can still be dealt with in particular settings, such as Dar es Salaam, by invoking an additional, and obviously context dependent, constraint. We exhibit the methodology on p = (0.7, 0.2)–the transition rates in example 1–augmented by a table of counts with entries n consistent with these values. Using an adjusted table of counts, , and thereby an adjusted vector, , we calculate the incidence rate, , that represents what we might have expected to find if PCR had been done on the blood samples in Dar es Salaam.We introduce the table of counts {n,1 ≤ i, j ≤ 2} with n11 = 100, n12 = 43, n21 = 90, and n22 = 23. For this table, p = (0.7, 0.2). It is one of a myriad of tables that could have been selected to illustrate our points about submicroscopic infection. However, it is comparable in size to many of the sub-ward tables in the Dar es Salaam data, and thus especially useful for illustrating an adjustment methodology.The prevalences at the initial and final rounds of data collection for the above table are: at initial survey = [90 + 23]/256 = 0.4414, and at final survey = [23 + 43]/256 = 0.2578. From Table S1 in Okell et al. [16], we find the prevalence from microscopy that is closest to the prevalence at initial survey in Dar es Salaam given by 0.4414. This is the prevalence of 0.481 based on data from Guinea Bissau. The corresponding prevalence ratio is p = 0.551. Thus our estimate for a corresponding PCR-based prevalence rate at the initial survey is 0.4414/0.551 = 0.8011. In contrast to the initial survey situation, there are four nearby microscopy-based prevalence rates to associate with the prevalence rate for the final survey given above by 0.2578. These values, their associated prevalence ratios, and our estimate for the corresponding PCR-based prevalence rates are shown in Table 4. Each PCR rate is equal to 0.2578/p.
Table 4
Microscopy and PCR-based prevalence rates and prevalence ratios.
Microscopy rate
Prevalence ratio, p
PCR rate
0.246
0.565
0.456
0.247
0.521
0.495
0.250
0.891
0.289
0.255
0.331
0.829
Source: Table S1 of Okell et al. [36].
Source: Table S1 of Okell et al. [36].For our analysis, we use the average of these PCR rates, namely 0.5173. To obtain an associated table of counts , we first observe that total count from the microscopy-based table with entries n. From PCR prevalence at initial survey = 0.8011, we obtain . From PCR prevalence at final survey = 0.5173, we obtain . Adding and subtracting to the equation for total count, we can rewrite it as . Then we have that . To be consistent with the microscopy-based vector, p = (0.7, 0.2), where obviously p1 > p2, we choose to ensure that . Table 5 shows some choices for estimated PCR-based tables.
Table 5
An example of incidence rates, , along the curve of values based on PCR-based tables.
n11*
n22*
p1*
p2*
n12*
n21*
s1*
u1*
q1(U)*
r1*
10
91
0.244
0.444
41
114
0.1758
0.006
0.1279
0.0244
30
111
0.588
0.541
21
94
0.1953
0.100
0.0576
0.0132
40
121
0.784
0.590
11
84
0.2210
0.199
0.0404
0.0107
Here we use as an example and (third line of Table 5) for subsequent calculations. To obtain what is interpreted as a PCR-based incidence rate, we proceed as in the previously described microscopy-based rate calculations. First set . Then from , we obtain . Using , we set . Then , and . Thus , with ε = 0.13 and Δ = 40. Along the curve of values given by , we obtain values of and as indicated in Table 5.
Results
Case study: Malaria in Dar es Salaam
Dar es Salaam
Dar es Salaam is the largest city and economic capital of the United Republic of Tanzania with a population of 4.4 million as recorded by the 2012 Population Census. The city is divided into three municipalities: (i) Ilala, the administrative area, where almost all government Offices and ministries are located; (ii) Temeke, the industrial area, which holds the main manufacturing centers and the Dar es Salaam Port, and has the largest concentration of low income dwellers; and (iii) Kinondoni, the most populated municipality that concentrates most of the high income housing in the city. Municipalities are divided into wards, and the smallest administrative component is the ten-cell unit.Analogous to other urban areas in sub-Saharan Africa, fast urban growth was observed in Dar es Salaam in recent decades, but not accompanied by effective provision of infrastructure. It is estimated that in Dar es Salaam 70% of the population lives in unplanned settlements [17], with precarious access to improved drinking water, sanitation [18], and waste collection. Although an extensive network of drains is available [19-21], many of the drains lack proper and regular maintenance resulting in waste accumulation, flooding, water stagnation, and the proliferation of breeding sites for disease vectors [22,23].The dominant malaria vectors in Dar es Salaam are Anopheles gambiae s.s. and An. funestus [24], and they breed mostly in aquatic habitats found in drains, ditches, construction pits, other human-made roles, and habitats associated with urban agriculture [23]. Plasmodium falciparum is responsible for more than 90% of malaria infections, and transmission is perennial [25] with seasonal variations related to the rainy seasons.
Dar es Salaam Urban Malaria Control Program (UMCP)
The methodology here proposed was applied to data from the UMCP, a large-scale larviciding intervention conducted in urban Dar es Salaam [8]. Reasons to choose the UMCP were three-fold. First, the availability of longitudinal data, specifically time series of two-wave panel data sets; six survey rounds were collected over a period of 4 years and 7 months (May-2004 to Dec-2008). Second, the intervention was shown to be successful [5,24,26], with the prevalence of malaria infection decreasing from 20.8% (95% CI: 16.8–24.9%) in the first survey round to 1.7% (95% CI: 1.4–2.1%) in the last [5]. Thus, the data describe a scenario of declining transmission to very low levels, similar to what is currently observed in countries/regions reaching malaria pre-elimination stages. Third, the calculation of incidence rates was a necessary input for the cost-effectiveness analysis of interventions [9].The UMCP started in 2004 targeting 15 wards (Fig 1), five in each of the three municipalities, covering approximately 56 km2 and more than 610,000 habitants [8]. After a period of baseline data collection and local capacity building, the program introduced the use of larvicides in three phases: in March 2006 it was introduced in 3 wards; in April 2007 it was expanded to 6 additional wards; and in April 2008 all 15 wards were being treated with larvicides. The time period of the survey rounds did not exactly match the launching of the intervention, so wards were exposed to larval control for different periods of time (Fig 2).
Fig 1
Urban Malaria Control Program wards, Dar es Salaam.
The UMCP covered 15 of the 73 wards in Dar es Salaam. Larviciding was implemented in three phases such that the coverage moved from three wards to nine, and then to all 15 between 2006 and 2008.
Fig 2
Timeline of data collection and larviciding intervention.
The first survey round was launched in 05/2004. The first phase of the larviciding intervention started on 03/2006, the second on 05/2007, and the last on 04/2008. Each survey round had five stages, and three wards of each municipality are included in each stage. Month overlaps reflect slightly different duration of data collection in each ward. The wards included in each stage were: Stage I—Vingunguti, Mwananyamala, and Azimio; Stage II—Buguruni, Mzimuni, and Keko; Stage III—Mchikichini, Mikocheni, and Miburani; Stage IV—Ilala, Ndugumbi, and Mtoni; and Stage V—Kipawa, Magomeni, and Kurasini.
Urban Malaria Control Program wards, Dar es Salaam.
The UMCP covered 15 of the 73 wards in Dar es Salaam. Larviciding was implemented in three phases such that the coverage moved from three wards to nine, and then to all 15 between 2006 and 2008.
Timeline of data collection and larviciding intervention.
The first survey round was launched in 05/2004. The first phase of the larviciding intervention started on 03/2006, the second on 05/2007, and the last on 04/2008. Each survey round had five stages, and three wards of each municipality are included in each stage. Month overlaps reflect slightly different duration of data collection in each ward. The wards included in each stage were: Stage I—Vingunguti, Mwananyamala, and Azimio; Stage II—Buguruni, Mzimuni, and Keko; Stage III—Mchikichini, Mikocheni, and Miburani; Stage IV—Ilala, Ndugumbi, and Mtoni; and Stage V—Kipawa, Magomeni, and Kurasini.The UMCP wards revealed some variability regarding behavioral, socioeconomic, and geographical characteristics. While individual knowledge about malaria transmission was similar across wards, an index of household assets and, especially, housing conditions (quality of walls, screening, and source of water) showed important differences. For example, 47% of houses interviewed in Ilala had access to a good source of water, while in Temeke this number was only 19%. Use of a mosquito net, however, was relatively high, but declined significantly following the successful implementation of larval control [27]. The organization of the urban space was very heterogeneous, combining patches of planned areas (both higher and lower housing quality), squatter dwellings, urban agriculture, open spaces, industry, commerce, and areas of recent expansion (where foundations of houses, usually unfinished, often accumulate rainwater and offer conditions for mosquito breeding).
Data structure
Data available for estimating incidence rates consist of a time series of 2-wave panels covering six survey rounds collected over a period of 4 years and 7 months (May-2004 to Dec-2008), in the 15 UMCP wards [5,8,24]. Each survey round had five stages, and in each stage interviews were conducted in 3 out of the 15 UMCP wards (one in each municipality)–the order of the municipalities interviewed in each stage remained the same throughout the study (Fig 2). Therefore, five 2-wave panel data sets are available including baseline and intervention phases. A total of 5,223 participants were followed up twice, 2,349 three times, 1,236 four times, 472 five times, and 99 individuals participated in every survey round. Intervals between consecutive rounds of data collection varied from 6–12 months. Thus, repeated observations on the same individual can be separated by one or more unobserved changes in infection status (interval-censored data).Individuals in the survey were tested for a malaria infection, upon consent. Finger-pricked blood samples were analyzed using Giemsa-stained thick smear microscopy. Quality check was conducted on a 10% sample of blood slides at the Muhimbili University of Health and Allied Sciences—MUHAS (a center of excellence in laboratory analysis), indicating a 94.5% specificity rate and 95.7% sensitivity rate [22]. All individuals found to be infected with malaria were treated with appropriate front-line regimens—sulphadoxine-pyrimethamine until August 2006, after which it was replaced with artesunate-amodiaquine.
Rationale for model constraints
We assume that there are two types of individuals governed, respectively, by the Markov chains with transition probabilities U and V. When p ∈ (0,1)2 = interior of the unit square, U represents people who do not follow anti-malarial drug regimens if they become infected, who do not sleep under bed nets, and whose houses lack adequate screening and tend to be in close proximity to larval habitats. Once infected, they remain so—also experiencing some superinfection—for sustained periods of time. This is reflected in the a priori assumptions that u1 ≤ 0.2 and u2 = 1.The V process represents individuals who go for a diagnosis at the first sign of possible malaria symptoms and follow a prescribed drug treatment regimen if they are found to be infected. The quantification of this behavior is to have v2 small (and certainly < 0.5), which implies a high frequency, 1 − v2, of infected to uninfected transitions between surveys. Since all infected individuals are treated, a large value of 1 − v2 implies that the treatment regimen is usually followed. Further, reinfection rates will be low (corresponding to high v1, and hence low 1 − v1). The V-process people usually sleep under bednets, live in houses with effective screening of doors and windows, and generally, do not expose themselves to the local anopheles mosquitoes. This leads to the requirement that v1 − v2 should be large.Finally, we assume that the U-process people are a minority in a locality where a control program is operative and local knowledge about malaria tends to be high. Thus, we assume, as a minimal condition, that s1 < 0.5.When p = (p1,0), we assume that p2 = 0 is a structural zero. Thus, anyone found infected is transferred to the uninfected state via a regimen of anti-malarial drugs. Then U takes on the special form u = (1,0), and , . This means that there are no transitions from uninfected to infected, and immediate transition from infected to uninfected. The V population operates under the same assumptions discussed above.
Estimated incidence rates in the context of the UMCP (based on microscopy)
Estimates of incidence rates (cases per week) based on the UMCP data, detailed by pairs of consecutive survey rounds (Fig 3 and Table 6) and by ward (Fig 4 and Table 7) were obtained utilizing the methodology here proposed. Estimated rates ranged from 0.1202 cases per week in R12 (0.1537 in Azimio ward)–baseline period when no larviciding had been launched yet—to 0.0010 cases per week in R56 (a third of the wards had a rate of zero)–the intervention achieved full coverage in R6. An overall declining trend was observed, even before larviciding was introduced in three wards in March 2006 [5]. As the coverage of the intervention expanded, fluctuation in the rates was reduced, and rather stable rates were observed during R56. Just for illustration, we also plotted rainfall information lagged by one month [5] in Fig 3. Although seasonality in malaria transmission following the rains is often observed, rainfall patterns per se cannot fully explain fluctuations in incidence and prevalence rates, since many other factors can potentially overcome or augment the impact of rains [5].
Fig 3
Estimated incidence rates, variability intervals, and 1-month lagged rainfall by stages of the survey rounds.
Variability intervals consider the uncertainty due to sampling and partial identifiability of the mixture model. For ease of visualization, the variability interval for the first point in the graph was truncated. Each survey round of the UMCP conducted interviews in three wards at a time, or stages. Thus, each round had five stages in order to cover all 15 wards. Rainfall estimates were obtained from the National Oceanic and Atmospheric (NOAA) Climate Prediction Center (CPC), have a spatial resolution of eight kilometers [35], and were lagged by 30 days.
Table 6
Estimated incidence rates and variability intervals by stages of the survey rounds.
Pairs of survey rounds
Stage
Estimated incidence rate
Variability interval due to partial identifiability
Variability interval considering partial identifiability and sampling variability
2.5th
97.5th
2.5th
97.5th
2.5th of Minima
97.5th of Maxima
Minima
Maxima
R12
I
0.12489
0.11395
0.16314
0.04345
0.16462
0.03847
0.18411
0.02349
0.18935
R12
II
0.01084
0.01084
0.01084
0.00578
0.01662
0.00578
0.01662
0.00289
0.02096
R12
III
0.01102
0.00875
0.02296
0.00756
0.02430
0.00657
0.05292
0.00531
0.05635
R12
IV
0.00603
0.00479
0.01257
0.00379
0.01363
0.00326
0.03135
0.00252
0.03791
R12
V
0.00844
0.00670
0.01760
0.00556
0.01873
0.00487
0.04135
0.00398
0.04788
R23
I
0.00495
0.00393
0.01032
0.00308
0.01127
0.00260
0.02554
0.00195
0.02873
R23
II
0.00823
0.00653
0.01715
0.00567
0.01819
0.00506
0.03873
0.00394
0.04657
R23
III
0.00670
0.00532
0.01397
0.00435
0.01502
0.00380
0.03314
0.00245
0.03917
R23
IV
0.00384
0.00305
0.00801
0.00226
0.00887
0.00194
0.02022
0.00133
0.02617
R23
V
0.00929
0.00738
0.01937
0.00653
0.02041
0.00585
0.04360
0.00526
0.05127
R34
I
0.00542
0.00430
0.01130
0.00364
0.01209
0.00322
0.02656
0.00246
0.03122
R34
II
0.00257
0.00204
0.00536
0.00142
0.00611
0.00121
0.01394
0.00061
0.01790
R34
III
0.00227
0.00180
0.00473
0.00130
0.00527
0.00112
0.01206
0.00067
0.01389
R34
IV
0.00263
0.00209
0.00547
0.00155
0.00602
0.00138
0.01354
0.00107
0.01655
R34
V
0.00278
0.00221
0.00579
0.00171
0.00630
0.00148
0.01456
0.00106
0.01733
R45
I
0.00514
0.00408
0.01072
0.00336
0.01152
0.00290
0.02601
0.00239
0.03315
R45
II
0.00330
0.00262
0.00687
0.00204
0.00761
0.00185
0.01731
0.00139
0.02004
R45
III
0.00261
0.00208
0.00545
0.00138
0.00614
0.00121
0.01369
0.00060
0.01724
R45
IV
0.00439
0.00348
0.00914
0.00264
0.01003
0.00230
0.02325
0.00179
0.02827
R45
V
0.00258
0.00205
0.00538
0.00143
0.00594
0.00132
0.01347
0.00061
0.01747
R56
I
0.00109
0.00109
0.00109
0.00000
0.00254
0.00000
0.00254
0.00000
0.00363
R56
II
0.00162
0.00162
0.00162
0.00032
0.00324
0.00032
0.00324
0.00000
0.00453
R56
III
0.00215
0.00215
0.00215
0.00086
0.00365
0.00086
0.00365
0.00043
0.00451
R56
IV
0.00088
0.00088
0.00088
0.00000
0.00220
0.00000
0.00220
0.00000
0.00352
R56
V
0.00241
0.00241
0.00241
0.00103
0.00447
0.00103
0.00447
0.00034
0.00550
Fig 4
Estimated incidence rates, variability intervals, and prevalence of malaria infection by wards and consecutive pairs of survey rounds.
Variability intervals consider the uncertainty due to sampling and partial identifiability of the mixture model. Prevalence rates were calculated using two distinct sets of data. First, the longitudinal data also utilized for the calculation of incidence rates (Prevalence—Follow-Up). Second, cross-sectional data collected at the same time and wards of the longitudinal data, but in different tencell units (Prevalence—Cross-Section). Each column of graphs shows wards from one municipality: the first column has wards from Kinondoni, the second column those from Temeke, and the third column shows wards from Ilala. Wards in the first line of the Figure were included in the first phase of the intervention; second and third line of graphs include wards targeted during the second phase of the intervention; and the last two lines show the wards included in the third and last phase of the intervention. The line indicating the onset of the larval control was placed on the pair of survey round at which any impact of each phase of the larval control could be observed. The Roman number after the name of the ward indicates the stage when interviews were conducted in each survey round (refer to Fig 2). For ease of visualization, estimates and variability intervals for Azimio and Vingunguti in R12 were truncated, as well as the variability interval for Mwananyamala in R12.
Table 7
Estimated incidence rates and variability intervals by wards and consecutive pairs of survey rounds.
Pairs of survey rounds
Ward
Estimated incidence rates
Variability interval due to partial identifiability
Partial Identifiability & Sampling Variability
2.5th
97.5th
2.5th
97.5th
2.5th of Minima
97.5th of Maxima
Minima
Maxima
Municipality: Ilala
R12
Buguruni
0.0170
0.0170
0.0170
0.0034
0.0306
0.0034
0.0306
0.0000
0.0442
R12
Ilala
0.0029
0.0023
0.0061
0.0007
0.0084
0.0005
0.0227
0.0000
0.0295
R12
Kipawa
0.0104
0.0082
0.0216
0.0060
0.0243
0.0050
0.0579
0.0027
0.0690
R12
Mchikichini
0.0109
0.0087
0.0228
0.0068
0.0246
0.0058
0.0559
0.0040
0.0665
R12
Vingunguti
0.0519
0.0428
0.0970
0.0263
0.1565
0.0189
0.1827
0.0117
0.2025
R23
Buguruni
0.0078
0.0062
0.0164
0.0046
0.0182
0.0039
0.0414
0.0011
0.0538
R23
Ilala
0.0056
0.0056
0.0056
0.0016
0.0104
0.0016
0.0104
0.0000
0.0128
R23
Kipawa
0.0090
0.0071
0.0187
0.0054
0.0208
0.0046
0.0480
0.0038
0.0546
R23
Mchikichini
0.0027
0.0021
0.0056
0.0009
0.0071
0.0006
0.0192
0.0003
0.0237
R23
Vingunguti
0.0036
0.0029
0.0075
0.0013
0.0093
0.0012
0.0239
0.0000
0.0298
R34
Buguruni
0.0036
0.0028
0.0075
0.0014
0.0092
0.0014
0.0225
0.0000
0.0294
R34
Ilala
0.0013
0.0013
0.0013
0.0000
0.0033
0.0000
0.0033
0.0000
0.0059
R34
Kipawa
0.0025
0.0020
0.0052
0.0009
0.0063
0.0009
0.0149
0.0004
0.0181
R34
Mchikichini
0.0040
0.0032
0.0083
0.0020
0.0097
0.0018
0.0231
0.0011
0.0286
R34
Vingunguti
0.0057
0.0045
0.0118
0.0029
0.0137
0.0025
0.0329
0.0010
0.0423
R45
Buguruni
0.0087
0.0069
0.0181
0.0053
0.0196
0.0048
0.0436
0.0035
0.0523
R45
Ilala
0.0063
0.0050
0.0131
0.0032
0.0154
0.0028
0.0365
0.0014
0.0470
R45
Kipawa
0.0055
0.0044
0.0115
0.0028
0.0133
0.0025
0.0304
0.0015
0.0396
R45
Mchikichini
0.0056
0.0044
0.0116
0.0023
0.0137
0.0020
0.0334
0.0004
0.0432
R45
Vingunguti
0.0108
0.0085
0.0224
0.0067
0.0246
0.0057
0.0565
0.0039
0.0639
R56
Buguruni
0.0036
0.0036
0.0036
0.0000
0.0072
0.0000
0.0072
0.0000
0.0119
R56
Ilala
0.0040
0.0040
0.0040
0.0000
0.0101
0.0000
0.0101
0.0000
0.0162
R56
Kipawa
0.0039
0.0039
0.0039
0.0011
0.0068
0.0011
0.0068
0.0000
0.0096
R56
Mchikichini
0.0026
0.0026
0.0026
0.0000
0.0060
0.0000
0.0060
0.0000
0.0077
R56
Vingunguti
0.0018
0.0018
0.0018
0.0000
0.0044
0.0000
0.0044
0.0000
0.0062
Municipality: Kinondoni
R12
Magomeni
0.0050
0.0040
0.0104
0.0018
0.0129
0.0017
0.0332
0.0006
0.0415
R12
Mikocheni
0.0087
0.0069
0.0181
0.0042
0.0210
0.0040
0.0503
0.0011
0.0615
R12
Mwananyamala
0.0359
0.0285
0.0748
0.0225
0.1382
0.0160
0.1808
0.0106
0.1956
R12
Mzimuni
0.0124
0.0124
0.0124
0.0047
0.0202
0.0047
0.0202
0.0000
0.0279
R12
Ndugumbi
0.0062
0.0050
0.0130
0.0029
0.0150
0.0025
0.0382
0.0012
0.0463
R23
Magomeni
0.0093
0.0073
0.0193
0.0056
0.0213
0.0048
0.0482
0.0031
0.0592
R23
Mikocheni
0.0082
0.0065
0.0170
0.0041
0.0198
0.0035
0.0486
0.0015
0.0583
R23
Mwananyamala
0.0056
0.0044
0.0116
0.0028
0.0135
0.0022
0.0323
0.0016
0.0416
R23
Mzimuni
0.0073
0.0058
0.0152
0.0040
0.0173
0.0032
0.0407
0.0025
0.0485
R23
Ndugumbi
0.0027
0.0021
0.0056
0.0008
0.0073
0.0007
0.0189
0.0000
0.0275
R34
Magomeni
0.0037
0.0029
0.0077
0.0018
0.0090
0.0015
0.0214
0.0006
0.0296
R34
Mikocheni
0.0043
0.0043
0.0043
0.0012
0.0080
0.0012
0.0080
0.0000
0.0099
R34
Mwananyamala
0.0035
0.0027
0.0072
0.0017
0.0084
0.0015
0.0206
0.0004
0.0267
R34
Mzimuni
0.0031
0.0031
0.0031
0.0006
0.0061
0.0006
0.0061
0.0000
0.0080
R34
Ndugumbi
0.0025
0.0020
0.0052
0.0011
0.0065
0.0010
0.0155
0.0004
0.0232
R45
Magomeni
0.0021
0.0021
0.0021
0.0000
0.0050
0.0000
0.0050
0.0000
0.0071
R45
Mikocheni
0.0018
0.0018
0.0018
0.0000
0.0043
0.0000
0.0043
0.0000
0.0055
R45
Mwananyamala
0.0031
0.0025
0.0065
0.0014
0.0076
0.0012
0.0181
0.0005
0.0265
R45
Mzimuni
0.0006
0.0006
0.0006
0.0000
0.0025
0.0000
0.0025
0.0000
0.0043
R45
Ndugumbi
0.0026
0.0026
0.0026
0.0000
0.0060
0.0000
0.0060
0.0000
0.0077
R56
Magomeni
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
R56
Mikocheni
0.0013
0.0013
0.0013
0.0000
0.0033
0.0000
0.0033
0.0000
0.0053
R56
Mwananyamala
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
R56
Mzimuni
0.0008
0.0008
0.0008
0.0000
0.0025
0.0000
0.0025
0.0000
0.0050
R56
Ndugumbi
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
Municipality: Temeke
R12
Azimio
0.1510
0.1403
0.1851
0.1200
0.1906
0.0276
0.2074
0.0204
0.2228
R12
Keko
0.0045
0.0045
0.0045
0.0000
0.0112
0.0000
0.0112
0.0000
0.0180
R12
Kurasini
0.0091
0.0072
0.0190
0.0053
0.0210
0.0046
0.0494
0.0036
0.0582
R12
Miburani
0.0131
0.0104
0.0273
0.0078
0.0298
0.0070
0.0644
0.0051
0.0874
R12
Mtoni
0.0090
0.0071
0.0187
0.0048
0.0216
0.0038
0.0531
0.0028
0.0623
R23
Azimio
0.0056
0.0045
0.0117
0.0022
0.0143
0.0020
0.0363
0.0005
0.0557
R23
Keko
0.0093
0.0074
0.0194
0.0056
0.0213
0.0051
0.0472
0.0023
0.0609
R23
Kurasini
0.0097
0.0077
0.0203
0.0057
0.0223
0.0047
0.0524
0.0030
0.0639
R23
Miburani
0.0104
0.0082
0.0216
0.0060
0.0243
0.0053
0.0558
0.0035
0.0732
R23
Mtoni
0.0064
0.0051
0.0133
0.0030
0.0155
0.0025
0.0371
0.0017
0.0495
R34
Azimio
0.0087
0.0069
0.0181
0.0049
0.0203
0.0042
0.0485
0.0015
0.0635
R34
Keko
0.0054
0.0054
0.0054
0.0023
0.0101
0.0023
0.0101
0.0008
0.0124
R34
Kurasini
0.0021
0.0017
0.0045
0.0008
0.0054
0.0006
0.0135
0.0004
0.0197
R34
Miburani
0.0008
0.0006
0.0016
0.0002
0.0024
0.0000
0.0068
0.0000
0.0107
R34
Mtoni
0.0044
0.0035
0.0092
0.0023
0.0107
0.0021
0.0251
0.0012
0.0331
R45
Azimio
0.0011
0.0011
0.0011
0.0000
0.0032
0.0000
0.0032
0.0000
0.0054
R45
Keko
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
R45
Kurasini
0.0023
0.0023
0.0023
0.0000
0.0053
0.0000
0.0053
0.0000
0.0075
R45
Miburani
0.0040
0.0040
0.0040
0.0000
0.0094
0.0000
0.0094
0.0000
0.0121
R45
Mtoni
0.0058
0.0046
0.0122
0.0027
0.0144
0.0023
0.0339
0.0005
0.0429
R56
Azimio
0.0014
0.0014
0.0014
0.0000
0.0042
0.0000
0.0042
0.0000
0.0070
R56
Keko
0.0009
0.0009
0.0009
0.0000
0.0028
0.0000
0.0028
0.0000
0.0047
R56
Kurasini
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
R56
Miburani
0.0026
0.0026
0.0026
0.0005
0.0052
0.0005
0.0052
0.0000
0.0062
R56
Mtoni
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
Estimated incidence rates, variability intervals, and 1-month lagged rainfall by stages of the survey rounds.
Variability intervals consider the uncertainty due to sampling and partial identifiability of the mixture model. For ease of visualization, the variability interval for the first point in the graph was truncated. Each survey round of the UMCP conducted interviews in three wards at a time, or stages. Thus, each round had five stages in order to cover all 15 wards. Rainfall estimates were obtained from the National Oceanic and Atmospheric (NOAA) Climate Prediction Center (CPC), have a spatial resolution of eight kilometers [35], and were lagged by 30 days.
Estimated incidence rates, variability intervals, and prevalence of malaria infection by wards and consecutive pairs of survey rounds.
Variability intervals consider the uncertainty due to sampling and partial identifiability of the mixture model. Prevalence rates were calculated using two distinct sets of data. First, the longitudinal data also utilized for the calculation of incidence rates (Prevalence—Follow-Up). Second, cross-sectional data collected at the same time and wards of the longitudinal data, but in different tencell units (Prevalence—Cross-Section). Each column of graphs shows wards from one municipality: the first column has wards from Kinondoni, the second column those from Temeke, and the third column shows wards from Ilala. Wards in the first line of the Figure were included in the first phase of the intervention; second and third line of graphs include wards targeted during the second phase of the intervention; and the last two lines show the wards included in the third and last phase of the intervention. The line indicating the onset of the larval control was placed on the pair of survey round at which any impact of each phase of the larval control could be observed. The Roman number after the name of the ward indicates the stage when interviews were conducted in each survey round (refer to Fig 2). For ease of visualization, estimates and variability intervals for Azimio and Vingunguti in R12 were truncated, as well as the variability interval for Mwananyamala in R12.Disaggregating by individual wards (Fig 4) raises an important issue: the lack of a perfect match between the time and duration of each stage/survey round, and the time when each phase of the intervention was launched. Phase 1 started in March 2006 (see Fig 2), slightly after the onset of the main rainy season, and at the end of data collection in round 3 (R3). Considering that there is a biological time lag between reducing larval survival and reducing malaria transmission from human to human, and the initial programmatic challenges of launching the intervention [28,29] (augmented by the heavy rains), we expect that incidence rates estimated for R34 (in the three wards targeted with the intervention) should be the first to potentially reflect changes due to the larval control. However, each one of these three wards (first line of three graphs in Fig 4) was interviewed at a different stage, and thus exposed to the intervention for different periods of time. Between March 2006 and the beginning of interviews there were about 180, 240, and 300 days in Buguruni, Mikocheni, and Kurasini wards, respectively. Except from Buguruni, incidence rates stabilized at lower levels after R34. One peculiarity of Buguruni is the intense migration from northern areas of the country.Phase 2 started in May 2007, a month after interviews for R5 had started, expanding larviciding application to six additional wards (second and third lines of graphs in Fig 4). Here, between May 2007 and the beginning of the interviews there were about 30 days for Mzimuni and Keko, 60 for Miburani, and 90 for Ilala. In the case of Vingunguti and Mwananyamala, however, R5 interviews started in April 2007, before the introduction of larviciding. Thus, changes in incidence rates observed in these wards for the period R45 cannot be associated with larval control.Phase 3 started in April 2008, a month after interviews for R6 had started, expanding larviciding application to the remaining six wards (fourth and fifth lines of graphs in Fig 4). The number of days between the onset of phase 3 and R6 interviews was about 180 for Magomeni and Kipawa, 150 for Ndugumbi and Mtoni, and 60 for Mchikichini. Azimio, however, had R6 data collection between March-May 2008, and thus data for this ward were not suitable to capture any potential contribution of the larval control intervention.A crude analysis of the estimated incidence rates by wards is also difficult due to the large diversity in social, economic, and environmental conditions. In addition, changes in the urban landscape are dynamic, reducing or augmenting the risk of malaria transmission. As a result, wards interviewed at the same period (thus subjected to same rainfall pattern), targeted by the same intervention phase, may show distinct trends in incidence rates. For example, Ndugumbi and Mtoni were both targeted during the 3rd phase of larviciding, and interviewed during stage IV of each survey round (see Fig 2), but while Ndugumbi stabilized at very low incidence rates even before larval control, Mtoni did show a steep decline in rates after the introduction of larviciding (see Fig 4). Also, Mwananyamala and Vingunguti were targeted during the 2nd phase of the larval control intervention, and interviews conducted during stage I of each survey round. Estimated incidence rates for both wards were among the highest for R12, reducing significantly in R23. However, while Mwananyamala maintained low rates (even before the introduction of larviciding), Vingunguti observed an increase in incidence rates in R34 and R45. Since interviews in Stage I started before the launching of phase 2 of the intervention, any impact would only be detected in R56, and a decline was indeed observed in Vingunguti by then. Distinct patterns were also observed for Magomeni and Kipawa—both interviewed in stage V of each survey round and targeted in phase 2 of the intervention. In such a complex scenario, even the consideration of a large number of variables to define control wards (to be compared with intervention wards) is a near to impossible task. Important to note, however, are the large uncertainty intervals observed in some wards (Fig 4), mainly due to sampling variability (as detailed next).
Sensitivity analysis
Sensitivity analysis of the incidence rate estimates by survey round (Table 6) and by ward (Table 7) was assessed through variation intervals that reflect uncertainty from non-identifiability of the mixture model, and variation intervals that combine both non-identifiability and sampling variation. Results indicate that variation from non-identifiability of the mixture model here proposed was rather small, as we would expect considering the rigorous context-based constraint conditions that we assumed. Thus, the bulk of the uncertainty in r1 could be attributed to sampling variability. Considering the level of detail used for the estimation of incidence rates by ward (15 wards * 5 pairs of consecutive survey rounds = 75 estimates), the fact that individuals who tested positive for a malaria infection were treated with anti-malarials, and the decline in transmission observed in the UMCP (even before the larviciding intervention was launched), implied that the small counts for n21, n22, and n12 (Table 1) that were observed in some pairs of survey rounds were not unexpected.Thus, sampling variation becomes rather important, especially for p2. The magnitude of uncertainty was smaller for the estimates by stage of the survey round since data were more aggregated (5 stages * 5 pairs of consecutive survey rounds = 25 estimates), and thus larger counts for n21, n22, and n12 (Table 2) obtained.The largest variation intervals in estimated rates were observed in three wards in R12 that recorded particularly high counts for n12 and thus lower values for p1; Azimio had p1 = 0.368 (and thus p2 > p1 and v1 − v2 is small), Vingunguti had p1 = 0.542 (and v1 − v2 is small), and Mwananyamala had p1 = 0.603 (and v1 − v2 is small).
Comparison of incidence and prevalence rates
We compared estimated incidence rates with prevalence rates calculated directly from survey data. Since the UMCP collected both longitudinal and cross-sectional data [5], we were able to calculate two sets of prevalence rates: one utilizing the follow-up data (the same data used to calculate incidence rates), and another utilizing the cross-sectional data collected at the same time of the follow-up data, but in different ten-cell units. Fig 5 shows that, overall, prevalence rates followed the same pattern of the incidence rates, becoming much lower and more stable after phase 3 of the intervention. The same was observed when analyzing the rates disaggregated by wards (Fig 4). The two prevalence rates were similar in magnitude, with only three exceptions: Azimio, Vingunguti, and Mwananyamala, at the time when incidence rates were also extremely high.
Fig 5
Estimated incidence rates, variability intervals, and prevalence of malaria infection by stages of the survey rounds.
Variability intervals of the incidence rates consider the uncertainty due to sampling and partial identifiability of the mixture model. Prevalence rates were calculated using two distinct sets of data. First, the longitudinal data also utilized for the calculation of incidence rates (Prevalence—Follow-Up). Second, cross-sectional data collected at the same time and wards of the longitudinal data, but in different tencell units (Prevalence—Cross-Section). Each survey round of the UMCP conducted interviews in three wards at a time, or stages. Thus, each round had five stages in order to cover all 15 wards.
Estimated incidence rates, variability intervals, and prevalence of malaria infection by stages of the survey rounds.
Variability intervals of the incidence rates consider the uncertainty due to sampling and partial identifiability of the mixture model. Prevalence rates were calculated using two distinct sets of data. First, the longitudinal data also utilized for the calculation of incidence rates (Prevalence—Follow-Up). Second, cross-sectional data collected at the same time and wards of the longitudinal data, but in different tencell units (Prevalence—Cross-Section). Each survey round of the UMCP conducted interviews in three wards at a time, or stages. Thus, each round had five stages in order to cover all 15 wards.It was reassuring that prevalence rates calculated from follow-up and cross-sectional data had similar values (with a few exceptions when the intensity of transmission was high). Despite possible intra-ward variability in malaria transmission, the ten-cell units sampled for follow-up and during cross-sectional surveys seem to be representative of the transmission pattern in the ward, which is reflected in similar prevalence rates for both data types.
Estimated incidence rates based on PCR adjustments
The same pattern seen for estimated incidence rates based on microscopy data was observed for PCR-adjusted incidence rates (S1 Fig). Prior to the final survey rounds (R56) of data collection, the estimated PCR-adjusted rates were slightly higher than those estimated based on microscopy data. However, as larviciding was scaled-up, estimated PCR-adjusted incidence rates near the end of the program did not differ qualitatively from what was calculated based on microscopy. Nevertheless if microscopy were to indicate an incidence rate of zero—consistent with elimination—the PCR rates being positive but small would not support such a claim.
Discussion
The methodology we introduced for estimating incidence rates from 2-wave panel data satisfying trace utilized 2-component mixtures of continuous-time Markov chains as a class of models to represent such data. This was accompanied by an unavoidable imposition of inequality constraints to facilitate parameter identification or, in some instances, partial identification. The application of this strategy for estimation of incidence rates in an urban malaria control initiative in Dar es Salaam represented a practical instance where such tools played a decisive role in quantifying reductions in malaria acquisition rates in a low endemicity setting. With this general methodology and particular example in hand, several further issues regarding incidence rate estimation require clarification.
a) Alternative specification of waiting time distributions
The continuous-time Markov chains all have exponentially distributed waiting time distributions in each state, which implies that their hazard rates are constant. Two-wave panel data, assumed to be generated by some 2-state continuous-time stochastic process, is not sufficiently rich to provide a basis for testing this hypothesis. However, several basic facts about malaria in diverse ecosystems make this assumption untenable. For example, in the Garki study [3], persons who survive repeated episodes of malaria in infancy and childhood have antibody titers that ensure an increasing hazard rate in the infected state—i.e. the longer an individual has detectable parasites, the more likely he/she is to clear parasites free of any intervention, and return to the uninfected state. For uninfected individuals at the end of a dry season, the hazard rate for onset of a new infection is also increasing, corresponding to the propensity for rain and, thereby, standing water. One of many possible parameterizations of these qualitative ideas is given by the waiting time distributions F((t), t > 0, where i = 1,2 designate the states uninfected (i = 1) and infected (i = 2), and having probability density functions
and hazard rates Here h(t) is: increasing if α > 1, constant if α = 1, and decreasing if α < 1.The family of Gamma densities eq (7) are the basis for obtaining an expression for P(Δ) within the class of semi-Markov models by solving the backward integral equation system [30].
where δ = 1if i = j, δ = 0 if i ≠ j, and 1 ≤ i, j ≤ r with M = ‖m‖ an r x r stochastic matrix having m = 0 for 1 ≤ i, j ≤ r.Specification eq (8) holds for general r-state process and waiting time densities f(t), 1 ≤ i ≤ r. For our purposes, r = 2, m12 = m21 = 1, and we focus on the 2-parameter family eq (7). Here we set t = Δ and identify p(0,Δ) with the (i,j) entry in P(Δ).The equation system eq (8) is amenable to the following interpretation: (i) when i ≠ j, p(0,t) consists of the sum of products of three factors: the probability of a first departure from state i at time s, the probability of a transition from state i to state k at that instant, and the probability of transferring to state j by some combination of moves in the interval t−s The summation is over all intermediate states k and all time divisions s in (0, t); (ii) when i = j, in addition to the preceding term, there is the probability of not transferring out of state i during (0, t). This is given by the first term.With empirical data, solving the system of equations for parameter values as in the Gamma specification above, requires a priori context-dependent constraints on the parameters—to secure identification or partial identification—and numerical inversion calculations in eq (8). Going back to the empirical analyses in the Garki baseline surveys [3], the disconcerting issue that now arises is that the entire set of 2-wave panel data sets shown in Singer & Cohen [7], with incidence and recovery rates computed within the class of continuous time Markov chain models, could just as well have been used to estimate incidence and recovery rates within the class of 2-state semi-Markov models with Gamma distributed waiting times. The same can, of course, be said for the rates computed in the prior section for Dar es Salaam now using, in some of the trace P < 1 cases, 2-component mixtures of semi-Markov models with analogous inequality constraints facilitating identification or partial identification of parameters.
b) In low-endemicity settings, do we need to take formal account of the possibility of many (i.e., 3 or more) unobserved transitions?
If we use a crude incidence rate given by , this essentially assumes that there is at most one unobserved transition in the inter-survey interval, Δ. For the Garki surveys where Δ = 10 weeks, we find, not surprisingly, that r1(We also find that r1( < r1( when trace in Dar es Salaam, but this is decidedly influenced by the long inter-survey intervals. As two examples, consider the ward Buguruni, for survey intervals R23 (with Δ = 42.2 weeks) and R56 (with Δ = 49.2 weeks). For R23, r1( = 0.0031 and r1( = 0.0081. For R56, r1( = 0.0007 and r1( = 0.0039. Thus, during the last survey interval, R56, when the larvicide intervention was operating effectively, we still have r1( and r1( differing by a factor of 5.6. Taking formal account of the possibility of multiple unobserved transitions clearly makes a difference, and is preferable to the crude incidence rate, generally.
c) Comparison of incidence rates and incidence defined as Inc = [Number of positive species-specific clinical cases observed during the duration of a survey]/[(Number of people observed over the survey duration) x (duration of survey)] [2]
For the 2-wave panel data in Dar es Salaam, we have:
where N = n11+n12+n21+n22. Using eq (9) we compare Inc with r1( for two wards, Buguruni and Kurasini, early in the larviciding program, R23, and at the end of it, R56 (Table 8).
Table 8
Comparison of incidence (Inc) and incidence rates for Buguruni and Kurasini wards.
Survey Interval
Ward
Inc
r1(Mixture)
R23
Buguruni
0.0068
0.0081
Kurasini
0.0095
0.0100
R56
Buguruni
0.0034
0.0039
Kurasini
0.00032
0
The general lesson here is that Inc < r1( except for survey rounds where there are almost no infected cases. Indeed, for Kurasini at R56 we have a raw table of counts given by . Here there is no apparent transmission between survey rounds 5 and 6. The infected individual at survey is, in accordance with the study protocol, treated with anti-malarial drugs. The general inequality Inc < r1( is basically a consequence of the fact that unobserved transitions are not taken into account in Inc.In moderate to high transmission areas, we anticipate that Inc will be substantially downward biased as a result of lack of formal consideration of unobserved transitions.
d) Estimating incidence from single and multiple prevalence surveys
Different methods have been proposed to generate incidence (as defined in this paper, or the rate of occurrence of an event) from prevalence data [31-34]. As more applications of the methodology here introduced are made, analysis that would produce both incidence and prevalence rates could provide a unique opportunity to evaluate the best approach to obtain incidence rate estimates from prevalence rates—currently, it is unclear what is the best strategy. Such an exercise could lead to clear recommendations that would have both a wide applicability in malaria endemic countries and a crucial importance for National Malaria Control Programs (e.g., planning and evaluation of the cost-effectiveness of interventions).In conclusion, this paper introduced new methodology for estimating incidence rates from 2-wave panel data with interval censoring, which is applicable in the many cases where the extant Markovian models are inapplicable. The methodology is suitable to settings with any malaria transmission level, given the availability of longitudinal data. In addition, we present a strategy for quantifying the uncertainty in estimation of incidence rates. It is hereby distributed with a well-documented programming code that allows the use of the method in R software.
R-code to estimate malaria incidence rates from interval-censored longitudinal data, and to calculate variability intervals for the estimated rates.
This file contains a commented version of the code written in R. The.R file is also available for download (S2 Code).(DOCX)Click here for additional data file.This file contains the.R file (which can be directly read and used in R).(R)Click here for additional data file.
Estimated incidence rates and PCR-adjusted incidence rates, and partial identifiability intervals by wards and consecutive pairs of survey rounds.
Each column of graphs shows wards from one municipality: the first column has wards from Kinondoni, the second column those from Temeke, and the third column shows wards from Ilala. Wards in the first line of the Figure were included in the first phase of the intervention; second and third line of graphs include wards targeted during the second phase of the intervention; and the last two lines show the wards included in the third and last phase of the intervention. The line indicating the onset of the larval control was placed on the pair of survey round at which any impact of each phase of the larval control could be observed. The Roman number after the name of the ward indicates the stage when interviews were conducted in each survey round. Assumptions for the calculation of PCR-based rates were extracted from Okell et al. [36].(TIFF)Click here for additional data file.
Authors: J F Trape; E Lefebvre-Zante; F Legros; P Druilhe; C Rogier; H Bouganali; G Salem Journal: Am J Trop Med Hyg Date: 1993-06 Impact factor: 2.345
Authors: Kim A Lindblade; Laura Steinhardt; Aaron Samuels; S Patrick Kachur; Laurence Slutsker Journal: Expert Rev Anti Infect Ther Date: 2013-06 Impact factor: 5.091
Authors: Katherine E Battle; Carlos A Guerra; Nick Golding; Kirsten A Duda; Ewan Cameron; Rosalind E Howes; Iqbal R F Elyazar; J Kevin Baird; Robert C Reiner; Peter W Gething; David L Smith; Simon I Hay Journal: Sci Data Date: 2015-08-18 Impact factor: 6.444