Literature DB >> 22268237

Epidemiological methods in diarrhoea studies--an update.

Wolf-Peter Schmidt¹, Benjamin F Arnold, Sophie Boisson, Bernd Genser, Stephen P Luby, Mauricio L Barreto, Thomas Clasen, Sandy Cairncross.

Abstract

BACKGROUND: Diarrhoea remains a leading cause of morbidity and mortality but is difficult to measure in epidemiological studies. Challenges include the diagnosis based on self-reported symptoms, the logistical burden of intensive surveillance and the variability of diarrhoea in space, time and person.
METHODS: We review current practices in sampling procedures to measure diarrhoea, and provide guidance for diarrhoea measurement across a range of study goals. Using 14 available data sets, we estimated typical design effects for clustering at household and village/ neighbourhood level, and measured the impact of adjusting for baseline variables on the precision of intervention effect estimates.
RESULTS: Incidence is the preferred outcome measure in aetiological studies, health services research and vaccine trials. Repeated prevalence measurements (longitudinal prevalence) are appropriate in high-mortality settings where malnutrition is common, although many repeat measures are rarely useful. Period prevalence is an inadequate outcome if an intervention affects illness duration. Adjusting point estimates for age or diarrhoea at baseline in randomized trials has little effect on the precision of estimates. Design effects in trials randomized at household level are usually <2 (range 1.0–3.2). Design effects for larger clusters (e.g. villages or neighbourhoods) vary greatly among different settings and study designs (range 0.1–25.8).
CONCLUSIONS: Using appropriate sampling strategies and outcome measures can improve the efficiency, validity and comparability of diarrhoea studies. Allocating large clusters in cluster randomized trials is compromized by unpredictable design effects and should be carried out only if the research question requires it.

Entities: Chemical Disease Species

Mesh：

Year: 2011 PMID： 22268237 PMCID： PMC3235024 DOI： 10.1093/ije/dyr152

Source DB: PubMed Journal: Int J Epidemiol ISSN： 0300-5771 Impact factor: 7.196

Introduction

Diarrhoeal diseases remain a leading cause of morbidity and mortality in children worldwide. Reliable field data from epidemiological studies are required to study diarrhoea epidemiology and the effect of interventions,, but diarrhoea remains a condition difficult to measure. Systematic reviews of diarrhoea interventions have found a great variety of approaches to measure diarrhoea., The past decade saw a trend towards less intensive active diarrhoea surveillance, the use of repeated diarrhoea prevalence measures instead of incidence as outcome measure and a greater recognition of recent advances in the design of cluster randomized trials., In this article we review current practices in conducting epidemiological studies on diarrhoeal diseases with an emphasis on randomized controlled trials (RCTs) in low-income populations, including cluster randomized trials. We discuss crucial methodological problems to be considered in the planning stage of a trial, but several issues should also be relevant for observational studies.

Literature search methods and data sets

We searched the database MEDLINE for the years 1970–2009 without language restrictions, using the search terms [diarrh(o)ea AND trial], [diarrh(o)ea AND measurement], [diarrh(o)ea AND recall] and [diarrh(o)ea AND longitudinal prevalence]. We screened the reference lists of relevant articles and contacted authors and experts in the field for further identification of relevant articles. We further used original data sets from different field sites across the world (in part described previously) to address issues of design effect and adjustment for baseline variables in RCTs. These data sets came from the authors of this article or were made available to us by other researchers in the field (see ‘Acknowledgements’).

Reporting and recording diarrhoea symptoms

Case definitions for diarrhoea commonly are either based on reported signs and symptoms (stool frequency, presence of blood or mucus) or based on local disease perception. For example, a study in Ghana identified seven different local terms for symptoms compatible with diarrhoea. Relying on local disease definitions requires extensive qualitative research and piloting, but such work can provide important insights that are useful for a study as a whole. Most studies continue to use the WHO definition of diarrhoea, defined as ‘the passage of 3 or more loose or liquid stools per day, or more frequently than is normal for the individual’., A stringent definition that does not depend on local disease concepts may reduce subjectivity and perhaps also the risk of bias but this has not yet been shown in practice. While not necessarily having more clinical validity, using the WHO definition facilitates comparison across sites (Box 1). Asking study participants specifically for the presence or absence of ‘3 or more loose or liquid stools per day’ may unnecessarily force a decision by the respondent that may be prone to bias. Therefore, some diarrhoea trials record stool frequency and then apply the WHO definition post-hoc., Studies have shown that the longer the recall period, the greater the imprecision (especially underestimation) of prevalence estimates., By assuming that reported prevalence in the last 24 h was 100% accurate, these studies may have overestimated recall error, since the higher diarrhoea prevalence closer to the day of the visit may indicate that people remember diarrhoea during the past 7 days as having occurred more recently than was actually the case. In a study from Peru, mothers reported the correct prevalence of diarrhoea but often were inaccurate in reporting the exact day when it occurred. Recall error depends on the severity and duration of symptoms. A decline in reported diarrhoea with time on study (independent of treatment) has been noted in diverse populations., Intensive surveillance including frequent home visits can lower the reported diarrhoea prevalence, perhaps due to ‘reporting fatigue’. Recall can be more complete in groups of higher socio-economic status, leading to bias when comparing different populations. Recall error may not be a big problem in studies exploring disease trends or comparing diarrhoea risk between treatment arms, if it can be assumed that recall error is non-differential among the groups compared. This assumption, however, is difficult to verify in unblinded trials. There are numerous theoretical possibilities for treatment effects to be biased. For example, allocation to the control group may lead to diarrhoea episodes being remembered more acutely out of frustration of not receiving the intervention. Alternatively, allocation to a treatment group may lead participants to not report disease episodes or field staff to not record disease episodes under the expectation that the intervention is effective. Due to such biases, even a diarrhoea reduction of 50% observed in unblinded trials may be compatible with no true effect. Given the complexity of validating reported diarrhoeal disease in community-based surveys, investigators should take steps to minimize measurement error whenever possible. A 7-day recall period is commonly used in diarrhoea trials, but a shorter recall period may reduce subjectivity of reporting and possibly bias in unblinded trials. Using a 2- or 3-day recall often leads to only a small to moderate loss of power compared with a 7-day recall period, especially if diarrhoea is common, if the number of measurements per individual exceeds 10 or 12 and in cluster randomized trials. Instead of asking for diarrhoea in the previous 24, 48 or 72 h, one could consider asking for whole calendar days only (Did you have diarrhoea today? Yesterday? The day before yesterday?). Such questions are usually easier to ask and to answer. Numerous studies have demonstrated that symptom recall beyond 7 days is unreliable, and we do not recommend it., In any case, the final choice of the recall period should only be done after pilot-testing different approaches in a given setting.

Incidence or longitudinal prevalence as outcome measure

Diarrhoea can be measured as incidence (new episodes per person-time) or prevalence (disease presence at time t, Box 1). Incidence does not account for episode duration, an important risk factor for adverse outcomes., In settings where diarrhoea is common, it can be difficult to distinguish one episode from the next. Two to three days have been suggested as the most appropriate period to separate distinct episodes,, an approach widely in use today. If diarrhoea is quite rare, it makes sense to use a longer gap (e.g. 6 days,) to separate distinct episodes, since episodes are unlikely to occur close together by chance. Such definitions are, to some extent, arbitrary and will inevitably cause some misclassification., Methods have been developed to allow the comparison of studies using different definitions. Measuring incidence especially in high-risk settings can require close disease surveillance (e.g. one to three times/week) to establish beginning and end of episodes. However, a rough incidence estimate can be obtained by repeated period prevalence measurements assuming that diarrhoea preceded by a period without diarrhoea represents a new episode. Incidence is an appropriate measure if the duration of illness is not of particular interest. A new episode can be interpreted as a case of pathogen transmission to a new host, which for disease control or vaccine research can be more important than episode duration. This applies to disease surveillance in middle- and high-income settings with low risk of malnutrition and diarrhoea-related mortality. For example, a study in the UK compared the incidence of diarrhoea in the community with cases reported to surveillance agencies. The duration of episodes was of little importance. Health service and vaccine researchers are often more interested in incident episodes than prevalence, focusing on the incidence of episodes with pre-defined characteristics, e.g. episodes of long duration or with blood/mucus, or watery diarrhoea for the surveillance of cholera., Such studies often use passive case finding instead of intensive active surveillance, e.g. by measuring the incidence of hospital admissions. This approach allows obtaining detailed clinical data and causative agents assessed by health professionals, often at a higher standard compared with field data. Measuring the incidence of hospital admission biases the data towards severe episodes, which are often the episodes of highest public health interest. Since only a fraction of diarrhoea episodes are seen at hospitals, the study population receiving the intervention will have to be large. On the other hand, there is no need for repeated surveillance visits. Passive recording of the incidence of hospital admissions may be less prone to observer and responder bias than diarrhoea incidence recorded through active surveillance because, although bias cannot be excluded, study participants are less likely to decide on health-care use based on treatment allocation. If the aim of a study is to obtain detailed clinical data on all, not just severe, episodes, close active surveillance (e.g. contacting participants at least once a week) is usually required, especially if stool samples are collected. Outside clinical studies, prevalence rather than incidence is often the outcome measure of choice, especially if prevalence can be measured repeatedly in the same individual. Repeated measurements provide an estimate of an individual's proportion of time ill, also termed ‘longitudinal prevalence’ (LP). The ideal settings for using LP as outcome are low-income, high-risk populations where preventing adverse outcomes such as death and malnutrition is important. LP is a better predictor of such complications than incidence., Table 1 shows the results from two large RCTs conducted in Guatemala (Household water treatment intervention) and Brazil (Vitamin A supplementation). In the Guatemala trial, the interventions reduced the incidence of diarrhoea by 24%, whereas the mean LP (days with diarrhoea/days observed) was reduced by only 14%. This was because the intervention mostly prevented short episodes. In contrast, the Brazil study achieved only a small reduction in the incidence, which, however, masks the impact of the intervention on the duration of illness, leading to an LP reduction of 12% (note the differences in the P-values for LP vs incidence). In both cases it can be argued that longitudinal prevalence is the more appropriate way to measure public health impact.

Table 1

Incidence vs longitudinal prevalence of diarrhoea: impact on study results and interpretation in two randomized trials

	Incidence reduction (%)	P	Mean LP reduction (%)	P
Guatemala (n = 2982)
Water treatment	−24	0.001	−14	0.185
Brazil (n = 1180)
Vitamin A	−7	0.18	−12	0.06

Incidence vs longitudinal prevalence of diarrhoea: impact on study results and interpretation in two randomized trials Individuals tend to differ more in the number of disease days than in the number of episodes they experience, since the variation in the duration of episodes increases the standard deviation (SD) of LP compared with incidence. LP studies may require a larger sample size than incidence studies, if the exposure variable has no effect on episode duration. If, however, the exposure variable is associated with shorter episodes (as in the Brazil Vitamin A study), using LP increases power because the effect size should be larger (Table 1). If an intervention reduces predominantly short episodes as in Guatemala (Table 1), incidence may be more powerful, but may also be less informative for public health purposes. Table 2 summarizes advantages and disadvantages of using incidence vs prevalence measurements.

Table 2

Comparison between incidence and longitudinal prevalence

	Incidence	LP
Suitable setting	Low diarrhoea risk	High diarrhoea risk
Suitable setting	Malnutrition and case fatality uncommon	Malnutrition and case fatality a public health problem
Suitable research objectives	Disease surveillance and control	Burden of disease
	Health services research	Adverse outcomes
	Vaccine research	Nutrition studies
	Aetiological research	Nutrition studies
Data interpretation	Disease transmission	Burden of disease
Data interpretation	Disease transmission	Risk of adverse outcomes
Definition to separate episodes	Required	Not required
Sampling frequency	Usually requires frequent and regular sampling, unless passive surveillance is used	Sampling at long or irregular intervals possible and often logistically efficient
Study power	Larger than for LP if exposure or treatment has no effect on episode duration	Larger than for incidence if exposure or treatment reduces episode duration

Comparison between incidence and longitudinal prevalence

At what intervals should diarrhoea prevalence be measured?

Diarrhoea prevalence can be measured at long and irregular intervals because a prevalence measurement requires no information of when an episode started. Incidence may also be estimated by infrequent or irregular sampling, e.g. by assuming that any diarrhoea occurring within the recall period is a new episode if no diarrhoea was present early in the recall period. This, however, is when recall error may be greatest, making such incidence estimates potentially unreliable. Infrequent sampling can reduce costs and may increase validity, since frequent measurements may compromise the willingness of participants to report illness. Frequent measurements may lead to a better compliance with the intervention and a lower reported prevalence of diarrhoea, at least if each visit includes procedures that are clearly related to the intervention (e.g. water testing in a household water chlorination intervention). Many repeat measurements of diarrhoea prevalence often provide little additional study power compared with fewer measurements. Clustering of disease in high-risk individuals means that if an individual reports being diseased at the time of a survey, he/she is more likely to have been ill on any other day than an individual reported healthy. The more illness is clustered in individuals, the more disease absence or presence in an individual at one point in time is representative of the true disease experience. Consider as an example, a study in which weekly surveillance visits are conducted over 1 year, each time recording the daily point prevalence of diarrhoea over the past 7 days since the last visit (a 1-week recall period), an approach resulting in continuous daily diarrhoea data. It has been shown that a study in which visits are conducted every 4 weeks instead of every week (again using a 1-week recall period) only requires a 15–30% larger sample size, while reducing the number of visits by 75%. In cluster randomized trials, the sample size increase in this example would even be smaller. Many measurements in the same cluster (e.g. more than 12 per year) yield little additional power, especially if within-cluster correlation of disease or cluster size is large. Of note, studies using diarrhoea as the ‘exposure’ variable require more precise estimates of an individual’s burden of diarrhoea than studies with diarrhoea as ‘outcome’. For example, many studies have examined the effect of diarrhoea LP (the exposure variable) on mortality, malnutrition, or the risk of other infectious diseases (the outcomes)., Imprecision in the measurement of diarrhoea as an ‘exposure’ variable (e.g. due to infrequent sampling) usually biases the effect estimate towards no effect (‘regression dilution bias’). Often, more than 15 to 20 visits will be required to limit bias. Also, when measuring diarrhoea as an ‘exposure’ variable, a short recall period (e.g. 3 days) may be preferable to minimize bias. If diarrhoea is the ‘outcome’ measure, imprecise diarrhoea estimates due to infrequent visits will only affect the precision of the effect estimate, not the effect size. Temporary absence of study participants and logistical constraints can cause prevalence measurements to be taken at irregular intervals. This is not a problem if irregularity occurs at random or at least similarly between comparison groups, which often should be the case. The later analysis should be weighted by the number of measurements in an individual.

Point prevalence vs period prevalence

While some investigators choose to record point prevalence (‘On which of the last X days did you suffer from diarrhoea?’), others collect period prevalence data (‘Did you have diarrhoea at any time during the last X days?’, Box 1). Period prevalence data are often used in large demographic and health surveys. Recording period prevalence may be simpler, but can reduce the difference (if expressed as a prevalence ratio) between two study groups because an individual with an episode several days long may be recorded as having the same disease experience as a person in the other group with only 1 disease day, i.e. period prevalence data bias the prevalence ratio towards no effect, especially if the disease is common (e.g. more than five episodes per person-year). Perhaps counter-intuitively, period prevalence as an imprecise outcome measure can achieve a higher study power than point prevalence even if the recall period is the same, because differences between individuals (i.e. the coefficient of variation of the mean LP) are reduced. However, period prevalence data are inappropriate to capture changes in illness duration. Effect sizes will be strongly biased towards no effect if an intervention primarily works by reducing episode duration, and will be exaggerated if an intervention primarily reduces short episodes (Table 1, Guatemala study). To conclude, investigators need to balance the advantages of using period prevalence data (easy to collect, slightly more powerful in many situations) with the risk of bias, which depends on the effect of the factor under study on illness duration. The collection of daily point prevalence with a limited recall period provides flexibility to use either outcome measure, but investigators should specify in advance which is to serve as the main study outcome to protect against selectively choosing a measure that provides the result most aligned with the investigators’ pre-conception.

Adjusting for baseline diarrhoea and age

In many trials, investigators measure diarrhoea at baseline (before randomization). In general, baseline measurements in trials may serve to (i) verify randomization success, (ii) adjust the final analysis for imbalances and (iii) increase precision of the treatment effect by including the baseline measure as a covariate in an adjusted analysis. The latter two uses require that the baseline measure be strongly associated with the later outcome to be effective., Concerns over imbalances in diarrhoea prevalence at baseline have in the past prevented or severely delayed publication of trials. However, caution is warranted in interpreting baseline diarrhoea data, specifically when used to verify the success of randomization. Most demographic variables commonly assessed at recruitment, such as date of birth, gender, family size or socio-economic status, do not change rapidly (if at all) and may later be used to adjust for imbalances. In contrast, diarrhoea prevalence is highly variable over time. If an individual has diarrhoea at baseline, it indicates that they may be more prone to diarrhoea during the follow-up period, but this depends on the within-person clustering of disease in a given setting. Typically, diarrhoea trials are designed to detect a certain difference between trial arms given a pre-specified number of repeat measurements (often more than 10), assuming a chance of false positivity of, say 0.05. A ‘single’ measurement at baseline in the same number of people has a considerable chance of suggesting a relevant imbalance where there may be none. It has been suggested that multiple baseline measurements collected during a run-in period could improve the efficiency of studies with both continuous and incidence rate outcomes, but this may not necessarily apply to diarrhoea. For example, Figure 1 plots the village-level diarrhoea incidence in 11 control villages from a randomized trial of solar water disinfection in Bolivia. The baseline diarrhoea measurement included 6 weeks of surveillance (six measurements per individual) that were collected 6 months before the intervention. As Figure 1 illustrates, baseline incidence bears no relation to incidence during the year-long intervention period. Several factors may have contributed to this, such as the long gap between baseline measurement and the actual trial and, in particular, the high spatial and temporal variability of diarrhoea often observed in the field. This contrasts with strong associations between baseline village HIV prevalence and subsequent incidence, or between baseline height-for-age Z-scores and subsequent height measurements.

Figure 1

Village-level diarrhoea incidence during a 12-month follow-up period in 11 control villages that participated in an intervention trial of solar water disinfection. Vertical lines mark bootstrapped 95% confidence intervals. The follow-up incidence is plotted against baseline incidence measured over a 6-week period (A), and against the village rank in baseline incidence over that same period (B) Table 3 shows the effect of adjusting for baseline diarrhoea (single measurement) or age on the effect estimate and standard error (SE) in studies available to us (age usually is a strong predictor of diarrhoea). In some cases, adjusting for baseline diarrhoea or age can have a relevant effect on the effect estimates (e.g. Kenya and Colombia), and often reduces the SE. However, adjusting for covariates in RCTs by using statistical models in general can lead to bias, and should be conducted with caution. The protocol for adjusted analyses in randomized trials to gain study power or reduce bias should be pre-specified and reserved for large studies, where statistical models may be less biased. The age adjustments shown in Table 3, however, do not suggest a great gain in study power in large studies. In cluster randomized trials the gain in power due to baseline adjustment may be even lower than in individually randomized trials, especially if the between-cluster variation is high. Based on these results and Figure 1, we infer that baseline diarrhoea would make a poor matching or stratification variable in a trial's design.

Table 3

Effect of adjusting for baseline diarrhoea or age on point estimate and SE

				Crude analysis			Adjusted analysis
References	Country	Age range (years)	N	PR	SE	P	PR	SE	P	SE change (%)
Adjustment for baseline diarrhoea
Clasen et al.⁶⁵	Bolivia	0–80	317	0.55	0.16	0.042	0.52	0.16	0.038	+1
Boisson et al.⁶⁶	Congo	0–84	1144	0.85	0.15	0.336	0.88	0.15	0.447	+1
Colford et al.⁶⁷	USA	55–95	770	0.90	0.06	0.119	0.90	0.06	0.123	+0.1
Clasen et al.⁶⁸	Colombia	0–82	684	0.54	0.14	0.017	0.54	0.13	0.015	−1
Boisson et al.⁶⁹	Ethiopia	0–91	1516	0.75	0.09	0.011	0.74	0.08	0.007	−3
Trotta⁵⁹	Peru	0.5–1.5	483	0.98	0.20	0.902	1.03	0.18	0.850	−9
Tiwari et al.⁷⁰	Kenya	<15	216	0.37	0.13	0.004	0.31	0.12	0.002	−9
Adjustment for age
Tiwari et al.⁷⁰	Kenya	<15	216	0.37	0.13	0.004	0.37	0.13	0.005	+1
Colford et al.⁶⁷	USA	55–95	770	0.90	0.06	0.119	0.91	0.06	0.129	+0.3
VAST¹²	Ghana	0–5	1918	0.99	0.01	0.316	1.01	0.01	0.289	−1
Reller et al.⁴⁴	Guatemala	0–80	2980	0.86	0.08	0.106	0.86	0.08	0.112	−2
Boisson et al.⁶⁹	Ethiopia	0–91	1516	0.75	0.09	0.011	0.75	0.08	0.009	−3
Boisson et al.⁶⁶	Congo	0–84	1144	0.85	0.15	0.336	0.88	0.14	0.434	−3
Clasen et al.⁶⁸	Colombia	0–82	684	0.54	0.14	0.017	0.43	0.13	0.010	−6
Clasen et al.⁶⁵	Bolivia	0–80	317	0.55	0.16	0.042	0.48	0.12	0.005	−23

Age adjustment was made with age as categorical variable (<1 year, 1 to <2 years, 2 to <3 years, 3 to <5 years, 5 to <10 years, 10 to <15 years, ≥15 years), except for the US elderly population (55–64, 65–74, 75–84 and 85–95 years); PR, prevalence ratio.

Effect of adjusting for baseline diarrhoea or age on point estimate and SE Age adjustment was made with age as categorical variable (<1 year, 1 to <2 years, 2 to <3 years, 3 to <5 years, 5 to <10 years, 10 to <15 years, ≥15 years), except for the US elderly population (55–64, 65–74, 75–84 and 85–95 years); PR, prevalence ratio. To conclude, a single baseline measurement of diarrhoea should primarily be useful to confirm trial procedures and familiarize study participants and field staff with measurement procedures. Occasionally, it has been observed that the first or a single measurement in a trial may provide implausibly high estimates compared with follow-up visits.,, Participants concerned about potentially not being included in a trial may over report the disease at first visit. A baseline measurement that would not be included in a later analysis may limit the impact of this possible effect.

Group-level clustering and design effect

Many diarrhoea studies need to consider clustering of diarrhoea in households or villages/neighbourhoods, e.g. if an intervention is randomized at group level. The effect of clustering can be expressed as the design effect DEFF, the factor by which the sample size needs to be increased to account for clustering: where m is the number of individuals per cluster, and ICC is the intra-cluster correlation coefficient. Estimating ICC and DEFF is one of the most challenging aspects in complex diarrhoea trials. Both depend on factors such as (i) mean number of persons per cluster, (ii) mean number of measurements per person, (iii) within-person correlation of diarrhoea (which strongly depends on the age range included) and (iv) the differences in diarrhoea risk between clusters (i.e. the between-cluster variability). In areas where a substantial proportion of diarrhoea occurs as localized epidemics shifting from place to place, between-cluster variability (i.e. ICC and DEFF) will be high because some areas may be experiencing an outbreak at the time of study, whereas others are not. In addition, the DEFF increases if cluster size and number of measurements per individual vary, which is usually the case in field studies. Calculating the DEFF for diarrhoea as a binary outcome based on an ICC estimate is not straightforward, perhaps best highlighted by the many different methods available., Estimating the ICC treating diarrhoea LP as a continuous outcome is problematic since follow-up time usually differs between individuals. Alternatively, the DEFF can be estimated directly from the SEs of the log prevalence ratio or log rate ratio resulting from clustered and unclustered analyses: where SEclustered is the standard error from an analysis accounting for clustering, and SEunclustered is the standard error from an analysis ignoring clustering. We calculated DEFFs from the data of several randomized trials available to us using this formula (for details, see footnote of Table 4). We calculated DEFFs separately for within-person and within-cluster correlation of disease to show the effect of group-level clustering in addition to the design effect due to within-person correlation.

Table 4

Design effects in diarrhoea studies

Authors	Country	Age range (years)	Individuals (N)	Follow-up method	Weeks follow-up per person, mean (SD)	Outcome measure	Mean LP (% days or weeks ill)	Persons per cluster, mean (SD)	DEFF overall	DEFF due to clustering by person	DEFF due to clustering by household or village
Household clustering
Clasen et al.⁶⁵	Bolivia (rural)	0–80	317	Monthly visits with 7-day recall	4.8 (0.5)	Weekly period prevalence	3.7	5.4 (2.4)	1.3	1.3	1.0
Colford et al.⁶⁷	USA (rural)	55–95	770	Weekly visits with 7-day recall	45.1 (13.6)	Weekly period prevalence	7.4	1.4 (0.5)	2.4 (1.4)^a	2.3 (1.3)^a	1.0 (1.0)^a
Reller et al.⁷⁹	Guatemala (rural)	0–80	2980	Weekly visits with 7-day recall	52 (0)	Daily point prevalence	2.5	6.3 (2.4)	46.8 (7.1)^a	44.0 (5.9)^a	1.1 (1.2)^a
Arnold et al.⁸⁰	India (rural)	0–5	1184	Monthly visits with 7-day recall	11.1 (2.5)	Weekly period prevalence	1.8	1.4 (0.6)	1.4	1.3	1.1
Luby et al.⁸¹	Bangladesh (rural)	0–5	1278	Monthly visits with 2-day recall	4.7 (1.2)	2-days period prevalence	11.0	1.3 (0.5)	3.1	2.5	1.2
Boisson et al.⁶⁹	Ethiopia (rural)	0–91	1516	Monthly visits with 7-day recall	10 (1.2)	Weekly period prevalence	3.5	4.8 (2.4)	1.4	1.2	1.2
Tiwari et al.⁷⁰	Kenya (rural)	0–15	216	Monthly visits with 7-day recall	5.3 (1.4)	Daily point prevalence	3.6	3.3 (1.4)	7.4	5.7	1.3
Mausezahl et al.¹⁰	Bolivia (rural)	0–5	725	Weekly visits with 7-day recall	32.9 (9.9)	Weekly period prevalence	9.4	1.7 (0.7)	14.2 (6.1)^a	10.1 (4.0)^a	1.4 (1.5)^a
Boisson et al.⁶⁶	DRC (rural)	0–84	1144	Monthly visits with 7-day recall	10.6 (1.2)	Weekly period prevalence	3.5	4.8 (2.4)	2.5	1.8	1.4
Luby et al.⁸²	Pakistan (urban)	0–15	4691	Twice-weekly visit with 3–4 day recall	44.8 (11.0)	Weekly period prevalence	3.4	5.2 (2.2)	6.2 (4.5)^a	4.3 (2.8)^a	1.4 (1.6)^a
Van der Hoek et al.⁸³	Pakistan (rural)	0–80	1500	Weekly visits with 7-day recall	58.4 (10.0)	Daily point prevalence	1.5	6.7 (2.8)	57.5 (7.5)^a	37.5 (4.8)^a	1.5 (1.6)^a
Luby et al.⁸⁴	Pakistan (urban)	0–66	8949	Twice-weekly visit with 3–4 day recall	34.2 (7.6)	Weekly period prevalence	5.0	6.7 (2.6)	5.0 (3.7)^a	3.3 (2.0)^a	1.5 (1.9)^a
Clasen et al.⁶⁸	Colombia (rural)	0–82	684	Monthly visits with 7-day recall	3.8 (0.5)	Weekly period prevalence	7.1	5.0 (2.5)	3.2	1.5	2.1
Clasen et al.⁸⁵	Bolivia (rural)	1–86	278	Monthly visits with 7-day recall	3.9 (0.3)	Weekly period prevalence	16.7	5.6 (1.8)	4.6	1.4	3.2
Village/neighbourhood clustering
Barreto et al.⁸⁶	Brazil (urban)	0–5	1880	Thrice-weekly visit with 2–3 day recall	37.6 (17.9)	Daily point prevalence	2.9	76.4 (24.5)	1.0	10.2	0.1
Reller et al.⁷⁹	Guatemala (rural)	0–80	2980	Weekly visits with 7-day recall	52 (0)	Daily point prevalence	2.5	248.5 (159.7)	50.2 (6.1)^a	44.0 (4.7)^a	1.1 (1.3)^a
Trotta⁵⁹	Peru (urban)	0–2	483	Weekly visits with 7-day recall	6.7 (0.6)	Daily point prevalence	9.6	24.2 (1.0)	6.6	5.5	1.2
Arnold et al.⁸⁰	India (rural)	0–5	1184	Monthly visits with 7-day recall	11.1 (2.5)	Weekly period prevalence	1.8	51.4 (17.4)	1.6	1.3	1.3
Mausezahl et al.¹⁰	Bolivia (rural)	0–5	725	Weekly visits with 7-day recall	32.9 (9.9)	Weekly period prevalence	9.4	33.0 (7.6)	19.2 (7.4)^a	10.1 (4.0)^a	1.9 (1.8)^a
Van der Hoek et al.⁸³	Pakistan (rural)	0–80	1500	Weekly visits with 7-day recall	58.4 (10.0)	Daily point prevalence	1.5	146.8 (66.7)	115.1 (38.2)^a	37.5 (4.8)^a	3.1 (7.9)^a
Luby et al.⁸¹	Bangladesh (rural)	0–5	1278	Monthly visits with 2-day recall	4.7 (1.2)	2-days period prevalence	11.0	12.4 (2.7)	14.8	2.7	5.5
Luby et al. ⁸²	Pakistan (urban)	0–15	4691	Twice-weekly visit with 3–4 day recall	44.8 (11.0)	Weekly period prevalence	3.4	130.3 (34.2)	56.7 (46.8)^a	4.3 (2.8)^a	13.2 (16.6)^a
Luby et al.⁸⁴	Pakistan (urban)	0–66	8949	Twice-weekly visit with 2–3 day recall	34.2 (7.6)	Weekly period prevalence	5.0	190 (42.7)	82.0 (51.5)^a	3.2 (2.0)^a	25.8 (25.4)^a

Design effects were calculated as DEFF = SEclustered2/SEunclustered2; diarrhoea was treated as a binary variable. Prevalence of diarrhoea (days or weeks with diarrhoea over days or weeks observed) was compared between treatment and control using log-binomial regression (family = binomial, link = log). Clustering was accounted for by using robust SE. SEunclustered was calculated ignoring any within-person or within-group correlation of diarrhoea. SEclustered to account for clustering by person was estimated with person identification (ID) as the cluster variable. SEclustered to account for clustering at group level used the group ID as cluster variable (the unit of randomization). If the original data provided no treatment variable, clusters were allocated post-hoc at random to a simulated treatment and control arm. DEFF due to clustering at group level was calculated as DEFFoverall/ DEFFperson.

aDEFFs show the same analyses for incidence (new episodes/days or weeks observed), calculated analogously to prevalence, using Poisson regression without (SEunclustered) and with robust SE to account for clustering (SEclustered).

Design effects in diarrhoea studies Design effects were calculated as DEFF = SEclustered2/SEunclustered2; diarrhoea was treated as a binary variable. Prevalence of diarrhoea (days or weeks with diarrhoea over days or weeks observed) was compared between treatment and control using log-binomial regression (family = binomial, link = log). Clustering was accounted for by using robust SE. SEunclustered was calculated ignoring any within-person or within-group correlation of diarrhoea. SEclustered to account for clustering by person was estimated with person identification (ID) as the cluster variable. SEclustered to account for clustering at group level used the group ID as cluster variable (the unit of randomization). If the original data provided no treatment variable, clusters were allocated post-hoc at random to a simulated treatment and control arm. DEFF due to clustering at group level was calculated as DEFFoverall/ DEFFperson. aDEFFs show the same analyses for incidence (new episodes/days or weeks observed), calculated analogously to prevalence, using Poisson regression without (SEunclustered) and with robust SE to account for clustering (SEclustered). DEFFs for ‘household’ clustering are quite similar across studies, ranging from one to approximately three regardless of the study design (Table 4). In contrast, we found very different design effects of up to 22 if the unit of clustering was large (villages or neighbourhoods). In one case (urban Brazil), the design effect was much smaller for the analysis accounting for neighbourhood clustering compared with the analysis accounting for within-person clustering only. In this setting an individually randomized trial may require a larger sample size than a cluster randomized trial, because children in the same cluster had very different diarrhoea risks, whereas the cluster-level diarrhoea risks were similar. For six studies with continuous diarrhoea records we did the same calculation for incidence of new episodes [Table 4, DEFFs in brackets], mostly resulting in much lower within-person DEFFs and slightly higher household DEFFs compared with prevalence data. The DEFFs for incidence vs prevalence due to village/neighbourhood clustering were quite different in three of the six studies (rural Bolivia, rural Pakistan and urban Brazil). Overall, DEFFs in trials randomizing large clusters are difficult to predict unless previous data from the same site are available. Randomization of large clusters should perhaps be ‘avoided like the plague’ unless the research question requires it. The DEFFs due to ‘within-person’ correlation very strongly depend on the number of measurements (Table 4), showing again that many repeated measurements contribute little to study power. Continuous surveillance of daily point prevalence generally results in extremely large within-person DEFFs because of high day-to-day correlation. DEFFs are much reduced if measurements are either reduced to period prevalence, or separated by intervals between measurements. Repeat measures add to the complexity of sample size calculations for cluster randomized diarrhoea trials, since the number of measurements also affects ‘group level’ ICC and hence DEFF. Sample size calculations for diarrhoea trials may have to be pragmatic and, even more so than for diseases that do not recur, undergo an iterative process testing different sampling intervals and cluster sizes. Several approaches are available. If diarrhoea is common, it can make sense to treat diarrhoea as a continuous variable (e.g. LP or number of episodes per person) and remove one level of complexity. This requires knowledge of the mean LP or number of episodes and the SD given a specified number of measurements (examples have been published elsewhere). The sample size resulting from simple formulae for the comparison of two means can be multiplied by a group-level DEFF deemed appropriate (Table 4). Note that the presence of several levels of clustering (e.g. person, household and area) does not necessarily require accounting for all of them in a later analysis. In cluster-randomized trials it is often sufficient to incorporate clustering at the level of the unit of randomization, i.e. the level of independence. This is because lower-level correlation of disease should increase the between-cluster variation at higher levels, which increases the SE accordingly.

Conclusion

When planning a study that measures diarrhoea, investigators must jointly consider the interdependent methodological points we have discussed in this article, which include recall periods, measures of disease occurrence (incidence vs prevalence), sampling frequencies and design effects. For example, the sampling frequency and the choice of the measure of disease occurrence can both influence the design effect. Conversely, the design effect can influence the choice of the sampling frequency or the recall period, because a strong design effect limits the study power gained from frequent sampling and long recall periods. Further, study settings differ from one another, especially in their logistics, which in turn has great implications for the study design. In some places it is difficult to recruit and train many field workers; in others it may be difficult to recruit study participants. As a consequence, it is difficult to develop universally applicable guidelines or a simple algorithm to identify the best way to measure diarrhoea in a specific study. In Table 5 we list examples of diarrhoea studies and suggest approaches to measure diarrhoea. None of our suggestions is meant to be absolute. As already suggested by Table 2, investigators must consider the research question first, as many critical decisions depend on it. For example, incidence of diarrhoea (such as hospital admissions) could be the preferred measure in vaccine trials. Point or period prevalence measured at long intervals could be ideal for large environmental health interventions in high-risk populations where many villages and individuals need to be surveyed over a long time. A high-risk study population here means a setting where malnutrition and case fatality are a public health problem. Some studies (such as Demographic and Health Surveys) require obtaining precise absolute prevalence figures, for which collecting point prevalence data with a short recall period is most suitable.

Table 5

Examples of different epidemiological studies and suggested sampling strategy

	Study example	Suggested sampling strategy
1	Context: RCT of household level food hygiene promotion to reduce the burden of diarrhoea, delivered by community health workers to mothers of young children Study population: children aged <5 years Logistics: adequate budget, trained staff and large eligible population available	Outcome measure: LP Surveillance duration: 1 year Sampling frequency: every 6–8 weeks (∼6–9 contacts) Recall period: 3 days Data type: point prevalence Comment: Incidence is not suitable as the treatment aims to lower disease burden, for which LP is likely to be a better measure Sampling at intervals (with a corresponding increase in the overall sample size) is chosen to decrease survey effects and bias. 3-day recall is chosen to minimize recall error. The study is done over 1 year to study potential seasonal effects in food contamination. For the sample size within-household clustering can be ignored as the average number of young children per household is usually small (less than two).
2	Context: RCT of household level food hygiene promotion to reduce the burden of diarrhoea, delivered by community health workers to mothers of young children Study population: children aged <5 years Logistics: tight budget, trained staff scarce, large eligible population available	Outcome measure: LP Surveillance duration: 5 months Sampling frequency: every 4 weeks (∼6 contacts) Recall period: 7 days Data type: period prevalence Comment: Incidence is not suitable as the treatment aims to lower disease burden, for which LP is likely to be a better measure. Sampling at intervals (with a corresponding increase in the overall sample size) is chosen to decrease survey effects and bias. 7-day recall (period prevalence) is chosen to maximize power. The study is restricted to 5 months because of the tight budget, focussing on the hot season where food contamination may be most common. For the sample size within-household clustering can be ignored as the number of young children per household is small
3	Context: RCT of household level food hygiene promotion to reduce the burden of diarrhoea, delivered by community health workers to mothers of young children Study population: children aged <5 years Logistics: tight budget, trained staff scarce, eligible population small (e.g. refugee camp)	Outcome measure: LP Surveillance duration: 5 months Sampling frequency: every 2 weeks (∼12 contacts) Recall period: 3 days Data type: point prevalence Comment: Incidence is not suitable as the treatment aims to lower disease burden, for which LP is likely to be a better measure. Frequent sampling is chosen to make the most of the small sample size. Short recall (point prevalence) is chosen to minimize recall error. Because of the short visit intervals, longer recall periods do not add much power.⁵⁶ The study is restricted to 5 months because of the tight budget, focussing on the hot season where food contamination may be most common. For the sample size within-household clustering can be ignored as the average number of young children per household is usually small (less than two).
4	Context: RCT of a new vaccine against a pathogen causing severe diarrhoea Study population: children aged <5 years Logistics: adequate budget, trained staff and large eligible population available	Outcome measure: incidence Surveillance duration: 12 months Sampling approach: passive surveillance of hospital admissions Recall period: Not applicable (NA) Data type: incidence of severe episodes Comment: Incidence is suitable as the treatment aims to lower disease transmission of a specific pathogen. Passive surveillance is chosen because a vaccine can be delivered relatively easily to a large study population, focussing on episodes of particular clinical interest. Because hospital admissions do not allow estimating the effect of the vaccine on LP (a better marker for adverse effects on nutritional status), one could consider adding a substudy with active surveillance similar to Example 1, as was done in a Vitamin A trial in Ghana.¹²
5	Context: cluster RCT of a large rural sanitation programme delivered at village level Study population: all ages Logistics: tight budget, trained staff scarce, large eligible population available	Outcome measure: LP Surveillance duration: 1 year Sampling frequency: every 6–8 weeks (∼6–9 contacts) Recall period: 7 days Data type: period prevalence Comment: Incidence is not suitable as the treatment aims to lower disease burden, for which LP is likely to be a better measure. Sampling at long intervals (with a corresponding increase in the number of included villages) is chosen to limit the number of surveillance teams and transport costs. The sampling procedure aims to measure the outcome in one village per day per team. In a cluster randomized trial, more frequent surveillance rounds add relatively little power. 7-day recall (period prevalence) is chosen to maximize power. Data on 3-day point prevalence can be obtained in addition as a secondary outcome. The effect of the intervention in children aged <5 years can be a secondary outcome. Because of the great uncertainties in study power due to the cluster-design, it is preferable to include all household members to maximize power. This is specifically the case if there is little reason to assume the intervention will affect young and older ages differently.
6	Context: observational study with recurrent infections as exposure (e.g. to study association between diarrhoea and reduction in weight-for-age Z-score) Study population: children aged <5 years Logistics: adequate budget, trained staff and large eligible population available	Outcome measure: LP Surveillance duration: 1 year Sampling frequency: every 2 weeks (∼20–25 visits) Recall period: 3 days Data type: point prevalence Comment: Frequent sampling is chosen to minimize bias towards no effect. Efforts should be made to keep study participants happy and interested. Bias is not a great concern in an observational study without differential treatment of study participants. Short recall period is chosen to minimize bias that could exaggerate the effect size.
7	Context: observational study >1 year, aimed at detailed exploration of clinical features of individual episodes (e.g. illness duration, severity, clinical signs and symptoms, stool testing for pathogens) Study population: children aged <5 years Logistics: adequate budget, trained staff and large eligible population available	Outcome measure: incidence Surveillance duration: 1 year Sampling frequency: once a week (∼50 contacts) Recall period: 7 days Data type: point prevalence data from which incidence can be calculated Comment: Frequent sampling is chosen to accurately establish the beginning and end of episodes, and to record clinical signs and symptoms in detail. Efforts should be made to keep study participants happy and interested. Bias is not a great concern in an observational study without treatment allocation of study participants. Continuous disease records may be needed, but depending on the budget, the surveillance period can be cut into blocks of, e.g., 6–8 weeks where surveillance is intense. This could allow capturing different seasons where different pathogens may circulate (dry cold season, wet season, hot season).
8	Context: demographic and health survey (DHS). The aim of the survey is to gain information on a range of topics, but the investigator also wishes to explore risk factors for diarrhoea (e.g. water, sanitation, socio-economic status) Study population: all ages Logistics: adequate budget, trained staff and large eligible population available	Outcome measure: LP Sampling frequency: one visit Recall period: 2–3 days Data type: point prevalence Comment: A short recall period is preferred to minimize recall error. A DHS usually aims to estimate prevalence as an absolute figure, not primarily to compare two groups, and therefore requires accurate data. Given the large sample size of most DHS surveys, loss of power due to a short recall period is normally not a big issue. Point prevalence data may often be easier to interpret and compare with, than period prevalence data, since diarrhoea definitions used in most DHS and epidemiological studies are based on disease experience during one day.

Examples of different epidemiological studies and suggested sampling strategy Context: RCT of household level food hygiene promotion to reduce the burden of diarrhoea, delivered by community health workers to mothers of young children Study population: children aged <5 years Logistics: adequate budget, trained staff and large eligible population available Outcome measure: LP Surveillance duration: 1 year Sampling frequency: every 6–8 weeks (∼6–9 contacts) Recall period: 3 days Data type: point prevalence Incidence is not suitable as the treatment aims to lower disease burden, for which LP is likely to be a better measure Sampling at intervals (with a corresponding increase in the overall sample size) is chosen to decrease survey effects and bias. 3-day recall is chosen to minimize recall error. The study is done over 1 year to study potential seasonal effects in food contamination. For the sample size within-household clustering can be ignored as the average number of young children per household is usually small (less than two). Context: RCT of household level food hygiene promotion to reduce the burden of diarrhoea, delivered by community health workers to mothers of young children Study population: children aged <5 years Logistics: tight budget, trained staff scarce, large eligible population available Outcome measure: LP Surveillance duration: 5 months Sampling frequency: every 4 weeks (∼6 contacts) Recall period: 7 days Data type: period prevalence Incidence is not suitable as the treatment aims to lower disease burden, for which LP is likely to be a better measure. Sampling at intervals (with a corresponding increase in the overall sample size) is chosen to decrease survey effects and bias. 7-day recall (period prevalence) is chosen to maximize power. The study is restricted to 5 months because of the tight budget, focussing on the hot season where food contamination may be most common. For the sample size within-household clustering can be ignored as the number of young children per household is small Context: RCT of household level food hygiene promotion to reduce the burden of diarrhoea, delivered by community health workers to mothers of young children Study population: children aged <5 years Logistics: tight budget, trained staff scarce, eligible population small (e.g. refugee camp) Outcome measure: LP Surveillance duration: 5 months Sampling frequency: every 2 weeks (∼12 contacts) Recall period: 3 days Data type: point prevalence Incidence is not suitable as the treatment aims to lower disease burden, for which LP is likely to be a better measure. Frequent sampling is chosen to make the most of the small sample size. Short recall (point prevalence) is chosen to minimize recall error. Because of the short visit intervals, longer recall periods do not add much power. The study is restricted to 5 months because of the tight budget, focussing on the hot season where food contamination may be most common. For the sample size within-household clustering can be ignored as the average number of young children per household is usually small (less than two). Context: RCT of a new vaccine against a pathogen causing severe diarrhoea Study population: children aged <5 years Logistics: adequate budget, trained staff and large eligible population available Outcome measure: incidence Surveillance duration: 12 months Sampling approach: passive surveillance of hospital admissions Recall period: Not applicable (NA) Data type: incidence of severe episodes Incidence is suitable as the treatment aims to lower disease transmission of a specific pathogen. Passive surveillance is chosen because a vaccine can be delivered relatively easily to a large study population, focussing on episodes of particular clinical interest. Because hospital admissions do not allow estimating the effect of the vaccine on LP (a better marker for adverse effects on nutritional status), one could consider adding a substudy with active surveillance similar to Example 1, as was done in a Vitamin A trial in Ghana. Context: cluster RCT of a large rural sanitation programme delivered at village level Study population: all ages Logistics: tight budget, trained staff scarce, large eligible population available Outcome measure: LP Surveillance duration: 1 year Sampling frequency: every 6–8 weeks (∼6–9 contacts) Recall period: 7 days Data type: period prevalence Incidence is not suitable as the treatment aims to lower disease burden, for which LP is likely to be a better measure. Sampling at long intervals (with a corresponding increase in the number of included villages) is chosen to limit the number of surveillance teams and transport costs. The sampling procedure aims to measure the outcome in one village per day per team. In a cluster randomized trial, more frequent surveillance rounds add relatively little power. 7-day recall (period prevalence) is chosen to maximize power. Data on 3-day point prevalence can be obtained in addition as a secondary outcome. The effect of the intervention in children aged <5 years can be a secondary outcome. Because of the great uncertainties in study power due to the cluster-design, it is preferable to include all household members to maximize power. This is specifically the case if there is little reason to assume the intervention will affect young and older ages differently. Context: observational study with recurrent infections as exposure (e.g. to study association between diarrhoea and reduction in weight-for-age Z-score) Study population: children aged <5 years Logistics: adequate budget, trained staff and large eligible population available Outcome measure: LP Surveillance duration: 1 year Sampling frequency: every 2 weeks (∼20–25 visits) Recall period: 3 days Data type: point prevalence Frequent sampling is chosen to minimize bias towards no effect. Efforts should be made to keep study participants happy and interested. Bias is not a great concern in an observational study without differential treatment of study participants. Short recall period is chosen to minimize bias that could exaggerate the effect size. Context: observational study >1 year, aimed at detailed exploration of clinical features of individual episodes (e.g. illness duration, severity, clinical signs and symptoms, stool testing for pathogens) Study population: children aged <5 years Logistics: adequate budget, trained staff and large eligible population available Outcome measure: incidence Surveillance duration: 1 year Sampling frequency: once a week (∼50 contacts) Recall period: 7 days Data type: point prevalence data from which incidence can be calculated Frequent sampling is chosen to accurately establish the beginning and end of episodes, and to record clinical signs and symptoms in detail. Efforts should be made to keep study participants happy and interested. Bias is not a great concern in an observational study without treatment allocation of study participants. Continuous disease records may be needed, but depending on the budget, the surveillance period can be cut into blocks of, e.g., 6–8 weeks where surveillance is intense. This could allow capturing different seasons where different pathogens may circulate (dry cold season, wet season, hot season). Context: demographic and health survey (DHS). The aim of the survey is to gain information on a range of topics, but the investigator also wishes to explore risk factors for diarrhoea (e.g. water, sanitation, socio-economic status) Study population: all ages Logistics: adequate budget, trained staff and large eligible population available Outcome measure: LP Sampling frequency: one visit Recall period: 2–3 days Data type: point prevalence A short recall period is preferred to minimize recall error. A DHS usually aims to estimate prevalence as an absolute figure, not primarily to compare two groups, and therefore requires accurate data. Given the large sample size of most DHS surveys, loss of power due to a short recall period is normally not a big issue. Point prevalence data may often be easier to interpret and compare with, than period prevalence data, since diarrhoea definitions used in most DHS and epidemiological studies are based on disease experience during one day. We did not describe a number of important methodological challenges in diarrhoea trials that have been discussed elsewhere, such as the clinical definition of disease severity,,,,, or objective proxy markers for diarrhoea in trials of interventions that cannot be blinded. We also did not discuss recent advances in diagnostic tools for pathogen identification currently in use in some population-based studies. Diarrhoea continues to be a major global health problem, and there is an ongoing debate over identifying research priorities and the development of cheap and effective interventions, given the limited funding.,, Whereas standard clinical trial procedures are often adequate to assess the effect of a vaccine or drug on diarrhoea in individuals, environmental interventions aiming at diarrhoea control are often much more complex, and more difficult to evaluate with randomized trials. Efficient methods to measure diarrhoea should allow more valid and generalizable results from research to be conducted with the same resources, especially in settings where resources are scarce.

Funding

This work was funded by the Wellcome Trust (WT082569AIA).

Diarrhoea day	Since diarrhoea symptoms occur intermittently, diarrhoea case definitions in epidemiology are usually based on the nature and frequency of symptoms experienced during one day (or 24 h). A diarrhoea case is therefore equivalent to a ‘diarrhoea day’. For example, the WHO definition requires the occurrence ‘3 or more loose or liquid stools per day’.
Diarrhoea episode	One or more diarrhoea days occurring closely in time, presumably caused by a single agent or the interaction of multiple causative agents (e.g. as super-infection). Defining a diarrhoea episode requires deciding on how many diarrhoea-free days separate independent episodes. This decision is necessarily pragmatic especially in high-risk settings, as it is usually difficult to know whether diarrhoea days occurring closely in time belong to the same episode or not.
Diarrhoea incidence	The number of diarrhoea episodes per person-time (incidence density) or over a defined period of time (cumulative incidence).
Diarhoea point prevalence	The proportion of the population experiencing a diarrhoea day at the time of interest, e.g. the day of a surveillance visit or the day before.
Diarrhoea period prevalence	The proportion of the population experiencing at least 1 day with diarrhoea over a pre-defined time window (recall period) prior to a given point in time, e.g. a surveillance visit by the study team.
Recall period	The period of time over which the occurrence of diarrhoea is assessed at each contact with a study participant (e.g. phone call or home visit). To measure point prevalence, the recall period is treated as individual days (for example: ‘on which of the last 7 days did you have diarrhoea?’). To measure period prevalence, the recall period is treated as a single time window (e.g. ‘did you have diarrhoea at any day during the last 7 days?’). Thus, when using a 7-day recall period, a single surveillance visit yields 7-point prevalence datapoints, but only one period prevalence datapoint.
Longitudinal prevalence	The proportion of time an individual has diarrhoea. This can either be the proportion of days with diarrhoea (for point prevalence), or the proportion of time windows with at least 1 diarrhoea day (for period prevalence). For example, a person reporting diarrhoea on 10% of days has a longitudinal point prevalence of diarrhoea of 10%. A person reporting diarrhoea at any time in the last week, in 10% of weeks of surveillance has a longitudinal period prevalence of 10%. Note that while prevalence is a population measure of disease occurrence, LP is an individual measure. A person can have an LP of 10%, but not a prevalence of 10%. At population level, LP is best described by the mean and SD of individual LP values.

82 in total

1. Reducing diarrhea through the use of household-based ceramic water filters: a randomized, controlled trial in rural Bolivia.

Authors: Thomas F Clasen; Joseph Brown; Simon Collin; Oscar Suntura; Sandy Cairncross
Journal: Am J Trop Med Hyg Date: 2004-06 Impact factor: 2.345

2. Causal inference methods to study nonrandomized, preexisting development interventions.

Authors: Benjamin F Arnold; Ranjiv S Khush; Padmavathi Ramaswamy; Alicia G London; Paramasivan Rajkumar; Prabhakar Ramaprabha; Natesan Durairaj; Alan E Hubbard; Kalpana Balakrishnan; John M Colford
Journal: Proc Natl Acad Sci U S A Date: 2010-12-13 Impact factor: 11.205

Review 3. Water supply and sanitation: an agenda for research.

Authors: S Cairncross
Journal: J Trop Med Hyg Date: 1989-10

4. Diarrhoea--defining the episode.

Authors: S S Morris; S N Cousens; C F Lanata; B R Kirkwood
Journal: Int J Epidemiol Date: 1994-06 Impact factor: 7.196

Review 5. Measurement issues in trials of pediatric acute diarrheal diseases: a systematic review.

Authors: Bradley C Johnston; Larissa Shamseer; Bruno R da Costa; Ross T Tsuyuki; Sunita Vohra
Journal: Pediatrics Date: 2010-06-21 Impact factor: 7.124

6. Association of diarrhoea and upper respiratory infections with weight and height gains in Bangladeshi children aged 5 to 11 years.

Authors: A M Torres; K E Peterson; A C de Souza; E J Orav; M Hughes; L C Chen
Journal: Bull World Health Organ Date: 2000 Impact factor: 9.408

7. Sampling strategies to measure the prevalence of common recurrent infections in longitudinal studies.

Authors: Wolf-Peter Schmidt; Bernd Genser; Mauricio L Barreto; Thomas Clasen; Stephen P Luby; Sandy Cairncross; Zaid Chalabi
Journal: Emerg Themes Epidemiol Date: 2010-08-03

8. Intermittent slow sand filtration for preventing diarrhoea among children in Kenyan households using unimproved water sources: randomized controlled trial.

Authors: Sangya-Sangam K Tiwari; Wolf-Peter Schmidt; Jeannie Darby; Z G Kariuki; Marion W Jenkins
Journal: Trop Med Int Health Date: 2009-09-04 Impact factor: 2.622

9. Quality of piped and stored water in households with children under five years of age enrolled in the Mali site of the Global Enteric Multi-Center Study (GEMS).

Authors: Kelly K Baker; Samba O Sow; Karen L Kotloff; James P Nataro; Tamer H Farag; Boubou Tamboura; Mama Doumbia; Doh Sanogo; Drissa Diarra; Ciara E O'Reilly; Eric Mintz; Sandra Panchalingam; Yukun Wu; William C Blackwelder; Myron M Levine
Journal: Am J Trop Med Hyg Date: 2013-07-08 Impact factor: 2.345

10. Evaluation of the optimal recall period for disease symptoms in home-based morbidity surveillance in rural and urban Kenya.

Authors: Daniel R Feikin; Allan Audi; Beatrice Olack; Godfrey M Bigogo; Christina Polyak; Heather Burke; John Williamson; Robert F Breiman
Journal: Int J Epidemiol Date: 2010-01-20 Impact factor: 7.196

63 in total

1. The impact of school water, sanitation, and hygiene interventions on the health of younger siblings of pupils: a cluster-randomized trial in Kenya.

Authors: Robert Dreibelbis; Matthew C Freeman; Leslie E Greene; Shadi Saboori; Richard Rheingans
Journal: Am J Public Health Date: 2013-11-14 Impact factor: 9.308

2. Modeling the sustainability of a ceramic water filter intervention.

Authors: Jonathan Mellor; Lydia Abebe; Beeta Ehdaie; Rebecca Dillingham; James Smith
Journal: Water Res Date: 2013-12-03 Impact factor: 11.236

3. Barriers and Enablers to Intervention Uptake and Health Reporting in a Water Intervention Trial in Rural India: A Qualitative Explanatory Study.

Authors: Sarah L McGuinness; Joanne O'Toole; Darshini Ayton; Asha Giriyan; Chetan A Gaonkar; Ramkrishna Vhaval; Allen C Cheng; Karin Leder
Journal: Am J Trop Med Hyg Date: 2020-03 Impact factor: 2.345

4. Use of Serologic Responses against Enteropathogens to Assess the Impact of a Point-of-Use Water Filter: A Randomized Controlled Trial in Western Province, Rwanda.

Authors: Laura Divens Zambrano; Jeffrey W Priest; Emil Ivan; John Rusine; Corey Nagel; Miles Kirby; Ghislaine Rosa; Thomas F Clasen
Journal: Am J Trop Med Hyg Date: 2017-07-27 Impact factor: 2.345

5. Daily Zinc but Not Multivitamin Supplementation Reduces Diarrhea and Upper Respiratory Infections in Tanzanian Infants: A Randomized, Double-Blind, Placebo-Controlled Clinical Trial.

Authors: Christine M McDonald; Karim P Manji; Rodrick Kisenge; Said Aboud; Donna Spiegelman; Wafaie W Fawzi; Christopher P Duggan
Journal: J Nutr Date: 2015-07-22 Impact factor: 4.798

6. H2S as an indicator of water supply vulnerability and health risk in low-resource settings: a prospective cohort study.

Authors: Ranjiv S Khush; Benjamin F Arnold; Padma Srikanth; Suchithra Sudharsanam; Padmavathi Ramaswamy; Natesan Durairaj; Alicia G London; Prabhakar Ramaprabha; Paramasivan Rajkumar; Kalpana Balakrishnan; John M Colford
Journal: Am J Trop Med Hyg Date: 2013-05-28 Impact factor: 2.345

10. Impact of mining projects on water and sanitation infrastructures and associated child health outcomes: a multi-country analysis of Demographic and Health Surveys (DHS) in sub-Saharan Africa.

Authors: Dominik Dietler; Andrea Farnham; Georg Loss; Günther Fink; Mirko S Winkler
Journal: Global Health Date: 2021-06-30 Impact factor: 4.185