| Literature DB >> 22268237 |
Wolf-Peter Schmidt1, Benjamin F Arnold, Sophie Boisson, Bernd Genser, Stephen P Luby, Mauricio L Barreto, Thomas Clasen, Sandy Cairncross.
Abstract
BACKGROUND: Diarrhoea remains a leading cause of morbidity and mortality but is difficult to measure in epidemiological studies. Challenges include the diagnosis based on self-reported symptoms, the logistical burden of intensive surveillance and the variability of diarrhoea in space, time and person.Entities:
Mesh:
Year: 2011 PMID: 22268237 PMCID: PMC3235024 DOI: 10.1093/ije/dyr152
Source DB: PubMed Journal: Int J Epidemiol ISSN: 0300-5771 Impact factor: 7.196
Incidence vs longitudinal prevalence of diarrhoea: impact on study results and interpretation in two randomized trials
| Incidence reduction (%) | Mean LP reduction (%) | |||
|---|---|---|---|---|
| Water treatment | −24 | 0.001 | −14 | 0.185 |
| Vitamin A | −7 | 0.18 | −12 | 0.06 |
Comparison between incidence and longitudinal prevalence
| Incidence | LP | |
|---|---|---|
| Suitable setting | Low diarrhoea risk | High diarrhoea risk |
| Malnutrition and case fatality uncommon | Malnutrition and case fatality a public health problem | |
| Suitable research objectives | Disease surveillance and control | Burden of disease |
| Health services research | Adverse outcomes | |
| Vaccine research | Nutrition studies | |
| Aetiological research | ||
| Data interpretation | Disease transmission | Burden of disease |
| Risk of adverse outcomes | ||
| Definition to separate episodes | Required | Not required |
| Sampling frequency | Usually requires frequent and regular sampling, unless passive surveillance is used | Sampling at long or irregular intervals possible and often logistically efficient |
| Study power | Larger than for LP if exposure or treatment has no effect on episode duration | Larger than for incidence if exposure or treatment reduces episode duration |
Figure 1Village-level diarrhoea incidence during a 12-month follow-up period in 11 control villages that participated in an intervention trial of solar water disinfection. Vertical lines mark bootstrapped 95% confidence intervals. The follow-up incidence is plotted against baseline incidence measured over a 6-week period (A), and against the village rank in baseline incidence over that same period (B)
Effect of adjusting for baseline diarrhoea or age on point estimate and SE
| Crude analysis | Adjusted analysis | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| References | Country | Age range (years) | PR | SE | PR | SE | SE change (%) | |||
| Clasen | Bolivia | 0–80 | 317 | 0.55 | 0.16 | 0.042 | 0.52 | 0.16 | 0.038 | +1 |
| Boisson | Congo | 0–84 | 1144 | 0.85 | 0.15 | 0.336 | 0.88 | 0.15 | 0.447 | +1 |
| Colford | USA | 55–95 | 770 | 0.90 | 0.06 | 0.119 | 0.90 | 0.06 | 0.123 | +0.1 |
| Clasen | Colombia | 0–82 | 684 | 0.54 | 0.14 | 0.017 | 0.54 | 0.13 | 0.015 | −1 |
| Boisson | Ethiopia | 0–91 | 1516 | 0.75 | 0.09 | 0.011 | 0.74 | 0.08 | 0.007 | −3 |
| Trotta | Peru | 0.5–1.5 | 483 | 0.98 | 0.20 | 0.902 | 1.03 | 0.18 | 0.850 | −9 |
| Tiwari | Kenya | <15 | 216 | 0.37 | 0.13 | 0.004 | 0.31 | 0.12 | 0.002 | −9 |
| Tiwari | Kenya | <15 | 216 | 0.37 | 0.13 | 0.004 | 0.37 | 0.13 | 0.005 | +1 |
| Colford | USA | 55–95 | 770 | 0.90 | 0.06 | 0.119 | 0.91 | 0.06 | 0.129 | +0.3 |
| VAST | Ghana | 0–5 | 1918 | 0.99 | 0.01 | 0.316 | 1.01 | 0.01 | 0.289 | −1 |
| Reller | Guatemala | 0–80 | 2980 | 0.86 | 0.08 | 0.106 | 0.86 | 0.08 | 0.112 | −2 |
| Boisson | Ethiopia | 0–91 | 1516 | 0.75 | 0.09 | 0.011 | 0.75 | 0.08 | 0.009 | −3 |
| Boisson | Congo | 0–84 | 1144 | 0.85 | 0.15 | 0.336 | 0.88 | 0.14 | 0.434 | −3 |
| Clasen | Colombia | 0–82 | 684 | 0.54 | 0.14 | 0.017 | 0.43 | 0.13 | 0.010 | −6 |
| Clasen | Bolivia | 0–80 | 317 | 0.55 | 0.16 | 0.042 | 0.48 | 0.12 | 0.005 | −23 |
Age adjustment was made with age as categorical variable (<1 year, 1 to <2 years, 2 to <3 years, 3 to <5 years, 5 to <10 years, 10 to <15 years, ≥15 years), except for the US elderly population (55–64, 65–74, 75–84 and 85–95 years); PR, prevalence ratio.
Design effects in diarrhoea studies
| Authors | Country | Age range (years) | Individuals ( | Follow-up method | Weeks follow-up per person, mean (SD) | Outcome measure | Mean LP (% days or weeks ill) | Persons per cluster, mean (SD) | DEFF overall | DEFF due to clustering by person | DEFF due to clustering by household or village |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Clasen | Bolivia (rural) | 0–80 | 317 | Monthly visits with 7-day recall | 4.8 (0.5) | Weekly period prevalence | 3.7 | 5.4 (2.4) | 1.3 | 1.3 | 1.0 |
| Colford | USA (rural) | 55–95 | 770 | Weekly visits with 7-day recall | 45.1 (13.6) | Weekly period prevalence | 7.4 | 1.4 (0.5) | 2.4 (1.4) | 2.3 (1.3) | 1.0 (1.0) |
| Reller | Guatemala (rural) | 0–80 | 2980 | Weekly visits with 7-day recall | 52 (0) | Daily point prevalence | 2.5 | 6.3 (2.4) | 46.8 (7.1) | 44.0 (5.9) | 1.1 (1.2) |
| Arnold | India (rural) | 0–5 | 1184 | Monthly visits with 7-day recall | 11.1 (2.5) | Weekly period prevalence | 1.8 | 1.4 (0.6) | 1.4 | 1.3 | 1.1 |
| Luby | Bangladesh (rural) | 0–5 | 1278 | Monthly visits with 2-day recall | 4.7 (1.2) | 2-days period prevalence | 11.0 | 1.3 (0.5) | 3.1 | 2.5 | 1.2 |
| Boisson | Ethiopia (rural) | 0–91 | 1516 | Monthly visits with 7-day recall | 10 (1.2) | Weekly period prevalence | 3.5 | 4.8 (2.4) | 1.4 | 1.2 | 1.2 |
| Tiwari | Kenya (rural) | 0–15 | 216 | Monthly visits with 7-day recall | 5.3 (1.4) | Daily point prevalence | 3.6 | 3.3 (1.4) | 7.4 | 5.7 | 1.3 |
| Mausezahl | Bolivia (rural) | 0–5 | 725 | Weekly visits with 7-day recall | 32.9 (9.9) | Weekly period prevalence | 9.4 | 1.7 (0.7) | 14.2 (6.1) | 10.1 (4.0) | 1.4 (1.5) |
| Boisson | DRC (rural) | 0–84 | 1144 | Monthly visits with 7-day recall | 10.6 (1.2) | Weekly period prevalence | 3.5 | 4.8 (2.4) | 2.5 | 1.8 | 1.4 |
| Luby | Pakistan (urban) | 0–15 | 4691 | Twice-weekly visit with 3–4 day recall | 44.8 (11.0) | Weekly period prevalence | 3.4 | 5.2 (2.2) | 6.2 (4.5) | 4.3 (2.8) | 1.4 (1.6) |
| Van der Hoek | Pakistan (rural) | 0–80 | 1500 | Weekly visits with 7-day recall | 58.4 (10.0) | Daily point prevalence | 1.5 | 6.7 (2.8) | 57.5 (7.5) | 37.5 (4.8) | 1.5 (1.6) |
| Luby | Pakistan (urban) | 0–66 | 8949 | Twice-weekly visit with 3–4 day recall | 34.2 (7.6) | Weekly period prevalence | 5.0 | 6.7 (2.6) | 5.0 (3.7) | 3.3 (2.0) | 1.5 (1.9) |
| Clasen | Colombia (rural) | 0–82 | 684 | Monthly visits with 7-day recall | 3.8 (0.5) | Weekly period prevalence | 7.1 | 5.0 (2.5) | 3.2 | 1.5 | 2.1 |
| Clasen | Bolivia (rural) | 1–86 | 278 | Monthly visits with 7-day recall | 3.9 (0.3) | Weekly period prevalence | 16.7 | 5.6 (1.8) | 4.6 | 1.4 | 3.2 |
| Barreto | Brazil (urban) | 0–5 | 1880 | Thrice-weekly visit with 2–3 day recall | 37.6 (17.9) | Daily point prevalence | 2.9 | 76.4 (24.5) | 1.0 | 10.2 | 0.1 |
| Reller | Guatemala (rural) | 0–80 | 2980 | Weekly visits with 7-day recall | 52 (0) | Daily point prevalence | 2.5 | 248.5 (159.7) | 50.2 (6.1) | 44.0 (4.7) | 1.1 (1.3) |
| Trotta | Peru (urban) | 0–2 | 483 | Weekly visits with 7-day recall | 6.7 (0.6) | Daily point prevalence | 9.6 | 24.2 (1.0) | 6.6 | 5.5 | 1.2 |
| Arnold | India (rural) | 0–5 | 1184 | Monthly visits with 7-day recall | 11.1 (2.5) | Weekly period prevalence | 1.8 | 51.4 (17.4) | 1.6 | 1.3 | 1.3 |
| Mausezahl | Bolivia (rural) | 0–5 | 725 | Weekly visits with 7-day recall | 32.9 (9.9) | Weekly period prevalence | 9.4 | 33.0 (7.6) | 19.2 (7.4) | 10.1 (4.0) | 1.9 (1.8) |
| Van der Hoek | Pakistan (rural) | 0–80 | 1500 | Weekly visits with 7-day recall | 58.4 (10.0) | Daily point prevalence | 1.5 | 146.8 (66.7) | 115.1 (38.2) | 37.5 (4.8) | 3.1 (7.9) |
| Luby | Bangladesh (rural) | 0–5 | 1278 | Monthly visits with 2-day recall | 4.7 (1.2) | 2-days period prevalence | 11.0 | 12.4 (2.7) | 14.8 | 2.7 | 5.5 |
| Luby | Pakistan (urban) | 0–15 | 4691 | Twice-weekly visit with 3–4 day recall | 44.8 (11.0) | Weekly period prevalence | 3.4 | 130.3 (34.2) | 56.7 (46.8) | 4.3 (2.8) | 13.2 (16.6) |
| Luby | Pakistan (urban) | 0–66 | 8949 | Twice-weekly visit with 2–3 day recall | 34.2 (7.6) | Weekly period prevalence | 5.0 | 190 (42.7) | 82.0 (51.5) | 3.2 (2.0) | 25.8 (25.4) |
Design effects were calculated as DEFF = SEclustered2/SEunclustered2; diarrhoea was treated as a binary variable. Prevalence of diarrhoea (days or weeks with diarrhoea over days or weeks observed) was compared between treatment and control using log-binomial regression (family = binomial, link = log). Clustering was accounted for by using robust SE. SEunclustered was calculated ignoring any within-person or within-group correlation of diarrhoea. SEclustered to account for clustering by person was estimated with person identification (ID) as the cluster variable. SEclustered to account for clustering at group level used the group ID as cluster variable (the unit of randomization). If the original data provided no treatment variable, clusters were allocated post-hoc at random to a simulated treatment and control arm. DEFF due to clustering at group level was calculated as DEFFoverall/ DEFFperson.
aDEFFs show the same analyses for incidence (new episodes/days or weeks observed), calculated analogously to prevalence, using Poisson regression without (SEunclustered) and with robust SE to account for clustering (SEclustered).
Examples of different epidemiological studies and suggested sampling strategy
| Study example | Suggested sampling strategy | |
|---|---|---|
| 1 | Context: RCT of household level food hygiene promotion to reduce the burden of diarrhoea, delivered by community health workers to mothers of young children Study population: children aged <5 years Logistics: adequate budget, trained staff and large eligible population available | Outcome measure: LP Surveillance duration: 1 year Sampling frequency: every 6–8 weeks (∼6–9 contacts) Recall period: 3 days Data type: point prevalence Incidence is not suitable as the treatment aims to lower disease burden, for which LP is likely to be a better measure Sampling at intervals (with a corresponding increase in the overall sample size) is chosen to decrease survey effects and bias. 3-day recall is chosen to minimize recall error. The study is done over 1 year to study potential seasonal effects in food contamination. For the sample size within-household clustering can be ignored as the average number of young children per household is usually small (less than two). |
| 2 | Context: RCT of household level food hygiene promotion to reduce the burden of diarrhoea, delivered by community health workers to mothers of young children Study population: children aged <5 years Logistics: tight budget, trained staff scarce, large eligible population available | Outcome measure: LP Surveillance duration: 5 months Sampling frequency: every 4 weeks (∼6 contacts) Recall period: 7 days Data type: period prevalence Incidence is not suitable as the treatment aims to lower disease burden, for which LP is likely to be a better measure. Sampling at intervals (with a corresponding increase in the overall sample size) is chosen to decrease survey effects and bias. 7-day recall (period prevalence) is chosen to maximize power. The study is restricted to 5 months because of the tight budget, focussing on the hot season where food contamination may be most common. For the sample size within-household clustering can be ignored as the number of young children per household is small |
| 3 | Context: RCT of household level food hygiene promotion to reduce the burden of diarrhoea, delivered by community health workers to mothers of young children Study population: children aged <5 years Logistics: tight budget, trained staff scarce, eligible population small (e.g. refugee camp) | Outcome measure: LP Surveillance duration: 5 months Sampling frequency: every 2 weeks (∼12 contacts) Recall period: 3 days Data type: point prevalence Incidence is not suitable as the treatment aims to lower disease burden, for which LP is likely to be a better measure. Frequent sampling is chosen to make the most of the small sample size. Short recall (point prevalence) is chosen to minimize recall error. Because of the short visit intervals, longer recall periods do not add much power. The study is restricted to 5 months because of the tight budget, focussing on the hot season where food contamination may be most common. For the sample size within-household clustering can be ignored as the average number of young children per household is usually small (less than two). |
| 4 | Context: RCT of a new vaccine against a pathogen causing severe diarrhoea Study population: children aged <5 years Logistics: adequate budget, trained staff and large eligible population available | Outcome measure: incidence Surveillance duration: 12 months Sampling approach: passive surveillance of hospital admissions Recall period: Not applicable (NA) Data type: incidence of severe episodes Incidence is suitable as the treatment aims to lower disease transmission of a specific pathogen. Passive surveillance is chosen because a vaccine can be delivered relatively easily to a large study population, focussing on episodes of particular clinical interest. Because hospital admissions do not allow estimating the effect of the vaccine on LP (a better marker for adverse effects on nutritional status), one could consider adding a substudy with active surveillance similar to Example 1, as was done in a Vitamin A trial in Ghana. |
| 5 | Context: cluster RCT of a large rural sanitation programme delivered at village level Study population: all ages Logistics: tight budget, trained staff scarce, large eligible population available | Outcome measure: LP Surveillance duration: 1 year Sampling frequency: every 6–8 weeks (∼6–9 contacts) Recall period: 7 days Data type: period prevalence Incidence is not suitable as the treatment aims to lower disease burden, for which LP is likely to be a better measure. Sampling at long intervals (with a corresponding increase in the number of included villages) is chosen to limit the number of surveillance teams and transport costs. The sampling procedure aims to measure the outcome in one village per day per team. In a cluster randomized trial, more frequent surveillance rounds add relatively little power. 7-day recall (period prevalence) is chosen to maximize power. Data on 3-day point prevalence can be obtained in addition as a secondary outcome. The effect of the intervention in children aged <5 years can be a secondary outcome. Because of the great uncertainties in study power due to the cluster-design, it is preferable to include all household members to maximize power. This is specifically the case if there is little reason to assume the intervention will affect young and older ages differently. |
| 6 | Context: observational study with recurrent infections as exposure (e.g. to study association between diarrhoea and reduction in weight-for-age Study population: children aged <5 years Logistics: adequate budget, trained staff and large eligible population available | Outcome measure: LP Surveillance duration: 1 year Sampling frequency: every 2 weeks (∼20–25 visits) Recall period: 3 days Data type: point prevalence Frequent sampling is chosen to minimize bias towards no effect. Efforts should be made to keep study participants happy and interested. Bias is not a great concern in an observational study without differential treatment of study participants. Short recall period is chosen to minimize bias that could exaggerate the effect size. |
| 7 | Context: observational study >1 year, aimed at detailed exploration of clinical features of individual episodes (e.g. illness duration, severity, clinical signs and symptoms, stool testing for pathogens) Study population: children aged <5 years Logistics: adequate budget, trained staff and large eligible population available | Outcome measure: incidence Surveillance duration: 1 year Sampling frequency: once a week (∼50 contacts) Recall period: 7 days Data type: point prevalence data from which incidence can be calculated Frequent sampling is chosen to accurately establish the beginning and end of episodes, and to record clinical signs and symptoms in detail. Efforts should be made to keep study participants happy and interested. Bias is not a great concern in an observational study without treatment allocation of study participants. Continuous disease records may be needed, but depending on the budget, the surveillance period can be cut into blocks of, e.g., 6–8 weeks where surveillance is intense. This could allow capturing different seasons where different pathogens may circulate (dry cold season, wet season, hot season). |
| 8 | Context: demographic and health survey (DHS). The aim of the survey is to gain information on a range of topics, but the investigator also wishes to explore risk factors for diarrhoea (e.g. water, sanitation, socio-economic status) Study population: all ages Logistics: adequate budget, trained staff and large eligible population available | Outcome measure: LP Sampling frequency: one visit Recall period: 2–3 days Data type: point prevalence A short recall period is preferred to minimize recall error. A DHS usually aims to estimate prevalence as an absolute figure, not primarily to compare two groups, and therefore requires accurate data. Given the large sample size of most DHS surveys, loss of power due to a short recall period is normally not a big issue. Point prevalence data may often be easier to interpret and compare with, than period prevalence data, since diarrhoea definitions used in most DHS and epidemiological studies are based on disease experience during one day. |
| Diarrhoea day | Since diarrhoea symptoms occur intermittently, diarrhoea case definitions in epidemiology are usually based on the nature and frequency of symptoms experienced during one day (or 24 h). A diarrhoea case is therefore equivalent to a ‘diarrhoea day’. For example, the WHO definition requires the occurrence ‘3 or more loose or liquid stools per day’. |
| Diarrhoea episode | One or more diarrhoea days occurring closely in time, presumably caused by a single agent or the interaction of multiple causative agents (e.g. as super-infection). Defining a diarrhoea episode requires deciding on how many diarrhoea-free days separate independent episodes. This decision is necessarily pragmatic especially in high-risk settings, as it is usually difficult to know whether diarrhoea days occurring closely in time belong to the same episode or not. |
| Diarrhoea incidence | The number of diarrhoea episodes per person-time (incidence density) or over a defined period of time (cumulative incidence). |
| Diarhoea point prevalence | The proportion of the population experiencing a diarrhoea day at the time of interest, e.g. the day of a surveillance visit or the day before. |
| Diarrhoea period prevalence | The proportion of the population experiencing at least 1 day with diarrhoea over a pre-defined time window (recall period) prior to a given point in time, e.g. a surveillance visit by the study team. |
| Recall period | The period of time over which the occurrence of diarrhoea is assessed at each contact with a study participant (e.g. phone call or home visit). To measure point prevalence, the recall period is treated as individual days (for example: ‘on which of the last 7 days did you have diarrhoea?’). To measure period prevalence, the recall period is treated as a single time window (e.g. ‘did you have diarrhoea at any day during the last 7 days?’). Thus, when using a 7-day recall period, a single surveillance visit yields 7-point prevalence datapoints, but only one period prevalence datapoint. |
| Longitudinal prevalence | The proportion of time an individual has diarrhoea. This can either be the proportion of days with diarrhoea (for point prevalence), or the proportion of time windows with at least 1 diarrhoea day (for period prevalence). For example, a person reporting diarrhoea on 10% of days has a longitudinal point prevalence of diarrhoea of 10%. A person reporting diarrhoea at any time in the last week, in 10% of weeks of surveillance has a longitudinal period prevalence of 10%. Note that while prevalence is a population measure of disease occurrence, LP is an individual measure. A person can have an LP of 10%, but not a prevalence of 10%. At population level, LP is best described by the mean and SD of individual LP values. |