Literature DB >> 28161712

The impact of mass deworming programmes on schooling and economic development: an appraisal of long-term studies.

Sophie Jullien, David Sinclair, Paul Garner.

Abstract

Background: Documents from advocacy and fund-raising organizations for child mass deworming programmes in low- and middle-income countries cite unpublished economic studies claiming long-term effects on health, schooling and economic development.
Methods: To summarize and appraise these studies, we searched for and included all long-term follow-up studies based on cluster-randomized trials included in a 2015 Cochrane review on deworming. We used Cochrane methods to assess risk of bias, and appraised the credibility of the main findings. Where necessary we contacted study authors for clarifications.
Results: We identified three studies (Baird 2016, Ozier 2016 and Croke 2014) evaluating effects more than 9 years after cluster-randomized trials in Kenya and Uganda. Baird and Croke evaluate short additional exposures to deworming programmes in settings where all children were dewormed multiple times. Ozier evaluates potential spin-off effects to infants living in areas with school-based deworming. None of the studies used pre-planned protocols nor blinded the analysis to treatment allocation. Conclusions: In the context of reliable epidemiological methods, all three studies are at risk of substantial methodological bias. They therefore help in generating hypotheses, but should not be considered to provide reliable evidence of effects.

Entities: Chemical Disease Gene Species

Keywords: Helminths; bias; children; cluster analyses; parasitic worms

Mesh：

Substances：
Anthelmintics

Year: 2016 PMID： 28161712 PMCID： PMC5841841 DOI： 10.1093/ije/dyw283

Source DB: PubMed Journal: Int J Epidemiol ISSN： 0300-5771 Impact factor: 7.196

Introduction

Soil- transmitted helminths remain common in many low- and middle-income countries, despite some evidence that global infection intensity may be declining. The worms are unpleasant, cause discomfort and with heavy infections can undermine nutritional status and lead to serious complications., It is obvious that children with symptomatic infection should be treated. It is also obvious that repeated mass treatment of whole communities with effective drugs will reduce the overall worm burden where helminths are common, at least in the short term., What is not obvious is whether mass deworming programmes have any measurable long-term effect on health and nutrition at the community level. A 2015 Cochrane review of trials administering multiple rounds of deworming treatment found little or no effect on average weight gain or average haemoglobin across 10 trials with more than 38 000 participants. The review authors (which include one of the authors of this paper) interpreted this as reasonable evidence of no effect. Others have suggested that the trials were simply too short or were poorly designed for detecting effects once infected children are dispersed among large numbers of uninfected children., However, much of the advocacy and fund-raising for mass deworming programmes in children has drawn on studies reporting long-term effects on school attendance and economic development. This advocacy contributed to the decision by India to run the largest national deworming programme in the world (targeting 270 million children in schools and preschools in 2016), and the Cochrane review has been criticized for excluding the studies cited for these effects., The objective of this paper was therefore to use health technology assessment methods based on epidemiological principles, to appraise the methods of these studies and to interpret their findings in the light of this appraisal.

Methods

Inclusion criteria

All follow-up studies based on randomized or quasi-randomized experimental trials (termed ‘base trials’) included in the 2015 Cochrane review. We included all outcome domains identified by the literature in this field as important for decision making and included in the Cochrane review (see Figure 1): nutritional status (measured by weight, height and haemoglobin); physical well-being (measured by exercise tolerance or self-reported measures); school attendance (measured by days present at school or years of school enrolment); and cognition and school performance (measured by formal tests and exam performance). In addition, we included all economic productivity outcomes the author teams deemed to be reasonable overall measures.

Figure 1.

Logic model for the effects of community deworming.

Reproduced with permission from Taylor-Robinson 2015.

Logic model for the effects of community deworming. Reproduced with permission from Taylor-Robinson 2015.

Search strategy

We identified the main unpublished studies being cited, by reviewing the reference lists of prominent papers, and the webpages of deworming advocates.,, We also searched Pubmed for published follow-up studies to all cluster-randomized studies included in the Cochrane review by using the search terms: ((deworm*[Title/Abstract] OR helminth*[Title/Abstract]) AND (Alderman[Author] OR Awasthi[Author] OR Hall[Author] OR Stoltzfus[Author] OR Wiria[Author] OR Rousham[Author] OR Miguel[Author])). Two authors independently screened the search results and applied the inclusion criteria.

Risk of bias assessment

For the base trials, we described the study design, setting, population, intervention and control. We assessed the risk of bias using the Cochrane tool for appraising randomized controlled trials and considered the potential for bias to also influence the results of the follow-up studies. For each follow-up study, we described the study design, population, timing, intervention and control group exposure to mass deworming, analytical approach, outcome measurement and reporting, and results. For the risk of bias assessment, we adapted the Cochrane tool for randomized controlled trials to take into account the additional risks posed by cross-sectional sampling from communities, many years after the planned experiment finished, as follows. Selection bias: we considered the randomization process of the base trial, the methods for selection of a proportional sample and the balance of potential confounders between groups. Measurement/ and detection bias: we considered the methods used to blind those collecting and analysing the data from the treatment allocation. Attrition bias: we considered loss of clusters, exclusion of participants after enrolment, migration in and out of the study area and the proportion and potential impact of missing outcome data. Selective reporting bias: we considered the use of a pre-planned protocol, the number of outcomes assessed and the potential for false-positive results, changes in the reporting of outcomes over time, and inclusion of important findings (showing association, or showing a lack of association) in the abstract. For each domain, we classified studies as: ‘low risk’ when appropriate methods were described to reduce the potential for bias; ‘high risk’ when the methods described were inadequate to negate the potential for bias to influence the results; and ‘unclear risk’ when the impacts of any methodological problems were uncertain or there was insufficient information to make a clear judgment. We refined these assessments after contacting the study authors for additional information.

Outcome credibility assessment

We first summarized the effect size and 95% confidence interval for all outcomes reported by the studies across the policy-important domains. As the included studies are outside the scope of what would normally be included in a Cochrane review, we familiarized ourselves with the study methods and findings and discussed which factors would be likely to be important when appraising the results. We then applied this appraisal systematically across studies. We then further assessed the credibility of all the main findings reported in the abstracts, by considering: the evidence base for the stated effect from the main text (we considered an effect to be present if P < 0.05); the power of the study to detect this effect; the consistency of the effect across subgroups; the consistency of the effect across similar or related outcomes; and the robustness of the effect to adjustment for multiple inferences [although statistical adjustment for multiple testing is of limited value without a pre-planned analytical protocol, we considered the effect to be robust if the false discovery rate (FDR) q-value < 0.05]. We also considered whether intermediate effects were present or absent on plausible causal pathways, and the plausibility of the effect in relation to the intensity of the intervention.

Results

We identified three unpublished, long-term follow-up studies based on two cluster-randomized trials from Kenya and Uganda, (see Table 1). One additional study was excluded, as it was not based on a randomized experiment.

Table 1.

Characteristics of the base trials and the long-term follow-up studies

Study ID (versions)	Base trial					Follow-up study
Study ID (versions)	Study ID	Country	Population	Number randomized (clusters)	Intervention	Population	Data collection	Sample size	Timing	Difference in deworming exposure between study groups
Baird series (2010, 2011a, 2011b, 2012, 2015, 2016)	Miguel and Kremer 2004	Kenya	School children aged between 6 and 18 years^a	32 565^b (75)	Deworming^c every 6 months at school, plus health promotion	Adults aged 19 to 26 years who participated in the base trial as children	Questionnaire survey	5084	9 to 11 years after base trial started	2.4 additional years of deworming
Ozier series (2011, 2014, 2015, 2016)	Miguel and Kremer 2004	Kenya	School children aged between 6 and 18 years^a	32 565^b (75)	Deworming^c every 6 months at school, plus health promotion	Children aged 8 to 15 years who now attend the base trial schools, but were too young at the time of the trial to have participated	Field survey	21 309 for height and weight; 2371 for cognitive assessment	11 to 12 years after base trial started	Exposure to the ‘spill-over’ effects of deworming during the first year of life
Croke 2014 (2014)	Alderman 2006	Uganda	Pre-school children aged 1 to 7 years	27 995 (50)	Deworming^d every 6 months at child health days (CHD)	Children aged 6 to 16 years who live in the area of the base trial and may have participated as children	Large-scale survey unrelated to base trial	763	10 to 11 years after base trial started	2 additional doses of deworming tablets

In Miguel and Kremer 2004: girls aged 13 years or older were not intended to receive the drug intervention due to potential drug teratogenicity. However, some did receive deworming treatment.

Miguel and Kremer 2004 was a quasi-randomized trial using sequential allocation.

In Miguel and Kremer 2004, deworming medication was given as albendazole every 6 months (600 mg in 1998 and 400 mg in 1999) plus praziquantel at 40 mg/kg annually. It is estimated that 72% of children in the intervention and 5% of children in control groups received this.

In Alderman 2006, deworming medication was given as albendazole 400 mg every 6 months. It is estimated that the deworming coverage increased from 21.7% before the intervention started in 2000 to 65.8% in 2003 in the intervention group, and from 23.9 to 34.6% in the control group (according to a cluster survey of households in all parishes, including 750 households in each group).

Characteristics of the base trials and the long-term follow-up studies In Miguel and Kremer 2004: girls aged 13 years or older were not intended to receive the drug intervention due to potential drug teratogenicity. However, some did receive deworming treatment. Miguel and Kremer 2004 was a quasi-randomized trial using sequential allocation. In Miguel and Kremer 2004, deworming medication was given as albendazole every 6 months (600 mg in 1998 and 400 mg in 1999) plus praziquantel at 40 mg/kg annually. It is estimated that 72% of children in the intervention and 5% of children in control groups received this. In Alderman 2006, deworming medication was given as albendazole 400 mg every 6 months. It is estimated that the deworming coverage increased from 21.7% before the intervention started in 2000 to 65.8% in 2003 in the intervention group, and from 23.9 to 34.6% in the control group (according to a cluster survey of households in all parishes, including 750 households in each group). All three follow-up studies are economic working papers available online but not formally published (Appendix 1, available as Supplementary data at IJE online). The study by Baird has been presented online in six iterations, although we were only able to access five (2011a, 2011b, 2012, 2015, 2016). The study by Ozier has been presented on-line in four iterations (2011, 2014, 2015, 2016). The search for published studies returned 94 on-line records, of which none were judged relevant to this review: 8 reports corresponded to the base trials in the Cochrane review, 11 were studies older than the base trials, 40 were not relevant to deworming interventions and the remaining 35 were not long-term follow-up studies.

Kenya trial (Miguel and Kremer 2004)

The base trial for the first two studies was conducted in Busia District, in Western Kenya, by Miguel and Kremer. The intervention comprised deworming drugs administered every 6 months, plus regular worm prevention education through public health lectures, wall charts and training of teachers. Seventy-five schools, with 32 565 pupils aged between 6 and 18 years, were allocated sequentially to one of three treatment groups: group 1 received the intervention from 1998, group 2 from 1999 and all groups received the intervention from 2001 onwards. Risk of bias assessment. The quasi-randomized design means that there is a small risk of systematic differences between groups, and this risk will probably still be present in the follow-up studies. In addition, any effects observed in the follow-up studies may be attributable to the effects of the public health education activities rather than the anti-helminthic drugs. Although some would argue this is part of the intervention, it is not the main component of most large national deworming initiatives. A complete risk of bias assessment is available in Table 2. Of note, an independent replication analysis of this trial was carried out in 2015, which found errors in the analysis of reported effects on haemoglobin and nutritional status; the authors now acknowledge that these effects are not ‘statistically significant’. In a second replication that used the original authors analytical approach, the externalities were also not demonstrable, but the original trial authors have adjusted the parameters, conducted new analyses and contest this.

Table 2.

Risk of bias assessments for the base trials

Study ID	Selection bias		Reporting and detection bias		Attrition bias	Other biases
Study ID	Sample selection	Confounding	Blinding of outcome assessors	Blinding of data analysis	Attrition bias	Contamination	Co-intervention
Miguel 2004	HIGH RISK Systematic allocation (non-random) Subsamples described as ‘random’ but no details given	UNCLEAR RISK Groups broadly similar according to comparison of variables at baseline, but missing data to assess and confirm it	HIGH RISK Not blinded	UNCLEAR RISK Blinding not described	HIGH RISK No clusters were lost Considerable missing data for all outcomes.	LOW RISK Deworming coverage of 5% in the control group Transfer rate into a different school between 2% and 8%, with similar proportions among the three groups	HIGH RISK Worm prevention education through regular public health lectures, wall charts and training of teachers^a Other school-based interventions simultaneously in 27/75 project schools
Alderman 2006	LOW RISK Cluster randomized controlled trial	LOW RISK Balanced baseline characteristics	HIGH RISK Not blinded	HIGH RISK Not blinded	LOW RISK Two clusters were lost	HIGH RISK Children dewormed in 2003: 65.8% in intervention group, 34.6% in the control group	LOW RISK None

Some may view this as part of the intervention, but current global policy advocates drug distribution, not intensive school health education.

Risk of bias assessments for the base trials Systematic allocation (non-random) Subsamples described as ‘random’ but no details given Groups broadly similar according to comparison of variables at baseline, but missing data to assess and confirm it Not blinded Blinding not described No clusters were lost Considerable missing data for all outcomes. Deworming coverage of 5% in the control group Transfer rate into a different school between 2% and 8%, with similar proportions among the three groups Worm prevention education through regular public health lectures, wall charts and training of teachers Other school-based interventions simultaneously in 27/75 project schools Cluster randomized controlled trial Balanced baseline characteristics Not blinded Not blinded Two clusters were lost Children dewormed in 2003: 65.8% in intervention group, 34.6% in the control group None Some may view this as part of the intervention, but current global policy advocates drug distribution, not intensive school health education.

Baird study (reported in a series of papers 2010-16)

The Baird series of reports presents analyses of a questionnaire survey of 5084 adults, 9 to 11 years after they participated in the Kenyan trial. The analysis compares adults from schools which began receiving the intervention in 1998 and 1999, with adults from schools which did not receive the intervention until 2001 (see Appendix 2, available as Supplementary data at IJE online). As all participants eventually received the intervention, this study looks for effects attributable to the intervention group receiving an additional 2.4 years of the deworming intervention compared with the control group (Table 1). The paper presents data on nutritional, health, schooling and labour market outcomes. Risk of bias assessment. The survey sampled adults from a complete list of all children who attended the schools in the base trial. The sample was selected using computerized randomization, and stratified by school, grade and gender (see Table 3). Baseline data were presented for age and academic performance prior to the base trial, and although these appear balanced, this is probably insufficient to exclude the possibility of confounding due to the quasi-randomized design of the base trial. The analysis did not follow a pre-planned protocol, and those analysing the data were not blinded to treatment allocation.

Table 3.

Risk of bias assessments of the long-term follow-up studies

Study ID	Selection bias		Reporting and detection bias		Attrition bias	Selective reporting
Study ID	Sample selection	Confounding	Blinding of outcome assessors	Blinding of data analysis	Attrition bias	Selective reporting
Baird series	LOW RISK Computer-generated random sampling from the eligible population, stratified by school, grade, and gender	UNCLEAR RISK Age and academic performance before base trial appeared similar, but other potential confounders not presented Uncertain risk of confounding due to the quasi-randomized design of the base trial	LOW RISK Outcome assessors were unaware of how treatment would be defined in the analysis	HIGH RISK Not blinded	LOW RISK 2/75 clusters not included in the analysis Effective tracking rate of 82.7%	HIGH RISK No a priori analytical plan Multiple significance testing Inconsistency of outcome reporting over time Post-hoc subgroup analyses presented as main results in the abstract Important findings of no effect not reported in abstract
Ozier series	LOW RISK Computer-generated random sampling from eligible population^a	UNCLEAR RISK Data on potential confounders are not provided separately for intervention and control groups Only two cohorts (of seven) contain relevant randomized comparisons. Additional analyses of the whole sample are at uncertain risk of confounding due to secular trends^a	LOW RISK Outcome assessors were unaware of how treatment would be defined in the analysis	HIGH RISK Not blinded	UNCLEAR RISK Around 28% of sample excluded as they had migrated into the area after the base trial Migration out of the area, which would represent missing data, is not well quantified 2/75 clusters not included in the analysis	UNCLEAR RISK No apriori analytical plan Important finding of no effect on height not reported in abstract until the 2016 version. Data on weight not reported at all
Croke 2014	UNCLEAR RISK Selection of villages described as ‘random’ but method not specified Selection of households within villages by systematic selection	UNCLEAR RISK Some confounders (access to water and private education) appear unbalanced	LOW RISK Data were collected through a larger survey conducted for other reasons and unrelated to the base study	HIGH RISK Not blinded	HIGH RISK 28/50 clusters not included in analysis Numeracy and literacy test outcomes available for 710/763 children (6.9% missing data) Potential migration out of the area not addressed	UNCLEAR RISK No a priori analytical plan

Ozier series: of the seven annual cohorts, none of the children born in 1995 or 1996 lived in areas with active deworming programmes in the first year of life, whereas all the children born in 2001 did. Analyses across all seven cohorts therefore represent a mixture of randomized and observational data.

Risk of bias assessments of the long-term follow-up studies Computer-generated random sampling from the eligible population, stratified by school, grade, and gender Age and academic performance before base trial appeared similar, but other potential confounders not presented Uncertain risk of confounding due to the quasi-randomized design of the base trial Outcome assessors were unaware of how treatment would be defined in the analysis Not blinded 2/75 clusters not included in the analysis Effective tracking rate of 82.7% No a priori analytical plan Multiple significance testing Inconsistency of outcome reporting over time Post-hoc subgroup analyses presented as main results in the abstract Important findings of no effect not reported in abstract Computer-generated random sampling from eligible population Data on potential confounders are not provided separately for intervention and control groups Only two cohorts (of seven) contain relevant randomized comparisons. Additional analyses of the whole sample are at uncertain risk of confounding due to secular trends Outcome assessors were unaware of how treatment would be defined in the analysis Not blinded Around 28% of sample excluded as they had migrated into the area after the base trial Migration out of the area, which would represent missing data, is not well quantified 2/75 clusters not included in the analysis No apriori analytical plan Important finding of no effect on height not reported in abstract until the 2016 version. Data on weight not reported at all Selection of villages described as ‘random’ but method not specified Selection of households within villages by systematic selection Some confounders (access to water and private education) appear unbalanced Data were collected through a larger survey conducted for other reasons and unrelated to the base study Not blinded 28/50 clusters not included in analysis Numeracy and literacy test outcomes available for 710/763 children (6.9% missing data) Potential migration out of the area not addressed No a priori analytical plan Ozier series: of the seven annual cohorts, none of the children born in 1995 or 1996 lived in areas with active deworming programmes in the first year of life, whereas all the children born in 2001 did. Analyses across all seven cohorts therefore represent a mixture of randomized and observational data. The five versions available online to mid-2016 contain substantially different analyses which appear exploratory, and there is a high risk of false-positive results given the number of hypotheses tested for statistical significance increased from 228 in Baird 2011a to 650 in Baird 2016, largely due to the introduction of subgroup analyses (see Appendix 3, available as Supplementary data at IJE online). This process appears to be at high risk of reporting bias, and a narrative analysis suggests selective reporting as follows. Some outcomes reported in early versions were dropped from later versions. It is not made clear to the reader why, but it is likely to be due to the failure to demonstrate an effect (for example, cognitive test results reported in 2011a, but absent in 2016; with no apparent effect on Raven's matrices or English vocabulary). Effects are presented for outcomes which appear to be part of a larger undisclosed data set (for example, ‘self-reported health rated as very good’ presented without additional categories; and ‘Kenyan women who participated as girls have fewer miscarriages’ without presenting other health-related outcomes). Results from post hoc subgroup analyses are given prominence in the abstract and results (for example, an increase in secondary school attendance in females is claimed in the 2016 abstract, but no effect was apparent in the whole sample, and disaggregation by sex only appeared from 2012 onwards). The abstract changed substantially between versions, but none reported important findings of no effect (for example, there were no effects apparent on body mass index or height but these are not reported in any of the five abstracts; see Table 4 (and Appendix 4, available as Supplementary data at IJE online).

Table 4.

Assessment of selective reporting in Baird 2011a

Policy-important domains	Abstract		Tables and appendices
Policy-important domains	Number of outcomes reported as beneficial	Number of outcomes reported as no effect	Number of outcomes reported with P < 0.05	Number of outcomes reported with P > 0.05
Nutritional status	0	0	0	3
Physical well-being	1	0	2	0
School enrolment and attendance	1^a	0	1	2
School performance and tests of cognition	1	0	1	6
Economic productivity	6^b	0	13^c	19^c

P < 0.1 and > 0.05.

Economic productivity measured in: hours worked (seven subgroups); missed days (four subgroups); occupational subgroups (12); wage subsamples/derivative measures (nine).

Assessment of selective reporting in Baird 2011a P < 0.1 and > 0.05. P < 0.1 and > 0.05. Economic productivity measured in: hours worked (seven subgroups); missed days (four subgroups); occupational subgroups (12); wage subsamples/derivative measures (nine). To further examine the influence of selective reporting, we compared the ‘statistically significant’ findings (P < 0.05) presented in the abstract, with the overall findings presented in Baird 2011a (Table 4). In the abstract, Baird reports that physical well-being, school enrolment and attendance and school performance or cognition are significantly higher in the group receiving earlier deworming. However, in the main text tables, only one of the seven outcomes measuring school performance/cognition is statistically significant. Similarly, for school attendance, an effect was only apparent in one of the three outcomes reported. Economic productivity was more complicated, as there were numerous subgroup analyses and a variety of derivative measures; an effect was apparent in 13 outcomes, with a further 19 reporting no statistically significant effect. Credibility assessment. In Table 5 we attempt to provide a balanced presentation of the key results from Baird 2016, stratified by the policy-relevant outcome domains, and in Table 6 we present our credibility assessment for the outcomes reported in the abstract of Baird 2016.

Table 5.

Summary of effects reported in the long-term follow-up studies

Policy-important domains	Reported outcomes (unit of measurement)	Effect size (95% CI)
Policy-important domains	Reported outcomes (unit of measurement)	Baird series	Ozier series	Croke 2014
Nutritional status	Body mass index	0.02 kg/m² higher	-	-
	(kg/m²)	(0.07 lower to 0.11 higher)	-	-
	Height	0.11 cm shorter	0.20 cm taller^a	-
	(cm)	(0.65 shorter to 0.43 taller)	(0.39 shorter to 0.80 taller)	-
	Haemoglobin	0.10 g/dl higher^b^,^c	-	-
	(g/dl)	(0.06 lower to 0.27 higher)	-	-
Physical well-being	Self-reported health status^d	4.0% more	-	-
	(% rated as ‘very good’)	(0.4 more to 7.6 more)	-	-
	Poor health in the past month	0.11 days fewer^e	-	-
	(workdays missed)	(0.38 fewer to 0.17 more)	-	-
School enrolment and attendance	School enrolment	-	-	1.86% higher
	(%)	-	-	(0.72 lower to 4.44 higher)
	School enrolment	0.29 years more	-	-
	(total years)	(0.00 more to 0.58 more)	-	-
	Secondary school attendance	3.0% higher	-	-
	(%)	(4.0 lower to 10.0 higher)	-	-
School performance and tests of cognition	Had to repeat at least one grade	6.3% higher	-	-
	(%)	(2.7 higher to 9.9 higher)	-	-
	Passed secondary school entrance exam	5.0% higher	-	-
	(%)	(1.2 lower to 11.2 higher)	-	-
	Raven’s matrices test score^f	1.1 % lower^g	22.0% higher	-
	(normalized scores, %)	(10.7 lower to 8.5 higher)	(6.4 higher to 37.6 higher)	-
	English vocabulary test score	7.6 % higher^h	16.1% higher	16.4% higher
	(normalized scores, %)	(3.4 lower to 18.6 higher)	(3.1 lower to 35.3 higher)	(17.74 lower to 50.54 higher)
	Math score	-	-	301 % higher
	(normalized scores, %)	-	-	(0.81 lower to 61.0 higher)
Economic productivity	Hours worked per week	1.58 h more	-	-
	(hours)	(0.50 fewer to 3.66 more)	-	-
	Monthly earnings (waged employment plus self-employed earnings)	226 higherⁱ
		(1162 lower to 1614 higher)
	Monthly earnings (waged employment only)	26.9% more	-	-
	Monthly earnings (waged employment only)	(9.9% more to 43.9% more)	-	-

Ozier also reports height-for-age and stunting, which are consistent with the findings for height.

Baird 2011a reported control group estimate of 126.1 and coefficient estimate of 1.03 but no unit of measure, and we asume they used g/l (SI units); we report this outcome as g/dl.

Findings on haemoglobin are not reported in the Baird 2016 version, but are in Baird 2011a and 2011b.

The Baird series also report the proportion of women who had experienced a miscarriage, which was lower in the intervention group. It is excluded from this table as it seems a spurious outcome to present in isolation without measuring a large range of other potential health outcomes.

Findings on workdays missed due to poor health in the past month are not reported in the Baird 2016 version, but are in Baird 2011a. In Baird 2011b, this outcome is reported for the out-of-school subsample only.

Ozier used the 12 questions in set B of the Raven's Progressive Matrices. Baird gives no further details on the questions used for assessing the Raven’s matrice test score.

Findings on Raven’s matrices test score are not reported in the Baird 2016 version; they are in Baird 2011a only.

Findings on English vocabulary test score are not reported in the Baird 2016 version, but are in Baird 2011a, 2011b and 2012.

The unit of this outcome is not reported, although we could assume it is the local currency.

Table 6.

Outcome appraisal of all outcomes reported in the abstract of Baird 2016

Outcomes reported in the abstract		Evidence base for stated effect	Effect present in whole sample?^a	Effect robust to adjustment for multiple inference?^b	Effect consistent across related outcomes?^c
Men	‘Stay enrolled for more years of primary school’	Men from intervention areas had higher total years enrolled in primary school (P < 0.05)	Yes	No	No	No statistically significant difference in the total number of school grades attained (P > 0.1), and adults from intervention areas more likely to have repeated at least one grade (P < 0.01)
	‘Work 17% more hours each week’	Men from intervention areas worked more hours in the past week (P < 0.05)	No	No	–	–
	‘Spend more time in non-agricultural self-employment’	A borderline effect on hours worked in non-agricultural self-employment in men (P < 0.1)	Yes (P < 0.05)	Remains borderline	No	No statistically significant difference in monthly non-agricultural earnings (P > 0.1)
	‘Spend more time in manufacturing’	Men from intervention areas had a higher manufacturing job indicator (P < 0.05)	Yes	No	No	No statistically significant effect on hours worked in waged employment (P > 0.1), and no statistically significant difference in monthly non-agricultural earnings (P > 0.1)
	‘Miss one fewer meals per week’	Men from intervention areas had eaten more meals the previous day (P < 0.01)	Yes	Yes	–	–
Women	‘One-quarter more likely to have attended secondary school’	Women from intervention areas had higher secondary school attendance (P < 0.05)	No	No	No	No statistically significant difference in the number of school grades attained (P > 0.1)
	‘Reallocate time from traditional agriculture into cash crops’	Women from intervention areas had a higher ‘grows cash crop’ indicator (P < 0.05)	Yes	No	–	–
	‘Reallocate time from traditional agriculture into non-agricultural self-employment’	Women from intervention areas worked more hours in non-agricultural self-employment in the past week (P < 0.05)	Yes	No	No	No statistically significant difference in monthly non-agricultural earnings (P > 0.1)

The subgroup analysis by sex was not introduced until the third edition of the Baird series and so is considered post hoc. We considered the effect to be present in the whole sample if P < 0.05 for both sexes combined.

The authors of the Baird series conducted adjustments for multiple inference. We considered the effect robust to adjustment if the FDA q-value < 0.05.

With so many outcomes presented, we considered whether the effects of related outcomes consistently suggested benefit.

In their 2016 abstract, Baird et al. state that men stayed ‘enrolled for more years of primary school’, and women were ‘approximately one-quarter more likely to have attended secondary school’. These statements are supported by ‘statistically significant’ results within the text, but presentation of these two results in isolation could be regarded as misleading, as there is other information that is required for a balanced interpretation: (i) these effects were not present in the whole sample, and are only apparent in post hoc subgroup analyses which the analysis was not adequately powered to examine; (ii) neither result is robust to the authors’ own adjustments for multiple inferences; and (iii) these are selected positive findings among a group of results for similar or related outcomes, that either show no effect (there was no evidence of an increase in the number of school grades attained in either sex), or provide an alternative explanation for these effects (those in the intervention group were actually more likely to have repeated a grade). Summary of effects reported in the long-term follow-up studies Ozier also reports height-for-age and stunting, which are consistent with the findings for height. Baird 2011a reported control group estimate of 126.1 and coefficient estimate of 1.03 but no unit of measure, and we asume they used g/l (SI units); we report this outcome as g/dl. Findings on haemoglobin are not reported in the Baird 2016 version, but are in Baird 2011a and 2011b. The Baird series also report the proportion of women who had experienced a miscarriage, which was lower in the intervention group. It is excluded from this table as it seems a spurious outcome to present in isolation without measuring a large range of other potential health outcomes. Findings on workdays missed due to poor health in the past month are not reported in the Baird 2016 version, but are in Baird 2011a. In Baird 2011b, this outcome is reported for the out-of-school subsample only. Ozier used the 12 questions in set B of the Raven's Progressive Matrices. Baird gives no further details on the questions used for assessing the Raven’s matrice test score. Findings on Raven’s matrices test score are not reported in the Baird 2016 version; they are in Baird 2011a only. Findings on English vocabulary test score are not reported in the Baird 2016 version, but are in Baird 2011a, 2011b and 2012. The unit of this outcome is not reported, although we could assume it is the local currency. Outcome appraisal of all outcomes reported in the abstract of Baird 2016 The subgroup analysis by sex was not introduced until the third edition of the Baird series and so is considered post hoc. We considered the effect to be present in the whole sample if P < 0.05 for both sexes combined. The authors of the Baird series conducted adjustments for multiple inference. We considered the effect robust to adjustment if the FDA q-value < 0.05. With so many outcomes presented, we considered whether the effects of related outcomes consistently suggested benefit. The abstract then uses these selected measures of educational effects to explain apparent shifts in the labour market, which are presented as beneficial. However, it is not clear to us which of these shifts represent a genuine economic improvement. For example, the number of hours women worked in agriculture appears lower in the intervention group and is presented as a benefit, but the number of hours worked by men appears higher. In reality, an effect in either direction could be interpreted as a benefit due to the alternative explanations of better health (enabling longer hours in manual work), or better education (enabling a move to higher skilled work). It is perhaps more useful to note that there was no evidence of an increase in hours worked in waged employment, and no evidence of an increase in non-agricultural earnings (waged earnings plus self-employed profits). The authors clarified that the sample size was calculated to detect a 15% relative increase in secondary school attendance in the whole sample. The analysis was therefore not powered to look for subgroup effects. Furthermore, the sample size calculation does not seem to have been adjusted for the cluster design.

Ozier study (reported in a series of papers 2011-16)

The Ozier series report a field survey of 21 309 children attending the schools, quasi-randomized by the Kenyan trial 11 to 12 years earlier. These children were too young at the time of the original trial to have received deworming treatment through the school-based programme. The analysis compares outcomes within each birth cohort from 1995 to 2001. Children aged less than 1 year living in communities where the deworming intervention had started are classified as the intervention group, and those living in communities where deworming had not yet started are classified as controls. The difference between these two groups is theoretically only that the children in the intervention group may have benefited from decreased worm prevalence among older siblings and the community during the first year of life, whereas the children in the control group did not. Risk of bias assessment. The field survey conducted cognitive tests on a computer-generated random sample representing approximately 12 % of the eligible population. This sample covered seven annual school cohorts from 1995 to 2001. Only the 1998 and 1999 cohorts contain quasi-randomized comparisons relevant to the study question. In the 1995 and 1996 cohorts, none of the children lived in areas with active deworming programmes during the first year of life; and in 2001, all the children lived in areas with active deworming programmes. Analyses across the whole sample (seven cohorts) are thus secondary observational analyses, with unknown secular changes potentially confounding the findings (see Table 3). Data collection was appropriately blinded to treatment allocation, but again data analysis was not blinded and was not guided by a pre-planned protocol. Important findings of no apparent effect on height and height-for-age were not reported in the abstract until the 2016 version (despite being one of the main a priori hypotheses, according to communication with the authors). Although weight data were collected for 21 309 children, they were not part of the analysis and not presented. Credibility assessment. In the 2016 abstract, Ozier states that exposure to the spill-over effects of deworming programmes during the first year of life produced ‘large cognitive effects, comparable to between 0.5 and 0.8 years of schooling’. This statement is based on demonstrable effects on two out of five cognitive tests (Raven’s matrices and verbal fluency; P < 0.05) and a trend towards benefit on all five tests. These positive effects are taken from analyses across the whole sample, which include non-randomized data. However, following communication with the authors, additional tables were produced confirming these effects were still apparent in analyses limited to the quasi-randomized cohorts from 1998 and 1999. It should, however, be noted that the revised analysis is substantially underpowered to reliably detect these effects (communication with the authors confirmed that only the analyses including all seven annual cohorts were adequately powered). The authors themselves explain the lack of effect on height to be related to the low worm load in young children. We consider this observation, along with the very low intensity of the intervention being tested, to question the plausibility of the stated effect.

Uganda trial (Alderman 2006)

The base trial for the third study was conducted in Eastern Uganda by Alderman et al. The intervention was implemented through Child Health Days (CHD) and comprised albendazole 400 mg every 6 months. Fifty parishes in five districts were identified as having heavy worm loads and randomly allocated to the intervention and control arms. Over the 3-year programme from 2000 to 2003, children in both groups attended 1.74 CHDs on average, with only the intervention group scheduled to be dewormed but both groups receiving additional health services such as vaccination and health promotion. Participants were pre-school children aged between 1 and 7 years, and deworming became routine and free for everyone shortly after the end of the study. Risk of bias assessment. The base trial used a truly random method of allocation (a coin toss), but although deworming was the intended difference between intervention and control groups, up to 35% of those in control areas were also dewormed, from private clinics or shops (see Table 2).

Croke study 2014

Croke uses a large-scale questionnaire survey conducted in Uganda 7 to 8 years after the end of the trial by Alderman. The survey was unrelated to the base trial but covered some of the same parishes, and included 763 children who would have been aged between 1 and 7 years at the time of the base trial and who therefore might have participated. The study compares children living within the intervention parishes of the base trial with children living in the control parishes. The difference between the two groups (ignoring migration in and out of the area) is therefore likely to be less than two additional doses of albendazole during the 3 years of the programme. The analysis reports on numeracy and literacy test outcomes. Risk of bias assessment. The sampling method is reported as random, but the descriptions of sampling are inadequate to make a clear assessment of the risk of selection bias (see Table 3). Data acquired through correspondence with the author reports on 11 covariates, among which the treatment group appears to have had better access to water (24% of individuals compared with 3%) and private education (14% compared with 9%). The data collection process was unrelated to the deworming base trial and so unlikely to have been influenced by it, but data analysis was not blinded. The risk of attrition bias is high with only 22 of the 50 parishes recruited by Alderman included in the sample (10 from the intervention group and 12 from the control group), and no assessment of the effects of migration. There was no pre-planned protocol. Credibility assessment. Croke states that children who lived in intervention parishes during the base trial period had ‘test scores 0.2 to 0.4 standard deviations higher than those in control parishes’. Setting statistical significance at P < 0.05, this effect was not present in the raw data and only apparent after adjustment for age, gender and survey year. No formal power calculations were conducted and the analysis is substantially underpowered to detect these effects, with less than a third of the sample size calculated by Ozier. The authors found no evidence of an effect on school enrolment, but do not report this in the abstract.

Discussion

In summary, of the three included long-term follow-up studies: the Baird series reports possible effects on secondary school attendance and job sector choices, 9 to 11 years after a head start of 2.4 years of additional school-based deworming; the Ozier series reports possible externalities on cognitive development in children living in areas with school-based deworming during the first year of life; and Croke 2014 reports possible effects on English and maths test scores 10 to 11 years after less than two additional doses of deworming tablets during early childhood. All the reports present these as clear evidence of benefit of deworming programmes. Long-term studies of the effects of public health interventions are complex and difficult to do. We therefore acknowledge the hard work of the study authors and research teams. However, from our epidemiological standpoint we find substantial reason to doubt the validity and plausibility of these findings, given the information provided and the process of analysis that has been documented. As such, we believe they should be regarded as hypothesis-generating, rather than as reliable evidence of effects to support large-scale deworming programmes in low- and middle-income countries. First, we note that these studies do not provide the evidence of cumulative effects from multiple rounds of deworming, that some have called for and others have attributed to them., In all three studies, most participants in both the intervention and control groups would have been ‘dewormed’ multiple times during their pre-school or school years, and the largest ‘intervention’ under evaluation was an additional 2.4 years of deworming medication in the Baird series. The majority of children in these studies would therefore be worm-free, or would have reduced worm counts during much of their childhood, and consequently the consistent finding of no effect on height or weight across all three studies is unsurprising. More subtle nutritional pathways for the observed effects, such as via micronutrient status, also seem unlikely to act over such short durations. Second, we are concerned about the selective reporting of favourable results in the abstracts, especially after multiple significance testing and post hoc subgroup analyses. All three papers herald from an economic discipline, but we assess them against current epidemiological standards and make no apology for that. The policy under evaluation is a public health programme, and the potential for bias exists irrespective of discipline. We do however acknowledge that some of the problems exist, at least in part, due to current norms within economics and the reporting requirements of economic journals (such as strict word limits for abstracts). Whereas some economists may argue that the accuracy of conclusions is improved over time through the refinement and addition of new analyses, we are concerned that the process risks cumulative selective reporting, and our analysis provides some indicators that this may be the case. Of note, none of these three studies worked to a pre-planned analytical protocol, and although this has been a standard requirement within epidemiology for some time, it has only recently been recognized as important within economics. This was also noted in the replication analyses of the primary trials., Nevertheless, this approach may well produce misleading results and conclusions, and statistical correction of multiple testing is insufficient to correct selective reporting. The abstract to the Baird series exclusively presents the positive results, and leaves readers unaware of the multiple findings of no evidence of effect and the conflicting findings within the analysis. For example: no evidence of effect on markers of nutrition (weight or height); no evidence of effect on multiple tests of cognition (the same tests as reported by the Ozier series); an increase in the need to repeat a school grade in the intervention group (an alternative explanation for the observed increase in years in school, and consistent with the finding of no overall increase in the number of grades achieved); no evidence of effect on secondary school attendance prior to subgrouping by gender (only the whole sample is adequately powered to detect an effect), suggesting a potentially spurious subgroup finding; little evidence of effect on secondary school attendance in females after adjustment for multiple significance testing (P = 0.084 after adjustment); no evidence of effect on monthly earnings. We do know that post hoc analyses increase the risk of type 1 errors (finding an effect when there is no effect present)., Item 18 of the 2010 CONSORT statement for the reporting of randomized trials specifies that post hoc analyses should be clearly labelled as such and considered as exploratory. In addition, the explanatory note states that ‘Post hoc subgroup comparisons (analyses done after looking at the data) are especially likely not to be confirmed by further studies’. At face value, there is a consistency of findings across the two remaining studies by Ozier and Croke. Both studies have substantial methodological and plausibility limitations which should temper their interpretation, but the observed effect after such a small deworming exposure probably deserves further consideration and should be amenable to testing through well-designed randomized trials. More generally, there appears to be a tendency for advocates of deworming to ‘build a case’ for deworming, by drawing together evidence which supports their prior beliefs and ignoring or dismissing the evidence that does not.,, This ‘confirmation bias’ is common, but runs counter to current standards in transparent, evidence-informed decision making and has led to the claims of these studies being cited verbatim without appropriate appraisal. Government ministries responsible for resource allocation; philanthropists supporting these programmes, and the public who are subjected to them, require transparency about what effects could reasonably be expected. If a community in a given setting has a high prevalence of untreated worm infections, then mass deworming programmes may well be an effective way to reach and treat a large number of children. If however, the problem is poor school attendance or low educational attainment, then these are problems which probably require different solutions.

Conclusion

These three studies all have substantial problems in their methods and analysis, which leave unanswered questions about the use of these studies to justify the effectiveness of deworming programmes. They help in generating hypotheses. Decisions about whether or not to implement mass treatment programmes, calculations around programme cost and advocacy to the public, should be based on reliable estimates of effects, informed by robust evidence.

Supplementary Data

Supplementary data are available at IJE online.

Funding

This work was supported through the Effective Health Care Research Programme Consortium, funded by UK aid from the UK Government for the benefit of developing countries (Grant: 5242). The views expressed in this review do not necessarily reflect UK government policy. The funding source had no role in identifying the research topic, nor in the design, data collection and analysis, decision to publish or preparation of the manuscript. Conflict of interest: P.G. is Director of the Evidence Building and Synthesis Research Consortium that receives money to increase the number of evidence-informed decisions by intermediary organizations, including WHO and national decision makers, which benefit the poor in middle- and low-income countries. D.S. and S.J. are employed as part of this Consortium. P.G. is the co-ordinator of a WHO Collaborating Centre for Evidence Synthesis for Infectious and Tropical Diseases [http://apps.who.int/whocc/default.aspx; UNK234] and one of the Centre’s aims is to help WHO in its role as an intermediary in communicating reliable summaries of research evidence to policy makers. P.G. is an author of the Cochrane review evaluating the effects of community-based deworming on health, nutrition and school participation. P.G. receives support from COUNTDOWN, a grant to the Liverpool School of Tropical Medicine from the Department for International Development to promote control of neglected tropical diseases in developing countries, including soil-transmitted helminths. Key Messages The long-term societal effects of mass deworming programmes for soil-transmitted helminths in low- and middle-income countries are contested. Advocates cite economic studies reporting long-term effects on health, schooling and economic development. We sought and appraised these studies using health technology assessment methods based on epidemiological principles. In the 11 reports from three studies, we found multiple potential sources of bias in the study methods, analysis and reporting. Of particular concern are: the lack of pre-planned protocols; multiple hypothesis testing followed by selective reporting of favourable results; and post hoc subgroup analyses. Our interpretation is that these trials do not provide credible evidence to support the claims of long-term effects. However, they raise interesting hypotheses that could be considered in further research. Click here for additional data file.

22 in total

Review 1. Soil-transmitted helminth infections: updating the global picture.

Authors: Nilanthi R de Silva; Simon Brooker; Peter J Hotez; Antonio Montresor; Dirk Engels; Lorenzo Savioli
Journal: Trends Parasitol Date: 2003-12

2. MDA helminth control: more questions than answers.

Authors: Graham F Medley; T Deirdre Hollingsworth
Journal: Lancet Glob Health Date: 2015-10 Impact factor: 26.763

3. Social science. Promoting transparency in social science research.

Authors: E Miguel; C Camerer; K Casey; J Cohen; K M Esterling; A Gerber; R Glennerster; D P Green; M Humphreys; G Imbens; D Laitin; T Madon; L Nelson; B A Nosek; M Petersen; R Sedlmayr; J P Simmons; U Simonsohn; M Van der Laan
Journal: Science Date: 2014-01-03 Impact factor: 47.728

4. Disease and Development: Evidence from Hookworm Eradication in the American South.

Authors: Hoyt Bleakley
Journal: Q J Econ Date: 2007

5. How does multiple testing correction work?

Authors: William S Noble
Journal: Nat Biotechnol Date: 2009-12 Impact factor: 54.908

6. Re-analysis of health and educational impacts of a school-based deworming programme in western Kenya: a pure replication.

Authors: Alexander M Aiken; Calum Davey; James R Hargreaves; Richard J Hayes
Journal: Int J Epidemiol Date: 2015-07-22 Impact factor: 7.196

7. Worms at Work: Long-run Impacts of a Child Health Investment.

Authors: Sarah Baird; Joan Hamory Hicks; Michael Kremer; Edward Miguel
Journal: Q J Econ Date: 2016-07-19

Review 8. Controlling soil-transmitted helminthiasis in pre-school-age children through preventive chemotherapy.

Authors: Marco Albonico; Henrietta Allen; Lester Chitsulo; Dirk Engels; Albis-Francesco Gabrielli; Lorenzo Savioli
Journal: PLoS Negl Trop Dis Date: 2008-03-26

9. Methodological Bias Can Lead the Cochrane Collaboration to Irrelevance in Public Health Decision-Making.

Authors: Antonio Montresor; David Addiss; Marco Albonico; Said Mohammed Ali; Steven K Ault; Albis-Francesco Gabrielli; Amadou Garba; Elkhan Gasimov; Theresa Gyorkos; Mohamed Ahmed Jamsheed; Bruno Levecke; Pamela Mbabazi; Denise Mupfasoni; Lorenzo Savioli; Jozef Vercruysse; Aya Yajima
Journal: PLoS Negl Trop Dis Date: 2015-10-22

Review 10. The Case for Mass Treatment of Intestinal Helminths in Endemic Areas.

Authors: Joan Hamory Hicks; Michael Kremer; Edward Miguel
Journal: PLoS Negl Trop Dis Date: 2015-10-22

8 in total

1. Commentary: Assessing long-run deworming impacts on education and economic outcomes: a comment on Jullien, Sinclair and Garner (2016).

Authors: Sarah Baird; Joan Hamory Hicks; Michael Kremer; Edward Miguel
Journal: Int J Epidemiol Date: 2016-12-01 Impact factor: 7.196

2. Response: Development economics-time to improve research methods.

Authors: David Sinclair; Sophie Jullien; Paul Garner
Journal: Int J Epidemiol Date: 2016-12-01 Impact factor: 7.196

3. Extending the global worm index and its links to human development and child education.

Authors: SuJin Kang; Ashish Damania; M Farhan Majid; Peter J Hotez
Journal: PLoS Negl Trop Dis Date: 2018-06-21

Review 4. Resolving "worm wars": An extended comparison review of findings from key economics and epidemiological studies.

Authors: Muhammad Farhan Majid; Su Jin Kang; Peter J Hotez
Journal: PLoS Negl Trop Dis Date: 2019-03-07