Literature DB >> 33998694

Challenges in determining causality: An ongoing critique of Bendavid et al's 'Assessing mandatory stay-at-home and business closure effects on the spread of COVID-19'.

Lonni Besançon¹, Gideon Meyerowitz-Katz², Emilio Zanetti Chini³, Hermann Fuchs⁴, Antoine Flahault⁵.

Abstract

Entities: Chemical

Mesh：

Year: 2021 PMID： 33998694 PMCID： PMC8209814 DOI： 10.1111/eci.13599

Source DB: PubMed Journal: Eur J Clin Invest ISSN： 0014-2972 Impact factor: 5.722

× No keyword cloud information.

CONFLICT OF INTEREST

The authors declare no conflict of interests. Dear Editors, We are happy to respond to Bendavid et al on the matter of their paper (the authors, henceforth). Given the subject matter impacts on lives across the globe, we are pleased to have the opportunity to continue this worthwhile discussion. While the authors have written a response to our initial concerns, , , we feel that it falls short in a number of key ways, and thus, the paper still does not propose a useful assessment of the efficacy of Non‐Pharmaceutical Interventions (NPIs) against COVID‐19.

SAMPLE SIZE AND ASSESSMENT CRITERIA

We are confused by the authors’ response to our questions regarding sample size and the inclusion/exclusion criteria they used. First, on sample size, while the authors have indeed combined regional estimates, even within the paper itself they agree that there are 16 primary comparisons between the total sample of 10 countries. The primary analysis, therefore, is indeed limited to the very small sample size of 10 (or perhaps 16) which remains a choice that significantly limits the analysis in important ways. An analogy to the argument of the authors, in the field of clinical trials, would be to argue that a study with 100 patients does not have a sample size of 100 patients since the drugs have been in the millions of cells of each patient, but that the results are presented aggregated by the patient in the end. As also noted by the John Hopkins institute's review of the paper, while subnational data analysis is one of the strengths of the initial manuscript, the fact that the authors only included 10 of the many countries with subnational data available is one of the key limitations of the study. Concerning the exclusion criteria used, the authors seem to point out in their own response to a contradiction. In their initial manuscript, the authors explained that they only included countries with subnational data available. In their response, the authors note that they excluded countries with restrictive measures but few cases. This first highlights the fact that the exclusion criterion was not presented in the original manuscript. Then, the authors argue that this exclusion is justified because there is ‘no evidence beyond the anecdotal’ that restrictive NPIs can control cases, which makes very little sense considering that this is precisely the question the paper is presumably attempting to answer. Excluding these countries seems to be a clear example of confounding by indication. If mrNPIs are indeed associated with fewer cases, but countries with very low numbers of cases are excluded, by definition the analysis will fail to find an effect of mrNPIs where one exists. Finally, on the matter of sample size and exclusion criteria, the authors have not only excluded countries with few cases. It is fairly trivial to include other countries with many cases—such as Brazil—however, such countries seem to also have been excluded. All of these points considered it would seem that our initial criticism of the sample size and inclusion criteria still remain valid despite the authors’ response.

COUNTRY CLASSIFICATION

The authors respond to the criticism that their decisions were arbitrary by simply disagreeing. Yet, the authors have not provided, in these two manuscripts, any rationale for the categorization that they have done nor have they given any coding scheme to classify countries should anyone wish to extend their analysis in future work. This is, it seems, an admission that the classification is arbitrary, or subjective. If these decisions were not arbitrary, it would be useful for the authors to publish a fulsome accounting of the difference between a more and less restrictive NPI country, with particular attention given to how subnational units can be vastly different. Indeed, this accounting seems extremely important more broadly for the paper and the argument from the authors. While they assert that their distinction ‘characterizes the countries well’, there is, it seems, no factual basis to this claim. Without a rigorous examination of what makes a NPI ‘more’ or ‘less’ restrictive, and why each country was categorized as such, the analysis simply represents the opinions of the authors and has no underlying scientific rationale. The authors may consider these countries more or less restrictive, but unless they explain why and how these classifications came about, it is hard to garner meaning from the analysis. By many measures, South Korea is in fact a ‘more’ restrictive country. As explained in one of the letters, it had one of the longest school closures in the world, and school closures are considered as one of the strictest NPI as the recent heated debate over this measure has shown (eg see , ). This idea is even reinforced by looking at the stringency index for all specified countries as calculated by OurWorldInData, we can see that South Korea is one of the countries that implemented much stricter NPIs during the time period examined (see Figure 1). An even more compelling image is visible when looking at the Containment and Health Index (see Figure 2) for which South Korea is now the second most restrictive country only behind Italy. Much like any index, these two have inherent limitations, but they provide an objective categorization of countries based on how restrictive their measures have been. When applied to the countries selected by the authors, these two indexes show, in addition to our initial arguments that South Korea had measures that would be considered in most countries as restrictive, that the classification done by the authors does not hold in many regards. Since the authors have not yet provided in their initial article nor in their response their coding scheme for country classification, our argument is that it arbitrary or subjective thus stands. The authors may, of course, disagree with this categorization of South Korea as a mrNPI, but if so, they should provide an objective reason rather than simply dismissing the criticism.

FIGURE 1

FIGURE 2

Containment and Health Index of all countries included by the authors until the maximum cut‐off date as specific in the supplementary materials of the original manuscript. England is not included as OurWorldInData only provided Containment and Health Index data for the United Kingdom. Image source: OurWorldInData

Stringency Index of all countries included by the authors until the maximum cut‐off date as specific in the supplementary materials of the original manuscript. England is not included as OurWorldInData only provided Stringency Index data for the United Kingdom. Image source: OurWorldInData Containment and Health Index of all countries included by the authors until the maximum cut‐off date as specific in the supplementary materials of the original manuscript. England is not included as OurWorldInData only provided Containment and Health Index data for the United Kingdom. Image source: OurWorldInData

ISSUES IN THE ‘POLICY’ VARIABLE

There are two points that we would like to rebut and question concerning the modelling used. First, in Section 4 of their reply, the authors solve the problem of the definition of ‘Policy’ variable as dichotomous. Surprisingly, then they write that they ‘implement panel regression model where coefficient on Policy variables identify “breaks” [the quotes are by authors] in case growth patterns in each sub‐national unit following the implementation of each NPI identified by specific Policy variables rather than a difference‐in‐difference as suggested by Zanetti Chini’. The so‐called ‘breaks’ (defined as ‘structural breaks’ in econometric literature to distinguish a break that produces perduring effects in the path of the time series under investigation from other ones that can be explained by cyclical oscillations or pure noise) cannot be identified by the coefficient of Policy{pcit}. This is a discrete‐choice model for panel data, not a model for structural breaks. Structural breaks require completely different models and statistical treatment like spline and eventually have to be tested properly. In any case, it cannot be addressed by imputing, sic et simpliciter, this meaning to a coefficient. Second, the authors explain in their response on the issues of timing and lags, identified in all three letters , , that they do not make a difference. The authors point out that the ‘timing of each NPI in each subnational unit of each country is explicitly modeled in the Policy {pcit} variables’. We think that their answers here miss the point of all three letters. There will not be a unique number of days between declaring an NPI and notable effects in the daily case numbers. Some responses are earlier, others later. Since a lot of factors such as individual behavioural responses have to be factored in (eg see , , , ), there is a distribution of time lags leading to a smooth temporal onset of the effect. Policy is a binary variable in the model and therefore attributes NPI‐induced growth reductions prior to the day of switching Policy to 1 to the pre‐NPI period and takes the not yet fully developed reductions in the days after as the complete NPI effect. This decreases the effective pre‐post difference, even if the day of switching has the lag equal to the mean value of the lag distribution.

DATA CUT‐OFF

In their response, Bendavid et al correctly state: ‘Fuchs worries about omitting the period of declining daily case numbers...’. As a reason, he emphasized that this decline is claimed to be the main benefit of rigorous NPIs and provides the most prominent negative contributions to growth rates. In their original paper, the authors defined such negative contributions as the signature of NPIs, and at the same time, they suppress the most prominent negative contributions provided in the period after the start of rigorous NPIs, without explicit mention nor reasons. Mention now is supplied in their response: ‘The data that we include cover the period up to the elimination of rapid growth in the first wave’, that is the period in which the daily case numbers form a sort of maximum, the subsequent descent being excluded, to the detriment of the signature of rigorous NPIs. A foundation for the data cut‐off is still missing.

ISSUES ON CASES REDUCTION

The arguments in the authors’ response on ‘not very implausible values’ of 0.4 or −028 logarithmic growth change due to rigorous NPIs miss the point. The largest beneficial growth change of −0.28 conceded to rigorous NPIs by the authors’ analysis, in the original paper was denoted as ‘modest’. This qualification is criticized by Fuchs as misleading. The logarithmic −0.28 growth change is equivalent to a factor 2 of reduction in daily case number within 2.5 days and thus sufficient to neutralize the most dramatic exponential increase in COVID‐19‐infected cases observed—this is not a ‘modest’ reduction by any reasonable definition. This quite beneficial value of −0.28 growth reduction is certainly not as exceptional as presented by the authors that unilaterally attenuates the quantitative effect of rigorous NPIs, via the various approximations discussed above.

ESTIMATING THE NPI EFFECT

The authors’ neglecting of the Diff‐in‐Diff is surprising, since, in the equation on p. 3 of the original paper, θ0 are fixed effects of subnational units and δct are country specific day‐of‐week fixed effects. This is a canonical specification of a Diff‐in‐Diff estimation: subnational units of a certain country differ among them in levels but not in the trend, assumed by the authors as common in all the subnational units of that country. As mentioned in Zanetti Chini's reply, this assumption is not sound, and this can be proved by looking at the data of Italian regions, for example. If Diff‐in‐Diff is not used in this context, it is impossible to understand how the estimates of the model parameters have been made. Are these obtained by Least Squares? If so, what kind? Grouped? Pooled? Each one of these estimators relies on specific assumptions that need to be properly discussed in the context of the empirical strategy. Without this information, any code replication becomes useless, as the statistical methodology that drives the available coding is missing. Moreover, the motivation that the authors give to the non‐use of Diff‐in‐diff estimation (which, contrary to their response, is not a suggestion but an attempt to understand what precisely they have done) is not really a motivation. Namely, they write: ‘We do not pass a strong verdict on the role of parallel trend assumption for causal identification here, but note that if it were indeed critical, that would invalidate most assessments of NPI effects that use similar econometric approaches, since the baseline trends are unique and highly nonlinear in each subnational unit’. This sentence does not seem to make sense with respect to the uniqueness of the trend: aren't the authors using two units for each comparison, so that a small panel with i = 2, hence with two individual trends can be constructed? Or are they computing a common trend among these two individuals? But yet again, how is this done? Is it via cointegration analysis? This is not explained in the submissions. Moreover, the assertion is also inaccurate in the part of the nonlinearity. In fact, a substantial portion of the econometric literature addresses nonlinear panel data (and discrete‐choice models), see for example. , , Finally, the overconfidence in randomizations seems inappropriate. The authors write: ‘Randomization has been increasingly used for assessing the impact of real‐world policies, and the value of knowing the benefits of NPIs, especially those with large health and welfare costs, would be enormous’. Some of the past literature (eg ) argues in the opposite direction: in fact, the estimates from experiments can be severely biased when the comparison is done using different models, so that the use of nonexperimental estimators is still fully justified.

CONCLUSION

Overall, we are forced to restate our previous position, which is that this paper does not allow us to meaningfully assess the efficacy of NPIs against COVID‐19. It is not possible to know from this study whether restrictive NPIs work, do not work or even how we might define a country's response as more or less ‘restrictive’.

12 in total

1. Data in paper about Swedish schoolchildren come under fire.

Authors: Gretchen Vogel
Journal: Science Date: 2021-03-05 Impact factor: 47.728

2. Open Schools, Covid-19, and Child and Teacher Morbidity in Sweden.

Authors: Lonni Besançon; David Steadson; Antoine Flahault
Journal: N Engl J Med Date: 2021-03-01 Impact factor: 91.245

3. A global panel database of pandemic policies (Oxford COVID-19 Government Response Tracker).

Authors: Thomas Hale; Noam Angrist; Rafael Goldszmidt; Beatriz Kira; Anna Petherick; Toby Phillips; Samuel Webster; Emily Cameron-Blake; Laura Hallas; Saptarshi Majumdar; Helen Tatlow
Journal: Nat Hum Behav Date: 2021-03-08

4. Letter to Editor.

Authors: Emilio Zanetti Chini
Journal: Eur J Clin Invest Date: 2021-04-04 Impact factor: 4.686

5. Sample size, timing, and other confounding factors: Toward a fair assessment of stay-at-home orders.

Authors: Lonni Besançon; Gideon Meyerowitz-Katz; Antoine Flahault
Journal: Eur J Clin Invest Date: 2021-03-02 Impact factor: 4.686

6. Authors Response to Letters to the editor regarding: 'Assessing mandatory stay- At- Home and business closure effects on the spread of COVID- 19'.

Authors: Eran Bendavid; Christopher Oh; Jay Bhattacharya; John P A Ioannidis
Journal: Eur J Clin Invest Date: 2021-03-29 Impact factor: 4.686

7. Challenges in determining causality: An ongoing critique of Bendavid et al's 'Assessing mandatory stay-at-home and business closure effects on the spread of COVID-19'.

Authors: Lonni Besançon; Gideon Meyerowitz-Katz; Emilio Zanetti Chini; Hermann Fuchs; Antoine Flahault
Journal: Eur J Clin Invest Date: 2021-06-18 Impact factor: 5.722

8. Fear, lockdown, and diversion: Comparing drivers of pandemic economic decline 2020.

Authors: Austan Goolsbee; Chad Syverson
Journal: J Public Econ Date: 2020-11-25

9. Inferring the effectiveness of government interventions against COVID-19.

Authors: Jan M Brauner; Sören Mindermann; Mrinank Sharma; Leonid Chindelevitch; Yarin Gal; Jan Kulveit; David Johnston; John Salvatier; Tomáš Gavenčiak; Anna B Stephenson; Gavin Leech; George Altman; Vladimir Mikulik; Alexander John Norman; Joshua Teperowski Monrad; Tamay Besiroglu; Hong Ge; Meghan A Hartwick; Yee Whye Teh
Journal: Science Date: 2020-12-15 Impact factor: 47.728

10. Assessing mandatory stay-at-home and business closure effects on the spread of COVID-19.

Authors: Eran Bendavid; Christopher Oh; Jay Bhattacharya; John P A Ioannidis
Journal: Eur J Clin Invest Date: 2021-02-01 Impact factor: 5.722

3 in total

1. Challenges in determining causality: An ongoing critique of Bendavid et al's 'Assessing mandatory stay-at-home and business closure effects on the spread of COVID-19'.

Authors: Lonni Besançon; Gideon Meyerowitz-Katz; Emilio Zanetti Chini; Hermann Fuchs; Antoine Flahault
Journal: Eur J Clin Invest Date: 2021-06-18 Impact factor: 5.722

2. Re: Subramanian and Kumar. Vaccination rates and COVID-19 cases.

Authors: Matthieu Mulot; Corentin Segalas; Clémence Leyrat; Lonni Besançon
Journal: Eur J Epidemiol Date: 2021-12-24 Impact factor: 12.434

3. Impact of mobility reduction on COVID-19 mortality: absence of evidence might be due to methodological issues.

Authors: Gideon Meyerowitz-Katz; Lonni Besançon; Antoine Flahault; Raphael Wimmer
Journal: Sci Rep Date: 2021-12-07 Impact factor: 4.379

3 in total