Literature DB >> 32798858

To avoid the noncausal association between environmental factor and COVID-19 when using aggregated data: Simulation-based counterexamples for demonstration.

Shi Zhao1.   

Abstract

In the infectious disease epidemiology, the association between an independent factor and disease incidence (or death) counts may fail to infer the association with disease transmission (or mortality risk). To explore the underlying role of environmental factors in the course of COVID-19 epidemic, the importance of following the epidemiological metric's definition and systematic analytical procedures are highlighted. Cautiousness needs to be taken when understanding the outcome association based on the aggregated data, and overinterpretation should be avoided. The existing analytical approaches to address the inferential failure mentioned in this study are also discussed.
Copyright © 2020 Elsevier B.V. All rights reserved.

Entities:  

Keywords:  COVID-19; Epidemic; Modelling; Reproduction number; Statistical inference

Mesh:

Year:  2020        PMID: 32798858      PMCID: PMC7415212          DOI: 10.1016/j.scitotenv.2020.141590

Source DB:  PubMed          Journal:  Sci Total Environ        ISSN: 0048-9697            Impact factor:   7.963


Background and problem

Identifying environmental factors, e.g., meteorological factors and air pollutants, affecting the transmission and mortality risk of COVID-19 is of importance in understanding the features of the ongoing COVID-19 pandemic (Daraei et al., 2020; Ran et al., 2020). Pani et al found the associations between several meteorological factors and COVID-19 incidences under a time series study design, and they implied similar associations with the COVID-19's transmission (Pani et al., 2020). Frontera et al found the positive associations between levels of several air pollutants and COVID-19 cases and deaths under a cross-region ecological study design (Frontera et al., 2020). They concluded that “air pollution may have a strong impact on the high rate of infection and mortality”. Similar analytical approaches as well as main findings appeared in other recent studies (Şahin, 2020; Suhaimi et al., 2020; Shahzad et al., 2020). In the infectious disease epidemiology, the number of incidences, i.e., commonly known as the epidemic curve, is driven by the disease's transmission process that is strongly determined by (i) the transmissibility, and (ii) the seed cases in the latest few days (Bauch et al., 2005; Cori et al., 2013; Lin et al., 2020; Riou and Althaus, 2020; Wallinga and Lipsitch, 2007; Wang et al., 2020; Wu et al., 2020a; Zhao et al., 2020a; Zhao et al., 2020b). Thus, the autocorrelation is highly likely undermine the statistical inference of the role of independent factors when using the aggregated number of COVID-19 incidences time series directly. Time series and ecological studies are two commonly used study designs aiming at exploring the relationship between an independent factor and the course of a disease, e.g., (Pani et al., 2020; Frontera et al., 2020). Both of these two classic study designs adopt the aggregated (instead of individual level) dataset. Hence, cautiousness needs to be taken when understanding the outcome association based on the aggregated data, and overinterpretation should be avoided. In this study, it is demonstrated that the association between an independent factor and disease incidence (or death) counts fails to infer the association with disease transmission (or mortality risk) by using simple simulation-based counterexamples. The existing analytical approaches to address this inferential failure are also discussed.

Simulation framework and construction of counterexamples

To mimic the real-world COVID-19 epidemic growing process, the simple modelling framework in Tuite and Fisman (2020) is adopted with the mean serial interval, τ, at 5 days, referring to previous studies (Zhao et al., 2020c; Ferretti et al., 2020; Ganyani et al., 2020; He et al., 2020; Zhao, 2020), and population size, N, at 10 million. Then, for the t-th day since the first COVID-19 case, the daily number of new cases c  = c ∙r 1/, where c 0 = 1 for first seed case at the start of the outbreak. Here, r denotes effective reproduction number, and thus r  = R ∙[1 − C / N]. The R is the reproduction number, a well-accepted metric quantifying the instantaneous transmissibility of an infectious disease (Zhao et al., 2020a). The C is the cumulative number of cases at the t-th day, and obviously C  ≥ C  > 0. The values of R time series will be directly assumed in our counterexamples, which could also be a constant in some scenarios.

Counterexample #1: a time series study

In this counterexample, the time series study design as in Pani et al. (2020) is considered. An independent factor X was considered as a determinant of the reproduction number (R ) of COVID-19. Three pre-defined scenarios are considered that included scenario (I): the correlation between X and R was negative; scenario (II): the correlation between X and R was positive; and scenario (III): the correlation between X and R was zero. In Fig. 1A, the factor X time series are generated by using the arithmetic sequence with additive random noise terms. For COVID-19, R largely ranges from 1.5 to 3 (Riou and Althaus, 2020; Wang et al., 2020; Wu et al., 2020a; Tuite and Fisman, 2020; Ferretti et al., 2020; He et al., 2020; Li et al., 2020; Zhao et al., 2020d), and the range from 1.5 to 2.5 is used in the simulation. With the same fashion, in Fig. 1B, three types of R time series are generated in decreasing, increasing and constant trends matching with scenarios (I)–(III), respectively. In Fig. 1C, the COVID-19 epidemic curves are simulated in terms of the daily number of cases by using the three R time series in Fig. 1B. Similarly, Fig. 1D shows the cumulative number of COVID-19 cases. The consistency between two kinds of pairwise associations (or correlations) are examined, and they include
Fig. 1

The demonstrative trends of factor X (panel A), reproduction number (R, panel B), daily number of cases (panel C), cumulative number of COVID-19 cases (panel D), and their pairwise relationships (panels E, F and G). In panels B–G, the scenarios (I)–(III) are represented in cyan, purple and gold, respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

the association between X and R , and the association between X and number of COVID-19 cases, i.e., daily or cumulative. The demonstrative trends of factor X (panel A), reproduction number (R, panel B), daily number of cases (panel C), cumulative number of COVID-19 cases (panel D), and their pairwise relationships (panels E, F and G). In panels B–G, the scenarios (I)–(III) are represented in cyan, purple and gold, respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) The inconsistency of these two associations indicates an inferential failure that the association between factor X and number of COVID-19 .

Counterexample #2: an ecological study

In this counterexample, the same cross-region ecological study design is considered as in Frontera et al. (2020). As model setting, n = 51 regions are considered with different region-specific reproduction numbers (R), which can be regarded by setting R s as constants, and case fatality ratios (CFR), and thus there were 51 COVID-19 outbreaks to be simulated. For the i-th region, the reproduction number was denoted by R , the case fatality ratio was denoted by CFR, and both of them are constants for a specific region. For COVID-19, R was set from 1.5 to 3 (Riou and Althaus, 2020; Wang et al., 2020; Wu et al., 2020a; Tuite and Fisman, 2020; Ferretti et al., 2020; He et al., 2020; Li et al., 2020; Zhao et al., 2020d), and CFR ranged from 1.5% to 3% (Verity et al., 2020; Russell et al., 2020; Wu et al., 2020b) across different regions. The CFR = (4.5 − R )/100 is formulated for simple demonstration, as shown in the color code bar in Fig. 2 . In the simulation, the time lag between onset of illness and death was ignored for simplicity, which should not affect our main conclusions. Fig. 1A shows the simulated COVID-19 epidemic curve of 51 regions, i.e., each curve represents the outbreak in one region. Fig. 1B shows the trends of the daily number of COVID-19 deaths for each region.
Fig. 2

The demonstrative trends of daily number of COVID-19 cases (panel A), deaths (panel B), relationship between factor X and case fatality ratio (panel C), and association between factor X and cumulative number of deaths (panel D). Panels (A)–(D) share the same color code of reproduction number (R) and case fatality ratio that is shown at the top of this figure. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

The demonstrative trends of daily number of COVID-19 cases (panel A), deaths (panel B), relationship between factor X and case fatality ratio (panel C), and association between factor X and cumulative number of deaths (panel D). Panels (A)–(D) share the same color code of reproduction number (R) and case fatality ratio that is shown at the top of this figure. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) For a factor, X, negatively correlated with CFR, see Fig. 2C, the association between X and the cumulative number of deaths is examined across all 51 regions. Similar as in counterexample #1, the consistency between two pairwise associations (or correlations) is examined, and they include the pre-defined negative association between X and CFR, and the association between X and cumulative number of COVID-19 deaths. In same fashion, the inconsistency of these two associations indicates an inferential failure that the association between factor X and number of COVID-19 .

Results and discussion

For the counterexample #1, Fig. 1E shows the relationship between factor X and three types of R series, which matches the scenarios (I)–(III) introduced above. However, in Fig. 1F and G, factor X was found positively associated with the daily or cumulative number of COVID-19 cases in all scenarios. Therefore, the proposed inferential failure was evidently demonstrated. Furthermore, as remark, even though this counterexample merely presented the scenarios when R > 1 for demonstration, the conclusions may still hold when R < 1 as well as in more complex contexts. For the counterexample #2, the predefined negative correlation between the factor X and CFR was shown in Fig. 1C. However, the association between X and the cumulative number of deaths are obviously positive, see Fig. 1D, which was opposite the association in Fig. 1C. Thus, it was clear that the positive association between factor X and cumulative number of deaths failed to infer the underlying true relationship between X and mortality risk of COVID-19 (in terms of CFR). A similar inferential failure also occurred when using the daily number of deaths (data not shown). Furthermore, as remark, that even though this counterexample presents the situation when CFR was negatively correlated with R for demonstration, similar kinds of inferential failure may still occur in alternative situations and even more complex contexts. The epidemic curve of an infectious disease is driven by its transmission process that can be measured by several metrics including R. The daily number of cases is strongly determined by the strength of transmissibility and the number of seed cases in the latest few days, which is determined by the serial interval. Hence, the autocorrelation is highly likely undermine the statistical inference of transmission driven factors when using the number of cases time series directly, i.e., without defining quantifying the disease transmissibility. The cautiousness needs to be taken in avoiding this kind of noncausal association between independent factors and COVID-19 incidences. It is important to transform the incidence data to transmissibility by using plausible analytical approaches (Cori et al., 2013; Leung et al., 2020; Wallinga and Teunis, 2004; Ali et al., 2018), and check the associate between transmission rate and external factors thereafter, e.g., (Ran et al., 2020; Yao et al., 2020a). The mortality risk of COVID-19 is reflected by both the number of disease-induced deaths and the number of COVID-19 cases. Thus, the COVID-19 CFR is of importance to properly quantify the mortality risk. It is important to transform the cases and deaths data to CFR by using analytical frameworks in previous studies (Verity et al., 2020; Russell et al., 2020; Wu et al., 2020b; Leung et al., 2020; Yang et al., 2020; Jung et al., 2020). One may then examine the association between CFR and external factors directly, e.g., (Yao et al., 2020b; Yao et al., 2020c). Recently, an increasing number of studies present ‘associations’ between environmental factors and COVID-19 transmission, which is mainly due to two reasons. They include in previous literatures, weather parameters are commonly believed as critical drivers of respiratory virus' transmission (Pica and Bouvier, 2012; Kutter et al., 2018), e.g., SARS-CoV (Yuan et al., 2006; Chan et al., 2011), MERS-CoV (Van Doremalen et al., 2013), and influenza (Ali et al., 2018; Dalziel et al., 2018); and exposing to air pollution may damage the immune system and thus lower the resistance to viral infections. Hence, the associations between environmental factors and COVID-19 transmission or related mortality risks are possible, and consistent with previous findings in other respiratory viruses. However, the counterexamples demonstrate that implausible conclusions may be reached due to flawed analytical procedures. This study highlights the importance of following systematic analytical procedures to explore the role of environmental factors in the infectious disease epidemiology with aggregated dataset.

Conclusive remarks

It is important to follow the systematic analytical procedures to explore the role of environmental factors in the course of COVID-19 pandemic. To avoid noncausal association as well as over-conclusion, the definition of epidemiological metrics for infectious disease should be introduced and clarified before the analytical procedures. Although some simplifications have been made, the two counterexamples proposed in this study apply to other infectious diseases in alternative contexts.

Ethics approval and consent to participate

Not applicable.

Data availability

No real-world data was used in this work.

Authors' contributions

S Zhao conceived the study, carried out the analysis, drafted the letter, discussed the results, critically read and revised the letter.

Funding

This work was not funded.

Disclaimer

The funding agencies had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.

Declaration of competing interest

None declared.
  40 in total

1.  Stability of Middle East respiratory syndrome coronavirus (MERS-CoV) under different environmental conditions.

Authors:  N van Doremalen; T Bushmaker; V J Munster
Journal:  Euro Surveill       Date:  2013-09-19

2.  Estimating the time interval between transmission generations when negative values occur in the serial interval data: using COVID-19 as an example.

Authors:  Shi Zhao
Journal:  Math Biosci Eng       Date:  2020-05-11       Impact factor: 2.080

3.  Estimates of the severity of coronavirus disease 2019: a model-based analysis.

Authors:  Robert Verity; Lucy C Okell; Ilaria Dorigatti; Peter Winskill; Charles Whittaker; Natsuko Imai; Gina Cuomo-Dannenburg; Hayley Thompson; Patrick G T Walker; Han Fu; Amy Dighe; Jamie T Griffin; Marc Baguelin; Sangeeta Bhatia; Adhiratha Boonyasiri; Anne Cori; Zulma Cucunubá; Rich FitzJohn; Katy Gaythorpe; Will Green; Arran Hamlet; Wes Hinsley; Daniel Laydon; Gemma Nedjati-Gilani; Steven Riley; Sabine van Elsland; Erik Volz; Haowei Wang; Yuanrong Wang; Xiaoyue Xi; Christl A Donnelly; Azra C Ghani; Neil M Ferguson
Journal:  Lancet Infect Dis       Date:  2020-03-30       Impact factor: 25.071

4.  Severe air pollution links to higher mortality in COVID-19 patients: The "double-hit" hypothesis.

Authors:  Antonio Frontera; Lorenzo Cianfanelli; Konstantinos Vlachos; Giovanni Landoni; George Cremona
Journal:  J Infect       Date:  2020-05-21       Impact factor: 6.072

5.  Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia.

Authors:  Qun Li; Xuhua Guan; Peng Wu; Xiaoye Wang; Lei Zhou; Yeqing Tong; Ruiqi Ren; Kathy S M Leung; Eric H Y Lau; Jessica Y Wong; Xuesen Xing; Nijuan Xiang; Yang Wu; Chao Li; Qi Chen; Dan Li; Tian Liu; Jing Zhao; Man Liu; Wenxiao Tu; Chuding Chen; Lianmei Jin; Rui Yang; Qi Wang; Suhua Zhou; Rui Wang; Hui Liu; Yinbo Luo; Yuan Liu; Ge Shao; Huan Li; Zhongfa Tao; Yang Yang; Zhiqiang Deng; Boxi Liu; Zhitao Ma; Yanping Zhang; Guoqing Shi; Tommy T Y Lam; Joseph T Wu; George F Gao; Benjamin J Cowling; Bo Yang; Gabriel M Leung; Zijian Feng
Journal:  N Engl J Med       Date:  2020-01-29       Impact factor: 176.079

6.  Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study.

Authors:  Joseph T Wu; Kathy Leung; Gabriel M Leung
Journal:  Lancet       Date:  2020-01-31       Impact factor: 79.321

7.  Estimating the serial interval of the novel coronavirus disease (COVID-19) based on the public surveillance data in Shenzhen, China, from 19 January to 22 February 2020.

Authors:  Kai Wang; Shi Zhao; Ying Liao; Tiantian Zhao; Xiaoyan Wang; Xueliang Zhang; Haiyan Jiao; Huling Li; Yi Yin; Maggie H Wang; Li Xiao; Lei Wang; Daihai He
Journal:  Transbound Emerg Dis       Date:  2020-06-19       Impact factor: 4.521

8.  Modelling the effective reproduction number of vector-borne diseases: the yellow fever outbreak in Luanda, Angola 2015-2016 as an example.

Authors:  Shi Zhao; Salihu S Musa; Jay T Hebert; Peihua Cao; Jinjun Ran; Jiayi Meng; Daihai He; Jing Qin
Journal:  PeerJ       Date:  2020-02-27       Impact factor: 2.984

9.  The role of the environment and its pollution in the prevalence of COVID-19.

Authors:  Hasti Daraei; Kimia Toolabian; Marzieh Kazempour; Mohammad Javanbakht
Journal:  J Infect       Date:  2020-06-12       Impact factor: 6.072

10.  Imitation dynamics in the mitigation of the novel coronavirus disease (COVID-19) outbreak in Wuhan, China from 2019 to 2020.

Authors:  Shi Zhao; Lewi Stone; Daozhou Gao; Salihu S Musa; Marc K C Chong; Daihai He; Maggie H Wang
Journal:  Ann Transl Med       Date:  2020-04
View more
  1 in total

1.  Impact analysis of environmental and social factors on early-stage COVID-19 transmission in China by machine learning.

Authors:  Yifei Han; Jinliang Huang; Rendong Li; Qihui Shao; Dongfeng Han; Xiyue Luo; Juan Qiu
Journal:  Environ Res       Date:  2022-01-21       Impact factor: 8.431

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.