| Literature DB >> 28142011 |
Etsuji Suzuki1, Toshiharu Mitsuhashi2, Toshihide Tsuda3, Eiji Yamamoto4.
Abstract
Confounding is a major concern in epidemiology. Despite its significance, the different notions of confounding have not been fully appreciated in the literature, leading to confusion of causal concepts in epidemiology. In this article, we aim to highlight the importance of differentiating between the subtly different notions of confounding from the perspective of counterfactual reasoning. By using a simple example, we illustrate the significance of considering the distribution of response types to distinguish causation from association, highlighting that confounding depends not only on the population chosen as the target of inference, but also on the notions of confounding in distribution and confounding in measure. This point has been relatively underappreciated, partly because some literature on the concept of confounding has only used the exposed and unexposed groups as the target populations, while it would be helpful to use the total population as the target population. Moreover, to clarify a further distinction between confounding "in expectation" and "realized" confounding, we illustrate the usefulness of examining the distribution of exposure status in the target population. To grasp the explicit distinction between confounding in expectation and realized confounding, we need to understand the mechanism that generates exposure events, not the product of that mechanism. Finally, we graphically illustrate this point, highlighting the usefulness of directed acyclic graphs in examining the presence of confounding in distribution, in the notion of confounding in expectation.Entities:
Keywords: Bias; Confounding; Counterfactual; Directed acyclic graphs; Response types
Mesh:
Year: 2016 PMID: 28142011 PMCID: PMC5328726 DOI: 10.1016/j.je.2016.09.003
Source DB: PubMed Journal: J Epidemiol ISSN: 0917-5040 Impact factor: 3.211
Characteristics of the four smoking subjects during the target time period.[a]
| Subject ID | Sex | History of asbestos exposure | Smoking | Lung cancer | Lung cancer if male/female[ | Response type | |
|---|---|---|---|---|---|---|---|
| Quit smoking (i.e., exposure) | Did not quit (i.e., non-exposure) | ||||||
| Subject #1 | Male | Yes | Quit | Diseased | Diseased | (Diseased) | Doomed |
| Subject #2 | Male | No | Did not quit | Diseased | (Non-diseased) | Diseased | Preventive |
| Subject #3 | Female | No | Quit | Non-diseased | Non-diseased | (Non-diseased) | Immune |
| Subject #4 | Female | No | Did not quit | Diseased | (Non-diseased) | Diseased | Preventive |
Effect of smoking cessation on lung cancer.
Parentheses indicate that these particular outcomes are counterfactual.
Fig. 1Typology of four notions of confounding. DAGs are primarily useful to examine the presence of confounding in the first quadrant. DAG, directed acyclic graph.
Response types and their distribution in Table 1.
| Response type | Response under | Description | Proportion of response types in | |||
|---|---|---|---|---|---|---|
| Exposure | Non-exposure | Exposed[ | Unexposed[ | Total population[ | ||
| 1 | 1 | 1 | Doomed | |||
| 2 | 1 | 0 | Causal | |||
| 3 | 0 | 1 | Preventive | |||
| 4 | 0 | 0 | Immune | |||
Effect of smoking cessation on lung cancer (1 = diseased, 0 = non-diseased). The associational risk difference is calculated as: (p1 + p2) − (q1 + q3) = 1/2 − 2/2 = −1/2. Note that the distribution in this table applies to scenario #2 in Table 3.
Causal risk difference in the exposed group is defined as: (p1 + p2) − (p1 + p3) = p2 − p3 = 0 − 0 = 0 .
Causal risk difference in the unexposed group is defined as: (q1 + q2) − (q1 + q3) = q2 − q3 = 0 − 2/2 = −2/2 .
As shown in Table 1, numbers of the exposed and unexposed groups are balanced, so a proportion of response type i in the total population, r, can be calculated as: p/2 + q/2. The causal risk difference in the total population is defined as: (r1 + r2) − (r1 + r3) = r2 − r3 = 0 − 2/4 = −2/4 .
Six possible scenarios when the numbers of the exposed and unexposed groups are balanced.[a]
| Actually exposed | Actually unexposed | |
|---|---|---|
| Subject ID | #1, #2 | #3, #4 |
| Response type | doomed, preventive | immune, preventive |
| Subject ID | #1, #3 | #2, #4 |
| Response type | doomed, immune | preventive, preventive |
| Subject ID | #1, #4 | #2, #3 |
| Response type | doomed, preventive | preventive, immune |
| Subject ID | #2, #3 | #1, #4 |
| Response type | preventive, immune | doomed, preventive |
| Subject ID | #2, #4 | #1, #3 |
| Response type | preventive, preventive | doomed, immune |
| Subject ID | #3, #4 | #1, #2 |
| Response type | immune, preventive | doomed, preventive |
Scenarios #1 and #3 are identical from the perspective of counterfactual reasoning, because the distributions of response types are the same in these scenarios. Similarly, scenarios #4 and #6 are identical from the perspective of counterfactual reasoning. Consequently, these six scenarios are grouped into a total of four patterns in terms of the distributions of response types.
Exposure status shown in Table 1 corresponds to scenario #2.
Expected values of the estimators for risk and their difference in Situations 1 and 2.
| Observed values | Situation 1 | Situation 2 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Risk estimates in | RD estimates | Weights[ | Observed risk × weight | RD estimator | Weights[ | Observed risk × weight | RD estimator | ||||
| Exposed | Unexposed | Exposed | Unexposed | Exposed | Unexposed | ||||||
| Scenario #1 | 1/2 | 1/2 | 0 | 1/6 | 1/12 | 1/12 | 16/33 | 8/33 | 8/33 | ||
| Scenario #2 | 1/2 | 2/2 | −1/2 | 1/6 | 1/12 | 2/12 | 4/33 | 2/33 | 4/33 | ||
| Scenario #3 | 1/2 | 1/2 | 0 | 1/6 | 1/12 | 1/12 | 4/33 | 2/33 | 2/33 | ||
| Scenario #4 | 0/2 | 2/2 | −2/2 | 1/6 | 0 | 2/12 | 4/33 | 0 | 4/33 | ||
| Scenario #5 | 0/2 | 1/2 | −1/2 | 1/6 | 0 | 1/12 | 4/33 | 0 | 2/33 | ||
| Scenario #6 | 0/2 | 2/2 | −2/2 | 1/6 | 0 | 2/12 | 1/33 | 0 | 1/33 | ||
| Expected value | 1/4 | 3/4 | −1/2 | 4/11 | 7/11 | −3/11 | |||||
RD, risk difference.
Probability of the four subjects quitting smoking is 1/2, so the six scenarios are induced randomly. See Table 3 and eAppendix 1 for details.
Probabilities of the two males and the two females quitting smoking are 2/3 and 1/3, respectively, so scenario #1 is expected to occur 16 times (i.e., 24) as often as scenario #6. Likewise, each of the scenarios #2–5 is expected to occur four times (i.e., 22) as often as scenario #6. See Table 3 and eAppendix 1 for details.
Fig. 2DAG for a study of the effect of smoking cessation on lung cancer. The presence of a dashed arrow from “Male” to “Smoking Cessation” is determined by comparing the probabilities of the two males and two females quitting smoking. If we use a DAG with signed edges in Situation 2, all the edges including the dashed arrow are positive. By applying the signed DAG approach, the sign of a backdoor path from “Smoking Cessation” to “Lung Cancer” is the product of the signs of the edges that constitute that path, and we can conclude that the sign of the bias is positive. This is consistent with the fact that, when the target population is the total population, there is positive bias in Situation 2 in the notion of confounding in expectation. DAG, directed acyclic graph.