Literature DB >> 24696862

An investigation of the significance of residual confounding effect.

Wenbin Liang¹, Yuejen Zhao², Andy H Lee³.

Abstract

BACKGROUND: Observational studies are commonly conducted in health research. However, due to their lack of randomization, the estimated associations between the outcome and the exposure can be affected by unmeasured confounding factors. It is important to determine how likely a significant association observed between an outcome variable and a noncausally related exposure may be introduced by residual confounding factors.
METHODS: A simulation approach is developed based on the sufficient cause model to test the likelihood of significant associations observed between a noncausally related exposure and the outcome.
RESULTS: Based on the estimates from all 500 replicates, the association between the exposure and the outcome is found to be significant in 386 (77%) replicates when all confounders (component causes) are controlled for in the model. However, when a subset of real component causes and some noncausal factors are controlled for in the model, the association between exposure and the outcome becomes significant in 487 (97%) replicates.
CONCLUSION: Even when all confounding factors are known and controlled for using conventional multivariate analysis, the observed association between exposure and outcome can still be dominated by residual confounding effects. Therefore, an observed significant association apparently provides limited evidence for a causal relationship.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2014 PMID： 24696862 PMCID： PMC3947891 DOI： 10.1155/2014/658056

Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411

1. Introduction

Ethical and budgetary constraints often limit the application of experimental study designs in health research, so that observational studies such as cohort or case-control studies have been widely undertaken as methodological alternatives [1-5]. However, due to the lack of randomization, the estimates so obtained can be influenced by uncontrolled or unmeasured confounders and typically, the confounders bias estimates from their true values [6-12]. According to the epidemiological literature, a confounder must meet the following conditions: (i) being a cause of the disease, or a proxy of cause(s), in unexposed people; (ii) being correlated with exposure in the study population; (iii) not being an intermediate step in the causal pathway between the exposure and the disease [1, 13–16]. To deal with confounding effects, known or suspected confounders are measured together with the exposure and outcome of interest. Multivariate analyses are then performed to measure the association between the exposure and the outcome while attempting to remove the effects of such known or suspected confounders [8, 13, 17–19]. Under the sufficient cause model, a sufficient cause means a complete causal mechanism, which can be defined as a combination of minimal conditions (necessary elements) and events that inevitably produce disease, while the necessary elements that constitute a sufficient cause are component causes [2]. It is common that component causes and compositions of sufficient causes are unknown, with simultaneous existence of measurement errors, misclassifications for exposures, confounders, and outcomes [8, 20–23]. Consequently, the estimated associations between the outcome and the exposure remain likely to be affected by unmeasured confounding factors. For example, even in well-designed studies, significant protective associations occurred between true nonprotective exposures and outcomes are actually caused by unmeasured confounding factors [24, 25]. It is thus important to investigate how likely a significant association observed between an outcome variable and a noncausally related exposure may be introduced by residual confounding factors. In this study, we develop a simulation approach to test the likelihood of observing significant associations between a noncausally related exposure and the outcome variable based on standard multivariate analysis, given that the compositions of sufficient causes are not recognized, but either all risk factors/component causes are known and controlled, or only some of the risk factors/component causes are known and controlled. There are two objectives: (1) to investigate the likelihood of false positive observations in observational studies, (2) to propose a simulation framework for assessing epidemiologic methods which deal with confounding effects.

2. Methods

2.1. Overview of the Simulation

The simulation process follows the sufficient cause model [2]. For an event to occur, at least one sufficient cause has to occur. The components of a sufficient cause are randomly chosen from a pool of low to moderate correlated variables, which include the exposure of interest and 99 other variables. The exposure of interest is set to be noncausal for the outcome and therefore it will never be chosen as a component for a sufficient cause. Given the correlation among the 100 variables, every chosen variable is a potential confounding factor for the association between the exposure and the outcome. The association between the exposure and the outcome is then estimated using a logistic regression model, while controlling for (i) all component causes; and (ii) some of the component causes (selected at random). The simulations contain 500 replicates, with each replicate being generated through an independent process. All simulations are performed using the STATA package release 12. The procedures involved in each replicate are outlined below. Details of the simulation procedure, including the sufficient cause model and the estimation process, are provided in the Appendix.where C indicates whether X is involved in at least one sufficient cause of Y, that is, C = 1 if true and C = 0 otherwise. Here, β 1 and β are the estimated effects of X 1 and each of the component causes on Y, respectively. To estimate the effect of X 1 on Y when only some component causes are known, and there are some noncausal factors being mistaken as causal factors, we have where K indicates whether X is “known” or suspected to be involved in at least one sufficient cause of Y, β 1′, and β ′ are the estimated effects of X 1 and each of the “known” risk factors on Y, respectively. Generate a pool of low to moderate correlated random variables from the uniform [0,1) distribution: T 100×50000 = {T }, i = (1,2, 3,…, 100), n = (1,2, 3,…, 50000). Determine the composition of sufficient causes and the threshold values of components. The total number for the types of sufficient causes for Y is randomly chosen from (1,2, 3,…, 9). Components for each type of sufficient causes are randomly selected from T , i = (2,3,…, 100). T 1 is taken as the exposure, which is set to be noncausal for Y. For each observation, a sufficient cause is set to occur, when each of its components has a value higher than its specific threshold value. The threshold value is specific for each component as well as each type of sufficient cause, and it is randomly chosen from a uniform [0.5, 0.9) distribution. This allows the threshold values to vary between components as well as between different sufficient causes for the same component. To reflect the fact that exact threshold values are typically unknown, T are then dichotomized into binary form denoted by X , i = (1,2, 3,…, 100), n = (1,2, 3,…, 50000), by applying the following rule: X is set to 1 if T > 0.7, and 0 otherwise. Here, the mean 0.7 of a uniform [0.5, 0.9) variable is used instead of applying the exact threshold values, in order to account for unavoidable measurement errors and misclassifications in confounders and exposures. Generate competing events for Y, E , n = (1,2, 3,…, 50000). Note that E is independent of T and X. Generate small random errors for Y to represent measurement errors of outcome and to smooth the computing process. Q is a Bernoulli distributed random variable, being independent of E and X and only accounts for a small proportion of variance of Y. Determine the status (occur or not occur) of Y. Determine the known (not necessary the fact) causal factors for Y through a random process. Details of steps 1 to 6 can be found in the Appendix. Estimate the effect of X 1 on Y when all component causes are identified. There is no noncausal factor being mistaken as causal factor. We have

3. Results

Data obtained from replicate 1 is used as an example. Table 1 shows details of the sufficient causes and their components for replicate 1. Overall, the incidence rate (per 1000 observation units) for Y is 32.4, while it is 20.2 among unexposed observations (X 1 = 0) and 89.0 among exposed observations (X 1 = 1). This leads to an observed crude exposed-to-unexposed risk ratio of 4.4, though the exposure is not causal for Y. Moreover, as shown in Table 2, the strength of association between exposure and confounders is considerably low, with low level of misclassifications for confounder status. Given that all confounding factors (component causes) are controlled for in the model, the effect of exposure remained significant (P < 0.001). Table 3 suggests that the effect of exposure is further biased away from the null when only a subset of real component causes and some noncausal factors are controlled in the model.

Table 1

Sufficient causes and their components for replicate 1.

Type of sufficient cause	Components (cut-off points)	Observed frequency for the 50,000 observations
A	X ₁₇ (0.847), X ₅₀ (0.850)	421
B	X ₇ (0.521), X ₂₉ (0.881), X ₅₃ (0.619)	515
C	X ₁₈ (0.754), X ₂₀ (0.626), X ₂₁ (0.504), X ₃₈ (0.642), X ₉₁ (0.617)	741

As described in the simulation design and the appendix, the total number of sufficient causes and the components of each possible sufficient cause vary between replicates and are determined by independent random process (i.e., sufficient cause A has two components: X 17 and X 50; sufficient cause B has three components: X 7, X 29, and X 53).

Table 2

Source and magnitude of bias in replicate 1.

Confounder/ Component	Correlation with exposure¹	Percentage of misclassification²
X ₁₇	0.183	13.4%
X ₅₀	0.160	14.4%
X ₇	0.135	26.7%
X ₂₉	0.150	15.5%
X ₅₃	0.181	11.6%
X ₁₈	0.227	5.89%
X ₂₀	0.155	10.8%
X ₂₁	0.292	31.2%
X ₃₈	0.188	7.9%
X ₉₁	0.282	11.4%

1Measured as the correlation coefficient between binary form of component (occurred or not occurred) and binary from of exposure in the 50,000 observations for replicate 1.

2Measured as 1 minus the proportion of correct classification of confounder/component status (occurred or not occurred) in the 50,000 observations for replicate 1.

Table 3

Estimates from multivariate analysis in replicate 1.

	Model adjusted for all component causes			Model adjusted for randomly selected component causes and noncausal factors
	Odds ratios	95% Confidence interval		Odds ratios	95% Confidence interval
(X ₁) Exposure	1.31	1.17	1.48	1.71	1.52	1.92
X ₅	—			1.48	1.32	1.65
X ₇	1.61	1.44	1.81	—
X ₁₁	—			1.95	1.73	2.20
X ₁₄	—			1.69	1.50	1.89
X ₁₇	2.45	2.18	2.75	—
X ₁₈	4.55	4.03	5.13	—
X ₂₀	2.67	2.38	2.99	—
X ₂₁	1.49	1.32	1.67	1.93	1.71	2.17
X ₂₃	—			1.47	1.31	1.65
X ₂₉	2.68	2.38	3.01
X ₃₂	—			1.40	1.26	1.57
X ₃₇	—			1.41	1.26	1.57
X ₃₈	3.10	2.76	3.48	—
X ₅₀	2.41	2.16	2.70	—
X ₅₃	1.90	1.69	2.13	2.12	1.90	2.37
X ₅₇	—			2.06	1.83	2.32
X ₆₉	—			1.39	1.25	1.56
X ₉₀	—			1.17	1.04	1.31
X ₉₁	2.28	2.03	2.56	—

—: variables not included in the model.

Based on the estimates from all replicates, the association between the exposure and the outcome Y is found to be significant in 386 (77%) out of the 500 replicates when all confounders (component causes) are controlled in the model. However, when a subset (rather than all) of real component causes and some noncausal factors are controlled in the model, the association between the exposure and the outcome Y becomes significant in 487 (97%) out of the 500 replicates. In addition, Figure 1 indicates that when adjusting for all the real component causes, the significantly estimated effect of the exposure is on average substantially smaller than the effects of real component causes. The mean (standard deviation), 25th, 50th, and 75th percentiles of the significant coefficients (natural logarithm of the odds ratio) are 0.22 (0.17), 0.14, 0.18, and 0.25, respectively for the noncausal exposure and are 0.73 (0.79), 0.23, 0.42, and 0.927, respectively, for the real component causes.

Figure 1

Difference in estimate distributions between the exposure and the real component causes. The upper and lower adjacent lines indicate the upper and lower adjacent values, respectively; the upper and lower edges of the boxes indicate 75th percentiles and 25th percentiles, respectively; and the white lines in the boxes indicate the medians. (The upper limit of the graph is set to 2.409, the 95th percentile for coefficients of the real component causes).

4. Discussion

In observational studies, when a statistical significant association arises between an exposure and the outcome in the multivariate analysis, it is usually considered as supportive evidence for causal relationship [8]. We adopt the sufficient cause model in the simulation process to investigate how likely a significant association between the exposure and the outcome may be observed when there is no causal association between the two in an observational study setting. The results indicate that significant associations between the exposure and its noncausal related outcomes are presented in more than 70% of the situations, even when assuming that all confounders (causal factors) are known to researchers and controlled for in the multivariate analysis. In reality, many component causes of a disease are unknown [8, 20–23]. Moreover, results from the simulation study suggest that under the conventional multivariate analysis approach, residual confounding effects remain strong enough to influence the observed associations and an observed significant association provides only limited evidence for a causal relationship. Therefore, new methods are required to handle residual confounding effects. The simulation design adopted in this study can also serve as a platform to evaluate the performance of such methods. There are several advantages of our simulation design. Firstly, although all component causes and sufficient causes are determined through random process, they are all tracked and measured, unlike collected data where most pieces of information on component causes and sufficient causes are unknown and unmeasurable. Secondly, for specific exposures and outcomes, information from existing literature can be easily adopted into the simulation design. Thirdly, the simulation design can be adjusted to fit specific prior assumptions on the distributions and correlations among component causes and the exposure as well as compositions of sufficient causes. Hence it is possible to obtain estimates on the effects of the exposure under different prior assumptions.

5. Conclusion

This study demonstrates that even when all confounding factors are known and controlled for using conventional multivariate analysis, the observed association between exposure and outcome can still be dominated by residual confounding effects. An observed significant association apparently provides limited evidence for a causal relationship.

21 in total

Review 1. Confounding in health research.

Authors: S Greenland; H Morgenstern
Journal: Annu Rev Public Health Date: 2001 Impact factor: 21.981

2. Confounding and confounders.

Authors: R McNamee
Journal: Occup Environ Med Date: 2003-03 Impact factor: 4.402

Review 3. What is a cause and how do we know one? A grammar for pragmatic epidemiology.

Authors: M Susser
Journal: Am J Epidemiol Date: 1991-04-01 Impact factor: 4.897

4. Confounding in epidemiological studies: why "independent" effects may not be all they seem.

Authors: G D Smith; A N Phillips
Journal: BMJ Date: 1992-09-26

5. The impact of confounder selection criteria on effect estimation.

Authors: R M Mickey; S Greenland
Journal: Am J Epidemiol Date: 1989-01 Impact factor: 4.897

6. Toward a clearer definition of confounding.

Authors: C R Weinberg
Journal: Am J Epidemiol Date: 1993-01-01 Impact factor: 4.897

7. Postmenopausal estrogen therapy and cardiovascular disease. Ten-year follow-up from the nurses' health study.

Authors: M J Stampfer; G A Colditz; W C Willett; J E Manson; B Rosner; F E Speizer; C H Hennekens
Journal: N Engl J Med Date: 1991-09-12 Impact factor: 91.245

8. Effect of potentially modifiable risk factors associated with myocardial infarction in 52 countries (the INTERHEART study): case-control study.

Authors: Salim Yusuf; Steven Hawken; Stephanie Ounpuu; Tony Dans; Alvaro Avezum; Fernando Lanas; Matthew McQueen; Andrzej Budaj; Prem Pais; John Varigos; Liu Lisheng
Journal: Lancet Date: 2004 Sep 11-17 Impact factor: 79.321

9. Estrogen plus progestin and the risk of coronary heart disease.

Authors: JoAnn E Manson; Judith Hsia; Karen C Johnson; Jacques E Rossouw; Annlouise R Assaf; Norman L Lasser; Maurizio Trevisan; Henry R Black; Susan R Heckbert; Robert Detrano; Ora L Strickland; Nathan D Wong; John R Crouse; Evan Stein; Mary Cushman
Journal: N Engl J Med Date: 2003-08-07 Impact factor: 91.245

10. Evaluating epidemiological evidence: a simple test.

Authors: Wenbin Liang
Journal: Int J Med Sci Date: 2013-08-28 Impact factor: 3.738

11 in total

1. Associations of maternal non-nutritive sweetener intake during pregnancy with offspring body mass index and body fat from birth to adolescence.

Authors: Michael I Goran; Emily Oken; Jasmine F Plows; Izzuddin M Aris; Sheryl L Rifas-Shiman
Journal: Int J Obes (Lond) Date: 2021-10-05 Impact factor: 5.095

2. SARS-CoV-2 Infection and the Risk of Suicidal and Self-Harm Thoughts and Behaviour: A Systematic Review.

Authors: Mark Sinyor; Rabia Zaheer; Roger T Webb; Duleeka Knipe; Emily Eyles; Julian P T Higgins; Luke McGuinness; Lena Schmidt; Catherine Macleod-Hall; Dana Dekel; David Gunnell; Ann John
Journal: Can J Psychiatry Date: 2022-05-09 Impact factor: 5.321

Review 3. The Influence of Sex in Stroke Thrombolysis: A Systematic Review and Meta-Analysis.

Authors: Mingsu Liu; Guangqin Li; Jie Tang; Yan Liao; Lin Li; Yang Zheng; Tongli Guo; Xin Kang; Maoting Yuan
Journal: J Clin Neurol Date: 2018-04 Impact factor: 3.077

4. Nutritional status according to the mini nutritional assessment (MNA)® as potential prognostic factor for health and treatment outcomes in patients with cancer - a systematic review.

Authors: G Torbahn; T Strauss; C C Sieber; E Kiesswetter; D Volkert
Journal: BMC Cancer Date: 2020-06-26 Impact factor: 4.430

5. Selenium supplementation and insulin resistance in a randomized, clinical trial.

Authors: Elizabeth Theresa Jacobs; Peter Lance; Lawrence J Mandarino; Nathan A Ellis; H-H Sherry Chow; Janet Foote; Jessica A Martinez; Chiu-Hsieh Paul Hsu; Ken Batai; Kathylynn Saboda; Patricia A Thompson
Journal: BMJ Open Diabetes Res Care Date: 2019-02-07

6. Reflection on modern methods: generalized linear models for prognosis and intervention-theory, practice and implications for machine learning.

Authors: Kellyn F Arnold; Vinny Davies; Marc de Kamps; Peter W G Tennant; John Mbotwa; Mark S Gilthorpe
Journal: Int J Epidemiol Date: 2021-01-23 Impact factor: 7.196

7. Medication adherence and clinical outcomes in dispensing and non-dispensing practices: a cross-sectional analysis.

Authors: Mayam Gomez-Cano; Bianca Wiering; Gary Abel; John L Campbell; Christopher E Clark
Journal: Br J Gen Pract Date: 2020-12-28 Impact factor: 5.386

8. Association of Retinal Nerve Fiber Layer Thickness, an Index of Neurodegeneration, With Depressive Symptoms Over Time.

Authors: Frank C T van der Heide; Indra L M Steens; Anouk F J Geraets; Yuri D Foreman; Ronald M A Henry; Abraham A Kroon; Carla J H van der Kallen; Thomas T van Sloten; Pieter C Dagnelie; Martien C J M van Dongen; Simone J P M Eussen; Tos T J M Berendschot; Jan S A G Schouten; Carroll A B Webers; Marleen M J van Greevenbroek; Anke Wesselius; Annemarie Koster; Nicolaas C Schaper; Miranda T Schram; Seb Köhler; Coen D A Stehouwer
Journal: JAMA Netw Open Date: 2021-11-01

Review 9. Quality Reporting of Multivariable Regression Models in Observational Studies: Review of a Representative Sample of Articles Published in Biomedical Journals.

Authors: Jordi Real; Carles Forné; Albert Roso-Llorach; Jose M Martínez-Sánchez
Journal: Medicine (Baltimore) Date: 2016-05 Impact factor: 1.889

10. Carotid stiffness is associated with retinal microvascular dysfunction-The Maastricht study.

Authors: Frank C T van der Heide; Tan Lai Zhou; Ronald M A Henry; Alfons J H M Houben; Abraham A Kroon; Pieter C Dagnelie; Martien C J M van Dongen; Simone J P M Eussen; Tos T J M Berendschot; Jan S A G Schouten; Carroll A B Webers; Miranda T Schram; Marleen M J van Greevenbroek; Anke Wesselius; Casper G Schalkwijk; Annemarie Koster; Hans H C M Savelberg; Nicolaas C Schaper; Koen D Reesink; Coen D A Stehouwer
Journal: Microcirculation Date: 2021-05-06 Impact factor: 2.628