| Literature DB >> 28108528 |
Debbie A Lawlor1,2, Kate Tilling1,2, George Davey Smith1,2.
Abstract
Triangulation is the practice of obtaining more reliable answers to research questions through integrating results from several different approaches, where each approach has different key sources of potential bias that are unrelated to each other. With respect to causal questions in aetiological epidemiology, if the results of different approaches all point to the same conclusion, this strengthens confidence in the finding. This is particularly the case when the key sources of bias of some of the approaches would predict that findings would point in opposite directions if they were due to such biases. Where there are inconsistencies, understanding the key sources of bias of each approach can help to identify what further research is required to address the causal question. The aim of this paper is to illustrate how triangulation might be used to improve causal inference in aetiological epidemiology. We propose a minimum set of criteria for use in triangulation in aetiological epidemiology, summarize the key sources of bias of several approaches and describe how these might be integrated within a triangulation framework. We emphasize the importance of being explicit about the expected direction of bias within each approach, whenever this is possible, and seeking to identify approaches that would be expected to bias the true causal effect in different directions. We also note the importance, when comparing results, of taking account of differences in the duration and timing of exposures. We provide three examples to illustrate these points.Entities:
Keywords: Aetiological epidemiology; Mendelian randomization; RCTs; causality; instrumental variables; natural experiments; negative control studies; triangulation; within-sibships studies
Mesh:
Year: 2016 PMID: 28108528 PMCID: PMC5841843 DOI: 10.1093/ije/dyw314
Source DB: PubMed Journal: Int J Epidemiol ISSN: 0300-5771 Impact factor: 7.196
Key sources of bias in different aetiological epidemiology approaches
| Randomized controlled trials | Prospective intervention study in which people are randomly allocated to comparison groups that are given different interventions | Intervention groups are similar with the exception of the intervention | Lack of concealment of the random allocation. Failure to maintain the original randomized status of participants when comparing outcomes and lack of blinding to which group participants have been randomised. Differential loss to follow-up, for example due to adverse effects of the intervention or a perception that there is no benefit |
| Multivariable regression in observational data | The application of multivariable regression to observational data | No residual confounding (all confounders are accurately measured and controlled for). Participants are not selected to participate or to be included in analyses in a way that produces a spurious associations. Any misclassification of exposure is not related to the outcome, and vice versa, and misclassification of covariables are not systematically related to outcome or exposure | Unmeasured or poorly measured confounders (residual confounding). Reverse causality. Misclassification of exposure is related to the outcome (or vice versa). Differential missing data between exposure levels, for example due loss to follow-up in prospective cohort studies or reporting bias in case-control studies |
| Cross-context comparisons | Compares results between two or more populations in different contexts that result in confounding structures being different | Different results between populations are due to different confounding structures and not due to true differences in causal effects between populations. Similar results between populations cannot be explained by confounding, given the differences between the populations/contexts in their confounding structures. There are no other (than confounding) sources of bias that could explain similar or different results between the two populations | Confounders are, in fact, the same in the populations being compared. For observed confounders, differences between the two populations should be established. There are different sources of bias (over and above different confounding structures), for example differential misclassification of exposure or outcome that investigators are unaware of. Measurement of the exposure and outcome and the quality of these measurements should be the same or very similar in the populations being compared |
| Different control groups | Use of two or more different control groups in a case-control study, where the bias for the different control groups is expected to be in different directions | The different sources of bias for the different control groups are different and would produce different results | If biases are in fact the same in the different control groups being compared, the inference made when comparing them will be misleading. If there are different sources of bias between the different control groups, but these nonetheless distort the finding in the same direction, this will also be misleading. Inference may be incorrect, if there are a priori incorrect assumptions about one of the control groups being least biased for the specific research question. This may be less statistically efficient than having just one control group, as with fixed resources it would imply using a smaller number of controls for each of the two groups (and possibly a smaller number of cases, as resources would be required to recruit two different sources of controls) |
| Natural experiments | Populations are compared in the belief that biases, such as confounding structures are similar between them. One (or more) of the populations has had a ‘natural’ exposure or are ‘quasi-randomly’ exposed. Natural exposure, e.g. flood or famine; quasi- randomization, e.g. those resulting from different timing of introduction to policies, such as smoking bans in public places | The populations being compared are similar with the exception of the naturally or quasi-randomized exposure | Populations differ on characteristics that confound the association. Misclassification of the outcome is related to the naturally occurring exposure. Ideally, identical methods for measuring the outcome should be used in each population. If associations are measured at the aggregated population level but interpreted as if they apply to individuals within the population, there may be bias due to the ecological fallacy |
| Within sibling comparisons | Assesses associations within sibships: comparing outcomes between sibs who are discordant for the exposure. Controls for observed and unobserved shared (familial) confounding | There is little or no individual-level confounding. Any misclassification of exposure or outcome is similar in the siblings | Individual level confounding could occur when siblings are raised in different environments. This approach works best when there is strong family-level confounding, with modest main effects, or where correlation within sibships is much stronger for the confounders than it is for the exposure of interest. This may be the case when examining the effect of intrauterine or early infancy exposures on outcomes assessed several years later. |
| Instrumental variable (IV) analyses | IVs are variables that are robustly associated with an exposure but not with confounders of the exposure and outcome ( | IV is associated with exposure. IV is not associated with confounders of exposure-outcome association. IV is not related to the outcome other than via its association with the exposure (the exclusion restriction criteria) | IV is not truly associated with exposure in the population being studied. There should be robust evidence (e.g. replicated in several different studies) that the IV is related to the exposure, and ideally its association in the study population should be established. If the statistical magnitude of association of the IV with exposure in a study is small, there may be weak instrument bias which would bias towards the results of the confounded exposure-outcome association in one-sample IV analyses and towards the null with two-sample IVs |
| IVs to test intermediates in RCTs | IV is randomization to an intervention that affects an intermediate of the randomized intervention; this intermediate is the exposure of interest (e.g. shown in | As above | Violation of the exclusion restriction criteria is likely to be the main source of bias. Comparing results from multiple IVs that work in different ways to affect the intermediate (e.g. comparing results from RCTs to different antihypertensives to determine the causal effect of BP on CHD |
| Genetic IVs in observational data (MR) | IV is one or more genetic variant(s) that have been shown to robustly relate to exposure | As above | Violation of the exclusion restriction criteria, as a result of genuine (also known as horizontal) pleiotropy ( |
| Non-genetic IVs in observational data | IV is non-genetic, examples include use of exposures in other family members as IVs for the index participants’ exposure, or a ‘natural’ occurring phenomenon (such as famine or flood); this approach is commonly used in natural experiments | As above | Association of the IV with confounders of the exposure-outcome association are more likely with this approach than IVs for intermediates in an RCT or MR. Violation of the exclusion restriction criteria is possible; given the wide range of non-genetic IVs that could potentially be used, the extent to which this may be a major source of bias is hard to state in a general way. Weak instrument bias is possible. |
| Exposure negative control studies | Aims to reproduce the same conditions as the ‘real’ study, but uses a different (negative control) exposure that is not plausibly causally related to outcome | The key sources of bias, including specific confounders, misclassification bias and other biases, are the same for the real and negative control exposures. The negative control exposure does not have a causal effect on the real outcome. To sensibly compare the real and negative control exposure, they should ideally be similarly scaled. This should be possible when negative control exposures are used to test critical or sensitive periods (see section on duration and timing of exposure being assessed with different exposures) | There are differences in the sources of bias between the real and negative control exposure. Attempts to explore this (e.g. exploring the association of observed confounders with the negative control exposure) should be made. There is a real (but unknown) causal effect of the negative control exposure on the outcome |
| Outcome negative control studies | As above, except here a different outcome is selected for the negative control study | As above, except here a different outcome is selected for the negative control study | As above, if either assumption is violated there could be biased inference from the comparison of the real with the negative control study |
We have tried to list most of the key sources of bias for different approaches, but the extent to which these are a key bias in any given triangulation example will depend upon the question being asked and the approaches and data being used to answer this. For example, in general, violation of the exclusion restriction criteria will be a key source of bias in MR studies and use of IVs to test intermediates in RCT; but as we discuss in the section on ‘What we mean by key sources of bias’, sometimes the source will be the same for these two approaches and sometimes it will not. Furthermore, in the second illustrative example, whereas we recognize that violation of the exclusion restriction criteria might bias the IV testing of glucose effects in the RCT, the assumptions we had to make about change in glucose in the control arm are (in that specific example) likely to be a bigger source of bias. The direction of any bias will depend on the question being asked and the approaches and data being used.
Figure 1.Illustrative example of instrumental variable analyses in RCTs and Mendelian randomization studies to answer aetiological questions of the effect of a risk factor (LDLc) on an outcome (CHD). The figure shows directed acyclic graphs (DAGs) of instrumental variable (IV) analyses to test the causal effect of low-density lipoprotein cholesterol (LDLc) on coronary heart disease (CHD). In a and b, the IV is randomization to receiving a statin or not (i.e. this is an example of IV analyses to test an intermediate in an RCT); statins are 3-hydroxy-3-methylglutaryl-coenzyme (HMG-CoA) reductase inhibitors. In (c) and (d), the IV is genetic variants in the HMGCR gene (i.e. this is an MR study); these variants mimic HMG-CoA reductase inhibition. In (e) and (f) the IV is genetic variants (MR) that are independent of those in the HMGCR genes. The three key assumptions of IV analyses are illustrated in (a), (c) and (e), that the: (i) IV ‘Z’ (randomization to statins in a and genetic variants related to LDLc in (c) and (e) is (or is plausibly) robustly related to the risk factor (LDLc in all figures); (ii) IV is not related to confounders (shown by letter C in all figures) for the risk factor-outcome association (shown by the lack of an arrow from C to Z in all figures); (iii) IV only affects the outcome ‘Y’ (CHD) through its effect on the risk factor ‘X’ (LDLc). This last assumption is known as the exclusion restriction criteria. In the RCT of statins example, we know that assumption (i) is true, and if the RCT is well conducted then assumption (ii) will be true. If, however, statins are directly (independently of LDLc) related to other factors which then affect CHD, assumption (iii) will be violated and the estimated causal effect a biased estimate of the true effect of LDLc. There is some evidence that statins do relate to a wide range of other lipids and fatty acids in addition to LDLc, though whether these are caused by the statins independent of LDLc and affect CHD is currently unknown. If they do (as shown as an illustrative example in (b) then the estimate of the LDLc effect on CHD is likely to be biased (what is assumed to be the effect of LDLc on CHD will be the combined effect of LDLc and other lipids/fatty acids on CHD). In the MR example of variants in the HMGCR gene, we know that assumption (i) is correct and there is evidence that assumption (ii) is also this is likely to be true. As with the RCT example, in MR we are often most worried about violation of assumption (iii), due to genuine (horizontal) pleiotropy in MR–i.e. that variants in HMGCR influence other factors independently of LDLc which in turn (independently of LDLc) affect CHD (d). As these variants are mimicking the action of statins, then any pleiotropy is likely to be similar to that seen for statins (d). By contrast, (e) and (f) show the use of genetic variants that are unrelated to HMGCR. Although there may still be violation of the exclusion restriction criteria (due to genuine pleiotropy) with these variants, it is unlikely to be related to violation of the exclusion restriction criteria in an RCT of statins because the variants have been selected on the basis that their actions are on a different path from those of statins.
Different approaches used in triangulation to determine the effect of systolic blood pressure on CHD
| Multivariable regression in prospective cohort studies | Prospective Cohorts Collaboration: an individual participant meta-analysis of 958 074 adults (61 studies) aged 40–69 with no previous history of CVD. Exposure = SBP (in both Ference paper and original paper); outcome = fatal CHD | Residual confounding by adiposity and height, both of which | From baseline SBP assessment to death or end of follow-up (mean 13.2 years) |
| IV of intermediate in RCTs | Systematic review and meta-analyses of 25 RCTs including 109 797 participants with no clinical evidence of cardiovascular disease before randomization. Authors of the original paper calculated ratios of difference in log odds CHD ÷ difference in SBP/DBP by randomized group for each antihypertensive, and meta-analysed these. Exposure = SBP (in Ference (triangulation) paper but is actually the combined SBP and DPB effect in the original paper); outcome = fatal or non-fatal CHD | Ference | From randomization to end of follow-up (mean 4.6 years) |
| MR | Two-sample MR. Genetic variants from the International Consortium for Blood Pressure genome-wide association study (ICBP) that had reached genome-wide levels of statistical significance were used. | We undertook a number of sensitivity analyses to explore the possibility of bias due to: (i) the fact that the ICBP results were for SBP adjusted for BMI; and (ii) violation of the exclusion restriction criteria ( | Whole of the participant’s life, and so mean age at end of follow-up or becoming a CHD case (54.9 years) |
In Supplementary text (available at IJE online) we provide full details of how we assessed a range of potential key sources of bias and their likely direction; here we describe the ones that we concluded were the key sources.
For this example, the three approaches were compared within one paper by Ference and colleagues, who undertook the MR study themselves but used results from published meta-analyses of prospective cohort and RCT studies. We have reviewed both Ference et al’s paper and the two papers, that they took results from in preparing this table and the detailed Supplementary text.
Figure 2.Triangulation of effect of systolic blood pressure on CHD risk from three approaches (RCT, multivariable regression and MR). Both graphs show the effect of exposure to 10 mmHg lower systolic blood pressure (SBP) on risk of coronary heart disease (CHD). In (a), squares represent the effect estimate for the association between 10 mmHg lower systolic blood pressure (SBP) and the risk of CHD; horizontal lines represent 95% confidence intervals (CI). The relative risk ratios (RRR) and their 95% CI are given for each approach on the righthand side of the graph. The P-values to the right of the RRR values (P diff) are testing the null hypothesis that results from the different approaches and are consistent with results from the first MR study (reference study). In (b), squares represent the proportional risk reduction (1−risk ratio) of CHD per 10 mmHg lower SBP plotted against the estimated mean length of exposure to 10 mmHg lower SBP; vertical lines represent 1 standard error (SE) above and below the point estimate of proportional risk reduction. Results are plotted against the estimated duration of exposure to lower SBP for each approach. Reproduced from reference 53 with permission.
Different approaches used in triangulation to determine the effect of maternal circulating pregnancy glucose on birthweight
| Multivariable regression (European) | Multivariable regression in 6008 European mother-offspring pairs. Adjusted for offspring sex and gestational age | Residual confounding by maternal socioeconomic position, age, parity and adiposity, which would result in | Fasting glucose assessed at single time point (24–28 weeks of gestation). For cumulative effect we assumed this is from then until birth (i.e. the last 12–16 weeks) |
| Cross-cohort comparison | Comparing multivariable regression between 750 Pakistani origin and 607 White British origin mother-offspring pairs | Because of differences in the associations of SEP with fasting glucose between the two populations [mean difference in glucose by maternal education in Pakistani 0.00 mmol/l (-0.03, 0.04) and in White British 0.04 (0.02, 0.06); by receipt of income support in Pakistani -0.25 (-0.44, -0.07) and in White British 0.10 (0.01, 0.21)], if an association in White British were due to SEP confounding, we would expect a | Fasting glucose assessed at single time point (26–28 weeks of gestation). For cumulative effect we assumed this is from then until birth (i.e. last 12–14 weeks) |
| MR | Use of a weighted allele score of genetic variants known to be robustly associated with fasting glucose as an IV in 11 493 European mother-offspring pairs | Methods, including sensitivity analyses, were undertaken in the paper to explore the possibility of bias due to: (i) weak instruments; and (ii) violation of the exclusion restriction criteria. On the basis of these we concluded that these results may be somewhat biased towards the null as a result of adjusting for offspring genetic variants (see | Assumed this approach tests fasting glucose across the whole of pregnancy |
| IV of intermediate in RCT | 958 women with mild gestational diabetes mellitus randomized to dietary advice, glucose monitoring and insulin treatment if necessary or usual care. We calculated an IV ratio estimate (difference in birthweight by randomised group ÷ difference in glucose by randomised group) | Glucose was not monitored in the control arm and we brought the baseline value forward. Maternal fasting glucose levels increase in the second and third trimesters of pregnancy. As a result the denominator of the IV ratio estimate (i.e. difference in fasting glucose by randomized group) is likely to have been an underestimate of the true difference and the IV estimate of the effect of fasting glucose on birthweight an | Randomization and fasting glucose assessed at ∼ 29 weeks of gestation. For a cumulative effect we assume differences were present for the last 11 weeks of pregnancy |
In Supplementary text (available at IJE online) we provide full details of how we assessed a range of potential key sources of bias and their likely direction; here we describe the ones that we concluded were the key sources.
SEP, socioeconomic position.
Figure 3.Results for triangulation across different approaches to determine the effect of maternal circulating pregnancy glucose on birthweight. a: Difference in mean birthweight (g) per 1 mmol/l greater fasting glucose. b: Difference in mean birthweight (g) per 1 mmol/l greater fasting glucose against the cumulative number of weeks of exposure (in completed gestational weeks) to 1 mmol/l greater fasting glucose. In (a), the effects are shown of 1 mmol/l maternal gestational fasting glucose on difference in mean birthweight in grams (g) from different approaches. MV, multivariable regression in prospective pregnancy cohorts; Euro, European-origin mother-offspring pairs; W Brit, White British mother-offspring pairs; minimal adjust, adjusted for infant sex and gestational age only; full adjust, fuller adjustment with additional adjustment for maternal age, BMI, parity, education and receipt of income support. In (b), the estimates are shown of the fuller adjusted MV analyses in White British (WB) and Pakistani (P) mother-offspring pairs, together with the IV analyses in the RCT and MR approaches, plotted against estimated length of cumulative exposure to fasting glucose for each approach in completed gestational weeks. The mean length of exposure in the MV of White British and Pakistani pairs is the same (13 weeks), but in order to visualize both they have been separated to 12.5 and 13.5 weeks. The regression line is forced through zero and shows that the RCT result appears to be an outlier (exaggerating the effect of glucose on birthweight).
Different approaches used in triangulation to determine the effect of having been breastfed on later body mass index
| Multivariable regression | Systematic review and meta-analysis of prospective cohort studies in participants largely of European origin, including up to 355 301 participants in different analyses. BMI assessed across ages from 1 to 70 years | Residual confounding by maternal SEP and BMI (the majority of studies did not adjust for these) which would produce an effect estimate that is an |
| Cross-context comparison | Multivariable regression in a UK cohort (BMI assessed at mean age 9; | It was demonstrated that SEP did not relate to breastfeeding in the LMICs or was in the opposite direction (more affluent and educated women being less likely to breastfeed) to that seen in the UK (breastfeeding more common in the more affluent and educated women). If an association in UK participants was due to SEP confounding, we would expect a |
| Within-sibship comparisons | Three studies were identified which examined associations of being breastfed with BMI. Two of these used the same data from the US National Longitudinal Study of Adolescent Health (Add Health) and had similar results; | In both studies breastfeeding was retrospectively reported by mothers when the children were adolescents, and misclassification is potentially the key source of bias here, which would be likely to result in an |
| RCT | 17 046 Belarusian women with healthy singleton births were randomized to a breastfeeding promotion intervention or usual care. The intervention resulted in marked differences in ever breastfeeding, duration of breastfeeding and whether breastfeeding was exclusive. An intention- to-treat analysis was used to assess the impact of these differences on BMI at age 6.5 ( | Because the intervention affected multiple aspects of breastfeeding, it was not possible to do a formal IV analysis, but the intention-to-treat analyses would have similar assumptions to an IV analysis to test an intermediate. There would be violation of the exclusion restriction criteria if the intervention, in addition to influencing breastfeeding, also affected the mothers, such that they had a tendency to a healthier lifestyle more generally, including encouraging their child to have a healthier diet and be more active postnatally. That violation would result in an effect estimate that was an |
| Negative control study | We tried to identify outcomes in the ALSPAC cohort | If the negative control outcomes are biologically influenced by the exposure, the interpretation of this approach would be biased. However, we consider it unlikely that house infestation by pigeons or mice would be affect being breast-fed, other than through confounding or other sources of bias. We explored associations of observed confounders with these outcomes and these verified that these were appropriate negative controls ( |
In Supplementary text (available at IJE online) we provide full details of how we assessed a range of potential key sources of bias; here we describe the ones that we concluded were the key sources. As the length of exposure (to breastfeeding) varied across the studies or was not measured in some, we have not commented on duration of exposure; timing will be similar in all approaches.