Literature DB >> 30406209

The Causal Effects of Education on Health Outcomes in the UK Biobank.

Neil M Davies^1,2, Matt Dickson³, George Davey Smith^1,2, Gerard J van den Berg^1,4, Frank Windmeijer^1,4.

Abstract

Entities: Chemical Disease Gene Species

Keywords: ROSLA; education; genomic confounding; instrumental variable analysis

Year: 2018 PMID： 30406209 PMCID： PMC6217998 DOI： 10.1038/s41562-017-0279-y

Source DB: PubMed Journal: Nat Hum Behav ISSN： 2397-3374

× No keyword cloud information.

Educated people are generally healthier, have fewer comorbidities and live longer than people with less education.1–3 Much of the evidence about the effects of education comes from observational studies, which can be affected by residual confounding. Natural experiments, such as increases to minimum school leaving age laws, are a potentially more robust source of evidence about the causal effects of education. Previous studies have exploited this natural experiment using population-level administrative data to investigate mortality, and surveys to investigate the effect on morbidity.1,2,4 Here, we add to the evidence using data from a large sample from the UK Biobank.5 We exploit the raising of the minimum school leaving age in the UK in September 1972 as a natural experiment.6 We used a regression discontinuity design to investigate the causal effects of remaining in school. We found consistent evidence that remaining in school causally reduced risk of diabetes and mortality in all specifications. We do not know if the differences in outcomes across education groups is because education directly causes these outcomes, by affecting behaviors, such as smoking, or if these differences are due to other factors, such as socioeconomic or genomic differences. Whether education causes differences in outcomes later in life has been the subject of considerable debate by epidemiologists, economists and other social scientists.1–3,7–17 Economists have argued that in addition to its effects on income, a substantial portion of the benefits of education accrue via its potential effects on mortality and morbidity.3 Epidemiologists have found that people who attended university have higher fluid intelligence in adulthood.18 These associations are robust to adjustment for parental social class and adolescent cognition, which has been taken by some as proof that education causes later outcomes.19 Despite this, many epidemiologists and economists are acutely aware that correlations and multivariable adjusted regressions can be unreliable evidence of causation.20–22 The ideal experiment to test this hypothesis, randomizing the age at which children leave school, is unlikely to be ethical, cost-effective, or timely. A more feasible, and potentially robust, research design is to exploit natural experiments that affected when people left school but are not related to confounding factors.23,24 One widely used natural experiment are changes to the legal minimum school leaving age. These changes forced some people to stay in school for longer than they would have otherwise chosen. In September 1972, the school leaving age increased from age 15 to 16 for children in England. Before the reform, the vast majority of those who left school at age 15 went into the labor force and found employment. The 1971 census indicated that in April 1971 32% of 15-year olds were non-students, of whom 87% were in the labor force. At this time, the unemployment rates in this group were 21.7% and 14.9% for males and females respectively.25 Government discussions at the time of the reform raised concerns at the impact of the immediate withdrawal of 400,000 15-year olds from the labor force as a result of the reform. School leavers at this time were strongly attached to the labor market.26 Researchers have previously used this policy change to investigate the effects of forcing students to stay in school longer using administrative data and longitudinal cohort studies.2,4,27,28 However, the cohort studies had relatively small samples and, as a result, produced relatively imprecise estimates of the effects of education. Previous results from administrative data lacked detailed information needed to identify people born in England affected by the reform, or on many outcomes of interest such as cognition or clinical measures of aging such as grip strength. In the current study, we used the raising of the school leaving age in 1972 as a natural experiment to estimate the causal effects of schooling. We used a regression discontinuity design and data from the UK Biobank.29,30 We add to the literature in two ways. First, this is the largest sample with detailed individual-level information from the school years immediately before and after the reform. Second, we used genome-wide data to demonstrate that the observational associations of education and other outcomes are likely to suffer from genomic confounding. Of the 502,644 participants in the UK Biobank, who were all aged between 37 and 74 at recruitment in 2008, 390,412 were born in England, (see Supplementary Figure 1 for a flow diagram of inclusion and exclusion of participants in this study, and Supplementary Table 1 for a description of their characteristics). The youngest participants, those born between 1960 and 1971, obtained more education than those born earlier in the twentieth century (Figure 1). This is consistent with the well-documented secular increase in the length of education over the period.2 UK Biobank includes 11,240 and 10,898 participants who turned 15 years old in the last year before and the first year after the school leaving age increased. Before the reform, 85% of participants remained in school after the age of 15, whereas after the reform almost 100% of participants remained in school after the age of 15. The proportions of men and women who remained in school after age 15 increased over time (Supplementary Figure 2). Participants born in July and August could still technically leave school before their 16th birthday, this is why participants born in the summer term were more likely to report leaving school before the age of 16.

Figure 1

Years of full-time education by quarter of birth. Each dot represents the proportion who left education before the given age per quarter. The black line indicates the first cohort of participants who were affected by the reform implemented in September 1972. These participants were born after in or after September 1957 and faced a minimum school leaving age of 16. This is a one year increase compared to those born before September 1957. The participants who did not have a university degree were asked, “What age did you leave full-time education?” People who were born in the summer (July-August) were still able to leave school at age 15. N=384,743.

People who remained in school after age 15 had higher birth weights, their mothers were less likely to smoke during pregnancy, were more likely to have been breastfed, were more likely to have parents who were alive, and had fewer siblings (Supplementary Table 2). In addition they had more genetic variants (single nucleotide polymorphism (SNPs)) known to associate with higher educational attainment31 (Supplementary Table 2). This suggests that the association of educational attainment and later outcomes will suffer from residual genomic confounding. In comparison, there were few detectable pre-existing differences between people affected and unaffected by the reform. The only detectable difference was that the parents of participants in the first year affected by the reform were more likely to be alive when they attended the assessment center in 2008-2010 (4.3 95% confidence intervals (95%CI): 2.5 to 6.1) and 3.7 (95%CI: 2.6 to 4.8) percentage points for father and mother respectively). These associations could be due to age effects, because on average the parents of those in the first year affected by the reform will be a year younger than parents’ of those in the previous school year. Alternatively, having more educated, and potentially richer offspring may increase parents’ longevity, perhaps via improved care.32 There was some evidence that fewer participants in the younger cohort were breastfed. On average, participants in the cohorts before and after the reform had similar numbers of education associated genetic variants. This suggests that associations of the reform and later outcomes are unlikely to suffer from residual genomic confounding. The participants affected by the reform are, by definition, an average of one year younger than those who were not affected. The raw differences above do not account for this age difference. There was little evidence of manipulation around the discontinuity (McCrary robust bias-corrected regression discontinuity manipulation test p=0.21).33 In this section we report two comparisons: first, the differences between participants who chose to stay in school after the age of 15 and those who left, and second, the regression discontinuity results. The regression discontinuity results are the difference between participants not affected by the reform (those born before September 1957) and those affected by it (those born in or after September 1957). On average, participants who chose to stay in school after age 15 had better outcomes later in life. They were less likely: to be diagnosed with hypertension, diabetes, a stroke or a heart attack, to die, smoke or have ever smoked, and were more likely to be diagnosed with depression (left columns in Table 1). Rates of cancer diagnoses were similar across education levels. Participants who remained in school had stronger grips, lower arterial stiffness, and lower systolic and diastolic blood pressure. They also reported higher incomes, were taller, thinner, achieved higher scores on the intelligence test, drank more, watched less television, and exercised less. There was little difference in happiness.

Table 1

The associations of remaining in school after age 15, and attending school after the raising of the school leaving age (ROSLA) and outcomes. Participants born between September 1956 and August 1958. ROSLA= Raising of the school leaving age. Estimated using robust linear regression, with standard errors clustered by year and month of birth. All estimates adjust for the month of birth and sex. The same sample was used for both the conventional linear regression and ROSLA analyses. Inverse probability weights used to correct for under-sampling of participants who left school at age 15 (weight=1.8857). The difference in outcomes between those who remained and left school at age 15 are included for comparison, and may suffer from residual confounding. * denotes mean differences.

	Left school after age 15					Affected by ROSLA
	N	Risk/Mean difference	95% Confidence interval		P-value	Risk/Mean difference	95% Confidence interval		P-value
	N	Risk/Mean difference	Lower	Upper	P-value	Risk/Mean difference	Lower	Upper	P-value
Hypertension	21,768	-0.039	-0.057	-0.021	1.9E-4	-0.018	-0.026	-0.010	9.0E-5
Diabetes	22,049	-0.019	-0.031	-0.008	0.002	-0.008	-0.011	-0.005	3.5E-6
Stroke	22,110	-0.006	-0.011	-0.002	0.009	-0.003	-0.005	-0.001	0.001
Heart attack	22,110	-0.011	-0.017	-0.005	9.5E-4	-0.003	-0.004	-0.002	2.5E-5
Depression	21,085	0.031	0.017	0.045	9.7E-5	-0.003	-0.010	0.005	0.47
Cancer	22,011	-0.006	-0.020	0.008	0.38	-0.005	-0.011	0.001	0.09
Died	22,138	-0.008	-0.013	-0.003	0.004	-0.005	-0.007	-0.002	0.001
Ever smoked	22,086	-0.205	-0.228	-0.183	1.9E-15	-0.023	-0.034	-0.012	3.0E-4
Currently smoke	22,086	-0.141	-0.155	-0.127	1.7E-16	-0.009	-0.014	-0.003	0.004
Income over £18k	19,921	0.174	0.154	0.195	8.0E-15	0.024	0.019	0.029	2.3E-10
Income over £31k	19,921	0.296	0.274	0.318	4.1E-19	0.052	0.047	0.058	6.7E-16
Income over £52k	19,921	0.256	0.239	0.274	3.2E-20	0.032	0.020	0.043	1.1E-5
Income over £100k	19,921	0.079	0.071	0.087	2.5E-16	0.005	-0.001	0.012	0.08
Grip strength (kg)*	21,989	1.215	0.947	1.484	2.6E-9	0.551	0.476	0.626	1.7E-13
Arterial Stiffness*	8,537	-0.750	-0.931	-0.570	1.2E-8	-0.113	-0.223	-0.003	0.04
Height (cm)*	22,077	1.765	1.517	2.014	3.6E-13	0.286	0.193	0.379	1.7E-6
BMI (kg/m²)*	22,055	-1.235	-1.478	-0.992	2.9E-10	-0.252	-0.324	-0.179	2.6E-7
Diastolic blood pressure (mmHg)*	21,494	-0.877	-1.377	-0.377	0.001	-0.069	-0.291	0.154	0.53
Systolic blood pressure (mmHg)*	21,492	-1.688	-2.444	-0.933	1.2E-4	-0.611	-0.923	-0.299	4.9E-4
Intelligence (0 to 13)*	8,540	1.653	1.458	1.849	9.0E-15	0.148	0.086	0.210	5.8E-5
Happiness (0 to 5 Likert)*	8,626	0.008	-0.047	0.062	0.77	-0.015	-0.039	0.009	0.21
Alcohol consumption (1 low, 5 high)*	22,123	0.316	0.229	0.404	1.3E-7	0.036	0.009	0.064	0.01
Hours of television viewing per day*	21,206	-0.834	-0.916	-0.752	1.5E-16	-0.137	-0.172	-0.102	3.0E-8
Moderate exercise (days/week)*	21,330	-0.480	-0.639	-0.321	2.2E-6	0.005	-0.040	0.049	0.84
Vigorous exercise (days/week)*	21,379	-0.129	-0.207	-0.051	0.002	0.010	-0.019	0.038	0.50

Turning to the regression discontinuity results, there was little evidence that the reform affected rates of depression, diastolic blood pressure, and rates of moderate and vigorous exercise (right columns in Table 1). For the other outcomes, the effect of the reform was consistent in direction with the association of choosing to remain in school and the outcomes. We found some evidence that the reform may have had a larger effect on male’s likelihood of earning more than £31,000 (p-value for interaction=0.008), but little evidence of interactions by gender with any other outcomes (Supplementary Tables 3 and 4). There was some evidence that the reform had larger effects on participants predicted to leave before the age of 16: specifically increasing the likelihood of earning over £18,000 or £31,000, increasing grip strength and happiness, and alcohol consumption (Supplementary Table 5). As a sensitivity analysis we repeated the analyses reported in Table 1 using Calonico, Cattaneo, and Titiunik (2014) optimal bandwidths (reported in Supplementary Tables 6, sex stratified in 7 and 8). These bandwidths are calculated using each outcome and the running variable (the difference between the participant’s date of birth and 1st of September 1957 in months). They minimize the mean squared error of the estimates. The bandwidths ranged from 24 to 65.4 months, greater than the 12 months used for the results above. These analyses allow for differential linear time trends either side of the reform. This substantially increased the sample size and statistical power (standard errors fell by a factor of between 1.25 and 4). The results were consistent in direction with the main results reported in Table 1, except for cancer, income over £100,000 and happiness. However, these differences are consistent with sampling error. Supplementary Tables 9, 10 and 11 provide the results for the regression discontinuity results using a one year bandwidth without using inverse probability weights (see methods below). The associations reported in Table 1 are valid tests of the null hypotheses that education does not affect the outcomes. However, these associations are not informative about the size of the effect of remaining school. We estimated the effect of remaining in school using instrumental variable analysis. Participants affected by the reform were 23.0 (95%CI: 21.7, 24.4) percentage points more likely to remain in school past age 15 than those who were unaffected. This suggests that these analyses are unlikely to suffer from weak instrument bias (min partial F-statistic=811). In Supplementary Table 12 we report instrumental variable estimates of the effect of remaining in school past the age of 15. The instrumental variable estimates are consistent in direction with the effect of the reform described above. There was evidence that the linear regression overestimated the effect of remaining in school on rates of ever or current smoking, income, intelligence, sedentary behavior, and exercise (all Hausman test for difference p<0.007). The instrumental variable results imply that staying in school increases the likelihood of earning more than £18,000, £31,000 or £52,000 by 11.1 (95%CI: 8.9 to 13.3), 24.0 (95%CI: 21.8 to 26.2) and 14.6 (95%CI: 9.8, 19.3) percentage points. These results exceeded the Benjamini and Hochberg (1995) false discovery rate threshold at δ=0.05 for 18 of the 25 outcomes.34 Supplementary Figures 3 and 4 plot the point estimates and confidence intervals for the conventional linear regression and the instrumental variable estimates using a 12 month bandwidth. Supplementary Tables 13 and 14 report the instrumental variable results stratified by sex. There was little evidence the reform had larger effects on men than women, except for the likelihood of having income above £31,000 (p-value for interaction=0.009). We investigated whether the differences in the outcomes seen in the regression discontinuity results could be solely explained by the aging process using a difference-in-difference approach. We created a series of non-overlapping negative control samples which contained participants born in consecutive school years in the 10 years before and after the reform. For each of these samples, we allocated the younger cohort to a “placebo” reform (see Supplementary Figure 1 for diagram and sample sizes). Within each of these negative control samples all the participants experienced the same minimum school leaving age. Therefore any differences between the younger and older school cohort cannot be due to the raising of the school leaving age in 1972, and are likely to be due to the aging process and not an effect of education. Forest plots of the differences in the outcomes for the negative control analyses are reported in the supplementary materials (Supplementary Figures 5 to 29). There was evidence of an effect of age. On average, younger participants in both the ROSLA and negative control cohorts were less likely to: report having had a diagnosis of hypertension, a heart attack, or cancer, die during follow-up, currently smoke, report higher incomes, have higher grip strength, lower arterial stiffness, be taller and slimmer, have lower diastolic and systolic blood pressure, have higher scores on the intelligence tests, be less sedentary, and do less moderate exercise. The effect of the reform on diastolic blood pressure was similar to year-on-year differences seen before the reform, but smaller than differences observed after the reform. The effect of the reform on likelihood of earning over £18,000 and £52,000 was similar to the year-on-year differences observed before the reform, but larger than the differences observed after the reform. The effects of the reform on the outcomes after accounting for age are shown in Figure 2. The effect of the reform exceeded the false discovery threshold for: diabetes, stroke, mortality, former smoker, current smoker, earning over £18,000 or £31,000, grip strength, BMI, intelligence, alcohol consumption, and sedentary behavior. We report sensitivity analyses of the overall result without using inverse probability weights (see methods below) in Supplementary Figures 30 and 31. The effects of the reform exceed the false discovery rate threshold in both the weighted and unweighted analysis for diabetes, stroke, mortality and grip strength.

Figure 2

The effect of the reform on each outcome estimated via difference in differences. The units in the top panel are reported on the absolute risk difference scale (risk differences per 100 people). This is interpreted as the change in the number of events per 100 people affected by the reform. The units for the bottom panel differ by outcome and are listed in the legend on left hand side. All estimates control for gender and month of birth. Estimates are the difference between the year-on-year difference in outcome across the raising of the school leaving age compared to the average year on year difference. Estimated using robust linear regression, with standard errors clustered by month of birth and weighting. Differences and confidence intervals calculated using Bland-Altman tests.61 The estimates for diabetes, stroke, mortality, former and current smoking, income over £18k, and £31k, grip strength, BMI, intelligence, alcohol consumption and sedentary behavior exceed Benjamini and Hochberg (1995) threshold for multiple hypothesis testing. Max N=262,348.

This study provides some of the strongest evidence to date about the causal effects of education. We found that the raising of the school leaving age in 1972 affected some health outcomes. A conservative analysis is to focus on the effects which were consistently found across all estimation methods. We found there was consistent evidence that the reform had generally beneficial effects on risk of diabetes and mortality. Finally, we found molecular genetic evidence that regression discontinuity designs using raising of the school leaving age are unlikely to suffer from residual genomic confounding. Clark and Royer found the participants of the Health Survey for England and the General Household Survey affected by the reform were by 26.1 (95%CI: 23.0 to 29.2) percentage points more likely to stay in school after age 15.2 After correcting for under sampling of people who left school at 15, we found a slightly smaller difference (23.0 95%CI: 21.7, 24.4). Clark and Royer found that people affected by the reform may have had lower mortality between the ages of 40 and 44 (odds-ratio=0.95, 95%CI: 0.89 to 1.01), but had no detectable effects on current or ever smoking, or drinking. Figure 3 presents a sensitivity analyses using identical bandwidths and covariates as in Clark and Royer for mortality, current and ever smoking, and drinking alcohol (coded as a binary rather than ordinal variable in our main analysis). As with our main results, the estimates using Clark and Royer’s specification suggest those affected by the reform had a substantially lower risk of mortality (odds-ratio=0.58, 95%CI: 0.39 to 0.87) (Figure 3). Furthermore, this difference was greater than the average year-on-year difference in mortality seen before and after the reform (Supplementary Figure 11).

Figure 3

The effect of the 1972 reform on mortality, smoking, ever smoking and alcohol consumption from the Office of National Statistics Census (summary data from the entire English and Welsh population) and the General Health Survey for England (min N=47,177) (▲) (Clark and Royer, 2013) and (■) the UK Biobank. All estimates adjust for the month of birth, sex, and a linear time trend which can differ before and after the reform. Estimated using robust linear regression, with standard errors clustered by month of birth and weighting. Current and ever smoking and alcohol consumption additionally adjust for age cubed. Inverse probability weights were used to correct for under-sampling of participants who left school at age 15 (weight=1.8857). The bandwidths are 74, 72, 74, and 138 months for mortality, current smoking, ever smoking, and drink alcohol respectively. In this analysis alcohol consumption is coded as a binary variable equal to one if the participant states they ever drink (93.3%), in the main results alcohol is coded as an ordinal variable. Mortality results are log odds of death. The Clark and Royer mortality results relate to the risk of mortality in the five years between the ages of 40 to 44, whereas UK Biobank participants were between the ages of 42 and 62 and follow-up spanned 7.78 years (over the period 10th May 2006 and 17th February 2014).

The difference between the UK Biobank and Clark and Royer mortality results may be because the UK Biobank participants were almost ten years older (mean age=53.2 years) than the Clark and Royer sample. Clark and Royer sampled those aged 40 to 44 and had a five year follow-up. The 5 year mortality rate for this age group is 0.79%.35 The five leading causes of death for this age group in 2001 were cancer (22.9%), ischemic heart disease (14.9%), alcohol related disease (13.3%), suicides (12.1%) and accidental injuries (7.0%). In contrast, the subsample of the UK Biobank used in the study is comprised of individuals aged between 42 and 62 and has a 7.78 year follow-up. The 8 year probability of mortality between the ages of 42 and 62 was 3.44% in 2008. The five leading causes of death for this age group in 2008 were cancer (37.0%), ischemic heart disease (20.0%), alcohol related disease (9.0%), cerebrovascular diseases (5.7%) and chronic obstructive pulmonary disease (4.8%). Therefore, the absolute probability of mortality is over four times as high in the UK Biobank, and the causes of death differ. In particular, the risk of mortality due to smoking related illness, such as ischemic heart disease, cancer (particularly lung cancer), and chronic obstructive pulmonary disease was much higher in UK Biobank. Therefore it is possible that Clark and Royer’s sample was too young to detect any difference in mortality. Finally, Clark and Royer could not exclude immigrants, who were not affected by the reform, from their sample. This could attenuate their estimates towards the null. In the sensitivity analysis reported in Figure 3, our estimates of the effect of the reform on smoking and alcohol consumption were almost identical to Clark and Royer. However, we found some evidence that the reform affected alcohol consumption and smoking rates using an ordinal measure of alcohol consumption, and tighter bandwidths. These effects exceeded the age effects found in the difference-in-difference analysis for the inverse probability weighted but not in the unweighted analysis. This suggests that the reform may have affected the frequency of alcohol consumption in those who drink alcohol, but had little effect on whether participants drank or not. Epidemiologists have argued that education has causal effects on intelligence later in life. Richards and Sacker found that educational attainment by age 26 was associated with intelligence at age 53,36 which they argue was evidence that education had a causal effect on intelligence.19 However, Deary and Johnson raised doubts about this interpretation and called for greater clarity about the assumptions underlying these analyses.22 We found modest evidence of a causal effect of education on intelligence later in life from the inverse probability weighted estimates. This suggests the raw differences in intelligence between those who remain and leave school at age 15 may over-estimate the effect of schooling on cognition. Our results are also consistent with Nguyen and colleagues, who used increases in the legal school leaving ages in the United States to investigate the effects of education on risk of dementia later in life.24 They found evidence that education reduced the risk of dementia. We cannot test this hypothesis directly in the UK Biobank because too few participants have been diagnosed with dementia. People with more education were much less likely to smoke. However, it is not clear whether this is due to a causal effect of education. Gilman and colleagues found the association between education and smoking status was attenuated in sibling fixed effects designs.37 We found evidence that participants affected by the reform were less likely to smoke, or have ever smoked. Educated participants drank more heavily, but the instrumental variable estimates suggested that this was likely to be an over-estimate of the causal effect of education on alcohol consumption. However, these effects only exceeded the false discovery rate in the weighted analysis. We found some evidence that the effects of the reform on income were greatest in participants who would otherwise have been expected to leave at age 15. Our results are consistent with those of Turley and colleagues who used data from the UK Biobank to investigate heterogeneity in the effects of education on BMI and blood pressure. They used a 110 month bandwidth and a triangle kernel to weigh their results. Their results allowing for differential linear trends before and after the reform suggested that remaining in school caused a 0.42 (95%CI: -0.30 to 1.14) kg/m2 reduction in BMI, and a 2.3 (95%CI: -0.1 to 4.7) percentage point reduction in risk of diabetes.38 A key strength of our study is that we used a natural experiment to identify the effects of education. The raising of the school leaving age in 1972 provided exogenous variation in the length of schooling. We found few pre-existing differences between participants on either side of the reform, suggesting that it can be used as a potentially valid instrumental variable.39 A strength of our study is that it uses one of the largest samples to date to investigate the effects of education on a wide range of outcomes. Our outcomes were recorded both in clinics and via linked NHS mortality registry data. This means our outcomes are likely to suffer from relatively little measurement error. Furthermore, we were able to restrict our sample to people born in England who were affected by the reform. In addition, we used genome-wide data to show that this natural experiment is unlikely to suffer from residual genomic confounding. Participants unaffected and affected by the reform had very similar genome-wide scores for education. A potential limitation of our study is that our treatment group, people affected by the reform, are one year younger than our control group, those born in the last school year unaffected by the reform. Many of the outcomes we investigated increase linearly or log-linearly over time. This means it is difficult to determine if any of the differences we observed in the regression discontinuity design with 12 month bandwidths were due to an additional year of aging or the reform. We addressed this by using a difference-in-difference approach to estimate the average effects of a year of aging (Figures 3), and allowed for a differential linear time trend before and after the reform as a sensitivity analysis using wider bandwidths (Supplementary Tables 4 to 6). These results suggest that aging rather than the reform are likely to explain the differences observed across the regression discontinuity for outcomes such as height. However, it is likely the reform affected outcomes where substantial effects remained in the difference in difference analysis. A representative sample is not a necessary condition for making causal inferences.40 Nevertheless, collider (attenuation) bias could affect our results because Biobank is a volunteer sample, which over-sampled more educated people. People affected by the reform may be more likely to participate in the study.41 This could cause less educated people, who would have remained in school had they attended school after the reform (the compliers), to be under-represented in UK Biobank. This could attenuate our results towards the null, because these marginal students would reduce the average outcome in the “treatment” group, and be missing from the “control” group. This would improve the control group’s outcomes relative to the treatment group. Despite these differences we found little evidence that people affected by the reform were more likely to participate in UK Biobank (see Supplementary Figure 32). In our primary analysis we used inverse probability weighting to account for this sampling. This requires the assumption that the participants sampled in UK Biobank who left school at age 15 are representative of the population that left school at age 15. However, this issue warrants further investigation in future research. There was limited time to collect measures during the participants’ assessment center visits, therefore our measure of intelligence is relatively coarse. Despite this, participants who remained in school had substantially higher intelligence. The instrumental variable estimates suggest that this difference substantially overestimates the causal effect. Finally, our instrumental variable results are estimates of the local average treatment effect of schooling.42 They can be interpreted (“point identified”) either under the assumption that the reform had a monotonic effect on likelihood of staying in school (monotonicity), or that the effects of schooling on the outcomes was not affected by the reform (no effect modification).43 Under the monotonicity assumption, our results are estimates of the causal effects of being forced to remain in school after the age of 15, on those who would otherwise have left school. These effects may not be externally valid to infer either the effects of compelling students to remain in school for longer, or of the effects of education on other populations.44,45 In particular, these results may not be valid estimates of the effect of education on “always takers”, that is people who would always remain in school regardless of the reform. Under the no effect modification assumption, we identify the average effect of education on those who remained in school. At a minimum, our results are internally valid estimates of the effects of schooling on people affected by the reform. Does education affect outcomes later in life? Yes, whilst education is not the panacea implied by naïve multivariable adjusted regression, in this sample increasing the length of compulsory schooling had substantial benefits. We found robust evidence that staying in school is likely to have causal effects on risk of diabetes and mortality. These results add to our understanding of the long-term consequences of educational decisions in childhood and adolescence.

Materials and Methods

Data

We used data from 502,624 participants of the UK Biobank project.29 The participants, aged between 37 and 74, were originally recruited between 2006 and 2010. In our regression discontinuity analysis, we restricted our sample to participants were born in England in the school cohorts in years immediately before and after the reform took place. We do this because we have a large enough sample born in these years to precisely identify the effects of schooling.

Exposure: left school after age 15

The participants were asked if they had a college or university degree. If they did not have a degree they were asked what age they left full-time education. We coded participants who reported having a degree as leaving full-time education at age 21. Participants who did not report the having a degree and did not have data on the age at which they left education were coded as missing.

Outcomes

Health outcomes

The participants were asked whether they had ever been diagnosed by a doctor with the following health conditions: hypertension, stroke, type 2 diabetes, or heart attack. They were asked if they had ever had a whole week where they felt depressed or down. The death of the participants was defined using linked NHS mortality registry data. Follow-up for the linked mortality data started with the first death on 10th May 2006 ended with the last recorded death on 17th February 2014. The cancer diagnoses were taken from the national cancer registries. The first recorded cancer diagnosis was on 20th September 1957 and the last on 25th October 2013.

Height, BMI, blood pressure, arterial stiffness, grip strength, and intelligence

Height and weight were measured during the participants’ visit to a UK Biobank assessment center. Two measures of diastolic and systolic blood pressure were recorded via an electronic blood pressure monitor. The measurements were taken two minutes apart. Arterial stiffness was measured using an electronic measuring device. Grip strength was measured in kilos using a hydraulic hand dynamometer. We residualized the measures of grip strength and arterial stiffness to control for potential between device heterogeneity. Fluid intelligence was measured via the number of 13 logic puzzles that the participants could answer correctly in 2 minutes.

Health behaviors and income

During their assessment center visit, the participants were asked to report their health behaviors. They were asked about how frequently they consumed alcohol. This is coded 6 if they drank every day, 5 for three or four times a week, 4 for once or twice a week, 3 for one to three times a week, 2 for special occasions only, and 1 for never. They were asked if they smoked, or had ever smoked. They were asked how often they moderately and vigorously exercised in a typical week. Finally, they were asked if their pre-tax income was below £18,000, between £18,000 and £30,999, between £31,000 and £51,999, between £52,000 and £100,000, or above £100,000. Participants who did not answer these questions were coded as missing.

Genotype data

The participants provided a blood sample. This sample was used to extract DNA and genotype using the Axiom and BiLEVE genome-wide arrays. These arrays genotyped around 800,000 SNPs for each participant. The genotyping data was used to impute SNPs which were not directly genotyped using the 1000 genomes and UK10K reference panels. The imputation produced a likelihood of each participant having a specific genotype (e.g. AA=0.1, TA=0.9, and TT=0). This resulted in a dataset of around 80,000,000 SNPs. For each participant, we created a genome-wide allele score by summing the number of genetic variants they had that were associated with higher educational attainment. We weighted each variant by its association with education reported in a large genome-wide association study, using a version of the GWAS not including UK Biobank.31 This study reported the association of 8,259,394 genetic variants and years of education in a meta-analysis of 64 studies. We normalized the allele score have mean zero and standard deviation one. This score only explains a minority (r2=1.32% in the full Biobank sample) of the variation in educational attainment explained by genome-wide data.31,46,47 This is because of limited statistical power of existing genome-wide association studies of educational attainment. One consequence of this is that the genetic score is too poor a proxy for the total genetic effects on educational attainment to be used as a conventional covariate in a regression. Therefore we use the educational attainment genome-wide score to test whether on average participants affected by the reform had more genetic variants known to associate with education.39

Statistical methods

We use the changes in the school leaving age to identify the effects of schooling on a range of outcomes. Our empirical strategy has five steps. First, we estimated the effect of the reforms on the proportion of participants who remained in school after age 15. Second, we investigated the associations of potential confounders with educational attainment and across the cohorts affected by the reform.39 Third, we used a regression discontinuity design to estimate the effect of the reform on the outcomes. Fourth, we used instrumental variable estimators to estimate the effects of the remaining in school. For continuous outcomes, we used conventional Wald estimators,48 for binary outcomes we used semi-parametric additive structural mean models.43 To address concerns about multiple hypothesis testing, we report whether the instrumental variable results for each outcome exceed a Benjamini and Hochberg (1995) false discovery rate threshold at δ=0.05 across 25 outcomes.34 Fifth, we conducted a difference in difference analyses.34

Inverse probability weighting

The UK Biobank is a volunteer sample, and as a result people who were left school at age 16 were less likely to attend the clinics than previous studies (17.5% versus 33% reported in Clark and Royer, 2013). Non-random (endogenous) sampling can induce associations in the sampled data, even if an exposure has no causal effect on an outcome.49 This is a particular concern when attempting to draw causal inferences. If the probability of sampling is known, then inverse probability weights can be used to account for the non-random sampling.50 Therefore, we corrected for the non-random sampling using inverse probability weights (equal to 33/17.5=1.8857) for participants who left school at age 15.51 This assumes that the participants who reported leaving school at age 15 are a representative sample of the sub-population who left at 15. If this assumption does not hold, for example if the sampled participants who left at 15 were healthier than those in the population, then the estimates could under estimate the differences between the groups. We report the unweighted results as a sensitivity analysis in the appendix.

Identification

The raising of the school leaving age will be a valid natural experiment for testing whether remaining in school at age 15 affects later outcomes under the following three assumptions. First, participants who attended school after the leaving age was increased must be more likely to stay in school. Second, there must be no pre-existing differences between the cohort who attended school in the year immediately before and immediately after the reform. Finally, the reform must not have any other direct effects on the outcomes. We can test the first assumption by investigating whether participants affected by the reform are more likely to stay in school. We can falsify the second assumption by investigating if there were any pre-existing differences between those affected and unaffected by the reform. The final assumption cannot be empirically tested, and could be invalid if the reform also affected the labor market around the time that the participants entered the workforce. However, claimant count statistics for the UK show that the cohorts entering the labor force immediately before and after the reform faced broadly similar conditions, with increases in unemployment related to the oil crises of the 1970s not being seen until 1975 onwards.52,53 In particular, youth unemployment was almost as low as all age unemployment in the years immediately before the reform, around 5 to 7% for males and 2 to 3% for females, compared with 5% and 1.5% respectively for all age unemployment. This continued to be the case in 1974 when the first post-reform cohort entered the labor market: youth unemployment was 3.6% and 2.0% compared with 3.5% and 1.0% for all age unemployment rate for males and females respectively.25

The effect of the reform on educational attainment

We used a fuzzy regression discontinuity design to estimate the effects of increasing the school leaving age from age 15 to 16 on the proportion of students who report leaving school before the age of 15. To investigate the effect of the reform on school attendance we estimated a regression of staying school after age 15 on a dummy variable equal to one if the participant was a member of the cohort affected by the reform, and equal to zero if they were not affected. In this and all subsequent analyses we included covariates for the month of birth, to control for seasonality, and sex. In contrast to Clark and Royer (2013), we do not include a term for birth cohort because our regression discontinuity results are restricted to people born in the single school years immediately before and after the reform. The regression discontinuity design is identified by assuming that the reform is independent of the unobserved confounding factors, and has no other direct effects on the outcome. The effect of the reform on the probability of participants staying in school after the age of 15, our parameter of interest, is the effect of remaining in school on those who were affected by the reform. We report this parameter on the risk or mean difference scale for binary and continuous outcomes. Our regressions allow for general form heteroskedasticity and clustering by year and month of birth.

Specification tests

We compared the associations of seven potential confounders C and the exposure, left school after the age of 15, E, and the indicator of the reform, D. We estimated these associations conditional on the same set of covariates, , as above and the standard errors allow for clustering by year and month of birth. In addition we test for manipulation of the forcing variable (number of months from 1st September 1957 to the participant’s birthday) using McCrary density tests to test for selection across the period before and after reform.33,54

Effects of increasing the school leaving age on outcomes in later life

Regression discontinuity

We estimated the associations of leaving school after age 15 and the outcomes and the association of the reform and each of the outcomes using the following linear regressions: The first is a linear regression of each of the health outcomes on whether the participant remained in school after the age of 15. The second regression is the association of the health outcomes and the reform. As above, each regression includes terms for sex and month of birth to account for the season of birth. This is a valid test of the null-hypothesis that remaining in school does not affect the outcomes. We tested whether the reform had larger effects on people who would otherwise have been expected to leave school at age 15. We estimated the probability that a participant would remain in school after the age of 15 using logistic regression and data from individuals born before 31st August 1956. This model included indicators for the participants’ assessment center, year and month of birth, sex, whether mother smoked during pregnancy, were breastfed, number of brothers and sisters, the normalized genome-wide education score, and their ethnicity. Missing data were replaced at the mean and indicators variables for missing values were included. We estimated the following regression: Where is probability of remaining in education from the logistic regression. For each outcome we report the coefficients on the reform indicator, and the coefficient on the interaction term and the effect of the reform. The effect of the reform on participants predicted to leave is indicated by φ1, and the effect on those expected to stay is indicated by φ1 + φ2. As with the main results above we adjust for sex and month of birth, and the interaction of these variables with predicted education.55 As a sensitivity analysis we used a regression discontinuity design with variable month bandwidths to investigate the robustness of our findings. In our the main analysis above we present difference in outcomes for the last school cohort of participants before the reform (those born between September 1956 and August 1957) and the first cohort affected by the reform (those born between September 1957 and August 1958). This is a regression discontinuity analysis with a bandwidth of one year. This is a fuzzy regression discontinuity design, as the reform only increased the probability of staying in school.56 In a sensitivity analyses we investigated whether our results were sensitive to the size of the bandwidth around the reform. We did this by repeating our instrumental variable analyses on a sample defined using Calonico, Cattaneo, and Titiunik (2014) optimal bandwidths.57 Analyses using these bandwidths use the same specification as the instrumental variable analyses described above, and in addition include linear time-trends which vary either side of the reform. We estimated the optimal bandwidths using the rdbwselect command in Stata.

Instrumental variables

We estimated the causal effect of schooling using instrumental variables estimators. We estimated mean differences using Wald estimators,48 and risk differences using additive structural mean models, for the continuous and binary outcomes respectively.43 These models can be identified by making one of three assumptions.43 First, for the continuous outcomes we could assume that staying in school has the same effect on the outcomes for all participants. This identifies the average effects of staying in school but is implausible for binary outcomes.58 Second, for the binary outcomes, we could assume a monotonic relationship between the reform and the participants’ likelihood of staying in school after the age of 15. In the potential outcomes framework, that E[Y(1) − Y(0)|E(1) − E(0) > 0]. This requires that there were no participants who were “defiers”, who would have remained in school if they were not affected by the reform, but would have left school if they were affected by the reform. Under monotonicity, the instrumental variable estimators estimate a local average treatment effect. This is the effects of treatment in the sub-group of participants whose decisions were affected by the reform.48 That is the people in the year after the reform who would have chosen to leave school at 15 had the reform not been introduced. Finally, we could assume that the effects of education are not affected by the reform (no effect modification). This would identify the effects of education on participants who remained in school. We report the partial F-statistic of the association of remained in school E and the reform D. We also report the test for endogeneity (using a C-statistic, which is a heteroskedasticity robust Hausman test 59,60, that E[Ew] = 0). This implicitly tests for differences between the linear regression and instrumental variable estimates.60 All estimates allow for clustered standard errors by year and month of birth and include controls for sex and month of birth.

Difference-in-difference

We were concerned that differences between the two school years may occur because of the participants affected by the reform were a year younger on average than participants unaffected by the reform. To investigate this, we estimated the year-on-year differences in each outcome for the five non-overlapping two-year cohorts in the 10 years before and after the reform. Otherwise, we used an identical specification to the regression discontinuity analysis above. There are no changes to the school leaving ages between each of these years. Therefore any year-on-year differences observed in these “negative control cohorts” must be due other factors, such as age effects, and cannot be an effect of raising the school leaving age in 1972. We compared these estimates using forest plots, which are reported in the supplementary materials. We pooled the year-on-year differences from the 5 negative control samples from before and after the reform using the Stata command metan. We calculated the difference between this pooled estimate and difference between the years before and after the reform. We estimated the difference and the standard error of this difference using Bland-Altman tests.61

31 in total

1. What makes UK Biobank special?

Authors: Rory Collins
Journal: Lancet Date: 2012-03-31 Impact factor: 79.321

2. Instruments for causal inference: an epidemiologist's dream?

Authors: Miguel A Hernán; James M Robins
Journal: Epidemiology Date: 2006-07 Impact factor: 4.822

3. Educational attainment and cigarette smoking: a causal association?

Authors: Stephen E Gilman; Laurie T Martin; David B Abrams; Ichiro Kawachi; Laura Kubzansky; Eric B Loucks; Richard Rende; Rima Rudd; Stephen L Buka
Journal: Int J Epidemiol Date: 2008-01-06 Impact factor: 7.196

4. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies.

Authors: Brendan K Bulik-Sullivan; Po-Ru Loh; Hilary K Finucane; Stephan Ripke; Jian Yang; Nick Patterson; Mark J Daly; Alkes L Price; Benjamin M Neale
Journal: Nat Genet Date: 2015-02-02 Impact factor: 38.330

5. The education effect on population health: a reassessment.

Authors: David P Baker; Juan Leon; Emily G Smith Greenaway; John Collins; Marcela Movit
Journal: Popul Dev Rev Date: 2011

6. From child to parent? The significance of children's education for their parents' longevity.

Authors: Jenny Torssander
Journal: Demography Date: 2013-04

7. Education and adult cause-specific mortality--examining the impact of family factors shared by 871 367 Norwegian siblings.

Authors: Oyvind Næss; Dominic A Hoff; Debbie Lawlor; Laust H Mortensen
Journal: Int J Epidemiol Date: 2012-10-13 Impact factor: 7.196

Review 8. Gene × environment interaction studies have not properly controlled for potential confounders: the problem and the (simple) solution.

Authors: Matthew C Keller
Journal: Biol Psychiatry Date: 2013-10-15 Impact factor: 13.382

9. Is education causal? Yes.

Authors: Marcus Richards; Amanda Sacker
Journal: Int J Epidemiol Date: 2010-10-05 Impact factor: 7.196

10. Benefits of educational attainment on adult fluid cognition: international evidence from three birth cohorts.

Authors: Sean A P Clouston; Diana Kuh; Pamela Herd; Jane Elliott; Marcus Richards; Scott M Hofer
Journal: Int J Epidemiol Date: 2012-10-28 Impact factor: 7.196

55 in total

1. How Much Does Education Improve Intelligence? A Meta-Analysis.

Authors: Stuart J Ritchie; Elliot M Tucker-Drob
Journal: Psychol Sci Date: 2018-06-18

2. Evaluating marginal genetic correlation of associated loci for complex diseases and traits between European and East Asian populations.

Authors: Haojie Lu; Ting Wang; Jinhui Zhang; Shuo Zhang; Shuiping Huang; Ping Zeng
Journal: Hum Genet Date: 2021-06-06 Impact factor: 4.132

3. Variable prediction accuracy of polygenic scores within an ancestry group.

Authors: Hakhamanesh Mostafavi; Arbel Harpak; Ipsita Agarwal; Dalton Conley; Jonathan K Pritchard; Molly Przeworski
Journal: Elife Date: 2020-01-30 Impact factor: 8.140

4. Interaction between genetics and smoking in determining risk of coronary artery diseases.

Authors: Yunfeng Huang; Qin Hui; Marta Gwinn; Yi-Juan Hu; Arshed A Quyyumi; Viola Vaccarino; Yan V Sun
Journal: Genet Epidemiol Date: 2022-02-16 Impact factor: 2.135

5. Association of Mental Disorder in Childhood and Adolescence With Subsequent Educational Achievement.

Authors: Søren Dalsgaard; John McGrath; Søren Dinesen Østergaard; Naomi R Wray; Carsten Bøcker Pedersen; Preben Bo Mortensen; Liselotte Petersen
Journal: JAMA Psychiatry Date: 2020-08-01 Impact factor: 21.596

6. Selection Bias When Estimating Average Treatment Effects Using One-sample Instrumental Variable Analysis.

Authors: Rachael A Hughes; Neil M Davies; George Davey Smith; Kate Tilling
Journal: Epidemiology Date: 2019-05 Impact factor: 4.822

7. The effects of education on cognition in older age: Evidence from genotyped Siblings.

Authors: Jason Fletcher; Michael Topping; Fengyi Zheng; Qiongshi Lu
Journal: Soc Sci Med Date: 2021-05-18 Impact factor: 5.379

8. Mediators of the association between educational attainment and type 2 diabetes mellitus: a two-step multivariable Mendelian randomisation study.

Authors: Jia Zhang; Zekai Chen; Katri Pärna; Sander K R van Zon; Harold Snieder; Chris H L Thio
Journal: Diabetologia Date: 2022-04-28 Impact factor: 10.460

9. COVID-19, Alcohol Consumption and Stockpiling Practises in Midlife Women: Repeat Surveys During Lockdown in Australia and the United Kingdom.

Authors: Emma R Miller; Ian N Olver; Carlene J Wilson; Belinda Lunnay; Samantha B Meyer; Kristen Foley; Jessica A Thomas; Barbara Toson; Paul R Ward
Journal: Front Public Health Date: 2021-06-30

10. The impact of population's educational composition on Healthy Life Years: An empirical illustration of 16 European countries.

Authors: Markus Sauerberg
Journal: SSM Popul Health Date: 2021-06-26