Literature DB >> 35490552

Complex trait methylation scores in the prediction of major depressive disorder.

Miruna C Barbu1, Carmen Amador2, Alex S F Kwong3, Xueyi Shen3, Mark J Adams3, David M Howard4, Rosie M Walker5, Stewart W Morris5, Josine L Min6, Chunyu Liu7, Jenny van Dongen8, Mohsen Ghanbari9, Caroline Relton6, David J Porteous5, Archie Campbell5, Kathryn L Evans5, Heather C Whalley3, Andrew M McIntosh3.   

Abstract

BACKGROUND: DNA methylation (DNAm) is associated with time-varying environmental factors that contribute to major depressive disorder (MDD) risk. We sought to test whether DNAm signatures of lifestyle and biochemical factors were associated with MDD to reveal dynamic biomarkers of MDD risk that may be amenable to lifestyle interventions.
METHODS: Here, we calculated methylation scores (MS) at multiple p-value thresholds for lifestyle (BMI, smoking, alcohol consumption, and educational attainment) and biochemical (high-density lipoprotein (HDL) and total cholesterol) factors in Generation Scotland (GS) (N=9,502) and in a replication cohort (ALSPACadults, N=565), using CpG sites reported in previous well-powered methylome-wide association studies. We also compared their predictive accuracy for MDD to a MDD MS in an independent GS sub-sample (N=4,432).
FINDINGS: Each trait MS was significantly associated with its corresponding phenotype in GS (βrange=0.089-1.457) and in ALSPAC (βrange=0.078-2.533). Each MS was also significantly associated with MDD before and after adjustment for its corresponding phenotype in GS (βrange=0.053-0.145). After accounting for relevant lifestyle factors, MS for educational attainment (β=0.094) and alcohol consumption (MSp-value<0.01-0.5; βrange=-0.069-0.083) remained significantly associated with MDD in GS. Smoking (AUC=0.569) and educational attainment (AUC=0.585) MSs could discriminate MDD from controls better than the MDD MS (AUC=0.553) in the independent GS sub-sample. Analyses implicating MDD did not replicate across ALSPAC, although the direction of effect was consistent for all traits when adjusting for the MS corresponding phenotypes.
INTERPRETATION: We showed that lifestyle and biochemical MS were associated with MDD before and after adjustment for their corresponding phenotypes (pnominal<0.05), but not when smoking, alcohol consumption, and BMI were also included as covariates. MDD results did not replicate in the smaller, female-only independent ALSPAC cohort (NALSPAC=565; NGS=9,502), potentially due to demographic differences or low statistical power, but effect sizes were consistent with the direction reported in GS. DNAm scores for modifiable MDD risk factors may contribute to disease vulnerability and, in some cases, explain additional variance to their observed phenotypes. FUNDING: Wellcome Trust.
Copyright © 2022 The Authors. Published by Elsevier B.V. All rights reserved.

Entities:  

Keywords:  Avon longitudinal study of parents and children; DNA methylation; Environmental factors; Generation Scotland; Major depressive disorder; Methylation score

Mesh:

Year:  2022        PMID: 35490552      PMCID: PMC9062752          DOI: 10.1016/j.ebiom.2022.104000

Source DB:  PubMed          Journal:  EBioMedicine        ISSN: 2352-3964            Impact factor:   11.205


Evidence before this study

Major depressive disorder (MDD) is a prevalent psychiatric disorder that is known to result from a complex combination of genetic and environmental risk factors. Polygenic risk scores only account for approximately 1.5-3.2% of the variance in MDD, and previous evidence has also shown associations with a number of lifestyle risk factors, including alcohol consumption, smoking, and body mass index. These factors are also known to have widespread effects on the methylome. In addition, differential DNA methylation has recently been associated with MDD, although the variance explained in the disorder remains small. Although there is evidence of DNA methylation links to both MDD and environmental factors, the epigenetic signature of these factors in relation to MDD has not been investigated thus far. To assess the existing evidence for epigenetic signatures of environmental risk factors for MDD and their association with MDD, we searched Google Scholar for studies from inception to 2021, using the following search terms: “MDD OR DNA methylation environmental risk factors”, “MDD OR lifestyle factors DNA methylation”, “MDD OR MDD environmental risk factors”, “epigenome-wide association studies of smoking OR alcohol OR BMI OR MDD”, “Methylation risk scores AND smoking AND BMI AND alcohol AND MDD”. We also examined reference lists and citations of relevant publications. We did not find any studies that looked specifically at the epigenetic signatures of lifestyle and environmental risk factors for MDD and associations with MDD. We therefore sought to investigate these associations in the current study.

Added value of this study

To our knowledge, this is one of a few studies to look at epigenetic signatures of lifestyle and biochemical factors that confer risk to MDD and their association with MDD in two large, population-based cohorts. Using previous large-scale epigenome-wide association studies, we report associations between 6 complex traits and their epigenetic signature in both cohorts investigated here. In our main cohort, we further report associations between the epigenetic signature of the 6 complex traits and MDD, although these associations become non-significant when accounting for further lifestyle variables. Our MDD results were not replicated in the second cohort. Our findings here indicate that lifestyle factors attenuate the relationship between the epigenetic signature of MDD-relevant environmental risk factors and MDD. The study highlights the importance of lifestyle factors in MDD-DNA methylation associations.

Implications of all the available evidence

Our findings suggest that, although there are associations between MDD and a number of environmental variables, the association between their epigenetic signature and MDD is attenuated when considering a number of lifestyle factors. Investigating the epigenetics of disease-relevant modifiable factors may uncover useful biomarkers for disease stratification as well as treatment options that may be responsive to lifestyle modifications. However, the relationship between DNA methylation and MDD is incompletely understood, and future studies, both cross-sectional and longitudinal, will be able to shed light on the trajectory of DNA methylation in relation to both lifestyle factors and MDD.

Role of funding sources

Our funding sources were not involved in the study preparation/design, analysis/interpretation of data, or in the writing and submission of this report. Alt-text: Unlabelled box

Introduction

Major depressive disorder (MDD) is a prevalent psychiatric disorder and is a leading cause of disability worldwide. MDD is moderately heritable (h2=37%) and is known to result from a complex combination of genetic and environmental risk factors. Polygenic risk scores (PRS) derived from large-scale genome-wide association studies (GWAS) explain approximately 1.5–3.2% of MDD risk in independent cohorts. In addition, a number of modifiable lifestyle factors are known to associate with MDD, including alcohol intake, smoking, sleeping pattern, diet, and body mass index (BMI)., Recently, methylome-wide association studies (MWAS) have begun to identify depressive symptom associations with differential DNA methylation (DNAm) at cytosine-phosphate-guanine (CpG) sites annotated to genes implicated in disorder- and neural-related traits., Further, methylation scores (MS) for MDD explain additional variance in the disorder when modelled alongside PRS and risk-associated environmental factors, such as smoking, alcohol consumption, and BMI., However, a large proportion of variance in MDD remains unexplained after accounting for MDD genetic and methylation risk alongside environmental factors. Recent studies using both methylome-wide association and penalised regression methods have identified DNAm markers for modifiable lifestyle factors, that are measured peripherally in whole blood and can be used for MS estimation.9, 10, 11, 12, 13 There are now well-established MWAS for a number of lifestyle factors that are relevant to MDD, including smoking, BMI, and alcohol consumption. In addition, using penalised regression, McCartney et al. showed that DNAm predictors for complex traits, including BMI, smoking, educational attainment, and total and HDL cholesterol increased the variance explained in these traits when modelled alongside PRS. This finding is of interest in the application to multifactorial diseases, where modelling PRS alongside MS for relevant risk factors may enhance prediction. For instance, a recent study showed that a risk model combining lung cancer PRS, a smoking-associated MS, and environmental factors such as pack years predicted lung cancer with a higher accuracy than models including individual scores (AUCPRS=0.571, AUCMS=0.628, AUCjoint=0.654), with the increase in AUC being mostly attributable to the MS. The study indicates that calculating MS for disease-relevant environmental factors may uncover biomarkers for disease stratification and treatment that may be responsive to lifestyle modifications. Although several environmental factors with widespread effects on the methylome are known to be associated with MDD, the associations between their epigenetic signatures and MDD has not yet been investigated. Risk prediction models including methylation scores for dynamically changing MDD-associated environmental variables have the potential to increase prediction accuracy for the disorder by capturing an archive of longitudinal exposure. In addition, DNAm biomarkers based on environmental risk factors may lead to the development of novel techniques to measure the efficacy of lifestyle interventions more rapidly, providing potentially useful feedback to both clinicians and patients. Here, we selected four lifestyle factors (smoking status, alcohol consumption, BMI, educational attainment) and 2 biochemical variables (total cholesterol, high-density lipoprotein (HDL) cholesterol) in N=9,502 individuals in Generation Scotland: the Scottish Family Health Study (GS) that are phenotypically associated with MDD., We then conducted a literature search to identify well-powered MWAS of these traits. The aim of the current study was to compute MSs for these MDD-associated risk factors using methylome-wide significant CpGs.9, 10, 11, 12, 13 For those variables where full summary statistics were available (alcohol consumption, educational attainment, smoking status), we additionally calculated MS using four additional p-value thresholds, including p < 0.01, 0.05, 0.1, and 0.5 to investigate whether MSs that include a larger number of CpGs would increase prediction. Associations between the MSs and MDD and their corresponding phenotypes were assessed in 9502 individuals in GS. We further split GS into a training (N=5,078) and testing (N=4432) sample to calculate a MDD MS and compared this to complex trait MS in the testing GS sub-sample (N=4432). We used an age-matched subset of mothers (mean age=47.96 years, N=565) in the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort to replicate findings in GS. To investigate further concordant signals with MDD, we assessed whether single nucleotide polymorphisms (SNPs) associated with methylation at CpG sites comprising each MS (mQTLs) were colocalised with SNPs associated with MDD.

Methods

Training panels

To conduct our analyses, we included summary-level data from 6 previous MWASs (educational attainment, HDL cholesterol, total cholesterol, smoking status, alcohol consumption, and BMI9, 10, 11, 12, 13). For educational attainment, smoking status, and alcohol consumption, full summary statistics were available and obtained directly from the respective authors. For BMI, HDL and total cholesterol, methylome-wide summary statistics were obtained from the EWAS catalog (http://www.ewascatalog.org/), after permission was obtained from the EWAS Catalog team. Further information regarding each MWAS, including cohort selection, statistical analysis, and demographic information, is available in the Supplementary Table 1 and in each study.9, 10, 11, 12, 13

Target panels

Generation Scotland – Scottish family health study (GS)

GS is a family-based population cohort aiming to investigate the genetic and environmental causes of health and disease in approximately 24,000 participants aged 18–98 years in Scotland. Baseline data was collected between 2006 and 2011 and includes detailed information on a broad range of variables, including lifestyle and environmental factors, mental health, and medication., DNA is also available from blood samples taken at the time of recruitment from more than 20,000 consenting participants.

Avon longitudinal study of parents and children (ALSPAC)

ALSPAC is a population-based study in the South-West of England aiming to investigate the effects of multiple factors on health and development. Pregnant women were recruited between April 1991 and December 1992, with the initial number of pregnancies enrolled being 14,541. The cohort now consists of 13,761 mothers, their partners, and their 14,901 children (now young adults).19, 20, 21 The main replication sample in the current study comes from the mothers’ follow-up timepoint (mean age=47.96; see Table 2 for further demographic characteristics). Further information regarding the sample is given in the Supplemental Materials.
Table 2

Demographic characteristics for individuals with a MDD diagnosis and controls in ALSPAC (N=565, female only); CSE=certificate of secondary education. *= significant differences between MDD cases and controls.

Demographic characteristicMDD diagnosis (N=67)No MDD Diagnosis (N=498)Significance testing
Age (mean, SD)48.57 (4.5)47.954t(80.94)=-1.18, p=0.238
BMI (mean, SD)25.05 (3.67)24.55 (3.33)t(79.71)=1.30, p=0.2
Smoking status (%)Χ2(2)=3.49, p=0.174
Current smokerNever smoked tobaccoMissing9 (13%)58 (87%)0 (0%)35 (7%)462 (92.8%)1 (0.2%)
Alcohol consumption (%)Χ2(5)=3.32, p=0.651
 Never drank5 (7%)39 (8%)
 Monthly or less13 (20%)67 (13%)
 2-4 times/month13 (20%)92 (18.8%)
 2-3 times/week19 (28%)188 (38%)
 5-4 or more times/week17 (25%)111 (22%)
 Missing0 (0%)1 (0.2%)
*Educational attainmentΧ2(4)=10.37, p=0.035
No qualificationCSEVocationalO-levelA-levelDegree0 (0%)8 (12%)7 (10%)19 (28%)23 (34%)10 (16%)0 (0%)26 (5%)25 (4.5%)162 (32.5%)161 (32%)124 (26%)
HDL cholesterol (mean, SD)1.45 (0.49)1.50 (0.36)t(85.87)=0.04, p=0.968
Total cholesterol (mean, SD)4.87 (0.84)4.91 (0.85)t(104.74)= -0.71, p=0.484

Ethics

GS received ethical approval from NHS Tayside Research Ethics Committee (REC reference number 05/S1401/89) and has Research Tissue Bank Status (reference: 20/ES/0021). Written informed consent was obtained from all participants. ALSPAC received ethical approval from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. Written informed consent was obtained from all participants and consent for biological samples has been collected in accordance with the Human Tissue Act (2004). Please note that the study website contains details of all the data that is available through a fully searchable data dictionary and variable search tool: http://www.bristol.ac.uk/alspac/researchers/our-data/.

Phenotypes

GS

MDD status was measured using the axis-I Structured Clinical Interview of the Diagnostic and Statistical Manual, version IV (SCID) and was administered to participants who answered “yes” to either of two screening questions (N=1626, see Supplementary Materials). Control participants were defined as those individuals who answered “no” to the two screening questions (see Supplementary Materials) or did not fulfil criteria for a diagnosis of current or previous MDD following the SCID interview (N=7876). Individuals fulfilling criteria for bipolar disorder or those who self-reported either bipolar disorder or schizophrenia (N=11) were excluded. Educational attainment was measured by asking participants: “What is the highest educational qualification you have obtained?” with nine available answers, detailed in the Supplementary Materials. BMI was computed using height (cm) and weight (kg) as measured by clinical staff at baseline recruitment. Participants reported their smoking status (never, former, current) as well as the number of units of alcohol consumed during the past week. Finally, concentrations of HDL and total cholesterol in blood were measured at baseline by mmol/L.

ALSPAC

MDD was measured using the Edinburgh Postnatal Depression Scale (EPDS). Briefly, participants were asked to mark the response closest to how they have been feeling in the past 7 days on a 10-item scale, where the total score is 30 and a score above 13 indicates MDD. We transformed the scores into a binary variable, where MDD cases were those who scored above 13 (N=67) and controls were those with a total score of ≤13 (N=498). BMI was computed using height (cm) and weight (kg) as measured by clinical staff at baseline recruitment. Participants reported whether they currently smoke as well as alcohol consumption frequency (see Supplementary Materials). Concentrations of HDL and total cholesterol were measured by mmol/L. Educational attainment was recorded by asking participants: “What is the highest educational qualification you have obtained?” with six available answers, detailed in the Supplementary Materials.

DNA methylation

Genome-wide DNAm data profiled from whole blood samples was available for 9,537 individuals in GS using the Illumina Human-MethylationEPIC BeadChip. The DNAm data was initially released in two sets (set 1N=5,087; set 2N=4,450). DNAm data was pre-processed and quality checked for all individuals in the present study, including participant removal due to a number of reasons, including sex mismatch (Nremoved=24), having more than 1% CpG sites with a detection p-value>0.05 (Nremoved=52), showing evidence of dye bias, being an outlier for bisulphite conversion control probes (Nremoved=1), having a median methylated signal intensity more than 3 standard deviations lower than expected (Nremoved=74), and other technical issues (Nremoved=602). A total of 10,495 CpG sites were removed due to low beadcount, poor detection p-value, and sub-optimal binding. Firstly, R package “minfi” was used to read in the IDAT files, compute M and beta values, and remove probes with large detection p-values, and to compute principal components (PC) of control probes (see Supplementary Tables 2 and 3). Secondly, correction was applied for technical variation, where M values were included as outcome variables in a mixed linear model adjusting for appointment date and Sentrix ID (random effects), jointly with Sentrix position, batch, clinic, year, weekday, and 10 PCs (fixed effects); and biological variation by fitting residuals of as outcome variables in a second mixed linear model adjusting for genetic and common family shared environmental contributions (random effects classed as G: common genetic; K: kinship; F: nuclear family; C: couple; and S: sibling) and sex, age, and estimated cell types proportions (CD8T, CD4T, NK, Bcell, Mono, Gran) (fixed effects). Cross-reactive (N=42,558) and polymorphic (N=10,971) CpGs, obtained from McCartney et al. were removed from the final dataset, resulting in 674,246 CpGs across the 22 autosomes. The Illumina Infinium HumanMethylation450 Beadchip was used for measuring genome-wide DNA methylation data from blood sample for all samples. The R package “mefill” was used for pre-processing and normalisation. Probes were removed based on background detection (p>0.05) and if they reach beyond the 3 times inter-quantile range from 25% and 75%. R function “beta2m” from the “lumi” package was used for M-value transformation. Cross-reactive and polymorphic CpGs (N=34,881), identified by Chen et al. were removed, resulting in 447,975 CpGs across the 22 autosomes remaining for analysis.

Statistical methods

All analyses were conducted using R (version 3.6.3) in a Linux environment. The R code for the current analyses is available in the Supplementary RMarkdown File.

MDD-relevant factor selection

To identify risk factors for MDD for building MSs, we first ran logistic regression models to identify nominal associations between environmental, lifestyle, and biochemical variables available in GS, included as predictor variables in separate models, and MDD, included as an outcome variable. The complete list of investigated variables is included in the Supplementary Table 4. Age and sex were included as covariates in all models. Following this, we conducted a literature search to identify previous MWASs where DNAm signatures of significantly associated factors were uncovered (see Supplementary Materials). To meet inclusion criteria, studies needed to use the 450K or EPIC array in peripheral blood in an adult population; provide access to methylome-wide findings where full summary statistics were not available; and include smoking status as a covariate where relevant. GS was not included in any of the MWASs. ALSPAC was included in the BMI MWAS as a replication cohort, however, we used weights from the discovery cohort to calculate the BMI MS. We identified and calculated MSs for six factors: total cholesterol, HDL cholesterol, educational attainment, smoking status, alcohol consumption, and BMI.9, 10, 11, 12, 13

MS calculation

All previous MWASs were conducted using the Illumina 450K array. The overlap between CpGs identified in previous MWASs and CpGs available in GS and ALSPAC is presented in Supplementary Table 5. For each trait, MSs were calculated for all individuals in GS and ALSPAC with available DNAm data (N=9,502 and N=565, respectively) by taking the sum of the product of methylome-wide significant CpGs and their estimated weights in each MWAS.9, 10, 11, 12, 13 Where full summary statistics were available (educational attainment, smoking status, alcohol consumption), we also calculated MS at 4 further p-value thresholds: 0.01, 0.05, 0.1, and 0.5.11, 12, 13 In GS, we calculated a MDD MS to compare to each complex trait MS. As there are no previous well-powered MWAS of MDD to date, we split the GS sample into a training (N=5078) and testing (N=4432) sample by set, as detailed above. We then applied least absolute shrinkage and selection operator (LASSO) penalised regression on 450K array CpG sites measured in the individuals in the training sample. Briefly, depression status was regressed on age, sex and 10 genetic principal components, as in previous studies, and extracted residuals from this model were input as the dependent variable in the regression model. Tenfold cross-validation was applied, and the mixing parameter was set to 1 for our LASSO penalty.

Association of MS with corresponding traits

In both cohorts, the associations between each MS and their corresponding traits were assessed using logistic, linear, and ordinal logistic regression (depending on each trait, included as an outcome variable). Each MS was included as a predictor variable in separate models. Technical and biological variables (GS: age and sex; ALSPAC: age, 20 methylation PCs, and estimated proportions for five white blood cell types (CD4T, CD8T, natural killer cells, B-cells, Granulocytes, estimated using the Houseman method)) were included as covariates in each model. In GS, methylation PCs and cell type estimations were regressed out during pre-processing of the DNAm data and were therefore not included as covariates in downstream analyses. In ALSPAC, we calculated methylation PCs by first residualizing DNAm data on age, sex, and array, and then applying principal component analysis (PCA) on the residualised data. McFadden's R2 was calculated to determine the proportion of variance in each trait explained by each MS.

Association of MS with MDD

We then tested whether each MS was associated with MDD in GS (N=9502) using logistic regression, where MDD was included as an outcome variable, and each MS was included as a predictor variable. Three statistical models were performed for each MS individually, differing in covariates included. The example below is demonstrated using BMI MS, and these models were repeated for all other MS: Model 1: MDD ∼ age + sex + BMI MS, where the association between each MS and MDD without confounding variables was assessed. Model 2: MDD ∼ age + sex + BMI + BMI MS, where, for each MS, its corresponding phenotype in GS was included to estimate how much variance each MS explains in MDD when adjusting for its corresponding phenotypic measure. Model 3: MDD ∼ age + sex + BMI + smoking status + pack years + alcohol consumption + BMI MS, where further lifestyle factors, known to associate with both MDD and DNAm, were included to observe the proportion of variance explained by the MS when adjusting for these factors. The same models were run in ALSPAC (N=565), with differing technical and biological covariates: age, 20 methylation PCs, and estimated proportions for five white blood cell types (CD4T, CD8T, natural killer cells, B-cells, granulocytes). BMI, alcohol consumption, and smoking status were included in model 3 as lifestyle factors in ALSPAC. In a subset of GS (N=4432), which was created by splitting the sample into a training and testing sample, we further investigated whether an MDD MS would explain more variance in MDD than complex trait MS. The area under the curve was calculated for each MS and a ROC curve showing the ability of each score to discriminate between MDD cases and controls is shown in Figure 3 below.
Figure 3

Receiver Operating Characteristic (ROC) curve indicating the sensitivity (y-axis) and specificity (x-axis) of environmental MS (Bonferroni-corrected CpGs) and MDD MS for MDD. The AUC estimates are indicated in black for each predictor in each graph, and the AUC estimate for MDD MS is indicated in red in all graphs.

Finally, we conducted sensitivity analyses by stratifying the GS sample by sex and running models 2 and 3 in a women-only sample (N=5615, MDD cases=1163), as the ALSPAC sample consisted of women only; and including smoking status as a covariate in model 1 described above, to observe whether this attenuates the relationship between complex trait MS and MDD due to the widespread effect of smoking on the methylome.

Colocalization analysis

We hypothesised that some CpG sites included in the complex trait MS will have shared variants with MDD-associated SNPs. We used Howard et al.’s MDD GWAS for MDD-associated SNPs and GoDMC summary statistics (http://www.godmc.org.uk/) for mQTL analysis., We used the package “gwasglue” (https://mrcieu.github.io/gwasglue/) to extract SNPs that were +/- 1Mb of each of the 102 genome-wide significant SNPs identified in Howard et al. and then extracted the same SNPs within those regions from the GoDMC mQTL analysis. We used the “coloc.abf” function with default parameters in the “coloc” package in R to perform colocalization analysis for each SNP association. The method tests for five mutually exclusive scenarios in a genetic region: H0: there exist no causal variants for either trait; H1: there exists a causal variant for trait one only; H2: there exists a causal variant for trait two only; H3: there exist two distinct causal variants, one for each trait; and H4: there exists a single causal variant common to both traits. For regions of interest with a posterior probability of >0.5, we performed a manual look-up to identify whether any of the loci in these regions colocalize with genetic variation influencing CpG sites that comprise MS for the 6 complex traits investigated here, including smoking, alcohol consumption, educational attainment, BMI, HDL and total cholesterol. Our funding sources were not involved in the study preparation/design, analysis/interpretation of data, or in the writing and submission of this report.

Results

Demographic characteristics

In GS, there were N=9,502 individuals included in the final analyses (Table 1). In ALSPAC, there were N=565 individuals included in the final analyses (Table 2). Significant differences between cases and controls are indicated in Tables 1 and 2. For model 3, the sample size decreased due to exclusion of individuals who had incomplete lifestyle, disorder, and biochemical data (GS sample for final model=7,890; ALSPAC sample for final model=404). Demographic characteristics for the sub-sample used to test the MDD MS in GS (N=4,432) are included in Supplementary Table 6.
Table 1

Demographic characteristics for individuals with an MDD diagnosis and controls in GS (N=9,502); CSE=certificate of secondary education; GCSE=general certificate of secondary education; NVQ=national vocational qualification; HND=higher national diploma; HNC=higher national certificate. *= significant differences between MDD cases and controls.

Demographic characteristicMDD diagnosis (N=1,626)No MDD Diagnosis (N=7,876)Significance testing
*Age (mean, SD)48.23 (12.05)50.16 (13.83)t(2614)=5.7, p=1.35 × 10−8
*Sex (%)F=1,163 (72%)F=4,452 (57%)Χ2(1)=125.43, p=4.096 × 10−29
*BMI (mean, SD)27.41 (5.7)26.78 (4.89)t(2141)= -4.05, p=5.23 × 10−5
Alcohol units (mean, SD)10.41 (12.51)10.64 (12.09)t(2009)=0.64, p=0.522
*Smoking status (%)Χ2(3)=85.76, p=1.78 × 10−18
Current smoker395 (24%)1,215 (16%)
Former smokers (quit < 1 year ago)47 (3%)227 (3%)
Former smokers (quit > 1 year ago)454 (28%)2,155 (27%)
Never smoked tobacco696 (43%)4,105 (52%)
Missing34 (2%)174 (2%)
*Pack years (mean, SD)9.11 (14.18)7.66 (14.05)t(2270)=-3.58, p=3.49 × 10−4
*Educational attainmentΧ2(8)=16.29, p=0.038
No qualification134 (8%)634 (8%)
Other51 (3%)191 (3%)
School leavers’ certificate47 (3%)380 (5%)
CSE/equivalent4 (0.25%)31 (0.5%)
Standard grade/O-level/GCSE/equivalent192 (11.75%)968 (12%)
Higher grade/A-level/AS-level/equivalent150 (9%)729 (9%)
NVQ/HND/HNC/equivalent145 (9%)646 (8%)
Other professional/technical qualification334 (21%)1,561 (19.5%)
College/University degree461 (28%)2,190 (28%)
Missing108 (7%)546 (7%)
HDL cholesterol (mean, SD)1.48 (0.42)1.48 (0.41)t(2327)=0.14, p=0.891
Total cholesterol (mean, SD)5.21 (1.07)5.16 (1.06)t(2331)=-1.62, p=0.105
Demographic characteristics for individuals with an MDD diagnosis and controls in GS (N=9,502); CSE=certificate of secondary education; GCSE=general certificate of secondary education; NVQ=national vocational qualification; HND=higher national diploma; HNC=higher national certificate. *= significant differences between MDD cases and controls. Demographic characteristics for individuals with a MDD diagnosis and controls in ALSPAC (N=565, female only); CSE=certificate of secondary education. *= significant differences between MDD cases and controls. Associations between environmental and biochemical factors and MDD in GS (N=9,502) in logistic regression models. All variables are significantly associated with MDD apart from educational attainment. Prior to identifying well-powered MWASs of potential environmental risk factors, in those individuals with available DNAm data in GS (N=9,502) we ran regression models where age and sex were included as covariates, to measure associations of environmental, lifestyle, and biochemical variables with MDD. All variables investigated, as well as results from regression analyses, are available in Supplementary Table 4. Table 3 below indicates results for those variables that were nominally associated with MDD in GS and were also identified as having an established DNAm signature in previous well-powered MWAS. Educational attainment was not associated with MDD in GS, however it has been widely investigated in relation to DNAm and was therefore included in subsequent analyses here.
Table 3

Associations between environmental and biochemical factors and MDD in GS (N=9,502) in logistic regression models. All variables are significantly associated with MDD apart from educational attainment.

TraitBetaP-valueR2 (%)
HDL cholesterol-0.1160.00010.9%
Total cholesterol0.0690.0160.6%
Smoking status0.5671.13 × 10−163.2%
Alcohol (units)0.1030.000710.8%
BMI0.1491 × 10−80.9%
Educational attainment-0.0030.7660.6%

MS and corresponding traits

Each MS, which was included as a predictor, was first investigated in relation to its corresponding phenotype, as outcome, in regression models, along with technical and biological covariates (GS: age and sex; ALSPAC: age, 20 methylation PCs, and 5 cell types (CD4T, CD8T, natural killer cells, B-cells, granulocytes)). All MSs were associated with their phenotypic counterparts in both GS and ALSPAC. R2 for all analyses are presented in Supplementary Figs. 1 and 2. Tables 4 and 5 present results for both cohorts.
Table 4

Associations between environmental factors (outcome) and their corresponding MS in GS (N=9,502), where age and sex were included as covariates, in linear, logistic, and ordinal regression models. Where available (educational attainment, smoking status, alcohol units), associations are presented for MS calculated at multiple significance thresholds (p=methylome-wide (MW, Bonferroni-corrected CpGs), <0.01, <0.05, <0.1, <0.5).

MSOutcomeBetaP-valueR2 (%)
HDL cholesterol (MW)HDL cholesterol0.189< 2 × 10−163.5%
Total cholesterol (MW)Total cholesterol0.117< 2 × 10−161.4%
BMI (MW)BMI0.407< 2 × 10−1616.5%
Educational attainment
 MWEducational attainment0.3132.6 × 10−591.25%
 0.010.2782.04 × 10−461.07%
 0.050.2435.99 × 10−360.93%
 0.10.2253.03 × 10−310.86%
 0.50.2039.13 × 10−260.78%
Smoking status
 MWSmoking status1.457< 2 × 10−1624.1%
 0.011.251< 2 × 10−1618.8%
 0.051.158< 2 × 10−1616.6%
 0.11.120< 2 × 10−1615.7%
 0.51.040< 2 × 10−1613.8%
Alcohol units
 MWAlcohol units0.244< 2 × 10−165.9%
 0.010.1371.69 × 10−421.9%
 0.050.1149.82 × 10−301.3%
 0.10.1052.59 × 10−251.1%
 0.50.0899.03 × 10−190.8%
Table 5

Associations between environmental factors (outcome) and their corresponding MSs in ALSPAC (N=565), where age, 20 methylation PCs, and 5 cell types were included as covariates, in linear, logistic, and ordinal regression models. Where available (educational attainment, smoking status, alcohol units), associations are presented for MS calculated at multiple significance thresholds (p=methylome-wide (MW, Bonferroni-corrected CpGs), <0.01, <0.05, <0.1, <0.5).

MSOutcomeBetaP-valueR2 (%)
HDL cholesterol (MW)HDL cholesterol0.0780.0081.062%
Total cholesterol (MW)Total cholesterol-0.1160.0031.322%
BMI (MW)BMI1.1791.28 × 10−64.82%
Educational attainment
 MWEducational attainment0.2360.0045.26%
 0.010.2680.0085.16%
 0.050.1990.0714.93%
 0.10.2480.0315.01%
 0.50.3470.0055.21%
Smoking status
 MWSmoking status-2.5334.30 × 10−1419.08%
 0.01-2.4011.05 × 10−1212.75%
 0.05-2.3022.07 × 10−1110.79%
 0.1-2.2241.31 × 10−109.85%
 0.5-1.9911.08 × 10−87.67%
Alcohol units
 MWAlcohol units0.6601.02 × 10−83.26%
 0.010.5933.76 × 10−62.73%
 0.050.5431.88 × 10−52.53%
 0.10.5104.76 × 10−52.43%
 0.50.4155.42 × 10−42.20%

MS and MDD

We then examined each MS, as a predictor, with MDD as the outcome, in logistic regression models. Table 6 includes results from the regression model 1 (covariates: GS: age, sex; ALSPAC: age, 20 methylation PCs, and 5 cell types), model 2 (covariates: model 1+corresponding phenotype for each MS for both cohorts) and model 3 (covariates: GS: model 2+4 lifestyle factors; ALSPAC: model 2+3 lifestyle factors). Figures 1 and 2 include R2 for models 2 and 3 in GS and ALSPAC, respectively. Supplementary Tables 7 and 8 include R2 for all models in GS and ALSPAC, respectively.
Table 6

Associations between MDD and MS in GS and ALSPAC across three incremental models differing in covariates included (model 1 covariates: GS (N=9,502): age, sex; ALSPAC (N=565): age, 20 methylation PCs, and 5 cell types; model 2 covariates: model 1+corresponding phenotype for each MS for both cohorts; model 3 covariates: GS (N=7,890): model 2+4 lifestyle factors; ALSPAC (N=404): model 2+3 lifestyle factors), in logistic regression models. Where available (educational attainment, smoking status, alcohol units), associations are presented for MS calculated at multiple significance thresholds (p=methylome-wide (MW, Bonferroni-corrected CpGs), <0.01, <0.05, <0.1, <0.5). Statistically significant results are represented in bold.

GS
Model 1
Model 2
Model 3
MSBetaP-valueBetaP-valueBetaP-value
HDL cholesterol (MW)-0.1133.81 × 10−5-0.0970.0006-0.0620.05
Total cholesterol (MW)-0.0770.005-0.0860.002-0.0430.167
BMI (MW)0.1383.83 × 10−70.0920.0020.0510.128
Educational attainment
 MW-0.1429.77 × 10−8-0.1453.59 × 10−70.0940.038
 0.01-0.1487.25 × 10−8-0.1522.05 × 10−7-0.0440.198
 0.05-0.1256.30 × 10−6-0.1291.09 × 10−5-0.0420.203
 0.1-0.1201.65 × 10−5-0.1242.19 × 10−5-0.0460.153
 0.5-0.1097.37 × 10−5-0.1176.45 × 10−5-0.0520.109
Smoking status
 MW0.1602.89 × 10−90.0530.0950.0330.334
 0.010.1594.01 × 10−90.0700.0220.0500.128
 0.050.1577.58 × 10−90.0740.0140.0540.099
 0.10.1551.14 × 10−80.0750.0130.0540.094
 0.50.1484.83 × 10−80.0720.0150.0520.104
Alcohol units
 MW0.0610.030.0590.0440.0180.561
 0.01-0.0610.03-0.0660.031-0.0690.026
 0.05-0.0830.004-0.0850.005-0.0790.010
 0.1-0.0890.002-0.0900.003-0.0820.008
 0.5-0.0970.0006-0.0970.001-0.0830.007
Figure 1

Variance in MDD (indicated by R2 (%) on the y-axis) explained by each MS in (a) model 2 (covariates: age, sex, each MS's corresponding phenotype) and (b) model 3 (covariates: model 2 + 4 lifestyle factors, BMI, smoking, pack years, and alcohol consumption) in GS (N=9,502) in logistic regression models (N=9,502). Where available, R2 is calculated for MS at different thresholds (educational attainment, smoking status, alcohol consumption). MW=methylome-wide (Bonferroni-corrected CpGs). * = p-value < 0.05; ** = p-value < 1 × 10−5.

Figure 2

Variance in MDD (indicated by R2 (%) on the y-axis) explained by each MS in (a) model 2 (covariates: age, 20 methylation PCs, and 5 cell types, each MS's corresponding phenotype) and (b) model 3 (covariates: model 2 + 3 lifestyle factors, BMI, smoking, and alcohol consumption) in ALSPAC (N=565) in logistic regression models (N=565). Where available, R2 is calculated for MS at different thresholds (educational attainment, smoking status, alcohol consumption). MW=methylome-wide (Bonferroni-corrected CpGs). * = p-value < 0.05.

Variance in MDD (indicated by R2 (%) on the y-axis) explained by each MS in (a) model 2 (covariates: age, sex, each MS's corresponding phenotype) and (b) model 3 (covariates: model 2 + 4 lifestyle factors, BMI, smoking, pack years, and alcohol consumption) in GS (N=9,502) in logistic regression models (N=9,502). Where available, R2 is calculated for MS at different thresholds (educational attainment, smoking status, alcohol consumption). MW=methylome-wide (Bonferroni-corrected CpGs). * = p-value < 0.05; ** = p-value < 1 × 10−5. Variance in MDD (indicated by R2 (%) on the y-axis) explained by each MS in (a) model 2 (covariates: age, 20 methylation PCs, and 5 cell types, each MS's corresponding phenotype) and (b) model 3 (covariates: model 2 + 3 lifestyle factors, BMI, smoking, and alcohol consumption) in ALSPAC (N=565) in logistic regression models (N=565). Where available, R2 is calculated for MS at different thresholds (educational attainment, smoking status, alcohol consumption). MW=methylome-wide (Bonferroni-corrected CpGs). * = p-value < 0.05. As the replication analyses in ALSPAC consisted of women only, we further stratified the GS sample by sex and ran models 2 and 3 in a women-only sample (N=5615, MDD cases=1163), with results available in Supplementary Table 9. Briefly, analyses restricted to women in GS showed similar results to the sex-adjusted analyses in GS, where MS were associated with MDD after adjustment for their phenotypic counterparts, but not when including further lifestyle factors. In addition, due to the known effects of smoking on the methylome, we included smoking status as a covariate in model 1 for all non-smoking traits to identify whether this attenuates the relationship between MDD and complex trait MS without adjusting for other covariates. Results are available in Supplementary Table 10. In both GS and ALSPAC, the effect for all complex trait MS was attenuated by the inclusion of smoking. In GS, all complex trait MS remained significant in their association with MDD, except for educational attainment. In ALSPAC, results remained non-significant as below.

MS and MDD – subset analysis

To investigate whether an MDD MS would out-perform complex trait MSs in the discrimination between MDD cases and controls, we additionally trained an MDD MS in a subset of individuals with DNAm data in GS (N=5078, MDD=1223), where 78 CpGs were selected (Supplementary Table 11). We then calculated an MDD MS in a second subset (N=4432, MDD=408). See Figure 3 for a Receiver Operating Characteristic (ROC) curve showing the ability of each complex trait MS and MDD MS to discriminate between MDD cases and controls. The two complex trait MS that outperformed MDD MS performance (AUC=0.553) are smoking (AUC=0.569) and educational attainment (AUC=0.585). We applied a DeLong test to identify whether this outperformance is statistically significant. Educational attainment and smoking MS were both non-significant when compared to a MDD MS in predicting MDD: educational attainment D=1.814, p=0.07; smoking D=0.913, p=0.36, indicating that although the AUC for the two complex traits is higher, the difference is not statistically significant. Receiver Operating Characteristic (ROC) curve indicating the sensitivity (y-axis) and specificity (x-axis) of environmental MS (Bonferroni-corrected CpGs) and MDD MS for MDD. The AUC estimates are indicated in black for each predictor in each graph, and the AUC estimate for MDD MS is indicated in red in all graphs. Colocalization analysis indicated that there was no strong evidence (PP4>0.8, PP4/PP3 >5) for a single SNP being associated with both MDD and DNAm at CpGs encompassing the MS. The posterior probability for one region was supportive of a suggestive co-localized association signal for both MDD and DNAm in that region (PP4=0.71). Within this region, the SNP with the highest posterior probability of being a causal SNP (66%) was rs73163796, which colocalized with genetic variation influencing a smoking-associated CpG site, cg15099418. Supplementary Excel File 1 contains results for all 102 regions investigated in colocalization analysis.

Discussion

We created MSs for 6 environmental and biochemical risk factors for MDD, namely HDL and total cholesterol, BMI, educational attainment, smoking status, and alcohol consumption, in two cohorts, GS (N=9,502) and ALSPAC (N=565). Methylome-wide scores, and where available, scores at multiple p-value thresholds (educational attainment, smoking status, and alcohol consumption), showed significant associations with their corresponding traits and with MDD after adjustment for their phenotypic counterparts. Most findings attenuated and became non-significant after adjustment for further lifestyle factors. Smoking and education MS marginally outperformed a MDD MS in discriminating between MDD cases and controls in a GS sub-sample (N=4,432). Finally, colocalization analysis showed that genetic variants are shared between a smoking associated CpG site (cg15099418) and MDD. Each MS was significantly associated with its corresponding phenotype in both cohorts (GS βrange=0.089–1.457; ALSPAC βrange=0.078–2.533). All training MWASs consisted of large sample sizes (Nrange=725 (HDL and total cholesterol) – 15,907 (smoking status)), included relevant covariates, and results were consistent where replication cohorts were included (see Supplementary Table 1). All of the training MWAS for phenotypes investigated here were sufficiently predictive of the trait in our independent samples (see Tables 4 and 5). The variance explained by MS was 5% for HDL cholesterol and 12%–27.3% alcohol consumption in Braun et al. and Liu et al., respectively. Other studies that applied penalised regression to derive methylation predictors of environmental factors identified similar proportions of variance explained: 12.5% for BMI and alcohol consumption, 60.9% for smoking, 2.5% for educational attainment, 2.7% for total cholesterol, and 15.6% for HDL cholesterol. These results are consistent with findings here. In contrast, we previously found that a MDD MS explains 1.75% of the variance in MDD, and attenuates when including lifestyle factors (0.68%). This indicates that, although there is evidence of an association between DNAm and MDD, the relationship is not as strong as with lifestyle factors, which is in line with previous evidence. Associations between environmental factors (outcome) and their corresponding MS in GS (N=9,502), where age and sex were included as covariates, in linear, logistic, and ordinal regression models. Where available (educational attainment, smoking status, alcohol units), associations are presented for MS calculated at multiple significance thresholds (p=methylome-wide (MW, Bonferroni-corrected CpGs), <0.01, <0.05, <0.1, <0.5). Associations between environmental factors (outcome) and their corresponding MSs in ALSPAC (N=565), where age, 20 methylation PCs, and 5 cell types were included as covariates, in linear, logistic, and ordinal regression models. Where available (educational attainment, smoking status, alcohol units), associations are presented for MS calculated at multiple significance thresholds (p=methylome-wide (MW, Bonferroni-corrected CpGs), <0.01, <0.05, <0.1, <0.5). Associations between MDD and MS in GS and ALSPAC across three incremental models differing in covariates included (model 1 covariates: GS (N=9,502): age, sex; ALSPAC (N=565): age, 20 methylation PCs, and 5 cell types; model 2 covariates: model 1+corresponding phenotype for each MS for both cohorts; model 3 covariates: GS (N=7,890): model 2+4 lifestyle factors; ALSPAC (N=404): model 2+3 lifestyle factors), in logistic regression models. Where available (educational attainment, smoking status, alcohol units), associations are presented for MS calculated at multiple significance thresholds (p=methylome-wide (MW, Bonferroni-corrected CpGs), <0.01, <0.05, <0.1, <0.5). Statistically significant results are represented in bold. MSs calculated at different p-value thresholds (methylome-wide, 0.01, 0.05, 0.1, 0.5) indicated that the most predictive threshold for each trait was the most conservative one. CpGs meeting a less stringent p-value threshold in the score captured less phenotypic variance for each corresponding trait (R2: smoking=24.1% and 19.08% for methylome-wide threshold compared to 13.8% and 7.67% for p < 0.05 threshold in GS and ALSPAC, respectively; education=1.25% and 5.26% for methylome-wide threshold compared to 0.78% and 5.21% for p < 0.05 threshold in GS and ALSPAC, respectively; alcohol=5.9% and 3.26% for methylome-wide threshold compared to 0.8% and 2.20% for p < 0.05 threshold in GS and ALSPAC, respectively). Consistent with this pattern, previous studies suggest that the optimal p-value threshold strongly depends on the epigenetic architecture of the trait, as well as on the strength of supporting data. Lifestyle traits such as smoking, alcohol, and BMI show widespread associations with peripheral blood DNAm,9, 10, 11, 12, 13 and indeed MWAS for lifestyle traits investigated here have large sample sizes as well as the largest number of associated CpGs at a methylome-wide threshold. All CpGs significantly associated with educational attainment were also found to be associated with smoking, which may explain the similar association pattern to lifestyle factors. In GS, each MS was significantly associated with MDD before corresponding trait and lifestyle factor adjustment, although this was only replicated for methylome-wide educational attainment in ALSPAC. When including smoking status in the covariate-free MS-MDD associations, the effects for all complex trait MSs were attenuated but, with the exception of educational attainment, remained significant (Supplementary Table 10). Although the CpGs (N=11) associated with educational attainment, were adjusted for smoking status in the original MWAS, all 11 were also smoking-associated CpGs, indicating that the two traits share an epigenetic signature and that self-reported smoking status may not be sufficient to correct for smoking signals. In GS, the epigenetic signature of each trait explained additional variance to its phenotypic counterpart in MDD, although effect sizes were small across all traits (methylome-wideβrange=0.053–0.145). For traits where MSs were available at multiple p-value thresholds, the variance explained increased for MSs that included CpGs at a larger p-value threshold for smoking (R2=0.03% methylome-wide MS to 0.07% p < 0.05) and alcohol consumption (R2=0.05% methylome-wide MS to 0.1% p < 0.05) and decreased for educational attainment (R2=0.28% methylome-wide MS to 0.20% p < 0.05). This suggests that for lifestyle traits with widespread effects on the methylome, including a larger number of associated CpG sites increased prediction accuracy for MDD, although the effect size did not differ significantly (smokingβ methylome-wide MS=0.053 to smokingβ p < 0.05=0.072; alcohol consumptionβ methylome-wide MS=0.059 to alcohol consumptionβ p < 0.05=-0.097; educational attainmentmethylome-wide MS=-0.145 to educational attainmentβ p < 0.05=-0.117). After adjusting for lifestyle factors that are known to associate with MDD and DNAm (smoking, pack years, BMI, and alcohol consumption), methylome-wide educational attainment and alcohol consumption at 4 p-value thresholds (p < 0.01-0.5) remained significantly associated with MDD, suggesting that certain lifestyle factors may attenuate the relationship between epigenetic signatures of specific traits and MDD. This is surprising given that smoking is known to have much larger effects on the methylome than alcohol consumption. However, smoking was included as a covariate in all previous MWASs used for the MSs calculated here, while the smoking MWAS did not adjust for any lifestyle factors, suggesting that the smoking MS captures other environment-related CpGs whose effect is attenuated by phenotypic measures of lifestyle factors. This pattern of results is consistent with previous studies where disease-relevant phenotypes attenuate associations between MS for complex traits and disease. For instance, Yu et al. investigated associations between smoking MS and lung cancer before and after adjustment for phenotypic smoking status and pack years. They found that the phenotypes attenuated the association between smoking MS and lung cancer, with odds ratio decreasing across different risk score quartiles. Here, we showed that MS for complex traits enhance MDD risk prediction when added to phenotypic measures of these traits. DNA methylation may represent an archive of exposure to environmental factors that are relevant to MDD and may contribute to disease vulnerability. However, lifestyle factors may play an important role in this relationship. Here, they were shown to attenuate the association between MDD and complex trait MS, indicating that they may interact with widespread DNAm in their association with MDD. This is not surprising, as we have previously found that a MDD MS was significantly associated with smoking and alcohol consumption. Although in our previous study the MDD MS enhanced MDD risk prediction when modelled alongside lifestyle factors, here complex trait MS effects are attenuated in the same scenario. This may show that phenotypic measures of lifestyle factors play a greater role in MDD development than methylation marks for complex traits. However, effect sizes here were small, and further studies will be needed to determine the extent of the role DNA methylation plays in MDD. While the association between each MS and their corresponding phenotype was replicated in ALSPAC, analyses investigating MDD were not. When the MS was modelled together with its phenotypic counterpart, effects were in the same direction with GS across all traits with the exception of BMI, which was positive in GS but negative in ALSPAC. The opposite pattern was shown in terms of variance explained for MS calculated at multiple thresholds, where explained variance in MDD decreased with a less stringent threshold for smoking (R2=0.06% methylome-wide MS to 0% p < 0.05) and alcohol consumption (R2=0.32% methylome-wide MS to 0% p < 0.05) and increased for educational attainment (R2=0.14% methylome-wide MS to 0.97% p < 0.05). The variance explained in MDD did not exceed 1% for any of the traits in GS. Larger cohorts may, therefore, be required to elucidate the link between MDD and epigenetic signatures of lifestyle factors. The MWASs used to create MS in the current study uncovered CpGs localised to a number of genes that may be of relevance to MDD. The MWAS of educational attainment identified genes implicated in neuronal, immune, and developmental processes; alcohol consumption-associated CpGs were localised to genes involved in cellular response to stress and chemicals, and immune functions; BMI-associated CpGs were linked to genes that played a role in lipid metabolism, inflammation, metabolic, cardiovascular, respiratory, and neoplastic disease; smoking-related methylation marks were localised to genes implicated in smoking-related diseases (osteoporosis, colorectal cancers, chronic obstructive pulmonary disease, pulmonary function, cardiovascular disease, rheumatoid arthritis); finally, HDL and total cholesterol-associated CpGs were annotated to genes implicated in cholesterol metabolism. Most processes identified in these MWAS have also been previously associated with MDD and antidepressant use, specifically immune and neuronal processes., It is therefore possible that some associations between MDD and complex trait MS in this study may arise as a result of the processes in which the genes above participate, although further studies are needed to confirm this. Differences in results between the two cohorts may be attributable to sample size (NGS=9,502; NALSPAC=565) and phenotypic differences. Firstly, the ALSPAC sample consists of women only. However, analyses restricted to women in GS (N=5,615) showed similar results to the sex-adjusted analyses in GS (see Supplementary Table 9), indicating that the lack of replication may be due to factors such as the much smaller sample size in ALSPAC (N=565) rather than sex. Further, although the replication sample was matched in age (GSmean age=49.82, ALSPACmean age=47.96), there were differences in lifestyle factors between the two cohorts. For instance, 18% of participants in GS smoked at the time of blood draw, as opposed to 8% in ALSPAC; 28% and 24% of individuals held a university degree in GS and ALSPAC, respectively. In addition to this, all participants in ALSPAC had some form of education qualification, whereas 8% of GS participants held no qualifications. Finally, BMI was lower in ALSPAC (mean=24.99) as compared to GS (mean=26.89). Further, similarly to GS, all MSs in ALSPAC explained a significant proportion of variance in their corresponding phenotypic traits, with non-replicating analyses occurring only when MDD was investigated. MDD was assessed differently in the two cohorts: in GS, this was measured using SCID, while in ALSPAC, MDD status was determined by classifying participants with a score of >13 on the EPDS as cases. Previous studies have shown that EPDS approximated SCID-based prevalence overall, although considerable heterogeneity between cohorts may play a role in this approximation. A MDD MS tested in a GS sub-sample (N=4,432) was outperformed by smoking and education MSs in predicting MDD, although predictive values were low for all MS (MDDAUC=0.553, smokingAUC=0.569, educationAUC=0.585). DNAm is highly predictive of smoking, and there is a strong overlap of smoking-associated CpGs in the educational attainment MWAS used to calculate the MS. The two predictors are also highly correlated (r=-0.720). Results therefore suggest that epigenetic signatures of lifestyle traits showing more widespread associations with DNAm are marginally more predictive of MDD than an MDD-specific predictor, although current results are limited by lack of large MWAS of MDD. There are several key strengths to this study. Firstly, GS is one of the largest population-based cohorts containing DNAm and a broad range of lifestyle, disorder, and environmental variables. Secondly, this study provides insight into associations between MSs for lifestyle and biochemical factors in relation to MDD across multiple p-value thresholds. Finally, we used a second large, population-based study as a replication cohort, which similarly contains a range of lifestyle and environmental variables, in addition to DNAm. Despite these strengths, a number of potential limitations to the current study also need to be considered. Firstly, although GS uses the EPIC array, capturing DNAm at approximately 850K sites, previous MWAS are limited by use of the 450K array, which measures methylation at 450K CpGs. Using a larger array may improve the predictive accuracy for environmental traits, which in turn may lead to more precise associations in relation to MDD. Secondly, DNAm was collected from blood samples in both cohorts and in all previous MWASs, which may not be the most relevant tissue for MDD. However, previous studies have shown robust associations between peripheral blood-based methylation predictors and MDD.,, Participants in GS and ALSPAC are predominantly of European ancestry, and the generalisability to diverse ancestries is unknown. Finally, as mentioned above, ALSPAC contained only women who smoked less and had a lower BMI than GS participants; GS participants reported a higher level of educational attainment than ALSPAC women. Although stratifying the GS cohort by women only indicated that results are not due to sex differences, results here may be due to these phenotypic differences, and future studies should select replication cohorts that are analogous to the training cohort. In the current study we showed that epigenetic signatures of lifestyle and biochemical factors are associated with MDD after adjustment for their phenotypic counterparts, but not when including a broader number of lifestyle factors. Results were not replicated in a second cohort, which may be due to phenotypic differences compared to the main cohort as well as the much smaller sample size. Lifestyle variables are significant in terms of DNAm-related risk to MDD, and efforts should be made in future to disentangle the relationship between these lifestyle factors, DNAm, and MDD. Our study demonstrates the value and necessity of large DNAm datasets for discovery and replication within and between cohorts.

Contributors

MCB, HCW, and AMM were responsible for the project conceptualisation, methodology and validation. MCB, ASFK, MJA, and AC carried out the data curation. CA, RMW, SWM, and KLE were responsible for DNA methylation data quality check and pre-processing in Generation Scotland and ALSPAC. CL, JvD, and MG were responsible for providing summary statistics for the current study. JLM was responsible for providing results in the form of summary statistics (GoDMC). CR and DJP are project directors for ALSPAC and Generation Scotland, respectively. MCB, HCW, and AMM were responsible for the decision to submit. MCB was responsible for formal analysis and writing the original draft and visualisation. XS additionally verified the underlying data and analysis. MCB, CA, ASFK, XS, MJA, DMH, RMW, SWM, JLM, CL, JvD, MG, CR, DJP, AC, KLE, HCW, and AMM reviewed versions of the manuscript. MCB, HCW, and AMM were responsible for manuscript editing and review. AMM and HCW were responsible for supervision, project administration, resources, and funding acquisition. MCB, CA, ASFK, XS, MJA, DMH, RMW, SWM, JLM, CL, JvD, MG, CR, DJP, AC, KLE, HCW, and AMM were not precluded from accessing data in the study and they accept responsibility to submit for publication. All authors read and approved the final version of the manuscript.

Data sharing statement

Data used in the preparation of this article were obtained from the Generation Scotland (GS) cohort (https://www.ed.ac.uk/generation-scotland) and the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort (http://www.bristol.ac.uk/alspac/). Qualified researchers can request access to GS data through a research proposal, accessed at the following link: https://www.ed.ac.uk/generation-scotland/for-researchers/access and to ALSPAC, accessed at the following link: http://www.bristol.ac.uk/alspac/researchers/access/. Scripts for the analyses in this project can be found in the Supplementary RMarkdown file.

Declaration of interests

MCB has received financial support from Edinburgh Neuroscience Researcher's Fund, Wellcome Trust Institutional Translational Partnership Award Innovation Competition, and Research Adaptation Fund to attend courses and conferences in the past. RMW has received financial support from Alzheimer's Research UK (ARUK) to attend the ARUK annual conference (2021 and 2022). JLM is supported by the UK Medical Research Council Integrative Epidemiology Unit at the University of Bristol. AC is a University of Edinburgh Medical Research Ethics Committee member. JvdD was supported by NWO Large Scale infrastructures, X-Omics (184.034.019). Remaining authors report no conflicts of interest.
  37 in total

1.  Cohort Profile: Generation Scotland: Scottish Family Health Study (GS:SFHS). The study, its participants and their potential for genetic research on health and illness.

Authors:  Blair H Smith; Archie Campbell; Pamela Linksted; Bridie Fitzpatrick; Cathy Jackson; Shona M Kerr; Ian J Deary; Donald J Macintyre; Harry Campbell; Mark McGilchrist; Lynne J Hocking; Lucy Wisely; Ian Ford; Robert S Lindsay; Robin Morton; Colin N A Palmer; Anna F Dominiczak; David J Porteous; Andrew D Morris
Journal:  Int J Epidemiol       Date:  2012-07-10       Impact factor: 7.196

2.  A methylation study of long-term depression risk.

Authors:  Shaunna L Clark; Mohammad W Hattab; Robin F Chan; Andrey A Shabalin; Laura K M Han; Min Zhao; Johannes H Smit; Rick Jansen; Yuri Milaneschi; Lin Ying Xie; Gerard van Grootheest; Brenda W J H Penninx; Karolina A Aberg; Edwin J C G van den Oord
Journal:  Mol Psychiatry       Date:  2019-09-09       Impact factor: 15.992

3.  Epigenetic Signatures of Cigarette Smoking.

Authors:  Roby Joehanes; Allan C Just; Riccardo E Marioni; Luke C Pilling; Lindsay M Reynolds; Pooja R Mandaviya; Weihua Guan; Tao Xu; Cathy E Elks; Stella Aslibekyan; Hortensia Moreno-Macias; Jennifer A Smith; Jennifer A Brody; Radhika Dhingra; Paul Yousefi; James S Pankow; Sonja Kunze; Sonia H Shah; Allan F McRae; Kurt Lohman; Jin Sha; Devin M Absher; Luigi Ferrucci; Wei Zhao; Ellen W Demerath; Jan Bressler; Megan L Grove; Tianxiao Huan; Chunyu Liu; Michael M Mendelson; Chen Yao; Douglas P Kiel; Annette Peters; Rui Wang-Sattler; Peter M Visscher; Naomi R Wray; John M Starr; Jingzhong Ding; Carlos J Rodriguez; Nicholas J Wareham; Marguerite R Irvin; Degui Zhi; Myrto Barrdahl; Paolo Vineis; Srikant Ambatipudi; André G Uitterlinden; Albert Hofman; Joel Schwartz; Elena Colicino; Lifang Hou; Pantel S Vokonas; Dena G Hernandez; Andrew B Singleton; Stefania Bandinelli; Stephen T Turner; Erin B Ware; Alicia K Smith; Torsten Klengel; Elisabeth B Binder; Bruce M Psaty; Kent D Taylor; Sina A Gharib; Brenton R Swenson; Liming Liang; Dawn L DeMeo; George T O'Connor; Zdenko Herceg; Kerry J Ressler; Karen N Conneely; Nona Sotoodehnia; Sharon L R Kardia; David Melzer; Andrea A Baccarelli; Joyce B J van Meurs; Isabelle Romieu; Donna K Arnett; Ken K Ong; Yongmei Liu; Melanie Waldenberger; Ian J Deary; Myriam Fornage; Daniel Levy; Stephanie J London
Journal:  Circ Cardiovasc Genet       Date:  2016-09-20

4.  Cohort Profile: the Avon Longitudinal Study of Parents and Children: ALSPAC mothers cohort.

Authors:  Abigail Fraser; Corrie Macdonald-Wallis; Kate Tilling; Andy Boyd; Jean Golding; George Davey Smith; John Henderson; John Macleod; Lynn Molloy; Andy Ness; Susan Ring; Scott M Nelson; Debbie A Lawlor
Journal:  Int J Epidemiol       Date:  2012-04-16       Impact factor: 7.196

5.  Methylome-wide association study of antidepressant use in Generation Scotland and the Netherlands Twin Register implicates the innate immune system.

Authors:  Miruna C Barbu; Floris Huider; Archie Campbell; Carmen Amador; Mark J Adams; Mary-Ellen Lynall; David M Howard; Rosie M Walker; Stewart W Morris; Jenny Van Dongen; David J Porteous; Kathryn L Evans; Edward Bullmore; Gonneke Willemsen; Dorret I Boomsma; Heather C Whalley; Andrew M McIntosh
Journal:  Mol Psychiatry       Date:  2021-12-08       Impact factor: 13.437

6.  Integration of disease association and eQTL data using a Bayesian colocalisation approach highlights six candidate causal genes in immune-mediated diseases.

Authors:  Hui Guo; Mary D Fortune; Oliver S Burren; Ellen Schofield; John A Todd; Chris Wallace
Journal:  Hum Mol Genet       Date:  2015-03-05       Impact factor: 6.150

7.  Individual and joint contributions of genetic and methylation risk scores for enhancing lung cancer risk stratification: data from a population-based cohort in Germany.

Authors:  Haixin Yu; Janhavi R Raut; Ben Schöttker; Bernd Holleczek; Yan Zhang; Hermann Brenner
Journal:  Clin Epigenetics       Date:  2020-06-18       Impact factor: 6.551

8.  Epigenome-wide association study of depression symptomatology in elderly monozygotic twins.

Authors:  A Starnawska; Q Tan; M Soerensen; M McGue; O Mors; A D Børglum; K Christensen; M Nyegaard; L Christiansen
Journal:  Transl Psychiatry       Date:  2019-09-02       Impact factor: 6.222

9.  Impact of admixture and ancestry on eQTL analysis and GWAS colocalization in GTEx.

Authors:  Nicole R Gay; Michael Gloudemans; Margaret L Antonio; Nathan S Abell; Brunilda Balliu; YoSon Park; Alicia R Martin; Shaila Musharoff; Abhiram S Rao; François Aguet; Alvaro N Barbeira; Rodrigo Bonazzola; Farhad Hormozdiari; Kristin G Ardlie; Christopher D Brown; Hae Kyung Im; Tuuli Lappalainen; Xiaoquan Wen; Stephen B Montgomery
Journal:  Genome Biol       Date:  2020-09-11       Impact factor: 13.583

10.  Multiple lifestyle factors and depressed mood: a cross-sectional and longitudinal analysis of the UK Biobank (N = 84,860).

Authors:  Jerome Sarris; Russell Thomson; Fiona Hargraves; Melissa Eaton; Michael de Manincor; Nicola Veronese; Marco Solmi; Brendon Stubbs; Alison R Yung; Joseph Firth
Journal:  BMC Med       Date:  2020-11-12       Impact factor: 8.775

View more
  1 in total

1.  Editorial: Psychiatric Comorbidities in the Epilepsies: Extensive Mechanisms and Broad Questions.

Authors:  Rafael Naime Ruggiero; Jose Eduardo Peixoto-Santos; Lezio Soares Bueno-Junior; Kette D Valente; Joao Pereira Leite
Journal:  Front Integr Neurosci       Date:  2022-06-17
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.