Literature DB >> 29187746

Genome-wide haplotype-based association analysis of major depressive disorder in Generation Scotland and UK Biobank.

David M Howard¹, Lynsey S Hall², Jonathan D Hafferty², Yanni Zeng^2,3, Mark J Adams², Toni-Kim Clarke², David J Porteous⁴, Reka Nagy³, Caroline Hayward^3,5, Blair H Smith^5,6, Alison D Murray^5,7, Niamh M Ryan⁴, Kathryn L Evans^4,8, Chris S Haley³, Ian J Deary^5,8,9, Pippa A Thomson^4,8, Andrew M McIntosh^2,5,8.

Abstract

Genome-wide association studies using genotype data have had limited success in the identification of variants associated with major depressive disorder (MDD). Haplotype data provide an alternative method for detecting associations between variants in weak linkage disequilibrium with genotyped variants and a given trait of interest. A genome-wide haplotype association study for MDD was undertaken utilising a family-based population cohort, Generation Scotland: Scottish Family Health Study (n = 18,773), as a discovery cohort with UK Biobank used as a population-based replication cohort (n = 25,035). Fine mapping of haplotype boundaries was used to account for overlapping haplotypes potentially tagging the same causal variant. Within the discovery cohort, two haplotypes exceeded genome-wide significance (P < 5 × 10-8) for an association with MDD. One of these haplotypes was nominally significant in the replication cohort (P < 0.05) and was located in 6q21, a region which has been previously associated with bipolar disorder, a psychiatric disorder that is phenotypically and genetically correlated with MDD. Several haplotypes with P < 10-7 in the discovery cohort were located within gene coding regions associated with diseases that are comorbid with MDD. Using such haplotypes to highlight regions for sequencing may lead to the identification of the underlying causal variants.

Entities: Chemical

Mesh：

Year: 2017 PMID： 29187746 PMCID： PMC5802488 DOI： 10.1038/s41398-017-0010-9

Source DB: PubMed Journal: Transl Psychiatry ISSN： 2158-3188 Impact factor: 7.989

Introduction

Major depressive disorder (MDD) is a complex and clinically heterogeneous condition with core symptoms of low mood and/or anhedonia over a period of at least two weeks. MDD is frequently comorbid with other clinical conditions, such as cardiovascular disease[1], cancer[2] and inflammatory diseases[3]. This complexity and comorbidity suggests heterogeneity of aetiology and may explain why there has been limited success in identifying causal genetic variants[4-7], despite heritability estimates ranging from 28 to 37%[8,9]. Single-nucleotide polymorphism (SNP)-based analyses are unlikely to fully capture the variation in regions surrounding the genotyped markers, including untyped lower-frequency variants and those that are in weak linkage disequilibrium (LD) with the common SNPs on many genotyping arrays. Haplotype-based analysis may help improve the detection of causal genetic variants as, unlike single SNP-based analysis, it is possible to assign the strand of sequence variants and combine information from multiple SNPs to identify rarer causal variants. A number of studies[10-12] have identified haplotypes associated with MDD, albeit by focussing on particular regions of interest. In the current study, a family and population-based cohort Generation Scotland: Scottish Family Health Study (GS:SFHS) was utilised to ascertain genome-wide haplotypes in closely and distantly related individuals[13]. A haplotype-based association analysis was conducted using MDD as a phenotype, followed by additional fine mapping of haplotype boundaries with a replication and meta-analysis performed using the UK Biobank cohort[14].

Materials and methods

Discovery cohort

The discovery phase of the study used the family and population-based Generation Scotland: Scottish Family Health Study (GS:SFHS) cohort[13], consisting of 23,960 individuals of whom 20,195 were genotyped with the Illumina OmniExpress BeadChip (706,786 SNPs). Individuals with a genotype call rate <98% were removed, as well as those SNPs with a call rate <98%, a minor allele frequency (MAF) < 0.01 or those deviating from Hardy–Weinberg equilibrium (P < 10−6). Individuals who were identified as population outliers through principal component analyses of their genotypic information were also removed[15]. Following quality control there were 19,904 GS:SFHS individuals (11,731 females and 8173 males) that had genotypic information for 561,125 autosomal SNPs. These individuals ranged from 18–99 years of age with an average age of 47.4 years and a standard deviation of 15.0 years. There were 4933 families that had at least two related individuals, this included 1799 families with two members, 1216 families with three members and 829 families with four members. The largest family group consisted of 31 related individuals and there were 1789 individuals that had no other family members within GS:SFHS.

Replication cohort

The population-based UK Biobank[16] (provided as part of project #4844) was used as a replication cohort to assess those haplotypes within GS:SFHS with P < 10−6. The UK Biobank data consisted of 152,249 individuals with genomic data for 72,355,667 imputed variants[17]. The SNPs genotyped in GS:SFHS were extracted from the UK Biobank data and those variants with an imputation accuracy <0.8 were removed, leaving 555,782 variants in common between the two cohorts. Those genotyped individuals listed as non-white British and those that had also participated in GS:SFHS were removed from within UK Biobank, leaving a total of 119,955 individuals.

Genotype phasing and haplotype formation

The genotype data for GS:SFHS and UK Biobank was phased using SHAPEIT v2.r837[18]. Genome-wide phasing was conducted on the GS:SFHS cohort, while the phasing of UK Biobank was conducted on a 50 Mb window centred on those haplotypes identified within GS:SFHS with P < 10−6. The relatedness within GS:SFHS made it suitable for the application of the duoHMM method, which improves phasing accuracy by also incorporating family information[19]. The default window size of 2 Mb was used for UK Biobank and a 5 Mb window was used for GS:SFHS as larger window sizes have been demonstrated to be beneficial when there is increased identity by descent (IBD) in the population[18]. The number of conditioning states per SNP was increased from the default of 100 states to 200 states to improve phasing accuracy, with the default effective population size of 15,000 used. To calculate the recombination rates between SNPs during phasing the HapMap phase II b37[20] was used. This build was also used to partition the phased data into haplotypes. Three window sizes (1cM, 0.5cM and 0.25cM) were used to establish the SNPs that formed each haplotype[21]. Each window was then moved along the genome by a quarter of the respective window size. There were a total of 97,333 windows with a mean number of SNPs per window of 157, 79 and 34 for the 1, 0.5 and 0.25cM windows, respectively. Windows that were less five SNPs in length were removed. The frequency (p) of each observed haplotype (A) was calculated as:where a represents all other haplotypes in that window. A chi-squared test for Hardy–Weinberg equilibrium (X 2) for each haplotype was calculated as:where n is the number of individuals and q = 1 − p. Haplotypes with 0.995 < p < 0.005 or with X 2 > 24 (P < 10−6) were not tested for association, however, they were included within the alternative haplotype. Following this quality control there were a total of 2,618,094 haplotypes remaining for analysis. The reported haplotype positions relate to the outermost SNPs within each haplotype are in base pair (bp) position according to GRCh37. To approximate the number of independently segregating haplotypes the clump command within Plink v1.90[22] was applied. This provides an estimation of the Bonferroni correction required for multiple testing. When applying an LD r threshold of <0.4 there were 1,070,216 independently segregating haplotypes within GS:SFHS, equating to a P-value < 5 × 10−8 for genome-wide significance. This threshold is also frequently applied to SNP-based and sequence-based association studies to account for multiple testing[23].

Phenotype ascertainment

Discovery cohort

Within GS:SFHS a diagnosis of MDD was made using initial screening questions and the Structured Clinical Interview for the Diagnostic and Statistical Manual of Mental Disorders (SCID)[24]. The SCID is an internationally validated approach to identifying episodes of depression and was conducted by clinical nurses trained in its administration. Further details regarding this diagnostic assessment have been described previously[25]. In this study, MDD was defined by at least one instance of a major depressive episode which initially identified 2659 cases, 17,237 controls and 98 missing (phenotype unknown) individuals. In addition, the psychiatric history of cases and controls was examined using the Scottish Morbidity Record[26]. Within the control group, 1072 participants were found to have attended at least one psychiatry outpatient clinic and were excluded from the study. In addition, 47 of the MDD cases were found to have additional diagnoses of either bipolar disorder or schizophrenia in psychiatric inpatient records and were also excluded from the study. These participants had given prior consent for anonymised access to routine administrative clinical data. In total there were 2605 MDD cases and 16,168 controls following the removal of individuals based on patient records and population stratification, equating to a prevalence of 13.9% for MDD in this cohort.

Replication cohort

Within the UK Biobank cohort, 25,035 participants (12,528 males and 12,507 females) completed a touchscreen assessment of depressive symptoms and previous treatment. These participants ranged from 40 to 79 years of age with a mean age of 57.8 years and a standard deviation of 8.0 years. On the basis of their responses to items from the Patient Health Questionnaire, diagnostic status was defined as either ‘probable single lifetime episode of major depression’ or ‘probable recurrent major depression (moderate and severe)’ and with control status defined as ‘no mood disorder’ using the definitions provided by Smith et al.[14]. MDD Cases were defined by reporting that they had ever been depressed/down for a whole week (UK Biobank field number 4598); plus this was for at least a two week period (UK Biobank field number 4609); plus this was for at least one episode (UK Biobank field number 4620); plus ever seen a GP (UK Biobank field number 2090) or psychiatrist (UK Biobank field number 2100) for nerves, anxiety, tension or depression. Alternatively, MDD cases were also defined by reporting that they had ever been uninterested in things or unable to enjoy the things you used to for at least a whole week (UK Biobank field number 4631); plus this was for at least a two week period (UK Biobank field number 5375); plus this was for at least one episode (UK Biobank field number 5386); plus ever seen a GP (UK Biobank field number 2090) or psychiatrist (UK Biobank field number 2100) for nerves, anxiety, tension or depression. In total there were 8508 cases and 16,527 controls, equating to a trait prevalence of 34.0% in this cohort, after the removal of individuals with insufficient information or ambiguous phenotypes.

Statistical approach

A mixed linear model was used to conduct an association analysis using GCTA v1.25.0:[27] where y was the vector of binary observations for MDD. β was the matrix of fixed effects, including haplotype, sex, age and age[2]. Each unique haplotype was represented as a distinct allele and was either coded as 0, 1 or 2 depending on the number of haplotypes carried by that individual. u was fitted as a random effect taking into account the genomic relationships (MVN (0,), where G was a SNP-based genomic relationship matrix[28]). was a random effect fitting a second genomic relationship matrix G (MVN (0,) which modelled only the more closely related individuals[29]. G was equal to G except that off-diagonal elements <0.05 were set to 0. X, Z 1 and Z 2 were the corresponding incidence matrices. ε was the vector of residual effects and was assumed to be normally distributed, MVN (0,I ). The inclusion of the second genomic relationship matrix, G , was deemed desirable as the fitting of the single matrix G alone resulted in significant population stratification (intercept = 1.029 ± 0.003, λGC = 1.026) following examination with LD score regression[30]. The fitting of both genomic relationship matrices simultaneously produced no evidence of bias due to population stratification (intercept = 1.002 ± 0.003, λGC = 1.005). A mixed linear model was used to assess the haplotypes in UK Biobank, which were identified in the discovery cohort with P < 10−6 using GCTA v1.25.0:[27] where y was the vector of binary observations for MDD. β was the matrix of fixed effects, including haplotype, sex, age, age[2], genotyping batch and recruitment centre. u was fitted as a random effect taking into account the SNP-based genomic relationships (MVN (0,).X and Z 1 were the corresponding incidence matrices and ε was the vector of residual effects and was assumed to be normally distributed, MVN (0, I ). Replication success was judged on the statistical significance of each haplotype using an inverse variance-weighted meta-analysis across both cohorts conducted using Metal[31].

Fine mapping

The method described above examines the effect of each haplotype against all other haplotypes in that window. Therefore, a haplotype could be assessed against similar haplotypes containing the same causal variant, limiting any observed phenotypic association. To investigate whether there were causal variants located within directly overlapping haplotypes of the same window size, fine mapping of haplotype boundaries was used. Where there were directly overlapping haplotypes, each with P < 10−3 and with an effect in the same direction, i.e., both causal or both preventative, then any shared consecutive regions formed a new haplotype that was assessed using the mixed-model described previously. This new haplotype was assessed using all individuals and was required to be at least five SNPs in length. A total of 47 new haplotypes were assessed from within 26 pairs of directly overlapping haplotypes.

Results

An association analysis for MDD was conducted using 2,618,094 haplotypes and 47 fine mapped haplotypes within the discovery cohort, GS:SFHS. A genome-wide Manhattan plot of –log10 P-values for these haplotypes is provided in Fig. 1 with a q–q plot provided in Supplementary Fig. S1. Within the discovery cohort, two haplotypes exceeded genome-wide significance (P < 5 × 10−8) for an association with MDD, one located on chromosome 6 and the other located on chromosome 10. There were 12 haplotypes with P < 10−6 in the discovery cohort with replication sought for these haplotypes using UK Biobank. Summary statistics from both cohorts and the meta-analysis for these 12 haplotypes are provided in Table 1. The protein coding genes which overlap these 12 haplotypes along with the observed haplotype frequencies within the two cohorts are provided in Table 2. The SNPs and alleles that constitute these 12 haplotypes are provided in Supplementary Table S1.

Fig. 1

Manhattan plot representing the –log10 P-values for an association between each assessed haplotype in the Generation Scotland: Scottish Family Health Study cohort and Major Depressive Disorder

Table 1

The genetic association between major depressive disorder and 12 haplotypes in the generation Scotland: Scottish Family Health Study (GS:SFHS) discovery cohort (where P < 10−6), the replication cohort (UK Biobank) and a meta-analysis

Haplotype			GS:SFHS		UK biobank		Meta-analysis
Chr.	Position (bp)	Window size (cM)	Odds ratio (95% CI)	P-value	Odds ratio (95% CI)	P-value	Odds ratio (95% CI)	P-value
6^a	108,338,267 − 108,454,437	0.34	1.83 (1.53–2.16)	7.06 × 10 ⁻⁹	1.11 (1.01–1.22)	3.62 × 10 ⁻²	1.26 (1.16–1.37)	3.14 × 10⁻⁷
6	108,407,662–108,454,437	0.25	1.68 (1.42–1.96)	8.17 × 10⁻⁸	1.14 (1.04–1.24)	4.47 × 10 ⁻³	1.25 (1.16–1.35)	4.38 × 10 ⁻⁸
7	139,682,412–139,708,901	0.25	2.17 (1.67–2.73)	4.37 × 10⁻⁷	0.87 (0.68–1.08)	2.20 × 10⁻¹	1.28 (1.08–1.49)	4.67 × 10⁻³
8	79,700,362–80,387,861	0.5	1.98 (1.56–2.46)	9.02 × 10⁻⁷	1.06 (0.86–1.28)	5.93 × 10⁻¹	1.36 (1.18–1.56)	6.29 × 10⁻⁵
8	79,759,499–80,156,474	0.25	1.77 (1.47–2.10)	7.90 × 10⁻⁸	1.05 (0.91–1.21)	5.06 × 10⁻¹	1.28 (1.15–1.42)	1.14 × 10⁻⁵
10	4,588,261–4,822,210	0.5	2.33 (1.83–2.91)	8.50 × 10 ⁻⁹	1.15 (0.80–1.59)	4.39 × 10⁻¹	1.67 (1.40–1.98)	7.92 × 10⁻⁸
11^a	2,260,854–2,437,425	0.41	1.64 (1.38–1.91)	2.86 × 10⁻⁷	1.00 (0.87–1.34)	9.91 × 10⁻¹	1.26 (1.10–1.34)	1.32 × 10⁻⁴
12	48,159,721–48,263,828	0.25	2.00 (1.58–2.47)	4.78 × 10⁻⁷	0.97 (0.79–1.17)	7.36 × 10⁻¹	1.29 (1.12–1.48)	6.51 × 10⁻⁴
12	116,904,503–117,062,860	0.25	2.13 (1.64–2.69)	9.90 × 10⁻⁷	1.04 (0.79–1.34)	7.79 × 10⁻¹	1.45 (1.22–1.71)	5.37 × 10⁻⁵
15	49,206,902–49,260,601	0.25	2.03 (1.62–2.48)	9.21 × 10⁻⁸	1.09 (0.88–1.32)	4.04 × 10⁻¹	1.41 (1.22–1.61)	4.39 × 10⁻⁶
15	93,806,447–93,851,224	0.5	1.58 (1.34–1.83)	4.47 × 10⁻⁷	0.93 (0.81–1.05)	2.38 × 10⁻¹	1.16 (1.05–1.27)	2.50 × 10⁻³
15	93,821,340–93,845,622	0.25	1.52 (1.31–1.75)	8.67 × 10⁻⁷	0.91 (0.81–1.03)	1.37 × 10⁻¹	1.13 (1.03–1.23)	6.97 × 10⁻³

Bold values indicate genome-wide statistical significance (P < 5 × 10−8) was achieved in the GS:SFHS cohort or the meta-analysis, or that nominal statistical significance (P < 0.05) was achieved in the UK Biobank. Base pair (bp) positions are based on build GRCh37. aindicates haplotype boundaries defined by the fine mapping approach. { indicates linkage disequilibrium (r ) > 0.5 between haplotypes in the GS:SFHS cohort

Table 2

Protein coding genes located overlapping with the 12 haplotypes with P < 10−6 in the generation Scotland: Scottish family health study (GS:SFHS) discovery cohort and the frequencies of those haplotypes in GS:SFHS and UK Biobank

			Haplotype frequency
Chr.	Position (bp)	Protein coding genes	GS:SFHS	UK Biobank
6	108,338,267–108,454,437	OSTM1	0.0152	0.0197
6	108,407,662–108,454,437	OSTM1	0.0193	0.0241
7	139,682,412–139,708,901	TBXAS1	0.0066	0.0069
8	79,700,362–80,387,861	IL7	0.0076	0.0081
8	79,759,499–80,156,474	IL7	0.0147	0.0157
10	4,588,261–4,822,210		0.0064	0.0027
11	2,260,854–2,437,425	ASCL2, CLorf21, TSPAN32, CD81, TSSC4, TRPM5	0.0196	0.0187
12	48,159,721–48,263,828	SLC48A1, RAPGEF3, HDAC7, VDR	0.0078	0.0090
12	116,904,503–117,062,860	MAP1LC3B2	0.0057	0.0045
15	49,206,902–49,260,601	SHC4	0.0082	0.0080
15	93,806,447–93,851,224		0.0224	0.0206
15	93,821,340–93,845,622		0.0265	0.0243

Base pair (bp) positions are based on build GRCh37 with protein coding regions obtained from Ensembl, GRCh37.p13. Haplotype frequencies were calculated using unrelated individuals and excluding UK Biobank participants recruited in Glasgow or Edinburgh. { indicates a linkage disequilibrium (r ) > 0.5 between haplotypes in the GS:SFHS cohort

Manhattan plot representing the –log10 P-values for an association between each assessed haplotype in the Generation Scotland: Scottish Family Health Study cohort and Major Depressive Disorder The genetic association between major depressive disorder and 12 haplotypes in the generation Scotland: Scottish Family Health Study (GS:SFHS) discovery cohort (where P < 10−6), the replication cohort (UK Biobank) and a meta-analysis Bold values indicate genome-wide statistical significance (P < 5 × 10−8) was achieved in the GS:SFHS cohort or the meta-analysis, or that nominal statistical significance (P < 0.05) was achieved in the UK Biobank. Base pair (bp) positions are based on build GRCh37. aindicates haplotype boundaries defined by the fine mapping approach. { indicates linkage disequilibrium (r ) > 0.5 between haplotypes in the GS:SFHS cohort Protein coding genes located overlapping with the 12 haplotypes with P < 10−6 in the generation Scotland: Scottish family health study (GS:SFHS) discovery cohort and the frequencies of those haplotypes in GS:SFHS and UK Biobank Base pair (bp) positions are based on build GRCh37 with protein coding regions obtained from Ensembl, GRCh37.p13. Haplotype frequencies were calculated using unrelated individuals and excluding UK Biobank participants recruited in Glasgow or Edinburgh. { indicates a linkage disequilibrium (r ) > 0.5 between haplotypes in the GS:SFHS cohort The two haplotypes on chromosome 6 (LD r = 0.74) with P < 10−6 in the discovery cohort both achieved nominal significance (P < 0.05) in the replication cohort (although these would not survive multiple testing correction for the 12 SNPs tested in the replication data set), with one reaching genome-wide significance (P < 5 × 10−8) in the meta-analysis. A regional association plot of the region surrounding these haplotypes within GS:SFHS is provided in Fig. 2. Fine mapping was used to form the most significant haplotype within the discovery cohort. Two directly overlapping 0.5 cM haplotypes consisting of 28 SNPs were identified between 108,335,345 and 108,454,437 bp (rs7749081–rs212829). These two haplotypes had P-values of 3.24 × 10−5 and 5.57 × 10−5, respectively, and differed at a single SNP (rs7749081). Exclusion of this single SNP defined a new 27 SNP haplotype that had a genome-wide significant association with MDD (P = 7.06 × 10−9). Calculating the effect size at the population level[32], the estimates of the contribution of the two haplotypes to the total genetic variance was 2.09 × 10−4 and 2.38 × 10−4, respectively, within GS:SFHS. None of the individual SNPs located within either haplotype were associated with MDD in either cohort (P ≥ 0.05).

Fig. 2

Regional association plot representing the –log10 P-values for an association between haplotypes in the Generation Scotland: Scottish Family Health Study cohort and Major Depressive Disorder within the 107.4–107.6 Mb region on chromosome 6. The start and end position (using build GRCh37) of haplotypes represent the outermost SNP positions within the windows examined. The warmth of colour represents the r with the genome-wide significant haplotype located between 108,338,267 and 108,454,437 bp A genome-wide significant haplotype (P = 8.50 × 10−9) was identified on chromosome 10 within GS:SFHS using a 0.5 cM window. A regional association plot of the region surrounding this haplotype is provided in Fig. 3. This haplotype had an odds ratio (OR) of 2.33 (95% confidence interval (CI): 1.83 – 2.91) in the discovery cohort and an OR of 1.15 (95% CI: 0.80–1.59) in the replication cohort. These were the highest ORs observed in the respective cohorts. The estimate of the contribution of this haplotype to the total genetic variance was 2.29 × 10−4 in the discovery cohort. Association analysis of the 92 SNPs on this haplotype revealed that one SNP in GS:SFHS (rs17133585) and two SNPs in UK Biobank (rs12413638 and rs10904290) were nominally significant (P < 0.05), although none had P-values < 0.001.

Fig. 3

Discussion

Twelve haplotypes were identified in the discovery cohort with P < 10−6 of which two were significant at the genome-wide level (P < 5 × 10−8) in the discovery cohort and one which was genome-wide significant (P < 5 × 10−8) in the meta-analysis. A power analysis[33] was conducted using the genotype relative risks observed in the discovery cohort, the sample sizes and haplotype frequencies in the replication cohort and the prevalence of MDD reported for a structured clinical diagnosis of MDD in other high income counties (14.6%)[34]. There was sufficient power (>0.99) to detect the twelve haplotypes with P < 10−6 identified in the discovery cohort within the replication cohort at a significance threshold of 0.05. There are several reasons why the effect sizes observed in the replication cohort were lower than those observed in the discovery cohort. The causal loci may have been in lower LD with the assessed haplotypes in the replication cohort than in the discovery cohort lessening the observed effect. The phenotypes across the two cohorts were potentially heterogeneous (certainly with regards to the prevalence in each population) so the assessed haplotypes may have had differing effects on each cohort’s phenotype. A complementary approach to replication is to identify the gene coding regions within haplotypes that potentially provide a biologically informative explanation for an association with MDD. Those haplotypes with P < 10−7 in the discovery cohort and the gene coding regions that they overlap are discussed below. The two haplotypes on chromosome 6 overlapped with the Osteopetrosis Associated Transmembrane Protein 1 (OSTM1) coding gene. OSTM1 is associated with neurodegeneration[35,36] and melanocyte function[37], and alpha-melanocyte-stimulating hormone has been shown to have an effect on depression-like symptoms[38-40]. This haplotype lies within the 6q21 region that has been associated with bipolar disorder[41-45], a disease that shares symptoms with MDD and has a correlated phenotypic liability of 0.64[46]. This may indicate either a pleiotropic effect or clinical heterogeneity, whereby patients may be misdiagnosed, i.e., patients may have MDD and transition to bipolar disorder in the future or are sub-threshold for bipolar disorder and instead given a diagnosis of MDD. The haplotype identified on chromosome 8 overlapped with the Interleukin 7 (IL7) protein coding region. IL7 is involved in maintaining T-cell homoeostasis[47] and proliferation[48], which in turn contributes to the immune response to pathogens. It has been proposed that impaired T-cell function may be a factor in the development of MDD[49], with depressed subjects found to have elevated[50] or depressed levels[51] of IL7 serum. There is conjecture as to whether MDD causes inflammation or represents a reaction to an increased inflammatory response[52,53], but it is most likely to be a bidirectional relationship[51]. The haplotype on chromosome 10 overlapped with two RNA genes: long intergenic non-protein coding RNA 704 (LINC00704) and long intergenic non-protein coding RNA 705 (LINC00705). The function of these non-protein coding genes is unreported. However, a study of cardiac neonatal lupus, which is a rare autoimmune disease demonstrated an association for a SNP (rs1391511) which is 15kb from LINC00705. Two Dutch studies[54,55] have identified a variant (rs8023445) on chromosome 15 located within the SRC (Src homology 2 domain containing) family, member 4 (SHC4) gene coding region that has a moderate degree of association with MDD (P = 1.64 × 10−5 and P = 9 × 10−6, respectively). A variant (rs10519201) within the SHC4 coding region was also found to have an association (P = 6.16 × 10−6) with Obsessive-Compulsive Personality Disorder in a UK-based study[56]. SHC4 is expressed in neurons[57] and regulates BDNF-induced MAPK activation[58], which has been shown to be a key factor in MDD pathophysiology[59]. The SHC4 region overlaps with the haplotype on chromosome 15 identified in the discovery cohort (located at 49,206,902–49,260,601 bp) and, therefore, further research to examine the association between the SHC4 region and psychiatric disorders could be warranted. Haplotype-based analyses are capable of tagging variants due to the LD between the untyped variants and the multiple flanking genotyped variants which make up the inherited haplotype. This approach should provide greater power when there is comparatively higher IBD sharing, such as in GS:SFHS which was a family-based cohort, where there is a greater likelihood that a single haplotype is tagging the same causal variant across that population. The UK Biobank was selected as replication cohort as it is a large population-based sample that was expected to be genetically similar to the GS:SFHS discovery cohort. This was confirmed by the similarity of the observed haplotype frequencies (Table 2) between the two cohorts. The prevalence of MDD observed in the discovery cohort (13.7%) was comparable to that reported (14.6%) within similar populations[34]. However, in the replication cohort, the trait prevalence was notably higher (34.0%), most likely due to the differing methods of phenotypic ascertainment. Additional work could seek to replicate the findings in further cohorts, as well as full meta-analysis of all haplotypes within those cohorts. An additive model was used to analyse the haplotypes and alternative approaches could implement a dominant model or an analysis of diplotypes (haplotype pairs) for association with MDD.

Conclusions

This study identified two haplotypes within the discovery cohort that exceeded genome-wide significance for association with a clinically diagnosed MDD phenotype. One of these haplotypes was nominally significant in the replication cohort and was in LD with a haplotype that was genome-wide significant in the meta-analysis. The genome-wide significant haplotype on chromosome 6 was located on 6q21, which has been shown previously to be related to psychiatric disorders. There were a number of haplotypes approaching genome-wide significance located within genic regions associated with diseases that are comorbid with MDD and, therefore, these regions warrant further investigation. The total genetic variance explained by the haplotypes identified was small, however, these haplotypes potentially represent biologically informative aetiological subtypes for MDD and merit further analysis. Supplementary Figure S1 Supplementary Table 1

56 in total

1. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies.

Authors: Brendan K Bulik-Sullivan; Po-Ru Loh; Hilary K Finucane; Stephan Ripke; Jian Yang; Nick Patterson; Mark J Daly; Alkes L Price; Benjamin M Neale
Journal: Nat Genet Date: 2015-02-02 Impact factor: 38.330

2. Common SNPs explain a large proportion of the heritability for human height.

Authors: Jian Yang; Beben Benyamin; Brian P McEvoy; Scott Gordon; Anjali K Henders; Dale R Nyholt; Pamela A Madden; Andrew C Heath; Nicholas G Martin; Grant W Montgomery; Michael E Goddard; Peter M Visscher
Journal: Nat Genet Date: 2010-06-20 Impact factor: 38.330

3. ShcD interacts with TrkB via its PTB and SH2 domains and regulates BDNF-induced MAPK activation.

Authors: Yuangang You; Weiqi Li; Yanhua Gong; Bin Yin; Boqin Qiang; Jiangang Yuan; Xiaozhong Peng
Journal: BMB Rep Date: 2010-07 Impact factor: 4.778

4. Abnormal pituitary function during melancholia: reduced alpha-melanocyte-stimulating hormone secretion and increased intact ACTH non-suppression.

Authors: M Maes; C DeJonckheere; C Vandervorst; C Schotte; P Cosyns; J Raus; E Suy
Journal: J Affect Disord Date: 1991-07 Impact factor: 4.839

Review 5. Homeostasis of naive and memory T cells.

Authors: Charles D Surh; Jonathan Sprent
Journal: Immunity Date: 2008-12-19 Impact factor: 31.745

6. A second generation human haplotype map of over 3.1 million SNPs.

Authors: Kelly A Frazer; Dennis G Ballinger; David R Cox; David A Hinds; Laura L Stuve; Richard A Gibbs; John W Belmont; Andrew Boudreau; Paul Hardenbol; Suzanne M Leal; Shiran Pasternak; David A Wheeler; Thomas D Willis; Fuli Yu; Huanming Yang; Changqing Zeng; Yang Gao; Haoran Hu; Weitao Hu; Chaohua Li; Wei Lin; Siqi Liu; Hao Pan; Xiaoli Tang; Jian Wang; Wei Wang; Jun Yu; Bo Zhang; Qingrun Zhang; Hongbin Zhao; Hui Zhao; Jun Zhou; Stacey B Gabriel; Rachel Barry; Brendan Blumenstiel; Amy Camargo; Matthew Defelice; Maura Faggart; Mary Goyette; Supriya Gupta; Jamie Moore; Huy Nguyen; Robert C Onofrio; Melissa Parkin; Jessica Roy; Erich Stahl; Ellen Winchester; Liuda Ziaugra; David Altshuler; Yan Shen; Zhijian Yao; Wei Huang; Xun Chu; Yungang He; Li Jin; Yangfan Liu; Yayun Shen; Weiwei Sun; Haifeng Wang; Yi Wang; Ying Wang; Xiaoyan Xiong; Liang Xu; Mary M Y Waye; Stephen K W Tsui; Hong Xue; J Tze-Fei Wong; Luana M Galver; Jian-Bing Fan; Kevin Gunderson; Sarah S Murray; Arnold R Oliphant; Mark S Chee; Alexandre Montpetit; Fanny Chagnon; Vincent Ferretti; Martin Leboeuf; Jean-François Olivier; Michael S Phillips; Stéphanie Roumy; Clémentine Sallée; Andrei Verner; Thomas J Hudson; Pui-Yan Kwok; Dongmei Cai; Daniel C Koboldt; Raymond D Miller; Ludmila Pawlikowska; Patricia Taillon-Miller; Ming Xiao; Lap-Chee Tsui; William Mak; You Qiang Song; Paul K H Tam; Yusuke Nakamura; Takahisa Kawaguchi; Takuya Kitamoto; Takashi Morizono; Atsushi Nagashima; Yozo Ohnishi; Akihiro Sekine; Toshihiro Tanaka; Tatsuhiko Tsunoda; Panos Deloukas; Christine P Bird; Marcos Delgado; Emmanouil T Dermitzakis; Rhian Gwilliam; Sarah Hunt; Jonathan Morrison; Don Powell; Barbara E Stranger; Pamela Whittaker; David R Bentley; Mark J Daly; Paul I W de Bakker; Jeff Barrett; Yves R Chretien; Julian Maller; Steve McCarroll; Nick Patterson; Itsik Pe'er; Alkes Price; Shaun Purcell; Daniel J Richter; Pardis Sabeti; Richa Saxena; Stephen F Schaffner; Pak C Sham; Patrick Varilly; David Altshuler; Lincoln D Stein; Lalitha Krishnan; Albert Vernon Smith; Marcela K Tello-Ruiz; Gudmundur A Thorisson; Aravinda Chakravarti; Peter E Chen; David J Cutler; Carl S Kashuk; Shin Lin; Gonçalo R Abecasis; Weihua Guan; Yun Li; Heather M Munro; Zhaohui Steve Qin; Daryl J Thomas; Gilean McVean; Adam Auton; Leonardo Bottolo; Niall Cardin; Susana Eyheramendy; Colin Freeman; Jonathan Marchini; Simon Myers; Chris Spencer; Matthew Stephens; Peter Donnelly; Lon R Cardon; Geraldine Clarke; David M Evans; Andrew P Morris; Bruce S Weir; Tatsuhiko Tsunoda; James C Mullikin; Stephen T Sherry; Michael Feolo; Andrew Skol; Houcan Zhang; Changqing Zeng; Hui Zhao; Ichiro Matsuda; Yoshimitsu Fukushima; Darryl R Macer; Eiko Suda; Charles N Rotimi; Clement A Adebamowo; Ike Ajayi; Toyin Aniagwu; Patricia A Marshall; Chibuzor Nkwodimmah; Charmaine D M Royal; Mark F Leppert; Missy Dixon; Andy Peiffer; Renzong Qiu; Alastair Kent; Kazuto Kato; Norio Niikawa; Isaac F Adewole; Bartha M Knoppers; Morris W Foster; Ellen Wright Clayton; Jessica Watkin; Richard A Gibbs; John W Belmont; Donna Muzny; Lynne Nazareth; Erica Sodergren; George M Weinstock; David A Wheeler; Imtaz Yakub; Stacey B Gabriel; Robert C Onofrio; Daniel J Richter; Liuda Ziaugra; Bruce W Birren; Mark J Daly; David Altshuler; Richard K Wilson; Lucinda L Fulton; Jane Rogers; John Burton; Nigel P Carter; Christopher M Clee; Mark Griffiths; Matthew C Jones; Kirsten McLay; Robert W Plumb; Mark T Ross; Sarah K Sims; David L Willey; Zhu Chen; Hua Han; Le Kang; Martin Godbout; John C Wallenburg; Paul L'Archevêque; Guy Bellemare; Koji Saeki; Hongguang Wang; Daochang An; Hongbo Fu; Qing Li; Zhen Wang; Renwu Wang; Arthur L Holden; Lisa D Brooks; Jean E McEwen; Mark S Guyer; Vivian Ota Wang; Jane L Peterson; Michael Shi; Jack Spiegel; Lawrence M Sung; Lynn F Zacharia; Francis S Collins; Karen Kennedy; Ruth Jamieson; John Stewart
Journal: Nature Date: 2007-10-18 Impact factor: 49.962

7. Advantages and pitfalls in the application of mixed-model association methods.

Authors: Jian Yang; Noah A Zaitlen; Michael E Goddard; Peter M Visscher; Alkes L Price
Journal: Nat Genet Date: 2014-02 Impact factor: 38.330

8. A detailed examination of cytokine abnormalities in Major Depressive Disorder.

Authors: N M Simon; K McNamara; C W Chow; R S Maser; G I Papakostas; M H Pollack; A A Nierenberg; M Fava; K K Wong
Journal: Eur Neuropsychopharmacol Date: 2007-08-03 Impact factor: 4.600

9. A prospective evaluation of the directionality of the depression-inflammation relationship.

Authors: Jesse C Stewart; Kevin L Rand; Matthew F Muldoon; Thomas W Kamarck
Journal: Brain Behav Immun Date: 2009-05-04 Impact factor: 7.217

10. Depression and cardiac disease: epidemiology, mechanisms, and diagnosis.

Authors: Jeff C Huffman; Christopher M Celano; Scott R Beach; Shweta R Motiwala; James L Januzzi
Journal: Cardiovasc Psychiatry Neurol Date: 2013-04-07

7 in total

1. Electronic health record and genome-wide genetic data in Generation Scotland participants.

Authors: Shona M Kerr; Archie Campbell; Jonathan Marten; Veronique Vitart; Andrew M McIntosh; David J Porteous; Caroline Hayward
Journal: Wellcome Open Res Date: 2017-09-18

2. Cardiac Troponin T and Troponin I in the General Population.

Authors: Paul Welsh; David Preiss; Caroline Hayward; Anoop S V Shah; David McAllister; Andrew Briggs; Charles Boachie; Alex McConnachie; Sandosh Padmanabhan; Claire Welsh; Mark Woodward; Archie Campbell; David Porteous; Nicholas L Mills; Naveed Sattar
Journal: Circulation Date: 2019-04-24 Impact factor: 29.690

3. Haplotype-based genome-wide association studies for carcass and growth traits in chicken.

Authors: Hui Zhang; Lin-Yong Shen; Zi-Chun Xu; Luke M Kramer; Jia-Qiang Yu; Xin-Yang Zhang; Wei Na; Li-Li Yang; Zhi-Ping Cao; Peng Luan; James M Reecy; Hui Li
Journal: Poult Sci Date: 2020-03-26 Impact factor: 3.352

4. Blood-based epigenome-wide analyses of cognitive abilities.

Authors: Daniel L McCartney; Robert F Hillary; Eleanor L S Conole; Daniel Trejo Banos; Danni A Gadd; Rosie M Walker; Cliff Nangle; Robin Flaig; Archie Campbell; Alison D Murray; Susana Muñoz Maniega; María Del C Valdés-Hernández; Mathew A Harris; Mark E Bastin; Joanna M Wardlaw; Sarah E Harris; David J Porteous; Elliot M Tucker-Drob; Andrew M McIntosh; Kathryn L Evans; Ian J Deary; Simon R Cox; Matthew R Robinson; Riccardo E Marioni
Journal: Genome Biol Date: 2022-01-17 Impact factor: 13.583

5. The eMERGE genotype set of 83,717 subjects imputed to ~40 million variants genome wide and association with the herpes zoster medical record phenotype.

Authors: Ian B Stanaway; Taryn O Hall; Elisabeth A Rosenthal; Melody Palmer; Vivek Naranbhai; Rachel Knevel; Bahram Namjou-Khales; Robert J Carroll; Krzysztof Kiryluk; Adam S Gordon; Jodell Linder; Kayla Marie Howell; Brandy M Mapes; Frederick T J Lin; Yoonjung Yoonie Joo; M Geoffrey Hayes; Ali G Gharavi; Sarah A Pendergrass; Marylyn D Ritchie; Mariza de Andrade; Damien C Croteau-Chonka; Soumya Raychaudhuri; Scott T Weiss; Matt Lebo; Sami S Amr; David Carrell; Eric B Larson; Christopher G Chute; Laura Jarmila Rasmussen-Torvik; Megan J Roy-Puckelwartz; Patrick Sleiman; Hakon Hakonarson; Rongling Li; Elizabeth W Karlson; Josh F Peterson; Iftikhar J Kullo; Rex Chisholm; Joshua Charles Denny; Gail P Jarvik; David R Crosslin
Journal: Genet Epidemiol Date: 2018-10-08 Impact factor: 2.135

6. Exploring effective approaches for haplotype block phasing.

Authors: Ziad Al Bkhetan; Justin Zobel; Adam Kowalczyk; Karin Verspoor; Benjamin Goudey
Journal: BMC Bioinformatics Date: 2019-10-30 Impact factor: 3.169

7. Genome-wide haplotype association study in imaging genetics using whole-brain sulcal openings of 16,304 UK Biobank subjects.

Authors: Slim Karkar; Claire Dandine-Roulland; Jean-François Mangin; Yann Le Guen; Cathy Philippe; Jean-François Deleuze; Morgane Pierre-Jean; Edith Le Floch; Vincent Frouin
Journal: Eur J Hum Genet Date: 2021-03-04 Impact factor: 4.246

7 in total