Literature DB >> 35181757

Using phenotype risk scores to enhance gene discovery for generalized anxiety disorder and posttraumatic stress disorder.

Frank R Wendt^1,2, Gita A Pathak^3,4, Joseph D Deak^3,4, Flavio De Angelis^3,4, Dora Koller^3,4, Brenda Cabrera-Mendoza^3,4, Dannielle S Lebovitch^5,6,7,8, Daniel F Levey^3,4, Murray B Stein^9,10,11, Henry R Kranzler^12,13, Karestan C Koenen^14,15,16, Joel Gelernter^3,4,17,18, Laura M Huckins^{5,6,7,8,19,20}, Renato Polimanti^21,22.

Abstract

UK Biobank (UKB) is a key contributor in mental health genome-wide association studies (GWAS) but only ~31% of participants completed the Mental Health Questionnaire ("MHQ responders"). We predicted generalized anxiety disorder (GAD), posttraumatic stress disorder (PTSD), and major depression symptoms using elastic net regression in the ~69% of UKB participants lacking MHQ data ("MHQ non-responders"; NTraining = 50%; NTest = 50%), maximizing the informative sample for these traits. MHQ responders were more likely to be female, from higher socioeconomic positions, and less anxious than non-responders. Genetic correlation of GAD and PTSD between MHQ responders and non-responders ranged from 0.636 to 1.08; both were predicted by polygenic scores generated from independent cohorts. In meta-analyses of GAD (N = 489,579) and PTSD (N = 497,803), we discovered many novel genomic risk loci (13 for GAD and 40 for PTSD). Transcriptomic analyses converged on altered regulation of prenatal dorsolateral prefrontal cortex in these disorders. Our results provide one roadmap by which sample size and statistical power may be improved for gene discovery of incompletely ascertained traits in the UKB and other biobanks with limited mental health assessment.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35181757 PMCID： PMC9133008 DOI： 10.1038/s41380-022-01469-y

Source DB: PubMed Journal: Mol Psychiatry ISSN： 1359-4184 Impact factor: 13.437

Introduction

Psychiatric disorders are highly polygenic with thousands of risk loci across the genome contributing to their liability. Because of this polygenicity, extremely large sample sizes are required to detect the small individual effects associated with risk alleles.[1-6] Biobanks and consortia play a critical role in organizing, curating, and facilitating large genetic studies of mental health and psychopathology.[7-10] The UK Biobank (UKB) represents a resource of homogeneously ascertained participants with detailed information related to physical health, anthropometric measurements, and sociodemographic characteristics, etc. A primary limitation of UKB for studying mental health is the limited availability of participant responses to voluntary mental health questions and surveys. Among the approximately 502,000 UKB participants, only 31% completed the online Mental Health Questionnaire (herein termed “MHQ responders”).[11] These missing data impose an upper limit on the UKB sample that is available for genetic studies using direct information. Indeed, many studies have had only modest success with risk locus discovery when studying psychopathologies in the subset of MHQ responders.[4, 12] We hypothesized that carefully selected features ascertained in the entire UKB could permit genetic studies of MHQ phenotypes in the UKB participants who did not complete the survey (herein termed “MHQ non-responders”).[13] We demonstrate the reliability of studying the collection of comorbid phenotypes, hereafter referred as a co-phenome,[13] using several independent methods. Here we maximized the use of unrelated individuals from the UKB – more than doubling the available sample size relative to only MHQ responders – for genome-wide association studies (GWAS) of generalized anxiety disorder (GAD) and posttraumatic stress disorder (PTSD) symptoms. In meta-analyses adjusted for the effects of the major co-phenome correlate and an important transdiagnostic feature of internalizing psychopathologies, neuroticism, we identified multi-omic and cross-phenotype contributions of genes expressed in the prenatal brain. Using these novel GAD and PTSD data, we report putative cross-phenotype drug repurposing targets and identify drugs that may induce adverse effects that resemble anxiety symptoms. Our results provide one roadmap by which sample size and statistical power may be improved for gene discovery of incompletely ascertained traits in the UKB and other biobanks with limited mental health assessment.

Subjects and Methods

UKB Participants and Genetic Data

The UKB is a population-based cohort of over 502 000 participants that assesses a wide range of factors including physical health, anthropometric measurements, circulating biomarkers, and sociodemographic characteristics. The use of UKB individual-level data has been conducted through application reference number 58146. UKB has approval from the North West Multi-Center Research Ethics (MREC) as a Research Tissue Bank (RTB) approval. This approval means that researchers do not require separate ethical clearance and can operate under the RTB approval. A subset of individuals (N=157 366) completed an online mental health questionnaire (MHQ)[11] covering topics of self-reported mental health and well-being. GAD-7[14], PCL-6[15], and PHQ-9[14] were derived from the MHQ using summed totals of participants responses to various questions (Supplementary Methods). Mean scores among MHQ-responders of European (EUR) descent were 8.97±3.09 (N=124 534) for GAD-7, 6.59±3.68 (N=126 219) for PCL-6, and 11.73±3.67 for PHQ-9 (N=110 291). Briefly, UKB participants were genotyped using a custom Axiom array capturing genome-wide genetic variation and short insertion/deletions, including coding variants across a range of minor allele frequencies and markers providing good coverage for imputation in EUR populations. UKB was imputed using the Haplotype Reference Consortium reference panel.[8]

Million Veteran Program Phenotypes

Independent assessments of anxiety, PTSD, and depression phenotypes were obtained from the MVP (Supplementary Methods). We used GWAS data from a two-item anxiety trait (GAD-2, N=199 611),[16] a 17-item PTSD trait (PCL-17, N=186 689),[5] and a broad depression trait for which cases were identified by 18 ICD codes (N=1 154 267).

First-Pass Feature Selection

The UKB assesses thousands of potentially informative phenotypes for predicting a given outcome. We selected features with >200 000 responses, not part of the MHQ, and lacking highly dimensional structure (e.g., ICD-9/10 codes, medication endorsements), and those attributes available through special requests (e.g., greenspace and water percentages). The final feature set included 772 phenotypes.

Elastic Net Regression Parameter Optimization

To further refine the feature list, we selected phenome-wide Spearman correlates of GAD-7, PCL-6, and PHQ-9 (tested in 132 016 unrelated MHQ-responders) where the estimate of ρ was based on data from at least 10% of the sample. Multiple iterations of elastic net regression were performed in glmnet[17] using varying thresholds of Spearman’s rho for feature inclusion. We tested three training and test proportions (25%train∣75%test, 50%train∣50%test, and 75%train∣25%test) and four thresholds of rho: ρ>0.3, >0.25, >0.2, >0.15. Standardized 50-fold cross-validation used the best-fit penalizing parameter lambda. Parameter combination success was determined by comparing predicted outcomes to direct-report outcomes. Using the optimal feature inclusion settings, feature weights (elastic net β) were extracted and used to calculate phenotype risk scores (PheRS).

Co-phenome Risk Scores

PheRS are the weighted sum of the co-phenome:[13] , where N is the number of phenotypes determined by Spearman’s ρ, x is 0 if the trait response was coded as missing, “prefer not to answer,” or a comparable derivative indicating a non-answer to the question, and w is the effect size (β) obtained from elastic net regression. Sample sizes for each feature are provided in Supplementary Table 1.

GWAS and Meta-Analysis

GWAS were performed in unrelated EUR MHQ-responders and non-responders separately. Detailed description of sample quality control is provided in the Supplementary Methods or at https://pan.ukbb.broadinstitute.org/docs/technical-overview. Linear regression was performed in PLINK 2.0 using SNPs with imputation INFO scores>0.8, minor allele frequencies>0.01, missingness<0.05, and Hardy-Weinberg equilibrium P-values>1x10−10. We included age, sex, age×sex, and the first ten within-ancestry principal components as covariates in each GWAS. GWAS of each trait were meta-analyzed together and again with the MVP counterpart GWAS using METAL.[18] Per meta-analyzed GWAS, we applied a genome-wide significance threshold of P<5x10−8. To account for multiple testing, we considered a study-wide significance threshold of P<1.25x10−8=0.05/1 000 000 LD independent SNPs in EUR/2 meta-analyses/2 internalizing traits.

Reliability of Co-Phenome Risk Scores

We determined the reliability of predicted traits and PheRS several ways. First, within MHQ-responders and non-responders, we correlated quantitative outcomes, PheRS, and case/control status. Second, we tested genetic correlation (r) within and between MHQ-responders and non-responders. As controls, we included GWASs of neuroticism (expected positive r) and subjective well-being (expected negative r).[19] SNP-heritability (h) and r were calculated using Linkage Disequilibrium Score Regression (LDSC) using the 1000 Genomes Project (1kGP) EUR reference. Liability-scale h estimates were generated using GAD = 16%,[20] PTSD = 7%,[21] MDD = 20%[22] population prevalence estimates. Third, we calculated polygenic risk scores (PRS) for each unrelated EUR participant in the UKB using GAD-2[16] and PTSD PCL-17[5] from the Million Veteran Program (MVP). To our knowledge, these GWAS represent the largest and most powerful genetic assessments of GAD and PTSD outcomes with no known overlap with UKB. PRS were calculated with PRSice v2[23] with the following clumping parameters to select linkage disequilibrium independent variants: r=0.001, P=1, in 10 000-kb windows. Relationships between PRS, PheRS, quantitative outcomes, and case/control status were covaried with age, sex, age×sex, and ten within-ancestry principal components.

Functional Annotation

Liability loci were mapped with Multi-marker Analysis of GenoMic Annotation (MAGMA v1.08) implemented in FUMA v1.6a[24] using 2-kb window and r>0.6.[25] Enrichment of tissue transcriptomic profiles was tested relative to Genotype-Tissue Expression (GTEx v8[26]) 53 tissues and the BrainSpan Atlas of the Developing Human Brain[27] age-stratified brain tissues. Cell-type transcriptomic profile enrichments were performed using 13 human-specific transcriptomic datasets related to the brain (Supplementary Methods) and assessed in three ways: (1) profile enrichment within each dataset, (2) within-dataset conditionally independent profile enrichment and (3) across-dataset conditionally independent profile enrichment.[25] Hi-C coupled MAGMA was used to perform gene-based association tests in the context of fetal brain chromatin interaction data using the 1kGP EUR reference.[28]

Locus Fine Mapping

LD-independent regions with ≤10 causal variants were fine-mapped to determine the 95% credible set using susieR.[29] A variant’s credible set membership (i.e., the variant is among the most likely causal variants) was evaluated using the posterior inclusion probability (PIP). PIP ranges from 0-1 with values closer to 1 indicating greater causal probability.

Casual Effect of Medication Use

Twenty-three GWAS of medication use[60] were evaluated for genetic overlap and causal relationships with GAD and PTSD. Each GWAS tested associated ~7 million SNPs and medication endorsement (e.g., diuretics, opioids, antidepressants, etc.) in more than 320 000 European ancestry participants from the UKB. Data may be accessed here: https://cnsgenomics.com/content/data. We performed Mendelian randomization (MR) to test the bidirectional causal relationship between two traits. MR relies on three assumptions about the genetic instrument: (i) SNPs are associated with the exposure, (ii) SNPs are not associated with confounding factors, and (iii) SNPs are associated with the outcome only through its association with the exposure. Using the R package TwoSampleMR, we tested different MR methods to account for instrumental variable weakness and perform sensitivity tests. Latent causal variable (LCV[56]) analysis infers genetic causal relationships between trait pairs. LCV assumptions include (i) symmetry in cross-trait shared genetic architectures arises from a latent genetic component rather than a non-genetic confounder and (ii) a single latent factor mediates trait relationships. LCV modelling was implemented in R using the 1kGP EUR reference. The genetic causality proportion (gĉp) is the degree to which genetic risk for trait 1 is causal for trait 2. Gĉp estimates range from 0 to 1 with values closer to 1 indicating fully causal relationships. The Supplementary Material provides a detailed description of these methods.

Drug Repurposing

Drug repurposing was performed using Gene2drug[30] which uses gene-set enrichment analysis to reveal pathways of genes up- or down-regulated by a drug based on gene expression profiles from ConnectivityMap.[31] Gene2drug reports a P-value for the Kolmogorov-Smirnov statistic. Each drug is assigned an enrichment score (“EScore”) to describe the magnitude and direction of regulation with EScores>0 indicating upregulation and EScores<0 indicating downregulation. Gene Ontology terms were selected by positionally mapping lead SNPs to the nearest gene. When >1 gene mapped to a lead SNP, we retained the gene with the greatest probability of loss of function intolerance. Gene Ontology (GO) terms were extracted from ShinyGO[32] after multiple testing correction (FDR<0.05, P=4.55x10−6 based on 72 394 human gene sets) and tested with Gene2drug.

Results

A study overview is provided in Fig. 1.

Fig. 1 ∣

Study design for understanding the genetic architectures of internalizing co-phenomes.

Features (i.e., comorbid phenotypes) were correlated with GAD-7, PCL-6, and PHQ-9. Outcomes were predicted using elastic net regression in two ways: (i) each quantitative outcome was predicted as the dependent variable in elastic net regression and (ii) elastic net regression weights were used to calculate a co-phenome risk score.

Elastic net features and regression

GAD-7, PCL-6, and PHQ-9 quantitative scores were derived in MHQ responders.[11] After multiple testing correction for 772 phenotypes (FDR<0.05), GAD-7, PCL-6, and PHQ-9 were correlated with 312, 347, and 358 phenotypes, respectively (Supplementary Table 1). We tested different combinations of training-test ratios and feature inclusion thresholds defined by Spearman’s rho (ρ) relative to each trait (Supplementary Table 2).[33] We predicted each outcome in MHQ-non-responders using the elastic net regression parameters with the lowest root mean square error (50∣50 for GAD-7 and PCL-6 and 75∣25 for PHQ-9). Using ρ>0.20 as a feature inclusion threshold, we predicted GAD-7 with 19 phenotypes (observed versus predicted ρ=0.33, P<2x10−16), PCL-6 with 15 phenotypes (observed versus predicted ρ=0.21, P<2x10−16), and PHQ-9 with 17 (observed versus predicted ρ=0.33, P<2x10−16) phenotypes. “Neuroticism score” was the feature most strongly correlated with, and a major predictor of, internalizing symptoms (“neuroticism score” versus GAD-7 ρ=0.482, P<4.13x10−307, elastic net β=0.286; “neuroticism score” versus PCL-6 ρ=0.378, P<4.13x10−307, elastic net β=0.103; “neuroticism score” versus PHQ-9 ρ=0.41, P<4.13x10−307, elastic net β=0.03). The remaining predictors of each trait capture relevant relationships, including features such as “tenseness” and “frequency of tiredness in the last two weeks” (Supplementary Table 3).

Characteristics of MHQ-responders and non-responders

PheRS were more strongly correlated with predicted internalizing outcomes (MHQ-non-responders) than the directly ascertained outcome likely due to the dependence of these variables in MHQ-non-responders (Supplementary Tables 4 and 5). All predicted quantitative outcomes were greater in magnitude among MHQ-non-responders suggesting more severe symptoms compared to MHQ-responders. The difference was minor for PCL-6 (Cohen’s d=−0.048, P=9.96x10−27) and PHQ-9 (Cohen’s d=0.096, P=2.21x10−144) but was large for GAD-7 (Cohen’s d=−0.749, P=1x10−322; MHQ-responder mean=8.97, s.d.=3.09; non-responder mean=12.36, s.d.=4.78). Based on these observations, UKB participants with the highest “neuroticism scores” (i.e., 12; mean MHQ-responder probability=97.7%, s.d.=0.151) were 6.04-times more likely to contribute to the MHQ than those with the lowest “neuroticism score” (i.e., 0; MHQ-responder probability=16.2%, s.d.=0.872, Pdiff=1.03x10−203). This effect appeared strongest among participants with medium and low GAD scores (GAD-7=14 and GAD-7=7, respectively) but was attenuated among those with higher GAD scores (GAD-7=21; Supplementary Fig. 1). We expand upon these observations in the Supplementary Methods and Results.

SNP-based heritability

We used multiple tests to verify that elastic net-predicted outcomes and PheRS capture the same genetic liability as true observations of each outcome. We performed three GWAS for each trait (Supplementary Fig. 2): quantitative score, PheRS, and case-control status derived from quantitative scores. Though elastic-net prediction accuracies were low, genetic analyses captured similar information to that of direct-report data. Due to the high elastic net weight of “neuroticism score”, the difference between MHQ-responder and non-responder “neuroticism scores,” and the heritable component of neuroticism,[19] we analyzed GWAS only after subjecting their effect sizes to multi-trait conditioning with a GWAS of neuroticism.[19, 34, 35] After conditioning (Supplementary Table 6), all GWAS had h estimates that differed significantly from zero (GAD range: MHQ-responder GAD-7-PheRS [h=0.81%, s.e.=0.40, P=0.043] to MHQ-non-responder GAD-7-PheRS [h=3.52%, s.e.=0.32, P=3.82x10−28]; PTSD range: MHQ-responder PTSD [h=1.88%, s.e.=0.21, P=3.55x10−19] to MHQ-responder PCL-6 [h=5.57%, s.e.=0.46, P=9.35x10−35]; depression range: MHQ-non-responder current depression [h=1.61%, s.e.=0.10, P=2.55x10−58] to MHQ-non-responder PHQ-9 [h=5.89%, s.e.=0.39, P=1.62x10−51]). Unless otherwise noted, all in-text results reflect GWAS after multi-trait conditioning with neuroticism and pre-conditioning results are in Supplementary Material. Several h estimates differed significantly between the MHQ-responder and non-responder GWAS, but there was no evidence of systematic over- or under-estimation of h in either cohort (Fig. 2 and Supplementary Table 6).

Fig. 2 ∣

Verifying the concordant genetic architectures of true and predicted internalizing outcomes.

a, SNP-heritability (h) of each internalizing outcome and the current largest unrelated sampling of a corresponding phenotype (GAD-2, PCL-17, and broad depression) after multi-trait conditioning with neuroticism. Each data point is the trait h2 point estimate and error bars represent the 95% confidence interval (CI) associated with each estimate. b, Genetic correlation (r) within and between internalizing outcomes derived from the Mental Health Questionnaire (MHQ responders) and those predicted in the MHQ non-responders of the UKB before (bottom left triangle) and after (top right triangle) multi-trait conditioning with neuroticism. Pale text indicates a phenotype from the MHQ non-responders and dark text indicates a phenotype from the MHQ responders. Each r heatmap contains a positive control with positive r (largest Million Veteran Program (MVP) corresponding phenotype) and positive control with negative r (subjective well-being) phenotype. All rs survive multiple testing correction (FDR<0.05).

To evaluate the cross-ancestry portability of European-ancestry-derived PheRS, we calculated PheRS for each trait in five other ancestries defined by the Pan-UKB project: African (N=868 MHQ-responders), Admixed American (N=879 MHQ-responders), Central/South Asian (N=1 109 MHQ-responders), East Asian (N=601 MHQ-responders), and Middle Eastern (N=270 MHQ-responders). GAD-7, PCL-6, and PHQ-9 were calculated using responses to the MHQ. For all ancestries, the correlation between PheRS and the true quantitative trait was lower than that in the European population, supporting the weak translation of PheRS feature weights across populations. The maximum correlation among diverse ancestry cohorts was ρ=0.413 for GAD-7 and GAD-7-PheRS among Central/South Asians, ρ=0.449 for PCL-6 versus PCL-6-PheRS among Africans, and ρ=0.518 for PHQ-9 versus PHQ-9-PheRS among Middle Eastern individuals. The GWAS of diverse ancestry PheRS also resulted in non-significant h estimates (Supplementary Tables 4 and 6).

Genetic overlap between MHQ-responders and non-responders

The r between MHQ-responders and non-responder GWAS were high: MHQ responder versus non-responder GAD-7-PheRS r=1.55, s.e.=0.406, P=1.0x10−4; PCL-6-PheRS r=1.19, s.e.=0.097, P=2.05x10−34; PHQ-9-PheRS r=1.15, s.e.=0.084, P=3.52x10−42 and likely exceed one as a consequence of conditioning. The corresponding MVP phenotype had high r with each MHQ-responder and non-responder phenotype (Supplementary Table 7). All traits were negatively genetically correlated with subjective well-being. We next evaluated how genetic effects detected in the MVP predicted internalizing outcomes in MHQ-responders and non-responders (Fig. 2 and Supplementary Fig. 3). MHQ-non-responders generally had greater PRS Z-scores and R relative to MHQ-responders (Supplementary Table 8), likely reflecting greater statistical power and higher mean symptom scores of the MHQ-non-responder sample. Regression coefficients for GAD and PTSD PheRS and predicted case-control status presented similar power improvements among MHQ-non-responders. Due to complete sample overlap between UKB depression and MVP broad depression,[3] PRS were not performed for PHQ-9.

Gene discovery through meta-analysis

We meta-analyzed MHQ-responders and non-responders to describe how using the entire UKB enhances gene discovery. Then, we meta-analyzed the two UKB cohorts with the MVP. Results of meta-analyzed depression (UKB only) offered no increase in sample size or h relative to MVP broad depression,[3] and therefore was omitted from in silico analyses. Per GWAS (P<5x10−8), we discovered (i) 10 and 12 risk loci for GAD when meta-analyzing with GAD-7 and GAD-7 PheRS, respectively and (ii) 32 and 26 risk loci for PTSD when meta-analyzing with PCL-6 and PCL-6 PheRS, respectively. 70% of GAD-7, 50% of GAD-7 PheRS, 46.8% of PCL-6, and 23.1% of PCL-6 PheRS loci were part of a credible set (Supplementary Tables 9-11). Some detected loci that have prior evidence of association with GAD (e.g., PHF2-rs12376738 and resistance to depression- and anxiety-like symptoms[36] and memory consolidation[37]), PTSD (e.g., IL2-rs45510091 and low dose cytokine treatments to reverse anxious symptoms), or related symptoms.[38] After study-wide multiple testing correction (P<1.25x10−8), 7 and 6 loci were associated with GAD in meta-analyses using GAD-7 and GAD-7 PheRS, respectively and 22 and 19 loci were associated with PTSD in meta-analyses using PCL-6 and PCL-6 PheRS, respectively (Fig. 3 and Supplementary Tables 9 and 10). Positional mapping identified six genes common to the GAD and PTSD GWAS: ADAD1-IL2-IL21-KIAA1109 cluster, CRHR1-MAPT-NSF-PLEKHM1-WNT3 cluster, FAM120-FAM120AOS-PHF2 cluster, MAD1L1, SOX6, and TMEM106B.

Fig. 3 ∣

SNP annotation of GAD and PTSD GWAS.

The bottom row shows Manhattan plots for each trait. Two horizontal dashed lines in each plot show the genome-wide significance threshold per phenotype (P<5x10−8) and study-wide (P<1.25x10−8). Above each Manhattan plot are Combined Annotation Dependent Depletion (CADD) scores and RegulomeDB scores for each genome-wide significant locus.

The out-sample PRS and prenatal transcriptomic enrichment in the following sections were performed using the most powerful conditioned meta-analysis for each outcome (i.e., highest h z-score): GAD meta-analysis using GAD-7 PheRS (h=2.96%, s.e.=0.16, P=2.06x10−76) and PTSD meta-analysis using PCL-6 PheRS (h=4.08%, s.e.=0.18, P=8.86x10−114; Fig 3. and Supplementary Table 12).

Out-Sample PRS

We evaluated overlap of GAD and PTSD with previous GWAS of anxiety and PTSD traits from FinnGen (KRA_PSY_ANXIETY N=15 770 cases and 161 129 controls; F5_PTSD N=781 cases and 161 390 controls), the PGC (PTSD v1 N=2 424 cases and 7 113 controls)[39], and ANGST (N=17 310).[40] At all P-value thresholds (P), GWAS from this study predicted all out-sample GWAS (P<0.05; Supplementary Table 13). The maximum association for each trait was: GAD versus FinnGen KRA_PSY_ANXIETY (R=0.029%, P=0.1, P=2.34x10−13) and PTSD versus PGC PTSD v1 (R=0.006%, P=0.3, P=6.03x10−4). GAD and PTSD meta-analyses were used to predict reexperiencing and self-reported anxiousness in individual-level data from the Philadelphia Neurodevelopmental Cohort[41, 42] and Yale-Penn[43, 44] (Fig. 4 and Supplementary Material). All PRS models were significant with at least one P (P<0.05) but the best prediction was observed for the corresponding trait: GAD and PNC self-reported anxiousness (R=0.103%, P=5x10−8, P=0.015) and PTSD and PNC reexperiencing (R=0.874%, P=1x10−7, P=1.57x10−4).

Fig. 4 ∣

Out-sample polygenic prediction of relevant phenotypes.

Maximum observed association (R) between polygenic risk scores (PRS) for GAD and PTSD outcomes in this study and out-sample GAD and PTSD phenotypes from large consortia (ANGST, FinnGen, and PGC using summary-level PRS in PRSice v1.25, Panel A) and individual-level cohorts informative for mental health outcomes (Philadelphia Neurodevelopmental Cohort (PNC) and Yale-Penn using PRSice v2, Panel B).

Prenatal Transcriptomic Enrichment

GAD and PTSD GWAS were enriched for Brodmann Area 9 (BA9, part of the dorsolateral prefrontal cortex (DLPFC)) transcriptomic profiles: GAD β=0.022, s.e.=0.007, P=9.07x10−4; PTSD β=0.030, s.e.=0.007, P=2.85x10−5; Supplementary Table 14). Each GWAS also was associated with transcriptomic profiles from late-mid prenatal tissue (Fig. 5): GAD β=0.041, s.e.=0.014, P=0.003; PTSD β=0.042, s.e.=0.015, P=0.003 (Supplementary Table 15). These findings were complemented with 3-D chromatin-aware gene-based association in fetal brain tissue. After study-wide multiple testing correction (FDR<0.05), 86 and 584 genes were associated with GAD and PTSD, respectively (Supplementary Table 16), including CRHR1 (GAD P=1.54x10−5; PTSD P=1.75x10−8), THSD7A (GAD P=1.27x10−8; PTSD P=1.29x10−7), and LAMB2 (GAD P=5.16x10−5; PTSD P=4.44x10−7).

Fig. 5 ∣

Prenatal transcriptomic signatures of GAD and PTSD outcomes.

a, Enrichment of transcriptomic profiles from prenatal tissue based on BrainSpan 11 developmental stages. Each bar represents the results from one-sided tests for enrichment of a given transcriptomic profile. Effect size estimates (β) are color coded. Dashed horizontal lines indicate the significance threshold after multiple testing correction (FDR<0.05) across all tests. b, Manhattan plots of Hi-C coupled gene-based association studies of GAD and PTSD in fetal paracentral tissue. Each data point represents a single gene positionally aligned across each autosome. The height of each point along the y-axis indicates the significance of association between gene and phenotype with each colored data point indicating a significantly associated gene after analysis-wide multiple testing correction (P<9.43x10−7). A subset of genes are labeled and all genes are provided in Supplementary Table 16.

Consistent with tissue and 3-D chromatin data, cell-type enrichments reinforce the contribution of prenatal development in GAD and PTSD liability (Supplementary Tables 17 and 18). GAD and PTSD GWAS were enriched for independent signals from post-conception prefrontal cortex neurons.[45] In PTSD, two brain cell types had cross-dataset significant effects: GABAergic neurons from gestational week 26 (GW26) prefrontal cortex tissue (β=0.041, s.e.=0.012, P=2.64x10−4) and from the midbrains of 6-to-11-week-old embryos (β=0.246, s.e.=0.050, P=5.58x10−7). Cross-data set analyses support partial independence of these signals with primary effects from 6-to-11-week-old midbrain neurons (proportional significance of midbrain GABAergic neurons given prefrontal cortex GABAergic neurons: PSMid_NbGaba,GW26_Gaba=0.700; PSGW26_NbGaba,Mid_Gaba=0.269).[25]

Drug Effects and Repurposing

GAD and PTSD GWAS were most strongly genetically correlated with opioid use (GAD r=0.530, s.e.=0.035, P=1.08x10−50; PTSD r=0.603, s.e.=0.028, P=1.27x10−100) and antidepressant use (GAD r=0.597, s.e.=0.041, P=1.94x10−48; PTSD r=0.632, s.e.=0.035, P=3.23x10−74; Supplementary Table 19). We detected one putative causal relationship (FDR<0.05) between vasodilator use and GAD (gĉp=0.093, s.e.=0.285, P=1.04x10−4; Supplementary Table 20), but two-sample MR between MVP GAD-2 and vasodilator use was insufficiently powered to support this causal hypothesis (Supplementary Results and Supplementary Table 21). We next applied gene-ontology based drug repurposing using 9 GAD and 17 PTSD genes (Supplementary Table 22) and detected 87 GAD and 28 PTSD gene-sets (FDR<0.05, P=4.55x10−6 based on 72,394 human gene sets,[32] Supplementary Table 23). After multiple testing correction (FDR<0.05 applied per trait) we uncovered upregulation of GAD gene-sets in the context of aminohippuric acid, a putative biomarker of depression and anxiety disorders (P=1.08x10−5, EScore=0.917; Supplementary Table 24).[46]

Discussion

Extremely large cohorts are required to discover polygenic signals associated with anxiety[12, 16, 40, 47] and PTSD.[4, 5, 39, 48] Biobanks, such as UKB, offer an opportunity to boost sample size, power, and trait ascertainment homogeneity. In practice, UKB mental health studies are limited by MHQ response (31% of participants[8, 11, 12]) and non-random missingness in questionnaire participation. Due to the large proportion of missingness in the UKB MHQ, we aimed to maximize the sample size informative for GAD and PTSD by studying the genetic architecture of co-morbid phenotype patterns (PheRS) associated with these traits. Predicted outcomes and PheRS reliably capture the genetic architecture of GAD and PTSD. Unsurprisingly, “neuroticism score” contributed substantial predictive power to elastic net regression. In the context of socioeconomic variables and internalizing spectrum psychopathologies, higher neuroticism scores were paradoxically associated with higher probability of MHQ participation. However, this effect appears to be driven by participants with low-to-medium GAD-7 scores and therefore may be due to liability to a subtype rather than neuroticism more broadly. Our data suggest that the interplay between neuroticism and GAD may affect MHQ participation bias. This could be due to the elevated anxiousness/tenseness elements rather than worry/vulnerability elements of neuroticism.[49] We initially hypothesized that prior internalizing studies in the UKB were under powered due to sample size but the data reported here support a more refined hypothesis. Following previous evidence of a two-factor model of neuroticism, the depletion of a subtype (e.g., worry/vulnerability factor) in MHQ-responders relative to non-responders may at least partially explain the limited success of anxiety and PTSD GWAS in the UKB.[34, 49] We leveraged the higher neuroticism scores of MHQ-non-responders to more than double the sample size upper limit for GWAS of GAD and PTSD outcomes while enriching the sample for individuals with objectively more severe symptoms. This procedure resulted in detection of more than twice the genomic risk loci associated with anxiety and PTSD relative to previous studies. Meta-analyses using PheRS were more powerful than meta-analyses using predicted quantitative traits. Though h differences between PheRS and quantitative outcomes were relatively small, we hypothesize that PheRSs capture slightly more accurate information about each trait because they are derived from tangential responses to questions not ascertained in the context of mental health (i.e., as part of the MHQ). Therefore, studying genetic liability to PheRS, in combination with directly ascertained symptoms, may help reduce analytic noise in self-reported assessments.[50] Several approaches to locus functional annotation converged on fetal/prenatal biology. These findings are interesting given the childhood to mid-adult onset of internalizing disorders.[51] We attribute our observations to improved statistical power of a larger sample size rather than multi-trait conditioning with neuroticism.[19, 34] Consistent with previous studies,[52, 53] the DLPFC was identified here as a tissue of interest for GAD and PTSD. We extended these observations to cell-type and 3-D chromatin interaction data to detect gestational week GABAergic neurons and several genes of interest with effects in fetal brain tissue (GAD: TMEM106B; PTSD: CRHR1, LAMB2). In a prior single-cell RNA-seq study of the DLPFC (N=1 057 neurons), the late gestational periods detected in our study were most enriched for genes related to axon guidance, neuron differentiation, and axonogenesis.[45, 54] We utilized the improved power of our meta-analyses to identify potential drug targets and/or drugs that induce anxiety and PTSD symptoms as adverse effects. We detected a relationship between vasodilator use and GAD that could not be confirmed using a two-sample MR approach but has been detected in epidemiology research.[55] The partial causal effect size of vasodilator use on GAD was small, so MR might be under-powered to detect this result considering (i) the requirement for non-overlapping samples and (ii) biases in latent causal estimates in the presence of strong pleiotropic effects among highly polygenic traits.[56] The discordance between these methods may reflect a causal relationship between GAD and vasodilator use that transcends a genetically-regulated molecular relationship (e.g., regulatory or proteomic elements). The results from our study expand prior findings on the biology of GAD and PTSD but there are several limitations to consider. First, while capturing very similar genetic liability to GAD and PTSD, elastic net-predicted phenotypes were weakly correlated with known GAD-7 and PCL-6 scores among MHQ-responders. Thus, there is no utility of these values for epidemiological studies of GAD or PTSD even though they are valuable in genetic studies of psychopathology.[57] Second, our study demonstrated that PheRS feature weights derived from EUR participants do not generalize to individuals of other ancestries. Lack of generalizability may be attributed to genetic differences and/or documented variability healthcare experiences across racial and ethnic groups.[58] Our group and others aim to recognize and reduce these health disparities using carefully tested PheRS for these populations.[13] Third, machine learning identifies patterns in data, not necessarily trait relationships. Thus, we identified mathematically informative and biologically meaningful predictors of GAD and PTSD symptoms. However, these features, their predictive patterns, and the regression weights reported here may not translate outside the UK Biobank. Future studies need to investigate how well PheRS created in one cohort generalize to other cohorts. Finally, solutions to non-random missingness can be influenced by the proportion of missingness. Our data support the non-random nature of UKB MHQ missingness with respect to certain features of mental health (e.g., higher neuroticism scores) but this attribute of data missingness may not extend to other biobanks. Future work will require detailed investigation of the type of missingness observed, its proportion, and how best to fill those gaps including the use of other machine learning or imputation pipelines. The PheRS derived here permit studies of GAD and PTSD in the whole UKB cohort. Our results provide one roadmap by which the community may improve sample size and statistical power for enhanced risk locus discovery in the context of incompletely ascertained traits in the UKB and other biobanks with limited mental health assessment. We use these data to present biological underpinnings uncovered from analysis of the largest GWAS meta-analysis of these traits to date.

Data availability

All data used to generate figures for this study are provided as Supplementary Material. Elastic net weights are provided as Supplementary Material. GWAS summary data are accessible at 10.5281/zenodo.4767570. This research has been conducted using the UK Biobank Resource (application reference no. 58146) and is available to bona fide researchers through approved access. Out-sample polygenic risk scoring utilized the Yale-Penn cohort (dbGaP Study Accession: phs000425.v1.p1) and the Philadelphia Neurodevelopmental Cohort (dbGaP Study Accession: phs000607.v3.p2). The dbGAP data used herein is available for approved access download from dbGAP data request portal.

45 in total

1. Million Veteran Program: A mega-biobank to study genetic influences on health and disease.

Authors: John Michael Gaziano; John Concato; Mary Brophy; Louis Fiore; Saiju Pyarajan; James Breeling; Stacey Whitbourne; Jennifer Deen; Colleen Shannon; Donald Humphries; Peter Guarino; Mihaela Aslan; Daniel Anderson; Rene LaFleur; Timothy Hammond; Kendra Schaa; Jennifer Moser; Grant Huang; Sumitra Muralidhar; Ronald Przygodzki; Timothy J O'Leary
Journal: J Clin Epidemiol Date: 2015-10-09 Impact factor: 6.437

2. An abbreviated PTSD checklist for use as a screening instrument in primary care.

Authors: Ariel J Lang; Murray B Stein
Journal: Behav Res Ther Date: 2005-05

Review 3. The Patient Health Questionnaire Somatic, Anxiety, and Depressive Symptom Scales: a systematic review.

Authors: Kurt Kroenke; Robert L Spitzer; Janet B W Williams; Bernd Löwe
Journal: Gen Hosp Psychiatry Date: 2010-05-07 Impact factor: 3.238

4. Reproducible Genetic Risk Loci for Anxiety: Results From ∼200,000 Participants in the Million Veteran Program.

Authors: Daniel F Levey; Joel Gelernter; Renato Polimanti; Hang Zhou; Zhongshan Cheng; Mihaela Aslan; Rachel Quaden; John Concato; Krishnan Radhakrishnan; Julien Bryois; Patrick F Sullivan; Murray B Stein
Journal: Am J Psychiatry Date: 2020-01-07 Impact factor: 18.112

5. Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions.

Authors: David M Howard; Mark J Adams; Toni-Kim Clarke; Jonathan D Hafferty; Jude Gibson; Masoud Shirali; Jonathan R I Coleman; Saskia P Hagenaars; Joey Ward; Eleanor M Wigmore; Clara Alloza; Xueyi Shen; Miruna C Barbu; Eileen Y Xu; Heather C Whalley; Riccardo E Marioni; David J Porteous; Gail Davies; Ian J Deary; Gibran Hemani; Klaus Berger; Henning Teismann; Rajesh Rawal; Volker Arolt; Bernhard T Baune; Udo Dannlowski; Katharina Domschke; Chao Tian; David A Hinds; Maciej Trzaskowski; Enda M Byrne; Stephan Ripke; Daniel J Smith; Patrick F Sullivan; Naomi R Wray; Gerome Breen; Cathryn M Lewis; Andrew M McIntosh
Journal: Nat Neurosci Date: 2019-02-04 Impact factor: 28.771

Review 6. Overview of the BioBank Japan Project: Study design and profile.

Authors: Akiko Nagai; Makoto Hirata; Yoichiro Kamatani; Kaori Muto; Koichi Matsuda; Yutaka Kiyohara; Toshiharu Ninomiya; Akiko Tamakoshi; Zentaro Yamagata; Taisei Mushiroda; Yoshinori Murakami; Koichiro Yuji; Yoichi Furukawa; Hitoshi Zembutsu; Toshihiro Tanaka; Yozo Ohnishi; Yusuke Nakamura; Michiaki Kubo
Journal: J Epidemiol Date: 2017-02-08 Impact factor: 3.211

7. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression.

Authors: Naomi R Wray; Stephan Ripke; Manuel Mattheisen; Maciej Trzaskowski; Enda M Byrne; Abdel Abdellaoui; Mark J Adams; Esben Agerbo; Tracy M Air; Till M F Andlauer; Silviu-Alin Bacanu; Marie Bækvad-Hansen; Aartjan F T Beekman; Tim B Bigdeli; Elisabeth B Binder; Douglas R H Blackwood; Julien Bryois; Henriette N Buttenschøn; Jonas Bybjerg-Grauholm; Na Cai; Enrique Castelao; Jane Hvarregaard Christensen; Toni-Kim Clarke; Jonathan I R Coleman; Lucía Colodro-Conde; Baptiste Couvy-Duchesne; Nick Craddock; Gregory E Crawford; Cheynna A Crowley; Hassan S Dashti; Gail Davies; Ian J Deary; Franziska Degenhardt; Eske M Derks; Nese Direk; Conor V Dolan; Erin C Dunn; Thalia C Eley; Nicholas Eriksson; Valentina Escott-Price; Farnush Hassan Farhadi Kiadeh; Hilary K Finucane; Andreas J Forstner; Josef Frank; Héléna A Gaspar; Michael Gill; Paola Giusti-Rodríguez; Fernando S Goes; Scott D Gordon; Jakob Grove; Lynsey S Hall; Eilis Hannon; Christine Søholm Hansen; Thomas F Hansen; Stefan Herms; Ian B Hickie; Per Hoffmann; Georg Homuth; Carsten Horn; Jouke-Jan Hottenga; David M Hougaard; Ming Hu; Craig L Hyde; Marcus Ising; Rick Jansen; Fulai Jin; Eric Jorgenson; James A Knowles; Isaac S Kohane; Julia Kraft; Warren W Kretzschmar; Jesper Krogh; Zoltán Kutalik; Jacqueline M Lane; Yihan Li; Yun Li; Penelope A Lind; Xiaoxiao Liu; Leina Lu; Donald J MacIntyre; Dean F MacKinnon; Robert M Maier; Wolfgang Maier; Jonathan Marchini; Hamdi Mbarek; Patrick McGrath; Peter McGuffin; Sarah E Medland; Divya Mehta; Christel M Middeldorp; Evelin Mihailov; Yuri Milaneschi; Lili Milani; Jonathan Mill; Francis M Mondimore; Grant W Montgomery; Sara Mostafavi; Niamh Mullins; Matthias Nauck; Bernard Ng; Michel G Nivard; Dale R Nyholt; Paul F O'Reilly; Hogni Oskarsson; Michael J Owen; Jodie N Painter; Carsten Bøcker Pedersen; Marianne Giørtz Pedersen; Roseann E Peterson; Erik Pettersson; Wouter J Peyrot; Giorgio Pistis; Danielle Posthuma; Shaun M Purcell; Jorge A Quiroz; Per Qvist; John P Rice; Brien P Riley; Margarita Rivera; Saira Saeed Mirza; Richa Saxena; Robert Schoevers; Eva C Schulte; Ling Shen; Jianxin Shi; Stanley I Shyn; Engilbert Sigurdsson; Grant B C Sinnamon; Johannes H Smit; Daniel J Smith; Hreinn Stefansson; Stacy Steinberg; Craig A Stockmeier; Fabian Streit; Jana Strohmaier; Katherine E Tansey; Henning Teismann; Alexander Teumer; Wesley Thompson; Pippa A Thomson; Thorgeir E Thorgeirsson; Chao Tian; Matthew Traylor; Jens Treutlein; Vassily Trubetskoy; André G Uitterlinden; Daniel Umbricht; Sandra Van der Auwera; Albert M van Hemert; Alexander Viktorin; Peter M Visscher; Yunpeng Wang; Bradley T Webb; Shantel Marie Weinsheimer; Jürgen Wellmann; Gonneke Willemsen; Stephanie H Witt; Yang Wu; Hualin S Xi; Jian Yang; Futao Zhang; Volker Arolt; Bernhard T Baune; Klaus Berger; Dorret I Boomsma; Sven Cichon; Udo Dannlowski; E C J de Geus; J Raymond DePaulo; Enrico Domenici; Katharina Domschke; Tõnu Esko; Hans J Grabe; Steven P Hamilton; Caroline Hayward; Andrew C Heath; David A Hinds; Kenneth S Kendler; Stefan Kloiber; Glyn Lewis; Qingqin S Li; Susanne Lucae; Pamela F A Madden; Patrik K Magnusson; Nicholas G Martin; Andrew M McIntosh; Andres Metspalu; Ole Mors; Preben Bo Mortensen; Bertram Müller-Myhsok; Merete Nordentoft; Markus M Nöthen; Michael C O'Donovan; Sara A Paciga; Nancy L Pedersen; Brenda W J H Penninx; Roy H Perlis; David J Porteous; James B Potash; Martin Preisig; Marcella Rietschel; Catherine Schaefer; Thomas G Schulze; Jordan W Smoller; Kari Stefansson; Henning Tiemeier; Rudolf Uher; Henry Völzke; Myrna M Weissman; Thomas Werge; Ashley R Winslow; Cathryn M Lewis; Douglas F Levinson; Gerome Breen; Anders D Børglum; Patrick F Sullivan
Journal: Nat Genet Date: 2018-04-26 Impact factor: 38.330

8. International meta-analysis of PTSD genome-wide association studies identifies sex- and ancestry-specific genetic risk loci.

Authors: Caroline M Nievergelt; Adam X Maihofer; Torsten Klengel; Elizabeth G Atkinson; Chia-Yen Chen; Karmel W Choi; Jonathan R I Coleman; Shareefa Dalvie; Laramie E Duncan; Joel Gelernter; Daniel F Levey; Mark W Logue; Renato Polimanti; Allison C Provost; Andrew Ratanatharathorn; Murray B Stein; Katy Torres; Allison E Aiello; Lynn M Almli; Ananda B Amstadter; Søren B Andersen; Ole A Andreassen; Paul A Arbisi; Allison E Ashley-Koch; S Bryn Austin; Esmina Avdibegovic; Dragan Babić; Marie Bækvad-Hansen; Dewleen G Baker; Jean C Beckham; Laura J Bierut; Jonathan I Bisson; Marco P Boks; Elizabeth A Bolger; Anders D Børglum; Bekh Bradley; Megan Brashear; Gerome Breen; Richard A Bryant; Angela C Bustamante; Jonas Bybjerg-Grauholm; Joseph R Calabrese; José M Caldas-de-Almeida; Anders M Dale; Mark J Daly; Nikolaos P Daskalakis; Jürgen Deckert; Douglas L Delahanty; Michelle F Dennis; Seth G Disner; Katharina Domschke; Alma Dzubur-Kulenovic; Christopher R Erbes; Alexandra Evans; Lindsay A Farrer; Norah C Feeny; Janine D Flory; David Forbes; Carol E Franz; Sandro Galea; Melanie E Garrett; Bizu Gelaye; Elbert Geuze; Charles Gillespie; Aferdita Goci Uka; Scott D Gordon; Guia Guffanti; Rasha Hammamieh; Supriya Harnal; Michael A Hauser; Andrew C Heath; Sian M J Hemmings; David Michael Hougaard; Miro Jakovljevic; Marti Jett; Eric Otto Johnson; Ian Jones; Tanja Jovanovic; Xue-Jun Qin; Angela G Junglen; Karen-Inge Karstoft; Milissa L Kaufman; Ronald C Kessler; Alaptagin Khan; Nathan A Kimbrel; Anthony P King; Nastassja Koen; Henry R Kranzler; William S Kremen; Bruce R Lawford; Lauren A M Lebois; Catrin E Lewis; Sarah D Linnstaedt; Adriana Lori; Bozo Lugonja; Jurjen J Luykx; Michael J Lyons; Jessica Maples-Keller; Charles Marmar; Alicia R Martin; Nicholas G Martin; Douglas Maurer; Matig R Mavissakalian; Alexander McFarlane; Regina E McGlinchey; Katie A McLaughlin; Samuel A McLean; Sarah McLeay; Divya Mehta; William P Milberg; Mark W Miller; Rajendra A Morey; Charles Phillip Morris; Ole Mors; Preben B Mortensen; Benjamin M Neale; Elliot C Nelson; Merete Nordentoft; Sonya B Norman; Meaghan O'Donnell; Holly K Orcutt; Matthew S Panizzon; Edward S Peters; Alan L Peterson; Matthew Peverill; Robert H Pietrzak; Melissa A Polusny; John P Rice; Stephan Ripke; Victoria B Risbrough; Andrea L Roberts; Alex O Rothbaum; Barbara O Rothbaum; Peter Roy-Byrne; Ken Ruggiero; Ariane Rung; Bart P F Rutten; Nancy L Saccone; Sixto E Sanchez; Dick Schijven; Soraya Seedat; Antonia V Seligowski; Julia S Seng; Christina M Sheerin; Derrick Silove; Alicia K Smith; Jordan W Smoller; Scott R Sponheim; Dan J Stein; Jennifer S Stevens; Jennifer A Sumner; Martin H Teicher; Wesley K Thompson; Edward Trapido; Monica Uddin; Robert J Ursano; Leigh Luella van den Heuvel; Miranda Van Hooff; Eric Vermetten; Christiaan H Vinkers; Joanne Voisey; Yunpeng Wang; Zhewu Wang; Thomas Werge; Michelle A Williams; Douglas E Williamson; Sherry Winternitz; Christiane Wolf; Erika J Wolf; Jonathan D Wolff; Rachel Yehuda; Ross McD Young; Keith A Young; Hongyu Zhao; Lori A Zoellner; Israel Liberzon; Kerry J Ressler; Magali Haas; Karestan C Koenen
Journal: Nat Commun Date: 2019-10-08 Impact factor: 14.919

9. Mental health in UK Biobank - development, implementation and results from an online questionnaire completed by 157 366 participants: a reanalysis.

Authors: Katrina A S Davis; Jonathan R I Coleman; Mark Adams; Naomi Allen; Gerome Breen; Breda Cullen; Chris Dickens; Elaine Fox; Nick Graham; Jo Holliday; Louise M Howard; Ann John; William Lee; Rose McCabe; Andrew McIntosh; Robert Pearsall; Daniel J Smith; Cathie Sudlow; Joey Ward; Stan Zammit; Matthew Hotopf
Journal: BJPsych Open Date: 2020-02-06

10. The UK Biobank resource with deep phenotyping and genomic data.

Authors: Clare Bycroft; Colin Freeman; Desislava Petkova; Gavin Band; Lloyd T Elliott; Kevin Sharp; Allan Motyer; Damjan Vukcevic; Olivier Delaneau; Jared O'Connell; Adrian Cortes; Samantha Welsh; Alan Young; Mark Effingham; Gil McVean; Stephen Leslie; Naomi Allen; Peter Donnelly; Jonathan Marchini
Journal: Nature Date: 2018-10-10 Impact factor: 49.962