Literature DB >> 30643251

Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use.

Mengzhen Liu¹, Yu Jiang^2,3, Robbee Wedow^4,5,6, Yue Li^7,8, David M Brazel^4,9,10, Fang Chen^2,3, Gargi Datta¹, Jose Davila-Velderrain^7,8, Daniel McGuire^2,3, Chao Tian¹¹, Xiaowei Zhan^12,13, Hélène Choquet¹⁴, Anna R Docherty^15,16, Jessica D Faul¹⁷, Johanna R Foerster¹⁸, Lars G Fritsche¹⁸, Maiken Elvestad Gabrielsen¹⁹, Scott D Gordon²⁰, Jeffrey Haessler²¹, Jouke-Jan Hottenga²², Hongyan Huang^23,24, Seon-Kyeong Jang¹, Philip R Jansen^25,26, Yueh Ling^2,9, Reedik Mägi²⁷, Nana Matoba²⁸, George McMahon²⁹, Antonella Mulas³⁰, Valeria Orrù³⁰, Teemu Palviainen³¹, Anita Pandit¹⁸, Gunnar W Reginsson³², Anne Heidi Skogholt¹⁹, Jennifer A Smith^17,33, Amy E Taylor²⁹, Constance Turman^23,24, Gonneke Willemsen²², Hannah Young¹, Kendra A Young³⁴, Gregory J M Zajac¹⁸, Wei Zhao³³, Wei Zhou³⁵, Gyda Bjornsdottir³², Jason D Boardman^4,5,6, Michael Boehnke¹⁸, Dorret I Boomsma²², Chu Chen²¹, Francesco Cucca³⁰, Gareth E Davies³⁶, Charles B Eaton³⁷, Marissa A Ehringer^4,38, Tõnu Esko^8,27, Edoardo Fiorillo³⁰, Nathan A Gillespie^15,20, Daniel F Gudbjartsson^32,39, Toomas Haller²⁷, Kathleen Mullan Harris^40,41, Andrew C Heath⁴², John K Hewitt^4,43, Ian B Hickie⁴⁴, John E Hokanson³⁴, Christian J Hopfer^4,45, David J Hunter^23,24,46, William G Iacono¹, Eric O Johnson⁴⁷, Yoichiro Kamatani²⁸, Sharon L R Kardia³³, Matthew C Keller^4,43, Manolis Kellis^7,8, Charles Kooperberg²¹, Peter Kraft^23,24,48, Kenneth S Krauter^4,9, Markku Laakso^49,50, Penelope A Lind⁵¹, Anu Loukola³¹, Sharon M Lutz⁵², Pamela A F Madden⁴², Nicholas G Martin²⁰, Matt McGue¹, Matthew B McQueen^4,38, Sarah E Medland⁵¹, Andres Metspalu²⁷, Karen L Mohlke⁵³, Jonas B Nielsen⁵⁴, Yukinori Okada^28,55, Ulrike Peters^21,56, Tinca J C Polderman²⁵, Danielle Posthuma^25,57, Alexander P Reiner^21,56, John P Rice⁵⁸, Eric Rimm^24,59, Richard J Rose⁶⁰, Valgerdur Runarsdottir⁶¹, Michael C Stallings^4,43, Alena Stančáková⁴⁹, Hreinn Stefansson³², Khanh K Thai¹⁴, Hilary A Tindle⁶², Thorarinn Tyrfingsson⁶¹, Tamara L Wall⁶³, David R Weir¹⁷, Constance Weisner¹⁴, John B Whitfield²⁰, Bendik Slagsvold Winsvold⁶⁴, Jie Yin¹⁴, Luisa Zuccolo^29,65, Laura J Bierut⁵⁸, Kristian Hveem^19,66,67, James J Lee¹, Marcus R Munafò^65,68, Nancy L Saccone⁶⁹, Cristen J Willer^35,54,70, Marilyn C Cornelis⁷¹, Sean P David⁷², David A Hinds¹¹, Eric Jorgenson¹⁴, Jaakko Kaprio^31,73, Jerry A Stitzel^4,38, Kari Stefansson^32,74, Thorgeir E Thorgeirsson³², Gonçalo Abecasis¹⁸, Dajiang J Liu^75,76, Scott Vrieze⁷⁷.

Abstract

Tobacco and alcohol use are leading causes of mortality that influence risk for many complex diseases and disorders1. They are heritable2,3 and etiologically related4,5 behaviors that have been resistant to gene discovery efforts6-11. In sample sizes up to 1.2 million individuals, we discovered 566 genetic variants in 406 loci associated with multiple stages of tobacco use (initiation, cessation, and heaviness) as well as alcohol use, with 150 loci evidencing pleiotropic association. Smoking phenotypes were positively genetically correlated with many health conditions, whereas alcohol use was negatively correlated with these conditions, such that increased genetic risk for alcohol use is associated with lower disease risk. We report evidence for the involvement of many systems in tobacco and alcohol use, including genes involved in nicotinic, dopaminergic, and glutamatergic neurotransmission. The results provide a solid starting point to evaluate the effects of these loci in model organisms and more precise substance use measures.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Year: 2019 PMID： 30643251 PMCID： PMC6358542 DOI： 10.1038/s41588-018-0307-5

Source DB: PubMed Journal: Nat Genet ISSN： 1061-4036 Impact factor: 38.330

An analysis overview is provided in Supplementary Figure 1; all independent associated variants are in Supplementary Tables 1–5; and Quantile-Quantile (QQ), Manhattan, and LocusZoom plots are shown in Supplementary Figures 2–12. Smoking initiation phenotypes included age of initiation of regular smoking (AgeSmk; N=341,427; 10 associated variants) and a binary phenotype indicating whether an individual had ever smoked regularly (SmkInit, N=1,232,091; 378 associated variants). Heaviness of smoking was measured with cigarettes per day (CigDay; N=337,334; 55 associated variants). Smoking cessation (SmkCes, N=547,219; 24 associated variants) was a binary variable contrasting current versus former smokers. Available measures of alcohol use were simpler, with drinks per week (DrnkWk; N=941,280; 99 associated variants) widely available and similarly measured across studies. See the Supplementary Note and Supplementary Tables 6–7 for phenotype definition details. The four smoking phenotypes were genetically correlated with one another (Figure 1; Supplementary Table 8). DrnkWk was not highly genetically correlated with the smoking phenotypes (rg~.10) except for SmkInit (rg~.34, p=6.7e−63), suggesting that sequence variation affecting alcohol use and those affecting initiation of smoking overlap substantially. The phenotypes were highly genetically correlated across constituent studies (Supplementary Table 9), suggesting minor impact of phenotypic heterogeneity in the present results, even across Western Europe and the United States. Smoking phenotypes were genetically correlated in expected directions with many behavioral, psychiatric, and medical phenotypes (Figure 1, Supplementary Table 10). Genetic variation associated with increased alcohol use was associated with greater levels of risky behavior (rg=.20, p=1.8×10−7) and cannabis use (rg=.36, p=6.2×10−10), but with less risk of disease, for almost all diseases (Figure 1, Supplementary Table 10).

Figure 1.

Genetic correlations between substance use phenotypes and phenotypes from other large genome-wide association studies.

Genetic correlations between each of the phenotypes are shown in the first 5 rows, with heritability estimates displayed down the diagonal. All genetic correlations and heritability estimates were calculated using LD Score Regression. Blue shading represents negative genetic correlations, and red shading represents positive correlations, with increasing color intensity reflecting increasing strength of a correlation. A single asterisk reflects significant genetic correlations at the p<.05 level. Double asterisks reflect significant genetic correlations at the Bonferroni-correction p<.000278 level (corrected for 180 independent tests). Note that SmkCes was oriented such that higher scores reflected current smoking, and for AgeSmk lower scores reflect earlier ages of initiation, both of which are typically associated with negative outcomes. AgeSmk=Age of Initiation of Smoking; CigDay=Cigarettes per Day; SmkInit=Smoking Initiation; SmkCes=Smoking Cessation; DrnkWk=Drinks per Week.

Using a novel method to evaluate multivariate genetic correlation at the locus (versus global) level, we observed 150 loci that affected multiple substance use phenotypes (Supplementary Table 11), summarized in Figure 2. Patterns of pleiotropy across phenotypes were highly diverse, with only three loci significantly associated with all five phenotypes. These three loci included associations implicating Phosphodiesterase 4B (PDE4B) and Cullin 3 (CUL3). PDE4B regulates the cAMP second messenger availability and thereby affects signal transduction, and is down-regulated by chronic nicotine administration in rats[12]. CUL3 has wide-ranging effects, including on ubiquination and protein degradation, and de novo mutations in CUL3 are associated with rare diseases affecting response to the mineralocorticoid aldosterone[13], which itself is affected by smoking[14] and associated with alcohol use[15]. In addition to testing for pleiotropy, we also used MTAG[16] to leverage the observed genetic correlations to increase power for locus discovery. Using this method, we discovered 1,193 independent, genome-wide significantly associated common variants (MAF > 1%; 173 for AgeSmk, 89 CigDay, 83 SmkCes, 692 SmkInit, and 156 DrnkWk) listed in Supplementary Table 12 and described further in the supplement.

Figure 2.

Pleiotropy.

Depicted here are results from the multivariate analysis of pleiotropy. For each locus, the method returns the best fitting solution of which phenotypes were associated with that locus. All loci with one or more associated phenotypes are shown here. For example, every locus associated with AgeSmk was found to be pleiotropic for other phenotypes (green, blue, red, purple, and fuchsia bars), and no locus showed association with only AgeSmk (no dark grey bar for AgeSmk). When sample sizes are unequal across phenotypes, the method also improves power for those phenotypes with smaller samples. The total number of loci associated with each trait (whether pleiotropic or not) from these analyses was 40 (AgeSmk), 48 (SmkCes), 72 (CigDay), 111 (DrnkWk), and 278 (SmkInit). Full information is in Supplementary Table 11.

Phenotypic variation accounted for by our initial 566 conditionally independent genome-wide significant variants from the initial GWAS ranged from 0.1% (SmkCes) to 2.3% (SmkInit; see Figure 3). SNP heritability calculated using LD Score Regression[17] ranged from 4.2% for DrnkWk to 8.0% for CigDay (Figure 3; Supplementary Table 13), consistent with estimates using individual-level data[18], SNP heritabilities calculated from the largest individual contributing studies (Supplementary Table 13), and prior work[19]. The results suggest that these phenotypes are highly polygenic and the majority of the heritability is accounted for by variants below standard GWAS thresholds.

Figure 3.

Heritability and polygenic prediction.

The light gray bars reflect SNP heritability, estimated with LD Score Regression. The light blue and gold bars reflect the predictive power of polygenic risk scores in Add Health and the Health and Retirement Study (HRS), respectively. Despite the 41-year generational gap between participants from these two studies, and major tobacco-related policy changes during that time, the polygenic scores are similarly predictive in both samples. Error bars are 95% confidence intervals estimated with 1000 bootstrapped repetitions. Dark gray bars represent the total phenotypic variance explained by only genome-wide significant SNPs. H2=heritability.

To further investigate the polygenicity, polygenic risk scores (Supplementary Table 14) were computed on the Add Health[20] and Health and Retirement Study[21] datasets, which are representative of their birth cohorts in the United States, and represent exposures to different tobacco policy environments. Add Health participants were born, on average, in 1979; average birth year in the Health and Retirement Study was 1938. Despite these generational differences, the polygenic score performed similarly in both samples. It accounted for approximately 1%, 4%, 1%, 4%, and 2.5% of variance in AgeSmk, CigDay, SmkCes, SmkInit, and DrnkWk, respectively, about half of the estimated SNP heritability of these traits (Figure 3). More concretely, in Add Health and the Health and Retirement study, respectively, a one SD increase in the CigDay risk score resulted in two and three additional daily cigarettes; a one SD increase on the SmkInit risk score resulted in a 12% and 10% increased risk of regularly smoking; and a one SD increase on the DrnkWk risk score reflected one additional drink per week in both datasets. Cell/tissue enrichment[22] was observed across all five phenotypes within core histone marks from multiple central nervous system (CNS) tissues (Supplementary Figures 13–15, Supplementary Tables 15–16). Enrichment was observed in tissues from cortical and sub-cortical regions in the CNS. Structure and function of these regions have been robustly associated with individual differences in frequencies, magnitudes, and clinical characteristics of alcohol use, and substance use/misuse generally, in human imaging research. Our results include significant enrichment across phenotypes and histone marks in the hippocampus[23], inferior temporal pathways[24], dorsolateral and medial prefrontal cortex[25], caudate, and striatum[26]. Consistent with gene and pathway findings described below, alcohol and nicotine use affect dopaminergic and glutamatergic neurotransmission among these brain regions, compromising reward-based learning and facilitating drug seeking behavior[26]. Enrichment within other cell/tissue groups and specific cell/tissue types included immune and liver cells but were less consistent across analytical approaches. We manually reviewed all genes implicated by the GWAS or gene-based tests (see Supplementary Tables 1–5 for the full catalogue of implicated genes; Supplementary Tables 17–21 for gene and gene-set test results). We replicated known associations between multiple variants in nicotine metabolism gene CYP2A6 with CigDay (p=4.0×10−99) and SmkCes (p=1.6×10−48). We replicated an association signal in alcohol metabolism gene ADH1B associated with DrnkWk, identifying in that locus 11 conditionally independently associated variants (lowest p<2.2×10−303). All drugs of abuse activate the mesolimbic dopamine system reward pathway[27], and dopamine-related genes have long been popular candidate genes. We found that variants near the widely studied dopamine receptor D2 (DRD2)[28] were associated across phenotypes, including CigDay, SmkCes, and DrnkWk (p=6.5×10−12, 1.1×10−10, and 4.9×10−11, respectively) but not with AgeSmk or SmkInit, suggesting that these variants are less relevant in early stages of nicotine use. Other specific dopamine-related genes only showed associations with smoking phenotypes, including multiple associations between CigDay and SmkCes with dopamine beta-hydroxylase (DBH, p=9.8×10−24 and 1.2×10−35, respectively)[9], an enzyme necessary to convert dopamine to norepinephrine. SmkInit was associated with variation near protein phosphatase 1 regulatory subunit 1B (PPP1R1B, p=3.9×10−8), a signal transduction gene that affects synaptic plasticity and reward-based learning in the striatum[29,30] and contributes to the behavioral effects of nicotine in mice[31]. In pathway analyses, dopamine gene sets were enriched only in SmkInit, where the exemplar pathway ‘reactome dopamine neurotransmitter release cycle’ pathway was enriched (p=9.2×10−5; Figure 4; Supplementary Table 18).

Figure 4.

Correlations among exemplary DEPICT gene sets.

There were 68 clusters available for Smoking initiation and 10 for Drinks Per Week (CigDay, AgeSmk, and SmkCes did not have > 1 exemplary sets.) Blue shading represents positive correlations, and red shading represents negative correlations, with increasing color intensity reflecting increasing strength of a correlation. Cluster names are truncated for space, with a full list of all names in Supplementary Table 18. The number after each name is the number of gene sets in each cluster. The matrix naturally falls into three blue superclusters along the diagonal. The largest supercluster contains primarily gene sets related to neurotransmitter receptors, ion channels (sodium, potassium, calcium), learning/memory, and other aspects of CNS function. The middle supercluster includes gene sets defined by regulation of transcription and translation, including RNA binding and transcription factor activity. The final supercluster is composed primarily of gene sets related to development of the nervous system.

Neuronal acetylcholine nicotinic receptors are the initial site of nicotine action in the brain and have long been implicated in nicotine use and dependence[32]. With the exception of CHRNA7, all CNS-expressed nicotinic receptor genes were significantly associated with one or more smoking phenotypes, many reported here for the first time. Enrichment was also noted for nicotinic receptor-related pathways and genes in smoking phenotypes (Supplementary Tables 17–21). There was no evidence of association between nicotinic receptor genes or pathways with DrnkWk, despite the use of nicotinic receptor partial agonists (e.g., varenicline) in the treatment of alcohol dependence[33]. Associations with SmkInit highlighted structures and functions related to long-term potentiation and reward-related learning and memory, systems that affect reward processing and addiction[28,34,35]. Glutamate is an important neurotransmitter mediating these processes, and exemplar pathways related to glutamate were significantly enriched in SmkInit (e.g., ‘extracellular-glutamate-gated ion channel’, p=9.9×10−7; ‘post-NMDA receptor activation events’, p=5.5×10−5; and ‘DLG4 PPI subnetwork’, p=4.5×10−12; Supplementary Table 18). DLG4 affects NMDA receptors and potassium channel clusters, and plays a central role in glutamatergic models of reward-related learning[35]. Individual associated genes related to these pathways included glutamate ionotropic receptor NMDA type subunit 2 (GRIN2A, p=3.4×10−11) and homer scaffolding protein 2 (HOMER2, p=3.1×10−14), which affects addictive behavior in mice[35,36] and regulates glutamate metabotropic receptor 1 (GRM1). Pathways enriched in SmkInit also included sodium, potassium, and calcium voltage-gated channels (Figure 4, Supplementary Table 18), essential to neuronal excitability and signaling. Alcohol is known to affect glutamatergic signaling pathways[37], and over half of the enriched pathways for DrnkWk clustered within the exemplar ‘glutamate ionotropic receptor kainate type subunit 2 (GRIK2) PPI subnetwork’ (Figure 4, Supplementary Table 18). Not all DrnkWk-enriched pathways involved the brain, however, as glucose and carbohydrate processing pathways were associated with DrnkWk but no smoking phenotype, perhaps suggesting that alcohol consumption is influenced by individual differences in one’s ability to process calorie-rich alcoholic beverages. Finally, we discovered variation in and around gene rich regions including corticotropin releasing hormone receptor 1 (CRHR1; p=1.6×10−17) and urocortin (UCN; p=8.1×10−45), associated with DrnkWk but not smoking. UCN encodes an endogenous ligand for CRHR1 and CRHR2[38]. CRH affects hormones involved in the stress response, including cortisol, and has been associated with the stress response and relapse to drug taking in animals[39,40]. Specific mechanisms by which implicated genes influence substance use in humans are largely unknown, even for those genes reported above involving systems such as neurotransmission, reward-related learning and memory, and the stress response. To prioritize genes for functional experimentation, we tabulated conditionally independent genome-wide significant nonsynonymous variants (Table 1). In the 406 GWAS loci, 4% of sentinel variants were nonsynonymous, representing a significant enrichment (p=2.5×10−10; 0.4% of variants with MAF>0.1% in the imputation panel[41] were nonsynonymous). Several genes in Table 1 have been previously associated with substance use/addiction (see Supplementary Table 22 for a list of previous associations), and two variants have been functionally validated (rs1229984 and rs16969968)[42,43]. The others have not, but in some cases their genes interact with established molecular targets of addiction and may themselves be suitable targets for further investigation. For example, rs1024323 in G protein-coupled receptor (GPCR) kinase 4 (GRK4) was associated with CigDay (p=8.7×10−9) and lies within a locus associated with AgeSmk. GRK4 is involved in the regulation of GPCRs including metabotropic glutamate receptor 1 (GRM1)[44], GABAB receptors[45], and dopamine receptor D1 (DRD1) and D3 (DRD3) in the kidneys and cerebellum, and is involved in essential hypertension[46]. GRK4 is also expressed in the midbrain and forebrain[46,47], but no research has evaluated its impact on substance use behavior. To take one more example, the nonsynonymous variant in SLC39A8 affects zinc and manganese transport, is highly pleiotropic for complex phenotypes, and may impair inflammation, glutamatergic neurotransmission, and regulation of various metals in the body[48].

Table 1.

Nonsynonymous sentinel variants.

The sentinel variant in approximately 4% of loci was nonsynonymous. Shown here are all nonsynonymous sentinel variants, and all nonsynonymous variants in near-perfect LD with a sentinel variant. If the listed gene was also associated (through single variant or gene-based test) with another phenotype, that phenotype is listed in parentheses. Several genes have been implicated in previous studies of substance use/addiction, including CHRNA5, BDNF, GCKR, and ADH1B.

Phenotype	Gene	rsID	Chr	Position	REF	ALT	AF	Beta	p	N	Q
CigDay (SmkCes)	CHRNA5	rs16969968[a]	15	78,882,925	G	A	.34	.075	1.2×10⁻²⁷⁸	330,721	.34
CigDay	HIST1H2BE	rs7766641	6	26,184,102	G	A	.27	−.014	2.9×10⁻¹⁰	335,553	.78
CigDay (AgeSmk)	GRK4	rs1024323	4	3,006,043	C	T	.38	−.012	8.7×10⁻⁹	337,334	.17
SmkInit	REV3L	rs462779[a]	6	111,695,887	G	A	.81	−.019	4.5×10⁻²⁹	1,232,091	.67
SmkInit (DrnkWk)	BDNF	rs6265	11	27,679,916	C	T	.20	−.016	2.8×10⁻¹⁹	1,232,091	.13
SmkInit	RHOT2	rs1139897	16	720,986	G	A	.23	−.012	1.8×10⁻¹⁵	1,232,091	.61
SmkInit (DrnkWk)	ZNF789	rs6962772[a]	7	99,081,730	A	G	.15	−.015	2.1×10⁻¹⁴	1,232,091	.92
SmkInit	BRWD1	rs4818005[a]	21	40,574,305	A	G	.58	−.010	3.9×10⁻¹⁴	1,232,091	.75
SmkInit	ENTPD6	rs6050446	20	25,195,509	A	G	.97	.035	8.8×10⁻¹³	1,225,969	.33
SmkInit	RPS6KA4	rs17857342[a]	11	64,138,905	T	G	.38	−.010	9.8×10⁻¹²	1,232,091	.16
SmkInit	FAM163A	rs147052174	1	179,783,167	G	T	.02	.037	2.3×10⁻¹⁰	1,232,091	.59
SmkInit	PRRC2B	rs34553878	9	134,907,263	A	G	.11	.016	1.2×10⁻⁹	1,232,091	.28
SmkInit	ADAM15	rs45444697[a]	1	155033918	C	T	.21	.010	5.3×10⁻⁹	1,232,091	.46
SmkInit	MMS22L	rs9481410[a]	6	97,677,118	G	A	.76	.010	1.1×10⁻⁸	1,232,091	.04
SmkInit	QSER1	rs62618693	11	32,956,492	C	T	.04	−.020	2.1×10⁻⁸	1,232,091	1.00
DrnkWk	ADH1B	rs1229984	4	100,239,319	T	C	.96	.060	2.2×10⁻³⁰⁸	941,280	.05
DrnkWk	GCKR	rs1260326	2	27,730,940	T	C	.60	.008	8.1×10⁻⁴⁵	941,280	.10
DrnkWk	SLC39A8	rs13107325	4	103,188,709	C	T	.07	−.009	1.5×10⁻²²	941,280	.33
DrnkWk	SERPINA1	rs28929474	14	94,844,947	C	T	.02	−.012	1.3×10⁻¹¹	941,280	.50
DrnkWk (SmkInit)	ACTR1B	rs11692465	2	98,275,354	G	A	.09	.008	2.5×10⁻¹¹	937,516	.40
DrnkWk	TNFSF12–13	rs3803800	17	7,462,969	A	G	.79	.004	1.5×10⁻¹⁰	941,280	.67
DrnkWk	HGFAC	rs3748034	4	3,446,091	G	T	.14	−.005	1.7×10⁻⁸	941,280	.65

Note: Phenotype abbreviations are defined in Figure 1. Chr=Chromosome; REF=reference allele; ALT=alternate allele; AF=allele frequency of ALT allele; Q=Cochrane’s Q statistic p-value.

These variants were not themselves sentinel, but were in near-perfect LD with a sentinel variant (R[2] >.99, from the 1000 Genomes European population). The scale of Beta is on the unit of the standard deviation of the phenotype. For binary phenotypes the standard deviation was calculated from the weighted average prevalence across all studies included in the meta-analysis (available in Supplementary Table 7).

Ultimately, substance use is embedded in a complex web of causal relations[49] (e.g., Figure 1), and caution must be exercised in drawing strong causal conclusions. However, the present findings represent a major step forward in understanding the etiology of these complex, disease-relevant behaviors. In particular, statistical and interpretive power were both enabled by simultaneously studying multiple related substance use behaviors representing different stages of use and substances. More precise measurements, including evaluating age and environment as moderators for these dynamic phenotypes[50], functional research, and complementary gene mapping approaches (e.g., sequencing) will aid in the discovery of mechanisms by which implicated genes may affect substance use and related disease risk.

METHODS

This article is accompanied by a Supplementary Note with further details, as well as the Life Sciences Reporting Summary.

Generation of summary statistics.

Participants in all studies were genotyped on genome-wide arrays. The majority of studies imputed their genotypes to the Haplotype Reference Consortium[41] using the University of Michigan Imputation Server (see URLs)[51]. Several studies did not impute using the imputation server, due to data sharing restrictions, computational, and/or resource limitations (described in the Supplementary Note). All studies used either Minimac3[51] or IMPUTE2[52] for imputation. GWAS summary statistics were generated in each study sample using RVTESTS[53] according to a standard analysis plan. Studies composed primarily of classically related individuals (e.g., family studies) first regressed out covariates including genetic principal components under a linear model, inverse-normalized the residuals (except for 23andMe), and tested for an additive effect of each variant under a linear mixed model with a genetic kinship matrix. Family studies followed this analysis for all phenotypes, even binary phenotypes such as smoking initiation and cessation. Studies of entirely classically unrelated individuals followed the same analysis for quasi-continuous phenotypes (AgeSmk, CigDay, DrnkWk), but estimated additive genetic effects under a logistic model for binary phenotypes (SmkInit and SmkCes). Quality control checks were applied to ensure quality of both the phenotypes and genotypes. For each phenotype and covariate, distribution statistics including the minimum, maximum, quartiles, median, mean, and standard deviation were examined. We ensured that these statistics were within expected limits given the phenotype definitions and any scale transformations per the analysis plan. We also evaluated simple relationships among phenotypes. When discrepancies were noted we contacted the original study for clarification or re-analysis, or the data were removed from further analysis. Phenotypic statistics are presented in Supplementary Tables 6 and 7. Extensive genetic quality control and filtering was performed on the contributed summary statistics from each cohort. We removed imputed variants with imputation quality less than 0.3 (the estimated squared correlation between the imputed dosage and true dosage). We compared the per-study allele labels and allele frequencies with those of the imputation reference panels, and removed or reconciled mismatches. For quantitative traits, we plotted the variance of the score statistics against the sample size, and tested whether the trait residuals in each study were properly normalized and whether the trait analyzed between studies was measured and analyzed using the same unit.

Meta-Analysis.

Meta-analysis was performed centrally using the software package rareGWAMA (see URLs). All statistical tests in the meta-analysis or secondary analyses of the meta-analytic results (e.g., polygenic risk scoring, functional enrichment, MTAG, Genomic SEM, etc.) were two-sided. Given that rarer variants and/or behavioral phenotypes may show between-study heterogeneity in allele frequencies, imputation qualities, or genetic architecture, we extended existing methods and developed a novel fixed effects approach that accounts for between-study heterogeneity. Specifically, the methods aggregated weighted Z-score statistics, i.e. , where Z is the Z-score statistic in study k. The weight w is defined by , where p is the variant allele frequency, is the imputation quality, and N is the sample size for study k. Under the null and with the present sample sizes, Z is normally distributed. The weights are proportional to the sample genotype variance. When the trait is uniformly measured and the allele frequency is similar, the method is approximately equivalent to meta-analysis of sample-size-weighted Z-scores. Yet, the method accounts for between-study heterogeneity in imputation accuracy and allele frequencies. The use of a fixed effects model, the most common approach in GWAS meta-analysis of single ancestry groups, appeared acceptable given the apparent lack of substantial meta-analytic effect heterogeneity (see Cochrane’s Q and I2 statistics in Supplementary Tables 1–5). Population stratification and cryptic relatedness were addressed during the generation of summary statistics by each local study through the use of kinship-based linear mixed models[54] and genetic principal components[55]. Residual stratification was further corrected at the meta-analytic level with study-specific genomic controls[56] (calculated separately for variants with MAF ≥ 1% and .1%≤MAF<1%; Supplementary Table 23) applied to each study’s results prior to meta-analysis. A locus was defined as a 1MB region surrounding the “sentinel” variant (the variant in the locus with the lowest p-value). When any two such loci overlapped or abutted, they were collapsed into a single locus. Variants within each locus were subjected to conditional analysis using a novel partial correlation-based score statistic using cohort-level summary statistics[57] implemented in a sequential forward selection framework. The method requires marginal association statistics and approximated covariance matrices among them, and performs favorably compared to existing methods[57] (Supplementary Table 24). Covariances among effects were based upon the linkage disequilibrium information estimated from a subset of the Haplotype Reference Consortium[41]. We applied multiple post-meta-analysis variant filters to ensure robustness of reported findings. To reduce artifacts arising from a small number of studies, we excluded any variant that was present in only two or fewer studies. For each variant in the meta-analysis, we calculated the effective sample size , where N is the sample size in study k and is the imputation quality. We removed variants with effective sample sizes < 10% of the total sample size to ensure only well-imputed variants with a modicum of power were included. We also excluded all variants with minor allele frequency less than 0.001, the lower bound of moderate imputation accuracy with the currently best available imputation reference panel[41]. Variants with MAF > 1% are expected to be imputed with high accuracy. Results from the application of post-meta-analysis filters are displayed in Supplementary Table 25. After applying variant filters and obtaining our final meta-analytic results, we calculated genomic controls and maximum/median per-variant sample sizes. Sample sizes ranged from 337,334 for cigarettes per day to 1,232,091 for smoking initiation. QQ plots, LD intercept tests, and genomic control values indicate that Type I error rates were well controlled, for common and low-frequency variants (Supplementary Figure 2, Supplementary Table 26). All conditionally independent variants were plotted in LocusZoom and included in Supplementary Figures 1–12. All plots were visually inspected, suspicious loci were identified (see Supplementary Table 27) and removed from further consideration. To ensure LD information was available between sentinel variants and others in the locus, we used surrogate variants for eight loci (Supplementary Table 28). We estimated the extent of pleiotropy for each genome-wide associated locus from our GWAS using an Empirical Bayes approach (i.e. whether a given locus is simultaneously associated with multiple phenotypes). Using summary association statistics from a given locus as input, the method estimated the 5×5 genetic correlation of the locus and the posterior probability of association for all possible phenotype configurations, while accounting for genome-wide genetic correlations and trait residual correlations. In cases where loci associated with different phenotypes overlapped, the locus was expanded in size. Statistical details are available in the Supplementary Note, Section 3.3. We applied MTAG[16] to variants with MAF>1% from the final meta-analysis results for each phenotype, using the other four phenotypes to increase power for locus discovery. Genomic controls and LD Intercept tests of the MTAG results were well controlled (Supplementary Table 29), and Manhattan and QQ plots well-behaved (Supplementary Figures 16 and 17). GCTA-COJO[58] was used to identify conditionally independent variants (listed in Supplementary Table 12). All loci were plotted with LocusZoom, visually inspected, with suspicious loci identified (e.g., those without LD support; see Supplementary Table 30) removed from further consideration. Additional details, including testing of MTAG model assumptions, are provided in the Supplementary Note. Finally, we also applied Genomic SEM[59] to our five phenotypes to formally model and factor their correlation structure. See Supplementary Figure 18, Supplementary Table 31, and the Supplementary Note for further details.

Genome-wide significant threshold.

The primary focus was to test variants with MAF≥1%, as these will be imputed with high confidence. The statistical significance threshold applied to meta-analysis of all variants with MAF≥1% was 5×10−8, consistent with widespread convention in GWAS of European individuals. Since our imputation procedure is expected to provide some marginal level of accuracy down to MAF of 0.1%, we also conducted an exploratory association test for low frequency variants with 0.1%<MAF<1%, to which we applied a statistical significance threshold of p<5×10−9. Only two such low-frequency variants surpassed the conventional common variant threshold of p<5×10−8. Of these two, one low-frequency variant, associated with SmkInit, survived the more stringent multiple testing correction (rs181508347, intergenic, MAF=.0096, p=5×10−10), and is included in our count of discovered loci and included in Supplementary Table 4. The more stringent threshold applies a correction for ~10 million tests, which is approximately the number of conditionally independent variants tested once the MAF lower bound was extended from 1% to 0.1%. We calculated this threshold using three existing methods[60-62]. These methods make use of the eigenvalues of the matrix of LD (measured in R2) between SNPs, calculated with a spectral decomposition. We estimated the number of independent tests using the genotype data from a subset of the Haplotype Reference Consortium panel[41]. We first calculated LD blocks across the genome using the algorithm implemented in PLINK version 1.9[63] with default settings, and then we lowered the MAF threshold to 0.1% to accommodate all low frequency variants. Next, we calculated the effective number of independent tests within each LD block and between LD blocks using the aforementioned three methods, which we aggregated to get the total number of independent tests. The three techniques estimated the number of independent variants at 9.8–10.1 million independent tests, similar to other independent estimates[64]. A total of 278 sentinel variants (including the one genome-wide significant low-frequency variant) had p < 5×10−9, out of the original 406 with p < 5×10−8.

Heritability.

We used univariate and bivariate LD Score Regression[17] to assess the heritability of each phenotype and to estimate a variety of genetic correlations. Analyses included (1) LD Score Regression intercept tests to evaluate the extent to which population stratification or cryptic relatedness may artificially inflate our summary statistics; (2) estimation of genetic correlations across our five phenotypes; (3) estimation of genetic correlations computed within a phenotype but between the larger contributing studies, as an estimate of the extent to which phenotypes were measuring the same genetic risk in different studies; and (4) estimation of genetic correlation between the five phenotypes and a wide variety of other phenotypes related to smoking and alcohol behaviors, and for which GWAS have already been made publicly available. Under standard assumptions, bivariate score regression produces unbiased estimates of genetic correlation, even in the presence of sample overlap[65]. Accordingly, to estimate the extent of genetic correlation between each of our phenotypes, and between our phenotypes and other phenotypes related to nicotine and alcohol use, we used standard procedures in LD Score Regression[22]. To be included in these analyses, variants were restricted to those present in HapMap3 with MAF>0.01. Standard errors were estimated with a block jackknife over all variants. We estimated the proportion of variance explained by the set of all conditionally independently associated variants. The joint effects of variants in a locus were approximated by , where is the single variant score statistics and V is the covariance matrix between them. The phenotypic variance explained by the independently associated variants in a locus is given by , where cov(G) is the genotype covariance estimated from the Haplotype Reference Consortium panel.

Polygenic scoring.

Polygenic risk scores (PRS) were computed using LDpred[66], which accounts for linkage disequilibrium between variants. Since we do not know the variance-covariance matrix of the effects in the training sample (here, the GWAS results), we replace this matrix with a block diagonal matrix estimated using LD patterns from the prediction cohorts, after dropping cryptically-related individuals and ancestry outliers. Smoking and alcohol use rates are influenced by secular trends and policy changes over the last half century. We therefore selected two independent prediction cohorts, the Health and Retirement Study (HRS)[21] and the National Longitudinal Study of Adolescent to Adult Health (Add Health)[20]. The HRS is a nationally representative study of U.S. households that began in 1992; the mean birth year of respondents is 1938 (SD=9.3), and the mean age at the time of assessment is 57.6 (SD=8.9). Add Health is a nationally representative sample of U.S. adolescents enrolled in grades 7 through 12 during the 1994–1995 school year. The mean birth year of respondents was 1979 (SD=1.8), and the mean age at assessment (here, wave 4) was 29.0 (SD=1.8). In the HRS, ~57% of respondents reported ever smoking regularly, and these respondents smoked ~13 cigarettes per day. In Add Health, slightly fewer (~53%) of respondents reported ever smoking regularly, and these respondents smoked ~11 cigarettes per day on average (Supplementary Table 14). For each of our five phenotype scores, we used variants that overlapped with HapMap3 (~1.1 million) to construct the scores. Prediction accuracy was estimated using ordinary least squares regression of a given phenotype (AgeSmk, CigDay, SmkInit, SmkCes, or DrnkWk) on the polygenic score and covariates including age, sex, age × sex interaction, and the first ten genetic principle components. Prediction accuracy comes from a two-step process where we first regress the phenotype on a standard set of covariates without including the PRS. Then, the PRS predictor is added and the difference in the coefficient of determination (R2) is calculated. For our quantitative phenotypes, AgeSmk, CigDay, and DrnkWk, the predictive power of the PRS is the change in the R in going from the regression without the PRS to the regression with the PRS. For our two binary phenotypes, SmkInit and SmkCes, we measure the incremental pseudo-R from probit regressions. 95% confidence intervals around all R values are bootstrapped with 1000 repetitions each. The same polygenic scoring procedure was applied to the MTAG results (Supplementary Table 32).

Epigenomic enrichment.

To detect genome-wide functional and tissue-specific epigenomic enrichments, we performed enrichment analyses by heritability stratification using Linkage Disequilibrium Score Regression (LDSC v1.0.0), implemented in the LDSC software. Annotation-stratified LD scores were estimated using dichotomized/binary annotations, 1000 Genomes Project samples with European ancestry, and one million base-pair LD windows by default. LDSC then determines functional enrichment of the GWAS traits by partitioning heritability according to the variance explained by the LD-linked SNPs belonging to each functional category[22]. Statistical enrichment was defined as the ratio between the percentage of heritability explained by variants in each annotated category and the percentage of variants covered by that category. A resampling approach was used to estimate standard errors[22]. Following standard procedure, we trained a baseline LDSC model using the 52 non-cell-type specific functional categories (plus one category that includes all SNPs) and used the observed z-scores of HapMap3 SNPs for each trait. We tested cell-group enrichments over 10 pre-defined cell-group annotations[22]. The cell-group annotations are the result of aggregating 220 cell-type-specific annotations over 4 histone marks (H3K4me1, H3K4me3, H3K9ac, H3K27ac) and 100 well-defined cell types. To detect which specific epigenomes contribute to the group-level enrichment, we performed 220 tests over each individual annotation. Multiple testing was accounted for through Bonferroni correction within phenotype with 10 tests for the cell-group annotation enrichment analyses and 220 tests for the cell-specific enrichment analyses. As a complementary method to LDSC, we also applied a recently developed mixture model learning approach[67], and report these results in Supplementary Figure 13.

Gene and Gene-Set Tests.

For each phenotype, we used SEQMINER[68] and the UCSC genome browser annotations (refGene; retrieved December 15 2017) to annotate all conditionally independent genome-wide significant variants. We identified all genes (all variants 5’ to 3’ UTR) harboring at least one variant within LD r2>0.3 with any conditionally independent variant. See Supplementary Tables 1–5. We conducted a manual review of all genes implicated within each locus, overlap with the GWAS catalogue (Supplementary Table 33), and all pathways identified by PASCAL and DEPICT (described below). We considered a gene to be implicated if it harbored variation in LD with a conditionally independent genome-wide significant variant, or if a gene was located within the locus and was significant by the PASCAL gene-based test. PASCAL[69] was used for gene based and pathway analysis to test genes and canonical pathways from MSigDb (Supplementary Tables 20–21). Default settings were used to test all variants within all genes. DEPICT[70] was used to identify enrichment within tissues/cell types, and reconstituted gene sets (also known as “pathways”). For each phenotype, variants from the GWAS were clumped using 500 kb flanking regions with the LD cutoff r2 > 0.1 (based on 1000 Genomes phase 1 release v3, the default in DEPICT). We used DEPICT to understand genetic signals beyond the genome-wide significant loci that surpass the conventional 5×10−8, and so included all variants with p<5×10−5. DEPICT tissue enrichment results are displayed in Supplementary Figure 15, where enrichment relative to genes in random sets of loci is indicated by red shading. To cluster DEPICT reconstituted gene sets, we used affinity propagation clustering[71] and calculated the correlation between each resulting “exemplary gene set” in Figure 4. Genes, gene sets, and tissue/cell enrichments were considered significant when their false discovery rate was below 0.05. All such significant DEPICT results are reported in Supplementary Tables 17–19. PASCAL and DEPICT were also applied in the same fashion to the MTAG summary statistics (Supplementary Tables 34–39).

Statistics.

The GWAS meta-analysis was conducted using chi-square statistics based upon an imputation-quality aware fixed effect meta-analysis approach. Two sided p-values were calculated. The MTAG and GenomicSEM analysis test statistics was conducted using the GWAS meta-analysis results, and two-sided p-values were similarly calculated from chi-square distribution. The pleiotropic analysis was conducted based upon an empirical Bayes approach. The prior distribution for the effect sizes were assumed to follow a mixture distribution: with a point mass at 0 (representing the possibility the locus is not associated with the trait) and a normal distribution (representing the possibility that the locus is associated). The hyper-parameters were estimated by maximizing the marginal likelihood. The method properly accounts for the local genetic correlation and residual correlation between phenotypes. The posterior probability of association for each locus was estimated for each possible combination of 5 phenotypes, and the combination with the highest PPA was reported for each locus.

68 in total

1. Clustering by passing messages between data points.

Authors: Brendan J Frey; Delbert Dueck
Journal: Science Date: 2007-01-11 Impact factor: 47.728

Review 2. Dysfunction of the prefrontal cortex in addiction: neuroimaging findings and clinical implications.

Authors: Rita Z Goldstein; Nora D Volkow
Journal: Nat Rev Neurosci Date: 2011-10-20 Impact factor: 34.870

3. Methamphetamine Addiction Vulnerability: The Glutamate, the Bad, and the Ugly.

Authors: Karen K Szumlinski; Kevin D Lominac; Rianne R Campbell; Matan Cohen; Elissa K Fultz; Chelsea N Brown; Bailey W Miller; Sema G Quadir; Douglas Martin; Andrew B Thompson; Georg von Jonquieres; Matthias Klugmann; Tamara J Phillips; Tod E Kippin
Journal: Biol Psychiatry Date: 2016-10-13 Impact factor: 13.382

4. The nature of nurture: Effects of parental genotypes.

Authors: Augustine Kong; Gudmar Thorleifsson; Michael L Frigge; Bjarni J Vilhjalmsson; Alexander I Young; Thorgeir E Thorgeirsson; Stefania Benonisdottir; Asmundur Oddsson; Bjarni V Halldorsson; Gisli Masson; Daniel F Gudbjartsson; Agnar Helgason; Gyda Bjornsdottir; Unnur Thorsteinsdottir; Kari Stefansson
Journal: Science Date: 2018-01-26 Impact factor: 47.728

Review 5. Role of corticotropin-releasing factor in drug addiction: potential for pharmacological intervention.

Authors: Marian L Logrip; George F Koob; Eric P Zorrilla
Journal: CNS Drugs Date: 2011-04 Impact factor: 5.749

6. The structure of genetic and environmental risk factors for common psychiatric and substance use disorders in men and women.

Authors: Kenneth S Kendler; Carol A Prescott; John Myers; Michael C Neale
Journal: Arch Gen Psychiatry Date: 2003-09

7. Genetic contributors to variation in alcohol consumption vary by race/ethnicity in a large multi-ethnic genome-wide association study.

Authors: E Jorgenson; K K Thai; T J Hoffmann; L C Sakoda; M N Kvale; Y Banda; C Schaefer; N Risch; J Mertens; C Weisner; H Choquet
Journal: Mol Psychiatry Date: 2017-05-09 Impact factor: 15.992

8. A reference panel of 64,976 haplotypes for genotype imputation.

Authors: Shane McCarthy; Sayantan Das; Warren Kretzschmar; Olivier Delaneau; Andrew R Wood; Alexander Teumer; Hyun Min Kang; Christian Fuchsberger; Petr Danecek; Kevin Sharp; Yang Luo; Carlo Sidore; Alan Kwong; Nicholas Timpson; Seppo Koskinen; Scott Vrieze; Laura J Scott; He Zhang; Anubha Mahajan; Jan Veldink; Ulrike Peters; Carlos Pato; Cornelia M van Duijn; Christopher E Gillies; Ilaria Gandin; Massimo Mezzavilla; Arthur Gilly; Massimiliano Cocca; Michela Traglia; Andrea Angius; Jeffrey C Barrett; Dorrett Boomsma; Kari Branham; Gerome Breen; Chad M Brummett; Fabio Busonero; Harry Campbell; Andrew Chan; Sai Chen; Emily Chew; Francis S Collins; Laura J Corbin; George Davey Smith; George Dedoussis; Marcus Dorr; Aliki-Eleni Farmaki; Luigi Ferrucci; Lukas Forer; Ross M Fraser; Stacey Gabriel; Shawn Levy; Leif Groop; Tabitha Harrison; Andrew Hattersley; Oddgeir L Holmen; Kristian Hveem; Matthias Kretzler; James C Lee; Matt McGue; Thomas Meitinger; David Melzer; Josine L Min; Karen L Mohlke; John B Vincent; Matthias Nauck; Deborah Nickerson; Aarno Palotie; Michele Pato; Nicola Pirastu; Melvin McInnis; J Brent Richards; Cinzia Sala; Veikko Salomaa; David Schlessinger; Sebastian Schoenherr; P Eline Slagboom; Kerrin Small; Timothy Spector; Dwight Stambolian; Marcus Tuke; Jaakko Tuomilehto; Leonard H Van den Berg; Wouter Van Rheenen; Uwe Volker; Cisca Wijmenga; Daniela Toniolo; Eleftheria Zeggini; Paolo Gasparini; Matthew G Sampson; James F Wilson; Timothy Frayling; Paul I W de Bakker; Morris A Swertz; Steven McCarroll; Charles Kooperberg; Annelot Dekker; David Altshuler; Cristen Willer; William Iacono; Samuli Ripatti; Nicole Soranzo; Klaudia Walter; Anand Swaroop; Francesco Cucca; Carl A Anderson; Richard M Myers; Michael Boehnke; Mark I McCarthy; Richard Durbin
Journal: Nat Genet Date: 2016-08-22 Impact factor: 38.330

Review 9. The effect of alcohol consumption on the adolescent brain: A systematic review of MRI and fMRI studies of alcohol-using youth.

Authors: Sarah W Feldstein Ewing; Ashok Sakhardande; Sarah-Jayne Blakemore
Journal: Neuroimage Clin Date: 2014-07-05 Impact factor: 4.881

10. A rare missense mutation in CHRNA4 associates with smoking behavior and its consequences.

Authors: T E Thorgeirsson; S Steinberg; G W Reginsson; G Bjornsdottir; T Rafnar; I Jonsdottir; A Helgadottir; S Gretarsdottir; H Helgadottir; S Jonsson; S E Matthiasson; T Gislason; T Tyrfingsson; T Gudbjartsson; H J Isaksson; H Hardardottir; A Sigvaldason; L A Kiemeney; A Haugen; S Zienolddiny; H J Wolf; W A Franklin; A Panadero; J I Mayordomo; I P Hall; E Rönmark; B Lundbäck; A Dirksen; H Ashraf; J H Pedersen; G Masson; P Sulem; U Thorsteinsdottir; D F Gudbjartsson; K Stefansson
Journal: Mol Psychiatry Date: 2016-03-08 Impact factor: 15.992

381 in total

1. An integrative systems-based analysis of substance use: eQTL-informed gene-based tests, gene networks, and biological mechanisms.

Authors: Zachary F Gerring; Angela Mina Vargas; Eric R Gamazon; Eske M Derks
Journal: Am J Med Genet B Neuropsychiatr Genet Date: 2020-12-23 Impact factor: 3.568

2. Trait Insights Gained by Comparing Genome-Wide Association Study Results using Different Chronic Obstructive Pulmonary Disease Definitions.

Authors: Jaehyun Joo; Brian D Hobbs; Michael H Cho; Blanca E Himes
Journal: AMIA Jt Summits Transl Sci Proc Date: 2020-05-30

3. Genetic Variant in CHRNA5 and Response to Varenicline and Combination Nicotine Replacement in a Randomized Placebo-Controlled Trial.

Authors: Li-Shiun Chen; Timothy B Baker; J Philip Miller; Michael Bray; Nina Smock; Jingling Chen; Faith Stoneking; Robert C Culverhouse; Nancy L Saccone; Christopher I Amos; Robert M Carney; Douglas E Jorenby; Laura J Bierut
Journal: Clin Pharmacol Ther Date: 2020-08-04 Impact factor: 6.875

4. Cohort Profile: The National Longitudinal Study of Adolescent to Adult Health (Add Health).

Authors: Kathleen Mullan Harris; Carolyn Tucker Halpern; Eric A Whitsel; Jon M Hussey; Ley A Killeya-Jones; Joyce Tabor; Sarah C Dean
Journal: Int J Epidemiol Date: 2019-10-01 Impact factor: 7.196

5. Variants in the CHRNA5-CHRNA3-CHRNB4 Region of Chromosome 15 Predict Gastrointestinal Adverse Events in the Transdisciplinary Tobacco Use Research Center Smoking Cessation Trial.

Authors: Robert C Culverhouse; Li-Shiun Chen; Nancy L Saccone; Yinjiao Ma; Megan E Piper; Timothy B Baker; Laura J Bierut
Journal: Nicotine Tob Res Date: 2020-02-06 Impact factor: 4.244