Literature DB >> 28166213

Genome-wide association analyses for lung function and chronic obstructive pulmonary disease identify new loci and potential druggable targets.

Louise V Wain1,2, Nick Shrine1, María Soler Artigas1, A Mesut Erzurumluoglu1, Boris Noyvert1, Lara Bossini-Castillo3, Ma'en Obeidat4, Amanda P Henry5, Michael A Portelli5, Robert J Hall5, Charlotte K Billington5, Tracy L Rimington5, Anthony G Fenech6, Catherine John1, Tineka Blake1, Victoria E Jackson1, Richard J Allen1, Bram P Prins7, Archie Campbell8,9, David J Porteous8,9, Marjo-Riitta Jarvelin10,11,12,13, Matthias Wielscher10, Alan L James14,15,16, Jennie Hui14,17,18,19, Nicholas J Wareham20, Jing Hua Zhao20, James F Wilson21,22, Peter K Joshi21, Beate Stubbe23, Rajesh Rawal24, Holger Schulz25,26, Medea Imboden27,28, Nicole M Probst-Hensch27,28, Stefan Karrasch25,29, Christian Gieger24, Ian J Deary30,31, Sarah E Harris8,30, Jonathan Marten22, Igor Rudan21, Stefan Enroth32, Ulf Gyllensten32, Shona M Kerr22, Ozren Polasek21,33, Mika Kähönen34, Ida Surakka35,36, Veronique Vitart22, Caroline Hayward22, Terho Lehtimäki37,38, Olli T Raitakari39,40, David M Evans41,42, A John Henderson43, Craig E Pennell44, Carol A Wang44, Peter D Sly45, Emily S Wan46,47, Robert Busch46,47, Brian D Hobbs46,47, Augusto A Litonjua46,47, David W Sparrow48,49, Amund Gulsvik50, Per S Bakke50, James D Crapo51,52, Terri H Beaty53, Nadia N Hansel54, Rasika A Mathias55, Ingo Ruczinski56, Kathleen C Barnes57, Yohan Bossé58,59, Philippe Joubert59,60, Maarten van den Berge61, Corry-Anke Brandsma62, Peter D Paré4,63, Don D Sin4,63, David C Nickle64, Ke Hao65, Omri Gottesman66, Frederick E Dewey66, Shannon E Bruse66, David J Carey67, H Lester Kirchner67, Stefan Jonsson68, Gudmar Thorleifsson68, Ingileif Jonsdottir68,69, Thorarinn Gislason69,70, Kari Stefansson68,69, Claudia Schurmann71,72, Girish Nadkarni71, Erwin P Bottinger71, Ruth J F Loos71,72,73, Robin G Walters74, Zhengming Chen74, Iona Y Millwood74,75, Julien Vaucher74, Om P Kurmi74, Liming Li76,77, Anna L Hansell78,79, Chris Brightling2,80, Eleftheria Zeggini7, Michael H Cho46,47, Edwin K Silverman46,47, Ian Sayers5, Gosia Trynka3, Andrew P Morris81, David P Strachan82, Ian P Hall5, Martin D Tobin1,2.   

Abstract

Chronic obstructive pulmonary disease (COPD) is characterized by reduced lung function and is the third leading cause of death globally. Through genome-wide association discovery in 48,943 individuals, selected from extremes of the lung function distribution in UK Biobank, and follow-up in 95,375 individuals, we increased the yield of independent signals for lung function from 54 to 97. A genetic risk score was associated with COPD susceptibility (odds ratio per 1 s.d. of the risk score (∼6 alleles) (95% confidence interval) = 1.24 (1.20-1.27), P = 5.05 × 10-49), and we observed a 3.7-fold difference in COPD risk between individuals in the highest and lowest genetic risk score deciles in UK Biobank. The 97 signals show enrichment in genes for development, elastic fibers and epigenetic regulation pathways. We highlight targets for drugs and compounds in development for COPD and asthma (genes in the inositol phosphate metabolism pathway and CHRM3) and describe targets for potential drug repositioning from other clinical indications.

Entities:  

Mesh:

Year:  2017        PMID: 28166213      PMCID: PMC5326681          DOI: 10.1038/ng.3787

Source DB:  PubMed          Journal:  Nat Genet        ISSN: 1061-4036            Impact factor:   38.330


Maximally attained lung function and subsequent lung function decline together determine the risk of developing Chronic Obstructive Pulmonary Disease (COPD)[1,2]. COPD, characterised by irreversible airflow obstruction and chronic airway inflammation, is the third leading cause of death globally[3]. Smoking is the primary risk factor for COPD but not all smokers develop COPD and more than 25% of COPD cases occur in never-smokers[4]. Patients with COPD exhibit variable presentation of symptoms and pathology, with or without exacerbations, with variable amounts of emphysema and with differing rates of progression. Although risk factors for COPD are known, including smoking and environmental exposures in early[5,6] and later life, the causal mechanisms are not well understood[7]. Disease-modifying treatments for COPD are required[7]. Understanding genetic factors associated with reduced lung function and COPD susceptibility could inform drug target identification, risk prediction and stratified prevention or treatment. Previous genome-wide association studies (GWAS) of COPD identified several independent COPD-associated variants[8-10] but the rate and scale of discovery has been limited by available sample sizes. We conducted a powerful GWAS for lung function, and followed up the robustly-associated variants in COPD case-control studies. Although previous GWAS have reported genome-wide significant associations with lung function[11-16], there has not been a comprehensive study confirming the effect of these variants on COPD susceptibility. In this study, we hypothesised that: (i) undertaking GWAS of lung function of unprecedented power and scale would detect novel loci associated with quantitative measures of lung function; (ii) collectively these variants would be associated with the risk of developing COPD, and (iii) aggregate analyses of all novel and previously-reported signals of association, and the identification of genes through which their effects are mediated, would reveal further insight into biological mechanisms underlying the associations. Together these findings could provide potential novel targets[17] for therapeutic intervention and pinpoint existing drugs which could be candidates for repositioning[18] for the treatment of COPD.

Results

43 new signals for lung function

For stage 1, genome-wide association analyses of forced expired volume in 1 second (FEV1), forced vital capacity (FVC) and FEV1/FVC were undertaken in 48,943 individuals from the UK BiLEVE study[16] who were selected from the extremes of the lung function distribution in UK Biobank (total n=502,682). From analysis of 27,624,732 variants, 81 independent variants associated with one or more traits with P<5x10-7 were selected for follow-up in stage 2, consisting of a further 95,375 independent individuals from UK Biobank, the SpiroMeta consortium and UK Households Longitudinal Study (UKHLS) (Supplementary Table 1). No evidence of sample overlap between stage 1 and stage 2 studies or between stage 2 studies was identified using LD score regression (Supplementary Table 2). Following meta-analysis of stage 1 and stage 2 results, 43 signals showed genome-wide significant (P<5x10-8) association with one or more of FEV1, FVC or FEV1/FVC (Table 1, Supplementary Table 3 and Supplementary Figure 1). We report these 43 signals as novel independent signals (Figure 1), almost doubling the number of confirmed independent genomic signals for lung function to 97 (Supplementary Table 4). Of the 43 novel signals, 33 represented novel loci whilst 10 were statistically independent signals (conditional P<5x10-7) within 500kb of another association signal. Based on an assumed heritability of 40%[19,20] for each lung function trait, the novel signals explained 4.3% of the heritability of FEV1, 3.2% for FVC and 5.2% for FEV1/FVC bringing the total heritability explained by the 97 signals to 9.6%, 6.4% and 14.3%, respectively. The estimated effect sizes of lung function associated variants in children were correlated with those in adults (r=0.65, 73 variants with high imputation quality, Supplementary Figure 2). A genetic risk score based on these 73 variants, was also significantly associated with FEV1 and FEV1/FVC in children, (per risk allele β (s.e.) = -0.0177 (0.0040), P=1.03x10-5 and per risk allele β (s.e.) = -0.0213 (0.0037), P=1.27x10-8, respectively), but not with FVC (per risk allele β (s.e.) = -0.0037 (0.0041), P=0.366).
Table 1

Stage 1 and stage 2 association results for the 43 novel signals of association with lung function.

Where the discovery variant was not available in replication cohorts but a proxy with r2 > 0.8 was available, the proxy was used for replication in all cohorts (proxies are marked with * in rsid column). For discovery the standard errors and P values are genomic controlled except for conditional analyses (“Conditioned on” SNP is given in rsid column) where unadjusted standard errors and P values are given. Genomic controlled results were used for SpiroMeta. Unadjusted results were used for UK Biobank or UKHLS where genome-wide inflation factors were not available. Values are missing from stage 2 studies where there was quality control failure due to poor imputation (info < 0.5) or low minor allele count (MAC < 3). In the meta-analysis of the Stage 2 replication cohorts the 39 variants showing independent replication (Bonferroni correction for 81 tests: P <6.17×10-4) have P value in bold. Nearest gene gives either the nearest genes either side or the gene and location within the gene. Stage 1 association results (FEV1, FVC and FEV1/FVC) for the 54 signals of association that have been previously reported are given in Supplementary Table 4b.

Stage 1 (discovery in UK BiLEVE)Stage 2Meta-analysis of stage 1 and stage 2
Top traitRsid (conditioned on)Position b37Nearest gene(s)Non /coding alleleEffect allele frequencybetasePUK Biobank betaUK Biobank seSpiroMeta betaSpiroMeta seUKHLS betaUKHLS seMeta betaMeta seMeta PbetaseMeta P
FEV1/FVCrs175131351:40035686LOC101929516 (intron)C/T23.15%-0.0470.0081.25E-09-0.0340.008-0.0250.009-0.0300.020-0.0330.0061.17E-08-0.0380.0052.31E-16
FEV1/FVCrs1192404(rs12140637)1:92068967CDC7/TGFBR3A/G16.21%-0.0460.0091.10E-07-0.0470.009-0.0460.010-0.0330.023-0.0500.0079.31E-14-0.0480.0056.09E-20
FEV1/FVCrs121406371:92374517TGFBR3/BRDT2C/T31.30%-0.0360.0073.49E-07-0.0140.008-0.0190.008-0.0420.018-0.0200.0051.46E-04-0.0260.0041.18E-09
FVCrs2001543341:118862070SPAG17/TBX15AT/-24.79%0.0540.0089.70E-120.0250.0080.0230.0090.0010.0200.0240.0061.69E-050.0340.0058.20E-14
FEV1/FVCrs66885371:239850588CHRM3 (intron)C/A50.60%-0.0370.0072.74E-08-0.0420.006-0.0230.008-0.0490.017-0.0390.0054.05E-15-0.0380.0046.72E-22
FEV1/FVCrs613320752:239316560TRAF3IP1/ASB1G/C12.30%0.0600.0102.93E-090.0250.0100.0210.0120.0290.0250.0260.0075.11E-040.0380.0062.55E-10
FEV1/FVCrs14589793:55150677CACNA2D3/WNT5AA/G50.11%-0.0350.0071.52E-07-0.0210.006-0.0100.008-0.0310.017-0.0190.0051.07E-04-0.0250.0044.42E-10
FVCrs14902653:67452043SUCLG2 (intron)C/A70.79%0.0390.0071.03E-070.0220.0070.0080.0080.0360.0180.0190.0053.27E-040.0260.0041.58E-09
FEV1/FVCrs28114153:127991527EEFSEC (intron)A/G84.04%-0.0570.0092.64E-10-0.0170.009-0.0230.010-0.0410.022-0.0230.0074.53E-04-0.0350.0055.52E-11
FEV1/FVCrs56341938*3:168715808LOC100507661/MECOMA/G51.34%0.0340.0073.38E-070.0370.0060.0130.008--0.0270.0051.97E-080.0290.0044.52E-14
FEV1/FVCrs13110699(rs2045517)4:89815695FAM13A (intron)T/G82.51%-0.0450.0081.29E-07-0.0370.008-0.0300.009-0.0140.024-0.0350.0067.80E-09-0.0380.0057.86E-15
FVCrs917315:33334312LOC340113/TARSC/A90.53%-0.0700.0118.10E-10-0.0310.011-0.0470.0130.0000.028-0.0380.0087.88E-06-0.0490.0074.31E-13
FEV1/FVCrs15519435:52195033ITGA1 (intron)G/A23.01%-0.0520.0083.12E-11-0.0410.008-0.0190.009-0.0310.020-0.0350.0062.35E-09-0.0410.0051.92E-18
FVCrs24410265:53444498ARL15 (intron)C/T46.27%0.0340.0074.59E-070.0230.0060.0250.0080.0060.0170.0240.0056.59E-070.0270.0042.75E-12
FEV1/FVCrs77130655:131788334C5orf56 (intron)A/C73.67%0.0390.0072.21E-070.0290.0070.0140.0080.0170.0190.0240.0058.29E-060.0290.0042.77E-11
FEV1rs38392345:148596693ABLIM3 (intron)G/-47.01%-0.0380.0078.87E-09-0.0230.006-0.0140.0080.0010.017-0.0190.0057.71E-05-0.0260.0044.48E-11
FEV1/FVCrs10515750 (rs1990950)5:156810072CYFIP2 (intron)C/T7.18%-0.0630.0122.61E-07-0.0500.012-0.0400.014-0.0330.032-0.0480.0092.62E-07-0.0540.0075.26E-13
FEV1/FVCrs28986170(rs2070600rs9272528*)6:31556155LST1 (intron)-/AA7.52%0.0750.0132.30E-080.0340.014--0.0960.0360.0480.0146.49E-040.0630.0101.56E-10
FEV1rs114229351(rs34864796)6:32648418HLA-DQB1/HLA-DQA2T/C17.53%-0.0460.0091.15E-07-0.0260.008---0.0450.026-0.0300.0081.78E-04-0.0370.0062.12E-10
FEV1/FVCrs1416515206:73670095KCNQ5 (intron)TTCTAT/-20.10%0.0420.0083.38E-070.0490.0080.0260.0090.0250.0200.0420.0065.49E-120.0420.0059.93E-18
FEV1/FVCrs102463037:7286445C1GALT1 (3’ UTR)A/T41.74%-0.0340.0074.42E-07-0.0130.006-0.0160.008-0.0190.017-0.0160.0051.29E-03-0.0220.0042.35E-08
FEV1/FVCrs726151577:99635967ZKSCAN1 (3’ UTR)G/A16.73%0.0460.0092.68E-070.0150.0090.0300.0100.0300.0220.0240.0072.56E-040.0320.0051.98E-09
FEV1rs126984037:156127246LOC389602/LOC285889G/A44.36%-0.0360.0077.43E-08-0.0250.006-0.0250.008-0.0120.017-0.0260.0051.43E-07-0.0290.0041.11E-13
FEV1rs78721889:4124377GLIS3 (intron)C/T40.17%-0.0380.0071.80E-08-0.0190.007-0.0200.0080.0050.017-0.0190.0051.41E-04-0.0260.0041.59E-10
FVCrs10870202(rs10858246)9:139257411DNLZ (intron)T/C50.01%-0.0330.0063.25E-07-0.0160.006-0.0170.008-0.0270.017-0.0190.001.54E-04-0.0240.0049.32E-10
FEV1/FVCrs384740210:30267810SVIL/KIAA1462G/A40.57%-0.0360.0071.00E-07-0.0170.007-0.0270.008-0.0070.017-0.0210.0053.84E-05-0.0270.0047.72E-11
FVCrs709560710:69957350MYPN (intron)G/A49.52%-0.0370.0073.93E-08-0.0210.006-0.0290.008-0.0300.017-0.0270.0052.26E-08-0.0310.0048.67E-15
FEV1rs250996111:62310909AHNAK (intron)T/C38.21%0.0360.0071.68E-070.0300.0070.0170.0080.0250.0170.0270.0051.07E-070.0300.0041.49E-13
FEV1rs145729347*11:86442733ME3/PRSS23G/C15.08%-0.0560.0091.67E-09-0.0200.009-0.0160.010---0.0180.0075.36E-03-0.0310.0058.58E-09
FEV1rs56750811:126008910CDON/RPUSD4G/A84.96%0.0500.0091.11E-070.0290.0090.0130.0100.0530.0240.0260.0071.08E-040.0340.0054.77E-10
FEV1rs149450212:65824670MSRB3 (intron)A/G36.20%0.0360.0072.72E-070.0200.0070.0120.0080.0300.0170.0190.0051.33E-040.0250.0049.80E-10
FEV1/FVCrs11374563512:95554771FGD6 (intron)C/T21.20%-0.0500.0083.47E-10-0.0390.008-0.0180.009-0.0610.020-0.0360.0061.41E-09-0.0410.0058.46E-18
FVCrs3550612:115500691TBX3/MED13LT/A71.25%0.0370.0074.31E-070.0210.0070.0190.0080.0110.0180.0210.0051.08E-040.0270.0049.87E-10
FEV1/FVCrs169826814:84309664LINC01467/LINC00911A/T29.44%-0.0390.0071.12E-07-0.0230.007-0.0030.0100.0000.018-0.0160.0064.20E-03-0.0250.0053.19E-08
FEV1/FVCrs7272413015:41977690MGA (intron)A/T5.70%-0.0750.0142.05E-07-0.0460.014-0.0390.0210.0070.035-0.0430.0122.62E-04-0.0560.0099.58E-10
FEV1/FVCrs12591467(rs10851839)15:71788387THSD4 (intron)C/T68.38%0.0370.0076.45E-080.0210.0070.0110.0080.0300.0180.0190.0052.17E-040.0260.0045.65E-10
FEV1/FVCrs6665017915:84261689SH3GL3 (intron)A/-81.34%-0.0480.0092.60E-08-0.0300.008---0.0350.021-0.0360.0081.79E-05-0.0420.0063.71E-12
FEV1/FVCrs62070270*17:28263980EFCAB5 (intron)A/G45.65%-0.0410.0076.71E-10-0.0360.006-0.0210.008---0.0300.0058.00E-10-0.0340.0047.29E-18
FEV1/FVCrs1165850017:36886828CISD3 (intron)G/A14.16%-0.0510.0094.70E-08-0.0310.009-0.0110.011-0.0690.025-0.0290.0075.11E-05-0.0370.0067.22E-11
FVCrs614005020:6632901CASC20/BMP2C/A63.34%0.0400.0075.95E-090.0260.0070.0280.008-0.0110.0170.0260.0055.23E-070.0310.0046.39E-14
FEV1rs7244846620:62363640ZGPAT (intron)GT/-67.28%-0.0380.0071.09E-07-0.0200.007-0.0290.008-0.0320.017-0.0270.0053.68E-07-0.0300.0044.31E-13
FEV1rs1170482722:18450287MICAL3 (intron)A/T23.14%0.0490.0086.08E-100.0210.0080.0210.0090.0470.0200.0250.0061.44E-050.0330.0058.32E-13
FEV1rs228384722:28181399MN1 (intron)C/T55.51%-0.0380.0072.40E-08-0.0260.007-0.0140.008-0.0030.021-0.0210.0053.65E-05-0.0270.0043.41E-11
Figure 1

Manhattan plots of genome-wide association results for FEV1 (top), FEV1/FVC (middle) and FVC (bottom). Previously reported signals are highlighted in dark blue (except signals with P>5x10-4in this study); and novel signals are coloured in red. Signals are highlighted for the trait with which they showed strongest association only. The red and blue lines correspond to the genome-wide significance level (P=5x10-8, -log10P=7.3) and the threshold used to select signals for follow up in stage 2 (P=5x10-7, -log10P=6.3) respectively. Labels show the nearest gene to the novel sentinel variants. There were 2 independent novel signals near CDC7 and TGFBR3 on chromosome 1 (labelled as CDC7/TGFBR3). See Supplementary Table 3 for full results. Image was created using a modified version of the R package qqman.

Using the stage 1 results, a 95% ‘credible set’ of variants (i.e. the set of variants that were 95% likely to contain the underlying causal variant, based on Bayesian refinement) was defined for all (novel and previously reported) association signals for which this was feasible (67 signals, Online Methods Supplementary Figures 3, 4 and 5 and Supplementary Table 5); 13 of these signals were fine-mapped to <=10 plausible causal variants and for 63 of the 67 signals fine-mapped, the sentinel (lowest P value) variant was also the top ranked variant by posterior probability. In addition, by refining six chromosome 6 MHC region association signals using imputation of classical alleles and amino acid changes (Online methods), we identified the MHC class II HLA-DQB1 gene product, HLA-DQβ1, amino acid change at position 57 (alanine compared to non-alanine) as the main driver of signals in the MHC region for both FEV1 (β (s.e.) = 0.048 (0.007), P=5.71×10-13, Supplementary Figure 6a) and FEV1/FVC (β (s.e.) = 0.062 (0.007), P=1.17×10-20, Supplementary Figure 6c) with secondary non-HLA gene signals in the MHC region remaining after conditioning on the HLA-DQβ1 position 57 variant for rs34864796:G>A (near ZKSCAN3, FEV1; conditional β (s.e.) = -0.058 (0.01), P=1.26x10-9, Supplementary Figure 6b) and rs2070600:C>T (in AGER, FEV1/FVC; conditional β (s.e.) = 0.120 (0.013), P=4.23x10-20, Supplementary Figure 6d), (Supplementary Table 6). We identified that 29 of the lung function-associated signals had previously shown genome-wide significant association in GWAS of traits other than lung function or COPD. This included associations with inflammatory bowel disease (Crohn’s disease and/or ulcerative colitis, 3 signals) and height (9 signals, 3 of which showed a consistent direction of effect on height and the lung function measure with which they were most strongly associated) (Supplementary Table 7). With the exception of KANSL1[16], there was no significant (P<5.15x10-4) association with smoking for any of the signals (Supplementary Table 8).

95 variants and COPD susceptibility

The disease-relevance of lung function-associated variants has been questioned[21]. Therefore we tested association with COPD susceptibility for variants representing 95 of the 97 lung function associated signals in up to 20,086 COPD cases and 215,630 controls (data were unavailable for further study for the X-chromosome variant, rs7050036:A>T near AP1S2, and a rare variant, chr12:114743533:C>T) (Supplementary Table 9). These cases and controls comprised the COPD study at deCODE Genetics[22], (COPD cases defined using spirometry, population-based controls excluding known cases, up to 1,964 moderate-severe cases, up to 142,262 controls), three lung resection cohorts[23-25] (COPD definition based on spirometry, 310 moderate-severe cases, 332 controls), four case-control studies employing post-bronchodilator spirometry[8-10,26-29] (5,778 moderate-severe cases, 3,950 controls), two studies within which COPD was determined from electronic medical records[30] (eMR, total 1,487 cases, 15,138 controls), additional UK Biobank samples (COPD definition based on spirometry, 984 moderate-severe[31] cases and 26,561 controls) and UK BiLEVE (COPD definition based on spirometry, 9,563 moderate-severe cases, 27,387 controls). UK BiLEVE COPD cases and controls were only used for single variant COPD association tests for the subset of 47 variants discovered independently from UK BiLEVE (that is excluding the 43 variants discovered using the UK BiLEVE data described in this paper and 5 variants reported in our previous study in the UK BiLEVE population[16]). Across all 95 variants, 51 showed nominal COPD association (P<0.05) and 30 showed associations with COPD susceptibility reaching a Bonferroni corrected threshold for 95 tests (P<5.26x10-4, Supplementary Table 10). Of these 30, 27 were variants discovered independently from UK BiLEVE and 3 were from the 48 lower powered association tests not including UK BiLEVE cases and controls. Using a risk score based on the available 95 sentinel variants or their best proxies, and using data from up to 9791 COPD cases and 120,462 controls (Online Methods), for the meta-analysis the OR (95% CI) per standard deviation change in risk score (~6 alleles) was 1.24 (1.20-1.27), P=5.05x10-49 (Figure 2a, Supplementary Table 11). We observed considerable heterogeneity in effect estimates between the different COPD studies (I2=92%) which had different approaches to ascertainment of COPD cases and variable disease severity. In UK Biobank (including UK BiLEVE) we found broadly similar effect size estimates of moderate-severe COPD to those in COPD case-control studies employing post-bronchodilator spirometry (OR=1.42 versus 1.36) and therefore we undertook further modelling showing a gradation in susceptibility to moderate-severe COPD across deciles of allelic risk score (Online Methods). The risk of moderate-severe COPD was more than three times higher in the top decile than the bottom decile (OR 3.71, 95% CI 3.34 to 4.12, Figure 2b). The estimated proportion of COPD cases attributable to allelic risk scores above the first decile (population attributable risk fraction) was 48.0% (95% CI 43.6 to 52.2%).
Figure 2

Genetic Risk Score associations with COPD susceptibility (a) Forest plot of COPD results for the risk score analysis. Odds ratios per standard deviation of the risk score (~6 alleles) are presented for each study. Studies are grouped according to study design and phenotyping: “eMR”, electronic medical records, which used ICD codes to define COPD (DiscovEHR also used spirometry to refine the COPD definition); “case-control”, COPD case-control, which used post-bronchodilator spirometry to define COPD; “lung resection cohort”, which used a combination of pre and post-bronchodilator spirometry to define COPD; the Icelandic Biobank, deCODE, where cases were selected from a population based study and a study of COPD patients and defined using a spirometric definition, controls were selected as individuals within the cohort that were not known cases (no spirometric definition was used for controls); and UK Biobank (excluding UK BiLEVE), which used spirometry to define both COPD cases and controls. Further details are provided in the Supplementary Note. (b) Odds ratios for spirometrically-defined COPD for weighted genetic risk score deciles in UK Biobank (10,547 cases, pre-bronchodilator % predicted FEV1<80% and FEV1/FVC<0.7, and 53,948 controls, FEV1/FVC>0.7 and % predicted FEV1>80%, weights derived from non-discovery populations). For each decile, odds ratios were obtained using a logistic regression adjusted for age, age2, sex, height, smoking status, pack-years and the first 10 ancestry principal components. The OR comparing the 10th and the 1st decile in ever-smokers only was 3.35 (95% CI 2.93 to 3.84) and in never-smokers only was 4.27 (95% CI 3.61 to 5.06).

We tested association of individual variants and the 95-variant risk score with COPD exacerbations in subsets of individuals from UK Biobank, deCODE, four COPD case-control studies and two eMR studies (total 2,462 COPD exacerbation cases, 15,288 COPD non-exacerbation controls) and the Lung Health Study (100 exacerbation cases, 4,002 controls). There was no association of individual variants or genetic risk score with acute exacerbations of COPD (Supplementary Tables 12 and 13). To evaluate whether these variants showed disease-relevant associations in a non-European population, we studied 71 variants for which data were available in 7,116 COPD cases (20,919 controls) and 5,292 exacerbation cases (1,824 controls) from the China Kadoorie Biobank cohort (CKB) (Supplementary Tables 10 to 13). The allelic risk score was associated with COPD susceptibility (OR per standard deviation change in risk score (95% CI) = 1.08 (1.04-1.11), P=4.2x10-6) suggesting some shared genetic contributions to COPD in European and East Asian descent populations. Thirty-nine of the variants showed a consistent direction of effect on COPD in European and Chinese samples and seven of these were significant (P<0.05). Two signals were significant after correction for multiple testing (Supplementary Table 10c). To assess the impact of including individuals with asthma in a COPD case-control analysis, we tested for association with COPD in UK Biobank both before and after excluding individuals with self-reported doctor-diagnosed asthma and show that the effect size estimates were similar (Supplementary Figure 7).

Implicated genes highlight pathways and druggable targets

Gene expression and genotype data from lung, blood and multi-tissue resources were queried to identify whether the top variant at each of the 97 signals, or a proxy, were significantly associated with changes in expression of any gene (i.e. were an eQTL for any gene). Using this approach, and identification of deleterious variants within the association signal (Online methods, Supplementary Table 14), we implicated 234 genes with potentially causal effects on lung function (Supplementary Table 15). These 234 genes were enriched (False Discovery Rate (FDR) ≤5%) in elastic fibre pathways and in “signalling events mediated by the Hedgehog family”, the latter including CDON implicated by a novel intergenic signal (rs567508, between CDON and RPUSD4) on chromosome 11. We narrowed this group of 234 genes to 68 high-priority genes which were implicated via a deleterious variant or on stricter criteria for gene expression co-localisation (sentinel variant and top expression variant r2≥0.9, Table 2). We found that the 68 high-priority genes were overrepresented (FDR≤5%) among a number of gene ontology terms including SH3 domain binding, GTPase binding, actin binding and fibroblast migration (Supplementary Table 16). Alternative approaches to pathway analyses, which instead use all genome-wide association results, supported previous reports of enrichment of histone and systemic lupus erythematosus pathways[14-16] and additional autoimmune and inflammatory pathways (Supplementary Table 17). Tests for tissue-specific enrichment of lung function signals overlapping histone marks identified enrichment in fetal lung, fetal heart and fibroblasts (H3K4me1), and stomach smooth muscle (H3K4me1 and H3K4me3) (Supplementary Table 18).
Table 2

Genes implicated as high-priority genes for novel genome-wide significant and previously-reported signals using expression data and functional annotation. #Variant did not reach P<5.15x10-4 (Bonferroni corrected P value for 97 tests) in this study for any trait. *Gene implicated as it contained a deleterious variant (Supplementary Table 14); all other genes implicated by co-localisation of GWAS and eQTL signal. (*) implicated by both co-localisation of eQTL and GWAS, and a deleterious variant. All 234 genes implicated are listed in Supplementary Table 15.

Genome-wide significant trait (additional traits with P<5.15x10-4)Variant ID (position b37)Nearest gene(s)High-priority genes
Novel signals
FEV1/FVC (FVC)rs17513135 (chr1:40,035,686)LOC101929516 (intron)PABPC4
FEV1/FVC (FEV1)rs6688537 (chr1:239,850,588)CHRM3 (intron)CHRM3
FEV1/FVC (FEV1)rs2811415 (chr3:127,991,527)EEFSEC (intron)RUVBL1
FEV1/FVC (-)rs13110699 (chr4:89,815,695)FAM13A (intron)FAM13A
FEV1 (FVC, FEV1/FVC)rs3839234 (chr5:148,596,693)ABLIM3 (intron)GRPEL2, ABLIM3
FEV1/FVC (FEV1)rs10515750 (chr5:156,810,072)CYFIP2 (intron)ADAM19
FEV1/FVC (FEV1)rs200003338 (chr6:31,556,155)LST1 (intron)MICB*, MICA*
FEV1/FVC (FEV1)rs10246303 (chr7:7,286,445)C1GALT1 (3’ UTR)C1GALT1
FVC (FEV1)rs10870202 (chr9:139,257,411)DNLZ (intron)INPP5E, CARD9
FVC (FEV1)rs7095607 (chr10:69,957,350)MYPN (intron)MYPN*
FEV1 (FVC)rs2509961 (chr11:62,310,909)AHNAK (intron)ROM1, EML3, MTA2, GANAB, C11orf83*
FEV1/FVC (-)rs59835752 (chr17:28,265,330)EFCAB5 (intron)EFCAB5, CRYBA1, SSH2, SLC6A4
FEV1/FVC (FEV1)rs11658500 (chr17:36,886,828)CISD3 (intron)CISD3*
FEV1 (FVC)rs72448466 (chr20:62,363,640)ZGPAT (intron)LIME1
Previously-reported signals
FEV1 (FVC)rs6681426 (chr1:150,586,971)MCL1/ENSAGOLPH3L
FEV1/FVC (-)rs4328080 (chr1:219,963,088)LYPLAL1/RNU5F-1SLC30A10
FEV1 (FVC, FEV1/FVC)rs2571445 (chr2:218,683,154)TNS1 (exon)TNS1*
FEV1/FVC (-)rs10498230 (chr2:229,502,503)SPHKAP/PID1SPHKAP
FVC (FEV1)rs1595029 (chr3:158,241,767)RSRC1 (intron)RSRC1
FEV1 (FVC, FEV1/FVC)rs10516526 (chr4:106,688,904)GSTCD (intron)INTS12, GSTCD, NPNT
FEV1/FVC (FEV1, FVC)rs34712979 (chr4:106,819,053)NPNT (intron)NPNT
FEV1/FVC (FEV1)rs138641402 (chr4:145,445,779)GYPA/HHIP-AS1HHIP
FEV1/FVC (-)rs153916 (chr5:95,036,700)SPATA9/RHOBTB3RHOBTB3
FEV1/FVC (FEV1)rs1990950 (chr5:156,920,756)ADAM19 (intron)ADAM19
FEV1 (FVC, FEV1/FVC)rs34864796 (chr6:27,459,923)ZNF184/LINC01012OR2B2*
FEV1/FVC (FEV1)rs2857595 (chr6:31,568,469)NCR3/AIF1MICB*
FEV1/FVC (-)rs2070600 (chr6:32,151,443)AGER (exon)AGER(*)
FEV1 (FVC, FEV1/FVC)rs114544105 (chr6:32,635,629)HLA-DQB1/HLA-DQA2HLA-DQB1*, APOM, RNF5
FEV1/FVC (FEV1)rs113096699 (chr6:142,745,883)GPR126 (intron)GPR126
FEV1/FVC (-)rs148274477 (chr6:142,838,173)GPR126/LOC153910GPR126*
FVC (FEV1)rs10858246 (chr9:139,102,831)QSOX2 (intron)QSOX2
FVC (FEV1)rs2348418 (chr12:28,689,514)CCDC91 (intron)FLJ35252
FEV1/FVC# (-)rs11172113 (chr12:57,527,283)LRP1 (intron)LRP1
FEV1# (-)rs7155279 (chr14:92,485,881)TRIP11 (intron)ATXN3
FEV1# (-)rs117068593 (chr14:93,118,229)RIN3 (exon)RIN3(*)
FEV1/FVC (FEV1)rs10851839 (chr15:71,628,370)THSD4 (intron)THSD4
FEV1/FVC (-)rs12447804 (chr16:58,075,282)MMP15 (intron)MMP15
FEV1/FVC (FEV1)rs3743609 (chr16:75,467,021)CFDP1 (intron)TMEM170A, BCAR1, CFDP1
FEV1 (FVC, FEV1/FVC)rs35524223 (chr17:44,192,590)KANSL1 (intron)KANSL1(*), MAPT(*), ARL17B, ARL17A, LRRC37A4, NUDT1, LRRC37A, CRHR1, LRRC37A2, ARHGAP27, FMNL1, PLEKHM1, WNT3, NSF, SPPL2C*
FEV1 (FVC)rs7218675 (chr17:73,513,185)TSEN54 (intron)CASKIN2, TSEN54*
FEV1/FVC (-)rs113473882 (chr19:41,124,155)LTBP4 (intron)LTBP4*
Approved drugs, or drugs in development, target the protein products of 7 of the 234 genes (Supplementary Table 19a). This includes 3 high-priority genes CHRM3, SLC6A4 and CRHR1. CHRM3 and SLC6A4 were both implicated by novel signals (rs6688537:C>A in an intron of CHRM3 and rs59835752:-/A in an intron of EFCAB5, respectively) and encode targets for drugs approved for the treatment of asthma and COPD (CHRM3, muscarinic acetylcholine receptor M3) and anxiety and depression (SLC6A4, serotonin transporter). CRHR1 (implicated by rs35524223:T>A in an intron of KANSL1) encodes the corticotropin releasing factor receptor 1 which is a target for compounds in development for the treatment of anxiety, depression and irritable bowel syndrome. The other 4 genes include NDUFA12 (implicated by rs113745635:C>T in an intron of FGD6) encoding an NADH dehydrogenase which is a target for metformin hydrochloride, primarily used to treat type 2 diabetes, and ITK (implicated by rs10515750 in an intron of CYFIP2) encoding a tyrosine-protein kinase, a target for the cancer drug Pazopanib. Using STRING[32] to find proteins that interact with the proteins encoded by the high priority genes, we highlighted further druggable targets (Supplementary Table 19b). These included the PI3-kinase p110-delta subunit (part of the inositol phosphate metabolism pathway with INPP5E, which was implicated as a high-priority gene by rs10870202 in an intron of DNLZ, and a target for compounds in development for the treatment of COPD and asthma), and matrix metalloproteinases 1, 8 and 7 (targets for doxycycline, which is an antibiotic and anti-malarial).

Discussion

In this study, the power gained by sampling from the extremes of a large biobank whilst retaining the power of a quantitative trait analysis, coupled with strategies to improve coverage of the genome and extensive follow-up, enabled a near-doubling of the number of signals of association with lung function identified to date. We further explored 95 variants, representing 43 novel signals and 52 previously reported signals, and showed that collectively these variants are strongly associated with COPD susceptibility. Using functional evidence from eQTL studies and deleterious variants to link signals to genes, we identified that 41 of the 97 lung function signals are also the strongest signals of association for expression of, or contain deleterious variants within, 68 genes (which we term “high-priority genes”). Amongst these, novel signals in or near FAM13A and ADAM19, both previously associated with lung function and COPD susceptibility[9,33], along with evidence that these signals are themselves eQTLs for FAM13A and ADAM19, provide further evidence for FAM13A and ADAM19 themselves being the drivers of those signals. There was significant enrichment amongst the 68 genes for SH3 domain (including ADAM19), GTPase and actin binding, and fibroblast migration, highlighting the potential importance of pathways relating to the cytoskeleton. The 68 genes identified as high-priority included genes at novel signals encoding targets for which there are approved drugs or drugs in development (Supplementary Table 19). Of note, the muscarinic acetylcholine receptor M3, encoded by CHRM3, is a well-characterised drug target for which many approved drugs exist, including for the treatment of asthma and obstructive lung disease. SLC6A4 encodes a serotonin transporter, a target for a number of drugs approved for treating depression and anxiety disorders, one of which (nortriptyline hydrochloride) has been trialed for use in inflammatory skin disorders (psoriasis and eczema); HTR4, which encodes a serotonin receptor, was identified in one of the earliest lung function GWAS[13]. INPP5E, identified as a high-priority gene for a novel signal of association with FVC (and FEV1) on chromosome 9, encodes inositol polyphosphate-5-phosphatase E, a component of the inositol phosphate metabolism pathway. Another component of the same pathway, phosphoinositide 3-kinase (PI3K) delta is a target of drugs under development for the treatment of a range of indications including COPD and asthma. Mutations in INPP5E cause ciliopathy (Joubert and MORM syndromes). Protective genetic variants that reduce the function or expression of a target protein could be mimicked by drugs and so are of particular interest. The minor allele (MAF 17%) at the novel signal in an intron of FAM13A was associated with decreased expression of FAM13A in lung tissue and reduced risk of COPD. This, together with recent evidence from a study of the Fam13a knockout mouse[34], suggests that pharmacological inhibition of FAM13A may be protective. Extending our pathway analyses to all 234 genes implicated by gene expression or deleterious variants, we observed enrichment of genes related to “signalling events mediated by the Hedgehog family” pathway. Hedgehog signalling plays a crucial role in early development. Three members of this pathway, PTCH1, TGFB2 and HHIP, have been previously reported as likely causal genes underlying lung function association signals[35]. In this study, we additionally report PTHLH, encoding a parathyroid hormone-like hormone, and CDON¸ encoding a Hedgehog co-receptor, as likely causal genes (the latter at a novel signal). Of the 73 well-imputed variants available in children, we show correlation (r=0.62) between variant effect size estimates with those in adults. Should this pattern of correlation apply across all 97 lung-function-associated variants, then this would suggest that many of these variants may act, at least in part, via effects on lung development. Elastic fibre pathways were over-represented; products of elastin degradation have been shown to be elevated during acute exacerbations of COPD [36,37]. In addition, degradation of elastin by excess neutrophil-released elastase in the lung leads to emphysema in individuals with alpha-1 antitrypsin deficiency. CARD9, another high-priority gene at a novel signal, encodes an adaptor protein involved in neutrophil recruitment in respiratory fungal infection[38]. Tissue-specific enrichment of lung function signals overlapping H3K4me1 was seen in stomach smooth muscle. Although comparable H3K4me1 data were not available for airway smooth muscle, similar findings have been reported previously for rectal smooth muscle[39]. The 17q21.31 inversion has previously been associated with lung function. Custom imputation of additional structural variation at the locus, along with eQTL evidence and deleterious variants in the gene, suggested that KANSL1 may drive the association. Amongst the novel signals reported in this study, SNPs in an intron of EEFSEC on chromosome 3 are correlated with expression of nearby gene RUVBL1. Both KANSL1 and RUVBL1 encode members of histone modification complexes. A novel signal on chromosome 20 (rs72448466, intronic in ZGPAT), which showed association with FVC almost as strong as its association with FEV1, is an eQTL for the telomere gene, RTEL1. Although rs72448466:->GT was not the strongest eQTL for RTEL1 (r2=0.6 with the top eQTL variant), RTEL1 is of interest as it has recently been implicated in familial pulmonary fibrosis[40]. Variant rs72448466 has also been associated with inflammatory bowel disease, prostate cancer and atopic dermatitis. Our implication of genes of potential functional relevance to the 97 signals was based on gene expression data (eQTL) and associated deleterious variants within a gene. Although eQTL evidence currently gives the best in silico indication of which gene (or genes) might be functionally relevant to a signal, conclusive evidence for a causal relationship between SNP genotype and gene expression can only be obtained through direct molecular experiments. Six signals of association have been previously identified within the HLA region. Using a custom imputation approach, we identified the presence of alanine (compared to aspartic acid, valine or serine) at amino acid position 57 in HLA-DQβ1 as associated with decreased lung function and the main driver of signals in this region. The presence of alanine is also strongly associated with risk of type 1 diabetes[41]. The three lung function traits we studied are correlated. The overall and genetic correlations were: 0.88 and 0.87 between FEV1 and FVC; 0.46 vs 0.35 between FEV1 and FEV1/FVC and; 0.038 and -0.17 between FVC and FEV1/FVC (transformed traits, as studied in UK Biobank and SpiroMeta[15], respectively). One might expect variants showing strongest association with FEV1 and FEV1/FVC to be of greatest relevance for COPD and genetic correlations of -0.76 and -0.9 have been reported between COPD and FEV1 and FEV1/FVC, respectively[42]. We show, however, that variants associated with one of these traits also tend to be associated with one of the other two lung function traits studied (for example, all but 2 signals for FVC are also associated (P<0.05) with FEV1, Supplementary Table 4). Although classification of COPD in UK Biobank was based on pre-bronchodilator spirometry, we have previously shown that this leads to minimal misclassification of moderate-severe (GOLD 2-4) COPD[43]. The effect size estimates for COPD associations could be influenced by differences in case ascertainment between the follow-up studies. Motivated by avoidance of potential winner’s curse bias for the 48 variants discovered using UK BiLEVE, we excluded UK BiLEVE from individual variant analyses. However, this excluded 9,563 moderate to severe COPD cases, and therefore the significance of COPD association tests for these variants should be interpreted with caution. Notably, we found effect size estimates only slightly smaller in deeply-characterised COPD case-control studies than in UK Biobank (OR per SD change in allelic risk score 1.36 compared to 1.42). Whilst we show an appreciable proportion of COPD cases could be attributable to allelic risk scores above the first decile, great caution must be exercised in interpretation of population attributable risk fraction estimates given considerations of shared etiologic responsibility[44]. The lung function-associated variants we report were not associated with acute exacerbations of COPD. Although more powerful studies of exacerbations will be required, this suggests that different genetic mechanisms could underlie risk of acute exacerbations. A threshold of P<5x10-8 is a valid threshold for genome-wide significance in GWAS analyses of common variants[45]. Our genotyping and imputation strategy resulted in testing of 27.6 million variants of which 21.6 million had MAF<5% and 18.2 million had MAF<1%. Although all of our 43 signals were common, had we adopted a stricter threshold for genome-wide significance, for example, P<1x10-8 (recommended in a recent report of significance thresholds in whole genome sequencing[45]), only two of our signals (rs10246303:A>T in the 3’ UTR of C1GALT1 on chromosome 7, and rs1698268:A>T near LINC00911 on chromosome 14) would not have reached significance. Thirty-nine of the 43 signals were additionally supported by statistically significant independent replication in stage 2 (P<0.05/43, Supplementary Table 3). In summary, our study provides the most comprehensive evidence yet regarding genetic variants associated with lung function and their association with susceptibility to COPD, with a more than threefold difference in COPD risk between highest and lowest allelic risk score deciles. Whilst translation of GWAS findings can take some years and requires extensive additional work, selecting genetically supported targets could double the drug development success rate[17]. The future clinical relevance of our findings include contributions towards understanding of disease pathogenesis, identification of drug targets for targeting or repositioning of drugs[18], and potentially improved prediction of COPD or its subtypes.

Data Availability Statement

The stage 1 (UK BiLEVE) genome-wide association results for FEV1, FVC and FEV1/FVC are available from UK Biobank at http://www.ukbiobank.ac.uk/. The sources of all other data utilised in this study can be found in the Online Methods and Supplementary Note.

Online Methods

Study Governance

UK Biobank has ethical approval from the NHS National Research Ethics Service (Ref 11/NW/0382). Informed consent was obtained from all participants. All other studies were approved by an appropriate ethics committee or data protection authority (Supplementary Note).

Stage 1 study sample selection

A genome-wide discovery study for variants associated with lung function measures was performed in 48,943 individuals from the UK BiLEVE[16] subset of UK Biobank (UK BiLEVE, stage 1). In brief, UK Biobank comprised 502,682 individuals of whom 275,939 were of self-reported European-ancestry and had ≥2 Forced Expired Volume in 1s (FEV1) and Forced Vital Capacity (FVC) measures (Vitalograph Pneumotrac 6800, Buckingham, UK) passing ATS/ERS criteria[46]. Based on the best (highest) available FEV1 measurement, 50,008 individuals from groups with extreme low (n=10,002), near-average (n=10,000) and extreme high (n=5,002) % predicted FEV1 were selected from amongst never-smokers (total n=105,272) and the same numbers from amongst the heavy-smokers (mean 35 pack-years of smoking, total n=46,758). FEV1, FVC and FEV1/FVC distributions are summarised in Supplementary Figure 8. Genotyping was undertaken using the Affymetrix Axiom UK BiLEVE array[16] and imputed to the 1000 Genomes Project Phase 1[47] and UK10K[48,49] combined panel. A total of 27,624,732 imputed or directly genotyped autosomal variants with imputation quality (info) >0.5 and minor allele count (MAC) ≥3 were included in the analysis. In total, 48,943 unrelated individuals passed all quality control steps and were used in this analysis.

Association testing and selection of signals from stage 1 for follow-up in stage 2

Power calculations were undertaken using Quanto (see URLs) (Supplementary Figure 9). For stage 1, genome-wide association studies of FEV1, FVC and FEV1/FVC were undertaken separately in heavy-smokers and never-smokers and then meta-analysed for each trait. Linear regression of age, age2, sex, height, the first 10 principal components of genetic ancestry and pack years of smoking (in smokers) on each trait was undertaken and residuals were ranked and transformed to inverse normally distributed Z-scores. For the first 26 lung function variants reported[11,13,14,50] we showed Stage 2 effect size estimates[14] were comparable with those from inverse normally distributed Z-scores in UK BiLEVE (Supplementary Figure 10). Subsequently these Z-scores were used for genome-wide association testing using an additive genetic model (SNPTEST v2.5). The full genome-wide stage 1 results are available via UK Biobank (see URLs). From each of the three discovery GWAS, signals were selected for follow-up in stage 2 if they met an initial threshold of P<5x10-7. Low MAC variants (MAC between 3 and 20), were selected for follow-up only if the imputation quality (info) exceeded 0.8. Independence of signals was determined as follows: the most strongly associated (P<5x10-7) variant within a 1Mb region was selected as a putative signal and then the analysis repeated for that 1Mb region conditioning on the most strongly associated variant. Any variant which then had a conditional P<5x10-7 was then assigned as a secondary putative signal and also included in the conditional analysis. This was repeated until no variants with P<5x10-7 remained within the 1Mb region. Results were confirmed using a joint conditional analysis (GCTA[51]) and visual inspection of region plots. Previously reported signals were not included in the final list of putative signals to be taken for follow-up in stage 2. Where novel signals for different traits were in linkage disequilibrium (r2 > 0.2), the variant for the trait with the most significant association was followed up. Due to the extended LD structure in the MHC region, conditional analyses and GCTA were run over a 9Mb region (chr6:24,126,750-33,126,689). Two pairs of signals previously reported as being independent (rs16909859:G>A[11] and rs16909898:A>G[14] in PTCH1, and rs34712979:G>A[16] and rs6856422:T>G[15], in NPNT) were found to be correlated in our data.

Stage 2 – follow-up in independent studies (quantitative lung function)

Putative novel signals of association from stage 1 were followed up in three independent sets of samples (stage 2): (i) an independent subset of UK Biobank participants (UK Biobank, n=49,727), (ii) a population-based consortium (SpiroMeta, n=38,199)[15] and (iii) UK Households Longitudinal Study (UKHLS, n=7,449). We did not include these studies in Stage 1 as: (ii) was to be utilised for independent replication and; (i) and (iii) were not yet available when Stage 1 was undertaken. Each signal was followed-up only for the trait with which it was most strongly associated in Stage 1. The first tranche of genotype data and imputation output (merged 1000 Genomes Project Phase 3 and UK10K imputation panel) from UK Biobank was released May 2015 (see URLs) and comprised the 49,979 individuals originally genotyped for UK BiLEVE (an unrelated subset of 48,943 of which were used as discovery in this study) and an additional 102,757 individuals selected at random from the entire UK Biobank. From these 102,757 individuals, we initially selected 51,117 samples that had lung function measurements (FEV1 and FVC) meeting ATS/ERS criteria and had covariates age, sex, height, principal components and smoking status recorded. Following further exclusion of individuals with sex mismatches (n=41), individuals of non-European ancestry (based on k-means clustering of principal components 1 and 2 with 4 clusters, n=124) and one individual from each pair of related samples (KING relatedness > 0.088 [2nd degree], n=1,225), a total of 49,727 individuals remained for analysis. The details of the SpiroMeta consortium analysis (including contributing studies, spirometry details and methods) are described elsewhere[15]. In brief, this was an inverse variance weighted fixed effects meta-analysis of 17 studies with imputation to 1000 Genomes Project Phase 1 reference panel. Within each study, FEV1, FVC and FEV1/FVC were adjusted for age, age2, sex, height and population structure, separately for ever and never-smokers. Inverse normal transformed residuals were then tested for association within each smoking stratum assuming an additive genetic effect and then meta-analysed. Genomic control was applied to account for residual population structure. We only included SpiroMeta meta-analysis results in the meta-analysis in this study if Neffective > 70% (i.e. >70% of 38,199), where Neffective is the effective sample size after scaling for imputation quality[15]. Summary statistics of a GWAS of FEV1, FVC and FEV1/FVC in 7,449 individuals were available from UKHLS (Supplementary Note). SNPs were genotyped using the Illumina Infinium HumanCoreExome BeadChip Kit and imputed against the same 1000 Genomes Project + UK10K combined imputation panel as used in discovery in this study. Association testing was performed separately for ever and never-smokers with covariates age, age2, sex height and ancestry principal components, as for Stage 1. We only included UKHLS results in the meta-analysis in this study if imputation info >0.5 and MAC >=3.

Meta-analysis of stage 1 and stage 2

All meta-analyses were undertaken using fixed effects inverse variance weighting which takes directionality of association into account. Effect estimates for all variants followed up in stage 2 were meta-analysed across the three stage 2 studies and then the combined result was meta-analysed with stage 1 results. Where the discovery variant was not present in any stage 2 study, a proxy (r2>0.8) that was available in all stage 1 and stage 2 studies was used. We report signals with association P<5x10-8 in the meta-analysis of stages 1 and 2 as novel signals of association with lung function.

Assessment of stage 1 and stage 2 sample overlap by LD score regression

LD score regression was used to assess the extent of confounding. Absence of significant confounding indicates that factors such as sample overlap and/or population stratification are not evident. Pre-computed LD scores from a European population were used (see URLs), based on genotypes for 1,293,150 HapMap3 SNPs in samples from the 1000 Genomes Project EUR population. Association results were filtered (info > 0.9 and MAF > 1%) before running LD score regression on (i) 3 pairwise meta-analyses of results from UK BiLEVE (stage 1) and UK Biobank (stage 2), UK BiLEVE and SpiroMeta and UK Biobank and SpiroMeta; (ii) bivariate analyses of the 3 pairs of cohorts.

Effect sizes in adults and children

The effects of variants on lung function in children were also tested in 5,062 children from ALSPAC (mean age 8.6) and 1,220 children from the Raine study (mean age 8.1). Data were available for 81 of the 97 variants (a proxy variant with r2>0.7 was used for 11 signals) with imputation quality >0.5 of which 73 had imputation quality >0.8 (71 variants in ALSPAC and 35 in the Raine study). Association results from the two cohorts were combined using inverse variance weighted meta-analysis. A weighted risk score was approximated using pooled single SNP results, as described in Dastani et al[52], and weights obtained using estimated effect sizes from either SpiroMeta[15] summary data (for SNPs discovered in UK Biobank), or from UK Biobank (for SNPs discovered elsewhere). The risk score was tested for the three lung function traits: FEV1, FVC and FEV1/FVC.

Refinement of signals

A Bayesian method[53] was used to fine-map lung function-associated signals to the set of variants that were 95% likely to contain the underlying causal variant (assuming that the causal variant has been analysed). This was undertaken for novel signals and for previously-reported signals which reached P<10-5 in the stage 1 results. Following van de Bunt et al.[54] we set the value of a prior W=0.4 in the approximate Bayes Factor formula. Signals in the HLA were not included. We re-imputed our 48,943 discovery samples across the HLA (chr6:29,607,078-33,267,103 (b37)) using IMPUTE2 v2.3.1 with a reference panel incorporating classical HLA alleles and amino acid changes[55]. The reference panel contained haplotypes for 5,225 samples from the type 1 diabetes genetics consortium (T1DGC) across 8,961 biallelic variants comprised of 5,863 directly genotyped biallelic SNPs and 3,098 surrogate biallelic variants encoding multiallelic SNPs, indels, classical HLA alleles and amino acid changes. Association testing was then undertaken as described for stage 1 for FEV1 and FEV1/FVC.

Effects of lung function associated variants on other traits

To identify whether the novel and previously reported lung function-associated variants had been reported in previous GWAS as associated with traits other than lung function and COPD, we queried the GWAS Catalog[56] (last update: 13/03/2016, downloaded on 17/03/16) and GRASP[57] (v2.0, downloaded on 17/03/16) for genome-wide significant (P<5x10-8) signals using the 95% credible set (if calculated) or all proxy SNPs (r2>0.8) within 2Mb of the top variant in our data.

Clinical relevance – COPD susceptibility and risk of COPD exacerbations in European and Chinese populations

The effect on COPD susceptibility of up to 95 out of the 97 lung function-associated signals was tested in the COPD study at deCODE Genetics (deCODE COPD study) (1,964 COPD cases and 142,262 controls for single-variant analyses and 1,248 COPD cases and 74,700 controls for risk score analyses), in three lung resection studies: Groningen, Laval and UBC (310 COPD cases and 332 controls), in the following COPD case-control studies: COPDGene Study (2,812 COPD cases and 2,534 controls), Evaluation of COPD Longitudinally to Identify Predictive Surrogate End-points (ECLIPSE) (1,736 COPD cases and 176 controls), National Emphysema Treatment Trial (NETT) and Normative Aging Study (NAS) (NETT/NAS, 376 COPD cases and 435 controls) and the Norway GenKOLS study (Genetics of Chronic Obstructive Lung Disease) (854 cases and 805 controls), in the following eMR studies: Mount Sinai BioMe Biobank (BioMe, 207 COPD cases and 1,817 controls) and Geisinger-Regeneron DiscovEHR Study (DiscovEHR, 1,280 COPD cases and 13,321 controls for single-variant analyses and 1,264 COPD cases and 13,032 controls for risk score analyses), and in UK Biobank (not including UK BiLEVE samples, 984 cases and 26,561 controls in total) and UK BiLEVE (9,563 moderate-severe cases, 27,387 controls). rs7050036, located in chromosome X, and chr12:114743533, with MAF= 0.15%, were not present in most studies and therefore were excluded from these analyses, bringing the 97 signals to 95. Of the 95 signals, 47 signals were previously discovered independently of UK BiLEVE and were tested for association using all available COPD cases and controls (20,086 COPD cases and 215,630 controls). The remaining 48 signals were discovered using UK BiLEVE data and so were tested for association using 10,523 COPD cases and 188,243 controls (UK BiLEVE excluded). The effect on risk of COPD exacerbation was additionally tested in the Lung Health Study (LHS) (100 COPD exacerbation cases and 4,002 COPD controls) as well as subsets of UK Biobank (including UK BiLEVE, 647 cases and 9,900 controls), COPDGene (557 cases and 2,255 controls), ECLIPSE (278 cases and 1,458 controls), NETT/NAS (87 cases and 277 controls), GenKOLS (120 cases and 734 controls), BioMe (8 cases and 199 controls) and DiscovEHR (774 cases and 472 controls). Analyses of the effect of lung function variants on COPD susceptibility and on risk of COPD exacerbations in a Chinese ancestry population were undertaken using the China Kadoorie Biobank prospective cohort (CKB) within which data were available for 71 (single variant analyses) or 70 (risk score analyses) of the 95 variants (or proxies) for analyses of COPD susceptibility (7,116 COPD cases and 20,919 controls) and risk of COPD exacerbation (5,292 cases and 1,824 controls). Further details of all studies, including case and control definitions are in the Supplementary Note and Supplementary Table 20. To test the single variant associations with COPD susceptibility and risk of exacerbation, logistic regression using age, age2, sex, and height as covariates (unless otherwise indicated, Supplementary Note) and assuming an additive genetic effect was used. To test the joint effect of these variants, risk alleles in the subset of the 95 signals with data available in each study (from 86 to 95) were summed to create an unweighted genetic risk score and logistic regression was used to test the effect of the risk score, as a continuous variable, on COPD status and COPD exacerbation status (adjusted for age, age2, sex and height, unless otherwise indicated, Supplementary Note). Results, both from single variant and risk scores, were meta-analysed separately for studies where similar study design and phenotyping was used: eMR, case-control and lung resection, and results were also meta-analysed across studies. Inverse variance weighted meta-analysis was used. In CKB, analyses were adjusted for sex, age, age2, height, region (n=10) and disease status (n=5) and final results were GC-corrected based on genome-wide inflation estimates. Heterogeneity was tested using I2(ref 58). We calculated odds ratios for spirometrically-defined COPD for weighted risk score deciles in UK Biobank (incorporating UK BiLEVE, 10,547 cases, pre-bronchodilator % predicted FEV1<80% and FEV1/FVC<0.7, and 53,948 controls, FEV1/FVC>0.7 and % predicted FEV1>80%). The weighting of the risk score was undertaken using COPD logOR calculated in studies free of winner’s curse bias (Supplementary Table 21). We scaled the logOR, so that the weights added up to 95.

Population attributable risk fraction calculation

The population attributable risk fraction (PARF) was calculated using the formula below where P(E) is the probability of the exposure, in this case the probability of having more risk alleles than those in the lowest decile of the risk score (P(E) =0.9), and the OR refers to the odds of having COPD for individuals in deciles 2 to 10 of the risk score compared to the odds of having COPD for individuals in the lowest decile (decile 1) of the risk score. The ORs were calculated separately in ever and heavy-smokers using a logistic regression adjusted for age, age2, sex, height and the first 10 ancestry principal components, and an additional pack-years adjustment for heavy-smokers, and were then meta-analysed using inverse variance weighting. Confidence intervals were estimated using the formula above with the lower and upper bound of the meta-analysed OR estimated by logistic regression. These analyses were run using UK Biobank data and the COPD case definition described above: individuals with % predicted FEV1<80% and FEV1/FVC<0.7 were selected as COPD cases and those with FEV1/FVC>0.7 and % predicted FEV1>80% were selected as controls.

Implication of causal genes

In order to implicate the likely causal gene (or genes) for each of the novel and previously-reported signals (97 in total), we employed functional annotation and analysis of gene expression data. All variants within 25kb, variants within 500kb and with r2>0.5 of the top SNP at each signal and variants within 1Mb and with r2>0.8 with the top SNP were annotated using ENSEMBL’s Variant Effect Predictor (VEP). A variant was labelled as deleterious if it was a missense coding variant that was annotated as ‘deleterious’ by SIFT, ‘probably damaging’ or ‘potentially damaging’ by PolyPhen-2, had a CADD scaled score ≥ 20 (CADD_PHRED ≥ 20), or had a GWAVA score > 0.5. The deleterious variants were each, in turn, included as a covariate in the association analysis for the top SNP. If inclusion of the deleterious variant as a covariate reduced the association signal for the top SNP such that P>0.01, that deleterious variant was deemed to explain part of the signal. If annotation (e.g. a coding variant) implicated a specific gene, then the gene was classified as a high-priority gene for the relevant signal. At each signal, the sentinel SNP and top proxies with r2>0.4 and within 2Mb, no limit on number of proxies, were used to query 3 eQTL resources; lung eQTL[23,24,59], blood eQTL[60] and GTEx[61] (artery (aorta and tibia), adrenal gland, colon sigmoid, esophagus (gastroesophageal junction and mucosa), transformed fibroblasts, lung, spleen, skin (sun exposed lower leg), stomach, testis, thyroid, whole blood). A False Discovery Rate (FDR) of 10% was used as a threshold for significance in the lung and blood eQTL datasets and 5% in GTEx (due to large number of different tissues and cells, and small sample size). A gene was classified as a potential causal gene if the sentinel SNP or proxy (r2>0.4) showed significant evidence of being an eQTL signal for that gene. Genes were further classified as high-priority genes if the variant most strongly associated with the lung function traits (or a proxy with r2>0.9) was also the variant most strongly associated with expression of the gene in one or more of the eQTL datasets (i.e. there was co-localisation of the lung function associated SNP and the gene expression associated SNP). Due to extended linkage disequilibrium across the MHC region, only high-priority genes were identified for the signals in the MHC.

Pathway analyses

The genes implicated for each signal (high-priority genes only and all genes) were tested for enrichment of gene sets and pathways using ConsensusPathDB[62]. Pathways or gene sets represented entirely by genes implicated by the same association signal were excluded. Pathways or gene sets represented by 2 or more genes from the same association signal were flagged. Pathway enrichment using all genome-wide P values was undertaken using MAGENTA[63] as previously described[15]. Gene sets/pathways with FDR<5% either including the HLA region or excluding the HLA region were reported.

Tissue specific enrichment of overlap of histone marks

Two methods were used to test for enrichment of the 97 signals of association with lung function for H3K4me1 and H3K4me3 histone marks in up to 127 different tissue and cell types from the ENCODE and RoadMap projects[39]. First, enrichment was investigated using a hypergeometric test (as previously described[39]) using SNPs from the GWAS Catalog (hg19, downloaded 02/11/2015) as background. The GWAS Catalog was pruned within each contributing GWAS study to retain only SNPs that were at least 1Mb apart within that study resulting in 18,202 SNPs for further analysis. BEDtools was used to calculate overlap with precomputed “gapped peaks” for H3K4me1 and H3K4me3 histone marks and a hypergeometric test was used to test the significance of enrichment of the 97 lung function variants compared to the background of GWAS Catalog SNPs. Control for multiple testing was undertaken by picking 97 random variants from the pruned GWAS Catalog and repeating the enrichment computation. FDR was calculated from 10,000 randomizations and FDR=10% was used as a threshold. The second method used, GoShifter, calculates overlap enrichment against a null distribution generated by locally shifting annotations[64]. Linkage disequilibrium was calculated using the stage 1 population. Precomputed “narrow peaks” for H3K4me1 and H3K4me3 histone marks from the Roadmap project were used. Tissues/cell types with overlap enrichment P<0.05 are reported.

Druggability

We searched the ChEMBL database (v21, last update: 01/02/2016, downloaded on 11/02/16) to identify whether any of the implicated genes encoded proteins that were targets for approved drugs, or drug compounds in development. We additionally searched for genes predicted to interact (parameters: STRING score ≥0.90; maximum of 10 interactions per gene) with each of the high-priority genes[32].
  62 in total

1.  Evidence for major genes influencing pulmonary function in the NHLBI family heart study.

Authors:  J B Wilk; L Djousse; D K Arnett; S S Rich; M A Province; S C Hunt; R O Crapo; M Higgins; R H Myers
Journal:  Genet Epidemiol       Date:  2000-07       Impact factor: 2.135

2.  Familial aggregation and heritability of adult lung function: results from the Busselton Health Study.

Authors:  L J Palmer; M W Knuiman; M L Divitini; P R Burton; A L James; H C Bartholomew; G Ryan; A W Musk
Journal:  Eur Respir J       Date:  2001-04       Impact factor: 16.671

Review 3.  Measuring inconsistency in meta-analyses.

Authors:  Julian P T Higgins; Simon G Thompson; Jonathan J Deeks; Douglas G Altman
Journal:  BMJ       Date:  2003-09-06

4.  Standardisation of spirometry.

Authors:  M R Miller; J Hankinson; V Brusasco; F Burgos; R Casaburi; A Coates; R Crapo; P Enright; C P M van der Grinten; P Gustafsson; R Jensen; D C Johnson; N MacIntyre; R McKay; D Navajas; O F Pedersen; R Pellegrino; G Viegi; J Wanger
Journal:  Eur Respir J       Date:  2005-08       Impact factor: 16.671

5.  A Bayesian measure of the probability of false discovery in genetic epidemiology studies.

Authors:  Jon Wakefield
Journal:  Am J Hum Genet       Date:  2007-07-03       Impact factor: 11.025

Review 6.  Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease. NHLBI/WHO Global Initiative for Chronic Obstructive Lung Disease (GOLD) Workshop summary.

Authors:  R A Pauwels; A S Buist; P M Calverley; C R Jenkins; S S Hurd
Journal:  Am J Respir Crit Care Med       Date:  2001-04       Impact factor: 21.405

7.  Evaluation of COPD Longitudinally to Identify Predictive Surrogate End-points (ECLIPSE).

Authors:  J Vestbo; W Anderson; H O Coxson; C Crim; F Dawber; L Edwards; G Hagan; K Knobil; D A Lomas; W MacNee; E K Silverman; R Tal-Singer
Journal:  Eur Respir J       Date:  2008-01-23       Impact factor: 16.671

8.  A randomized trial comparing lung-volume-reduction surgery with medical therapy for severe emphysema.

Authors:  Alfred Fishman; Fernando Martinez; Keith Naunheim; Steven Piantadosi; Robert Wise; Andrew Ries; Gail Weinmann; Douglas E Wood
Journal:  N Engl J Med       Date:  2003-05-20       Impact factor: 91.245

9.  A genome-wide association study of pulmonary function measures in the Framingham Heart Study.

Authors:  Jemma B Wilk; Ting-Hsu Chen; Daniel J Gottlieb; Robert E Walter; Michael W Nagle; Brian J Brandler; Richard H Myers; Ingrid B Borecki; Edwin K Silverman; Scott T Weiss; George T O'Connor
Journal:  PLoS Genet       Date:  2009-03-20       Impact factor: 5.917

10.  A genome-wide association study in chronic obstructive pulmonary disease (COPD): identification of two major susceptibility loci.

Authors:  Sreekumar G Pillai; Dongliang Ge; Guohua Zhu; Xiangyang Kong; Kevin V Shianna; Anna C Need; Sheng Feng; Craig P Hersh; Per Bakke; Amund Gulsvik; Andreas Ruppert; Karin C Lødrup Carlsen; Allen Roses; Wayne Anderson; Stephen I Rennard; David A Lomas; Edwin K Silverman; David B Goldstein
Journal:  PLoS Genet       Date:  2009-03-20       Impact factor: 5.917

View more
  109 in total

1.  Mitochondrial DNA variants and pulmonary function in older persons.

Authors:  Carlos A Vaz Fragoso; Todd M Manini; John A Kairalla; Thomas W Buford; Fang-Chi Hsu; Thomas M Gill; Stephen B Kritchevsky; Mary M McDermott; Jason L Sanders; Steven R Cummings; Gregory J Tranah
Journal:  Exp Gerontol       Date:  2018-12-01       Impact factor: 4.032

Review 2.  Of pleiotropy and trajectories: Does the TGF-β pathway link childhood asthma and chronic obstructive pulmonary disease?

Authors:  Avery DeVries; Donata Vercelli
Journal:  J Allergy Clin Immunol       Date:  2018-04-27       Impact factor: 10.793

3.  Granulocyte-CSF links destructive inflammation and comorbidities in obstructive lung disease.

Authors:  Evelyn Tsantikos; Maverick Lau; Cassandra Mn Castelino; Mhairi J Maxwell; Samantha L Passey; Michelle J Hansen; Narelle E McGregor; Natalie A Sims; Daniel P Steinfort; Louis B Irving; Gary P Anderson; Margaret L Hibbs
Journal:  J Clin Invest       Date:  2018-04-30       Impact factor: 14.808

4.  GWAS and systems biology analysis of depressive symptoms among smokers from the COPDGene cohort.

Authors:  Jonathan T Heinzman; Karin F Hoth; Michael H Cho; Phuwanat Sakornsakolpat; Elizabeth A Regan; Barry J Make; Gregory L Kinney; Frederick S Wamboldt; Kristen E Holm; Nicholas Bormann; Julian Robles; Victor Kim; Anand S Iyer; Edwin K Silverman; James D Crapo; Shizhong Han; James B Potash; Gen Shinozaki
Journal:  J Affect Disord       Date:  2018-09-07       Impact factor: 4.839

Review 5.  Lung functional development and asthma trajectories.

Authors:  Fabienne Decrue; Olga Gorlanova; Jakob Usemann; Urs Frey
Journal:  Semin Immunopathol       Date:  2020-01-27       Impact factor: 9.623

6.  Leveraging Big Data to Transform Drug Discovery.

Authors:  Benjamin S Glicksberg; Li Li; Rong Chen; Joel Dudley; Bin Chen
Journal:  Methods Mol Biol       Date:  2019

7.  Highlights from the European Respiratory Society 2018 Annual Congress: environment and epidemiology (assembly 6).

Authors:  André F S Amaral
Journal:  J Thorac Dis       Date:  2018-09       Impact factor: 2.895

8.  A Genetic Risk Score Associated with Chronic Obstructive Pulmonary Disease Susceptibility and Lung Structure on Computed Tomography.

Authors:  Elizabeth C Oelsner; Victor E Ortega; Benjamin M Smith; Jennifer N Nguyen; Ani W Manichaikul; Eric A Hoffman; Xiuqing Guo; Kent D Taylor; Prescott G Woodruff; David J Couper; Nadia N Hansel; Fernando J Martinez; Robert Paine; Meilan K Han; Christopher Cooper; Mark T Dransfield; Gerard Criner; Jerry A Krishnan; Russell Bowler; Eugene R Bleecker; Stephen Peters; Stephen S Rich; Deborah A Meyers; Jerome I Rotter; R Graham Barr
Journal:  Am J Respir Crit Care Med       Date:  2019-09-15       Impact factor: 21.405

Review 9.  Obstructive lung diseases and risk of rheumatoid arthritis.

Authors:  H Maura Friedlander; Julia A Ford; Alessandra Zaccardelli; Alexsandra V Terrio; Michael H Cho; Jeffrey A Sparks
Journal:  Expert Rev Clin Immunol       Date:  2020-01-06       Impact factor: 4.473

10.  Effect of terbutaline plus doxofylline on chronic obstructive pulmonary disease.

Authors:  Qiuyan Luo; Xue Peng; Hua Zhang
Journal:  Am J Transl Res       Date:  2021-06-15       Impact factor: 4.060

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.