Literature DB >> 30804561

Genetic landscape of chronic obstructive pulmonary disease identifies heterogeneous cell-type and phenotype associations.

Phuwanat Sakornsakolpat1,2, Dmitry Prokopenko1,3, Brian D Hobbs1,4, Michael H Cho5,6, Maxime Lamontagne7, Nicola F Reeve8, Anna L Guyatt8, Victoria E Jackson8, Nick Shrine8, Dandi Qiao1, Traci M Bartz9,10,11, Deog Kyeom Kim12, Mi Kyeong Lee13, Jeanne C Latourelle14, Xingnan Li15, Jarrett D Morrow1, Ma'en Obeidat16, Annah B Wyss13, Per Bakke17, R Graham Barr18, Terri H Beaty19, Steven A Belinsky20, Guy G Brusselle21,22,23, James D Crapo24, Kim de Jong25,26, Dawn L DeMeo1,4, Tasha E Fingerlin27,28, Sina A Gharib29, Amund Gulsvik17, Ian P Hall30,31, John E Hokanson32, Woo Jin Kim33, David A Lomas34, Stephanie J London13, Deborah A Meyers15, George T O'Connor35,36, Stephen I Rennard37,38, David A Schwartz39,40, Pawel Sliwinski41, David Sparrow42, David P Strachan43, Ruth Tal-Singer44, Yohannes Tesfaigzi20, Jørgen Vestbo45, Judith M Vonk25,26, Jae-Joon Yim46, Xiaobo Zhou1, Yohan Bossé7,47, Ani Manichaikul48,49, Lies Lahousse21,50, Edwin K Silverman1,4, H Marike Boezen25,26, Louise V Wain8,51, Martin D Tobin8,51.   

Abstract

Chronic obstructive pulmonary disease (COPD) is the leading cause of respiratory mortality worldwide. Genetic risk loci provide new insights into disease pathogenesis. We performed a genome-wide association study in 35,735 cases and 222,076 controls from the UK Biobank and additional studies from the International COPD Genetics Consortium. We identified 82 loci associated with P < 5 × 10-8; 47 of these were previously described in association with either COPD or population-based measures of lung function. Of the remaining 35 new loci, 13 were associated with lung function in 79,055 individuals from the SpiroMeta consortium. Using gene expression and regulation data, we identified functional enrichment of COPD risk loci in lung tissue, smooth muscle, and several lung cell types. We found 14 COPD loci shared with either asthma or pulmonary fibrosis. COPD genetic risk loci clustered into groups based on associations with quantitative imaging features and comorbidities. Our analyses provide further support for the genetic susceptibility and heterogeneity of COPD.

Entities:  

Mesh:

Year:  2019        PMID: 30804561      PMCID: PMC6546635          DOI: 10.1038/s41588-018-0342-2

Source DB:  PubMed          Journal:  Nat Genet        ISSN: 1061-4036            Impact factor:   38.330


Introduction

Chronic obstructive pulmonary disease (COPD) is a disease of enormous and growing global burden[1], ranked third as a global cause of death by the World Health Organization in 2016[2]. Environmental risk factors, predominately cigarette smoking, account for a large fraction of disease risk, but there is considerable variability in COPD susceptibility among individuals with similar smoking exposure. Studies in families and in populations demonstrate that genetic factors account for a substantial fraction of disease susceptibility. Similar to other adult-onset complex diseases, common variants likely account for the majority of population genetic susceptibility[3,4]. Our previous efforts identified 22 genome-wide significant loci[5]. Expanding the number of loci can lead to novel insights into disease pathogenesis, not only through discovery of novel biology at individual loci[6,7], but also across loci via identification functional links and specific cell types and phenotypes[5]. We performed a genome-wide association study combining previously described studies from the International COPD Genetics Consortium (ICGC)[5] with additional subjects from the UK Biobank[8], a population-based study of several hundred thousand subjects with lung function and cigarette smoking assessment. We determined, through bioinformatic and computational analysis, the likely set of variants, genes, cell types, and biologic pathways implicated by these associations. Finally, we assessed our genetic findings for relevance to COPD-specific, respiratory, and other phenotypes.

Results

Genome-wide association study of COPD

We included a total of 257,811 individuals from 25 studies in the analysis, including studies from International COPD Genetics Consortium and UK Biobank (Figure 1). We defined COPD based on pre-bronchodilator spirometry according to modified Global Initiative for Chronic Obstructive Lung Disease (GOLD) criteria for moderate to very severe airflow limitation[9], as done previously[5]. This definition resulted in 35,735 cases and 222,076 controls (Supplementary Table 1). We tested association of COPD and 6,224,355 variants in a meta-analysis of 25 studies using a fixed-effects model. We found no evidence of confounding by population substructure using linkage disequilibrium score regression[10] (LDSC) intercept (1.0377, s.e. 0.0094).
Figure 1

Study design

COPD, chronic obstructive pulmonary disease; FEV1, force expiratory volume in one second; FVC, forced vital capacity. ARIC, Atherosclerosis Risk in Communities.

We identified 82 loci (defined using 2-Mb windows) at genome-wide significance (P < 5 × 10−8) (Figure 1 and 2; Supplementary Figures 1 and 2). Forty-seven of 82 loci were previously described as genome-wide significant in COPD[5,11] or lung function[12-20] (Supplementary Table 2), leaving 35 novel loci (Table 1) at the time of analysis. We then sought to replicate these loci. Given the strong genetic correlation between population-based lung function and COPD, we tested the lead variant at each locus for association with forced expiratory volume in 1 s (FEV1)or FEV1/forced vital capacity (FVC) in 79,055 individuals from SpiroMeta[21] (Supplementary Table 3). We identified 13 loci - C1orf87, DENND2D, DDX1, SLMAP, BTC, FGF18, CITED2, ITGB8, STN1, ARNTL, SERP2, DTWD1, and ADAMTSL3 – that replicated using a Bonferroni correction for a one-sided P < 0.05/35; Table 1). Although not meeting the strict Bonferroni threshold, additional 14 novel loci were nominally significant in SpiroMeta (consistent direction of effect and one-sided P < 0.05): ASAP2, EML4, VGLL4, ADCY5, HSPA4, CCDC69, RREB1, ID4, IER3, RFX6, MFHAS1, COL15A1, TEPP, and THRA (Table 1), and all 82 loci showed consistent direction of effect with either FEV1 or FEV1/FVC ratio in SpiroMeta (Table 1 and Supplementary Table 2). We note that 9 of our 35 novel loci were recently described in a contemporaneous analysis of lung function in UK Biobank[21]. None of the novel loci appeared to be explained by cigarette smoking, and variant effect sizes in ever- and never-smokers and including and excluding self-reported asthmatics were similar (Supplementary Note). In addition, we found no significant differences in variant effects by sex (Supplementary Note). Including all 82 genome-wide significant variants, we explain up to 7.0% of the phenotypic variance in liability scale, using a 10% prevalence of COPD, acknowledging that these effects are likely overestimated in the discovery sample. This represents up to a 48% increase in COPD phenotypic variance explained by genetic loci compared to the 4.7% explained by 22 loci reported in a recent GWAS of COPD[5].
Figure 2

Manhattan plot

P-values are two-sided based on Wald statistics (35,735 cases and 222,076 controls) without multiple comparison adjustment. Loci are labeled with the closest gene to the lead variant.

Table 1

Meta-analysis results showing 35 loci novel for COPD and lung function

rsIDHGVS nameClosest geneLocusRisk/alt. alleleRisk allele frequencyUK BiobankICGC CohortsOverall meta-analysisSpiroMeta FEV1SpiroMeta FEV1/FVC

OR95% CIP valueOR95% CIP valueOR95% CIP valueBetaP valueBetaP value
rs72673419NC_000001.10:g.60913143C>TC1orf871p32.1T/C0.051.131.07–1.182.0E-061.191.08–1.312.9E-041.141.09–1.194.0E-09−0.021.3E-01−0.053.0E-04
rs629619NC_000001.10:g.111738108C>TDENND2D1p13.3T/C0.201.081.05–1.117.5E-081.101.04–1.168.2E-041.081.06–1.112.9E-10−0.014.6E-01−0.029.3E-04
rs10929386NC_000002.11:g.15906179C>TDDX12p24.3C/T0.491.061.03–1.081.2E-061.071.03–1.121.8E-031.061.04–1.089.1E-09−0.023.1E-06−0.027.1E-06
rs62259026NC_000003.11:g.57746515C>TSLMAP3p14.3C/T0.751.071.04–1.091.8E-061.071.02–1.123.8E-031.071.04–1.092.4E-08−0.033.3E-05−0.016.4E-02
rs4585380NC_000004.11:g.75673363G>ABTC4q13.3G/A0.741.071.04–1.101.2E-071.061.02–1.118.1E-031.071.05–1.093.4E-09−0.023.7E-04−0.021.3E-03
rs12519165NC_000005.9:g.170901586A>TFGF185q35.1A/T0.381.081.05–1.102.7E-101.020.97–1.084.0E-011.071.05–1.091.1E-09−0.026.5E-03−0.032.4E-07
rs646695NC_000006.11:g.140280398T>CCITED26q24.1C/T0.241.071.05–1.103.0E-081.091.04–1.143.4E-041.081.05–1.104.6E-11−0.025.6E-040.007.5E-01
rs2040732NC_000007.13:g.20418134C>TITGB87p21.1C/T0.581.071.05–1.095.2E-091.030.99–1.071.7E-011.061.04–1.086.9E-09−0.021.8E-03−0.011.0E-02
rs1570221NC_000010.10:g.105656874G>ASTN110q24.33A/G0.351.061.03–1.083.5E-061.071.03–1.121.4E-031.061.04–1.082.2E-08−0.029.5E-04−0.013.9E-02
rs4757118NC_000011.9:g.13171236C>TARNTL11p15.2T/C0.541.071.04–1.091.1E-081.041.00–1.087.3E-021.061.04–1.083.8E-09−0.012.8E-01−0.022.2E-03
rs9525927NC_000013.10:g.44842503G>ASERP213q14.11G/A0.191.071.04–1.102.9E-061.101.05–1.161.4E-041.081.05–1.102.8E-09−0.022.2E-02−0.022.2E-03
rs72731149NC_000015.9:g.49984710G>CDTWD115q21.2G/C0.911.121.07–1.178.9E-071.121.04–1.202.6E-031.121.08–1.168.3E-09−0.057.9E-07−0.039.0E-04
rs10152300NC_000015.9:g.84392907G>AADAMTSL315q25.2G/A0.231.091.06–1.112.4E-101.081.02–1.134.7E-031.081.06–1.114.2E-12−0.016.4E-02−0.021.5E-03
rs955277NC_000002.11:g.9290357C>TASAP22p25.1T/C0.611.081.05–1.102.7E-101.040.99–1.088.6E-021.071.05–1.091.9E-100.009.6E-01−0.013.3E-02
rs12466981NC_000002.11:g.42433247C>TEML42p21C/T0.731.061.03–1.088.1E-061.081.03–1.131.2E-031.061.04–1.094.9E-08−0.014.4E-02−0.016.4E-02
rs2442776NC_000003.11:g.11640601G>AVGLL43p25.3G/A0.151.091.06–1.135.7E-081.101.04–1.168.9E-041.091.06–1.122.0E-10−0.016.4E-02−0.011.3E-01
rs4093840NC_000003.11:g.123077042T>AADCY53q21.1A/T0.471.071.05–1.091.6E-091.041.00–1.094.8E-021.061.04–1.083.9E-10−0.018.5E-02−0.003.8E-01
rs62375246NC_000005.9:g.132439010T>AHSPA45q31.1A/T0.261.081.05–1.109.6E-091.030.98–1.082.7E-011.061.04–1.092.2E-08−0.021.3E-02−0.012.2E-01
rs979453NC_000005.9:g.150595073A>GCCDC695q33.1G/A0.341.071.05–1.101.6E-091.020.97–1.065.1E-011.061.04–1.081.4E-08−0.017.2E-02−0.006.4E-01
rs1334576NC_000006.11:g.7211818G>ARREB16p24.3A/G0.421.071.05–1.094.2E-091.020.99–1.062.2E-011.061.04–1.081.2E-080.006.9E-01−0.012.9E-02
rs9350191NC_000006.11:g.19842661C>TID46p22.3T/C0.851.111.08–1.156.2E-121.141.05–1.231.9E-031.121.09–1.155.1E-14−0.024.9E-02−0.011.5E-01
rs2284174NC_000006.11:g.30713580T>CIER36p21.33C/T0.221.121.10–1.153.4E-191.121.05–1.211.5E-031.121.10–1.152.1E-21−0.015.2E-02−0.023.0E-02
rs674621NC_000006.11:g.117257018T>CRFX66q22.1C/T0.321.061.03–1.081.3E-061.071.03–1.121.4E-031.061.04–1.087.6E-09−0.013.6E-02−0.016.9E-02
rs9329170NC_000008.10:g.8697658C>GMFHAS18p23.1C/G0.861.101.06–1.142.0E-081.091.03–1.154.9E-031.101.07–1.133.6E-10−0.024.6E-02−0.016.5E-02
rs10760580NC_000009.11:g.101661650G>ACOL15A19q22.33G/A0.711.081.06–1.111.6E-101.040.99–1.091.1E-011.071.05–1.101.2E-10−0.021.5E-02−0.021.4E-02
rs8044657NC_000016.9:g.58022625G>ATEPP16q21G/A0.901.121.07–1.161.7E-071.091.01–1.171.8E-021.111.07–1.151.1E-08−0.029.9E-02−0.022.8E-02
rs62065216NC_000017.10:g.38218773G>ATHRA17q21.1A/G0.421.061.04–1.091.2E-071.061.01–1.102.2E-021.061.04–1.088.1E-09−0.017.8E-02−0.008.0E-01
rs4660861NC_000001.10:g.45946636G>TTESK21p34.1G/T0.571.051.03–1.087.2E-061.071.03–1.121.2E-031.061.04–1.084.4E-08−0.013.0E-01−0.004.1E-01
rs7650602NC_000003.11:g.141147414T>CZBTB383q23C/T0.451.071.04–1.096.7E-091.010.97–1.065.1E-011.061.04–1.084.9E-080.008.4E-01−0.013.2E-01
rs34651NC_000005.9:g.72144005C>TTNPO15q13.2C/T0.081.101.06–1.157.5E-071.121.02–1.221.2E-021.111.07–1.153.0E-08−0.007.6E-010.007.5E-01
rs798565NC_000007.13:g.2752152G>AAMZ17p22.3G/A0.711.091.06–1.112.8E-111.000.95–1.058.8E-011.071.04–1.093.9E-09−0.011.2E-01−0.006.1E-01
rs7866939NC_000009.11:g.85126163T>CRASEF9q21.32C/T0.331.061.03–1.081.3E-061.071.02–1.123.8E-031.061.04–1.081.7E-08−0.012.5E-01−0.011.1E-01
rs7958945NC_000012.11:g.115947901A>GMED13L12q24.21G/A0.361.061.04–1.091.0E-071.071.02–1.112.8E-031.061.04–1.091.0E-09−0.011.5E-01−0.006.2E-01
rs72626215NC_000019.9:g.46294136G>ADMWD19q13.32G/A0.731.061.04–1.091.7E-061.111.05–1.177.3E-051.071.05–1.101.7E-090.004.8E-01−0.011.2E-01
rs73158393NC_000022.10:g.33335386C>GSYN322q12.3C/G0.741.061.04–1.091.1E-061.091.03–1.141.4E-031.071.05–1.097.7E-09−0.007.7E-01−0.013.3E-01

CI, confidence interval; OR, odds ratio; ICGC, International COPD Genetics Consortium; HGVS, Human Genome Variation Society. This table shows association statistics in UK Biobank (21,081 cases and 179,711 controls), ICGC cohorts (14,654 cases and 42,365 controls), the overall meta-analysis (35,735 cases and 222,076 controls), and SpiroMeta studies (FEV1 and FEV1/FVC; n=79,055). P values are two-sided based on Wald statistics (COPD) and t statistics (FEV1 and FEV1/FVC) without multiple comparison adjustment. White = Significant in SpiroMeta using Bonferroni correction for novel loci (one-sided P < 0.05/35); Grey = Nominally significant in SpiroMeta (one-sided P < 0.05), Dark grey = not significant (directionally consistent only)

Identification of secondary association signals

We used approximate conditional and joint analysis[22] to find secondary signals at each of the 82 genome-wide significant loci. We found 82 secondary signals at 50 loci, resulting in a total of 164 independent associations in 82 loci (Supplementary Table 4). Of 50 loci containing secondary associations, 33 were at loci previously described for COPD or lung function, and six at Bonferroni-replicated novel loci. Of 82 secondary associations, 20 reached genome-wide significance (P < 5 × 10−8) (Supplementary Table 4). Of 61 novel (not previously described in COPD or lung function) independent associations, 21 reached a region-wise Bonferroni-corrected threshold (one-sided P < 0.05/novel independent association(s) in each locus) in unconditioned associations from SpiroMeta (Methods and Supplementary Table 4).

Tissue and specific cell types

In determining the tissue in which COPD genetic variants function to increase COPD risk, lung is the obvious tissue to consider. However, COPD is a systemic disease[23,24] and within the lung the cell-types collectively contributing to disease pathogenesis are largely unknown. Furthermore, available databases include cell types relevant to lung (e.g. smooth muscle) but from other organs (e.g. the gastrointestinal tract). To identify putative causal tissues and cell types, we assessed the heritability enrichment in integrated genome annotations at the single tissue level[25] and tissue-specific epigenomic marks[26]. Lung tissue showed the most significant enrichment (enrichment = 9.25, P = 1.36 × 10−9), as previously described, though significant enrichment was also seen in heart (enrichment = 6.85, P = 3.83 × 10−8) and the gastrointestinal (GI) tract (enrichment = 5.53, P = 6.45 × 10−11). In an analysis of enriched epigenomic marks, the most significant enrichment was in fetal lung and GI smooth muscle DNase hypersensitivity sites (DHS) (P = 6.75 × 10−8) and H3K4me1 (P = 7.31 × 10−7) (Supplementary Table 5). To identify the source of association within lung tissue, we tested for heritability enrichment using single-cell chromatin accessibility[27] (ATAC-Seq) and gene expression (RNA-Seq) from human[28,29] and murine[30] lung (Supplementary Table 5). Using LD score regression in murine ATAC-Seq data, we found enrichment of chromatin accessibility in several cell types, including endothelial cells (most significant), type 1, and type 2 alveolar cells (the latter among the highest fold-enrichment [Supplementary Table 5a]). Results using LD score regression[31] or SNPsea[32] on single-cell RNA-Seq varied, with nominal P-values for genes expressed in type 2 alveolar cells, basal-like cells, club cells, fibroblasts and smooth muscle cells (Supplementary Tables 5b,c).

Fine-mapping of associated loci

To identify the most likely causal variants at each locus, we performed fine mapping using Bayesian credible sets[33]. Including 160 potential primary and secondary association signals (excluding four variants in the major histocompatibility complex [MHC] region), 61 independent signals had a 99% credible set with fewer than 50 variants; 34 signals had credible sets with fewer than 20 variants (Supplementary Figure 3). Eighteen loci had a single variant with a posterior probability of driving association (PPA) greater than 60% including the NPNT (4q24) locus, where the association could be fine-mapped to a single intronic variant, rs34712979 (NC_000004.11:g.106819053G>A, see Supplementary Note and Supplementary Table 6). Most sets included variants that overlapped genic enhancers of lung-related cell types (e.g., fetal lung fibroblasts, fetal lung, and adult lung fibroblasts) and were predicted to alter transcription binding motifs (Supplementary Table 6). Of 61 credible sets with fewer than 50 variants, eight sets contained at least one deleterious variant. These deleterious variants included 1) missense variants affecting TNS1, RIN3, ADGRG6, ADAM19, ATP13A2, BTC, and CRLF3; and 2) a splice donor variant affecting a lincRNA - AP003059.2.

Candidate target genes

In most cases, the closest gene to a lead SNP will not be the gene most likely to be the causal or effector gene of disease-associated variants[34-36]. Thus, to identify the potential effector (‘target’) genes underlying these genetic associations, we integrated additional molecular information including gene expression, gene regulation (open chromatin and methylation data), chromatin interaction, co-regulation of gene expression with gene sets, and coding variant data (Methods and Figure 3).
Figure 3

Identification of target genes

(a) Overview of datasets used to identify target genes at genome-wide significant loci (b) Regional association plots at ADAMTSL3 locus showing GWAS (top), chromatin interaction in lung tissue (middle) and expression quantitative trait loci (bottom). GWAS P-values are two-sided based on Wald statistics (35,735 cases and 222,076 controls). Expression quantitative trait loci (eQTL) P-values are two-sided based on t statistics (1,038 samples). P-values for Hi-C data were calculated using binomial distribution from spline-fitted and outlier-filtered distribution of contacts. All P-values were not adjusted for multiple comparison. GREx, gene-based association using gene expression; mQTL, colocalization with methylation quantitative trait loci; Cod., significant single variant or gene-based association tests for deleterious coding variants from exome data; Hi-C, significant chromatin interaction identified in human lung or the IMR90 cell line; DHS, overlap with DNase hypersensitivity sites; GSet, prioritized genes from DEPICT.

At 82 loci, 472 genes within +/− 1 Mb of top associated variants were implicated by analysis of least one dataset; 106 genes were implicated by lung gene expression[37,38], and an additional 50 genes by >= 2 other datasets (methylation[39], chromatin interaction[40], open chromatin regions[41], similarity in gene sets[42] or deleterious coding variants[43] [Figure 3]), for a total of 156 genes meeting more stringent criteria. Excluding loci in the MHC region, the median number of potentially implicated genes per locus was four, with a maximum of 17 genes (7q22.1 and 17q21.1). The median distance of implicated genes to top associated variants was 346 Kb. Among 82 loci, 60 (73%) included the nearest gene. We identified 20 genes with supportive evidence from exome sequencing data. Two genes (ADAM19 and ADAMTSL3) were implicated by five datasets (Figure 3) and another two (EML4 and RIN3) were implicated by four datasets. A summary of all genes implicated using these approaches is included in Supplementary Table 7.

Associated pathways

To gain further functional insight of associated genetic loci, we performed gene-set enrichment analysis using DEPICT[42]. Among 165 enriched gene sets at false discovery rate (FDR) < 5%, 44% of them were related to the developmental process term, with nominal P for lung development of 1.02 × 10−6; significant sub-terms included lung alveolus development (P = 0.0003) and lung morphogenesis (P = 0.0005). We also found enrichment of extracellular matrix-related pathways including laminin binding, integrin binding, mesenchyme development, cell-matrix adhesion, and actin filament bundles. Additional pathways of note included histone deacetylase binding, the Wnt receptor signaling pathway, SMAD binding, the MAPK cascade, and the transmembrane receptor protein serine/threonine kinase signaling pathway. Full enrichment analysis results including the top genes for each DEPICT gene set are shown in Supplementary Table 8.

Identification of drug targets

GWAS is also useful for identifying drug targets either at the individual gene[18,44,45] or genome-wide level[46,47]. Of 482 candidate target genes, 60 genes could be targeted by at least one approved or in-development drug[48], totaling 428 drugs with 144 different modes of action (Supplementary Table 9). Druggable targets at novel loci for COPD and lung function included ABHD6, CDKL2, GSTO2, KCNC4, PDHB, SLK, and TRPM7. We also identified drugs for repurposing in COPD using transcriptome-wide associations and drug-induced gene expression signatures[49] (Supplementary Note).

Phenotypic effects of COPD-associated variants

To characterize the phenotypic effects of 82 genome-wide significant loci, we performed a phenome-wide association analysis within the deeply phenotyped COPDGene study (Methods). We assessed for common patterns of phenotype associations for the 82 loci by using hierarchical clustering across scaled Z scores of phenotype-variant associations. We identified two clusters of variants differentially associated with two sets of phenotypes (Supplementary Figure 4). As these two variant-phenotype clusters appeared to be driven by computed tomography (CT) imaging features, we repeated variant clustering limited to quantitative computed tomography imaging features. We again found two clusters of variants, differentiated by association with quantitative emphysema, emphysema distribution, gas trapping, and airway phenotypes (Figure 4a). Additionally, we evaluated the association of the 82 genome-wide significant variants in a prior GWAS of emphysema and airway quantitative computed tomography features[50] (Supplementary Table 10).
Figure 4

Effects on COPD-related and other phenotypes

(a) Heatmap of scaled computed tomography (CT) quantitative imaging associations with the 34 genome-wide significant variants (known and replicated novel associations) with at least nominal (P < 0.05) association with any CT imaging feature in COPDGene non-Hispanic white participants. Cluster 1 variants are more associated with airway imaging features and Cluster 2 variants are more associated with emphysema imaging features. Variants are referred to by the closest gene. (b) Overlap of genome-wide significant loci of COPD and select traits from GWAS Catalog (c) Genome-wide overlapping results between COPD with pulmonary fibrosis (left) and asthma (right). PRM emphysema, emphysema quantified by parametric response mapping; UL, upper lobe of the lung; LL, lower lobe of the lung; Pi10, airway wall thickness calculated from regressing the square root of the airway wall area with the airway internal perimeter. CAD, coronary artery disease; BMD, bone mineral density.

We also examined all genome-wide significant loci in the NHGRI-EBI GWAS Catalog[51] (Supplementary Figure 5, Supplementary Table 11) and looked for trait-associated variants in linkage disequilibrium (r2 > 0.2) with our lead COPD-associated variants. Many variants were associated with anthropometric measures including height and body mass index (BMI), measurements on blood cells (red and white cells), and cancers. COPD is well known to have many common comorbidities, such as coronary artery disease (CAD), type 2 diabetes mellitus (T2D), osteoporosis, and lung cancer. Of these diseases and 13 additional traits, we confirmed previously reported overall genetic correlation (using linkage disequilibrium score regression[52]) of COPD with lung function, asthma, and height, and found evidence of modest correlation between COPD and lung cancer (Supplementary Note). However, at individual loci, and using more stringent linkage disequilibrium (r2 > 0.6), we found evidence of shared risk factors for these comorbid diseases and COPD including a genome-wide significant variant near PABPC4 associated with T2D, four variants with CAD (near CFDP1, DMWD, STN1, and TNS1), and a variant near SPPL2C with bone density (Figure 4b).

Overlapping loci with asthma and pulmonary fibrosis

Based on our previous identification of genetic overlap of COPD with asthma, and COPD with pulmonary fibrosis, we examined loci for specific overlap with these two diseases. In asthma, we noted an r2 > 0.2 with one of our variants and previously reported variants at ID2, ZBTB38, C5orf56, MICA, AGER, HLA-DQB1, ITGB8, CLEC16A, and THRA. In pulmonary fibrosis, in addition to our previously described overlap at FAM13A, DSP, and 17q21, we noted overlapping associations at ZKSCAN1 and STN1 (Supplementary Table 12). To more closely examine overlap, we applied a Bayesian method (gwas-pw[53]) of COPD associations from our current GWAS with previous GWASs of asthma (limited to those of European ancestry) and pulmonary fibrosis[54,55]. To mitigate the results of including asthma among our COPD cases, we performed analysis for overlap with asthma removing self-reported asthmatics from UK Biobank for this analysis (Methods). We identified 14 shared genome segments (posterior probability > 70%), 9 with asthma and 5 with pulmonary fibrosis (Figure 4c, Supplementary Table 13). In addition to the three segments shared with pulmonary fibrosis identified in the previous study[5] (FAM13A, DSP, and the 17q21 locus – here nearest CRHR1), we identified two new segments including loci near ZKSCAN1 and STN1 (formerly known as OBFC1). Shared variants between COPD and pulmonary fibrosis all had an opposite effect (i.e., increasing risk for COPD but protective for pulmonary fibrosis). In asthma, we identified five shared segments in the 6p21–22 regions, as well as ADAM19, ARMC2, ELAVL2, and STAT6. With the exception of STAT6, overlapping variants showed the same direction of effect.

Discussion

Genetic factors play an important role in COPD susceptibility. We examined genetic risk of COPD in a genome-wide association study of 35,735 cases and 222,076 controls. We identified 82 genome-wide significant loci for COPD, of which 47 were previously identified in genome-wide association studies of COPD or population-based lung function. Of 35 loci not previously described at the time of analysis, 13 replicated in an independent study of population-based lung function. We used several data sources to attempt to assign causal genes at each locus, identifying 156 genes at 82 loci that were supported by either gene expression or a combination of at least 2 other data sources. Our results identify specific genes, cell types, and biologic pathways for targeted study and also suggest a genetic basis for the clinical heterogeneity seen in COPD. Our study supports the role of early life events in the risk of COPD. Gene set enrichment analysis identified developmental pathways both specific to the lung (e.g., lung morphogenesis and lung alveolar development) and related to the lung (e.g., the canonical Wnt receptor[56,57], the MAPK/ERK, and the nerve growth factor receptor signaling pathways). We also confirmed enrichment of heritability in epigenomic marks of fetal lung. Our findings are consistent with epidemiologic studies demonstrating that a substantial portion of the risk of COPD may develop in early life: genetic variants may set initial lung function[58] and patterns of growth[58-60]. While further work will be needed to confirm the causal variants and genes affected by our variants, testing the role of these genes in lung development-relevant murine or ex-vivo models – for example, determining whether the perturbation of these genes changes proliferation and differentiation of lung epithelial progenitors in induced pluripotent cell-derived lung alveolar type 2 cells[44] – could provide experimental evidence of the role of these genes in early life susceptibility. Ultimately, the goal of this work would be to identify targets for or subsets of high risk individuals early in the disease course, or molecular candidates that may affect lung repair and regeneration[61]. Apart from genes related to lung development, our analyses highlighted several genes and pathways already of interest in COPD therapy (e.g. CHRM3 / acetylcholine receptor inhibitors, the MAPK pathway) – supporting the role of genetic analyses in finding therapeutic targets[18,62] – and newer genes that could inform future functional studies. We identified interleukin 17 receptor D (IL17RD), as a potential effector gene at the 3p14 locus. Numerous studies have examined the role of IL-17A in COPD[63], and IL17RD can differentially regulate pathways employed by IL-17A[64]. Chitinase acidic (CHIA) at 1p13.3, which encodes a protein that degrades chitin[65], exhibits lung-specific expression[66,67]. CHIA variants have been associated with FEV1[68], asthma[69-72], and acid mammalian chitinase activity[71,73]. We identified several potential effector genes related to extracellular matrix, cell adhesion, cell-cell interactions, and elastin-associated microfibrils[74-76], some of which have been previously identified in studies of lung function[15]. These include integrin family members that mediate cell-matrix communication (e.g., ITGA1, ITGA2, ITGA8[77-79]), an integrin ligand encoding gene (NPNT[80]), and genes encoding matrix proteins (e.g., MFAP2 and ADAMTSL3). ADAMTSL3 plays a role in cell-matrix interactions related to the assembly of fibrillin and microfibril biogenesis[81-83] and of our candidate effector genes was supported by the greatest number of bioinformatic analyses. Recombinant forms of other ADAMTS-like proteins demonstrate experimental evidence of promoting and enhancing fibrillin and microfibril deposition and assembly[84,85]. ADAMTSL3 may play a role in preventing emphysematous destruction of lung tissue by ADAMTS in COPD. In addition to identifying the effector gene, knowing the effector cell type is critical for functional studies. We identified an overall enrichment of epigenomic marks in lung tissue and smooth muscle (also identified in studies of lung function[16]). This latter association was found in gastrointestinal tissue cell types; respiratory smooth muscle is absent in the analyzed datasets. We also performed analyses of single-cell data in an attempt to identify the specific lung cell types in which our top variants are potentially functioning. We found evidence for enrichment of several cell types, including but not limited to endothelial cells, alveolar type 2 cells, and basal-like cells. Each of these cell types has been postulated to have a role in the development of COPD[86-88], and our data are consistent with the likely heterogeneity of lung cell types contributing to COPD susceptibility. The lung comprises at least 40 different resident cell types[89], most of which are not distinctly represented in these datasets. Thus, while our findings support the investigation of specific cell types for further functional studies, they also highlight the need for profiling of lung-relevant cell types and loci-specific analyses. Characterization of functional variant effects could lead to better disease subtyping and more targeted therapy for COPD. Cluster analysis on hundreds of COPD-associated features in the more extensively phenotyped COPDGene cohort showed heterogeneous effects of genetic variants on COPD-related phenotypes, including computed tomography (CT) measurements of airway abnormalities and emphysema – well-described sources of heterogeneity in COPD[90-92]. Analyzing hundreds of diseases/traits in GWAS Catalog, we identified overlapping associations with various diseases/traits in multiple organ systems, comorbidities such as coronary artery disease, bone mineral density, and type 2 diabetes mellitus (T2D). The COPD-associated PABPC4 locus was associated with T2D[93] and C-reactive protein (CRP) level[94]. Although a causal gene in this locus and its contribution to COPD is unknown, its association with T2D may suggest a shared disease pathway and drug targets. Together, the identification of variable COPD risk loci associations with sub-phenotypes and other diseases[95,96] may have potential for more nuanced approaches to therapy for COPD. Overall, our phenotype, gene, and pathway analyses illustrate the utility of both searching for enrichment of genetic signals overall, and performing a more detailed identification of the effects of individual variants or groups of variants. We performed additional specific analysis in two diseases that overlap with COPD, asthma and pulmonary fibrosis. While a genome-wide genetic correlation of COPD and asthma has been previously described[5], our analysis is the first to identify specific shared genetic segments between asthma and COPD. While the effects at most of these shared segments were concordant in direction, one of the segments of particular interest was near STAT6, which had opposite directions of effect in the two diseases. STAT6 plays a role in T helper (Th) type 2-dependent inflammation, and is activated by interleukin-4 and interleukin-13 (IL-4 and IL-13)[97]. IL-13, in turn, has been found to be increased in asthmatic airways[98] but decreased in severe emphysema[99]. In pulmonary fibrosis, variants at all overlapping loci have an opposite direction of effect compared to COPD[5]. These effects raise the possibility that specific therapies for one disease could increase the risk of the other disease, which may be worth evaluating in treatment trials. The reasons why genetic effects are divergent between COPD and fibrosis are unclear, but these identified opposite effects could point to molecular switches that influence why some smokers develop emphysema while others develop pulmonary fibrosis. While pulmonary fibrosis is an uncommon disease and specifically excluded in several of our COPD case-control cohorts, interstitial lung abnormalities are increasingly being recognized as a potential precursor to fibrosis, and an inverse relationship between these abnormalities and emphysema has been previously identified[100]. Mechanistically, some have hypothesized that the divergent derangement of Wnt and Notch signaling pathways[101] and mesenchymal cell fate[102] may be responsible for the distinct development of these two diseases. We also describe an overlapping region at the STN1 (previously known as OBFC1) locus. STN1 plays a role in telomere maintenance[103]; shortened telomeres have been observed in both COPD and idiopathic pulmonary fibrosis (IPF)[104,105], and rare genetic variants in the telomerase pathway have been implicated in both pulmonary fibrosis and emphysema – albeit with concordant effects on either disease[106]. While our study a large genome-wide association study of COPD, individuals meeting our criteria for COPD in the UK Biobank may be different from other studies, especially for smoking history. We used the same definition of COPD as in our prior analysis[5], which included non-smokers. Our use of pre-bronchodilator spirometry to define COPD (allowing us to maximize sample size) as well as population-based lung function for replication could bias our findings against variants that are only associated with more severe forms of COPD. We did not exclude other causes of airway obstruction such as asthma, noting that asthma frequently overlaps with, and is misdiagnosed in COPD[107]. We performed several additional analyses to determine whether our results were driven by, or markedly different, by smoking status, asthma, or use of pre- instead of post-bronchodilator spirometry to define COPD. The results of these additional analyses did not indicate a substantial impact of these factors on our overall findings, and together with prior analyses[5,16], suggest that bias due to these factors is likely small. However, our study was not designed to identify differences between subgroups, and we cannot rule out a role for studying more severe disease or disease subtypes. We note that the alpha-1 antitrypsin locus (SERPINA1) was identified as genome-wide significant in smaller studies of emphysema and in smokers with severe COPD[108]. In the current study, the association of the PiZ allele (NC_000014.8:g.94844947C>T, rs28929470) had P = 2.2 × 10−5 using moderate-to-severe cases (FEV1 < 80% predicted), and a smaller P-value (1.4 × 10−6) in severe cases (FEV1 < 50% predicted) despite a smaller sample size, a phenomenon we have previously described[11]. Thus, despite the strong overlap of COPD with quantitative spirometry, new loci may be identified through studies of sufficiently large subsets of COPD patients and with more specific and homogeneous COPD phenotypes. Given suggestive evidence for replication using a related (but not identical) phenotype for additional novel loci beyond the 13 meeting a Bonferroni-corrected threshold for significance, we chose to include all loci significant in discovery in subsequent analyses, recognizing that we likely included some false positive associations. Our study focused on relatively common variants, predominantly in individuals of European ancestry; more detailed studies of rare variants, the human leukocyte antigen (HLA) regions, and other ethnicities are warranted, but broader multi-ethnic analyses are limited by the number of cases in currently available cohorts. Although COPD sex differences have been reported[109], we did not identify significant sex-specific differences in effect sizes of the 82 top variants. Future studies including more subjects and methodological advances may be needed to elucidate this effect. The global burden of COPD is increasing. Our work finds a substantial number of new loci for COPD and uses multiple lines of supportive evidence to identify potential genes and pathways for both existing and novel loci. Further investigation of the genetic overlap of COPD with other respiratory diseases and the phenotypic effects of top loci finds new shared loci for asthma and idiopathic pulmonary fibrosis and suggests heterogeneity across COPD-associated loci. Together, these insights provide multiple new avenues for investigation of the underlying biology and the potential therapeutics in this deadly disease.

Methods

Study populations

The UK Biobank is a population-based cohort consisting of 502,682 individuals[8]. To determine lung function, we used measures of forced expiratory volume in 1 second (FEV1) and forced vital capacity (FVC) derived from the spirometry blow volume-time series data, subjected to additional quality control based on ATS/ERS criteria[110] (Supplementary Note). As in our previous study[5], we defined COPD using pre-bronchodilator spirometry according to modified Global Initiative for Chronic Obstructive Lung Disease (GOLD) criteria for moderate to very severe airflow limitation[9]: FEV1 less than 80% of predicted value (using reference equations from Hankinson et al.[111]), and the ratio between FEV1 and FVC less than 0.7. Consistent with our previous analyses and enrollment criteria for COPD case-control datasets[112], we did not exclude individuals based on self-reported asthma. Genotyping was performed using Axiom UK BiLEVE array and Axiom Biobank array (Affymetrix, Santa Clara, California, USA) and imputed to the Haplotype Reference Consortium (HRC) version 1.1 panel[113]. We invited participants in the prior International COPD Genetics Consortium (ICGC) COPD genome-wide association study to provide case-control association results (with the exception of the 1958 British Birth Cohort, to avoid overlapping samples with the replication sample). ICGC cohorts performed case-control association analysis based on pre-bronchodilator measurements of FEV1 and FEV1/FVC, and cases were identified using modified GOLD criteria, as above. Studies were imputed to 1000 Genomes reference panels. Detailed cohort descriptions and cohort-specific methods have been previously published[5] (Supplementary Note). All studies comply to all relevant ethical regulations. Ethical/regulatory boards approved the study protocol for each study (Supplementary Note). We obtained informed consent from all participating individuals. Based on the strong genetic overlap of lung function and COPD[5], we performed lookups of select significant variants for FEV1 and FEV1/FVC in the SpiroMeta consortium meta-analysis[21]. Briefly, SpiroMeta comprised a total of 79,055 individuals from 22 studies imputed to either the 1000 Genomes Project Phase 1 reference panel (13 studies) or the HRC (9 studies). Each study performed linear regression adjusting for age, age[2], sex, and height, using rank-based inverse normal transforms, adjusting for population substructure using principal components or linear mixed models, and performing separate analyses for ever- and never- smokers or using a covariate for smoking (for studies of related subjects). Genomic control was applied to individual studies, and results were combined using a fixed-effects meta-analysis[21].

Genome-wide association analysis

In UK Biobank, we performed logistic regression of COPD, adjusting for age, sex, genotyping array, smoking pack-years, ever smoking status, and principal components of genetic ancestry. Association analysis was done using PLINK 2.0 alpha[114] (downloaded on December 11, 2017) with Firth-fallback settings, using Firth regression when quasi-complete separation or regular-logistic-regression convergence failure occurred. We performed a fixed-effects meta-analysis of all ICGC cohorts and UK Biobank using METAL (version 2010–08-01)[115]. We assessed population substructure and cryptic relatedness by linkage disequilibrium (LD) score regression intercept[10]. We defined a genetic locus using a 2-Mb window (+/−1 Mb) around a lead variant, with conditional analyses as described below. To maximize our power to identify existing and discover new loci, we examined all loci at the genome-wide significance value of P < 5 × 10−8. We first characterized loci as being previously described (evidence of prior association with lung function[12-20,116,117] or COPD[5,11,118]) or novel. We defined previously reported signals if they were in the same LD block in Europeans[119] and in at least moderate LD (r2 >= 0.2). For novel loci we attempted replication through association of each lead variant with either FEV1 or FEV1/FVC ratio in SpiroMeta, using one-sided P-values with Bonferroni correction for the number of novel loci examined. Novel loci failing to meet a Bonferroni-corrected P-value were assessed for nominal significance (one-sided P < 0.05) or directional consistence with FEV1 and FEV1/FVC ratio in SpiroMeta. Cigarette smoking is the major environmental risk factor for COPD and genetic loci associated with cigarette smoking have been reported[5,120]. While we adjusted for cigarette smoking in our analysis, we further examined these effects by additionally testing for association of each locus with cigarette smoking and by looking at two separate analyses of ever- and never- smokers in UK Biobank. We tested for sex-specific genetic effects of genome-wide significant variants via a stratified analysis and interaction testing, using a 5% Bonferroni-corrected threshold to determine significance (Supplementary Note).

Identification of independent associations at genome-wide significant loci

We identified specific independent associations at genome-wide significant loci using GCTA-COJO[22]. This method utilizes an approximate conditional and joint analysis approach requiring summary statistics and representative LD information. As the UK Biobank provided the predominant sample, we used 10,000 randomly drawn unrelated individuals from this discovery dataset as a LD reference sample. We scaled genome-wide significance to a 2-Mb region, resulting in a locus-wide significant threshold of 8 × 10−5, or 2 × 10−6 for variants in the major histocompatibility complex (MHC) region (chr6:28477797–33448354 in hg19). We created regional association plots via LocusZoom using 1000 Genomes EUR reference data[121] (Nov2014 release).

Identification and prioritization of tissues and cell types, candidate variants, genes, and pathways

Identification of enriched tissues and specific cell types

We used LD Score Regression (LDSC) to estimate the enrichment of functional annotations[26] and specifically expressed gene regions[31] on disease heritability. We utilized LDSC baseline models (e.g., conserved region, promoter flanking region), tissue-specific annotations from the Roadmap Epigenomics Program[31], integrated tissue annotations from GenoSkyline[25], and cell type-specific chromatin accessibility data[27] (ATAC-Seq). We used four single-cell gene expression (RNA-Seq) datasets to identify specific cell types (Supplementary Note), including 1) lung epithelial cells from normal and pulmonary fibrosis human lung[28] (Gene Expression Omnibus [GEO] accession GSE86618), 2) human induced pluripotent stem cells (iPSCs)-derived putative alveolar type 2 cells[29] (GSE96642), 3) mouse lungs at embryonic day 18.5 (E18.5) and 4) postnatal day 1 (P1) by Whitsett et al. (unpublished, available at LungMAP[30]). We also used SNPsea[32] to identify enriched cell types in genome-wide significant loci (Supplementary Note). We reported only estimates of coefficients and P-values for the Roadmap annotations and gene expression datasets, as theses analyses used –h2-cts, which does not report fold enrichment.

Fine-mapping of independent association signals at genome-wide significant loci

We used Bayesian fine-mapping at each locus to identify the credible set: the set of variants with a 99% probability of containing a causal variant. Briefly, for each genome-wide significant loci we calculated approximate Bayes factors[33] of association. We then selected variants in each locus, so that their cumulative posterior probability was equal or greater than 0.99 using an unscaled variance. At loci with multiple independent associations, we used statistics from approximate conditional analysis with GCTA software on each index variant adjusting for other independent variants in the loci. Otherwise, we used unconditioned statistics from our meta-analysis. Details on characterization of variant effects are summarized in the Supplementary Note.

Identification of target genes

We used several computational approaches with corresponding available datasets to identify target genes in genome-wide significant loci. We used two methods that utilized gene expression data: 1) S-PrediXcan and 2) DEPICT. We used S-PrediXcan[37] to identify genes with genetically regulated expression associated with COPD. We used data from the Lung-eQTL consortium[38] (1,038 lung tissue samples) as an expression quantitative trait loci (eQTL) and gene expression reference database. S-PrediXcan tests for association between a trait and imputed gene expression using summary statistics. Here, we performed S-PrediXcan using models for protein-coding genes +/− 1 Mb from top-associated variants at genome-wide significant loci. We used DEPICT (Data-driven Expression Prioritized Integration for Complex Traits)[42] to prioritize genes from ‘reconstituted’ gene sets. We also used additional information on gene regulation, including epigenetic data: 1) regulatory fine mapping, 2) methylation quantitative trait loci (mQTL), and 3) chromosome conformation capture. We used regulatory fine mapping (regfm[41]) to overlap 99% credible interval (CI) variants at each GWAS locus with open chromatin regions based on DNAse hypersensitivity sites (DHS). DHS cluster accessibility state was then associated with gene expression levels (for 13,771 genes) from 22 tissues in the Roadmap Epigenomics Project[41]. Using both the 99% CI and DHS overlap, as well as the DHS state and transcript level association, regfm calculates a posterior probability of association of each gene +/− 1 Mb of the lead SNP at each GWAS locus. We also searched for overlapping mQTL data from lung tissue, as recently described[39]. To determine whether these signals co-localized (rather than being related due to linkage disequilibrium), we performed colocalization analysis between our GWAS and mQTL in genome-wide significant loci using eCAVIAR[122] (eQTL and GWAS CAusal Variants Identification in Associated Regions, Supplementary Note). We also sought information from publicly available chromosome conformation capture data[40]. We queried association statistics of chromatin contact (i.e., long range chromatin interactions) between top associated variants and gene promoters nearby in a lung (fetal lung fibroblast cell line (IMR90) and human lung tissue[40]) using HUGIn[123] (Hi-C Unifying Genomic Interrogator). We retained only the strongest associations (i.e., smallest P-value) for each cell line/primary cell in the analysis. Finally, we searched for signals from deleterious variants by querying consequences of variants within 99% credible sets containing fewer than 50 variants (Supplementary Note). We also searched for rare coding variants, based on exome sequencing results in the COPDGene, Boston Early-Onset COPD (BEOCOPD), and International COPD Genetics Network (ICGN) studies, as previously described[43]. In brief, we performed exome sequencing on 485 severe COPD cases and 504 smoking resistant controls from the COPDGene study and 1,554 subjects ascertained through 631 probands with severe COPD from the BEOCOPD and the ICGN study. Details on statistical tests for single-variant and gene-based analyses are summarized in the Supplementary Note. For each dataset described above, we used Bonferroni-corrected P-values, or a fixed posterior probability threshold to determine target genes at each locus. We reported protein-coding genes +/−1 Mb from a top associated variant. We restricted our search to genes from the GRCh37 server in biomaRt[124] with updated HUGO Gene Nomenclature Committee (HGNC) names (downloaded from HGNC database of human gene names on June 7, 2018). For each locus, we used a 5% Bonferroni-corrected threshold (i.e., P < 0.05 divided by number of genes at that locus) to determine significance for 4 data types: gene expression data, chromatin conformation capture data, co-regulation of gene expression, and exome sequencing results. For two remaining datasets, we used a fixed posterior probability (of gene association with a GWAS locus) threshold of 0.1 for regfm and eCAVIAR. We considered genes that were implicated by gene expression or >= 2 combination of other datasets (e.g., methylation and chromatin conformation capture data) as target genes.

Identification of pathways

To identify enriched pathways in COPD-associated loci, we performed gene-set enrichment analysis using the “reconstituted” genes sets from DEPICT, as described above[42]. We defined significant gene sets using false discovery rate (FDR) < 5%.

Effects on COPD-related and other phenotypes

COPD is a complex and heterogeneous disorder, comprised of different biologic processes and specific phenotypic effects. In addition, many loci discovered by GWAS have pleiotropic effects. To identify these effects, we performed analyses of a) identification of overlapping genetic loci between related disorders (asthma and pulmonary fibrosis) b) genetic association studies of our genome-wide significant findings using COPD-related phenotypes, including a cluster analysis to identify groups of variants that may be acting via similar mechanisms; c) look up of top variants in prior COPD-related quantitative computed tomography (CT) imaging feature GWAS, d) look up of associations with other diseases/traits using GWAS Catalog, and e) estimate the genetic correlation between COPD and other diseases/traits. To identify overlapping loci between COPD and other respiratory disorders, we used gwas-pw[53] to perform pairwise analysis of GWAS. This method searches for shared genomic segments[119] using adaptive significance threshold, allowing detection of sub genome-wide significant loci. We identified shared segments or variants using posterior probability of colocalization greater than 0.7[53]. We obtained GWAS summary statistics from previous studies of pulmonary fibrosis[55] and asthma in Europeans[54]. For the overlap analysis of COPD with asthma, we examined the influence of the inclusion of individuals with self-reported asthma on both the overlap of discrete GWAS loci (using gwas-pw) and genome-wide genetic correlation (using LD score regression) by performing these analyses in the meta-analysis of ICGC studies and the UK Biobank (with individuals with asthma removed from cases in the latter). To assess heterogeneous effects of COPD susceptibility loci on COPD-related features (phenotypes), we evaluated associations of our genome-wide significant SNPs with 121 detailed phenotypes (e.g., lung function, computed tomography-derived metrics, biomarkers, and comorbidities) available in 6,760 COPDGene non-Hispanic whites. We calculated Z-scores for each SNP-phenotype combination relative to the COPD risk allele to create a SNP by phenotype Z-score matrix. We tested each COPD-related phenotype with at least one nominally significant association with one of our genome-wide significant COPD SNPs, leaving us with 107 phenotypes. We then oriented all Z-scores to be positive (based on sign of median Z score) in association with each phenotype to avoid clustering based on direction of association. To avoid clustering phenotypes only by strength of association with SNPs, we scaled Z-scores within each phenotype by subtracting mean Z-scores and dividing by the standard deviation of Z-scores within each phenotype. We then scaled Z-scores across SNPs to circumvent clustering of SNPs according only to relative strength of association with phenotypes. We then performed hierarchical clustering of the scaled Z-scores of associations between SNPs and phenotypes to identify clusters of SNPs and phenotypes for all 107 phenotypes as well as in the subset of 26 quantitative imaging phenotypes. We performed the clustering of variants both in the set of all genome-wide significant variants in discovery as well as in the subset of known variants plus novel variants meeting a strict Bonferroni threshold in SpiroMeta replication (Supplementary Note). We further examined top variant associations with COPD-related traits through a look-up of top variants in a prior GWAS of 12,031 subjects with quantitative emphysema and airway CT features[50]. To examine overlap of our COPD results with other traits, we downloaded genome-wide significant associations from the GWAS Catalog[51] (P < 5 × 10−8; downloaded on April 10, 2018). Between a pair of COPD- and trait- associated variants within the same LD block in Europeans[119], we computed the LD using the European ancestry panel[125] and considered the overlap if variants were in at least in moderate LD (r2 >= 0.2). We estimated genetic correlation between COPD and other diseases/traits using a web engine for LDSC, LD Hub[52]. We assessed the results using a 5% Bonferroni-corrected significance level. We queried our target genes using the Drug Repurposing Hub[48]. This resource contains comprehensive annotations of launched drugs, drugs in phases 1–3 of clinical development, previously approved and preclinical or tool compounds, curated using publicly available sources (e.g., ChEMBL and Drugbank) and proprietary sources. We performed drug-gene expression similarity analysis[49] (the Query) using a ranked gene set from a gene-based association test[37] (Supplementary Note).

Reporting Summary

We provide further information on research design in the Life Sciences Reporting Summary linked to this article.

Data availability statement

The genome-wide association summary statistics are available at the database of Genotypes and Phenotypes (dbGaP) under accession phs000179.v5.p2 and via the UK Biobank. Derived phenotypic data for COPD case control status is also available in the UK Biobank.
  121 in total

1.  LD Score regression distinguishes confounding from polygenicity in genome-wide association studies.

Authors:  Brendan K Bulik-Sullivan; Po-Ru Loh; Hilary K Finucane; Stephan Ripke; Jian Yang; Nick Patterson; Mark J Daly; Alkes L Price; Benjamin M Neale
Journal:  Nat Genet       Date:  2015-02-02       Impact factor: 38.330

2.  A Chronic Obstructive Pulmonary Disease Susceptibility Gene, FAM13A, Regulates Protein Stability of β-Catenin.

Authors:  Zhiqiang Jiang; Taotao Lao; Weiliang Qiu; Francesca Polverino; Kushagra Gupta; Feng Guo; John D Mancini; Zun Zar Chi Naing; Michael H Cho; Peter J Castaldi; Yang Sun; Jane Yu; Maria E Laucho-Contreras; Lester Kobzik; Benjamin A Raby; Augustine M K Choi; Mark A Perrella; Caroline A Owen; Edwin K Silverman; Xiaobo Zhou
Journal:  Am J Respir Crit Care Med       Date:  2016-07-15       Impact factor: 21.405

3.  Genetic loci associated with chronic obstructive pulmonary disease overlap with loci for lung function and pulmonary fibrosis.

Authors:  Brian D Hobbs; Kim de Jong; Maxime Lamontagne; Yohan Bossé; Nick Shrine; María Soler Artigas; Louise V Wain; Ian P Hall; Victoria E Jackson; Annah B Wyss; Stephanie J London; Kari E North; Nora Franceschini; David P Strachan; Terri H Beaty; John E Hokanson; James D Crapo; Peter J Castaldi; Robert P Chase; Traci M Bartz; Susan R Heckbert; Bruce M Psaty; Sina A Gharib; Pieter Zanen; Jan W Lammers; Matthijs Oudkerk; H J Groen; Nicholas Locantore; Ruth Tal-Singer; Stephen I Rennard; Jørgen Vestbo; Wim Timens; Peter D Paré; Jeanne C Latourelle; Josée Dupuis; George T O'Connor; Jemma B Wilk; Woo Jin Kim; Mi Kyeong Lee; Yeon-Mok Oh; Judith M Vonk; Harry J de Koning; Shuguang Leng; Steven A Belinsky; Yohannes Tesfaigzi; Ani Manichaikul; Xin-Qun Wang; Stephen S Rich; R Graham Barr; David Sparrow; Augusto A Litonjua; Per Bakke; Amund Gulsvik; Lies Lahousse; Guy G Brusselle; Bruno H Stricker; André G Uitterlinden; Elizabeth J Ampleford; Eugene R Bleecker; Prescott G Woodruff; Deborah A Meyers; Dandi Qiao; David A Lomas; Jae-Joon Yim; Deog Kyeom Kim; Iwona Hawrylkiewicz; Pawel Sliwinski; Megan Hardin; Tasha E Fingerlin; David A Schwartz; Dirkje S Postma; William MacNee; Martin D Tobin; Edwin K Silverman; H Marike Boezen; Michael H Cho
Journal:  Nat Genet       Date:  2017-02-06       Impact factor: 38.330

4.  Hhip haploinsufficiency sensitizes mice to age-related emphysema.

Authors:  Taotao Lao; Zhiqiang Jiang; Jeong Yun; Weiliang Qiu; Feng Guo; Chunfang Huang; John Dominic Mancini; Kushagra Gupta; Maria E Laucho-Contreras; Zun Zar Chi Naing; Li Zhang; Mark A Perrella; Caroline A Owen; Edwin K Silverman; Xiaobo Zhou
Journal:  Proc Natl Acad Sci U S A       Date:  2016-07-21       Impact factor: 11.205

5.  Heritability of chronic obstructive pulmonary disease and related phenotypes in smokers.

Authors:  Jin J Zhou; Michael H Cho; Peter J Castaldi; Craig P Hersh; Edwin K Silverman; Nan M Laird
Journal:  Am J Respir Crit Care Med       Date:  2013-10-15       Impact factor: 21.405

6.  Global Strategy for the Diagnosis, Management, and Prevention of Chronic Obstructive Lung Disease 2017 Report. GOLD Executive Summary.

Authors:  Claus F Vogelmeier; Gerard J Criner; Fernando J Martinez; Antonio Anzueto; Peter J Barnes; Jean Bourbeau; Bartolome R Celli; Rongchang Chen; Marc Decramer; Leonardo M Fabbri; Peter Frith; David M G Halpin; M Victorina López Varela; Masaharu Nishimura; Nicolas Roche; Roberto Rodriguez-Roisin; Don D Sin; Dave Singh; Robert Stockley; Jørgen Vestbo; Jadwiga A Wedzicha; Alvar Agustí
Journal:  Am J Respir Crit Care Med       Date:  2017-03-01       Impact factor: 21.405

7.  UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age.

Authors:  Cathie Sudlow; John Gallacher; Naomi Allen; Valerie Beral; Paul Burton; John Danesh; Paul Downey; Paul Elliott; Jane Green; Martin Landray; Bette Liu; Paul Matthews; Giok Ong; Jill Pell; Alan Silman; Alan Young; Tim Sprosen; Tim Peakman; Rory Collins
Journal:  PLoS Med       Date:  2015-03-31       Impact factor: 11.069

8.  Risk loci for chronic obstructive pulmonary disease: a genome-wide association study and meta-analysis.

Authors:  Michael H Cho; Merry-Lynn N McDonald; Xiaobo Zhou; Manuel Mattheisen; Peter J Castaldi; Craig P Hersh; Dawn L Demeo; Jody S Sylvia; John Ziniti; Nan M Laird; Christoph Lange; Augusto A Litonjua; David Sparrow; Richard Casaburi; R Graham Barr; Elizabeth A Regan; Barry J Make; John E Hokanson; Sharon Lutz; Tanda Murray Dudenkov; Homayoon Farzadegan; Jacqueline B Hetmanski; Ruth Tal-Singer; David A Lomas; Per Bakke; Amund Gulsvik; James D Crapo; Edwin K Silverman; Terri H Beaty
Journal:  Lancet Respir Med       Date:  2014-02-07       Impact factor: 30.700

9.  A genome-wide association study of pulmonary function measures in the Framingham Heart Study.

Authors:  Jemma B Wilk; Ting-Hsu Chen; Daniel J Gottlieb; Robert E Walter; Michael W Nagle; Brian J Brandler; Richard H Myers; Ingrid B Borecki; Edwin K Silverman; Scott T Weiss; George T O'Connor
Journal:  PLoS Genet       Date:  2009-03-20       Impact factor: 5.917

10.  The genetic architecture of type 2 diabetes.

Authors:  Christian Fuchsberger; Jason Flannick; Tanya M Teslovich; Anubha Mahajan; Vineeta Agarwala; Kyle J Gaulton; Clement Ma; Pierre Fontanillas; Loukas Moutsianas; Davis J McCarthy; Manuel A Rivas; John R B Perry; Xueling Sim; Thomas W Blackwell; Neil R Robertson; N William Rayner; Pablo Cingolani; Adam E Locke; Juan Fernandez Tajes; Heather M Highland; Josee Dupuis; Peter S Chines; Cecilia M Lindgren; Christopher Hartl; Anne U Jackson; Han Chen; Jeroen R Huyghe; Martijn van de Bunt; Richard D Pearson; Ashish Kumar; Martina Müller-Nurasyid; Niels Grarup; Heather M Stringham; Eric R Gamazon; Jaehoon Lee; Yuhui Chen; Robert A Scott; Jennifer E Below; Peng Chen; Jinyan Huang; Min Jin Go; Michael L Stitzel; Dorota Pasko; Stephen C J Parker; Tibor V Varga; Todd Green; Nicola L Beer; Aaron G Day-Williams; Teresa Ferreira; Tasha Fingerlin; Momoko Horikoshi; Cheng Hu; Iksoo Huh; Mohammad Kamran Ikram; Bong-Jo Kim; Yongkang Kim; Young Jin Kim; Min-Seok Kwon; Juyoung Lee; Selyeong Lee; Keng-Han Lin; Taylor J Maxwell; Yoshihiko Nagai; Xu Wang; Ryan P Welch; Joon Yoon; Weihua Zhang; Nir Barzilai; Benjamin F Voight; Bok-Ghee Han; Christopher P Jenkinson; Teemu Kuulasmaa; Johanna Kuusisto; Alisa Manning; Maggie C Y Ng; Nicholette D Palmer; Beverley Balkau; Alena Stančáková; Hanna E Abboud; Heiner Boeing; Vilmantas Giedraitis; Dorairaj Prabhakaran; Omri Gottesman; James Scott; Jason Carey; Phoenix Kwan; George Grant; Joshua D Smith; Benjamin M Neale; Shaun Purcell; Adam S Butterworth; Joanna M M Howson; Heung Man Lee; Yingchang Lu; Soo-Heon Kwak; Wei Zhao; John Danesh; Vincent K L Lam; Kyong Soo Park; Danish Saleheen; Wing Yee So; Claudia H T Tam; Uzma Afzal; David Aguilar; Rector Arya; Tin Aung; Edmund Chan; Carmen Navarro; Ching-Yu Cheng; Domenico Palli; Adolfo Correa; Joanne E Curran; Denis Rybin; Vidya S Farook; Sharon P Fowler; Barry I Freedman; Michael Griswold; Daniel Esten Hale; Pamela J Hicks; Chiea-Chuen Khor; Satish Kumar; Benjamin Lehne; Dorothée Thuillier; Wei Yen Lim; Jianjun Liu; Yvonne T van der Schouw; Marie Loh; Solomon K Musani; Sobha Puppala; William R Scott; Loïc Yengo; Sian-Tsung Tan; Herman A Taylor; Farook Thameem; Gregory Wilson; Tien Yin Wong; Pål Rasmus Njølstad; Jonathan C Levy; Massimo Mangino; Lori L Bonnycastle; Thomas Schwarzmayr; João Fadista; Gabriela L Surdulescu; Christian Herder; Christopher J Groves; Thomas Wieland; Jette Bork-Jensen; Ivan Brandslund; Cramer Christensen; Heikki A Koistinen; Alex S F Doney; Leena Kinnunen; Tõnu Esko; Andrew J Farmer; Liisa Hakaste; Dylan Hodgkiss; Jasmina Kravic; Valeriya Lyssenko; Mette Hollensted; Marit E Jørgensen; Torben Jørgensen; Claes Ladenvall; Johanne Marie Justesen; Annemari Käräjämäki; Jennifer Kriebel; Wolfgang Rathmann; Lars Lannfelt; Torsten Lauritzen; Narisu Narisu; Allan Linneberg; Olle Melander; Lili Milani; Matt Neville; Marju Orho-Melander; Lu Qi; Qibin Qi; Michael Roden; Olov Rolandsson; Amy Swift; Anders H Rosengren; Kathleen Stirrups; Andrew R Wood; Evelin Mihailov; Christine Blancher; Mauricio O Carneiro; Jared Maguire; Ryan Poplin; Khalid Shakir; Timothy Fennell; Mark DePristo; Martin Hrabé de Angelis; Panos Deloukas; Anette P Gjesing; Goo Jun; Peter Nilsson; Jacquelyn Murphy; Robert Onofrio; Barbara Thorand; Torben Hansen; Christa Meisinger; Frank B Hu; Bo Isomaa; Fredrik Karpe; Liming Liang; Annette Peters; Cornelia Huth; Stephen P O'Rahilly; Colin N A Palmer; Oluf Pedersen; Rainer Rauramaa; Jaakko Tuomilehto; Veikko Salomaa; Richard M Watanabe; Ann-Christine Syvänen; Richard N Bergman; Dwaipayan Bharadwaj; Erwin P Bottinger; Yoon Shin Cho; Giriraj R Chandak; Juliana C N Chan; Kee Seng Chia; Mark J Daly; Shah B Ebrahim; Claudia Langenberg; Paul Elliott; Kathleen A Jablonski; Donna M Lehman; Weiping Jia; Ronald C W Ma; Toni I Pollin; Manjinder Sandhu; Nikhil Tandon; Philippe Froguel; Inês Barroso; Yik Ying Teo; Eleftheria Zeggini; Ruth J F Loos; Kerrin S Small; Janina S Ried; Ralph A DeFronzo; Harald Grallert; Benjamin Glaser; Andres Metspalu; Nicholas J Wareham; Mark Walker; Eric Banks; Christian Gieger; Erik Ingelsson; Hae Kyung Im; Thomas Illig; Paul W Franks; Gemma Buck; Joseph Trakalo; David Buck; Inga Prokopenko; Reedik Mägi; Lars Lind; Yossi Farjoun; Katharine R Owen; Anna L Gloyn; Konstantin Strauch; Tiinamaija Tuomi; Jaspal Singh Kooner; Jong-Young Lee; Taesung Park; Peter Donnelly; Andrew D Morris; Andrew T Hattersley; Donald W Bowden; Francis S Collins; Gil Atzmon; John C Chambers; Timothy D Spector; Markku Laakso; Tim M Strom; Graeme I Bell; John Blangero; Ravindranath Duggirala; E Shyong Tai; Gilean McVean; Craig L Hanis; James G Wilson; Mark Seielstad; Timothy M Frayling; James B Meigs; Nancy J Cox; Rob Sladek; Eric S Lander; Stacey Gabriel; Noël P Burtt; Karen L Mohlke; Thomas Meitinger; Leif Groop; Goncalo Abecasis; Jose C Florez; Laura J Scott; Andrew P Morris; Hyun Min Kang; Michael Boehnke; David Altshuler; Mark I McCarthy
Journal:  Nature       Date:  2016-07-11       Impact factor: 69.504

View more
  94 in total

1.  Trait Insights Gained by Comparing Genome-Wide Association Study Results using Different Chronic Obstructive Pulmonary Disease Definitions.

Authors:  Jaehyun Joo; Brian D Hobbs; Michael H Cho; Blanca E Himes
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2020-05-30

2.  A Genetic Risk Score Associated with Chronic Obstructive Pulmonary Disease Susceptibility and Lung Structure on Computed Tomography.

Authors:  Elizabeth C Oelsner; Victor E Ortega; Benjamin M Smith; Jennifer N Nguyen; Ani W Manichaikul; Eric A Hoffman; Xiuqing Guo; Kent D Taylor; Prescott G Woodruff; David J Couper; Nadia N Hansel; Fernando J Martinez; Robert Paine; Meilan K Han; Christopher Cooper; Mark T Dransfield; Gerard Criner; Jerry A Krishnan; Russell Bowler; Eugene R Bleecker; Stephen Peters; Stephen S Rich; Deborah A Meyers; Jerome I Rotter; R Graham Barr
Journal:  Am J Respir Crit Care Med       Date:  2019-09-15       Impact factor: 21.405

Review 3.  Pharmacogenomics of chronic obstructive pulmonary disease.

Authors:  Craig P Hersh
Journal:  Expert Rev Respir Med       Date:  2019-04-08       Impact factor: 3.772

4.  Transcriptome-wide association study reveals candidate causal genes for lung cancer.

Authors:  Yohan Bossé; Zhonglin Li; Jun Xia; Venkata Manem; Robert Carreras-Torres; Aurélie Gabriel; Nathalie Gaudreault; Demetrius Albanes; Melinda C Aldrich; Angeline Andrew; Susanne Arnold; Heike Bickeböller; Stig E Bojesen; Paul Brennan; Hans Brunnstrom; Neil Caporaso; Chu Chen; David C Christiani; John K Field; Gary Goodman; Kjell Grankvist; Richard Houlston; Mattias Johansson; Mikael Johansson; Lambertus A Kiemeney; Stephen Lam; Maria T Landi; Philip Lazarus; Loic Le Marchand; Geoffrey Liu; Olle Melander; Gadi Rennert; Angela Risch; Susan M Rosenberg; Matthew B Schabath; Sanjay Shete; Zhuoyi Song; Victoria L Stevens; Adonina Tardon; H-Erich Wichmann; Penella Woll; Shan Zienolddiny; Ma'en Obeidat; Wim Timens; Rayjean J Hung; Philippe Joubert; Christopher I Amos; James D McKay
Journal:  Int J Cancer       Date:  2019-12-09       Impact factor: 7.396

5.  Chromatin Landscapes of Human Lung Cells Predict Potentially Functional Chronic Obstructive Pulmonary Disease Genome-Wide Association Study Variants.

Authors:  Christopher J Benway; Jiangyuan Liu; Feng Guo; Fei Du; Scott H Randell; Michael H Cho; Edwin K Silverman; Xiaobo Zhou
Journal:  Am J Respir Cell Mol Biol       Date:  2021-07       Impact factor: 6.914

Review 6.  A Bioinformatics Crash Course for Interpreting Genomics Data.

Authors:  Daniel M Rotroff
Journal:  Chest       Date:  2020-07       Impact factor: 9.410

Review 7.  Machine Learning Characterization of COPD Subtypes: Insights From the COPDGene Study.

Authors:  Peter J Castaldi; Adel Boueiz; Jeong Yun; Raul San Jose Estepar; James C Ross; George Washko; Michael H Cho; Craig P Hersh; Gregory L Kinney; Kendra A Young; Elizabeth A Regan; David A Lynch; Gerald J Criner; Jennifer G Dy; Stephen I Rennard; Richard Casaburi; Barry J Make; James Crapo; Edwin K Silverman; John E Hokanson
Journal:  Chest       Date:  2019-12-28       Impact factor: 9.410

8.  Mixed-model admixture mapping identifies smoking-dependent loci of lung function in African Americans.

Authors:  Andrey Ziyatdinov; Margaret M Parker; Amaury Vaysse; Terri H Beaty; Peter Kraft; Michael H Cho; Hugues Aschard
Journal:  Eur J Hum Genet       Date:  2019-12-13       Impact factor: 4.246

Review 9.  Why is Disease Penetration So Variable? Role of Genetic Modifiers of Lung Function in Alpha-1 Antitrypsin Deficiency.

Authors:  Brian D Hobbs; Michael H Cho
Journal:  Chronic Obstr Pulm Dis       Date:  2020-07

10.  Chloride channels regulate differentiation and barrier functions of the mammalian airway.

Authors:  Mu He; Bing Wu; Wenlei Ye; Daniel D Le; Adriane W Sinclair; Valeria Padovano; Yuzhang Chen; Ke-Xin Li; Rene Sit; Michelle Tan; Michael J Caplan; Norma Neff; Yuh Nung Jan; Spyros Darmanis; Lily Yeh Jan
Journal:  Elife       Date:  2020-04-14       Impact factor: 8.140

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.