Literature DB >> 27835950

Targeted high-throughput sequencing of candidate genes for chronic obstructive pulmonary disease.

Hans Matsson1,2, Cilla Söderhäll3,4, Elisabet Einarsdottir3,5, Maxime Lamontagne6, Sanna Gudmundsson7, Helena Backman8, Anne Lindberg9, Eva Rönmark8, Juha Kere3,5, Don Sin10, Dirkje S Postma11, Yohan Bossé6,12, Bo Lundbäck13, Joakim Klar7.   

Abstract

BACKGROUND: Reduced lung function in patients with chronic obstructive pulmonary disease (COPD) is likely due to both environmental and genetic factors. We report here a targeted high-throughput DNA sequencing approach to identify new and previously known genetic variants in a set of candidate genes for COPD.
METHODS: Exons in 22 genes implicated in lung development as well as 61 genes and 10 genomic regions previously associated with COPD were sequenced using individual DNA samples from 68 cases with moderate or severe COPD and 66 controls matched for age, gender and smoking. Cases and controls were selected from the Obstructive Lung Disease in Northern Sweden (OLIN) studies.
RESULTS: In total, 37 genetic variants showed association with COPD (p < 0.05, uncorrected). Several variants previously discovered to be associated with COPD from genetic genome-wide analysis studies were replicated using our sample. Two high-risk variants were followed-up for functional characterization in a large eQTL mapping study of 1,111 human lung specimens. The C allele of a synonymous variant, rs8040868, predicting a p.(S45=) in the gene for cholinergic receptor nicotinic alpha 3 (CHRNA3) was associated with COPD (p = 8.8 x 10-3). This association remained (p = 0.003 and OR = 1.4, 95 % CI 1.1-1.7) when analysing all available cases and controls in OLIN (n = 1,534). The rs8040868 variant is in linkage disequilibrium with rs16969968 previously associated with COPD and altered expression of the CHRNA5 gene. A follow-up analysis for detection of expression quantitative trait loci revealed that rs8040868-C was found to be significantly associated with a decreased expression of the nearby gene cholinergic receptor, nicotinic, alpha 5 (CHRNA5) in lung tissue.
CONCLUSION: Our data replicate previous result suggesting CHRNA5 as a candidate gene for COPD and rs8040868 as a risk variant for the development of COPD in the Swedish population.

Entities:  

Keywords:  Association; CHRNA5; COPD; Lung development; Sequencing; eQTL

Mesh:

Substances:

Year:  2016        PMID: 27835950      PMCID: PMC5106844          DOI: 10.1186/s12890-016-0309-y

Source DB:  PubMed          Journal:  BMC Pulm Med        ISSN: 1471-2466            Impact factor:   3.317


Background

Chronic obstructive pulmonary disease (COPD), characterised by a persistent airflow obstruction [1], is a life-threatening disease accounting for 6 % of all deaths globally in 2012 [2]. The development of the disease is influenced by environmental determinants, most commonly cigarette smoking, genetic risk factors and possible genetic protective factors [3]. Candidate gene association studies have suggested several potential COPD susceptibility genes, and genome-wide association studies (GWAS) have identified multiple COPD susceptibility loci [4]. However, genetic mapping in families with high penetrance for a disease gene variant can be helpful in pinpointing new susceptibility genes even for multifactorial traits. Recently, we reported mutations in the gene for fibroblast growth factor 10 (FGF10) involved in lung development, as a possible cause of COPD in families from Sweden [5]. Hence, a monogenic form of COPD could result from mutations in FGF10. To date, the only other known monogenic form of COPD is alpha 1-antitrypsin deficiency caused by disruption of the alpha-1-antiproteinase (SERPINA1) gene [6]. Typically in GWAS, common polymorphisms are tested for association. In this study, we provide an alternative approach with the aim to perform an in-depth analysis of exons of candidate genes for COPD by using high-throughput sequencing. This allowed us to detect the full spectrum of single nucleotide variation at any frequency in selected genomic regions and to also capture variants with a potential functional effect on gene expression levels. We show here that targeted high throughput sequencing using a well-defined population-based case–control sample can i) assess the impact of common variants in genes important for lung development, and ii) test genetic variants in a large set of candidate genes and genomic regions for association with COPD. To accomplish this we captured and sequenced 22 genes implicated in lung development as well as 61 genes and 10 genomic regions previously associated with COPD. The sample used here is comprised of cases and controls from The Obstructive Lung Disease in Northern Sweden (OLIN) studies. The population in northern Sweden, an admixture of three different ethnic groups (Swedes, Finns and Saami), showed a dramatic growth of population size since the 18th century from a relatively small founder population [7]. This resulted in founder effects that significantly reduced the heterogeneity of this population, making it suitable for genetic association studies of multifactorial phenotypes, such as COPD [8]. This study assessed Swedish COPD cases and controls and assessed detected variants in candidate genes for association with COPD. We replicated a previous described association signal in CHRNA3, which also associated with lower CHRNA5 gene expression. The DNA capture design and targeted sequencing used here show potential to detect known single nucleotide variants in association with COPD with the additional potential to also detect low-frequent variants. The result presented here using the relative limited sample size could be replicated using our targeted capture design in larger samples from different populations.

Methods

Patient material and ethics statement

The OLIN studies are an on-going research program focused on asthma, allergy and COPD. It started 30 years ago [9] and now involves more than 50,000 subjects from northern Sweden. Within OLIN, a COPD-cohort was identified at re-examination of several cohorts in 2002–2004 [10]. At recruitment, COPD (n = 993) was defined using the fixed ratio of FEV1 / FVC < 0.70 (forced expiratory volume in 1 s / forced vital capacity). When calculating the ratio FEV1 / FVC, the highest values of FEV1 and the highest value of forced vital capacity (FVC) or slow vital capacity (SVC) were used. This has support in the GOLD documents [1] and is acknowledged in the recent ERS task force guidelines for epidemiological studies on COPD [11]. An age and gender matched control population (n = 993) without obstructive lung function impairment was also recruited [10]. Since 2005 the OLIN COPD cohort with corresponding controls is followed up annually with a basic program including spirometry and interviews regarding symptoms and morbidity [12]. We initially selected 96 COPD cases (18 non-smokers, 43 former smokers and 35 smokers) from those who had an FEV1 < 80 % of predicted value in 2005 and either FEV1 / FVC < LLN (lower limit of normal) in 2010 or were rapid decliners with an annual FEV1 decline of ≥ 60 ml between 2005 and 2010. We also identified a set of 96 age- and gender-matched controls (33 smokers and 63 former smokers) with normal lung function. These 96 cases and 96 controls are henceforth termed the OLIN discovery sample (Table 1). Furthermore, we defined an OLIN replication sample consisting of individuals from the OLIN COPD study for which DNA was available (n = 1,534). From this group we classified individuals as cases when FEV1 / FVC was lower than LLN in 2010, or if they had a yearly FEV1 decline from 2005 to 2010 of at least 60 ml (n = 256). The remaining individuals were used as a reference group (n = 1,278). The physiological parameters for the OLIN replication sample included average age (cases 64 years, SD 11; controls 66 years, SD 11; P = 0.0012), gender (cases 123 females, 133 males; controls 569 females, 709 males; P = 0.30), smoking habits (cases 11 pack/year, SD 15; controls 23 pack/year, SD 17; P = 4.2 × 10−9), weight (cases 73 kg, SD 15; controls 77 kg, SD 14; P = 1.2 × 10−4) and height (cases 168 cm, SD 9; controls 168 cm, SD 10; P = 0.8). The phenotype description of this sample included measures of FVC (cases 3.29, SD 1.01; controls 3.50, SD 1. 03; P = 0.0031), FEV1 (cases 1.90, SD 0.67; controls 2.64, SD 0.81; P = 9.5 × 10−43) and FEV1% of predicted values (cases 67 %, SD 17 %; controls 95 % SD 16 %; P = 5.3 × 10−73), as well as the FEV1/FVC ratio (cases 0.57, SD 0.08; controls 0.75, SD 0.07; P = 2.0 × 10−108) when investigated the fifth year of the evaluations. For description of individual reference values, see Additional file 1.
Table 1

Pulmonary function in patients and controls

AverageFVC (cm3)FVC LLN (cm3)FVC pred (cm3)FEV1 (cm3)FEV1 LLN (cm3)FEV1 pred (cm3)FEV1%pred (%)FEV1 / FVCFEV1 / FVC LLNFEV / FVC pred
Cases2.69 (0.95)2.87 (0.81)3.79 (0.81)1.50 (0.59)2.01 (0.55)2.78 (0.61)53 (14)0.56 (0.11)0.62 (0.03)0.73 (0.03)
Controls3.96 (0.90)2.77 (0.77)3.83 (0.77)3.11 (0.69)2.05 (0.55)2.82 (0.59)110 (9)0.79 (0.05)0.62 (0.03)0.74 (0.03)
P value6.3 × 10−18 0.420.754.0 × 10−41 0.620.582.5 × 10−81 2.5 × 10−45 0.240.23

Values in parenthesis denote standard deviations. FVC Forced vital capacity, LLN Lower limit of normal, FVC pred, predicted FVC based on age, gender and length in the population. FEV1 Forced expiratory volume in 1 s, FEV1 Predicted FEV1 based on age, gender and length in the population, FEV1%pred, FEV1 Divided by predicted FEV1 in the population, FEV1 / FVC ratio when investigated the fifth year of the evaluations. The lower limit of normal (LLN) values where calculated by subtracting 1.645 × RSD (residual standard deviation) from the predicted value (pred). FEV1%pred is the measured FEV1 value divided by the predicted value (i.e., FEV1 / FEV1pred). P values are calculated using two-sided Student’s t-test assuming equal variance. SD Standard deviation

Pulmonary function in patients and controls Values in parenthesis denote standard deviations. FVC Forced vital capacity, LLN Lower limit of normal, FVC pred, predicted FVC based on age, gender and length in the population. FEV1 Forced expiratory volume in 1 s, FEV1 Predicted FEV1 based on age, gender and length in the population, FEV1%pred, FEV1 Divided by predicted FEV1 in the population, FEV1 / FVC ratio when investigated the fifth year of the evaluations. The lower limit of normal (LLN) values where calculated by subtracting 1.645 × RSD (residual standard deviation) from the predicted value (pred). FEV1%pred is the measured FEV1 value divided by the predicted value (i.e., FEV1 / FEV1pred). P values are calculated using two-sided Student’s t-test assuming equal variance. SD Standard deviation P values for differences in parameters between cases and controls were calculated using two-sided Student’s t-tests, assuming equal variance. The ethics board of Umeå University (Dnr 04-045 M, supplement approved 2005-06-13) approved the use of individual phenotypic data and DNA samples for genetic research.

Sequencing and quality controls

In total 22 genes implicated in lung development, 61 genes and 10 genomic regions previously associated with COPD (Additional files 1 and 2), were investigated using targeted sequencing of captured genomic regions (HaloPlex Protocol Version A, Agilent Technologies, Santa Clara, CA). Regions of 1.5 kb of genomic sequence, including specific intergenic polymorphisms, was also included in the design. The regions of interest (ROI) were designed to target all known exons of major/known transcripts and at least 20 base pairs (bp) of intronic sequences flanking each exon. The sequence capture design included 953 target regions spanning 204.384 bp with 95.9 % (196.066 bp) coverage an average. Captured genomic regions were subjected to high throughput paired-end (100 bp read module) sequencing (HiSeq2000, Illumina, San Diego, CA) at the Science for Life Laboratory in Uppsala, Sweden. Sequence reads were aligned to the hg19 reference genome and single nucleotide variants (SNVs) were called using GATK Unified Genotyper (GATK bundle v.2.2) [13]. Next, we enriched for high quality SNVs by removing SNVs with low confidence (QD < 1.5), Phred scaled quality score (<50) and SNVs within SNV clusters. These high quality variants are henceforth referred to as ‘variants’. We also removed individual cases and controls with sequencing read depth consistently < 10 reads. The strategy for filtering and quality controls are illustrated in Fig. 1.
Fig. 1

Flow chart of the strategy for variant calling, quality control and association tests. QC = quality control

Flow chart of the strategy for variant calling, quality control and association tests. QC = quality control

Statistical analyses

Test for genetic association and genetic effect was performed for each predicted variant separately using the discovery sample. In addition, rs8040868 and rs11728716 were tested for association using the available OLIN sample (replication sample) as, according to RegulomeDB, these two markers present with potential functional effects on gene regulation (Table 2). Tests for allelic association of individual variants with COPD were performed using the Fisher’s exact test. Results were considered statistically significant when p < 0.05. No adjustment for multiple testing was performed in these analyses. Effect size was measured using odds-ratios (OR) with 95 % confidence intervals (CI).
Table 2

Associated genetic variants in the discovery sample set (n = 96 cases and n = 96 controls)

ChrPos (bp)SNP rs IDAltRefF c (%)F p (%)OR (95 % CI) P valueGeneCodingRG scoreExAc Eur frequencya CADD_PHREDb
111023056972989301GA10.022.12.5 (1.0-6.2)0.038 GSTM1 40.24NA
1110233057111436983CT11.626.72.8 (1.2-6.2)0.013 GSTM1 70.24NA
221699193512694384AC29.417.90.5 (0.3-0.9)0.04 XRCC5 5NANA
221866922561741262CT13.60NA0.027 TNS1 p.(Asp1722Ser)40.1312.49
22187469902303381TA7.71.50.2 (0.0-0.9)0.019 TNS1 6NANA
355520778566926TG39.225.00.5 (0.3-0.9)0.017 WNT5A 3aNANA
410663869772671840GA2.311.45.3 (1.5-18.9)5.8 × 10−3 GSTCD 6NANA
410664767972671858TC3.010.53.7 (1.2-11.7)0.026 GSTCD 7NANA
410675599611728716AG3.421.07.4 (2.5-22.5)6.5 × 10−5 GSTCD 1fNANA
5582842083805557CT22.011.90.5 (0.2-0.9)0.034 PDE4D 7NANA
5582842833805556GA22.011.80.5 (0.2-0.9)0.033 PDE4D 60.84NA
5582866251553114CT22.011.80.5 (0.2-0.9)0.033 PDE4D 70.84NA
514199386717223611TC10.63.00.3 (0.1-0.8)0.025 FGF1 5NANA
61427031372143390TC2.919.27.9 (1.6-37.6)4.5 × 10−3 GPR126 p.(Asp373=)70.12NA
61511975019322290CT17.49.00.5 (0.2-1.0)0.047 MTHFD1L 5NANA
6151206894147872265TC7.62.20.3 (0.1-1.0)0.049 MTHFD1L 70.002NA
6151263456803451AG41.855.71.8 (1.1-2.9)0.04 MTHFD1L 7NANA
6151264132803448TC37.950.71.7 (1.0-2.7)0.037 MTHFD1L 6NANA
61521835511643821AG25.839.01.8 (1.1-3.1)0.026 ESR1 6NANA
84255253041272375CG1.56.94.8 (1.0-22.8)0.034 CHRNB3 5NANA
9982395033780573AG18.38.70.4 (0.2-0.9)0.041 PTCH1 5NANA
10817062816413520GA0.85.98.2 (1.0-66.4)0.036 SFTPD p.(Ser45=)50.07NA
1012335809641301039GC25.02.80.1 (0.0-0.7)0.017 FGFR2 4NANA
11102738499632009TC29.650.82.5 (1.5-4.1)6.7 × 10−4 MMP12 7NANA
122373756611046992AG21.233.11.8 (1.1-3.2)0.039 SOX5 6NANA
1211022491660258652TC2.410.54.7 (1.0-22.9)0.05 TRPV4 5NANA
121102249221861810AC36.653.82.0 (1.1-3.8)0.04 TRPV4 5NANA
1211023203259870578AG7.81.60.2 (0.0-0.9)0.034 TRPV4 4NANA
1211023203459940634TG7.81.60.2 (0.0-0.9)0.034 TRPV4 4NANA
15714340292004101AT0.85.67.5 (0.9-61.6)0.035 THSD4 7NANA
15787901892292115GA4.80NA0.03 IREB2 7NANA
15789111818040868CT31.847.82.0 (1.2-3.2)8.8 × 10−3 CHRNA3 p.(Val53=)1f0.41NA
1616130514903880AC15.927.22.0 (1.1-3.6)0.027 ABCC1 4NANA
16162057419673292CG4.50NA0.013 ABCC1 6NANA
1616230290212087AG38.652.21.7 (1.1-2.8)0.028 ABCC1 50.44NA
1616235366113328089AG6.91.50.2 (0.0-1.0)0.034 ABCC1 5NANA
201596739041275442TC9.117.72.1 (1.0-4.5)0.049 MACROD2 p.(Thr100Met)40.183.98

Alt, non-reference allele. Ref, reference allele, OR Odds ratio with 95 % confidence intervals, F c Frequency in controls, F p Frequency in cases (patients), RG Score, RegulomeDB score, NA Not available

aNon-Finnish European allele frequency extracted from ExAc v.0.3.1

bThe “PHRED-scaled” CADD score based on ranks of all SNV in the hg19 genome reference

Associated genetic variants in the discovery sample set (n = 96 cases and n = 96 controls) Alt, non-reference allele. Ref, reference allele, OR Odds ratio with 95 % confidence intervals, F c Frequency in controls, F p Frequency in cases (patients), RG Score, RegulomeDB score, NA Not available aNon-Finnish European allele frequency extracted from ExAc v.0.3.1 bThe “PHRED-scaled” CADD score based on ranks of all SNV in the hg19 genome reference Visualization SNPs associated with COPD located in the CHRNA3/5 region was made using LocusZoom v1.1 [14] available on http://locuszoom.sph.umich.edu/locuszoom/. RefSeq gene/transcript case–control tests for aggregation of genetic variants in the targeted genomic regions were performed using PLINK/SEQ v0.1 [15]. Tests were divided into an analysis of rare variants with minor allele frequency (MAF) < 5 % or common variants (MAF ≥ 5 %). The UNIQ test, which identifies unique risk alleles, was utilized using default parameters to count the total number of alleles found only in cases (risk variants). Similarly, the SKAT burden tests, which assesses excess of rare alleles in cases compared to controls, was also utilized. Since both UNIQ and SKAT burden are 1-sided tests, we also swapped the phenotype information and analyzed the effects in both directions (excess of alleles in cases or controls) separately to capture evidence of both risk and protective alleles. Due to the matched design of the case and control groups no covariate adjustments (age, sex, pack-years) were performed in analysis using the discovery sample. Linkage disequilibrium information of associated variants was extracted from genotypes from the sequencing analysis using Haploview 4.2 [16].

Functional analysis of associated variants

To study the possible effect of associated variants on gene expression, we used information from RegulomeDB, a database that combines ENCODE data sets (chromatin immunoprecipitation sequencing (ChIP-seq) peaks, DNase I hypersensitivity peaks, DNase I footprints) with additional data sources (ChIP-seq data from the NCBI Sequence Read Archive, conserved motifs, expression quantitative trait loci (eQTL), and experimentally validated functional variants) [17]. A scoring system is based on the confidence of the functionality of variants, a lower score corresponding to stronger confidence. Subcategories are used to denote additional functional annotations. Combined Annotation Dependent Depletion (CADD) scores were used to assess potential structural and functional effect of associated nonsynonymous variants [18].

Lung expression quantitative trait loci analyses

The existence of expression quantitative trait loci (eQTLs) was investigated as previously described using genotyping and gene expression data from 1,111 patients who underwent lung surgery at one of three sites, Laval University (discovery sample), University of British Columbia, and University of Groningen (replication sample sets) (referred to as Laval, UBC, and Groningen) [19, 20]. The eQTL data is derived from non-tumour lung parenchymal samples and expression data were adjusted for age, gender, and smoking status. Estimated P-values for each region were Bonferroni-corrected for multiple testing based on the number of SNPs and probe sets (number of SNPs x number of probe sets) and were considered significant if corrected p < 0.05.

SNP genotyping for validation of rs11728716 and rs8040868

Individuals from the OLIN replication sample (n = 1,534) were genotyped for the rs11728716 and rs8040868 variants (99.2 % and 99.8 % success rate, respectively) at the Uppsala Genome Center (Uppsala, Sweden) using commercially available TaqMan assays (Life Technologies, Carlsbad, CA). Assay conditions were according to manufacturer’s recommendations. Effect size was estimated by comparing ORs with 95 % CI between cases and controls. Furthermore, to assess smoking dependence, we measured association and effect sizes also between the groups ‘non-smokers’ and ‘ever smokers’, and between ‘current smokers’ and ‘former smokers’.

Results

Selection of cases and controls for the discovery sample

The characteristics of each sample are listed in this section as value ± standard deviation. Cases and controls were matched for age (cases: 68 ± 10 years; controls: 66 ± 11 years; p = 0.15), gender (cases: 35 females, 61 males; controls: 31 females, 65 males; p = 0.65) and smoking habits (cases: 26 ± 19 pack/year; controls: 28 ± 12 pack/year; p = 0.53). Both groups were also closely matched for weight (cases: 76 ± 15 kg; controls: 77 ± 15 kg; p = 0.85) and height (cases: 169 ± 9 cm; controls: 169 ± 9 cm; p = 0.7). No non-smokers were included in the control group to avoid false negative results. The cases presented a significant reduction in lung function consistent with moderate or severe COPD. This is illustrated by a reduced FVC (cases: 2.69 ± 0.95 L; controls: 3.96 ± 0.90 L; p = 6.3 × 10−18), FEV1 (cases: 1.50 ± 0.59 L; controls: 3.11 ± 0.69 L; p = 4.0 × 10−41) and FEV1% of predicted values (cases: 53 ± 14 %; controls: 110 ± 9 %; p = 2.5 × 10−81), as well as the FEV1/FVC ratio (cases: 0.56 ± 0.11; controls: 0.79 ± 0.05; p = 2.5 × 10−45) when investigated the fifth year of the evaluations (Table 1).

Test for association between genetic variants and COPD

We identified 2,151 SNVs after analysis of the sequenced target regions. After variant and sample quality control procedures, 1588 SNVs and 68 cases and 66 controls were retained in the downstream analysis (Fig. 1). Out of the 1588 variants, we identified 37 variants with significantly different allele frequencies in cases and controls (henceforth referred to as ‘associated variants’) (Table 2). We initially detected two novel variants in the discovery sample: GRCh37.p13, 5:g.157002804C > G in the ADAM19 gene and GRCh37.p13, 7:g.73477874C > A in ELN. However, using Sanger sequencing of the same sample we excluded both variants, as they were monomorphic. Three of the associated variants were shown to be unique to controls including missense variant rs61741262 (p.Asn1722Ser) in TNS1. The most significantly associated variants were all intronic (GSTCD, rs11728716, p = 6.5 × 10−5, OR = 7.4 (2.5-22.5) and MMP12, rs632009, p = 6.7 × 10−4, OR = 2.5 (1.5-4.1). Although the majority of the associated variants were intronic (or intergenic), five were protein-coding (Table 2). Of these, two variants predicted amino-acid substitutions (missense variants): p.(Thr100Met) in MACROD2 and p.(Asn1722Ser) in TNS1 respectively. The p.(Asn1722Ser) variant could be potential damaging based on a relative high CADD score or 12.49. Of the coding variants, we found that rs2143390, predicting a p.(D373=) in GPR126 (p = 0.005, OR = 7.9 (1.6-37.6)), rs6413520 in SFTPD (p.(Ser45=), p = 0.036, OR = 8.2 (1.0-66.4)), rs8040868 in CHRNA3 (p.(Val53=), p = 8.8 × 10−3, OR = 2.0 (1.2-3.2)) and rs41275442 in MACROD2 (p.(Thr100Met), p = 0.049, OR = 2.1 (1.0-4.5)) conferred moderate to high risk for COPD (Table 2). We tested the presence of variants uniquely found in cases or controls as well as gene burden tests in cases against controls as specified in PLINK/SEQ v0.1 using standard settings. We noted that the ADAM19, WNT2, CHRNA5, NOS3 and PTCH1 genes all harbor rare variants (MAF < 5 %) uniquely found in cases (Additional file 3). Conversely, the FGF8, CTNNB1 and HHIP genes contain rare variants uniquely found in the control sample (Additional file 3). Neither gene burden analysis (SKAT) or analysis of rare alleles (MAF < 5 %) yielded significant results. However, by performing a joint analysis with only common alleles (MAF ≥ 5 %) in target regions using SKAT, we showed a significant gene burden for the genes GSTCD, FGF1, ELN and ESR1 (Additional file 4).

Haplotypes and linkage disequilibrium

We identified five regions with associated variants in pairwise LD (r2 > 0.7, D´ = 1.0). The regions were located at the GSTM1 gene locus on chromosome 1 (rs72989301-rs111436983), GSTCD on chromosome 4 (rs72671840-rs72671858), PDE4D on chromosome 5 (rs3805557- rs3805556- rs1553114), MTHFDIL on chromosome 6 (rs803451- rs803448) and TRPV4 on chromosome 12 (rs59870578-rs59940634) (Additional file 5). Furthermore, the variant rs8040868 on chromosome 15q21.1 is in pairwise LD (r2 = 0.76, D’ = 1.0) with rs16969968, a nonsynonymous variant previously associated with expression of the CHRNA5 gene [21]. The rs16969968 variant was included in our capture design but it did not reach significant association in the OLIN discovery sample (OR 1.6; p = 0.07) (Additional file 6).

In silico analysis of predicted functions of associated variants

According to RegulomeDB, all 37 associated variants were located within known and predicted regulatory elements in intergenic regions (Table 2). We noted that a variant in CHRNA3 (rs8040868) and a variant in GSTCD (rs11728716) each showed a RegulomeDB score of “1f”, denoting the presence of transcription binding site or DNAse peak.

Lung eQTL results

According to RegulomeDB, the rs8040868 (CHRNA3) and rs11728716 (GSTCD) variants could present with potential functional effects on gene regulation. To determine if these variant could represent eQTL, we analysed the genotypes and gene expression data in the discovery sample (Laval) as well as replication samples (UBC and Groningen). One of these variants, rs8040868:C > T, was confirmed to be significantly associated with gene expression of the nearby gene CHRNA5 in all three data sets, with the C allele (minor allele) associated with lower CHRNA5 expression (Fig. 2 and Additional file 7). Interestingly, we could also see a high correlation between rs8040868 and expression of an anti-sense transcript (AF147302) of unknown function from the adjacent IREB2 gene region (data not shown). AF147302 is likely a result of strong bi-directional promoter activity in this region [22].
Fig. 2

CHRNA5 gene expression levels in the lungs according to genotype groups for rs8040868. The left y-axis represents standardised gene expression levels in the lung with heterozygous genotype group set to zero. The x-axis represents the three genotyping groups, TT, CT and CC (risk allele C), for the variant in (a) the discovery set (Laval) n = 408; P = 4.2 × 10−10, and the replication sets (b) UBC, n = 287; P = 2.31 × 10−7, and (c) Groningen, n = 342; P = 1.5 × 10−6. The number of subjects per genotype group is indicated in parenthesis. The right y-axis shows the proportion of the gene expression variance explained by the variant (black bar)

CHRNA5 gene expression levels in the lungs according to genotype groups for rs8040868. The left y-axis represents standardised gene expression levels in the lung with heterozygous genotype group set to zero. The x-axis represents the three genotyping groups, TT, CT and CC (risk allele C), for the variant in (a) the discovery set (Laval) n = 408; P = 4.2 × 10−10, and the replication sets (b) UBC, n = 287; P = 2.31 × 10−7, and (c) Groningen, n = 342; P = 1.5 × 10−6. The number of subjects per genotype group is indicated in parenthesis. The right y-axis shows the proportion of the gene expression variance explained by the variant (black bar)

The rs8040868 variant is associated with COPD in the OLIN replication sample

We also investigated pulmonary data from the replication sample (n = 1,534; cases = 256, controls = 1278). Analysis using RegulomeDB predicted both rs11728716 and rs8040868 variants as being functional (score 1f for both variants). We therefore selected these two variants for genotyping in all available OLIN samples (n = 1,534). The frequency of the rs8040868-C allele was 35 % in the reference group (n = 1,278) and 42 % in the cases (n = 256) resulting in a significant association (p = 0.003) and an OR of 1.4 (95 % CI 1.1-1.7) for COPD. SweGen variant frequency database reports a 39 % frequency of rs8040868 in 1000 whole genomes representing a cross-section of the Swedish population (https://swefreq.nbis.se). The frequency of the homozygous rs8040868-CC genotype was 12 % in the reference group and significantly higher (18 %) in the cases (p = 0.018). When comparing smoking status using all genotyped individuals, no significant difference in allele frequency between neither the groups ‘non-smokers’ (n = 589) and ‘ever smokers’ (n = 943) (OR 1.1 95 % CI 1.0-1.7; p = 0.09) nor between the groups ‘current smokers’ (n = 312) and ‘former smokers’ (n = 631) (OR 1.2 95 % CI 1.0-1.4; p = 0.11) was seen. The latter test was used to assess nicotine dependency and aptitude for smoking cessation under the assumption that a genetic variant associated with these traits would be underrepresented in a former smoking group as compared to a group of current smokers, i.e., harder to quit smoking. The tests for association with smoking must however be taken with caution as the confidence intervals are wide and a larger sample size would be needed for replication. Analysis of rs11728716 using the OLIN replication sample (n = 1,534) revealed no association with COPD (p = 0.07; OR = 2.2). In order to test if rs11728716 is associated with severe COPD, we stratified the available COPD cases based on severity and selected cases with FEV1%pred < 40 % and FEV1/FVC < LLN. Our results show a significant association (p = 0.017) between rs11728716 and the group of severe COPD (n = 14). The allele frequency of rs11728716-A was 10 % among cases with severe COPD and 4 % in the controls.

Discussion and conclusion

Genetic variants influencing lung function in children and adults may ultimately lead to the development of COPD [23]. Since limited disease-specific therapy for COPD is available, an improved knowledge of genetic variants modulating the pathogenic mechanisms underlying COPD is greatly needed. We aimed here to identify genetic variants within, or close to, the coding regions of genes and loci previously associated with COPD, or in genes involved in lung development. We opted for a qualitative rather than a quantitative approach with the selection of cases with moderate or severe COPD and progressive decline in lung function. Furthermore, controls were all smokers without COPD that, in our study design, can aid the identification of potential protective genetic variants and aid detection of genetic variants associated with severe COPD. When applying a Bonferroni correction for the total number of variants detected, no variants showed statistically significant association. We did, however, identify several variants with a likely biological significance, as indicated by high effect sizes (odds ratio), that we believe warrants further investigation in a larger sample. Furthermore, potential functional effects of variants were investigated using data from a large number of lung samples and we describe here a COPD lung eQTL. When comparing our association data with the lung eQTL data (discovery data set from Laval University), we could identify a variant associated with COPD that was also associated with level of gene expression (Fig. 2). This variant, synonymous variant (rs8040868) in CHRNA3 on chromosome 15, confers a risk for the development of COPD in both our OLIN discovery sample with moderate or severe COPD and our OLIN replication sample including all available COPD cases and controls in OLIN (OR 1.4, p = 0.003). In the lung eQTL data, we could see a correlation of the C allele of rs8040868 with lower expression levels of CHRNA5 (Fig. 2), and, to a lesser extent, also CHRNA3 and PSMA4, which are located in close proximity to CHRNA5. The α-nicotinic receptor (CHRNA3/5) gene locus on chromosome 15q25.1 is associated with COPD, lung cancer and peripheral arterial disease, as well as other smoking related conditions [24, 25] and nicotine addiction [26, 27]. Recently, the CHRNA3/5 locus was implicated in all-cause mortality among smokers in a Finnish cohort [2]. The rs8040868-C allele associates with both reduced pulmonary function and lung cancer [24, 25, 28, 29] and affects DNA-methylation and transcription of CHRNA5 [30]. Furthermore, rs8040868 is also in LD with a nearby variant (rs16969968) previously reported to be associated with expression levels of CHRNA5 in the lung [21]. The direction of effect is the same for both SNPs, with the minor alleles associated with reduced expression of CHRNA5. Also recently, rs16969968 was found to be the most significantly associated variant in an exome array analysis in a study including more than 6,100 COPD cases and 6,000 control subjects across five cohorts [31]. Several genetic variants showed association with COPD in our population, but did not correlate with gene expression levels in the lung, including previously identified variants in the genes glutathione S-transferase, c-terminal domain containing (GSTCD), surfactant protein D (SFTPD) and matrix metalloproteinase-12 (MMP12) [32-36]. We identified a haplotype consisting of three risk-conferring variants, rs72671840, rs72671858 and rs11728716 (G-T-A haplotype), at the GSTCD gene locus on chromosome 4q24. The variant rs11728716 has previously been associated with lung function [32-34] and is likely to affect the transcription of GSTCD. We show here that rs11728716 was associated with severe COPD using the OLIN replication sample. Although intriguing, due to the limited number of severe COPD cases used in this study, this result needs further verification in a larger sample. The other two variants (rs72671840 and rs72671858) are of unknown function [37]. GSTCD encodes a glutathione S-transferase C-terminal domain protein involved in detoxification by catalysing conjugation of glutathione to products of oxidative stress. We found association between COPD and rs6413520, a synonymous variant, p.(Ser45=), within SFTPD on chromosome 10q22.3. This variant conferred a high risk (OR = 8.2) for COPD in our study and has previously been reported to be associated with COPD susceptibility [36]. SFTPD encodes surfactant protein D, of importance for the regulation of oxidant production, inflammatory responses, and apoptotic cell clearance in the lung [38]. We also identified rs632009, in the MMP12 gene on chromosome 11q22.3, to confer moderate risk. Matrix metalloproteinases (MMPs) are involved in both tissue remodelling and repair and several members of the MMP family have been implicated in COPD pathology [35, 39, 40]. In this study, we also found association (uncorrected) with novel susceptibility variants. Several variants in the G-protein-coupled receptor 126 (GPR126) gene on chromosome 6q24.1 have previously been associated with FEV1 / FVC ratio [32]. GPR126 belongs to a superfamily of G protein-coupled receptors and is involved in cell signalling and adhesion. Studies in mice show an induction of Gpr126 expression between embryonic day 7 and 11 with expression in the developing heart and face as well as a high expression in the adult lung [41]. We found significant association between a synonymous variant in the GPR126 gene (rs2143390, p.(Asp373=)) and COPD. The alternative T allele is highly overrepresented in cases compared to controls (p = 4.5 × 10−3, OR = 7.9). We also focused our attention to the chromosome 4q31 locus upstream of HHIP, previously shown to be associated with expression of the gene [20, 42]. The HHIP upstream region belongs to one of the so far strongest COPD association signals [43], but no association could be seen in our case–control groups for any upstream variants. The sequencing approach allowed us to detect rare alleles in both cases and controls. We therefore performed gene burden tests to find evidence of overrepresented rare or common variants in individual genes or transcripts in the cases or controls, respectively. Interestingly, we found that the genes ADAM19, WNT2, CHRNA5, NOS3 and PTCH1 all contain rare variants (MAF < 5 %) uniquely found in cases of the OLIN discovery sample. These variants, and especially the coding variants with predicted functional effect, could be followed up in a larger case–control sample for verification and further genetic and functional analysis. We assessed 83 genes and 10 genomic regions of 1.5 kb size for variants associated with COPD in a sample from Northern Sweden. Still, one limitation of our study is that the targeted capture design may exclude yet unknown genomic regions that can harbour genetic variation influencing COPD. Also, the two novel variants detected after sequencing were monomorphic and an assessment of the false discovery rate using HaloPlex with subsequent Illumina sequencing would be helpful in order to evaluate our set of candidate genes as a gene panel for COPD. Furthermore, we cannot rule out that some findings are influenced by population substructure and replication of our result in different populations is essential. It is also possible that some risk variants were not identified due to the limited number of cases and controls used for sequencing. Using a conservative Bonferroni correction based on the 1588 variants detected resulted in no variants reached significant association with COPD. However, we believe there is no definite consensus regarding the type of multiple testing procedures to use in targeted sequencing based approaches. Furthermore, many parameters such as variant quality checks, genotyping success rate and sequencing depth limit will influence the number of variants found, and consequently, multiple testing adjustments. Also, in addition to include genes including variants previously associated with COPD or asthma, we explored if a set of genes involved in lung development would harbour variants in association with COPD in the Swedish discovery sample. Therefore, as the study is exploratory with a mixed hypothesis the p values for association testing in this study are not corrected for multiple testing. Despite the limited size of the discovery sample used here, we identified several high-risk genetic variants for COPD and we replicated several previous GWAS results. In particular, our results support the CHRNA5 gene as a likely candidate gene for COPD where the rs8040868-C allele confers a risk for the disease in the Swedish population. Furthermore, we indicate the advantage of using less heterogeneous populations in the studies of complex disorders.
  39 in total

1.  DREG, a developmentally regulated G protein-coupled receptor containing two conserved proteolytic cleavage sites.

Authors:  Tetsuo Moriguchi; Keiko Haraguchi; Naoko Ueda; Masato Okada; Toshio Furuya; Tetsu Akiyama
Journal:  Genes Cells       Date:  2004-06       Impact factor: 1.891

2.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors:  Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal:  Genome Res       Date:  2010-07-19       Impact factor: 9.043

3.  Polymorphisms in surfactant protein-D are associated with chronic obstructive pulmonary disease.

Authors:  Marilyn G Foreman; Xiangyang Kong; Dawn L DeMeo; Sreekumar G Pillai; Craig P Hersh; Per Bakke; Amund Gulsvik; David A Lomas; Augusto A Litonjua; Steven D Shapiro; Ruth Tal-Singer; Edwin K Silverman
Journal:  Am J Respir Cell Mol Biol       Date:  2010-05-06       Impact factor: 6.914

4.  Obstructive lung disease in northern Sweden: respiratory symptoms assessed in a postal survey.

Authors:  B Lundbäck; L Nyström; L Rosenhall; N Stjernberg
Journal:  Eur Respir J       Date:  1991-03       Impact factor: 16.671

Review 5.  Risk factors for the development of chronic obstructive pulmonary disease.

Authors:  E K Silverman; F E Speizer
Journal:  Med Clin North Am       Date:  1996-05       Impact factor: 5.456

Review 6.  Immunomodulatory roles of surfactant proteins A and D: implications in lung disease.

Authors:  Amy M Pastva; Jo Rae Wright; Kristi L Williams
Journal:  Proc Am Thorac Soc       Date:  2007-07

7.  Genome-wide association and large-scale follow up identifies 16 new loci influencing lung function.

Authors:  María Soler Artigas; Daan W Loth; Louise V Wain; Sina A Gharib; Ma'en Obeidat; Wenbo Tang; Guangju Zhai; Jing Hua Zhao; Albert Vernon Smith; Jennifer E Huffman; Eva Albrecht; Catherine M Jackson; David M Evans; Gemma Cadby; Myriam Fornage; Ani Manichaikul; Lorna M Lopez; Toby Johnson; Melinda C Aldrich; Thor Aspelund; Inês Barroso; Harry Campbell; Patricia A Cassano; David J Couper; Gudny Eiriksdottir; Nora Franceschini; Melissa Garcia; Christian Gieger; Gauti Kjartan Gislason; Ivica Grkovic; Christopher J Hammond; Dana B Hancock; Tamara B Harris; Adaikalavan Ramasamy; Susan R Heckbert; Markku Heliövaara; Georg Homuth; Pirro G Hysi; Alan L James; Stipan Jankovic; Bonnie R Joubert; Stefan Karrasch; Norman Klopp; Beate Koch; Stephen B Kritchevsky; Lenore J Launer; Yongmei Liu; Laura R Loehr; Kurt Lohman; Ruth J F Loos; Thomas Lumley; Khalid A Al Balushi; Wei Q Ang; R Graham Barr; John Beilby; John D Blakey; Mladen Boban; Vesna Boraska; Jonas Brisman; John R Britton; Guy G Brusselle; Cyrus Cooper; Ivan Curjuric; Santosh Dahgam; Ian J Deary; Shah Ebrahim; Mark Eijgelsheim; Clyde Francks; Darya Gaysina; Raquel Granell; Xiangjun Gu; John L Hankinson; Rebecca Hardy; Sarah E Harris; John Henderson; Amanda Henry; Aroon D Hingorani; Albert Hofman; Patrick G Holt; Jennie Hui; Michael L Hunter; Medea Imboden; Karen A Jameson; Shona M Kerr; Ivana Kolcic; Florian Kronenberg; Jason Z Liu; Jonathan Marchini; Tricia McKeever; Andrew D Morris; Anna-Carin Olin; David J Porteous; Dirkje S Postma; Stephen S Rich; Susan M Ring; Fernando Rivadeneira; Thierry Rochat; Avan Aihie Sayer; Ian Sayers; Peter D Sly; George Davey Smith; Akshay Sood; John M Starr; André G Uitterlinden; Judith M Vonk; S Goya Wannamethee; Peter H Whincup; Cisca Wijmenga; O Dale Williams; Andrew Wong; Massimo Mangino; Kristin D Marciante; Wendy L McArdle; Bernd Meibohm; Alanna C Morrison; Kari E North; Ernst Omenaas; Lyle J Palmer; Kirsi H Pietiläinen; Isabelle Pin; Ozren Pola Sbreve Ek; Anneli Pouta; Bruce M Psaty; Anna-Liisa Hartikainen; Taina Rantanen; Samuli Ripatti; Jerome I Rotter; Igor Rudan; Alicja R Rudnicka; Holger Schulz; So-Youn Shin; Tim D Spector; Ida Surakka; Veronique Vitart; Henry Völzke; Nicholas J Wareham; Nicole M Warrington; H-Erich Wichmann; Sarah H Wild; Jemma B Wilk; Matthias Wjst; Alan F Wright; Lina Zgaga; Tatijana Zemunik; Craig E Pennell; Fredrik Nyberg; Diana Kuh; John W Holloway; H Marike Boezen; Debbie A Lawlor; Richard W Morris; Nicole Probst-Hensch; Jaakko Kaprio; James F Wilson; Caroline Hayward; Mika Kähönen; Joachim Heinrich; Arthur W Musk; Deborah L Jarvis; Sven Gläser; Marjo-Riitta Järvelin; Bruno H Ch Stricker; Paul Elliott; George T O'Connor; David P Strachan; Stephanie J London; Ian P Hall; Vilmundur Gudnason; Martin D Tobin
Journal:  Nat Genet       Date:  2011-09-25       Impact factor: 38.330

Review 8.  Growth factors in lung development and disease: friends or foe?

Authors:  Tushar J Desai; Wellington V Cardoso
Journal:  Respir Res       Date:  2001-10-09

9.  Genome-wide joint meta-analysis of SNP and SNP-by-smoking interaction identifies novel loci for pulmonary function.

Authors:  Dana B Hancock; María Soler Artigas; Sina A Gharib; Amanda Henry; Ani Manichaikul; Adaikalavan Ramasamy; Daan W Loth; Medea Imboden; Beate Koch; Wendy L McArdle; Albert V Smith; Joanna Smolonska; Akshay Sood; Wenbo Tang; Jemma B Wilk; Guangju Zhai; Jing Hua Zhao; Hugues Aschard; Kristin M Burkart; Ivan Curjuric; Mark Eijgelsheim; Paul Elliott; Xiangjun Gu; Tamara B Harris; Christer Janson; Georg Homuth; Pirro G Hysi; Jason Z Liu; Laura R Loehr; Kurt Lohman; Ruth J F Loos; Alisa K Manning; Kristin D Marciante; Ma'en Obeidat; Dirkje S Postma; Melinda C Aldrich; Guy G Brusselle; Ting-hsu Chen; Gudny Eiriksdottir; Nora Franceschini; Joachim Heinrich; Jerome I Rotter; Cisca Wijmenga; O Dale Williams; Amy R Bentley; Albert Hofman; Cathy C Laurie; Thomas Lumley; Alanna C Morrison; Bonnie R Joubert; Fernando Rivadeneira; David J Couper; Stephen B Kritchevsky; Yongmei Liu; Matthias Wjst; Louise V Wain; Judith M Vonk; André G Uitterlinden; Thierry Rochat; Stephen S Rich; Bruce M Psaty; George T O'Connor; Kari E North; Daniel B Mirel; Bernd Meibohm; Lenore J Launer; Kay-Tee Khaw; Anna-Liisa Hartikainen; Christopher J Hammond; Sven Gläser; Jonathan Marchini; Peter Kraft; Nicholas J Wareham; Henry Völzke; Bruno H C Stricker; Timothy D Spector; Nicole M Probst-Hensch; Deborah Jarvis; Marjo-Riitta Jarvelin; Susan R Heckbert; Vilmundur Gudnason; H Marike Boezen; R Graham Barr; Patricia A Cassano; David P Strachan; Myriam Fornage; Ian P Hall; Josée Dupuis; Martin D Tobin; Stephanie J London
Journal:  PLoS Genet       Date:  2012-12-20       Impact factor: 5.917

10.  Bidirectional promoters are the major source of gene activation-associated non-coding RNAs in mammals.

Authors:  Masahiro Uesaka; Osamu Nishimura; Yasuhiro Go; Kinichi Nakashima; Kiyokazu Agata; Takuya Imamura
Journal:  BMC Genomics       Date:  2014-01-17       Impact factor: 3.969

View more
  7 in total

1.  A genome-wide association study of quantitative computed tomographic emphysema in Korean populations.

Authors:  Sooim Sin; Hye-Mi Choi; Jiwon Lim; Jeeyoung Kim; So Hyeon Bak; Sun Shim Choi; Jinkyeong Park; Jin Hwa Lee; Yeon-Mok Oh; Mi Kyeong Lee; Brian D Hobbs; Michael H Cho; Edwin K Silverman; Woo Jin Kim
Journal:  Sci Rep       Date:  2021-08-17       Impact factor: 4.996

2.  Whole exome sequencing analysis in severe chronic obstructive pulmonary disease.

Authors:  Dandi Qiao; Asher Ameli; Dmitry Prokopenko; Han Chen; Alvin T Kho; Margaret M Parker; Jarrett Morrow; Brian D Hobbs; Yanhong Liu; Terri H Beaty; James D Crapo; Kathleen C Barnes; Deborah A Nickerson; Michael Bamshad; Craig P Hersh; David A Lomas; Alvar Agusti; Barry J Make; Peter M A Calverley; Claudio F Donner; Emiel F Wouters; Jørgen Vestbo; Peter D Paré; Robert D Levy; Stephen I Rennard; Ruth Tal-Singer; Margaret R Spitz; Amitabh Sharma; Ingo Ruczinski; Christoph Lange; Edwin K Silverman; Michael H Cho
Journal:  Hum Mol Genet       Date:  2018-11-01       Impact factor: 5.121

3.  From COPD epidemiology to studies of pathophysiological disease mechanisms: challenges with regard to study design and recruitment process: Respiratory and Cardiovascular Effects in COPD (KOLIN).

Authors:  Anne Lindberg; Robert Linder; Helena Backman; Jonas Eriksson Ström; Andreas Frølich; Ulf Nilsson; Eva Rönmark; Viktor Johansson Strandkvist; Annelie F Behndig; Anders Blomberg
Journal:  Eur Clin Respir J       Date:  2017-12-17

4.  Integration of SNP Disease Association, eQTL, and Enrichment Analyses to Identify Risk SNPs and Susceptibility Genes in Chronic Obstructive Pulmonary Disease.

Authors:  Yang Liu; Kun Huang; Yahui Wang; Erqiang Hu; Benliang Wei; Zhaona Song; Yuqing Zou; Luanfeng Ge; Lina Chen; Wan Li
Journal:  Biomed Res Int       Date:  2020-12-29       Impact factor: 3.411

5.  Progesterone activates GPR126 to promote breast cancer development via the Gi pathway.

Authors:  Wentao An; Hui Lin; Lijuan Ma; Chao Zhang; Yuan Zheng; Qiuxia Cheng; Chuanshun Ma; Xiang Wu; Zihao Zhang; Yani Zhong; Menghui Wang; Dongfang He; Zhao Yang; Lutao Du; Shiqing Feng; Chuanxin Wang; Fan Yang; Peng Xiao; Pengju Zhang; Xiao Yu; Jin-Peng Sun
Journal:  Proc Natl Acad Sci U S A       Date:  2022-04-08       Impact factor: 12.779

6.  Study on polymorphisms in CHRNA5/CHRNA3/CHRNB4 gene cluster and the associated with the risk of non-small cell lung cancer.

Authors:  Yiting Sun; Jiaye Li; Chang Zheng; Baosen Zhou
Journal:  Oncotarget       Date:  2017-12-20

7.  PulmonDB: a curated lung disease gene expression database.

Authors:  Ana B Villaseñor-Altamirano; Marco Moretto; Mariel Maldonado; Alejandra Zayas-Del Moral; Adrián Munguía-Reyes; Yair Romero; Jair S García-Sotelo; Luis A Aguilar; Oscar Aldana-Assad; Kristof Engelen; Moisés Selman; Julio Collado-Vides; Yalbi I Balderas-Martínez; Alejandra Medina-Rivera
Journal:  Sci Rep       Date:  2020-01-16       Impact factor: 4.379

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.