Literature DB >> 31205927

Novel idiopathic pulmonary fibrosis susceptibility variants revealed by deep sequencing.

Jose M Lorenzo-Salazar1,2, Shwu-Fan Ma3,2, Jonathan Jou4,2, Pei-Chi Hou3, Beatriz Guillen-Guio5, Richard J Allen6, R Gisli Jenkins7, Louise V Wain6,8, Justin M Oldham9, Imre Noth3, Carlos Flores1,5,10.   

Abstract

BACKGROUND: Specific common and rare single nucleotide variants (SNVs) increase the likelihood of developing sporadic idiopathic pulmonary fibrosis (IPF). We performed target-enriched sequencing on three loci previously identified by a genome-wide association study to gain a deeper understanding of the full spectrum of IPF genetic risk and performed a two-stage case-control association study.
METHODS: A total of 1.7 Mb of DNA from 181 IPF patients was deep sequenced (>100×) across 11p15.5, 14q21.3 and 17q21.31 loci. Comparisons were performed against 501 unrelated controls and replication studies were assessed in 3968 subjects.
RESULTS: 36 SNVs were associated with IPF susceptibility in the discovery stage (p<5.0×10-8). After meta-analysis, the strongest association corresponded to rs35705950 (p=9.27×10-57) located upstream from the mucin 5B gene (MUC5B). Additionally, a novel association was found for two co-inherited low-frequency SNVs (<5%) in MUC5AC, predicting a missense amino acid change in mucin 5AC (lowest p=2.27×10-22). Conditional and haplotype analyses in 11p15.5 supported the existence of an additional contribution of MUC5AC variants to IPF risk.
CONCLUSIONS: This study reinforces the significant IPF associations of these loci and implicates MUC5AC as another key player in IPF susceptibility.

Entities:  

Year:  2019        PMID: 31205927      PMCID: PMC6556557          DOI: 10.1183/23120541.00071-2019

Source DB:  PubMed          Journal:  ERJ Open Res        ISSN: 2312-0541


Introduction

Idiopathic pulmonary fibrosis (IPF), a devastating interstitial lung disease with unknown aetiology, encompasses a highly heterogeneous and unpredictable clinical course [1]. IPF remains an incurable condition, although lung transplant can improve long-term survival [2] and two antifibrotic therapies are effective at slowing disease progression [3, 4]. Identifying genetic risk factors will allow a better understanding of the causative molecular pathways involved in disease pathogenesis and guide novel therapeutic approaches to support the development of precision medicine approaches in IPF. The existence of a familial form of pulmonary fibrosis (FPF) and the recognition that pulmonary fibrosis occurs in several rare genetic disorders strongly suggest that genetic factors influence susceptibility and prognosis [5]. Rare variants in surfactant-encoding genes (SFTPC and SFTPA2) and telomere integrity genes (TERT, TERC, RTEL1 and PARN) are associated with both FPF and IPF [6-11]. Additionally, common single nucleotide variants (SNVs) predicting risk of sporadic IPF have been identified at 17 independent loci by means of several genome-wide scale studies [12-16]. Reviewed in detail elsewhere [17], variants at these loci in aggregate currently explain roughly 25–30% of the disease risk, and support a major role of telomere maintenance, cell adhesion/wound healing, fibrogenic and immunity/host defence pathways in IPF development. In conducting one of the largest genome-wide association studies (GWASs) in IPF, our group [12] confirmed that the common SNV rs35705950 of MUC5B is the strongest known risk factor for the disease [12-16], and identified additional novel common susceptibility SNVs with milder effects in the genes encoding Toll-interacting protein (TOLLIP, 11p15.5) and signal peptide peptidase like 2C (SPPL2C, 17q21.31). One of the most striking results of this study was that it revealed allelic heterogeneity in 11p15.5 given the existence of replicable common independent risk SNVs in MUC5B and TOLLIP, and possibly other nearby genes. Additionally, a fourth locus involving the gene encoding MAM domain containing glycosylphosphatidylinositol anchor 2 (MDGA2, 14q21.3) reached genome-wide significance in the second stage of our study, but could not be replicated in a third case–control sample of our study [12] nor in other GWASs conducted to date. Because incomplete overlap of results across distinct GWASs is common, and since MDGA2 is a paralogue for a potential biomarker of IPF disease activity [18], this locus remains of potential importance. As progress has been made in identifying susceptibility loci across many diseases, it is increasingly being shown that multiple nearby, but independent, signals often underlie strong susceptibility loci [19]. This observation, along with increasingly available and affordable high-throughput sequencing technologies, provides a valuable opportunity to better characterise previously identified risk loci. Here, we use a fine mapping approach based on target-enriched DNA sequencing to assess the full spectrum of variants in three IPF-associated genomic loci previously identified in our GWAS.

Materials and methods

Institutional review boards and ethics committees at participating centres approved the study. All participants provided written informed consent (see supplementary methods).

Discovery study

Study subjects

A total of 181 IPF subjects were obtained from the University of Chicago Natural History study (n=138), the Correlating Outcomes with biochemical Markers to Estimate Time-progression study (COMET; n=22) and the AntiCoagulant Effectiveness in IPF study (ACE; n=21). The majority of these patients (60.8%) overlapped with those used for the discovery stage in our previous GWAS [12]. However, for this study, we prioritised the cases based on the existence of sufficient DNA quantity for the targeted next-generation sequencing (NGS) experiments and high DNA integrity. Subjects were European-Americans, had an average age of 67 years at diagnosis, and respiratory symptoms including dyspnoea on exertion and/or cough for at least 3 months. A high-resolution computed tomography scan with a probable or definite usual interstitial pneumonia (UIP) pattern was required according to published diagnostic guidelines [2]. A surgical lung biopsy was obtained in 37.3% of patients, all confirming UIP. None of them had a clinically significant exposure to known fibrogenic agents or suffered from other known causes of interstitial lung disease. Patient details are listed in table 1.
TABLE 1

Clinical and demographic characteristics of idiopathic pulmonary fibrosis cases included in the discovery study

University of ChicagoCOMETACEp-value
Subjects1382221
Age years69±963±869±60.02
Male108 (78.3)14 (63.6)16 (76.2)0.46
Ever-smoker95 (75.4)17 (77.3)15 (71.4)0.91
FVC % pred67.2±16.772.1±12.953.9±15.48.0×10−4
DLCO % pred47.8±16.846.4±14.233.1±19.51.7×10−3
Transplant12 (8.7)0 (0)0.17
Death68 (49.3)1 (4.5)3 (14.3)2.7×10−5
Follow-up months38.5±24.836.3±2.835.1±4.00.41
Time to death months27.8±17.99.57#4.0±2.90.06

Data are presented as n, mean±sd or n (%), unless otherwise stated. FVC: forced vital capacity; DLCO: diffusing capacity of the lung for carbon monoxide. #: mean.

Clinical and demographic characteristics of idiopathic pulmonary fibrosis cases included in the discovery study Data are presented as n, mean±sd or n (%), unless otherwise stated. FVC: forced vital capacity; DLCO: diffusing capacity of the lung for carbon monoxide. #: mean.

Sequencing, variant calling, validation and association testing

Sequencing (>100× depth) and variant calling was performed in regions of interest (ROIs) spanning 1.7 Mb (supplementary table S1 and supplementary methods). The dataset obtained from the 181 cases was used for a case–control association study, where unrelated European individuals from the 1000 Genomes Project (1KGP; www.internationalgenome.org) were used as controls (n=501; release May 2, 2013). Single-variant association tests are typically underpowered for rare variants [20]. However, given the previous reported large effect for some of the variants in IPF [14] and the design of the study, we were interested in identifying variants with large, similar effect sizes within ROIs and not in delineating the most likely causal gene(s). Therefore, association testing was performed individually for each SNV. Effect sizes (odds ratios) and 95% confidence intervals were assessed with PLINK version 1.07 (http://zzz.bwh.harvard.edu/plink) under logistic regression models for biallelic loci with call rates >95%. Principal components were derived with Eigensoft version 6.0.1 [21] using a subset of 2342 variants with reduced linkage disequilibrium (r2<0.15). The first two principal components were used to project the genetic ancestry of cases in the 1KGP dataset for visual inspection of the clustering. In addition, the first five principal components were included in the regression models to account for the population stratification and no evidence for inflation of the association results was observed (λ=1.00). Variants were annotated according to the minor allele frequency (MAF) in 1KGP, classifying them in two tiers (common/low frequency) based on a 5% threshold in controls. Significantly associated low-frequency variants were subjected to validation by Sanger sequencing (supplementary table S2 and supplementary methods).

Conditional and haplotype analyses in 11p15.5

Including the newly identified risk variants from this study, the top hits for this locus have been described in three mucin-encoding genes (MUC2, MUC5AC and MUC5B) and the TOLLIP gene. However, as the top hit of MUC2 (rs7934606) [16, 22] falls outside of the ROI targeted by our NGS experiments, seven risk variants from three genes were included in final analyses: 1) rs34474233 and rs34815853, the two tightly linked variants of MUC5AC identified in the current study; 2) rs12802931 (from this study) and rs35705950 [12–14, 16, 22], the two 5′-flanking variants of MUC5B; and 3) rs111521887, rs5743894 and rs5743890, the three GWAS hits mapping near or within TOLLIP [12]. A formal conditional analysis taking the linkage disequilibrium structure of 11p15.5 into account was applied using the GCTA-COJO method [23], conditioning the risk variants to rs35705950 of MUC5B. In addition, haplotype associations were conducted in PLINK comparing the frequency of combinations of the risk variants between cases and controls with logistic regressions adjusted for five principal components. Combinations with frequencies >1% were reconstructed from all seven variants together and from variants from each of the gene pairs. Statistical significance was set at p<2.0×10−3 after a Bonferroni correction considering all haplotypes tested.

Replication study and meta-analysis of results

Replication was assessed in data from a study consisting of 602 IPF cases and 3366 UK Biobank controls as described by Allen et al. [16] (see supplementary methods for additional information). Random effects meta-analysis was performed with METASOFT version 2.0.1 [24] to estimate the overall effect size of associated SNVs across the discovery and replication studies. Replication was declared for risk variants satisfying the same direction of effects as in the discovery study, with p<0.0014 in the replication stage (corresponding to a Bonferroni-like correction threshold of 0.05/36) and with a genome-wide significant association (p<5×10−8) in the meta-analysis of both stages.

Results

Quality control of called variants in the discovery study

A total of 18 234 variants (13 932 SNVs and 4302 indels) were identified among IPF samples. The Ti/Tv ratio (i.e. the ratio of numbers of transitions versus transversions) was 2.192, within the range of expected ratios for whole genomes (i.e. 2.1–2.3) [25, 26]. This is not unexpected as a large fraction of the ROIs are nonexonic sequences. Based on this, we inferred a false discovery rate (FDR) of 3.3%. We also evaluated the MAF and concordance of genotypes of called variants from NGS with those from the array data of our GWAS [12]. For the 231 variants that had genotype data in both datasets, MAFs showed a near-perfect linear correlation (Pearson correlation R2=0.998) and genotype concordance was 96.1% (95% CI 95.9–96.4%). The genotype discrepancies between the array and NGS were attributed to missing genotypes on the array and the FDR rate estimates. Association testing in the discovery study was conducted in the subset of 10 245 biallelic variants and had genotypes in >95% individuals, which implies a 86.9% overlap of imputed variants with our previous GWAS assessment in these loci [12]. Genetic ancestry projections of cases and 1KGP samples from all continents based on a subset of the biallelic variants demonstrated clustering of the patients with Europeans (supplementary figure S1), supporting their recorded ethnicity.

Association in the discovery study

36 variants reached genome-wide significance (nine in 11p15.5, 17 in 14q21.3 and 10 in 17q21.31). Only 14% of these variants were assessed in our previous GWAS [12]. Most of them were located in introns or flanking regions (table 2 and figure 1). The strongest signals corresponded to rs35705950 within MUC5B in 11p15.5 (MAF 10.8% in controls; p=2.69×10−22), rs12586854 within MDGA2 in 14q21.3 (MAF 42.8% in controls; p=6.81×10−19) and rs55938136 within CRHR1 in 17q21.31 (MAF 1.8% in controls; p=3.37×10−28). Besides, another variant of MUC5B located ∼8.1 kb away from the 5′ region of the gene was also strongly associated with IPF (rs12802931: MAF 18.3%; p=3.72×10−16), although it was not independent from rs35705950 (p=0.731 conditioning on rs35705950). Strikingly, three coding low-frequency variants of MUC5AC were among the significant findings (p≤4.15×10−9): one with a synonymous prediction (rs371630624) and two others (rs34815853 and rs34474233) affecting the same codon leading to a missense amino acid change (p.Ala5353Lys: MAF 4.4% in controls) that was supported by the sequencing results (figure 2). Individually, they are predicted by PolyPhen (http://genetics.bwh.harvard.edu/pph2) to be benign (rs34815853) and possibly damaging (rs34474233), but the simultaneous effects of the two are unknown. Orthogonal validation by Sanger sequencing strongly supported that the two missense variants were true positives (figure 2); however, it did not support the existence of the variant with synonymous prediction (i.e. false positive). Besides these three, only two other low-frequency variants from 14q21.3 (rs543453148: MAF 0.4% in controls) and 17q21.31 (rs55938136: MAF 1.8% in controls) were significantly associated with IPF. Sanger sequencing of rs543453148 suggested the existence of variation but with alleles that were unaligned to those recorded by NGS. Sanger results were fully congruent with the NGS for rs55938136. In the context of our previous results [12], while several other SNVs reached genome-wide significance in 11p15.5, none of the three TOLLIP risk variants (rs111521887, rs5743894 and rs5743890) previously evidenced were significant in this study (figure 1). As for 14q21.3 and 17q21.31, none of the two top hits reported before were nominally significant in this study (rs7144383: p=0.181; rs17690703: p=0.639). The SNV at rs4898572, an intronic variant in strong linkage disequilibrium with rs7144383 in MDGA2, was also not significant in this study (p=0.191).
TABLE 2

Association results reaching genome-wide significance in the discovery study

SNVChr.Position (hg19)Effect alleleMAF#OR (95% CI)p-valueNearby geneFunction/location
rs37163062411p15.51 213 302C0.0011942 (245.6–15 360)7.18×10−13MUC5ACSynonymous
rs3447423311p15.51 219 152A0.0444.08 (2.56–6.49)2.99×10−9MUC5ACMissense (Ala5353Lys)
rs3481585311p15.51 219 153A0.0444.01 (2.52–6.38)4.15×10−9MUC5ACMissense (Ala5353Lys)
rs1280293111p15.51 236 164G0.1833.76 (2.73–5.16)3.72×10−16MUC5B8.1 kb 5′ of MUC5B
rs3570595011p15.51 241 221T0.1086.18 (4.28–8.93)2.69×10−22MUC5B3.1 kb 5′ of MUC5B
rs20024327311p15.51 266 716C0.2270.27 (0.17–0.43)3.55×10−8MUC5B/RP11-532E4.2Missense/intronic
rs496307311p15.51 362 949G0.3003.23 (2.12–4.921)4.91×10−8CTD-2245O6.131 kb 3′ of CTD-2245O6.1
rs496307211p15.51 362 953G0.3003.34 (2.19–5.11)2.63×10−8CTD-2245O6.131 kb 3′ of CTD-2245O6.1
rs7146989211p15.51 416 119G0.4910.22 (0.15–0.31)2.15×10−16BRSK2Intronic
rs14589817014q21.347 574 913G0.4580.47 (0.36–0.62)4.71×10−8MDGA2Intronic
rs19983802214q21.347 574 922C0.4580.45 (0.34–0.57)7.14×10−9MDGA2Intronic
rs1258685414q21.347 576 151T0.4280.18 (0.12–0.26)6.81×10−19MDGA2Intronic
rs1115754314q21.347 576 203C0.3000.13 (0.07–0.22)5.97×10−14MDGA2Intronic
rs1115754414q21.347 576 205C0.4270.30 (0.21–0.41)1.37×10−13MDGA2Intronic
rs1258685614q21.347 576 217G0.3040.18 (0.11–0.29)3.77×10−13MDGA2Intronic
rs1115754514q21.347 576 231T0.4630.44 (0.34–0.59)7.81×10−9MDGA2Intronic
rs18364341514q21.347 576 246A0.1820.04 (0.01–0.12)2.77×10−8MDGA2Intronic
rs15032284014q21.347 576 252T0.2160.13 (0.07–0.24)3.90×10−10MDGA2Intronic
rs800546514q21.347 716 040A0.4610.37 (0.27–0.51)4.41×10−10MDGA2Intronic
rs54345314814q21.347 751 911A0.00425.22 (8.29–76.73)1.30×10−8MDGA2Intronic
rs1289018014q21.347 788 012G0.3930.39 (0.28–0.53)2.91×10−9MDGA2Intronic
rs7325185714q21.347 800 734G0.1540.06 (0.02–0.14)9.59×10−10MDGA2Intronic
rs714165314q21.347 828 946C0.3630.25 (0.17–0.37)1.65×10−11MDGA2Intronic
rs714532914q21.347 931 577T0.3760.34 (0.24–0.49)2.59×10−9MDGA2Intronic
rs490077014q21.347 938 755A0.4980.38 (0.28–0.53)9.10×10−9MDGA2Intronic
rs5873132514q21.348 009 745G0.4690.34 (0.25–0.47)7.35×10−11MDGA2Noncoding transcript/intronic
rs11581151917q21.3143 677 790C0.0704.93 (2.83–8.58)1.68×10−8RP11-707O23.17 kb 3′ of RP11-707O23.1
rs5638376317q21.3143 682 323C0.2420.07 (0.03–0.16)1.75×10−9CTC-501O10.117 kb 5′ of CRHR1
rs37341717q21.3143 691 173T0.2390.10 (0.05–0.20)1.24×10−10CRHR16.5 kb 5′ of CRHR1
rs722112417q21.3143 764 301A0.2650.04 (0.02–0.09)6.70×10−14CRHR1Intronic
rs5593813617q21.3143 798 360A0.018151.90 (62.14–371.50)3.37×10−28CRHR1Intronic
rs1187084417q21.3144 141 279A0.2573.98 (2.82–5.62)3.85×10−15KANSL1Intronic
rs37199652517q21.3144 183 317A0.2440.04 (0.02–0.11)2.17×10−10KANSL1Intronic
rs14292027217q21.3144 301 840C0.2480.10 (0.05–0.20)7.45×10−11KANSL1Intronic
rs266863717q21.3144 322 960G0.0955.32 (3.09–9.13)1.43×10−9KANSL1/LRRC37AIntergenic
rs269661817q21.3144 325 635C0.2496.74 (4.02–11.31)5.09×10−13KANSL1/LRRC37A23 kb 5′ of KANSL1

SNV: single nucleotide variant; Chr.: chromosome; MAF: minor allele frequency. #: MAF in Europeans from the 1000 Genomes Project (low-frequency variants in italic); ¶: because of their complete linkage disequilibrium, these variants can be merged into rs71464134. The functional information provided corresponds to the predicted change for the merged reference sequence.

FIGURE 1

Regional association plots of a) 11p15.5, b) 14q21.3 and c) 17q21.31 with annotations of previously detected signals (rs35705950 in chromosome 11, rs7144383 in chromosome 14 and rs17690703 in chromosome 17). Chromosomal position is shown in Mb. Significance is represented on a −log10(p-value) scale. A threshold minor allele frequency in controls of 0.05 was used to stratify the results derived by common versus low-frequency variants. Colours reflect linkage disequilibrium (r2) values against the top hit on each region according to the European population data from the 1000 Genomes Project.

FIGURE 2

Detailed pile-up view of sequence reads mapping and Sanger sequencing results of the two MUC5AC variants affecting the missense change.

Association results reaching genome-wide significance in the discovery study SNV: single nucleotide variant; Chr.: chromosome; MAF: minor allele frequency. #: MAF in Europeans from the 1000 Genomes Project (low-frequency variants in italic); ¶: because of their complete linkage disequilibrium, these variants can be merged into rs71464134. The functional information provided corresponds to the predicted change for the merged reference sequence. Regional association plots of a) 11p15.5, b) 14q21.3 and c) 17q21.31 with annotations of previously detected signals (rs35705950 in chromosome 11, rs7144383 in chromosome 14 and rs17690703 in chromosome 17). Chromosomal position is shown in Mb. Significance is represented on a −log10(p-value) scale. A threshold minor allele frequency in controls of 0.05 was used to stratify the results derived by common versus low-frequency variants. Colours reflect linkage disequilibrium (r2) values against the top hit on each region according to the European population data from the 1000 Genomes Project. Detailed pile-up view of sequence reads mapping and Sanger sequencing results of the two MUC5AC variants affecting the missense change.

Elucidating distinctive gene contributions in the 11p15.5 region

Previous evidence highlights the importance of the 11p15.5 region harbouring mucin genes and TOLLIP in IPF susceptibility and mortality [12-16]. Given the novel finding of hits in MUC5AC, we performed further association analyses focusing on the topmost significant risk variant combinations from this region. In total, seven risk variants showing variable linkage disequilibrium relationships in the discovery study (figure 3) resided in the 11p15.5 captured by NGS experiments. These variants from three genes (MU5AC, MUC5B and TOLLIP) were used to reconstruct the 25 most common haplotypes as a result of distinct gene combinations (supplementary table S3). 12 of these were associated with IPF irrespective of the model adjustments, eight of them with statistically significant risk effects. Among all risk combinations, those defined by MUC5AC together with MUC5B variants showed the largest effect (OR 6.44; p≤1.3×10−11), while intermediate ORs in the range of 3.39–4.03 (p≤1.8×10−4) were generally found for combinations containing variants from each of these two genes separately. A formal association test of any of the genome-wide significant 11p15.5 variants in the discovery study conditioned to rs35705950 of MUC5B resulted in attenuation of all the signals (table 3). However, they remained nominally significant for the two MUC5AC variants (rs34474233 and rs34815853: p≤6.27×10−3), suggesting an additional contribution to IPF risk. The haplotypes of any of the TOLLIP risk variants with those from MUC5AC and/or MUC5B had no evident effects in terms of the odds ratios or significance.
FIGURE 3

Linkage disequilibrium plot of r2 and D′ estimates in the discovery study for the risk variants in MUC5AC, MUC5B and TOLLIP. Each diamond of the linkage disequilibrium plot represents a pairwise comparison, with its values schematically symbolised by a colour gradient, ranging from red (stronger linkage disequilibrium) to white (reduced linkage disequilibrium).

TABLE 3

Association results of 11p15.5 with or without conditioning on rs35705950

Nearby geneFunction/locationSNVUnconditioned p-valueConditioned p-value
MUC5ACMissense (Ala5353Lys)rs34474233#2.99×10−94.12×10−3
MUC5ACMissense (Ala5353Lys)rs34815853#4.15×10−96.27×10−3
MUC5B8.1 kb 5′ of MUC5Brs128029313.72×10−160.731
MUC5B/RP11-532E4.2Missense/intronicrs2002432733.55×10−81.44×10−4
CTD-2245O6.131 kb 3′ of CTD-2245O6.1rs49630734.91×10−81.48×10−6
CTD-2245O6.131 kb 3′ of CTD-2245O6.1rs49630722.63×10−82.66×10−6
BRSK2Intronicrs714698922.15×10−161.29×10−9

The rs371630624 variant at MUC5AC was excluded from this analysis as it was not supported by Sanger sequencing. #: these variants can be merged into rs71464134.

Linkage disequilibrium plot of r2 and D′ estimates in the discovery study for the risk variants in MUC5AC, MUC5B and TOLLIP. Each diamond of the linkage disequilibrium plot represents a pairwise comparison, with its values schematically symbolised by a colour gradient, ranging from red (stronger linkage disequilibrium) to white (reduced linkage disequilibrium). Association results of 11p15.5 with or without conditioning on rs35705950 The rs371630624 variant at MUC5AC was excluded from this analysis as it was not supported by Sanger sequencing. #: these variants can be merged into rs71464134. Of the 36 variants that reached genome-wide significance in the discovery study, 10 variants had nominal significance in the replication study, had the same direction of effects as in the discovery study and resulted in a meta-analysis p<5×10−8: five were located on 11p15.5 and the remaining five on 17q21.31 (table 4). However, only four of them reached the adjusted significance threshold (p<1.4×10−3) in the replication study, all corresponding to MUC5AC and MUC5B genes. Replication was not supported for the 14q21.3 variants. In meta-analysis the most significant findings were those corresponding to MUC5B: rs35705950 (OR 4.90, 95% CI 3.30–7.28; p=9.27×10−57) and the linkage disequilibrium proxy rs12802931 (OR 2.96, 95% CI 1.93–4.53; p=4.60×10−35). Most importantly, these results strongly supported the association of the two MUC5AC variants rs34474233 (OR 3.39, 95% CI 2.65–4.32; p=2.27×10−22) and rs34815853 (OR 3.37, 95% CI 2.64–4.30; p=3.02×10−22) predicting a missense change in the protein.
TABLE 4

Variants showing nominal significance in the replication study, with the same direction of effects as in the discovery study and that met the genome-wide significance level in the meta-analysis

SNVChr.Position (hg19)GeneEffect/noneffect alleleMAFDiscoveryReplicationMeta-analysis
OR (95% CI)p-valueOR (95% CI)p-valueOR (95% CI)p-value
rs34474233111 219 152MUC5ACA/G0.0444.08 (2.56–6.49)2.99×10−93.15 (2.37–4.20)4.10×10−143.39 (2.65–4.32)2.27×10−22
rs34815853111 219 153MUC5ACA/C0.0444.01 (2.53–6.37)4.15×10−93.16 (2.37–4.20)4.13×10−143.37 (2.64–4.30)3.02×10−22
rs12802931111 236 164MUC5BG/A0.1833.76 (2.73–5.16)3.72×10−162.42 (2.02–2.90)6.07×10−222.96 (1.93–4.53)4.60×10−35
rs35705950111 241 221MUC5BT/G0.1086.18 (4.28–8.94)2.69×10−224.11 (3.31–5.11)1.86×10−374.90 (3.30–7.28)9.27×10−57
rs4963072111 362 953CTD-2245O6.1G/C0.3003.34 (2.18–5.11)2.63×10−81.29 (1.08–1.54)5.30×10−31.59 (0.38–6.65)4.91×10−8
rs563837631743 682 323CTC-501O10.1C/T0.2420.07 (0.03–0.16)1.75×10−90.82 (0.68–0.97)2.42×10−20.24 (0.02–2.82)2.13×10−8
rs3734171743 691 173CRHR1T/C0.2390.10 (0.05–0.20)1.24×10−100.82 (0.69–0.98)2.72×10−20.29 (0.04–2.36)1.59×10−9
rs3719965251744 183 317KANSL1A/C0.2440.04 (0.02–0.11)2.17×10−100.80 (0.67–0.95)1.26×10−20.19 (0.01–3.44)1.98×10−9
rs1429202721744 301 840KANSL1C/T0.2480.10 (0.05–0.20)7.45×10−110.83 (0.69–0.98)3.07×10−20.29 (0.04–2.35)1.11×10−9
rs26966181744 325 635KANSL1/LRRC37AC/G0.2496.74 (4.02–11.31)5.09×10−131.25 (1.05–1.49)1.05×10−22.28 (0.28–18.51)4.40×10−12

SNV: single nucleotide variant; Chr.: chromosome; MAF: minor allele frequency.

Variants showing nominal significance in the replication study, with the same direction of effects as in the discovery study and that met the genome-wide significance level in the meta-analysis SNV: single nucleotide variant; Chr.: chromosome; MAF: minor allele frequency.

Discussion

In recent years, there has been growing evidence that genetic factors play an important role in IPF. However, a large fraction of genetic risk remains unexplained [22]. Here, we screened 1.7 Mb from three loci of interest and tested association for IPF susceptibility in two stages comprising 4650 unrelated subjects. The initial stage identified 36 variants (average MAF 26.6% in controls) that reached genome-wide significance. Three of these constituted validated low-frequency SNVs (<5%), suggesting a minor impact of low-frequency variants in IPF susceptibility in these loci. By locus, the top signals at 11p15.5 reinforced that the strongest risk corresponds to the previously described MUC5B promoter variant rs35705950. Besides this, two tightly linked low-frequency SNVs at 11p15.5 (rs34474233 and rs34815853) that predicted the p.Ala5353Lys amino acid change in MUC5AC were associated with IPF for the first time. Studies conditioned on rs35705950 of MUC5B and replication of results in independent case–control samples further supported that the MUC5AC p.Ala5353Lys change has an additional contribution to IPF risk. Regarding the results of 14q21.3 and 17q21.31, we observed no evidence of replication based on an independent study with larger sample size. Besides this, none of the two top hits that have been linked to TOLLIP in the literature [12] were nominally significant in the discovery study despite the large overlap of samples (60.8%). This would have been determined by the statistical power of the discovery, as it was <35% for detecting the reported effects of these variants (not shown). Thousands of genetic variants have been reported for association with complex traits [27], the majority being frequent in the population (>5%) [28]. The contribution of low-frequency variants (<5%) in diseases such as those affecting blood lipid levels and cardiovascular disease [29], among others, has just started to be unearthed, facilitated by exome and whole-genome sequencing. There are only a few GWASs of IPF completed so far, all showing risk loci linked to common variants (MAF 11–54% for European ancestry populations) [6, 12, 16, 22]. Besides rs35705950 of MUC5B, which has a strong effect in the disease [12-16], other common SNVs associated thus far have milder effects with regard to IPF risk. One of the possibilities underlying these GWAS signals is the existence of underlying low-frequency variants with strong disease effects, as has been recently demonstrated for other well-known IPF genes by exome sequencing experiments (TERT, RTEL1 and PARN) [30]. In that scenario, such variation would be better ascertained for disease significance through NGS as conducted in this study. Thus, we focused on three genomic loci to uncover low-frequency variants with strong effects in IPF. Notably, we recognise the limitation to provide precise evaluations of low-frequency variants in IPF due to the small discovery sample size. However, our results support some contribution from low-frequency SNVs to IPF susceptibility in these loci, given that among the 36 genome-wide significant hits, only three of them with validation support (two in 11p15.5 and one in 17q21.31) were low-frequency variants in the controls. Despite that, two of these variants result in the same missense amino acid change for MUC5AC, encoded by another mucin gene located in 11p15.5. As MUC5AC p.Ala5353Lys was observed in 4.4% of controls and in as much as 13.8% of the cases, it associates with a relatively strong effect on IPF susceptibility. Hypersecretion of mucins, most abundantly the glycoproteins MUC5AC and MUC5B, is common during respiratory tract inflammation via cytokine stimulation (interleukin-13 and epidermal growth factor) [31]. Chronic hypersecretion and changes in the mucus viscosity can promote its accumulation in the airways, compromising the immune response and perpetuating tissue damage, leading to disease exacerbations [17]. Despite that the common variant rs35705950 of MUC5B results in increased mucin gene expression in lung tissues [14, 32] and is the strongest known risk factor for IPF, the exact mechanistic links between the enhanced production of this mucin and the development of IPF are incompletely understood. In IPF, overexpression of MUC5B and reduced expression of MUC5AC have been described in goblet cells located in the lung lesions in comparison with controls [32, 33]. Regulation of MUC5AC derives from the activation of cellular stress, damage and repair pathways, suggesting a key role during disrupted homeostasis [31]. Its activity has been involved in epithelial wound healing after mucosal injury [34]. Therefore, aberrant upregulation of MUC5B and downregulation or activity alterations of MUC5AC may synergise to alter mucus cell differentiation [35] and disrupt epithelial organisation. We speculate that the MUC5AC p.Ala5353Lys variant may, therefore, be promoting mucus production, either by directly increasing MUC5AC or indirectly by triggering further increases of MUC5B in the bronchiole [36]. Alternatively, altered glycosylation of this mucin could also contribute to impaired tissue remodelling and promote the disease [37]. Collectively, this evidence along with the results from our study mark MUC5AC as another biologically plausible IPF susceptibility gene. Further experiments will be needed to evaluate the potentially relevant cellular mechanisms. One of the strengths of this study is that we have provided fine-grained variant information from entire and well-recognised IPF loci, enlarging the spectrum of frequencies for SNVs in entire genes and flanking regions involved in the previously evidenced GWAS hits [12]. This is an important contribution as the bulk (>90%) of genetic risk factors involved in complex traits are located in noncoding sequences, supporting the weight of variation regulating transcription in the susceptibility of complex diseases [38]. Moreover, despite the challenge of sequencing the inaccessible repetitive mucin-encoding regions [39], our analytic procedures maintained false variant calls at low levels. These robust results were possible by the high mean depth of coverage reached in the sequencing experiments (>100×). This challenging task, however, imposed some major limitations. First, the discovery study was greatly facilitated by the use of a public database of controls for association testing, which is suitable and advantageous in NGS-based disease mapping approaches [40]. However, because the sequencing depth in cases and controls was different, the quality of sequencing results most likely differed between them, which can lead to considerable risk of sequencing artefacts and other technical issues that can introduce systematic errors [40]. To minimise such a possibility, we used an orthogonal sequencing method to validate the key findings and replicated the results in independent cases and controls. Further NGS studies with larger sample sizes will help to assess the impact of known and unknown genetic variation in these regions, as well as the role of other types of variants besides SNVs. Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author. ERJOR-00071-2019_supplementary material 00071-2019_supplementary_material
  40 in total

1.  Principal components analysis corrects for stratification in genome-wide association studies.

Authors:  Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal:  Nat Genet       Date:  2006-07-23       Impact factor: 38.330

Review 2.  Regulation of airway mucin gene expression.

Authors:  Philip Thai; Artem Loukoianov; Shinichiro Wachi; Reen Wu
Journal:  Annu Rev Physiol       Date:  2008       Impact factor: 19.318

3.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits.

Authors:  Lucia A Hindorff; Praveen Sethupathy; Heather A Junkins; Erin M Ramos; Jayashri P Mehta; Francis S Collins; Teri A Manolio
Journal:  Proc Natl Acad Sci U S A       Date:  2009-05-27       Impact factor: 11.205

4.  A genome-wide association study identifies an association of a common variant in TERT with susceptibility to idiopathic pulmonary fibrosis.

Authors:  T Mushiroda; S Wattanapokayakit; A Takahashi; T Nukiwa; S Kudoh; T Ogura; H Taniguchi; M Kubo; N Kamatani; Y Nakamura
Journal:  J Med Genet       Date:  2008-10       Impact factor: 6.318

5.  Mucin gene expression in intestinal epithelial cells in Crohn's disease.

Authors:  M P Buisine; P Desreumaux; E Leteurtre; M C Copin; J F Colombel; N Porchet; J P Aubert
Journal:  Gut       Date:  2001-10       Impact factor: 23.059

6.  Deletion of exon 4 from human surfactant protein C results in aggresome formation and generation of a dominant negative.

Authors:  Wen-Jing Wang; Surafel Mulugeta; Scott J Russo; Michael F Beers
Journal:  J Cell Sci       Date:  2003-02-15       Impact factor: 5.285

7.  Telomerase mutations in families with idiopathic pulmonary fibrosis.

Authors:  Mary Y Armanios; Julian J-L Chen; Joy D Cogan; Jonathan K Alder; Roxann G Ingersoll; Cheryl Markin; William E Lawson; Mingyi Xie; Irma Vulto; John A Phillips; Peter M Lansdorp; Carol W Greider; James E Loyd
Journal:  N Engl J Med       Date:  2007-03-29       Impact factor: 91.245

8.  Central role of Muc5ac expression in mucous metaplasia and its regulation by conserved 5' elements.

Authors:  Hays W J Young; Olatunji W Williams; Divay Chandra; Lindsey K Bellinghausen; Guillermina Pérez; Alberto Suárez; Michael J Tuvim; Michelle G Roy; Samantha N Alexander; Seyed J Moghaddam; Roberto Adachi; Michael R Blackburn; Burton F Dickey; Christopher M Evans
Journal:  Am J Respir Cell Mol Biol       Date:  2007-04-26       Impact factor: 6.914

9.  Adult-onset pulmonary fibrosis caused by mutations in telomerase.

Authors:  Kalliopi D Tsakiri; Jennifer T Cronkhite; Phillip J Kuan; Chao Xing; Ganesh Raghu; Jonathan C Weissler; Randall L Rosenblatt; Jerry W Shay; Christine Kim Garcia
Journal:  Proc Natl Acad Sci U S A       Date:  2007-04-25       Impact factor: 11.205

10.  Short telomeres are a risk factor for idiopathic pulmonary fibrosis.

Authors:  Jonathan K Alder; Julian J-L Chen; Lisa Lancaster; Sonye Danoff; Shu-chih Su; Joy D Cogan; Irma Vulto; Mingyi Xie; Xiaodong Qi; Rubin M Tuder; John A Phillips; Peter M Lansdorp; James E Loyd; Mary Y Armanios
Journal:  Proc Natl Acad Sci U S A       Date:  2008-08-27       Impact factor: 11.205

View more
  9 in total

Review 1.  Human Fibrosis: Is There Evidence for a Genetic Predisposition in Musculoskeletal Tissues?

Authors:  Louis Dagneaux; Aaron R Owen; Jacob W Bettencourt; Jonathan D Barlow; Peter C Amadio; Jean P Kocher; Mark E Morrey; Joaquin Sanchez-Sotelo; Daniel J Berry; Andre J van Wijnen; Matthew P Abdel
Journal:  J Arthroplasty       Date:  2020-06-04       Impact factor: 4.757

Review 2.  Opportunities and challenges for the use of common controls in sequencing studies.

Authors:  Genevieve L Wojcik; Jessica Murphy; Jacob L Edelson; Christopher R Gignoux; Alexander G Ioannidis; Alisa Manning; Manuel A Rivas; Steven Buyske; Audrey E Hendricks
Journal:  Nat Rev Genet       Date:  2022-05-17       Impact factor: 59.581

Review 3.  Decrypting the crosstalk of noncoding RNAs in the progression of IPF.

Authors:  Yujuan Wang; Han Xiao; Fenglian Zhao; Han Li; Rong Gao; Bingdi Yan; Jin Ren; Junling Yang
Journal:  Mol Biol Rep       Date:  2020-03-16       Impact factor: 2.316

Review 4.  Mucins as a New Frontier in Pulmonary Fibrosis.

Authors:  Beatriz Ballester; Javier Milara; Julio Cortijo
Journal:  J Clin Med       Date:  2019-09-11       Impact factor: 4.241

Review 5.  Toll-Interacting Protein in Pulmonary Diseases. Abiding by the Goldilocks Principle.

Authors:  Xiaoyun Li; Gillian C Goobie; Alyssa D Gregory; Daniel J Kass; Yingze Zhang
Journal:  Am J Respir Cell Mol Biol       Date:  2021-05       Impact factor: 6.914

Review 6.  Idiopathic Pulmonary Fibrosis: An Update on Pathogenesis.

Authors:  Qianru Mei; Zhe Liu; He Zuo; Zhenhua Yang; Jing Qu
Journal:  Front Pharmacol       Date:  2022-01-19       Impact factor: 5.810

7.  Mammalian Neuraminidases in Immune-Mediated Diseases: Mucins and Beyond.

Authors:  Erik P Lillehoj; Irina G Luzina; Sergei P Atamas
Journal:  Front Immunol       Date:  2022-04-11       Impact factor: 8.786

Review 8.  Research Progress in the Molecular Mechanisms, Therapeutic Targets, and Drug Development of Idiopathic Pulmonary Fibrosis.

Authors:  Hongbo Ma; Xuyi Wu; Yi Li; Yong Xia
Journal:  Front Pharmacol       Date:  2022-07-21       Impact factor: 5.988

Review 9.  The Epithelial-Immune Crosstalk in Pulmonary Fibrosis.

Authors:  Thomas Planté-Bordeneuve; Charles Pilette; Antoine Froidure
Journal:  Front Immunol       Date:  2021-05-19       Impact factor: 7.561

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.