Literature DB >> 21602797

Genome-wide association and linkage identify modifier loci of lung disease severity in cystic fibrosis at 11p13 and 20q13.2.

Fred A Wright1, Lisa J Strug, Vishal K Doshi, Clayton W Commander, Scott M Blackman, Lei Sun, Yves Berthiaume, David Cutler, Andreea Cojocaru, J Michael Collaco, Mary Corey, Ruslan Dorfman, Katrina Goddard, Deanna Green, Jack W Kent, Ethan M Lange, Seunggeun Lee, Weili Li, Jingchun Luo, Gregory M Mayhew, Kathleen M Naughton, Rhonda G Pace, Peter Paré, Johanna M Rommens, Andrew Sandford, Jaclyn R Stonebraker, Wei Sun, Chelsea Taylor, Lori L Vanscoy, Fei Zou, John Blangero, Julian Zielenski, Wanda K O'Neal, Mitchell L Drumm, Peter R Durie, Michael R Knowles, Garry R Cutting.   

Abstract

A combined genome-wide association and linkage study was used to identify loci causing variation in cystic fibrosis lung disease severity. We identified a significant association (P = 3.34 × 10(-8)) near EHF and APIP (chr11p13) in p.Phe508del homozygotes (n = 1,978). The association replicated in p.Phe508del homozygotes (P = 0.006) from a separate family based study (n = 557), with P = 1.49 × 10(-9) for the three-study joint meta-analysis. Linkage analysis of 486 sibling pairs from the family based study identified a significant quantitative trait locus on chromosome 20q13.2 (log(10) odds = 5.03). Our findings provide insight into the causes of variation in lung disease severity in cystic fibrosis and suggest new therapeutic targets for this life-limiting disorder.

Entities:  

Mesh:

Year:  2011        PMID: 21602797      PMCID: PMC3296486          DOI: 10.1038/ng.838

Source DB:  PubMed          Journal:  Nat Genet        ISSN: 1061-4036            Impact factor:   38.330


Lung disease is the major source of morbidity and mortality in cystic fibrosis (CF), a recessive disorder caused by mutations in the cystic fibrosis transmembrane conductance regulator (CFTR) gene. Allelic variation in CFTR does not explain the wide variation in severity of lung disease[1] however studies of twins and siblings demonstrate substantial heritability underlying differences in lung function measures in CF patients (h2 > 0.5)[2]. Candidate gene studies have produced conflicting results, with only a few large scale replications accounting for a small proportion of heritable variation in CF lung function[3,4]. Identification of other genetic modifiers could identify potential mechanisms for variation in lung function in CF, as well as for common diseases such as chronic obstructive pulmonary disease (COPD), and suggest new targets for intervention. Whole-genome methods provide an attractive approach to identify modifier loci of Mendelian disorders. However CF presents numerous challenges, such as: (1) collecting multiple years of lung function measures to accurately classify lung disease severity; (2) selecting the appropriate study design to identify common and rare variants; (3) accruing sufficient sample sizes, and (4) accounting for potential interaction between CFTR and modifier loci. To overcome these challenges, we formed a North American CF Gene Modifier Consortium to identify modifiers of lung disease severity and other phenotypes. For lung disease in CF, the forced expiratory volume in 1 second (FEV1) is the most clinically useful measure of lung disease severity and is a well-established predictor of survival[5,6]. However, comparison of FEV1 measures across a broad age range of CF patients is confounded by decline with age and mortality attrition. To account for these confounders, the Consortium developed a quantitative lung disease phenotype based on multiple measures of FEV1 over 3 years[7] that displays robust genetic influence (h2 = 0.51)[8]. The Consortium is composed of three samples of CF patients recruited using different study designs. The Genetic Modifier Study (GMS) consists of unrelated patients homozygous for the common CF allele F508del (HGVS nomenclature: p.Phe508del), recruited from extremes of lung function[9]. The Canadian Consortium for Genetic Studies (CGS) enrolled unrelated patients having pancreatic insufficiency from a population-based sample[10]. The CF Twin and Sibling Study (TSS) recruited families where two or more surviving children have CF[2]. The GMS and CGS were designed for association analysis, while the TSS was designed for both linkage and association, providing an opportunity to detect rarer variants or poorly tagged loci. As many current Genome Wide Association Studies (GWASs) employ sample sizes that are several-fold larger than available for the CF population, we sought to maximize power by (1) testing association using combined data from GMS and CGS, followed by replication using the association evidence from TSS, and (2) testing linkage using the TSS, followed by SNP association testing in linked regions in the unrelated patients in GMS and CGS. We also restricted analysis to patients bearing two severe loss-of-function CFTR alleles and a subset of these patients that had identical CFTR genotypes (homozygosity for F508del).

RESULTS

Genome-wide association analysis of lung disease severity in CF

A total of 3,467 CF patients are represented in three study designs (Table 1, Supplementary Note). Patients in the GMS and 60% of the patients in the CGS and TSS are F508del homozygotes (F508del/F508del), while the remainder has other severe exocrine pancreatic CFTR genotypes[2,9,10]. The three samples showed consistent distributions of the lung disease phenotype, with the mid-range under-represented in GMS due to the extremes-of-phenotype design (Figure 1). Patients were contemporaneously genotyped using the Illumina 610-Quad array® in a single facility with stringent quality control (Online Methods). Association scans for the GMS and CGS used an additive model adjusted for sex and principal components as described[11]. Results were combined using a directional meta-analysis approach for (1) GMS and CGS, n=2,494 and (2) GMS and CGS F508del/F508del, n=1,978 (power analysis shown in Supplementary Figure 1).
Table 1

Characteristics of patients enrolled by the three studies comprising the North American CF Gene Modifier Consortium

Genetic Modifier Study (GMS)Canadian Consortium for Genetic Studies (CGS)Twins & Sibs Study (TSS)
Lead Institution(s)Univ. of North Carolina/Case WesternHosp. Sick ChildrenJohns Hopkins
DesignExtremes-of-Phenotype UnrelatedPopulation-Based UnrelatedFamily-Based
Type of EvidenceAssociationAssociationLinkage and association
Number of patients1,1371,357973 a (486 sibling pairs)
Severe (n = 406)Mild (n = 731)
Age
Mean ± SD (yrs)15.2 ± 4.627.5 ± 9.818.5 ± 9.515.5 ± 7.8
Range (yrs)8-2515-566-496-55
Male n (%)194 (47.8%)405 (55.4%)734 (54.1%)521 (53.5)
Caucasian n (%)b1,137 (100.0%)1,180 (87.0%)898 (92.3%)
F508del/F508del n (%)1,137 (100.0%)841 (62.0%)557 (57.2%)
Pancreatic Exocrine Insufficient n (%)1,137 (100.0%)1,357 (100.0%)973 (100.0%)

420 two-sib families, 20 three-sib families, 1 four-sib family and 69 singletons.

Based on self-identified ancestry and principal components analysis.

Figure 1

Histograms of the Consortium lung phenotype for the three cystic fibrosis studies show similar average phenotypes. The phenotype mean is above zero due to a lower bound placed by the survival correction, as well as cohort effects of improving lung function. (a) The two designs using unrelated individuals. All of the patients in the Genetic Modifier Study (GMS) are F508del/F508del at CFTR. These patients were oversampled at extremes of an initial entry phenotype, in order to improve power, and the original severe/mild designations are colored separately. In contrast, the Canadian Consortium for Genetic Studies (CGS) is population based, representing a range of pancreatic insufficient CFTR genotypes. (b) Patients enrolled in the family-based Twin and Sibling Study (TSS) show a similar distribution of the Consortium lung phenotype as the population-based CGS.

The combined GMS and CGS analysis identified seven regions with suggestive association (P ≤ 1/570,725 = 1.75 × 10-6) (Figure 2 and Table 2). Restricting analysis to F508del/F508del patients, the EHF-APIP region on 11p13 achieved genome-wide significance at rs12793173 (P=3.34 × 10-8, explaining 1.0% of the phenotype variation in GMS, 2.2% in CGS F508del/F508del). We verified the significance by permutation analysis and by developing an alternative conditional likelihood approach which acknowledged the GMS extremes of phenotype (Online Methods, Supplementary Figure 2). With the inclusion of CF-relevant covariates (sex, BMI and previously associated genes), association for rs12793173 was even stronger (P= 9.42 × 10-9 for GMS and CGS F508del/F508del; Supplementary Table 1). Two purported modifiers of CF lung disease, TGFB1 and IFRD1, did not achieve genome-wide significance. TGFB1 did, however, achieve P-values in the range of 10-3 to 10-4 in the GMS sample, depending on additional covariates (Supplementary Table 1).
Figure 2

Genome-wide Manhattan plots for the cystic fibrosis Consortium lung function phenotype, combining the association evidence from GMS and CGS samples across 570,725 SNPs. The black dashed line represents the Bonferroni threshold for genome-wide α=0.05, while the green dashed line is the suggestive association threshold, expected once per genome scan. SNPs are plotted in Mb relative to their position on each chromosome (alternating blue and black) (a) Results from GMS (n=1137, all of whom are F508del/F508del) combined with all of the CGS patients (n=1357). Seven regions reach suggestive significance. (b) Results from the combined evidence of GMS (n=1137) and the CGS F508del/F508del (n=841). A region on chromosome 11p13 reaches genome-wide significance (P=3.34 × 10-8).

Table 2

Significant and suggestive association results for GMS and CGS, with replication values for TSS

SNPChrBase pairaNearestGeneCategorybRiskallelecNon-riskallelec(Minorallele)Freq.dGMScoef.eCGSF508del/F508delcoef.CGSAllcoefAnalysis withmaxsignificanceP-value:GMS+CGSF508del/F508delP-value:GMS+CGS AllP-value:TSSfP-value:Jointg
rs127931731134,790,780APIP/EHFSignificantCT(C) 0.240.160.200.12GMS+CGS F508del/F508 del3.34E-081.76E-060.0061.49E-09
rs1403543X115,216,220AGTR2SuggestiveAG(G) 0.490.220.070.11GMS+CGS All1.61E-052.58E-070.0531.71E-06
rs9268905632,540,055HLA-DRASuggestiveCG(C) 0.320.160.100.12GMS+CGS All1.42E-052.81E-070.0321.21E-07
rs47605061291,857,181EEA1SuggestiveGA(A) 0.450.160.100.10GMS+CGS All6.77E-068.56E-070.5949.15E-05
rs128838841469,586,936SLC8A3SuggestiveTG(G) 0.390.120.150.12GMS+CGS All1.20E-069.56E-070.2237.81E-06
rs121881645481,236AHRRSuggestiveAC(A) 0.380.080.120.15GMS+CGS All5.92E-041.34E-060.1363.65E-06
rs116453661660,934,654CDH8SuggestiveCT(T) 0.230.170.130.13GMS+CGS All1.23E-051.52E-060.1827.03E-06

NCBI build 36.

Significant and suggestive imply P ≤(0.05/570725)=8.76 × 10-8 or P ≤ (1/570725)=1.75 × 10-6, respectively, for at least one analysis (GMS+CGS F508del/F508del or GMS+CGS All).

Alleles indexed to the forward strand of NCBI build 36; the risk allele is the allele associated with worse lung function.

Minor allele frequencies are listed for all GMS +CGS All. Study-specific MAFs are provided in Supplementary Table 1.

Coefficients refer to the average reduction in the Consortium lung phenotype for each copy of the risk allele.

TSS direction-consistent association p-value, for TSS F508del/F508del only, or TSS All Patients, selected according to the GMS+CGS result with maximum significance.

Joint meta-analysis P-value for GMS, CGS, and TSS, with selection of patients (F508del/F508del only, or All Patients) according to the GMS+CGS result with maximum significance

The SNPs in the significant region and the six suggestive regions in GMS and CGS were evaluated for association in TSS using Merlin[12], while accounting for family structure. To be consistent with the GMS and CGS allelic effect, each replication test was one-sided, with the TSS sample (all or F508del/F508del patients) for each suggestive SNP chosen to be consistent with the GMS and CGS sample set providing maximum significance. Covariates for sex and four principal components[11] were included for TSS. The SNP attaining genome-wide significance in GMS and CGS (rs12793173, F508del/F508del) demonstrated significant association in the TSS F508del/F508del sample (P=0.006; Bonferroni corrected P = 0.041 for the seven replication tests; Table 2). Two of the suggestive SNPs provided modest evidence in TSS: rs9268905 near HLA-DRA (P=0.032) and rs1403543 near AGTR2 (P=0.053), with neither significant after correcting for the seven replication tests. We next performed a joint analysis, shown to be more powerful than testing followed by replication[13], using a weighted meta-analysis procedure (Online Methods). Using all patients, rs12793173 attained genome-wide significance (P=1.12 × 10-8). For this patient set, rs568529, a SNP in high LD (r2 > 0.9) with rs12793173, achieved slightly greater significance (P=9.75 × 10-9). As in the earlier analysis, restricting to F508del/F508del patients increased the significance of EHF-APIP (P=1.49 × 10-9 for rs12793173 (Table 2), P=8.28 × 10-10 for rs568529). In the HLA class II region, a SNP (rs2395185, ~1kb from the suggestive SNP rs9268905 identified from GMS and CGS) approached genome-wide significance using all patients (P=9.02 × 10-8; Supplementary Figure 3). SNPs in AGTR2 remained suggestive for all patients (rs5952206, P=1.25 × 10-7) and for F508del/F508del patients (rs7060450, P=3.67 × 10-7). Figure 3 shows the GMS and CGS results for an 800kb interval including EHF-APIP. The minimum P-value appears in an intergenic region 3’ to both EHF and APIP. A second peak at rs286873 (P=5.62 × 10-7) near EHF exhibited low linkage disequilibrium (r2 < 0.2) with the primary SNP (Figure 3). After conditioning on the primary finding, rs286873 had regional statistical significance (rs12793173; corrected P=0.0029), suggesting additional regional genetic variants (Supplementary Figure 4). We repeated the testing after MACH imputation[14]. The imputed SNPs in the region identified the same EHF/APIP interval, with minimum P=1.45 × 10-8, at rs535719, at a position 19kb closer to APIP than rs12793173. None of the imputed SNPs produced substantially improved association evidence (Supplementary Figure 5). Neither total copy number nor allele-specific copy number (Online Methods) models met genome-wide significance (illustrative Manhattan plot in Supplementary Figure 6). Finally, after sequencing the exonic regions of EHF and APIP in 48 patients with mild pulmonary disease and 48 patients with severe pulmonary disease from the GMS, no additional genetic variation was found that offered insight into putative modifying roles (data not shown).
Figure 3

A plot of the association evidence in GMS and CGS F508del/F508del in the chromosome 11p13 EHF/APIP region (NCBI build 36, LocusZoom viewer). Colors represent HapMap CEU linkage disequilibrium r2 with the most significant SNP, rs12793173 (P=3.34 × 10-8). The secondary peak at rs286873 has relatively low r2 with the primary peak.

Linkage of lung disease severity in CF to chromosome 20q13.2

Linkage analysis revealed a genome-wide significant multipoint LOD score of 5.03 at rs4811626, located at 53.81 Mb (~85cM) on chromosome 20q13.2 (nominal P=7.9 × 10-7 ; genome-wide[15] P=2.3 × 10-3; Figure 4). Another, but more modest linkage signal was on chromosome 1p22.21, with multipoint LOD score of 2.48 for rs941031at 91.07 Mb (119 cM). Inclusion of BMI-Z, an important covariate of CF lung function (Supplementary Table 1), increased the LOD score for the linkage peak on 20q13.2 to 5.72 (genome-wide P=5.05 × 10-4 at rs4811645 which is 0.07cM (0.13Mb) from rs4811626; Figure 5) while linkage on chromosome 1p22.21 decreased to LOD 1.67. Thus, anthropometric measures are not major contributors to the linkage on 20q13.2 but may be playing a role on 1p22.21. We estimated that the QTL at 20q13.2 is approaching 50% of the variation in lung function in the CF sibling pairs (Supplementary Figure 7); however, this estimate is highly likely to be biased upward due to winner’s curse[16].
Figure 4

Genome-wide linkage scan for the Consortium lung phenotype of 486 sibling pairs in the family-based TSS, adjusted for sex. A QTL with a genome-wide significant LOD=5.03 was found on 20q13.2. LOD scores with SNPs used in the linkage panel are plotted in cM relative to their position on each chromosome (alternating blue and black).

Figure 5

Regional analysis of the QTL on chromosome 20q13.2 (a) A detailed chromosome 20 linkage plot for the Consortium lung phenotype in the TSS study, with covariates sex (essentially the same result as for no covariates) and with covariates sex and BMI. (b) Association evidence from the GMS and CGS F508del/F508del patients, in the 1-LOD support interval provided by TSS. A region centromeric to CBLN4 and MC3R on 20q13.2 shows suggestive evidence of association, with the greatest evidence at rs6024460 (P=1.34 × 10-4).

A region of 1.31 Mb on 20q13.2, demarcated by 1 LOD unit below the maximum (when BMI-Z was used as a covariate), was analyzed for association in the combined GMS and CGS samples. A 16kb cluster of SNPs in high LD (rs6092179, rs6024437, rs8125625, rs6024454 and rs6024460; r2 > 0.8) located ~200kb from CBLN4 generated the lowest P-values in the combined GMS and CGS F508del/F508del samples (Figure 5). The SNP with the lowest P-value (rs6024460; P=1.34 × 10-4) reached regional significance (corrected P = 0.041). Association in the TSS identified a SNP (rs6069437) with marginal association (uncorrected P = 0.014) that displays weak LD with the GMS and CGS cluster of SNPs. Imputation did not identify any SNPs exhibiting a lower P value for association than rs6024460 (Supplementary Figure 8).

A combined false discovery rate approach corroborates genome-wide significance of loci on chromosomes 11 and 20

To evaluate association and linkage in a single framework, linkage information was used to reprioritize genome-wide association using extensions of the false discovery rate (FDR)[17] via the stratified FDR (SFDR)[18] and weighted FDR (WFDR)[19]. We (1) obtained linkage-weighted q-values representing the combined evidence at each SNP, and (2) re-ranked GWAS results by linkage-weighted q-values (see Online Methods). Results are presented from the WFDR; results were confirmed using the SFDR (data not shown). SNPs with q-values less than 0.05 were declared to be genome-wide significant (Table 3). SNPs in the EHF-APIP region on chromosome 11 are highly significant (low q-values), because of the strong association (Table 3). After accounting for linkage, the q-values for SNPs under the linkage peak on chromosome 20 are considerably decreased. The results presented in Table 3 illustrate that the linked SNPs on chromosome 20 are now top ranked genome-wide, while they were ranked 154th or lower, prior to incorporating the linkage information. The top-ranked SNP by the WFDR analysis was rs6092179 at 53.81 Mb on chr 20 (WFDR q-value=0.015, Table 3). SNP rs6092179 is within an LD block containing 4 other SNPs (rs6024437, rs8125625, rs6024454 and rs6024460), all demonstrating association with CF lung function and q-value <0.05. A rank-based q-value Manhattan plot demonstrates that chromosome 11 and chromosome 20 both attain genome-wide significance (Supplementary Figure 9).
Table 3

Combined association and linkage-weighted FDR q-values and genome-wide ranks for SNPs with WFDR q-values genome-wide significant (< 0.05)

ChrSNPBase PairGMS+CGS F508del/F508del Association P valueFDR q-valueaFDR rankWFDR q-valuebWFDR rank
11rs9313897162895.08E-060.012470.038316
11rs11032829347050782.29E-060.00830.02778
11rs7924717347329072.76E-060.00810.02186
11rs10466455347375123.86E-070.012460.037514
11rs7929679347624251.47E-060.00850.028210
11rs10836312347670191.56E-070.00820.02777
11rs525202347785241.34E-070.00840.02779
20rs7265042537908161.14E-030.785219760.045918
20rs6098782537919741.84E-030.73498650.045917
20rs910668537947531.09E-030.50291750.0152
20rs6092176537991091.51E-030.55812550.0155
20rs6092179538124401.93E-040.4841540.0151
20rs6024437538139621.61E-040.755311160.035313
20rs8125625538203522.49E-040.5162070.0153
20rs6024454538268402.56E-040.761513480.038115
20rs6024460538289481.34E-040.73498540.029612
20rs11907114538623542.79E-030.5542500.0154
20rs1326022542774321.16E-030.73448240.029611

Benjamini-Hochberg approach based on association P-value.

Weighted FDR using combined linkage information and association P-values.

Rows in bold indicate the top ranked SNPs before incorporating linkage evidence (rs7924717 on chromosome 11) and after (rs6092179 on chromosome 20)

DISCUSSION

We identified two new loci containing genetic variants contributing to variation in lung function in CF patients. The success of this project reflected: 1) coordinated analysis of three independent samples of the CF population (representing ~15% of all patients in North America) where each study subject was characterized by the same quantitative measure of lung function; 2) simultaneous genotyping of samples using a single platform which allowed for data cleaning using relatedness assessments and removal of poor quality genotypes based on parent to child transmission predictions; 3) analyzing for loci with small effect sizes using association, and loci of major effect (even in the presence of substantial allelic heterogeneity) using linkage. Moreover, we garnered increased power from an extreme of phenotype sample, while a population-based sample allowed for the development of a phenotype with external validity. The association at chr11p13 is in an intergenic region 3’ to APIP and EHF with regulatory features including: i) significant conservation across species, ii) open chromatin (DNAase hypersensitivity and FAIRE-Seq), and iii) DNAase hypersensitive patterns suggesting cell-type-specificity (http://genome.ucsc.edu). The UCLA Gene Expression Tool (UGET, http://genome.ucla.edu/~jdong/GeneCorr.html)[20,21] indicates correlation of expression of nearby genes, including strong correlation of EHF to ELF5, both epithelial-specific transcription factors; APIP to PDHX, which have the same promoter region; and EHF to APIP. APIP (Apaf-1-interacting protein) is known to inhibit apoptosis by binding to APAF-1, an important activator of caspase-9[22, 23] and by APAF-1 independent activation of AKT and ERK1/2[24]. EHF is a member of epithelial-specific-Ets transcription factors that share a conserved Ets domain[25-27]. EHF can be induced in bronchial epithelial cells, smooth muscle cells and fibroblasts[28,29], leading to transcriptional repression of a subset of ETS/AP-1-responsive genes activated by MAP-kinase pathways[26,28], and in airway it may serve as an important regulator of differentiation under conditions of stress and inflammation[26,27]. Both genes show evidence of robust expression in lung and trachea, with APIP showing ubiquitous expression across tissues and EHF showing highest expression in trachea (http://www.ncbi.nlm.nih.gov/UniGene and http://www.ncbi.nlm.nig.gov/geo)[30]. Interestingly, cis-eQTL signatures for APIP are reported for lymphocytes and monocytes (eqtl.uchicago.edu). Comparing the eQTLs to the direction of phenotype-genotype association suggests that increased expression of APIP may be associated with decreased lung function, implying that inhibition of apoptosis worsens CF lung disease. This hypothesis is consistent with the emerging concepts that delayed neutrophil clearance, due to reduced apoptosis in neutrophils in the airways of CF patients, could lead to a hyperinflammatory state and more severe lung disease [31,32] and that inhibition of apoptosis contributes to goblet cell metaplasia, a central feature in CF airway pathophysiology[33]. All 5 genes within the 1 LOD support interval in the chromosome 20 linkage region (Figure 5) are expressed in either fetal or adult lung or in bronchial epithelial cells (http://genome.ucsc.edu/). The 16kb cluster of SNPs associated with lung function in the GMS and CGS samples is located ~200kb to 500 kb centromeric to the five genes. None of the SNPs lies within a segment of open chromatin identified in the 16kb region in Normal Human Bronchial Epithelia cells (http://genome.ucsc.edu). Neither eQTL in lymphocytes (eqtl.uchicago.edu), miRNA (http://www.mirbase.org) nor DNaseI hypersensitive sites in Small Airway Epithelial cells map to the 16kb region. However, this does not exclude the possibility that the associated region regulates expression of any of the five genes or more distant genes. Among the five genes, MC3R has been implicated in weight maintenance and regulation of energy balance in animals and humans[34-36]. Variation in resting energy expenditure has been correlated with lung function measurements, lung tissue damage and lung disease exacerbation in CF patients [37,38]. MC3R has also been implicated as a modulator of neutrophil accumulation in a murine model of lung inflammation[39], a key feature of CF lung disease, as noted above. Other genes of interest within the linkage peak encode Crk-associated substrate scaffolding (CASS) 4 (CASS4/HEPL), a relative of proteins implicated in cell attachment, migration establishing polarity, invasion and phagocytosis of bacterial pathogens[40] and Aurora kinase A (AURKA) which been shown to interact with Hef1/NEDD9, a member of the CASS family that mediates cytokinesis in late mitosis and facilitates disassembly of primary cilia[41]. Twin studies in adults demonstrate that FEV1 is under strong genetic influence[42,43], and at least three loci (GSTCD, TNS1 and HTR4) have been reproducibly associated with this measure[44-46]. Multiple replicated loci have also been associated with variation in the FEV1/FVC ratio[45,46] and at least two of these loci (HHIP and FAM13) show reproducible association with COPD[44,47,48]. While the lung phenotype used here was based on FEV1, none of the above loci coincides with the regions identified in this study and neither of the loci identified here occur within the top 2000 associations for FEV1 or FEV1/FVC[45,46]. Common variation in the EHF/APIP region is estimated to alter the lung function measure in the GMS and CGS F508del/F508del patients by ~0.2 units of the quantitative lung disease phenotype per allele (Table 2). Translated into more familiar clinical terms, the 0.2 unit difference is approximately equivalent to a mean difference in FEV1 percent predicted of 5.1 ± 1.9, corresponding to a mean difference in FEV1 of 254 ± 86mL in patients over 18 years of age (Online Methods). The QTL on chromosome 20 may account for a sizeable fraction of lung function variation in CF. Using simulations described by Blangero and colleagues[16], we estimate that this locus accounts for a maximum of 46% and a minimum of 4% of the variance in the CF siblings (Online Methods). In summary, our association and linkage approach provided complementary findings with the identification of two significant loci harboring genes of biologic relevance for CF. Of particular note for modifier searches in other monogenic diseases is the potential importance of minimizing variation in the causative gene. When we confined association analysis to patients with identical CFTR genotypes (i.e. F508del/F508del), one of the 7 suggestive loci achieved genome-wide significance, despite the reduction in sample size due to the exclusion of 38% of subjects in the CGS sample with other CFTR genotypes. The remaining suggestive loci contain biologically intriguing candidate modifiers that will be evaluated in future studies. Finally, the identification of genetic loci that modify lung function in CF, should provide new insight leading to the development of novel therapies for this devastating condition.

ONLINE METHODS

Genotyping and quality control

DNA from whole blood or transformed lymphocytes was hybridized to the Illumina 610-Quad ® platform at Genome Quebec (McGill University and Genome Quebec Innovation Centre,) using the 96-well plates with CEPH and one replicate control per plate. Illumina BeadStudio® was used to call genotype, and identity confirmed by Sequenom® fingerprinting. SNPs were removed if they were monomorphic, missing > 10% calls or with >1% Mendelian error in TSS trios. Finally, 570,725 autosomal and X-chromosome SNPs were selected, as well as 158 chromosome Y SNPs and 138 mitochondrial SNPs. Duplicate discordance was 0.004% in GMS, and similar for the other studies. Sample exclusions included: initial call rate below 98%, unexpected close relatives or duplicate enrollments, unresolved sex mismatches, aneuploidy or outlying heterozygosity (> 5 standard deviations from the mean of 31.6%). Overlapping from 542 Illumina GoldenGate ® SNPs in GMS revealed platform discordance of 0.07%. Families with >5% Mendelian errors were excluded. Twenty-eight patient samples were excluded (GMS6; CGS 17, TSS 5) due to genotyping failure or artifacts, two GMS samples excluded due to outlying ancestry (by PC analysis), and eight GMS samples excluded for > second degree relation with other samples. Reported findings were verified using Illumina GenomeStudio V1.0.2® module V1.0.10 and manually-assisted calling.

Association testing

Regressions for the lung phenotype were performed separately for GMS, all CGS, and CGS F508del/F508del using an additive model in PLINK v. 1.07 [49], adjusted for sex and genotype principal components (PCs)[11]. Using the PLINK z-statistics for GMS and CGS, the standard meta-analysis z-statistic[50] was z = w + w, with weights inversely proportional to standard errors, and common reference alleles for directional consistency. “Suggestive” association used the approximate threshold 1/(number of SNPs)=1/570,725=1.75 × 10-6, and significant association the Bonferroni threshold P < 0.05/570,725 = 8.76 × 10-8. For males, X-chromosome genotypes followed PLINK defaults (0 or 1 minor alleles; alternative coding resulted in no qualitative changes). Permutations of genotypes relative to phenotypes and covariates (1,000) were used to refine the thresholds. From this pool of permutations, 10,000 permuted meta-analyses were computed. The obtained significance thresholds for a genome-wide error 0.05 were P = 1.07 × 10-7 (GMS and CGS) and P = 1.05 × 10-7 (GMS and CGS F508del/F508del). Consequently, P< 5 × 10-8 achieves false positive error control at genome-wide α<0.05, even correcting for two separate GWAS analyses. Regional multiple-comparisons correction (after highlighting a region) used the Bonferroni correction for the regional SNPs. TSS association analysis was performed in 973 CF siblings and for the 557-patient F508del/F508del subset using the Merlin variance-components additive model framework[12], corrected for linkage, family structure, sex, and 4 PCs. Missing genotypes (0.125%) were inferred to increase power[51]. Joint analyses of GMS, CGS and TSS used the meta-analysis approach described above.

A combined conditional likelihood approach

We devised a novel approach using the assumption that CGS represents a random population sample, whereas GMS was conditional on the observed phenotypes. Letting g be the number of SNP minor alleles, the phenotypes y were pre-adjusted for sex and the study-specific PCs. We assumed an additive model y = β (0, σ2). The full likelihood conditioned on GMS sampling was where . Finally, we computed the SNP-specific statistic 2 × (log-likelihood ratio), with β1 = 0 as the null and compared to 2 . The approach assumes the effect sizes are the same in GMS and CGS, which is true under the null.

Power Analyses

Power analyses for the combination of GMS and CGS assumed an additive genetic model, with effect β1 on the average phenotype for each minor allele. The results for GMS and CGS F508del/F508del are in Supplementary Figure 1. For each simulation the weighted meta-analysis P-values were compared to 5 × 10-8.

Genotype imputation

MACH (autosomes, http://www.sph.umich.edu/csg/abecasis/mach/) and IMPUTE (chromosome X, http://mathgen.stats.ox.ac.uk/impute/impute.html) imputation was conducted for 1162 GMS patients, 1,254 self-reported CGS “Caucasian” patients and 60 CEU reference samples from HapMap I/II. Some of these individuals were later used for TSS, and association analyses considered only unique subsets in GMS and CGS, respectively (Table 1). Imputation yielded data for ~2,544,000 autosomal and ~65,000 chromosome X SNPs.

Copy-number analysis

Copy number variants (CNVs) were detected using pennCNV (2008Nov19 version)[52] and genoCNV (version 1.08)[53] using default parameters in 1103 GMS and 1301 CGS samples. CNVs with fewer than 5 probes or showing <1% variation were used, resulting in 3,008/4,868 probes from genoCNV/pennCNV in GMS and 3015/4663 probes for genoCNV /pennCNV in CGS. Genotype PCs were used to control stratification.

Linkage Marker Selection

19,566 SNPs were selected from the Illumina platform with minor allele frequency >0.4 and r <0.01 between adjacent SNPs, using Merlin[54]. HapMap II recombination data were used to integrate genetic and physical map positions. Average inter-marker distance was 0.18 cM, or 0.13 Mbp. Physical positions not appearing in HapMap were estimated assuming uniform recombination between known adjacent SNPs. The average marker information content was ~0.9 (multipoint) and ~0.31 (two-point).

Linkage Analysis

Variance components were estimated in SOLAR (Sequential Oligogenic Linkage Analysis Routines)[55], with similar results from Merlin[54], using multipoint IBD probabilities obtained from Merlin. LOD scores were computed with and without covariates (sex and average BMI Z-score). Multipoint LODs>2.0 was considered suggestive and LOD>3.7 was considered genome-wide significant[15].

WFDR and SFDR methods

Let P be the p-value of an association test for SNP i, i =1,…,m. Converting p-values to q-values[56] controls the FDR. SNPs with q-values less than the FDR threshold value (e.g. γ = 0.05) are declared significant. The expected proportion of false positives among all the positives is then controlled at level γ. Note that ranking SNPs by P-value or q-value are equivalent. Let Z be the linkage score of SNP i obtained from a GWL study. For the SFDR method, m SNPs are divided into K disjoint strata based on the prior linkage information[57]. Cconsider K = 2 and assign each SNP i to stratum 1 (the high priority group) or stratum 2 (the low priority group) according to whether the linkage score Z exceeds a threshold C (we used C=3.3 corresponding to significant linkage[15]). Q-values are then calculated separately for each stratum of SNPs, achieving FDR control in each stratum (Sun et al., 2006). Ranks of the GWAS SNPs are determined by the q-values with the original association p-values used to break any q-value ties. WFDR calculates a weighting factor W for each SNP i with weights subject to two constraints: W ≥ 0_ and W̄ = Σ W = 1. The weight W is proportional to the linkage signal Z for SNP i (e.g.W exp(B · Z) / ν, ν = Σi exp(B · Z)/m,, and B=1) (Roeder et al., 2006), and the FDR procedure is applied to the set of weight-adjusted p-values, P =1,…,m. We use B=2 in the present analysis. The WFDR and SFDR were implemented in a perl program called SFDR, available at http://www.utstat.toronto.edu/sun/Software/SFDR/index.html.

Phenotype variation attributable to association and to linkage

The proportion of variation due to each SNP was measured as the change in regression sums of squares vs. the smaller model with the SNP removed[58]. Using the genome-scan threshold of P=5 ×10-8 and minimum P=3.34 × 10-8 in the chromosome 11p13 region for GMS and CGS F508del/F508del patients, we estimate a 57.4% reduction in effect size compared to the nominal result. Using the joint analysis based on GMS, CGS F508del/F508del and TSS F508del/F508del patients, the observed minimum P=8.28 × 10-8 results in ~ 28.0% reduction of the effect size. Using the rough parallel to explained variation in the trait, the estimated explained variation for 11p13 remains 1%-2%. For a linkage study of comparable size (n=500 sibling pairs), with a phenotype heritability of 0.5, the bias attributed to the winner’s curse varies from approximately 0.46 down to zero as the true (unmeasured) heritability attributable to the QTL increases[16]. While not possible to quantify the magnitude of this bias in this single study, these calculations provide an upper bound on the bias of 0.38 to 0.46 and a lower bound of 0.04 to 0.12.

Estimation of changes in the CF lung phenotype upon FEV1 %predicted and airway flow

Using 973 TSS individuals, a hypothetical quantity of 0.2 was added to each individual’s lung phenotype, to correspond to the effect size observed for the significant association of SNPs near EHF/APIP. The average raw FEV1 (in liters) was then back-extrapolated[8] and FEV1 percent predicted values were generated using the predictive equations[59,60]. Height and age adjustments used to calculate the original quantitative lung phenotype were preserved. The average increase (mean ± SD) in FEV1 percent predicted corresponding to a 0.2-unit increase of our lung phenotype was 5.09% ± 1.90% [n = 841; Range: 0.00 – 14.53%]. The corresponding average increase in raw FEV1 was 253.5 ± 85.9mL in adult subjects (>18 years) [n = 244; Range: 0.0 – 630.0mL].
  56 in total

1.  Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies.

Authors:  Andrew D Skol; Laura J Scott; Gonçalo R Abecasis; Michael Boehnke
Journal:  Nat Genet       Date:  2006-01-15       Impact factor: 38.330

2.  In silico method for inferring genotypes in pedigrees.

Authors:  Joshua T Burdick; Wei-Min Chen; Gonçalo R Abecasis; Vivian G Cheung
Journal:  Nat Genet       Date:  2006-08-20       Impact factor: 38.330

Review 3.  The melanocortin system and energy balance.

Authors:  Andrew A Butler
Journal:  Peptides       Date:  2006-01-23       Impact factor: 3.750

4.  PLINK: a tool set for whole-genome association and population-based linkage analyses.

Authors:  Shaun Purcell; Benjamin Neale; Kathe Todd-Brown; Lori Thomas; Manuel A R Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I W de Bakker; Mark J Daly; Pak C Sham
Journal:  Am J Hum Genet       Date:  2007-07-25       Impact factor: 11.025

5.  Family-based association tests for genomewide association scans.

Authors:  Wei-Min Chen; Goncalo R Abecasis
Journal:  Am J Hum Genet       Date:  2007-09-18       Impact factor: 11.025

6.  PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data.

Authors:  Kai Wang; Mingyao Li; Dexter Hadley; Rui Liu; Joseph Glessner; Struan F A Grant; Hakon Hakonarson; Maja Bucan
Journal:  Genome Res       Date:  2007-10-05       Impact factor: 9.043

7.  Heritability of lung disease severity in cystic fibrosis.

Authors:  Lori L Vanscoy; Scott M Blackman; Joseph M Collaco; Amanda Bowers; Teresa Lai; Kathleen Naughton; Marilyn Algire; Rita McWilliams; Suzanne Beck; Julie Hoover-Fong; Ada Hamosh; Dave Cutler; Garry R Cutting
Journal:  Am J Respir Crit Care Med       Date:  2007-03-01       Impact factor: 21.405

8.  Suppression of hypoxic cell death by APIP-induced sustained activation of AKT and ERK1/2.

Authors:  D-H Cho; H-J Lee; H-J Kim; S-H Hong; J-O Pyo; C Cho; Y-K Jung
Journal:  Oncogene       Date:  2006-11-06       Impact factor: 9.867

9.  Celsius: a community resource for Affymetrix microarray data.

Authors:  Allen Day; Marc R J Carlson; Jun Dong; Brian D O'Connor; Stanley F Nelson
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

10.  HEF1-dependent Aurora A activation induces disassembly of the primary cilium.

Authors:  Elena N Pugacheva; Sandra A Jablonski; Tiffiney R Hartman; Elizabeth P Henske; Erica A Golemis
Journal:  Cell       Date:  2007-06-29       Impact factor: 41.582

View more
  127 in total

1.  A genome-wide analysis of open chromatin in human tracheal epithelial cells reveals novel candidate regulatory elements for lung function.

Authors:  Jared M Bischof; Christopher J Ott; Shih-Hsing Leir; Nehal Gosalia; Lingyun Song; Darin London; Terrence S Furey; Calvin U Cotton; Gregory E Crawford; Ann Harris
Journal:  Thorax       Date:  2011-12-14       Impact factor: 9.139

2.  Matrix eQTL: ultra fast eQTL analysis via large matrix operations.

Authors:  Andrey A Shabalin
Journal:  Bioinformatics       Date:  2012-04-06       Impact factor: 6.937

Review 3.  Cystic fibrosis papers of the year 2010-2011.

Authors:  David Honeybourne
Journal:  J R Soc Med       Date:  2012-06       Impact factor: 5.344

4.  Future directions in early cystic fibrosis lung disease research: an NHLBI workshop report.

Authors:  Bonnie W Ramsey; Susan Banks-Schlegel; Frank J Accurso; Richard C Boucher; Garry R Cutting; John F Engelhardt; William B Guggino; Christopher L Karp; Michael R Knowles; Jay K Kolls; John J LiPuma; Susan Lynch; Paul B McCray; Ronald C Rubenstein; Pradeep K Singh; Eric Sorscher; Michael Welsh
Journal:  Am J Respir Crit Care Med       Date:  2012-02-03       Impact factor: 21.405

5.  Genome reference and sequence variation in the large repetitive central exon of human MUC5AC.

Authors:  Xueliang Guo; Shuo Zheng; Hong Dang; Rhonda G Pace; Jaclyn R Stonebraker; Corbin D Jones; Frank Boellmann; George Yuan; Prashamsha Haridass; Olivier Fedrigo; David L Corcoran; Max A Seibold; Swati S Ranade; Michael R Knowles; Wanda K O'Neal; Judith A Voynow
Journal:  Am J Respir Cell Mol Biol       Date:  2014-01       Impact factor: 6.914

6.  Gene expression in transformed lymphocytes reveals variation in endomembrane and HLA pathways modifying cystic fibrosis pulmonary phenotypes.

Authors:  Wanda K O'Neal; Paul Gallins; Rhonda G Pace; Hong Dang; Whitney E Wolf; Lisa C Jones; XueLiang Guo; Yi-Hui Zhou; Vered Madar; Jinyan Huang; Liming Liang; Miriam F Moffatt; Garry R Cutting; Mitchell L Drumm; Johanna M Rommens; Lisa J Strug; Wei Sun; Jaclyn R Stonebraker; Fred A Wright; Michael R Knowles
Journal:  Am J Hum Genet       Date:  2015-01-29       Impact factor: 11.025

Review 7.  Genetic variation and clinical heterogeneity in cystic fibrosis.

Authors:  Mitchell L Drumm; Assem G Ziady; Pamela B Davis
Journal:  Annu Rev Pathol       Date:  2011-10-17       Impact factor: 23.472

8.  The genetics and genomics of cystic fibrosis.

Authors:  N Sharma; G R Cutting
Journal:  J Cyst Fibros       Date:  2019-12-23       Impact factor: 5.482

9.  New challenges in the diagnosis and management of cystic fibrosis.

Authors:  Hara Levy; Philip M Farrell
Journal:  J Pediatr       Date:  2015-06       Impact factor: 4.406

10.  Single-cell RNA sequencing identifies diverse roles of epithelial cells in idiopathic pulmonary fibrosis.

Authors:  Yan Xu; Takako Mizuno; Anusha Sridharan; Yina Du; Minzhe Guo; Jie Tang; Kathryn A Wikenheiser-Brokamp; Anne-Karina T Perl; Vincent A Funari; Jason J Gokey; Barry R Stripp; Jeffrey A Whitsett
Journal:  JCI Insight       Date:  2016-12-08
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.