| Literature DB >> 35647563 |
Hanley Kingston1, Adrienne M Stilp2, William Gordon3, Jai Broome4, Stephanie M Gogarten2, Hua Ling5, John Barnard6, Shannon Dugan-Perez7, Patrick T Ellinor8,9, Stacey Gabriel10, Soren Germer11, Richard A Gibbs7, Namrata Gupta10, Kenneth Rice2, Albert V Smith12, Michael C Zody11, Scott M Blackman13, Garry Cutting14, Michael R Knowles15, Yi-Hui Zhou16, Margaret Rosenfeld17,18, Ronald L Gibson17,18, Michael Bamshad3,17,19,20, Alison Fohner1,21, Elizabeth E Blue1,4,20.
Abstract
CFTR F508del (c.1521_1523delCTT, p.Phe508delPhe) is the most common pathogenic allele underlying cystic fibrosis (CF), and its frequency varies in a geographic cline across Europe. We hypothesized that genetic variation associated with this cline is overrepresented in a large cohort (N > 5,000) of persons with CF who underwent whole-genome sequencing and that this pattern could result in spurious associations between variants correlated with both the F508del genotype and CF-related outcomes. Using principal-component (PC) analyses, we showed that variation in the CFTR region disproportionately contributes to a PC explaining a relatively high proportion of genetic variance. Variation near CFTR was correlated with population structure among persons with CF, and this correlation was driven by a subset of the sample inferred to have European ancestry. We performed genome-wide association studies comparing persons with CF with one versus two copies of the F508del allele; this allowed us to identify genetic variation associated with the F508del allele and to determine that standard PC-adjustment strategies eliminated the significant association signals. Our results suggest that PC adjustment can adequately prevent spurious associations between genetic variants and CF-related traits and are therefore effective tools to control for population structure even when population structure is confounded with disease severity and a common pathogenic variant.Entities:
Keywords: CFTR F508del; genome-wide association study; population structure
Year: 2022 PMID: 35647563 PMCID: PMC9136666 DOI: 10.1016/j.xhgg.2022.100117
Source DB: PubMed Journal: HGG Adv ISSN: 2666-2477
Summary of the Cystic Fibrosis Genome Project participants
| Total | 1,809 | 1,783 | 1,347 | 4,939 |
| Birth year: mean (range) | 1991 (1943–2011) | 1982 (1946–2007) | 2000 (1992–2006) | 1900 (1943–2011) |
| Age diagnosed, years: mean (SD) | 2.4 (5.7) | 2.4 (4.5) | 0.9 (1.7) | 2.0 (4.5) |
| Genotype: N (%) | ||||
| | 1,643 (90.8) | 1,706 (95.7) | 1,244 (92.4) | 4,593 (93.0) |
| | 875 (48.4) | 1,282 (71.9) | 722 (53.6) | 2,879 (58.3) |
| Male: N (%) | 946 (52.3) | 1,004 (56.3) | 673 (50.0) | 2,623 (53.1) |
| Empirical ancestry: N (%) | ||||
| African | 32 (1.8) | 25 (1.4) | 33 (2.4) | 90 (1.8) |
| Native American | 86 (4.8) | 46 (2.6) | 64 (4.8) | 196 (4.0) |
| East Asian | 4 (0.2) | 0 (0) | 1 (0.1) | 5 (0.1) |
| European | 1,681 (92.9) | 1,710 (95.9) | 1,247 (92.6) | 4,638 (93.9) |
| South Asian | 6 (0.3) | 2 (0.1) | 2 (0.1) | 10 (0.2) |
Values are provided for the 4,939 participants passing quality control and included in PCAs. Estimated ancestry defined as the ancestry group with the highest probability estimated by Somalier analysis. Details limited to carriers are presented in Table S1.
CFTR F508del homozygotes were included in the count of carriers.
Figure 1Population structure within the entire CFGP (n = 4,939)
Pairwise principal-component (PC) plots are shown for PCs 1–4 with frequency distributions and percentage of variance explained by each PC on the diagonal. Ancestry estimates indicate the ancestry with the highest estimated proportion using Somalier. Abbreviations: AFR, sub-Saharan African; AMR, Native American; EAS, East Asian; EUR, European; SAS, South Asian.
Figure 2Correlation between PCs and genomic position
(A and B) The correlation between PCs (Y axis) and genomic position (X axis) are shown for the (A) CFGP (n = 4,939) and (B) CFGP participants with estimated European ancestry >80% (n = 4,567). The number of PCs shown is the number used to calculate the genetic relatedness matrix and, for the total CFGP dataset, used in the PC-adjusted GWAS analysis. Color-coded regions include 7q21.31 (CFTR, pink) and three regions that have previously shown evidence of long-range LD: 2q21.1-2q22.1 (LCT, teal), 6p22.3-6p21.2 (the major histocompatibility complex, orange), and the 8p23 inversion polymorphism (purple).
Figure 3GWASs for CFTR F508del heterozygosity versus homozygosity
(Top) The baseline model adjusted for site and relatedness. (Bottom) The PC-adjusted model. Association signals are measured as -log10(p values). Plot is truncated at p = 1 × 10-10, as the peak at CFTR on chr7 reaches p < 1 × 10-300 under both models. The genome-wide significance level, p < 5 × 10-8, is indicated by the horizontal line.
Regions of the genome significantly associated with CFTR F508del heterozygosity versus homozygosity under the baseline model
| 1q31.3 | rs2813164 | A | G | 0.33 | 0.33 | 1.74 × 10-8 | |
| 1q41 | rs853741 | G | A | 0.95 | 0.94 | 2.62 × 10-9 | |
| 2q14.3 | rs1911632 | A | C | 0.17 | 0.20 | 7.84 × 10-9 | |
| 2q22.1 | rs533344 | T | A | 0.72 | 0.70 | 5.58 × 10-9 | |
| 3p14.1 | rs11127729 | T | C | 0.95 | 0.95 | 3.28 × 10-8 | |
| 6q27 | rs9455973 | G | A | 0.08 | 0.10 | 4.56 × 10-8 | |
| 7q31.2 | rs7802924 | A | G | 0.09 | 0.85 | 1 × 10-300 | |
| 9q21.32 | rs6559779 | A | G | 0.10 | 0.10 | 4.95 × 10-8 | |
| 10q25.2 | rs1923653 | A | G | 0.94 | 0.92 | 4.47 × 10-8 | |
| 12p11.1 | rs949473 | G | A | 0.14 | 0.16 | 8.99 × 10-9 |
Significant p value threshold: 5 × 10-8. The baseline association model is adjusted for site and a genetic relatedness matrix. Sequence positions of association peaks are provided on the GRCh38 map. Alternate allele frequencies (AAFs) are given for non-Finnish Europeans in gnomAD v.3.1.1 and within the CFGP.