| Literature DB >> 28658209 |
Hailiang Huang1,2, Ming Fang3,4, Luke Jostins5,6, Maša Umićević Mirkov7, Gabrielle Boucher8, Carl A Anderson7, Vibeke Andersen9,10, Isabelle Cleynen11, Adrian Cortes5,12, François Crins3,4, Mauro D'Amato13,14,15, Valérie Deffontaine3,4, Julia Dmitrieva3,4, Elisa Docampo3,4, Mahmoud Elansary3,4, Kyle Kai-How Farh1,2,16, Andre Franke17, Ann-Stephan Gori3,4, Philippe Goyette8, Jonas Halfvarson18, Talin Haritunians19, Jo Knight20, Ian C Lawrance21,22, Charlie W Lees23, Edouard Louis3,24, Rob Mariman3,4, Theo Meuwissen25, Myriam Mni3,4, Yukihide Momozawa3,4,26, Miles Parkes27, Sarah L Spain7,28, Emilie Théâtre3,4, Gosia Trynka7, Jack Satsangi23, Suzanne van Sommeren29, Severine Vermeire11,30, Ramnik J Xavier2,31, Rinse K Weersma29, Richard H Duerr32,33, Christopher G Mathew34,35, John D Rioux8,36, Dermot P B McGovern19, Judy H Cho37, Michel Georges3,4, Mark J Daly1,2, Jeffrey C Barrett7.
Abstract
Inflammatory bowel diseases are chronic gastrointestinal inflammatory disorders that affect millions of people worldwide. Genome-wide association studies have identified 200 inflammatory bowel disease-associated loci, but few have been conclusively resolved to specific functional variants. Here we report fine-mapping of 94 inflammatory bowel disease loci using high-density genotyping in 67,852 individuals. We pinpoint 18 associations to a single causal variant with greater than 95% certainty, and an additional 27 associations to a single variant with greater than 50% certainty. These 45 variants are significantly enriched for protein-coding changes (n = 13), direct disruption of transcription-factor binding sites (n = 3), and tissue-specific epigenetic marks (n = 10), with the last category showing enrichment in specific immune cells among associations stronger in Crohn's disease and in gut mucosa among associations stronger in ulcerative colitis. The results of this study suggest that high-resolution fine-mapping in large samples can convert many discoveries from genome-wide association studies into statistically convincing causal variants, providing a powerful substrate for experimental elucidation of disease mechanisms.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28658209 PMCID: PMC5511510 DOI: 10.1038/nature22969
Source DB: PubMed Journal: Nature ISSN: 0028-0836 Impact factor: 49.962
Extended Data Figure 1Power of the fine-mapping analysis.
Power (y axis) to identify the causal variant in a correlated pair (strength of correlation shown by color) increases with the significance of the association (x axis), and therefore with sample size and effect size. The vertical dashed line shows the genome-wide significance level. To estimate the relationship between the strength of association and our ability to fine-map it, we assumed that the association has only two causal variant candidates, and we defined the signal as successfully fine-mapped if the ratio of Bayes factors between the true causal variant and the non-causal variant is greater than 10 (a 91% posterior, assuming equal priors for the two candidate variants). Using equation (8) in Supplementary Methods, we have in which θ* is maximum likelihood estimate of the parameter values. The log-likelihood ratio follows a chi-square distribution: in which λ is the chi-square statistic of the lead variant and r is the correlation coefficient between the two variants. Because of the additive property of the chi-square distribution, logBF follows a non-central chi-square distribution with 1 degree of freedom and non-centrality parameter λ(1 – r2)/2. Therefore, the power can calculated as the probability that logBF > log(10), given by the cumulative distribution function of the non-central chi-squared distribution.
Study samples.
Genotyped samples in each batch for healthy controls (Control), Crohn’s disease (CD) and ulcerative colitis (UC). Batches were grouped into cohorts for further analysis (Controlling for population structure, batch effects and other confounders, Methods).
| Batch | Control | CD | UC | Cohort |
|---|---|---|---|---|
| IMSGC | 5740 | 0 | 0 | imbalanced |
| NIDDK | 1786 | 3653 | 3020 | balanced |
| D. Ellinghaus | 4559 | 2696 | 1006 | balanced |
| E. Theatre | 713 | 1109 | 559 | balanced |
| H. Huang | 3 | 551 | 316 | imbalanced |
| J. Barrett | 4397 | 2715 | 2835 | balanced |
| K. Fransen | 1598 | 1234 | 430 | balanced |
| L. Jostins | 1354 | 1252 | 1063 | balanced |
| P. Gregersen | 1611 | 0 | 0 | imbalanced |
| R. Duerr | 1696 | 321 | 1611 | balanced |
| S. Rich | 4259 | 0 | 0 | imbalanced |
| S. Sommeren | 107 | 77 | 201 | balanced |
| S. Vermeire | 922 | 1539 | 838 | balanced |
| T. Balschun | 5511 | 1882 | 1683 | balanced |
| T. Haritunians | 1 | 1938 | 1066 | imbalanced |
Extended Data Figure 2Procedures in the fine-mapping analysis.
Details for each stage are described in Methods. The dashed line means the imputation was performed only once after the manual inspection (not iteratively).
Figure 1Fine-mapping procedure and output using the SMAD3 region as an example.
a, 1) We merge overlapping signals across methods; 2) select a lead variant (black triangle) and phenotype (color); and 3) choose the best model. Details for each step are available in Methods. b, Example fine-mapping output. This region has been mapped to two independent signals. For each signal, we report the phenotype it is associated with (colored), the variants in the credible set, and their posterior probabilities.
Figure 2Summary of fine-mapped associations.
a, Independent signals. Sixty-eight loci containing one association and 26 loci containing multiple associations. b, Number of variants in credible sets. 18 associations were fine-mapped to a single variant, and 116 to ≤ 50 variants. c, Distribution of the posterior probability of the variants in credible sets having ≤ 50 variants.
Extended Data Figure 3Variance explained.
Variance explained by secondary, tertiary, … variants as a fraction of the primary signal at each locus.
Variants having posterior probability >50%.
| Variant | Chr | Position | Ns | Phe | AF | Prob | INFO | Func | Annotation |
|---|---|---|---|---|---|---|---|---|---|
| Signals mapped to a single variant | |||||||||
| rs7307562 | 12 | 40724960 | 2 | CD | 0.398 | 0.999 | 1 | ||
| rs2066844 | 16 | 50745926 | 10 | CD | 0.063 | 0.999 | 0.8 | C | |
| rs2066845 | 16 | 50756540 | 10 | CD | 0.022 | 0.999 | 1 | C | |
| rs6017342 | 20 | 43065028 | 2 | UC | 0.544 | 0.999 | 1 | E | |
| rs61839660 | 10 | 6094697 | 2 | CD | 0.094 | 0.999 | 1 | E | |
| rs5743293 | 16 | 50763781 | 10 | CD | 0.964 | 0.999 | 1 | C | |
| rs6062496 | 20 | 62329099 | 1 | IBD | 0.587 | 0.996 | 1 | T | |
| rs141992399 | 9 | 139259592 | 3 | IBD | 0.005 | 0.995 | 1 | C | |
| rs35667974 | 2 | 163124637 | 1 | UC | 0.021 | 0.994 | 1 | C | |
| rs74465132 | 7 | 50304782 | 3 | IBD | 0.034 | 0.994 | 1 | T,E | |
| rs4676408 | 2 | 241574401 | 1 | UC | 0.508 | 0.994 | 0.99 | ||
| rs5743271 | 16 | 50744688 | 10 | CD | 0.007 | 0.993 | 1 | C | |
| rs10748781 | 10 | 101283330 | 2 | IBD | 0.55 | 0.990 | 1 | E | |
| rs35874463 | 15 | 67457698 | 2 | IBD | 0.054 | 0.989 | 1 | C,E | |
| rs72796367 | 16 | 50762771 | 10 | CD | 0.023 | 0.983 | 1 | ||
| rs1887428 | 9 | 4984530 | 1 | IBD | 0.603 | 0.974 | 0.97 | ||
| rs41313262 | 1 | 67705900 | 5 | CD | 0.014 | 0.973 | 1 | C | |
| rs28701841 | 6 | 106530330 | 2 | CD | 0.116 | 0.971 | 1 | ||
| Signals mapped to 2-50 variants and the lead variant has posterior probability > 50% | |||||||||
| rs76418789 | 1 | 67648596 | 5 | CD | 0.006 | 0.937 | 0.59 | C | |
| rs7711427 | 5 | 40414886 | 3 | CD | 0.633 | 0.919 | 1 | ||
| rs1736137 | 21 | 16806695 | 2 | CD | 0.407 | 0.879 | 1 | ||
| rs104895444 | 16 | 50746199 | 10 | CD | 0.003 | 0.865 | 1 | C | |
| rs56167332 | 5 | 158827769 | 2 | IBD | 0.353 | 0.845 | 1 | ||
| rs104895467 | 16 | 50750810 | 10 | CD | 0.002 | 0.833 | 1 | C | |
| rs630923 | 11 | 118754353 | 2 | CD | 0.153 | 0.820 | 0.98 | ||
| rs3812565 | 9 | 139272502 | 3 | IBD | 0.402 | 0.815 | 1 | Q | eQTL of |
| rs4655215 | 1 | 20137714 | 3 | UC | 0.763 | 0.784 | 1 | E | Gut_H3K27ac |
| rs145530718 | 19 | 10568883 | 3 | CD | 0.023 | 0.762 | 0.97 | ||
| rs6426833 | 1 | 20171860 | 3 | UC | 0.555 | 0.752 | 1 | ||
| chr20: | 20 | 43258079 | 2 | CD | 0.041 | 0.736 | 0.88 | ||
| rs17229679 | 2 | 199560757 | 2 | UC | 0.028 | 0.716 | 1 | ||
| rs4728142 | 7 | 128573967 | 1 | UC | 0.448 | 0.664 | 1 | E | Immune_H3K4me1 |
| rs2143178 | 22 | 39660829 | 2 | IBD | 0.157 | 0.662 | 1 | T,E | NFKB TFBS, Gut_H3K27ac |
| rs34536443 | 19 | 10463118 | 3 | CD | 0.038 | 0.649 | 1 | C | |
| rs138425259 | 16 | 50663477 | 10 | UC | 0.009 | 0.648 | 0.92 | ||
| rs146029108 | 9 | 139329966 | 3 | CD | 0.036 | 0.643 | 0.92 | ||
| rs12722504 | 10 | 6089777 | 2 | CD | 0.26 | 0.615 | 1 | ||
| rs60542850 | 19 | 10488360 | 3 | IBD | 0.17 | 0.591 | 0.89 | ||
| rs2188962 | 5 | 131770805 | 1 | CD | 0.44 | 0.590 | 1 | E,Q | Gut_H3K27ac, |
| rs2019262 | 1 | 67679990 | 5 | IBD | 0.4 | 0.586 | 1 | ||
| rs3024493 | 1 | 206943968 | 2 | IBD | 0.171 | 0.537 | 1 | E | Immune_H3K4me1 |
| rs7915475 | 10 | 64381668 | 3 | CD | 0.304 | 0.528 | 1 | ||
| rs77981966 | 2 | 43777964 | 1 | CD | 0.077 | 0.521 | 1 | ||
| rs9889296 | 17 | 32570547 | 1 | CD | 0.264 | 0.512 | 1 | ||
| rs2476601 | 1 | 114377568 | 1 | CD | 0.908 | 0.508 | 1 | C | |
Ns: number of independent signals in the locus. Phe: phenotype. AF: allele frequency. Prob: posterior probability for being a causal variant. INFO: imputation. Func: functional annotations -- coding (C), disrupting transcription factor binding sites (T), overlapping epigenetic peaks (E) and colocalization with eQTL (Q).
Figure 3Functional annotation of causal variants.
a, Proportion of credible variants that are protein coding, disrupt/create transcription factor binding motif sites (TFBS) or are synonymous, sorted by posterior probability. b, Epigenetic peaks overlapping credible variants in cell and tissue types from the Roadmap Epigenomics Consortium39. Significant enrichment has been marked with asterisks. Proportion of credible variants that overlap (c) core immune peaks for H4K4me1or (d) core gut peaks for H3K27ac (Methods). In panels a, c and d, the vertical dotted lines mark 50% posterior probability and the horizontal dashed lines show the background proportions of each functional category.
Colocalization with eQTL.
The number of IBD credible sets that colocalize with eQTLs using the naïve, frequentist and Bayesian approaches. Significant observations are boldfaced. ‘Number of credible sets’ reports the number of credible sets that have MAF above the cut-off.
| Tissue/cell line | Method | Overlaps observed | Overlaps Expected | P value | Dataset | MAF cut-off | Number of credible sets |
|---|---|---|---|---|---|---|---|
| whole blood | Naïve | 3 | 3.7 | 0.746 | GODOT | 0.005 | 113 |
| whole blood | 8 | 4.2 | 0.060 | Westra | 0.05 | 95 | |
| CD14 IFN stimulated | 4 | 3.2 | 0.398 | Fairfax | 0.04 | 98 | |
| CD14 LPS 2h stimulated | 1 | 2.1 | 0.869 | Fairfax | 0.04 | 98 | |
| CD14 LPS 24h stimulated | 5 | 2.5 | 0.106 | Fairfax | 0.04 | 98 | |
| CD8 | 1 | 0.3 | 0.306 | ULg | 0.05 | 95 | |
| CD14 | 0 | 0.2 | 1.000 | ULg | 0.05 | 95 | |
| CD15 | 1 | 0.2 | 0.199 | ULg | 0.05 | 95 | |
| CD19 | 0 | 0.1 | 1.000 | ULg | 0.05 | 95 | |
| platelets | 0 | 0.0 | 1.000 | ULg | 0.05 | 95 | |
| colon | 1 | 0.2 | 0.202 | ULg | 0.05 | 95 | |
| rectum | 1 | 0.2 | 0.189 | ULg | 0.05 | 95 | |
| Frequentist | |||||||
| CD8 | 3 | 1.5 | 0.186 | ULg | 0.05 | 95 | |
| CD14 | 4 | 2.3 | 0.180 | ULg | 0.05 | 95 | |
| CD15 | 1 | 1.8 | 0.863 | ULg | 0.05 | 95 | |
| CD19 | 0 | 1.4 | 1.000 | ULg | 0.05 | 95 | |
| platelets | 0 | 0.1 | 1.000 | ULg | 0.05 | 95 | |
| colon | 3 | 1.7 | 0.216 | ULg | 0.05 | 95 | |
| Bayesian | |||||||
| CD8 | 1 | 0.8 | 0.566 | ULg | 0.05 | 95 | |
| CD14 | 1 | 0.9 | 0.595 | ULg | 0.05 | 95 | |
| CD15 | 0 | 0.7 | 1.000 | ULg | 0.05 | 95 | |
| CD19 | 0 | 0.6 | 1.000 | ULg | 0.05 | 95 | |
| platelets | 0 | 0.1 | 1.000 | ULg | 0.05 | 95 | |
| ileum | 2 | 0.4 | 0.069 | ULg | 0.05 | 95 | |
| rectum | 2 | 0.6 | 0.124 | ULg | 0.05 | 95 | |
Figure 4Number of credible sets that colocalize eQTLs.
Distributions of the number of colocalizations by chance (violins) and observed number of colocalizations with p-values (dots). Both the background and the observed numbers were calculated using the “Frequentist colocalization using conditional P values” approach (Methods).
Extended Data Figure 4Functional annotations.
a, Functional annotation for 45 variants having posterior probability > 50%. b, Functional annotation for 116 association signals that are fine-mapped to ≤ 50 variants. Annotations are defined in Methods. We additionally grouped eQTLs into “Immune/Blood” (CD4+, CD8+, CD19+, CD14+ CD15+, platelets) and “Gut” (ileum, transverse colon and rectum). The eQTLs were generated from the ULg dataset using the “frequentist colocalization using conditional P values” approach (Methods).
Extended Data Figure 5Size of credible sets.
Comparison of credible set sizes for primary signals using each of our fine-mapping methods (methods 1, 2 and 3), the combined approach (as adopted in final results) and the approach described in Maller et al.6 (y axis) and the R > 0.6 cut-off (x axis). Fine-mapping maps most signals to smaller numbers of variants.
Extended Data Figure 6Distributions of the allele frequency and the imputation quality.
Panels a-c: distribution of the risk allele frequency for 45 variants having > 50% posterior probability plotted against (a) posterior probability, (b) significance of the association as –log10(P), and (c) odds ratio of the association. Variants are color coded according to their functions. Odds ratio for IBD associations was the larger of odds ratios for CD and UC. Panels d-f: distribution of imputation quality (INFO measure from the IMPUTE2 program) for variants having MAF ≥5% (d), between 5% and 1% (e) and <1% (f).
Genomic inflation.
Genomic inflation factors and LD score regression intercept for Crohn’s disease (CD), ulcerative colitis (UC) and both (IBD). a, Genomic inflation factors using the first four, five and six principal components. The factors were calculated using 2,853 background variants from the Immunochip. b, Genomic inflation factors for subsets of the data (using five principal components for the same 2,853 background variants). Balanced, imbalanced and down-sampled cohorts are defined in Methods. Numbers in brackets indicate the 95% confidence interval for the inflation factors (only estimated for the down-sampled cohorts). c, LD score regression intercept and genomic inflation factors (λ and λ) from the largest IBD meta-analyses with genome-wide data (CD:GWAS and UC:GWAS).
| a | ||||
|---|---|---|---|---|
| CD | UC | IBD | ||
| PC 1-4 | 1.41 | 1.31 | 1.38 | |
| PC 1-6 | 1.28 | 1.25 | 1.32 | |
Extended Data Figure 7Merging and adjudicating signals across methods.
The number of signals for each method is shown in the brackets, and for each method a black bar indicates a signal with p < 1.35 x 10–6, and a grey bar a signal that does not reach that threshold. The colored bar shows the final status of each signal after merging and model selection (Methods). “Low info” corresponds to INFO < 0.8 (the threshold used for signals reported by 1 or 2 methods) and “rare and imputed” to MAF < 0.01 and no genotyped variants in the credible set, regardless of INFO (Methods).