| Literature DB >> 29773854 |
Cindy Im1, Yadav Sapkota2, Wonjong Moon2, Minae Kawashima3, Minoru Nakamura4, Katsushi Tokunaga3, Yutaka Yasui5,6.
Abstract
Primary biliary cholangitis (PBC) susceptibility loci have largely been discovered through single SNP association testing. In this study, we report genic haplotype patterns associated with PBC risk genome-wide in two Japanese cohorts. Among the 74 genic PBC risk haplotype candidates we detected with a novel methodological approach in a discovery cohort of 1,937 Japanese, nearly two-thirds were replicated (49 haplotypes, Bonferroni-corrected P < 6.8 × 10-4) in an independent Japanese cohort (N = 949). Along with corroborating known PBC-associated loci (TNFSF15, HLA-DRA), risk haplotypes may potentially model cis-interactions that regulate gene expression. For example, one replicated haplotype association (9q32-9q33.1, OR = 1.7, P = 3.0 × 10-21) consists of intergenic SNPs outside of the human leukocyte antigen (HLA) region that overlap regulatory histone mark peaks in liver and blood cells, and are significantly associated with TNFSF8 expression in whole blood. We also replicated a novel haplotype association involving non-HLA SNPs mapped to UMAD1 (7p21.3; OR = 15.2, P = 3.9 × 10-9) that overlap enhancer peaks in liver and memory Th cells. Our analysis demonstrates the utility of haplotype association analyses in discovering and characterizing PBC susceptibility loci.Entities:
Mesh:
Year: 2018 PMID: 29773854 PMCID: PMC5958065 DOI: 10.1038/s41598-018-26112-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Distributions of haplotype association test p-values for dropped versus selected 3-SNP haplotypes in the replication cohort. Side-by-side histograms of haplotype association p-values for dropped haplotypes and selected haplotypes in our replication cohort (N = 949) are provided for comparison. Selected 3-SNP haplotypes have the top percentile of permutation-based evaluation statistics (values less than −11.4).
Selected examples of replicated 3-SNP haplotypes.
| Chr | 3-SNP haplotype or logic tree | Gene analytic windows with 3-SNP haplotype | # SNPs in gene windows | Discovery | Replication | Combined | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Permutation-based selection statistic | OR | P | OR | P | OR | P | # with 0 haplotype copies (% cases) | # with 1 haplotype copy (% cases) | # with 2 haplotype copies (% cases) | ||||
| 6 | (rs3129881 = C) or ((rs375244 = A) and (rs3132947 = G)) |
| 153–163 | −69.730 | 4.937 | 1.8 × 10−20 | 2.324 | 1.6 × 10−5 | 3.665 | 2.3 × 10−24 | 17 | 360 | 2,509 |
| 6 | ((rs9268831 = T) or (rs9269190 = T)) or (rs9270652 = C) |
| 158–169 | −43.565 | 3.640 | 3.2 × 10−23 | 2.315 | 1.9 × 10−7 | 3.075 | 7.3 × 10−29 | 46 | 496 | 2,344 |
| 6 | (rs9295704 = C) and ((rs2451752 = A) and (rs2575174 = C))a |
| 46–52 | −12.263 | 0.362 | 1.7 × 10−10 | 0.374 | 1.3 × 10−5 | 0.365 | 8.9 × 10−15 | 2,571 | 304 | 11 |
| 7 | ((rs12671658 = T) or (rs12702656 = A)) and (rs11768586 = G)a |
| 204–292 | −16.424 | 0.040 | 8.7 × 10−6 | 0.112 | 3.6 × 10−4 | 0.066 | 3.9 × 10–9 | 2,802 | 84 | 0 |
| 9 | (rs4979484 = C) or ((rs13300483 = T) and (rs7028891 = G))a |
| 121–156 | −28.431 | 1.746 | 4.2 × 10−17 | 1.528 | 8.6 × 10−6 | 1.672 | 3.0 × 10−21 | 775 | 1,434 | 677 |
Abbreviation: #, number.
aContains no HLA region SNPs.
Single SNP and component 2-SNP haplotype effects for example replicated 3-SNP haplotypes.
| Chr | 3-SNP haplotype or logic tree | Tree ORb | Tree | Single SNP | Alternative allele | SNP ORc | SNP | Component 2-SNP haplotype | Pair | Pair |
|---|---|---|---|---|---|---|---|---|---|---|
| 6 | (rs3129881 = C) or ((rs375244 = A) and (rs3132947 = G)) | 3.665 | 2.3 × 10−24 | rs3129881 | C | 2.227 | 1.8 × 10−23 | rs375244 = A and rs3132947 = G | 1.238 | 7.9 × 10−5 |
| rs375244 | A | 0.980 | 0.715 | |||||||
| rs3132947 | G | 2.117 | 1.4 × 10−16 | |||||||
| 6 | ((rs9268831 = T) or (rs9269190 = T)) or (rs9270652 = C) | 3.075 | 7.3 × 10−29 | rs9268831 | T | 1.324 | 1.6 × 10−7 | rs9268831 = T or rs9269190 = T | 1.633 | 2.8 × 10−16 |
| rs9269190 | T | 1.252 | 1.3 × 10−4 | rs9268831 = T or rs9270652 = C | 1.321 | 2.0 × 10−5 | ||||
| rs9270652 | C | 1.109 | 0.057 | rs9269190 = T or rs9270652 = C | 1.925 | 1.2 × 10−18 | ||||
| 6 | (rs9295704 = C) and ((rs2451752 = A) and (rs2575174 = C))a | 0.365 | 8.9 × 10−15 | rs9295704 | C | 0.669 | 1.5 × 10−7 | rs9295704 = C and rs2451752 = A | 0.399 | 3.2 × 10−14 |
| rs2451752 | A | 0.951 | 0.457 | rs9295704 = C and rs2575174 = C | 0.638 | 1.2 × 10−7 | ||||
| rs2575174 | C | 0.940 | 0.393 | rs2451752 = A and rs2575174 = C | 0.933 | 0.228 | ||||
| 7 | ((rs12671658 = T) or (rs12702656 = A)) and (rs11768586 = G)a | 0.066 | 3.9 × 10−9 | rs12671658 | T | 1.023 | 0.682 | rs12671658 = T and rs11768586 = G | 0.104 | 1.4 × 10−6 |
| rs12702656 | A | 1.039 | 0.585 | rs12702656 = A and rs11768586 = G | 0.000 | 0.953 | ||||
| rs11768586 | G | 0.865 | 0.010 | |||||||
| 9 | (rs4979484 = C) or ((rs13300483 = T) and (rs7028891 = G))a | 1.672 | 3.0 × 10−21 | rs4979484 | C | 1.365 | 0.005 | rs13300483 = T and rs7028891 = G | 1.637 | 1.2 × 10−19 |
| rs13300483 | T | 1.584 | 1.4 × 10−17 | |||||||
| rs7028891 | G | 1.574 | 2.8 × 10−17 |
aContains no HLA region SNPs.
b3-SNP haplotype OR and p-value in the combined sample (N = 2,886).
cSingle SNP ORs and p-values, assuming an additive genetic effect model for the specified alternative allele.
d2-SNP haplotype ORs and p-values, assuming an additive genetic effect model for the specified haplotype pattern.
Comparison of logic regression and benchmark methods to detect 3-SNP haplotypes in the discovery cohort, N = 1,937.
| Chr | Method A (proposed): Logic regression | Method B (benchmark): 3-SNP sliding windows | Comparison | |||||
|---|---|---|---|---|---|---|---|---|
| # Gene windows with replicated | Gene window with best | Best | # Tests with P < 3.2 × 10−8 | # Gene windows with at least one haplotype with P < 3.2 × 10−8 | Gene window with best | Best p-value | # Gene window matches between methods A and B | |
| 2 | 0 | NA | NA | 1 | 1 |
| 1.1 × 10−9 | 0 |
| 3 | 0 | NA | NA | 5 | 3 |
| 3.6 × 10−18 | 0 |
| 6 | 143 |
| 6.1 × 10−27 | 1352 | 173 |
| 2.2 × 10−30 | 123 |
| 7 | 9 |
| 8.7 × 10−6 | 6 | 3 |
| 2.1 × 10−15 | 0 |
| 8 | 0 | NA | NA | 4 | 4 |
| 1.1 × 10−8 | 0 |
| 9 | 12 |
| 2.5 × 10−17 | 74 | 16 |
| 3.2 × 10−13 | 12 |
| 18 | 0 | NA | NA | 10 | 5 |
| 1.6 × 10−10 | 0 |
Abbreviation: #, number.
Figure 2Selection of histone modification and DNase peak enrichment analysis results. The enrichment analyses compared 106 SNPs in replicated 3-SNP haplotypes to nominally associated single SNPs in gene regions. Figure shows enrichment test p-values that are log-transformed (-log10(P)) for the 15 blood/liver cell types for which haplotype SNPs are significantly enriched for both H3K4me1 and H3K4me3 peaks. Dashed and dotted vertical lines show Bonferroni-corrected p-value thresholds based on the number of blood/liver cell types for which Roadpmap Epigenomics assay data was available (29 and 11 types for histone mark and DNase peaks, respectively).
Figure 3Visualization of two replicated 3-SNP haplotype logic trees containing SNPs in the HLA region. The plotted region spans chr6:32156782–32585905 (hg19), which corresponds with the red rectangle in the chromosomal ideogram. The top data track shows single-SNP association results and haplotype association results using benchmark (“3-SNP sliding window”) and proposed (“3-SNP logic tree”) methods for the selected genomic region. The subsequent annotation tracks show RefSeq genes, a heatmap corresponding to Roadmap Epigenomics[21] H3K4me1 (Enh) and H3K4me3 (TSS) ChIP-seq peaks in primary B cells (E032) and primary T cells (E034), and the top significant blood (BLD), lymphoblastoid (LCL), and liver (LIV) eQTLs (expression quantitative trait loci) associated with SNPs in replicated haplotypes (GTEx Consortium)[20].
Functional annotations of SNPs in selected replicated 3-SNP haplotypes.
| Chr | 3-SNP haplotype | SNP | Chr position (hg19) | Ontology | Mapped gene | DHS overlapa, # EIDs | H3K4me1 overlapa, # EIDs | H3K4me3 overlapa, # EIDs (# PBC EIDs) | Bound proteinb: | # Altered motifsc | Significant eQTLsd: |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 6 | (rs3129881 = C) or ((rs375244 = A) and (rs3132947 = G)) | rs3129881 | 32409484 | intronic |
| 0 (0) | 42 (20) | 50 (17) | GM12878 (OCT2, POL2, POL24H8, POU2F2); GM12891 (OCT2, POL2, POL24H8, POU2F2); GM12892 (POL2, POL24H8) | 3 | Whole Blood ( |
| rs375244 | 32191457 | intronic |
| 0 (0) | 69 (15) | 38 (3) | NA | 3 | Whole Blood ( | ||
| rs3132947 | 32176782 | intronic |
| 0 (0) | 2 (1) | 4 (0) | NA | 3 | Whole Blood ( | ||
| 6 | ((rs9268831 = T) or (rs9269190 = T)) or (rs9270652 = C) | rs9268831 | 32427748 | intergenic | 5 (1) | 31 (19) | 71 (16) | GM18951 (POL2) | 1 | Whole Blood ( | |
| rs9269190 | 32448500 | intergenic | 2 (2) | 4 (4) | 2 (2) | NA | 2 | Whole Blood | |||
| rs9270652 | 32565905 | intergenic | 1 (1) | 1 (1) | 2 (0) | NA | 0 | NA | |||
| 6 | (rs9295704 = C) and ((rs2451752 = A) and (rs2575174 = C)) | rs9295704e | 26704816 | intergenic | 0 (0) | 13 (1) | 8 (1) | NA | 4 | Whole Blood ( | |
| rs2451752e | 26648013 | intronic | ZNF322 | 0 (0) | 1 (1) | 3 (1) | NA | 0 | Whole Blood ( | ||
| rs2575174e | 25885552 | intergenic | 0 (0) | 5 (0) | 0 (0) | NA | 2 | Whole Blood ( | |||
| 7 | ((rs12671658 = T) or (rs12702656 = A)) and (rs11768586 = G) | rs12671658e | 7842281 | intronic |
| 1 (0) | 6 (0) | 2 (0) | NA | 7 | NA |
| rs12702656e | 7851742 | intronic |
| 0 (0) | 9 (3) | 0 (0) | NA | 4 | NA | ||
| rs11768586e | 7849806 | intronic |
| 1 (0) | 6 (2) | 1 (1) | NA | 0 | NA | ||
| 9 | (rs4979484 = C) or ((rs13300483 = T) and (rs7028891 = G)) | rs4979484e | 117751450 | intergenic | 22 (6) | 33 (20) | 11 (7) | GM12878 (BATF, NFKB); | 3 | NA | |
| rs13300483e | 117643362 | intergenic | 0 (0) | 11 (6) | 2 (0) | NA | 2 | Whole Blood ( | |||
| rs7028891e | 117645015 | intergenic | 0 (0) | 4 (3) | 1 (1) | NA | 3 | Whole Blood ( |
Abbreviations: EID, epigenome identifier; dist, distance; #, number; DHS, DNase I hypersensitivity site; eQTL, expression quantitative trait loci.
aCounts of the number of consolidated cell types (EIDs) for which the SNP of interest overlaps the queried epigenomic assay peak (Roadmap Epigenomics Mapping Consortium processed data, Kundaje et al.)[21]. “PBC EID”: Separately considers peak overlap counts among the 29 blood/liver cell types available in Roadmap Epigenomics.
bBound protein: Regulatory protein-binding ChIP-seq peak overlaps for specified proteins are provided for blood- or liver-related cell lines only (HaploReg v4, Ward and Kellis)[37].
cAltered motifs: The number of regulatory motifs predicted to be affected by the SNP based on position weight matrices (PWM) score changes (HaploReg v4, Ward and Kellis)[37].
deQTLs: Reported significant eQTLs for whole blood, lymphoblastoid, and liver cell types only (GTEx Consortium[20]; HaploReg v4, Ward and Kellis)[37].
eSignifies non-HLA SNPs.