| Literature DB >> 34529725 |
Seung-Soo Kim1, Adam D Hudgins1, Jiping Yang1, Yizhou Zhu1, Zhidong Tu2, Michael G Rosenfeld3, Teresa P DiLorenzo4,5,6,7, Yousin Suh1,8.
Abstract
Type 1 diabetes (T1D) is an organ-specific autoimmune disease, whereby immune cell-mediated killing leads to loss of the insulin-producing β cells in the pancreas. Genome-wide association studies (GWAS) have identified over 200 genetic variants associated with risk for T1D. The majority of the GWAS risk variants reside in the non-coding regions of the genome, suggesting that gene regulatory changes substantially contribute to T1D. However, identification of causal regulatory variants associated with T1D risk and their affected genes is challenging due to incomplete knowledge of non-coding regulatory elements and the cellular states and processes in which they function. Here, we performed a comprehensive integrated post-GWAS analysis of T1D to identify functional regulatory variants in enhancers and their cognate target genes. Starting with 1,817 candidate T1D SNPs defined from the GWAS catalog and LDlink databases, we conducted functional annotation analysis using genomic data from various public databases. These include 1) Roadmap Epigenomics, ENCODE, and RegulomeDB for epigenome data; 2) GTEx for tissue-specific gene expression and expression quantitative trait loci data; and 3) lncRNASNP2 for long non-coding RNA data. Our results indicated a prevalent enhancer-based immune dysregulation in T1D pathogenesis. We identified 26 high-probability causal enhancer SNPs associated with T1D, and 64 predicted target genes. The majority of the target genes play major roles in antigen presentation and immune response and are regulated through complex transcriptional regulatory circuits, including those in HLA (6p21) and non-HLA (16p11.2) loci. These candidate causal enhancer SNPs are supported by strong evidence and warrant functional follow-up studies.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34529725 PMCID: PMC8445446 DOI: 10.1371/journal.pone.0257265
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Overview of T1D SNP integrative annotation analyses.
A. Candidate T1D SNPs (n = 1, 817) comprised both GWAS-identified SNPs (Tier 1, n = 129, p <5E-8) and SNPs in high LD (Tier 2, n = 1,688, r2 > 0.6 and D′ = 1). To prioritize high-confidence candidate T1D SNPs, four independent analyses, involving five public DBs, were conducted: 1) Integration of Roadmap enhancer and ENCODE TFBS annotations to identify functional enhancer SNPs; 2) Querying RegulomeDB to find functional evidence of causal T1D SNPs; 3) Mining GTEx data to identify eQTL-gene associations for T1D SNPs; 4) Utilizing lncRNASNP2 data to identify T1D SNPs located in lncRNA regions. Independently, the SNPsea algorithm was applied to the candidate T1D SNPs to identify tissue-specific enrichment of T1D risk loci gene expression, and the CoDeS3D algorithm was applied to find evidence of potential physical interactions between candidate T1D SNPs and risk locus genes. B. Venn diagram showing the overlap between prioritized T1D SNPs, GTEx enhancer eQTLs, and lncRNASNP2 lncRNA data. Among the 26 high-probability causal enhancer T1D SNPs, we found 15 eQTLs associated with 64 downstream genes (S3 Table), and 4 SNPs located in 5 lncRNAs (S4 Table). The highest priority SNPs, rs3129716 and rs886424, are both GTEx eQTLs and are located in lncRNA regions. C. Tissue distribution of GTEx eQTLs and their corresponding genes. Of the 48 tissues, whole blood and skeletal muscle have the largest numbers (14 SNPs) of eQTLs, with 20 genes implicated by the high-probability SNPs. In pancreas, there are 9 eQTLs among the high-probability SNPs, with 9 corresponding genes. The numbers of enhancer eQTLs and high-probability SNPs are colored as yellow and red, respectively. The numbers of genes paired with enhancer eQTLs and high-probability SNPs are colored as green and blue, respectively. Whole blood and pancreas are marked in bold.
High-probability causal enhancer SNPs, eQTLs, and lncRNAs.
| Rsid | DB source | Chr | Position | High-probability causal enhancer SNPs | ENCODE Canonical TF-binding Motifs | |
|---|---|---|---|---|---|---|
| GTEx eQTLs Gene# | lncRNASNP2 lncRNA# | |||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| - | - |
|
|
|
|
|
| - | - |
| rs1049053 | LDlink | chr6 | 32634405 | - | - | - |
| rs9274626 | LDlink | chr6 | 32636040 | - | - | - |
| rs9388486 | LDlink | chr6 | 126661154 | - | 1 | - |
| rs3024493 | LDlink | chr1 | 206943968 | - | - | - |
| rs3024505 | GWAS | chr1 | 206939904 | - | - | - |
| rs478222 | GWAS | chr2 | 25301755 | 3 | - | - |
| rs11715915 | LDlink | chr3 | 49455330 | 9 | - | - |
| rs6997 | LDlink | chr3 | 49453834 | 8 | - | - |
| rs9814873 | LDlink | chr3 | 49454112 | 9 | - | - |
| rs7725052 | GWAS | chr5 | 40487270 | - | - | - |
| rs7731626 | GWAS | chr5 | 55444683 | - | - | - |
| rs68037604 | LDlink | chr11 | 2212487 | - | 1 | - |
| rs10876870 | LDlink | chr12 | 56478002 | 3 | - | - |
| rs4759229 | LDlink | chr12 | 56474480 | 3 | - | - |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| rs12919083 | LDlink | chr16 | 11188930 | - | - | - |
| rs7203793 | LDlink | chr16 | 11182134 | - | - | - |
| rs3746923 | LDlink | chr21 | 43826344 | - | - | - |
| rs229544 | LDlink | chr22 | 37593080 | 1 | - | MAX, USF1 |
Prioritized T1D SNPs are enriched in chromosome 6 and 16 which are marked with gray shading. Rs886424, rs3129716, rs1264361, and rs9268606 in the HLA region (chromosome 6p21.33, see Fig 2A) and rs4788084, rs62031562, rs743590, and rs762633 in chromosome 16p11.2 locus (see Fig 3A) are marked as bold font.
1Integration of the Roadmap, ENCODE ChIP-seq, and RegulomeDB.
264 genes are associated with 15 T1D eQTLs. See full list of gene-eQTL pairs in S3 Table.
34 SNPs are located within 5 lncRNA regions. See full list of lncRNA-SNP pairs in S6 Table.
Fig 2Complex transcriptional regulatory circuits in the HLA locus (6p21).
A. Regulatory circuits of the high-probability causal enhancer T1D SNPs, rs886424 and rs3129716, in the chromosome 6 HLA locus, including 25 associated genes. Red colored arrows indicate that the SNP is associated with increased expression of the indicated gene and blue colored arrows indicate an association with decreased expression of the indicated gene, according to GTEx eQTL data. The two eQTLs are located at lncRNA regions NONHSAG043434.2 (HGNC Symbol: LINC00243) and NONHSAT207123.1, respectively. B. Relative median gene expression levels of the 25 genes in the HLA locus from GTEx RNA-seq data. Based on their tissue-specific expression patterns, four representative gene groups were discovered by hierarchical clustering. The genes highly expressed in each organ are marked with colored boxes corresponding to the clustered groups. C. A heatmap for T1D-associated eQTL effect size, using the slope values of the eQTL-gene pairs from the GTEx eQTL analysis. D. A summary table of the two eQTLs rs886424 and rs3129716, including their 25 associated genes (“Gene” column), hierarchical clustering results of the tissue-specific gene expressions (“Cluster” column), directional effects of eQTLs on the genes (“Direction” column), and tissue numbers affected by eQTLs (“Tissue N” column). The genes with whole blood eQTLs are marked as bold font.
Fig 3Complex transcriptional regulatory circuits in the 16p11.2 locus.
A. This diagram shares the same rules for color, shape, and arrows as Fig 2. Regulatory circuits of the four T1D-associated eQTLs (rs4788084, rs762633, rs743590, and rs62031562) and their 19 associated genes. The four eQTLs share a common direction of association with 16 of the 19 genes. Two of the genes, SGF29 and RABEP2, are associated with three of the eQTLs, rs762633, rs743590, and rs62031562. The expression of SBK1 is associated only with rs4788084. The enhancer region harboring rs4788084 contains a canonical binding motif for the TF EBF1. B. Relative gene expression levels of the 19 genes are displayed as a heatmap. By hierarchical clustering, two representative groups were found with high tissue-specific expression patterns. The genes highly expressed in each organ are marked with colored boxes corresponding to the clustered groups. C. A heatmap for T1D-associated eQTL effect size, using the slope values of the eQTL-gene pairs from the GTEx eQTL analysis. D. A summary table of the four eQTLs rs4788084, rs743590, rs762633 and rs62031562, including their 19 associated genes (“Gene” column), hierarchical clustering results of the tissue-specific gene expressions (“Cluster” column), directional effects of eQTLs on the genes (“Direction” column), and tissue numbers affected by eQTLs (“Tissue N” column).
Genes involved in immune response represent the largest functional category of eQTL-associated genes.
| Ensembl ID | Symbol | Name | Note | High-probability SNPs | Hclust group |
|---|---|---|---|---|---|
|
| |||||
| ENSG00000204520.8 |
| MHC class I polypeptide-related sequence A | HLA locus | - | 3 |
| ENSG00000204525.10 |
| Major histocompatibility complex, class I, C | HLA locus |
| |
| ENSG00000206503.7 |
| Major histocompatibility complex, class I, A | HLA locus |
| |
| ENSG00000204632.7 |
| Major histocompatibility complex, class I, G | HLA locus |
| |
| ENSG00000204516.5 |
| MHC class I polypeptide-related sequence B | HLA locus | - |
|
| ENSG00000196126.6 |
| Major histocompatibility complex, class II, DR beta 1 | HLA locus |
| |
| ENSG00000237541.3 |
| Major histocompatibility complex, class II, DQ alpha 2 | HLA locus |
|
|
| ENSG00000232629.4 |
| Major histocompatibility complex, class II, DQ beta 2 | HLA locus |
| |
| ENSG00000204257.10 |
| Major histocompatibility complex, class II, DM alpha | HLA locus |
|
|
| ENSG00000242574.4 |
| Major histocompatibility complex, class II, DM beta | HLA locus |
|
|
| ENSG00000196735.7 |
| Major histocompatibility complex, class II, DQ alpha 1 | HLA locus |
|
|
| ENSG00000179344.12 |
| Major histocompatibility complex, class II, DQ beta 1 | HLA locus |
| |
| ENSG00000198502.5 |
| Major histocompatibility complex, class II, DR beta 5 | HLA locus |
|
|
| ENSG00000176920.10 |
| Fucosyltransferase 2 | H antigen | - | 21 |
| ENSG00000223534.1 |
| HLA-DQB1 antisense RNA 1 | Non-coding RNA | - |
|
| ENSG00000176998.3 |
| HLA complex group 4 | Non-coding RNA | - | 15 |
| ENSG00000206337.6 |
| HLA complex P5 | Non-coding RNA | - |
|
| ENSG00000103811.11 |
| Cathepsin H | Proteinase | - |
|
| ENSG00000204622.6 |
| Major histocompatibility complex, class I, J (pseudogene) | Pseudogene | - |
|
| ENSG00000230795.2 |
| Major histocompatibility complex, class I, K (pseudogene) | Pseudogene |
| |
| ENSG00000229391.3 |
| Major histocompatibility complex, class II, DR beta 6 (pseudogene) | Pseudogene |
|
|
| ENSG00000196301.3 |
| Major histocompatibility complex, class II, DR beta 9 (pseudogene) | Pseudogene |
|
|
| ENSG00000237669.1 |
| HLA complex group 4 pseudogene | Pseudogene | - | 11 |
|
| |||||
| ENSG00000173531.11 |
| Macrophage stimulating 1 | - | rs11715915, rs6997, rs9814873 | 10 |
| ENSG00000133466.9 |
| C1q and TNF related 6 | B-cell receptor | rs229544 | 3 |
| ENSG00000178188.10 |
| SH2B adaptor protein 1 | Cytokine receptor | 1 | |
| ENSG00000105397.9 |
| Tyrosine kinase 2 | Cytokine | - |
|
| ENSG00000197272.2 |
| Interleukin 27 | Cytokine | 10 | |
| ENSG00000204616.6 |
| Tripartite motif containing 31 | Cytokine | - | 20 |
| ENSG00000160856.16 |
| Fc receptor like 3 | Fc receptor-like | - |
|
| ENSG00000240053.8 |
| Lymphocyte antigen 6 family member G5B | Glycophosphatidylinositol | - | 1 |
| ENSG00000204421.2 |
| Lymphocyte antigen 6 family member G6C | Glycophosphatidylinositol | - | 16 |
| ENSG00000156711.12 |
| Mitogen-activated protein kinase 13 | Inflammation | - | 16 |
| ENSG00000111540.11 |
| RAB5B, member RAS oncogene family | SMAD | rs10876870, rs4759229 | 3 |
| ENSG00000166949.11 |
| SMAD family member 3 | SMAD | - | 3 |
| ENSG00000005020.8 |
| Src kinase associated phosphoprotein 2 | Src | - | 2 |
| ENSG00000184293.3 |
| C-type lectin like 1 | T cell costimulator | - |
|
| ENSG00000119919.9 | NK2 homeobox 3 | T cell differentiation, TF | - | 12 | |
| ENSG00000213658.6 |
| Linker for activation of T cells | T cell, TCR | - | 17 |
| ENSG00000171862.5 |
| Phosphatase and tensin homolog | T cell, TCR | - | 3 |
| ENSG00000105287.8 |
| Protein kinase D2 | T cell, TCR | - |
|
|
| |||||
| ENSG00000110852.4 |
| C-type lectin domain family 2 member B | - | - | 3 |
| ENSG00000182179.6 |
| Ubiquitin like modifier activating enzyme 7 | - | rs11715915, rs6997, rs9814873 |
|
| ENSG00000163599.10 |
| Cytotoxic T-lymphocyte associated protein 4 | - | - |
|
| ENSG00000150637.4 |
| CD226 molecule | - | - |
|
| ENSG00000136153.15 |
| LIM domain 7 | - | - | 13 |
| ENSG00000172575.7 |
| RAS guanyl releasing protein 1 | - | - | 14 |
| ENSG00000164068.11 |
| Ring finger protein 123 | - | rs11715915, rs6997, rs9814873 | 19 |
| ENSG00000224389.4 |
| Complement C4B (Chido blood group) | Complement factor |
| 27 |
| ENSG00000244731.3 |
| Complement C4A (Rodgers blood group) | Complement factor |
| 27 |
|
| |||||
| ENSG00000187796.9 |
| Caspase recruitment domain family member 9 | - | - |
|
| ENSG00000164062.8 |
| Acylaminoacyl-peptide hydrolase | Acylpeptide hydrolase | rs11715915, rs6997, rs9814873 | 18 |
| ENSG00000167914.6 |
| Gasdermin A | Bactericidal activity | - | 16 |
| ENSG00000172057.5 |
| ORMDL sphingolipid biosynthesis regulator 3 | Protein binding | - | 10 |
| ENSG00000204540.6 |
| Psoriasis susceptibility 1 candidate 1 | Inflammation, Psoriasis | 11 | |
|
| |||||
| ENSG00000185010.9 |
| Coagulation factor VIII | Blood coagulation | - | 9 |
| ENSG00000176046.7 |
| Nuclear protein 1, transcriptional regulator | Transcription factor | 17 | |
See S7 Table for full information of the 159 eQTL genes. Genes in gray shading indicate members of Hclust groups 7 and 8, which are enriched for genes involved in antigen presentation (see S1 Fig). The variants rs886424, rs3129716, rs1264361, and rs9268606, which reside in enhancer regions in the HLA locus (chromosome 6p21.33, see Fig 2A), and rs4788084, rs62031562, rs743590, and rs762633, which reside in enhancer regions in the chromosome 16p11.2 locus (see Fig 3A), are marked as bold font.
*Hclust group: The result of hierarchical clustering by gene expression patterns (see S1 Fig).