| Literature DB >> 33555323 |
Frauke Degenhardt1, Gabriele Mayr1, Mareike Wendorff1, Gabrielle Boucher2, Eva Ellinghaus3, David Ellinghaus1,4, Hesham ElAbd1, Elisa Rosati1, Matthias Hübenthal1,5, Simonas Juzenas1, Shifteh Abedian6,7, Homayon Vahedi7, B K Thelma8, Suk-Kyun Yang9, Byong Duk Ye9, Jae Hee Cheon10, Lisa Wu Datta11, Naser Ebrahim Daryani12, Pierre Ellul13, Motohiro Esaki14, Yuta Fuyuno14,15, Dermot P B McGovern16, Talin Haritunians16, Myhunghee Hong17, Garima Juyal18, Eun Suk Jung1,10, Michiaki Kubo19, Subra Kugathasan20,21, Tobias L Lenz22, Stephen Leslie23, Reza Malekzadeh7, Vandana Midha24, Allan Motyer23, Siew C Ng25, David T Okou26, Soumya Raychaudhuri27,28,29,30,31, John Schembri13, Stefan Schreiber1,32, Kyuyoung Song17, Ajit Sood24, Atsushi Takahashi33, Esther A Torres34, Junji Umeno14, Behrooz Z Alizadeh6, Rinse K Weersma35, Sunny H Wong25, Keiko Yamazaki15, Tom H Karlsen4,36, John D Rioux2, Steven R Brant11,37, Andre Franke1.
Abstract
Inflammatory bowel disease (IBD) is a chronic inflammatory disease of the gut. Genetic association studies have identified the highly variable human leukocyte antigen (HLA) region as the strongest susceptibility locus for IBD and specifically DRB1*01:03 as a determining factor for ulcerative colitis (UC). However, for most of the association signal such as delineation could not be made because of tight structures of linkage disequilibrium within the HLA. The aim of this study was therefore to further characterize the HLA signal using a transethnic approach. We performed a comprehensive fine mapping of single HLA alleles in UC in a cohort of 9272 individuals with African American, East Asian, Puerto Rican, Indian and Iranian descent and 40 691 previously analyzed Caucasians, additionally analyzing whole HLA haplotypes. We computationally characterized the binding of associated HLA alleles to human self-peptides and analyzed the physicochemical properties of the HLA proteins and predicted self-peptidomes. Highlighting alleles of the HLA-DRB1*15 group and their correlated HLA-DQ-DR haplotypes, we not only identified consistent associations (regarding effects directions/magnitudes) across different ethnicities but also identified population-specific signals (regarding differences in allele frequencies). We observed that DRB1*01:03 is mostly present in individuals of Western European descent and hardly present in non-Caucasian individuals. We found peptides predicted to bind to risk HLA alleles to be rich in positively charged amino acids. We conclude that the HLA plays an important role for UC susceptibility across different ethnicities. This research further implicates specific features of peptides that are predicted to bind risk and protective HLA proteins.Entities:
Year: 2021 PMID: 33555323 PMCID: PMC8098114 DOI: 10.1093/hmg/ddab017
Source DB: PubMed Journal: Hum Mol Genet ISSN: 0964-6906 Impact factor: 6.150
Figure 1HLA regional association plots. Association analysis results for imputed and genotyped SNVs (gray) and four-digit HLA alleles (yellow) are shown for (A) 373 African American cases and 590 controls (AA), (B) 13 927 Caucasian cases and 26 764 controls (EUR) and (C) 709 Japanese cases 3169 and controls (JPN) as well as (D) the meta-analysis (META) results from the analysis with RE2 (26) at variants with a MAF > 1% in the respective cohorts (including 17 276 cases and 32 975 controls from nine different cohorts). The association plots for the remaining populations are provided in Supplementary Material, Fig. S3. The curves in (A–C) show the P-value of the meta-analysis (PVAULE_RE2). In (D), the overlying curve shows the I2 as a measure of heterogeneity in the meta-analysis indicating the heterogeneity of effects and allele frequencies in that region. Dashed lines indicate the thresholds of genome-wide (P = 5 × 10-8) and nominal significance (P = 10−5) The association analyses indicate HLA class II as the most associated susceptibility region across the different populations. In the Korean and the Japanese populations, a strong association signal is also seen for B*52:01 and C*12:02, both alleles being in strong LD with the HLA class II loci DRB1*15:02, DQA1*01:02 and DQB1*06:01, i.e. another population-specific haplotype association in these ethnicities exists.
Figure 3Haplotypes for associated HLA alleles. For a selection of associated HLA alleles, we show the most frequently observed risk (A) and protective (B) haplotypes in the respective populations. African American (AA), Puerto Rican (PRI), Caucasian (EUR), Maltese (MLT), Iranian (IRN), North Indian (IND), Chinese (CHN), Korean (KOR) and Japanese (JPN). Here we show only DRB1-DQA1-DQB1 haplotypes with a frequency > 1% in the case individuals in each respective population. The most frequently observed C-B alleles in each population were then added if the C-B-DRB1-DQA1-DQB1 haplotype occurred in more than or equal to five individuals. HLA-DRB3/4/5 alleles were taken from (15) and calculated on the basis of individuals hemizygous for HLA- DRB3/4/5 (i.e. carrying only one HLA-DRB1 observed with either HLA-DRB3, -DRB4 or -DRB5 and one DRB1*01, DRB1*08 or DRB1*10, which are not observed with any of the HLA- DRB3/4/5.)
Figure 2HLA single allele association analysis results at 2- and 4-digit resolution for MHC class II loci (A) HLA-DRB3/4/5, (B) HLA-DRB1 and (C) HLA-DQA1-DQB1. (AF; common defined as AF > 1%), OR, P-value (P) and whether an allele had a P-value < 0.05 (circle symbol) is shown for the respective population (e.g. circles with black boundary and red color represent an allele that is common and associated with risk). We depict association results of the analysis of the African American (AA), Puerto Rican (PRI), Caucasian (EUR), Maltese (MLT), Iranian (IRN), North Indian (IND), Chinese (CHN), Korean (KOR) and Japanese (JPN) cohorts and the meta-analysis (META) with I_SQUARE as an indicator of allelic heterogeneity and the P-value of association (PVALUE_RE2), combined here with single study P-values. Only HLA alleles, which are significant in the meta-analysis, have an AF > 1% in at least one population and have a marginal post-imputation probability > 0.6 are shown. The strongest association signals in the meta-analysis are for risk alleles of the DRB1*15 group, i.e. DRB1*15:01, DRB*15:02 and DRB1*15:03 and the alleles located on the same respective haplotype (Fig. 3). Alleles with OR > 5.0 or OR < 0.2 (rare and nonsignificant alleles may have larger/smaller OR) values were ‘ceiled’ at 5.0 and 0.2, respectively. The ‘consistent alleles’ that are highlighted in Figure 3 are highlighted in bold type on the left side. Null alleles at the HLA-DRB3/4/5 loci are described as DRB3*00:00, DRB4*00:00 and DRB5*00:00, respectively.
Figure 6Frequency of DRB1*01:03 across populations available in the allele frequency net database. (A) ‘Worldmap’, (B) zoom into European continent. Frequencies are shown within different ranges noted by AF. Allele frequencies of DRB1*01:03 are lower across central Europe than in the UK, Spain, India, South Africa, USA and coastal regions of South America. Frequencies were binned according to allele frequency. The figures were created using the R-package rworldmap. Frequencies were extracted from the allele frequency network database (35) for populations >100 individuals. To plot the geographic locations, we converted assigned degree and minutes to decimal numbers. We deleted all non-Caucasian populations with USA coordinates prior to plotting.
Figure 4Clustering of DRB1 proteins according to preferential peptide-binding and combined peptide-binding motifs. (MIDDLE CLUSTER): For five sets of 200 000 unique random human peptides the percentile rank scores of preferential peptide binding were calculated using NetMHCIIpan-3.2 (29) for all DRB1 proteins that were significant in the meta-analysis of genetic analysis of the HLA with and AF > 1% in at least one cohort. We additionally included DRB1*01:03. Within each set, the top 2% binders (according to NetMHCIIpan-3.2 threshold) were used to perform a clustering on the pairwise correlations between two alleles using complete observations only. We show clustering results for peptide set 2. Labels were colored according to risk (red) or protective (blue). (BINDING MOTIFS): Top 2% binders were combined for proteins (RISK 1) DRB1*11:01/04 and DRB1*13:01 DRB1*12:01, DRB1*14:04 and DRB1*15:01/03 (RISK 2), DRB1*04:01/05, DRB1*07:01, DRB1*09:01 and DRB1*10:01 (PROT 1) and DRB1*04:03/04/06 (PROT 2). For this analysis, shared peptides (10% top binders) between at least two of the groups were deleted from the set. Here we depict the results for human peptide set 2. Peptide motifs were plotted using Seq2Logo (30). The color scheme shows the chemistry of the amino acids. Red: positively charged amino acids, blue: negatively charged amino acids, green: polar amino acid, purple: neutral amino acid and black: hydrophobic amino acid.
Figure 5Clustering according to chosen physicochemical properties of amino acids within the peptide binding pockets. We only show sites with variable information in pockets (P) 1, 4, 6, 7 and 9 and only proteins for which the genetic analysis was significant (meta-analysis PVALUE_RE2 < 0.05) and for which at least one cohort had AF > 1%. We additionally show DRB1*01:03. Clustering was performed using the hclust function of the R package stats. The box below the cluster plot shows positions of P1, 4, 6, 7 and 9 of the beta (B) chain of the molecules (as defined in Supplementary Material, Table S5). Here we show combined scores F1 (A) and F3 (B) derived from a factor analysis of 54 unique amino acid properties (31). F1 captures polarity and hydrophobicity of the amino acid, whereas factor F3 captures amino acid size and bulkiness. For F1, high values indicate larger hydrophobicity, polarity and hydrogen donor abilities, whereas low values indicate nonpolar amino acids. For F3, high values indicate larger and bulkier amino acids, whereas low values indicate smaller, more flexible amino acids. We additionally show the residue-volume (C) as a measure of pocket size and defined a score ‘hydrogen acceptor’ (HB-acceptor) (D), which defines the ability of an amino acid to participate in hydrogen bonds and corresponds to the number of atoms within the sidechain that can accept a hydrogen. Additional information for the ‘charge’ parameter and the analysis for DQA1-DQB1 can be found in Supplementary Material, Fig. S9 and S10.