| Literature DB >> 32425948 |
Catherine Tang1, Davide Bagnara2,3, Nicholas Chiorazzi2, Matthew D Scharff4, Thomas MacCarthy1.
Abstract
Somatic hypermutation (SHM) of the immunoglobulin variable (IgV) loci is a key process in antibody affinity maturation. The enzyme activation-induced deaminase (AID), initiates SHM by creating C → U mismatches on single-stranded DNA (ssDNA). AID has preferential hotspot motif targets in the context of WRC/GYW (W = A/T, R = A/G, Y = C/T) and particularly at WGCW overlapping hotspots where hotspots appear opposite each other on both strands. Subsequent recruitment of the low-fidelity DNA repair enzyme, Polymerase eta (Polη), during mismatch repair, creates additional mutations at WA/TW sites. Although there are more than 50 functional immunoglobulin heavy chain variable (IGHV) segments in humans, the fundamental differences between these genes and their ability to respond to all possible foreign antigens is still poorly understood. To better understand this, we generated profiles of WGCW hotspots in each of the human IGHV genes and found the expected high frequency in complementarity determining regions (CDRs) that encode the antigen binding sites but also an unexpectedly high frequency of WGCW in certain framework (FW) sub-regions. Principal Components Analysis (PCA) of these overlapping AID hotspot profiles revealed that one major difference between IGHV families is the presence or absence of WGCW in a sub-region of FW3 sometimes referred to as "CDR4." Further differences between members of each family (e.g., IGHV1) are primarily determined by their WGCW densities in CDR1. We previously suggested that the co-localization of AID overlapping and Polη hotspots was associated with high mutability of certain IGHV sub-regions, such as the CDRs. To evaluate the importance of this feature, we extended the WGCW profiles, combining them with local densities of Polη (WA) hotspots, thus describing the co-localization of both types of hotspots across all IGHV genes. We also verified that co-localization is associated with higher mutability. PCA of the co-localization profiles showed CDR1 and CDR2 as being the main contributors to variance among IGHV genes, consistent with the importance of these sub-regions in antigen binding. Our results suggest that AID overlapping (WGCW) hotspots alone or in conjunction with Polη (WA/TW) hotspots are key features of evolutionary variation between IGHV genes.Entities:
Keywords: B cell receptor (BCR); activation induced deaminase (AID); computational immunology; dimensionality reduction; immunoglobulin heavy chain; somatic hypermutation (SHM); unsupervised learning
Year: 2020 PMID: 32425948 PMCID: PMC7204545 DOI: 10.3389/fimmu.2020.00788
Source DB: PubMed Journal: Front Immunol ISSN: 1664-3224 Impact factor: 7.561
Figure 1Identifying WGCW hotspot regions. (A) We show the moving window profile for WGCW overlapping AID hotspots for IGHV1-69. The shaded areas mark CDR1 and CDR2. (B) Site-by-site calculation of the average number of WGCW hotspots found in a window of size 31 (+/– 15 nt around each site). The bold line indicates the average across the 56 human IGHV genes and is colored according to sub-region. The shaded region represents +/– 1 standard deviation at each site.
Figure 2Principal components analysis (PCA) of functional human IGHV genes. (A) PCA transformation of the WGCW hotspot distribution profiles for 56 functional human IGHV genes analyzed, known as PCA scores, with respect to the first two principal components (PC1 and PC2). The amount of variance from the WGCW hotspot distribution profiles captured by each PC is shown in parentheses. Each gene is colored according to its corresponding IGHV family. Gene labels located far from their corresponding dot are attached by a fine line to overcome the problems of overlapping and nearby numbers. (B) PCA loadings plot where each dot represents a site and its relative contribution to each of the first two PCs. Distance from the origin (where PC1 and PC2 intersect) signifies the magnitude of each site's loadings contribution. Colors indicate the sub-region of each site. Dots enclosed by colored lines indicate high-contributing sites for each category (CDR1, CDR2, FW3).
Figure 3Distribution of mutation frequency between WGCW/WA sub-regions and non-WGCW/WA sub-regions. The distribution of the observed mutation frequencies is shown separately for WGCW/WA sub-regions (blue), and non-WGCW/WA sub-regions (red) for each individual IGHV gene. One-sided t-tests comparing the two distributions were performed for each gene. Significant p-values are indicated by asterisks above each plot (*p < 0.05; **p < 0.01; ***p < 0.001; ****p < 0.0001).
Binomial test results showing mutability of all WGCW/WA sites (CDRs + FWs).
| IGHV1-2*02 | 960 | 13,928 | 14,888 | 6.4 | 10 | 270 | 3.7 | ||
| IGHV1-3*01 | 2,380 | 7,684 | 10,064 | 23.6 | 31 | 270 | 11.5 | ||
| IGHV1-8*01 | 907 | 7,931 | 8,838 | 10.3 | 18 | 270 | 6.7 | ||
| IGHV1-18*01 | 7,472 | 11,325 | 18,797 | 39.8 | 57 | 270 | 21.1 | ||
| IGHV1-24*01 | 76 | 3,840 | 3,916 | 1.9 | 3 | 270 | 1.1 | 4.52 × 10−6 | 4.86 × 10−6 |
| IGHV1-45*02 | 75 | 1,738 | 1,813 | 4.1 | 8 | 270 | 3.0 | 3.04 × 10−3 | 3.11 × 10−3 |
| IGHV1-46*01 | 4,032 | 7,368 | 11,400 | 35.4 | 51 | 270 | 18.9 | ||
| IGHV1-58*01 | 459 | 2,153 | 2,612 | 17.6 | 33 | 270 | 12.2 | 1.56 × 10−15 | 1.77 × 10−15 |
| IGHV1-69*01 | 3,399 | 5,778 | 9,177 | 37.0 | 54 | 270 | 20.0 | ||
| IGHV2-5*01 | 130 | 3,706 | 3,836 | 3.4 | 7 | 273 | 2.6 | 1.14 × 10−3 | 1.19 × 10−3 |
| IGHV2-26*01 | 956 | 2,896 | 3,852 | 24.8 | 43 | 273 | 15.8 | ||
| IGHV3-7*01 | 4,731 | 12,711 | 17,442 | 27.1 | 40 | 270 | 14.8 | ||
| IGHV3-9*01 | 5,192 | 16,114 | 21,306 | 24.4 | 30 | 270 | 11.1 | ||
| IGHV3-13*01 | 4,349 | 9,487 | 13,836 | 31.4 | 48 | 267 | 18.0 | ||
| IGHV3-15*01 | 2,546 | 6,362 | 8,908 | 28.6 | 45 | 276 | 16.3 | ||
| IGHV3-20*01 | 764 | 2,441 | 3,205 | 23.8 | 32 | 270 | 11.9 | ||
| IGHV3-21*01 | 3,446 | 13,941 | 17,387 | 19.8 | 28 | 270 | 10.4 | ||
| IGHV3-23*01 | 17,854 | 31,432 | 49,286 | 36.2 | 54 | 270 | 20.0 | ||
| IGHV3-30-3*01 | 3,907 | 4,442 | 8,349 | 46.8 | 68 | 270 | 25.2 | ||
| IGHV3-30*01 | 751 | 2,813 | 3,564 | 21.1 | 40 | 270 | 14.8 | ||
| IGHV3-33*01 | 6,621 | 12,254 | 18,875 | 35.1 | 45 | 270 | 16.7 | ||
| IGHV3-43*01 | 908 | 1,869 | 2,777 | 32.7 | 52 | 270 | 19.3 | ||
| IGHV3-48*01 | 1,052 | 4,097 | 5,149 | 20.4 | 28 | 270 | 10.4 | ||
| IGHV3-49*03 | 1,432 | 1,737 | 3,169 | 45.2 | 82 | 276 | 29.7 | ||
| IGHV3-53*01 | 2,170 | 5,770 | 7,940 | 27.3 | 35 | 267 | 13.1 | ||
| IGHV3-64*01 | 1,321 | 1,022 | 2,343 | 56.4 | 84 | 270 | 31.1 | ||
| IGHV3-66*01 | 951 | 2,740 | 3,691 | 25.8 | 35 | 267 | 13.1 | ||
| IGHV3-72*01 | 1,142 | 2,444 | 3,586 | 31.8 | 44 | 276 | 15.9 | ||
| IGHV3-73*01 | 1,192 | 1,412 | 2,604 | 45.8 | 72 | 276 | 26.1 | ||
| IGHV3-74*01 | 3,855 | 6,355 | 10,210 | 37.8 | 49 | 270 | 18.1 | ||
| IGHV4-4*02 | 529 | 6,769 | 7,298 | 7.2 | 14 | 270 | 5.2 | 3.09 × 10−14 | 3.41 × 10−14 |
| IGHV4-30-2*01 | 382 | 3,862 | 4,244 | 9.0 | 16 | 273 | 5.9 | 3.01 × 10−16 | 3.50 × 10−16 |
| IGHV4-30-4*01 | 451 | 3,236 | 3,687 | 12.2 | 22 | 273 | 8.1 | 1.89 × 10−18 | 2.32 × 10−18 |
| IGHV4-31*01 | 455 | 1,632 | 2,087 | 21.8 | 35 | 273 | 12.8 | ||
| IGHV4-34*01 | 1,746 | 25,944 | 27,690 | 6.3 | 16 | 267 | 6.0 | 1.51 × 10−2 | 1.51 × 10−2 |
| IGHV4-38-2*01 | 294 | 1,739 | 2,033 | 14.5 | 24 | 270 | 8.9 | 1.96 × 10−16 | 2.35 × 10−16 |
| IGHV4-39*01 | 3,202 | 8,585 | 11,787 | 27.2 | 38 | 273 | 13.9 | ||
| IGHV4-59*01 | 2,085 | 15,502 | 17,587 | 11.9 | 20 | 267 | 7.5 | ||
| IGHV4-61*01 | 438 | 2,059 | 2,497 | 17.5 | 27 | 273 | 9.9 | ||
| IGHV5-10-1*03 | 694 | 2,589 | 3,283 | 21.1 | 39 | 270 | 14.4 | ||
| IGHV5-51*01 | 2,584 | 12,755 | 15,339 | 16.8 | 34 | 270 | 12.6 | ||
| IGHV6-1*01 | 3,610 | 8,402 | 12,012 | 30.1 | 51 | 279 | 18.3 | ||
| IGHV7-4-1*01 | 1,394 | 3,237 | 4,631 | 30.1 | 43 | 270 | 15.9 |
Expected mutation probability (“Percent WGCW/WA Sites”) is (number of WCGW/WA sites)/(IGHV gene length ungapped). Observed fraction of mutations in (“Percent Mutations in WGCW/WA sites”) WGCW/WA sites is (number of mutations in WGCW/WA sites)/(total mutations). P-value is for binomial test, both raw and FDR-corrected (see Materials and Methods).
Figure 4PCA analysis of overlapping AID (WGCW) and Polη (WA/TW) hotspots. Plots equivalent to Figure 2 but using the combined overlapping AID and Polη (WGCW/WA) hotspot distributions. (A) Corresponding PCA scores of the co-localized profiles. Gray arrows point to IGHV genes with co-localized profiles enriched in CDR1; and black arrows indicate IGHV genes with an especially strong co-localization signal focused in CDR2. (B) Corresponding PCA loadings colored according to relevant sub-region. Gene labels located far from their corresponding dot are attached by a fine line to compensate for overlapping nearby labels.
Figure 5Co-localized WGCW/WA profiles for functional and non-functional IGHV genes. Site-by-site calculation of the average number of WGCW/WA co-localized hotspots of found in a window of size 31 (+/– 15 nt around each site) for (A) functional IGHV genes and (B) non-functional IGHV genes. The bold line indicates the average across the respective genes and is colored according to sub-region. The shaded region represents +/– 1 standard deviation at each site.