Literature DB >> 36037353

Different classes of genomic inserts contribute to human antibody diversity.

Mikhail Lebedin^1,2,3, Mathilde Foglierini^4,5, Svetlana Khorkova^1,2,6, Clara Vázquez García^1,3, Christoph Ratswohl^1,7, Alexey N Davydov⁸, Maria A Turchaninova^2,6, Claudia Daubenberger⁹, Dmitriy M Chudakov^2,6,8, Antonio Lanzavecchia⁴, Kathrin de la Rosa^1,3,10.

Abstract

Recombination of antibody genes in B cells can involve distant genomic loci and contribute a foreign antigen-binding element to form hybrid antibodies with broad reactivity for Plasmodium falciparum. So far, antibodies containing the extracellular domain of the LAIR1 and LILRB1 receptors represent unique examples of cross-chromosomal antibody diversification. Here, we devise a technique to profile non-VDJ elements from distant genes in antibody transcripts. Independent of the preexposure of donors to malaria parasites, non-VDJ inserts were detected in 80% of individuals at frequencies of 1 in 104 to 105 B cells. We detected insertions in heavy, but not in light chain or T cell receptor transcripts. We classify the insertions into four types depending on the insert origin and destination: 1) mitochondrial and 2) nuclear DNA inserts integrated at VDJ junctions; 3) inserts originating from telomere proximal genes; and 4) fragile sites incorporated between J-to-constant junctions. The latter class of inserts was exclusively found in memory and in in vitro activated B cells, while all other classes were already detected in naïve B cells. More than 10% of inserts preserved the reading frame, including transcripts with signs of antigen-driven affinity maturation. Collectively, our study unravels a mechanism of antibody diversification that is layered on the classical V(D)J and switch recombination.

Entities: Chemical

Keywords: B cell diversity; antibody repertoire; insert

Mesh：

Substances：

Year: 2022 PMID： 36037353 PMCID： PMC9457163 DOI： 10.1073/pnas.2205470119

Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN： 0027-8424 Impact factor: 12.779

The generation of B cell diversity relies on two main mechanisms. The primary repertoire emerges in early B cell development by activation of recombination-activating gene (RAG) enzymes. Random and imprecise joining of numerous variable, diversity, and joining (V, D, J) gene segments assemble a V(D)J exon that encodes the antibody variable region (1). Upon antigen encounter, B cells express activation-induced cytidine deaminase (AID) to further diversify the antibody repertoire in secondary lymphoid organs (2). AID mediates DNA nicks and double-strand breaks (DSB) in the variable as well as in the switch region, thereby initiating somatic hypermutation (SHM) and class switch recombination (CSR) (3). SHMs adjust the antibody affinity, while CSR replaces the constant domain of antibodies by switching from IgM to IgG, IgE, or IgA conferring different immune effector functions. Using an antigen-based screening, we previously identified antibodies that gain Plasmodium parasite reactivity through integration of the extracellular immunoglobulin (Ig)-like domains of the leukocyte-associated immunoglobulin-like receptor 1 (LAIR1) (4, 5) or of the leukocyte immunoglobulin-like receptor 1 (LILRB1) (6). In six of nine donors, the >300 bp LAIR1 insert was positioned between V and D/J segments, while in the remaining three donors a LAIR1 exon with flanking introns integrated into the switch region and was spliced into the J-to-constant junction of the mRNA. In three donors with LILRB1 inserts, two extra exons were exclusively detected in the J-to-constant junction. LAIR1 and LILRB1 are originally encoded on chromosome (chr) 19. The recombination process to integrate inserts into the antibody locus on chr 14 relies on the generation of a DNA break acceptor site and the availability of an insert substrate. RAG and AID are enzymes known to cut at specific sites in Ig loci and are therefore likely to provide the acceptor site. For example, transfected DNA is integrated into VDJ and switch regions with a 7- and 100-fold higher frequency than into average genomic sites (7). In contrast, the mechanism that may generate insert substrates is less clear. Both RAG and AID were previously shown to excise pieces of DNA that can be reinserted into the genome (8). Likewise, the RAG machinery was shown to insert recombination signal sequence (RSS)-containing Ig gene fragments into non-Ig sites in vitro (9–15). Reciprocally, in two human cases of follicular lymphoma, the BCL2 gene with cryptic RSS was excised from chr 18 and inserted into the Ig locus (16). In another study, microRNA-125b-1 was found inserted at the rearranged Ig locus in a case of B cell acute lymphoblastic leukemia (17). In contrast to LAIR1 insertions, BCL2 and MIR125B1 (microRNA-125b) were accompanied by a deletion from their original loci, suggesting RAG-mediated cut-and-pasting. Instead, endogenous LAIR1 alleles in B cells carrying the insertion remained intact, suggesting a copy-and-paste mechanism (4). Examples of large sequences inserted at DSBs have been reported in different experimental systems and in vivo. In yeast, the absence of the Dna2 nuclease promotes duplicates of genomic DNA fragments that are captured at DSBs (18). In human nonlymphoid cells, natural DSBs can be repaired by large templated DNA patches deriving from duplication of retrotransposons and reversely transcribed RNA (19, 20). In murine pro-B cells deficient for RAG2, inserts deriving from highly transcribed genes and early replicating fragile sites (ERFSs) integrated at an I-SceI restriction site (21). A distinct form of genomic aberrations is chromosomal translocations found in certain cancers (22, 23). Intriguingly, individuals endemically exposed to malaria are at higher risk to develop endemic Burkitt lymphoma arising from germinal center B cells (24, 25). In mice, Plasmodium chabaudi infection leads to chronically stimulated germinal centers with high levels of AID, thereby predisposing B cells for genomic instability and translocations (26). LAIR1-containing antibodies, despite their prevalence in about 10% of Africans exposed to malaria, have not been detected in a cohort of more than 800 European individuals (5). It remains to be established whether malaria plays an exclusive role in selection of LAIR1 antibodies or also contributes to their generation. So far, large genomic DNA insertions have been observed in the absence of malaria in the genomic switch region of plasmacytoma (27), as well as primary human B cells of healthy European donors (5). Here, we apply an unbiased, systematic approach to identify ectopic inserts in human antibody transcripts and address the general relevance of inserts to antibody diversity. Our methodology overcomes technical difficulties to enrich and screen for large insert-containing antibody transcripts. We characterize numerous antibody insertions in different B cell subsets, thereby shedding light on the molecular mechanisms involved. Importantly, we show that inserts in Ig transcripts occur in the vast majority of donors, independent of Plasmodium falciparum preexposure. Contrasting the classic recombination by predefined segments and addition of random nucleotides, our data suggest that ectopic inserts can contribute another layer of diversity forming the human antibody repertoire.

Results

Suppression PCR and a Data-Processing Pipeline Identify Insert-Containing Antibody Transcripts.

Previous repertoire studies have systematically omitted insert-containing antibodies because: 1) PCR amplifies preferentially shorter products, 2) size selection steps remove long PCR amplicons, and 3) limited read lengths prevent detection of chimeric sequences. To overcome these limitations, we developed an approach based on suppression PCR (28) to achieve selective amplification of rare, long antibody transcripts (). In the first amplification step, we introduce inverted repeats that allow the formation of intramolecular hairpins. During downstream amplification, short amplicons are disfavored due to circularization, while the ends of long amplicons remain accessible for primer annealing. To enable reliable detection of V segments and isotype, we designed PCR primers that bind in framework region (FR) 3 at least 30 bp upstream of the CDR3 and at least 14 bp downstream in CH1 exons. While a conventional PCR disabled amplification of a LAIR1-containing IgM transcript already at a 1:10 dilution, our approach detected insert-transcripts in the presence of a 103- to 104-fold excess of polyclonal IgM transcripts (). As insert-containing antibody transcripts may be of low abundance and different size, we designed spike-in probes of J-to-constant inserts of defined length (50, 100, 250, 500, and 750 bp) using LAIR1 and ICAM1 genes as templates. Probes of each insert size were mixed equimolarly and spiked into primary B cell cDNA to represent 0.1% or 0.01% of total IgG sequences (). A dedicated data-processing pipeline () was developed and applied, identifying insertions from non-IGH loci ranging from 50 to 250 bp. Insertions of 500 bp and above were excluded, which was expected as the suppression effect was optimized to capture LAIR1-like events about 300 bp in length. Finally, detection of the natural diversity of insert-containing transcripts was confirmed with monoclonal cell lines and primary blood samples of donors containing LAIR1-antibodies, revealing 1 versus 16 and 43 somatically hypermutated clones, respectively (). Collectively, these results validate the suppression PCR method and demonstrate that this unbiased approach can be used to identify antibody mRNAs containing inserts of sizes ranging from 50 to 300 bp.

Inserts Are Found in Donors of Different Origin and at Various Junctions of Heavy Chain Segments.

To determine the prevalence of insert-containing Ig transcripts in individuals and their possible relationship to exposure to malaria, we screened 56 healthy individuals living in Europe and Africa by suppression PCR, from which we isolated a total of 57.6 × 106 peripheral blood B cells (Dataset S1). Overall, we identified 1,822 antibody transcripts containing inserts that mapped to regions outside of the IGH locus (Dataset S2). Insert-containing IGH transcripts were detected in the majority of individuals (50 of 56 donors, 89.3%) (Fig. 1). Frequencies were comparable between European and African individuals ranging from 1.67 × 10−6 to 1.74 × 10−4 per B cell. The absence of insert-containing transcripts in six monoclonal B cell lines excludes PCR recombination artifacts, confirming the specificity of the suppression PCR approach (Fig. 1).

Fig. 1.

Molecular characteristics of insertions in antibody transcripts. (A) Minimum frequencies of unique inserts and analyzed cell numbers of polyclonal B cells isolated from blood of European (n = 29) and African (n = 17) donors, and monoclonal cell lines (n = 6). Frequency calculations are based on three (n = 45), two (n = 7), and one (n = 4) biological replicates for each donor. (B) Insert types classified by the position of the insert between antibody V, D, J, and CH1 (constant) segments. Blue: variable (V-D-J) insert; gray: nonclassified; red: insert between VDJ and CH1 domain (J-CH1); green: P/N-nucleotides. Bottom scheme depicts the putative genomic structure of a switch region insert allowing exon splicing between J and CH1. (C) Frequencies, n = numbers, and (D) classification of inserts by origin from genes: introns, exons, or intergenic regions. (E) Mapping of inserts identified in blood derived B cells to nuclear chromosomes. Blue: VDJ-inserts; red: J-CH1 inserts; pale colors: inserts detected in in vitro-activated B cells. Numbers indicate genomic distance in kilobases if not stated otherwise. Dark-pink rectangles mark the 10 most common hotspots (hs). Boxes show top three hotspot donor sites. Gray rectangles: Exons; red or blue rectangles: inserts. (F) Portion of top four hotspot donating VDJ and J-CH1 inserts. (G) Length distribution of detected insertions. 352 biological replicates from 56 donors. (H) Inserts mapped to mtDNA. Zoom-in box on D-loop region with origins of heavy (OH) and light strand replication sites (OL). L, H1, H2: light, heavy-1, heavy-2 strand promoters. Yellow squares and green circles depict cryptic splice acceptor and donor sites. The insertions were classified according to the insert position in the antibody transcript (Fig. 1 ). In the first group of transcripts, denoted as VDJ inserts (), fragments were inserted in the antibody CDR3 region: between V and DJ (V-DJ, 33.4%,), VD and J (VD-J, 5.5%), or, when no D segment could be assigned, between V and J (V-J, 17.7%). Rare transcripts contained two distinct inserts, both between V-D and D-J segments (V-D-J, 0.7%). In the second group of transcripts, insertions were located between the VDJ and the constant region (J-CH1 inserts, 26.2%) (). Of J-CH1 inserts, 45% spanned entire exons with the 5′- and 3′-ends precisely matching the original splice sites (Fig. 1), while for a fraction of the remaining J-CH1 inserts we could detect cryptic splice sites. These findings suggest that J-CH1 inserts are the product of a genomic insert comprising an exon or a cryptic exon with flanking introns that is spliced. We also identified transcripts missing the J-segment with inserts positioned between a V segment and constant domain (V-CH1, 11.8%). This insert type may represent CDR3 region integrations that are accompanied by the deletion of the J segment via genomic loss or alternative splicing. In conclusion, within the size limitation, our experimental approach revealed numerous unique inserts at frequencies ranging from 10−4 to 10−6 B cells in most of the individuals analyzed, irrespective of their origin and preexposure to the malaria parasite.

Insert Substrates Originate from Nuclear and Mitochondrial DNA.

We found that 85.7% of all inserts derived from the nuclear genome and map to all chromosomes (Fig. 1). The majority of insertions were unique, but certain inserts were detected multiple times. Of all nuclear insertions, 18.3% originated from 10 prominent regions (Fig. 1 and ). Such hotspots were primarily associated with J-CH1 insert events (Fig. 1 and Dataset S3). Certain genes were frequently detected, such as CSNK1D, QSOX2, and PNPLA7 (boxes in Fig. 1). The detected insert length ranged from 29 to 563 bp, with a median equal to 160 bp (Fig. 1). Of all inserts, 14.3% originated from mitochondrial (mt) DNA (Fig. 1), many of which derived from hotspots. The most prominent hotspot donated 44.1% of inserts and overlapped with the D-loop region containing transcription and replication initiation sites (Fig. 1 and ). MT-CYB, MT-ND5, and MT-ND4 genes donated 36.4% of mtDNA inserts. Of note, mitochondrial inserts were exclusively found in VDJ junctions. We hypothesized that the bacterial ancestry of mtDNA with its lack of exon-intron structures might prevent an insert from splicing between J and CH1 segments. To investigate this hypothesis, we reanalyzed our published genomic dataset of the antibody switch regions sequenced by MinION (5) and found that none of 232 genomic switch inserts detected in six European individuals was of mtDNA origin. We conclude that mtDNA is an exclusive donor for inserts that are incorporated in VDJ regions. Certain mtDNA inserts (3.4%) carried small deletions, which can be attributed to splicing at cryptic sites () and points to the processing of such transcripts by the nuclear splicing machinery within the nucleus of a cell. Collectively, the data show that inserts deriving from mtDNA exclusively integrate between V-D-J segments, while fragments of nuclear DNA are found at both V-D-J and J-CH1 junctions. Furthermore, our analysis reveals distinct hotspots in the nuclear genome and in the D-loop region of mtDNA that provide a majority of inserted templates.

ERFS and R-Loops Contribute to J-CH1 Inserts.

We further defined the donor regions of nuclear DNA and observed that 70.0% and 88.4% of VDJ and J-CH1 inserts originated from mRNA-encoding regions, suggesting a possible link between insert source and transcription. To quantify transcription of insert donor regions, we used a published RNA sequencing dataset for human immune cells (29). Both VDJ and J-CH1 inserts originated from genes that are highly transcribed in human peripheral blood B cells (Fig. 2). Previous studies have linked transcription to DNA damage (30, 31), possibly by formation of RNA/DNA hybrids called R-loops (32). To test whether inserts originate from such structures, we compared our dataset to an R-loops immunoprecipitation sequencing (DRIP-seq) database derived from human leukemic cell line K562 (33). As a control, we generated in silico datasets simulated to originate from randomized positions in the genome (Materials and Methods). J-CH1 insertions, but not VDJ insertions, displayed proximity to R-loop regions, with 45.4% of inserts overlapping with R-loops. In contrast, only 24.4% overlap was found for in silico controls (Fig. 2 and ).

Fig. 2.

Features of insert substrates deriving from nuclear DNA. (A) Expression level (transcripts per million, TPM) of VDJ and J-CH1 compared to S-PCR (previously detected genomic inserts in the switch region) (5) insert donor genes and the transcriptome in B cells. (B) Distance (kb) of the insert donor site to the closest R-loop determined by DRIP-seq. (C) Distance (Mb) to ERFS sites retrieved by lift-over of murine data to human. (D) GC content in percent of the inserted sequence. (E) Distance (kb) of the insert donor site to closest LINE element. (F) Euler diagram depicting overlap of insert donor sites with AID-bound sites. (G) Overlap of insert donor sites with RAG-mediated off-targets and AID translocations. (H) Median insert donor site proximity (kb) to the AID-bound sites. For all panels: red, J-CH1 inserts; blue, VDJ-inserts; gray, in silico-generated control data (182,200 artificial inserts, 100 technical replicates). n = number of inserts. Black numbers: median. The black lines in the boxplot represent the median, the top and the bottom of the boxplots represent 25th and 75th percentile. Red numbers: overlap in percent. Normality tested by Shapiro–Wilk test, significance computed by Wilcoxon signed-rank test. ns, P ≥ 0.05, *P < 0.05, **P < 0.01, ***P < 0.001, and ****P < 0.0001. In malaria infection-driven germinal centers, DNA damage in replicating B cells preferentially occurs at ERFSs but not at common fragile sites (CFS) (26), which are genomic regions prone to break during late DNA replication (34, 35). DNA breakage at ERFSs is likely to be induced by replicative stress and is independent of AID activity (36). Insert origins showed significant proximity to ERFSs previously described for murine B cells (Fig. 2 and ), while no proximity was observed for documented human CFSs (). Inserts showed an elevated GC-content (Fig. 2 and ), sharing this feature with ERFS (36), while CFSs were shown to be AT-rich (37–39). In addition, inserts originated from genes that are significantly longer than the average human gene (mean 36.7 kb for J-CH1 and 100.7 kb for VDJ inserts) (). As a long interspersed retrotransposable element (LINE) retrotransposon-mediated DNA repair was observed in mammalian cells and human genetic diseases (40), we tested if inserts may originate from these elements. Reported LINEs yielded a significant overlap with VDJ inserts but not J-CH1 inserts (Fig. 2 and ), while no overlap was found for short interspersed retrotransposable element (SINEs) (). To test if AID off-target activity may contribute to the described events, the donor sites were analyzed for overlaps with AID off-target regions using a chromatin immunoprecipitation-sequencing (ChIP-seq) database of murine B cells (41). While a random control displayed 22.8 to 32% overlap, 52.3% of J-CH1 inserts, and 24.4% of VDJ inserts derived from regions that were shown to bind AID (Fig. 2 and Dataset S4). As AID binding is not sufficient for the induction of a DSB (41), we compared the locations of insert origins with available datasets of AID-mediated DSBs, translocations, or mutagenesis (31, 42–47). We found a moderate overlap between J-CH1 inserts detected in memory B cells with AID-mutated sites (47) (14.3% of the inserts, 1.47% of the control dataset) and with AID-associated translocations characterized by high-throughput genomic translocation sequences (31) (5.7% of the inserts, 0.58% for the control dataset). From 234 documented AID translocation hotspots that derived from translocation-capture sequencing (43) and were converted from mice to human syntenic regions, only five genes were detected in eight unique insert-containing transcripts (Fig. 2). We detected significant proximity to AID target sites for J-CH1 inserts in both naïve and memory B cells (Fig. 2) but not in in vitro-activated B cells. While the overlaps in naïve B cells argue against AID off-target activity generating inserted fragments, a lack in activated cells may be explained by a limited AID activity in vitro. Instead, in vitro activation significantly increased acquisition of J-CH1 inserts deriving from ERFSs (), suggesting that during B cell activation and proliferation inserts are provided by replication stress rather than AID off-targeting. Inserts did not overlap with described RAG off-targets (48) (Fig. 2) and, importantly, cryptic RSS flanking the insert were as frequently detected as in random controls (). To match inserts detected in transcripts with events observed in genomic DNA, we made use of our previously published dataset of switch region insertions (5). As expected, more than any other insertion type, J-CH1 insertions shared similarities with fragments detected by switch PCR (S-PCR), including their origin from highly expressed genes, overlap with R-loops, and proximity to ERFS and AID-bound sites (Fig. 2). Only the GC-content of genomic switch inserts was lower, which can be explained by the presence of introns that are less GC-rich compared to exons (Fig. 2). In conclusion, our data suggest that distinct molecular mechanisms contribute to insert substrates. While generation of the insert fragments appears to be independent of RAG activity, our results suggest that a portion of VDJ inserts derive from LINE elements. J-CH1 inserts may originate from ERFSs and R-loop proximal regions, and a contribution of AID off-targeting may be moderate and cannot be excluded.

Acceptor Break Sites Occur at Distinct B Cell Developmental Stages.

VDJ inserts were detected in all populations analyzed at a similar frequency, namely pre-B cells (), naïve, and memory B cells (Dataset S2), suggesting the involvement of RAG recombinase in providing acceptor sites during heavy chain rearrangement. Interestingly, the analysis of N nucleotides unraveled that an insert donor template may be subject to TdT-mediated N nucleotide addition (). We expected that J-CH1 inserts mainly occur in memory B cells as acceptor sites can be provided by AID-mediated DSBs in the genomic switch region (Fig. 1 , bottom scheme). However, J-CH1 inserts were detected in naïve B cells, suggesting that certain J-CH1 inserts can be acquired in the absence of AID activity. Since inserts deriving from ERFSs were detected only after CD40L/interleukin (IL)-4 in vitro stimulation (), we expected that B cell activation increases insert frequency. However, suppression PCR analysis revealed no significant change in frequency, which might be explained by a lower efficiency of splicing after in vitro stimulation (). Nevertheless, we observed profound qualitative differences, as J-CH1 inserts detected in naïve B cells derived from telomere-proximal regions (Fig. 3), while ERFS proximity was exclusive for activated cells. Of note, insertions detected by genomic S-PCR (5) are not prone to originate from the subtelomeric regions (Fig. 3), which is a feature they share with the J-CH1 inserts detected in in vitro activated cells.

Fig. 3.

Telomere-proximal J-CH1 inserts span multiple exons and are shared between distinct donors. (A) Distance of inserts to chromosome ends detected by suppression PCR for naïve, Mem, Act B cells, and by S-PCR (previously detected genomic inserts in the switch region) (5); in silico controls in gray. Black text above the boxes: median in Mb. n = number of inserts. (B) Percentage and numbers of multiple exon inserts. (C) Schematic representation of a CSNK1D insert with the original insert donor locus, the putative IGH structure, and the detected transcript. (D) Telomer proximity and estimated genomic length calculated by 5′- and 3′-end coordinates of insert flanks of VDJ and J-CH1 inserts detected in primary and Act B cells (semitransparent dots). (E) CDR3 region alignments for CSNK1D-containing insert transcripts. Donor names (Don; n = 8), B cell population from which inserts were isolated (Pop), frame preservation (Prod), VDJ segment usage, and isotype (Iso) are shown. Gray shade density represents homology. The black lines in the boxplot represent the median, the top and the bottom of the boxplots represent 25th and 75th percentile. ns, P ≥ 0.05, *P < 0.05, **P < 0.01, ***P < 0.001, and ****P < 0.0001.

J-CH1 Inserts of Telomere-Proximal Origin Span Multiple Exons and Are Shared between Donors.

We observed that 10 to 20% of inserts comprise multiple consecutive exons of one donor gene (Fig. 3), with introns spanning up to multiple kilobases. For example, the CSNK1D gene donated an insert with two exons that are 10.1 kb apart in the donor gene (Fig. 3), and the QSOX2 gene donated up to three exons covering a genomic distance of 2.9 kb (Fig. 1 , upper box). J-CH1 inserts deriving from telomere-proximal regions are thus supposed to span >1 kb genomic distances (Fig. 3). Inserts deriving from the CSNK1D gene were detected in 8 of 56 donors. In three of them, the CSNK1D insert was observed in two to five distinct antibodies with unique CDR3 junctions (Fig. 3). Similarly, QSOX2 inserts were detected in 12 antibodies of 7 donors (Dataset S2). In conclusion, J-CH1 inserts in naïve B cells belong to a particular insert class likely resulting from splicing of multiexon genomic insertions that originate from telomere-proximal hotspots.

Ectopic Inserts Were Not Undetectable in Light Chain or T Cell Receptor Transcripts.

To address if V-J junctions of light chains serve as acceptor sites for inserts, we applied the suppression PCR technique to Ig-κ (IGK) transcripts. In 27 samples from 9 donors, no light chain inserts were detected (). This finding was somewhat unexpected, as our data suggested that RAG might mediate insertions during heavy chain rearrangement. Mapping the reads to the IGK locus revealed frequent inclusions of inter-J sequences and alternatively spliced exons in the transcripts (), confirming enrichment of long natural transcripts by suppression PCR. To clarify if alternative IGH transcripts, such as inter-J splice variants, could mask insert detection, we studied the IGL locus with consecutive J-C cassettes that would exclude integration of inter-J sequences. No inserts were found in λ light chain transcripts in six samples from a single donor (). Finally, the interlaced structure of the IGL locus does not prevent the inclusion of cryptic J-C exons, which dominated the size-selected λ light chain amplification. Similar to the light chains, suppression PCR in two donors targeting T cell receptor (TCR)A and TCRB transcripts did not reveal any ectopic insert, but pointed to a significant incorporation of inter-J fragments and J-C alternative exons (). We conclude that suppression PCR did not extract any ectopic inserts in TCR and light chain transcripts, suggesting that the IGH locus is most permissive for insert acquisition.

In-Frame Insert Transcripts Somatically Hypermutate and Class Switch.

To address the potential contribution of inserts to the functional antibody repertoire, we analyzed the frequency of in-frame transcripts and signatures of antigen-driven affinity maturation. A total of 13.2% of inserts were in-frame, which were more frequently detected between J-CH1 junctions (32.8% of in-frame inserts; 11.1% of total inserts) (Fig. 4 ). In contrast, 3.1% of all VDJ inserts (1.8% of total inserts) preserved the frame. To investigate if inserts could contribute to immune responses, we analyzed in-frame insert-containing transcripts isolated from memory B cells. Of in-frame VDJ inserts, 4.2% and 16.7% derived from IgM memory and IgG/A class-switched memory B cells, respectively; and 21.2% and 51% of J-CH1 inserts derived from IgM memory and IgG/A class-switched memory B cells, respectively (Fig. 4). SHMs were detected in VDJ in-frame inserts of memory but not naïve or bone marrow-derived pre-B cells (Fig. 4). Our data thus provide evidence that B cells expressing in-frame insert-containing transcripts underwent antigen-driven affinity maturation.

Fig. 4.

In-frame inserts show signs of antigen-driven affinity maturation in memory B cells and are compatible with expression when grafted into recombinant antibodies. (A) In- and out-of-frame insert frequencies for different insert types. Each point represents an insert frequency of 1 of 56 donors analyzed (352 biological replicates). The black lines in the boxplot represent the median, the top and the bottom of the boxplots represent 25th and 75th percentile. (B) Pie chart depicts out-of-frame (O-F) and in-frame (I-F) insertion frequencies with two subpies deconvoluting contributing B cell subpopulations. Schemes represent insert position of J-CH1 (red) and VDJ (blue) inserts in the antibody protein. n = insert numbers. NA = IgM transcripts derived from bulk sorting. (C) Schematic representation of in-frame VDJ insert-containing transcripts detected in IgG/IgM memory cells, naïve B cells, and pre-B cells. Red circles mark the SHMs. (D) Scheme depicting the selection of recombinant backbone antibodies with known specificity for grafting of in-frame inserts, which was based on a most favorable alignment of VDJ joints. SNTG2 is shown as an example. (E) Indicated constructs (15 VDJ Abs and 8 J-CH1 Abs) were expressed in Expi293F cells and the concentration of IgG in culture supernatants was determined by ELISA. (F) Protein G purified insert-antibody grafts were analyzed for their ability to bind the target antigen of the backbone antibody, an irrelevant antigen (Irr Ag), and bovine serum albumin (BSA). Green rectangles in C and D denote the N nucleotides. In A and B, in vitro-activated cells were excluded from analysis. To assess the effect of ectopic insertions on the stability and binding properties of antibodies by in vitro expression, we cloned 20 of the most frequently detected in-frame fragments (8 J-CH1 and 12 VDJ) into available recombinant antibodies that share VDJ-homology with the insertion-carrier transcript (, Fig. 4, and Dataset S5). One VDJ insert (WBP1L) was cloned into four antibody backbones. In total, 11 of 15 VDJ and all J-CH1 insert-containing antibodies (further called VDJ Abs and J-CH1 Abs) were produced, detected by ELISA in culture supernatants, and purified by protein G (Fig. 4). Except for WBP1L, FI-KDM2B, and FI-HNRNPUL1 insertions, single bands of expected molecular weight compared to the backbone antibodies were observed (). Next, we determined antibody binding toward the antigen recognized by the backbone antibody (Fig. 4 and Dataset S5). As expected, 9 of 14 purified VDJ Abs lost reactivity to the original target. Interestingly, 2 of 14 VDJ Abs presented with moderate target binding, but at the same time gained weak reactivity to irrelevant control antigens. Three of 14 VDJ Abs gained unspecific binding properties to all tested targets. Six of eight purified J-CH1 Abs maintained binding and in part gained weak to moderate unspecific binding. We conclude that the antibody structure is permissible for an insertion of 35 to 76 amino acids between VDJ and CH1 domains and for 33 to 86 amino acid insertions in the heavy chain CDR3 region.

Discussion

Frequency of Inserts in the Human Antibody Repertoire.

By developing a target-independent approach, we identified insert-containing antibody transcripts in the vast majority of individuals of a genetically diverse cohort. Inserts were detected at frequencies of 10−4 to 10−6 B cells, of which about 90% were out-of-frame. This finding is consistent with a limited contribution of inserts to total B cell diversity. We have to consider, however, that the antibody repertoire is dynamic and includes spatial and temporal diversity. Naïve B cells are continuously renewed and, at a given time, the number of circulating clones in all body compartments may range between 109 (49) and 1011 (50) naïve and total B cells, respectively. LAIR1 inserts, despite being undetectable in nonexposed subjects, were readily selected in at least 5% of malaria-exposed individuals (5), demonstrating that inserts can contribute to the antibody response. This may especially apply when the insert is a pathogen receptor, but it is also possible that random insertions may generate new antigen-binding domains that may further diversify through somatic mutation. Insert detection was limited not only by the number of B cells analyzed but also by the fact that suppression PCR is size-selective. As PCR and screening conditions limit the detectable range, the obtained numbers need to be interpreted as minimum insert frequencies. Therefore, the contribution of inserts to diversity might be higher than estimated in this study.

Four Insert Classes Defined by Acceptor Break Sites and Insert Origins.

Our results suggest a classification of inserts into four classes based on two parameters: the site of insertion and the source of the insert (Table 1). V-D-J inserts derived either from nuclear genes (nucVDJ) or mitochondrial DNA (mtVDJ) and J-CH1 inserts derived from telomere proximal regions (nucJC) or ERFS (telJC).

Table 1.

Four distinct insert classes: nucVDJ, mtVDJ, nucJC, and telJC were defined

	nucVDJ	mtVDJ	nucJC	telJC
Acceptor site	RAG mediated	RAG-mediated	AID-mediated	Switch region fragility, RAG or AID
Insert source	Nuclear DNA: LINE elements, very large genes (mean 100 kb), slightly GC-rich	mtDNA: D-loop	Nuclear DNA: ERFS proximal, highly GC-rich, large genes (mean 50 kb)	Nuclear DNA: telomere-proximal, R-loop proximal, highly GC-rich
Occurrence	Prior to antigen encounter	Prior to antigen encounter	After B cell activation	Prior to antigen encounter

Four distinct insert classes: nucVDJ, mtVDJ, nucJC, and telJC were defined We assume that nucVDJ and mtVDJ inserts occur prior to antigen encounter as a consequence of repair at RAG-mediated breaks. The origins of nucVDJ inserts in part overlapped with LINE elements and large genes. The latter are known to be prone to DNA damage due to a clash of transcription with replication (51, 52). Although cellular divisions of B cell progenitors are limited, it may be possible that in rare cases single-stranded DNA may be released from stalled replication forks (53) serving as patches for the repair of DNA breaks. We observed that mtVDJ inserts derived from the mitochondrial chromosome but not from genomic mitochondrial pseudogenes (54). NUMTogenesis—the transfer of mtDNA to the nucleus—has been suggested to play a role in cancer development (55) and 19% of templated sequence insertion polymorphisms found in the human genome derive from mtDNA (56). Intriguingly, mtDNA nuclear insertion was detected in a zygote or early after fertilization but not in experimentally induced I-SceI DSBs in a leukemia cell line (20). These results point to either a different mechanism exploited by mtDNA or to a distinct cellular state at which mtDNA donates inserts. NucJC inserts were mainly present in memory and in vitro class-switched B cells, suggesting that acceptor break sites are generated by AID. Substrate sources, instead, overlapped with ERFSs that are independent of AID activity (36). The observed overlap of some detected inserts with AID-bound sites may result from transcription and open chromatin contributing to both AID binding and fragment generation. Naïve B cells harbored telJC inserts that comprised multiple exons. It remains unclear how acceptor sites between the J-to-constant region are generated, since AID may only rarely be expressed at the early stages of B cell development (57). TelJC insert substrates and acceptor sites may, for example, be mediated by RAG proteins that excise DNA over large genomic distances and have the potential to function as a transposase that targets GC-rich sequences and hairpins (12, 14, 58). Considering that telJC inserts may span multiple kilobases in the genome, conventional techniques in the past may have erroneously classified telJC inserts as translocation events. Linear amplification methods (59) could in the future be combined with suppression PCR to study the genomic architecture and deliver further mechanistic insights into the nature of the phenomenon. However, a dual-sided approach may be needed to clearly distinguish a multikilobase genomic insert from a translocation. Finally, the LAIR1 gene, which donates inserts in malaria-exposed individuals (5), shares characteristics with inserts identified in this study. The LAIR1 gene is 19 kb long, which is almost double the average gene size, and is proximal (at 4.26 Mb distance) to the chromosome end, 8.6 kb apart from the closest R-loop. LAIR1 is highly expressed in progenitor and naïve B cells, as well as in a fraction of memory B cells. LAIR1 does not overlap with ERFSs; however, it contains a CFS. Together, these features hint at LAIR1 inserts resembling nucVDJ and telJC insert classes.

Insert Contribution to the Antibody Repertoire.

VDJ inserts and J-CH1 inserts have to meet different requirements to be expressed as a protein. VDJ inserts integrate into the CDR3 region and thus need to preserve the open reading frame to allow a positive B cell selection. Instead, J-CH1 insertions can be alternatively spliced giving rise to B cell receptors (BCRs) with and without an insert. As VDJ inserts are likely to arise during B cell development, they may be subject to selection. Despite rare exceptions (60), the vast majority of B cells is committed to express a single antibody heavy chain due to allelic exclusion (61–64). Therefore, detection of heavy chain transcripts containing hypermutated, in-frame VDJ inserts suggest that inserts contribute to functional diversity. J-CH1 inserts comprise entire exons and are susceptible to alternative splicing. Despite having a direct impact on specificity, the addition of an insert between VDJ and CH1 domains could also affect protein conformation and binding, as well as clustering of BCRs, which was shown to depend on the spatial organization in the plasma membrane (65). Despite detecting several insert transcripts compatible with expression, we did not detect another LAIR1-like example. Two main factors may be crucial: gain of a particular specificity by the integration of a pathogen receptor and chronic stimulation by that pathogen. The latter would increase the likelihood to activate a rare B cell clone. In the future, an exclusive screening of memory B cells of donors exposed to chronic infections may enable isolation of other functional insert antibodies. Finally, our work not only provides a tool to unravel an additional layer of antibody diversity, but also provides molecular insights into the mechanisms of insert acquisition that might be of particular relevance in chronic infectious diseases.

Materials and Methods

Human Specimens.

Blood from healthy individuals was obtained from the German and Swiss Red Cross. In all cases, written informed consents were obtained and samples were anonymized. All healthy donor samples were tested negative for HIV, HBV, HCV. Peripheral blood mononuclear cells (PBMCs) from malaria preexposed volunteers were obtained from the prevaccination period of the P27A vaccine phase Ib trial (ClinicalTrials.gov Identifier: NCT01949909, Pan African Clinical Trial Registry identifier: PACTR201310000683408). This study was conducted with approval from the Tanzanian Food and Drug Administration (TFDA; Dar-es-Salaam, TFDA13/CTR/004/03), National Institute for Medical Research (NIMR; Dar-es-Salaam, NIMR/HQ/R8a/Vol.IX/1742), and ethical review boards at Ifakara Health Institute and the University of Lausanne, and all volunteers provided informed consent before blood donation. Bone marrow specimens were obtained from AllCells.

Extraction of Human PBMCs.

Blood samples were diluted 1:1 with Dulbecco’s phosphate-buffered saline (DPBS, Sigma, Cat#D8537-500ML) containing 2 mM EDTA and separated using Ficoll density gradient centrifugation (Roth, Cat#0642.2). PBMCs were either immediately processed or resuspended in 90% FBS with 10% DMSO, frozen in freezing chambers at −80 °C, and transferred to liquid nitrogen. Frozen PBMC or cells extracted from bone marrow material were thawed at 37 °C for 4 min and gradually diluted in 10-fold excess of B cell medium (composition is described below in B Cell Activation). Thawed cells were centrifuged for 5 min at 350 × g, resuspended in B cell medium, and used for flow cytometry, in vitro activation, or RNA extraction.

Cell Enrichment and Flow Cytometry.

PBMCs were enriched for CD19+ cells by magnetic cell separation (MACS) according to the manufacturer’s manual (Miltenyi Biotec, Cat#130-050-301). For T cell analysis, PBMCs were enriched by MACS using anti-CD3 microbeads (Miltenyi Biotec, Cat#130-050-101). For FACS analysis and cell sorting, CD19+ B cells were incubated with antibodies specific for CD19, CD27, CD38, IgM, IgG, IgD, IgA, and κ light chains. CD3+ T cells were FACS-sorted using anti-CD3, anti-CD4, and anti-CD8 antibodies. Bone marrow samples were thawed, washed, and incubated with antibodies specific for CD10, CD34, CD38, CD19, and the IgM µ-chain. After sorting, collected cells were immediately processed or frozen in 90% FBS with 10% DMSO. Cell populations were sorted according to the following phenotypes: 1) naïve B cells: CD19+CD27−IgM+IgD+; 2) IgM memory B and Activated IgM cells: CD19+CD27+IgM+; 3) IgG/A+ switched memory and Activated IgG/A+ cells: CD19+CD27+IgG/A+; 4) IgK+ B cells: CD19+IgK+; 5) IgL+ B cells: CD19+IgK−; 6) CD4 T cells: CD3+CD4+; 7) CD8 T cells: CD3+CD8+; 8) pro-B cells: CD38+CD19+CD10+CD34+µ-chain−; 9) pre-B cells: CD38+CD19+CD10+CD34−µ-chain−; 10) immature B cells: CD38+CD19+CD10+CD34− µ-chain+.

B Cell Activation.

Sorted naïve B cells were seeded at 3 × 104/cm2 and cocultured with irradiated cells expressing CD40L at a 10:1 ratio. Cells were cultured in B cell medium: complete RPMI plus 10% FBS, 1% sodium pyruvate, 1% nonessential amino acids, 1% penicillin-streptomycin (Thermo Fisher), 1% GlutaMAX, 0.1% 2-mercaptoethanol, 0.02 ng/mL transferrin (Sigma), and 0.1 mg/mL kanamycin (Serva). Cells were activated with 25 ng/mL IL-4. Every 3 d cells were restimulated by adding 25 ng/mL IL-4. On the seventh day, cells were reseeded at a density of 106 cells/mL, provided with fresh CD40L expressing cells, and activated with 25 ng/mL IL-4 every 3 d. On the 14th day, cells were collected, sorted as described above and immediately processed by either extracting total RNA or genomic DNA.

Suppression PCR.

Total RNA was isolated from up to 105 cells with RNeasy Mini Kit (Qiagen) according to the manufacturer’s protocol and either stored at −80 °C or processed immediately. cDNA synthesis was performed using SuperScript IV Reverse Transcriptase (Thermo Fisher) with 1 µM final primer concentration (hlGM or an equimolar mixture of hlGA and hlGG; for primer sequence, see Dataset S6) keeping reaction at a minimum 55 °C to avoid off-target priming. Reactions were purified with 1.4× AMPure XP beads (Beckman Coulter), according to manufacturer’s recommendations and eluted in TE buffer (Tris⋅HCl 10 mM, EDTA 1 mM). All PCRs described below are performed with Q5 Hot Start Polymerase (New England Biolabs) in the recommended buffer and 0.25 mM of each dNTP and 0.2 μM of each primer if not stated otherwise. PCR-I: purified cDNA was amplified with a mixture of suppression primers annealing to V gene segments and constant regions (Dataset S6) using the following thermocycling settings: 98 °C 30 s, (98 °C 10 s, 55 °C 10 s, 72 °C 10 s) × 10, 72° 60 s. Amplicons were purified with 0.8× AMPure XP beads. PCR-II: Tagged amplicons with inverted repeats on both ends were amplified with distal_22 primer at the final concentration 0.5 µM and the following thermocycling settings: 98 °C 30 s, (98 °C 10 s, 52.7 °C 10 s, 72 °C 10 s) × 37, 72 °C 60 s. Products were purified with 0.8× AMPure XP beads. PCR-III: to prepare for Illumina adapters introduction, suppression products were amplified with U1- and U2-overhang primers, with the thermocycling conditions: 98 °C 30 s, (98 °C 10 s, 55 °C 10 s, 72 °C 10 s) × 10, 72°60 s. Products were purified with 0.8× AMPure XP beads. iPCR: unique indices and flow cell adapters were introduced with FC1-i5X-U1 and FC2-i7X-U2 primers (X = index identifier) by the following program: 98 °C 30 s, (98 °C 10 s, 55 °C 10 s, 72 °C 10 s) × 5, 72 °C 60 s. PCR products were analyzed by agarose gel electrophoresis, purified with 0.8× beads. Concentration was measured either by DS-11 spectrophotometer (DeNovix) or Qubit HS DNA kit (Thermo Fisher). The size distribution of the products was estimated by gel image processing with GelAnalyzer software (66) and used to calculate the molarity. Indexed samples were mixed equimolarly, repurified by 0.6× AMPure XP beads, measured with BioAnalyzer High Sensitivity DNA kit (Agilent), and prepared for sequencing with 2 × 300 MiSeq Reagents kits v3 (Illumina) according to manufacturer’s manual. Libraries for light chains and T-cell receptor chains were performed following the same procedure with distinct primers (Dataset S6).

Molecular Cloning of Spike-In Plasmids.

Control plasmids were generated by introducing LAIR1 or ICAM1 inserts into antibody vector encoding IgG heavy chain. Fragments of the designed constructs were amplified from total PBMC cDNA with the corresponding primers and included: 1) IGH variable domain (amplified with VH3-23_Fwd_NcoI and IGHJ4_Rev_BamHI); 2) IGHG1 constant gene fragment (IGHG_Fwd_NheI, IGHG_Rev_SalI); and 3) LAIR1 and ICAM1 fragments of variable length (LAIR1_Fwd_BamHI, ICAM1_Fwd_BamHI, and LAIR1_Rev_X_NheI or ICAM1_Rev_X_NheI, where X - amplified fragment length). The purified PCR products were digested with corresponding restriction enzymes and ligated into NcoI/SalI-digested pAL2-T vector (Evrogen) with T4 DNA ligase (Thermo Fisher Scientific). To prepare spike-in samples, IgG transcript copy number was assessed by qPCR, plasmids were diluted to either 200 or 20 fg/µL each and spiked into cDNA of bulk B cells at 1:1,000 and 1:10,000 ratios. The mixtures were used for suppression PCR.

Selection of In-Frame Inserts and Cloning into Recombinant Antibodies.

Suppression PCR does not allow us to obtain the full-length V segment sequence or the sequence of the light chain. To express the insertion-containing antibodies in the most favorable VDJ context, we aligned the available VDJ sequence to our in-house antibody library and selected the backbone for each insertion with the highest VDJ homology. The WBP1L insertion was cloned into four different backbones to assess the backbone influence, resulting in 15 total VDJ insertion-containing antibodies (Dataset S5). The insertions were synthesized either by overlap-extension PCR or amplified from the human genomic PCR and introduced in the backbone by Gibson assembly (primer sequences in Dataset S6). Successful cloning was confirmed by Sanger sequencing.

Expression and Purification of Grafted Insert Antibodies.

Heavy chain constructs containing the selected insertions were mixed with the corresponding light chain plasmids to the final amount of 1.2 µg/mL of production. The plasmids were mixed with polyethylenimine (PEI, #23966-1, Polysciences Europe; 4 µg/mL of production) in OptiMEM (#31985047, Life Technologies; 100 µL/mL of production). The DNA-PEI mixture was incubated at room temperature for 20 min and added dropwise to the Expi293F culture (Thermo Scientific) seeded at 2 million per milliliter in Expi293 expression medium (#A1435102, Thermo Scientific) on the previous day. The supernatant was harvested, filtered through a 0.45-µm membrane, and analyzed after 72 h posttransfection. Antibodies were purified through protein G pull-down using a Ab SpinTrap kit (#28408347, Cytiva). Protein concentration in the eluates was calculated through 280-nm absorbance measured by DS-11 spectrophotometer (DeNovix).

Insert Antibodies Concentration and Binding Analysis.

The concentration of the antibodies in the Expi293F supernatant was determined by the ELISA. Half-area high binding polystyrene plates (#7626991, Greiner) were coated with 25 µL of 10 ng/µL PBS dilution of goat anti-human IgG (#2040-01, Southern Biotechnologies) overnight at +4 °C. The next day, plates were blocked with 1% BSA in PBS (PBS-BSA) for 1 h at room temperature, washed three times with 0.1% Tween in PBS (PBST), and incubated with the serial dilutions of the Expi293F supernatant (72 h posttransfection) in PBS-BSA for 1 h at room temperature. Human IgG preparation (#0150-01, Southern Biotechnologies) was used as a standard. Washed plates were incubated with 1:500 PBS-BSA dilution of goat anti-human IgG conjugated with alkaline phosphatase (#2040-04, Southern Biotechnologies) for 1 h at room temperature, and the washing procedure was repeated. Plates were incubated with 250 µg/mL solution of 4-nitrophenyl phosphate (#S0942, Sigma Aldrich) for 30 min and analyzed for 405 nm absorbance with Cytation 5 device (Biotek). For all antibodies expressed and purified at a detectable level, we determine the binding affinity toward the backbone target (SARS-CoV-2 RBD for 2M-10B11, S309, and S2H13; gp140 for F105; Influenza HA for FI and FY), the irrelevant viral antigen (gp140 for 2M-10B11, S309, and S2H13; SARS-CoV-2 RBD for F105, FI, and FY), and BSA via ELISA. The procedure is performed according to the protocol described above with the following changes: 1) the plates were coated with the corresponding antigens listed above and 2) protein G-purified antibodies were used instead of Expi293F supernatant. Corresponding backbone antibodies were used as a standard. The optical density (OD) measurements were analyzed by in-house R scripts, effective dilution/concentration 50 (ED50/EC50) was calculated using the four-parametric curve fitting (Dataset S5).

Cell Lines.

EBV-immortalized cell lines derived from African donors were produced in an earlier study (4).

Suppression PCR Bioinformatics Analysis.

For suppression of PCR bioinformatics analysis, 300-nt paired-end (PE) reads were trimmed to remove adapters and poor-quality base calls using Trim Galore (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/, v0.4.3, parameters –nextera –paired -q 20 –length 100). Quality score per base position was assessed using FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Trimmed reads were aligned in paired-end mode to the GRCh38.p12 human genome assembly using Burrows-Wheeler alignment tool (v0.7.17, parameters: bwa mem) (67). The IGH locus (V, D, J, and constant genes) was defined on GRCh38, with the following coordinates: chr14 105586437–106879844 (heavy). Light chain and TCR loci were mapped to: chr2 88848000–90362000 (κ) and chr22 22026076–22922913 (λ), chr14 21621904–22552132 (TCR-α), chr14 142299011–142813287 (TCR-β). Corresponding target locus (IGH/IGK/IGL/TCRA/TCRB) was excluded from the insertion mapping to distinguish between the genomic insertion and a cryptic exon splicing. To find the insertion within the VDJ rearranged sequences, we selected genomic ranges when sequence coverage was above 10 reads, with a sequence length between 20 and 2,000 bp. Genome coverage was calculated using BEDTools (https://bedtools.readthedocs.io/en/latest/index.html, v2.27.1) (68) and a dedicated python script using pysam (https://github.com/pysam-developers/pysam) was written to identify potential inserts. Potential inserts were assigned if they fulfill the following criteria: 1) one mapping read is chimeric with the IGH region; 2) it has no secondary alignment; 3) its mapping quality is equal or higher than 5; and 4) its mate is mapped. The list of potential inserts was annotated using GENCODE v291 (69) and BEDOPS tools (70). mtDNA origin of the insertions was confirmed by comparing the E-values of the alignment against mitochondrial and nuclear genomes. Original reads were retrieved, pooled, and used as input for de novo assembly using Trinity software (71). BLAST was used to define the insert coordinates within the contig (command-line version, v2.7.1+). Contig sequence bordering the insert was annotated by IgBLAST (72). Finally, we validated contig sequences that: 1) contain an insert embedded by V-J genes and CH1 gene or by V gene and J-CH1 genes; and 2) contain the last 15 bp of one of the V primers (two mismatches allowed) + 5 bp belonging to the V gene (1 mismatch allowed). To identify exon–exon junctions, we used JAGuaR to generate a junction database of the human transcriptome (100-bp junction length) and then blast the contig sequences against this database (parameters: -task megablast -perc_identity 95). Inserts covering more than 70% of the junction length (70 bp) were tagged as having an exon–exon junction.

Insert Frequencies and Feature Analysis.

Frequencies of the insertions were determined for each donor as well as distinct B cell populations by dividing the number of detected unique inserts by the number of analyzed B cells for a given sample. Insertions were mapped to nuclear and mtDNA using in-house R scripts (73) (https://github.com/lebmih/LAIR). Details on in silico controls generation and calculation of the distance and overlap with previously reported genomic regions such as R-loops, ERFS, CFS, LINEs, SINEs, AID, and RAG off-target sites, are provided in . CDR3 sequence alignment was performed with the MUSCLE algorithm using UGENE software (74).

Alternative Splicing.

Illumina reads received from suppression PCR libraries of IGH, IGL, and IGK chain profiling were processed as follows: reads were paired by PEAR software (75), adapters were removed with Trimmomatic (76), and constant genes mapped by an in-house script in R (73). Variable domain was annotated by IgBLAST (77), the alternative exons quantified and extracted by an in-house script in R. Extracted sequences were mapped to the corresponding locus by BLAST+ (78) and visualized as the histogram in R. Cryptic splice sites were annotated by Human Splicing Finder web service (79).

P/N Nucleotide Analysis.

To measure the P/N nucleotide length, we analyzed the transcript sequence surrounding the inserted fragment. To extract the junction sequence, we detected the conserved second framework cysteine residue through TT[AG][CATG]TGT[GA][CT] pattern search and the J-segment conserved tryptophane with CTGGGGC[CA][AG][ATG]GG[AGC]AC[AC][AC][CT] pattern search. The nucleotide sequence between the conserved Cys and Trp residues was characterized with IMGT junction analysis (https://imgt.org/IMGTindex/IMGTJunctionAnalysis.php).

Statistical Analysis.

Sampling size was not predetermined and was limited by the availability of the donor material. Researchers were not blinded to the studied groups. Data were analyzed for normality with Shapiro–Wilk test. Colocalization and overlap data were not distributed normally, Wilcoxon rank-sum test was used with the null hypothesis stating the absence of a significant difference (80). Plots were created using ggplot2 and gridExtra packages in R 3.6.2 (73). In all boxplots, the black lines represent the median, the top and the bottom of the boxplots represent 25th and 75th percentile, and the whiskers spread from the borders of the box for 1.5× interquartile range. Significance level showed according to the legend: ns, P ≥ 0.05, *P < 0.05, **P < 0.01, ***P < 0.001, and ****P < 0.0001. Schemes and loci depictions are generated with Inkscape software (not-in-scale schemes) or with ggplot2 (in-scale maps).

78 in total

1. Insertion of excised IgH switch sequences causes overexpression of cyclin D1 in a myeloma tumor cell.

Authors: A Gabrea; P L Bergsagel; M Chesi; Y Shou; W M Kuehl
Journal: Mol Cell Date: 1999-01 Impact factor: 17.970

2. Dna2 nuclease deficiency results in large and complex DNA insertions at chromosomal breaks.

Authors: Yang Yu; Nhung Pham; Bo Xia; Alma Papusha; Guangyu Wang; Zhenxin Yan; Guang Peng; Kaifu Chen; Grzegorz Ira
Journal: Nature Date: 2018-12-05 Impact factor: 49.962

3. Collisions between replication and transcription complexes cause common fragile site instability at the longest human genes.

Authors: Anne Helmrich; Monica Ballarino; Laszlo Tora
Journal: Mol Cell Date: 2011-12-23 Impact factor: 17.970

4. Insertion of microRNA-125b-1, a human homologue of lin-4, into a rearranged immunoglobulin heavy chain gene locus in a patient with precursor B-cell acute lymphoblastic leukemia.

Authors: T Sonoki; E Iwanaga; H Mitsuya; N Asou
Journal: Leukemia Date: 2005-11 Impact factor: 11.528

5. Sequence at insertion site of E.Tn retrotransposon into an immunoglobulin switch region suggests a role for switch recombinase.

Authors: L A Elenich; W A Dunnick
Journal: Nucleic Acids Res Date: 1991-01-25 Impact factor: 16.971

6. Effect of a malaria suppression program on the incidence of African Burkitt's lymphoma.

Authors: A Geser; G Brubaker; C C Draper
Journal: Am J Epidemiol Date: 1989-04 Impact factor: 4.897

7. Deep-sequencing identification of the genomic targets of the cytidine deaminase AID and its cofactor RPA in B lymphocytes.

Authors: Arito Yamane; Wolfgang Resch; Nan Kuo; Stefan Kuchen; Zhiyu Li; Hong-wei Sun; Davide F Robbiani; Kevin McBride; Michel C Nussenzweig; Rafael Casellas
Journal: Nat Immunol Date: 2010-11-28 Impact factor: 25.606

8. Prevalent, Dynamic, and Conserved R-Loop Structures Associate with Specific Epigenomic Signatures in Mammals.

Authors: Lionel A Sanz; Stella R Hartono; Yoong Wearn Lim; Sandra Steyaert; Aparna Rajpurkar; Paul A Ginno; Xiaoqin Xu; Frédéric Chédin
Journal: Mol Cell Date: 2016-06-30 Impact factor: 17.970

9. A genome-wide analysis of common fragile sites: what features determine chromosomal instability in the human genome?

Authors: Arkarachai Fungtammasan; Erin Walsh; Francesca Chiaromonte; Kristin A Eckert; Kateryna D Makova
Journal: Genome Res Date: 2012-03-28 Impact factor: 9.043

10. IMGT, the international ImMunoGeneTics information system: a standardized approach for immunogenetics and immunoinformatics.

Authors: Marie-Paule Lefranc
Journal: Immunome Res Date: 2005-09-20