Literature DB >> 25873751

AID to overcome the limitations of genomic information by introducing somatic DNA alterations.

Tasuku Honjo¹, Masamichi Muramatsu¹, Hitoshi Nagaoka¹, Kazuo Kinoshita¹, Reiko Shinkura¹.

Abstract

The immune system has adopted somatic DNA alterations to overcome the limitations of the genomic information. Activation induced cytidine deaminase (AID) is an essential enzyme to regulate class switch recombination (CSR), somatic hypermutation (SHM) and gene conversion (GC) of the immunoglobulin gene. AID is known to be required for DNA cleavage of S regions in CSR and V regions in SHM. However, its molecular mechanism is a focus of extensive debate. RNA editing hypothesis postulates that AID edits yet unknown mRNA, to generate specific endonucleases for CSR and SHM. By contrast, DNA deamination hypothesis assumes that AID deaminates cytosine in DNA, followed by DNA cleavage by base excision repair enzymes. We summarize the basic knowledge for molecular mechanisms for CSR and SHM and then discuss the importance of AID not only in the immune regulation but also in the genome instability.

Entities: CellLine Chemical Disease Gene Species

Keywords: Activation induced cytidine deaminase; DNA deamination; RNA editing; class switch recombination; genome instability; somatic hypermutation

Year: 2006 PMID： 25873751 PMCID： PMC4323042 DOI： 10.2183/pjab.82.104

Source DB: PubMed Journal: Proc Jpn Acad Ser B Phys Biol Sci ISSN： 0386-2208 Impact factor: 3.493

Introduction

One of the most striking features of the complete human genome sequencing is that the human genome may contain as few as 30,000 genes, only twofold more than those in the fruit fly or worm genomes.[1),2)] Such a number of genes may be too small to support highly sophisticated biological functions such as the immune system in humans. Evolution of the vertebrates adapted somatic alteration of genetic information after birth to overcome this problem. This implies that the genome is not a fixed blueprint but rather a scenario of life that requires ad libs. The immune system is known for taking advantage of a series of genetic alterations during lymphocyte differentiation as well as after stimulation with antigen. First, antigen receptor genes are assembled by site-specific recombination of subexon segments of the variable (V) region gene, namely V, diversity (D) and joining (J) segments.[3)] Each step of VDJ recombination is programmed, ordered, and tightly regulated by a number of factors including cytokines provided by stromal cells. The regulation prevents generation of more than one copy of functional V exon in the heavy (H) and light (L) chain gene loci (allelic exclusion). Subsequently, mature B lymphocytes which have completed functional VDJ recombination of both H and L chain genes, express IgM on the surface and migrate to the secondary lymphoid organs such as spleen and lymph nodes where they encounter antigens. B lymphocytes activated by antigen stimulation proliferate vigorously in lymphoid follicles and often form special microenvironment called germinal centers, where the second wave of genetic alterations, namely class switch recombination (CSR) and somatic hypermutation (SHM), takes place in the immunoglobulin gene loci.[4)] SHM takes place in the V region of both H and L chain genes, introducing a million times more point mutations than the genome-wide background. SHM followed by selection leads to generation of high-affinity antibodies. CSR replaces the immunoglobulin CH gene to be expressed from Cμ to Cγ, Cε, or Cα, resulting in switching of immunoglobulin isotype from IgM to either IgG, IgE, or IgA, respectively, without changing the antigen specificity. Each isotype determines the manner in which captured antigens are eliminated or the location where the immunoglobulin is delivered and accumulated. Thus, CSR and SHM generate quite distinct products in entirely different targets, i.e., CH and VL/H, respectively. The discovery of activation induced cytidine deaminase (AID) and its function provided unexpected findings, as this single protein regulates all antigen-induced genetic alterations with distinct features, including point mutation (SHM) and region-specific recombination (CSR).[5),6)] That observation hinted at the enormous complexity of the strategies used by living organisms to overcome the limitation of genomic information, although the exact function of AID is still a matter of considerable debate.[7),8)] On the other hand, AID is a dangerous protein that can introduce mutations in the genome. In fact, over-expression or ectopic expression of AID has been shown to induce tumors.[9),10)] AID was also suggested to be involved in chromosomal translocation between c-myc and IgH locus in plasmacytoma.[11)] Recent reports have shown that AID is induced by several types of viral infection, implicating a wider role of AID in tumorigenesis or mutagenesis.[12)–15)]

Molecular basis of CSR

Region specific recombination

The immunoglobulin C locus contains an ordered array of CH genes,[16)] each flanked at its 5′ region by a switch (S) region composed of tandem repetitive sequences with many palindromes.[17)] CSR takes place between two S regions, resulting in looped out deletion of intervening DNA segments as circular DNA.[18)–20)] Since the Cμ gene is located at the VH proximal end of the CH gene cluster, CSR between Sμ and another S region 5′ to a CH gene brings that particular CH gene adjacent to the VH exon. CSR in the S regions is preceded by transcription of the two S regions starting from the I promoter located 5′ to each S region. Since mutations at splicing donor sites of the transcripts reduce CSR,[21),22)] not only transcription but also splicing of transcripts appears to be important, which gives rise to germline transcripts containing the I and CH exon sequences. Structures of S regions have common features, although their exact primary sequences are diverged.[23)] Each mammalian S region contains scattered but conserved guanine(G)-rich pentameric sequences, which are major tandem repeat units in the Sμ region.[24)] In mouse, Sγ sequences are mostly repetition of 49-bp repeats, and Sε consists of 40-bp repeating units.[23)] Sα region consists of 80-bp unit sequences.[25)] Another important feature of the S region is the presence of abundant palindromic sequences, which can form the stem loop structure in a denatured state. Similar repetitive sequences are found in S regions of human, chicken, frog, cow, pig, camel, shrew and rabbit.[26)–33)] Requirement of the S region for CSR was first demonstrated by an in vitro assay system using artificial switch substrates, in which the absence of S sequences completely abolished CSR.[34)–40)] This finding is consistent with the inability to express the structurally normal human pseudo Cγ gene without the S region.[41),42)] Deletion of the major portion of the Sμ core region from mouse causes a reduced frequency of CSR, the IgG1 production decreases to a half, but clearly significant levels of many isotypes are found in sera.[43)] The results suggest that scattered pentameric unit sequences upstream of the Sμ region may serve less efficiently as a functional S region. More recently, Cogne and colleagues generated mice defective of all Sμ pentameric sequences and found 10∼100 fold less efficient class switching without affecting germline transcription, clearly suggesting an essential role of Sμ pentameric sequences.[44)] Shinkura et al. assessed IgG1 class switch efficiency in Sγ 1-deleted IgH locus using genetically modified mice, and found almost no IgG1 class switching in those B cells without affecting IgG1 germline transcription.[45)] Taken together, G rich repeats in S region are critical to CSR. The products of CSR are a deleted chromosomal IgC locus and looped-out circular DNA. Transchromosomal recombination is also demonstrated between a transgene and the endogenous locus[46),47)] or even between endogenous loci[48)] in mice, and in μ-to-α switching in normal rabbits.[28),49)] Joining of cleaved ends appears to be mediated by a non-homologous end joining (NHEJ) repair system,[50)–52)] which also plays an essential role in VDJ recombination. Extensive analyses of switch recombination junctions have revealed no consensus sequences in the proximity of the breakpoints nor homologous sequences between two recombined parental sequences.[23),53),54)] These results indicate that CSR is a unique type of recombination that does not belong to either homologous or site-specific recombination but belongs rather to region-specific recombination. This is consistent with the finding that primary sequences of S regions are not important for CSR (see below), although there is a report claiming the existence of class-specific factors.[55)] Distribution of breakpoints is mostly within the S region, but they are also found in 5′ and 3′ franking regions of S regions.[53),54),56)] In fact, the breakpoints are rather confined within the intronic region of germline transcripts, namely between downstream of the Iμ exon and upstream of the Cμ exon.[54)] However, recombination junction points can be observed in the Iμ exon when the Sμ region is deleted from the mouse genome.[44)] The molecular mechanism of CSR can be divided in three steps: (a) selection of target S region, (b) recognition of target structure and cleavage by a putative recombinase and (c) repair and ligation. None of these steps has been fully understood. However, recent studies using mice genetically manipulated by transgenes and gene targeting and by switching B cell lines carrying artificial switch substrates have expanded our knowledge.[38),40),57)–63)] Furthermore, comparison of these steps with those of SHM has revealed a strinking similarity, which, together with the common requirement of AID, provides an important clue for understanding of molecular mechanisms of CSR and SHM. Since the third step is clearly distinct between CSR and SHM, this review does not cover it.

CSR assay systems using artificial switch substrates in cultured cells

A number of groups reported artificial mini-chromosomal constructs to dissect the recognition target by a putative CSR recombinase.[34)–40),64),65)] These constructs have many different features. It is important to evaluate each construct to determine whether it meets basic requirements of CSR in vivo. Gene disruption studies have shown that CSR depends on transcription of the S region,[59),66)–68)] splicing of transcription products[21),22)] and the presence of the S region.[43)–45)] CSR is absolutely dependent on AID.[5),6)] From these criteria some of the systems may not represent real CSR, and the interpretation of data from such constructs is limited and so excluded from this review. Studies on artificial switch substrates have also shown the requirement of a pair of S regions,[38),40),57),58)] their transcription[38),40),57),58)] and AID[69)] for CSR. The splicing requirement has been partially demonstrated in the artificial system.[40)] Important messages obtained from artificial substrates are (a) the I exon and C exons are dispensable,[38),40),57),58)] (b) the palindromic sequence (not the primary sequences of S regions) is important,[38),70)] and (c) the transcription level of S regions correlates to efficiency of CSR.[71)]

Selection of target S regions by cytokines

Cytokine stimulation activates a specific I promoter and induces synthesis of germline transcripts containing the I and CH exon sequences. Since germline transcription almost always precedes cytokine-induced CSR, the two groups[72),73)] proposed the accessibility model that germline transcription of the S region opens its chromatin structure, allowing a putative CSR recombinase to access to a particular S region. This hypothesis is generally well supported by a number of experiments that clearly indicate a close linkage of the CSR target with the transcribed S region.[59)–63),66)–68),74)–78)] The original accessibility model postulated that CSR recombinase exists before induction of germline transcription and chromatin accessibility is limiting to initiate CSR. However, artificial constructs containing a constitutive promoter for each of two S regions introduced in CH12F3-2 B lymphoma cell line[79)] were unable to switch unless cytokine stimulation was given.[38)] In addition, a protein synthesis inhibitor cycloheximide blocks cytokine-induced CSR, suggesting that de novo protein synthesis is required for CSR.[80)] Such experiments imply that cytokine stimulation plays at least two roles: (a) induction of germline transcription associated with chromatin opening and (b) induction of de novo synthesis of CSR recombinase or its activator. If germline transcription is required only for opening the chromatin locus of S regions, a minimal level of transcription may be sufficient and quantitative increment of transcription would not affect the CSR efficiency. Experiments using an artificial switch substrate containing a tetracycline-inducible promoter in place of the I promoter clearly demonstrated that germline transcription levels quantitatively correlate with the CSR efficiency.[71)] Experiments using transgenic loci[59)–63)] and also artificial switch constructs[38),40),57),58)] showed the I promoter to be dispensable and replacable by any promoters. The I promoters are regulated by signaling of cytokine receptors and CD40.[81)–90)] Regulatory regions of many I promoters have been extensively studied and shown to contain several binding motifs of transcription factors that are regulated by specific cytokines.[91)–102)] The involvement of enhancers in CSR has been also extensively characterized. The 30 kb region just downstream of the most 3′ CH gene (Cα) contains four known enhancer elements including HS3a, HS1,2, HS3b, and HS4. Neither of two DNase I hypersensitive sites, HS1,2 and HS3a in the 3′ enhancer is required for efficient CSR,[78)] but the intronic enhancer is required.[67),74),75),103)] It was reported that deletion of the two 3′ enhancers HS3b and HS4, severely impairs germline transcription as well as CSR.[104)] However, switch constructs without enhancers are capable of undergoing CSR if they have strong promoters for S regions.[38)] Enhancers are probably required in vivo to support efficient transcription. Gene replacement studies have shown that splicing disruption of germline transcripts causes severe reduction or abolishing of CSR,[21),22)] which suggests that spliced germline transcripts or loading of spliceosomes on the S region may be important for CSR. In fact, the distribution of breakpoints is closely associated with the intronic region of germline transcripts.[54)] However, another interpretation could be that CSR is reduced because the absence of splicing decreases transcription efficiency. In summary, the selection of a target S region among many S regions is mediated by transcription from the particular I promoter of that S region. Since the level of transcription is correlated with the CSR efficiency, the amount of transcription machinery loaded on the S region, the stem-loop structure of the denatured S region during transcription, or spliceosomes associated with the transcribed S region may play important roles.[71),105)] These three are not mutually exclusive.

Recognition and cleavage of S regions by CSR recombinase

The S region primary sequence is not important to CSR because, in CH12F3-2 B cell line that specifically switches to IgA, the Sα sequence of the switch construct can be replaced by the Sγ 1 or Sε sequence without changing switching efficiency.[38)] Inverted orientation of the Sα region in the artificial substrate is efficient for CSR, although a severe reduction of CSR efficiency was observed in B cells carrying an inverted Sγ 1 region.[45)] In vitro artificial constructs containing S regions of various species or their derivatives demonstrate that the most important features of the S region are not repetitive sequences nor G-rich sequences but palindromic sequences.[70)] AT-rich sequences of the Xenopus S region, but not G-rich repetitive sequences of the telomere, support CSR. Most strikingly, the multiple cloning sequence of the Bluescript plasmid, which contains many palindromic sequences, was able to replace S region functionally,[70)] which suggests that the palindromic nature of the S region primary sequences is most important. Since palindromic sequences are rich in the S region, the stem-loop structure can be formed transiently in S regions when they are denatured during transcription.[106),107)] Such a stem-loop structure is proposed as a recognition target.[27),70),108),109)] Mussmann et al. [27)] identified CSR breakpoints in close proximity to the transition sites from a stem to a loop structure, based on the single stranded DNA folding program. Reaban & Griffin [110)] found that in vitro transcription of supercoiled plasmids containing the murine Sα sequence leads to loss of a superhelical turn. Analysis of the less supercoiled plasmid showed the formation of RNA/DNA hybrid by the nascent RNA transcript. Based on this in vitro observation, they proposed that the R-loop structure could be a recognition target by CSR recombinase. A similar structure was detected by in vitro transcription in a wider range of S regions.[111),112)] Overexpression of germline transcripts in trans or E. coli RNase H failed to support requirement of germline transcripts and R-loop formation in B cells.[71)] Another type of recognition structure has been proposed by the finding that single-stranded G-rich sequences self-associate to form four-stranded structure.[113)] This unique structure is probably held together by Hoogsteen pairing. Since the G-rich sequences exist not only in the immunoglobulin S region but also in gene promoters and chromosomal telomeres, such four-stranded structures were proposed to be recognition targets in various biological systems including CSR. There is no question about involvement of double-strand cleavages in CSR, at least one each in two S regions. An important question is how they are generated. One possibility is double-strand cleavages by an endonuclease. Another possibility is two successive nicks in each S region, generating staggered double-strand cleavages. The single-strand tail of the staggered cleavage product may be processed for the subsequent end joining mechanism mediated by the NHEJ repair system. The other possibility is a nick cleavage followed by transesterification as catalyzed by RAGs, producing a double-strand cleavage with a blunt end and a hairpin end.[114)] As shown for VDJ recombination, CSR can theoretically generate inversion products. In fact, the organization of the chicken CH locus suggests inversion-type CSR in vivo.[115)] Chen et al.[116)] have shown that inversion-type CSR is enhanced in switch substrates when two S regions are transcribed by two separate promoters in the opposite direction. Using such substrates, nucleotide sequences of junction points in more than 30 inversion-type switch products were determined. Since the two breakpoints of a single recombination event are retained in the substrate by inversion-type recombination, one can estimate the mode of cleavage by examination of deletion or duplication during CSR. The majority of them contain either duplication or deletion. Duplication during recombination can be explained only by a staggered cleavage with the 5′ over hang, followed by DNA synthesis to convert single-stranded tails to double-stranded ends. Deletion of sequences can be explained by multiple blunt end cleavages or by a staggered cleavage with the 3′ over hang, followed by exonuclease chewing of single-stranded tails. It is known that cleaved ends of the CSR target have to be double-stranded for the NHEJ repair system. The distance of two nicks can vary, and variable lengths of deletion or duplication can be explained by variation in the distance of two nicks. The results suggest that staggered cleavage is more likely than blunt-ended double-strand cleavage and nick coupled with transesterification as the first step of CSR. The cleaved ends of S regions have to be repaired and joined together to give rise to looped-out circular DNA and deleted chromosome. The NHEJ system is clearly involved in joining of cleaved ends of S regions because Ku-80 and Ku-70 which form a complex with DNA-PKcs to function as DNA-PK, are shown to be required for CSR.[51),52)] SCID mice contain a leaky mutation in DNA-PKcs. DNA-PKcs deficient mice with an IgH and IgL knock-in are completely defective for CSR except reduced IgG1 switching.[117)] On the other hand, SCID mutant with another IgH and IgL knock-in are almost normal for switching to IgG1, IgG2b, IgG3, IgE and IgA.[118)] The discrepancy is puzzling but reminiscent of the requirement of UNG protein but not U removal activity (see below).

Somatic hypermutation

Distribution of mutations and their target specificity

Lines of evidence have clearly shown that the complementarity determining regions (CDRs) are preferred targets of mutation to the framework regions of the Ig V regions, as originally proposed by Wu et al. [119)] The three-dimensional structure of Ig has established the functional significance of CDR as antigen binding site. Subsequently, extensive studies have been carried out to look for any primary sequences associated with mutation sites, and these revealed a few preferred motifs, among which the RGYW motif is approximately twofold more frequently mutated than by chance.[120)–123)] However, because of the selection after hypermutation, whether CDR and the RGYW motif are the preferred targets of the mutation event or selection had to be examined. To solve this problem, nucleotide sequences of nonfunctional V genes, especially out-of-frame VDJ recombination products, were investigated. Such studies indicate that CDRs are indeed preferred targets of mutation, as compared with framework regions.[124)] In addition, the RGYW motif contains the mutation more frequently even in nonfunctional V genes. Furthermore, contrary to previous suggestions,[125),126)] there is no strand bias for the mutation target.[123),127)] The importance of primary sequences is confirmed by observing the reduction or increase of mutability after changing a few bases in transgenes.[120),128)] However, the primary sequence is not the only determinant of mutation targets because not all RGYW motifs are mutated, and the RGYW motif inserted in other environments loses its mutability.[120)] It is interesting to note that the RGYW motif includes the AGCT palindromic sequence, which is most abundantly found in S sequences. The microsequence specificity of mutations introduced during SHM and those introduced meiotically during neutral evolution is generally similar, suggesting that the enzyme machinery incorporating mutations may be shared.[129)] The Ig gene is not the only target of hypermutation.[130)–135)] The bcl-6 gene in human and mouse B cells accumulates mutations with slightly lower frequency.[130),131),136)] Translocated c-myc genes also contain frequent mutations.[134),135)] Many other genes like β-globin accumulate extensive mutations when driven by the Ig promoter and intron enhancer as transgene.[137)–141)] Interestingly, insertion of the EPS sequence (tandem restriction site sequences for EcoRV and PvuII) which contains abundant palindrome sequences strongly augments the SHM frequency in surrounding V gene sequences.[137),141)] The EPS insert which contains the E47 motif is mutated many times more frequently than the flanking Ig sequences.[142)] Kotani et al. [10)] pointed out that most of hypermutated genes including Ig genes contain the E47 motif. Interestingly, Kolchanov et al. [143)] pointed out that not only V genes but also other hypermutated genes including c-myc contain palindromic sequences. Furthermore, CDRs overlap the stem-loop structure predicted by the computer program.[143)] Since Brenner & Milstein[144)] proposed that hypermutation is introduced by DNA cleavage and error-prone repair, initiation of SHM by DNA cleavage has been taken for granted. A considerable list of error-prone DNA polymerases is recently identified.[145)–148)] Among these DNA polymerases η,[149),150)] ζ,[151)–153)]θ,[154),155)] and Rev1[156)] are reported to be involved in SHM. Since the absence of any one of these polymerases does not completely abolish SHM, error-prone polymerases appear to be redundant. However, they have some preference for incorporation of bases, as has been characterized by biochemical properties of polymerases.[149)–155)] These results suggest a scenario that after cleavage of V genes, error-prone DNA polymerases introduce mismatched bases, which will be either fixed or corrected by mismatch repair enzymes. It is important to note that the primary DNA cleavage site recognized by a putative endonuclease of SHM and the actual base change site detected as mutations may not be the same. We can only determine the outcome of base changes that are probably generated during repair or DNA synthesis after DNA cleavage. Taking all these results together, the following conclusion is drawn for SHM target specificity; (a) some motifs are preferred but not absolutely required, and (b) palindromic sequences containing the E47 motif are not only preferred targets but also stimulators of SHM. Therefore, the stem-loop structure based on palindromic sequences is a most likely candidate for a recognition target of SHM endonuclease.

Quantitative correlation of V region transcription with SHM frequency

The transcription requirement of hypermutation target genes has been demonstrated by several convincing experiments.[138),157)–159)] Transgenic mice carrying a transgene, in which the Ig promoter is duplicated upstream of the Cκ region but downstream of the intronic enhancer, accumulate SHM not only in the V gene but also in the Cκ gene, which normally does not mutate.[138)] The authors postulate that a putative mutator is guided to the target DNA by RNA polymerase. Another series of transgenic experiments, including deletion of V promoter and replacement of V promoter with that of RNA polymerase I, have clearly demonstrated that the level of transcription parallels with the frequency of SHM.[158)] The most direct quantitative correlation between the transcription level and SHM frequency is demonstrated using 18–81 pre B cell transfectants with a GFP transgene containing a point mutation to block its expression. Transcription of the GFP gene is controlled by the tetracycline-inducible promoter, and the level of transcripts from the GFP gene is almost directly correlated with the frequency of SHM measured by expression of GFP.[157)] Methylation is responsible for suppression of SHM probably through reducing the transcriptional efficiency.[160),161)] The distance from the promoter determines the initiation site of mutation in the target gene. Several transgenes that contain insertion and deletion between the promoter and V gene shifted not only distribution[138),162)] but also frequency of mutations.[163),164)] Again the efficiency of transcription has to be carefully evaluated to compare the frequency of mutations in various transgenes. The enhancer requirement for SHM was shown by deletion of Eμ and 3′κ enhancers from transgenes.[163),165),166)] In turn, insertion of the enhancer facilitated SHM in non-Ig genes.[163),167)] However, the transcription level may be of primary importance, and the presence of the enhancer may not be crucial,[168)] in agreement with the fact that the tetracycline-inducible promoter of the mutating GFP gene described above does not have an enhancer.[157)]

Cleavage of target DNA

Evidence to demonstrate that DNA cleavage in the V gene is an obligatory intermediate of SHM is limited in spite of general acceptance. Sale & Neuberger [169)] reported that extra nucleotides are inserted preferentially into the V gene of hypermutating Ramos B cell line when terminal deoxynucleotidyl transferase is overexpressed. The distribution of inserted oligonucleotides agrees broadly with the general distribution of point mutations. Extensive deletions or insertions are also observed in V genes of human memory B cells, which probably reflect aberrant products of SHM.[170),171)] The ligation-mediated PCR method was employed to detect double-strand breakage(DSB) in hypermutating B cells.[172),173)] These studies have shown that distribution of DSB overlaps with CDR and the RGYW motif. Although these experiments show that DSBs exist in the V gene at much higher frequency than that in the C gene, many of them were AID independent.[174),175)] Since background DSBs are not rare in proliferating cells,[176)] it is important to assess which fraction of DSB are the intermediate of hypermutation. Furthermore, the sites of mutations do not necessarily correlate with those of cleavage. In case of CSR, recombination joining sites (probably cleavage sites) and mutation sites can be 50 ∼ 300 bp apart.[177)] Whether DSB are either the primary product or secondary repair product of nicking should be also examined. Kong & Maizels,[178)] using a similar method, have found that the majority of cleavages in the V region of mutating B cells are single-stranded nicks. Nagaoka et al.[179)] used the histone γ H2AX (phosphorylated histone H2AX) focus formation as a marker of DSB and showed that DSB is induced in the V region in an AID dependent manner.

Discovery of AID and its physiological properties

Isolation, expression and structure of AID

AID cDNA was isolated from a murine B lymphoma line CH12F3-2 by subtractive cDNA hybridization using mRNAs from class switch-stimulated and nonstimulated CH12F3-2 cells.[80)] AID mRNA is found only in activated B cells, most prominently in the germinal center B cells. AID mRNA encodes a protein of 198 amino acid residues containing a unique cytidine deaminase motif at residues 55–94, which includes three essential residues (H56, C87, and C90) for zinc binding and catalytic activity. These residues are highly conserved by all members of the cytidine deaminase family, including metabolic cytidine deaminase in E.coli. The authologue of AID is found in vertebrates but not in nonvertebrates.[180),181)] The amino acid sequence of AID has the strongest homology with that of apolipoprotein (apo) B mRNA editing catalytic polypeptide 1 (APOBEC-1), a well-characterized RNA editing enzyme of apoB100 mRNA that encodes the cholesterole carrier protein in low-density lipoprotein (LDL).[182)] APOBEC-1 recognizes the structure of apoB100 mRNA through a cofactor called APOBEC-1 complementation factor (ACF),[183)] which guides the APOBEC-1 catalytic center to the specific cytosine (C) at position 6666. APOBEC-1 converts this C to uracil (U) and generates apoB48 mRNA that encodes the chylomicron protein component, a carrier of triglyceride. Both APOBEC-1 and AID form a dimmer.[184),185)]

AID is essential and sufficient for CSR and SHM

The physiological function of AID is clearly demonstrated by studies on AID-deficient animals and patients.[5),6)] AID-deficient mice show the complete loss of class switching and accumulation of IgM in sera and feces. Patients with an autosomal recessive hereditary disease called hyper-IgM syndrome type 2 (HIGM2) have severe defects in class switching. Genetic linkage analysis of the disease locus using polymorphic markers has revealed that the mutation is mapped on chromosome 12p13, which coincided with the human AID gene locus determined by the FISH analysis.[186)] Subsequent molecular studies demonstrated that all HIGM2 patients have mutations in the AID gene, and all these mutated AID cDNAs are shown to be defective in CSR by in vitro assays described below.[185)] The B lymphocytes from HIGM2 patients and AID-deficient mice are unable to switch isotypes by in vitro stimulation. Surprisingly, AID-deficient memory B cells of HIGM2 patients do not have SHM, and repeated antigen stimulation of AID-deficient mice do not show accumulation of mutations in the antigen-specific V region.[5),6)] The major phenotypes of AID deficiency are summarized in Table I.

Table I.

Hyper IgM syndrome type II and mouse AID deficiency

Absence of IgG, IgE and IgA

Increased IgM in sera and feces

Defect of CSR in stimulated B cells

Defect of SHM by Ag administration

Enlarged germinal centers

Recurrent infection

Hyper IgM syndrome type II and mouse AID deficiency Absence of IgG, IgE and IgA Increased IgM in sera and feces Defect of CSR in stimulated B cells Defect of SHM by Ag administration Enlarged germinal centers Recurrent infection Ectopic expression of AID induces class switching and hypermutation in non-B cells, such as fibroblasts,[69),187)] hybridomas[188),189)] and T cells,[69)] which carry artificial constructs for measuring CSR and SHM. DT40 chicken B cells that mutated the AID gene cannot undergo GC, which can be recovered by transfection of the AID cDNA.[190),191)] These results clearly demonstrate that AID is essential and sufficient to all three different genetic alterations induced by antigen stimulation of B cells. The other enzymes and cofactors are probably expressed ubiquitously.

Phenotypes in AID-deficient mice

As compared with RAG-2−/− mice, AID−/− mice are relatively healthy under the SPF condition and resistant to infection with a virulent strain of influenza virus.[192)] However, AID-deficient mice are more susceptible to secondary infection with higher doses, indicating that nonmutated IgM has significant protection capacity against low-dose virus but tailored Igs with SHM and CSR are important for protection from infection with higher-dose virus. HIGM2 patients suffer from recurrent infections, which cause hyperthrophy of lymph nodes and enlarged germinal centers.[6)] AID-deficient mice also have enlarged germinal centers[5)] and accumulation of activated IgM+ B cells and IgM plasma cells in all lymphoid tissues, but especially in the gut lamina propria.[193)] Accumulation of IgM+ B cells and plasma cells in the intestine of AID−/− mice is explained by (1) blockade of in situ class switching in the gut lymphoid tissues of AID−/− mice, where local IgM+ B cells ordinarily switch preferentially to IgA and differentiate to IgA plasma cells,[194)] and (2) sustained activation of the immune system due to the absence of intestinal IgA, causing continuous recruitment of immune cells to gut, which leads to hyperthrophy of Peyer’s patches and protrusion of isolated lymphoid follicles.[193)] The absence of normal hypermutated intestinal IgA causes a profound disregulation of the gut microflora, especially an excessive proliferation of anaerobic bacteria. The anaerobes detected in all segments of AID−/− small intestine are nonpathogenic, commensal strains usually found in flora of the large intestine.[193)] Among them, the major population is represented by segmented filamentous bacteria, strict anaerobes that cannot be cultured, at least with available microbiogical techniques, and strongly attach to the mucosal epithelium.[195)] An antibiotic treatment inhibiting anaerobe expansion or reconstitution of IgA production in AID−/− small intestine recovers the normal composition of gut flora and abolishes both local and systemic activation of the immune system.[193),195)] Thus, IgA secreted into the gut lumen appears to function not only for protection against pathogenic bacterial or viral antigens but also for the homeostasis of the nonpathogenic gut flora, which is essential to prevent overstimulation of the nonmucosal immune system.[196)] Unmutated IgMs, although secreted into the gut lumen, cannot prevent the excessive and aberrant expansion of anaerobes.[193),195)]

Molecular mechanism for regulation of CSR and SHM by AID

The proposal of an RNA-editing mechanism as the function of AID was originally based on the observation that AID has its strongest homology with the RNA-editing enzyme APOBEC-1.[5),80)] According to the RNA-editing hypothesis, AID recognizes a putative mRNA precursor and converts it to mRNA encoding an endonuclease. The endonuclease generated cleaves either the V region by SHM or the S region by CSR (Fig. 1a). Thus, translation of edited mRNA is mandatory for cleavage of DNA after AID expression.

Fig. 1.

Schematic representation of RNA editing and DNA deamination models DNA cleavage mechanisms by the two models are schematically represented. Repair phase is believed to be the same, except that SHM can be introduced by replication according to DNA deamination model. In contrast, the DNA-deamination hypothesis (Fig. 1b) was proposed based on the observation that AID can induce mutations in a variety of genes in Escherichia coli.[197)] Because it is very difficult to imagine that cofactors and targets of AID required for RNA editing are conserved between mammals and E. coli, it was proposed that AID directly deaminates C nucleotides on DNA to uracil (U).[197)] Deamination of C on DNA generates the U:G mismatch pair, which is quickly recognized by base exision repair enzymes, including uracil DNA glycosylase (UNG) and apyrimidine endonuclease, which are responsible for U-nucleotide removal and the cleavage of phosphodiester bonds at apyrimidine sites, respectively (Fig. 1b). According to this model, DNA cleavage is not obligatory for SHM, because the U:G mismatch pair can be corrected to thymidine:adenine (T:A) during replication, resulting in the mutation from C to T or G to A.[197),198)] The distinction between the two hypotheses is in the requirement for de novo protein synthesis for the RNA-editing hypothesis and the involvement of U removal by UNG in DNA cleavage for the DNA-deamination hypothesis. Although many reviews of the function of AID have focused mainly on the DNA-deamination hypothesis,[199)–207)] this issue is totally unsettled. We have compared and critically examined experimental data for and against the two models to elucidate the function of AID (Table II and III).

Table II.

Supporting data to RNA editing model

Supporting data	Counter arguments
① Protein synthesis requirements for DNA cleavage in CSR and SHM by AID ② Homology between AID and APOBEC 1 Evolutionary similarities Cytoplasmic-nucleus shuttling protein Requirement of cofactors for target specificity	① Other labile factors

Table III.

Supporting and contradictory data to DNA deamination model

Supporting data	Counter arguments
① AID deaminates DNA ② CSR reduction in UNG^−/− mice ③ AID association with DNA ④ AID overexpression causes GC biased SHM	① RNA editing enzyme also deaminates DNA ② U removal is dispensable for CSR. UNG is not required for DNA cleavage in CSR and SHM ③ No association of AID with DNA ④ Not always. AID overexpression can cause AT biased SHM

Supporting data to RNA editing model ① Protein synthesis requirements for DNA cleavage in CSR and SHM by AID ② Homology between AID and APOBEC 1 Evolutionary similarities Cytoplasmic-nucleus shuttling protein Requirement of cofactors for target specificity ① Other labile factors Supporting and contradictory data to DNA deamination model ① AID deaminates DNA ② CSR reduction in UNG−/− mice ③ AID association with DNA ④ AID overexpression causes GC biased SHM ① RNA editing enzyme also deaminates DNA ② U removal is dispensable for CSR. UNG is not required for DNA cleavage in CSR and SHM ③ No association of AID with DNA ④ Not always. AID overexpression can cause AT biased SHM

How AID can differentially regulate SHM and CSR

In considering any models for the function of AID, the model must provide an explanation for the critical issues of SHM and CSR: how the V and S regions are specifically targeted for SHM and CSR, respectively; why transcription is required for both SHM and CSR; and how regions 3′ proximal to the promoter are specifically mutated during SHM. Among these, the most critical is the molecular mechanism of distinguishing target loci. The targets to be discriminated are between the rearranged VH and S regions, which are only about 5–7 kilobases apart, and also between nonimmunoglobulin targets of SHM and other nonmutated but actively transcribed genes. As discussed below, neither of the present models can resolve these issues conclusively. Functional studies on AID mutants have provided clear insight into the first issue. AID mutants with truncation or replacement at the carboxy terminus are almost completely devoid of CSR activity but retain full SHM activity in vitro as well as in vivo.[6),185),208)] However, several amino-terminal mutants lack SHM but retain more than 80% of CSR activity.[209)] Furthermore, point mutations in the S region are introduced by ‘CSR+ SHM−’ mutants but not by ‘SHM+ CSR−’ mutants. Those results suggest that CSR and SHM depend on the interaction of specific cofactors with the separate domains of AID. Although the molecular nature of the specific cofactors remains to be elucidated, it is likely that the target specificity to the V- and S-region DNA is determined by the cofactors but not by AID itself. According to the RNA-editing hypothesis, the CSR-specific cofactor would recognize precursor mRNA for CSR-specific endonuclease, and this precursor mRNA would be edited by the ‘collaboration’ of AID and the CSR-specific cofactor, generating mRNA for CSR endonuclease that recognizes the S region but not the V region. A similar mechanism is also applicable to generation of V region-specific endonuclease. Alternatively, the edited mRNA may encode the specific guiding factors that associate the specific target DNA as well as an endonuclease common to the V- and S-region DNA. In any case, these scenarios can easily explain why AID can differentially regulate SHM and CSR. In contrast, the DNA-deamination hypothesis would explain why specific cofactors are required for the interaction of AID with specific target DNA. In that case, target recognition of DNA is mediated by the cofactors and AID may not bind to that DNA directly. One of the proteins required for replication, replication protein A, has been proposed to be a factor that guides AID to specific target DNA.[210)] Because replication protein A is known to associate any single-stranded DNA, it remains to be determined how replication protein A guides AID to specific targets of CSR, SHM or both.

Evidence for AID involvement in DNA cleavage

Direct evidence that AID is involved in DNA cleavage was obtained using focus formation of γ-H2AX as a marker of DSB. The γ-H2AX focus at the immunoglobulin heavy-chain (Igh) locus is formed in B cells undergoing CSR but not in AID-deficient B cells, as shown by overlapping images of fluorescence in situ hybridization of the Igh locus and immunostaining using antibody to γ-H2AX.[211)] Similarly, the AID-dependent γ-H2AX localization in the Igh locus in CSR-stimulated B cells has been demonstrated by chromatin immunoprecipitation using antibody to γ-H2AX.[212),213)] Chromatin immunoprecipitation has also shown that AID-dependent γ-H2AX localizes specifically in the V region and surrounding IGH locus in human lymphoma BL2 cells.[179)] Because a carboxy-terminal truncation mutant of AID was used that can catalyze SHM but not CSR,[185),209),214)] the induction of DNA breakage was associated with SHM but not CSR. It was also demonstrated that overexpression of AID can induce the γ-H2AX localization in VH and C in BL2 cells.[215)] Although DSBs assessed by the γ-H2AX focus formation are induced by AID expression in hypermutating B cells, it does not necessarily mean that DSBs are an obligatory intermediate of SHM. Single-strand nick cleavage is probably sufficient to introduce SHM in the V region,[178),216)] but frequent nick cleavages generate DSBs with ‘staggered’ ends. In fact, a human B cell line (Ramos) has been reported to contain microdeletions in the V region, suggesting that DSBs may occur in some B cells that mutate extensively.[217)]

Tumorigenesis by constitutive expression of AID

Transgenic mice with AID cDNA under the control of the chicken β actin promoter develop T cell tumors and die by 85 weeks without an exception.[9)] The onset of tumors varies from 4 to 40 weeks, depending on the copy numbers of the transgene. By surface phenotype, T cell tumors appear to originate from either thymic or peripheral T cells. In rare cases, tumors develop in lung, liver and muscle tissues (our unpublished data). To our surprise, no B lymphomas have been detected in the AID-transgenic mice. In both types of T lymphomas, a large number of mutations accumulate in the V gene (10−3) but very infrequently in the C gene (10−4) of the T cell receptor, with a distribution profile of mutations reminiscent of SHM accumulation in Ig V genes. The mutation accumulation is also identified in the c-myc gene. Mutation target genes are selective because there are many transcriptionally active genes, which do not have mutations. No common chromosomal translocation other than sporadic one is found in these tumors.[9)] Therefore, mutation accumulation leading to tumor development is not solely related with deficiency in DNA repair genes but also to an uncontrolled AID activity, thus identifying AID as the first active mutator in vertebrates.

Future perspectives

To overcome the limitations of the genomic information, evolution has adopted a strategy for modifying DNA that cannot avoid the risk of the genome instability. The genetic alterations of the immune system take place in two different phases of lymphocyte differentiation (Fig. 2). The first step i.e. VDJ recombination occurs in the bone marrow or thymus. This DNA rearrangement is precisely regulated by the differentiation program and coupled with the differentiation step of lymphocytes. RAG-1 and -2 are enzymes that catalyze VDJ recombination. Once B lymphocytes complete VDJ recombination, B lymphocytes express IgM and move to the periphery where they encounter with antigen. Antigen-induced DNA alterations, CSR, SHM and GC are all mediated by AID. AID is the first enzyme to our knowledge identified in the genome that can physiologically induce mutations in the genome. The previously identified mutator genes are defective mutations in the repair mechanism. The tumorigenic activity of AID in mammalian cells has been demonstrated by transgenic expression of AID in mice.[9)] Chromosomal translocation associated with plasmacytoma formation is shown to be dependent on AID.[11)] It is therefore absolutely mandatory to regulate the function of AID at several different steps such as expression, decay, and target selection. It will be particularly interesting to discover which mechanism nature has selected for the dual action of AID: ‘shield’ by immune diversification, or ‘sword’ by genome mutation. Is it direct reaction on DNA or an additional control via RNA editing? Reports that infection with Epstein-Barr virus or hepatitis C virus[12),15)] induces AID expression suggests the existence of a delicate and fascinating interplay between pathogen, genome alteration, tumorigenesis and host defense, in which AID is central.

Fig. 2.

Pathogen induced DNA alterations by AID.

214 in total

1. Transcription-induced cleavage of immunoglobulin switch regions by nucleotide excision repair nucleases in vitro.

Authors: M Tian; F W Alt
Journal: J Biol Chem Date: 2000-08-04 Impact factor: 5.157

Review 2. Linking class-switch recombination with somatic hypermutation.

Authors: K Kinoshita; T Honjo
Journal: Nat Rev Mol Cell Biol Date: 2001-07 Impact factor: 94.444

3. A CD30 responsive element in the germline epsilon promoter that is distinct from and inhibitory to the CD40 response element.

Authors: M D Jumper; K Fujita; P E Lipsky; K Meek
Journal: Mol Immunol Date: 1996-08 Impact factor: 4.407

4. Circular DNA is excised by immunoglobulin class switch recombination.

Authors: T Iwasato; A Shimizu; T Honjo; H Yamagishi
Journal: Cell Date: 1990-07-13 Impact factor: 41.582

5. Cloning of human immunoglobulin mu gene and comparison with mouse mu gene.

Authors: N Takahashi; S Nakai; T Honjo
Journal: Nucleic Acids Res Date: 1980-12-20 Impact factor: 16.971

6. Fidelity of human DNA polymerase eta.

Authors: R E Johnson; M T Washington; S Prakash; L Prakash
Journal: J Biol Chem Date: 2000-03-17 Impact factor: 5.157

7. DNA cleavage in immunoglobulin somatic hypermutation depends on de novo protein synthesis but not on uracil DNA glycosylase.

Authors: Hitoshi Nagaoka; Satomi Ito; Masamichi Muramatsu; Mikiyo Nakata; Tasuku Honjo
Journal: Proc Natl Acad Sci U S A Date: 2005-01-31 Impact factor: 11.205

8. Position-dependent inhibition of class-switch recombination by PGK-neor cassettes inserted into the immunoglobulin heavy chain constant region locus.

Authors: K J Seidl; J P Manis; A Bottaro; J Zhang; L Davidson; A Kisselgof; H Oettgen; F W Alt
Journal: Proc Natl Acad Sci U S A Date: 1999-03-16 Impact factor: 11.205

9. Molecular cloning of an apolipoprotein B messenger RNA editing protein.

Authors: B Teng; C F Burant; N O Davidson
Journal: Science Date: 1993-06-18 Impact factor: 47.728

10. Unmutated immunoglobulin M can protect mice from death by influenza virus infection.

Authors: Yuichi Harada; Masamichi Muramatsu; Toshikatsu Shibata; Tasuku Honjo; Kazumichi Kuroda
Journal: J Exp Med Date: 2003-06-09 Impact factor: 14.307