Jiao Wang1, Yingchun Liu1, Ying Liu1,2, Kaixin Du1, Shuqi Xu1, Yuchen Wang1, Mart Krupovic2, Xiangdong Chen1. 1. State Key Laboratory of Virology, College of Life Sciences, Wuhan University, Wuhan 430072, China. 2. Unit of Molecular Biology of the Gene in Extremophiles, Department of Microbiology, Institut Pasteur, Paris 75015, France.
Abstract
Genomes of halophilic archaea typically contain multiple loci of integrated mobile genetic elements (MGEs). Despite the abundance of these elements, however, mechanisms underlying their site-specific integration and excision have not been investigated. Here, we identified and characterized a novel recombination system encoded by the temperate pleolipovirus SNJ2, which infects haloarchaeon Natrinema sp. J7-1. SNJ2 genome is inserted into the tRNAMet gene and flanked by 14 bp direct repeats corresponding to attachment core sites. We showed that SNJ2 encodes an integrase (IntSNJ2) that excises the proviral genome from its host cell chromosome, but requires two small accessory proteins, Orf2 and Orf3, for integration. These proteins were co-transcribed with IntSNJ2 to form an operon. Homology searches showed that IntSNJ2-type integrases are widespread in haloarchaeal genomes and are associated with various integrated MGEs. Importantly, we confirmed that SNJ2-like recombination systems are encoded by haloarchaea from three different genera and are critical for integration and excision. Finally, phylogenetic analysis suggested that IntSNJ2-type recombinases belong to a novel family of archaeal integrases distinct from previously characterized recombinases, including those from the archaeal SSV- and pNOB8-type families.
Genomes of halophilic archaea typically contain multiple loci of integrated mobile genetic elements (MGEs). Despite the abundance of these elements, however, mechanisms underlying their site-specific integration and excision have not been investigated. Here, we identified and characterized a novel recombination system encoded by the temperate pleolipovirus SNJ2, which infects haloarchaeon Natrinema sp. J7-1. SNJ2 genome is inserted into the tRNAMet gene and flanked by 14 bp direct repeats corresponding to attachment core sites. We showed that SNJ2 encodes an integrase (IntSNJ2) that excises the proviral genome from its host cell chromosome, but requires two small accessory proteins, Orf2 and Orf3, for integration. These proteins were co-transcribed with IntSNJ2 to form an operon. Homology searches showed that IntSNJ2-type integrases are widespread in haloarchaeal genomes and are associated with various integrated MGEs. Importantly, we confirmed that SNJ2-like recombination systems are encoded by haloarchaea from three different genera and are critical for integration and excision. Finally, phylogenetic analysis suggested that IntSNJ2-type recombinases belong to a novel family of archaeal integrases distinct from previously characterized recombinases, including those from the archaeal SSV- and pNOB8-type families.
Integrases are site-specific recombinases, which catalyze recombination between two specific DNA molecules, resulting in integration and excision of mobile genetic elements (MGEs) into and from host chromosomes (1). This process occurs without an external energy source and involves several, well-orchestrated steps comprised of strand breakage, exchange and rejoining (2). Recombinases are widespread in both cellular organisms and various MGEs, including plasmids (3,4), viruses (5), pathogenicity islands (6), genomic islands (7), integrons and gene cassettes (8), and integrative and conjugative elements (9). Correspondingly, they play multiple fundamental biological roles in the three domains of life, including integration/excision of viral genomes (10), elimination of dimers from replicated chromosomes during chromosome segregation (Xer/dif system) (11) and modulation of cell-surface components for immune escape (Hin/hix system) (12). They also maintain proper copy number, partition plasmid (Flp/frt system) and bacteriophage (Cre/lox system) genomes (13,14), and acquire, reorder and stockpile various genes (IntI system) (15).Most integrases can be subdivided into two recombinase superfamilies, the tyrosine and serine superfamilies, respectively. This classification is based on their corresponding nucleophilic amino acid residues, which attack DNA phosphodiester bonds to form a covalent protein–DNA linkage (16,17). Tyrosine recombinases can be further subdivided into two groups based on the directionality of the catalyzed site-specific recombination: complex unidirectional and simple bidirectional tyrosine-type integrases, respectively (2). This directionality is modulated by a class of small accessory proteins known as recombination directionality factors (RDFs). RDFs are typically small proteins lacking clear sequence conservation (18) that play architectural roles in the reactions catalyzed by their cognate recombinases. Some RDFs also function as transcriptional regulators (19–21). Importantly, there is no apparent genomic coupling between integrase and rdf genes, and the two genes are often separated in the genome. In contrast, simple bidirectional tyrosine-type integrases catalyze DNA recombination in a reversible fashion without the assistance of other factors (22).Most tyrosine recombinases possess seven conserved residues: ArgI, Glu/AspI, Lys, HisII, ArgII, His/TrpIII (where Roman numerals correspond to the three catalytic signature motifs I, II and III) and the essential catalytic Tyr residue (23). These residues play catalytic roles in several enzymes. Specifically, the Tyr and Lys residues serve as nucleophile and general acid catalysts, respectively (24). The ArgI and ArgII residues neutralize the negative charge during the transition state and activate the scissile phosphate by the catalytic tyrosine residue. Finally, the His/TrpIII, HisII and Glu/AspI residues stabilize the transition state (23–25). Importantly, bacterial tyrosine-type integrases contain a signature RI…HIIXXRII…Y tetrad (where X is any residue).In archaea, there are reports of two unique families of MGE-encoded integrases: SSV-type (RI…KIIXXRII…Y) and pNOB8-type (RI…YIIXXRII…Y) integrases, respectively. Integrases of both families harbor non-canonical substitutions within the active site tetrad (Table 1). SSV-type integrases are encoded by the spindle-shaped fuselloviruses, SSV1 and SSV2, that infect hyperthermophilic archaea of the genus Sulfolobus (phylum Crenarchaeota) (26–30). Uniquely, their integrase (int) gene is interrupted upon integration by an att site located within this gene (31). By contrast, pNOB8-type integrases are exclusively found in the conjugative pNOB8-like plasmids of Sulfolobus (31,32). Unlike SSV-type integrases, pNOB8-type integrases retain an intact int gene upon integration because their att site is located upstream of the int gene (31). Many archaeal head-tailed viruses (order Caudovirales) have been predicted to carry integrase genes (33); however, none of these have been studied experimentally.
Table 1.
Analysis of the conserved residues of SNJ2 and other integrases based on sequence alignment of representative tyrosine recombinase superfamily proteins
Consensus
ArgI
Glu/AspI*
Lysa
HisII
ArgII
His/TrpIII*
Tyr
Int SNJ2
R165
G168
K199
H295
R298
A321
Y330
Int Lambda
R212
D215
K235
H308
R311
H333
Y342
Flp S. cerevisiae
R191
D194
K223
H305
R308
W330
Y400
XerC E. coli
R148
E151
K172
H240
R243
H266
Y275
XerD E. coli
R148
E151
K172
H244
R247
H270
Y279
Cre P1
R173
E176
K201
H289
R292
W315
Y324
Int BJ1
R170
E173
K197
T264
R267
W290
Y300
Int phiCh1
R48
E51
R79
H172
R175
W198
Y207
Int SSV1
R211
E214
R240
K278
R281
R304
Y314
Int pNOB8
R277
E280
K305
Y379
R382
R405
Y417
Int pTN3
R286
E289
K336
A382
R385
R417
Y427
The traditional consensus tetrad that differentiates between the bacterial and archaeal sources of int is shaded in light gray. Substitutions in the conserved sites are bolded. *Two conserved residues (Glu/AspI and His/TrpIII) are present as one of two alternative amino acid types in tyrosine recombinases. a, the catalytic lysine (arginine) was determined to be present if it was located within four residues of IntSNJ2 K199 in the alignment.
The traditional consensus tetrad that differentiates between the bacterial and archaeal sources of int is shaded in light gray. Substitutions in the conserved sites are bolded. *Two conserved residues (Glu/AspI and His/TrpIII) are present as one of two alternative amino acid types in tyrosine recombinases. a, the catalytic lysine (arginine) was determined to be present if it was located within four residues of IntSNJ2 K199 in the alignment.The accumulation of archaeal genome sequences in public databases has fuelled the discovery of novel integrating MGEs that encode diverse tyrosine recombinases (34). However, very little is known about these integrases due to the lack of efficient genetic systems and relatively poor annotation of the archaeal genomes. Previously, we identified a novel provirus in a haloarchaeon Natrinema sp. J7-1. This virus, SNJ2, is the first isolated temperate member of the family Pleolipoviridae (35). We also found 17 SNJ2-like proviruses in the genomes of other haloarchaea belonging to 10 different genera. These proviruses all share a unique conserved gene cluster and appear to utilize a similar integration strategy (36). Haloarchaea isolated from widespread geographical locations all contain SNJ2-like proviral regions, suggesting that temperate pleolipoviruses represent a ubiquitous group of viruses ‘hiding’ in archaeal genome databases. Identification of the site-specific recombination machinery encoded by these viruses is necessary to understand their life cycle. In this study, we characterized a novel archaeal recombination system for SNJ2 that was important for the excision, integration and maintenance of this virus in host cells. Furthermore, this recombination system was comprised of genes for int and occasionally, auxiliary factors and the cognate att site. These were encoded by various proviral regions or MGEs in haloarchaeal genomes from 22 different genera and 5 families. Collectively, our results provide the first insight into the mechanisms controlling a site-specific recombination system used by a prevalent group of haloarchaeal MGEs, which may promote host genome plasticity and horizontal gene transfer (HGT).
MATERIALS AND METHODS
Bioinformatic analysis of site-specific recombination elements
Alignments of multiple protein sequences were performed using MUSCLE (37) or ClustalW (38). For homology detection, we used a combination of the NCBI Conserved Domain Search (39), PSI-BLAST (40) and HHpred (41). For phylogenetic analyses, protein sequences were aligned with Promals3D (42). Poorly aligned (low information content) positions were removed using the gappyout function of Trimal (43). The maximum likelihood phylogenetic tree of multiple tyrosine recombinase families was constructed using the latest version of the PhyML program (44). This program automatically selects the best-fit substitution model for a given alignment. The best model identified by PhyML was LG +G4 +I +F (LG, Le-Gascuel matrix; G4, Gamma shape parameter: fixed, number of categories: 4; I, proportion of invariable sites: fixed; F, equilibrium frequencies: empirical). The phylogenetic tree of the 38 IntSNJ2-type integrases was inferred using the neighbour-joining method in the MEGA software package (version 7.0) (45). Potential open reading frames (ORFs) larger than 50 codons were predicted using ORF Finder (NCBI) or the Vector NTI Advance® 11.5 software. Start codons were determined by selecting for the largest possible ORFs with one of the three start codons (ATG, GTG and TTG). Pairwise identity percentages between amino acid sequences were computed using the EMBOSS Needle tool (http://www.ebi.ac.uk/Tools/psa/emboss_needle/). Promoters were predicted using the Neural Network Promoter Prediction tool (http://www.fruitfly.org/seq_tools/promoter.html).
Strains, culture conditions and transformation procedures
All strains used in this study are listed in Supplementary Table S1. J7 and other haloarchaeal strains were cultured on Halo-2 or 18% modified growth medium (MGM) as previously described (46). Agar plates contained 15 g Bacto Agar (BD) per litre. Casamino Acids medium (Hv-Ca) was prepared according to the online protocol (http://www.haloarchaea.com/resources/halohandbook/Halohandbook_2009_v7.2mds.pdf), except Casamino Acids (Sigma-Aldrich) replaced the peptone and yeast extract in 18% MGM to provide selection pressure. When needed, 5-fluoroorotic acid (5-FOA) was added at a final concentration of 0.04 mg/ml, while mevinolin (Mev) antibiotic was added at 5 μg/mL in 18% MGM for haloarchaeal cultures. Escherichia coli strains, DH5α and JM110, were cultured in Luria-Bertani medium at 37°C. When needed, ampicillin (0.1 mg/ml) and chloramphenicol (0.05 mg/ml) were added to the media. The modified polyethylene glycol (PEG) method was used to transform halobacteria as previously described (46). DH5α and JM110 were transformed using the CaCl2 method (47).
Plasmid construction
Plasmid information is provided in Supplementary Table S2, and the oligonucleotides used for plasmid construction are summarized in Supplementary Table S3. Five classes of plasmids were constructed and classified according to functionality, as follows: (i) pNBF (kindly provided by Prof. Yuping Huang, Wuhan University) used for construction of pyrF deletion mutants carried upstream and downstream flanking sequences of pyrF (whose product was homologous to the enzyme orotidine-5′-phosphate decarboxylase), with an Mev resistance gene. The pNBK-ORFx and pUC-mev-pyrF-ORFx used to generate the orfx deletion and orfx::pyrF deletion-insertion mutants were constructed based on the basic vectors, pNBK-F (gift from Prof. Yuping Huang) and pUC-mev (46). pNBK-ORFx contained target gene flanking sequences with an intrinsic pyrF cassette as the selective marker, while pUC-mev-pyrF-ORFx contained the target gene flanking sequences separated by the pyrF cassette with Mev resistance. (ii) pYCJ-1416 used for complementation of int-null mutants. The promoter and coding regions of int were amplified by polymerase chain reaction (PCR) using DNA from J7-1-F Δorf2–3 as a template, and then cloned into the vector pYC-J (46). (iii) IntSNJ2 and IntSNJ2-type integration vectors, pCF-X and pCF-Z, were used in the integration assay. To construct these plasmids, fragments X and Z containing different parts of the Int operon were amplified using the genomic DNA of strains J7-1-F Δorfx and three haloarchaea, Haloterrigena thermotolerans DSM 11522, Natronorubrum bangense JCM 10635 and Natrinema versiforme JCM 10478, respectively. These were cut and ligated into the vector pUC-M-pyrF (48) (pCF), and were selectively marked by the pyrF cassette. We created different point mutations in pCF-2207 to generate Orf2/Orf3 frame-shift fragments and conserved site substitution of IntSNJ2 fragments using PCR-based overlap extension mutagenesis (49). (iv) Plasmids containing truncated IntSNJ2 and IntSNJ2-type excision recombination vectors, pSU-Y and pFJ6-W, were created using fragments Y and W with the pCF-X and Z as templates, respectively. These were cut with HindIII–XbaI and MfeI–SphI, then cloned into the corresponding basic vectors, pSU19 (50) and pFJ6 (48). (v) The mini-chimeric vector, pUC-mev-oriN-MH, for the excision assay was constructed using four steps. First, an SnaB–MfeI oriC10 (48) replicon fragment digested from pFJ6 was ligated into the vector pUC-mev to generate pUC-mev-oriN. Second, two fragments, PropyrF (promoter of pyrF)-attBOP and pyrF-attB'OP', were amplified using overlapping PCR. Third, the two fragments were fused together by overlapping PCR containing the restriction sites of MfeI and HindIII at the 5′ end, respectively. Finally, the digested fragment was ligated into the MfeI–HindIII-digested pUC-mev-oriN vector.
Construction of ΔpyrF and ΔorfxSNJ2 strains
PyrF deletion in Natrinema sp. J7-1, J7-3 was performed as previously described in Haloferax volcanii (51). Briefly, the non-replicative vector pNBF transformants were cultured in 18% MGM with Mev antibiotic to promote the first homologous recombination. Then, second recombination was promoted by culturing in the rich medium, Halo-2. Cultures were then plated on Halo-2 plates with 5-FOA to select for pyrF deletion mutants, designated as J7-1-F and J7-3-F, respectively. The orfx and orfx::pyrF strains were constructed based on J7-1-F in a similar manner. The knockout vectors, pNBK-ORFx and pUC-mev-pyrF-ORFx, containing the 5′ and 3′ flanking sequences of the orfx gene were transformed into J7-1-F. The resultant strains underwent twice homologous recombination for allelic exchange by culturing in different medium. The cultures were then plated on Halo-2 plates with 5-FOA and Hv-Ca to select for deletion and deletion-insertion recombinants, respectively. At least three independent clones of each strain were confirmed by PCR with primer pairs (listed in Supplementary Table S3) located at the outer edge and inside the deletion regions.
Induction, verification of virus and infection procedures
Natrinema sp. J7 cultures at the stationary phase were treated with 1 μg/ml mitomycin C (MMC; Roche) for 30 min at 37°C. Cells were then collected by centrifugation and resuspended in fresh Halo-2 medium, followed by incubation for about 48 h to obtain culture lysates containing virus particles (36). Cells and debris were removed by centrifugation (10 000 × g, 10 min), followed by filtration through a Millipore Millex filter (0.45 μm). The filtrates were collected and stored. For infection, the filtrate was incubated with J7 culture for 1 h at 37°C. The culture was then plated on the Halo-2 or Hv-Ca medium to select for infected strains. To verify the presence of the marked virus particles in culture supernatants, the supernatant was first treated with DNase I (TaKaRa) to remove chromosomal and proviral DNA and then used as template for PCR to detect the circularized viral genome with specific primer pairs (Supplementary Table S3).
Co-transcription analyses and determination of transcript 5′/3′-ends
For analysis of orf1–4 co-transcription, total RNA was first extracted from exponentially growing cultures J7-1 with/without MMC treatment using TRIzol (Invitrogen). RNA quantity and quality were quantified by Eppendorf BioSpectrometer fluorescence (at 260, 280 and 230 nm) and visualized using agarose gel electrophoresis stained with ethidium bromide. Purified RNA was pre-treated with gDNA Eraser Reagent to remove genomic DNA before it was used as a template for Reverse Transcription-PCR (RT-PCR). RT-PCR was performed according to the manufacturer’s protocol (PrimeScript™ RT reagent Kit with gDNA Eraser; TaKaRa) using reverse transcriptase and random primers to amplify the cDNA. To quantify genomic DNA contamination, negative controls consisting of RT-PCR without reverse transcriptase were included for each reaction mixture. The resulting cDNA was used as a template for transcription analysis, with specific primers (Supplementary Table S3) designed to amplify the intragenic and intergenic regions of orf1-orf4. To determine the transcript 5′- and 3′-ends, RNA circularization and extraction were performed as previously described (52). Briefly, purified self-ligated RNA was treated with gDNA eraser to remove genomic DNA and then used for reverse transcription. The cDNA of 5′-3′ ligated RNA was then amplified with gene-specific primer pairs orf1-edge/orf3-edge, followed by a second PCR with nested primer pairs Nest-F/Nest-R (Supplementary Table S3 and Figure S5). Nested PCR considerably enhanced amplification specificity and eliminated false-positive fragments from the first PCR reaction. Nested PCR products of the 5′-3′ ligated RNA were subsequently analyzed by sequencing.
Quantitative and qualitative detection of site-specific recombination in vivo
For the excision assay (schematic shown in Supplementary Figure S6A), a series of Int-expressing plasmids, pSU-Y (10 μl), were constructed and transformed into J7-3-F/pUC-mev-oriN-MH competent cells (2 ml, OD600 at 0.8–1.0) containing the recombination sites, attBOP (257 bp) and attB'OP' (162 bp; hybrid of virus attachment site attP'OP and host chromosome attachment site attBOB'; Supplementary Figure S6A). For the integration assay (Supplementary Figure S6B), a series of IntSNJ2 and IntSNJ2-type recombination vectors, pCF-X and pCF-Z, were transformed into J7-3-F competent cells. Transformed cells were recovered with 1 ml of Halo-2 medium at 37°C for 12 h and then plated on solid Hv-Ca medium and incubated at 45°C for 7–8 days. Integration and excision frequencies were calculated as the number of transformants on solid Hv-Ca medium obtained with 1 μg of plasmid DNA. Qualitative tests of the recombination junctions were performed using PCR and enzyme digestion. The PCR-based recombination assay was previously described (36). MMC-treated/untreated cultures were centrifuged to collect cells, and cell pellets were lyzed by suspension in the same volume of distilled water. Lysates were used as PCR amplification templates. The primer pairs (Supplementary Table S3) used for detection of the att sites were designed to target the borders between the MGE (proviral) genome and the host chromosome.
RESULTS
Temperate virus SNJ2 encodes a novel tyrosine integrase
The pleolipovirus SNJ2 was previously isolated from Natrinema sp. J7-1 cell culture supernatants (36,53). It carries a 14 bp attachment core site identical to the 3′-distal region of the host tRNA-Met gene. This enables the virus to site-specifically integrate into the host chromosome by recombining with the tRNA-Met gene, and subsequently be excised as a circular DNA molecule (36). Here, SNJ2 genome annotation revealed that ORF1 encodes a putative integrase, IntSNJ2 (accession no: AFO55992). Among profile HMMs available on the HHpred website, Orf1 best matched (HHpred probability of 100) the site-specific recombinases XerC (COG4973) and XerD (COG4974) (54). Alignment of IntSNJ2 with representative tyrosine recombinases revealed that this enzyme contains five invariant active site residues of the RI…K…HIIXXRII…Y pentad (Table 1 and Supplementary Figure S1), which are conserved in canonical tyrosine recombinases (23). However, in IntSNJ2, the typical Glu/AspI site and His/TrpIII sites were substituted with Gly and Ala residues, respectively (Table 1). Consistently, IntSNJ2 had low pairwise sequence identity (1.7–21.3%) with tyrosine recombinases from all other characterized families (Supplementary Table S4).To further characterize the relationship between SNJ2-like integrases and other tyrosine recombinases, we created a maximum likelihood phylogenetic tree of representative sequences from diverse tyrosine recombinase families. The global phylogeny generally complied with the predefined tyrosine recombinase classification and available biochemical data (Figure 1 and Supplementary Figure S2). Notably, archaeal recombinases involved in chromosome dimer resolution, named XerA (55), were at the base of a bacterial clade, including the paralogous XerC and XerD recombinases. Putative integrases of archaeal BJ1 (56)-like and phiCh1 (57)-like viruses formed a common clade with Cre-like recombinases of P1-like phages and yeast Flp-like recombinases, distinguishing them from other bona fide viral integrases (Figure 1). For phiCh1-like viruses, this placement is consistent with the experimental evidence that phiCh1 recombinase catalyzes inversion (rather than integration) of a gene cassette encoding tail fibre proteins (58). Importantly, SNJ2-like integrases formed a large, well-supported clade with the MGE-encoded integrases from hyperthermophilic archaea. In this clade, SSV-type integrases encoded by crenarchaeal MGE formed a sister group to pTN3-like integrases encoded by plasmids and viruses infecting members of the order Thermococcales (phylum Euryarchaeota) (59). Such topology supports the possibility that the two integrase families evolved from a common ancestor, possibly prior to the divergence of the Crenarchaeota and Euryarchaeota phyla. Furthermore, SNJ2-like and pNOB8-like family integrases were at the base of this archaeal clade, suggesting that the unorthodox SSV-type and pTN3-like integrases evolved from the more typical archaeal integrases, which act on att sites located outside of the int. Finally, the SNJ2-like integrases were close to the root of the major clade of archaeal MGE-encoded integrases (Figure 1), suggesting that SNJ2-like integrases diverged from the remaining archaeal homologs early in their evolution. These results strongly suggest that the IntSNJ2 belongs to a novel family of tyrosine integrases.
Figure 1.
Maximum likelihood phylogenetic analysis of tyrosine recombinases. The tree contains representative members from different tyrosine recombinase families, including shufflon-specific DNA recombinases (4); dusA-associated integrases (DAI); phage integrases; integrases from integrative and conjugative elements (IntC); integron integrases (IntI); genomic island integrases (IntG); site-specific recombinases involved in chromosome dimer resolution in archaea (XerA) and bacteria (XerC/D); yeast Flp-like flippases (FLP); phage P1-like recombinases (Cre); archaeal SSV-type, pNOB8-type and pTN3-type integrases; and integrases encoded by haloarchaeal BJ1-like and phiCh1-like viruses. The SNJ2 clade contains putative integrases encoded by proviruses that are integrated in various haloarchaeal genomes. The SNJ2 integrase is indicated with an asterisk. Numbers at the main branch points represent the Bayesian-like transformation of aLRT (aBayes) local support values. The scale bar represents the number of substitutions per site. Branches with support values below 70% were collapsed. The tree in which all nodes are labeled is shown in Supplementary Figure S2.
Maximum likelihood phylogenetic analysis of tyrosine recombinases. The tree contains representative members from different tyrosine recombinase families, including shufflon-specific DNA recombinases (4); dusA-associated integrases (DAI); phage integrases; integrases from integrative and conjugative elements (IntC); integron integrases (IntI); genomic island integrases (IntG); site-specific recombinases involved in chromosome dimer resolution in archaea (XerA) and bacteria (XerC/D); yeast Flp-like flippases (FLP); phage P1-like recombinases (Cre); archaeal SSV-type, pNOB8-type and pTN3-type integrases; and integrases encoded by haloarchaeal BJ1-like and phiCh1-like viruses. The SNJ2 clade contains putative integrases encoded by proviruses that are integrated in various haloarchaeal genomes. The SNJ2 integrase is indicated with an asterisk. Numbers at the main branch points represent the Bayesian-like transformation of aLRT (aBayes) local support values. The scale bar represents the number of substitutions per site. Branches with support values below 70% were collapsed. The tree in which all nodes are labeled is shown in Supplementary Figure S2.
IntSNJ2 excises and integrates SNJ2 provirus into the host chromosome
To determine the biological function of the putative IntSNJ2 protein, we constructed a Natrinema sp. J7-1-F strain in which ORF1 (IntSNJ2) was deleted from the SNJ2 proviral region (Supplementary Table S1 and Figures S3A and B), referred to as J7-1-F Δorf1. To determine which SNJ2 provirus products affected excision, a series of J7-1-F Δorfx strains were constructed in which different regions of the integrated SNJ2 provirus were deleted (Supplementary Figure S3B and Table S1). The in-frame deleted regions spanned the entire proviral genome, except the hybrid attBOP and attP'OB' sites (B/B' and P/P' are the host and virus DNA flanking the 14 bp core site (O) respectively; Figure 2A). Deleted ORFs were not essential for SNJ2 excision if we were able to detect the expected circularized form of the provirus with a concomitant reconstitution of the attP'OP site (Supplementary Figure S3B). For essential ORFs, however, we would not detect the attP'OP site. The attP'OP site was not detected in culture supernatants of strains J7-1-F Δorf1 and J7-1-F Δorf1–3, indicating that Orf1 (IntSNJ2) was critical for SNJ2 excision (Figure 2A and Supplementary Figure S3B). Alternatively, the ORFs corresponding to proviruses SNJ2Δorf3, SNJ2Δorf2–3, SNJ2Δorf9–11, SNJ2Δorf4–25 and SNJ2Δorf20–25 were not necessary for SNJ2 excision (Figure 2A and Supplementary Figure S3B). To confirm these results, we complemented strains J7-1-F Δorf1 and J7-1-F Δorf1–3 with plasmid pYCJ-1416 (Supplementary Table S2) carrying IntSNJ2 and tested whether excision was restored upon MMC induction. As shown in Supplementary Figure S3C, we confirmed that IntSNJ2 was the only product required for SNJ2 provirus excision.
Figure 2.
Excision and integration of SNJ2 variants lacking one or more ORFs. (A) Schematic containing the different orfx deletions in the SNJ2 provirus genome and results of SNJ2Δorfx excision. SNJ2Δorfx excision from J7-1-F Δorfx cells was detected by PCR of the reconstituted attP'OP site using primer set b/c (short arrow). ‘Y’ or ‘N’ indicated that the attP'OP fragment could (Y) or could not (N) be detected from the corresponding knockout strains. (B) Detection of SNJ2Δorfx::pyrF virus particles in the supernatants of corresponding J7 cultures. The radA gene located in the J7 chromosome was used as a control to detect chromosomal DNA contamination. +/−, with/without correct PCR products. (C) Integration of the SNJ2Δorfx::pyrF viruses was detected by successful amplification of the integration site attBOP (292 bp) using primer pairs a/b when J7-3-F and J7-3-F/pYCJ-1416 were infected by the corresponding viruses. Lane M, Trans 2K PlusII marker. Wild-type SNJ2 and H2O were used in this system as positive and negative controls, respectively.
Excision and integration of SNJ2 variants lacking one or more ORFs. (A) Schematic containing the different orfx deletions in the SNJ2 provirus genome and results of SNJ2Δorfx excision. SNJ2Δorfx excision from J7-1-F Δorfx cells was detected by PCR of the reconstituted attP'OP site using primer set b/c (short arrow). ‘Y’ or ‘N’ indicated that the attP'OP fragment could (Y) or could not (N) be detected from the corresponding knockout strains. (B) Detection of SNJ2Δorfx::pyrF virus particles in the supernatants of corresponding J7 cultures. The radA gene located in the J7 chromosome was used as a control to detect chromosomal DNA contamination. +/−, with/without correct PCR products. (C) Integration of the SNJ2Δorfx::pyrF viruses was detected by successful amplification of the integration site attBOP (292 bp) using primer pairs a/b when J7-3-F and J7-3-F/pYCJ-1416 were infected by the corresponding viruses. Lane M, Trans 2K PlusII marker. Wild-type SNJ2 and H2O were used in this system as positive and negative controls, respectively.To test if IntSNJ2 was also necessary for provirus integration, we first generated defective SNJ2 viruses lacking ORF1 (encoding IntSNJ2). We introduced the orfx::pyrF (x = 1, 1–3) mutation into J7-1-F cells complemented with pYCJ-1416. Upon induction, IntSNJ2 excision produced defective SNJ2 viruses marked with pyrF (60) (SNJ2Δorfx::pyrF, Supplementary Figure S4A). Control mutants were also constructed (orfx::pyrF) (x = 2–3, 20–24, 4–24) within the J7-1-F strain to generate their corresponding SNJ2 variants. PCR amplification of pyrF and the circularized viral genome containing the attP’OP site confirmed virus particle production. PCR of radA expression in J7 cells confirmed that there was no contamination by chromosomal DNA. All marked and defective SNJ2 particles were induced and detected in the J7 culture except for the SNJ2Δorf4–24::pyrF, which was not packaged because all genes encoding virus structural proteins were lost (Figure 2B and Supplementary Figure S4B).Finally, to determine the role of IntSNJ2 in integration, SNJ2Δorf1::pyrF, SNJ2Δorf1–3::pyrF, SNJ2Δorf2–3::pyrF and SNJ2Δorf20–24::pyrF viruses were incubated with the Natrinema sp. J7-3-F cells (Supplementary Table S1) lacking SNJ2 and containing an empty attBOB' site. The attBOP site would only be detected by PCR from the infected strains using the host-specific primer-a and the virus-specific primer-b (Figure 2A) if the SNJ2 variants could integrate into the host chromosome. Cells infected with SNJ2Δorf2–3::pyrF, SNJ2Δorf20–24::pyrF, or the wild-type SNJ2 viral infection, but not SNJ2Δorf1::pyrF or SNJ2Δorf1–3::pyrF, yielded detectable amplification products (Figure 2C), suggesting that IntSNJ2 was critical for integration. Accordingly, SNJ2Δorf1::pyrF-infected cells were unstable when passaged onto Halo-2 rich culture medium, and could not be selected on Hv-Ca medium. In contrast, incubating J7-3-F/pYCJ-1416 (Supplementary Table S1) cells with SNJ2Δorf1::pyrF and SNJ2Δorf1–3::pyrF restored integration (Figure 2C), and allowed cells to be stably selected on Hv-Ca medium. These results indicate that SNJ2 lacking int could adsorb to and infect J7-3-F cells, but lost the ability to integrate into the host chromosome (Figure 2B and C). Taken together, these findings demonstrated that int (orf1) was required to maintain the SNJ2 genome in host cells and to site-specifically recombine with the host chromosome.
IntSNJ2 is co-transcribed with orf2 and orf3
Previous research indicated that int and the adjacent ORF2 and ORF3 are transcribed in the same direction while the nearby ORF4 is transcribed in the opposite direction (Figure 2A) (36). These three genes overlap with one another by 8 (orf3-orf2) and 20 (orf2-orf1) bp, respectively, indicating that they likely form an operon and are co-transcribed from a promoter upstream of orf3. To test this hypothesis, we analyzed the transcription products of int and its adjacent ORFs using RT-PCR of the cDNA extracted from J7-1 cells. Int and orf4 were all transcribed (Figure 3A), and co-transcription of int and orf2, orf2 and orf3, and int and orf3 was readily detected. However, this was not detected for orf3 and orf4, or int and orf4 (Figure 3A, right panel). These results confirmed that int forms an operon with orf2 and orf3, and that the three genes are co-transcribed.
Figure 3.
Orf1, orf2 and orf3 form an operon with the transcription start site located upstream of orf3. (A) Co-transcription analysis of orf1 to orf2 (1034 bp), orf2 to orf3 (635 bp) and orf1 to orf3 (1474 bp). Orf4 was not co-transcribed with orfs1–3. Lanes: +, positive control with J7–1 genomic DNA as the template; −, negative control with total RNA as the template; RT, RT (Reverse Transcription)-PCR with cDNA as the template; M, DNA marker. (B) Promoter and transcription start sites of the operon formed by orf1-orf3. Two transcription start sites (+1) are bolded and indicated by a bent arrow. The ORF3 start codon ATG is highlighted in an open box. The predicted TATA boxes are shown in dark gray and the BRE (transcription factor B recognition element) CGAAA motif is underlined.
Orf1, orf2 and orf3 form an operon with the transcription start site located upstream of orf3. (A) Co-transcription analysis of orf1 to orf2 (1034 bp), orf2 to orf3 (635 bp) and orf1 to orf3 (1474 bp). Orf4 was not co-transcribed with orfs1–3. Lanes: +, positive control with J7–1 genomic DNA as the template; −, negative control with total RNA as the template; RT, RT (Reverse Transcription)-PCR with cDNA as the template; M, DNA marker. (B) Promoter and transcription start sites of the operon formed by orf1-orf3. Two transcription start sites (+1) are bolded and indicated by a bent arrow. The ORF3 start codon ATG is highlighted in an open box. The predicted TATA boxes are shown in dark gray and the BRE (transcription factor B recognition element) CGAAA motif is underlined.To next determine the orf1–3 operon transcription start and termination sites, the 5′- and 3′-ends of the transcripts were identified by sequencing the PCR products yielded from two consecutive nested PCR reactions using the circularized cDNA as the template (as illustrated in Supplementary Figure S5). All transcripts initiated at two sites, either 132 or 114 bp upstream of the start codon of orf3 (ATGORF3), respectively (Figure 3B). However, transcripts terminated at different positions downstream of int, yielding transcripts of variable length. These results suggested that this operon contained two transcription start sites. Indeed, two promoters were identified upstream of orf3, including one typical promoter containing the transcription factor B recognition element (BRE, -54 CGAAA -50) (52) upstream of the TATA box (-28 TTTTTATT -21), and another containing a putative purine-rich BRE element (61) (-75 GAAAGAA -69) and the TATA box (-45 TATATA -40) (Figure 3B). The existence of two initiation sites implied that IntSNJ2 operon expression is under certain regulation, similar to that of phosphotransferase system in Haloferax mediterranei (62).
Orf2 and Orf3 have limited effects on SNJ2 excision but play important roles in virus integration
Genes within an operon often function in the same pathway. Thus, we tested the possibility that orf2 and orf3 act together with IntSNJ2 to integrate or excise SNJ2 provirus (Figure 2). Products of these two ORFs were detected by tandem mass spectrometry (LC-ESI MS/MS) (36), indicating that both ORFs were translated into proteins. Orf2 (accession no. AFO55991) encodes a 111-amino acid protein containing a strongly-predicted coiled-coil domain in the N-terminal region, which could mediate homotypic or heterotypic protein–protein interactions. The 141-aa protein product of orf3 (accession no. AFO55990) contains a MarR-like winged helix-turn-helix DNA-binding motif. Notably, this motif is also found in a group of RDFs that control the recombination direction of the complex site-specific recombination system (63–65).To investigate whether Orf2 and Orf3 affect site-specific recombination, we first attempted to establish an excision system in J7–3-F by replacing its attBOB' site with a fragment containing the attBOP-pyrF-attB'OP'; however, this approach was unsuccessful. Instead, we used a chimeric plasmid, pUC-mev-oriN-MH (Supplementary Table S2) containing the replication origin of Natrinema sp. J7 (oriN, oriC10 of J7 with an autonomously replicating sequence (48) and the attBOP-pyrF-attB'OP' fragment (Supplementary Figure S6A). In the presence of IntSNJ2 and accessory factors, recombination between the attBOP and attB'OP' (Supplementary Figure S6A) on pUC-mev-oriN-MH would invert rather than excise the fragment and concomitantly form the new junction sites, attBOB' and attP'OP, respectively. The reoriented pyrF could complement the pyrF-defective strain, J7-3-F, allowing for selection on Hv-Ca plates (Supplementary Figure S6A). Then, the recombination efficiency was evaluated by counting the number of transformants obtained per μg of plasmid DNA. PCR amplification of the recombinant attBOB' and attP'OP sites and SacI restriction digestion could further verify the recombinants by producing a different restriction pattern of the newly formed attBOB'-pyrF fragment compared to the one obtained using the original plasmid (Supplementary Figure S6C). IntSNJ2 or its related factors were provided from a series of non-replicating plasmids, pSU19-Y, that carried the int operon with different gene deletions: single-gene-deletion of Δorf1 (Y = 1048), Δorf2 (Y = 1936), Δorf3 (Y = 1844), double-gene-deletion of Δorf2-orf3 (Y = 1416) and triple-gene-deletion: Δorf1–3 (Y = 367) (Figure 4A and Supplementary Table S2). Expression of IntSNJ2 alone from plasmid pSU19-1416 yielded a recombination efficiency of ∼80% compared to that induced by expression of IntSNJ2 with Orf2 and Orf3 from pSU19-2017 (Figure 4A). Deletion of either orf2 (pSU19-1936) or orf3 (pSU19-1844) did not affect the recombination efficiency. However, deletion of IntSNJ2 or the operon promoter (pSU19-1048 or -367) completely abolished recombination as shown by a lack of transformants (Figure 4A). These results indicated that IntSNJ2 mediated excision independent of Orf2 and Orf3, although Orf2 and Orf3 could partially promote this process.
Figure 4.
The effects of Orf2 and Orf3 on IntSNJ2-mediated recombination efficiency. (A) Scheme of truncated recombination elements and their corresponding excision efficiency. Excision efficiencies are shown as black bars normalized to the fragment of 2071 (whole orf1-orf3 operon) represented as averages and standard deviations. This assay was performed in triplicate for each construct. Significance testing was performed using a one-sample t-test (*P < 0.05). (B) Scheme of truncated recombination elements and their corresponding integration efficiency. The recombination efficiencies (average) of different plasmids are normalized to pCF-2207 (whole orf1-orf3 operon with attP'OP site), with the efficiency (average ± standard deviation) of integration indicated (n ≥ 3). Significance testing was performed using a one-sample t-test (***P < 0.001, **P < 0.01).
The effects of Orf2 and Orf3 on IntSNJ2-mediated recombination efficiency. (A) Scheme of truncated recombination elements and their corresponding excision efficiency. Excision efficiencies are shown as black bars normalized to the fragment of 2071 (whole orf1-orf3 operon) represented as averages and standard deviations. This assay was performed in triplicate for each construct. Significance testing was performed using a one-sample t-test (*P < 0.05). (B) Scheme of truncated recombination elements and their corresponding integration efficiency. The recombination efficiencies (average) of different plasmids are normalized to pCF-2207 (whole orf1-orf3 operon with attP'OP site), with the efficiency (average ± standard deviation) of integration indicated (n ≥ 3). Significance testing was performed using a one-sample t-test (***P < 0.001, **P < 0.01).We also studied how these two proteins affected integration. A series of non-replicating plasmids (pCF-X marked by pyrF) carrying the attP'OP region and different int operon gene deletions (Supplementary Table S2) were constructed and transformed into J7-3-F cells (Figure 4B and Supplementary Figure S6B). Because J7-3-F is defective in pyrF, only transformants that successfully integrated into the host chromosome expressed pyrF and grew on Hv-Ca plates (Supplementary Figure S6B). Integrants were further verified by PCR amplification of the recombined attBOP and attP'OB' (Supplementary Figure S6D). The integration efficiency was evaluated by counting the number of transformants per μg of plasmid DNA. Transformation of the plasmid, pCF-2207, containing the whole operon yielded ∼1.23 × 103 transformants per μg of DNA on selection plates (Figure 4B). As expected, transformations of plasmids lacking the int gene, pCF-1238 (Δint) or pCF-243 (Δorf1–3), did not produce transformants (Figure 4B). Interestingly, plasmid pCF-1638 (Δorf2–3) yielded less than ten transformants per μg of plasmid DNA, suggesting that Orf2 and Orf3 significantly enhanced integration. Deletion of either Orf3 (pCF-1980) or Orf2 (pCF-2072) decreased the integration efficiency to approximately 40% of the efficiency conferred by the whole operon (Figure 4B). These results indicated that Orf2 and Orf3 partially redundantly promoted IntSNJ2 integration. To confirm this, frame-shift mutations were introduced into orf2 and orf3 in the plasmid carrying the whole operon. These frame-shift mutations dramatically reduced recombination efficiency (Figure 4B), confirming that both Orf2 and Orf3 were critical to efficient integration.
SNJ2-type site-specific recombination elements are widespread in haloarchaea and are associated with various pleolipoproviral regions and other MGEs
The above results indicated that IntSNJ2-mediated site-specific recombination require the virus-encoded accessory factors for integration but not excision. This distinguished SNJ2 from the typical lambda-type tyrosine Int (66) or phiC31-type serine Int families (67,68), which both require RDFs for excision. IntSNJ2 was also distinct from the archaeal SSV-type integrases that catalyze recombination in a simple and reversible fashion (30). To next determine the distribution of these novel recombination systems, we searched the non-redundant protein database at NCBI for the presence of the IntSNJ2 homologs using the IntSNJ2 amino acid sequence as a query using a Position-Specific Iterated (PSI-) BLAST search. Over 500 integrases encoded on haloarchaeal chromosomes showed an identity higher than 30% with IntSNJ2 and the sequence coverage over 90%. From these hits, we selected 46 (denoted as No. 2 to 47) IntSNJ2-like recombinases with the pairwise identities ranging from 34.6 to 88.8% (Supplementary Table S5). These integrases were found in chromosomes of haloarchaea belonging to 22 different genera and 5 families (Supplementary Table S5). Alignment of the corresponding amino acid sequences revealed that they were all tyrosine recombinases containing the RI…K…HIIXXRII…Y pentad, but with the Glu/AspI site replaced by Gly/AlaI (proportion: 15/31) and His/TrpIII replaced by Ala/ValIII (proportion: 35/11) (Supplementary Figure S7).To identify features conserved in the SNJ2-like recombination systems, we attempted to delineate the exact borders of the MGEs encoding these integrases by searching for direct repeats corresponding to the att sites and flanking the integrated MGEs. Putative att sites were predicted for all MGEs except for elements residing in the incompletely sequenced genomes of H. thermotolerans DSM 11522 (No. 20) and Natrialba chahannaoensis JCM 10990 (No. 44 and 45) (Supplementary Table S5). In almost all cases, one border of the MGE overlapped with a tRNA gene, like the IntSNJ2 recombination system. Three exceptions were the att sites of MGEs of Halomicrobium mukohataei DSM 12286 (No. 24), Halopenitus sp. DYS4 (No. 33), and Natronococcus jeotgali DSM 18795 (No. 47), which were all located in the intergenic regions (Supplementary Table S5).Phylogenetic analysis of these IntSNJ2-like integrases revealed that they could be further subdivided into eight subgroups (Supplementary Table S5). In each of the subgroups, integrases and the corresponding att sites were coupled. Specifically, the sequences of the att sites in each subgroup tended to be identical, using the same tRNA target gene, with the notable exception of subgroup VIII (Figure 5 and Supplementary Table S5). In this subgroup, all members targeted tRNA-Pro except for the MGE of Halopiger xanaduensis SH-6 (No. 34, accession no. AEH38855), which instead used tRNA-Trp. Furthermore, the distances between the 3′ end of the int genes and the 5′ end of the att sites were often similar or identical within each subgroup (Figure 5). For instance, in subgroup I, all four att sites were identical (14 bp-long) and located 48 bp downstream of the 3′ end of the int genes (Figure 5). The uniform arrangement of the recombination elements in each subgroup suggested that they originated from a common ancestor and spread in different chromosomes by HGT.
Figure 5.
Relatedness of putative IntSNJ2-type integrases and their corresponding conserved core target sequences. The phylogenetic tree of 39 putative haloarchaeal tyrosine integrase was inferred using MEGA7. Numbers at nodes represent percentages of bootstrap support based on a neighbor-joining analysis of 1000 resampled datasets. GenBank identifiers of protein sequences used to generate this phylogenetic tree are presented following the ordinal numbers (1–39, Supplementary Table S5). The distance (bp) between the 3′ end of int genes and 5′ end of att core sites are shown in the middle panel. The conserved att core sequences targeted by integrases and overlapping with the same tRNA genes belonging to eight groups are shown.
Relatedness of putative IntSNJ2-type integrases and their corresponding conserved core target sequences. The phylogenetic tree of 39 putative haloarchaeal tyrosineintegrase was inferred using MEGA7. Numbers at nodes represent percentages of bootstrap support based on a neighbor-joining analysis of 1000 resampled datasets. GenBank identifiers of protein sequences used to generate this phylogenetic tree are presented following the ordinal numbers (1–39, Supplementary Table S5). The distance (bp) between the 3′ end of int genes and 5′ end of att core sites are shown in the middle panel. The conserved att core sequences targeted by integrases and overlapping with the same tRNA genes belonging to eight groups are shown.Because integration reaction with the SNJ2 recombination system was highly dependent on Orf2 and Orf3, we searched for potential homologs of these proteins in proximity of the int genes carried by other haloarchaeal MGEs and containing a coiled-coil structure and DNA-binding domain. In the MGEs, we identified 11 proteins containing coiled-coil domains (ORFs indicated with bright green arrows, Figure 6) and 25 proteins with DNA-binding domains (ORFs indicated in dark green, Figure 6). However, further analysis revealed that only five MGEs, No. 4, 20, 29, 30 and 38, had SNJ2 Orf2 and Orf3 homologs. In 14 MGEs, no candidates for either domain-containing protein were identified, presumably because they had defective recombination systems or evolved to function without accessory factors.
Figure 6.
IntSNJ2-type integrases are associated with various MGEs. Genes belonging to the same functional categories are indicated by the use of the same colors (top left legend) in the genome comparison between SNJ2 (1) and the other MGEs (2–39, ordinal numbers consistent with Figure 5). The level of identity between different homologs is indicated in shades of gray (top left legend). The proviral regions encoding the typical IntSNJ2-type recombination elements are numbered in red, including integrases (green arrow) and Orf2SNJ2-like (bright green arrow)/Orf3SNJ2-like (dark green solid arrow) proteins. Newly predicted putative ORFs are marked by an asterisk.
IntSNJ2-type integrases are associated with various MGEs. Genes belonging to the same functional categories are indicated by the use of the same colors (top left legend) in the genome comparison between SNJ2 (1) and the other MGEs (2–39, ordinal numbers consistent with Figure 5). The level of identity between different homologs is indicated in shades of gray (top left legend). The proviral regions encoding the typical IntSNJ2-type recombination elements are numbered in red, including integrases (green arrow) and Orf2SNJ2-like (bright green arrow)/Orf3SNJ2-like (dark green solid arrow) proteins. Newly predicted putative ORFs are marked by an asterisk.Analysis of the 38 MGEs containing SNJ2-like recombination systems revealed that 33 were pleolipoproviruses (or remnants thereof), with 4 belonging to the genus Alphapleolipovirus and 29 to Betapleolipovirus (Figure 6). The remaining five were MGEs with the majority of their ORFs encoding functionally uncharacterized proteins with no known counterparts (Figure 6). As expected, all putative pleolipoprovirus regions encoded conserved core genes (VP3-like membrane protein, VP4-like spike protein and VP8-like ATPase shown with purple arrows) (35,69). Moreover, 11 of these putative pleolipoproviral regions contained SNJ2 Orf8-like protein (36), which is homologous to archaealOrc1/Cdc6 proteins (origin recognition complex 1/cell division control protein 6) (70). This suggested that these proviral regions may be excised from the host to form circular genomes and replicate using host machinery. Notably, two elements, No. 2 (assembled, Supplementary Figure S8) and 19, lacked all the core pleolipoviral genes encoding proteins involved in virion formation and, as a result, were reduced to gene cassettes encoding only the putative replication-associated protein and the integration/excision module (Figure 6).
SNJ2-type site-specific recombinases from three haloarchaea are functional
We demonstrated that IntSNJ2-type site-specific recombination systems are widespread in haloarchaea and are associated with various MGEs. To determine if these recombinases were functional, we further analyzed SNJ2-type integrases from subgroup I (Figure 5). These integrases were encoded within haloarchaea belonging to different genera of the order Natrialbales and showed high sequence identity (≥81.4%) to IntSNJ2 (Supplementary Table S5). Like SNJ2, their associated MGEs were excised from the host chromosome as closed circular molecules, subsequently reforming the native tRNA sequences (Figure 7A). Nucleotide sequence alignments of the putative DNA recombination junctions (attBOP, attP'OB', attP'OP, attBOB') from Natrinema sp. J7-1, H. thermotolerans DSM 11522, N. bangense JCM 10635, N. versiforme JCM 10478 and Halobiforma haloterrestrisDSM 13078 (indicated as No. 1, 2, 3, 4, 5) revealed not only the conserved common core att site (O), but also highly conserved flanking sequences (B arm on the chromosome and P/P' arms on MGEs) (Figure 7A). Additionally, all five haloarchaea harbored an identical 23 bp continuous sequence in the B arm. The B arms of JCM 10635 (No. 3) and DSM 13078 (No. 5) were more similar to that of J7-1 (No. 1) than to DSM 11522 (No. 2) and JCM 10478 (No. 4), as they shared more consensus nucleotide positions and longer tRNA genes. Thus, we chose the three representative haloarchaea (No. 2, 3, 4) to test whether their MGEs were active. PCR and sequencing data obtained from the amplicons of the circularized attP'OP sites (Supplementary Figure S9A) and reformed tRNA genes confirmed that the precise MGE termini and the putative attachment sites were involved in recombination, indicating that the proviral regions were active (Figure 7A).
Figure 7.
IntSNJ2-type recombination elements mediated excision and integration in Natrinema sp. J7. (A) Diagram of MGE excision in J7-1 (1), Haloterrigena thermotolerans DSM 11522 (2), Natronorubrum bangense JCM 10635 (3), Natrinema versiforme JCM 10478 (4) and Halobiforma haloterrestris DSM 13078 (5). The att core (O) site (sequences shaded in gray) overlapped with the 3′ end of the tRNA gene (underlined sequences) of the corresponding organism. The archaeal chromosome and proviral (MGEs) DNA flanking the core sites are designated as B/B' (green line) and P/P' (orange line) arms, respectively. Identical nucleotides are bolded (uppercase) in the alignments of the sites (B, B', O, P, P') from five different haloarchaea. Consensus amino acid sequences of Int (C-terminal) are shown below the nucleotide sequences. (B) Cross integration reactivity of three IntSNJ2-type recombination elements (origin from strains 2, 3, 4 in [A]). IntSNJ2-type elements are shown as operons with putative promoter (predicted by the promoter predictor NNPP version 2.2). ‘Y’ or ‘N’ indicated the fragments with different operon length that could (Y) or could not (N) integrate into J7–3-F. (C) Cross excision reactivity of IntSNJ2-type recombination elements in the excision defective strains J7–1-F Δorf1 and J7–1-F Δorf1–3. Excision was confirmed by PCR amplification of the fragments W (2486, 1642, 2448) in defective strains, which were complemented by plasmids pFJ6-W (lane , circularized attPOP' (lane , 363 bp for SNJ2Δorf1; 1355 bp for SNJ2Δorf1–3) and restored chromosome junction attBOB' (lane , 387 bp).
IntSNJ2-type recombination elements mediated excision and integration in Natrinema sp. J7. (A) Diagram of MGE excision in J7-1 (1), Haloterrigena thermotolerans DSM 11522 (2), Natronorubrum bangense JCM 10635 (3), Natrinema versiforme JCM 10478 (4) and Halobiforma haloterrestrisDSM 13078 (5). The att core (O) site (sequences shaded in gray) overlapped with the 3′ end of the tRNA gene (underlined sequences) of the corresponding organism. The archaeal chromosome and proviral (MGEs) DNA flanking the core sites are designated as B/B' (green line) and P/P' (orange line) arms, respectively. Identical nucleotides are bolded (uppercase) in the alignments of the sites (B, B', O, P, P') from five different haloarchaea. Consensus amino acid sequences of Int (C-terminal) are shown below the nucleotide sequences. (B) Cross integration reactivity of three IntSNJ2-type recombination elements (origin from strains 2, 3, 4 in [A]). IntSNJ2-type elements are shown as operons with putative promoter (predicted by the promoter predictor NNPP version 2.2). ‘Y’ or ‘N’ indicated the fragments with different operon length that could (Y) or could not (N) integrate into J7–3-F. (C) Cross excision reactivity of IntSNJ2-type recombination elements in the excision defective strains J7–1-F Δorf1 and J7–1-F Δorf1–3. Excision was confirmed by PCR amplification of the fragments W (2486, 1642, 2448) in defective strains, which were complemented by plasmids pFJ6-W (lane , circularized attPOP' (lane , 363 bp for SNJ2Δorf1; 1355 bp for SNJ2Δorf1–3) and restored chromosome junction attBOB' (lane , 387 bp).To test the ability of the three MGEs to site-specifically integrate into the SNJ2 free strain, J7-3-F (ΔpyrF), that carried only the attBOB' site, we amplified and ligated fragments containing int-attP'OP cassette (including the native predicted promoter, integrase and att site, Figure 7B) of each of the three proviral regions into the non-replicating plasmid, pCF, carrying pyrF. Following plasmid transformation into J7-3-F (ΔpyrF) cells, only transformants from plasmids carrying the int-attP'OP operons integrated into the chromosome would grow on Hv-Ca plates (as illustrated in Supplementary Figure S6B). As expected, integrants were obtained from cells transformed with plasmids carrying any one of the three tested int-attP'OP operons, but not with plasmids lacking the int or the attP'OP site (Figure 7B). Furthermore, fragments corresponding to the attBOP and attP'OB' sites were amplified from cultures of these integrants (Supplementary Figure S9B), suggesting that all three recombinases mediated site-specific integration (Figure 7B).Together, these results showed that the three tested IntSNJ2-type recombination systems catalyzed cross-reactive integration in SNJ2 host J7-3-F cells, suggesting that they may be also able to substitute for IntSNJ2 during excision. To test this, we used strains, J7-1-F Δorf1–3 and J7-1-F Δorf1, with an intact att site but defective for SNJ2 excision (as illustrated in Figure 2A). Plasmids pFJ6-W (W = 2486, 1642 and 2448; Figure 7B and Supplementary Table S2) containing the integrase genes, putative SNJ2 orf2 or orf3-like accessory protein-encoding genes, and the predicted promoters of the three proviral regions were constructed and transformed into IntSNJ2-defective host cells. The transformed cultures of J7-1-F Δorf1–3 and J7-1-F Δorf1 were induced by MMC, and cell-free supernatants were analyzed for viral presence (SNJ2Δorf1–3 and SNJ2Δorf1). PCR revealed attP'OP site amplification from culture supernatants (Figure 7C), indicating that all three integrases recognized the att site on the J7 chromosome and excised the defective SNJ2 provirus. Collectively, these results suggest that SNJ2-type recombinases from subgroup I mediate both integration and excision in J7 cells.
DISCUSSION
IntSNJ2 belongs to a novel family of tyrosine recombinases
In this study, we identified over 500 SNJ2-type integrases (including IntSNJ2) encoded within haloarchaeal genomes. Comparison of these integrases with tyrosine recombinases from other known families revealed that SNJ2-type integrases belong to a distinct family. Three main features distinguish SNJ2-type integrases from the known hyperthermophilic archaeal SSV-type and pNOB8-type integrases. First, IntSNJ2-type integrases have a catalytic tetrad of RI…HIIXXRII…Y that is conserved in tyrosine recombinases from Bacteria and Eukarya (Table 1), while SSV- and pNOB8-type integrases usually contain the RI…KIIXXRII…Y and RI…YIIXXRII…Y tetrad sequences, respectively. Second, in IntSNJ2-like integrases, the two conserved sequence motifs Glu/AspI and His/TrpIII, which play important structural roles and stabilize the transition state of the integrase–DNA complex, are replaced with Gly/Ala and Ala/Val, respectively (Supplementary Figure S7). Further mutagenesis of these conserved residues in IntSNJ2 confirmed their essentiality for recombination activity (Supplementary Table S6). Third, IntSNJ2-type recombinases are deeply rooted within the major clade of archaealtyrosine recombinases. Specifically, SNJ2-type integrases diverged from the remaining archaeal homologs early in evolution to catalyze recombination in the intracellular environment specific to halophilic archaea (Figure 1).SNJ2-type integrases could be classified into eight subgroups based on the similarity of their amino acid sequences. Interestingly, each subgroup had similar or sometimes identical core sequences of the att sites (i.e. the tRNA genes used as insertion targets) and distances between the 3′ end of int genes and the 5′ end of the core sequences of the att sites (Figure 5). However, the integrated MGEs in each Int subgroup did not necessarily belong to the same MGE family. For instance, subgroups I and V included both alphapleolipoviruses and betapleolipoviruses, whereas the novel uncharacterized MGEs, despite their relatively close genomic relationship (Figure 6), were distributed across subgroups IV, VI, VII and VIII, and were accordingly integrated into diverse tRNA genes (Figure 5). This indicated that the integrase genes, auxiliary factors and cognate att site form a recombination module that is horizontally exchanged between diverse haloarchaeal MGEs. Furthermore, phylogenetic analysis indicated that integrases do not cluster according to cellular taxonomy, but rather according to the specificity of the target att site, indicating that their evolution is coupled. A similar observation has been made for SSV-type integrases encoded by hyperthermophilic spindle-shaped viruses (71), suggesting that this may be a general feature of archaeal site-specific recombination systems. Further investigation of the relationship between the structure of integrases and their recognized sites may reveal the underlying basis of this phenomenon (72,73).
The role of IntSNJ2 in the maintenance of SNJ2 virus in host cells
All previously isolated pleolipoviruses are not lytic, and are instead capable of continuously releasing virions without rupturing the cell envelope. This only slightly retards host cell growth. Such viruses are produced by infected cells even after several culture passages (74,75). Similar to other members of Pleolipoviridae, SNJ2 is non-lytic, although it was difficult to measure host cell growth during viral release because SNJ2 is co-produced with another virus, the sphaerolipovirus SNJ1 that is lytic upon induction (36). Notably, even though pleolipovirus-derived proviruses are found in numerous haloarchaeal genomes (36,74,76), to date, SNJ2 is the only isolated temperate pleolipovirus. In this study, IntSNJ2 was critical not only for the excision and integration of the SNJ2 genome, but also improved its maintenance in host cells. As expected, integrase-deficient virus, SNJ2Δint::pyrF, still infected J7-3-F host cells, but lost the ability to integrate into the host chromosome. Unexpectedly, the integrase-deficient virus could not persist in infected cells, as revealed by the lack of colonies when infected cells (J7-3-F SNJ2Δint::pyrF) were grown on selective Hv-Ca plates. This contrasts with other studied pleolipoviruses, which remain in host cells even without integration into the host genome (74,75). It is possible that IntSNJ2 evolved to function in virus replication and/or viral genome partitioning during cell division. Indeed, such roles have been previously reported for tyrosine recombinases encoded by the coliphage P1 and the two-micron plasmid of Saccharomyces cerevisiae, which are maintained in host cells due to the presence of Cre/lox and Flp/frt site-specific recombination systems, respectively (14,77,78). Thus, while most non-integrating betapleolipoviruses and alphapleolipoviruses persist in the carrier state for several host passages, SNJ2 relies on the IntSNJ2 not only for integration/excision, but possibly also for episomal viral genome partitioning into daughter cells.
The roles of Orf2SNJ2 and Orf3SNJ2 during IntSNJ2-mediated recombination
Although IntSNJ2 alone catalyzed SNJ2 provirus excision in J7 cells, Orf2SNJ2 and Orf3SNJ2 significantly increased integration efficiency (Figure 4). Orf2SNJ2 and Orf3SNJ2 formed an operon with IntSNJ2, but not all identified IntSNJ2-like recombination systems harbored the two integration factors. This suggested that IntSNJ2 was distinct from the simple bidirectional recombinases (e.g. Cre, Flp and SSV-type integrases), which catalyze recombination alone in a simple and reversible fashion, or the complex unidirectional recombinases (e.g. λ, HP1, P2), which need accessory proteins (e.g. IHF and Xis or Cox) to catalyze excision (18,20,79). Orf2SNJ2 contains a coiled-coil domain and might belong to the Prefoldin superfamily of molecular chaperones that modulate protein folding and stabilization, suggesting that Orf2 may interact with IntSNJ2 to promote its folding or stabilize certain intermediate conformations during recombination. Orf3SNJ2 contains the MarR-like winged helix-turn-helix domain and is likely to act as a DNA-binding protein that interacts with the conserved flanking sequences (B/B' and P'/P arms in Figure 7A) of the core att site (O) during integration. We hypothesize that Orf2SNJ2 and Orf3SNJ2 act as molecular companions of IntSNJ2 to influence the structure of the IntSNJ2–DNA complex and increase recombination efficiency thereby regulating the balance between integration and excision activities of IntSNJ2in vivo.
IntSNJ2-type site-specific recombination systems promote fluidity of the haloarchaeal genomes
Various MGEs, including viruses, plasmids, and transposons, play key roles in the evolution of their hosts by promoting genome plasticity and HGT through site-specific and homologous recombination (59,80,81). We showed that SNJ2-type site-specific recombination system is widespread and associated not only with various pleolipoproviral regions but also with novel MGEs of unknown provenance (Supplementary Table S5). Notably, the SNJ2-like recombination module provided the only connection between these novel MGEs and pleolipoviruses (Figure 6). Such loose connectivity between different families of MGEs is consistent with the previous observation that the global archaeal virus network is sparsely interconnected through a small number of connector genes that function at the MGE–host interface (82). The IntSNJ2-type-mediated integration of MGEs promotes gene flow into the host chromosomes. For instance, H. mukohataei DSM 12286 and Haloarcula marismortui ATCC 43049 harbors five (No. 10, 15, 22, 24 and 36) and three (No. 7, 28 and 37) MGEs (Figure 6), respectively. Moreover, integrases encoded by proviruses residing within one haloarchaeal genus can catalyze recombination in haloarchaea belonging to at least two other genera, suggesting that IntSNJ2-type recombination systems can promote inter-generic gene flow. This was previously observed in haloarchaea residing in an isolated Antarctic lake (83). Finally, recombination plays a key role in the evolution of MGEs and even drives transitions between different MGE classes (82,84–86). In this context, the two MGEs evidently derived from betapleolipoviruses through loss of the virion morphogenetic module (indicated No. 2 and 19 in Figure 6) are particularly interesting. Likely, these two elements propagate as integrative plasmids devoid of the extracellular stage typical of bona fide viruses, and epitomize the dynamics of evolutionary transition between viruses and plasmids.In summary, a novel archaealtyrosine recombinase encoded by the SNJ2 virus mediates site-specific recombination, which substantially differs from the previously characterized simple or complex recombination systems. IntSNJ2-like integrases are widespread in genomes of halophilic archaea associated with various pleolipoproviruses and novel MGEs of unknown provenance. Future studies involving the recombination systems of haloarchaeal MGEs are necessary to further classify genome evolution and gene flow in haloarchaea, and to discover new viruses hidden in the abundant ‘dark matter’ (34) islands in archaeal genomes.Click here for additional data file.
Authors: Eulyn Pagaling; Richard D Haigh; William D Grant; Don A Cowan; Brian E Jones; Yanhe Ma; Antonio Ventosa; Shaun Heaphy Journal: BMC Genomics Date: 2007-11-09 Impact factor: 3.969
Authors: Mart Krupovic; Virginija Cvirkaite-Krupovic; Jaime Iranzo; David Prangishvili; Eugene V Koonin Journal: Virus Res Date: 2017-11-22 Impact factor: 3.303
Authors: Nina S Atanasova; Camilla H Heiniö; Tatiana A Demina; Dennis H Bamford; Hanna M Oksanen Journal: Genes (Basel) Date: 2018-02-28 Impact factor: 4.096
Authors: Yasmin Carla Ribeiro; Lizandra Jaqueline Robe; Danila Syriani Veluza; Cyndia Mara Bezerra Dos Santos; Ana Luisa Kalb Lopes; Marco Aurélio Krieger; Adriana Ludwig Journal: Mob DNA Date: 2019-08-05
Authors: Mart Krupovic; Kira S Makarova; Yuri I Wolf; Sofia Medvedeva; David Prangishvili; Patrick Forterre; Eugene V Koonin Journal: Environ Microbiol Date: 2019-03-18 Impact factor: 5.491
Authors: Catherine Badel; Gaël Erauso; Annika L Gomez; Ryan Catchpole; Mathieu Gonnet; Jacques Oberto; Patrick Forterre; Violette Da Cunha Journal: Environ Microbiol Date: 2019-10-21 Impact factor: 5.491