Literature DB >> 17660435

Comparative genomic analysis of mycobacteriophage Tweety: evolutionary insights and construction of compatible site-specific integration vectors for mycobacteria.

Thuy T Pham1, Deborah Jacobs-Sera1, Marisa L Pedulla2, Roger W Hendrix1, Graham F Hatfull1.   

Abstract

Mycobacteriophage Tweety is a newly isolated phage of Mycobacterium smegmatis. It has a viral morphology with an isometric head and a long flexible tail, and forms turbid plaques from which stable lysogens can be isolated. The Tweety genome is 58 692 bp in length, contains 109 protein-coding genes, and shows significant but interrupted nucleotide sequence similarity with the previously described mycobacteriophages Llij, PMC and Che8. However, overall the genome possesses mosaic architecture, with gene products being related to other mycobacteriophages such as Che9d, Omega and Corndog. A gene encoding an integrase of the tyrosine-recombinase family is located close to the centre of the genome, and a putative attP site has been identified within a short intergenic region immediately upstream of int. This Tweety attP-int cassette was used to construct a new set of integration-proficient plasmid vectors that efficiently transform both fast- and slow-growing mycobacteria through plasmid integration at a chromosomal locus containing a tRNA(Lys) gene. These vectors are maintained well in the absence of selection and are completely compatible with integration vectors derived from mycobacteriophage L5, enabling the simple construction of complex recombinants with genes integrated simultaneously at different chromosomal positions.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17660435      PMCID: PMC2884959          DOI: 10.1099/mic.0.2007/008904-0

Source DB:  PubMed          Journal:  Microbiology (Reading)        ISSN: 1350-0872            Impact factor:   2.777


INTRODUCTION

The isolation and comparative genomic analysis of 30 mycobacteriophages (viruses that infect mycobacterial hosts) reveals them to have high genetic diversity and to typically contain genomes that are genetic mosaics with modules shared with other phage genomes (Hatfull ; Pedulla ). Nucleotide sequence comparison of these genomes shows that while they are overall highly diverse, there are several smaller clusters within the group with genomes that are more similar to each other than to other mycobacteriophage genomes. Similar clusters were revealed through a metaproteomic analysis in which all 3350 putative protein products were organized into related ‘phamilies’ and the genomes compared according to the presence or absence of phamily members (Hatfull ). The availability of groups of closely related phages in the context of a larger, more diverse, group significantly enhances the power of comparative genomic analyses. In particular, comparisons among closely related sequences have the potential to reveal the nature of individual mutational steps of phage evolution, unconfounded by multiple overlapping events. Characterization of mycobacteriophage genomes not only provides insights into viral diversity and evolution but also offers a large, diverse and complex toolbox from which a variety of applications useful for mycobacterial genetics can be derived. A recent example is the identification of mycobacteriophage genes encoding recombination functions related to RecE and RecT which, while rare among mycobacteriophages, are both found in phage Che9c (van Kessel & Hatfull, 2007). These have been utilized to develop a recombineering system to facilitate the construction of gene-replacement mutants by allelic exchange in both Mycobacterium smegmatis and Mycobacterium tuberculosis (van Kessel & Hatfull, 2007). Other examples include the use of phage immunity loci as genetically selectable markers (Donnelly-Wu ; Jain & Hatfull, 2000), regulated gene expression systems (Brown ), and exploitation of phage integration systems (Lee ). The construction of integration-proficient plasmid vectors based on the integration system of mycobacteriophage L5 enables the simple insertion of genes into the chromosomes of both fast- and slow-growing mycobacteria (Lee ; Stover ) and similar vectors based on other phages such as Ms6 have been described previously (Freitas-Vieira ). Provided that the phage-encoded recombination directionality factor (RDF) (Lewis & Hatfull, 2001) is not present in these vectors then the integrated DNAs are more stably maintained in the absence of selection than extrachromosomal plasmid vectors; however, excision-independent integrase-mediated excisive recombination can lead to plasmid loss and accumulation of excised derivatives, especially when the integrated sequences express products deleterious to growth of the recombinant (Springer ). These events can be avoided by using transient expression of integrase to construct recombinants such that the int gene is not present in the stably transformed strains (Hatfull, 2004; Peña ). Introduction of the L5 excise (gene 36) into integrated recombinants leads to efficient integrase-mediated excision (Lewis & Hatfull, 2000) and this has been exploited to determine whether genes are essential for mycobacterial growth (Parish ) and to switch integrated plasmid copies (Pashley & Parish, 2003). A primary benefit of these integration-proficient vector systems is that they enable the construction of single-copy recombinants that avoid the phenotypic effects of multicopy recombinants, including phage and drug resistance (Banerjee ; Barsom & Hatfull, 1996). However, there are often genetic applications that require the insertion of more than one element into the chromosome and methods have been described to piggy-back multiple insertions using L5 integration vectors (Saviola & Bishai, 2004), to manipulate Ms6 vectors to confer integration at different chromosomal loci (Vultos ), or to use alternative integration systems such as those derived from serine-integrases φRv1 and Bxb1 (Bibb ; Kim ). However, these have significant limitations, including reduced frequency, limited strain utilization, or, in the case of the serine-integrases, interruption of chromosomal genes (Kim ; Ojha ). There is thus a need for additional integration-proficient vectors that are fully compatible with other vector systems. In this paper we describe the isolation and genomic characterization of mycobacteriophage Tweety and the development of integration-proficient plasmids carrying the Tweety attP–int region that efficiently transform both fast- and slow-growing mycobacteria. These Tweety-derived vectors integrate at a tRNALys gene as distinct from the tRNAGly chromosomal locus used by L5-derived vectors and are fully compatible, such that co-transformants with both integrating vector systems can be recovered from a single electroporation. M. smegmatis recombinants derived by Tweety-mediated integration are more stably maintained than recombinants derived using L5 integration-proficient vectors in the presence of their cognate integrases and should prove to be useful additions to the arsenal of tools available for mycobacterial genetic manipulation.

METHODS

Bacterial strains.

Mycobacterium smegmatis mc2155 and Mycobacterium bovis bacille Calmette–Guérin (BCG) have been described previously (Jacobs ; Snapper ); electrocompetent cells were prepared as described previously (Bibb & Hatfull, 2002) and transformed using 0.05–1 μg DNA. Media were supplemented with carbenicillin (50 μg ml−1), cycloheximide (10 μg ml−1), kanamycin (20 μg ml−1), hygromycin (50 μg ml−1) or gentamicin (10 μg ml−1) as required.

Phage isolation and genome sequencing.

Tweety was isolated from a moist soil sample from a lawn in the Oakland district near the University of Pittsburgh (PA, USA) . Tweety was plaque purified and sequenced using a shotgun approach as described previously (Pedulla ; Sarkis & Hatfull, 1998). The GenBank accession number is EF536069.

Plasmids and DNA.

Plasmid pMH94 is an L5 integration-proficient vector that has been described previously (Lee ). Plasmids pJV39 and pJV44 were kind gifts from Julia van Kessel, University of Pittsburgh. Plasmid pJV39 is similar to pMH94 but confers hygromycin resistance (HygR) instead of kanamycin resistance (KanR). Plasmids pTTP1A and pTTP1B were constructed as follows. Two primers with XhoI restriction sites were designed and used to amplify the attP and int region from Tweety genomic DNA. This 1.7 kb fragment was inserted by blunt-end cloning into vector pMOSBlue. A clone containing Tweety attP and int was identified and digested with XhoI, and the fragment was subcloned into SalI-digested pMH94. Both pTTP1A and pTTP1B contain the Tweety attP and int, oriE, and kanamycin- and ampicillin-resistance genes. Plasmids pTTP1A and pTTP1B differ in regard to the orientation of the attP–int region with respect to the plasmid backbone. DNA manipulations and agarose gel electrophoresis were as described by Sambrook .

PCR assays.

Site-specific integration between the homologous sequences of Tweety attP and M. smegmatis attB was confirmed using PCR assays. Transformants were prepared for colony PCR by suspending in 200 μl H2O, vortexing 20 times, and heating at 95 °C for 5 min. Approximately 5 μl of the colony mix was used along with Pfu polymerase (Stratagene), dNTPs (10 mM) and 5 % (v/v) DMSO. The four primers used to amplify the attL and attR regions of the recombinant chromosome were TTP1a (5′-CAGTCACGACGTTGTAAAACGACGG-3′), TTP1b (5′-GTCACCGAAAGGCGTGCCCTTGTC-3′), TTP1d (5′-GACCGCTTCAAGAGCGAGCAGTAC-3′) and TTP1e (5′-TCCCGTTGAATATGGCTCATAACACCC-3′). PCR products were analysed by gel electrophoresis.

Plasmid stability.

Transformants of M. smegmatis derived from pTTP1A and pTTP1B were inoculated into Middlebrook 7H10 medium containing ADC [albumin (5 g l−1), dextrose (2 g l−1), NaCl (0.85 g l−1)], Tween 80 (0.05 %) and kanamycin (20 μg ml−1) and grown to saturation. Cultures were diluted 1 : 10 000 into antibiotic-free medium and allowed to grow back to saturation (Lee ). The cultures were repeatedly diluted and grown for a total of approximately 35 generations. Cell samples were then plated for single colonies on solid 7H10/ADC medium in the presence and absence of kanamycin to determine the proportion of antibiotic-resistant colonies.

Electron microscopy.

A suspension of CsCl-purified virions was applied to a sample grid with a carbon-coated nitrocellulose film, stained with 2 % uranyl acetate, and examined in a FEI Morgagni 268 transmission electron microscope equipped with an AMT digital camera system.

RESULTS

Isolation and genomic sequencing of mycobacteriophage Tweety

Mycobacteriophage Tweety was isolated from a soil sample from the Oakland district of Pittsburgh, and was identified (without amplification) as a p.f.u. on a lawn of M. smegmatis mc2155. Following plaque purification, high-titre stocks were prepared and Tweety virions were examined by electron microscopy. Tweety particles have a morphology typical of the Siphoviridae, with an isometric head approximately 60 nm in diameter and a flexible tail approximately 175 nm long (Fig. 1); this morphology is the most common one found among mycobacteriophages (Hatfull ). A small, possibly double-layered, baseplate structure is visible at the tip of the tail, with some apparently flexible fibrous structures extending beyond it. Tweety forms lightly turbid plaques on lawns of M. smegmatis, a plaque morphology that is extremely common among mycobacteriophages.
Fig. 1.

Virion morphology of Tweety. Electron micrograph of mycobacteriophage Tweety particles. Scale bar, 100 nm.

Preliminary restriction analysis of Tweety dsDNA from virions indicated that it is distinct from all previously characterized mycobacteriophages, and its complete genomic sequence was determined using a shotgun strategy. Tweety virion DNA is 58 692 bp in length; it has unique ends with 10 base, single-stranded cohesive 3′-extensions (left end; 3′-GCCTTCCGCG). The Tweety genome is 61.7 mol% G+C, similar to other mycobacteriophage and mycobacterial genomes. Nucleotide sequence comparison with the 30 previously sequenced mycobacteriophage genomes revealed significant sequence similarity with mycobacteriophages Che8, Llij, PMC and, to a lesser degree, Che9d (Fig. 2). The extent of sequence similarity between Tweety and Che8, Llij and PMC appears to be highest in the leftmost parts of these genomes, while being weaker and discontinuous towards the rightmost parts.
Fig. 2.

Nucleotide sequence comparisons of the Tweety, Che8, Che9d, Llij and PMC genomes. The extent of DNA sequence similarity among these mycobacteriophage genomes is illustrated in a Dotter plot using a sliding window of 25 bp (Sonnhammer & Durbin, 1995).

Organization of the Tweety genome

Analysis of the Tweety genome reveals 109 potential ORFs (Table 1), all except eight of which are transcribed in the rightwards direction (Fig. 3). The overall genome organization shares similarities with other mycobacteriophages such as PMC, Llij and Che8 and differs from phages such as L5, D29 and their near relatives, in which genes in the right half of the genome are transcribed leftwards (Ford ; Hatfull & Sarkis, 1993; Hatfull ; Pedulla ). Most of the Tweety genome is utilized as protein-coding regions, although there are small non-coding regions between gene 109 and the right terminus, and between genes 42 and 43, and genes 44 and 45. We have not identified any tRNA, transfer-messenger RNA (tmRNA) or other small RNA genes. An integrase gene (43) of the tyrosine recombinase family lies close to the centre of the genome, and the left arm (genes 1–42) is very similar in organization and sequence to the corresponding parts of the Che8, Llij and PMC genomes (Fig. 3), with the main differences at the right end of the left arm. A putative stem–loop terminator for rightwards transcription is positioned at coordinates 33 784–33 827 immediately following the integrase gene.
Table 1.

Coordinates and putative functions of Tweety genes

GeneF/R*Start positionStop positionMol. mass (kDa)Nearest homologue (% identity)†Function
1F10956417.2PMC gp1 (100 %)
2F587222461.3Llij gp2 (94 %)Terminase
3F2255362551.2LLlij gp3 (99 %)Portal
4F3612436727.6Che8 gp4 (92 %)
5F4393504324.6Che8 gp5 (97 %)
6F5062588329.2Llij gp6 (95 %)Major capsid subunit
7F5883644919.8Che8 gp7 (61 %)
8F6446677811.7Che8 gp8 (77 %)
9F6781711011.9Llij gp9 (89 %)
10F7097750414.8Llij gp10 (66 %)
11F7633844230.1Llij gp11 (98 %)Major tail subunit
12F8579913020.6Llij gp12 (85 %)
13F9150951514.0Llij gp13 (96 %)
14F953413 064119.8Che8 gp14 (44 %)Tapemeasure protein
15F13 06514 77464.1Llij gp15 (95 %)
16F14 86016 56964.2Che8 gp16 (98 %)
17F16 60617 45429.1Llij gp17 (97 %)
18F17 45119 98888.7Llij gp18 (81 %)
19F19 98521 87167.1Llij gp19 (89 %)Carboxypeptidase
20F21 86822 1169.0Llij gp20 (100 %)
21F22 14323 03330.4Llij gp20 (81 %)
22F23 05123 39211.2Llij gp21 (75 %)
23F23 41523 6699.1Che8 gp23 (95 %)
24F23 68224 32021.9Llij gp23 (90 %)
25F24 31725 29730.7Llij gp24 (71 %)
26R25 30125 4927.1Omega gp45 (100 %)
27F25 64225 9269.8Llij gp26 (78 %)
28F26 07626 2285.5Che9d gp32 (100 %)
29F26 26326 5359.8Llij gp29 (97 %)
30F26 53227 74345.4Ms6 gp2 (94 %)Lysin A
31F27 74028 74137.2Ms6 gp3 (96 %)Lysin B
32F28 75128 9848.0Ms6 gp4 (80 %)Holin
33F28 98129 35514.3Ms6 gp5 (73 %)
34F29 38029 6138.7Llij gp34 (92 %)
35F29 60030 42131.3Llij gp35 (99 %)DNA polymerase III ε subunit?
36F30 50830 7298.0Llij gp36 (98 %)
37F30 72230 8143.5Che8 gp39 (100 %)
38R30 92331 0635.8Che8 gp42 (75 %)
39R31 05031 39112.9Llij gp38 (100 %)
40R31 39131 6007.1Llij gp39 (100 %)
41R31 64031 8799.2Omega gp119 (88 %)
42R31 87932 0466.4Omega gp118 (100 %)
43F32 47833 76748.9Llij gp40 (100 %)Integrase
44R33 84334 30116.1PMC gp39 (100 %)
45R34 56435 09719.9Llij gp42 (63 %)
46F35 25135 55611.7M. therm. hyp.‡ (41 %)
47F35 61536 61937.9PMC gp42 (95 %)Antirepressor?
48F36 61636 8317.9PMC gp43 (100 %)
49F36 91737 1147.6Che8 gp52 (100 %)
50F37 15537 59216.8Che9d gp57 (58 %)
51F37 58937 7777.0Che9d gp58 (70 %)
52F37 77438 0048.8Che8 gp55 (72 %)
53F38 00138 60922.8Che8 gp56 (38 %)
54F38 64439 76835.9Che8 gp57 (94 %)
55F39 84740 12810.9Che8 gp58 (71 %)
56F40 12540 47212.7PMC gp53 (94 %)M. tuberculosis secreted protein?
57F40 47240 87315.8PMC gp54 (55 %)WhiB
58F40 87041 37019.0PMC gp55 (71 %)
59F41 36741 73213.7PMC gp57 (63 %)
60F41 73241 8876.4Llij gp57 (86 %)
61F41 88442 1148.4Llij gp58 (74 %)
62F42 11142 2575.5Che9d gp71 (91 %)
63F42 28342 5439.8PMC gp60 (84 %)
64F42 54043 11820.8Llij gp60 (94 %)
65F43 11543 47113.7Llij gp61 (94 %)Endonuclease
66F43 46844 12423.9Che9d gp77 (85 %)DNA methylase
67F44 10844 3418.5Che8 gp72 (100 %)
68F44 34144 4875.5Che8 gp73 (100 %)
69F44 48445 26928Corndog gp7 (56 %)DNA methylase
70F45 24845 59812.8PMC gp78 (84 %)
71F45 59145 8278.8NDM
72F45 82446 93638.8Corndog gp7 (57 %)DNA methylase
73F47 07647 2346.2Llij gp67 (90 %)
74F47 22247 59614.3NDM
75F47 75048 31021.5D. rad. TerF§ (46 %)HNH endonuclease
76F48 31048 5739.5Llij gp70 (88 %)
77F48 57048 7707.6NDM
78F48 76748 9617.1PMC gp73 (96 %)
79F48 95849 34113.5Omega gp81 (100 %)
80F49 33449 5227.6Catera gp14 (76 %)
81F49 48050 02820.5NDM
82F50 02550 30910.8Llij gp77 (100 %)
83F50 30250 4907.0Llij gp78 (78 %)
84F50 56050 93114.3Che8 gp89 (89 %)
85F50 91851 0916.5NDM
86F51 08851 37210.4Omega gp77 (88 %)
87F51 49451 6887.5Che8 gp91 (95 %)
88F51 66151 95111.5NDM
89F51 94852 1276.9Llij gp85 (85 %)
90F52 12752 2645.1PMC gp86 (100 %)
91F52 26152 4346.8PMC gp87 (100 %)
92F52 48752 6546.3PMC gp88 (53 %)
93F52 65152 91410.0Che9d gp10 (51 %)
94F52 91153 55224.8PMC gp92 (81 %)
95F53 66253 8868.0PMC gp93 (97 %)
96F53 88354 0476.2PMC gp94 (96 %)
97F54 05454 2908.9PMC gp95 (97 %)
98F54 23554 4026.8PMC gp96 (69 %)
99F54 39954 80014.6PMC gp97 (84 %)
100F54 79755 0398.8PMC gp98 (64 %)
101F55 03255 1936.4Che8 gp107 (96 %)
102F55 23455 73719.1Omega gp2 (40 %)Ser/Thr protein kinase?
103F55 74356 27919.8NDM
104F56 28456 95825.3D. desul∥ Dde043 (30 %)Glycosyltransferase?
105F57 03957 2457.9Llij gp97 (92 %)
106F57 24257 3674.5NDM
107F57 39357 6027.7NDM
108F57 59958 22523.6PMC gp103 (89 %)
109F58 22558 55112.1PMC gp104 (85 %)HNH endonuclease

*Direction of transcription, F, forward (leftwards); R, reverse (rightwards).

†NDM, no database match.

‡M. therm hyp. Mycobacterium thermoresistible hypothetical protein.

§D. rad., Deinococcus radiodurans.

∥D. desul Desulfovibrio desulfuricans.

Fig. 3.

Map of the Tweety genome and comparison to maps of Che8, Llij, PMC and Che9d. Genomes are represented by horizontal lines with putative genes shown as boxes above (transcribed rightwards) or below (transcribed leftwards) each genome; the number of each gene is shown within each box. The diagonal arrow indicates a programmed translational frameshift between Tweety genes 12 and 13. All genes have been assorted into phamilies (Phams) of related sequences using the computer program ‘Phamerator’ (S. Cresawn, R. W. Hendrix & G. F. Hatfull, unpublished data); the phamily number is displayed above each gene and the boxes colour-coordinated accordingly. Note that the Pham numbers differ from those described previously (Hatfull ). Putative gene functions are noted. (A larger version of this figure is available as supplementary data with the online version of this paper.)

The Tweety genomic left arm: virion structure and assembly genes

Many of the Tweety left arm genes are probably involved in virion structure and assembly, and genes 2, 3, 11 and 14 encode putative terminase, portal, major tail subunit and tapemeasure functions respectively, based on sequence similarity to proteins with established functions. Genes 15, 18, 19, 21, 24 and 25 may all encode minor tail proteins, and we note that the gp19 sequence suggests a carboxypeptidase function, as seen also in several other mycobacteriophage genomes. The two ORFs (12 and 13) between the major tail subunit (11) and tapemeasure genes (14) are arranged such as to express the product of gene 12 (gp12) and a larger protein putatively generated via a translational −2 frameshift approximately 50 bp from the end of gene 12. By analogy with phage lambda, the gp12 and gp12/13 products are probably involved as chaperones in tail assembly; the programmed frameshift is one of the best-conserved features of dsDNA tailed phages (Xu ). The tapemeasure gene is so named because the size of the encoded protein determines the length of the tail (Katsura & Hendrix, 1984; Pedulla ). In most cases the proportionality constant relating the two is 0.15 nm tail length per amino acid of tapemeasure protein, corresponding to an α-helical structure for the tapemeasure protein. The measured length of the Tweety tail is 175 nm (above), and the 1176 amino acids of the tapemeasure protein would make an α-helix of about 176 nm, agreeing very closely with prediction. The major capsid subunit is likely to be encoded by gene 6, since we previously showed (unpublished observations) that the Che8 major capsid subunit is Che8 gp6, which is 99 % identical to Tweety gp6. When the sequence databases were searched with the Tweety gp6 sequence using the psi-blast algorithm, more than 100 phage capsid proteins were found, most with very low levels of similarity. Interestingly, after the near-perfect matches of Llij, PMC and Che8, the best matches are to the major capsid proteins of Escherichia coli phage T7 and its relatives, with some other mycobacteriophage capsid proteins farther down the list. The Tweety lysis genes (30–32) are located at the right end of the left arm and encode lysin A (gp30), lysin B (gp31) and holin (gp32) functions respectively. Tweety gp35 has weak but significant similarity (25 % identity, E-value, 10−5) to a putative DNA polymerase III ε subunit of Xanthomonas phage OP1, and the position of a DNA metabolism gene in the left arm is an unusual feature (also found in phages Che8 and Llij). Mycobacteriophage Cjw1 encodes a homologue of Tweety gp35 (Cjw1 gp115), although in this genome it is located at the right end of the right arm (Pedulla ). The Tweety left arm encodes seven proteins (gp15, gp18, gp19, gp20, gp21, gp24 and gp25) that are all part of an extremely large phamily of minor tail proteins that have complex sequence relationships. Tweety gp18 is nearly identical throughout its entire length to Llij gp18 and PMC gp18, but the similar gene in Che8 encodes two proteins gp18 and gp19. A notable departure of the Tweety left arm from its Che8, Llij and PMC relatives is the apparent splitting of the Llij 20, Che8 21 and PMC 20 into Tweety genes 20 and 21 (Fig. 3). The DNA sequences of these genes are very closely related although Tweety contains a 1 base deletion at codon 66 that shortens the ORF (see Supplementary Fig. S1, available with the online version of this paper); Tweety gene 21 corresponds to the 3′ end of this segment, although it has a somewhat poor ribosome-binding site and it is uncertain whether it is likely to be expressed. The deletion does not appear to result from a sequencing error (Supplementary Fig. S2) and thus probably corresponds to a genomic change with specific biological consequences for virion particles. We note that a similar single-base deletion in the side tail fibre gene of phage lambda has a specific effect on adsorption to E. coli (Hendrix & Duda, 1992) and these may thus reflect the types of mutations that fuel the high degree of variation seen among phage tail fibre proteins (Desplats & Krisch, 2003; Leiman ).

The Tweety genomic right arm

The right arm genes (44–109) are organized distinctly differently from those of phages Che8, Llij and PMC (Fig. 3) and show evident mosaicism, with numerous insertions and deletions, and many genes related to others dispersed throughout other mycobacteriophage genomes. Only few functions of these right arm genes can be predicted, although these include three possible restriction endonucleases (65, 75 and 109) and three probable DNA methylases (66, 69 and 72). The product of gene 47 is similar to proteins with antirepressor activities, although the immunity functions of Tweety or PMC (which carries a homologue of this protein) have yet to be characterized. We note, however, that gp57 is related to WhiB-family transcriptional regulators, and these are quite common among mycobacteriophage genomes. Tweety also encodes an apparent glycosyl transferase (gp104), a function that has been seen occasionally in other phage genomes, though none of these is a member of the sequence family represented by Tweety gp104. The specific role in the Tweety life cycle is unknown, but since this class of enzymes is associated with modifications of both bacterial cell walls and DNA, it could be involved either in phage exclusion or in protection from restriction. Tweety gp102 has weak sequence similarity to parts of bacterial serine/threonine protein kinases.

Tweety gp54: a protein with multiple tetrapeptide repeats

Tweety gp54 is a remarkable protein with high sequence similarity (>95 % identity) at both its N- and C-termini to the corresponding parts of Che8 gp57 and PMC gp51 (Fig. 4). The first striking aspect of Tweety gene 54 is the presence of a central core of very high mol% G+C that is prominent within a mol% G+C scan of the entire Tweety genome (Fig. 4a). Although such a deviation from the average mol% G+C is often indicative of the introduction of DNA elements by horizontal genetic exchange, in this case this seems unlikely. The segment of high mol% G+C corresponds to an apparent expansion of a G+C-rich repeated sequence present in all three related proteins (Supplementary Tables S1 and S2). At the nucleotide level the minimum repeat unit is 12 bp long, of which the first six positions (and their encoded alanine residues) are invariant (Supplementary Table S1). Curiously, positions nine and twelve, which correspond to third codon positions in the utilized reading frame, are also invariant, with greater variation occurring at repeat positions seven (34 Gs, 11 Ts, 3 Cs), eight (45 Gs, 3 As), ten (38 As, 10 Ts) and eleven (38 Gs, 10 As), corresponding to first and second codon positions (Supplementary Table S1). Nevertheless, only two different amino acids are encoded at the fourth residue of the tetrapeptide repeat (serine 38 times, tyrosine 10 times), and three at the third amino acid position (glycine 34 times, tryptophan 11 times, glutamine 3 times) (Supplementary Table S2). This pattern of substitutions within the repeated elements is consistent with selection for variation within this protein.
Fig. 4.

An unusual repeated sequence within Tweety gene 54. (a) Plot of mol% G+C across the Tweety genome, revealing a region of very high mol% G+C within gene 54. The approximate positions where changes in mol% G+C occur are indicated. (b) Alignment of Tweety gp54, Che8 gp57 and PMC gp51, with amino acid identities shown by asterisks. Conserved substitutions are indicated by colons, and semi-conserved substitutions by periods. The red box indicates the amino acid sequence of Tweety gp54 that corresponds to the segment of high mol% G+C in panel a. (c) Sequence of Tweety gp54 showing the locations of repeated sequences. The repeats can be organized as octapeptide repeats (shown as alternating green and red boxes), or as tetrapeptide repeats (shown as alternating darker- and lighter-coloured boxes). Alignments of the sequences of the nucleotide and tetrapeptide repeats are shown in Supplementary Tables S1 and S2 respectively.

This repeat sequence is reminiscent of variable region 2 (VR2) in Bordetella phage BMP-1 (Liu , 2004), in which a 24 bp element (which includes a 19 bp repeat followed by one of three possible 5 bp spacers) is repeated 9–20 times, depending on the phage isolate, in gene bbp36; the role of this variable segment is unknown although it does not appear to reflect changes in host tropism (Liu ). While the number of the tetrapeptide repeated segments in mycobacteriophages Tweety gp54, Che8 gp57 and PMC gp51 differs (48 in Tweety, 26 in Che8 and 15 in PMC), these do not simply correspond to the variants observed in different BMP-1 isolates, since the encoded amino acid sequence also differs; in all three phages, the first two positions are invariant alanines, but the composition of the last two positions is distinctly different (Supplementary Table S2). The function of these gene products and the utility of this repeat and its variation is not known, although the finding of similar structures in otherwise unrelated mycobacteriophage and Bordetella phages suggests that these may be more widespread throughout phage populations than had been previously recognized. Finally, we note that the entire ORF is absent from mycobacteriophage Llij, even though closely related homologues flanking this gene in Tweety, PMC and Che8 are present (Fig. 3). Presumably, Tweety gp54 is not essential for viral growth, as has been demonstrated for BMP-1 bbp36 (Liu ).

Tweety integration functions

At the 5′ site of the integrase gene (43) there is a region of approximately 500 bp that lacks protein-coding potential and is a plausible location for the attP site. Comparison of this region with the M. smegmatis genome using blastn revealed a short segment of sequence identity (45/47 identical base pairs) that overlaps the 3′ end of a host tRNALys gene, a common target for phage integration (Fig. 5a). This indicates that the attP site lies upstream of the Tweety int gene and that Tweety integrates at an attB site located at coordinates 4 847 939–4 847 986 in the M. smegmatis genome. This arrangement also suggests that integration of Tweety results in reconstruction of a hybrid but functional tRNA gene of which the sequence 3′ to the extreme 5′-side of the anticodon stem is phage-derived (Fig. 5b). Interestingly, the two base differences between Tweety and the M. smegmatis genome correspond to the innermost-paired bases in the TψC loop of the tRNA (Fig. 5b). Comparison with other mycobacterial genomes shows that this tRNA and the putative attB sites are conserved in M. tuberculosis, M. bovis, Mycobacterium leprae and Mycobacterium avium. We also note that mycobacteriophages Che8, Llij and PMC contain near-identical integrases and putative attP sites, and probably integrate at the same chromosomal location. Che9d has a closely related integrase (39 % amino acid sequence identity) but a different putative attP site that we predict recombines at a tRNAMet gene (see below).
Fig. 5.

Integration of Tweety into the M. smegmatis genome. (a) Sequence alignment of a segment of the Tweety genome immediately upstream of the int gene (43) (coordinates 32 362–32 290) with part of the M. smegmatis genome (coordinates 4 847 908–4 847 991) reveals a common core sequence (boxed); two non-identical positions are shown in bold type. The position of the M. smegmatis tRNALys gene (Msmeg 4746) is indicated by the arrow. (b) Structure of tRNALys, with an arrow indicating the position corresponding to the 5′ position of the common core. Positions that change following Tweety integration are shown in bold. (c) Organization of integration-proficient plasmid pTT1B and its integration into the M. smegmatis chromosome.

Tweety-based integration-proficient plasmid vectors

The putative attB site is at a distinct location from those previously described for phages L5, Ms6 and Bxb1. We therefore reasoned that integration-proficient vectors derived from phage Tweety would integrate independently from those derived from other phages and could thus be used in conjunction with them without interference. Furthermore, the conservation of the host tRNALys gene provides a potentially broad host range for integrating plasmids. To construct such vectors, a 1.7 kbp segment of the Tweety genome corresponding to the int gene and ∼400 bp of upstream sequences containing the putative attP site were PCR amplified and cloned into a plasmid vector containing a kanamycin-resistance gene that cannot replicate in mycobacteria (Fig. 5c). The two plasmids with the attP–int segment in either orientation (pTTP1A and pTTP1B) were electroporated into M. smegmatis and the numbers of kanamycin-resistant transformants determined (Table 2); both plasmids efficiently transformed M. smegmatis, yielding approximately 105 transformants per μg DNA. PCR analysis showed that every transformant tested derived from integration of the plasmid sequences at the predicted attB site (data not shown).
Table 2.

Transformation of M. smegmatis and BCG by Tweety and L5 integration-proficient vectors

Plasmid(s)FeaturesHostTransformants [c.f.u. (μg DNA)−1]
KanRHygRKanR/HygR
pTTP1ATweety attPint; KanRM. smegmatis2×105
BCG4×105
pTTP1BTweety attPint; KanRM. smegmatis1×105
BCG3×105
pMH94L5 attPint; KanRM. smegmatis2×105
BCG2×105
pJV39L5 attPint; HygRM. smegmatis3×105
BCG4×105
pTTP1A+pJV39Tweety attPint; KanR+L5 attPint; HygRM. smegmatis2×1046×1042×103
BCG2×1041×1042×103
pTTP1B+pJV39Tweety attPint; KanR+L5 attPint; HygRM. smegmatis2×1044×1048×103
BCG2×1047×1032×103
To test whether these Tweety integration-proficient vectors are fully compatible with the previously described L5 integration-proficient vectors, we performed co-electroporations with either pTTP1A or pTTP1B DNA and pJV39, an L5 integration vector conferring hygromycin resistance. Co-transformants were readily recovered, indicating that these plasmid integration systems do not interfere with each other (Table 2). We also prepared electrocompetent cells carrying an L5-integration-proficient plasmid vector and showed that pTTP1A efficiently transforms this strain (data not shown). A similar series of experiments were performed using BCG with similar outcomes although the overall transformation frequencies were somewhat lower (Table 2). The stability of integration-proficient vectors is dependent on the absence of the phage-encoded excise gene. Plasmids pTTP1A and pTTP1B contain no other annotated ORFs other than the integrase gene, so we presume that the putative excise gene is absent. We have not been able to identify any putative excise gene by sequence analysis, although the best candidate is gene 44, not only because it is adjacent to int, but also because there are related copies in phages Che8, PMC and LLij that encode identical integrases (Fig. 3). To test for plasmid stability we grew M. smegmatis transformants in the absence of antibiotic selection for approximately 35 generations and then determined the proportion of recovered colonies that had lost the plasmid drug-resistance gene. Under these conditions, we observed that approximately 15 % of cells had lost an L5-derived integrated plasmid (pMH94) whereas only 3.3 % and 7.4 % had lost plasmids pTTP1A and pTTP1B respectively. As noted previously for L5 vectors, the stability of these Tweety vectors could probably be further increased by using a transient integrase-expression system (Hatfull, 2004, 2006).

DISCUSSION

We have presented here the genome of mycobacteriophage Tweety, a new mycobacteriophage with several interesting and novel features, and its exploitation for the development of integration-proficient vectors that are compatible with those described previously. The Tweety genome is most closely related to those of Che8, PMC and Llij (Hatfull ) (Fig. 2), and this close similarity allows more fine-scale conclusions about evolutionary changes than are available from comparisons among more distantly related phages. Only a few of the Tweety gene functions can be readily predicted, although these include several possible restriction endonucleases and several DNA methylases. However, these do not form well-defined restriction–modification cassettes, and combinations of these are not well conserved in the other closely related mycobacteriophages (Fig. 3). For example, homologues of Tweety gp65 are found in Che8, Llij, PMC and Che9d and there are more distant relatives in mycobacteriophages Cjw1 and Wildcat (Hatfull ). However, none of these have a closely linked DNA modification function that can be readily recognized. The presence of a gene encoding a putative family 2 glycosyltransferase (gp104) in the Tweety genome is intriguing since, to our knowledge, this is the first finding of a member of this sequence family of glycosyltransferases in any phage genome. Similar enzymes have been shown previously to be involved in sugar modifications of bacterial cell walls, and gp104 could play a role in phage exclusion similar to the role proposed for the glucosyltransferase in phage SfV (Bastin ); however, it is also possible that Tweety gp104 could be involved in DNA modification. Phage T4 and its close relatives encode two glycosyltransferases, and these have long been known to add glucose to hydroxymethyl cytosine residues in phage DNA. If the Tweety enzyme also adds sugars to DNA, this would be an example of analogous but not homologous proteins carrying out the same function in different phages. Other examples include phage lysins, integrases and head-maturation proteases. There do not appear to be any closely related homologues of Tweety gene 104 in any other sequenced mycobacterial genome, and it is therefore unclear from where this gene was acquired. We note that the gene immediately upstream, 103, has no identifiable homologues in other phage genomes or elsewhere. Tweety gp54 is unusual with respect to the repeated sequence within the ORF that significantly expands the length of the gene relative to its homologues in phages Che8 (gp57) and PMC (gp51). While the functions of these genes are still unknown, these structures are interesting in their organizational similarity to the VR2 region of Bordetella phage BMP-1. The BMP-1 bbp36 gene that contains VR2 is not essential for phage growth, and we note that Llij does not contain a homologue of Tweety gp54 even though similar flanking genes are present, suggesting that it is not essential for mycobacteriophage growth either. Repeats similar to those in Tweety gp54 are commonly associated with intrinsically unstructured proteins (Tompa, 2003). The development of integration-proficient vectors with site specificities distinct from those developed previously will provide important tools for constructing recombinant mycobacterial strains. The need for such vectors is illustrated by the development of secondary applications for those derived from phages Ms6 and L5 (Saviola & Bishai, 2004; Vultos ), in which either secondary attB sites have been introduced or specificities have been altered mutationally, albeit with significant loss of efficiency (Vultos ). The Tweety integration vectors not only transform both fast- and slow-growing strains efficiently, but do so in a manner that is fully compatible with integration vectors derived from L5 (Table 2) and Bxb1 (data not shown); it is likely that they are also compatible with Ms6-derived vectors. The Tweety vectors are also maintained with reasonable stability in the absence of drug selection, and somewhat more so than the L5-derived vectors. We have not yet been able to identify the Tweety recombination-directionality factor by sequence comparisons, which is perhaps not surprising given the high sequence divergence of these proteins (Lewis & Hatfull, 2001), although Tweety gp44 remains the best candidate for this function. While integrase genes can be readily identified in phage genomes, the locations of the attP sites require somewhat closer examination. The putative location of Tweety attP was indicated by sequence comparison with the M. smegmatis genome, and is facilitated by the use of an attB site that overlaps a host tRNA gene which is reconstructed following integration. Thus finding a long common core (40 bases or more) that overlaps a host tRNA gene is strongly predictive of the attB site location. We have extended this approach to identify potential attB sites of other mycobacteriophage integrases in order to identify those that are the best candidates for development of additional integration-proficient vectors with new specificities (Table 3). Using this approach, we predict that phages Che9d, Che9c, Halo and Omega integrate at tRNAMet, tRNATyr, tRNAArg and tRNALeu genes respectively, using attB sites that are distinct from those of L5, Ms6 and Tweety; three of these phages have conserved attB sites in M. tuberculosis (Table 3), suggesting that these could be potential broad-host-range integration systems. Interestingly, the Halo integration site is similar to that suggested previously for beta family phages of the Corynebacteria (Cianciotto ). This strategy is not applicable for those phages that use serine integrases, although we have identified the attB site for the Bxz2 serine integrase, which is located within the Msmeg_5156 ORF, using experimental approaches (Table 3). The Bxz2 attP and its attB sites share only a 4 bp common core and thus could not simply be identified bioinformatically.
Table 3.

Mycobacteriophage integration systems and putative integration sites

Phage(s)Integrase typeattB locationM. smegmatis coordinatesM. tuberculosis? (bp identity)
L5, D29, Che12TyrtRNAGly4 764 522–4 764 564Yes (42/43)
Tweety, Che8, Llij, PMCTyrtRNALys4 847 934–4 847 981Yes (46/48)
Ms6TyrtRNAAla2 213 189–2 213 214Yes (26/26)
Che9dTyrtRNAMet4 532 823–4 532 867Yes? (42/45)
HaloTyrtRNAArg6 410 365–6 410 399No
OmegaTyrtRNALeu3 328 692–3 328 735Yes? (39/42)
Che9cTyrtRNATyr1 228 421–1 228 480Yes? (53/57)
CJW1, 244Tyr???
Bxb1, U2, BethlehemSergroEL1No
Bxz2SerMsmeg_5156 5 259 344–5 259 347No
In summary, the genomic analysis of mycobacteriophage Tweety and the development of new integration-proficient vectors further illustrate the general utility of mycobacteriophage studies for mycobacterial genetics. Most of the Tweety genomic functions have yet to be explored or exploited, but this phage promises to have potential utility for understanding other important aspects of mycobacterial and bacteriophage biology and evolution.
  39 in total

1.  Identification and characterization of mycobacteriophage L5 excisionase.

Authors:  J A Lewis; G F Hatfull
Journal:  Mol Microbiol       Date:  2000-01       Impact factor: 3.501

2.  Instability and site-specific excision of integration-proficient mycobacteriophage L5 plasmids: development of stably maintained integrative vectors.

Authors:  B Springer; P Sander; L Sedlacek; K Ellrott; E C Böttger
Journal:  Int J Med Microbiol       Date:  2001-03       Impact factor: 3.473

3.  Origins of highly mosaic mycobacteriophage genomes.

Authors:  Marisa L Pedulla; Michael E Ford; Jennifer M Houtz; Tharun Karthikeyan; Curtis Wadsworth; John A Lewis; Debbie Jacobs-Sera; Jacob Falbo; Joseph Gross; Nicholas R Pannunzio; William Brucker; Vanaja Kumar; Jayasankar Kandasamy; Lauren Keenan; Svetsoslav Bardarov; Jordan Kriakov; Jeffrey G Lawrence; William R Jacobs; Roger W Hendrix; Graham F Hatfull
Journal:  Cell       Date:  2003-04-18       Impact factor: 41.582

Review 4.  Intrinsically unstructured proteins evolve by repeat expansion.

Authors:  Peter Tompa
Journal:  Bioessays       Date:  2003-09       Impact factor: 4.345

5.  The diversity and evolution of the T4-type bacteriophages.

Authors:  Carine Desplats; Henry M Krisch
Journal:  Res Microbiol       Date:  2003-05       Impact factor: 3.992

6.  Use of the mycobacteriophage L5 excisionase in Mycobacterium tuberculosis to demonstrate gene essentiality.

Authors:  T Parish; J Lewis; N G Stoker
Journal:  Tuberculosis (Edinb)       Date:  2001       Impact factor: 3.131

7.  Control of directionality in integrase-mediated recombination: examination of recombination directionality factors (RDFs) including Xis and Cox proteins.

Authors:  J A Lewis; G F Hatfull
Journal:  Nucleic Acids Res       Date:  2001-06-01       Impact factor: 16.971

8.  Transcriptional regulation and immunity in mycobacteriophage Bxb1.

Authors:  S Jain; G F Hatfull
Journal:  Mol Microbiol       Date:  2000-12       Impact factor: 3.501

9.  Reverse transcriptase-mediated tropism switching in Bordetella bacteriophage.

Authors:  Minghsun Liu; Rajendar Deora; Sergei R Doulatov; Mari Gingery; Frederick A Eiserling; Andrew Preston; Duncan J Maskell; Robert W Simons; Peggy A Cotter; Julian Parkhill; Jeff F Miller
Journal:  Science       Date:  2002-03-15       Impact factor: 47.728

10.  Integration and excision of the Mycobacterium tuberculosis prophage-like element, phiRv1.

Authors:  Lori A Bibb; Graham F Hatfull
Journal:  Mol Microbiol       Date:  2002-09       Impact factor: 3.501

View more
  49 in total

1.  Genome sequence and characterization of the Tsukamurella bacteriophage TPA2.

Authors:  Steve Petrovski; Robert J Seviour; Daniel Tillett
Journal:  Appl Environ Microbiol       Date:  2010-12-23       Impact factor: 4.792

2.  Functional analysis of molybdopterin biosynthesis in mycobacteria identifies a fused molybdopterin synthase in Mycobacterium tuberculosis.

Authors:  Monique J Williams; Bavesh D Kana; Valerie Mizrahi
Journal:  J Bacteriol       Date:  2010-10-22       Impact factor: 3.490

3.  Genomic characterization of mycobacteriophage Giles: evidence for phage acquisition of host DNA by illegitimate recombination.

Authors:  Peter Morris; Laura J Marinelli; Deborah Jacobs-Sera; Roger W Hendrix; Graham F Hatfull
Journal:  J Bacteriol       Date:  2008-01-04       Impact factor: 3.490

Review 4.  Comparative genomics of the mycobacteriophages: insights into bacteriophage evolution.

Authors:  Graham F Hatfull; Steven G Cresawn; Roger W Hendrix
Journal:  Res Microbiol       Date:  2008-05-07       Impact factor: 3.992

5.  Generation of affinity-tagged fluoromycobacteriophages by mixed assembly of phage capsids.

Authors:  Mariana Piuri; Liliana Rondón; Estefanía Urdániz; Graham F Hatfull
Journal:  Appl Environ Microbiol       Date:  2013-07-12       Impact factor: 4.792

6.  Mycobacteriophage-repressor-mediated immunity as a selectable genetic marker: Adephagia and BPs repressor selection.

Authors:  Zaritza O Petrova; Gregory W Broussard; Graham F Hatfull
Journal:  Microbiology       Date:  2015-06-11       Impact factor: 2.777

7.  pks5-recombination-mediated surface remodelling in Mycobacterium tuberculosis emergence.

Authors:  Eva C Boritsch; Wafa Frigui; Alessandro Cascioferro; Wladimir Malaga; Gilles Etienne; Françoise Laval; Alexandre Pawlik; Fabien Le Chevalier; Mickael Orgeur; Laurence Ma; Christiane Bouchier; Timothy P Stinear; Philip Supply; Laleh Majlessi; Mamadou Daffé; Christophe Guilhot; Roland Brosch
Journal:  Nat Microbiol       Date:  2016-01-27       Impact factor: 17.745

8.  Pathway-selective sensitization of Mycobacterium tuberculosis for target-based whole-cell screening.

Authors:  Garth L Abrahams; Anuradha Kumar; Suzana Savvi; Alvin W Hung; Shijun Wen; Chris Abell; Clifton E Barry; David R Sherman; Helena I M Boshoff; Valerie Mizrahi
Journal:  Chem Biol       Date:  2012-07-27

9.  The use of genomic signature distance between bacteriophages and their hosts displays evolutionary relationships and phage growth cycle determination.

Authors:  Patrick Deschavanne; Michael S DuBow; Christophe Regeard
Journal:  Virol J       Date:  2010-07-17       Impact factor: 4.099

10.  Improved tetracycline repressors for gene silencing in mycobacteria.

Authors:  Marcus Klotzsche; Sabine Ehrt; Dirk Schnappinger
Journal:  Nucleic Acids Res       Date:  2009-01-27       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.