| Literature DB >> 16914439 |
Christopher D Bayliss1, Martin J Callaghan, E Richard Moxon.
Abstract
Phase variable restriction-modification (R-M) systems are widespread in Eubacteria. Haemophilus influenzae encodes a phase variable homolog of Type III R-M systems. Sequence analysis of this system in 22 non-typeable H.influenzae isolates revealed a hypervariable region in the central portion of the mod gene whereas the res gene was conserved. Maximum likelihood (ML) analysis indicated that most sites outside this hypervariable region experienced strong negative selection but evidence of positive selection for a few sites in adjacent regions. A phylogenetic analysis of 61 Type III mod genes revealed clustering of these H.influenzae mod alleles with mod genes from pathogenic Neisseriae and, based on sequence analysis, horizontal transfer of the mod-res complex between these species. Neisserial mod alleles also contained a hypervariable region and all mod alleles exhibited variability in the repeat tract. We propose that this hypervariable region encodes the target recognition domain (TRD) of the Mod protein and that variability results in alterations to the recognition sequence of this R-M system. We argue that the high allelic diversity and phase variable nature of this R-M system have arisen due to selective pressures exerted by diversity in bacteriophage populations but also have implications for other fitness attributes of these bacterial species.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16914439 PMCID: PMC1557822 DOI: 10.1093/nar/gkl568
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Repeat numbers and mod types for H.influenzae and N.meningitidis strains and isolates
| Strain/isolate designation | MLST type | Repeat number | Initiation codon | Mod type | Simple designation (REBASE designations | |||
|---|---|---|---|---|---|---|---|---|
| M.Hin1056ModP-1 | ||||||||
| 47 | 32 | Proximal | 1 | 100 | 100 | 100 | (M.HindORF1056P) | |
| M.Hin1056ModP-2 | ||||||||
| 86-028 | — | 16 | Distal | 2 | 94 | 79 | 94 | (M.Hin86ORF1217P, |
| 375 | 3 | 16 | Distal | 2 | 94 | NDg | ND | M.Hin375ORFAP, |
| 432 | 40 | 16 | Distal | 2 | 94 | 79 | ND | M.Hin432ORFAP, |
| 1124 | 12 | 16 | Distal | 2 | 94 | 79 | 94 | M.Hin1124ORFAP, |
| 1247 | 33 | 19 | Distal | 2 | 94 | ND | ND | M.Hin1247ORFAP) |
| M.Hin1056ModP-3 | ||||||||
| 667 | 57 | 0 | Distal | 3 | 94 | 80 | 92 | (M.Hin667ORFAP, |
| 1180 | 2 | 0 | Distal | 3 | 94 | ND | ND | M.Hin1180ORFAP, |
| 1181 | 2 | 0 | Distal | 3 | 94 | ND | ND | M.Hin1181ORFAP, |
| 1292 | 2 | 0 | Distal | 3 | 94 | ND | ND | M.Hin1292ORFAP) |
| M.Hin1056ModP-4 | ||||||||
| 1231 | 34 | 2 | Proximal | 4 | 95 | ND | ND | (M.Hin1231ORFAP, |
| 1232 | 34 | 2 | Proximal | 4 | 95 | ND | ND | M.Hin1232ORFAP, |
| 257 | 3 | None | 4 | 95 | 78 | 95 | M.Hin2846ORFAP) | |
| M.Hin1056ModP-5 | ||||||||
| 285 | 39 | 7 | None | 5 | 94 | 78 | ND | (M.Hin285ORFAP, |
| 477 | 1 | 14 | Distal | 5 | 94 | ND | ND | M.Hin477ORFAP, |
| 981 | 42 | 8 | Distal | 5 | 94 | ND | ND | M.Hin981ORFAP, |
| 1200 | 36 | 10 | None | 5 | 94 | ND | ND | M.Hin1200ORFAP, |
| 1268 | 36 | 17 | Distal | 5 | 94 | ND | ND | M.Hin1268ORFAP) |
| M.Hin1056ModP-6 | ||||||||
| 1158 | 11 | 0 | Distal | 6 | 93 | ND | ND | (M.Hin1158ORFAP |
| 1159 | 11 | 0 | Distal | 6 | 93 | ND | ND | M.Hin1159ORFAP) |
| M.Hin1056ModP-7A | ||||||||
| 486 | 41 | 0 | Distal | 7A | 94 | ND | ND | (M.Hin486ORFAP) |
| M.Hin1056ModP-7B | ||||||||
| 162 | 37 | 0 | Distal | 7B | 95 | 81 | 96 | (M.Hin162ORFAP) |
| M.Hin1056ModP-8 | ||||||||
| 1008 | 43 | 0 | Distal | 8 | 94 | 77 | ND | (M.Hin1008ORFAP) |
| M.Hin1056ModP-9 | ||||||||
| 1209 | 13 | 19 | Distal | 9 | 92 | ND | ND | (M.Hin1209ORFAP, |
| 1233 | 13 | 5 | Proximal | 9 | 92 | 77 | ND | M.Hin1233ORFAP) |
| M.Hin1056ModP-10 | ||||||||
| 99 | 16 | Distal | 10 | 95 | 83 | 94 | (M.Hin2866ORFAP) | |
| M.Nme1056ModP-11 | ||||||||
| — | 20 | Proximal | 11 | 93 | 79 | 91 | (M.NmeBORF1375P) | |
| M.Nme1056ModP-12 | ||||||||
| — | 3 | None | 12 | 95 | 77 | 92 | (M.NmeAORF1590P) | |
| M.Ngo1056ModP-13 | ||||||||
| — | 37 | Distal | 13 | 92 | 76 | ND | (M.NgoORFC707P) |
aNTHi isolate numbers in normal type. Designations of strains for which genomic data were obtained are in italics.
bFrom H.influenzae MLST database at .
cAll 5′-AGCC except strain Rd which has 5′-AGTC.
dDistal initiation codon is the predicted initiation codon (5′-ATG) of HI1058. Proximal initiation codon is an 5′-ATG that is 53 bp upstream of the repeat tract.
eComparison of nucleotide sequences of isolate to sequence from strain Rd. N-terminal region includes sequences from the distal initiation codon to the repeat tract, the repeat tract and the 264 bp downstream of the repeat tract.
fREBASE designations were developed in consultation with Richard Roberts.
ND = no data.
Figure 1Schematic diagram of the phase variable Type III R-M system of H.influenzae and a summary of the allelic variation of H.influenzae isolates in this locus. Panel (a) shows the reading frames as open or filled rectangles. The intergenic region is shown as a line. The striped box in mod marks the position of the repeat tract whilst the diagonal lines in res signify that the full gene is not represented in the figure. The HI numbers are from the annotation of the H.influenzae strain Rd genome sequence (). The relative positions of the distal (Dis) and proximal (Pro) 5′-ATG initiation codons and of the DPPY motif, characteristic of many R-M systems, are also marked. The top diagram of panel (b) specifies the variations in length (due to insertions/deletions), in nucleotides, of different regions of this locus. These regions are: (i) rnhB; (ii) intergenic sequences; (iii) sequences between distal initiation codon and repeat tract; (iv) repeat tract; (v) conserved sequences; (vi) region containing N-terminal MTase motifs; (vii) variable region; (viii) region containing C-terminal MTase motifs; (ix) res. The lower diagram indicates the extent of the partial sequences of the mod locus (upper line) and of the full-length sequences of mod (middle line) and res (lower line). The numbers below these lines signify the polymorphic sites in each region observed in comparisons of multiple sequences with the number of H.influenzae isolates (see text) analysed being listed on the right-hand side of the figure.
Figure 2Amino acid sequences of full-length H.influenzae Mod proteins. Amino acid sequences were derived from the nucleotide sequences of mod for seven NTHi strains. These sequences were aligned with those of related Mod protein sequences present in the genome sequences of H.influenzae strains R2866 (R2866mod), R2864 (R2864mod) and Rd (Rdmod), N.meningitidis strains MC58 (NMB1375mod) and Z2491 (NMA1590mod) and N.gonorrhoeae strain FA1090 (NGOC707mod). The latter four proteins are identical with the following REBASE genes: M.Hindorf1056P, M.NmeBorf1375P, M.NmeAorf1590P and M.NgoorfC707P. Identical and conserved amino acids are highlighted with black or grey backgrounds, respectively. Missing amino acids are indicated by a dot. Plus signs (+) on the bottom line mark semi-conserved motifs present in 69 Type III R-M systems (Supplementary Figure 4A and data not shown). Asterisks (*) mark amino acids that are conserved in 75% of the 69 Mod proteins.
Positively selected sites in H.influenzae mod genes
| Codon | Amino acid | |
|---|---|---|
| 53* | D | 2.206 |
| 81* | S | 2.256 |
| 105 | S | 2.125 |
| 196** | S | 2.282 |
| 201** | T | 2.280 |
| 229* | V | 2.232 |
| 242* | N | 2.239 |
| 244** | G | 2.289 |
aCodon sites are given with reference to the amino acid sequence in Supplementary (Figure2).
bAmino acids are those encoded by codons in the Rd mod sequence.
cCodons not marked with an asterisk were under positive selection at or above the 90% level; a single asterisk indicates 95% or greater posterior probability of positive selection; and a double asterisk indicates 99% or greater posterior probability.
Figure 3Phylogenetic trees of Mod amino acid sequences from known and putative Type III R-M systems. Amino acids sequences for Mod proteins of Type III R-M systems were obtained from genomic databases and from the REBASE database and aligned with seven full-length NTHi mod amino acid sequences. Alignments were then trimmed manually to include only the semi-conserved regions (A) or to exclude the repeat and variable regions (B). Alignments included 69 Mod sequences (A) or a sub-set of these sequences (B). Phylogenetic trees were generated using these alignments and a NJ algorithm implemented in the program MEGA, version 3 (34).