| Literature DB >> 20071371 |
Yoshikazu Furuta1, Kentaro Abe, Ichizo Kobayashi.
Abstract
The mobility of restriction-modification (RM) gene complexes and their association with genome rearrangements is a subject of active investigation. Here we conducted systematic genome comparisons and genome context analysis on fully sequenced prokaryotic genomes to detect RM-linked genome rearrangements. RM genes were frequently found to be linked to mobility-related genes such as integrase and transposase homologs. They were flanked by direct and inverted repeats at a significantly high frequency. Insertion by long target duplication was observed for I, II, III and IV restriction types. We found several RM genes flanked by long inverted repeats, some of which had apparently inserted into a genome with a short target duplication. In some cases, only a portion of an apparently complete RM system was flanked by inverted repeats. We also found a unit composed of RM genes and an integrase homolog that integrated into a tRNA gene. An allelic substitution of a Type III system with a linked Type I and IV system pair, and allelic diversity in the putative target recognition domain of Type IIG systems were observed. This study revealed the possible mobility of all types of RM systems, and the diversity in their mobility-related organization.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20071371 PMCID: PMC2853133 DOI: 10.1093/nar/gkp1226
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Various modes of DNA recombination that result in target sequence duplication. (a) Insertion of a DNA transposon typically results in direct repeats of <10 bp, although the Mycoplasma transposon IS1630 forms long and variable target duplications of 19–26 bp. (b) Insertion by site-specific recombination. (c) Insertion with long and variable target duplication.
Figure 2.Search design for finding repeats flanking RM genes. White boxes indicate an RM-related gene; thick black lines indicate 1 kb of flanking sequence. Each curved arrow indicates a pair of black line sequences. When three genes were included, six pairs result.
RM locus pairs classification
| Classification | RM loci pair |
|---|---|
| Homology detected in entire RM regions | |
| A. flanking 5-kb region ≥50% aligned | 244 |
| B. flanking 5-kb region <50% aligned | 24 |
| Homology partially detected in RM regions | |
| C. flanking 5-kb region ≥50% aligned | 99 |
| D. flanking 5-kb region <50% aligned | 18 |
| No homology detected in RM region | |
| flanking 5-kb region ≥50% aligned | |
| E. Substitution | 116 |
| F. Indel | |
| RM insertion with long target duplication | 9 |
| others | 149 |
| G. flanking 5-kb region <50% aligned | 101 |
| Total | 760 |
aCriteria for classifying substitution and indel is the length of the subject genome region which corresponds to the RM region in query genome. If it is longer than 1 kb, the case was classified as substitution. If shorter than 1 kb case, then classified as indel.
Insertion with long target duplication
| Symbols (see | Species | Strain with/without insert | Inserted RM genes | Identity/repeat (bp/bp) | ||
|---|---|---|---|---|---|---|
| upstream- target | downstream- target | upstream- downstream | ||||
| a | 86-028NP/Rd KW20 | NTHI0188 (I, M), NTHI0192 (I, S), NTHI0193 (I, R) | 45/46 | 45/46 | 44/46 | |
| b | 86-028NP/Rd KW20 | NTHI1460 (II, M), NTHI1459 (II, R) | 43/46 | 45/46 | 44/46 | |
| c | HB27/HB8 | TTC1877 (IIGS, RM), TTC1880 (II, M) | 47/47 | 47/47 | 47/47 | |
| d | Temecula I/9a5c | PD1608 (II, R), PD1607 (II, M) | 45/45 | 45/45 | 45/45 | |
| e | CMCP6/YJ016 | VV1_2037 (I, R), VV1_2031 (I, M), VV1_2030 (I, S) | 49/49 | 49/49 | 49/49 | |
Figure 3.RM systems inserted with long target duplications. White triangles indicate a repeated sequence. (a) Comparison within H. influenzae. The 46-bp repeat sequence in Rd KW20 overlaps with 15 bp at the 3′ end of the HI0105 gene. (b) Comparison within H. influenzae. The 46-bp repeat sequence in Rd KW20 overlaps with 2 bp at the 3′ end of HI1589 gene. (c) Comparison within T. thermophilus. Underlined sequences represent the entire tRNA coding region of the query genome in (c2), (d2) and (e2). (d) Comparison within V. vulnificus. The sequence in (d2) corresponds to the strand complementary to tRNA. (e) Comparison within X. fastidiosa. In the original annotation of X. fastidiosa Temecula 1, the C-terminus of an integrase family gene (PD1606) overlapped with 144 bp of the C-terminus of the M gene homolog (PD1607). No overlap of the two genes occurs in the annotation of the same sequence in X. fastidiosa M23, shown here.
Figure 4.Allelic RM systems in a Xanthomonas locus. Homologous regions are indicated in gray.
Figure 5.Allelic diversity in Thermus and Rhodopseudomonas. (a) Allelic Type II systems in T. thermophilus HB8 and HB27. (a1) Alignment. Gray indicates sequence similarity. (a2) Codon usage and GC contents of the third nucleotides of codons (GC3) of relevant RM genes and all HB8 genes. (a3) Codon usage and GC3 of the relevant M gene and all HB27 genes. (b) Allelic Type II M gene and Type IIG RM genes in Rhodopseudomonas palustris HaA2 and CGA009. (b1) Alignment. (b2) Codon usage and GC3 of HaA2 genes. (b3) Codon usage and GC3 of genes of CGA009. Black dots indicate another RM gene.
Figure 6.Allelic diversity in the target recognition domain of Type IIG proteins. Homologous regions are in gray. (a) Alleles of Type IIG RM gene at a C. jejuni locus. (b) Alleles of Type II M and Type IIG RM genes at a S. thermophilus locus.
Figure 7.Frequency of genes flanked by (a) direct or inverted repeats, (b) direct repeats and (c) inverted repeats. The vertical axis indicates percentage of the 11 554 compared RM-system-flanking sequence pairs. See ‘Materials and Methods’ section for control gene calculations. Black and white bars represent frequencies of flanking repeats and control genes, respectively. White circles indicate the ratio of RM systems to control genes for repeat frequency.
Figure 8.Screening and classification of RM rearrangements. (a) Screening procedure for RM systems flanked by repeats. The number of RM systems selected is boxed. See text for details. (b) Classification in the genome comparison analysis. (b1) Insertion with long target duplication or integration by site-specific recombination. Repeated sequences align with the sequence in the other genome. (b2) Substitution. The repeated sequences do not align with the other genome. (b3) Tranposon-like structure. The outer, shorter, and direct repeats (triangles) align with the other genome, but the inner, longer and inverted repeats (arrows) do not.
RM systems flanked by repeat sequences aligned with subject genome sequence
| Query species | Sequene ID | class or phylum | Compared sequences | Sequence ID | Homology group | Left- right | Left- subject | Right- subject | Direction | RM type |
|---|---|---|---|---|---|---|---|---|---|---|
| (a) Insertion with long target duplication | ||||||||||
| NC_000921 | ε-proteobacteria | NC_008086 | Figure S7(a) | 417/423 | 345/370 | 343/370 | D | Type IIP | ||
| NC_007510 | β-proteobacteria | NC_011000 | Figure S7(b) | 384/412 | 233/263 | 234/270 | D | Type IV | ||
| NC_010698 | ε-proteobacteria | NC_000915 | Figure S7(c) | 388/397 | 372/397 | 371/397 | D | Type IIP | ||
| NC_008086 | ε-proteobacteria | NC_011333 | Figure S7(c) | 368/393 | 370/393 | 361/393 | D | Type IIP | ||
| NC_010698 | ε-proteobacteria | NC_008229 | Figure S7(d) | 224/238 | 140/153 | 201/223 | D | Type IIP | ||
| NC_009839 | ε-proteobacteria | NC_003912 | Figure S7(e) | 195/208 | 195/208 | 208/208 | D | Type III | ||
| NC_009707 | ε-proteobacteria | NC_008787 | Figure S7(e) | 106/110 | 105/107 | 98/107 | D | Type III | ||
| NC_002946 | β-proteobacteria | AM886294 | Figure S7(f) | 98/104 | 74/76 | 70/76 | D | Type IIS | ||
| NC_011035 | β-proteobacteria | AM886294 | Figure S7(f) | 99/104 | 70/76 | 75/76 | D | Type IIS | ||
| NC_008086 | ε-proteobacteria | NC_008229 | Figure S7(d) | 98/104 | 98/103 | 94/103 | D | Type IIP | ||
| NC_000921 | ε-proteobacteria | NC_008229 | Figure S7(d) | 99/103 | 84/90 | 82/90 | D | Type IIP | ||
| NC_000915 | ε-proteobacteria | NC_008229 | Figure S7(d) | 100/103 | 98/103 | 95/103 | D | Type IIP | ||
| NC_000915 | ε-proteobacteria | NC_008086 | Figure S7(g) | 83/96 | 67/73 | 55/63 | D | Type IIP | ||
| NC_000921 | ε-proteobacteria | NC_011498 | Figure S7(g) | 89/95 | 81/93 | 82/93 | D | Type IIP | ||
| NC_010296 | cyanobacteria | AM778953 | Figure S7(h) | 88/88 | 88/88 | 88/88 | D | Type II | ||
| NC_010698 | ε-proteobacteria | NC_011498 | Figure S7(i) | 59/64 | 60/62 | 57/62 | D | Type IIS,P | ||
| NC_008782 | β-proteobacteria | NC_011992 | Figure S7(j) | 57/64 | 36/38 | 37/41 | D | Type IIG | ||
| NC_010698 | ε-proteobacteria | NC_008086 | Figure S7(g) | 48/51 | 24/25 | 19/20 | D | Type IIP | ||
| NC_000915 | ε-proteobacteria | NC_011333 | Figure S7(k) | 41/41 | 37/37 | 37/37 | D | Type IIS | ||
| NC_007146 | γ-proteobacteria | NC_000907 | Figure S7(l) | 38/39 | 38/39 | 39/39 | D | Type I | ||
| NC_009567 | γ-proteobacteria | NC_000907 | Figure S7(l) | 38/39 | 38/39 | 39/39 | D | Type I | ||
| NC_009480 | actinobacter | NC_010407 | Figure S7(m) | 29/29 | 23/25 | 23/25 | D | Type IIP | ||
| NC_009566 | γ-proteobacteria | NC_000907 | Figure S7(l) | 24/25 | 25/25 | 24/25 | D | Type I | ||
| (b) Substitution | ||||||||||
| NC_008751 | δ-proteobacteria | NC_002937 | Figure S6(a) | 37/43 | – | – | D | Type I | ||
| NC_003116 | β-proteobacteria | NC_011035 | Figure S6(b) | 23/24 | – | – | D | Type IIG | ||
| NC_000921 | ε-proteobacteria | NC_010698 | Figure S6(c) | 20/22 | – | – | D | Type IIP | ||
| (c) Transposon like RM system | ||||||||||
| NC_006834 | γ-proteobacteria | NC_010717 | 60/65 | – | – | I | Type IIP | |||
| NC_007705 | γ-proteobacteria | NC_010717 | 60/65 | – | – | I | Type IIP | |||
| NC_011035 | β-proteobacteria | NC_003112 | 26/26 | – | – | I | Type IIG | |||
Figure 9.RM systems flanked by direct repeats in one genome and with a corresponding empty site in another genome. White triangles indicate repeat sequences. (a)–(c) Subject genomes with an empty site are not depicted. (d) Insertion of partial M and R genes. Positions of motifs in M genes are depicted in squares.
Figure 10.RM system components flanked by inverted repeats. Black or white arrows indicate inverted repeats.
Figure 11.Transposon-like structure of RM systems flanked by repeat sequences. Triangles and arrows represent different sets of repeated sequences. (a) Type II RM genes in X. oryzae pv. oryzae KAC10331 are flanked by 65-bp inverted repeats (aligned below). The resulting unit is further flanked by 8-bp direct repeats (underlined), which are identical to the 8-bp sequence at the empty locus in X. oryzae pv. oryzae PX099A. The short direct repeat sequences are flanked by part of the predicted recognition sequence of the RM system (boxed) in the other genome. (b) Type II genes in N. gonorrhoeae NCCP 11 945 are flanked by 26-bp inverted repeats (aligned below). The resulting unit is further flanked by 8-bp direct repeats, which are identical to the 8-bp sequence at the empty locus in N. meningitides MC58.