| Literature DB >> 24710291 |
Takakazu Kaneko1, Hiroko Maita2, Hideki Hirakawa3, Nobukazu Uchiike4, Kiwamu Minamisawa5, Akiko Watanabe6, Shusei Sato7.
Abstract
The complete nucleotide sequence of the genome of the soybean symbiont Bradyrhizobium japonicum strain USDA6T was determined. The genome of USDA6T is a single circular chromosome of 9,207,384 bp. The genome size is similar to that of the genome of another soybean symbiont, B. japonicum USDA110 (9,105,828 bp). Comparison of the whole-genome sequences of USDA6T and USDA110 showed colinearity of major regions in the two genomes, although a large inversion exists between them. A significantly high level of sequence conservation was detected in three regions on each genome. The gene constitution and nucleotide sequence features in these three regions indicate that they may have been derived from a symbiosis island. An ancestral, large symbiosis island, approximately 860 kb in total size, appears to have been split into these three regions by unknown large-scale genome rearrangements. The two integration events responsible for this appear to have taken place independently, but through comparable mechanisms, in both genomes.Entities:
Year: 2011 PMID: 24710291 PMCID: PMC3927601 DOI: 10.3390/genes2040763
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Size of DNA fragments resulting from restriction digestion of the USDA6T genome.
| Total size (kb) | ||
|---|---|---|
| USDA6T (PFGE-gel) | 9,295 | 2,200, 1,850, 1,800, 1,500, 740, 650, 460, 95 |
| USDA6T (computed) | 9,207 | 2,322, 1,702, 1,694, 1,407, 788, 682, 508, 103 |
| USDA110 (computed) | 9,106 | 2,152, 1,911, 1,481, 954, 784, 445, 413, 333, 162, 126, 124, 112, 106, 2, 1 |
General features of USDA6T and USDA110 genomes.
| USDA6T | USDA 110 | |
|---|---|---|
| Size (bp) | 9,207,384 | 9,105,828 |
| G + C content (%) | 63.67 | 64.06 |
| tRNA coding genes | 51 | 50 |
| rRNA gene clusters | 2 | 1 |
| Protein-encoding genes | 8,829 | 8,317 |
| Gene density (bp) | 1,043 | 1,095 |
| Genes assigned to COG | 5,859 | 5,834 |
| Not in COGs | 2,970 | 2,483 |
| Genomic islands | 15 | 14 |
| Insertion sequences | 69 | 104 |
Figure 1Schematic representations of circular replicons in B. japonicum strains. (a) USDA6T chromosome and (b) USDA110. The scale towards the outside of each map indicates genomic location (in kb). The bars in the outermost circle and the second circle show the positions of the putative protein-encoding genes in clockwise and counter-clockwise directions, respectively. The putative genes are represented by 18 colors, based on Clusters of Orthologous Groups (COG) assignments, as described in Supplemental Figure 1. The third circle from the outside indicates positions of structural RNA genes. In the fourth circle, the black bars indicate areas of putative genomic islands inserted in a tRNA gene. Three red bars within the fifth circle represent regions corresponding to the possible symbiosis island. The sixth circle shows the distribution of insertion sequences (ISs), as black bars. The innermost and second circles from the center show the GC skew values (yellow and purple) and the average GC percentage (blue and red), respectively, calculated using a window-size of 10 kb. This map was depicted using the GenomeViz program.
Number of protein genes assigned into Clusters of Orthologous Groups (COG) categories.
| COG code | USDA110 entire genome | USDA6T | |
|---|---|---|---|
|
| |||
| entire genome | GIs | ||
| J: Translation, ribosomal structure, and biogenesis | 197 (2.4%) | 203 (2.3%) | 1 (0.2%) |
| K: Transcription | 468 (5.6%) | 487 (5.5%) | 10 (1.7%) |
| L: Replication, recombination, and repair | 333 (4.0%) | 290 (3.3%) | 30 (5.1%) |
| D: Cell cycle control, cell division, and chromosome partitioning | 33 (0.4%) | 30 (0.3%) | 0 (0.0%) |
| T: Signal transduction mechanisms | 330 (4.0%) | 346 (3.9%) | 30 (5.1%) |
| M: Cell wall/membrane/envelope biogenesis | 263 (3.2%) | 245 (2.8%) | 9 (1.5%) |
| N: Cell motility | 243 (2.9%) | 233 (2.6%) | 9 (1.5%) |
| O: Posttranslational modification, protein turnover, and chaperones | 213 (2.6%) | 216 (2.4%) | 16 (2.7%) |
| C: Energy production and conversion | 419 (5.0%) | 411 (4.7%) | 4 (0.7%) |
| G: Carbohydrate transport and metabolism | 389 (4.7%) | 409 (4.6%) | 4 (0.7%) |
| E: Amino acid transport and metabolism | 717 (8.6%) | 710 (8.0%) | 14 (2.4%) |
| F: Nucleotide transport and metabolism | 93 (1.1%) | 89 (1.0%) | 0 (0.0%) |
| H: Coenzyme transport and metabolism | 183 (2.2%) | 179 (2.0%) | 0 (0.0%) |
| I: Lipid transport and metabolism | 289 (3.5%) | 271 (3.1%) | 2 (0.3%) |
| P: Inorganic ion transport and metabolism | 281 (3.4%) | 296 (3.4%) | 5 (0.8%) |
| Q: Secondary metabolites biosynthesis, transport and catabolism | 356 (4.3%) | 356 (4.0%) | 11 (1.9%) |
| R: General function prediction only | 583 (7.0%) | 599 (6.8%) | 26 (4.4%) |
| S: Function unknown | 444 (5.3%) | 489 (5.5%) | 19 (3.2%) |
| not in COGs | 2,483 (29.9%) | 2,970 (33.6%) | 402 (67.9%) |
| Total | 8,317 | 8,829 | 592 |
The percentage of assigned genes out of the total number of genes is shown in parentheses.
The number of predicted genes assigned inside fifteen GIs of USDA6T is shown.
Figure 2A typical phylogenetic tree using orthologous genes with a single linkage cluster. A gene product (BJ6T39800: recombinase) from USDA6T was selected based on an ortholog cluster corresponding to the phylogenetic pattern of ITS sequences in Bradyrhizobiaceae. Orthologs from USDA110 and other Bradyrhizobium sp. members (BTAi1 and ORS278) were analyzed using the neighbor-joining method, and the resulting phylogenetic tree is depicted. An ortholog from M. loti MAFF303099 was used as the outgroup for this tree.
Figure 3Comparison of USDA6T protein-encoding genes with the USDA110 genome sequence, at the nucleotide sequence level. The plot was generated by using the results of similarity searches of protein-coding regions between USDA6T and USDA110 genome sequences by BLASTN. A total of 6,789 genes with significant similarity were divided into 23 groups based on the % identity in the alignments. The horizontal axis represents the range of % identity in each alignment. The vertical axis represents the number of genes with significant similarity.
Distribution of Insertion Sequences (ISs) in the genomes of USDA6T and USDA110.
| IS name | USDA110 | USDA6T | ||||
|---|---|---|---|---|---|---|
|
|
| |||||
| entire genome | symbiosis island A | symbiosis island C | entire genome | symbiosis island A | symbiosis island C | |
| RSalpha | 15 | 8 | 2 | 9 | 7 | 2 |
| RSbeta | 12 | 7 | 4 | 7 | 3 | 4 |
| FK1 | 6 | 5 | 1 | 5 | 4 | 1 |
| IS1632 | 2 | 2 | 0 | 2 | 2 | 0 |
| ISB20 | 3 | 2 | 1 | 2 | 1 | 1 |
| ISB27 | 3 | 2 | 1 | 3 | 2 | 1 |
| ISBj2 | 10 | 9 | 0 | 10 | 8 | 2 |
| ISBj3 | 4 | 2 | 0 | 1 | 1 | 0 |
| ISBj4 | 6 | 0 | 1 | 1 | 0 | 1 |
| ISBj5 | 5 | 1 | 2 | 5 | 1 | 2 |
| ISBj6 | 1 | 0 | 1 | 0 | 0 | 0 |
| ISBj7 | 11 | 9 | 2 | 10 | 8 | 2 |
| ISBj8 | 4 | 2 | 2 | 5 | 4 | 1 |
| ISBj9 | 2 | 0 | 1 | 0 | 0 | 0 |
| ISBj10 | 1 | 1 | 0 | 2 | 0 | 2 |
| ISBj11 | 5 | 3 | 0 | 1 | 1 | 0 |
| ISBj12 | 1 | 1 | 0 | 2 | 0 | 2 |
| ISBj13 | 3 | 3 | 0 | 1 | 1 | 0 |
| ISBj14 | 7 | 3 | 0 | 1 | 0 | 1 |
| ISBj15 | 3 | 2 | 1 | 2 | 2 | 0 |
|
| ||||||
| Total | 104 | 62 | 19 | 69 | 45 | 22 |
Figure 4Percent identity plot between USDA110 and USDA6T genome sequences. (a) Linear pairwise comparison of regions corresponding to “Locus A” containing “distinct identical genes” from the two genomes. The gray horizontal bars (with a scale corresponding to each genome sequence position), represent the reverse strand of the USDA6T genome and the forward strand of the USDA110 genome. Regions with alignment up to an E-value of 10−4 are represented by highlighted connecting colored lines between USDA6T and USDA110. Colors indicate the % nucleotide identity in the alignment output by BLASTN, according to the vertical scale on the right. Open triangles indicate the positions of tRNA genes on the genome. Arrowheads indicate the locations of gene clusters involved in symbiosis or nitrogen fixation; (b) Linear pairwise comparison of regions corresponding to the “Locus C”.
List of genomic islands of USDA6T.
| GI name | tRNA gene (gene_ID) | GI-left end | GI-right end | GI-length (bp) | GC% | duplication length (bp) | gene_ID of integrase |
|---|---|---|---|---|---|---|---|
| BJ6TGI01 | 911,924 | 937,940 | 26,017 | 55.7 | 50 | BJ6T08690 | |
| BJ6TGI02 | 1,307,833 | 1,310,731 | 2,899 | 62.7 | 18 | - | |
| BJ6TGI03 | 2,046,607 | 2,146,757 | 100,151 | 59.7 | 48 | BJ6T20830 | |
| BJ6TGI04 | 3,118,227 | 3,192,248 | 74,022 | 59.5 | 46 | BJ6T31070 | |
| BJ6TGI05 | 3,192,295 | 3,236,376 | 44,082 | 60.9 | 46 | BJ6T31620 | |
| BJ6TGI06 | 3,236,423 | 3,322,375 | 85,953 | 59.0 | 46 | - | |
| BJ6TGI07 | 3,322,421 | 3,326,439 | 4,019 | 58.6 | 45 | BJ6T32590 | |
| BJ6TGI08 | 4,371,641 | 4,428,687 | 57,047 | 58.8 | 47 | BJ6T42570 | |
| BJ6TGI09 | 5,251,688 | 5,269,566 | 17,879 | 58.4 | 14 | BJ6T51650 | |
| BJ6TGI10 | 5,380,012 | 5,394,916 | 14,905 | 59.3 | 53 | BJ6T52900 | |
| BJ6TGI11 | 7,326,115 | 7,332,736 | 6,622 | 62.7 | 47 | - | |
| BJ6TGI12 | 7,332,777 | 7,363,200 | 30,424 | 62.1 | 39 | - | |
| BJ6TGI13 | 7,790,844 | 7,797,321 | 6,478 | 61.1 | 48 | - | |
| BJ6TGI14 | 8,494,412 | 8,539,486 | 45,075 | 58.5 | 49 | BJ6T82530 | |
| BJ6TGI15 | 8,539,534 | 8,591,258 | 51,725 | 59.5 | 47 | BJ6T82650 |
Figure 5Comparative genomic analysis among three strains: USDA6T, USDA110, and BTAi1. The total non-redundant number of deduced proteins from the three strains is 13,837. The number of proteins per genome is given inside the circles representing the bacterial strains. The overlapping sections indicate shared numbers of proteins. The proportion of the entire protein number is shown in parenthesis.