| Literature DB >> 24704847 |
Michael Black1, Paula Moolhuijzen2, Brett Chapman3, Roberto Barrero4, John Howieson5, Mariangela Hungria6, Matthew Bellgard7.
Abstract
The symbiotic relationship between legumes and nitrogen fixing bacteria is critical for agriculture, as it may have profound impacts on lowering costs for farmers, on land sustainability, on soil quality, and on mitigation of greenhouse gas emissions. However, despite the importance of the symbioses to the global nitrogen cycling balance, very few rhizobial genomes have been sequenced so far, although there are some ongoing efforts in sequencing elite strains. In this study, the genomes of fourteen selected strains of the order Rhizobiales, all previously fully sequenced and annotated, were compared to assess differences between the strains and to investigate the feasibility of defining a core 'symbiome'-the essential genes required by all rhizobia for nodulation and nitrogen fixation. Comparison of these whole genomes has revealed valuable information, such as several events of lateral gene transfer, particularly in the symbiotic plasmids and genomic islands that have contributed to a better understanding of the evolution of contrasting symbioses. Unique genes were also identified, as well as omissions of symbiotic genes that were expected to be found. Protein comparisons have also allowed the identification of a variety of similarities and differences in several groups of genes, including those involved in nodulation, nitrogen fixation, production of exopolysaccharides, Type I to Type VI secretion systems, among others, and identifying some key genes that could be related to host specificity and/or a better saprophytic ability. However, while several significant differences in the type and number of proteins were observed, the evidence presented suggests no simple core symbiome exists. A more abstract systems biology concept of nitrogen fixing symbiosis may be required. The results have also highlighted that comparative genomics represents a valuable tool for capturing specificities and generalities of each genome.Entities:
Year: 2012 PMID: 24704847 PMCID: PMC3899959 DOI: 10.3390/genes3010138
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
REFSEQ identifiers for the fourteen Rhizobiales genomes.
| Chr | Plasmid1 | Plasmid2 | Plasmid3 | Plasmid4 | Plasmid5 | Plasmid6 | |
|---|---|---|---|---|---|---|---|
| NC_009937 | - | - | - | - | - | - | |
| NC_004463 | - | - | - | - | - | - | |
| NC_009485 | NC_009475 | ||||||
| NC_009445 | - | - | - | - | - | - | |
| NC_002678 | NC_002679 | NC_002682 | |||||
| NC_008254 | NC_008242 | NC_008243 | NC_008244 | ||||
| NC_007761 | NC_007762 | NC_007763 | NC_007764 | NC_004041 | NC_007765 | NC_007766 | |
| NC_010994 | NC_010998 | NC_010996 | NC_010997 | ||||
| NC_012850 | NC_012848 | NC_012858 | NC_012853 | NC_012852 | NC_012854 | ||
| NC_011369 | NC_011368 | NC_011366 | NC_011370 | NC_011371 | |||
| NC_008380 | NC_008382 | NC_008383 | NC_008379 | NC_008381 | NC_008384 | NC_008378 | |
| NC_012587 | NC_000914 | NC_012586 | |||||
| NC_009636 | NC_009620 | NC_009621 | NC_009622 | ||||
| NC_003047 | NC_003037 | NC_003078 |
Summary statistics of the fourteen genomes of Rhizobiales.
| Species Code | No. of Plasmids | Total Genome Length (nucleotides) | Protein Coding Genes | tRNA genes | Pseudo genes | GC Content (%) | Proportion of Genome that is Gene Coding (%) | Host (Scientific/common name) | |
|---|---|---|---|---|---|---|---|---|---|
|
| 0 | 5,369,772 | 4717 | 63 | - | 67 | 89 |
| |
|
| 0 | 9,105,828 | 8317 | 56 | - | 64 | 86 |
| |
|
| 1 | 8,493,513 | 7621 | 70 | 90 | 64 | 85 |
| |
|
| 0 | 7,456,587 | 6717 | 66 | 35 | 65 | 85 |
| |
|
| 2 | 7,596,297 | 7272 | 57 | - | 62 | 86 |
| |
|
| 3 | 4,935,185 | 4543 | 68 | 40 | 61 | 89 |
| |
|
| 6 | 6,530,228 | 5963 | 59 | 32 | 61 | 86 |
| |
|
| 3 | 6,448,048 | 6056 | 60 | 15 | 61 | 86 |
| |
|
| 5 | 7,418,122 | 7001 | 63 | 75 | 61 | 86 | ||
|
| 4 | 6,872,702 | 6415 | 65 | 45 | 61 | 86 | ||
|
| 6 | 7,751,309 | 7143 | 61 | 37 | 61 | 86 | Tribe Viciae – | |
|
| 2 | 6,891,900 | 6363 | 70 | - | 63 | 87 | 112 legume species and the non-legume | |
|
| 3 | 6,817,576 | 6213 | 63 | 43 | 61 | 87 | ||
|
| 2 | 6,691,694 | 6218 | 66 | 4 | 62 | 86 |
|
Figure 116S rRNA gene tree built with the multiple 16S rRNA genes from each of the fourteen genomes compared. Taxonomic analysis performed with MEGA5 [68], by constructing a bootstrapped neighbor-joining (NJ) gene tree with Jukes-Cantor substitution.
Figure 2FRECKLE DNA dotplot of the fourteen Rhizobiales chromosomes, constructed from nucleotide FASTA files using an in house script.
Summary of KEGG pathway and protein orthologs in the fourteen Rhizobiales genomes.
| Relevant KEGG Pathways | Number of KEGG Protein Orthologs | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| azc | bja | bbt | bra | mlo | mes | ret | rec | rlg | rlt | rle | rhi | smd | sme | |
| 1.1 Carbohydrate Metabolism | 240 | 295 | 295 | 296 | 274 | 245 | 249 | 279 | 257 | 256 | 299 | 262 | 267 | 273 |
| 1.2 Energy Metabolism | 114 | 147 | 146 | 145 | 113 | 114 | 112 | 111 | 99 | 99 | 118 | 111 | 124 | 138 |
| 1.3 Lipid Metabolism | 52 | 63 | 72 | 72 | 72 | 53 | 61 | 76 | 58 | 57 | 81 | 56 | 58 | 71 |
| 1.4 Nucleotide Metabolism | 102 | 96 | 100 | 99 | 111 | 105 | 100 | 108 | 97 | 99 | 112 | 97 | 98 | 114 |
| 1.5 Amino Acid Metabolism | 221 | 249 | 268 | 258 | 275 | 224 | 235 | 261 | 227 | 241 | 272 | 239 | 236 | 253 |
| 1.6 Metabolism of Other Amino Acids | 50 | 58 | 62 | 57 | 60 | 48 | 54 | 58 | 54 | 57 | 58 | 56 | 54 | 62 |
| 1.7 Glycan Biosynthesis and Metabolism | 30 | 29 | 31 | 33 | 32 | 25 | 33 | 35 | 32 | 34 | 36 | 31 | 22 | 32 |
| 1.8 Metabolism of Cofactors and Vitamins | 117 | 132 | 143 | 137 | 131 | 103 | 122 | 126 | 114 | 117 | 130 | 119 | 117 | 124 |
| 1.9 Biosynthesis of Polyketides and Terpenoids | 29 | 29 | 42 | 43 | 29 | 27 | 28 | 36 | 28 | 27 | 34 | 27 | 27 | 34 |
| 1.10 Biosynthesis of Other Secondary Metabolites | 8 | 21 | 32 | 35 | 31 | 20 | 27 | 34 | 26 | 26 | 37 | 26 | 23 | 28 |
| 1.11 Xenobiotics Biodegradation and Metabolism | 85 | 134 | 166 | 163 | 97 | 72 | 58 | 129 | 60 | 59 | 138 | 55 | 53 | 101 |
| 2.1 Transcription | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
| 2.2 Translation | 134 | 131 | 128 | 129 | 130 | 123 | 128 | 128 | 129 | 129 | 127 | 124 | 129 | 132 |
| 2.3 Folding, Sorting and Degradation | 40 | 40 | 38 | 38 | 36 | 34 | 36 | 37 | 38 | 38 | 37 | 36 | 38 | 38 |
| 2.4 Replication and Repair | 69 | 74 | 71 | 71 | 70 | 70 | 71 | 71 | 73 | 71 | 73 | 70 | 71 | 71 |
| 3.1 Membrane Transport | 119 | 141 | 121 | 114 | 172 | 135 | 159 | 163 | 138 | 131 | 162 | 165 | 148 | 170 |
| 3.2 Signal Transduction | 57 | 61 | 55 | 57 | 50 | 39 | 50 | 55 | 48 | 48 | 53 | 49 | 46 | 49 |
| 4.2 Cell Motility | 38 | 41 | 43 | 43 | 34 | 38 | 40 | 41 | 40 | 40 | 41 | 40 | 40 | 40 |
| Total KEGG Protein Orthologs | 1509 | 1745 | 1817 | 1794 | 1721 | 1479 | 1567 | 1752 | 1522 | 1533 | 1812 | 1567 | 1555 | 1734 |
Summary of selected MCL BLASTline protein cluster groups.
| Cluster Family | Clusters | Proteins | Proteins in Chromosomes | Proteins in Plasmids | Percent in Chromosomes | Percent in Plasmids |
|---|---|---|---|---|---|---|
| Pan-genome (all 14) | 1126 | 28110 | 23686 | 4424 | 84.26 | 15.74 |
| 13 Fix+ genomes | 105 | 1126 | 619 | 507 | 54.97 | 45.03 |
| 11 NodABC+ genomes | 9 | 113 | 60 | 53 | 53.10 | 46.90 |
|
| 206 | 1150 | 1141 | 9 | 99.22 | 0.78 |
|
| 857 | 2981 | 2976 | 5 | 99.83 | 0.17 |
|
| 577 | 1224 | 1224 | 0 | 100.00 | 0.00 |
|
| 86 | 192 | 167 | 25 | 86.98 | 13.02 |
|
| 214 | 2424 | 2081 | 343 | 85.85 | 14.15 |
|
| 161 | 1661 | 1140 | 521 | 68.63 | 31.37 |
|
| 155 | 1347 | 1122 | 225 | 83.30 | 16.70 |
|
| 51 | 347 | 191 | 156 | 55.04 | 44.96 |
|
| 92 | 286 | 197 | 89 | 68.88 | 31.12 |
|
| 253 | 555 | 182 | 373 | 32.79 | 67.21 |
|
| 6 | 21 | 5 | 16 | 23.81 | 76.19 |
|
| 242 | 767 | 476 | 291 | 62.06 | 37.94 |
|
| 123 | 262 | 71 | 191 | 27.10 | 72.90 |
|
| 352 | 1866 | 1436 | 430 | 76.96 | 23.04 |
|
| ||||||
|
| 956 | 986 | 986 | 0 | 100.00 | 0.00 |
|
| 1760 | 1839 | 1839 | 0 | 100.00 | 0.00 |
|
| 1051 | 1107 | 1005 | 102 | 90.79 | 9.21 |
|
| 959 | 987 | 987 | 0 | 100.00 | 0.00 |
|
| 1706 | 1809 | 1613 | 196 | 89.17 | 10.83 |
|
| 857 | 899 | 761 | 138 | 84.65 | 15.35 |
|
| 333 | 344 | 196 | 148 | 56.98 | 43.02 |
|
| 563 | 563 | 374 | 189 | 66.43 | 33.57 |
|
| 576 | 602 | 260 | 342 | 43.19 | 56.81 |
|
| 490 | 492 | 286 | 206 | 58.13 | 41.87 |
|
| 542 | 550 | 276 | 274 | 50.18 | 49.82 |
|
| 758 | 797 | 356 | 441 | 44.67 | 55.33 |
|
| 598 | 639 | 317 | 322 | 49.61 | 50.39 |
|
| 472 | 498 | 170 | 328 | 34.14 | 65.86 |
Figure 3Phylogeny of NodD proteins of the Rhizobiales genomes from this study, achieved through a neighbor joining (NJ) gene tree based on BLOSUM-62 matrix alignment of the proteins in the NodD cluster. Number in sequence label is the GENBANK protein GI number. Colored line indicates genus of each genome as coded for in Tables S1 to S18, as well as that the protein is a confirmed NodD protein. Black lines indicate protein is only a putative protein within the LysR family.
Figure 4Phylogeny of NifN/K proteins of the Rhizobiales genomes from this study, achieved through a neighbor joining (NJ) gene tree based on BLOSUM-62 matrix alignment of the proteins in the NifN/K cluster. Number in sequence label is the GENBANK protein GI number. Colored line indicates genus of each genome as coded for in Tables S1 to S18.