| Literature DB >> 26937267 |
Marike Palmer1, Pieter de Maayer2, Michael Poulsen3, Emma T Steenkamp1, Elritha van Zyl1, Teresa A Coutinho1, Stephanus N Venter1.
Abstract
The genus Pantoea incorporates many economically and clinically important species. The plant-associated species, Pantoea agglomerans and Pantoea vagans, are closely related and are often isolated from similar environments. Plasmids conferring certain metabolic capabilities are also shared amongst these two species. The genomes of two isolates obtained from fungus-growing termites in South Africa were sequenced, assembled and annotated. A high number of orthologous genes are conserved within and between these species. The difference in genome size between P. agglomerans MP2 (4,733,829 bp) and P. vagans MP7 (4,598,703 bp) can largely be attributed to the differences in plasmid content. The genome sequences of these isolates may shed light on the common traits that enable P. agglomerans and P. vagans to co-occur in plant- and insect-associated niches.Entities:
Keywords: Bacteria; Insect; Pantoea; Symbiosis
Year: 2016 PMID: 26937267 PMCID: PMC4774006 DOI: 10.1186/s40793-016-0144-z
Source DB: PubMed Journal: Stand Genomic Sci ISSN: 1944-3277
Fig. 1Photomicrographs of source organisms. The source organisms for a P. agglomerans MP2 and of b P. vagans MP7, stained with safranin
Classification and general features of P. agglomerans MP2 and P. vagans MP7
| MIGS ID | Property |
| Evidence codea |
| Evidence codea |
|---|---|---|---|---|---|
| Classification | Bacteria | NAS [ | Bacteria | NAS [ | |
|
| NAS [ |
| NAS [ | ||
|
| NAS [ |
| NAS [ | ||
|
| NAS [ |
| NAS [ | ||
|
| NAS [ |
| NAS [ | ||
|
| NAS [ |
| NAS [ | ||
|
| NAS [ |
| NAS [ | ||
| Gram stain | Negative | NAS [ | Negative | NAS [ | |
| Cell shape | Straight rods | NAS [ | Short rods | NAS [ | |
| Motility | Motile | NAS [ | Motile | NAS [ | |
| Sporulation | Non-sporeforming | NAS [ | Non-sporeforming | NAS [ | |
| Temperature range | Mesophile | NAS [ | Mesophile | NAS [ | |
| Optimum temperature | 30 °C | NAS [ | 30 °C | NAS [ | |
| pH range; Optimum | 4 - 8; 5–6 | IDA | 4 - 9; 5 -6 | IDA | |
| Carbon source | D-Glucose, L-arabinose, D-galactose, maltose, D-mannitol, D-mannose, L-rhamnose, sucrose, trehalose, D-xylose | NAS [ | Malonic acid, L-ornithine, D-glucose, L-arabinose, D-ribose, D-galactose, sucrose, maltose | NAS [ | |
| Energy source | Chemoorganotroph | NAS [ | Chemoorganotroph | NAS [ | |
| Terminal electron receptor | Not available | Not available | |||
| MIGS-6 | Habitat | Termite | IDA | Termite | IDA |
| MIGS-6.3 | Salinity | Not available | Not available | ||
| MIGS-22 | Oxygen requirement | Facultative anaerobic | NAS [ | Facultative anaerobic | NAS [ |
| MIGS-15 | Biotic relationship | Potential termite symbiont | Potential termite symbiont | ||
| MIGS-14 | Pathogenicity | Not available | Not available | ||
| MIGS-4 | Geographic location | Pretoria, South Africa | Mookgophong, South Africa | ||
| MIGS-5 | Sample collection | January 2010 | January 2010 | ||
| MIGS-4.1 MIGS-4.2 | Latitude – Longitude | S25 43 45.6 E28 14 09.9 | S24 40 30.5 E28 47 50.4 | ||
| MIGS-4.3 | Depth | N/A | N/A | ||
| MIGS-4.4 | Altitude | 1344 m | 1046 m |
IDA Inferred from Direct Assay, TAS Traceable Author Statement (i.e., a direct report exists in the literature), NAS Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are derived from the Gene Ontology project
aEvidence codes
Fig. 2Maximum likelihood phylogenetic tree indicating the phylogenetic relationship of sequenced isolates. The maximum likelihood (ML) tree was constructed from an alignment of concatenated atpD, carA, gyrB, infB, recA and rpoB gene sequences [57]. The tree was constructed with Mega 6 [49] using the general time reversible (GTR) model [36] with the estimation of the proportion of invariable sites and gamma distribution. Bootstrap support values were calculated from 1000 bootstrap replicates. Several strains (including type strains; indicated with “T”) of Pantoea sppecies for which genome sequences are publicly available were included in the analysis [Genbank Accessions: P. agglomerans 190 [26]: GCA_000731125.1, P. vagans C9-1 [10]: GCA_000148935.1, P. anthophila 11–2 [10]: GCA_000969395.1, P. stewartii subsp. indologenes LMG 2632T [38]: GCA_000757405.1, P. stewartii subsp. stewartii DC283 [38]: GCA_000248395.2, P. ananatis LMG 2665 T [38]: GCA_000710035.1, P. ananatis LMG 20103 [38]: GCA_000025405.2, P. septica FF5 [9]: GCA_000612605.1, P. dispersa EGD-AAK13 [26]: GCA_000465555.2, P. rodasii ND03 [11]: GCA_000801085.1, P. rwandensis ND04 [11]:GCA_000759475.1]. Type strains of species of the sister genera Tatumella [Tatumella ptyseos LMG 7888 T [31, 52]: GCA_000439895.1 and Tatumella morbirosei LMG 23360 T [31]: GCA_000757425.2 (Genbank Accessions)] and Erwinia [44, 55], [Erwinia billingiae LMG 2613 T [39]: GCA_000196615.1, Erwinia pyrifoliae DSM 12163 [34]: GCA_000026985.1, Erwinia tasmaniensis Et-99: GCA_000026185.1 (Genbank Accessions)], for which genome sequences are available, were also included. Brenneria goodwinii OBR-1 [GCA_001049335.1 (Genbank Accession)] was used as outgroup
Project information
| MIGS ID | Property |
|
|
|---|---|---|---|
| MIGS-31 | Finishing quality | High-quality draft | High-quality draft |
| MIGS-28 | Libraries used | 500 bp | 500 bp |
| MIGS-29 | Sequencing platforms | Illumina HiSeq mate-pair | Illumina HiSeq mate-pair |
| MIGS-31.2 | Fold coverage | 179 × | 184 × |
| MIGS-30 | Assemblers | Velvet | Velvet |
| MIGS-32 | Gene calling method | RAST | RAST |
| Genbank ID | JPKQ00000000.1 | JPKP00000000.1 | |
| Genbank Date of Release | 23/9/2014 | 23/9/2014 | |
| GOLD ID | Gp0099200 | Gp0099199 | |
| BIOPROJECT | PRJNA254768 | PRJNA254769 | |
| MIGS-13 | Source material identifier | SAMN02905153 | SAMN02905155 |
| Project relevance | Potential termite symbiont | Potential termite symbiont |
Summary of the genomes
| Label | Size (Mb) | Topology | INSDC identifier | RefSeq ID | |
|---|---|---|---|---|---|
|
| Chromosome 1 | 3988.2 | circular | JPKQ0100001-13 | NZ_JPKQ01000001.1-13.1 |
| Plasmid 1 | 184.9 | circular | JPKQ01000014 | NZ_JPKQ01000014.1 | |
| Plasmid 2 | 292.9 | circular | JPKQ01000015 | NZ_JPKQ01000015.1 | |
| Plasmid 3 | 531.5 | circular | JPKQ01000016 | NZ_JPKQ01000016.1 | |
|
| Chromosome 1 | 3913.1 | circular | JPKP01000001-6 | NZ_JPKP01000001.1-6.1 |
| Plasmid 1 | 176.9 | circular | JPKP01000007 | NZ_JPKP01000007.1 | |
| Plasmid 2 | 508.6 | circular | JPKP01000008 | NZ_JPKP01000008.1 |
Fig. 3The genome structure of P. agglomerans MP2. The genome consists of 1 chromosome and 3 plasmids. The order of the contigs was based on the publicly available complete genome sequence of P. vagans C9-1 [45]. The sizes of the contigs varied significantly with the smallest being just below 5 kbp (contig 5) and the largest being just less than 800 kbp (contig 3). The open-reading frames (ORFs) for the forward and reverse strands are indicated in the inner tracks, flanked by the COG classes associated with the respective ORFs. The GC content across the genome is indicated in black, with the GC skew (calculated as [G-C/G + C]) indicated in green and purple, respectively [48]
Fig. 4The genome structure of P. vagans MP7. The genome consists of 1 chromosome and 2 plasmids. The order of the contigs was based on the complete genome sequence of P. vagans C9-1 which is publicly available [45]. The contigs varied in size with the largest (contig 2) being approximately 1,010 kbp and the smallest (contig 6) being just below 50 kbp. The predicted ORFs are indicated in the inner tracks and are flanked with the COG classes associated with each of the ORFs. The GC content of the various regions within the genome is indicated in black, with the GC skew indicated in green and purple [48]
Nucleotide content and gene count levels of the genomes
| Attribute |
|
| ||
|---|---|---|---|---|
| Value | % of totala | Value | % of totala | |
| Genome size (bp) | 4,733,829 | 100 % | 4,598,703 | 100 % |
| DNA coding (bp) | 4,043,819 | 85.4 % | 3,948,783 | 85.9 % |
| DNA G + C (bp) | 2,614,812 | 55.2 % | 2,541,699 | 55.3 % |
| DNA scaffolds | 16 | - | 8 | - |
| Total genesb | 4449 | - | 4277 | - |
| Protein coding genes | 4355 | 100 % | 4181 | 100 % |
| RNA genes | 94 | 2.2 % | 91 | 2.2 % |
| Pseudo genes | - | - | 2 | 0.1 % |
| Genes in internal clusters | - | - | - | - |
| Genes with function prediction | 3470 | 79.7 % | 3351 | 80.1 % |
| Genes assigned to COGs | 3686 | 84.6 % | 3608 | 86.3 % |
| Genes with Pfam domains | 2124 | 48.8 % | 2064 | 49.4 % |
| Genes with signal peptides | 810 | 18.6 % | 768 | 18.4 % |
| Genes with transmembrane helices | 927 | 21.3 % | 906 | 21.7 % |
| CRISPR repeats | 4 | 0.09 % | 3 | 0.07 % |
aThe percentage of total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome
bAlso includes pseudogenes and other genes
Number and proportion of genes associated with 25 COG functional categories
|
|
| ||||
|---|---|---|---|---|---|
| Code | Value | % of totala | Value | % of totala | Description |
| J | 196 | 4.50 % | 194 | 4.54 % | Translation |
| A | 1 | 0.02 % | 2 | 0.05 % | RNA processing and modification |
| K | 358 | 8.22 % | 331 | 7.74 % | Transcription |
| L | 147 | 3.38 % | 137 | 3.20 % | Replication, recombination and repair |
| B | - | - | - | - | Chromatin structure and dynamics |
| D | 42 | 0.96 % | 42 | 1.00 % | Cell cycle control, Cell division, chromosome partitioning |
| Y | - | - | - | - | Nuclear structure |
| V | 48 | 1.10 % | 50 | 1.17 % | Defence mechanisms |
| T | 228 | 5.24 % | 225 | 5.26 % | Signal transduction mechanisms |
| M | 239 | 5.49 % | 242 | 5.66 % | Cell wall/membrane biogenesis |
| N | 90 | 2.07 % | 92 | 2.15 % | Cell motility |
| Z | - | - | - | - | Cytoskeleton |
| W | - | - | - | - | Extracellular structures |
| U | 78 | 1.79 % | 82 | 1.92 % | Intracellular trafficking and secretion |
| O | 137 | 3.15 % | 133 | 3.11 % | Posttranslational modification, protein turnover, chaperones |
| C | 209 | 4.80 % | 206 | 4.82 % | Energy production and conversion |
| G | 395 | 9.07 % | 378 | 8.84 % | Carbohydrate transport and metabolism |
| E | 405 | 9.30 % | 405 | 9.47 % | Amino acid transport and metabolism |
| F | 96 | 2.20 % | 100 | 2.34 % | Nucleotide transport and metabolism |
| H | 164 | 3.77 % | 165 | 3.86 % | Coenzyme transport and metabolism |
| I | 117 | 2.69 % | 106 | 2.48 % | Lipid transport and metabolism |
| P | 244 | 5.60 % | 248 | 5.80 % | Inorganic ion transport and metabolism |
| Q | 77 | 1.77 % | 69 | 1.61 % | Secondary metabolites biosynthesis, transport and catabolism |
| R | 450 | 10.33 % | 430 | 10.05 % | General function prediction only |
| S | 393 | 9.02 % | 387 | 9.05 % | Function unknown |
| - | 669 | 15.36 % | 669 | 15.64 % | Not in COGs |
aThe total is based on the total number of predicted protein coding genes in the annotated genomes
Average nucleotide identity (ANI) values for the sequenced isolates and additional strains representative of the lineages of Pantoea
|
|
|
|
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|---|---|---|---|
|
| --- | 98.06 | 90.66 | 90.83 | 87.96 | 78.79 | 78.87 | 78.73 | 78.83 | 78.05 |
|
| 98.75 | --- | 91.88 | 91.81 | 89.08 | 79.89 | 79.72 | 79.64 | 79.89 | 78.95 |
|
| 90.66 | 91.12 | --- | 96.62 | 87.56 | 78.79 | 78.81 | 78.75 | 78.75 | 78.1 |
|
| 90.87 | 91.17 | 96.71 | --- | 87.57 | 78.9 | 78.84 | 78.69 | 78.6 | 78.11 |
|
| 88.03 | 88.49 | 87.65 | 87.59 | --- | 78.97 | 78.9 | 78.72 | 78.92 | 77.93 |
|
| 78.65 | 79.28 | 78.71 | 78.77 | 78.81 | --- | 83.77 | 83.62 | 77.19 | 76.69 |
|
| 79.01 | 79.48 | 78.99 | 78.98 | 79.05 | 83.87 | --- | 98.99 | 77.54 | 76.92 |
|
| 78.58 | 79.2 | 78.59 | 78.6 | 78.57 | 83.6 | 98.72 | --- | 77.13 | 76.61 |
|
| 78.68 | 79.35 | 78.69 | 78.64 | 78.85 | 77.3 | 77.37 | 77.27 | --- | 82.97 |
|
| 78.03 | 78.44 | 78.02 | 78.01 | 77.97 | 76.81 | 76.78 | 76.73 | 83.02 | --- |