| Literature DB >> 28567447 |
Roshan Kumar1, Helianthous Verma1, Shazia Haider2, Abhay Bajaj1, Utkarsh Sood1, Kalaiarasan Ponnusamy3, Shekhar Nagar1, Mallikarjun N Shakarad1, Ram Krishan Negi1, Yogendra Singh1, J P Khurana4, Jack A Gilbert5,6,7, Rup Lal1.
Abstract
Species belonging to the genus Novosphingobium are found in many different habitats and have been identified as metabolically versatile. Through comparative genomic analysis, we identified habitat-specific genes and regulatory hubs that could determine habitat selection for Novosphingobium spp. Genomes from 27 Novosphingobium strains isolated from diverse habitats such as rhizosphere soil, plant surfaces, heavily contaminated soils, and marine and freshwater environments were analyzed. Genome size and coding potential were widely variable, differing significantly between habitats. Phylogenetic relationships between strains were less likely to describe functional genotype similarity than the habitat from which they were isolated. In this study, strains (19 out of 27) with a recorded habitat of isolation, and at least 3 representative strains per habitat, comprised four ecological groups-rhizosphere, contaminated soil, marine, and freshwater. Sulfur acquisition and metabolism were the only core genomic traits to differ significantly in proportion between these ecological groups; for example, alkane sulfonate (ssuABCD) assimilation was found exclusively in all of the rhizospheric isolates. When we examined osmolytic regulation in Novosphingobium spp. through ectoine biosynthesis, which was assumed to be marine habitat specific, we found that it was also present in isolates from contaminated soil, suggesting its relevance beyond the marine system. Novosphingobium strains were also found to harbor a wide variety of mono- and dioxygenases, responsible for the metabolism of several aromatic compounds, suggesting their potential to act as degraders of a variety of xenobiotic compounds. Protein-protein interaction analysis revealed β-barrel outer membrane proteins as habitat-specific hubs in each of the four habitats-freshwater (Saro_1868), marine water (PP1Y_AT17644), rhizosphere (PMI02_00367), and soil (V474_17210). These outer membrane proteins could play a key role in habitat demarcation and extend our understanding of the metabolic versatility of the Novosphingobium species. IMPORTANCE This study highlights the significant role of a microorganism's genetic repertoire in structuring the similarity between Novosphingobium strains. The results suggest that the phylogenetic relationships were mostly influenced by metabolic trait enrichment, which is possibly governed by the microenvironment of each microbe's respective niche. Using core genome analysis, the enrichment of a certain set of genes specific to a particular habitat was determined, which provided insights on the influence of habitat on the distribution of metabolic traits in Novosphingobium strains. We also identified habitat-specific protein hubs, which suggested delineation of Novosphingobium strains based on their habitat. Examining the available genomes of ecologically diverse bacterial species and analyzing the habitat-specific genes are useful for understanding the distribution and evolution of functional and phylogenetic diversity in the genus Novosphingobium.Entities:
Keywords: Novosphingobium; core genome; habitat-specific genes; pangenome; regulatory hubs
Year: 2017 PMID: 28567447 PMCID: PMC5443232 DOI: 10.1128/mSystems.00020-17
Source DB: PubMed Journal: mSystems ISSN: 2379-5077 Impact factor: 6.496
General genome characteristic features of the genus Novosphingobium
| Strain | Source of isolation | Genome size (bp) | No. of contigs/replicons | GC content (%) | No. of genes | No. of essential marker genes | % completeness | Genomic island size (bp) | Accession no. | Reference |
|---|---|---|---|---|---|---|---|---|---|---|
| Freshwater lake | 4,750,579 | 50 | 65.6 | 4,304 | 106 | 99.07 | 501,818 | Unpublished data | ||
| Freshwater lake | 4,232,088 | 84 | 59.4 | 4,074 | 106 | 99.07 | 286,348 | Unpublished data | ||
| Freshwater lake | 4,267,112 | 149 | 65.5 | 3,948 | 104 | 97.20 | 219,491 | Unpublished data | ||
| Acidic lake water | 3,708,535 | 55 | 64.3 | 3,496 | 104 | 97.20 | 248,334 | Unpublished data | ||
| The sample obtained at a depth of 410 m from a borehole sample that was drilled at the Savannah River Site | 4,233,314 | One Chr and two plasmids | 65.1 | 4,124 | 106 | 99.07 | 64,422 | Aylward et al. ( | ||
| Isolated from a surface water sample of the southwest basin of Lake Grosse Fuchskuhle | 3,963,850 | 14 | 65.4 | 3,721 | 105 | 98.13 | 172,938 | Unpublished data | ||
| Sunken wood from Suruga Bay | 5,361,448 | 33 | 65.4 | 5,202 | 103 | 96.26 | 528,404 | Ohta et al. ( | ||
| Mangrove sediment | 5,027,021 | 85 | 63.4 | 4,887 | 106 | 99.07 | 135,248 | Unpublished data | ||
| Muddy sediment of Ulsan Bay | 5,457,578 | One Chr and five plasmids | 63.1 | 5,087 | 106 | 99.07 | 203,560 | Choi et al., 2015 ( | ||
| Marine water and oil interface | 5,313,905 | One Chr and three plasmids | 63.3 | 5,135 | 106 | 99.07 | 181,419 | D’Argenio et al. ( | ||
| Southeast coastal plain, subsurface core at 180-m depth | 4,885,942 | 54 | 63.2 | 4,838 | 106 | 99.07 | 148,732 | Unpublished data | ||
| Isolated from the plant rhizosphere | 6,537,300 | 65 | 63.7 | 6,279 | 105 | 98.13 | 322,771 | Unpublished data | ||
| Rhizosphere of | 5,611,617 | 187 | 65.9 | 5,367 | 105 | 98.13 | 435,323 | Unpublished data | ||
| Isolated from root of plant | 6,952,763 | 194 | 64.5 | 6,330 | 104 | 97.20 | 636,815 | Unpublished data | ||
| Hexachlorocyclohexane-contaminated soil | 5,307,348 | 26 | 64 | 5,220 | 104 | 97.20 | 264,580 | Pearce et al. ( | ||
| Carbofuran-exposed agricultural soil | 5,024,847 | 243 | 63.1 | 5,036 | 106 | 99.07 | 328,926 | Nguyen et al. ( | ||
| Hexachlorocyclohexane-contaminated soil | 4,857,928 | 156 | 64.6 | 4,749 | 105 | 98.13 | 292,630 | Saxena et al. ( | ||
| Isolated from polychlorinate | 5,236,092 | 234 | 63.8 | 5,224 | 106 | 99.07 | 342,845 | Unpublished data | ||
| Rhizosphere of | 6,269,463 | 166 | 64.5 | 6,945 | 100 | 93.46 | 303,084 | Unpublished data | ||
| Steeping fluid of eroded bamboo slips | 4,909,165 | 491 | 65.1 | 4,715 | 104 | 97.20 | 825,534 | Unpublished data | ||
| Derived from an | 3,715,735 | 22 | 64.1 | 3,675 | 106 | 99.07 | 235,969 | Unpublished data | ||
| New Zealand pulp mill effluent | 4,148,048 | 77 | 64 | 3,867 | 106 | 99.07 | 255,167 | Unpublished data | ||
| Biofilm of a bioreactor fed with polychlorinated phenols | 6,304,486 | 115 | 65.1 | 6,079 | 106 | 99.07 | 279,955 | Unpublished data | ||
| Grapevine crown gall tumor | 4,539,029 | 166 | 62.7 | 4,513 | 104 | 97.20 | 295,587 | Gan et al., 2012 ( | ||
| Isolated from activated sludge of sewage treatment plant | 4,291,514 | 54 | 61.3 | 4,223 | 105 | 98.13 | 319,713 | Unpublished data | ||
| Not available | 4,836,455 | 70 | 65.7 | 4,452 | 106 | 99.07 | 612,524 | Unpublished data | ||
| Isolated from a cold fluidized-bed process treating chlorophenol-contaminated groundwater | 4,407,848 | 53 | 65.7 | 4,266 | 105 | 98.13 | 528,688 | Unpublished data |
The number of contigs/replicons or the number of chromosomes (Chr) and plasmids is shown.
FIG 1 Core and pangenome of 27 Novosphingobium strains plotted against the number of genomes. (A) Core genome. The x axis shows the number of genomes, and the y axis shows the core genome size (number of genes) of Novosphingobium spp. (B) Pangenome. The x axis shows the number of genomes added, and the y axis shows the increase in pangenomic content of Novosphingobium spp. with the addition of genomes. The sizes of the core and pangenome clusters were computed using the BDBH algorithm. For the robustness of the calculation, the built-in program runs the sampling experiments (n = 10), where genomes are randomly added to estimate the stability of the core and pangenome. The best-fit Tettelin curve represents the regression line for the core and pangenome.
FIG 2 Phylogenetic clustering of 27 Novosphingobium strains. (A) Phylogeny based on 400 conserved marker genes with 1,000 bootstraps by using S. indicum B90A as an outgroup. (B and C) Average nucleotide identity (ANI)-based phylogeny was constructed with 220 orthologous genes and the whole genome, respectively. The bars represent 1 nucleotide substitution per position.
FIG 3 The protein-protein interaction (PPI) network of four habitats, i.e., freshwater, marine water, rhizosphere, and soil. Expanded view of the network imported from Cytoscape, where nodes represent proteins and edges represent physical interactions. The nodes in all four habitats (freshwater, marine water, rhizosphere, and contaminated soil) were represented as filled circles that were light red, green, dark blue, and light blue, respectively. The edges in all habitats were represented in the form of grey lines. The significant existence of sparsely distributed hubs in four habitat networks were represented by colored circles as purple (freshwater), dark blue (marine), orange (rhizosphere), and pink (contaminated soil).
FIG 4 Topological properties of the PPI networks in the four habitats (freshwater, marine water, rhizosphere, and soil). The Pearson correlation coefficient values (r2) and probability of degree distributions P(k) (A), average clustering coefficient (B), and average neighborhood connectivity of the PPI network (C) are shown. All these properties follow the power law distribution and show the nature of scale-free network, suggesting a hierarchical organization in the network.
FIG 5 Schematic representation of different modes of environmental sulfur uptake and utilization within the Novosphingobium genus. The three different routes for sulfur assimilation are shown. Sulfur assimilation as inorganic sulfur (sulfates and thiosulfates) (A), via ssuABC (transport system) and ssuD (FMNH2-dependent alkane sulfonate monooxygenase) (B), and via taurine transport and metabolism by tauABC (transport system) and tauD (taurine dioxygenase) (C). APS, adenosine phosphosulfate; PAPS, phosphoadenosine phosphosulfate; SUOX, sulfite oxidase.
FIG 6 Matrix and dual dendrogram based on the presence/absence of sulfur metabolism genes was constructed in 19 Novosphingobium genomes belonging to four different habitats, viz., contaminated soil (C), rhizosphere (R), freshwater (F), and marine water (M). The colored and white boxes represent the presence and absence of a gene, respectively. A dendrogram based on the matrix of sulfur metabolism genes was constructed using Pearson correlation and hierarchical clustering.
FIG 7 Correlation between the ability of Novosphingobium strains from four different habitats to degrade aromatic and xenobiotic compounds. (A) The heat map represents clustering of genomes based on the presence of different aromatic degradation pathways. (B) Principal-component analysis (PCA) plot using strain-specific degradation pathways. (C) Distribution of mono- and dioxygenase genes within Novosphingobium genomes.
Characteristic features of predicted phages within the genus Novosphingobium
| Strain | Putative | Region length (kbp) | % GC | Total no. of CDS | No. of phage proteins | No. of hypothetical proteins | Presence (no.)/absence of genes encoding | |||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Integrase/transposes | Short-chain dehydrogenase/reductase SDR | Translation initiation factor IF-2 | Transcription elongation factor NusA | CopG/Arc/MetJ/Ars family transcriptional regulator | Phage shock protein PspC | Methylase | Hsp33 protein | NADPH-dependent FMN reductase | LexA repressor | Putative lipoprotein | 5-Oxoprolinase | OmpA/MotB | Nuclease | Plasmid pRiA4b ORF-3 family protein | Serine | ATPase subunit C | Dioxygenase | Protein-tyrosine-phosphatase | Arsenical resistance | Membrane dipeptidase | Amino acid permease | |||||||
| GCGCCTGATGCGC | 57.2 | 61.40 | 63 | 31 | 32 | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | |
| CTCCCGCTCCGCCA | 39.2 | 62.16 | 55 | 33 | 21 | − | − | − | − | − | − | 1 | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | |
| Unresolved | 23.8 | 65.32 | 30 | 20 | 8 | − | − | − | − | − | − | − | 1 | 1 | − | − | − | − | − | − | − | − | − | − | − | − | − | |
| Unresolved | 12 | 63.81 | 17 | 13 | 4 | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | |
| CAAGGCAGGGAA | 34.9 | 63.00 | 49 | 22 | 25 | − | − | − | − | − | − | − | − | − | 1 | 1 | − | − | − | − | − | − | − | − | − | − | − | |
| Unresolved | 16.8 | 69.11 | 20 | 11 | 8 | − | − | − | − | − | − | − | − | 1 | − | − | − | − | − | − | − | − | − | − | − | − | − | |
| TGCGCGGCGCCTT | 35.2 | 63.73 | 48 | 30 | 18 | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | |
| Unresolved | 16.7 | 68.88 | 24 | 13 | 10 | − | − | − | − | − | − | − | − | 1 | − | − | − | − | − | − | − | − | − | − | − | − | − | |
| GGGCGGTTAGCTCAGTTGGTAGAGCATCTCGTTTACAC | 40.3 | 63.01 | 55 | 33 | 22 | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | |
| GACGGCGCCGAGCAT | 40.5 | 65.26 | 37 | 27 | 10 | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | |
| Unresolved | 35.7 | 64.76 | 54 | 28 | 21 | − | − | − | − | 1 | − | − | − | − | − | − | − | − | − | − | − | − | 1 | 1 | 1 | − | − | |
| TTCGGATCAGGCTCT | 25.9 | 61.12 | 26 | 14 | 7 | 3 | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | 1 | 1 | |
| GAGGGTGAGATG | 36.1 | 61.61 | 27 | 13 | 9 | 2 | − | − | − | − | − | − | − | − | − | − | 2 | − | − | − | − | − | − | − | − | − | − | |
| Unresolved | 19.3 | 68.18 | 27 | 15 | 7 | − | − | − | − | − | − | − | − | 1 | − | − | − | 2 | − | − | − | − | − | − | − | − | − | |
| CGCCGCCGCTGGTCG | 49.9 | 61.54 | 46 | 29 | 15 | 1 | − | − | − | − | − | − | − | − | − | − | − | − | 1 | − | − | − | − | − | − | − | − | |
| Unresolved | 18.5 | 63.16 | 24 | 14 | 5 | 3 | − | − | − | 1 | − | − | − | − | − | − | − | − | − | 1 | − | − | − | − | − | − | − | |
| CCGACCAAAGCACGAACCCGCTCCGCGGGAGAGTCGCTTGGGGTGCCGTAGCGTAGTATTGTTCAGGCTTTGCGTGCGGC | 24.6 | 62.74 | 31 | 14 | 12 | − | − | 1 | 1 | 1 | 1 | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | |
| Unresolved | 23.3 | 63.07 | 29 | 24 | 5 | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | |
| AGGAGCCCACGC | 35.3 | 62.47 | 43 | 29 | 14 | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | |
| Unresolved | 23.8 | 64.75 | 31 | 23 | 8 | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | |
| Unresolved | 13.5 | 64.21 | 15 | 12 | − | 2 | − | − | − | − | − | − | − | − | − | − | − | − | − | − | 1 | − | − | − | − | − | − | |
| Unresolved | 30.9 | 62.27 | 43 | 28 | 15 | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | |
| Unresolved | 27.9 | 64.58 | 34 | 26 | 7 | − | 1 | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | |
| Unresolved | 19.6 | 68.73 | 25 | 15 | 9 | − | − | − | − | 1 | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | |
| GATCAGCTTGCTATGGACAAGACAACCACACGGCC | 23.5 | 59.62 | 23 | 13 | 9 | 1 | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | |
| CGGATTTTAAGTCCGCAGCGTCTACCATTCCGCCACGCCCGCAC | 37.4 | 63.76 | 51 | 34 | 15 | − | − | − | − | − | − | − | − | − | 1 | − | − | − | 1 | − | − | − | − | − | − | − | − | |
| Unresolved | 25.1 | 66.40 | 29 | 19 | 8 | − | − | − | − | − | − | − | 1 | − | − | − | − | − | − | − | − | 1 | − | − | − | − | − | |
| Unresolved | 16.3 | 67.62 | 21 | 11 | 9 | − | − | − | − | − | − | − | − | 1 | − | − | − | − | − | − | − | − | − | − | − | − | − | |
| TTGATGGCGACGC | 52.3 | 60.80 | 37 | 27 | 10 | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | − | |
CDS, coding sequences.
The presence or absence (−) of genes encoding the indicated protein or characteristic is shown. If the gene is present, the number of genes is shown.