| Literature DB >> 32605986 |
Huansheng Cao1,2, Yohei Shimura3, Morgan M Steffen4, Zhou Yang5, Jingrang Lu6, Allen Joel6, Landon Jenkins2, Masanobu Kawachi3, Yanbin Yin7,8, Ferran Garcia-Pichel9.
Abstract
Water bloom development due to eutrophication constitutes a case of niche specialization among planktonic cyanobacteria, but the genomic repertoire allowing bloom formation in only some species has not been fully characterized. We posited that the habitat relevance of a trait begets its underlying genomic complexity, so that traits within the repertoire would be differentially more complex in species successfully thriving in that habitat than in close species that cannot. To test this for the case of bloom-forming cyanobacteria, we curated 17 potentially relevant query metabolic pathways and five core pathways selected according to existing ecophysiological literature. The available 113 genomes were split into those of blooming (45) or nonblooming (68) strains, and an index of genomic complexity for each strain's version of each pathway was derived. We show that strain versions of all query pathways were significantly more complex in bloomers, with complexity in fact correlating positively with strain blooming incidence in 14 of those pathways. Five core pathways, relevant everywhere, showed no differential complexity or correlations. Gas vesicle, toxin and fatty acid synthesis, amino acid uptake, and C, N, and S acquisition systems were most strikingly relevant in the blooming repertoire. Further, we validated our findings using metagenomic gene expression analyses of blooming and nonblooming cyanobacteria in natural settings, where pathways in the repertoire were differentially overexpressed according to their relative complexity in bloomers, but not in nonbloomers. We expect that this approach may find applications to other habitats and organismal groups.IMPORTANCE We pragmatically delineate the trait repertoire that enables organismal niche specialization. We based our approach on the tenet, derived from evolutionary and complex-system considerations, that genomic units that can significantly contribute to fitness in a certain habitat will be comparatively more complex in organisms specialized to that habitat than their genomic homologs found in organisms from other habitats. We tested this in cyanobacteria forming harmful water blooms, for which decades-long efforts in ecological physiology and genomics exist. Our results essentially confirm that genomics and ecology can be linked through comparative complexity analyses, providing a tool that should be of general applicability for any group of organisms and any habitat, and enabling the posing of grounded hypotheses regarding the ecogenomic basis for diversification.Entities:
Keywords: Microcystis aeruginosazzm321990; adaptation; comparative genomics; cyanobacteria; cyanobacterial bloom; ecogenomics; ecophysiology; genomic complexity; metatranscriptome; water blooms
Mesh:
Year: 2020 PMID: 32605986 PMCID: PMC7327172 DOI: 10.1128/mBio.01155-20
Source DB: PubMed Journal: mBio Impact factor: 7.867
FIG 1Phylogeny of the 43 cyanobacterial strains used based on 16S rRNA gene sequences. Blooming species are in blue and nonblooming in red. Morphological types are indicated according to the traditional Rippka groups.
FIG 2The core and query pathways in Aphanizomenon flos-aquae NIES-81. Each component is detailed in the corresponding tables (Tables S2 to S24). The pathways labeled with dashed borders are not complete (due to the absence of required components) in this strain but may be complete in others.
Ratio of complexity metrics of core and query pathways between blooming and nonblooming strains
| Pathway | Mean ratio | Ratio ( | |||||||
|---|---|---|---|---|---|---|---|---|---|
| NP | NC | NPC | LP | FLP | LC | FLC | PC | ||
| Vesicle | 6.6 | ||||||||
| Toxin | 2.5 | NA | NA | NA | NA | NA | |||
| Osmosis | 2.4 | ||||||||
| PUFA | 2 | 1.1 (1 × 10−1) | |||||||
| AAT | 1.9 | ||||||||
| PBS | 1.8 | ||||||||
| MAA | 1.7 | 1.7 (2 × 10−1) | 1.6 (2 × 10−1) | 1.7 (5 × 10−2) | 1.7 (2 × 10−1) | ||||
| NF | 1.7 | ||||||||
| CCM | 1.7 | ||||||||
| Sugar | 1.6 | ||||||||
| Sulfur | 1.6 | ||||||||
| MetalR | 1.5 | ||||||||
| N | 1.4 | 1.1 (8 × 10−2) | 1.1 (4 × 10−1) | ||||||
| TEVit | 1.4 | 1.1 (4 × 10−1) | |||||||
| P | 1.3 | 1.1 (1 × 10−1) | 1.1 (4 × 10−1) | ||||||
| DrugR | 1.3 | 0.6 (3 × 10−1) | 1.2 (2 × 10−1) | ||||||
| ORS | 1.1 | 1.0 (4 × 10−1) | 1.0 (4 × 10−1) | ||||||
| PSI | 1 | ||||||||
| PSII | 1 | 1.0 (4 × 10−1) | 1.0 (4 × 10−1) | ||||||
| Calvin | 0.9 | 1.0 (1 × 10−1) | 1.0 (3 × 10−1) | 1.0 (7 × 10−1) | 1.0 (4 × 10−1) | 0.7 (6 × 10−2) | 1.0 (4 × 10−1) | 0.6 (5 × 10−2) | 1.0 (3 × 10−1) |
| Glycolysis | 0.9 | 1.0 (9 × 10−1) | 1.0 (7 × 10−1) | 1.0 (7 × 10−1) | 1.0 (5 × 10−1) | 0.7 (8 × 10−2) | 1.0 (5 × 10−1) | 0.9 (5 × 10−2) | 1.0 (7 × 10−1) |
| ETC | 0.9 | 1.0 (1 × 10−1) | 1.0 (2 × 10−1) | 1.0 (8 × 10−1) | 1.0 (6 × 10−1) | 0.8 (9 × 10−2) | 1.0 (8 × 10−1) | 0.8 (8 × 10−2) | 1.0 (2 × 10−1) |
| PPP | 1.0 | 1.0 (5 × 10−1) | 1.0 (3 × 10−1) | 1.0 (7 × 10−1) | 1.0 (4 × 10−1) | 0.8 (6 × 10−2) | 1.0 (7 × 10−1) | 0.9 (5 × 10−2) | 1.0 (6 × 10−1) |
| TCA | 0.9 | 1.0 (5 × 10−1) | 1.0 (5 × 10−1) | 1.0 (7 × 10−1) | 1.0 (5 × 10−1) | 0.9 (5 × 10−2) | 1.0 (3 × 10−1) | 0.7 (5 × 10−2) | 1.0 (7 × 10−1) |
NP, total number of proteins in pathway; NC, total number of protein complexes (including multiprotein complexes and singular proteins); NPC, total number of proteins in the complete multiprotein complexes; LP, total length (in base pairs) of nucleotide sequences encoding all the proteins in pathway; FLP, fraction of total coding length of proteins (the ratio of LP to genome size); LC, total length (in base pairs) of proteins in the complete multiprotein complexes; FLC, fraction of total coding length of proteins in the complete multiprotein complexes (the ratio of LC to genome size); PC, the ratio of NC to the total number of complexes in the reference protein set; NA, not applicable.
P values indicate the significance of the difference between numerator and denominator of each ratio according to Wilcoxon sign-rank tests. Significant ratios are in boldface.
FIG 3Average relative complexity (GCI) in blooming and nonblooming strains for each of the 24 pathways. B, blooming; NB, nonblooming. The background color of each plot represents the P values of the comparison between blooming and nonblooming strains, based on Wilcoxon sign-rank tests.
FIG 4Bubble plots of Pearson’s correlations between GCI and BII in each pathway. The circle size represents either the coefficient of determination (R2) (A) or the slope of the correlation (B) with the significance (P) of the correlation indicated with colors. Two correlations are also shown as examples: Vesicle (C) and Calvin cycle (D).
FIG 5Gas vesicle (Vesicle) (A) and oxidative stress resistance (OSR) (B) genes present in the 113 genomes studied, sorted by blooming capacity.
FIG 6Correlations between the expression levels and GCI of the pathways studied in three different aquatic habitats. Metatranscriptomes were from eutrophic freshwater (Lake Erie, A and B; Harsha Lake, C and D), oligotrophic freshwater (Sparkling Lake, E and F), oligotrophic ocean water (western Arctic Ocean, Canada, G and H). Reference genomes representing the dominant species in these habitats were used for mapping: M. aeruginosa NIES-843 (Lake Erie and Harsha Lake), Oscillatoria nigro-viridis PCC7112 (Sparkling Lake), and Synechococcus sp. WH8102 (western Arctic Ocean, Canada). Pearson correlations were performed with (A, C, E, and G) or without (B, D, F, and H) core pathways. The coefficient of determination (R2) and the significance (P) of the correlations are on top left corner of each panel. GCI values were subject to angular transformation.