Literature DB >> 35311563

Comparative Genomics of Cyclic di-GMP Metabolism and Chemosensory Pathways in Shewanella algae Strains: Novel Bacterial Sensory Domains and Functional Insights into Lifestyle Regulation.

Alberto J Martín-Rodríguez¹, Shawn M Higdon¹, Kaisa Thorell^2,3, Christian Tellgren-Roth⁴, Åsa Sjöling¹, Michael Y Galperin⁵, Tino Krell⁶, Ute Römling¹.

Abstract

Shewanella spp. play important ecological and biogeochemical roles, due in part to their versatile metabolism and swift integration of stimuli. While Shewanella spp. are primarily considered environmental microbes, Shewanella algae is increasingly recognized as an occasional human pathogen. S. algae shares the broad metabolic and respiratory repertoire of Shewanella spp. and thrives in similar ecological niches. In S. algae, nitrate and dimethyl sulfoxide (DMSO) respiration promote biofilm formation strain specifically, with potential implication of taxis and cyclic diguanosine monophosphate (c-di-GMP) signaling. Signal transduction systems in S. algae have not been investigated. To fill these knowledge gaps, we provide here an inventory of the c-di-GMP turnover proteome and chemosensory networks of the type strain S. algae CECT 5071 and compare them with those of 41 whole-genome-sequenced clinical and environmental S. algae isolates. Besides comparative analysis of genetic content and identification of laterally transferred genes, the occurrence and topology of c-di-GMP turnover proteins and chemoreceptors were analyzed. We found S. algae strains to encode 61 to 67 c-di-GMP turnover proteins and 28 to 31 chemoreceptors, placing S. algae near the top in terms of these signaling capacities per Mbp of genome. Most c-di-GMP turnover proteins were predicted to be catalytically active; we describe in them six novel N-terminal sensory domains that appear to control their catalytic activity. Overall, our work defines the c-di-GMP and chemosensory signal transduction pathways in S. algae, contributing to a better understanding of its ecophysiology and establishing S. algae as an auspicious model for the analysis of metabolic and signaling pathways within the genus Shewanella. IMPORTANCE Shewanella spp. are widespread aquatic bacteria that include the well-studied freshwater model strain Shewanella oneidensis MR-1. In contrast, the physiology of the marine and occasionally pathogenic species Shewanella algae is poorly understood. Chemosensory and c-di-GMP signal transduction systems integrate environmental stimuli to modulate gene expression, including the switch from a planktonic to sessile lifestyle and pathogenicity. Here, we systematically dissect the c-di-GMP proteome and chemosensory pathways of the type strain S. algae CECT 5071 and 41 additional S. algae isolates. We provide insights into the activity and function of these proteins, including a description of six novel sensory domains. Our work will enable future analyses of the complex, intertwined c-di-GMP metabolism and chemotaxis networks of S. algae and their ecophysiological role.

Entities: Chemical

Keywords: Shewanella; c-di-GMP; chemotaxis; sensing; signal transduction; whole-genome sequencing

Year: 2022 PMID： 35311563 PMCID： PMC9040814 DOI： 10.1128/msystems.01518-21

Source DB: PubMed Journal: mSystems ISSN： 2379-5077 Impact factor: 7.324

INTRODUCTION

The gammaproteobacterial genus Shewanella comprises more than 70 species of facultative anaerobes that thrive in aquatic ecosystems, such as the water column, sedimental microbial communities, the microbiota of animals, and biofilms (1, 2). The renowned physiological versatility of Shewanella spp. is reflected by their isolation from habitats with remarkably diverse environmental conditions with respect to temperature, from polar to tropical (3, 4); salinity, including hypersaline environments (5); pressure, including deep-sea environments (6); and oxygen concentration, including hypoxic or anoxic waters (7). Their widespread distribution and outstanding metabolic toolset make Shewanella spp. important players in global biogeochemical cycles, and significant research and technical efforts are devoted to exploiting their biotechnological potential (1). While most Shewanella spp. are regarded as environmental bacteria, there is increasing evidence of the pathogenic potential in some species. The species Shewanella algae is responsible for approximately 80% of all Shewanella infections in humans (8). In the majority of cases, patients have underlying conditions, but occasionally, healthy individuals are affected (8, 9). Besides, S. algae has been reported to be the causative agent of animal infections, including mass mortality events of reared abalone (10) and ulcer disease in reared fish (11, 12), which underscores its pathogenic potential to higher eukaryotic organisms. Prokaryotic genome plasticity contributes to niche adaptation, which can be reflected at the genomic level by a variable accessory genome content, single nucleotide polymorphisms, and genome rearrangements. Strain-specific adaptation involving gene acquisition or loss has been described for a representative set of Shewanella spp. (13). Horizontal gene transfer has been recognized as an important driving force shaping Shewanella genomes, including genes involved in metal reduction (7, 14). We have previously reported strain-specific, respiration-driven biofilm formation in S. algae (15), a phenotype and potential niche adaptation mechanism in which sensing and signal transduction systems may play a role. The switch from a free-living, planktonic lifestyle to a sessile, biofilm mode of life constitutes a major physiological challenge for bacteria. In most Gram-negatives, the second messenger cyclic diguanosine monophosphate (c-di-GMP) regulates this lifestyle transition. Thus, high (local) c-di-GMP levels can correlate with a biofilm phenotype, and low (local) c-di-GMP pools are frequently associated with motile behavior (16, 17). Biosynthesis of c-di-GMP from GTP is performed by GGDEF domain-containing proteins termed diguanylate cyclases (DGCs), whereas its breakdown into GMP and pGpG is catalyzed by c-di-GMP-specific phosphodiesterases (PDEs) through either EAL or HD-GYP domains (16, 17). Dual GGDEF-EAL domain-containing proteins can possess either DGC or PDE activity, or both, or be catalytically inactive (16, 17). Most c-di-GMP turnover proteins harbor N-terminal sensory input domains that respond to environmental stimuli to control catalytic activities of the output domains (18). Local, target-specific cytosolic pools of c-di-GMP are increasingly recognized to play important regulatory roles (19). An example is c-di-GMP production by AdrA (DgcC) in Salmonella enterica serovar Typhimurium and Escherichia coli, which largely acts in situ to activate the cellulose synthase BcsA (20, 21). Shewanella spp. are among the bacteria with the highest number and density of GGDEF, EAL, and HD-GYP domain-encoding genes in their genomes (https://www.ncbi.nlm.nih.gov/Complete_Genomes/c-di-GMP.html), which represents the c-di-GMP proportion of the "bacterial IQ" (22, 23). This suggests a significant impact of c-di-GMP signaling on the physiology of Shewanella spp. (24–26), including S. algae (15), which is also observed in other aquatic gammaproteobacteria such as Vibrio spp. (27, 28). The complex physiology of Shewanella spp. requires fine-tuned integration of environmental stimuli with regulation at all levels from gene expression to posttranslational modifications. Chemosensory pathways represent major bacterial signal transduction systems that can be classified into three main families. Members of the Fla family mediate flagellum-based chemotaxis, those of the TFP family are associated with type IV pilus-based motility, whereas the ACF family systems carry out alternative cellular functions like the control of c-di-GMP levels (29). Chemotactic movements are most frequently chemoattraction (i.e., movements to more optimal conditions for growth or survival) whereas there are relatively few cases of chemorepulsion (30). Energy taxis is an alternative mode of flagellum-based swimming motility that results in migration toward environments with more optimal levels of metabolic resources (30). In a canonical chemotaxis chemosensory pathway, chemoeffectors bind to the extracytosolic ligand-binding domains of chemoreceptors, an event that triggers a molecular stimulus that is transmitted across the membrane, where it modulates the activity of the CheA autokinase, which in turn modulates the phosphorylation of the response regulator CheY. Only the phosphorylated form of CheY binds to the flagellar motor, allowing chemotaxis to occur. In addition, the sensitivity of chemosensory pathways is adjusted to the chemoeffector concentration by the concerted action of the methyltransferase CheR and methylesterase CheB, which control the methylation state of several glutamate residues in the chemoreceptor’s signaling domain (31, 32). In addition, chemosensory pathways employ a number of auxiliary proteins that are present in some, but not all, pathways, such as the CheY phosphatases CheC, CheX, and CheZ (29). Chemotaxis has been extensively studied in the model species E. coli. However, over the last decades chemotaxis has also been studied in other species with a different lifestyle, revealing a wide diversity in the number and type of chemoreceptors as well as in the signaling mechanisms (33, 34). S. algae, inhabiting environmental and host-associated niches with different levels of trophism, constitutes an attractive archetypal model for the understanding of Shewanella species chemotactic systems and their involvement in niche colonization, evasion, and pathogenesis. Chemotaxis of marine bacteria is currently poorly investigated, revealing a need for further research in this area. There is evidence that Shewanella spp. perform chemotaxis (35, 36) and energy taxis (37). In addition, there is initial evidence for an additional chemosensory pathway in Shewanella (38) that is currently of unknown function. In this work, we define the repertoire of c-di-GMP turnover proteins and chemosensory systems in a collection of 42 S. algae strains, including the type strain, CECT 5071T (=DSM 9167T), whose complete genome sequence has been recently obtained (39). Through whole-genome sequencing (WGS), that included closing of 4 additional genomes, and subsequent bioinformatic analysis, we analyzed the pangenome of this strain collection and identified a diverse set of accessory genes that suggest a significant extent of physiological heterogeneity. We describe here the reference c-di-GMP proteome of the type strain S. algae CECT 5071, which, according to the presence and distribution of key signature motifs, consists of predicted catalytically active and inactive members. Reconstruction of the GGDEF, EAL, and HD-GYP domain phylogenies revealed a diverse origin of the members of this proteome, and intraspecific comparisons showcased substantial variability among isolates. Analysis of the N-terminal regions of c-di-GMP turnover proteins unraveled six novel bacterial sensory domains, including a previously unrecognized variant of the CSS domain. Furthermore, we dissect the chemosensory pathways of these 42 strains, including chemoreceptor topology, diversity, and gene synteny. Altogether, our work provides a comprehensive analysis aimed toward disentangling the complex and intertwined networks regulating motile and biofilm behaviors in this bacterial species and their ecophysiological implications.

RESULTS AND DISCUSSION

Pangenome analysis reveals distinct accessory genes in genetically independent strain backgrounds.

We initiated our study by analyzing genome sequence characteristics and strain assignment to the species S. algae, since misidentification of Shewanella sp. isolates has been documented, including the reclassification of Shewanella upenei (strain CCUG 58400 in our set) and Shewanella haliotis as later heterotypic synonyms of S. algae (40–42). The identity of the isolates as S. algae was confirmed by multilocus sequence typing (MLST) (41) and pairwise digital DNA-DNA hybridization (dDDH) (43; data not shown). All strains belong to the same subspecies, as determined by a dDDH >79% cutoff (43; data not shown). S. algae genomes ranged from 4.66 to 5.07 Mbp, with a GC content in the range from 52.8% to 53.2%, which agreed with reported data for this species (40). To evaluate the WGS diversity of the 42 S. algae isolates at the nucleotide composition level, we calculated the genomic distance, measured as the Jaccard Similarity Index (JSI), between each unique isolate pairing with MinHash sketches of their respective WGS assemblies (Fig. 1). The JSI for each of the all-by-all comparisons conducted spanned a range from 0.34 to 1, where lower values confirmed greater genomic distance and a value of 1 indicated absolute genomic similarity. Queries of all 42 MinHash genome sketches against the Genome Taxonomy Database (GTDB) corroborated the dDDH and MLST results by producing an array of 17 best-matching reference genomes with an average query to subject JSI of 0.53. These queries included three sequenced isolates that had a 100% match to S. algae GTDB references, namely, the type strain CECT 5071, CCUG 58400 (originally reported as S. upenei 20-23RT and later as a heterotypic synonym of S. algae as indicated above), and A41 (strain designation equivalent to NCTC 10738) (41). The all-by-all comparison of genome similarity among the 42 S. algae assemblies revealed each isolate to represent a distinct strain, except isolates HUD-H4 and HUD-I2, which were retrieved from the same individual before and after antibiotic treatment (see Table S1 in the supplemental material). Higher JSI values were also observed for the HUD-G3::20533 and 5043::A97 isolate WGS assembly comparisons. Taken together, our analyses showcased substantial phylogenomic diversity within this sequenced group of isolates.

FIG 1

WGS-based distance and taxonomic comparisons using MinHash. An all-by-all pairwise comparison of the MinHash genome sketch for each S. algae strain. The clustered distance matrix depicts the JSI, also referred to as Jaccard distance, for each unique strain pairing based on comparisons of WGS-assembly nucleotide composition. JSI values closer to 1 (darker) showcase higher WGS similarity, while values approaching 0 (lighter) reflect greater dissimilarity. The column annotation bar indicates each strain’s isolation source. Row annotation box colors reflect the S. algae reference genome from the GTDB with the highest JSI value to each of the 42 strains included in the study. Row annotation bar plot values represent the JSI between the query and GTDB subject strains. List of sequenced S. algae strains and relevant WGS parameters. Download Table S1, DOCX file, 0.03 MB. Next, we focused on analyzing the genomic diversity of our S. algae strains. To estimate the total gene pool of the 42 S. algae strains and assess their core and accessory gene repertoire, we performed a pangenome analysis. Upon successive addition of each genome, the number of protein-coding gene clusters containing homologs in the pangenome maintained a positive slope, indicating that the pangenome is essentially open, and reached a total of 10,122 (Fig. 2A). This indicated a substantial genetic diversity among the genomes of the sequenced S. algae strains, consistent with their diverse origins in terms of time, host, disease, and geography (Table S1), as well as their phylogenomic distance (Fig. 1). Conversely, the number of conserved genes decreased with the addition of each genome, resulting in 3,427 core genome genes (Fig. 2A). Thus, the S. algae pangenome is far from saturated. Overall, core genes (i.e., genes shared by all isolates) represented 37.65% of the pangenome, while shell genes (genes shared by 2 or more isolates) and cloud genes (genes unique to a single isolate) comprising the accessory genome represented 28.56% and 33.79%, respectively (Fig. 2B). Evaluating pangenome homologous gene cluster frequency demonstrated that genomic presence for the majority of accessory genome content was sparse, with more than half of the clusters being present in 25% or less of the S. algae isolates (Fig. 2C). Remarkably, these intra-specific proportions are similar to those reported for Shewanella in other comparative genomic studies involving a lower number of genomes belonging to diverse species (13, 14). Detailed content from our pangenome analysis is provided in Data Set S1 in the supplemental material.

FIG 2

Pangenome analysis of S. algae strains. (A) Pangenome rarefaction curves illustrating the growth in pangenome size (top, purple) and stabilization of the core genome (bottom, green) after subsequent genome additions during construction. (B) Pie chart showcasing relative proportions of the core, shell, and cloud pangenome categories. (C) Gene frequency bar plot reflecting the number of homologous gene clusters present (y axis) in respective proportions of the isolate population used to construct the S. algae pangenome (x axis). Bars to the far right indicate gene clusters comprising the core genome. (D) Clustered pangenome dendrogram based on the respective accessory genome (gene presence/absence for shell and cloud genomic categories) profiles of each independent S. algae genome. Each column represents a homologous gene cluster with blue indicating presence and white indicating absence. Dendrogram tip point colors designate the isolation source. (E) Summary of S. algae pangenome KEGG pathway annotations. Bar colors correspond to the respective pangenome category (red, cloud; blue, core; green, shell), and bar heights indicate the percentage of total gene clusters comprising each category that received KEGG pathway annotations. Gene presence and absence in S. algae genomes. Shown are output data from pangenome analysis using the Roary pipeline. GFF files, including Prokka annotations used in pangenome analysis, are available at https://github.com/ctmrbio/Salgae_c-di-GMP_analysis. Download Data Set S1, XLSX file, 2.4 MB. To contextualize the relationships between isolates based on their pangenome, we clustered the strains based on their accessory genome content (Fig. 2E). Clades formed when comparing homologous gene cluster profiles were similar to those observed when assessing relationships with JSI (Fig. 1). While isolates HUD-H4 and HUD-I2 shared the same distinct accessory genome profile, this clustering also revealed substantial similarities in accessory genome content between the clinical isolate pairs HUD-G3::20533 and 5043::A97, consistent with their WGS-based similarities (Fig. 1). Next, to gain functional insights into the core, shell, and cloud genome contents of the 42 S. algae strains, KEGG pathway annotations were generated for each pangenome category (Fig. 2F). Most core genome genes belonged to amino acid and carbohydrate metabolism, representing together about 25% of the core genome content. Other significantly populated KEGG pathways in the core genome were energy metabolism, metabolism of cofactors and vitamins, and signal transduction, altogether representing about 20% of the core genome content. Carbohydrate metabolism pathways also comprised the majority of accessory (cloud and shell) genes. Notably, the second most populated accessory genome category was glycan biosynthesis and metabolism, suggesting a rather diverse glycobiology among S. algae strains that so far has only been incipiently explored (44). Signal transduction, secondary metabolite biosynthesis, and cellular community pathways were also significantly represented in the accessory genome. Among cloud genes, we identified some noteworthy features, including antibiotic resistance determinants such as sul2, aph(3′')-Ib, aph(6)-Id, floR, tetR, and tetD (strain 150735, locus tags JKK46_16645, JKK46_16650, JKK46_16655, JKK46_16670, JKK46_16685, and JKK46_16690). An additional nitrite reductase, besides the two constitutive nrfA paralogs, was identified in strain 97087 (locus tag I6M44_21660), which also contained a copper resistance operon (locus tags I6M44_04940, I6M44_04945, I6M44_04950, and I6M44_04955). Accessory genes included genes associated with representative toxin-antitoxin systems (e.g., higBA, strain CCUG 12945, locus tags I6M59_21290 and I6M59_21295), restriction-modification systems (e.g., CCUG 20533, locus tags I6M54_19235 to I6M54_19245), and a fimbrial biogenesis cluster (strain CCUG 24987, locus tags I6M53_13090 to I6M53_13105). A urease operon found in strain CCUG 789 (locus tags I6M58_18955 to I6M58_18985) is exceptional, as S. algae has been reported to be urease negative (41). Besides, numerous putative c-di-GMP turnover genes and chemotaxis protein-coding genes were identified in the accessory pangenome (see below). Of note, the accessory nitrite reductase NrfA of S. algae 97087, a strain that lacks the dimethyl sulfoxide reductase operon (15), is more closely related to NrfA proteins of Shewanella xiamenensis and Shewanella putrefaciens than to either of the two NrfA paralogs seen in other strains of S. algae (see Fig. S1A in the supplemental material). This finding supports the possibility that horizontal gene transfer of anaerobic respiration pathways between S. algae and other Shewanella spp. has occurred (15). The genomic context for nrfA-3 in S. algae 97087 suggests acquisition via transposition (Fig. S1B). Also of note, while all strains contained qnrA and bla-OXA-55 genes, the human wound isolate 150735 carried additional virulence factors, including antibiotic resistance determinants encoded by a chromosomal genomic island at positions 3670095 to 3775315 that contained IS91 and Tn3 family transposase elements and a type IV conjugal transfer system, as well as phage-related proteins (data not shown). Phylogeny and gene synteny of nitrite reductases in S. algae genomes. (A) Maximum likelihood phylogenetic reconstruction of evolutionary relationships of NrfA subunits in S. algae genomes. The evolutionary history was inferred by using the Whelan and Goldman model as implemented in MEGA X software. Bootstrap support values (in %) after 1,000 iterations are indicated. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. (B) Gene synteny of nitrite reductases in S. algae genomes, including the unique third reductase of S. algae 97087. Download FIG S1, TIF file, 1.0 MB. Collectively, the components of our pangenome analysis highlight the distinct genomic features and rich genetic diversity among these 42 isolates. As part of our ongoing efforts to understand S. algae biofilm physiology in response to environmental stimuli, the following sections showcase a detailed analysis of cyclic di-GMP metabolism and chemosensory networks.

The Shewanella algae c-di-GMP proteome is mostly conserved among strains.

Biofilm formation in Shewanella spp. is typically controlled by the bacterial second messenger c-di-GMP (15, 45, 46). To explore the diversity of c-di-GMP metabolizing proteins in the 42 S. algae strains, we first identified putative c-di-GMP turnover proteins in the type strain CECT 5071. This search revealed 63 open reading frames (ORFs): 33 proteins with a GGDEF domain (PF00990 domain in Pfam [47]) being putative DGCs, 4 proteins with an EAL domain (PF00563 in Pfam) with putative c-di-GMP PDE activity, 19 hybrid proteins with a GGDEF-EAL domain combination, and 7 putative c-di-GMP PDEs with an HD-GYP domain. The type strain c-di-GMP turnover protein density was 12.79 per Mbp, which is close to the average for Shewanella spp., but much higher than the bacterial average of 4.11 per Mbp (calculated from the data in https://www.ncbi.nlm.nih.gov/Complete_Genomes/c-di-GMP.html). With S. algae CECT 5071 being the type strain and the reference model, we thoroughly dissected its c-di-GMP turnover proteome. We performed sequence alignments of GGDEF, EAL, and HD-GYP domains, encoded in the genome of this strain, investigated the presence of key structural motifs, and reconstructed their phylogenies with respect to reference proteins. Of the 33 GGDEF domain proteins of S. algae CECT 5071, 27 domains contain the GG(D/E)EF signature motifs indicative of catalytic activity (see Fig. S2A in the supplemental material). Four domains from proteins WT_01800, WT_01911, WT_02417, and WT_02823 contain neither the GG(D/E)EF motif nor additional consensus motifs required for catalytic activity, making their functionality as c-di-GMP synthases unlikely. Two proteins were excluded from this comparison. Protein WT_04201 is 169 amino acids long and consists solely of a stand-alone GGDEF domain with no N-terminal signaling domain and the eponymous active-site motif changed to NHHLF. WT_00162 is highly truncated, consisting of only 76 amino acids corresponding to the C terminus of the GGDEF domain. While GGDEF domains with altered signature motifs are usually catalytically inactive, they may play regulatory roles (16). An RxxD autoinhibitory I-site (48) involved in allosteric c-di-GMP binding that restricts the DGC activity was identified in 21 of the 31 full-length GGDEF domain proteins (Fig. S2A). Structure-based alignments of GGDEF, EAL, and HD-GYP domains of S. algae CECT 5071. (A) Structure-based alignment of all full-length GGDEF domains. Only proteins with a full-length GGDEF domain were considered; WT_4201 had been excluded. The crystal structure of the GGDEF domain from Caulobacter vibrioides protein PleD (PDB ID 2V0N) was used as a reference. α-Helices are indicated with coils, β-strands with arrows, and the bottom line indicates residue accessibility. Blue dots indicate GGDEF domain proteins, and red dots indicate GGDEF-EAL domain proteins. Green triangles indicate predicted catalytically nonfunctional domains. (B) Structure-based alignment of EAL domains. Only proteins with a complete EAL domain were considered. The EAL domain from Klebsiella pneumoniae protein BlrP1 (PDB ID 3GG1) was used as a reference. Blue dots indicate EAL domain proteins, and red dots indicate GGDEF-EAL domain proteins. Green triangles indicate predicted catalytically nonfunctional domains. (C) Structure-based alignment of HD-GYP domains. The HD-GYP domain of Persephonella marina protein PmGH (PDB ID 4MDZ) was used as a reference. Download FIG S2, PDF file, 1.5 MB. A phylogenetic reconstruction of GGDEF domains of S. algae CECT 5071 with respect to the reference GGDEF domains from other bacterial species is presented in Fig. 3. Thirty-one of the 33 GGDEF-domain proteins contain N-terminal fragments that could serve as potential sensor domains. Twenty-one of these proteins are predicted to be membrane bound and contain one or more transmembrane helices with periplasmic or membrane-embedded N-terminal domains, most of which were not identified by SMART, Pfam, or CD-search. N-terminal sensor domains include the small ligand binding PAS/PAC domain and GAF domain in three proteins each. These domains are most commonly associated with GGDEF and EAL signaling domains, can bind small molecules such as NO, oxygen, and nucleotides, and sense light, temperature, and the redox status. Periplasmic substrate-binding domains (PBPb; PF00497 in Pfam) were found in WT_00716, WT_03640, and WT_03791, 7TMR-DISMED2 integral membrane domains in WT_00221, WT_01017, and WT_02230, and a dCache domain in WT_04108. The identification of novel N-terminal domains in these and other c-di-GMP turnover proteins is described in the next section.

FIG 3

Phylogenetic tree of full-length GGDEF domains of S. algae CECT 5071 and domain architectures of the corresponding proteins. Experimentally characterized GGDEF domain proteins AdrA (STM0385), YfiN (STM2672), YciR (STM1703), and YfgF (STM2503) from Salmonella Typhimurium and PleD (PDB ID 2V0N) from Caulobacter vibrioides were chosen as reference points. The circles on the right indicate GGDEF domains with (filled blue circles) or without (open blue circles) the active-site GG(D/E)EF motif and the presence of the autoinhibitory RxxD motif (black circles) and/or the EAL domain with (filled green circles) or without (open green circles) the eponymous EAL motif. Node support values above 30% are indicated. Protein names are followed by their domain architectures, determined by searches with SMART, ScanProsite, HHPred, and CDVist. Domains identified by HHSearch as implemented in CDVist are displayed if the probability was ≥90.0%. Protein lengths (in amino acid residues) are indicated at the bottom. The domain names and their Pfam database (47) entries are as follows: GGDEF, PF00990; EAL, PF00563; HAMP, PF00672; GAF, PF01590; PAS (or PAS+PAC), PF00989; CHASE, PF03924; TPR, PF00515; DUF, PF11849, Reg_prop, PF07494; YYY, PF07495; NIT, PF08376; GAPES4, PF17157; SGL, PF08450; REC, Response_reg, PF00072; SpoVT_C, PF15714; dCache, PF02743; 5TM-5TMR_LYT, PF07694; PBPb, SBP_bac_3, PF00497; 7TMR-DISMED2, PF07696; 7TMR-DISM_7TM, PF07695; MASE1, PF05231; MASE2, PF05230; MASE3, PF17159; ANAPC5. PF12862; FleQ, PF06490; ECF-ribofla_trS, PF07155; TOM20_plant, PF06552; Protoglobin, PF11563; DivIC, PF04977. For MASE6 and MASE7, see Fig. S5; for DgcCoil, see Fig. S6. Sequence alignment and membrane topology of the MASE6, MASE7, and MASE8 (membrane-associated sensor) domains of S. algae strains. The proteins are listed under their GenBank or RefSeq accession numbers and are linked to the respective entries in the NCBI protein database, the numbers indicate the positions of the residues within the respective proteins. S. algae proteins are shown on the top line, the second line shows transmembrane (TM) segments and their topology, predicted by TMHMM2.0 (http://www.cbs.dtu.dk/services/TMHMM/). Uncharged residues in predicted TM segments are shaded yellow, positively charged residues used to predict membrane topology are in blue, negatively charged residues are in red. Conserved residues are shown in bold and indicated as follows: acidic – red, positively charged – blue, hydroxyl-containing – purple, hydrophobic – yellow shading, small/turn (Gly, Ala, Pro) – green shading. The numbers of omitted residues are in white on black background. A. Sequence alignment of the MASE6 domain. The alignment includes proteins from gammaproteobacteria, including Vibrio cholerae (AAF94753) and Pseudomonas mendocina (ABP86790); alphaproteobacterium Rhodospirillum rubrum (ABC24185); zetaproteobacterium Mariprofundus micogutta (GAV20351); a spirochete (OQY32249), three firmicutes (PKM87762, PKM68690, and SHI20590), and a member of Balneolaeota (AXI99330). All listed proteins consist of MASE6 and GGDEF domains, except for the last two, SHI20590 from Sporobacter termitidis with a MASE6-PAS-GGDEF-HD-GYP domain architecture and AXI99330 from “Candidatus Cyclonatronum proteinivorum,” which has the MASE6-HisKA-HATPase-REC domain architecture. (B) Predicted membrane topology of the MASE6 domain of WT_00655 protein drawn by Protter (http://wlab.ethz.ch/protter/), TM segments are numbered as in panel A. Positions of the highly conserved residues Glu48 and Asn149 are marked with red and purple arrows, respectively, and conserved aromatic residues are indicated with yellow circles. (C) Sequence alignment of the MASE7 domain. The alignment includes proteins from a variety of bacteria, including proteobacteria (first 10 sequences), Bacteroidetes (OFX80737 and ), Leptospira biflexa (ABZ94876), two firmicutes (PKM64352 and OGS53999), and two Mycobacterium spp. (AAS03635 and AF017731). With one exception, sequences from proteobacteria and firmicutes combine MASE7 and GGDEF domains; the OGS53999 protein from an uncharacterized firmicute contains an additional hemerythrin domain. In proteins from Bacteroidetes and Leptospira, MASE7 is connected to histidine kinases; the proteins from Mycobacterium spp. and EKF59967 from Agrobacterium albertimagni are adenylate cyclases. The last sequence represents the N-terminal domain of the experimentally characterized adenylate cyclase Rv1625c (CYA1_MYCTU) from Mycobacterium tuberculosis H37Rv. Mutating Arg43→Ala or Arg44→Gly decreased the activity of the enzymatic domain almost 3-fold; the double mutant had less than 1% of wild-type adenylate cyclase activity, whereas mutating these Arg residues to Lys did not affect the enzyme activity. (D) Sequence alignment and membrane topology of the MASE8 domain. The upper block shows sequences from gammaproteobacteria; the next block shows sequences from other proteobacteria, followed by sequences from members of other phyla (Firmicutes, Saccharibacteria, and Nitrospinae). Most proteins have the MASE8-GGDEF architecture, but Bradyrhizobium tropiciagri protein WP_212020812 has a MASE8-GGDEF-EAL architecture, while sequences from the alphaproteobacterial Sulfitobacter sp. strain SK011 (AXI43256) and Labrenzia suaedae (SHN17585) combine MASE8 with the adenylate/guanylate cyclase domain (PF00211); in the betaproteobacteria Methylobacterium tarhaniae (KMO44214) and Undibacterium pigrum (PXX44803), MASE8 is combined with histidine kinases. The last line shows an alignment of the MASE8 domain with the 6TM transmembrane region of mammalian adenylyl cyclase (PF16214), which highlights their limited similarity. Download FIG S5, PDF file, 0.3 MB. Sequence alignments of novel N-terminal domains of S. algae c-di-GMP turnover proteins and a new version of the CSS (PF12792) sensor domain. S. algae proteins are shown on the top line. Residue coloring and other details are as in Fig. S5. (A) Sequence alignment and membrane topology of the VUPT (vitamin uptake-like sensor) domain. The alignment includes proteins from gammaproteobacteria (upper block), other proteobacteria (second block), other bacterial phyla (third block), and archaea (fourth block). The bottom block shows the sequences of the putative 7-cyanodeazaguanine (preQ0) transporters from E. coli (YhhQ) and Chlamydia trachomatis and the respective COG and Pfam entries (COG1738, PF02592), which show a distant relationship with the VUST domain. Included in this alignment are diguanylate cyclases, c-di-GMP phosphodiesterases, and histidine kinases with VUPT-GGDEF (QTE90264, ABS66673, ACN17581), VUPT-PAS-GGDEF-EAL (SDH91283, TKD14289, KGM01083), VUPT-PAS-HisKa-HATPase (NPD45052, OQD43493), VUPT-PAS-PAS-HisKa-HATPase (MBC8504223), and VUPT-PAS-HisKa-HATPase-REC (QEE14403) domain architectures. (B) Sequence alignment of DGCcoil, the coiled-coil N-terminal domain of S. algae protein WT_00119. All sequences are from gammaproteobacteria; the four blocks represent sequences from Alteromonadales, Legionella pneumophila, Vibrionales, and Pseudomonadales, respectively. The second line shows secondary structure prediction by Jpred4 (http://www.compbio.dundee.ac.uk/jpred/). The coiled-coil structure of the upper and bottom parts of this alignment has been confirmed by several coiled-coil prediction programs, including DeepCoil (https://toolkit.tuebingen.mpg.de/tools/deepcoil). The Vibrio cholerae protein VC0900 (CdgG, AAF94062) and Pseudomonas aeruginosa protein PA5487 (DgcH, AAG08872) have been shown to be active diguanylate cyclases that are involved in biofilm formation and other c-di-GMP-related functions; they are both widespread in the members of the respective genera. (C) Sequence alignment of the 2TM membrane anchor of S. algae protein WT_00055 and related proteins from Shewanella spp. or Aeromonas spp. All proteins in this alignment have the same 2TM-GAF-GGDEF domain architecture. (D) Sequence alignment of two types of CSS domains. The top line shows the sequence of the atypical CSS/CxxC domain from S. algae CECT 5071 protein WT_03853 (GenBank accession no. QTE76250), linked to its GenBank entry. The second line shows secondary structure prediction for WT_03853 by Jpred4 (http://www.compbio.dundee.ac.uk/jpred/). Other proteins are listed under their UniProt accession numbers and are linked to the respective UniProt entries. The upper group includes atypical CSS domains with an extra CxxC motif from the gammaproteobacteria Shewanella, Aeromonas, Photobacterium, and Vibrio spp. and the acidobacterium “Candidatus Koribacter versatilis.” The bottom group includes typical CSS (PF12792) domains from experimentally characterized proteins of Escherichia coli (PdeB and PdeC) and Pseudomonas aeruginosa (Arr). Conserved Cys residues are shown in white on a red background, Ser residues of the CSS motif are in purple; conserved hydrophobic residues are shaded yellow. Download FIG S6, PDF file, 0.2 MB. The four EAL domain proteins retain all recognized active-site residues for c-di-GMP-hydrolyzing PDE activity (Fig. S2B). A phylogenetic reconstruction of S. algae CECT 5071 EAL domains with respect to reference EAL domains from other bacterial species is presented in Fig. 4. WT_01776 and WT_03588 represent stand-alone EAL domains. The putative c-di-GMP PDE WT_01622 is a response regulator of the PvrR/RocR family that has the REC-EAL domain architecture, combining the EAL domain with the phosphoacceptor (receiver [REC]) domain of two-component signal transduction systems (49) (Fig. 4). A novel divergent CSS motif domain (described below) was identified in WT_03853.

FIG 4

Phylogenetic tree of the EAL domains of S. algae CECT 5071 and domain architectures of the corresponding proteins. EAL domains of experimentally characterized proteins YciR (STM1703), YfgF (STM2503), YhjH (STM3611), and YjcC (STM4264) from Salmonella Typhimurium, YahA from E. coli, Tbd1265 from Thiobacillus denitrificans (PDB ID 3N3T), and BlrP1 from Klebsiella pneumoniae (PDB ID 3GG1) were chosen as reference points. Other details and domain shapes are as in Fig. 2, except for the BLUF (PF04940), CBS (PF00571), LapD_MoxY_N (PF16448), CSS-motif (PF12792), and DUF3369 (PG11849) domains. Of the 19 GGDEF-EAL domain proteins, 15 GGDEF domains contain a GGDEF motif, with 5 of them containing an RxxD I-site motif, while the conservation of the consensus motif indicates functionality for 16 EAL domains. The GGDEF domains of proteins WT_00615, WT_03649, and WT_03908 do not contain a GG(D/E)EF motif, but the EAL domains of these proteins could be functional. Two proteins, WT_00995 (ortholog of E. coli CsrD [50, 51]) and WT_01800 (ortholog of Pseudomonas fluorescens LapD [52, 53]), possess unconventional EVF and ELF active-site motifs, respectively. Based on the conservation of the catalytically important motifs, WT_00995 might possess PDE activity, as has been observed for PigX of Serratia sp. (54), while WT_01800 is likely not catalytically active. The EAL domain of WT_01795 has the eponymous EAL motif replaced by the DVR motif and also lacks other motifs involved in metal binding and catalysis, suggesting a lack of catalytic activity (Fig. 3; Fig. S2A and B). With respect to the N-terminal signaling domains, 10 proteins contain at least one PAS domain and 3 proteins contain at least one GAF domain (Fig. 3). Two proteins contain a REC (receiver) domain characteristic of two-component phosphotransfer systems, and 11 proteins contain at least one transmembrane helix (Fig. 3). The GGDEF and EAL domain superfamilies are two of the most abundant bacterial protein superfamilies. Analysis of the phylogenetic relationships of 50 of the 52 GGDEF domains, excluding WT_00162 and WT_04201, with a truncated and a highly divergent GGDEF domain, respectively, showed that GGDEF domains of S. algae are distantly related, clustering rather with GGDEF domains of similar domain architecture from other species (Fig. 3; Fig. S2A) as previously observed (55, 56). Similarly, S. algae EAL domains are distantly related, with the closest homologs of similar domain architecture found in other species (Fig. 4). These results generalize previous findings of EAL domains to cluster according to their domain architectures rather than according to species (55), suggesting that most of the respective genes were vertically inherited from a common ancestor and subsequently evolved with respect to domain structure and sequence diversification, although horizontal gene transfer also played a role (7). The HD-GYP domain (57) is characterized by c-di-GMP (and occasionally cGAMP) specific PDE activity (58, 59). Seven HD-GYP domain proteins with characteristic consensus amino acid signature motifs suggesting catalytic activity were identified. The HD-GYP domains of functionally or structurally characterized reference proteins of the two HD-GYP subfamilies and the cGAMP PDE HD-GYP domain of VCA0681 (60) were aligned with S. algae HD-GYP domains (Fig. S2C). Three S. algae HD-GYP domain proteins have an N-terminal REC domain, three have the DUF3391 (Pfam PF11871) domain, which has not yet been experimentally characterized but whose structural model is available on the Pfam website (61), and one protein has a GAF sensor domain. Phylogenetic analysis (Fig. 5) revealed that HD-GYP domain proteins in S. algae CECT 5071 belong to the two previously identified subfamilies (58). Based on the conservation of the Fe-coordinating Glu residue (E185 in the structure of Persephonella marina PmGH (PDB ID 4MDZ) (Fig. S2C), four HD-GYP domains of S. algae (WT_00636, WT_00868, WT_01695, and WT_03715) contain a predicted trinuclear metal center and are likely to hydrolyze c-di-GMP to GMP, whereas the other three HD-GYP domain proteins (WT_01528, WT_03378, and WT_04118) contain a binuclear center and only hydrolyze c-di-GMP to the linear dinucleotide pGpG (61). No close homolog of the distinct cGAMP-specific HD-GYP domain of the Vibrio cholerae protein VCA0681 (60) is encoded in S. algae CECT 5071.

FIG 5

Phylogenetic tree of HD-GYP domains of S. algae CECT 5071 and domain architectures of the corresponding proteins. The experimentally characterized HD-GYP domain proteins from Vibrio cholerae (VCA0681, PDB ID 5Z7C), Pseudomonas aeruginosa PAO1 (PA4108 and PA4781; PDB ID 4R8Z), Persephonella marina (PDB ID 4MDZ), and Xanthomonas campestris (RpfG) were chosen as reference points for HD-GYP domains. Other details and domain shapes are as in Fig. 2, except for the HD-GYP domain (COG2206 in the COG database [72]) and DUF3391 (PF11871 in Pfam). We subsequently retrieved all putative c-di-GMP metabolizing proteins from other 41 S. algae genomes. The number of c-di-GMP metabolizing proteins identified ranged from 61 (S. algae A57) to 67 (S. algae A41), with relative c-di-GMP turnover protein densities between 12.38 per Mbp (S. algae A56 and S. algae A97) and 13.53 per Mbp (S. algae A41) (Fig. 6A). The distribution of conserved, nonconserved, and distantly related c-di-GMP turnover orthologs of the type strain in the other 41 isolates is presented in Fig. 6B. Two proteins, WT_00103 and WT_0162, are absent in most or all other strains. Thus, WT_00103 orthologs were only found in 11 of the 41 strains, suggesting that this putative DGC is uncommon. WT_00103 is located in the relA-recN genomic region at positions 3302276 to 3319231 in the genome of strain CECT 5071 (39), and part of this region is contained in a genomic island (see Fig. S3A in the supplemental material). In S. algae strain G1, which lacks this DGC, the same genomic region is divergently transcribed and located at a different position in the complete genome sequence (Fig. S3B). The truncated, stand-alone GGDEF domain protein WT_00162 is unique to the type strain and might be vestigial. Divergent orthologs of the GGDEF domain proteins WT_02322 and WT_02823 and the GGDEF-EAL domain protein WT_03275 were frequently found in other strains, suggesting that these proteins might be under evolutionary pressure in S. algae.

FIG 6

Cyclic di-GMP turnover proteome of Shewanella algae strains. (A) Distribution of GGDEF, EAL, GDDEF+EAL, and HD-GYP domains in S. algae genomes (bars, left axis) and total putative c-di-GMP turnover protein density per genome (dots, right axis). (B) Heat map showing the degree of conservation of putative orthologs of CECT 5071 c-di-GMP turnover proteins in other 41 S. algae genomes. Green, proteins that are ≥90% identical in length (Palign ≥ 90%) and have ≥95% amino acid sequence identity (Pident ≥ 95%); yellow, proteins with either Palign > 90% and Pident > 75% or Palign > 50% and Pident > 90%; red, proteins with Palign < 50% and Pident < 75%; orange, annotated proteins showing single nucleotide changes disrupting the open reading frames or split across contigs. Genomic islands of S. algae strains CECT 5071 and G1. Shown are the genomic islands of the type strain CECT 5071 and the acute enteritis isolate G1, as determined by IslandViewer 4 using different prediction methods: integrated (red), IslandPath-DIMOB (blue), SIGI-HMM (orange), and IslandPick (red). Relevant gene clusters are highlighted. Distinct genes in the relA-recN genomic region depicted for CECT 5071 and G1 are shaded gray and represented within a dashed box. The locus tags of the first and last genes represented in each scheme are indicated. Download FIG S3, TIF file, 0.7 MB. The phylogenetic positions of GGDEF, EAL, and HD-GYP domains identified in the 41 other S. algae strains with reference to S. algae CECT 5071 proteins are shown in Fig. 7A to C, and the domain architectures and key features of representative proteins are presented in Fig. S4 in the supplemental material. These include the membrane-bound GGDEF domain proteins 159418_02667 and 97087_01449 (5TM helices each) and 789_00791 (7TM helices), with only 97087_01449 showing the RxxD inhibitory I-site. A cytoplasmic PAC-GGDEF domain protein was identified in strains A41 and G1 (shown is that of strain G1, G1_04334), in a distinct genomic island that also contained a cytoplasmic GGDEF-EAL domain protein with 3 PAS domains and one GAF domain (G1_04336), along with chemosensory proteins (Fig. S3B). A membrane-bound GGDEF-EAL domain protein with a HAMP domain typical of bacterial transmembrane sensory proteins was found in strain A291 (A291_04011) lacking GGDEF and EAL motifs and an I-site, suggesting catalytic inactivity. Unique cytoplasmic EAL domain proteins were identified in strains of different genetic backgrounds, including HUD-G3_03949, which shows a similar architecture to WT_01622, including an N-terminal REC domain. A putative HD-GYP c-di-GMP PDE, 50501_04362, was found to be unique to the clinical isolate CCUG 50501. This PDE is predicted to be cytoplasmic, with an N-terminal GAF domain that participates in small molecule binding and protein dimerization (62), and a C-terminal CBS domain involved in the regulation of functions in response to the cellular energy status (63). This unique HD-GYP protein is located upstream of a helix-turn-helix (HTH)-containing transcriptional regulator and an HBM-LBD-type chemoreceptor (50501_04360). The three-gene cluster is flanked by transposases (locus tags I6M49_16805 to I6M49_16835 in the draft genome sequence; GenBank accession no. JADZHO000000000.1), a tell-tale sign of acquisition via horizontal gene transfer. Two proteins, 72678_00888 and HUD-G3_02699, are present in most of the strains (Fig. 6). Taken together, the c-di-GMP proteome of S. algae is conserved to a large extent, but exhibits some degree of variability across strains with clear evidence of horizontal acquisition of c-di-GMP turnover genes in some isolates. Cyclic di-GMP gene gain, loss, or sequence divergence might relate to adaptation to specific environmental conditions that are yet to be determined.

FIG 7

Phylogenies of the GGDEF, EAL, and HD-GYP domains of S. algae c-di-GMP turnover proteins not encoded by the type strain. The phylogenies of the GGDEF (A), EAL (B), and HD-GYP (C) domains identified in the 41 S. algae genomes in the context of type strain domains are displayed by GrapeTree representations (146). Nodes representing CECT 5071 proteins are in red, reference c-di-GMP turnover proteins (listed in the legends to Fig. 1 and 3) are in magenta, and distinct c-di-GMP turnover proteins from other strains are in turquoise. S. algae proteins are shown under their Prokka gene tags (see Table S2 for GenBank accession numbers). Representative c-di-GMP turnover proteins encoded in S. algae genomes but absent in the type strain. Domain topology and the presence of a GGDEF domain with the GGDEF motif, the RxxD motif, and an EAL domain with the EAL motif are indicated. Protein names are listed under their GenBank accession numbers and are followed by their domain architectures, determined by searches with SMART, ScanProsite, HHPred, and CDVist followed by manual curation. Domains identified by HHSearch as implemented in CDVist are displayed if the probability was ≥90.0%. Protein lengths (in amino acid residues) are indicated at the bottom. The domain names and their Pfam database entries are as follows: GGDEF, PF00990; EAL, PF00563; HD-GYP (COG2206 in the COG database); PAS (or PAS+PAC), PF00989; GAF, PF01590; Protoglobin, PF11563; LapD_MoxY_N, PF16448; HAMP, PF00672; REC, Response_reg, PF00072; CBS, PF00571. N-terminal transmembrane segments of the first two proteins, 159418_02667 (GenBank accession no. MBO2661730) and 97087_01449 (GenBank accession no. MBO2626571) represent variants of the MASE8 domain, shown in Fig. S5. Download FIG S4, TIF file, 0.3 MB. Equivalence between Prokka-generated locus tags and NCBI GenBank locus tags for relevant sequences used in figures. Download Table S2, DOCX file, 0.02 MB.

The S. algae c-di-GMP proteome encodes novel N-terminal sensor domains and a new CSS domain variant.

Examination of the amino acid sequences of the c-di-GMP turnover proteins from S. algae revealed that a substantial fraction of them had uncharacterized N-terminal sequences that could represent new sensor domains. As a first step toward the characterization of the sensory capacity of S. algae, we constructed the alignments of these homologous domains from the respective proteins found in various organisms, predicted their structural organization and transmembrane topology, identified the conserved residues, and submitted them to the Pfam protein domain database (47). We provide here the description of five novel membrane-embedded domains, an N-terminal cytoplasmic coiled-coil domain, and a distinct variant of the previously characterized CSS motif domain. Previous studies resulted in the identification of a variety of periplasmic sensor domains (34, 64) and a relatively small number of integral membrane domains, which included the NO-binding MHYT domain, named after its conserved sequence motif (65, 66), 5TMR of the 5TMR-LYT and 7TM-DISM_7TM domains (67), and five MASE (membrane-associated sensor) domains (18, 68, 69), of which only MASE1 (Pfam domain PF05231) has been experimentally characterized and was found to sense aspartate in E. coli PdeF (YfgF) and control the proteolytic turnover of DgcE (70, 71). Analysis of the S. algae CECT 5071 GGDEF domain proteins WT_00655 (GenBank accession no. QTE77313.1), WT_00826 (GenBank accession no. QTE77466.1), and WT_02160 (GenBank accession no. QTE78688.1) revealed that their GGDEF domains are preceded by hydrophobic fragments that consist of six predicted transmembrane (TM) segments. Iterative PSI-BLAST searches revealed that the TM regions of WT_00655 and WT_00826 are related and contain the same 6TM sensor domain (hereafter MASE6), whereas WT_02160 contains another 6TM domain, hereafter MASE7 (see Fig. S5A to C in the supplemental material). In addition to the GGDEF domains, MASE6 and MASE7 were found in association with histidine sensor kinases, and MASE7 was also associated with the adenylate cyclase output domain (PF00211 in Pfam), confirming that they both comprise promiscuous sensor domains in diverse bacterial receptors. Analysis of the membrane topology of these domains revealed that they contain short N-terminal hydrophilic regions located in the cytoplasm, which are followed by six TM segments that are connected by very short cytoplasmic and periplasmic loops. These loops are too short to form any separate ligand-binding domains in the periplasm and show poor sequence conservation, which suggests that signals are likely sensed by the membrane-embedded portions of MASE6 and MASE7 domains. Indeed, both domains contain several conserved Phe, Tyr, and Trp residues that could be involved in binding aromatic compounds. Alternatively, these domains could modulate the catalytic activity of the downstream domains through protein-protein interactions, as has been recently demonstrated for the MASE1-containing DgcE (71). Comparative analysis of S. algae genomes revealed one more 6TM integral membrane sensor domain in a DGC from strain 97087, named MASE8, which is widespread among various classes of Proteobacteria and is also found in representatives of some other phyla (Fig. S5D). In addition to DGCs, MASE8 was found in association with adenylate/guanylate cyclase output domain and in histidine sensor kinases (Fig. S5D), which confirmed its identity as a sensor domain. One more integral membrane sensory domain, seen in DGCs from 32 out of 42 strains of S. algae, albeit not in the type strain, is distantly related to previously characterized proteins. A shorter, divergent stand-alone version of this domain, which corresponds to the E. coli protein YhhQ, a member of the COG1738 family in the COG database (72), has been described as an essential component of the transport system for queuosine precursor (preQ0) 7-cyanodeazaguanine (73, 74). The YhhQ protein has 6 predicted TM segments and is listed in Pfam as “Putative vitamin uptake transporter Vut_1” (PF02592). In S. algae and other gammaproteobacteria, the YhhQ-related N-terminal sensor domain has an extra TM segment and combines with the GGDEF domain to form a widely conserved DGC (see Fig. S6A in the supplemental material). This domain, which we named VUPS (vitamin uptake-like sensor), is found in a variety of bacteria in combination with PAS, GGDEF, and EAL domains, as well as with histidine sensor kinases. Several archaea also encode histidine sensor kinases with VUPS as their sensor domain (Fig. S6A). The DGC WT_00119 (GenBank accession no. QTE76827.1) of S. algae CECT 5071 combines the GGDEF domain with a 350-amino-acid long N-terminal α-helical cytoplasmic domain. Analysis by programs such as PCOILS, MultiCoil2, and DeepCoil (75–77) indicated that a large part of this domain consists of a coiled-coil structure (Fig. S6B). Furthermore, the MARCOIL/LOGICOIL combination (78) predicted the quaternary structure of this domain as either a tetramer (four-helix bundle [79]) or an antiparallel dimer. The combination of this domain with the GGDEF domain is widespread in gammaproteobacteria and two proteins with such a domain architecture have been experimentally characterized as active DGCs in V. cholerae (CdgG, locus tag VC0900; GenBank accession no. AAF94062) and P. aeruginosa (DgcH, locus tag PA5487; GenBank accession no. AAG08872) (80, 81). Accordingly, we designated this domain DGCcoil (DGC-associated coil). In contrast to the periplasmic α-helical sensor domains containing a predicted coiled coil, such as HBM (PF16591), DAHL (PF19443), and the sensor domain of TorS (PDB ID 3I9W), which have been previously described in DGCs, chemoreceptors, and histidine sensor kinases (82–85), the DGCcoil domain is cytoplasmic and shows no sequence (or predicted structural) similarity to either of these domains. It remains to be seen whether it binds any ligand, is responsible for some kind of protein-protein interaction, or just stimulates dimerization of the associated GGDEF domains. Finally, S. algae DGC WT_00055 (GenBank accession no. QTE76780.1) consists of the cytoplasmic GAF and GGDEF domains, which are preceded by a short N-terminal transmembrane region that consists of two TM segments. Sequence alignment of this 2TM hairpin (Fig. S6C), which is found exclusively in Shewanella and Aeromonas spp., shows few conserved residues, suggesting that this hairpin might serve simply as a membrane anchor, rather than a full-fledged sensor domain. An interesting new domain was also found in S. algae c-di-GMP phosphodiesterases. Sequence analysis of the S. algae EAL domain protein WT_03853 (GenBank accession no. QTE76250.1) showed that it contains an unusual N-terminal periplasmic sensor domain. While this domain was not recognized by any Pfam domain model, it was clearly related to the CSS motif sensor (PF12792) domain (86), but with an additional conserved CxxC motif (hereafter, CSS_CxxC domain) (Fig. S6D). Proteins with the CSS-EAL domain architecture, comprising COG4943 in the COG database (72), are widespread in bacteria, with multiple paralogous genes identified: for example, in E. coli (5 genes), Salmonella enterica (5 genes), and Pseudomonas aeruginosa (3 genes). Figure S6D shows an alignment of these diverse CSS domains with the experimentally studied CSS domains from PdeB and PdeC proteins of E. coli (86). A typical CSS motif sensor domain contains two conserved Cys residues that can form a disulfide bridge and undergo dithiol-disulfide transitions, allowing regulation of the PDE activity of the cytoplasmic EAL domain in response to the redox conditions in the periplasm (86). One of the P. aeruginosa proteins with the CSS-EAL domain architecture, PA2818, activates biofilm formation in response to subinhibitory concentrations of the aminoglycoside antibiotic tobramycin (87). S. algae encodes only one CSS domain-containing protein, WT_03853, but its CSS motif domain carries two additional Cys residues that form a conserved CxxC motif. We identified CSS_CxxC domains in other Shewanella spp. as well as other members of Altermonadales (Aeromonas, Oceanimonas, Pseudoaltermonas, and Thalassomonas spp.), Vibrionales (Vibrio, Enterovibrio, and Photobacterium spp.), and in some acidobacteria (“Candidatus Koribacter versatilis” and Terracidiphilus sp.). Representative sequences are shown in Fig. S6D. The CxxC motif is characteristic of redox proteins that are involved in the formation, isomerization, or reduction of disulfide bonds (88, 89). The presence of two distinct pairs of cysteines that are capable of dithiol-disulfide transitions may allow such a CSS domain to perform a wider range of responses to redox changes or respond to different redox changes that ultimately lead to alterations in c-di-GMP levels. E. coli and several other organisms appear to accomplish this task by using several different CSS-EAL proteins. The full breadth of redox responsive c-di-GMP turnover proteins remains to be characterized. In addition, S. algae DGC WT_02823 (GenBank accession no. QTE79282.1) and dual GGDEF-EAL domain proteins WT_01391 (GenBank accession no. QTE80346.1), WT_03908 (GenBank accession no. QTE76295.1), and WT_01971 (GenBank accession no. QTE78520.1) contain uncharacterized periplasmic N-terminal domains that are found in many S. algae strains and across Shewanella spp. but rarely, if ever, outside this genus. In summary, initial characterization of c-di-GMP turnover proteins in Shewanella algae, a marine organism with a high c-di-GMP IQ, has unraveled a number of novel integral membrane, periplasmic and cytoplasmic N-terminal signaling domains whose sensory specificity and/or protein-protein interaction capacity remains to be determined.

Chemosensory pathways of S. algae are conserved across strains and include an F7 pathway of unknown function.

Cyclic di-GMP homeostasis and biofilm formation are functionally intertwined with chemosensory pathways, including chemoreceptors. There is a significant number of studies showing that chemoreceptors (90–94) and chemosensory signaling genes (95–98) affect biofilm formation/dispersion and/or c-di-GMP levels. In P. aeruginosa, there is evidence for two chemosensory pathways, wsp and chp, that sense surfaces and modulate in turn c-di-GMP levels initiating biofilm formation (99). Current information suggests that there are multiple mechanisms by which chemosensory signaling modulates biofilm formation, and there is a considerable research need for the identification of the underlying molecular details. To this end, we present here an analysis of the repertoire of chemoreceptor and chemosensory signaling genes in S. algae. S. algae CECT 5071 encodes 29 chemoreceptors that are likely to stimulate two chemosensory pathways conserved across all 42 strains (see Fig. S7 in the supplemental material). Classification of pathway-signaling genes by the MIST 3.0 database indicates the presence of the F6 pathway, likely to be involved in chemotaxis, as well as the F7 pathway. The gene arrangement of the latter pathway shows similarities to the P. aeruginosa che2 pathway, which is of unknown function but not involved in chemotaxis (100). Chemotaxis experiments with P. aeruginosa in-frame deletion mutants with mutations in cheB2, cheA2, and mcpB/aer2 (the only receptor predicted to stimulate the che2 pathway) had chemotaxis and aerotaxis phenotypes indistinguishable from those of the wild type (101). The number of chemoreceptors in S. algae strains ranged from 28 to 31, which suggests a relatively conserved chemosensory repertoire, in contrast to the diversity found in P. aeruginosa or Pseudomonas putida strains. While genomes of some bacteria harbor an extremely large number of chemoreceptor genes, such as 90 for Caryophanon latum (102), the average in bacteria is 14 (103). The presence of 29 chemoreceptors thus reflects an above average chemosensory capacity for S. algae. So far there are no experimental data available on chemoeffectors recognized by any of these receptors. Of the 29 chemoreceptors of strain CECT 5071 (Fig. 8A), 12 present a ligand-binding domain (LBD) of the Cache superfamily that corresponds to the largest family of extracytoplasmic bacterial sensor domains (64). Thus, of these, 3 chemoreceptors contain a single Cache (sCache) domain of subtype 2, and 9 chemoreceptors contain a double Cache (dCache) domain (7 of subtype 1, 1 of subtype 3, and 1 dual Cache_3-Cache_2). Along with the four-helix-bundle (4HB) domain, the dCache domain is the prevalent LBD in prokaryotes and is homologous to, but distinct from the PAS superfamily, containing a long N-terminal α-helix and two α/β-type modules (64). Many dCache domains bind different amines (104), and most known chemoreceptors for amino acids possess a dCache domain (105). In other species, chemoreceptors with sCache_2 domains were found to primarily bind small organic acids (64, 105).

FIG 8

Shewanella algae chemoreceptors. (A) Domain architectures of S. algae chemoreceptors. Shown are the topologies of the reference chemoreceptor proteome of the type strain CECT 5071 (black lettering), as well as chemoreceptors identified in other sequenced S. algae strains with distinct topologies (blue lettering). Chemoreceptors that were not annotated in databases but identified through homology modeling are indicated with the suffix “-like”. (B) Heat map showing conserved (green), divergent (yellow) or absent (red) S. algae CECT 5071 chemoreceptors in all other 41 S. algae genomes. Color criteria are as in Fig. 6. Gene synteny of flagellar systems F6 and F7 of S. algae. Che proteins and relevant gene clusters are shown. Download FIG S7, TIF file, 0.7 MB. Two chemoreceptors of the type strain present a four-helix bundle (4HB) 4HB_MCP_1 domain, which is also widespread in Bacteria and Archaea and is known to bind a diverse array of ligands (79), such as amino acids (105), C6 ring carboxylic acids (105), boric acid (106), tricarboxylic acid (TCA) cycle intermediates (107), or polycyclic aromatic hydrocarbons (108). A single chemoreceptor in the S. algae chromosome presents an NIT domain, predicted to bind nitrate and nitrite (109). By analogy to the sole so-far-characterized nitrate chemoreceptor McpN (110), this chemoreceptor was found to be downregulated in strain CECT 5071 in the presence of a high extracellular nitrate concentration (15), therefore supporting the notion of a role in nitrate recognition and nitrate-mediated signal transduction. Homology modeling identified an HBM-like LBD in WT_00533 (82) that was not identified by Pfam. Six chemoreceptors of strain CECT 5071 exhibit unknown LBDs. Four MCPs, WT_01293, WT_02279, WT_00032, and WT_00231, exhibit a noncanonical topology characterized by two consecutive transmembrane regions without an LBD. WT_01293 protein has three PAS domains, WT_02279 has a C-terminal LBD, and WT_00231 has an N-terminal PAS domain. The latter receptor has the typical topology of an Aer receptor, which, by analogy to the studied homologs with the same topology, is likely to mediate aerotaxis (the PAS domain contains bound heme for oxygen sensing) (111). WT_01293 has similarities to the Aer2/McpB chemoreceptor of P. aeruginosa (99, 112). The presence of multiple PAS domains is a characteristic feature observed for many Aer2/McpB homologs (38). By analogy to the P. aeruginosa chemoreceptor, WT_01293 is proposed to stimulate the F7 pathway (homologous to the che2 pathway in P. aeruginosa), which may have a function unrelated to chemotaxis. WT_02279 contains a C-terminal hemerythrin domain, which in other species was found to bind oxygen or nitric oxide (113, 114), suggesting that this receptor may also mediate aerotactic responses. Three soluble MCPs, WT_02605, WT_03599, and WT_02400, complete the chemosensory repertoire of the type strain. WT_2605 and WT_03599 contain 2 N-terminal PAS domains and 1 N-terminal PAS domain, respectively, whereas WT_02400 contains at the C-terminus a CZB (chemoreceptor zinc-binding, PF13682) domain, which is commonly found in proteins involved in chemotaxis, c-di-GMP signaling, and nitrate/nitrite sensing (115). The characterized CZB domain-containing chemoreceptor TlpD of Helicobacter pylori was found to mediate repellent responses to oxidative stress (116), as well as chemoattraction to bleach (117), and similar responses may be mediated by WT_02400. Figure 8B summarizes the presence and degree of conservation of chemoreceptors in all 42 S. algae strains. Thus, the WT_02386 ortholog was absent in strains 404 and G1, and the WT_03104 ortholog was not present in strain A41. Distant orthologs of WT_00678 sharing 77 to 79% sequence identity with the type strain protein were identified in five S. algae strains, suggesting significant sequence variability in this specific chemoreceptor. Chemoreceptors found in other S. algae genomes that lack an ortholog in the type strain did not show distinct topologies and LBD types from the ones mentioned above (Fig. 8A). Homology modeling predicted a 4HB-like LBD (118) in the receptor 789_03200 of S. algae CCUG 789. Chemoreceptors can be classified according to the length of their signaling domain (119). Since this domain forms a coiled-coil structure, the length is usually expressed in heptad repeats (H). Studies of P. aeruginosa have shown that the length of the signaling domain can be associated with the pathway it stimulates (120). For example, the 22 P. aeruginosa PAO1 chemoreceptors of the 40-heptad repeats (40H) family all stimulate the che pathway for chemotaxis, whereas the sole chemoreceptor of the 36H class (Aer2/McpB) was predicted to stimulate the che2 pathway. By analogy to this study, we propose that the 26 40H chemoreceptors of S. algae stimulate the chemotaxis pathway, whereas the sole 36H receptor (WT_01293) is proposed to stimulate the F7 pathway. This prediction thus coincides with the above-mentioned similarities in receptor topology between P. aeruginosa Aer2/McpB and WT_1293. While genes encoding receptors stimulating the F6 pathway are often distributed across the genome, a typical feature of F7 pathways is that the receptor that stimulates this pathway is encoded in a gene cluster together with the signaling proteins. This is the case of the 36H receptor WT_01293 and its homologs in other S. algae strains (Fig. S7). Besides, gene synteny of this cluster (cheYAW-mcp, cheRB) is remarkably similar to that of cluster II of P. aeruginosa, which codes for the F7 pathway (cheYAW-mcp, cheRB). Overall, the S. algae chemosensory system exhibits significant parallelism with that of P. aeruginosa PAO1, which encodes 22 40H receptors, two 24H receptors, one 36H receptor, and one 44H receptor, with the 36H receptor (Aer2/McpB) stimulating the F7 pathway (99). The 36H Aer2/McpB receptor of P. aeruginosa and its homolog in V. cholerae are known to sense oxygen at the PAS domain harboring the heme cofactor (121, 122). A similar function is therefore plausible for the S. algae 36H receptor. The function of the F7 pathway has not been identified but appears to be associated with virulence in P. aeruginosa (100, 123). Of note, analysis of the acute enteritis isolates G1 and A41 of S. algae revealed the presence of a putative integrative and conjugative element containing a putative aerotaxis receptor, frequently associated with virulence (124), a CheV homolog, a cytosolic GGDEF-EAL domain-containing protein with one GAF and 3 PAS N-terminal sensor domains, a PAS domain-containing chemoreceptor with a topology similar to that of the P. aeruginosa BdlA (biofilm dispersion locus A) chemoreceptor (92), and a divergently transcribed GGDEF domain-containing protein (Fig. S3B). Such a genetic environment suggests a potential role of cyclic di-GMP signaling in the coordinated regulation of chemotaxis and virulence in S. algae.

Concluding remarks.

Phylogenomic and pangenome analyses of 42 S. algae isolates revealed a relatively high degree of divergence, with most isolates being genetically distinct. Thus, S. algae shows an endemic population structure; however, the occurrence of closely related strains distinct in space and time suggests that clones with stable genetic repertoire exist in the population. The notable features of the S. algae accessory genome included horizontally acquired respiration-related genes, and antibiotic resistance determinants, as well as genes for urea hydrolysis enzymes and chemosensory and c-di-GMP turnover proteins. With 63 c-di-GMP turnover proteins and 29 chemoreceptors in the reference (type) strain, S. algae is near the top in Bacteria in terms of signaling capacity per Mb of the genome sequence. The apparent redundancy of putative c-di-GMP turnover proteins likely reflects a highly flexible c-di-GMP network. Analysis of structural features of GGDEF, EAL, and HD-GYP domains of the type strain suggests that 27 out of 33 GGDEF-containing proteins, 16 of 19 dual GGDEF-EAL proteins, and all 7 HD-GYP domains retain the ability to synthesize or hydrolyze c-di-GMP. In contrast, 12 GGDEF domains and 3 EAL domains deviate from the respective canonical motifs that are indicative of catalytic activity. Ongoing efforts at the functional analysis of the complete reference c-di-GMP proteome are expected to shed light on the catalytic activities of these DGCs or PDEs. Phylogenetic analyses of S. algae GGDEF, EAL, and HD-GYP domains using heterologous reference proteins, in combination with protein topology and domain architecture analysis, show that S. algae c-di-GMP turnover proteins generally cluster according to their domain architectures rather than according to species, suggesting inheritance from a common gammaproteobacterial ancestor. Analysis of the domain topology of proteins involved in c-di-GMP metabolism (Fig. 3 and 5) reveals that the majority of sensor domains are located in the cytosol, indicative of a central role of cytosolic signals in modulating c-di-GMP levels. In contrast to GGDEF and EAL proteins, HD-GYP proteins have only two types of sensor domain, GAF and DUF3391, which may suggest a narrow range of signals modulating the activity of this protein family. The significant number of RpfG family response regulators with the REC-HD-GYP domain architecture (61) suggests an important input from sensor kinases. The occurrence of c-di-GMP turnover proteins in the accessory genome of certain isolates raises questions about their contribution to S. algae fitness inside or outside a host and the rewiring of the signaling network upon their introduction. The comparative analysis of the c-di-GMP turnover proteome of S. algae isolates also revealed that the core proteins WT_02322, WT_02823, and WT_03275, while conserved in domain architecture, exhibit the highest degree of sequence variability across isolates, suggesting evolutionary pressure toward positive selection of those signaling proteins. The rich sensory repertoire of S. algae is reflected in the diversity of the N-terminal sensory domains of c-di-GMP turnover proteins (Fig. 3 and 4). We have described here several previously uncharacterized ones, including three new MASE (membrane-associated sensor) domains (MASE6 to MASE8), a novel VUPS (vitamin uptake-like sensor) domain, a novel DGCcoil (DGC-associated coil) domain, and a new CSS_CxxC variant of the CSS-motif sensory domain (86) containing two additional Cys residues that presumably broaden its redox-sensing palette. Unraveling the full breadth of signals and protein-protein interactions of these novel domains will require extensive experimental investigation. The analysis of the chemosensory pathways of S. algae revealed the existence of two conserved pathways: an F6 pathway involved in chemotaxis and an F7 pathway of unknown function. The resemblance of the S. algae and P. aeruginosa chemosensory systems is noteworthy. In contrast to P. aeruginosa strains, though, which exhibit a wide variability in terms of chemosensory content, S. algae strains have a rather narrow set of chemoreceptors. Inferences on signal recognition by S. algae chemoreceptors were made based on an array of in silico tools, including homology modeling. There are currently no experimental data on the chemoeffectors recognized by these chemoreceptors. This lack of knowledge represents a clear research need that extends to other marine bacteria, whose dissection is likely to yield valuable results from the ecophysiological perspective. Notably, genomic analysis of the acute enteritis isolates G1 and A41 revealed a putative integrative and conjugative element containing a putative aerotaxis receptor, a CheV homolog, two c-di-GMP turnover proteins, and a PAS domain-containing chemoreceptor similar to P. aeruginosa BdlA (biofilm dispersion locus A), known to be involved in virulence. As S. algae virulence and infectivity determinants in the human host are virtually uncharacterized, the analysis of this element may provide a valuable entry point for the study of S. algae infectivity and lifestyle regulation in the intestinal tract. Altogether, we have used state-of-the-art methodologies to construct a comprehensive atlas of S. algae c-di-GMP and chemosensory systems, which establishes S. algae as a model organism for the analysis of signaling pathways that will serve as a reference for future genetic and functional studies in Shewanella.

MATERIALS AND METHODS

Bacterial strains and whole-genome sequencing.

Shewanella algae strains (Table S1) were grown in LB medium at 37°C prior to DNA isolation with (i) the GenElute bacterial genomic DNA kit (Merck) for samples submitted for WGS with Illumina MiSeq or (ii) with the Genomic-tip 500/G kit (Qiagen) for samples submitted for Oxford Nanopore or PacBio sequencing, as detailed below.

(i) PacBio sequencing.

Multiplexed sequencing libraries for strain 150735 were constructed using the SMRTbell Express template prep kit 2.0 and the Barcoded Overhang Adapter kit 8A (PacBio, Menlo Park, CA, USA). Libraries were sequenced on one Sequel SMRT cell according to the manufacturer’s recommendations (PacBio).

(ii) Oxford Nanopore sequencing.

Genomic DNA samples of strains A59, A291, and G1 were barcoded using the SQK-RBK004 Rapid Barcoding kit and sequenced on the MinION platform, flow cell version FLO-MIN106. Fast5 files were base called and demultiplexed using the Guppy software v4.2.2. Hybrid assemblies were performed using the Unicycler software v0.4.8.

(iii) Illumina sequencing.

Sequencing libraries from all other 37 S. algae strains were prepared using the TruSeq Nano DNA library prep (Illumina, San Diego, CA, USA) and sequenced at the Center for Translational Microbiome Research, Karolinska Institute, on an Illumina MiSeq instrument with 2 × 300-bp paired-end reads. Sequencing reads were processed using the BACTpipe bacterial assembly and annotation pipeline v2.6.0 (https://github.com/ctmrbio/BACTpipe). The long reads from the PacBio sequencing were assembled using HGAP4 included in SMRT Link v8 (PacBio). The contigs from the long-read assemblies were manually circularized and then linearized to start close to the origin of replication (at the start of the dnaA locus). For each strain, contigs from the short-read assemblies were ordered according to their mapping to the type strain CECT 5071. Contigs spanning the dnaA locus were split 5′ of the dnaA gene. The sequence part of the contig containing dnaA was placed at the beginning of the assembly and the upstream part at the end of the matching contigs. Non-unique-mapping contigs and nonmapping contigs were added after the last matching contig. Prokka (125) annotations of draft and complete genomes were used for primary bioinformatics; therefore, Prokka-generated locus tags were used for in silico analyses. Genome sequences were deposited in GenBank (BioProject PRJNA526057) under accession no. SAMN11083189 (CECT 5071) (39) and SAMN16670435 to SAMN16670475 (all other 41 strains), along with the other accession numbers provided in the “Data availability” section below, and annotation was added by the NCBI Prokaryotic Genome Annotation Pipeline (126). The correspondence between the Prokka gene locus tags, GenBank accession number genome locus tags, and GenBank accession numbers for relevant sequences is presented in Table S2 in the supplemental material.

Whole-genome sequence assembly comparisons and pangenome analysis.

WGS assemblies were compared using Sourmash 4.2.1 (127). With a k-size of 31 and scaled setting of 1,000, genome sketches for each S. algae strain were subjected to an all-by-all comparison to evaluate their pairwise relative genomic distance (measured as the Jaccard index between two assemblies). Taxonomic assessment of each WGS MinHash sketch was achieved by querying each genome sketch against GTDB (128) r202 (available at https://osf.io/wxf9z/) to identify the best match based on Jaccard similarity via the search function of Sourmash. Code used to generate Fig. 1 is available at https://github.com/ctmrbio/Salgae_c-di-GMP_analysis. Pairwise comparisons between the WGS of the type strain CECT 5071 and the WGSs of all other 41 strains were also performed using the Genome-to-Genome Distance Calculator v2.1 (Leibniz Institute DSMZ) under the recommended settings (43), and pairwise digital DNA-DNA hybridization (dDDH) values were inferred. The threshold for species and subspecies delineation was set at DDH >70% and DDH >79%, respectively, as previously described (40, 43). Pangenome construction was performed using the Roary pipeline (129) with a BLASTp (130) identity threshold of 90%. The gene presence/absence content is available in Data Set S1, and GFF files including Prokka annotations are available at https://github.com/ctmrbio/Salgae_c-di-GMP_analysis. Subsequent pangenome analyses using gene presence/absence data from Roary were carried out with the R package Pagoo 0.3.9 (131). EggNog mapper 2.1.6 (132) was used to generate KEGG orthology annotations for each homologous gene cluster identified by Roary via diamond queries against the eggNog v5.0 database (133). KEGG orthology annotations for each gene cluster were then mapped to their corresponding KEGG pathways by referencing the KEGG brite file br08901.keg and a KEGG pathway to KEGG orthology id mapping file (http://rest.kegg.jp/link/ko/pathway). Code used to generate Fig. 2 is available at https://github.com/ctmrbio/Salgae_c-di-GMP_analysis. Other relevant phylogenetic analyses were performed with MEGAX (134). Genomic islands were predicted using IslandViewer 4 (135).

Bioinformatic analyses of cyclic di-GMP metabolizing proteins.

Putative diguanyulate cyclases (containing the GGDEF domain) and putative c-di-GMP phosphodiesterases (containing either EAL or HD-GYP domains) were identified in S. algae proteomes using searches with BLAST (130), ProSite (136), and HMMER (137). Verification of the occurrence of a GGDEF, EAL, and/or HD-GYP domain in candidate proteins was manually investigated by running the protein sequence through additional domain recognition programs such as CDD-Search (138), SMART (139), and Pfam (47). Poorly conserved domains were identified with HHpred (140). A search for orthologs of S. algae CECT 5071 c-di-GMP turnover proteins in other S. algae genomes was performed by BLASTp (130) followed by manual curation of the resulting hits. For the initial multiple alignments, the entire protein sequences including truncated domains were aligned with ClustalX (141). The sequence alignment was then manually assessed and curated in GeneDoc (142) by removing gaps and/or adjacent domains. For structure-based alignments, the respective GGDEF, EAL, and HD-GYP domains were aligned against the reference proteins as indicated in the legends to the figures. Structure-based alignment was performed with ESPript 3.0 (143) using default parameters. Cyclic di-GMP turnover protein maximum likelihood trees were produced using RaxML-NG (144) with default parameters (LG substitution matrix, gamma-distributed substitution rates, 100 bootstrap replicates) on the Vital-IT website at https://raxml-ng.vital-it.ch/. The trees including the domain predictions were visualized using iTol (145). GrapeTree (146) was used for the visualization of the phylogenetic relationships of GGDEF, EAL, and HD-GYP domains of S. algae c-di-GMP turnover proteins not encoded in the type strain chromosome. The newly described sensor domains were inferred from uncharacterized N-terminal sequences preceding the GGDEF and/or EAL domains. The respective sequences from S. algae were used as queries in PSI-BLAST (130) and HMMer (137) searches, and representative sequences from diverse bacteria and archaea were selected for inclusion in the displayed alignments. The novelty of identified domains and/or their remote similarity to any known domains was checked by HHPred (140). Transmembrane segments and their membrane topology were predicted with TMHMM2.0 (147) and verified by checking the respective entries in UniProt and inspecting the distribution of charged residues according to the “positive-inside rule” (148). In addition, predicted membrane topology was visualized by Protter (149). The putative N-terminal coiled-coil domain in the WT_00119 protein was analyzed using such prediction programs as MultiCoil2 (75) and PCOILS and DeepCoil (76, 77), accessed through the MPI Bioinformatics Toolkit (140). The quaternary structure of this domain was predicted with LOGICOIL (78).

Identification and analysis of chemosensory systems.

The identification of chemoreceptors followed a similar method to that described above for c-di-GMP turnover proteins. Thus, a chemoreceptor is defined as a protein that contains an MCP signaling domain identified by either SMART (139), ProSite (136), Pfam (47), or CDVist (150) followed by manual curation of the obtained hits. Transmembrane regions were identified using TMHMM2.0 (147). Chemoreceptor ligand-binding domains (LBDs) were classified according to Pfam (47). In the case in which a potential LBD (as defined by a domain flanked by two transmembrane regions) was not annotated in Pfam, homology modeling using Phyre2 (151) was used to approximate the LBD. The chemosensory signaling proteins were extracted from the MIST 3.0 database (152).

Data availability.

The genome sequences used in this study are deposited in GenBank (BioProject PRJNA526057) under the following accession numbers (for BioSample numbers, strain names are given in parentheses): complete genomes, SAMN16670464 (150735), SAMN16670468 (G1), SAMN16670438 (A59), SAMN16670445 (A291), SAMN11083189 (CECT 5071); draft genomes, SAMN16670460 (404), SAMN16670458 (789), SAMN16670449 (CCUG 48086), SAMN16670450 (CCUG 38646), SAMN16670457 (CCUG 56496), SAMN16670447 (CCUG 72638), SAMN16670456 (CCUG 72678), SAMN16670465 (97087), SAMN16670459 (CCUG 58400), SAMN16670452 (CCUG 20533), SAMN16670467 (5043), SAMN16670453 (CCUG 15259), SAMN16670455 (CCUG 526), SAMN16670437 (A58), SAMN16670436 (A57), SAMN16670475 (950570), SAMN16670462 (6F5), SAMN16670440 (A41), SAMN16670446 (A292), SAMN16670435 (A56), SAMN16670466 (159418), SAMN16670470 (590722), SAMN16670469 (669801), SAMN16670451 (CCUG 24987), SAMN16670448 (CCUG 50501), SAMN16670463 (28011), SAMN16670454 (CCUG 12945), SAMN16670442 (A93), SAMN16670441 (A65), SAMN16670443 (A94), SAMN16670439 (A60), SAMN16670444 (A97), SAMN16670471 (HUD-G3), SAMN16670461 (SF7), SAMN16670473 (HUD-H4), SAMN16670474 (HUD-I2), SAMN16670472 (HUD-D8).

148 in total

Comparative Genomics of Cyclic di-GMP Metabolism and Chemosensory Pathways in Shewanella algae Strains: Novel Bacterial Sensory Domains and Functional Insights into Lifestyle Regulation.

INTRODUCTION

RESULTS AND DISCUSSION

Pangenome analysis reveals distinct accessory genes in genetically independent strain backgrounds.

The Shewanella algae c-di-GMP proteome is mostly conserved among strains.

The S. algae c-di-GMP proteome encodes novel N-terminal sensor domains and a new CSS domain variant.

Chemosensory pathways of S. algae are conserved across strains and include an F7 pathway of unknown function.

Concluding remarks.

MATERIALS AND METHODS

Bacterial strains and whole-genome sequencing.

(i) PacBio sequencing.

(ii) Oxford Nanopore sequencing.

(iii) Illumina sequencing.

Whole-genome sequence assembly comparisons and pangenome analysis.

Bioinformatic analyses of cyclic di-GMP metabolizing proteins.

Identification and analysis of chemosensory systems.

Data availability.

1. CDvist: a webserver for identification and visualization of conserved domains in protein sequences.

2. GrapeTree: visualization of core genomic relationships among 100,000 bacterial pathogens.

3. COG database update: focus on microbial diversity, model organisms, and widespread pathogens.

4. MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms.

5. Interplay of heritage and habitat in the distribution of bacterial signal transduction systems.

6. Allosteric control of cyclic di-GMP signaling.

7. Gas Sensing and Signaling in the PAS-Heme Domain of the Pseudomonas aeruginosa Aer2 Receptor.

8. Atypical chemoreceptor arrays accommodate high membrane curvature.

9. The phosphorylated regulator of chemotaxis is crucial throughout biofilm biogenesis in Shewanella oneidensis.

10. SMART: recent updates, new developments and status in 2020.