Literature DB >> 33576800

Comparative Genomics Suggests a Taxonomic Revision of the Staphylococcus cohnii Species Complex.

Anna Lavecchia1, Matteo Chiara1,2, Caterina De Virgilio3, Caterina Manzari1, Carlo Pazzani4, David Horner1,2, Graziano Pesole1,3,5, Antonio Placido1.   

Abstract

Staphylococcus cohnii (SC), a coagulase-negative bacterium, was first isolated in 1975 from human skin. Early phenotypic analyses led to the delineation of two subspecies (subsp.), Staphylococcus cohnii subsp. cohnii (SCC) and Staphylococcus cohnii subsp. urealyticus (SCU). SCC was considered to be specific to humans, whereas SCU apparently demonstrated a wider host range, from lower primates to humans. The type strains ATCC 29974 and ATCC 49330 have been designated for SCC and SCU, respectively. Comparative analysis of 66 complete genome sequences-including a novel SC isolate-revealed unexpected patterns within the SC complex, both in terms of genomic sequence identity and gene content, highlighting the presence of 3 phylogenetically distinct groups. Based on our observations, and on the current guidelines for taxonomic classification for bacterial species, we propose a revision of the SC species complex. We suggest that SCC and SCU should be regarded as two distinct species: SC and SU (Staphylococcus urealyticus), and that two distinct subspecies, SCC and SCB (SC subsp. barensis, represented by the novel strain isolated in Bari) should be recognized within SC. Furthermore, since large-scale comparative genomics studies recurrently suggest inconsistencies or conflicts in taxonomic assignments of bacterial species, we believe that the approach proposed here might be considered for more general application.
© The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  zzm321990 Staphylococcus cohniizzm321990 ; DNA–DNA hybridization analyses; average nucleotide identity; comparative genomics; genome shotgun sequencing; phylogenetic analyses

Mesh:

Year:  2021        PMID: 33576800      PMCID: PMC8086632          DOI: 10.1093/gbe/evab020

Source DB:  PubMed          Journal:  Genome Biol Evol        ISSN: 1759-6653            Impact factor:   3.416


Significance In recent years, as the extent of its involvement in a multitude of human and animal infections has become evident, much research has been focused on Staphylococcus cohnii. Moreover, S. cohnii is also widely used as a model in the development of biotechnological applications and antibacterial medical devices. Its relevance notwithstanding, comparative genomic studies of this species complex are lacking, and the current work suggests the need for a major taxonomic revision. More generally, the computational approach presented here can be applied on a larger scale, both for the resolution of complex or conflicting taxonomic assignments and as a tool to contribute to the understanding of the history of biomedically and biotechnologically important traits at species and subspecies levels.

Introduction

Staphylococcus cohnii (SC), a Gram-positive bacterium of the Coagulase-Negative Staphylococci (CoNS) group, was first isolated by Schleifer and Kloos in 1975. The name conhii was adopted in memory of Ferdinand Julius Cohn, a German botanist and bacteriologist (Schleifer and Kloos 1975). According to the current classification—which is fundamentally based on phenotypic traits—SC includes two subspecies (subsp.): Staphylococcus cohnii subsp. cohnii (SCC) and Staphylococcus cohnii subsp. urealyticum (SCU). The SCC ATCC 29974 and SCU ATCC 49330 isolates have been designated as the type strains for SCC and SCU, respectively (Kloos and Wolfshohl 1983, 1991). The original spelling, SC subsp. urealytcum (sic), was corrected by Sneath to SC subsp. urealyticus (SCU) in 1992 (Sneath 1992). A larger colony size, distinct pigmentation, differences in fatty acid profile, and the presence of metabolic activities, including β-glucuronidase and β-galactosidase activities, delayed alkaline phosphatase activity, and the ability to produce acid aerobically from α-lactose discriminate SCU from SCC. Moreover, SCC was originally reported to colonize only humans, whereas SCU can also colonize other primates (Kloos and Wolfshohl 1991). More recently, SCU has also been isolated from healthy dogs (Bean et al. 2017) and goats (Seni et al. 2019). Similar to most CoNS, SCC and SCU are typically commensal bacteria of the skin and mucous membranes (Waldon et al. 2002; Crossley et al. 2009). However, several opportunistically pathogenic strains have been described and implicated in nosocomial infections, including meningitis, primary septic arthritis, septicaemia, brain abscess, and catheter invasion (Okudera et al. 1991; Mastroianni et al. 1996; Basaglia et al. 2003; Yamashita et al. 2005; Adeyemi et al. 2010; Mendoza-Olazarán et al. 2017). The ability to form biofilms appears to play an important role in staphylococcal virulence (Yong et al. 2019), and biofilm-associated infections are of particular concern because they are often difficult to resolve with antibiotics. Recent studies suggest that similar to other CoNS, SC can adhere to and invade human HeLa cells through the formation of biofilms (Szczuka et al. 2016), whereas strains of SC and especially those isolated from hospital environments, including pediatric wards and intensivecare units, have been reported to be resistant to several antibiotics (Szewczyk et al. 2000, 2004; Song et al. 2017). Indeed, multidrug-resistant CoNS bacteria constitute an emerging source of concern for Public Health Organizations (David and Elliott 2015; Moawad et al. 2019) as they have been associated with an increasing proportion of nosocomial infections, and because they can act as a reservoir of resistance determinants for Staphylococcus aureus through horizontal gene transfer (Otto 2013; Winstel et al. 2013; Larsen et al. 2017; Argemi et al. 2019). Comparative genomic and phylogenetic studies allow the characterization of evolutionary dynamics and the identification of genes and pathways potentially involved in pathogenesis and/or antibiotic resistance. Here we present comprehensive analyses of the complete collection of 65 publicly available SC genomes as well as that of a novel strain. We uncover striking patterns of genomic evolution, including high levels of genomic diversity and differential gene acquisition and loss, which suggest a taxonomic revision of the SC species complex. We propose that SC should be divided into two species, SC and SU (Staphylococcus urealyticus). Moreover, two subspecies SCC and SCB (SC subp. barensis, exemplified by our novel strain isolated in Bari) should be distinguished within SC. Of more general interest, we describe an approach based on the integration of different types of phylogenetic, genomic and gene content analyses that could be applied on a larger scale for the resolution of complex or conflicting taxonomic assignments.

Materials and Methods

Isolation of SC5 and Preliminary Taxonomic Detection

A Nunc bioassay plate containing Luria Bertani (LB) agar supplemented with 0.2 mM Potassium Chromate and Chloramphenicol (12.5 µg/ml) was prepared and placed on the worktop of a class II biological safety cabinet. Airflow was switched on for 1h, and the plate was subsequently incubated for 16 h at 37°C. Sixty-two colonies were grown. All colonies were replicated on LB agar containing Cr+6 150 mM. Only five colonies survived (Lavecchia et al. 2018) and were subjected to a preliminary taxonomic characterization through partial 16S rDNA amplification and Sanger sequencing. One colony preliminarily assigned to Staphyloccus cohnii and labeled SC5 was then subjected to whole-genome sequencing. Polymerase chain reaction (PCR) amplification of the SC5 16S rDNA was performed using 200 ng of DNA (see DNA Isolation section) and the following primers: 1 µM Forward 5′-TACGGGAGGCAGCAGTAG-3′ (16S rDNA position 369-386), 1 µM Reverse 5′-CATGGTGTGACGGGCG GT-3’ (position 1424–1441). The reaction mixture (final volume 50 µl) was completed using: each NTP 200 µM, 2 mM MgCL2, and 2 U Taq DNA polymerase (Thermo Fisher Scientific, Waltham, MA). Taq DNA polymerase was activated as follows: 94°C for 5 min; 30 cycles at 94°C for 30 s, 55°C for 15 s, and 72°C for 1 min. Sanger sequencing was performed by Macrogen (Amsterdam, Netherlands), and the preliminary taxonomic assessment was made by probing the 16S rRNA (Bacteria and Archaea) database available at the NCBI (https://blast.ncbi.nlm.nih.gov/Blast.cgi), using the BlastNalgorithm.

DNA Isolation, Library Preparation, and Sequencing

Genomic DNA was extracted with the DNeasy Blood and Tissue kit (Qiagen, Hilden, Germany). The library was prepared using the Nextera XT library prep workflow (Illumina, Illumina, San Diego, CA) and 2 × 250 nt paired-end reads generated on an Illumina MiSeq instrument.

Genome Assembly and Annotation

Raw data were processed using a modified version of the “Fosmid1” pipeline in the A-GAME (Chiara et al. 2018) Galaxy framework (Afgan et al. 2018). Quality trimming was executed using the sliding-window operation in Trimmomatic with default parameters (Bolger et al. 2014). Overlapping reads were merged using PEAR with standard parameters (Zhang et al. 2014). The final assembly was performed using the SPAdes assembler (version 3.50) using kmers of 33, 55, 77, 99, and 121 nt (Bankevich et al. 2012). Annotation was performed with PROKKA using default parameters (Seemann 2014).

Staphylococcus cohnii Genomes Used in This Study and Annotation of Protein-Coding Genes

The complete collection of 65 S. cohnii (SC) genome assemblies (including SC subsp. cohnii ATCC 29974 and SC subsp. urealyticus ATCC 49330), as available in GenBank on July 1, 2020, was downloaded from the NCBI assembly database, directly from the “Download Assemblies” link, as available from the web interface. To avoid possible ascertainment biases, all the genomes were reannotated using the procedure described above. Annotations of protein-coding genes, as obtained from Prokka, were used in all the subsequent analyses. A complete list of the accession number of the genomes used in this study is provided in supplementary table S1, Supplementary Material online.

Calculation of Average Nucleotide Identity and In Silico DNA–DNA Hybridization

Average Nucleotide Identity based on BLAST (ANIb) between all 66 genomes (SC5 included) was computed according to the method described by Rossello-Mora (Richter and Rosselló-Móra 2009), as implemented by a custom script, which is available at https://github.com/cvulpispaper/compute_anib. In silico DNA–DNA hybridization (DDH) was computed using the GGDC (Genome-to-Genome Distance Calculator 2.1) available from https://ggdc.dsmz.de/ggdc.php# (Meier-Kolthoff et al. 2013). As recommended in Auch et al.(2010), all the comparisons performed in this study were based on formula 2.

Clusters of Orthologous Genes, Core, and Accessory Genome

The makeblastdb utility, as incorporated in the blast+ software package (Camacho et al.2009) was used to prepare a Blast protein database, containing all of the protein-coding genes as predicted by Prokka in the 66 SC genomes included in this study. All-against-all BLASTp (Altschul et al. 1990) was performed using the BLOSUM80 matrix and accepting only best reciprocal hits with e-value ≤1e−5 and where “second-best” hits from the same genome produce bit scores <90% of that associated with the best match. Putative clusters of orthologous genes (COGs) were established as groups of best reciprocal BLAST hits. Core genes were defined as COGs containing single representatives from all genomes included in our analyses (or all genomes within major groups) and accessory genes as COGs with incomplete representation. The program used for the identification of COGs is available at https://github.com/cvulpispaper/compute_anib.

Estimation of Completeness of Core and Accessory Genomes

Size of core and accessory genomes were established by rarefaction analyses based on random resampling of genomic sequences of each major group. For each number of strains considered (2–28 for B, 2–22 for A1, 2–16 for A2, and 2–66 for SC) the inferred sizes of core and accessory genomes were recorded for 10,000 replicates of randomly selected combinations of genomes. Plots were prepared showing mean and standard deviation of these statistics.

Phyletic Patterns and Clustering of Gene Presence/Absence Profiles

The phyletic pattern of genes presence/absence in the genomes of the 66 SC isolates was inferred directly by comparison of clusters of orthologous genes. Only COGs containing ten or more genes were considered in this analysis. A matrix of gene presence/absence was compiled, with genes on the rows and isolates on the columns. A value of 0 was used to indicate the absence of a gene, a value of 1 its presence. A correlation-based distance matrix of gene presence/absence profiles was obtained by applying the cor and the dist functions, from the stat library of the R programming language with default parameters (Pearson correlation and Euclidean distances, respectively). Clustering was performed by applying the hclust function with median linkage, from the same software package.

Phylogenetic Analyses

The conceptually translated sequences of the 1468 SC core genes were independently aligned using Muscle (Edgar 2004) and ambiguously aligned regions were excluded using the GBlocks software (Castresana 2000). Maximum-likelihood phylogenetic reconstruction and bootstrap analyses of concatenated alignments were performed using the software PHYML (Guindon et al. 2009) under the WAG (Whelan and Goldman 2001) substitution model, suggested by the software ProtTest (Darriba et al. 2011) to best fit the data, with invariable and four gamma-distributed substitution rate categories.

Statistical Analyses

Welch t-test P-values for the comparison of ANIb distributions and the size of the core and accessory genomes were computed by means of the t-test function as implemented in the stats R package.

Results

Isolation and Whole-Genome Shotgun Sequencing of a Novel Strain of Staphylococcus cohnii

Five strains of staphylococci were isolated from a disused class II biological safety cabinet during a study aiming to identify bacterial strains resistant to hexavalent chromate (Lavecchia et al. 2018). Preliminary taxonomic analyses based on partial 16S rDNA Sanger sequencing identified four of these strains as Staphylococcus arlettae (Lavecchia et al. 2018), whereas one isolate showed high levels of similarity (99.1% and 98.8%), respectively, with the type strains of SCC (Staphylococcus cohnii subsp. cohnii) and SCU (Staphylococcus cohnii subsp. urealiticus). The latter, preliminarily named SC5, was subsequently subjected to Whole-Genome Shotgun Sequencing using an Illumina MiSeq instrument. A total of 2,402,324 paired-end reads were obtained, with an average insert size of 254.74 bp, providing a theoretical 230× coverage of the genome. Raw reads were subjected to quality trimming and assembly by means of a modified version of the Fosmid1 pipeline as incorporated in A-GAME (Chiara et al. 2018). Salient features of the SC5 genome assembly are summarized in table 1. An overall good level of contiguity was observed with more than 90% of the assembly incorporated in contigs >100 kb in size (N90 108 kb). A total of 10 rRNA, 61 tRNA, and 2,510 protein-coding genes were predicted by in silico annotation of the genome. Of note, the emrA and emrB genes, implicated in chromate and ampicillin co-resistance in Staphylococcus aureus LZ 01 (Zhang et al. 2016), were identified in the genome of SC5. These genes are also observed in the genomes of the other four S. arlettae chromium-resistant strains isolated from the same environment. The draft genome sequence of SC5 was deposited in NCBI under the accession number JAALCY000000000, BioSample accession number SAMN14142771, and BioProject number ID PRJNA607668.
Table 1

Main Genome Assembly Features of SC5, SCC ATCC 29974 and SCU ATCC 49330 Strains

StrainSize (Mb)GC (%)ContigsN50 (kb)ProteinsrRNAstRNAsDDH (%)a
ANIb (%)b
SC5SCCSCUSC5SCCSCU
SC5 2.6232.2348062,510106167.541.995.491.0
SCC ATCC 29974 2.7132.6831142,42295867.595.491.5
SCU ATCC 49330 2.6732.5223262,457136141.991.091.5

Note.—DDH and ANIb values between strains.

DDH (cut-off for species affiliation > 70%);

ANIb (cut-off for species affiliation > 96%). SC5, our isolate; SCC ATCC 29974, Staphylococcus cohnii subsp. cohnii (type strain) and SCU ATCC 49330, Staphylococcus cohnii subsp. urealyticus (type strain).

Main Genome Assembly Features of SC5, SCC ATCC 29974 and SCU ATCC 49330 Strains Note.—DDH and ANIb values between strains. DDH (cut-off for species affiliation > 70%); ANIb (cut-off for species affiliation > 96%). SC5, our isolate; SCC ATCC 29974, Staphylococcus cohnii subsp. cohnii (type strain) and SCU ATCC 49330, Staphylococcus cohnii subsp. urealyticus (type strain).

In Silico DNA-DNA Hybridization Analyses

In silico DNA-DNA hybridization (DDH) analyses were performed to refine the taxonomic delineation of SC5. Strikingly, although 16S rDNA taxonomic assignment suggested that SC5 was closely related to SCC, in silico hybridization assays against the SCC ATCC 29974 and SCU ATCC 49330 type strains recovered somewhat unexpected patterns. Indeed (table 1), the observed DDH values, 67.5% and 41.9%, respectively, for SCC and SCU, were borderline or well below the cut-off value of 70%, that is, normally used to delineate species by this method (Meier-Kolthoff et al. 2013; Colston et al. 2014; Garrido-Sanz et al. 2016). A systematic comparison of in silico hybridization profiles of SC5 against the complete collection of the other 65 SC draft genomes considered in this study (supplementary table S1, Supplementary Material online) was performed. Notably, contrasting patterns of sequence similarity profiles (supplementary table S2, Supplementary Material online) were observed. SC5 showed DDH levels > 90% with 15 isolates (supplementary fig.S1, Supplementary Material online), but lower than 70% with 22 (including the SCC ATCC 29974 type strain) and around 45% with the remaining 28 isolates (including the SCU ATCC 49330 type strain).

Analysis of Genomic Identity

Levels of pairwise genome identity between all currently available SC genomes were established by Average Nucleotide Identity on BLAST (ANIb). Hierarchical clustering of ANIb profiles was applied to identify clades/groups of SC genomes with similar levels of genome identity. As shown in figure 1 and supplementary figure S2, Supplementary Material online, and consistent with patterns of in silico DDH profiles, the results of these analyses suggested the presence of three distinct clusters with different mutual levels of ANIb within the SC species complex. The first two groups of isolates, referred to hereafter as A1 and A2, correspond to isolates SCC and are more closely related, whereas the third group (referred as B) is more distantly related to A1 and A2 and is composed exclusively of SCU isolates. A1 incorporates 22 strains, including the SCC type strain ATCC 29974, A2 is composed of 16 strains and includes SC5. Finally, group B contains 28 isolates, including the type strain SCU ATCC 49330.
Fig. 1

Heatmap of ANIb between genomes of Staphylococcus cohnii isolates. ANIb values are represented using a gray scale color map, with darker colors indicating higher levels of identity, according to the scale represented on the top. Strain identifiers are indicated on the rows. The panel on the left indicates cluster memberships, according to the following color codes: green = B, dark purple = A1, and light purple = A2. Columns and row dendrograms are used to group SC strains based on patterns of genome identity profiles. The novel SC5 isolate and the two type strains SCC ATCC 29974 and SCU ATCC 49330 are highlighted in red and underlined. The SE4.1, SE 4.2, and SE3.10 strains that are also discussed in the text are highlighted in red.

Heatmap of ANIb between genomes of Staphylococcus cohnii isolates. ANIb values are represented using a gray scale color map, with darker colors indicating higher levels of identity, according to the scale represented on the top. Strain identifiers are indicated on the rows. The panel on the left indicates cluster memberships, according to the following color codes: green = B, dark purple = A1, and light purple = A2. Columns and row dendrograms are used to group SC strains based on patterns of genome identity profiles. The novel SC5 isolate and the two type strains SCC ATCC 29974 and SCU ATCC 49330 are highlighted in red and underlined. The SE4.1, SE 4.2, and SE3.10 strains that are also discussed in the text are highlighted in red. Comparisons of genomic identity levels between the draft genomes of SCC and SCU show an average ANIb of 89.43%, a value that is well below the cut-off normally considered for inclusion in the same bacterial species (Otto 2008; Varghese et al. 2015). This approach also sustains the presence of two distinct clusters within SCC, with significantly different levels of ANIb (t-test P-value ≤1e−16) and an average genome sequence identity of 95.82% (supplementary fig.S2, Supplementary Material online), a value that is normally considered borderline for the identification of bacterial species (Otto 2008; Varghese et al. 2015). Taken together, our analyses of genomic similarity profiles by means of two independent methods strongly suggest that according to the current guidelines for the delineation of bacterial species, SCC and SCU should be considered as two distinct species. Consistent with this consideration, we observed that the ANIb value recovered from the comparison of the two type strains SCC ATCC 29974 and SCU ATCC 49330 is 91.5%. Notably, comparisons between the draft genome assembly of SC5 with the SCC ATCC 29974 and SCU ATCC 49330 type strains resulted in ANIb values of 95.4% and 91.0%, respectively (table 1).

Cluster of orthologous genes and Phylogenetic Analyses

Cluster of orthologous genes (COGs) as well as core and accessory genomes were established using an approach based on best reciprocal BLAST hits. A total of 5,456 clusters of putative orthologs with more than one gene and 5,044 singleton genes were identified. Hierarchical clustering of phenetic patterns of gene presence/absence profiles was applied to identify SC isolates with a similar gene content. Consistent with our previous observations, three distinct groups were observed, identical in size and composition with A1, A2, and B. Notably, although the A1 and A2 clades were delineated very clearly by this analysis (fig. 2), suggesting a similar gene content within isolates of these groups, a somewhat more heterogeneous pattern was observed for group B, suggesting a higher plasticity of the pan-genome, possibly associated with lateral gene transfer.
Fig. 2

Heatmap of gene presence/absence profiles of Staphylococcus cohnii isolates. Similarity of gene presence/absence profiles were estimated by computing pairwise Pearson correlation values between all the 66 genomes considered in the study. Pearson correlation coefficients are represented using a gray scale color map. Darker colors indicated higher correlation (similarity) of gene presence/absence profiles. Strain identifiers are indicated on the rows. The panel on the left is used to indicate cluster memberships, with the color codes defined in figure 1. Similar to figure 1, dendrograms are applied to the columns and rows to delineate groups of isolates with similar gene absence prevalence profiles. SC5 and the two type strains SCC ATCC 29974 and SCU ATCC 49330 are highlighted in red and underlined. The SE4.1, SE 4.2, and SE3.10 strains are highlighted in red.

Heatmap of gene presence/absence profiles of Staphylococcus cohnii isolates. Similarity of gene presence/absence profiles were estimated by computing pairwise Pearson correlation values between all the 66 genomes considered in the study. Pearson correlation coefficients are represented using a gray scale color map. Darker colors indicated higher correlation (similarity) of gene presence/absence profiles. Strain identifiers are indicated on the rows. The panel on the left is used to indicate cluster memberships, with the color codes defined in figure 1. Similar to figure 1, dendrograms are applied to the columns and rows to delineate groups of isolates with similar gene absence prevalence profiles. SC5 and the two type strains SCC ATCC 29974 and SCU ATCC 49330 are highlighted in red and underlined. The SE4.1, SE 4.2, and SE3.10 strains are highlighted in red. Analysis of the SC core genome provides additional evidence for the different gene content in the A1, A2, and B. Indeed, when all the 66 available genomes are considered, a core genome of 1,468 genes was recovered, whereas notable differences were observed in the size of the core genome between the three groups (fig. 3).
Fig. 3

Core and accessory genome. (A) Plot of core genome size in the Staphylococcus cohnii species complex and in the three distinct groups of SC (A1, A2, and B) delineated in this study. X axis = number of genomes and Y axis = number of genes. (B) Plot of accessory genome size in S. cohnii and in the three distinct groups of SC identified by this study. X axis = number of genomes and Y axis = number of genes.

Core and accessory genome. (A) Plot of core genome size in the Staphylococcus cohnii species complex and in the three distinct groups of SC (A1, A2, and B) delineated in this study. X axis = number of genomes and Y axis = number of genes. (B) Plot of accessory genome size in S. cohnii and in the three distinct groups of SC identified by this study. X axis = number of genomes and Y axis = number of genes. Although the estimated core genome size for the A2 cluster was1,889 genes, the core genomes of A1 and B were larger, with an estimated sizes of 1,993 and 1,953 genes, respectively. As our analyses are based on nearly equivalent numbers of isolates for every group, this difference is unlikely to be the result of a biased sampling but might reflect a tendency for a more compact genome with a reduced number of genes in the A2 clade. Consistent with this hypothesis, we observe that the average number of predicted protein coding genes is significantly reduced in A2 (average 2,566) with respect to A1 (2,618) and B (2,651), with P-values of 0.06 and 0.022, respectively, according to a Welch t-test. Although our analyses suggest that the accessory genome of SC is relatively open (fig. 3) and that additional genes are likely to be discovered as new genomic sequences become available, we note once again that the accessory genome in A2 is substantially reduced with respect to A1 and B, again consistent with systematic differences in gene content. Phylogenetic analyses of the concatenated alignment of 1,468 core genes (fig. 4), recovered a tree with a topology consistent with the clustering of the isolates based on genome identity levels, providing an additional line of evidence for the presence of three distinct clades within the SC species complex. However, we notice that according to our phylogenetic analyses, A1 is not strictly monophyletic and a distinct basal clade formed by three isolates: SE4.1, SE 4.2, and SE3.10 is observed. Interestingly, a similar pattern is replicated also in figure 1, where the same group of isolates (SE4.1, SE 4.2, and SE3.10) form an identical basal clade, suggesting overall reduced levels of genome identity of these three strains with the other strains included in A1. Notably, the same pattern is not recovered when clustering of isolates based on gene content is considered (fig. 2). This indicates that the gene content of SE4.1, SE 4.2, and SE3.10 is highly consistent with that of other strains in the A1 cluster. Taken all together, these observations might suggest widespread lateral gene transfer between SE4.1, SE 4.2, and SE3.10 and other SC isolates or alternatively, that while having a similar gene content with strains included in the A1 group, overall SE4.1, SE 4.2, and SE3.10 are highly divergent at sequence level. Possibly indicating faster evolutionary rates and/or ongoing diversifying selection. Importantly, we underscore that these three strains were isolated from a similar environment (rice seeds), and geographic location (India), consistent with the hypothesis that the observed reduction in genome/protein identity levels might reflect the regional/environmental diversity of bacterial communities (Lozupone and Knight, 2008).
Fig. 4

Phylogenetic tree of Staphylococcus cohnii isolates based on concatenated alignment of 1,468 core genes. Branch colors indicate the different groups identified in this study, according to the color code defined in figure 1. Bootstrap values below 95 are reported on the corresponding branches. The SE4.1, SE 4.2, and SE3.10 strains are marked in red. SC5 isolate and the two type strains SCC ATCC 29974 and SCU ATCC 49330 are highlighted in red and underlined.

Phylogenetic tree of Staphylococcus cohnii isolates based on concatenated alignment of 1,468 core genes. Branch colors indicate the different groups identified in this study, according to the color code defined in figure 1. Bootstrap values below 95 are reported on the corresponding branches. The SE4.1, SE 4.2, and SE3.10 strains are marked in red. SC5 isolate and the two type strains SCC ATCC 29974 and SCU ATCC 49330 are highlighted in red and underlined.

Molecular Discrimination of A1, A2, and B Strains

Interestingly, analyses of core genome composition between the three SC groups proposed here, identify 8, 6, and 10 genes that are universally present in the genomes of A1, A2, and B strains, respectively, and consistently absent from genomes of the other groups. These genes (Table 2) are not adjacent in the genomes suggesting that they do not represent operons. PCR assays based on the presence/absence profiles of these genes might be used to discriminate between members of the A1, A2, and B groups.
Table 2

A1, A2, and B Strain-Specific Genes

GroupAccessionAnnotation
A1WP_103211109.1ABC-F family ATP-binding cassette domain-containing protein
A1WP_019468720.1tRNA (N6-isopentenyl adenosine(37)-C2)-methylthiotransferase MiaB
A1WP_019468192.1MFS transporter
A1WP_040030451.1Aldehyde dehydrogenase family protein
A1WP_019468481.1Trigger factor
A1WP_103211478.1M15 family metallopeptidase
A1WP_040030229.1Class 1b ribonucleoside-diphosphate reductase
A1WP_019468907.1Hypothetical protein
A2WP_107523479.1Orotidine-5′-phosphate decarboxylase
A2WP_107505199.1BtrH N-terminal domain-containing protein
A2WP_019468295.1Uracil phosphoribosyltransferase
A2WP_181187692.1Hypothetical protein
A2WP_107384484.1Arsenate reductase
A2WP_107384163.1LytTR family transcriptional regulator
BWP_073342256.1NAD(P)/FAD-dependent oxidoreductase
BWP_073344697.1DUF4352 domain-containing protein
BWP_046206585.1DUF4064 domain-containing protein
BWP_073344420.1ABC transporter substrate-binding protein
BWP_073345446.1NAD(P)H-dependent oxidoreductase
BWP_103161674.1Anion permease
BWP_103161408.1Glucose 1-dehydrogenase
BWP_073343781.1Energy-coupling factor transporter
BWP_073341550.1Amino acid ABC transporter ATP-binding
BWP_073342088.1Thymidylate synthase

Note.—The protein ID and the relevant annotation is shown for each gene.

A1, A2, and B Strain-Specific Genes Note.—The protein ID and the relevant annotation is shown for each gene. Note.—The protein ID and the relevant annotation is shown for each gene.

Discussion

Difficulties in the taxonomic assignment of a novel Staphylococcus isolate based on well-established genome identity metrics (ANIb and DDH), prompted us to perform an extensive phylogenomic analysis of the Staphylococcus cohnii (SC) species complex, based on all (65) currently available draft genome assemblies. Analyses based on genomic identity levels, core and accessory genome size, gene content, and phylogenetic approaches consistently and strongly suggest that the SC species complex, as currently defined is composed of at least three distinct groups, and advocate a revision of the current phylogenetic classification of SC. Based on analyses presented in this study, and on the current guidelines and best practices for the classification of bacterial species (Varghese et al. 2015), we propose that the two SC subspecies, as described by Schleifer and Kloos (1975), should be instead regarded as two distinct species: SC, corresponding to groups A1 and A2 in this work (fig. 2) and Staphylococcus urealyticus (SU), corresponding to group B. Additionally, the revised classification of SC should include two subspecies: Staphylococcus cohnii subsp. cohnii (SCC), corresponding to group A1 and Staphylococcus cohnii subsp. barensis (SCB), corresponding to group A2 and including our novel isolate. Importantly, although the three different clades were not associated with any evident morphological or phenotypic difference, possible criteria for an effective discrimination of SU, SCC, and SCB are proposed in the current study. These include the application of standard approaches based either on DDH and ANIb, two genome-wide similarity metrics, which are considered a reference standard for delineation of bacterial species. Moreover, according to our analyses, a limited, but consistent number of lineage-specific genes is observed in each of SU, SCC, and SCB. In principle, simple tests based on targeted PCR resequencing of these genes could be used to develop a highly effective molecular assays for the discrimination of both species and subspecies proposed here. Although increased future environmental sampling may help overcome any ascertainment bias inherent in the available cohort, and to resolve the uncertainty of the phylogenetic placement of the three isolates from rice seeds, our study demonstrates a wider host range for SCC than previously hypothesized (Kloos and Wolfshohl 1983, 1991) and includes isolates from plants (rice seed) and vegetable liquid food (soy sauce) (table 3). Conversely, isolates of SCB strains displayed a narrower hosts range, as 15 out of 16 strains were isolated from Bos taurus intramammary infections (bovine mastitis) and one (the novel strain described in this study) isolated from a disused biological cabinet. Intriguingly, we observe that isolates of SCB also show a more compact genome and a significant reduction in gene content, compared to SCC a consideration that might at least in part explain its more reduced/specialized host range. The apparent host range of SU, based on the source of isolation as reported in the metadata associated with GenBank genome submissions, is consistent with previous reports with members of this candidate species isolated both from humans (e.g., skin, blood, catheter) and other animals, including ducks, dogs, cows, and goats (table 3).
Table 3

Staphylococcus cohnii Genome Metadata,

StrainIsolation SourceAssemblyBioSampleBioProjectGeographic LocationCurrent RankProposed Rank
SC5 Biological Safety Cabinet SAMN14142771 PRJNA607668 Bari, Italy SCC SCB
SNUC 2659 Bos taurus bovine mastitisGCA_003035875.1SAMN06172961PRJNA342349Quebec, CanadaSCSCB
SNUC 5656 Bos taurus bovine mastitisGCA_003035785.1SAMN06172970PRJNA342349Quebec, CanadaSCSCB
SNUC 5133 Bos taurus bovine mastitisGCA_003577875.1SAMN06172969PRJNA342349Quebec, CanadaSCSCB
SNUC 1036 Bos taurus bovine mastitisGCA_003035975.1SAMN06172951PRJNA342349Ontario, CanadaSCSCB
SNUC 1120 Bos taurus bovine mastitisGCA_003035965.1SAMN06172955PRJNA342349Quebec, CanadaSCSCB
SNUC 969 Bos taurus bovine mastitisGCA_003039995.1SAMN06172950PRJNA342349Quebec, CanadaSCSCB
SNUC 4643 Bos taurus bovine mastitisGCA_003577905.1SAMN06172967PRJNA342349Ontario, CanadaSCSCB
SNUC 1071 Bos taurus bovine mastitisGCA_003578125.1SAMN06172953PRJNA342349Ontario, CanadaSCSCB
SNUC 2129 Bos taurus bovine mastitisGCA_003039915.1SAMN06172958PRJNA342349Quebec, CanadaSCSCB
SNUC 5124 Bos taurus bovine mastitisGCA_003035485.1SAMN06172968PRJNA342349Quebec, CanadaSCSCB
SNUC 3213 Bos taurus bovine mastitisGCA_003577955.1SAMN06172962PRJNA342349Ontario, CanadaSCSCB
SNUC 2486 Bos taurus bovine mastitisGCA_003035905.1SAMN06172960PRJNA342349Atlantic, CanadaSCSCB
SNUC 3829 Bos taurus bovine mastitisGCA_003035865.1SAMN06172964PRJNA342349Ontario, CanadaSCSCB
SNUC 4546 Bos taurus bovine mastitisGCA_003577915.1SAMN06172965PRJNA342349Ontario, CanadaSCSCB
SNUC 4640 Bos taurus bovine mastitisGCA_003035505.1SAMN06172966PRJNA342349Ontario, CanadaSCSCB
FDARGOS_538Human clinical isolateGCA_003956025.1SAMN10163250PRJNA231221MissingSCSCC
SE4.4Rice seedGCA_001876785.1SAMN03097241PRJNA263233IndiaSCSCC
SE4.3Rice seedGCA_001876755.1SAMN03097240PRJNA263231IndiaSCSCC
ATCC 29974 Human isolate GCA_900240165.1 SAMEA104410613 PRJEB22856 Liverpool, UK SCC SCC
SE4.5Rice seedGCA_001876805.1SAMN03097242PRJNA263234IndiaSCSCC
NCTC 11041Human skinGCA_002902365.1SAMN06177162PRJNA339206London, UKSCCSCC
NCTC 11041Human skinGCA_900458255.1SAMEA3871778PRJEB6403USASCSCC
G22B2Human gall bladderGCA_000981215.1SAMN03352186PRJNA275680Chandigarh, IndiaSCCSCC
hu-01Human skin swab sampleGCA_000513495.2SAMN02388844PRJNA225658Hangzhou, ChinaSCCSCC
AL1Soy sauce (plant food)GCA_000292305.1SAMN02471867PRJNA171726MalaysiaSCSCC
DE0361EnvironmentalGCA_007673385.1SAMN11792521PRJNA543692Durham, North Carolina, USASCSCC
DE0431EnvironmentalGCA_007668065.1SAMN11792591PRJNA543692Durham, North Carolina, USASCSCC
DE0071EnvironmentalGCA_007679825.1SAMN11792231PRJNA543692Durham, North Carolina, USASCSCC
DE0552EnvironmentalGCA_007666185.1SAMN11792712PRJNA543692Durham, North Carolina, USASCSCC
DE0325EnvironmentalGCA_008764065.1SAMN11792485PRJNA543692Durham, North Carolina, USASCSCC
NBRC 109713Human skinGCA_007992675.1SAMD00172682PRJDB1638MissingSCCSCC
DE0360EnvironmentalGCA_007673395.1SAMN11792520PRJNA543692Durham, North Carolina, USASCSCC
YNSA55HumanGCA_005861955.1SAMN11775280PRJNA543691Yunnan, ChinaSCSCC
H62Environmetal (air)GCA_001650645.1SAMN04591361PRJNA316869California, USASCSCC
SE4.1Rice seedGCA_001876705.1SAMN03097238PRJNA263229IndiaSCSCC
SE4.2Rice seedGCA_001876735.1SAMN03097239PRJNA263230IndiaSCSCC
SE3.10Rice seedGCA_001876725.1SAMN03097237PRJNA263228IndiaSCSCC
FDARGOS_334Human peripheral bloodGCA_002984565.1SAMN06173347PRJNA231221Maryland, USASCSU
532Human catheterGCA_000972575.1SAMN03449104PRJNA279286Nuevo Leon, Monterrey, MexicoSCCSU
57Human bloodGCA_000972565.1SAMN03449103PRJNA279286Nuevo Leon, Monterrey, MexicoSCUSU
DSM 6718 = ATCC 49330 Human skin GCA_002902235.1 SAMN05977987 PRJNA339206 Braunschweig, Germany SCU SU
SW120 Canis lupus ear swabGCA_001896245.1SAMN06043532PRJNA354224Ballarat, AustraliaSCUSU
RIT614Mobile phone (urban biome)GCA_003725375.1SAMN10392928PRJNA504471USASCSU
MF1844Poultry processing equipmentGCA_001651275.1SAMN04479463PRJNA311173NorwaySCSU
073ANGoat perinealGCA_900097955.1SAMEA3109313PRJEB2655Tanzania, AfricaSCU/SCCSU
SNUC 156 Bos taurus bovine mastitisGCA_003035945.1SAMN06172949PRJNA342349Alberta, CanadaSCSU
SNUC 1322 Bos taurus bovine mastitisGCA_003039955.1SAMN06172956PRJNA342349Ontario, CanadaSCSU
SNUC 5 Bos taurus bovine mastitisGCA_003036005.1SAMN06172948PRJNA342349Alberta, CanadaSCSU
SNUC 5710 Bos taurus bovine mastitisGCA_003035835.1SAMN06172971PRJNA342349Quebec, CanadaSCSU
SNUC 1091 Bos taurus bovine mastitisGCA_003039975.1SAMN06172954PRJNA342349Ontario, CanadaSCSU
SNUC 2341 Bos taurus bovine mastitisGCA_003035925.1SAMN06172959PRJNA342349Atlantic, CanadaSCSU
SNUC 1638 Bos taurus bovine mastitisGCA_003039935.1SAMN06172957PRJNA342349Ontario, CanadaSCSU
SNUC 3536 Bos taurus bovine mastitisGCA_003577935.1SAMN06172963PRJNA342349Ontario, CanadaSCSU
DE0122EnvironmentalGCA_007679195.1SAMN11792282PRJNA543692Durham, North Carolina, USASCSU
DE0450EnvironmentalGCA_007667835.1SAMN11792610PRJNA543692Durham, North Carolina, USASCSU
DE0303EnvironmentalGCA_007674235.1SAMN11792463PRJNA543692Durham, North Carolina, USASCSU
NBRC 109766Human skinGCA_007992755.1SAMD00172686PRJDB6345MissingSCUSU
DE0536EnvironmentalGCA_007666405.1SAMN11792696PRJNA543692Durham, North Carolina, USASCSU
DE0534EnvironmentalGCA_007666455.1SAMN11792694PRJNA543692Durham, North Carolina, USASCSU
DE0524EnvironmentalGCA_007666625.1SAMN11792684PRJNA543692Durham, North Carolina, USASCSU
DE0550EnvironmentalGCA_008763915.1SAMN11792710PRJNA543692Durham, North Carolina, USASCSU
DE0506EnvironmentalGCA_007666955.1SAMN11792666PRJNA543692Durham, North Carolina, USASCSU
SNUDS 2Duck brainGCA_900240165.1SAMN06286343PRJNA369449Seuol, SouthCoreaSCUSU
UBA2625Environmental (metal)GCA_002359925.1SAMN06452948PRJNA348753New York City, USASCSU
SNUC 1067 Bos taurus bovine mastitisGCA_003578165.1SAMN06172952PRJNA342349Ontario, CanadaSCSU

Metadata as reported in GenBank.

Colors were assigned according to the three cluster memberships as in figure 1, A1  = dark purple; A2 = light purple; B = green. Current ranks: SC, Staphylococcus cohnii; SCC, Staphylococcus cohnii subsp. cohnii; SCU, Staphylococcus cohnii subsp. urealyticus. Proposed ranks: SCB, Staphylococcus cohnii subsp. barensis; SCC, Staphylococcus cohnii subsp. cohnii; SU, Staphylococcus urealyticus. In bold, SC5 and type strains (ATCC 29974 and ATCC 49330).

Staphylococcus cohnii Genome Metadata, Metadata as reported in GenBank. Colors were assigned according to the three cluster memberships as in figure 1, A1  = dark purple; A2 = light purple; B = green. Current ranks: SC, Staphylococcus cohnii; SCC, Staphylococcus cohnii subsp. cohnii; SCU, Staphylococcus cohnii subsp. urealyticus. Proposed ranks: SCB, Staphylococcus cohnii subsp. barensis; SCC, Staphylococcus cohnii subsp. cohnii; SU, Staphylococcus urealyticus. In bold, SC5 and type strains (ATCC 29974 and ATCC 49330). Consistent with the problems encountered in the taxonomic classification of our novel SCB isolate, we observed that taxonomic classifications of several SC isolates deposited in GenBank are incomplete or discordant with our analyses. For example, strain 532 (Mendoza-Olazarán et al. 2017) is assigned to SCC in the NCBI biosample but is likely a member of SCU. Furthermore, strain 073AN (NCBI draft genome A.N.: FMPF00000000.1), isolated in Tanzania (Africa) from a goat, is labeled as SCC in the associated Biosample and Bioproject submissions (SAMEA3109313 and ERS576551) (INSDC, last update 2016-09-28, https://www.ncbi.nlm.nih.gov/biosample/SAMEA3109313/) although both the original publication (Seni et al. 2019) and our own analyses indicate that it is SCU. These misclassifications are likely consequences of problems in the initial classification of the isolates, arising from ambiguities in the delineation of the SC species complex. Consistent with this hypothesis, we observe that in many cases, the classification of SC strains is limited to species level. Misclassified sequences, errors in initial taxonomic classification, and annotation can occur and be propagated resulting in the unintentional misclassification of bacterial strains (Federhen 2015). Accordingly, we believe that the approach and criteria presented here may be of general interest and could be applied on a larger scale for the resolution of complex/conflicting taxonomic assignments.

Data Availability

The SC5 draft genome is available in NCBI under the accession number JAALCY000000000, BioSample accession number SAMN14142771, and BioProject number ID PRJNA607668.

Supplementary Material

Supplementary tables S1–S2 and figures S1–S2 are available at Genome Biology and Evolution online. Click here for additional data file.
  46 in total

1.  A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach.

Authors:  S Whelan; N Goldman
Journal:  Mol Biol Evol       Date:  2001-05       Impact factor: 16.240

2.  SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.

Authors:  Anton Bankevich; Sergey Nurk; Dmitry Antipov; Alexey A Gurevich; Mikhail Dvorkin; Alexander S Kulikov; Valery M Lesin; Sergey I Nikolenko; Son Pham; Andrey D Prjibelski; Alexey V Pyshkin; Alexander V Sirotkin; Nikolay Vyahhi; Glenn Tesler; Max A Alekseyev; Pavel A Pevzner
Journal:  J Comput Biol       Date:  2012-04-16       Impact factor: 1.479

Review 3.  Staphylococcal biofilms.

Authors:  M Otto
Journal:  Curr Top Microbiol Immunol       Date:  2008       Impact factor: 4.291

4.  Predominant staphylococci in the intensive care unit of a paediatric hospital.

Authors:  E M Szewczyk; A Piotrowski; M Rózalska
Journal:  J Hosp Infect       Date:  2000-06       Impact factor: 3.926

5.  Genome sequence-based species delimitation with confidence intervals and improved distance functions.

Authors:  Jan P Meier-Kolthoff; Alexander F Auch; Hans-Peter Klenk; Markus Göker
Journal:  BMC Bioinformatics       Date:  2013-02-21       Impact factor: 3.169

6.  Bacterial bloodstream infections in HIV-infected adults attending a Lagos teaching hospital.

Authors:  Adeleye I Adeyemi; Akanmu A Sulaiman; Bamiro B Solomon; Obosi A Chinedu; Inem A Victor
Journal:  J Health Popul Nutr       Date:  2010-08       Impact factor: 2.000

7.  Draft Genome Sequence of Staphylococcus cohnii subsp. urealyticus Isolated from a Healthy Dog.

Authors:  David C Bean; Sarah M Wigmore; David W Wareham
Journal:  Genome Announc       Date:  2017-02-16

8.  Staphylococcus aureus CC395 harbours a novel composite staphylococcal cassette chromosome mec element.

Authors:  Jesper Larsen; Paal S Andersen; Volker Winstel; Andreas Peschel
Journal:  J Antimicrob Chemother       Date:  2017-04-01       Impact factor: 5.790

9.  Wall teichoic acid structure governs horizontal gene transfer between major bacterial pathogens.

Authors:  Volker Winstel; Chunguang Liang; Patricia Sanchez-Carballo; Matthias Steglich; Marta Munar; Barbara M Bröker; Jose R Penadés; Ulrich Nübel; Otto Holst; Thomas Dandekar; Andreas Peschel; Guoqing Xia
Journal:  Nat Commun       Date:  2013       Impact factor: 14.919

10.  Trimmomatic: a flexible trimmer for Illumina sequence data.

Authors:  Anthony M Bolger; Marc Lohse; Bjoern Usadel
Journal:  Bioinformatics       Date:  2014-04-01       Impact factor: 6.937

View more
  1 in total

1.  Extensive Horizontal Gene Transfer within and between Species of Coagulase-Negative Staphylococcus.

Authors:  Joshua T Smith; Cheryl P Andam
Journal:  Genome Biol Evol       Date:  2021-09-01       Impact factor: 3.416

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.