Literature DB >> 26203341

Complete genome sequence of Salmonella enterica subspecies arizonae str. RKS2983.

Chun-Xiao Wang1, Song-Ling Zhu1, Xiao-Yu Wang1, Ye Feng2, Bailiang Li1, Yong-Guo Li3, Randal N Johnston4, Gui-Rong Liu1, Jin Zhou5, Shu-Lin Liu6.   

Abstract

Salmonella arizonae (also called Salmonella subgroup IIIa) is a Gram-negative, non-spore-forming, motile, rod-shaped, facultatively anaerobic bacterium. S. arizonae strain RKS2983 was isolated from a human in California, USA. S. arizonae lies somewhere between Salmonella subgroups I (human pathogens) and V (also called S. bongori; usually non-pathogenic to humans) and so is an ideal model organism for studies of bacterial evolution from non-human pathogen to human pathogens. We hence sequenced the genome of RKS2983 for clues of genomic events that might have led to the divergence and speciation of Salmonella into distinct lineages with diverse host ranges and pathogenic features. The 4,574,836 bp complete genome contains 4,203 protein-coding genes, 82 tRNA genes and 7 rRNA operons. This genome contains several characteristics not reported to date in Salmonella subgroup I or V and may provide information about the genetic divergence of Salmonella pathogens.

Entities:  

Keywords:  Facultative anaerobe; Genomic evolution; Host-adapted; S. enterica subspecies arizonae RKS2983; Salmonella pathogenicity islands

Year:  2015        PMID: 26203341      PMCID: PMC4511000          DOI: 10.1186/s40793-015-0015-z

Source DB:  PubMed          Journal:  Stand Genomic Sci        ISSN: 1944-3277


Introduction

are Gram-negative facultative anaerobic bacteria of the family inhabiting the gastrointestinal tract of a wide variety of animals. There are currently over 2,600 serotypes (also called serovars) documented in the genus . By chromosomal DNA hybridization experiments and MLEE, currently are classified into two species, and (formerly subgroup V). The species is further divided into six subspecies, including , , , , , and , corresponding to the former subgroups I, II, IIIa, IIIb, IV and VI, respectively. Additionally, subgroup VII was described by Boyd et al. [1],[2]. taxonomy is a dynamic field of research and many issues remain unsolved, especially regarding species definition [3]-[5]. To avoid confusions, therefore, we use the traditional classification system and the terms subgroup and serotype rather than subspecies or serovar (see more detailed explanation in [5]). Most of infections in warm-blooded animals are caused by subgroup I serotypes, and non-subgroup I serotypes are typically associated with cold-blooded vertebrates and rarely colonize the intestines of warm-blooded animals. evolved from a common ancestor with about 120–150 million years ago [6],[7]. During the evolutionary process, several key genomic events might have led bacteria to diverge, such as gene mutation and gene acquisition or loss [8]. Importantly, numerous lines of evidence have indicated that gene acquisition and loss are the major force driving the evolution of virulence in [9]. In fact, it has been postulated that the evolution of -specific virulence can be divided into three phases. The first phase is the split of and by the acquisition of pathogenicity island 1, which is present in all lineages of but absent from .SPI-1 encodes virulence factors that strengthen the infection of serotypes by different mechanisms, including the invasiveness of the bacteria into intestinal epithelial cells [10], induction of neutrophil recruitment, and secretion of intestinal fluid [11]-[13]. The second phase is the divergence of into and S. enterica; this pathogenic lineage acquired SPI-2[14]-[17], which contains genes encoding a type III secretion system that is required for survival in macrophages [18]. The third phase is the adaptation of subgroup I to warm-blooded animals, but the key genomic events involved remain unknown. Genome sequencing efforts in have mostly focused on subgroup I serotypes, largely due to their pathogenicity in humans. In this study, we sequenced the genome of a strain from subgroup IIIa (also known as ), which lies somewhere between subgroups I and V in evolution. Based on the important evolutionary position of subgroup IIIa, we anticipated that its genomic comparisons with other subgroups, especially subgroups I and V, may provide novel insights into the evolutionary transition of adaptation from cold- to warm-blooded hosts.

Organism information

Classification and features

is classified to Class , Order , Family and Genus (Table 1). was first described in 1939 by the name dar es salaam and was categorized as subgroup IIIa, later named [19]. is a rare cause of human infection and is naturally found in reptiles.
Table 1

Classification and general features of RKS2983

MIGS IDPropertyTermEvidence codea
 Current classificationDomain BacteriaTAS [34]
Phylum ProteobacteriaTAS [35]
Class GammaproteobacteriaTAS [35],[36]
Order "Enterobacteriales"TAS [37]-[39]
Family EnterobacteriaceaeTAS [39],[40]
Genus SalmonellaTAS [40]-[41]
Species Salmonella entericaTAS [41],[42]
Subspecies Salmonella enterica subsp. arizonaeTAS [42]
Strain RKS2983TAS [42]
Serovar 62:z36:-TAS [42]
 Gram stainNegativeIDA
 Cell shapeRod-shapedIDA
 MotilityMotileIDA
 SporulationNon-sporulatingIDA
 Temperature rangeMesophilicIDA
 Optimum temperature35°C–37°CIDA
 pH7.2–7.6IDA
 Carbon sourceGlucoseIDA
MIGS–6HabitatHumanTAS [42]
MIGS-6.3SalinityMediumIDA
MIGS-22Oxygen requirementFacultative anaerobesIDA
MIGS-15Biotic relationshipEndophyteIDA
MIGS-14PathogenicityPathogenicIDA
MIGS-4Geographic locationCalifornia, USATAS [42]
MIGS-5Sample collection time1985TAS [42]
MIGS-4.1LatitudeNot reportNAS
MIGS-4.2LongitudeNot reportNAS
MIGS-4.3DepthNot reportNAS
MIGS-4.4AltitudeNot reportNAS

a.) Evidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [43].

Classification and general features of RKS2983 a.) Evidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [43]. We obtained RKS2983 from the Genetic Stock Center (SGSC) as one of the strains in the set of Reference Collection C strain (SARC6) [2]; it was initially isolated from a human of California in 1985. It is, like other bacteria, Gram-negative with diameters around 0.7 to 1.5 μm and lengths of 2 to 5 μm, facultatively anaerobic, non-spore-forming, and predominantly motile with peritrichous flagella. The bacteria were grown at 37°C in Luria broth with pH of 7.2-7.6. Detailed information on the strain can be found at SGSC [20].

Genome sequencing information

Genome project history

This complete genome project was deposited in the Genomes On-Line Database (GOLD) and the complete genome sequence of strain RKS2983 was deposited at DDBJ/EMBL/GenBank under the accession CP006693.1. Table 2 presents the project information and its association with MIGS version 2.0 [21].
Table 2

Project information

MIGS IDPropertyTerm
MIGS-31Finishing qualityFinished
MIGS-28Libraries usedIllumina Paired-End library and SOLiD mate_pair library (2 x 50 bp)
MIGS-29Sequencing platformsIllumina HiSeq 2000 and SOLiD 3.0
MIGS-31.2Fold coverage100 ×
MIGS-30AssemblersSOAPdenovo v1.05
MIGS-32Gene calling methodGlimmer software that used in the RAST pipeline
 Genbank IDCP006693.1
 Genbank date of releaseSeptember 22, 2014
 GOLD IDGI686507741
 BIOPROJECTPRJNA215272
MIGS 13Source material identifierCDC 409–85
 Project relevanceEvolution in bacteria
Project information

Growth conditions and DNA isolation

RKS2983 was cultured to mid-logarithmic phase in 50 ml of Luria Broth on a gyratory shaker at 37°C. DNA was isolated from the cells using a CTAB bacterial genomic DNA isolation method [22].

Genome sequencing and assembly

The genome of RKS2983 was sequenced by use of two sequencing platforms, SOLiD 3.0 and Illumina HiSeq 2000. First, genomic DNA was sequenced with the Illumina sequencing platform by the paired-end strategy (2×100 bp) and the details of library construction and sequencing can be found at the Illumina web site [23]. The sequence data from Illumina HiSeq 2000 were assembled by SOAPdenovo v1.05 and the assembly contained 103 scaffolds with a genome size of 4.5 Mb. Then, the genomic DNA was sheared into 3 kb fragments by the Hydroshear instrument and was sequenced on a SOLiD sequencer by the mate-pair strategy (2 × 50 bp) according to the manual for the instrument (Applied Biosystems). The two sets of data from different methods were assembled by the velvet v1.2.09 software. The final assembly contained 20 scaffolds. Gaps between contigs were closed by PCR amplification using ABI3730 sequencer.

Genome annotation

Genes were predicted by Rapid Annotation using Subsystem Technology [24] with Glimmer 3 [25] followed by manual curation. The predicted coding sequences (CDSs) were translated and used to search the National Center for Biotechnology Information non-redundant database and Clusters of Orthologous Groups databases. These data sources were combined to assert a product description for each predicted protein. Then, we compared them with the annotated genes from four available genomes, including Ty2, LT2 (AE006468) [26], RKS2980 (CP000880) [27] andNCTC12419 (NC_015761) [17]. Non-coding genes and miscellaneous features were predicted using tRNAscanSE [28], RNAMMer [29], Rfam [30] and TMHMM [31].

Genome properties

The genome (Figure 1) consists of a chromosome of 4,574,836 bp (51.5% GC content) with 4,390 genes predicted, including 4,203 protein-coding genes, 22 rRNA genes, 82 tRNA genes and 98 pseudogenes. The properties and the statistics of the genome are summarized in Tables 3 and 4.
Figure 1

Graphical circular map of the S. arizonae RKS2983 genome. From the outside to the center: genes on forward strand (color by COG categories), genes on reverse strand (color by COG categories), GC content, and GC skew. The map was generated with the CGviewer software.

Table 3

Nucleotide content and gene count levels of the genome

AttributeValue% of total a
Genome Size (bp)4,574,836 
G + C content (bp)2,356,04051.50
Coding region (bp)3,924,84385.79
Total genesb4,390 
rRNA genes220.50
tRNA genes821.87
Protein-coding genes4,20395.70
Pseudogenes982.23
Frameshifted Genes781.78
Genes assigned to COGs3,38377.06

a.) The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome.

Table 4

Number of genes associated with the 25 general COG functional categories

CodeValue% of totalaDescription
J1683.83Translation, ribosomal structure and biogenesis
A10.02RNA processing and modification
K00.00Transcription
L2164.92Replication, recombination and repair
B00.00Chromatin structure and dynamics
D320.73Cell cycle control, mitosis and meiosis
Y00.00Nuclear structure
V441.00Defense mechanisms
T1062.41Signal transduction mechanisms
M2235.08Cell wall/membrane biogenesis
N892.03Cell motility
Z00.00Cytoskeleton
W00.00Extracellular structures
U441.00Intracellular trafficking and secretion
O1373.12Posttranslational modification, protein turnover, chaperones
C2405.47Energy production and conversion
G3076.99Carbohydrate transport and metabolism
E3147.15Amino acid transport and metabolism
F761.73Nucleotide transport and metabolism
H1423.23Coenzyme transport and metabolism
I882.00Lipid transport and metabolism
P1824.15Inorganic ion transport and metabolism
Q531.21Secondary metabolites biosynthesis, transport and catabolism
R3117.08General function prediction only
S3487.93Function unknown
-100722.94Not in COGs

a.) The total is based on the total number of protein coding genes in the annotated genome.

Graphical circular map of the S. arizonae RKS2983 genome. From the outside to the center: genes on forward strand (color by COG categories), genes on reverse strand (color by COG categories), GC content, and GC skew. The map was generated with the CGviewer software. Nucleotide content and gene count levels of the genome a.) The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome. Number of genes associated with the 25 general COG functional categories a.) The total is based on the total number of protein coding genes in the annotated genome.

Insights from the genome sequence

We first looked into the genetic relatedness of and . For this, we concatenated the 945 genes common to the 25 sequenced strains analyzed in this study and conducted comparisons using BLAST with the parameters set at >70% DNA identity and >0.7 gene length ratio to categorize genes into common genes. The multiple sequence alignment program MAFFT program [32] was used to align the gene sequences of the and strains. Phylogenetic trees were constructed with the aligned gene sequences using the Neighbor-Joining methods based on 1,000 randomly selected bootstrap replicates by MEGA 4.0 software [33]. The tree showed that positioned between subgroup I and , RKS2983 positioned between subgroup I and , and all subgroup I strains were clustered together (Figure 2).
Figure 2

Phylogenetic tree highlighting the position of S. arizonae RKS2983 (shown in bold) relative to strains of other Salmonella lineages. The corresponding GenBank accession numbers are displayed in parentheses. The tree was built based on the comparison of concatenated nucleotide sequences of 945 conserved genes in all strains. Individual orthologous sequences were aligned by the MAFFT program [32] and concatenated. The phylogenetic tree was constructed by using the MEGA 4.0 software [33] with Neighbor-Joining method. The bootstrap values are shown at branch points.

Phylogenetic tree highlighting the position of S. arizonae RKS2983 (shown in bold) relative to strains of other Salmonella lineages. The corresponding GenBank accession numbers are displayed in parentheses. The tree was built based on the comparison of concatenated nucleotide sequences of 945 conserved genes in all strains. Individual orthologous sequences were aligned by the MAFFT program [32] and concatenated. The phylogenetic tree was constructed by using the MEGA 4.0 software [33] with Neighbor-Joining method. The bootstrap values are shown at branch points. The core gene data of RKS2983, NCTC 12419 and LT2 (representing subgroup I)were presented in Figure 3. There are 2823 genes common to all three genomes and 926 genes specific in RKS2983. SPI-2 is in the set of 516 genes common to RKS2983 and LT2 and is absent in NCTC 12419. As many as 1017 genes are in LT2 but not in the other two genomes; we postulate that some of these genes may be associated with virulence to warm-blooded hosts.
Figure 3

Venn diagram showing the core genes in S. arizonae RKS2983, S. bongori NCTC 12419 and S. typhimurium LT2. The core genes conducted using BLAST with the parameters set at “>70% DNA identity and >0.7 gene length ratio”.

Venn diagram showing the core genes in S. arizonae RKS2983, S. bongori NCTC 12419 and S. typhimurium LT2. The core genes conducted using BLAST with the parameters set at “>70% DNA identity and >0.7 gene length ratio”. We compared these genomes for presence or absence of pathogenecity islands (SPIs) and found that RKS2983 shared some of the SPIs with NCTC 12419 and others with LT2 or Ty2 (Table 5), providing opportunities of evolutionary studies about acquisition of SPIs during transition of from cold- to warm-blooded animal pathogens.
Table 5

Distribution of known SPIs in four representation genomes of genus

Genomic Island S. bongori 12419 S. arizonae RKS2983 S. typhimurium LT2 S. typhi Ty2
SPI-1++++
SPI-2-+++
SPI-3+-++
SPI-4++++
SPI-5++++
SPI-6--++
SPI-7---+
SPI-8---+
SPI-9++++
SPI-10---+
SPI-11++++
SPI-12--++
SPI-13+++-
SPI-14-++-
SPI-15---+
SPI-16--++
SPI-17---+
SPI-18---+
SPI-19----
SPI-20++--
SPI-21++--
SPI-22----

+ means SPI is present in the serotype.

- means SPI is absent in the serotype.

Distribution of known SPIs in four representation genomes of genus + means SPI is present in the serotype. - means SPI is absent in the serotype.

Conclusions

is phylogenetically positioed between and subgroup I and shares some pathogenicity-associated genes with and some others with subgroup I lineages. Therefore genome analyses may provide important clues to key genomic events that might have facilitated the evolution of warm-blooded animal pathogens from cold-blooded parasites.

Abbreviations

SARC: Salmonella reference collection C CTAB: Cetyl trimethyl ammonium bromide MLEE: Multilocus enzyme electrophoresis

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

CXW carried out the genome sequence analysis and drafted the manuscript. SLZ and BL participated in genome sequence analysis. XYW participated in PCR amplification and sequencing of the PCR products by ABI3130 sequencer. RJ, JZ and GRL participated in the study design and provided reagents for the project. YGL and JZ provided the SOLiD and ABI Sequencing platform. SLL conceived the study and finalized the manuscript. All authors read and approved the final manuscript.
  34 in total

1.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.

Authors:  A Krogh; B Larsson; G von Heijne; E L Sonnhammer
Journal:  J Mol Biol       Date:  2001-01-19       Impact factor: 5.469

2.  Rfam: an RNA family database.

Authors:  Sam Griffiths-Jones; Alex Bateman; Mhairi Marshall; Ajay Khanna; Sean R Eddy
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

3.  Genes encoding putative effector proteins of the type III secretion system of Salmonella pathogenicity island 2 are required for bacterial virulence and proliferation in macrophages.

Authors:  M Hensel; J E Shea; S R Waterman; R Mundy; T Nikolaus; G Banks; A Vazquez-Torres; C Gleeson; F C Fang; D W Holden
Journal:  Mol Microbiol       Date:  1998-10       Impact factor: 3.501

Review 4.  Evolution of host adaptation in Salmonella enterica.

Authors:  A J Bäumler; R M Tsolis; T A Ficht; L G Adams
Journal:  Infect Immun       Date:  1998-10       Impact factor: 3.441

5.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.

Authors:  T M Lowe; S R Eddy
Journal:  Nucleic Acids Res       Date:  1997-03-01       Impact factor: 16.971

6.  Determining divergence times of the major kingdoms of living organisms with a protein clock.

Authors:  R F Doolittle; D F Feng; S Tsang; G Cho; E Little
Journal:  Science       Date:  1996-01-26       Impact factor: 47.728

7.  Analysis of the boundaries of Salmonella pathogenicity island 2 and the corresponding chromosomal region of Escherichia coli K-12.

Authors:  M Hensel; J E Shea; A J Bäumler; C Gleeson; F Blattner; D W Holden
Journal:  J Bacteriol       Date:  1997-02       Impact factor: 3.490

8.  Molecular archaeology of the Escherichia coli genome.

Authors:  J G Lawrence; H Ochman
Journal:  Proc Natl Acad Sci U S A       Date:  1998-08-04       Impact factor: 11.205

Review 9.  Lateral gene transfer in Salmonella.

Authors:  Steffen Porwollik; Michael McClelland
Journal:  Microbes Infect       Date:  2003-09       Impact factor: 2.700

10.  Macrophage-dependent induction of the Salmonella pathogenicity island 2 type III secretion system and its role in intracellular survival.

Authors:  D M Cirillo; R H Valdivia; D M Monack; S Falkow
Journal:  Mol Microbiol       Date:  1998-10       Impact factor: 3.501

View more
  3 in total

1.  Pathogenic potential of non-typhoidal Salmonella serovars isolated from aquatic environments in Mexico.

Authors:  Areli Burgueño-Roman; Gloria M Castañeda-Ruelas; Ramón Pacheco-Arjona; Maribel Jimenez-Edeza
Journal:  Genes Genomics       Date:  2019-03-11       Impact factor: 1.839

2.  The Salmonella enterica Serovar Typhi ltrR Gene Encodes Two Proteins Whose Transcriptional Expression Is Upregulated by Alkaline pH and Repressed at Their Promoters and Coding Regions by H-NS and Lrp.

Authors:  J E Rebollar-Flores; L Medina-Aparicio; V E Osio-Becerro; J M Villarreal; S Mayo; B D Mendoza; S Rodríguez-Gutierrez; L Olvera; S Dávila; S Encarnación; A G Martínez-Batallar; E Calva; I Hernández-Lucas
Journal:  J Bacteriol       Date:  2020-06-09       Impact factor: 3.490

3.  Evolution of host adaptation in the Salmonella typhoid toxin.

Authors:  Xiang Gao; Lingquan Deng; Gabrielle Stack; Hai Yu; Xi Chen; Yuko Naito-Matsui; Ajit Varki; Jorge E Galán
Journal:  Nat Microbiol       Date:  2017-10-09       Impact factor: 17.745

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.