| Literature DB >> 28163827 |
Voula Alexandraki1, Maria Kazou1, Jochen Blom2, Bruno Pot3, Effie Tsakalidou1, Konstantinos Papadimitriou1.
Abstract
Streptococcus thermophilus ACA-DC 2 is a newly sequenced strain isolated from traditional Greek yogurt. Among the 14 fully sequenced strains of S. thermophilus currently deposited in the NCBI database, the ACA-DC 2 strain has the smallest chromosome, containing 1,731,838 bp. The annotation of its genome revealed the presence of 1,850 genes, including 1,556 protein-coding genes, 70 RNA genes and 224 potential pseudogenes. A large number of pseudogenes were identified. This was also accompanied by the absence of pathogenic features suggesting evolution of strain ACA-DC 2 through genome decay processes, most probably due to adaptation to the milk ecosystem. Analysis revealed the existence of one complete lactose-galactose operon, several proteolytic enzymes, one exopolysaccharide cluster, stress response genes and four putative antimicrobial peptides. Interestingly, one CRISPR-cas system and one orphan CRISPR, both carrying only one spacer, were predicted indicating low activity or inactivation of the cas proteins. Nevertheless, four putative restriction-modification systems were determined that may compensate any deficiencies of the CRISPR-cas system. Furthermore, whole genome phylogeny indicated three distinct clades within S. thermophilus. Comparative analysis among selected strains representative for each clade, including strain ACA-DC 2, revealed a high degree of conservation at the genomic scale, but also strain specific regions. Unique genes and genomic islands of strain ACA-DC 2 contained a number of genes potentially acquired through horizontal gene transfer events, that could be related to important technological properties for dairy starters. Our study suggests genomic traits in strain ACA-DC 2 compatible to the production of dairy fermented foods.Entities:
Keywords: CRISPR; Extended genome report; Horizontal gene transfer; Streptococcus thermophilus; Stress genes; Yogurt
Year: 2017 PMID: 28163827 PMCID: PMC5282782 DOI: 10.1186/s40793-017-0227-5
Source DB: PubMed Journal: Stand Genomic Sci ISSN: 1944-3277
Fig. 1Photomicrographs of S. thermophilus ACA-DC 2. The images were obtained with (a) optical microscopy at magnification 1000x for Gram stained cells and (b) transmission electron microscopy of stained cells with 10% (w/v) PTA. Scale bar in (b) corresponds to 1 μm
Classification and general features of S. thermophilus strain ACA-DC 2 according to the MIGS recommendations [39]
| MIGS ID | Property | Term | Evidence codea |
|---|---|---|---|
| Classification | Domain | TAS [ | |
| Phylum | TAS [ | ||
| Class | TAS [ | ||
| Order | TAS [ | ||
| Family | TAS [ | ||
| Genus | TAS [ | ||
| Species | TAS [ | ||
| Strain: ACA-DC 2 | TAS (this study) | ||
| Gram stain | Positive | IDA | |
| Cell shape | Coccus | IDA | |
| Motility | Non-motile | IDA | |
| Sporulation | Non-sporulating | NAS | |
| Temperature range | 30–50 °C | TAS [ | |
| Optimum temperature | 42 °C | TAS [ | |
| pH range; Optimum | 5–7; 6.5 | TAS [ | |
| Carbon source | lactose; saccharose; d-glucose; galactose | IDA | |
| MIGS-6 | Habitat | Yogurt | TAS [ |
| MIGS-6.3 | Salinity | 2% NaCl (w/v) | TAS [ |
| MIGS-22 | Oxygen requirement | Microaerophilic | TAS [ |
| MIGS-15 | Biotic relationship | Free-living | NAS |
| MIGS-14 | Pathogenicity | Non-pathogen | NAS |
| MIGS-4 | Geographic location | Greece | TAS [ |
| MIGS-5 | Sample collection | 1988 | NAS |
| MIGS-4.1 | Latitude | Unknown | |
| MIGS-4.2 | Longitude | Unknown | |
| MIGS-4.4 | Altitude | Unknown |
aEvidence codes - IDA inferred from direct assay, TAS traceable author statement (i.e., a direct report exists in the literature), NAS non-traceable author statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [54]
Fig. 2Phylogenetic tree highlighting the position of S. thermophilus ACA-DC 2 relative to other Streptococcus species. The tree was constructed based on 16S rRNA gene sequences. GenBank accession numbers are presented in parentheses and type strains are indicated with a superscript T (type strains = T). Strains with complete genome sequence are marked with an asterisk. 16S rRNA gene sequences were aligned using MUSCLE [55]. The phylogenetic tree was built by the Maximum Likelihood method within MEGA7 software [56] using the Tamura-Nei substitution model [57]. Lactococcus lactis subsp. lactis NCDO 604T served as the outgroup. Bootstrap values derived after 1,000 replicates. The scale bar indicates an estimated 0.01 nucleotide change per nucleotide position
Project information
| MIGS ID | Property | Term |
|---|---|---|
| MIGS 31 | Finishing quality | Finished |
| MIGS-28 | Libraries used | Illumina genomic Nextera XT library; |
| MIGS 29 | Sequencing platforms | Illumina HiSeq2500; PacBio RSII |
| MIGS 31.2 | Fold coverage | 636x |
| MIGS 30 | Assemblers | ABySS v1.5.1; BLASR; SSPACE v1.0; GapFiller v1.10 |
| MIGS 32 | Gene calling method | Prodigal; MeteGeneAnnotator; FGENESB |
| Locus Tag | STACADC2 | |
| Genbank ID | LT604076 | |
| GenBank Date of Release | 29-Jul-2016 | |
| GOLD ID | NA | |
| BIOPROJECT | PRJEB14916 | |
| MIGS 13 | Source Material Identifier | ACA-DC 2 |
| Project relevance | Dairy isolate |
Genome statistics
| Attribute | Value | % of Total |
|---|---|---|
| Genome size (bp) | 1,731,838 | 100.00 |
| DNA coding (bp) | 1,356,670 | 78.34 |
| DNA G + C (bp) | 679,104 | 39.21 |
| DNA scaffolds | 1 | 100.00 |
| Total genes | 1,850 | 100.00 |
| Protein coding genes | 1,556 | 84.11 |
| RNA genes | 70 | 3.78 |
| Pseudo genes | 224 | 12.11 |
| Genes in internal clusters | NA | NA |
| Genes with function prediction | 1,182 | 63.89 |
| Genes assigned to COGs | 1,327 | 71.73 |
| Genes with Pfam domains | 1,318 | 71.24 |
| Genes with signal peptides | 127 | 6.86 |
| Genes with transmembrane helices | 339 | 18.32 |
| CRISPR repeats | 2 |
Fig. 3Circular map of S. thermophilus ACA-DC 2 genome features generated with the CGview tool. From periphery to center: Protein coding genes on forward strand colored by COG category assignment; Genes on forward strand; Protein coding genes on reverse strand colored by COG category assignment; Genes on reverse strand; GC content; GC skew; Genome region in kbp
Number of genes associated with general COG functional categories
| Code | Value | %age | Description |
|---|---|---|---|
| J | 146 | 9.38 | Translation, ribosomal structure and biogenesis |
| A | 0 | 0.00 | RNA processing and modification |
| K | 89 | 5.72 | Transcription |
| L | 136 | 8.74 | Replication, recombination and repair |
| B | 0 | 0.00 | Chromatin structure and dynamics |
| D | 16 | 1.03 | Cell cycle control, Cell division, chromosome partitioning |
| V | 39 | 2.51 | Defense mechanisms |
| T | 43 | 2.76 | Signal transduction mechanisms |
| M | 80 | 5.14 | Cell wall/membrane biogenesis |
| N | 3 | 0.19 | Cell motility |
| U | 20 | 1.29 | Intracellular trafficking and secretion |
| O | 55 | 3.53 | Posttranslational modification, protein turnover, chaperones |
| C | 40 | 2.57 | Energy production and conversion |
| G | 66 | 4.24 | Carbohydrate transport and metabolism |
| E | 160 | 10.28 | Amino acid transport and metabolism |
| F | 67 | 4.31 | Nucleotide transport and metabolism |
| H | 49 | 3.15 | Coenzyme transport and metabolism |
| I | 33 | 2.12 | Lipid transport and metabolism |
| P | 67 | 4.31 | Inorganic ion transport and metabolism |
| Q | 13 | 0.84 | Secondary metabolites biosynthesis, transport and catabolism |
| R | 63 | 4.05 | General function prediction only |
| S | 215 | 13.82 | Function unknown |
| - | 229 | 14.72 | Not in COGs |
The total is based on the total number of protein coding genes in the genome
Fig. 4Synteny plot of the CRISPR loci between S. thermophilus strains ACA-DC 2 and LMD-9. The synteny of the two regions was calculated by the KODON software. In both strains the cas genes are denoted in blue. Gene csm6 in strain ACA-DC 2 is a potential pseudogene and it is denoted in yellow. The pyrD and pyrF genes colored in beige define the upstream and downstream limits of the CRISPR loci. Percentages displayed in the ribbon areas correspond to the % identity among the nucleotide sequences
Fig. 5Comparative genomics of S. thermophilus strains. a Whole genome phylogeny of S. thermophilus strains with complete available genome sequences. The phylogenetic tree was calculated in EDGAR and it is presented as a cladogram ignoring branch length. The strains S. salivarius NCTC 8616 and Lactococcus lactis subsp. cremoris MG1363 were used as outgroups. Colored boxes indicate the three distinct S. thermophilus branches identified. Strains in each colored branch designated in bold were used for further comparative analysis. b Whole genome alignments of S. thermophilus strains KLDS 3.1003, JIM 8232 and ACA-DC 2 using Circoletto. Red and blue ribbons correspond to regions of >98% and 80–98% identity, respectively. White regions correspond to strain specific loci. c Venn diagram analysis of S. thermophilus strains KLDS 3.1003, JIM 8232 and ACA-DC 2, as implemented in EDGAR
Fig. 6Additional genomic features of S. thermophilus ACA-DC 2. a Circular map of the S. thermophilus ACA-DC 2 genome as generated by IslandViewer 3. Highlighted regions correspond to GIs. GIs are colored within the circular map according to the prediction method used: five GIs in orange and eight GIs in blue were predicted with SIGI-HMM and IslandPath-DIMOB, respectively. Twelve integrated GIs are presented on the periphery in red. The black line plot represents the GC content (%) of the genomic sequence. b Distribution of genes in GIs and unique genes of S. thermophilus ACA-DC 2 into COG categories