| Literature DB >> 32669560 |
Francisco Salvà-Serra1,2,3,4,5, Daniel Jaén-Luchoro6,7,8,9, Hedvig E Jakobsson6,7,8,9, Lucia Gonzales-Siles6,7,8,9, Roger Karlsson6,7,8,9,10, Antonio Busquets11, Margarita Gomila11, Antoni Bennasar-Figueras11, Julie E Russell12, Mohammed Abbas Fazal12, Sarah Alexander12, Edward R B Moore6,7,8,9.
Abstract
We present the first complete, closed genome sequences of Streptococcus pyogenes strains NCTC 8198T and CCUG 4207T, the type strain of the type species of the genus Streptococcus and an important human pathogen that causes a wide range of infectious diseases. S. pyogenes NCTC 8198T and CCUG 4207T are derived from deposit of the same strain at two different culture collections. NCTC 8198T was sequenced, using a PacBio platform; the genome sequence was assembled de novo, using HGAP. CCUG 4207T was sequenced and a de novo hybrid assembly was generated, using SPAdes, combining Illumina and Oxford Nanopore sequence reads. Both strategies yielded closed genome sequences of 1,914,862 bp, identical in length and sequence identity. Combining short-read Illumina and long-read Oxford Nanopore sequence data circumvented the expected error rate of the nanopore sequencing technology, producing a genome sequence indistinguishable to the one determined with PacBio. Sequence analyses revealed five prophage regions, a CRISPR-Cas system, numerous virulence factors and no relevant antibiotic resistance genes. These two complete genome sequences of the type strain of S. pyogenes will effectively serve as valuable taxonomic and genomic references for infectious disease diagnostics, as well as references for future studies and applications within the genus Streptococcus.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32669560 PMCID: PMC7363880 DOI: 10.1038/s41598-020-68249-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Sequencing workflows. Illustration showing the origin, the strain passage and indicating the whole-genome sequencing workflows performed, in parallel, by the NCTC and the CCUG to determine the complete genome sequence of the type strain of S. pyogenes. Map created using the on-line server MapChart (www.mapchart.net).
Results of whole-genome sequencing of S. pyogenes NCTC 8198T and CCUG 4207T, from the four sequencing runs done with PacBio RSII, Illumina HiSeq 2,500 and Oxford Nanopore MinION platforms.
| Platform | PacBio RSII | Illumina HiSeq 2500 | Oxford Nanopore MinION | |
|---|---|---|---|---|
| Run SRA accession number | ERR550482 | ERR550487 | SRR8631872 | SRR10092043 |
| Total number of reads | 163,469 | 163,274 | 10,260,942 | 135,020 |
| Total yield (Mb) | 420 | 694 | 1,293 | 1,122 |
| Sequencing depth | 219 X | 362 X | 675 X | 586 X |
| Average read length (bp) | 2,570 | 4,250 | 126 | 8,307 |
| Read length N50 (bp) | 4,555 | 9,809 | 126 | 14,598 |
| Longest read (bp) | 35,267 | 41,160 | 126 | 97,527 |
| Average Phred quality score | 2.7 | 3.8 | 35 | 11.5 |
The total number of reads, total yield (Mb), sequencing depth, average read length (bp), read length N50 (bp), longest read (bp) and average Phred quality score are shown for each sequencing run. The SRA accession number of each run is indicated.
General information and genomic features of S. pyogenes NCTC 8198T = CCUG 4207T.
| Section | Features | ||
|---|---|---|---|
| General information | Organism | ||
| Taxonomy ID | 1,314 | ||
| Strain | S.F. 130T = NCTC 8198T = CCUG 4207T | ||
| Collection date | ca., 1926 | ||
| Host | |||
| Host disease | Scarlett fever | ||
| Geographic location | United Kingdom, Manchester | ||
| Assemblies | Strain | ||
| Sequencing platforms | PacBio RSII | Illumina HiSeq 2500 + Oxford Nanopore MinION | |
| Assembly method | HGAP version 3 | SPAdes version 3.11.0 | |
| Assembly coverage | 581 X | 186 X (Illumina) + 576 X (Oxford Nanopore) | |
| GenBank accession number | LN831034 | CP028841 | |
| Assembly accession number | GCA_002055535.1 | GCA_004028355.1 | |
| BioProject ID | NCTC_3000 (PRJEB6403) | TAILORED-Treatment (PRJNA302716) | |
| Finishing quality | Closed complete genome | Closed complete genome | |
| Number of contigs | 1 | 1 | |
| Number of N's | 0 | 0 | |
| Total length (bp) | 1,914,862 | 1,914,862 | |
| GC content (%) | 38.5 | 38.5 | |
| RefSeq annotation | Annotation method | PGAP version 4.1 | PGAP version 4.7 |
| Number of genes (total) | 2,007 | 2,009 | |
| Total coding sequences (CDSs) | 1,918 | 1,920 | |
| Protein coding sequences | 1,866 | 1,860 | |
| Pseudogenes | 52 | 60 | |
| Number of RNA genes | 89 | 89 | |
| tRNA | 67 | 67 | |
| Non-coding RNA | 4 | 4 | |
| Ribosomal RNA | 18 (6 operons) | 18 (6 operons) | |
| Hypothetical proteins | 334 | 306 | |
Figure 2Genome atlas of the type strain of Streptococcus pyogenes. The atlas was built with the genome sequence annotated with PGAP 4.7 and available in RefSeq (NZ_CP028841.1), using the on-line server GView. Labelling, from outside to inside: backbone; CDSs and prophage regions (coloured in blue); GC content deviations (GC-rich towards outside, GC-poor towards inside); GC skew (excess of guanine over cytosine towards outside, and vice versa) and CDSs coloured by COG categories (if assigned).
Prophage regions identified by the software PHASTER.
| Region | Completeness | Start | End | Length (bp) | No. CDSs | GC content (%) |
|---|---|---|---|---|---|---|
| SF130.1 | Questionable | 526,644 | 570,537 | 43,894 | 65 | 38.6 |
| SF130.2 | Intact | 818,527 | 860,412 | 41,886 | 59 | 38.3 |
| SF130.3 | Intact | 1,058,125 | 1,107,536 | 49,412 | 67 | 37.4 |
| SF130.4 | Intact | 1,216,850 | 1,259,402 | 42,553 | 60 | 37.7 |
| SF130.5 | Intact | 1,348,482 | 1,405,407 | 56,926 | 70 | 39.1 |
For each region, the estimated completeness, the positions in the genome, length, number of CDSs and percentage of GC are indicated.
Figure 3CRISPR-Cas system of the type strain of S. pyogenes. (A) The genomic architecture of the S. pyogenes CCUG 4207T (= NCTC 8198T) CRISPR-Cas subtype I-C system, formed by seven cas genes followed by six direct repeats (DR) and five spacers. The stepped line crossing the cas3 gene indicates frameshift. (B) The hairpin structure motif of the consensus sequence of the direct repeats of the CRISPR array of S. pyogenes CCUG 4207T (= NCTC 8198T).