Literature DB >> 28571573

Comparison of molecular serotyping approaches of Streptococcus agalactiae from genomic sequences.

Georgia Kapatai1, Darshana Patel2, Androulla Efstratiou2, Victoria J Chalker3.   

Abstract

BACKGROUND: Group B streptococcus (GBS) capsular polysaccharide is one of the major virulence factors underlying invasive GBS disease and a component of forthcoming vaccines. Serotype classification of GBS is based on the capsule polysaccharide of which ten variants are known to exist (Ia, Ib, II-IX). Current methods for GBS serotype assignment rely on latex agglutination or PCR while more recently a whole genome sequencing method was reported. In this study, three distinct algorithms for serotype assignment from genomic data were assessed using a panel of 790 clinical isolates.
METHODS: The first approach utilised the entire capsular locus coupled with a mapping methodology. The second approach continues from the first and utilised a SNP-based methodology across the conserved cpsD-G region to differentiate serotypes Ia-VII and IX. Finally the third approach used the variable cpsG -K region coupled with a mapping methodology. All three approaches were assessed for typeability (percentage of isolates assigned a serotype) and concordance to the latex agglutination methodology.
RESULTS: Following comparisons, the third approach using the variable cpsG-K region demonstrated the best performance with 99.9% typeability and 86.7% concordance. Overall, of the 105 discordant isolates, 71 were resolved following retesting of latex agglutination and whole genome sequencing, 20 failed to assign a serotype using latex agglutination and only 14 were found to be truly discordant on re-testing. Comparison of this final approach with the previously described assembly-based approach returned identical results.
CONCLUSIONS: These results demonstrated that molecular capsular typing using whole genome sequencing and a mapping-based approach is a viable alternative to the traditional, latex agglutination-based serotyping method and can be implemented in a public health microbiology setting.

Entities:  

Keywords:  Group B Streptococci; Serotyping; Streptococcus agalactiae; Whole genome sequencing

Mesh:

Year:  2017        PMID: 28571573      PMCID: PMC5455115          DOI: 10.1186/s12864-017-3820-5

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


Background

Streptococcus agalactiae (group B streptococcus, GBS) is a leading cause of neonatal sepsis and meningitis worldwide [1]. Increasingly GBS is also an important cause of infections in immunosuppressed adults and the elderly [2]. A rise in the incidence of disease has been noted across multiple countries [3]. This is of particular concern because GBS is associated with a high morbidity and mortality [4]. Although no GBS vaccine is currently available conjugate polysaccharide vaccines covering the most common serotypes are in development [5]. Serotype classification of GBS is based on the capsule polysaccharide of which ten variants are known to exist (Ia, Ib, II-IX). The prevalence and distribution of serotypes differ between geographical regions, ethnic populations and clinical presentations [6]. The serotypes also differ in their virulence potential. Serotype III for example is associated with a significant proportion of neonatal disease particularly late-onset disease which presents between 7 to 89 days of age. In addition, serotype III is strongly associated with neonatal meningitis cases. The majority of serotype III isolates belong to multi-locus sequence type 17 which is associated with poor outcome of disease [7]. Accurate assignment of serotypes is important particularly for assessing serotype distributions in vaccine coverage and post-vaccine surveillance studies. The capsular polysaccharide is encoded on the cps locus and is composed of 16-18 genes [8]. The cpsA to -F genes are located at one end distal to the cpsL, NeuB, −D, −A and –C genes at the other, and these genes are highly conserved across the ten serotypes. In the central region from cpsG to –K in serotypes Ia-VII and IX and from cpsR to -K for serotype VIII the presence of genes and/or the sequence similarity varies between the serotypes (Fig. 1).
Fig. 1

Comparison of the cps loci of all 10 serotypes (Ia, Ib, III-IX). The cps loci extracted from the reference strains where aligned using progressiveMauve and the gene regions were annotated using Artemis. The genes within the variable region cpsG-K are coloured in orange

Comparison of the cps loci of all 10 serotypes (Ia, Ib, III-IX). The cps loci extracted from the reference strains where aligned using progressiveMauve and the gene regions were annotated using Artemis. The genes within the variable region cpsG-K are coloured in orange Multiple phenotypic serotyping methods such as latex agglutination, enzyme-based immunoassays and flow cytometry experiments using anti-capsular monoclonal antibodies have been described for GBS [9-11]. These assays can have limited typeability, can be subjective and are not able to assign all isolates to a type resulting in a high number of non-typeable isolates. Genotypic methods such as PCR-based DNA hybridisation, real-time PCR and restriction fragment length polymorphism assays can identify genetic variants in the cps locus that can be used to assign isolates to a serotype [12-15]. With the continuous reduction in cost of whole-genome sequencing (WGS) and the rapid development of bioinformatic infrastructures to analyse and store the large amount of data generated, WGS can provide a feasible approach to perform GBS serotyping. A recent study has described an approach to successfully determine the GBS serotype from WGS data [16]. The approach used was based on differences in the variable region of the cps locus and was able to assign isolates to all ten serotypes with high concordance with previous typing methods. In this study we sought to develop, compare and validate multiple WGS-based serotyping assays for GBS to determine the optimal methodology for implementation into a public health reference microbiology laboratory.

Methods

GBS reference strains and clinical isolates

A panel of 10 GBS reference strains (serotypes Ia to IX; Additional file 1: Table S1) were available from the archives of the Public Health England (PHE) Reference Laboratory. The clinical isolates included in the study include isolates referred to the reference laboratory between April 2014 and December 2015 as well as retrospective isolates from 2010 for uncommon serotypes. In total, 790 isolates were used to compare and validate the WGS-based serotyping methods.

Serotyping using latex agglutination

Isolates were cultured onto Columbia agar plates supplemented with horse blood (Oxoid Ltd., Hampshire, UK) and incubated aerobically at 37 °C for 24 h. Serological classification based on capsular polysaccharide types Ia, Ib and II to IX was performed using latex agglutination according to the manufacturer’s recommendations (Statens Serum Institute, Copenhagen, Denmark).

DNA extraction and sequencing

Strains were pre-lysed in a lysis buffer composed of 2 mg lysozyme (Sigma-Aldrich, Dorset, UK), 120 U mutanolysin (Sigma-Aldrich), 400 μg RNase A (Qiagen, Manchester, UK), 20 μL proteinase K (>600 mAU/mL) and 15 μL of overnight culture. Cell suspensions were incubated for 1 h at 37 °C, 2 h at 56 °C followed by 1 h at 80 °C. DNA was extracted using the QIAsymphony SP system and the QIAsymphony DSP mini kit (Qiagen) according to the manufacturers’ recommendations. DNA concentrations were measured using the Quant-iT dsDNA Broad-Range Assay Kit (Life Technologies, Paisley, UK) and GloMaxR 96 Microplate Luminometer (Promega, Southampton, UK). For sequencing preparation, a Nextera XT DNA Library Preparation Kit (Illumina, San Diego, CA, USA) was used followed by sequencing using a HiSeq 2500 System (Illumina) and the 2 × 100-bp paired-end mode.

Bioinformatic processing

Casava 1.8.2 (Illumina) was used to deplex the samples and FASTQ reads were processed with Trimmomatic to remove bases from the trailing end that fall below a PHRED score of 30 [17]. K-mer identification software (https://github.com/phe-bioinformatics/kmerid) was used to compare the sequence reads with a panel of curated NCBI Reference Sequence Database genomes to identify the species. FASTQ reads from all sequences in this study (n = 790) were submitted to the European Nucleotide Archive (ENA) using the ena_submission tool (https://github.com/phe-bioinformatics/ena_submission) and can be found at the PHE Pathogens BioProject PRJEB18093 at ENA (http://www.ebi.ac.uk/ena/data/view/PRJEB18093; Additional file 2: Table S2).

Reference strain analysis

Ten GBS reference strains were sequenced and the genomic reads were assembled using SPAdes. The capsular locus sequences from all isolates were extracted using a BLAST query with the cps locus sequence of Streptococcus agalactiae strain A4 (accession number: DQ359707), a serotype V strain that covers the majority of the capsular genes [8, 18]. Artemis was used to annotate the capsular locus sequences [19]. Fastq reads (Accession numbers: ERS1462786 – ERS1462795) and annotated capsular locus sequences (Accession numbers: LT671983-LT671992; http://www.ebi.ac.uk/ena/data/view/LT671983-LT671992) for all reference strains are available in the PHE Pathogens BioProject PRJEB18093 at ENA (http://www.ebi.ac.uk/ena/data/view/PRJEB18093; Additional file 1: Table S1).

De novo assembly

Genomic reads were assembled using SPAdes (version 2.5.1) de novo assembly software [20] with the following parameters ‘spades.py --careful -1 strain.1.fastq.gz -2 strain.2.fastq -t 4 -k 33,55,77,85,93’. The resulting contigs.fasta file was converted into a BLAST database using blast + (version 2.2.27) [21] and queried using selected query sequence.

Sequence alignment

Capsular locus sequences were aligned using progressiveMauve [22] to visualise presence/absence of genes and define variable and conserved regions (Fig. 1). The cpsD-G conserved regions were also aligned using MEGA6 [23] to investigate single nucleotide polymorphisms (SNP).

Bioinformatic analysis for serotype prediction

Reads from each readset were mapped to a multi-fasta file containing the reference sequences using bowtie2 (version 2.1.0; following options used: --fr --minins 300 --maxins 1100 -k 99,999 -D 20 -R 3 -N 0 -L 20 -I S,1,0.50) [24]. A threshold coverage of >90% of the length of the sequence, minimum depth of 5 reads per bp and a mean depth of >20 reads over the entire length of the sequence was implemented during this step. SAMtools (version 0.1.19) [25] mpileup was used to detect non-reference positions (SNPs) in each readset. BAQ computation was disabled and anonymous read pairs were included in variant calling (option –B and –A). The mpileup file was parsed and SNPs were filtered based on base quality (>30) and number of reads (>5).

Results

Selecting SNPs for serotype identification

Serotype identification using PCR amplification followed by Sanger sequencing was previously described by Kong et al. for serotypes Ia to VII and later by Slotvet et al. to include serotype IX [26, 27]. These two publications identified 52 single nucleotide polymorphisms (SNP) and one repeat region within the conserved cpsD-G region and based upon their results were able to differentiate serotypes Ia to VII and IX. Serotype VIII was not included due to an inability to amplify the target region using the primers detailed in those studies. In this study, we examined the previously described SNPs using the capsular locus sequences from 10 reference strains. The capsular operon regions covering genes cpsD up to the 5′ end of cpsG for nine serotypes were aligned and all SNPs were recorded. Serotype VIII was excluded due to the lack of cpsF and cpsG genes; instead of the standard cpsD-cpsE-cpsF-cpsG-cpsH gene pattern, serotype VIII has cpsD-cpsE-cpsR-cpsH (Fig. 1). Previously published variants were compared to the results of the reference strain capsular locus alignment and 47 SNPs and the repeat region were concordant between the two sets. The SNPs at positions 138, 198, 249, 1173 and 1527 were not found in the respective reference strains (Additional file 3: Table S3). Furthermore, based on the alignment of the reference strains, 11 novel SNPs were identified that differentiated the nine serotypes. These were added to the 48 published variants and the new set of 59 variants was validated using clinical isolates. Isolates were prospectively sequenced as referred from hospital laboratories therefore, at this initial stage of the study, genomic data were available for only a limited number of isolates. In total, 28 isolates were included in the initial validation in addition to the 10 reference strains; 3-6 isolates were selected for each serotype, except serotypes VI and IX where only one clinical isolate had been sequenced and serotype VII where no representative clinical isolates had been received during that period (Additional file 4: Table S4). Following analysis, serotype-specific SNPs were confirmed in clinical isolates for all serotypes except serotype III which as previously described is subdivided into different subtypes [26]. To simplify the process, all SNPs present only in certain serotype III subtypes were removed (n = 14), as well as the repeat region, leaving a final set of 44 SNPs (Table 1).
Table 1

SNP sites used for identification of serotypes Ia-VII and XI in the conserved region, cpsD to 5' end of cpsG

Positions previously described [26, 27] are highlighted in green whereas SNPs identified in this study are highlighted in blue. SNPs highlighted in yellow are unique to a single serotype

SNP sites used for identification of serotypes Ia-VII and XI in the conserved region, cpsD to 5' end of cpsG Positions previously described [26, 27] are highlighted in green whereas SNPs identified in this study are highlighted in blue. SNPs highlighted in yellow are unique to a single serotype

Assessment of targets for molecular capsular typing

Typeability

Firstly, the entire capsular locus was used as a target; reads were mapped to the capsular loci extracted from the 10 reference strains and serotype was assigned based on mapping coverage along the length of the reference sequence (mapping coverage >90%). Using this approach, a single serotype was assigned in 163 cases (20.6%) whereas in one case the highest percentage mapping coverage observed was below 90% resulting in a ‘Failed’ tag and in the remaining 629 cases the method was not able to distinguish between two serotypes that both exhibited >90% coverage. This result demonstrated higher than 90% similarity between the capsular sequences of serotypes Ia and III (n = 535), IX and V (n = 71) and IX and VII (n = 17) (Additional file 5: Table S5). This was confirmed by aligning the two sequences within each pair using BLAST. The high similarity between Ia and III was also previously described by Chaffin et al. [28]. Contamination was suspected for the remaining four cases, each corresponding to different pairs, considering that the percent coverage seen using BLAST was <90%. The 167 cases where a serotype was assigned successfully, correspond mainly to five serotypes (Ib, n = 51; II, n = 68; IV, n = 29; VI, n = 10; VIII, n = 2), with the remaining two cases corresponding to samples assigned Ia and IX, respectively (Table 2). The serotypes corresponding to the respective pairs (Ia/III and IX/VII) were missed due to mapping coverage slightly below 90% (88.11% and 89.78%, respectively).
Table 2

Detailed summary of the results (typeability and concordance) of the three WGS methods

cpsG-KFull CPSSNP-basedGrant total
TRUEFALSENTTRUEFALSENTTRUEFALSENTn/aa
Latex agglutinationIa155164167151155179
Ib441643984314361
II474741648
III3361863483201618378
IV20120120122
IX94121093113
V5710760559370
VI7317473111
VII7777
VIII282268210
NT29121727230
Grant total68410511204262865396392829

TRUE corresponds to concordant with latex agglutination results, FALSE corresponds to discordant, NT is non-typeable whereas n/a is not applicable for the method (relevant only for the SNP approach)

Serotype VIII with both latex agglutination and Full CPS

Detailed summary of the results (typeability and concordance) of the three WGS methods TRUE corresponds to concordant with latex agglutination results, FALSE corresponds to discordant, NT is non-typeable whereas n/a is not applicable for the method (relevant only for the SNP approach) Serotype VIII with both latex agglutination and Full CPS Secondly, SNPs within the conserved cpsD-G region were investigated amongst serotypes Ia-VII and IX. The reads were first mapped to the full capsule and any isolates with serotype VIII capsule were precluded from further analysis (n = 2). Reads were mapped to the cpsD-G region of S. agalactiae strain 3139 (region 3166-5955: accession number AF332908.1) and a serotype was assigned based on the bases present in the 44 SNP positions described above (Table 1). Using this approach, serotype was assigned to 749 cases (94.8%) whereas in the remaining 39 cases, serotype was not assigned due to either low base mapping (n = 17), a mixed base (variant present in 20-80% of the reads) in certain SNP positions (n = 2) or an incomplete SNP profile match (n = 20) (Additional file 6: Table S6). In two cases no SNPs were detected; one of the isolates had also failed with the previous method with a highest percent coverage of 71.93% whereas the second isolate was the Ia isolate called as a single serotype by the previous method. Further investigation, revealed that the failed isolate, when mapped to the full capsular locus reference file, had no reads mapping in the region corresponding to the 3’end of cpsC – cpsH inclusive. Similar investigation for the Ia isolate revealed a smaller region of no coverage within the cpsD-G region (1549-2069 bps), which covered 12 of the 44 SNP positions. Due to the low coverage (<90%) the SNP analysis was not performed. However, based on the 32 SNP positions that were covered this isolate matched (32/32) the serotype Ia SNP profile. The variable region cpsG-K (Fig. 1) was the third and final region to be evaluated in this study. This region was previously successfully used to assign serotype from WGS data using de novo assembly followed by BLAST analysis [16]. In this study, a mapping approach was used with the cpsG-K regions extracted from the capsular locus sequences of the 10 reference strains used as a reference. This approach assigned a serotype to 789 of the isolates (99.9%) and the one isolate that failed with highest coverage of 66.97% was the same isolate that failed with the two previous methods.

Concordance

Concordance of the three molecular serotyping assays was observed compared to the latex agglutination method. Concordance was observed in 120 of the 162 typed isolates (74.1%) using the full capsular locus approach (Table 3). The 42 discordant isolates were investigated by repeat serotyping and/or WGS. In 22 cases, discordances were resolved; in seven cases repeat WGS was concordant with the serotype whereas in the remaining 15 cases repeating the serotype analysis gave a result concordant with the WGS. The one failed isolate mentioned previously was also investigated; originally typed serologically as serotype VI, repeat serological analysis was not able to assign a serotype due to auto-agglutination with all sera.
Table 3

Investigation of discordant isolates in this dataset (n = 107)

SampleIDcpsG-KFull CPSSNP-basedSNPhitsOriginal ResultsRetested
LAB_SEROTYPEG-KvsLABCPSvsLABSNPvsLABRetested labRetested WGSResolved
PHEGBS0002IIIISerotype undetermined n = 42/44IXFALSEFALSENTNTNo
PHEGBS0007IaIa,IIIIa n = 44/44IXFALSENTFALSEIaYes
PHEGBS0016IIIIII n = 44/44IXFALSEFALSEFALSEIbIINo
PHEGBS0017VV,IXV n = 44/44VIFALSENTFALSENTNo
PHEGBS0022VV,IXV n = 44/44VIIIFALSENTFALSENTNo
PHEGBS0024IaIa,IIIIa n = 44/44VIIIFALSENTFALSEIaYes
PHEGBS0025IaIa,IIIIa n = 44/44IXFALSENTFALSEIaYes
PHEGBS0026IaIa,IIIIa n = 44/44VIIIFALSENTFALSEIaYes
PHEGBS0027IaIa,IIIIa n = 44/44VIIIFALSENTFALSEIaYes
PHEGBS0028IIIIII n = 44/44VIIIFALSEFALSEFALSEIaIINo
PHEGBS0029IbIbIb n = 44/44VIIIFALSEFALSEFALSEIbYes
PHEGBS0030VV,IXV n = 44/44VIIIFALSENTFALSENTNo
PHEGBS0033VV,IXV n = 44/44VIFALSENTFALSENTNo
PHEGBS0035IaIaSerotype undetermined n = 0/44NTFALSEFALSENTIaYes
PHEGBS0037VIVIVI n = 44/44IIIFALSEFALSEFALSEVIYes
PHEGBS0043IaIa,IIIIa n = 44/44NTFALSENTFALSEIaYes
PHEGBS0070IaIa,IIIIa n = 44/44NTFALSENTFALSEIaYes
PHEGBS0075IIIIII n = 44/44NTFALSEFALSEFALSEIIYes
PHEGBS0082VV,IXV n = 44/44NTFALSENTFALSEVYes
PHEGBS0084IVIVIV n = 44/44IbFALSEFALSEFALSEIVYes
PHEGBS0086IaIa,IIIIa n = 44/44NTFALSENTFALSEIaYes
PHEGBS0091IIIIII n = 44/44NTFALSEFALSEFALSEIaIINo
PHEGBS0098VV,IXV n = 44/44NTFALSENTFALSEVYes
PHEGBS0103IVIVIV n = 44/44IbFALSEFALSEFALSEIVYes
PHEGBS0134IaIa,IIIIa n = 44/44IVFALSENTFALSEIaYes
PHEGBS0152IaIa,IIIIa n = 44/44NTFALSENTFALSEIaYes
PHEGBS0157IVIV,IIIIV n = 44/44IbFALSENTFALSEIVYes
PHEGBS0163IIIISerotype undetermined n = 42/44IbFALSEFALSENTIaIINo
PHEGBS0167IIIIII,IaIII n = 44/44NTFALSENTFALSEIIIYes
PHEGBS0176IIIIII,IaSerotype undetermined n = 42/44VFALSENTNTNTNo
PHEGBS0195IaIa,IIIIa n = 44/44IIIFALSENTFALSEIIIIIIYes
PHEGBS0196IaIa,IIIIa n = 44/44IbFALSENTFALSEIbIbYes
PHEGBS0198IbIbIb n = 44/44VFALSEFALSEFALSEVVYes
PHEGBS0200VV,IXV n = 44/44IaFALSENTFALSEIaIaYes
PHEGBS0201IIIIII,IaIII n = 44/44IaFALSENTFALSEIaIaYes
PHEGBS0203IaIa,IIIIa n = 44/44VFALSENTFALSEVVYes
PHEGBS0204IaIa,IIIIa n = 44/44IIIFALSENTFALSEIIIIIIYes
PHEGBS0205VV,IXV n = 44/44IIIFALSENTFALSEIIIIIIYes
PHEGBS0208IVIVIV n = 44/44VFALSEFALSEFALSEVVYes
PHEGBS0209IIIIII,IaIII n = 44/44IaFALSENTFALSEIaIaYes
PHEGBS0210VV,IXV n = 44/44IaFALSENTFALSEIaIaYes
PHEGBS0211IaIa,IIIIa n = 44/44IIIFALSENTFALSEIIIIIIYes
PHEGBS0212IaIa,IIIIa n = 44/44IIIFALSENTFALSEIIIIIIYes
PHEGBS0215IIIIII,IaIII n = 44/44IaFALSENTFALSEIaIaYes
PHEGBS0216IIIIII,IaSerotype undetermined n = 42/44IbFALSENTNTIbIbYes
PHEGBS0217IaIa,IIIIa n = 44/44IIIFALSENTFALSEIIIIIIYes
PHEGBS0218IbIbIb n = 44/44IIIFALSEFALSEFALSEIIIIIIYes
PHEGBS0220IIIIII,IaIII n = 44/44IbFALSENTFALSEIbIbYes
PHEGBS0222IbIbIb n = 44/44IIIFALSEFALSEFALSEIIIIIIYes
PHEGBS0225IIIIII,IaIII n = 44/44IaFALSENTFALSEIaIaYes
PHEGBS0227IaIa,IIIIa n = 44/44IbFALSENTFALSEIbIbYes
PHEGBS0228IIIIII,IaIII n = 44/44IaFALSENTFALSEIaIaYes
PHEGBS0229IbIbIb n = 44/44VFALSEFALSEFALSEVVYes
PHEGBS0231VV,IXV n = 44/44IIIFALSENTFALSEIIIIIIYes
PHEGBS0232IaIa,IIIIa n = 44/44IbFALSENTFALSEIaYes
PHEGBS0233IVIVIV n = 44/44IIIFALSEFALSEFALSEIIIIIIYes
PHEGBS0236IIIIII,IaIII n = 44/44IbFALSENTFALSEIbIbYes
PHEGBS0238IbIbIb n = 44/44IIIFALSEFALSEFALSEIIIIIIYes
PHEGBS0240IIIIII,IaIII n = 44/44IaFALSENTFALSEIaIaYes
PHEGBS0241IIIIII,IaIII n = 44/44VIFALSENTFALSEVIVIYes
PHEGBS0242IaIa,IIIIa n = 44/44IIIFALSENTFALSEIIIIIIYes
PHEGBS0264IIIIII n = 44/44NTFALSEFALSEFALSENTIINo
PHEGBS0266IVIVIV n = 44/44IbFALSEFALSEFALSENTIVNo
PHEGBS0271IIIIII n = 44/44NTFALSEFALSEFALSEIIYes
PHEGBS0313IVIVIV n = 44/44IbFALSEFALSEFALSEIVYes
PHEGBS0327IVIVIV n = 44/44IbFALSEFALSEFALSEIVYes
PHEGBS0352IIIIII n = 44/44IbFALSEFALSEFALSEIIYes
PHEGBS0373VV,IXV n = 44/44NTFALSENTFALSENTVNo
PHEGBS0390IIIIII n = 44/44VFALSEFALSEFALSEVIINo
PHEGBS0394IIIIII n = 44/44NTFALSEFALSEFALSEIIYes
PHEGBS0398IIIIII n = 44/44NTFALSEFALSEFALSEVIINo
PHEGBS0404IIIIII n = 44/44NTFALSEFALSEFALSENTIINo
PHEGBS0432IIIIII n = 44/44VFALSEFALSEFALSENTNo
PHEGBS0433IVIVIV n = 44/44IbFALSEFALSEFALSEIVYes
PHEGBS0460IaIa,IIIIa n = 44/44IIIFALSENTFALSENTNo
PHEGBS0492IIIIII,IaIII n = 44/44NTFALSENTFALSEIIIYes
PHEGBS0520VV,IXV n = 44/44NTFALSENTFALSENTNo
PHEGBS0554IIIIII n = 44/44NTFALSEFALSEFALSENTNo
PHEGBS0568IIIIII n = 44/44IaFALSEFALSEFALSENTNo
PHEGBS0586IIIIII n = 44/44VFALSEFALSEFALSENTNo
PHEGBS0600IIIISerotype undetermined n = 41/44IIIFALSEFALSENTNTNo
PHEGBS0603IIIIII,IaIII n = 44/44NTFALSENTFALSENTNo
PHEGBS0607VIIVII,IXVII n = 44/44IIIFALSENTFALSEIIINo
PHEGBS0610IIIIII,IaIII n = 44/44IaFALSENTFALSEIaNo
PHEGBS0611IaIa,IIIIa n = 44/44IIIFALSENTFALSEIIINo
PHEGBS0615IaIa,IIIIa n = 44/44VFALSENTFALSEIaYes
PHEGBS0620IaIa,IIISerotype undetermined n = 41/44IIIFALSENTNTIIIIIIYes
PHEGBS0627IIIIII n = 44/44IbFALSEFALSEFALSEIaNo
PHEGBS0640IIIIII,IaIII n = 44/44VIIIFALSENTFALSEIIIYes
PHEGBS0641IIIIII,IaIII n = 44/44NTFALSENTFALSEIIIYes
PHEGBS0642IIIIII,IaIII n = 44/44IaFALSENTFALSEIIIYes
PHEGBS0643IIIIII,IaIII n = 44/44IaFALSENTFALSEIaNo
PHEGBS0644IIIIII,IaIII n = 44/44NTFALSENTFALSEIIIYes
PHEGBS0671IVIVSerotype undetermined n = 38/44IaFALSEFALSENTIaIVNo
PHEGBS0728IIIIII n = 44/44NTFALSEFALSEFALSENTNo
PHEGBS0737IIIIII n = 44/44VFALSEFALSEFALSEVNo
PHEGBS0738VV,IXSerotype undetermined n = 42/44IaFALSENTNTVYes
PHEGBS0761VV,IXV n = 44/44NTFALSENTFALSEVYes
PHEGBS0772IaIa,IIISerotype undetermined n = 38/44NTFALSENTNTIaYes
PHEGBS0821VIVIVI n = 44/44NTFALSEFALSEFALSEVIYes
PHEGBS0822IbIbIb n = 44/44NTFALSEFALSEFALSEIbYes
PHEGBS0823IaIa,IIIIII n = 44/44NTFALSENTFALSENTNo
PHEGBS0824IbIbIb n = 44/44IaFALSEFALSEFALSEIaIbNo
PHEGBS0825VIIVII,IXVII n = 44/44NTFALSENTFALSEVIIYes
PHEGBS0826VIVIVI n = 44/44IaFALSEFALSEFALSEVIYes
Investigation of discordant isolates in this dataset (n = 107) The SNP-based approach resulted in concordance with serology for 653 of 749 typed isolates (87.2%). The remaining 96 discordant isolates were investigated by repeating serology and/or WGS. Discordance was resolved in 67 cases; in 29 cases repeat WGS was concordant with the laboratory serotype whereas in the remaining 37 cases repeating the serological analysis gave a result concordant with the WGS. Of the 96 discordant isolates, 37 were also discordant with the previous method; the remaining five were non-typeable by the SNP-based method (Table 3). Finally, using the variable cpsG-K region concordance was observed in 684 of the 789 cases (86.7%). The remaining 105 discordant isolates, included all discordant isolates from the first method (n = 42) and 95/96 of the discordant isolates from the second method, in addition to 10 non-typeable by the SNP-based method. All discordant isolates were retested with latex agglutination by an investigator blinded to the previous results. One isolate discordant with the SNP-based method was correctly assigned serotype Ia with the cpsG-K method; using the full capsule method the serotype pair Ia/III was called and with the SNP-based method serotype III; following retesting in the laboratory serotype V was assigned which did not agree with either method. Further investigation into this isolate, using mapping analysis and de novo assembly followed by BLAST analysis, revealed that whereas the cpsD-G region corresponded to serotype III (44/44 SNP matches) this isolate contained a serotype Ia cpsH gene [28], suggesting a recombination event may have taken place. The serological method was repeated for all discordant isolates (n = 105) and in 40 cases the new result corresponded to the predicted serotype (by WGS) whereas in 20 cases no serotype was assigned due to lack of reaction or reaction with multiple sera. Of the remaining 45 discordant cases, 44 were re-sequenced and in 31 of these cases the new predicted serotype was concordant with the serological result. Overall following retesting only 14 cases remain truly discordant excluding the 20 non-typeable by sera (Table 3).

Comparison of mapping and assembly approach

The results of the third mapping-approach, that uses the variable cpsG-K region, were compared to the previously described assembly method followed by BLAST [16]. All 790 isolates were assembled and serotype was assigned following BLAST analysis of the contigs against the variable cpsG-K region from all 10 serotypes. The results of this approach were concordant with the results from the mapping approach for 721/790 isolates (Additional file 2: Table S2). One isolate that had previously failed with the mapping approach, with all three methods, due to low coverage of the capsular operon region (depth > = 5 is used as threshold) return no hits with BLAST. Serotype was not assigned for the remaining 68 isolates due to lack of a complete match; the cpsG-K region was present but split into two or more contigs. Further investigation in a subset of these cases (n = 20) revealed that in all cases the serotype assigned by mapping was also the serotype with the highest coverage along the multiple contigs in the assembly/BLAST approach.

Discussion

The conserved region cpsD-G was previously used for molecular capsular typing in Group B streptococci using a SNP-based approach with PCR amplification and Sanger sequencing [26, 27], whereas more recently the variable region cpsG-K was used to predict serotype from WGS data using a de novo assembly and BLAST analysis approach [16]. In this study, these two capsular locus target regions as well as the full capsular locus sequence were investigated for their efficacy in predicting serotype in GBS from WGS data. We used a mapping approach followed by SNP-based approach for the cpsD-G region on a validation panel of 790 GBS isolates. The full capsular locus approach was able to predict 5/10 serotypes (Ib, II, IV, VI, VIII) with distinct capsular locus sequences but was unable to distinguish the remaining five, including serotype III which is the most prevalent serotype in the UK. These latter serotypes were resolved to one of two highly similar capsular loci (Ia/III, V/IX, VII/IX) each sharing more than 90% similarity. Previous publications have reported the similarity of the serotype Ia and III capsular loci; the main difference lies in the cpsH gene [28]. Further investigation into the three serotypes (V, VII & IX) of the remaining two pairs (V/IX & VII/IX) showed high sequence variability for cpsM based on the read mapping patterns for serotypes V and IX; that is in addition to the presence of cpsN in serotype V (V: cpsG-cpsH-cpsM-cpsN-cpsO-cpsJ-cpsK, IX: cpsG-cpsH-cpsM-cpsO-cpsJ-cpsK). For the VII/IX pair, the two serotypes differ in the presence of cpsO in serotype IX and cpsI in VII (VII: cpsG-cpsH-cpsM-cpsI-cpsJ-cpsK). Furthermore, fragmented mapping (reads mapping to some region but not others) between cpsM and cpsK suggests high sequence variability in cpsJ as well. The SNP-based approach failed to predict a serotype in 39 cases due to low mapping coverage or a mixed base at certain SNP positions for approximately half the cases; this dependence on specific positions is one of the disadvantages of using a SNP-based methodology coupled with mapping and strict quality criteria (depth > = 5 and base quality > = 30). However, the remaining unassigned cases were of particular concern; in these cases failure to predict serotype was due to a mixed SNP profile, this is a 41-43/44 match to a serotype-specific SNP profile with one to three SNP positions matching the expected base of another serotype. The positions affected include 12 SNPs located in cpsE and cpsF, including 10 previously published (419, 429, 437, 457, 486, 645, 1026, 1611, 1627 and 1866). However, based on previous studies [26], three of these positions have alternative variants in serotype III subtypes. Specifically, at position 437, the presence of a T instead of a C is the only difference between serotype II and III-4 SNP profiles of region cpsD-G; three isolates have 43/44 SNPs matching the serotype II SNP profile and a T in position 437 suggesting that these isolates are indeed serotype III subtype 4 which also agrees with the serological result and the result from the cpsG-K method. Furthermore, in positions 486 and 1026 the presence of G and A instead of A and G, respectively are indicative of serotype III subtype 3 and based on this, three isolates with a partial 42/44 serotype III profile and G and A in 486 and 1026 respectively would be assigned serotype III subtype 3. This also agrees with the cpsG-K assignment and in one of the cases with the serological result. For the remaining 7 positions, the presence of these alternate variants, even though occurring rarely, suggests that variants at these positions are not conserved within a serotype. Finally, mapping the reads to the variable cpsG-K region produced the highest typeability (99.9%) with only a single isolate remaining non-typeable. However, this isolate was non-typeable with all methods and was shown to lack a large part of the capsular locus (cpsC-H) therefore the lack of serotype prediction was correct. Using this method 86.7% concordance was observed with only 105 isolates, which included all (but one) of the discordant isolates from the two previous methods and 10 non-typeable isolates by the SNP method. Therefore, this number represents the discordance between the molecular and serological approach. Following retesting using both serological and WGS methods only 14 isolates remained truly discordant. The initially high discordance rate can mainly be attributed to the traditional methodology of latex agglutination which can be subjective; specifically re-serotyping using latex agglutination resulted in the same serotype in only 39/105 cases indicating latex agglutination or lab transcription error. This high ambiguity with the serological method may be in part due to a problem with a particular batch as some of the sera used were shown to agglutinate poorly when tested with controls. This included the Ia sera resulting in a number of serotype Ia isolates (n = 18) being mis-identified as rarer types or non-typeables and was resolved when a new batch was used. Furthermore, isolates referred to the reference lab are inoculated onto an agar plate and following overnight incubation a sweep of material is usually tested for serotyping whereas for retesting subculture from a single colony was used. Therefore the presence of a mixed population in the initial culture could result in ambiguous serotyping results downstream in the investigation. However not all discordance can be attributed to the traditional methodology. Cases where the traditional re-serotyping was consistent but repeat WGS resulted in a different serotype could be attributed to a laboratory transcription error. Overall, this study demonstrated that molecular capsular typing using WGS is a viable alternative to the traditional serotyping method and the most efficient method between the three investigated was using a mapping approach on the variable cpsG-K method. These results further support the evidence provided by Sheppard et al. [16], however, unlike the assembly-based methodology used in that study, a mapping approach as described here is more appropriate for use in reference microbiology. Generally, mapping is faster and can provide more precise, region-specific, quality metrics than genome assembly. Furthermore, comparisons of the two approaches, revealed lower typeability with the assembly approach followed by blasting (721/790 vs 789/790 with the mapping approach) due to incomplete assembly of the capsular operon region.

Conclusions

The use of genomics to determine serotype of GBS is critical to Public Health, improving on current subjective methodologies, enabling determination of serotype in strains that were previously designated nontypeable using phenotypic testing. Ultimately it will enable testing, remote determination of serotype and collation of data from strains without reliance of specialised testing centres. The ability to determine the serotype is essential to enable identification of cross infections linked to healthcare associated infection and monitoring of specific serotypes in differing patient groups. It will have increasing importance when considering the serological profile of the bacterial population in response to forthcoming vaccinations. Information on historical GBS strains used as reference for the 10 serotypes (Ia-IX). (XLSX 11 kb) Metadata for all clinical isolates used in this study (n = 790). (XLSX 36 kb) Investigation of the previously published SNPs using the 10 reference strains. SNPs highlighted in yellow do not correlate with the respective reference and are removed from subsequent analysis. (XLSX 19 kb) Investigation of the SNPs (previously published and novel) identified using multiple alignment of the reference cps loci using a small a small dataset of clinical samples (n = 28). SNP positions highlighted in grey represent SNPs not found in all isolates of a particular serotype. These positions were removed and the remaining SNPs represent the final set used in this study. (XLSX 23 kb) Sequence analysis for the serotype pairs identified using the full cps locus approach. Hit coverage corresponds to the mapping coverage observed during analysis whereas the BLAST coverage corresponds to the percent coverage seen during pairwise alignment using BLAST. (XLSX 8 kb) Investigation of the not-typeable (“Serotype undetermined”) isolates from the SNP-based approach. The pileup file created during the analysis was investigated in detail to determine the aetiology behind the incomplete SNP profiles. (XLSX 19 kb)
  28 in total

1.  Inhibition enzyme-linked immunosorbent assay for serotyping of group B streptococcal isolates.

Authors:  G Arakere; A E Flores; P Ferrieri; C E Frasch
Journal:  J Clin Microbiol       Date:  1999-08       Impact factor: 5.948

2.  Structural and genetic diversity of group B streptococcus capsular polysaccharides.

Authors:  Michael J Cieslewicz; Donald Chaffin; Gustavo Glusman; Dennis Kasper; Anup Madan; Stephani Rodrigues; Jessica Fahey; Michael R Wessels; Craig E Rubens
Journal:  Infect Immun       Date:  2005-05       Impact factor: 3.441

3.  Incidence of invasive group B streptococcal disease and pathogen genotype distribution in newborn babies in the Netherlands over 25 years: a nationwide surveillance study.

Authors:  Vincent Bekker; Merijn W Bijlsma; Diederik van de Beek; Taco W Kuijpers; Arie van der Ende
Journal:  Lancet Infect Dis       Date:  2014-10-19       Impact factor: 25.071

4.  The serotype of type Ia and III group B streptococci is determined by the polymerase gene within the polycistronic capsule operon.

Authors:  D O Chaffin; S B Beres; H H Yim; C E Rubens
Journal:  J Bacteriol       Date:  2000-08       Impact factor: 3.490

5.  progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement.

Authors:  Aaron E Darling; Bob Mau; Nicole T Perna
Journal:  PLoS One       Date:  2010-06-25       Impact factor: 3.240

6.  Safety and immunogenicity of an investigational maternal trivalent group B streptococcus vaccine in healthy women and their infants: a randomised phase 1b/2 trial.

Authors:  Shabir A Madhi; Clare L Cutland; Lisa Jose; Anthonet Koen; Niresha Govender; Frederick Wittke; Morounfolu Olugbosi; Ajoke Sobanjo-Ter Meulen; Sherryl Baker; Peter M Dull; Vas Narasimhan; Karen Slobod
Journal:  Lancet Infect Dis       Date:  2016-04-29       Impact factor: 25.071

7.  Increasing burden of invasive group B streptococcal disease in nonpregnant adults, 1990-2007.

Authors:  Tami H Skoff; Monica M Farley; Susan Petit; Allen S Craig; William Schaffner; Ken Gershman; Lee H Harrison; Ruth Lynfield; Janet Mohle-Boetani; Shelley Zansky; Bernadette A Albanese; Karen Stefonek; Elizabeth R Zell; Delois Jackson; Terry Thompson; Stephanie J Schrag
Journal:  Clin Infect Dis       Date:  2009-07-01       Impact factor: 9.079

8.  Analysis of group B streptococcal isolates from infants and pregnant women in Portugal revealing two lineages with enhanced invasiveness.

Authors:  E R Martins; M A Pessanha; M Ramirez; J Melo-Cristino
Journal:  J Clin Microbiol       Date:  2007-08-15       Impact factor: 5.948

9.  Latex assay for serotyping of group B Streptococcus isolates.

Authors:  H-C Slotved; J Elliott; T Thompson; H B Konradsen
Journal:  J Clin Microbiol       Date:  2003-09       Impact factor: 5.948

Review 10.  An overview of global GBS epidemiology.

Authors:  Kirsty Le Doare; Paul T Heath
Journal:  Vaccine       Date:  2013-08-28       Impact factor: 3.641

View more
  12 in total

1.  Microevolution of Streptococcus agalactiae ST-261 from Australia Indicates Dissemination via Imported Tilapia and Ongoing Adaptation to Marine Hosts or Environment.

Authors:  Minami Kawasaki; Jerome Delamare-Deboutteville; Rachel O Bowater; Mark J Walker; Scott Beatson; Nouri L Ben Zakour; Andrew C Barnes
Journal:  Appl Environ Microbiol       Date:  2018-08-01       Impact factor: 4.792

Review 2.  Streptococcus agalactiae in pregnant women in Brazil: prevalence, serotypes, and antibiotic resistance.

Authors:  Cilicia S do Nascimento; Nayara F B Dos Santos; Rita C C Ferreira; Carla R Taddei
Journal:  Braz J Microbiol       Date:  2019-08-20       Impact factor: 2.476

3.  Identifying large-scale recombination and capsular switching events in Streptococcus agalactiae strains causing disease in adults in the UK between 2014 and 2015.

Authors:  Uzma Basit Khan; Elita Jauneikaite; Robert Andrews; Victoria J Chalker; Owen B Spiller
Journal:  Microb Genom       Date:  2022-03

4.  Uncertainties in Screening and Prevention of Group B Streptococcus Disease.

Authors:  Kirsty Le Doare; Paul T Heath; Jane Plumb; Natalie A Owen; Peter Brocklehurst; Lucy C Chappell
Journal:  Clin Infect Dis       Date:  2019-08-01       Impact factor: 9.079

5.  Genomic characterisation of perinatal Western Australian Streptococcus agalactiae isolates.

Authors:  Lucy L Furfaro; Barbara J Chang; Charlene M Kahler; Matthew S Payne
Journal:  PLoS One       Date:  2019-10-02       Impact factor: 3.240

6.  Phylogeny, recombination, and invasiveness of group B Streptococcus revealed by genomic comparisons of its global strains.

Authors:  Enze Lin; Shengmei Zou; Yue Wang; Chien-Chung Lee; Cheng-Hsun Chiu; Ye Feng
Journal:  Eur J Clin Microbiol Infect Dis       Date:  2020-10-16       Impact factor: 3.267

7.  Molecular Identification of Invasive Non-typeable Group B Streptococcus Isolates From Denmark (2015 to 2017).

Authors:  Hans-Christian Slotved; Kurt Fuursted; Ioanna Drakaki Kavalari; Steen Hoffmann
Journal:  Front Cell Infect Microbiol       Date:  2021-03-29       Impact factor: 5.293

8.  Phenotypic and molecular analysis of nontypeable Group B streptococci: identification of cps2a and hybrid cps2a/cps5 Group B streptococcal capsule gene clusters.

Authors:  Areej Alhhazmi; Gregory J Tyrrell
Journal:  Emerg Microbes Infect       Date:  2018-08-08       Impact factor: 7.163

9.  Identification of Group B Streptococcus Serotypes and Genotypes in Late Pregnant Women and Neonates That Are Associated With Neonatal Early-Onset Infection in a South China Population.

Authors:  Zhu Yao; Wu Jiayin; Zheng Xinyi; Chen Ling; He Mingyuan; Ma Simin; Lin Yayin; Lin Xinzhu; Chen Chao
Journal:  Front Pediatr       Date:  2020-05-27       Impact factor: 3.418

10.  Antibiotic Resistance and Molecular Epidemiological Characteristics of Streptococcus agalactiae Isolated from Pregnant Women in Guangzhou, South China.

Authors:  Zhaomin Cheng; Pinghua Qu; Peifeng Ke; Xiaohan Yang; Qiang Zhou; Kai Lan; Min He; Nannan Cao; Sheng Qin; Xianzhang Huang
Journal:  Can J Infect Dis Med Microbiol       Date:  2020-04-30       Impact factor: 2.471

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.