| Literature DB >> 33355528 |
James R Knight1, Eileen M Dunne2,3, E Kim Mulholland4,2,3, Sudipta Saha5, Catherine Satzke2,3,6, Adrienn Tothpal7,5, Daniel M Weinberger5.
Abstract
Serotyping of Streptococcus pneumoniae is a critical tool in the surveillance of the pathogen and in the development and evaluation of vaccines. Whole-genome DNA sequencing and analysis is becoming increasingly common and is an effective method for pneumococcal serotype identification of pure isolates. However, because of the complexities of the pneumococcal capsular loci, current analysis software requires samples to be pure (or nearly pure) and only contain a single pneumococcal serotype. We introduce a new software tool called SeroCall, which can identify and quantitate the serotypes present in samples, even when several serotypes are present. The sample preparation, library preparation and sequencing follow standard laboratory protocols. The software runs as fast as or faster than existing identification tools on typical computing servers and is freely available under an open source licence at https://github.com/knightjimr/serocall. Using samples with known concentrations of different serotypes as well as blinded samples, we were able to accurately quantify the abundance of different serotypes of pneumococcus in mixed cultures, with 100 % accuracy for detecting the major serotype and up to 86 % accuracy for detecting minor serotypes. We were also able to track changes in serotype frequency over time in an experimental setting. This approach could be applied in both epidemiological field studies of pneumococcal colonization and experimental laboratory studies, and could provide a cheaper and more efficient method for serotyping than alternative approaches.Entities:
Keywords: pneumococcus; serotyping; whole genome sequencing
Year: 2021 PMID: 33355528 PMCID: PMC8115901 DOI: 10.1099/mgen.0.000494
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
Comparison of run times for SeroCall, SeroBA and PneumoCaT
|
Running time (MM : SS) |
SeroCall |
SeroBA |
PneumoCaT |
|---|---|---|---|
|
Minimum |
0 : 14 |
0 : 26 |
0 : 17 |
|
Mean |
0 : 35 |
1 : 43 |
1 : 16 |
|
Maximum |
1 : 06 |
2 : 39 |
2 : 49 |
Fig. 1.Call accuracy for all-by-all, in silico mixtures of 3.5 million reads. Accuracy of SeroCall for each serotype/percentage combination, displayed as vertical barcharts evaluating the calls from pairwise mixing the serotype at the given percentage against each of the other serotypes. Each call was evaluated to see if SeroCall made the correct call, did not call that serotype, had additional serotype calls or called a different serotype. Correct calls were also evaluated as to whether the reported percentage was within 15 % of that serotype’s input percentage or not. Missing (or white) barcharts reflect samples with too few reads to perform the in silico mixture (i.e. the serotype 10C data contained 688 206 reads, so mixtures of 3.5 million reads using percentages >=25 % could not be generated).
Fig. 2.Comparison of true and estimated serotype percentage. Comparison of the true and estimated percentage of each individual serotype in multiple mixed samples using two serotypes (blue), a mixed sample with three serotypes (red) and a mixed sample with five serotypes (yellow). (Note: two 20 % serotypes from the five-serotype mix were reported as 20.9 and 21.0 %, and so overlay each other in the figure.)
Fig. 3.SeroCall accuracy for PneuCarriage samples. (a) Serotype calling accuracy for the 65 PneuCarriage blind testing samples, using a first round of 1.9 million reads per sample and a second round of 4.6 million reads per sample. (b) Comparison of SeroCall quantification [‘abundance (sequencing)’] and known PneuCarriage abundances [‘abundance (actual)’], for 32 mixed samples.
Fig. 4.Quantitation of replicate mixtures. Replicate testing using mixtures of 2, 3, 5 and 10 serotypes. Replicates were cultured for 2, 4, 6 or 8 h before selection for sequencing.