| Literature DB >> 35937698 |
Marcel Dahms1,2, Simone Eiserloh1,3, Jürgen Rödel4, Oliwia Makarewicz3,5, Thomas Bocklitz1,2, Jürgen Popp1,2, Ute Neugebauer1,2,3.
Abstract
Streptococcus pneumoniae, commonly referred to as pneumococci, can cause severe and invasive infections, which are major causes of communicable disease morbidity and mortality in Europe and globally. The differentiation of S. pneumoniae from other Streptococcus species, especially from other oral streptococci, has proved to be particularly difficult and tedious. In this work, we evaluate if Raman spectroscopy holds potential for a reliable differentiation of S. pneumoniae from other streptococci. Raman spectra of eight different S. pneumoniae strains and four other Streptococcus species (S. sanguinis, S. thermophilus, S. dysgalactiae, S. pyogenes) were recorded and their spectral features analyzed. Together with Raman spectra of 59 Streptococcus patient isolates, they were used to train and optimize binary classification models (PLS-DA). The effect of normalization on the model accuracy was compared, as one example for optimization potential for future modelling. Optimized models were used to identify S. pneumoniae from other streptococci in an independent, previously unknown data set of 28 patient isolates. For this small data set balanced accuracy of around 70% could be achieved. Improvement of the classification rate is expected with optimized model parameters and algorithms as well as with a larger spectral data base for training.Entities:
Keywords: bacteria; binary PLS-DA classification models; chemometrics; clinical isolates; pneumococcus; raman spectroscopy; streptococcus
Mesh:
Year: 2022 PMID: 35937698 PMCID: PMC9353136 DOI: 10.3389/fcimb.2022.930011
Source DB: PubMed Journal: Front Cell Infect Microbiol ISSN: 2235-2988 Impact factor: 6.073
Overview of number of aggregated spectra used for modelling and testing.
| Type | Training spectra (laboratory strains + patient isolates) | Test*: spectra (patient isolates) |
|---|---|---|
|
| 1177 (8) + 1142 (25) = 2319 (33) | 122 (3) |
| Other streptococci | 682 (4) + 1575 (34) = 2257 (38) | 1157 (25) |
| total | 1859 (12) + 2717 (59) = 4576 (71) | 1279 (28) |
The number in brackets gives the number of independent laboratory strains/patient isolates. (more detailed information about laboratory strains and patient isolates is given in ).
*Identification of strains in the test data set was not disclosed until Raman prediction was finished.
Figure 1Raman mean spectra of the laboratory strains. Preprocessed Raman mean spectra of the eight different S. pneumoniae strains (orange) and the four different non-pneumococcal Streptococcus species (turquois) presented as overlaid spectra to visualize spectral differences. The insets highlight Raman bands of interest discussed in the corresponding assignment in .
Figure 2Difference spectrum (blue, bottom) and PLSR loadings (black) for models. (A) With and (B) without vector normalization applied to the spectral data in the training set. Difference spectra (computed by subtracting the mean spectrum of S. pneumoniae from the mean spectrum of other streptococci) are scaled appropriately matching the loadings scale and are depicted in blue. Loadings are shown in black with increasing components organized from bottom to top. For each spectrum the “zero line” on the y-axis (contribution of Raman intensity) is indicated with a dotted line. ROC curve using these 10 or 9 loadings, respectively, are depicted in .
Typical Raman bands found in streptococci spectra ( ), assignment to functional groups and rough estimation of relative intensity of the respective Raman bands in S. pneumoniae (S. p.) and other streptococci (o. S.).
| Raman | Characteristic Raman bands and assignment ( | |||||
|---|---|---|---|---|---|---|
| ν [cm-1] | DNA/RNA | Proteins | Lipids | Carbohydrates |
| o. |
| 2936 | ** | CH3 (str), CH2 (str) | CH3 (str) | CH2 (str), CH (str) | + | ++ |
| 2880 | CH2 (str) | CH2 (str) | ++ | + | ||
| 2856 | CH2 (str) | CH2 (str) | ++ | + | ||
| 1660 | Amid I (C=O) | C=C (str) | + | ++ | ||
| 1572 | G/A (Ring, str) | + | + + | |||
| 1448 | C-H (def) | CH2/CH3 (def) | + | + | ||
| 1336-1376 | T/A/G | C-H (def), Trp | CH, CH2 (def) | + | +++ | |
| 1244 | Amid III (C-N, N-H) | + | + | |||
| 1096 | C-C | COC glycosidic bond | ++ | + | ||
| 1004 | Phe | + | + | |||
| 956 | Trp, Val, Tyr | COC glycosidic bond | ++ | + | ||
| 872 | Trp | N+(CH3)3
| ++ | |||
| 856 | Tyr, Pro | CC, COC, glycosidic bond | ++ | + | ||
| 784 | C/U/T (str) | + + | + | |||
| 720 | A | N+(CH3)3 (str) | + + | + | ||
**No exact information possible, S. p., S. pneumoniae; o. S., other streptococci; str, stretching vibration; def, deformation vibration; A, adenine; G, guanine; C, cytosine; T, thymine; U, uracil; Phe, phenylalanine; Trp, tryptophan; Tyr, tyrosine; Val, valine; Pro, proline; +, present; ++/+++, increased.
Figure 3Score plots for components 1 and 2 for PLS models. (A) Shows scores for the model using vector normalization, (B) The model using no normalization. Pairs plot of all 7 PLSR scores are depicted in and with different color coding in . Model performance during auto-prediction (using the training data set) and during prediction of unknown test data set is shown in .
Summary on model performance of the automatically optimized models during auto-prediction (training data set, left) and during prediction of unknown test data set (right).
| Auto-prediction (training data set) | Prediction of unknown test data | |||||||
|---|---|---|---|---|---|---|---|---|
| Model | Sens. | Spec. | PPV | Bal. acc. | Sens. | Spec. | PPV | Bal. acc. |
| 1 (with vector norm.)* | 68.9 (78.8) | 91.3 (89.5) | 89.1 (86.7) | 80.1 (84.1) | 56.6 (33.3) | 75.5 (72.0) | 19.6 (12.5) | 66.0 (52.7) |
| 2 (without norm.)** | 57.9 (51.5) | 69.2 (76.3) | 65.9 (65.4) | 63.5 (63.9) | 63.1 (66.7) | 76.1 (76.0) | 21.8 (25.0) | 69.6 (71.3) |
Numbers in brackets represent the corresponding value after majority vote per patient/isolate. A detailed overview of the prediction is given in (auto-prediction) and (test data).
*Model 1: with vector normalization using 10 components, discrimination threshold: -0.01;
**Model 2: without normalization using 9 components, discrimination threshold: -0.091.
Sens … Sensitivity, Spec … specificity, PPV … positive predictive value, Bal. acc … balanced accuracy.
The positive class is S. pneumoniae.