| Literature DB >> 35208889 |
Maaike J C van den Beld1,2, John W A Rossen2,3,4, Noah Evers1, Mirjam A M D Kooistra-Smid2,5, Frans A G Reubsaet1.
Abstract
Shigella spp. and E. coli are closely related and cannot be distinguished using matrix-assisted laser desorption-ionization time-of-flight mass spectrometry (MALDI-TOF MS) with commercially available databases. Here, three alternative approaches using MALDI-TOF MS to identify and distinguish Shigella spp., E. coli, and its pathotype EIEC were explored and evaluated using spectra of 456 Shigella spp., 42 E. coli, and 61 EIEC isolates. Identification with a custom-made database resulted in >94% Shigella identified at the genus level and >91% S. sonnei and S. flexneri at the species level, but the distinction of S. dysenteriae, S. boydii, and E. coli was poor. With biomarker assignment, 98% S. sonnei isolates were correctly identified, although specificity was low. Discriminating markers for S. dysenteriae, S. boydii, and E. coli were not assigned at all. Classification models using machine learning correctly identified Shigella in 96% of isolates, but most E. coli isolates were also assigned to Shigella. None of the proposed alternative approaches were suitable for clinical diagnostics for identifying Shigella spp., E. coli, and EIEC, reflecting their relatedness and taxonomical classification. We suggest the use of MALDI-TOF MS for the identification of the Shigella spp./E. coli complex, but other tests should be used for distinction.Entities:
Keywords: EIEC; Escherichia coli; MALDI-TOF MS; Shigella spp.; biomarker assignment; custom-made database; machine-learning classifiers
Year: 2022 PMID: 35208889 PMCID: PMC8878589 DOI: 10.3390/microorganisms10020435
Source DB: PubMed Journal: Microorganisms ISSN: 2076-2607
Isolates used in this study are divided into training and test sets.
| Species and Serotype/O-Type | Training Set | Test Set | ||
|---|---|---|---|---|
|
| Origin |
| Origin | |
| 2 | CIP 57.28T; A1 | 1 | 1 ci 1 | |
| 5 | A2, 4 ci 1 | 4 | 4 ci 1 | |
| 5 | AMC-43-G-93; 4 ci 1 | 3 | 3 ci 1 | |
| 2 | AMC 43-G-86; 1 ci 1 | 0 | ||
| 1 | AMC 43-G-84 | 0 | ||
| 1 | AMC 43-G-81 | 1 | 1 ci 1 | |
| 1 | AMC 43-G-76 | 1 | 1 ci 1 | |
| 2 | A58: 1646; 1 ci 1 | 1 | 1 ci 1 | |
| 1 | A2050-52 | 0 | ||
| 2 | 2 ci 1 | 1 | 1 ci 1 | |
| 1 | NCTC 11867 | 0 | ||
| 1 | NCTC 11868 | 0 | ||
| Total number of | 24 | 12 | ||
| 3 | B1A; 2 ci 1 | 0 | ||
| 5 | B1B; 4 ci 1 | 5 | 5 ci 1 | |
| 4 | 4 ci 1 | 3 | 3 ci 1 | |
| 32 | CIP 82.48T; B2A; 30 ci 1 | 32 | 32 ci 1 | |
| 1 | B2B | 3 | 3 ci 1 | |
| 2 | B3A; 1 ci 1 | 14 | 14 ci 1 | |
| 2 | B3B; B3C | 3 | 3 ci 1 | |
| 1 | B4A | 4 | 4 ci 1 | |
| 4 | 5 ci 1 | 0 | ||
| 1 | B4B | 0 | ||
| 3 | 3 ci 1 | 0 | ||
| 1 | B5 | 1 | 1 ci 1 | |
| 10 | B6; 9 ci 1 | 10 | 10 ci 1 | |
| 1 | ||||
| 2 | 2 ci 1 | 2 | 2 ci 1 | |
| 2 | 2 ci 1 | 0 | ||
| 5 | 5 ci 1 | 0 | ||
| Total number of | 79 | 77 | ||
| 2 | AMC-43-G-58; 1 ci 1 | 3 | 3 ci 1 | |
| 3 | CIP 82.50T; P288; 1 ci 1 | 4 | 4 ci 1 | |
| 1 | D1 | 0 | ||
| 2 | AMC-43-G-63; 1 ci 1 | 2 | 2 ci 1 | |
| 2 | P143; 1 ci 1 | 0 | ||
| 1 | CDC 9771 (D19) | 0 | ||
| 1 | AMC 4006 (Lavington) | 0 | ||
| 0 | 1 | 1 ci 1 | ||
| 1 | 1296/7 | 0 | ||
| 1 | 430 | 1 | 1 ci 1 | |
| 1 | 34 | 0 | ||
| 0 | 1 | 1 ci 1 | ||
| 0 | 1 | 1 ci 1 | ||
| 0 | 1 | 1 ci 1 | ||
| 1 | CDC C-703 | 0 | ||
| 1 | 1 ci 1 | 1 | 1 ci 1 | |
| Total number of | 17 | 15 | ||
|
| 117 | CIP 82.49T; 116 ci 1 | 115 | 115 ci 1 |
| EIEC | 30 | DSM 9027; DSM 9028; CCUG 11335; CCUG 38080; CCUG 38092; CCUG 38093; EW227; 1624-56; 1184-68; 145/46; L119B-10; 19 ci 1 | 31 | 31 ci 1 |
| Other | 11 | 7 STEC ci 1, 4 EPEC ci 1 | 11 | 8 STEC ci 1; 3 EPEC ci 1 |
| Other | 10 | 5 mussel, 3 pigeon, 2 turkey | 10 | 4 mussel, 3 pigeon, 2 turkey, 1 oyster |
1 ci = clinical isolate. 2 isolated from animals, all other numbers are reference isolates.
Figure 1The classes in the different discrimination levels to which isolates were assigned.
Discrimination scheme of biomarkers, percentage of isolates in the training set with specific biomarkers.
| Biomarkers ( | 2691 | 2877 | 3129 | 3636 | 3647 | 3930 | 3939 | 4163 | 4189 | 4368 | 4501 | 4769 | 4775 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| ( | 92 | 4 | 100 | 0 | 0 | 0 | 100 | 0 | 100 | 100 | 100 | 0 | 100 |
|
| ( | 100 | 0 | 100 | 0 | 63 | 53 | 18 | 1 | 94 | 97 | 99 | 22 | 97 |
|
| ( | 88 | 0 | 100 | 18 | 0 | 0 | 94 | 0 | 100 | 88 | 100 | 18 | 100 |
|
| ( | 56 | 49 | 59 | 22 | 0 | 1 | 56 | 17 | 89 | 68 | 56 | 23 | 98 |
| EIEC | ( | 100 | 0 | 100 | 3 | 6 | 0 | 97 | 0 | 97 | 100 | 94 | 26 | 97 |
| Other | ( | 52 | 24 | 52 | 71 | 0 | 5 | 62 | 38 | 90 | 67 | 57 | 67 | 100 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 100 | 100 | 8 | 92 | 52 | 100 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 100 |
|
| 76 | 97 | 55 | 99 | 45 | 99 | 99 | 4 | 42 | 1 | 83 | 0 | 41 | 23 |
|
| 76 | 94 | 18 | 88 | 59 | 100 | 88 | 18 | 6 | 18 | 0 | 6 | 12 | 88 |
|
| 69 | 86 | 27 | 73 | 0 | 74 | 62 | 13 | 1 | 17 | 0 | 28 | 18 | 75 |
| EIEC | 71 | 100 | 39 | 100 | 42 | 94 | 97 | 19 | 0 | 23 | 6 | 3 | 16 | 84 |
| Other | 19 | 86 | 0 | 67 | 0 | 57 | 67 | 48 | 10 | 52 | 0 | 24 | 43 | 33 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
| 0 | 0 | 100 | 100 | 0 | 0 | 100 | 100 | 0 | 0 | 100 | 0 | 0 | |
|
| 5 | 4 | 90 | 94 | 4 | 17 | 82 | 92 | 8 | 9 | 86 | 0 | 0 | |
|
| 0 | 18 | 82 | 88 | 12 | 18 | 82 | 88 | 6 | 18 | 82 | 0 | 12 | |
|
| 15 | 15 | 82 | 38 | 12 | 16 | 85 | 54 | 15 | 13 | 70 | 34 | 36 | |
| EIEC | 6 | 16 | 77 | 87 | 13 | 23 | 77 | 81 | 6 | 19 | 77 | 0 | 0 | |
| Other | 43 | 38 | 38 | 24 | 48 | 48 | 62 | 38 | 43 | 48 | 48 | 0 | 5 | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 2Dendrogram of MSPs of training isolates. Blue = cluster 1; green = cluster 2; red = cluster 3. Yellow/blue vertical band = manual cluster distinction at distance level 50–100 relative units with species designation using the culture-based identification algorithm.
Figure 3Number of different species in the first 10 matches per spot with the direct smear method. Identity (x-axis) was assigned using the culture-based identification algorithm. Black horizontal bars represent the median number of species; the 25–75% interquartile ranges are indicated by the blue vertical bars, and 5–95% intervals by the black vertical lines. Outliers are indicated with blue dots.
Correct identification results of isolates from the test set.
| Correct Identification with MALDI-TOF, Direct Smear | Correct Identification with MALDI-TOF, Ethanol | |||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Bruker Databases 1, | Custom Databases 1, | Bruker Databases 1 + Custom, | Biomarker Assignment, | Classifier Models, | Bruker Databases 1, | Custom Databases 1, | Bruker Databases 1 + Custom, | Biomarker Assignment, | Classifier Models, | |||||||||||
|
| ||||||||||||||||||||
| 19 | (9) | 205 | (94) | 205 | (94) | 10 | (5) | 209 | (96) | 12 | (6) | 207 | (95) | 205 | (94) | 15 | (7) | 217 | (100) | |
| 49 | (94) | 26 | (50) | 29 | (56) | NA | 11 | (21) | 47 | (90) | 35 | (67) | 37 | (71) | NA | 4 | (8) | |||
| Unassigned 2 | 2 | (1) | 1 | (0.4) | 3 | (1) | 257 | (96) | 0 | (0) | 1 | (0.4) | 0 | (0) | 3 | (1) | 250 | (93) | 0 | (0) |
|
| ||||||||||||||||||||
| NA | 233 | (94) | 241 | (97) | 217 | (88) | 145 | (58) | NA | 245 | (99) | 242 | (98) | 225 | (92) | 147 | (59) | |||
| Other | 21 | (100) | 6 | (29) | 10 | (48) | NA | 14 | (67) | 21 | (100) | 11 | (52) | 13 | (62) | NA | 6 | (29) | ||
| Unassigned 2 | 2 | 1 | (0.4) | 3 | (1) | 46 | (17) | 0 | (0) | 0 | (0) | 0 | (0) | 3 | (1) | 27 | (10) | 0 | (0) | |
|
| ||||||||||||||||||||
| 19 | (9) | 205 | (94) | 205 | (94) | 193 | (89) | 131 | (60) | 12 | (6) | 207 | (95) | 205 | (94) | 195 | (90) | 134 | (62) | |
| EIEC ( | NA | 9 | (29) | 8 | (26) | NA | 2 | (6) | NA | 19 | (61) | 19 | (61) | NA | 0 | (0) | ||||
| Other | 21 | (100) | 6 | (29) | 10 | (48) | NA | 13 | (62) | 21 | (100) | 11 | (52) | 13 | (62) | NA | 7 | (33) | ||
| Unassigned 2 | 2 | (1) | 1 | (0.4) | 3 | (1) | 49 | (23) | 0 | (0) | 1 | (0.4) | 0 | (0) | 3 | (1) | 36 | (13) | 0 | (0) |
|
| ||||||||||||||||||||
| 5 | (45) | 5 | (45) | 5 | (45) | 0 | (0) | 0 | (0) | 4 | (36) | 7 | (64) | 6 | (55) | 0 | (0) | 0 | (0) | |
| NA | 70 | (91) | 70 | (91) | 24 | (31) | 6 | (8) | NA | 73 | (95) | 73 | (95) | 30 | (39) | 3 | (4) | |||
| NA | 1 | (7) | 0 | (0) | 0 | (0) | 0 | (0) | NA | 0 | (0) | 0 | (0) | 0 | (0) | 0 | (0) | |||
| NA | 110 | (96) | 110 | (96) | 113 | (98) | 92 | (80) | NA | 112 | (97) | 112 | (97) | 108 | (94) | 101 | (88) | |||
| EIEC ( | NA | 9 | (29) | 8 | (26) | 0 | (0) | 1 | (3) | NA | 19 | (61) | 19 | (61) | 1 | (3) | 3 | (10) | ||
| Other | 21 | (100) | 6 | (29) | 10 | (48) | 0 | (0) | 12 | (57) | 21 | (100) | 11 | (52) | 13 | (62) | 0 | (0) | 4 | (19) |
| Unassigned 2 | 2 | (1) | 1 | (0.4) | 3 | (1) | 85 | (32) | 0 | (0) | 1 | (0.4) | 0 | (0) | 3 | (1) | 97 | (36) | 0 | (0) |
NA = not applicable, as no discriminating peaks were assigned to these classes. 1 Bruker MALDI Biotyper database (V8.0.0.0) and the Bruker Security-Relevant Library (V1.0.0.0). 2 Number of isolates that could not be assigned to a class.
Figure 4PCA of isolates in the training set. (a) Colored at genus level: beige = Shigella, teal = Escherichia; (b) Colored at pathotype level: black = Shigella/EIEC, green = E. coli (other than EIEC); (c) Colored at group level: orange = Shigella spp., yellow = EIEC, purple = E. coli (other than EIEC); (d) Colored at species level: light blue = S. dysenteriae, red = S. flexneri, green = S. boydii, pink = S. sonnei, blue = EIEC, light grey = Other E. coli.