| Literature DB >> 33709707 |
Karin Höjer Holmgren1, Lina Mörén1, Linnea Ahlinder1, Andreas Larsson1, Daniel Wiktelius1, Rikard Norlin1, Crister Åstot1.
Abstract
Route determination of sulfur mustard was accomplished through comprehensive nontargeted screening of chemical attribution signatures. Sulfur mustard samples prepared via 11 different synthetic routes were analyzed using gas chromatography/high-resolution mass spectrometry. A large number of compounds were detected, and multivariate data analysis of the mass spectrometric results enabled the discovery of route-specific signature profiles. The performance of two supervised machine learning algorithms for retrospective synthetic route attribution, orthogonal partial least squares discriminant analysis (OPLS-DA) and random forest (RF), were compared using external test sets. Complete classification accuracy was achieved for test set samples (2/2 and 9/9) by using classification models to resolve the one-step routes starting from ethylene and the thiodiglycol chlorination methods used in the two-step routes. Retrospective determination of initial thiodiglycol synthesis methods in sulfur mustard samples, following chlorination, was more difficult. Nevertheless, the large number of markers detected using the nontargeted methodology enabled correct assignment of 5/9 test set samples using OPLS-DA and 8/9 using RF. RF was also used to construct an 11-class model with a total classification accuracy of 10/11. The developed methods were further evaluated by classifying sulfur mustard spiked into soil and textile matrix samples. Due to matrix effects and the low spiking level (0.05% w/w), route determination was more challenging in these cases. Nevertheless, acceptable classification performance was achieved during external test set validation: chlorination methods were correctly classified for 12/18 and 11/15 in spiked soil and textile samples, respectively.Entities:
Year: 2021 PMID: 33709707 PMCID: PMC8041246 DOI: 10.1021/acs.analchem.0c04555
Source DB: PubMed Journal: Anal Chem ISSN: 0003-2700 Impact factor: 6.986
Figure 1Schematic overview of HD synthesis routes. Two-step routes (R1–9) proceed via the intermediate thiodiglycol (TDG routes). HD is produced directly from ethylene in the one-step ethylene routes (R10 and R11).
Overview of Samplesa,b,c
| crude
HD samples | spiked matrix samplesa | |||
|---|---|---|---|---|
| synthesis routes | training set | test set | training
set | test set |
| R1 | 4 | 1 | 6 + 6 | 2 + 2 |
| R2 | 4 | 1 | 6 + 6 | 2 + 2 |
| R3 | 4 | 1 | 6 + 6 | 2 + 2 |
| R4 | 4 | 1 | 6 + 6 | 2 + 2 |
| R5 | 4 | 1 | 6 + 6 | 2 + 2 |
| R6 | 4 | 1 | 6 + 6 | 2 + 2 |
| R7 | 4 | 1 | 6 + 6 | 2 + 2 |
| R8 | 4 | 1 | 6 + 6 | 2 + 2 |
| R9 | 4 | 1 | 6 + 6 | 2 + 2 |
| R10 | 4 | 1c | 6 + 6 | 2 + 2 |
| R11 | 4 | 1c | 6 + 6 | 2 + 2 |
Soil and textile matrices spiked with crude HD.
Pooled crude samples spiked in triplicates at two occasions.
Pooled crude sample.
Figure 3Flowchart illustrating sample use and workflow.
Figure 2Hierarchical decision tree where the first model (M1) differentiates between ethylene route and TDG route samples. The second model (M2) differentiates between the three chlorination methods, while M3a–M3c differentiate between methods of TDG synthesis. This model tree was applied to the datasets for the crude HD samples and the spiked matrix samples
Characteristics of Classification Models and Correctly Predicted Crude Test Set Samplesa
| OPLS-DA | RF | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| classification model | attribution capacity | class | comp. | R2X (cum) | R2Y (cum) | Q2(cum) | prediction | OOB error (%) | prediction |
| M1crude | ethylene or TDG routes | R(1–9) | 1 + 2 + 0 | 0.72 | 0.99 | 0.95 | 9/9 | 2.3 | 9/9 |
| R(10, 11) | 2/2 | 2/2 | |||||||
| M2crude | chlorination methods | R(1, 4, 7) | 2 + 2 + 0 | 0.61 | 0.98 | 0.95 | 3/3 | 3/3 | |
| R(2, 5, 8) | 3/3 | 0.0 | 3/3 | ||||||
| R(3, 6, 9) | 3/3 | 3/3 | |||||||
| M3a crude | TDG synthesis methods of R(1, 4, 7) samples | R1 | 2 + 2 + 0 | 0.70 | 0.98 | 0.90 | 1/1 | 1/1 | |
| R4 | 0/1 | 33.3 | 1/1 | ||||||
| R7 | 0/1 | 1/1 | |||||||
| M3bcrude | TDG synthesis methods of R(2, 5, 8) samples | R2 | 2 + 1 + 0 | 0.63 | 0.98 | 0.88 | 1/1 | 1/1 | |
| R5 | 1/1 | 16.7 | 1/1 | ||||||
| R8 | 0/1 | 1/1 | |||||||
| M3ccrude | TDG synthesis methods of R(3, 6, 9) samples | R3 | 2 + 1 + 0 | 0.55 | 0.97 | 0.91 | 1/1 | 1/1 | |
| R6 | 0/1 | 8.3 | 0/1 | ||||||
| R9 | 1/1 | 1/1 | |||||||
Comp. shows the number of components (x/y joint predictive variation + variation in x orthogonal to y + variation in y orthogonal to x) included in each model.
Figure 4Distribution of the 1097 potential CAS found in crude HD and spiked matrices.
Classification Model Characteristics and Correctly Predicted Test Set Matrix Samplesa,b
| OPLS-DA | RF | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| classification model | attribution capacity | class | comp. | pred. soil | pred. textile | OOB error (%) | pred. soil | pred. textile | |||
| M1matrix | ethylene or TDG routes | R(10, 11) | 1 + 2 + 0 | 0.45 | 0.99 | 0.97 | - | - | 0 | - | - |
| R(1–9) | |||||||||||
| M2matrix | chlorination methods | R(1, 4, 7) | 2 + 3 + 0 | 0.46 | 0.97 | 0.8 | 5/6 | 4/4 | 0 | 5/6 | 4/4 |
| R(2, 5, 8) | 5/6 | 5/5 | 5/6 | 5/5 | |||||||
| R(3, 6, 9) | 2/6 | 2/6 | 2/6 | 2/6 | |||||||
| M3amatrix | TDG synthesis methods of R(1, 4, 7) samples | R1 | 2 + 2 + 0 | 0.75 | 0.93 | 0.66 | 2/2 | 0/1 | 2.9 | 2/2 | 1/1 |
| R4 | 0/2 | 0/1 | 0/2 | 0/1 | |||||||
| R7 | 2/2 | 2/2 | 2/2 | 2/2 | |||||||
| M3bmatrix | TDG synthesis methods of R(2, 5, 8) samples | R2 | 2 + 3 + 0 | 0.80 | 0.97 | 0.91 | 2/2 | 1/1 | 2.8 | 0/2 | 0/1 |
| R5 | 2/2 | 2/2 | 2/2 | 2/2 | |||||||
| R8 | 2/2 | 2/2 | 2/2 | 2/2 | |||||||
| M3cmatrix | TDG synthesis methods of R(3, 6, 9) samples | R3 | 2 + 0 + 0 | 0.73 | 0.64 | 0.26 | 0/2 | 0/2 | 0 | 0/2 | 0/2 |
| R6 | 2/2 | 2/2 | 2/2 | 2/2 | |||||||
| R9 | 2/2 | 0/2 | 2/2 | 2/2 | |||||||
Comp. shows the number of components (x/y joint predictive variation + variation in x orthogonal to y + variation in y orthogonal to x) included in each model.
In the initial PCA model, two outliers were detected among the training set samples and three in the test set and thus excluded from further analysis.
Prediction Performance of 11-Class RF Models for Crude HD (M4crude) and Spiked Matrices (M4matrix)a
| test set
samples | ||||
|---|---|---|---|---|
| model | OOB error (%) | crude HD | spiked soil | spiked textile |
| M4crude | 20.5 | - | - | |
| M4matrix | 0.8 | - | ||
Number of correct/total predictions.
Prediction Performance of OPLS-DA and RF Classification Models for Chlorination Methods in Spiked Matrices (M2matrix)a
| predicted
class | |||||||
|---|---|---|---|---|---|---|---|
| OPLS-DA | RF | ||||||
| true class | R1, 4, 7 | R2, 5, 8 | R3, 6, 9 | no class | R1, 4, 7 | R2, 5, 8 | R3, 6, 9 |
| R1, 4, 7 | 1 | 1 | |||||
| R2, 5, 8 | 1 | 3 | 1 | ||||
| R3, 6, 9 | 1 | 7 | 8 | ||||
Number of correct predictions.