| Literature DB >> 30361914 |
Cathrin Veenaas1, Anna Linusson2, Peter Haglund2.
Abstract
Comprehensive two-dimensional (2D) gas chromatography (GC×GC) coupled to mass spectrometry (MS, GC×GC-MS), which enhances selectivity compared to GC-MS analysis, can be used for non-directed analysis (non-target screening) of environmental samples. Additional tools that aid in identifying unknown compounds are needed to handle the large amount of data generated. These tools include retention indices for characterizing relative retention of compounds and prediction of such. In this study, two quantitative structure-retention relationship (QSRR) approaches for prediction of retention times (1tR and 2tR) and indices (linear retention indices (LRIs) and a new polyethylene glycol-based retention index (PEG-2I)) in GC × GC were explored, and their predictive power compared. In the first method, molecular descriptors combined with partial least squares (PLS) analysis were used to predict times and indices. In the second method, the commercial software package ChromGenius (ACD/Labs), based on a "federation of local models," was employed. Overall, the PLS approach exhibited better accuracy than the ChromGenius approach. Although average errors for the LRI prediction via ChromGenius were slightly lower, PLS was superior in all other cases. The average deviations between the predicted and the experimental value were 5% and 3% for the 1tR and LRI, and 5% and 12% for the 2tR and PEG-2I, respectively. These results are comparable to or better than those reported in previous studies. Finally, the developed model was successfully applied to an independent dataset and led to the discovery of 12 wrongly assigned compounds. The results of the present work represent the first-ever prediction of the PEG-2I. Graphical abstract ᅟ.Entities:
Keywords: Federation of local models; GC × GC; Non-target analysis; Partial least squares (PLS); Quantitative structure–retention relationship (QSRR); Retention-time prediction
Year: 2018 PMID: 30361914 PMCID: PMC6244764 DOI: 10.1007/s00216-018-1415-x
Source DB: PubMed Journal: Anal Bioanal Chem ISSN: 1618-2642 Impact factor: 4.142
Root-mean-square error of prediction (RMSEP) for PLS models of varying complexities using the test set and the lowest and highest measured values of each response
| Descriptors used in model | RMSEP* | |||
|---|---|---|---|---|
| 1tR (s) | LRI | 2tR (s) | PEG- | |
| MOE only | 142 (6) | 116 (8) | 0.37 (8) | 15.3 (8) |
| MOE and Percepta | 118 (7) | 104 (6) | 0.33 (6) | 13.2 (7) |
| MOE, Percepta, and manually transformed descriptors | 0.33 (6) | 13.3 (7) | ||
| MOE, Percepta, manually transformed, and normalized-to-weight descriptors | 119 (7) | 106 (7) | ||
| All, auto-transformed | 114 (9) | 107 (8) | 0.32 (7) | 20.2 (8) |
| All, except those with high uncertainty | 123 (6) | 114 (6) | 0.32 (7) | 14.4 (7) |
| All, except those of low importance | 132 (6) | 108 (8) | 0.30 (7) | 14.2 (8) |
| All, except those with high uncertainty and low importance (stepwise removal) | 145 (4) | – | 0.41 (2) | – |
| Lowest measured value | 270 | 808 | 1.68 | 0 |
| Highest measured value | 3325 | 3413 | 6.62 | 215.1 |
* The number of PLS components is shown in parentheses. The italicized values indicate the model with the lowest RMSEP for each response, respectively. 1tR, LRI, 2tR, and PEG-I are the first-dimension retention time, linear retention index, second-dimension retention time, and polyethylene glycol–based second-dimension retention index, respectively
Root-mean-square error of prediction (RMSEP) for each model optimization step with ChromGenius using the test set and lowest and highest measured values of each response
| Model settings | RMSEP* | |||
|---|---|---|---|---|
| 1tR (s) | LRI | 2tR (s) | PEG- | |
| Dice coefficient (25 compounds) |
| 93 | 0.28 | 14.6 |
| Dice coefficient (20 compounds) | 160 |
| 0.32 | 17.2 |
| Euclidian distance (25 compounds) | 196 | 105 |
|
|
| Euclidian distance (20 compounds) | 204 | 98 | 0.29 | 15.5 |
| Best model setting, no Abraham parameters | 176 | 155 | 0.34 | 17.5 |
| Best model setting, only Abraham parameters |
| 153 | 0.34 | 22.6 |
| Three instead of four molecules used per parameter | 160 | 135 |
| 18.1 |
| Lowest measured value | 270 | 808 | 1.68 | 0 |
| Highest measured value | 3325 | 3413 | 6.62 | 215.1 |
* The italic and the bold values indicate the results of the final model and the best model from the first step, respectively. 1tR, LRI, 2tR, and PEG-I are the first-dimension retention time, linear retention index, second-dimension retention time and polyethylene glycol–based second-dimension retention index, respectively
Fig. 1Predicted vs. experimental values for the external validation set using PLS. 1tR, LRI, 2tR, and PEG-I are the first-dimension retention time, linear retention index, second-dimension retention time, and polyethylene glycol-based second-dimension retention index, respectively
Fig. 2Predicted vs. experimental values for the external validation set using ChromGenius. 1tR, LRI, 2tR, and PEG-I are the first-dimension retention time, linear retention index, second-dimension retention time, and polyethylene glycol-based second-dimension retention index, respectively
Prediction errors (RMSEP) and average relative deviation of the predicted value from the experimental value for all four models using PLS and ChromGenius and the test set and external validation set, respectively
| External validation set prediction * | Test set prediction * | |||||||
|---|---|---|---|---|---|---|---|---|
| 1tR | LRI | 2tR | PEG- | 1tR | LRI | 2tR | PEG- | |
| PLS | ||||||||
| Average relative deviation | 5% | 4% | 5% | 12% | 7% | 5% | 6% | 16% |
| Average deviation | 80 s | 74 | 0.19 s | 7.8 | 85 s | 74 | 0.20 s | 8.2 |
| RMSEP | 109 s | 95 | 0.27 s | 11.3 | 121 s | 105 | 0.29 s | 12.2 |
| ChromGenius | ||||||||
| Average relative deviation | 6% | 3% | 4% | 12% | 9% | 3% | 5% | 17% |
| Average deviation | 115 s | 60 | 0.16 s | 7.8 | 124 s | 57 | 0.17 s | 9.2 |
| RMSEP | 143 s | 85 | 0.23 s | 11.8 | 158 s | 84 | 0.26 s | 14.5 |
* 1tR, LRI, 2tR, and PEG-I are the first-dimension retention time, linear retention index, second-dimension retention time, and polyethylene glycol–based second-dimension retention index, respectively. The results for the PEG-I models include compounds that were extrapolated due to a narrow PEG range
95-percentiles defining the range of error associated with the prediction of each final model
| 1tR (s) | LRI | 2tR (s) | PEG- | |
|---|---|---|---|---|
| PLS | 214 | 189 | 0.53 | 21.0 |
| ChromGenius | 258 | 160 | 0.48 | 23.6 |
| Average | 195 | 140 | 0.41 | 19.7 |
1tR, LRI, 2tR, and PEG-I are the first-dimension retention time, linear retention index, second-dimension retention time, and polyethylene glycol–based second-dimension retention index, respectively