| Literature DB >> 32168813 |
Alessandra Biancolillo1, Maria Anna Maggi2, Sebastian Bassi3, Federico Marini3, Angelo Antonio D'Archivio1.
Abstract
Phenoxy acid herbicides are used worldwide and are potential contaminants of drinking water. Reversed phase high-performance liquid chromatography (RP-HPLC) is commonly used to monitor phenoxy acid herbicides in water samples. RP-HPLC retention of phenoxy acids is affected by both mobile phase composition and pH, but the synergic effect of these two factors, which is also dependent on the structure and pKa of solutes, cannot be easily predicted. In this paper, to support the setup of RP-HPLC analysis of phenoxy acids under application of linear mobile phase gradients we modelled the simultaneous effect of the molecular structure and the elution conditions (pH, initial acetonitrile content in the eluent and gradient slope) on the retention of the solutes. In particular, the chromatographic conditions and the molecular descriptors collected on the analyzed compounds were used to estimate the retention factor k by Partial Least Squares (PLS) regression. Eventually, a variable selection approach, Genetic Algorithms, was used to reduce the model complexity and allow an easier interpretation. The PLS model calibrated on the retention data of 15 solutes and successively tested on three external analytes provided satisfying and reliable results.Entities:
Keywords: HPLC; PLS regression; gradient elution; molecular descriptors; phenoxy acid herbicides; retention prediction
Mesh:
Substances:
Year: 2020 PMID: 32168813 PMCID: PMC7144001 DOI: 10.3390/molecules25061262
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Molecular structure and pKa value (from literature [38]) of the investigated acids.
| Name | Structure | pKa |
|---|---|---|
| 2,4-D |
| 2.73 |
| 2,4,5-T |
| 2.83 |
| 2,4,5-TP |
| 2.84 |
| Clopyralid |
| 2.29 |
| Dichlorprop |
| 3.10 |
| MCPA |
| 3.13 |
| Mecoprop |
| 3.10 |
| Tryclopir |
| 3.97 |
| Benzoic acid |
| 4.19 |
| Salicylic acid |
| 2.97 |
| 2-Iodobenzoic acid |
| 2.93 |
| 3-Iodobenzoic acid |
| 3.85 |
| 4-Iodobenzoic acid |
| 4.00 |
| Phenylacetic acid |
| 4.31 |
| 4-Chlorophenylacetic acid |
| 2.76 |
| 4-Nitrophenylacetic acid |
| 3.85 |
| Phenoxyacetic acid |
| 3.17 |
| 4-Chlorophenoxyacetic acid |
| 3.56 |
Chromatographic conditions utilized to collect the retention data.
| Column | Kinetex C18 (Phenomenex) |
|---|---|
| Eluent | water-acetonitrile, flux:1 mLmin−1 |
| Elution mode | Starting acetonitrile volume fraction (φi): 30, 40, 50, 60, 70% |
| Eluent pH: 2,3,4 | |
| Application time of linear composition gradient (from φi to 100%): none, 15, 20, 25 min |
Figure 1PLS model. Legend: Calibration set: blue dots; Test set: orange diamonds.
Parameters of the genetic algorithm.
| Parameters | Value |
|---|---|
| Number of chromosomes | 30 |
| Probability of selection in the original population | 0.015 |
| Maximum number of variables per chromosome | 30 |
| Probability of mutation | 0.01 |
| Probability of cross-over | 0.50 |
| Backward stepwise selection every | 100 iterations |
| Number of runs | 100 |
| Number of evaluations per run | 200 |
List of the 29 predictors selected by the GA in order of decreasing frequency.
| Variable | Description | Block |
|---|---|---|
| φi | starting acetonitrile volume fraction in the eluent | - |
| ϕ | gradient slope | - |
| pH | eluent pH | - |
| MATS2i | Moran autocorrelation of lag 2 weighted by ionization potential | 2D autocorrelations |
| nCb- | number of substituted benzene C(sp2) | Functional group counts |
| Mor21u | signal 21/unweighted | 3D-MoRSE descriptors |
| MATS4p | Moran autocorrelation of lag 4 weighted by polarizability | 2D autocorrelations |
| Eta_beta_A | eta average VEM count | ETA indices |
| TDB05s | 3D Topological distance-based descriptors - lag 5 weighted by I-state | 3D autocorrelations |
| RDF070u | Radial Distribution Function - 070/unweighted | RDF descriptors |
| RDF025s | Radial Distribution Function - 025/weighted by I-state | RDF descriptors |
| Mor06s | signal 06/weighted by I-state | 3D-MoRSE descriptors |
| G2i | 2nd component symmetry directional WHIM index/weighted by ionization potential | WHIM descriptors |
| MW | molecular weight | Constitutional indices |
| nCsp2 | number of sp2 hybridized Carbon atoms | Constitutional indices |
| MATS5m | Moran autocorrelation of lag 5 weighted by mass | 2D autocorrelations |
| Mor28v | signal 28/weighted by van der Waals volume | 3D-MoRSE descriptors |
| R7e+ | R maximal autocorrelation of lag 7/weighted by Sanderson electronegativity | GETAWAY descriptors |
| nN | number of Nitrogen atoms | Constitutional indices |
| piPC06 | molecular multiple path count of order 6 | Walk and path counts |
| RDF080u | Radial Distribution Function - 080/unweighted | RDF descriptors |
| RDF090u | Radial Distribution Function - 090/unweighted | RDF descriptors |
| RDF070s | Radial Distribution Function - 070/weighted by I-state | RDF descriptors |
| Mor04m | signal 04/weighted by mass | 3D-MoRSE descriptors |
| Mor21p | signal 21/weighted by polarizability | 3D-MoRSE descriptors |
| G2e | 2nd component symmetry directional WHIM index/weighted by Sanderson electronegativity | WHIM descriptors |
| E2s | 2nd component accessibility directional WHIM index/weighted by I-state | WHIM descriptors |
| R5s | R autocorrelation of lag 5/weighted by I-state | GETAWAY descriptors |
| CATS2D_03_LL | Lipophilic-Lipophilic at lag 03 | CATS2D |
Regression coefficients extracted from the PLS model based on the 29 descriptors selected by GA.
| Descriptor | Coefficient |
|---|---|
| φi | −0.686 |
| ϕ | −0.091 |
| pH | 0.212 |
| MATS2i | 0.061 |
| nCb- | 0.097 |
| Mor21u | −0.067 |
| MATS4p | 0.080 |
| Eta_beta_A | −0.073 |
| TDB05s | −0.042 |
| RDF070u | −0.025 |
| RDF025s | 0.0366 |
| Mor06s | 0.0900 |
| G2i | −0.0400 |
| MW | 0.093 |
| nCsp2 | 0.039 |
| MATS5m | 0.008 |
| Mor28v | 0.001 |
| R7e+ | 0.034 |
| nN | −0.048 |
| piPC06 | 0.0658 |
| RDF080u | −0.015 |
| RDF090u | 0.0188 |
| RDF070s | 0.016 |
| Mor04m | 0,025 |
| Mor21p | −0.081 |
| G2e | −0.007 |
| E2s | 0.025 |
| R5s | −0.037 |
| CATS2D_03_LL | 0.0300 |
Figure 2PLS model built on the reduced set of 29 predictors selected by GA: scores plot of the training samples. The points are colored according to their values of log k.
Figure 3Variables loadings projected on the first two latent variables of the PLS model.