| Literature DB >> 23203132 |
Jamshed Akbar1, Shahid Iqbal, Fozia Batool, Abdul Karim, Kim Wei Chan.
Abstract
Quantitative structure-retention relationships (QSRRs) have successfully been developed for naturally occurring phenolic compounds in a reversed-phase liquid chromatographic (RPLC) system. A total of 1519 descriptors were calculated from the optimized structures of the molecules using MOPAC2009 and DRAGON softwares. The data set of 39 molecules was divided into training and external validation sets. For feature selection and mapping we used step-wise multiple linear regression (SMLR), unsupervised forward selection followed by step-wise multiple linear regression (UFS-SMLR) and artificial neural networks (ANN). Stable and robust models with significant predictive abilities in terms of validation statistics were obtained with negation of any chance correlation. ANN models were found better than remaining two approaches. HNar, IDM, Mp, GATS2v, DISP and 3D-MoRSE (signals 22, 28 and 32) descriptors based on van der Waals volume, electronegativity, mass and polarizability, at atomic level, were found to have significant effects on the retention times. The possible implications of these descriptors in RPLC have been discussed. All the models are proven to be quite able to predict the retention times of phenolic compounds and have shown remarkable validation, robustness, stability and predictive performance.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23203132 PMCID: PMC3509648 DOI: 10.3390/ijms131115387
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Descriptors used in the study.
| Method/Type | Descriptors |
|---|---|
| Total energy, electronic energy, core-core repulsion, dielectric energy, dipole moment, ionization energy, energies of highest occupied molecular orbital ( | |
| Constitutional, topological, molecular walk counts, BCUT, Galvez topological charge indices, 2D autocorrelations, charge descriptors, aromaticity indices, Randic molecular profiles, geometrical, RDF, 3D-MoRSE, WHIM, GETAWAY, functional groups, atom-centered fragments, empirical and properties. |
Correlations of the descriptors in SMLR model.
| HNar | GATS2v | DISPe | Mor32e | Ke | |
|---|---|---|---|---|---|
| 1.0000 | |||||
| −0.0482 | 1.0000 | ||||
| 0.1253 | 0.1566 | 1.0000 | |||
| −0.4053 | −0.4069 | 0.0784 | 1.0000 | ||
| 0.4727 | 0.4644 | 0.1360 | −0.3608 | 1.0000 |
Figure 1Representative y-scrambling plot (SMLR model).
UFS selected descriptors with R2max = 0.90.
| Descriptors | Name | Type |
|---|---|---|
| IDM | Mean information content on the distance magnitude | Topological |
| MATS6p | Moran autocorrelation-lag6/weighted by atomic poloarizabilities | 2D-autocorrelations |
| Mp | Mean atomic polarizability (scaled on carbon atom) | Constitutional |
| E1e | 1st component accessibility directional WHIM index/weighted by atomic Sanderson electronegativities | WHIM |
| MATS6e | Moran autocorrelation-lag6/weighted by atomic Sanderson electronegativities | 2D-autocorrelations |
| Mor30m | 3D-MoRSE-signal 30/weighted by atomic masses | 3D-MoRSE |
| AROM | Aromaticity | Aromatic indices |
| E3u | 3rd component accessibility directional WHIM index/unweighted | WHIM |
| Mor22v | 3D-MoRSE-signal 22/weighted by atomic volume | 3D-MoRSE |
| Mor28e | 3D-MoRSE-signal 28/weighted by atomic Sanderson electronegativities | 3D-MoRSE |
| Mor29m | 3D-MoRSE-signal 29/weighted by atomic masses | 3D-MoRSE |
| DISPm | d COMMA2 value/weighted by atomic masses | Geometrical |
| PJI3 | 3D petijean shape index | Geometrical |
| G3s | 3rd component accessibility directional WHIM index/weighted by atomic electrotopological states | WHIM |
| MATS5e | Moran autocorrelation-lag5/weighted by atomic Sanderson electronegativities | 2D-autocorrelations |
| PJI2 | 2D petijean shape index | Topological |
| SIC4 | Structural information content (neighbourhood symmetry of 4-order) | Topological |
| E2p | 3rd component accessibility directional WHIM index/weighted by atomic poloarizabilities | WHIM |
| Mor12e | 3D-MoRSE-signal 12/weighted by atomic Sanderson electronegativities | 3D-MoRSE |
| IVDE | Mean information content vertex degree equality | Topological |
| SPI | Superpendentic index | Topological |
| HATS7p | Leaverage-weighted autocorrelation of lag 7/weighted by atomic poloarizabilities | GETAWAY |
Figure 2Experimental and predicted retention times (RT) for training and validation sets. (a) SMLR model (b) UFS-SMLR model.
Architecture and validation statistics of the optimal ANNs.
| SMLR-ANN | UFS-SMLR-ANN | |
|---|---|---|
| No. of neurons in the input layer | 4 | 5 |
| No. of neurons in the hidden layer | 6 | 5 |
| No. of neurons in the output layer | 1 | 1 |
| Hidden weight decay | 0.01 | 0.01 |
| Output weight decay | 0.01 | 0.01 |
| Hidden activation function | Tanh | Exponential |
| Output activation function | Tanh | Logistic |
| PRESSext | 1.4841 | 1.1021 |
| 0.8145 | 0.8622 | |
| Training error | 0.0013 | 0.0047 |
| Test error | 0.0021 | 0.0009 |
| Validation error | 0.0042 | 0.0031 |
Experimental and predicted retention times (RT) of naturally occurring phenolic compounds.
| Sr No. | Compound | Experimental RT (min) | Predicted RT (min) | |||
|---|---|---|---|---|---|---|
|
| ||||||
| SMLR | UFS-SMLR | SMLR-ANN | UFS-SMLR-ANN | |||
| 1 | Gallic acid | 1.63 | 1.82 | 2.12 | 1.94 | 2.54 |
| 2 | Gentisic acid | 3.02 | 3.36 | 3.65 | 3.28 | 3.49 |
| 3 | Protocatechuicacid | 2.43 | 2.61 | 3.04 | 2.67 | 2.94 |
| 4 | Salicylic acid | 3.96 | 3.93 | 4.23 | 3.89 | 4.04 |
| 5 | Syringic acid | 3.27 | 3.36 | 2.58 | 3.10 | 2.61 |
| 6 | Vanillic acid | 3.14 | 3.29 | 3.05 | 3.07 | 2.93 |
| 7 | 2,4-Dihydroxybenzoic acid | 3.26 | 2.67 | 3.13 | 2.76 | 3.05 |
| 8 | 3-Methoxybenzoic acid | 4.32 | 4.25 | 3.53 | 4.37 | 3.31 |
| 9 | 4-Hydroxybenzoic acid | 2.94 | 2.88 | 3.60 | 2.90 | 3.45 |
| 10 | Caffeicacid | 3.24 | 2.69 | 3.31 | 2.74 | 3.08 |
| 11 | Chlorogenic acid | 3.07 | 3.26 | 3.13 | 3.16 | 2.78 |
| 12 | Ferulicacid | 3.80 | 3.84 | 4.11 | 3.84 | 3.89 |
| 13 | 3.88 | 3.69 | 3.94 | 3.67 | 3.71 | |
| 14 | 4.07 | 4.39 | 4.42 | 4.31 | 4.37 | |
| 15 | 3.63 | 3.47 | 3.70 | 3.45 | 3.54 | |
| 16 | Sinapic acid | 3.85 | 3.86 | 3.80 | 3.89 | 3.59 |
| 17 | 4.69 | 4.80 | 4.38 | 4.69 | 4.14 | |
| 18 | Dihydrocaffeic acid | 3.00 | 2.84 | 2.52 | 2.85 | 2.57 |
| 19 | Homovanillicacid | 3.22 | 3.29 | 3.08 | 3.14 | 3.00 |
| 20 | DOPAC | 2.34 | 2.11 | 2.27 | 2.19 | 2.59 |
| 21 | 4-hydroxyphenylacetic acid | 2.92 | 3.34 | 2.64 | 3.28 | 2.79 |
| 22 | Ellagic acid | 3.80 | 3.90 | 3.65 | 4.07 | 3.27 |
| 23 | Vanillin | 3.49 | 3.52 | 3.18 | 3.45 | 3.05 |
| 24 | Tyrosol | 2.73 | 3.00 | 2.80 | 3.05 | 2.77 |
| 25 | Apigenin | 5.14 | 5.01 | 4.88 | 5.16 | 4.99 |
| 26 | Chrysin | 5.92 | 6.18 | 5.78 | 5.77 | 5.62 |
| 27 | Luteolin | 4.76 | 4.33 | 4.82 | 4.45 | 4.90 |
| 28 | Luteolin-7- | 3.81 | 4.10 | 4.32 | 4.10 | 4.24 |
| 29 | Kaempferide | 6.06 | 5.65 | 5.91 | 5.66 | 5.74 |
| 30 | Myricetin | 4.28 | 3.98 | 4.03 | 3.98 | 4.00 |
| 31 | Quercetin | 4.76 | 4.28 | 4.87 | 4.39 | 4.89 |
| 32 | Rutin | 3.73 | 3.91 | 3.62 | 3.82 | 3.62 |
| 33 | Hesperidin | 3.94 | 3.71 | 4.23 | 3.73 | 4.26 |
| 34 | Isosakuranetin | 5.94 | 5.74 | 5.45 | 5.68 | 5.43 |
| 35 | Naringenin | 5.11 | 5.05 | 4.87 | 5.20 | 5.04 |
| 36 | (+)-Catechin | 2.99 | 3.91 | 4.07 | 3.89 | 3.63 |
| 37 | (−)-Epicatechin | 3.26 | 3.66 | 3.67 | 3.63 | 3.28 |
| 38 | Genistein | 5.09 | 5.15 | 5.12 | 5.37 | 5.21 |
| 39 | (+)-Taxifolin | 3.85 | 3.57 | 4.02 | 3.51 | 3.78 |
For ANN models, compounds labelled with letter
represent molecules in the test set, while those with
represent molecules in the validation set and unlabelled compounds are in training set.
Figure 3Experimental and predicted retention times (RT) for training, test and validation sets. (a) SMLR-ANN model (b) UFS-SMLR-ANN model.
Figure 4Residual plot for QSRR models.