| Literature DB >> 20957079 |
Roya Khosrokhavar1, Jahan Bakhsh Ghasemi, Fereshteh Shiri.
Abstract
In the present work, support vector machines (SVMs) and multiple linear regression (MLR) techniques were used for quantitative structure-property relationship (QSPR) studies of retention time (t(R)) in standardized liquid chromatography-UV-mass spectrometry of 67 mycotoxins (aflatoxins, trichothecenes, roquefortines and ochratoxins) based on molecular descriptors calculated from the optimized 3D structures. By applying missing value, zero and multicollinearity tests with a cutoff value of 0.95, and genetic algorithm method of variable selection, the most relevant descriptors were selected to build QSPR models. MLR and SVMs methods were employed to build QSPR models. The robustness of the QSPR models was characterized by the statistical validation and applicability domain (AD). The prediction results from the MLR and SVM models are in good agreement with the experimental values. The correlation and predictability measure by r(2) and q(2) are 0.931 and 0.932, repectively, for SVM and 0.923 and 0.915, respectively, for MLR. The applicability domain of the model was investigated using William's plot. The effects of different descriptors on the retention times are described.Entities:
Keywords: MLR; QSPR; SVM; William’s Plot; genetic algorithm; mycotoxins
Mesh:
Substances:
Year: 2010 PMID: 20957079 PMCID: PMC2956080 DOI: 10.3390/ijms11093052
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Details of the constructed QSPR model.
| Descriptor | Coefficient | Mean effect | VIF |
|---|---|---|---|
| C logP | 2.6951(±0.2248) | 5 | 1.006 |
| ElcE | −0.0002(±0.0001) | 8 | 1.246 |
| DPLL | −1.091(±0.2981) | −3.875 | 1.556 |
| LUMO | −1.6922(±0.5521) | 0.594 | 1.287 |
| Constant | 3.1912(±1.7569) | _ | _ |
= The octanol/water partition coefficient
= Electronic energy
= Dipole length
= Lowest Unoccupied Molecular Orbital energy
= Variable inflation factors
Correlation matrix for MLR model.
| tR | C logP | ElcE | DPLL | LUMO | |
|---|---|---|---|---|---|
| tR | 1 | ||||
| C logP | 0.821263 | 1 | |||
| ElcE | −0.21234 | 0.05977 | 1 | ||
| DPLL | −0.07144 | 0.004813 | −0.32903 | ||
| LUMO | −0.12041 | −0.05044 | 0.000773 | −0.45025 | 1 |
Figure 1The selection of the optimal epsilon for SVM (C = 4).
Figure 2The selection of the optimal capacity factors for SVM (ɛ = 0.01).
Comparison of experimental and predicted values of tR for prediction set by MLR and SVM models.
| No. | Exp. ( tR) | MLR model | SVM model | ||
|---|---|---|---|---|---|
| Pred. (tR) | RE (%) | Pred. (tR) | RE (%) | ||
| 21 | 5.1 | 4.97 | 2.55 | 5.03 | 1.37 |
| 4 | 6.6 | 6.91 | −4.7 | 7.99 | −21.06 |
| 23 | 7.4 | 7.03 | 5 | 8.35 | −12.84 |
| 41 | 8.59 | 8.88 | −3.38 | 10.08 | −17.35 |
| 3 | 10.33 | 9.44 | 8.62 | 10.25 | 0.77 |
| 38 | 10.51 | 11.43 | −8.75 | 12 | −14.18 |
| 24 | 11.28 | 12.03 | −6.65 | 12.37 | −9.66 |
| 27 | 13.69 | 11.51 | 15.92 | 11.74 | 14.24 |
| 34 | 14.15 | 11.48 | 18.87 | 12.53 | 11.45 |
| 13 | 15.03 | 14.52 | 3.39 | 15.18 | −1 |
| 25 | 15.56 | 14.61 | 6.11 | 14.79 | 4.95 |
| 37 | 17 | 14.29 | 15.94 | 15.08 | 11.29 |
| 11 | 18.02 | 15.7 | 12.87 | 16.37 | 9.16 |
| 46 | 18.6 | 18.91 | −1.67 | 19.39 | −4.25 |
| 65 | 20 | 22.66 | −13.3 | 22.11 | −10.55 |
| 29 | 21.12 | 22.61 | −7.05 | 20.43 | 3.27 |
| 55 | 21.6 | 20.74 | 3.98 | 19.84 | 8.15 |
Figure 3tR estimated by MLR (top panel) and SVM (bottom panel) modeling versus experimental values and residual versus experimental tR.
The statistical parameters obtained by applying the MLR and SVM methods to the prediction set.
| Parameters | MLR | SVM |
|---|---|---|
| RMSEP | 1.504 | 1.341 |
| REP | 10.902 | 9.719 |
| SEP | 1.551 | 1.382 |
| q2 | 0.915 | 0.932 |
| R2 | 0.923 | 0.931 |
| (R2-R02)/R2 | 0.001 | 0.0118 |
| (R2-R′02)/R2 | 0.0108 | 0.0011 |
| rm2 | 0.894 | 0.833 |
| k | 0.996 | 0.891 |
| k′ | 0.926 | 1.045 |
| NDS | 4 | 4 |
= Relative error of prediction.
= Standard error of prediction.
= Number of descriptors.
Figure 4Williams plot of standardized residual versus leverage.
Experimental retention time (tR) of 67 compounds.
| NO. | Compound | tR(min) | NO. | Compound | tR(min) |
|---|---|---|---|---|---|
| 1 | Aflatoxicol I | 12.45 | 9 | Austocystin A | 21.57 |
| 2 | Aflatoxin B1 | 11.50 | 10 | Averufin | 25.65 |
| 3 | Aflatoxin B2 | 10.33 | 11 | 5-Methoxysterigmatocystin | 18.02 |
| 4 | Aflatoxin B2 α | 6.60 | 12 | Dihydroxysterigmatocystin | 17.70 |
| 5 | Aflatoxin G1 | 10.16 | 13 | Methoxysterigmatocystin | 15.03 |
| 6 | Aflatoxin G2 | 8.97 | 14 | Sterigmatocystin | 18.91 |
| 7 | Aflatoxin G2α | 5.00 | 15 | Norsolorinic acid | 31.08 |
| 8 | Aflatoxin M1 | 7.21 | 16 | Parasiticol | 10.73 |
| 17 | Nivalenol | 1.27 | 27 | HT-2 Toxin | 13.69 |
| 18 | Fusarenone X | 2.35 | 28 | T-2 Toxin | 17.06 |
| 19 | Deoxynivalenol | 1.54 | 29 | Acetyl-T-2 toxin | 21.12 |
| 20 | 3-Acetyldeoxynivalenol | 5.21 | 30 | Trichodermin | 16.13 |
| 21 | 15- | 5.10 | 31 | Trichodermol | 9.69 |
| 22 | Scirpentriol | 1.82 | 32 | 7-α-Hydroxytrichodermol | 2.59 |
| 23 | 15-Acetoxyscirpenol | 7.40 | 33 | Verrucarol | 2.89 |
| 24 | Diacetoxyscirpenol | 11.28 | 34 | 4,15-Diacetylverrucarol | 14.15 |
| 25 | 3α-Acetyldiacetoxyscirpenol | 15.56 | 35 | Trichothecin | 16.29 |
| 26 | Neosolaniol | 3.19 | 36 | Trichothecolone | 3.63 |
| 37 | Trichoverrol A | 10.16 | |||
| 38 | Agroclavine-I | 17.00 | 51 | Ergotamin | 19.60 |
| 39 | Auranthine | 10.51 | 52 | Fumigaclavine C | 21.40 |
| 40 | Aurantiamine | 10.49 | 53 | Marcfortine A | 19.59 |
| 41 | Aurantioclavine | 14.30 | 54 | Marcfortine B | 17.39 |
| 42 | Chanoclavine-I | 8.59 | 55 | Meleagrin | 18.90 |
| 43 | Costaclavine | 17.00 | 56 | Oxalin | 21.60 |
| 44 | Cyclopenin | 11.60 | 57 | Pyroclavine | 14.81 |
| 45 | Cyclopenol | 6.20 | 58 | Roquefortine C | 20.50 |
| 46 | Cyclopeptin | 12.05 | 59 | Roquefortine D | 6.09 |
| 47 | Dihydroergotamin | 18.60 | 60 | Rugulovasine A and B | 8.43 |
| 48 | Elymoclavine | 5.34 | 61 | Secoclavine | 20.40 |
| 49 | Epoxyagroclavine-I | 10.00 | 62 | α-Ergocryptin | 19.20 |
| 50 | Ergocristine | 25.10 | |||
| 63 | Ochratoxin α | 5.60 | 66 | Ochratoxin B-ethyl ester | 19.41 |
| 64 | Ochratoxin A-methyl ester | 22.49 | 67 | Ochratoxin α-methyl ester | 16.16 |
| 65 | Ochratoxin B-methyl ester | 20.00 | |||
Parameters of genetic algorithm (GA).
| Cross-Validation | Random subset |
|---|---|
| Number of subsets | 4 |
| Population size | 64 |
| Mutation rate | 0.005 |
| Window width | 2 |
| Initial term% | 20% |
| Maximum generation | 100 |
| Convergence (%) | 50 |
| Cross-over | Double |