| Literature DB >> 35886881 |
Karl Marti Toots1, Sulev Sild1, Jaan Leis1, William E Acree2, Uko Maran1.
Abstract
Ionic liquids (ILs) are known for their unique characteristics as solvents and electrolytes. Therefore, new ILs are being developed and adapted as innovative chemical environments for different applications in which their properties need to be understood on a molecular level. Computational data-driven methods provide means for understanding of properties at molecular level, and quantitative structure-property relationships (QSPRs) provide the framework for this. This framework is commonly used to study the properties of molecules in ILs as an environment. The opposite situation where the property is considered as a function of the ionic liquid does not exist. The aim of the present study was to supplement this perspective with new knowledge and to develop QSPRs that would allow the understanding of molecular interactions in ionic liquids based on the structure of the cationic moiety. A wide range of applications in electrochemistry, separation and extraction chemistry depends on the partitioning of solutes between the ionic liquid and the surrounding environment that is characterized by the gas-ionic liquid partition coefficient. To model this property as a function of the structure of a cationic counterpart, a series of ionic liquids was selected with a common bis-(trifluoromethylsulfonyl)-imide anion, [Tf2N]-, for benzene, hexane and cyclohexane. MLR, SVR and GPR machine learning approaches were used to derive data-driven models and their performance was compared. The cross-validation coefficients of determination in the range 0.71-0.93 along with other performance statistics indicated a strong accuracy of models for all data series and machine learning methods. The analysis and interpretation of descriptors revealed that generally higher lipophilicity and dispersion interaction capability, and lower polarity in the cations induces a higher partition coefficient for benzene, hexane, cyclohexane and hydrocarbons in general. The applicability domain analysis of models concluded that there were no highly influential outliers and the models are applicable to a wide selection of cation families with variable size, polarity and aliphatic or aromatic nature.Entities:
Keywords: Ionic liquid; QSPR; gas-ionic liquid partition coefficient; gaussian process regression; molecular interactions; multiple linear regression; support vector regression
Mesh:
Substances:
Year: 2022 PMID: 35886881 PMCID: PMC9323540 DOI: 10.3390/ijms23147534
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 6.208
Hyperparameters of the SVR and GPR final models.
| C | ε | γ | |
|---|---|---|---|
| SVRh | 1 | 0.001 | auto |
| SVRc | 5 | 0.001 | 0.1 |
| SVRb | 1 | 0.001 | scale |
|
|
|
| |
| GPRh | 0.478 | 0.00947 | 3.7 |
| GPRc | 0.364 | 0.00888 | 9.52 |
| GPRb | 3.13 | 0.00215 | 2.91 |
Statistical parameters of final linear and non-linear models on all cross-validation folds.
| R2 | RMSE | CCC | |||||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
| |
| train | 0.944 | 0.966 | 0.957 | 0.092 | 0.071 | 0.080 | 0.971 | 0.982 | 0.978 |
| test | 0.919 | 0.926 | 0.924 | 0.101 | 0.098 | 0.097 | 0.960 | 0.957 | 0.957 |
|
|
|
|
|
|
|
|
|
| |
| train | 0.915 | 0.946 | 0.940 | 0.102 | 0.081 | 0.085 | 0.955 | 0.972 | 0.969 |
| test | 0.891 | 0.910 | 0.903 | 0.110 | 0.097 | 0.097 | 0.942 | 0.953 | 0.950 |
|
|
|
|
|
|
|
|
|
| |
| train | 0.791 | 0.973 | 0.935 | 0.068 | 0.025 | 0.038 | 0.883 | 0.986 | 0.966 |
| test | 0.717 | 0.869 | 0.788 | 0.072 | 0.051 | 0.057 | 0.813 | 0.928 | 0.869 |
Figure 1Predicted vs. experimental log K scatter plots for each MLR model with training set observations in blue and validation set values in orange. Compounds are numbered in ascending log K order (Tables S1–S3).
Figure 2Predicted vs. experimental log K scatter plots for each SVR model with training set observations in blue and validation set values in orange. Compounds are numbered in ascending log K order (Tables S1–S3).
Figure 3Predicted vs. experimental log K scatter plots for each GPR model with training set observations in blue and validation set values in orange. Compounds are numbered in ascending log K order (Tables S1–S3).
Standardized regression coefficients of descriptors for linear models and permutation importance of descriptors for linear and non-linear models. Between the models, the columns are attributed to the same or similar descriptor where possible.
| Model | Descriptors: Standardized Regression Coefficients | |||||
|---|---|---|---|---|---|---|
| MLRh |
|
|
|
|
| |
| −0.326 | −0.089 | −0.126 | 0.060 | −0.133 | ||
| MLRc |
|
|
|
| ||
| −0.329 | 0.101 | −0.153 | −0.126 | |||
| MLRb |
|
|
| |||
| −0.042 | −0.131 | −0.055 | ||||
|
| ||||||
| MLRh |
|
|
|
|
| |
| 1.43 | 0.096 | 0.214 | 0.048 | 0.237 | ||
| MLRc |
|
|
|
| ||
| 1.79 | 0.156 | 0.376 | 0.264 | |||
| MLRb |
|
|
| |||
| 1.50 | 0.272 | 0.166 | ||||
| SVRh |
|
|
|
|
| |
| 0.324 | 0.155 | 0.422 | 0.189 | 0.351 | ||
| SVRc |
|
|
|
| ||
| 1.04 | 0.0938 | 0.226 | 0.237 | |||
| SVRb |
|
|
|
| ||
| 0.837 | 0.218 | 0.511 | 0.319 | |||
| GPRh |
|
|
|
| ||
| 1.92 | 0.0472 | 0.476 | 0.564 | |||
| GPRc |
|
|
|
|
| |
| 1.33 | 0.0664 | 0.254 | 0.0129 | 0.513 | ||
| GPRb |
|
|
| |||
| 0.638 | 0.985 | 1.01 | ||||
Descriptor structural contribution and related solvent interaction based on descriptor analysis.
| Solvent Interaction | Main Structural Contribution | Descriptors | ||
|---|---|---|---|---|
| MLR | SVR | GPR | ||
| Atom count/chain length | ||||
| Molecule surface area |
|
| ||
| Branching |
| |||
| Lipophilicity |
| |||
| Gasteiger charge |
| |||
| Electronegativity | ||||
| Bond order | ||||
| Heteroatoms/hydrogen bonding atoms | ||||
* descriptors that relate to multiple structural contributions.
Figure 4Influence plot for the hexane MLR model. The dotted horizontal lines distinguish possible outliers and vertical lines the high-leverage compounds. The point size is determined by the Cook’s distance (D) value for the point. Cations are numbered in ascending log K order (Tables S1–S3).
Figure 5Influence plot for the cyclohexane MLR model. The dotted horizontal lines distinguish possible outliers and vertical lines the high-leverage compounds. The point size is determined by the Cook’s distance (D) value for the point. Cations are numbered in ascending log K order (Tables S1–S3).
Figure 6Influence plot for the benzene MLR model. The dotted horizontal lines distinguish possible outliers and vertical lines the high-leverage compounds. The point size is determined by the Cook’s distance (D) value for the point. Cations are numbered in ascending log K order (Tables S1–S3).
Structures and abbreviations of cations.
| [PrOHMMorp]+ | [4-CNBPy]+ | [EtOHMIm]+ |
| [EtOHM3Am]+ | [Et3S]+ | [1-PrOHPy]+ |
| [CNMeM2iPAm]+ | [(Meo)2Im]+ | [M3BAm]+ |
| [BzPy]+ | [MeoeMMorp]+ | [EtOHM2iPAm]+ |
| [TDC]+ | [C1,9(M2iPAm)2]2+ | [BzMPyrr]+ |
Figure 7Data series preparation workflow.
SVR hyperparameter tuning values.
| C | 0.001, 0.005, 0.1, 0.5, 1,5, 10, 50, 100, 500, 1000 |
| ε | 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0, 5.0, 10.0 |
| γ | 0.001, 0.005, 0.01, 0.05, 0.1, ‘auto’, ‘scale’ |