| Literature DB >> 35497322 |
Shahram Lotfi1, Shahin Ahmadi2, Parvin Kumar3.
Abstract
Ionic liquids (ILs) have captured intensive attention owing to their unique properties such as high thermal stability, negligible vapour pressure, high dissolution capacity and high ionic conductivity as well as their wide applications in various scientific fields including organic synthesis, catalysis, and industrial extraction processes. Many applications of ionic liquids (ILs) rely on the melting point (T m). Therefore, in the present manuscript, the melting points of imidazolium ILs are studied employing a quantitative structure-property relationship (QSPR) approach to develop a model for predicting the melting points of a data set of imidazolium ILs. The Monte Carlo algorithm of CORAL software is applied to build up a robust QSPR model to calculate the values T m of 353 imidazolium ILs. Using a combination of SMILES and hydrogen-suppressed molecular graphs (HSGs), the hybrid optimal descriptor is computed and used to generate the QSPR models. Internal and external validation parameters are also employed to evaluate the predictability and reliability of the QSPR model. Four splits are prepared from the dataset and each split is randomly distributed into four sets i.e. training set (≈33%), invisible training set (≈31%), calibration set (≈16%) and validation set (≈20%). In QSPR modelling, the numerical values of various statistical features of the validation sets such as R Validation 2, Q Validation 2, and IICValidation are found to be in the range of 0.7846-0.8535, 0.7687-0.8423 and 0.7424-0.8982, respectively. For mechanistic interpretation, the structural attributes which are responsible for the increase/decrease of T m are also extracted. This journal is © The Royal Society of Chemistry.Entities:
Year: 2021 PMID: 35497322 PMCID: PMC9042335 DOI: 10.1039/d1ra06861j
Source DB: PubMed Journal: RSC Adv ISSN: 2046-2069 Impact factor: 4.036
The mathematical relationship of validation parameters used for the predictive potential of QSPR models
| The criterion of the predictive potential | References |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The summary of statistical quality and criteria of predictability of the QSPR models
| Split | Set |
|
| CCC | IIC |
|
|
|
|
|
|
| Δ | Y-r |
| MAE |
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Training | 113 | 0.7862 | 0.8803 | 0.6548 | 0.7802 | 0.7848 | 31.3 | 24.6 | 408 | |||||||
| Invisible training | 112 | 0.7864 | 0.8865 | 0.7868 | 0.7780 | 0.7830 | 23.5 | 18.8 | 405 | ||||||||
| Calibration | 55 | 0.8196 | 0.9029 | 0.9053 | 0.8079 | 0.8112 | 0.8103 | 0.3341 | 0.8070 | 24.4 | 18.8 | 241 | |||||
| Validation | 54 | 0.8204 | 0.8954 | 0.8972 | 0.8060 | 0.8086 | 0.7419 | 0.1335 | 0.0279 | 28.0 | 22.8 | 238 | |||||
| 2 | Training | 116 | 0.8023 | 0.8903 | 0.7278 | 0.7958 | 0.7998 | 26.4 | 20.6 | 463 | |||||||
| Invisible training | 111 | 0.8334 | 0.8859 | 0.6057 | 0.8277 | 0.8299 | 26.4 | 20.4 | 545 | ||||||||
| Calibration | 57 | 0.8256 | 0.9071 | 0.9005 | 0.8137 | 0.8163 | 0.8144 | 0.8301 | 0.8136 | 24.5 | 20.2 | 260 | |||||
| Validation | 50 | 0.8535 | 0.9133 | 0.8982 | 0.8423 | 0.8271 | 0.7889 | 0.0764 | 0.0215 | 24.7 | 20.7 | 280 | |||||
| 3 | Training | 109 | 0.8116 | 0.8960 | 0.7922 | 0.8052 | 0.8064 | 24.9 | 19.2 | 461 | |||||||
| Invisible training | 107 | 0.8226 | 0.8900 | 0.8274 | 0.8155 | 0.8195 | 27.9 | 22.5 | 487 | ||||||||
| Calibration | 62 | 0.7809 | 0.8687 | 0.8837 | 0.7665 | 0.7287 | 0.7267 | 0.6810 | 0.7747 | 33.3 | 26.1 | 214 | |||||
| Validation | 56 | 0.7846 | 0.8818 | 0.7784 | 0.7687 | 0.6838 | 0.6965 | 0.0256 | 0.0218 | 27.5 | 22.3 | 197 | |||||
| 4 | Training | 118 | 0.8232 | 0.9031 | 0.8195 | 0.8183 | 0.8188 | 25.2 | 18.8 | 540 | |||||||
| Invisible training | 107 | 0.8551 | 0.9038 | 0.7369 | 0.8503 | 0.8471 | 23.1 | 17.6 | 620 | ||||||||
| Calibration | 62 | 0.8177 | 0.8952 | 0.9042 | 0.8035 | 0.8224 | 0.8154 | 0.7975 | 0.8093 | 26.8 | 22.0 | 269 | |||||
| Validation | 47 | 0.8323 | 0.8986 | 0.7424 | 0.8163 | 0.7888 | 0.7077 | 0.1623 | 0.0182 | 23.9 | 17.3 | 223 |
Fig. 1Experimental Tmversus predicted Tm values (A) and residual of Tmversus predicted Tm (B) for four QSPR models constructed by TF2.
The list of the promoter of increase/decrease of Tm extracted from split 2 using TF2
| Type of descriptors | No. | SA | CWs | NT, | NiT, | NC | Defect [SA | Comments | ||
|---|---|---|---|---|---|---|---|---|---|---|
| Probe 1 | Probe 2 | Probe 3 | ||||||||
|
| ||||||||||
| Graph-based descriptors | 1 | VS2-C…5… | 1.34960 | 2.43334 | 1.31153 | 108 | 99 | 50 | 0.0003 | Valence shell of the second order for aliphatic carbon atom equal to 5 |
| 2 | PT3-C…5… | 0.10558 | 1.51087 | 0.92938 | 99 | 87 | 41 | 0.0006 | The presence of the path of length 3 equal to 5 for a carbon atom | |
| 3 | PT2-C…4… | 0.24752 | 2.00169 | 0.35048 | 72 | 72 | 37 | 0.0003 | The presence of the path of length 2 equal to 4 for a carbon atom | |
| SMILES based descriptors | 1 | 1……….. | 7.29691 | 5.95405 | 6.34387 | 113 | 112 | 55 | 0.0000 | Presence of a cyclic ring |
| 2 | c……….. | 1.71525 | 2.42427 | 0.64969 | 109 | 102 | 55 | 0.0002 | Presence of aromatic carbon | |
| 3 | n……….. | 2.38850 | 1.33337 | 0.67039 | 108 | 102 | 55 | 0.0002 | Presence of aromatic nitrogen | |
| 4 | c…(……. | 2.49408 | 0.42892 | 2.49631 | 101 | 91 | 48 | 0.0002 | Branching at an aromatic carbon | |
| 5 | (…C…(… | 1.27092 | 2.30761 | 2.86614 | 88 | 89 | 39 | 0.0006 | Combination of aliphatic carbon with two branching | |
| 6 | c…(…C… | 1.33043 | 0.11257 | 0.06315 | 83 | 80 | 39 | 0.0003 | Aromatic carbon joined by branching with the aliphatic carbon atom | |
| 7 | […1……. | 0.89010 | 1.76115 | 0.62621 | 73 | 74 | 32 | 0.0006 | Presence of branching connected to the ring | |
| 8 | n…(……. | 0.94747 | 1.97741 | 2.03406 | 71 | 60 | 32 | 0.0005 | Presence of aromatic nitrogen and branching | |
|
| ||||||||||
| Graph-based descriptors | 1 | VS2-F…6… | −0.31025 | −0.58746 | −0.85323 | 67 | 70 | 30 | 0.0005 | Valence shell of second-order equal to 6 for a fluorine atom |
| 2 | PT2-C…1… | −0.76524 | −0.99261 | −0.74397 | 85 | 81 | 45 | 0.0004 | The presence of the path of length 2 equal to 1 for a carbon atom | |
| SMILES based descriptors | 1 | BOND10000000 | −2.42991 | −1.67430 | −2.03277 | 60 | 58 | 30 | 0.0001 | Presence of double bonds and absence of triple and stereochemical bonds |
| 2 | […S……. | −0.94276 | −0.55305 | −0.90267 | 35 | 37 | 13 | 0.0015 | Combination of branching and aliphatic sulphur | |
| 3 | […C……. | −1.90870 | −0.15225 | −0.77063 | 76 | 67 | 32 | 0.0009 | Presence of branching connected to aliphatic carbon | |
The comparison between some of the previous models and the present study for the prediction of Tm of imidazolium ILsa
| Descriptor type | Feature selection method | Machin learning method | Data set size |
| RMSD | Ref. | |||
|---|---|---|---|---|---|---|---|---|---|
| Training | Test | Training | Test | Training | Test | ||||
| CODESSA | BMLR | MLR | 16 | 3 | 0.90 | 0.9815 | 19.2 | 13.2 |
|
| 25 | 4 | 0.92 | 0.8622 | 15.2 | 29.1 | ||||
| PaDEL-descriptor | Tree feature selection | MLR | 291 | — | 0.78 | — | 18.2 | — |
|
| Group contribution descriptors | — | Group contribution method | 190 | — | 0.90 | — | 28.2 | — |
|
| Artificial neural networks | Multilayer perceptron network (MLP) | ANN | 97 | — | 0.99 | — | — | — |
|
| CODESSA | BMLR | MLR | 57 | — | 0.74 | — | 29.2 | — |
|
| 25 | — | 0.75 | — | 14.5 | — | ||||
| 18 | — | 0.94 | — | 17.7 | — | ||||
| 45 | — | 0.69 | — | 20.0 | — | ||||
| Dragon and CODESSA | — | PLS | 22 | — | 0.95 | — | — | — |
|
| — | 62 | — | 0.87 | — | — | — | |||
| Materials Studio | Genetic algorithm | MLR | 50 | 10 | 0.88 | 0.74 | 29.9 |
| |
| BA-ANN | 50 | 10 | 0.91 | 0.95 | 12.2 | ||||
| CORAL | Monte-Carlo | LR | 226 | 109 | 0.83 | 0.85 | 26.0 | 24.7 | This work |
BMLR: best multilinear regression method, PLS: partial least squares, MLR: multiple linear regression, ANN: artificial neural network.