Literature DB >> 36120015

Predicting the Surface Tension of Deep Eutectic Solvents Using Artificial Neural Networks.

Tarek Lemaoui1,2, Abir Boublia3,2, Ahmad S Darwish4,5, Manawwer Alam6, Sungmin Park7, Byong-Hun Jeon8, Fawzi Banat4,5, Yacine Benguerba1, Inas M AlNashef4,5,2.   

Abstract

Studies on deep eutectic solvents (DESs), a new class of "green" solvents, are attracting increasing attention from researchers, as evidenced by the rapidly growing number of publications in the literature. One of the main advantages of DESs is that they are tailor-made solvents, and therefore, the number of potential DESs is extremely large. It is essential to have computational methods capable of predicting the physicochemical properties of DESs, which are needed in many industrial applications and research. Surface tension is one of the most important properties required in many applications. In this work, we report a relatively generalized artificial neural network (ANN) for predicting the surface tension of DESs. The database used can be considered comprehensive because it contains 1571 data points from 133 different DES mixtures in 520 compositions prepared from 18 ions and 63 hydrogen bond donors in a temperature range of 277-425 K. The ANN model uses molecular parameter inputs derived from the conductor-like screening model for real solvents (S σ-profiles). The training and testing results show that the best performing ANN architecture consisted of two hidden layers with 15 neurons each (9-15-15-1). The proposed ANN was excellent in predicting the surface tension of DESs, as R 2 values of 0.986 and 0.977 were obtained for training and testing, respectively, with an overall average absolute relative deviation of 2.20%. The proposed models represent an initiative to promote the development of robust models capable of predicting the properties of DESs based only on molecular parameters, leading to savings in investigation time and resources.
© 2022 The Authors. Published by American Chemical Society.

Entities:  

Year:  2022        PMID: 36120015      PMCID: PMC9475633          DOI: 10.1021/acsomega.2c03458

Source DB:  PubMed          Journal:  ACS Omega        ISSN: 2470-1343


Introduction

The chemical industry is highly dependent on organic solvents, and most of these solvents are harmful, toxic, expensive, and generate waste residues, which can cause significant damage to health and safety and contribute to atmospheric pollution.[1] Therefore, applying green chemistry and engineering concepts to more sustainable and environmentally friendly studies becomes necessary. Consequently, one of the 12 green chemistry concepts is that baleful solvents must be avoided, substituted with more sustainable alternatives, or used in limited quantities. Therefore, many researchers have focused their attention on developing greener solvents. These solvents must meet specific conditions to qualify as eco-efficient green media with characteristics such as biodegradability, recyclability, low price, accessibility, and nontoxicity.[2] Due to the aforementioned reasons, research on ionic liquids (ILs) has accelerated and attracted considerable attention as a class of green solvents due to their unique physicochemical characteristics.[3] ILs are salts in the liquid state, consisting mainly of organic cations with organic or inorganic anions with a low melting point (<373 K). Also, because of their lower vapor pressure, ILs are recyclable, making them more effective and environmentally friendly. However, the poor biodegradability and toxicity of some families of IL remain a challenge that obstructs their industrial application.[4] Another problem with some ILs is their complex and expensive synthesis procedure. To overcome the drawbacks of ILs, deep eutectic solvents (DESs) have been developed and are considered alternative green solvents to conventional organic solvents and ILs. Most DESs are generally inexpensive and simple to prepare from natural substances that are easily accessible.[5] Abbott and his team reported in 2003 the first DES, where they considered a eutectic mixture composed of a quaternary salt (choline chloride) that functions as a hydrogen bond acceptor (HBA) and urea that functions as a hydrogen bond donor (HBD) in a molar ratio of (1:2).[6] DESs can be defined as a mixture of two or more compounds with a melting point lower than the ideal mixture, where its eutectic point temperature deviates significantly from the ideal behavior. The depression is created by strong intermolecular force (H-bond) interactions between HBD and HBA and, in some cases, by other noncovalent interactions. DESs have been applied in the literature as an alternative to traditional solvents in many applications, such as catalysis, separation, biochemistry, electrochemistry, and nanotechnology. Therefore, understanding the physical properties of DESs in general and surface tension, in particular, is crucial to evaluating their feasibility in various applications. Surface tension (γ) is defined as the tendency of the fluid to obtain the minimum possible surface area.[7] Many experimental studies have reported on the surface tension of DESs.[8] According to their findings, the main factors that affect the surface tension of DESs are their constituents, the composition of the mixtures, and the intermolecular interactions between HBAs and HBDs.[8] For example, extremely viscous DESs (such as choline chloride-based ESs with polyols/sugars) have high surface tension.[9] Nevertheless, obtaining experimental surface tension data for each DES is time-consuming and expensive because of the theoretically infinite combinations of HBA/HBDs and their molar ratios. Thus, the development of computational models to predict the surface tension of DESs is essential for their use in various applications. Table lists the predictive models available in the literature (to the best of our knowledge) for predicting the surface tension of DESs. Haghbakhsh et al.[10] have developed three models utilizing a data set including 553 data points from 112 DES compositions. The first model utilizing corresponding states as inputs (Tc, Pc, Vc, and ω) demonstrated an average absolute relative deviation (AARD) of 8.80%. In their second paper, the authors developed another two models using group contribution and atomic contribution inputs, and their results showed that the group-contribution-based model performed the best with an AARD of 7.59%. Cea-Klapp et al.[11] predicted the surface tension of DESs by combining the density gradient theory with the perturbed-chain statistical associating fluid theory (PC-SAFT + DGT). Their results showed that an AARD of 1.26% with a maximum variation of 8% was achieved for 34 DES compositions with 334 experimental data points. Also, because their method utilizes the PC-SAFT equation of state, the surface tension prediction trend for DESs with other co-solvent mixtures can also be qualitatively captured giving it an advantage over other approaches. Nonetheless, the method requires the density data in order to fit the PC-SAFT binary interaction parameters (k) for each DES system. More recently, Khajeh[12] developed two multiple linear regression (MLR) models with one model utilizing descriptors obtained from the Dragon Software and the other utilizing group contribution. The database utilized consisted of 126 DES compositions prepared from 781 experimental data points, and their results showed that the quantitative structure–property relationship (QSPR) model outperformed the group contribution model with AARD values of 3.67 and 5.16%, respectively.
Table 1

Comparison between the State-of-the-Art Models in the Literature for Predicting the Surface Tension of DESsa

yearnumber of DESsdata pointsmethodAARD %refs
2020112553CS8.80Haghbakhsh et al.[10]
2021112553GC, AC7.59, 7.80Haghbakhsh et al.[13]
202234334PC-SAFT-DGT1.26Cea-Klapp et al.[11]
2022126781QSPR, GC3.67, 5.16Khajeh et al.[12]
20225201571ANN2.20this work

Abbreviations: CS: corresponding states, GC: group contribution, AC: atomic contribution, PC-SAFT-DGT: perturbed chain statistical associating fluid theory coupled with density gradient theory, QSPR: quantitative structure–property relationship, and ANN: artificial neural network.

Abbreviations: CS: corresponding states, GC: group contribution, AC: atomic contribution, PC-SAFT-DGT: perturbed chain statistical associating fluid theory coupled with density gradient theory, QSPR: quantitative structure–property relationship, and ANN: artificial neural network. Artificial neural networks (ANNs) have been developed as a powerful method for modeling complex processes. By applying experimental data throughout the learning phase, ANNs help determine the outputs of a system by finding patterns and interactions within a given data set.[14] Numerous reports in the literature showed the high accuracy of molecular-based ANNs models for property prediction.[15−17] For example, Bagh et al.[16] evaluated the applicability of an ANN model to predict the electrical conductivities of 18 ammonium- and phosphonium-based DES and reported an AARD of 4.4%. Adeyemi et al.[17] developed an ANN bagging model to predict the density of amine-based DES and reported an R2 value of 0.999 for nine DES. As for the surface tension property, to the best of our knowledge, no molecular-based machine learning (ML) model for predicting the surface tension of DESs has yet been reported. For the case of Ils, Atashrouz et al.[18] predicted the surface tension of 59 ILs (801 data points) using an ANN model based on thermodynamic properties (lower boiling temperature, molar density, critical pressure, acentric factor, and critical compressibility factor). Their model achieved a remarkable performance with an AARD of 4.5%. Nonetheless, as with any modeling technique, ANN also suffer from several disadvantages such as their tendency to be overfitted, their high computational requirements, and their low interpretability that stems from their “black box” nature.[19] Due to the critical role that surface tension plays in identifying the suitability of solvents, especially, in the operation and design of mass transfer processes such as extraction, absorption, and distillation,[15] in this work, we develop the first ANN model that can predict the surface tension of DESs by simply correlating their molecular-level structure. The inputs of the ANN model are selected to be Sσ-profiles, which are molecular-based parameters that can easily be computed from COSMO-RS “conductor-like screening model for real solvents”. Sσ-profiles have previously been used in ML models such as MLR, support vector machines, genetic algorithms, and ANNs for their reliability in describing solvents and their mixtures.[20] Also, to ensure that the developed ANN model is reliable and robust, the database used includes all the surface tension measurements of DESs published in the literature to the best of our knowledge up to the time of writing. Following model development, the ANN model was then externally validated and also tested through an applicability domain assessment. A schematic summary of the method used in this work is shown in Figure .
Figure 1

Summary of the methodology scheme used in this work.

Summary of the methodology scheme used in this work.

Methods

Database

In this work, 1571 experimental data points on surface tension (γ/mN m–1) extracted from 133 different DES mixtures with 520 compositions prepared from 4 anions, 14 cations, and 63 HBDs were used to develop the ANN model. Table lists the compositions and references of the DESs used. The data set covers a wide range of surface tension measurements (17.62–80.68 mN m–1) and temperatures (277–425 K) for binary and ternary DES compositions. Note that the data set does not account for the influence of pressure on the surface tension of DESs because pressure-dependent experimental data are not widely reported in the literature. Thus, the pressure has been fixed at 100 kPa for the data set. Additionally, because water is a critical factor that influences surface tension, the water content of all DESs was also considered in the mixture compositions. The experimental surface tension, DES compositions, temperatures, and corresponding references are given in full detail in Table S1 in the Supporting Information. Additionally, the surface tensions of all 520 DES compositions at 298 K are compiled and summarized in Table S2.
Table 2

List of Investigated DESs with Their temperature Range, Experimental Surface Tensions, Number of Data Points, and Corresponding Referencesa

#abbreviationT/Kγ/mN m–1nrefs
DES1[AcCh][Cl]:U31365.101(21)
DES2[ATPP][Br]:DEG298–24340.86–49.3730(22)
DES3[ATPP][Br]:TEG298–24340.11–48.2530(23)
DES4[BA][Br]:Gly29844.901(24)
DES5[BTP][Cl]:DEG293–35332.71–66.687(25)
DES6[BTP][Cl]:EG29866.931(26)
DES7[Ch][Cl]:1,2-ButOH293–31131.10–34.7040(27)
DES8[Ch][Cl]:1,3-ButOH293–31131.90–40.1040(27)
DES9[Ch][Cl]:1,4-ButOH293–31145.30–47.6050(27)
DES10[Ch][Cl]:2,3-ButOH293–31132.30–35.6040(27)
DES11[Ch][Cl]:BenA:H2O333–35346.90–51.535(28)
DES12[Ch][Cl]:CA:H2O278–33846.72–70.4927(29, 30)
DES13[Ch][Cl]:CA31360.351(31)
DES14[Ch][Cl]:DEG293–35334.16–48.407(25)
DES15[Ch][Cl]:DGA303–34358.30–67.695(32)
DES16[Ch][Cl]:EG:H2O278–33854.15–56.9025(33)
DES17[Ch][Cl]:EG277–29845.70–51.4064(27, 34)
DES18[Ch][Cl]:Fru298–35859.00–75.0028(35)
DES19[Ch][Cl]:Glu:H2O298–33865.80–78.7017(36)
DES20[Ch][Cl]:Glu293–35868.60–75.0018(37, 38)
DES21[Ch][Cl]:Gly:H2O313–33342.55–56.122(30)
DES22[Ch][Cl]:Gly293–32845.60–63.7056(27)
DES23[Ch][Cl]:HexOH316–33441.00–43.6040(27)
DES24[Ch][Cl]:LacA:H2O313–33332.02–42.422(30)
DES25[Ch][Cl]:LacA298–33845.70–48.009(39)
DES26[Ch][Cl]:LevA:H2O29839.351(40)
DES27[Ch][Cl]:Mal:H2O313–33337.36–74.492(30)
DES28[Ch][Cl]:MalA:H2O32357.10–68.204(41)
DES29[Ch][Cl]:MalA298–42552.30–65.703(42)
DES30[Ch][Cl]:MEA298–35844.40–49.6028(43)
DES31[Ch][Cl]:Nin:H2O308–33361.02–63.706(44)
DES32[Ch][Cl]:OA:H2O29860.801(45)
DES33[Ch][Cl]:OA29875.301(46)
DES34[Ch][Cl]:PAA:Act29841.861(47)
DES35[Ch][Cl]:PEG200:Act29822.55–45.569(48)
DES36[Ch][Cl]:PEG200:EtAc29820.26–43.549(48)
DES37[Ch][Cl]:PEG200:Eth29820.94–43.159(48)
DES38[Ch][Cl]:PEG200:FeCl3:Act29822.54–39.979(48)
DES39[Ch][Cl]:PEG200:FeCl3:EtAc29820.70–41.689(48)
DES40[Ch][Cl]:PEG200:FeCl3:Eth29821.15–37.459(48)
DES41[Ch][Cl]:PEG200:FeCl3:H2O29841.46–49.849(48)
DES42[Ch][Cl]:PEG200:FeCl3:IsoOH29818.18–40.319(48)
DES43[Ch][Cl]:PEG200:H2O29833.88–34.465(48)
DES44[Ch][Cl]:PEG200:H2O29845.83–49.219(48)
DES45[Ch][Cl]:PEG200:IsoOH29819.19–40.059(48)
DES46[Ch][Cl]:PEG200298–35335.97–55.0328(49)
DES47[Ch][Cl]:PEG200:FeCl4298–33831.32–35.595(48)
DES48[Ch][Cl]:PEG400298–33843.12–45.625(48)
DES49[Ch][Cl]:PenOH29847.501(45)
DES50[Ch][Cl]:Ph29835.461(40)
DES51[Ch][Cl]:TFA:H2O31335.901(21)
DES52[Ch][Cl]:U:H2O307–33752.84–74.4316(50)
DES53[Ch][Cl]:U:H2O293–42538.70–57.208(51)
DES54[Ch][Cl]:Xyl:H2O278–33870.36–80.6825(29)
DES55[DEEA][Cl]:DEG293–35333.67–64.957(25)
DES56[EA][Br]:Gly29857.601(24)
DES57[EA][Cl]:Ace31346.301(21)
DES58[EA][Cl]:TFA31330.101(21)
DES59[EA][Cl]:U31352.901(21)
DES60[MPPyr][N(SO2CF3)2]:EG29838.00–38.403(34)
DES61[MTP][Br]:DEG293–35329.92–62.747(25)
DES62[MTP][Br]:EG298–32844.64–51.2914(15, 38)
DES63[MTP][Br]:Gly298–32855.95–59.357(38)
DES64[MTP][Br]:MDEA298–35339.19–43.0621(23)
DES65[MTP][Br]:MEA298–35844.00–55.3028(43)
DES66[MTP][Br]:TEG298–32847.03–49.857(15)
DES67[N-DEEA][Cl]:EG298–32844.57–51.2914(15, 38)
DES68[N-DEEA][Cl]:Gly298–32855.16–59.3514(38)
DES69[N-DEEA][Cl]:TFA298–32837.51–40.277(15)
DES70[PA][Br]:Gly29851.701(24)
DES71[TBA][Br]:AA29834.501(52)
DES72[TBA][Br]:DEG298–35332.23–53.507(25)
DES73[TBA][Br]:EG29853.311(26)
DES74[TBA][Br]:FA29837.201(52)
DES75[TBA][Br]:MalA29838.201(52)
DES76[TBA][Br]:MEA298–35833.20–36.1028(43)
DES77[TBA][Br]:OA29842.701(52)
DES78[TBA][Br]:PA29832.401(52)
DES79[TBA][Cl]:Arg313–35335.80–40.4015(53)
DES80[TBA][Cl]:AspA313–35333.90–43.4015(53)
DES81[TBA][Cl]:GluA313–35331.20–39.1015(53)
DES82[TBA][Cl]:Met31341.801(53)
DES83[TBA][HSO4]:BA333–35338.98–42.605(28)
DES84[TBA][ HSO4]:DGA303–34342.82–43.895(32)
DES85[TBA][ HSO4]:Nin308–33338.18–43.236(44)
DES86[TEA][Br]:BA333–35342.11–52.5910(28)
DES87[TPA][Br]:EG303–35341.91–46.9918(54)
DES88[TPA][Br]:Gly303–35345.77–53.1518(54)
DES89[TPA][Br]:TEG303–35342.07–46.5518(54)
DES90Bet:CA293–33342.90–46.305(8)
DES91Glu:Pae:H2O288–33862.30–71.3021(36)
DES92Mat:Pae303–34337.88–43.3627(55)
DES93Men:CaA29827.50–29.044(56)
DES94Men:CapA29829.411(56)
DES95Men:OcA29828.041(57)
DES96Men:OcA298–33318.98–26.6740(58)
DES97PDA:1,4-ButOH293–31838.98–46.79114(59)
DES98PEG200:LacA:Act29823.73–43.419(48)
DES99PEG200:LacA:EtAc29819.93–39.889(48)
DES100PEG200:LacA:Eth29820.98–42.119(48)
DES101PEG200:LacA:H2O29844.46–48.409(48)
DES102PEG200:LacA:IsoOH29818.23–39.619(48)
DES103PEG200:NMA298–33842.30–45.175(48)
DES104PEG200:NMA:Act29820.53–40.9518(48)
DES105PEG200:NMA:EtAc29819.66–39.3418(48)
DES106PEG200:NMA:Eth29819.66–40.0918(48)
DES107PEG200:NMA:H2O29839.85–47.4018(48)
DES108PEG200:NMA:IsoOH29817.62–38.6018(48)
DES109PEG200:NMA298–33838.02–44.1710(48)
DES110PEG200:ThU:Act29822.22–44.169(48)
DES111PEG200:ThU:EtAc29820.33–40.529(48)
DES112PEG200:ThU:Eth29821.21–41.659(48)
DES113PEG200:ThU:H2O29845.06–49.989(48)
DES114PEG200:ThU:IsoOH29818.81–39.649(48)
DES115PEG200:ThU298–33841.79–45.085(48)
DES116PEG400:ThU:Act29823.02–44.049(48)
DES117PEG400:ThU:EtAc29821.46–42.849(48)
DES118PEG400:ThU:Eth29819.43–43.689(48)
DES119PEG400:ThU:H2O29836.15–42.129(48)
DES120PEG400:ThU:IsoOH29818.13–42.129(48)
DES121PEG400:bor298–33840.70–42.225(48)
DES122Thy:CaA29831.751(56)
DES123Thy:Cam29828.431(57)
DES124Thy:CapA29830.351(56)
DES125Thy:FuA29829.091(57)
DES126TMG:GlyA29832.301(60)
DES127TMG:ManA29855.921(60)
DES128TMG:PAA29864.501(60)
DES129TMG:Ace29840.741(60)
DES130ZnCl2:EG293–30749.04–53.006(61)
DES131ZnCl2:HexOH293–30553.59–57.905(61)
DES132ZnCl2:U297–30345.71–49.446(61)
DES133ZnCl2:U296–30368.80–73.124(61)

All data points were reported at approximately 100 kPa.

All data points were reported at approximately 100 kPa. All DESs constituents involved are summarized as follows: (a) anions (bromide [Br], chloride [Cl], hydrogen sulfate [HSO4], and bis(trifluoromethylsulfonyl)imide [N(SO2CF3)2]); (b) cations (acetylcholine [AcCh], allyltriphenylphosphonium [ATPP], benzyltriphenylphosphonium [BTP], butylammonium [BA], choline [Ch], N,N-diethylenethanolammonium [DEEA], ethylammonium [EA], n-methyl-n-propylpyrrolidinium [MPPyr], methyltriphenylphosphium [MTP], N,N-diethylethanolammonium [N-DEEA], propylammonium [PA], tetrabutylammonium [TBA], tetraethylammonium [TEA], and tetrapropylammonium [TPA]; (c) HBDs (1,2-butanediol [1,2-ButOH], 1,3-butanediol [1,3 ButOH], 1,4-butanediol [1,4-ButOH], 2,3-butanediol [2,3-ButOH], acetic acid [AA], acetamide [Ace], acetone [Act], arginine [Arg], aspartic acid [AspA], benzilic acid [BenA], betaine [Bet], borneol [bor], citric acid [CA], capric acid [CaA], camphor [Cam], caprylic acid [CapA], diethylene glycol [DEG], diglycolic acid [DGA], ethylene glycol [EG], ethyl acetate [EtAc], ethanol [Eth], formic acid [FA], iron(III) chloride [FeCl3], fructose [Fru], 2-furoic acid [FuA], glucose [Glu], glutamic acid [GluA], glycerol [Gly], glycolic acid [GlyA], water [H2O], 1,6-hexanediol [HexOH], isopropanol [IsoOH], lactic acid [LacA], levulinic acid [LevA], maltose [Mal], malonic acid [MalA], d-(+)-mandelic acid [ManA], matrine [Mat], n-methyl diethanolamine [MDEA], monoethanolamide [MEA], dl-menthol [Men], methionine [Met], ninhydrin [Nin], N-methyl acetamide [NMA], oxalic acid [OA], octanoic acid [OcA], propionic acid [PA], phenylacetic acid [PAA], paeonol [Pae], 1,3-propanediamine [PDA], polyethylene glycol 200 [PEG200], polyethylene glycol 400 [PEG400], 1,5-pentanediol [PenOH], phenol [Ph], triethylene glycol [TEG], 2,2,2-trifluoroacetamide [TFA], thiourea [ThU], thymol [Thy], trimethyl glycine [TMG], urea [U], xylitol [Xyl], and finally zinc chloride [ZnCl2].

Development of the σ-Profiles

The COSMO-RS theory predicts thermodynamic properties by creating a virtual conductor around each molecule, where the surface area and density charge of each formed surface segment are then calculated, and based on that the σ-profile is determined.[62] To perform the COSMO-RS calculations, building the 3D molecular structures is the first step in optimizing the ground state geometry of the molecule. In this work, the calculation of molecular energy and geometric optimization was carried out for each molecule using the def-TZVP basis “triple-ζ valence polarized” and the generalized gradient approximation BP86 “Becke-Perdew 86”.[20] Geometrical optimizations were carried out using Turbomole software (TmoleX version 4.5.1). The density convergence threshold for the self-consistent field was set at 10–6 hartree.[20] The files obtained for each molecule were then exported as “COSMO” files and imported into COSMOThermX 2022. Examples of the 3D structures of the modeled anions, cations, and HBD molecules using COSMOThermX are presented in Figure . The molecular polarity is graphically represented by the colors blue and red, where blue is the positive “hydrogen-donating” polarity surface, while red represents the negative “hydrogen-accepting” surface. The green areas characterize neutral or “nonpolar” molecular surfaces.
Figure 2

Examples of the developed COSMO structures in this work of four representative (a) anions, (b) cations, and (c) HBDs.

Examples of the developed COSMO structures in this work of four representative (a) anions, (b) cations, and (c) HBDs.

Calculation of the Sσ-profile Descriptors

Using the generated molecular surfaces shown in Figure , the polarity distributions (σ-profiles) of the anion, cation, and HBDs were calculated. The σ-profile of a molecule is a probability distribution that quantifies the relative probability of a molecular surface segment having a certain screening charge density.[63] The curves in σ-profile also indicate the concentration of a particular atom in the molecule.[64] As a result, the integrated area under the σ-profile curve may be used to obtain a description of the surface of a molecule, which is designated as Sσ-profiles. The Sσ-profiles molecular parameter is an a priori quantum chemistry parameter that characterizes the concentration and type of atoms within a certain σ-range. For more information on the Sσ-profiles molecular descriptor, the reader is directed to the work of Torrecilla et al.[64] It should be noted that the accuracy of the developed models can be substantially increased if the σ-profiles were partitioned into 51 regions of 0.001 e/Å2 widths as it would allow for a more detailed description of the molecule,[65] however, that would also lead to the development of a very complex model as a result of having 51 inputs. Therefore, a comprise should be made between the complexity and the accuracy of the developed model. Several research groups in the literature utilized Sσ-profiles in 6 regions,[66] 8 regions,[67] and 10 regions.[68] In our previous work, we have tested several Sσ-profiles discretizations in 4, 6, 8, 10, and 12 regions for the prediction of the pH of DESs using MLR and ANN approaches, and our results showed that an 8-level discretization of the Sσ-profile was the best compromise between accuracy and the number of fitting parameters inputs. Additionally, the 8-level discretization was found to be sufficient to effectively represent the polarization influence of all functional groups constituting the solvent’s structure.[20] This discretization also provides a clear representation of the 3 main categorical regions; (1) the HBA region, (2) the nonpolar region, and (3) the HBD region, with each region being further divided into [S1, S2, and S3], [S4 and S5], and [S6, S7, and S8], respectively. For example, the HBD region can be considered as the addition of three regions, where the chemical information of strong HBD groups is compiled within [S1], standard HBD groups are compiled within [S2], and weak HBD groups are compiled within [S3]. Therefore, in this work, an 8-level discretization of the Sσ-profiles was also utilized. First, the COSMO files (Figure ) were loaded into the BIOVIA COSMOtherm software (version 2022) to calculate the σ-profiles of all the 81 constituents (anions, cations, and HBD molecules), and then they were imported into Excel. Then, the Sσ-profile of each constituent was then calculated by entering the σ-profile data into MATLAB and computing the integral under the curves in each of the 8 distinct regions using the trapz() function. Thereafter, the Sσ-profiles of the modeled DESs are then defined as the molar weighted average of the constituents, which is the conventional method utilized in the literature.[20] The equation is expressed as followswhere xHBA and xHBD are the mole fractions of the HBA (anion + cation) and the HBD, respectively, while S is the descriptor in the region i from 1 to 8 (e/Å2). Table S3 lists the calculated Sσ-profile descriptors for the 81 DES constituents investigated in this work.

Artificial Neural Network

The ANN model, inspired by the biological neuron anatomy, is composed of a network of mathematical functions called “neuron nodes” that relate the various components and layers of the network together. Neurons are directly connected through links that go through an activation function. The activated and deactivated neuron nodes are collected to create the necessary output response.[19] The primary feature of this pattern is to analyze the data and find patterns and interactions within the data sets.[19] ANNs have been widely used to address various engineering challenges and are well known for their high accuracy and robustness in solving complex problems. ANNs may effectively replace statistical analysis techniques such as autocorrelation, multivariable regression, trigonometric, and linear regression.[69] In this work, the hidden neurons within the neural network (H and HH) are defined as follows[17]where tanh is the activation function that binds the neuron values to a range between −1 and 1 (−1 denotes a deactivated neuron while 1 denotes an activated neuron), W represents the weight coefficient of the connection between the input of the layer and the hidden neuron, b represents the intercept bias of the hidden neuron, the subscript m represents the number of the weight coefficient, the subscript n represents the number of the neuron, the subscript p represents the hidden layers (1 or 2), and H and HH denote the neurons in hidden layer 1 and hidden layer 2, respectively. The final surface tension output (γ) of the ANN is expressed as follows In this study, the 8 Sσ-profiles descriptors and the temperature in K were selected as the network’s inputs, while the surface tension of the DESs was chosen as the output. The neural network toolbox of the John’s Macintosh Project statistical software (JMP SAS 15) was used to design the fully connected multilayer perceptron ANN models, where 25% of the training data set was used for internal cross-validation (271 data points). The training algorithm used was the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm. The network’s learning rate was fixed at 0.1, the number of tours was set to 100,000, and a squared penalty method was used for optimization. Input normalization was not used, and the ANN layers were fully connected without using node drop-out. All other options in the JMP SAS 15 software were kept as default.

Applicability Domain

The applicability domain (AD) is a critical concept in ML, as it enables evaluating the uncertainty in a molecule’s prediction based on its similarity to the compounds used in training.[70] AD has been widely used in ML models to detect structural outliers and define the range of molecules for which the prediction may be considered accurate. Different techniques have been used to determine the AD, although the most prevalent is the leverage approach in which the model is tested based on the leverage value (h) for each chemical.[70] For example, lower h values (h < h*) imply more similarity to the training set. In contrast, h values higher than the critical leverage value (h > h*) represent molecules that are “different” from the molecules in the training set, and their prediction may be perceived as less reliable owing to the high degree of extrapolation. The leverage value is defined as follows[70]where v is a matrix with dimensions of 1 × d* containing the input parameters, d* denotes the number of inputs in the ANN model, which is 9 in this work, V is a p × d* matrix where p denotes the number of experimental data points in training, and the superscript “T” indicates the transpose of the matrices.[70] The crucial leverage value (h*) is determined using the formula below[70] The William plot illustrates a model’s domain of applicability by plotting the standardized residuals (SDR) versus the leverage values (h) of each data point. The SDR boundaries in the William plot are between −3 < SDR < +3 and 0 < h < h*. The SDRs are determined using the following formula[20]where γpred and γexp represent the predicted and the experimental surface tensions, respectively.

Results and Discussion

σ-Profiles

The σ-profile of a molecule is a probability distribution that quantifies the relative probability of a molecular surface segment having a certain screening charge density. The σ-profile can be divided into three areas: (1) the HBA area σ > 0.001 e/Å2; (2) the nonpolar area −0.001 < σ < 0.001 e/Å2; and (3) the HBD area σ < −0.001 e/Å2.[20] To determine the input parameters for the ANN model (Sσ-profiles), the σ-profiles of the DES constituents were divided into eight areas and then by calculating the integral area under the curves. The Sσ-profiles can then be classified into five classes depending on their charges: (1) the strong donor region [S1 and S2], the weak donor region [S3], the nonpolar region [S4 and S5], the weak acceptor region [S6], and the strong acceptor region [S7 and S8]. From the 81 modeled DES constituents in this work, the σ-profiles of four anions, four cations, and four HBDs are shown in Figure as representative examples, while the rest of the Sσ-profile are listed in Table S3. The charge distribution is coded in colors: red denotes the HBA area, blue denotes the HBD area, and green denotes the nonpolar region.
Figure 3

Examples of the developed σ-profile in this work of four representative (a) anions, (b) cations, and (c) HBDs.

Examples of the developed σ-profile in this work of four representative (a) anions, (b) cations, and (c) HBDs. As shown in Figure a, most anion peaks are located on the right-hand side of the curves, indicating the nonpolar [S4 and S5], and HBA areas [S6, S7, and S8]. Additionally, it can be seen that the negative charges of the chlorine and bromine ions provide [Cl]–and [Br]− a much stronger screening charge density peak than other anions in the S7 region. In Figure b, the peaks of the cations are noticeable on the left-hand side, covering a large area in the nonpolar [S4 and S5] and HBD [S1, S2, and S3] regions. It can be seen that [Ch]+ and [EA]+ show the highest peaks in the [S2] region, indicating their high positive polarities, while [BTP]+ and [MTP]+ show peaks in the weak donor region [S3], this is due to the charge stabilization of the neighboring CH and CH2 groups nearby their cationic cores, which explains the large peaks in the nonpolar [S4] region. Moving on to Figure c, the σ-profiles of AA, EG, H2O, and U are illustrated as wide profiles. The observed peaks are between −0.0015 < σ < 0.0015 e/Å2, which means that they can exhibit weak HBA and HBD abilities. For example, the left peaks of EG are due to the partial negative charge on the oxygen lone pair of electrons, and the right peaks are due to the positively charged hydrogen. The peak located around 0 e/Å2 is due to the nonpolar CH2 surfaces of EG.

First Hidden Layer

Conducting experimental validation of the model’s predicted values is always necessary. Therefore, to test the performance of the ANN model in predicting the surface tension of DESs, the data of the 133 DES mixtures were separated into two subsets: a training set including 80% of the DESs and a testing set including the remaining 20%. The testing subset was selected using the “ordered response” method,[71] where the surface tension values of all DES at 298 K were sorted from lowest to highest, and then, one of five DESs was selected for the external testing subset. The advantage of using this method is that it ensures a meaningful and diverse selection of training and testing subsets.[71] The data division is shown in Table .
Table 3

Statistical Parameters for the Developed ANN Model

training
number of DESs93
data points of DES1084
DESs consideredDESs 4, 6–13, 15, 17, 19, 21, 23–28, 30–38, 48–52, 54, 56, 58–59, 62–71, 73–79, 81–82, 84–87, 89–97, 99–101, 105–106, 108, 112–117, and 121–133
The performance of an ANN model is highly dependent on the number of neurons in the hidden layer, which substantially influences the accuracy and complexity of the developed model.[19] A small number of neurons may cause the model to be underfitted and thus to have low performance on training and testing data. On the other hand, having a high number of neurons will cause the model to be overfitted, thus having high performance on training data but low performance on external testing data. However, note that there is no direct technique for selecting the most appropriate architecture (number of neurons and number of hidden layers), and thus, the most common method applied in the literature is often through trial and error. In this section, several network architectures with a single hidden layer are tested with 5, 10, 15, 20, 25, 30, 35, 40, 45, and 50 neurons, and the results are shown in Figure . It can be seen from the figure that the ANN model with 25 neurons achieved the lowest root-mean-square error (RMSE) in predicting the surface tension of the testing set with an RMSE value of 3.69 mN/m.
Figure 4

Effect of the number of hidden neurons on the model’s RMSE.

Effect of the number of hidden neurons on the model’s RMSE.

Second Hidden Layer

To study the effect of adding a second hidden layer, the number of neurons in the first hidden layer and the second hidden layer was varied between 10 and 50, with 5–5 as a minimum and 25–25 as the maximum. Figure shows the values of the training and testing sets for the RMSE.
Figure 5

Contour plot of the effect of the number of neurons in layers 1 and 2 on the RMSE for (a) training and (b) testing.

Contour plot of the effect of the number of neurons in layers 1 and 2 on the RMSE for (a) training and (b) testing. It can be seen from Figure that the ANN architecture with 15–15 neurons achieved the lowest RMSE in predicting the surface tension of the testing set with an RMSE value of approximately 1.87 mN/m, which is substantially lower than that of the optimal 1-hidden layer model with 25 neurons model that achieved an RMSE of 3.69. Therefore, it was concluded that the optimal architecture in predicting the given data set is 9–15–15–1, which is schematically presented in Figure . The slope weight coefficients and biases of each neuron for the developed model are available in Table S4.
Figure 6

Schematic diagram of the best performing ANN model with a 9–15–15–1 configuration.

Schematic diagram of the best performing ANN model with a 9–15–15–1 configuration.

Input Importance

To verify the importance of the 8 Sσ-profiles descriptors, the temperature, and their effect on the surface tension, a relative contribution analysis was performed using the “predictor screening” function in the JMP SAS software. The influence of each input on the surface tension is indicated by the sign, where a positive sign indicates that increasing this input variable increases the surface tension, while a negative sign indicates that increasing this input variable causes the surface tension to decrease. Figure presents the relative contribution of the 8 Sσ-profiles descriptors and the temperature to the surface tension of the DESs.
Figure 7

Relative contributions of the input parameters for the developed ANN model.

Relative contributions of the input parameters for the developed ANN model. It can be seen from the figure that the most important descriptors are S2, S3, S4, S5, and S7 as they have the largest contributions, while S1, S6, and S8 have much lower contributions. It can also be seen that the non-neutral surfaces pertaining to the HBD [S1, S2, and S3] and the HBA [S6, S7, and S8] regions tend to increase the surface tension of the DES, while the neutral surfaces [S4 and S5] have a negative effect on the surface tension. As for the effect of temperature, it can be seen that an increase in temperature tends to decrease the surface tension of the DESs. This result is in accordance with other studies reported in the literature.[28,48,49,55,58,59] This could be attributed to the accompanying increase in kinetic energy between the molecules, which in turn weakens the DES intermolecular interactions.

Model Evaluation

Training and Testing of the ANN Model

Figure illustrates a comparison of the experimental and predicted surface tension values in both training and testing. Additionally, the model’s statistical parameters, including RMSE, regression coefficient (R2), average standard deviation (SDavg), and AARD are listed in Table .
Figure 8

Parity graph of experimental and predicted surface tension values of the ANN model in (a) training and (b) testing.

Table 4

Statistical Parameters for the Developed ANN Model

training
Rtraining20.986
Rscramble20.058
RMSE (γ/mN m–1)1.464
SDavg (γ/mN m–1)±0.385
AARD1.43%
ADcoverage96.9%
Parity graph of experimental and predicted surface tension values of the ANN model in (a) training and (b) testing. As shown in Figure a, the training set predictions are largely similar to those for the experimental set, with an R2 value of 0.986. In the case of the testing subset shown in Figure b, it can be seen that the predictions still have a narrow range scattering around the diagonal line with an R2 of 0.977, indicating that the predictions for the external DESs have an acceptable error. The R2 and AARD for the total data set (including training and testing) are 0.983 and 2.20%, respectively, which can be considered reliable and satisfactory. The other statistical parameters for both the training and the testing subsets are listed in Table . To further check that the ANN is not correlated by chance, the y-scrambling method[70] has been used, where the experimental data were modified by randomly reordering the surface tension values, and then a new 9–15–15–1 model was developed for the randomly sorted response. As can be seen in Table , low values of the y-scrambling regression coefficient (Rscramble2) indicate that the ANN is not correlated by chance. The residual plot was used to analyze the model accuracy for further model evaluation. Figure shows the remarkable performance of the proposed model in predicting the surface tension of DESs, where the majority of the residuals were in a range of ±5 mN m–1, with an overall SDavg of ±0.627. Based on these findings, it can be concluded that the developed ANN model can adequately predict the surface tension of DESs with an acceptable error.
Figure 9

The residual deviation between the experimental and predicted surface tension values.

The residual deviation between the experimental and predicted surface tension values.

Applicability Domain

An essential feature of any model is to predict the modeled property of external DESs reliably, and thus an accurate evaluation of a model’s true predictive capability is crucial. To verify the applicability of external DESs, the AD of an ANN model can be tested using both the leverage (h) and SDRs method. The Williams plot for each data point is shown in Figure , where the AD limits are as follows: 0 < h < h* = 0.03 for the x-axis and −3 < SDR < +3 for the y-axis.[70]
Figure 10

William plot for the surface tension of the total set of DESs.

William plot for the surface tension of the total set of DESs. As can be observed, almost all DESs in the external testing set of the ANN model were within the AD limits, as the ADcoverage in the testing was determined to be 96.6% of all data points. However, the predictions of a few DESs in training and testing at various exception temperatures were considered response and structural outliers because they had a leverage value higher than h*, or SDRs greater than three limits. However, these outliers only account for less than 4% of the total data points. Overall, the results of the AD evaluation suggest that the developed ANN demonstrates ample robustness and generalizability due to its large AD and structural coverage, which is a consequence of the 520 DES compositions included in the development of the ANN.

Conclusions

The demand for computational methods capable of predicting the physicochemical properties of solvents for screening purposes is rapidly increasing, particularly given the theatrically infinite nature of designer solvents, such as DESs. This work presents an ANN model for predicting the surface tension of DESs. To ensure that the developed ANN is reliable and robust, a database was used that, to the best of our knowledge, contains all surface tension measurements of DESs reported in the literature. The data set includes 1571 points from 133 different DES mixtures with 520 different compositions and temperatures prepared from 4 anions, 14 cations, and 63 HBDs. The ANN uses molecular-based parameters as inputs, easily obtained from COSMO-RS (Sσ-profiles), and does not require the input of experimental data into the model. Based on the external testing results, the optimal ANN architecture was determined to be two hidden layers with 15 neurons in each layer (9–15–15–1 configuration). The ANN model demonstrated high performance in both training and testing, with an AARD of 1.43% in training and 3.04% in testing. The ANN model also demonstrated a wide domain of applicability covering a large range of DES molecular structures. In summary, the statistical performance of the model indicates that the surface tension predictions can be considered reliable and can be used to estimate the surface tension of DESs in the absence of experimental data.
  19 in total

1.  Novel solvent properties of choline chloride/urea mixtures.

Authors:  Andrew P Abbott; Glen Capper; David L Davies; Raymond K Rasheed; Vasuki Tambyrajah
Journal:  Chem Commun (Camb)       Date:  2003-01-07       Impact factor: 6.222

2.  Eutectic-based ionic liquids with metal-containing anions and cations.

Authors:  Andrew P Abbott; John C Barron; Karl S Ryder; David Wilson
Journal:  Chemistry       Date:  2007       Impact factor: 5.236

3.  Molecular motion and ion diffusion in choline chloride based deep eutectic solvents studied by 1H pulsed field gradient NMR spectroscopy.

Authors:  Carmine D'Agostino; Robert C Harris; Andrew P Abbott; Lynn F Gladden; Mick D Mantle
Journal:  Phys Chem Chem Phys       Date:  2011-10-28       Impact factor: 3.676

4.  Surfactant-Solvent Interaction Effects on the Micellization of Cationic Surfactants in a Carboxylic Acid-Based Deep Eutectic Solvent.

Authors:  Adrian Sanchez-Fernandez; Oliver S Hammond; Andrew J Jackson; Thomas Arnold; James Doutch; Karen J Edler
Journal:  Langmuir       Date:  2017-12-08       Impact factor: 3.882

5.  A COSMO-RS based guide to analyze/quantify the polarity of ionic liquids and their mixtures with organic cosolvents.

Authors:  José Palomar; José S Torrecilla; Jesús Lemus; Víctor R Ferro; Francisco Rodríguez
Journal:  Phys Chem Chem Phys       Date:  2010-01-18       Impact factor: 3.676

6.  Effect of cation alkyl chain length on surface forces and physical properties in deep eutectic solvents.

Authors:  Zhengfei Chen; Michael Ludwig; Gregory G Warr; Rob Atkin
Journal:  J Colloid Interface Sci       Date:  2017-01-30       Impact factor: 8.128

7.  From Salt in Solution to Solely Ions: Solvation of Methyl Viologen in Deep Eutectic Solvents and Ionic Liquids.

Authors:  Jeffrey M Klein; Henry Squire; William Dean; Burcu E Gurkan
Journal:  J Phys Chem B       Date:  2020-07-10       Impact factor: 2.991

8.  Characterization of xylitol or citric acid:choline chloride:water mixtures: Structure, thermophysical properties, and quercetin solubility.

Authors:  Noelia López; Ignacio Delso; David Matute; Carlos Lafuente; Manuela Artal
Journal:  Food Chem       Date:  2019-09-28       Impact factor: 7.514

9.  Physicochemical Characterization and Simulation of the Solid-Liquid Equilibrium Phase Diagram of Terpene-Based Eutectic Solvent Systems.

Authors:  Maha M Abdallah; Simon Müller; Andrés González de Castilla; Pavel Gurikov; Ana A Matias; Maria do Rosário Bronze; Naiara Fernández
Journal:  Molecules       Date:  2021-03-23       Impact factor: 4.411

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.