Literature DB >> 35912349

Transition State Theory-Inspired Neural Network for Estimating the Viscosity of Deep Eutectic Solvents.

Liu-Ying Yu^1,2, Gao-Peng Ren¹, Xiao-Jing Hou^1,2, Ke-Jun Wu^1,2,3, Yuchen He⁴.

Abstract

The lack of accurate methods for predicting the viscosity of solvent materials, especially those with complex interactions, remains unresolved. Deep eutectic solvents (DESs), an emerging class of green solvents, have a severe lack of viscosity data, resulting in their application still staying at the stage of random trial and error, and it is difficult for them to be implemented on an industrial scale. In this work, we demonstrate the successful prediction of the viscosity of DESs based on the transition state theory-inspired neural network (TSTiNet). The TSTiNet adopts multilayer perceptron (MLP) for the transition state theory-inspired equation (TSTiEq) parameters calculation and verification using the most comprehensive DESs viscosity data set to date. For the energy parameters of the TSTiEq, the constant assumption and the fast iteration with the help of MLP can allow TSTiNet to achieve the best performance (the average absolute relative deviation on the test set of 6.84% and R 2 of 0.9805). Compared with the traditional machine learning methods, the TSTiNet has better generalization ability and dramatically reduces the maximum relative deviation of prediction under the constraints of the thermodynamic formulation. It requires only the structural information on DESs and is the most accurate and reliable model available for DESs viscosity prediction.

Entities: Chemical

Year: 2022 PMID： 35912349 PMCID： PMC9335917 DOI： 10.1021/acscentsci.2c00157

Source DB: PubMed Journal: ACS Cent Sci ISSN： 2374-7943 Impact factor: 18.728

Introduction

Solvent materials occupy a strategic position in the fields of biology, pharmacy, medical treatment, chemistry, and chemical engineering.[1−5] Green chemistry requires us to use green solvents that are nontoxic and harmless to the human body and the environment. Deep eutectic solvents (DESs) are expected to achieve the design of chemical processes without utilizing or generating harmful chemicals, due to their unique physical and chemical properties such as low vapor pressure, high thermal stability, low flammability, high solubility, wide liquid range, and designable structures.[6] The synthesis of DESs is 100% atomically economical, requiring only simple mixing of the components, without waste generation and further purification steps.[7] These attractive properties make it a potential substitute for conventional organic solvents and ionic liquids, and some breakthroughs have been made in the fields of gas absorption,[8,9] extraction and separation,[10,11] bioengineering,[12] nanotechnology,[13] analytical chemistry,[14] catalysis,[15,16] etc. Although DESs have received widespread attention, the serious lack of viscosity information has caused their application to remain in the stage of random trial and error, and it makes it difficult to apply them on an industrial scale.[17,18] Viscosity is internal friction or resistance to the flow caused by intermolecular interactions and is very important in all physical processes involving fluid movement or component dissolution. Viscosity information determines dimensions for a pipe system, specifications for pumps or heat exchangers, the operability of the mixing and separation process, and the application of the product. Understanding the viscosity of DESs is considered a top priority in investigating their applications in different fields and designing the application processes. To obtain viscosity information on the immeasurable number of DESs (the theoretical possible combinations of components that exhibit eutectic behavior are unlimited[19,20]), accurate determination of their viscosity must be done. Most of the proposed viscosity models of DESs are based on a limited database and are applicable for only one kind of DES or for only a limited database of DESs. For example, the viscosity model for choline chloride-based DESs[21] and the viscosity model that only applies to hydrophobic DESs[22] belong to the former. The latter is common in applications based on some small modeling databases. For example, the models are proposed to predict the viscosity of 27 different DESs through cubic plus association (CPA) and perturbed chain-statistical associating fluid theory (PC-SAFT) equations of state (EOSs). Coupling with the friction theory[23] or free volume theory,[24] their models have deviations of 4.4% and 2.7%, respectively. It can be seen that such models can generally achieve small average absolute relative deviation (AARD), but, limited by their small scope of application, the practicability of this kind of model is low. There is only one viscosity model considering all types of DESs to date.[25] However, it is a regression model that requires some experimental viscosity data as inputs. Besides, the AARD of the model is as high as 10.4%, and maximum absolute relative deviation (MARD) achieves 83.9%. This result is still unsatisfactory. To predict the viscosity of DESs accurately and efficiently, it is necessary to develop a comprehensive prediction model with an extensive database covering every type of DESs and small prediction deviation. The use of machine learning in physicochemical properties modeling has great potential to accelerate the discovery and application of emerging solvent materials. The neural network (NN) is currently one of the most commonly used machine learning methods.[26−28] With powerful abilities of feature extraction and function learning, NN has arisen as a potential and very suitable approach in quantitative structure–property relationship (QSPR) models and quantitative structure–activity relationship (QSAR) models.[29−33] However, the main weakness of the plain NN model is its poor portability. The prediction of the plain NN model is only driven by the stack of data, while the laws of physics are omitted. Hence, for an uneven data set (e.g., the viscosity data set has a large proportion of low viscosity data points), the plain NN models have difficulty capturing the correct input–output relationships in the region of the low proportion part in the data set.[34] Unfortunately, the data distribution is always biased. The data augmentation method is one possible way to alleviate this problem.[35] However, research on the data augmentation method for molecules is still in its early stages, especially in the field of molecule property prediction. In contrast to the most prominent fields of NN applications (e.g., computer vision, natural language processing), most physicochemical characteristics have theoretical or semiempirical equations that are represented by temperature and molecular information. A more efficient and feasible way is to combine the prior knowledge of humans with machine learning methods, and it has been proven to do well in various fields.[36−38] Absolute rate theory[39] and free volume theory[40] based on transition state theory are currently the most commonly accepted theoretical models for calculating the viscosity of pure liquids. By introducing appropriate mixing rules, we establish a transition state theory-inspired neural network (TSTiNet) model, which needs only structural information on DESs. It is the most accurate and reliable model currently available for viscosity prediction of DESs. This work provides an initiative to develop reliable models to predict the viscosity of DESs and promote the application and inverse design of DESs.

Results and Discussion

Data Analysis

The database of the viscosity of DESs covers the viscosity values from 1.3 to 85 000 mPa·s, which confers higher chances of solvent manipulations to design task-specific solvents. As shown in Table , DESs are divided into five categories according to their compositions: (I) the combination of organic salt and metal salt, (II) the combination of organic salt and hydrated metal salt, (III) the combination of organic salt and nonionic hydrogen bond donor (HBD), (IV) the combination of hydrated metal salt and nonionic HBD, and (V) the combination of nonionic hydrogen bond acceptor (HBA) and nonionic HBD. The number of different types of DESs investigated in this work is shown in Figure A. Type I, II, and IV DESs have fewer examples in the database because of the limitation of hydrated and nonhydrated metal halides.[41] Type III and V DESs have the most, as they are usually selected from a wide range of natural compounds and thus are less toxic and less expensive than other classes.[42]

Table 1

General Formula for the Classification of DESs

type	general formula	terms
Type I	Cat⁺X^– + zMCl_x	M = Zn, Sn, Fe, Al, Ga, In
Type II	Cat⁺X^– + zMCl_x·yH₂O	M = Cr, Co, Cu, Ni, Fe
Type III	Cat⁺X^– + zRZ	Z = CONH₂, COOH, OH
Type IV	MCl_x + RZ	M = Al, Zn; Z = CONH₂, OH
Type V	RZ₁ + RZ₂	Z_1,2 = OH, COOH

Figure 1

Number of DESs’ viscosity data on the training set and test set. (A) Number of DESs’ viscosity data in different types. (B) Number of DESs’ viscosity data in the different temperature ranges. (C) Number of DESs’ viscosity data in different viscosity value ranges. The viscosity of DESs is a function of temperature.[43] In this work, the 2229 data points collected have a wide temperature range of 278.15–378.15 K, which is the operating temperature range of most solvents. As shown in Figure B, we divide the temperature range into 5 equal intervals, and each range includes at least 50 data points, which shows the temperature distribution in our data set is balanced. This feature is helpful for the viscosity model to learn the relationship between viscosity and temperature. The histogram in Figure C shows a bimodal distribution of the viscosity values with 1000 mPa·s as an interval. Most data points are at a viscosity of less than 1000 mPa·s, and few data points are in the high viscosity region. That is because solvents with low viscosity are often of more interest due to energy consumption considerations. The imbalanced data distribution leads to poor performance of machine learning models in the region of high viscosity.[44−49] Although limited information is available, the prediction of viscosity of DESs in the high-value region is very meaningful in the field of daily chemicals and petroleum chemicals. Taking the applications of DESs as lubricants as an example, the oil film with too low viscosity is unstable and easy to break, and a higher viscosity is preferred.

Viscosity Model from Transition State Theory

Transition state theory regards chemical reactions and other processes as continuous changes in the relative positions and potential energies of the constituent atoms and molecules. There is an intermediate configuration on the path between the initial and final arrangements of atoms or molecules, at which the potential energy has a maximum value. The configuration corresponding to this maximum is known as the activated complex, and its state is referred to as the transition state.[50] Both absolute rate and free volume theories of liquid viscosity based on the transition state theory are widely accepted for calculating the viscosity of pure liquids.[51] Both theories are based on the assumption of a quasi-crystalline liquid structure.[52] The flow process of Newtonian fluid can be expressed asAfter the molecule at position X obtains the activation energy E, the activated molecule X′ will move to the new vacancy Y. That is, a molecule is considered to be vibrating near the equilibrium position; when it has enough energy and there is a free space, the molecule will jump to a new equilibrium position. The probability of this jump pj can be expressed aswhere pE is the probability of attaining sufficient energy to cross the barrier, and pv is the probability that there is sufficient local free volume for a jump to occur. The absolute rate theory simplifies the processing of all pores in the fluid to have the same volume, so that the temperature dependence of viscosity is simplified to determine the number of possible jumps for molecules to cross the barrier at different temperatures. This simplification leads to inaccurate calculation of pv. The free volume theory considers a liquid composed only of hard balls and repulsive force, and successfully deduced the distribution of pore sizes in the fluid. However, this theory ignores the role of attraction and is incomplete in calculating the probability pE of molecular transitions. It was found that in a narrow temperature range, either the absolute rate theory or the free volume theory can fit the experimental data well. However, in a wide temperature range, neither equation can successfully depict the viscosity–temperature relationship. For this reason, the concept of combining absolute rate and free volume theories was proposed to depict the Newtonian viscosity of liquid under various temperatures.[53] According to the definition of Newtonian viscosity, considering two layers of molecules in a liquid, at a distance λ1 apart, the force f applying on per square meter makes one layer slide past the other. The difference in the velocity of the two layers is Δu. Then the viscosity η is equal toAbsolute rate theory describes the process as molecules crossing the barrier from one equilibrium position to another.where λ is the distance between the two equilibrium positions in the direction of movement; λ2 and λ3 are the average distances between two adjacent molecules in the moving layer perpendicular and the same to the direction of the movement, respectively. κ is the number of times a molecule passes over the barrier per second; k is Boltzmann’s constant, and T is the absolute temperature. Substitution in eq then givesFor normal viscous flow, f is relatively small, and since λ, λ2, and λ3 are all about molecular dimensions, it follows that 2kT ≫ fλ2λ3λ. It is thus possible, in expanding the exponentials included in eq , to neglect all terms beyond the first, and the result isAlthough λ is not necessarily equal to λ1, the two quantities are of the same order of magnitude and if, as a first approximation, they are taken to be identical (λ = λ1). The product λ2λ3λ1 is approximately the volume inhabited by a single molecule in the liquid state, and hence it may be put equal to V/N, where V is the molar volume and N is the Avogadro number; then eq can be written asIf E is the standard free energy of activation per mole, κ is given bywhere R is the gas constant; substitution in eq then gives the classic absolute rate viscosity model[54]According to the free volume theory, the pore size distribution can be obtained asthen P(v) is the probability of finding the free volume v nearby. The average free volume per molecule is Vf. The constant r is a numerical factor needed to correct for the overlap of free volume. Assuming that a minimum local free volume V* is necessary for a jump to occur, one can calculate the probability of finding V* and thus the jump probability pv. So we can get the classic free volume viscosity model[55]Although these two viscosity models have shortcomings, the absolute rate model fully expresses pE, while the free volume model expresses pv better.The quasi-crystalline theory of liquid viscosity assumes that the viscosity is inversely proportional to the jump probability. Combining the absolute rate and free volume theories, the viscosity of a liquid can be described as follows,quantity V* should be close to V0, the close-packed molecular volume per mole, and Vf is defined asThis hybrid equation has been applied to many types of liquid including polyatomic van der Waals as well as hydrogen-bonded liquids.[56] One method for obtaining Vf is to assume that the free volume is the total thermal expansion at constant pressure where V0 is considered to be independent of temperature, and then, Vf can be obtained approximately bywhere α is the thermal expansion coefficient, and T0 is the temperature of completely ordered material. For this case, eq can be rearranged as,whereAs mentioned before, the composition of DES will affect its viscosity. It is found that[57] the DES system formed using glycerol as the HBD and different types of ammonium salts as the HBA has the viscosity decreasing along with the reduced molecular weight of the DES. Hence, in this work, we assumed that Aη varied with M, and eq thus could be expressed asA, E, α′, T0, and y are adjustable parameters. Equation can be used to correlate viscosity data of liquids, and these adjustable parameters can be obtained if viscosity-temperature data is available. For temperatures ranging from the melting point to the normal boiling point, eq can be expressed in a more general form as follows,Assuming the temperature of completely ordered material (β) is ideal, the difference between different substances is slight. To simplify the model, in this work, we assume that β is a constant, and the adjustable parameters α0, α1, α2, and α3 are only molecules dependent. Therefore, according to the Grunberg–Nissan method,[58] the viscosity of the binary nonideal mixture DES can be expressed as follows (which is called as TSTiEq):where ηDES is the viscosity of DES, x is the mole fraction of the component, M is the molecular weight of the component, α0, α1, α2, and α3 are the structural parameters. G is the interaction factor of the component HBA and HBD. Both β and G are the energy parameters. To simplify the model, we supposed that the values of G, namely, GI, GII, GIII, GIV, and GV, are the same for the same type of DES, which has been proved to be reasonable in our previous work.[59,60]

NN vs TSTiNet

Many metrics can be chosen to evaluate the performance of the models. Since our database has an extensive range of viscosity, the frequently used mean square error (MSE) and mean absolute error (MAE) are not suitable for evaluating the performance of the models. Therefore, we evaluate both models using AARD, MARD, and the coefficient of determination (R2). AARD can tell the average performance of the model on the data set. MARD and R2 can tell the reliability of the model, which is essential for practical applications. Figure shows the network architecture of the TSTiNet model. As shown in Figure , we use three multilayer perceptrons (MLPs) to calculate the parameters in TSTiEq, and each MLP has different inputs. In addition to the TSTiNet model, we also implement a plain NN model to predict DESs’ viscosity as a comparison. The plain NN model takes all features as inputs to calculate logarithmic viscosity directly, and the architecture of the NN model is as same as the MLP in the TSTiNet.

Figure 2

The network architecture of the TSTiNet model. The model takes the structure information, molecular weight, mole fraction, types of DESs with one-hot encoding, and temperature as input features. Then the model uses two MLPs to calculate structural parameters with molecular structures of HBA and HBD, respectively. Besides, the model uses one MLP to calculate energy parameters with all input features. It should be noted that the energy parameters are treated as constants. In other words, the final value of the energy parameters is the average of the values on the training set. The molecular weight, mole fraction, types of DESs, and temperature are directly driven into the TSTieq. Then TSTieq gives the final value of the logarithmic viscosity of DESs. The training process and performances of both models are shown in Figure , and the metrics are provided in Table . As shown in Figure A, neither model falls into severe overfitting, which indicates both models achieve a trade-off between variance and bias. Figure B shows a scatter chart correlating the predicted and reported viscosity values of the training and test sets. The calculated viscosity of DESs using the TSTiNet model displays a better agreement with the corresponding experimental viscosity data than that of the plain NN model. It can be seen that most of the data points are close to the identity line on both models, but some noticeable deviation points appear in the plain NN model. Although the plain NN model has a higher R2 on the training set (R2 = 0.9999), it has an unacceptable R2 on the test set (R2 = 0.7464). In comparison, the TSTiNet model achieves high R2 on both training and test sets (training set R2 = 0.9997 and test set R2 = 0.9805). Besides, to ensure a better understanding of the results, the distribution of relative deviations (RD) between the literature and the predicted viscosity on the training and test sets is shown in Figure C. Although most data points in the plain NN model are closer to the line with RD = 0, some data points are far from that line. As mentioned in the Data Analysis section, most models based on machine learning are not good at predicting the region of high viscosity. Thus, we can see that the points with the most significant deviation in the plain NN model are located in the right area of the figure. In contrast, the RD distribution in the TSTiNet model is more evenly on the line with RD = 0, and there are not many large deviation points appearing in the right region. The box plots of different types of DESs are plotted in Figure D. It can be seen that the plain NN model has very low median absolute relative deviation (ARD) (all less than 5%) for different types of DESs but has many outliers. Further, what is even more difficult to accept in the plain NN model is that some outliers have significantly large values, especially in the type IV DESs. This is further reflected in Figure E: the number of data points of ARD > 25% on the TSTiNet model (1.61%) is less than that of the plain NN model (2.69%). This result indicates that the TSTiNet model has a stronger generalization ability than that of the plain NN model. In other words, the TSTiNet model can predict the full range of data under the condition of an uneven distribution of data points.

Figure 3

Table 2

Metrics of Different Models on the Test Set

metric	plain NN	TSTiNet-mixed	TSTiNet-variables	TSTiNet-constants
R²	0.7464	0.9805	0.8857	0.7320
AARD (%)	5.23	6.85	6.06	9.85
MARD (%)	82.15	49.28	69.47	99.03

Training processes and performances of the plain NN model and the TSTiNet model. (A) Learning curve of the TSTiNet model and the plain NN model. An epoch is when all the training data pass through the network during the training phase. (B) Correlation between the predicted and reported viscosity values of data sets. The achieved R2 on the training set and test set are given on the top. (C) Relative deviations between the literature and the predicted viscosity in both data sets. (D) Box plots of ARD on different types DESs. Each box shows the interquartile range (IQR between Q1 and Q3) for the corresponding set. The central mark (horizontal line) shows the median, and the whiskers show the rest of the distribution based on IQR (Q1 – 1.5 × IQR, Q3 + 1.5 × IQR). Data outside of this range are considered outliers and represented by dark dots. (E) Percentage of ARD on the test set in different ranges, which are <5%, 5–15%, 15–25%, and >25%. More detailed information can be found in Table . Table shows that the TSTiNet model has comparable AARD with the plain NN model but performs better on the metrics of R2 and MARD. The plain NN model has a smaller AARD, which may be attributed to the fact that the plain NN model has learned a more complicated formula than the TSTiNet model. In the TSTiNet model, the relationships between viscosity and molecular weight, mole fraction, type of DES, and temperature are described by TSTieq whose formula is fixed. The constraints of the equation make the TSTiNet model perform slightly worse in AARD. However, from another perspective, the equation derived from viscosity theory can also limit the model from fitting incorrect relationships. In contrast, the plain NN model is completely driven by data, causing it tp not be well trained in some regions with few data points. Therefore, the plain NN model has worse performance on R2 and MARD. In short, although the plain NN model with more flexibility can get good results in most data points, it is this flexibility that makes the plain NN model susceptible to the uneven data set in the training set, which makes the reliability of the model poor. In contrast to the plain NN model, the TSTiNet model can give a better prediction on all data sets with high R2, which indicates that the TSTiNet model has better generalization ability. In industrial applications, the reliability of the model is of paramount importance. Since the TSTiNet model can accurately predict the viscosity of DESs in the full viscosity range and all types of DESs, it is a more appropriate model to be applied in the prediction of the viscosity of DESs. As a comparison, we also test the performance of other traditional machine learning methods (random forest, gradient boosting, and LightGBM), after hyperparameter optimization, all the models cannot get comparable performance with TSTiNet (R2 > 0.9, MARD < 50%). More detailed comparisons and discussions are shown in Supporting Information. To give a more comprehensive perspective of the proposed model, we also explore the relationships between viscosity with temperature, mole fraction, and types of HBA and HBD (as shown in Supporting Information), and the results show that the trends of model prediction value and experimental value matched very well.

Ways to Train the Energy Parameters

The energy parameters refer to β and G in TSTiEq. These two parameters are closely related to the intramolecular or intermolecular interaction energy.[61] The parameter β affects the relationship between viscosity and temperature, and the parameter G affects the relationship between the viscosity of DESs and the type of HBA and HBD. Therefore, it is crucial to fit the energy parameters accurately. To achieve a more accurate viscosity prediction model, we examine three methods to fit the parameters. Given that the energy parameters are theoretically related to the structure information of HBA and HBD, molecular weights, temperature, etc., we first take all features as input to train an MLP model, whose outputs are the energy parameters. The viscosity prediction model including this MLP is called TSTiNet-variables. As shown in Table , although the TSTiNet-variables model has a higher R2, lower MARD, and comparable AARD compared with the NN model, its R2 and MARD are still unacceptable. A possible explanation for this result is that all the features are involved in the training of the MLP for energy parameters in the TSTiNet-variables model; then the model will approximate the NN model to achieve a lower loss. For example, if the outputs of the MLPs for predicting structure parameters (α0, α1, α2, α3) get all zeros, the TSTiEq will degenerate toThis shows that the viscosity prediction is similar to the prediction of G. This similarity makes the TSTiNet-variables model and the NN model behave similarly (all have bad R2 and MARD). To prevent the TSTiNet model from degenerating to the NN model, we trained the energy parameters as constants. Consequently, the energy parameters can be embedded in the viscosity model as trainable model parameters. The viscosity prediction model, including this training method of the energy parameters, is called TSTiNet-constants. As Table shows, the TSTiNet-constants model performs worse than both the NN and TSTiNet-variables models. This result suggests that the TSTiNet-constants model may have fallen into underfitting, and the higher training loss of the TSTiNet-constants model (Huber loss approaching 0.007) supports this explanation. As a comparison, the loss of the TSTiNet-variables model approaches 0.002. The reason for the underfitting of TSTiNet-constants model may be due to the model falling into the local minimum of the loss function. Furthermore, limited by a low learning rate, the iteration of the energy parameters is very slow, as shown in Figure A,B. Both Figure A and Figure B show that the value of the energy parameters change very little from the initial value, which means that the energy parameters are not well trained. The poor training of the energy parameters causes the TSTiNet-constants model to perform poorly.

Figure 4

Energy parameters during the training process and final distribution on the training set. (A) The parameter β over training epochs on the TSTiNet-mixed model and the TSTiNet-constants model; (B) the interaction factors of different types of DESs over training epoch on the TSTiNet-mixed model and the TSTiNet-constants model. (C) The histogram describes the frequency of occurrence of different ranges of values of the parameter β on the training set. The orange curve is the kernel smooth of the histogram. (D) Box plot of interaction factors on different types of DES. Each box shows the interquartile range (IQR between Q1 and Q3) for the corresponding set. The central mark (horizontal line) shows the median, and the whiskers show the rest of the distribution based on IQR (Q1 – 1.5 × IQR, Q3 + 1.5 × IQR). Data outside of this range are considered outliers and represented by dark dots. Since type I DESs have only one data point in the training set, the interaction factor of type I DESs is not present in the box plot. Since the TSTiNet-variables model has a degeneration problem and the TSTiNet-constants model has an underfitting problem, neither model can give good viscosity prediction performance. To solve these two problems, a novel method for training energy parameters is constructed. Since the TSTiNet-variables model can converge faster and converge to a lower training loss, we still use a two-layer MLP to calculate the energy parameters. Meanwhile, we still adopt the assumption that the energy parameters are constant to prevent model degeneration. Combining these two premises, we divide the calculation of energy parameters into two processes: the training and nontraining processes. In the training process, we use an MLP to calculate the energy parameters (β, GI, GII, GIII, GIV, and GV) of all the examples in the training set and take the average in the training set. In the nontraining process (validation process or test process), we ignore the MLP that calculates the energy parameters and directly use the average value of the energy parameters on the training set, which means all the energy parameters are considered as constants. The viscosity prediction model, including this training method of the energy parameters, is called TSTiNet-mixed. As shown in Table and the results of the previous section, the TSTiNet-mixed model offers the best performance on R2 and MARD and comparable performance on AARD with the NN model and the TSTiNet-variables model. The reason why the TSTiNet-mixed model performs better than the TSTiNet-constants model can be seen from Figure A,B. Because of the use of MLP for energy parameters in the training process, the model parameters are increased, which makes the energy parameters get more effective training. On the other hand, treating the energy parameters as constants during model evaluation avoids the degeneration of the model. Both Figure C and Figure D show that the assumption that the energy parameters are constants is reasonable. From the plotting of the frequency of β on the training set (Figure C), parameter β has 71% of the values between 180 and 220. Therefore, the assumption that the parameter β can be regarded as a constant is reasonable. The box plot of the interaction factor on the training set can be seen in Figure D. As shown in Figure D, the intervals between the upper and lower quartiles of the interaction factor of four types of DESs are small. It shows that the interaction factor is only related to the type of DESs, and the interaction factor of DESs under the same type can also be regarded as a constant. Consequently, the combination of MLP and assumption of constant energy parameters makes the TSTiNet-mixed model have the best performance. Particularly, we wish to point out that our model is also illuminating for predicting other labels with a theoretical basis (e.g., density, thermal conductivity). When combining a theoretical equation with NN, the first thing to note is that certain features (e.g., temperature, composition) in the equation should have a fixed and reasonable relationship. Furthermore, these features should not be involved in the equation parameters. Otherwise, it will cause the degeneration of the model. Second, for the constant parameters in the equation, a feasible training method is to use an MLP to calculate the mean value of the parameters on the training set and discard this MLP during model evaluation. This method can avoid degeneration and underfitting problems according to the experiments. Finally, the theory-inspired neural network is especially suitable for occasions with few data points and uneven data distribution. For giant data sets and even data distribution, more complex deep neural networks may be more appropriate.

Conclusion

In this work, a model combining theoretical equations and NN is used to predict the viscosity of DESs. This model uses prior theoretical knowledge to solve the model generalization problem caused by the lack of data and uneven distribution. A novel viscosity equation that relates viscosity to molecular weight is derived based on the transition state theory. Then the energy parameters and structural parameters in the equation are calculated through three MLPs. The results show that our model (the TSTiNet model) exhibits better viscosity prediction performance compared to the plain NN model. The TSTiNet model overcomes the shortcoming of most viscosity models in predicting poorly for larger viscosities and dramatically improves the performance on R2 and MARD. By now, the TSTiNet model is the most accurate and reliable model for predicting the viscosity of DESs.

Materials and Methods

Databank

The viscosity of DESs is one of the most challenging properties to predict as the difference in water content of DESs will dramatically change the viscosity.[62] Furthermore, different measurement methods may also cause deviations in the measured viscosity values. In some cases, the experimental viscosity data show an undesirable variability; i.e., the viscosity presented in the literature shows apparent inconsistencies, and significant dispersions are present. For example, choline chloride–malonic acid (1:1) shows an apparent discrepancy at 293.15 K (2016 mPa·s[63] and 900 mPa·s[64]). This variability in the experimental viscosities limits the application of these data in research activity and process development. Hence, experimental data on the viscosities of these solvents are not a reliable source without appropriate analysis and re-elaboration. The data used in the current model development is screened as follows:[65] If there were several reported values of viscosity for a particular temperature and the difference between these viscosity values exceeds 50%, the value with the lowest uncertainty was incorporated into the data set utilized. If the reported values had the same uncertainties, the latest published values were utilized. A sufficiently large database is important for machine learning. Group values derived from a limited number of species may overfit and cannot be applied to new species with the same group. Therefore, a comprehensive literature review has been carried out in the first step to build an extensive set of liquid viscosity data for DESs. The data set used consists of 2229 experimental points, including all the experimental measurements reported in the published literature up to the date of writing this work to ensure that the developed models are highly reliable and robust. The collected data set includes 183 DESs that are prepared from 49 HBA and 70 HBD. The data set covers a wide range of viscosity (1.3–85000 mPa·s) measurements with a wide range of temperatures (278.15–378.15 K) and HBA/HBD mole ratios (1:19–49:1) measured at atmospheric pressure. The viscosity data set (η/mPa·s) provides a lot of important information, including both HBA and HBD names, CAS registry numbers, molecular formulas, molecular structures, mole masses, mole ratios, references, measurement methods of the viscosity, uncertainty, sample sources, purity, sample purification method, and experimental data of viscosity at different temperatures (Supporting Information). The complete data set of viscosity values, including the original reference sources of the experimental data, is presented in Supporting Information. During the development of the model, the database for the viscosity is divided into three subsets: the training, validation, and test data sets. The training set is utilized to obtain parameters for the model. The validation set is used to tune the hyperparameters of the model, and the test set is implemented to evaluate the reliability and predictive ability of the model. In this study, we split the viscosity data of DESs into training, validation, and test set at a ratio of 4:1:1 randomly.

Generation of Chemical Features

The viscosity of a solvent is mainly determined by the molecular structure. Therefore, it is necessary to generate a series of chemical characteristics that can accurately describe the molecular structure of different solvents, which can be used as the input of the neural network. Here, the secondary division of groups has been utilized according to the practice of the group contribution method.[66] In the current method, the molecular structure of a DES is considered a combination of two types of groups: first-order groups and second-order groups. The first-order groups are used to describe the basic structure of DESs, whereas the role of the second-order groups is to provide supporting information for the molecular structure of DESs whose description is insufficient through the first-order groups.

First-Order Groups

The first level of estimation has a large set of simple groups that describe a wide variety of DESs. At present, most DESs with experimental data of viscosity can be described with only first-order groups. The first-order groups are mainly determined based on the Joback and Reid method[67] and Valderrama method.[68] We selected 45 molecular groups as first-order groups to treat diverse types of DESs, as shown in Table .

Table 3

Chemical Features of the Molecules

without rings				with rings
First-Order Groups
–CH₃	–COOH	>NH/>NH⁺–	–S–	–CH₂
–CH₂	–COO–/–COO^–	NH⁴⁺	–SO₂–	>CH–
>CH-	-CHO	=NH	–F	=CH–
>C<	–OH	–NH₂	–Cl/Cl^–	>C=
=CH₂	–OH(ph)	–NH₂(C=O)	Br^–	>C<
=CH-	–O–/–O^–	>P<⁺	M_m	>C=O
>C=	–C≡N	P=O	H₂O	–O–
>C=O	>N<⁺/>N–			>NH
				–N=
				>N–
Second-Order Groups
o-(ph)	m-(ph)	p-(ph)	R	S
Coefficients
G_l	G_ll	G_lll	G_lV	G_V

There are two points to be noted: –NH2 is defined in detail: with carbonyl- and with others. According to the initial fitting of viscosity data by the model, the viscosity fitting of DESs containing −NH2 directly connected to the carbonyl group in the molecular structure is poor. We consider that this structure has a special effect on viscosity, so it is considered separately. If metal ions are divided into different groups, many model input parameters will be introduced, which will easily lead to overfitting problems. Here, we assume that the difference in metal ions’ contribution is only related to the molecular weight and is equal to (nm + 1)Mm, where nm is the number of the metal ion, Mm is its molecular weight.

Second-Order Groups

The second-order groups listed in Table provide more structural information about the molecular structure of DESs, which is not sufficiently described in the first-order groups, such as the differentiation among isomers for aromatics DESs and chiral DESs. Thus, three groups of ortho(o-(r)), meta(m-(r)), and para(p-(r)) among substituent groups in the benzene ring are considered. Using the primary functional group as the reference (determined following IUPAC nomenclature for organic compounds), the occurrence of these groups can be determined. Two configurations of chiral carbon (i.e., RC and SC) are introduced. For example, as shown in Figure , for thymol, based on the phenolic hydroxyl group, the second-order groups include one o-(r) and one m-(r); for d-glucose, the second-order groups include three RC and one SC.

Figure 5

Structural formulas of thymol acid and d-glucose.

Structural formulas of thymol acid and d-glucose. As mentioned earlier, we divided DESs into five categories and performed one-hot encoding on them. Therefore, the input features of the TSTiNet model include 45 × 2 structural features + G (1 × 5 one-hot vector) + temperature + composition ×2 + molecular weight ×2.

Model Details

According to the established chemical characteristics, two NNs are implemented based on Python and PyTorch libraries. One takes all features as input to calculate the viscosity of DESs directly. The other (TSTiNet) includes three MLPs, two of which take structural information on HBA or HBD as inputs to calculate the structural parameters, α0, α1, α2, α3, and the other takes all features as input to calculate the equation parameters of β and G. On the basis of the assumption that β, GI, GII, GIII, GIV, and GV are constants, the average value of all training sets is taken as the final value. With all parameters’ values obtained, viscosity can be calculated by the TSTieq. We have examined a series of hyperparameter settings in MLPs according to the performance on the validation set, including network architecture and activation function. The search space can be found in Table S1. The results show that the same hyperparameter settings can get better performance in the two MLPs of calculating structural parameters. The input features are normalized to make training faster and reduce the chances of getting stuck in local optima. All MLPs have two hidden blocks, and each block has a fully connected layer with 32 neurons, a GELU nonlinearity,[69] and a batch normalization[70] (BN) layer. Unlike the ReLU activation function, the GELU function output can be both negative and positive, so it can be used in predicting labels that have negative values. Besides, the GELU function has been widely used in natural language processing and recent state-of-art MLP related models. The experiments in this work show that the GELU function is more suitable for the TSTiNet than ReLU. In the regression problem, MSE loss, MAE loss, and Huber loss are three main loss functions. After a series of experiments, it was found that Huber loss can obtain the best performance. This is because Huber loss can reduce the instability of MSE to outliers and enhance the convergence speed of MAE. The weights of neural networks are initialized with Xavier uniform.[71] To avoid overfitting, L2 regularization and early stopping are applied in the models. The models are trained using AdamW algorithm[72] with default parameters, learning rate = 0.001, weight decay = 0.0001, and patience of early stopping = 2000.[43,57,81,73−80]

17 in total

Review 1. Ionic liquids and deep eutectic solvents in natural products research: mixtures of solids as extraction solvents.

Authors: Yuntao Dai; Jaap van Spronsen; Geert-Jan Witkamp; Robert Verpoorte; Young Hae Choi
Journal: J Nat Prod Date: 2013-11-04 Impact factor: 4.050

2. Deep neural nets as a method for quantitative structure-activity relationships.

Authors: Junshui Ma; Robert P Sheridan; Andy Liaw; George E Dahl; Vladimir Svetnik
Journal: J Chem Inf Model Date: 2015-02-17 Impact factor: 4.956

3. Natural deep eutectic solvents as new potential media for green technology.

Authors: Yuntao Dai; Jaap van Spronsen; Geert-Jan Witkamp; Robert Verpoorte; Young Hae Choi
Journal: Anal Chim Acta Date: 2013-01-09 Impact factor: 6.558

Review 4. Deep eutectic solvents: syntheses, properties and applications.

Authors: Qinghua Zhang; Karine De Oliveira Vigier; Sébastien Royer; François Jérôme
Journal: Chem Soc Rev Date: 2012-07-17 Impact factor: 54.564

5. A Deep Learning-Based Chemical System for QSAR Prediction.

Authors: ShanShan Hu; Peng Chen; Pengying Gu; Bing Wang
Journal: IEEE J Biomed Health Inform Date: 2020-02-28 Impact factor: 5.772

6. Molecular motion and ion diffusion in choline chloride based deep eutectic solvents studied by 1H pulsed field gradient NMR spectroscopy.

Authors: Carmine D'Agostino; Robert C Harris; Andrew P Abbott; Lynn F Gladden; Mick D Mantle
Journal: Phys Chem Chem Phys Date: 2011-10-28 Impact factor: 3.676

7. Green and Sustainable Solvents in Chemical Processes.

Authors: Coby J Clarke; Wei-Chien Tu; Oliver Levers; Andreas Bröhl; Jason P Hallett
Journal: Chem Rev Date: 2018-01-04 Impact factor: 60.622

8. Thermal isomerization of azobenzenes: on the performance of Eyring transition state theory.

Authors: Clemens Rietze; Evgenii Titov; Steven Lindner; Peter Saalfrank
Journal: J Phys Condens Matter Date: 2017-05-30 Impact factor: 2.333

9. A Theoretical Study on Terpene-Based Natural Deep Eutectic Solvent: Relationship between Viscosity and Hydrogen-Bonding Interactions.

Authors: Chen Fan; Yang Liu; Tarik Sebbah; Xueli Cao
Journal: Glob Chall Date: 2021-01-12

10. Gaussian synapses for probabilistic neural networks.

Authors: Amritanand Sebastian; Andrew Pannone; Shiva Subbulakshmi Radhakrishnan; Saptarshi Das
Journal: Nat Commun Date: 2019-09-13 Impact factor: 14.919

1 in total

1. Viscosity Deviation Modeling for Binary and Ternary Mixtures of Benzyl Alcohol-N-Hexanol-Water.

Authors: Iuliana Bîrgăuanu; Maricel Danu; Cătălin Lisa; Florin Leon; Silvia Curteanu; Constanta Ibanescu; Gabriela Lisa
Journal: Materials (Basel) Date: 2022-08-18 Impact factor: 3.748

1 in total