Yunhan Chu1, Xuezhong He1. 1. Department of Chemical Engineering, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway.
Abstract
An automated computational framework (MoDoop) was developed to predict the biopolymer solubilities in ionic liquids (ILs) on the basis of conductor-like screening model for real solvents calculations of two thermodynamic properties: logarithmic activity coefficient (ln γ) at infinite dilution and excess enthalpy (H E) of mixture. The calculation was based on the optimized two-dimensional structures of biopolymer models and ILs by searching the lowest-energy conformer and optimizing molecular geometry. Three lignin models together with one IL dataset were used to evaluate the prediction ability of the developed method. The evaluation results show that ln γ is a more reliable property to predict lignin solubilities in ILs and the p-coumaryl alcohol model is considered as the best model to represent lignin molecules. The developed MoDoop approach is efficient for rapid in silico screening of suitable ionic liquids to dissolve biopolymers.
An automated computational framework (MoDoop) was developed to predict the biopolymer solubilities in ionic liquids (ILs) on the basis of conductor-like screening model for real solvents calculations of two thermodynamic properties: logarithmic activity coefficient (ln γ) at infinite dilution and excess enthalpy (H E) of mixture. The calculation was based on the optimized two-dimensional structures of biopolymer models and ILs by searching the lowest-energy conformer and optimizing molecular geometry. Three lignin models together with one IL dataset were used to evaluate the prediction ability of the developed method. The evaluation results show that ln γ is a more reliable property to predict lignin solubilities in ILs and thep-coumaryl alcohol model is considered as the best model to represent lignin molecules. The developed MoDoop approach is efficient for rapid in silico screening of suitable ionic liquids to dissolve biopolymers.
Lignocellulosic
biomass is the most abundant renewable biomaterial
on the earth. It is a composite material with three main biopolymers:
cellulose, hemicellulose, and lignin.[1] Cellulose
and hemicellulose are typically used for production of textiles, paper,
pharmaceutical compounds, etc., whereas lignin is usually converted
into liquid biofuels or turned to be feedstocks of chemicals, such
as binders, dispersants, surfactants, and emulsifiers.[2] Due to the growing concern of sustainable development and
environmental protection, substantial attention has been put on the
conversion of lignocellulosic biomass into biofuels or valuable products
through thermochemical/biological conversion.[3−5] A key step in
the utilization of lignocellulosic biomass is to dissolve these contained
biopolymers (i.e., cellulose, hemicellulose, and lignin). Among them,
lignin is a cross-linked polyphenolic polymer mainly acting as a barrier
preventing biological and physical attacks to cellulose and hemicellulose.[1] The crystalline structure of cellulose and the
cross-linked structure of lignin make them even more difficult than
hemicellulose to be deconstructed. Therefore, proper solvents should
be identified to dissolve these biopolymers.Ionic liquids (ILs)
are green solvents and typically consist of
a bulky, asymmetric organic cation and an anion that largely adjusts
the physical and chemical properties.[6,7] Compared to
conventional solvents, ILs have desirable properties such as high
thermal stability, nonvolatility, high solvation ability, and low
toxicity.[8−13] Moreover, ILs can be altered with a wide range of cations and anions
to produce new ILs with a wide spectrum of physical, chemical, and
biological properties.[14,15] All of the aforementioned advantages
make ILs promising solvents for dissolution of biopolymers of lignocellulosic
biomass.[16,17] Moreover, due to the large diversity of
ILs, experimental screening of ILs with preferred dissolution ability
from a vast number of potential ILs to dissolve biopolymers is not
practical, which highlights the importance of applying an automated
rapid tool to predict their dissolving ability.Combining statistical
thermodynamics and quantum chemistry, conductor-like
screening model for real solvents (COSMO-RS)[18−21] as a well-founded approach has
recently received a significant amount of attention. With the large
number of segments of the molecular surfaces of the compounds, and
the assumption that the segment of one molecule overlaps perfectly
with that of another, the charge distribution (σ-profile) on
the molecular surface and chemical potential distribution (σ-potential)
of the molecule in liquid mixture are computed by COSMO-RS on the
basis of quantum chemistry and statistical thermodynamics. The resulting
μ turns out to be the foundation for evaluation of other equilibrium
thermodynamic properties, e.g., activity coefficient (γ) and
excess enthalpy (HE). Given the ability
to predict the thermodynamic data of compounds, COSMO-RS can be used
as an in silico tool to screen molecules for a specific problem solely
on the basis of the information arising from their molecular structures.COSMO-RS has been proven to be effective for prediction of properties
of ILs.[22−29] It integrates dominant interactions such as electrostatic misfits,
H-bonds, and van der Waals forces to summarize multiple solvation
among IL systems; so, mixture calculations can be performed at different
temperatures.[30] Compared to group contribution
methods (e.g., UNIFAC model[31−34]), COSMO-RS is a priori predictive method, which allows
calculations of systems with a qualitative accuracy.[35] Some literature has also reported the suitability of using
COSMO-RS to predict solubilities of cellulose[36−38] and drug molecules[30] in ILs. On the availability of a database of
quantum COSMO calculated compounds, COSMO-RS is adequate for rapid
in silico screening of a large number of solutes or solvents on the
basis of their selected molecular models. Moreover, the conformations
of biopolymers/ILs have a high influence on the prediction results
of COSMO-RS in that different predictions of thermodynamic properties
can be resulted from different conformations of the same molecule.[23,39] Therefore, it is essential to use proper molecular models and conformations
searched by a stable routine to acquire qualitatively and quantitatively
precise predictions.In this work, we present an automated computational
framework that
allows COSMO-RS-based prediction of biopolymer solubilities in ILs
(MoDoop). The computational framework is developed on the basis of
a script calling of different tools: ChemAxon Convert and Cxcalc,[40] OpenBabel,[41] MOPAC,[42] and Amsterdam density functional (ADF) COSMO-RS.[43−45] By selecting an appropriate force field and geometry optimization
method, MoDoop generates a single thermodynamically stable conformer
for both biopolymers and ILs. The single thermodynamically stable
conformer can be used to calculate COSMO result files,[45] which permits rapid qualitative screening of
ILs against selected biopolymer models on the basis of COSMO-RS.To evaluate the developed MoDoop method, the solubilities of lignin
in ILs were predicted. Lignin is represented by three different models
as p-coumaryl, coniferyl, andsinapyl alcohol. The
logarithmic activity coefficient (ln γ) of lignin models
in ILs at infinite dilution and theHE of mixtures were calculated by COSMO-RS as qualitative measures
of their solubilities in ILs. ln γ is correlated with
differences in the strength among molecules due to the dominant interactions,
which leads to the affinity between solutes and solvents.[39]HE, as the temperature
derivative of Gibbs free energy, is a sensitive measure of the intermolecular
interactions within a mixture, which reflects the behavior of the
species in solution. Linear regressions are conducted to compare the
calculated ln γ and HE with
available experimental solubilities of lignin, and R-squared (R2) and residual standard error
(RSE) are used to measure the goodness of fit of the regression models
to reflect their prediction accuracies with respect to lignin solubilities
in ILs. On the basis of the evaluation of the two thermodynamic properties,
the best lignin model and suitable ILs are identified.
Results and Discussion
σ-Potentials
of Lignin Models
The σ-potential
in COSMO-RS measures the affinity between the system S and a surface
of polarity σ. It can roughly be divided into H-bond acceptor
region, the nonpolar region, and the H-bond donor region[23] for the σ-potential distribution on the
molecular surface. As shown in Figure , thesinapyl alcohol model shows the strongest hydrogen-bonding
acceptor capacity due to a more negative σ-potential in the
H-bond donor region and a more positive σ-potential in the H-bond
acceptor region. Thep-coumaryl alcohol model shows
the strongest hydrogendonor capacity due to a more negative σ-potential
in the H-bond acceptor region and a more positive σ-potential
in the H-bond donor region. Theconiferyl alcohol model is somehow
in between. Thus, the solubility ranking of the three lignin models
in ILs is p-coumaryl alcohol > coniferyl alcohol
> sinapyl alcohol, given that the IL dissolution process is anion
dominated.
Figure 1
σ-Potentials of the three lignin models: p-coumaryl, coniferyl, and sinapyl alcohol predicted by COSMO-RS.
σ-Potentials of the three lignin models: p-coumaryl, coniferyl, andsinapyl alcohol predicted by COSMO-RS.
Model Validation
The thermodynamic properties of ln γ
and HE calculated by MoDoop on the basis
of the proposed three lignin models along with experimental lignin
solubilities in the four selected ILs from the IL dataset are listed
in Table . The experimental
lignin solubilities are compared to the calculated ln γ
and HE by linear regressions, and R2 and RSE are used to characterize the goodness
of fit as listed in Table . Thelignin solubilities in the selected ILs are predicted
by each regression model on the basis of calculated ln γ
and HE, as shown in Table . There are deviations between the predicted
solubilities and the experimental data. However, the dissolution ability
trends can be well predicted on the basis of these models, which can
be used for the qualitative screening of suitable ILs for lignin dissolution.
Table 1
Experimental Solubilities of Lignin
along with ln γ and HE (kJ
mol–1) Calculated by MoDoop at 90 °C
ln γ (predicted solubility)
HE (predicted solubility)
IL
lignin solubility (wt %)[49,50]
p-coumaryl
coniferyl
sinapyl
p-coumaryl
coniferyl
sinapyl
[Emim]Ac
30
–8.04 (27.60)
–3.20 (27.09)
–3.22 (26.39)
–9.94 (28.29)
–8.06 (28.68)
–7.75 (28.38)
[Bmim]Cl
10
–4.13 (14.70)
–1.45 (15.33)
–1.58 (16.12)
–6.16 (13.74)
–5.11 (13.19)
–5.09 (13.70)
[Bmim]BF4
4
–0.30 (2.06)
0.64 (1.29)
0.81 (1.16)
–3.24 (2.49)
–3.13 (2.79)
–3.11 (2.77)
[Bmim]PF6
1
0.15 (0.58)
0.64 (1.29)
0.78 (1.35)
–2.70 (0.42)
–2.67 (0.38)
–2.65 (0.23)
Table 2
Goodness of Fit (R2 and RSE) Reflected by Linear Regressions Conducted between
Experimental Lignin Solubilities and Thermodynamic Properties (ln γ
and HE) Calculated by MoDoop on the Basis
of Three Different Lignin Models
p-coumaryl
coniferyl
sinapyl
goodness
of fit
ln γ
HE
ln γ
HE
ln γ
HE
R2
0.91
0.94
0.87
0.96
0.83
0.95
RSE
3.99
3.12
4.71
2.62
5.42
3.03
The R2 values of the linear regressions
listed in Table based
on the three alcohol models show that theconiferyl alcohol model
(with medium polarity) gives the best prediction with HE (R2 = 0.96), whereas thep-coumaryl alcohol (with more hydrogendonor capacity) gives
the best prediction with ln γ for thelignin solubility
(R2 = 0.91). Nevertheless, both models
present good predictions regarding thelignin solubilities in ILs
as shown in Figure .
Figure 2
Linear regressions of experimental solubilities of lignin measured
in four ILs at 90 °C against (a) ln γ calculated
on the basis of the p-coumaryl alcohol model and
(b) HE of mixture calculated on the basis
of the coniferyl alcohol model.
Linear regressions of experimental solubilities of lignin measured
in four ILs at 90 °C against (a) ln γ calculated
on the basis of thep-coumaryl alcohol model and
(b) HE of mixture calculated on the basis
of theconiferyl alcohol model.On the basis of ln γ of thep-coumaryl
modelOn the basis of HE of theconiferyl model(See the other prediction
models in Appendix
A: Figures S1–S4.) It should be
noted that more experimental solubilities are probably needed to further
validate the robustness of the developed models in the future work.
Screening ILs
The predicted ln γ based
on thep-coumaryl alcohol model and HE based on theconiferyl alcohol model in 450 ILs are
depicted in Figure a,b, respectively. (The detailed values are given in Appendix B: Tables S1 and S2.) The cations and anions are
mapped according to scaled values of ln γ and HE. The ILs with a higher dissolution capacity
(highly negative values of ln γ and HE) are shown in the down-left corner (blue region), whereas
those with a lower dissolution capacity (highly positive ln γ
and HE values) are shown in the upper-right
corner (red region). Both thermodynamic properties, ln γ
and HE, vary significantly with anions,
but are less dependent on cations, which indicates that the dissolution
power is strongly dependent on anions. Theionic liquids containing
the anions of Ac–, HCOO–, MeH2NCOO–, MeHOCOO–, DEC–, MeHSCOO–, and BEN– are found to have a high dissolution power for lignin. On the other
hand, theHE calculated on the basis of
theconiferyl alcohol model shows a small difference with cations,
which may indicate the challenges in distinguishing the dissolution
power of ILs containing different cations. Thus, ln γ
is regarded as a more reliable property and thep-coumaryl alcohol is considered as the optimal model to predict lignin
solubility in ILs.
Figure 3
(a) ln γ of lignin in 450 ILs at infinite
dilution
estimated on the basis of the p-coumaryl alcohol
model and (b) HE of mixture calculated
on the basis of the coniferyl alcohol model at 90 °C by COSMO-RS.
The ln γ and HE values were
scaled.
(a) ln γ of lignin in 450 ILs at infinite
dilution
estimated on the basis of thep-coumaryl alcohol
model and (b) HE of mixture calculated
on the basis of theconiferyl alcohol model at 90 °C by COSMO-RS.
The ln γ and HE values were
scaled.
Conclusions
The
automated computational framework of MoDoop is used for COSMO-RS-based
prediction of biopolymer solubilities in ILs. To conduct theCOSMO-RS
calculations, the COSMO result files are generated from the two-dimensional
(2D) structures of biopolymers and ILs based on the conformers searched
by specific force fields and the geometries optimized by empirical
and density functional theory (DFT) methods. The method allows the
use of a single thermodynamically stable conformer to represent biopolymers
and ILs and thus enables rapid qualitative screening of ILs to dissolve
biopolymers. Three selected lignin models have been used to predict
the solubilities of lignin in 450 ILs at 90 °C following the
developed MoDoop method. ln γ is found to be a reliable
reference property as it can reflect the variation of the dissolution
power of ILs along with both cations and anions. Thep-coumaryl alcohol model is selected as the best model to predict
lignin solubility on the basis of ln γ with the high R2 of 0.91. Theionic liquids containing the
anions of Ac– and HCOO– show a
high dissolution power for lignin. The developed MoDoop approach is
efficient for the large-scale screening of suitable ILs for dissolution
of lignin and potentially other biopolymers.
Methods
Computational
Framework
In the MoDoop framework, theCOSMO-RS calculations of thermodynamic properties are based on ADF
COSMO result files from quantum mechanical calculations of different
molecular structures generated by a specific geometry optimization
route. The overview of the computational workflow of MoDoop is shown
in Figure .The in-house MoDoop script allows the whole computational workflow
to be automated, which outputs the analysis results on the basis of
given 2D structures of biopolymer and ILs. It should be noted that
the step of COSMO result file generation is most time-consuming, which
mainly depends on the sizes of the models; however, it only requires
to be performed once per molecule, and the generated COSMO result
files are reusable in the subsequent COSMO-RS computations. Moreover,
in accordance with our previous calculation of ln γ of
cellulose in ionic liquids,[47] the ILs containing
halogen ions (e.g., Cl–, Br–,
and I–) are totally underestimated with hydrogen-bond
strengths compared to those not containing halogen elements. Therefore,
the values of some key parameters of theCOSMO-RS model, such as the
subkey CHB (chb) and SIGMAHBOND (σhb) of the key CRSPARAMETERS,
are adjusted according to our previous work.[47]
Figure 4
Schematic
workflow of MoDoop.
Sketching the 2D
structures of biopolymer
models (e.g., lignin) and ILs by Marvinsketch;converting the 2D structures of biopolymer
models to 3D by Molconvert;conducting the lowest-energy conformer
search for biopolymers by Cxcalc with Dreiding force field;[46]optimizing the geometries of the obtained
lowest-energy conformers of biopolymers on the basis of PM6 method
by MOPAC software;searching the lowest-energy IL conformers
for isolated cations/anions by OpenBabel on the basis of the universal
force field;optimizing
the resulted geometries
of biopolymers and ILs at DFT level to generate COSMO result files
by ADFprep[43] on the basis of the main parameterization
GGA:BP/TZP;calculating
the σ-profiles,
σ-potentials, and thermodynamic properties (e.g., γ and HE) at a given temperature on the basis of the
generated COSMO result files by COSMO-RS implemented in CRSprep;[43−45]reporting the calculated
COSMO-RS
properties by ADFreport;conducting data visualization, plotting,
and reporting to create the overall analysis report.Schematic
workflow of MoDoop.
Lignin Models
The quantum COSMO calculation is time-consuming;
thus, it is impractical to conduct computation on the whole biopolymer.
A feasible way is to represent the biopolymer by a unit part, which
is not only compact enough for efficient quantum mechanical calculations
but also remains the main characteristic of the molecule. Lignin is
usually biosynthesized from up to three monomers: p-coumaryl, coniferyl, andsinapyl alcohols. However, their compositions
in lignin vary due to different material resources (e.g., softwood,
hardwood, and grasses). p-Coumaryl alcohol is the
substructure of theconiferyl alcohol, whereas coniferyl alcohol is
the substructure of sinapyl alcohol. Therefore, these three alcohol
structures were chosen to represent lignin molecules, and their 2D
structures and COSMO molecular surfaces are shown in Figure a–c.
Figure 5
COSMO molecular surfaces
and 2D structures of lignin models: (a) p-coumaryl
alcohol, (b) coniferyl alcohol, and (c) sinapyl
alcohol. On the molecular surface, the red area with the underlying
molecular charge as negative marks positive COSMO charge density,
and the blue area with the underlying molecular charge as positive
marks negative COSMO charge density, whereas the yellow and green
area marks nearly neutral charges.
COSMO molecular surfaces
and 2D structures of lignin models: (a) p-coumarylalcohol, (b) coniferyl alcohol, and (c) sinapyl
alcohol. On the molecular surface, the red area with the underlying
molecular charge as negative marks positive COSMO charge density,
and the blue area with the underlying molecular charge as positive
marks negative COSMO charge density, whereas the yellow and green
area marks nearly neutral charges.
IL Dataset
A set of 450 ILs was extracted from the
literature,[38,47,48] which includes 18 cations (Table ) of methylimidazolium+, ethylmorpholinium+, methylpyrrolidinium+, and pyridinium+, with functional groups of allyl, ethyl, butyl, acryloyloxypropyl,
2-methoxyethyl, or 2-hydroxylethyl, and 25 anions (Table ). The selected IL dataset was
used for COSMO-RS calculations. In addition, four ILs with experimental
solubilities for Kraft lignin (Indulin AT)[49,50] at 90 °C were used to validate the prediction ability of the
MoDoop approach.