Christel A S Bergström1, Per Larsson2. 1. Department of Pharmacy, Uppsala University, Biomedical Centre P.O. Box 580, SE-751 23 Uppsala, Sweden. Electronic address: christel.bergstrom@farmaci.uu.se. 2. Department of Pharmacy, Uppsala University, Biomedical Centre P.O. Box 580, SE-751 23 Uppsala, Sweden.
Abstract
In this review we will discuss recent advances in computational prediction of solubility in water-based solvents. Our focus is set on recent advances in predictions of biorelevant solubility in media mimicking the human intestinal fluids and on new methods to predict the thermodynamic cycle rather than prediction of solubility in pure water through quantitative structure property relationships (QSPR). While the literature is rich in QSPR models for both solubility and melting point, a physicochemical property strongly linked to the solubility, recent advances in the modelling of these properties make use of theory and computational simulations to better predict these properties or processes involved therein (e.g. solid state crystal lattice packing, dissociation of molecules from the lattice and solvation). This review serves to provide an update on these new approaches and how they can be used to more accurately predict solubility, and also importantly, inform us on molecular interactions and processes occurring during drug dissolution and solubilisation.
In this review we will discuss recent advances in computational prediction of solubility in water-based solvents. Our focus is set on recent advances in predictions of biorelevant solubility in media mimicking the human intestinal fluids and on new methods to predict the thermodynamic cycle rather than prediction of solubility in pure water through quantitative structure property relationships (QSPR). While the literature is rich in QSPR models for both solubility and melting point, a physicochemical property strongly linked to the solubility, recent advances in the modelling of these properties make use of theory and computational simulations to better predict these properties or processes involved therein (e.g. solid state crystal lattice packing, dissociation of molecules from the lattice and solvation). This review serves to provide an update on these new approaches and how they can be used to more accurately predict solubility, and also importantly, inform us on molecular interactions and processes occurring during drug dissolution and solubilisation.
Poor drug solubility is one of the main obstacles in the drug discovery and development process and was recently identified to be strongly related to the choice of target explored. (Bergstrom et al., 2016) Solubility is the driving force for absorption and acceptable solubility in the intestinal fluid is a prerequisite for achieving sufficiently high drug blood concentrations to obtain a therapeutic effect when systemic effects are warranted. The solubility of a compound affects its absorption, distribution, metabolism, excretion and toxicity (ADMET) profile. Only when the ADMET properties of a drug-like compound are of a sufficiently high quality, and when the target has been validated, can the compound be developed into a new medication (Cook et al., 2014, Morgan et al., 2012). Since the molecular requirements of some targets inevitably result in poor solubility of the ligands, early awareness of this fact by the medicinal chemistry team is crucial for them to make the right decisions on which analyses and assays to perform. Understanding the risk of poor solubility is also important for analysing the results of ADMET assays, since there is potential to identify false readouts as an effect of precipitation or aggregation of the drug compound (Coan and Shoichet, 2008, Pohjala and Tammela, 2012). If the compound is precipitating, it is easy to identify why the in vitro screen has failed. However, if there is no visible sign of aggregation and/or precipitation it is much more difficult to interpret the results. This may lead to false conclusions being drawn from the assay and the “false positives” of the assay pushing the compound forward in the discovery process. In the worst case scenario, this could lead to a poor pharmacological and/or ADMET profile of the chosen compound. Indeed, based on Pfizer clinical trial data, Morgan and colleagues have identified that, to a large extent, failure of drugs in clinical trials could be related to poor efficacy (Morgan et al., 2012). More importantly, the authors questioned whether the target had been explored and validated correctly during the drug discovery stage. As an example, promiscuous compounds may aggregate in in vitro buffers and by this mechanism then cause a non-competitive inhibition, whereas they are diluted in the blood stream and the much lower concentration of the free fraction does not result in target engagement.Since solubility has a profound impact on all the factors that are important for decision making with respect to the fate of the compound, much effort has been directed to developing tools applicable to predicting drug solubility (Delaney, 2005). Various in vitro assays have been developed, ranging from high throughput assays based on titrations of DMSO stock solutions and identification of the precipitations as a qualitative screen providing yes/no answers to highly accurate small scale thermodynamic measurements (Bergstrom et al., 2014). Typically these studies have focussed on solubility in pure water or in non-complex buffers. However, over the last two decades a large number of modified media have also been developed, with the aim of better predicting the intestinal solubility of new compounds in vivo (Fuchs et al., 2015, Galia et al., 1998, Jantratid et al., 2008). Recently also media mimicking the interindividual variability of the composition of intestinal fluids in fasted and fed state have been explored for their influence on drug solubility (Khadra et al., 2015, Madsen et al., 2018, Perrier et al., 2017). These more biorelevant solvents include additives such as bile salts, phospholipids, cholesterol and lipids to reflect fasted and fed intestinal states. Biorelevant dissolution media have so far mainly been used in the early phases of development and have not to any major extent been investigated for their potential as a base for computational predictions. Instead, in silico models developed for prediction of solubility are based on the solubility of the neutral (non-ionised) compound in pure water (see e.g. examples provided in Norinder and Bergstrom (2006)). There are several reasons for taking this approach, including the complexity of the solubility process, since both dissociation from the solid state and solvation of the molecule by the solvent studied influence the final solubility (this is further discussed in Section 2, and Fig. 1). This, together with the difficulty of predicting ionisation constants for complex protolytes, and the difficulties associated with forecasting the influence of additives (such as the bile salts, phospholipids and cholesterol included in simulated intestinal fluids) on the solubility, has made pure water adjusted to a pH allowing the non-ionised species to be determined the first choice for computational modelling. Unfortunately, several of the datasets used, or large fractions thereof, do not reflect the drug-like chemical space. These datasets are repeatedly used and it is not always possible to evaluate the experimental quality of the data. For instance, one of the most repeatedly used datasets includes experimental data ranging from -11.62 to 2.77 on a log molar scale, corresponding to 2 pM and 589 M (Kühne et al., 1995). The exact quality of these data can be questioned, since the pM concentrations need a very sensitive analytical method to be trustworthy and the high end of the range corresponds to a solubility value greater than that of water in water (which is 55 M). Hence, if data like these are used in computational modelling, the accuracy of the predictions could be improved by weighting the influence of the observation by the accuracy of the experimental data, in order not to let poor experimental data influence the final model.
Fig. 1
The thermodynamics behind the solubility process. The drug molecule needs to dissociate from its solid form (step 1) and the tight structure of the water needs to form a cavity large enough to incorporate the drug molecule (step 2). Finally, the drug molecule is inserted into the water where it interacts with the surrounding water molecules (step 3).
The thermodynamics behind the solubility process. The drug molecule needs to dissociate from its solid form (step 1) and the tight structure of the water needs to form a cavity large enough to incorporate the drug molecule (step 2). Finally, the drug molecule is inserted into the water where it interacts with the surrounding water molecules (step 3).In this review we will discuss computational models and modelling approaches that have identified the molecular features resulting in poor aqueous solubility. We will discuss how these findings can be applied in the drug discovery and development settings both as tools for predicting solubility and as indicators of whether the compound will make it to the market after extensive formulation strategies are applied. The review will not focus on statistical approaches to solubility predictions; the interested reader is referred to several other reviews within this area, e.g. (Delaney, 2005, Johnson and Zheng, 2006, Norinder and Bergstrom, 2006, Skyner et al., 2015). Instead, we will discuss new approaches recently presented for modelling properties of importance for solubility. These include models revealing molecular features that indicate solid-state-limited versus solvation-limited solubility, models that include the current status of prediction of biorelevant solubility reflecting the intestinal environment, models for predicting crystal structure and solid-state properties such as melting point (Tm) and models that make use of the thermodynamic cycle theory. The extent to which solubility and solubility changes can be calculated from first principles is also addressed.
Molecular properties resulting in poor aqueous solubility
The thermodynamics behind solubility are shown in Fig. 1. In order for the molecule to dissolve in the aqueous solvent, it must be able to dissociate from its crystal lattice. This process is dependent on the intermolecular interactions between the molecules in the crystal lattice. Compounds with strong intermolecular bonds and/or complex interaction patterns with a large number of interaction points between the molecules in the crystal lattice often show a limited capacity to dissociate from the solid form. These compounds are sometimes referred to as ‘brick dust’ molecules, to demonstrate the poor solubility of a strong (stone-like) solid structure. Typically Tm is used to identify whether a compound shows solid-state-limited solubility (i.e. is a brick dust compound). A Tm of 200 °C has been identified as the cut-off value; for compounds that melt at higher temperatures, the crystal lattice will have a strong influence on the solubility (Bergstrom et al., 2016). For these compounds, any formulation strategy that changes the solid crystal form (e.g. using salts, cocrystals or amorphous systems) will be useful for increasing the dissolution rate and achieving a greater apparent solubility (Edueng et al., 2017, Elder et al., 2013, Kuminek et al., 2016, Taylor and Zhang, 2016). While the compound needs to dissociate from its crystal lattice, the surrounding solvent also needs to prepare for incorporating a new molecule. The larger the cavity needs to be, the larger the energy penalty for the formation. Hence, molecular weight can be used as a simple physicochemical descriptor to reflect the volume required of the cavity. In order to decrease the energy penalty associated with cavity formation, additives which loosen up the tight water structure can be included in the solvent, resulting in an improvement in the solubility of the molecules. Typically, such components are cosolvents (e.g. ethanol and polyethylenglycol 400). Finally, to be soluble, the dissociated free molecule needs to be inserted in the solvent cavity. Hydrophobic compounds have a limited capacity to interact with the water phase, in accordance with similia similibus solvuntur (like dissolves like), and these compounds are solubility-limited by poor hydration. Poorly soluble compounds restricted in solubility by poor hydration are described in the popular scientific jargon as ‘greaseball’ molecules, due to their high hydrophobicity and lack of interaction with water. The molecular descriptor commonly used to describe the role of hydration is the partition coefficient between octanol and water (P), often presented as the log10 value (logP). LogP values of 2–3 have been recommended as the cut-off point for hydration becoming a significant limitation for solubility; the higher the value the poorer the hydration (Bergstrom et al., 2016, Wassvik et al., 2008). It should be noted that, for ionisable compounds, it is the corresponding logD value (at the pH of interest) that should be greater than the logP cut-off value (Fagerberg and Bergstrom, 2015). Formulation strategies successful for delivery of the solvation-limited compounds are lipid-based formulations that are composed of lipids, surfactants and/or cosolvents (Feeney et al., 2016). Unfortunately, compounds that display both solid-state- and solvation-limited solubility also exist. These are compounds with high melting points and high logP values, and these compounds can be viewed as ‘anything’-phobic. These compounds typically aggregate or precipitate out of solution. Such compounds are truly difficult to manipulate, even by altering the formulation, and are likely to produce concentrations in vivo that are too low to allow therapeutic effects.The modified general solubility equation (GSE) established by Jain and Yalkowsky in 2001 allows the roles of logP and Tm in solubility to be clarified (Jain and Yalkowsky, 2001). The GSE states thatwhere S0 is the intrinsic solubility, i.e. the solubility of the non-ionised (neutral) species. Making use of the GSE and hypothetical values for the lipophilicity (logP of 2, 4 and 6) and melting point (Tm of 50, 150 and 250 °C) can help identify which properties dominate the solubility (Wassvik et al., 2008). By this means, it was established that the solubility of compounds with a logP < 2 is mainly dependent on the solid state.Multivariate data analysis making use of principal component analysis (PCA) and projection to latent structures (PLS) of a number of datasets enabled the identification of the molecular features resulting in solid-state- versus solvation-limited solubility (Bergstrom et al., 2007, Fagerberg et al., 2010, Wassvik et al., 2008, Zaki et al., 2010). Solvation-limited compounds are lipophilic, relatively large molecules, and lack conjugated systems (Fig. 2). Many of these have been developed as oral dosage forms, however, dosage forms typically include several different excipients that may improve dissolution (disintegration and dispersion) and solubilisation (Bergstrom et al., 2007). This indicates that an extensive drug development process would be required to bring these compounds to the market. In contrast, solid-state-limited compounds are often flat, typically with an extended ring structure, and display high aromaticity. These molecular features are important for forming a more stable crystal lattice (Wassvik et al., 2008).
Fig. 2
Typical greaseball and brick dust molecules. Solvation-limited compounds (‘greaseball molecules’) are large, show a high degree of flexibility and are highly lipophilic (Bergstrom et al., 2007). Solid-state-limited compounds (‘brick dust molecules’) are small in their structure, and quite large portions of the molecule are often flat and rigid (Wassvik et al., 2008).
Typical greaseball and brick dust molecules. Solvation-limited compounds (‘greaseball molecules’) are large, show a high degree of flexibility and are highly lipophilic (Bergstrom et al., 2007). Solid-state-limited compounds (‘brick dust molecules’) are small in their structure, and quite large portions of the molecule are often flat and rigid (Wassvik et al., 2008).It should be noted, however, that while compounds described as solvation-limited often display intrinsic solubility values in the lower nanomolar scale, none of the compounds included in the solid-state-limited dataset published by Wassvik et al. had solubility values in this range (Wassvik et al., 2008). The least soluble compound in that dataset was griseofulvin, which has a solubility of approximately 15 µM. There are two explanations for why this is the case. Firstly, if we use the GSE (Eq. (1)) and the knowledge that compounds with a logP of 2 or less will tend to have solid-state-dependent solubility, then compounds with extraordinarily high melting points (of about 325 °C) would be estimated to have a solubility of about 30 µM. Secondly, we believe that compounds which are solely solubility-limited because of their solid state are terminated relatively early in the drug development process. The lack of truly poorly soluble compounds with a lipophilicity value of <2 is also shown in Fig. 3. This plot shows the general relationship between the lipophilicity and solubility of 292 drugs. At a logP ≤ 2, there were only a few compounds with solubility values <100 µM. Hence, the possibility of analysing which molecular features drive poor solubility truly originating from the synthesis of crystal structures that are too stable is currently limited, since relevant data are only sparsely publicly available.
Fig. 3
Relationship between lipophilicity and intrinsic solubility for 292 drugs. Data taken from Bergstrom et al. (2004).
Relationship between lipophilicity and intrinsic solubility for 292 drugs. Data taken from Bergstrom et al. (2004).
Prediction of the solid state
One property that has been explored for its potential to be computationally predicted is the melting point, since such models would facilitate predictions of many other properties including the solubility as described in Section 2 and presented in Eq. (1). The first model that was built on a larger series of drug-like compounds (n = 277) was published in 2003 and approached the problem by establishing quantitative structure-property relationships (QSPRs) between calculated molecular descriptors and the melting point reported in Merck Index (Bergstrom et al., 2003). Only the stable polymorph was predicted. These efforts resulted in RMSE-value of 35.1 °C when a consensus model was established that made use of both 2D and 3D descriptors to calculate the Tm. In this particular study, PLS was used to establish the relationship. The same dataset has since then been used for further development, however, the RMSE is still in the same ball park regardless of the molecular descriptors and the statistical methodology used (McDonagh et al., 2015, Tetko et al., 2016).Recently, computational crystal structure prediction (CSP), i.e. prediction of crystal lattice structure, has significantly advanced. These predictions have their basis in experimental data, usually obtained from measurements of single crystals but recently also proven to be possible to obtain from powders (Baias et al., 2013). CSP has also proven feasible to apply on larger and drug-like molecules (Jones et al., 2011, Kazantsev et al., 2011, Reilly et al., 2016, Santos et al., 2013). One means to arrive at computational predictions of crystal structure using ab initio calculations is to perform Monte Carlo simulations with a number of simulations are run starting from a random configuration of molecules in the simulation box. The ranking of the resulting packing is thereafter performed by some sort of energy-based metric (Reilly et al., 2016). However, all types of ab initio calculations are very time consuming, and especially so for drug-like compounds with higher molecular flexibility than typical model compounds used for the methodology development. Therefore other techniques have been explored. One approach that has shown promise is the solid state perturbation where only the 2D structure of the drug is needed as input data in the solid state prediction (Briggner et al., 2014).
In silico models that include biorelevant solubility
Over the last two decades great efforts have been put into research on computational models for predicting aqueous solubility. This can be seen in the increased number of publications on the computational prediction of solubility (Fig. 4). We will not go through all the strategies used in detail with regard to algorithms and descriptors; instead the reader is guided to other reviews where datasets, methodologies and descriptors are discussed (Delaney, 2005, Johnson and Zheng, 2006, Norinder and Bergstrom, 2006, Skyner et al., 2015). Herein we will present a few recent studies where in silico modelling has been used to predict solubility in more biorelevant solvents. It should be noted that the response value (i.e. solubility) is commonly presented as the log10 value when developing in silico models.
Fig. 4
Number of publications on computational prediction of solubility. The search was performed Oct 9, 2017 on PubMed, using a search string of “theoretical OR prediction OR in silico OR comput* AND aqueous solubility”. Thus the number of publications reflects scientific efforts where computational methods have been used to different extents to understand processes, mechanisms and the impact of e.g. additives on solubility as well as producing in silico models enabling quantitative predictions of solubility values.
Number of publications on computational prediction of solubility. The search was performed Oct 9, 2017 on PubMed, using a search string of “theoretical OR prediction OR in silico OR comput* AND aqueous solubility”. Thus the number of publications reflects scientific efforts where computational methods have been used to different extents to understand processes, mechanisms and the impact of e.g. additives on solubility as well as producing in silico models enabling quantitative predictions of solubility values.The relationship between lipophilicity and the solubilization ratio (SR) in water-based solvents including surfactants is well-known and was first shown for the natural surfactant taurocholate by Mithani et al. (1996). This work showed a strong relationship between logP and the SR (R2 of 0.99). The SR is described as:where SCbs is the solubilisation capacity of the bile salt (here taurocholate) and SCaq is the solubilisation capacity of the water. This relationship was further explored by Fagerberg et al., who made use of the measurements of ten structurally diverse compounds in the more complex systems of fasted and fed state simulated intestinal fluids (FaSSIF and FeSSIF, respectively) (Fagerberg et al., 2010). These media differ from a pure bile salt system in that they also contain phospholipids. FeSSIF has 5-fold higher concentrations of these additives than does FaSSIF (Galia et al., 1998). Further, the pH differs between the two media, with the pH of FaSSIF being 6.5 and that of FeSSIF 5.0. Based on these measured values it was suggested that the pH-dependent lipophilicity (logDpH6.5 and logDpH5.0) should be used instead of logP; R2 increased from 0.32 to 0.74 when logD was used instead of logP. Lipophilicity has also been related to dissolution kinetics in FeSSIF version 2 in a more qualitative manner (Gamsiz et al., 2010). When logP < 1 (low lipophilicity), logP 1–4 (intermediate lipohilicity) and logP > 4 (high lipophilicity) were separated out, it was found that solubility and dissolution kinetics were greatly enhanced for highly lipophilic compounds (logP > 4), for which a dissolution rate up to 6-fold higher was measured. However, it was also noted that improvement in solubility was not always related to similarly strong improvements in dissolution kinetics.The dataset studied for the dissolution rate mentioned above was part of the dataset used by SimulationsPlus to establish their in silico models for predicting solubility in biorelevant media (fasted-state simulated gastric fluid (FaSSGF), FaSSIF and FeSSIF version 2). These models and datasets have not been fully published, but the models are available in the ADMET Predictor software. The manual for this software states that these models are based on 160 diverse drug-like compounds. The models were developed using neural network methodology based on 2D molecular descriptors. Approximately 10% of the compounds were used as a test set for each model developed (n = 16–21, dependent on the media studied). The R2 of the training sets was in the range of 0.71–0.76 and the root mean square error (RMSE) was 0.47–0.50 logS (mg/mL) units. There are only two published papers that make use of a similar approach, both by Fagerberg et al., 2012, Fagerberg et al., 2015 who explored the PLS methodology combined with DragonX descriptors for their usefulness in predicting solubility in simulated (FaSSIF) and aspirated fasted-state human intestinal fluid (FaHIF) (Fagerberg et al., 2012, Fagerberg et al., 2015). In the first study, the capability of PLS and molecular descriptors to predict solubility was investigated in FaSSIF for only 22 compounds, resulting in R2 of 0.82 and RMSE of 0.32 log (M) units. In the follow-up study, 112 compounds were studied in FaSSIF and 74 compounds in FaHIF. The results of these modelling studies are similar to those described for the neural network models, with R2 of 0.69 (RMSE of 0.48) and R2 of 0.85 (RMSE of 0.34) for FaSSIF and FaHIF, respectively. It should be noted that the number of compounds examined for solubility in different simulated and aspirated intestinal fluids has increased in recent years and compilations have been made (see, for example, (Augustijns et al., 2014, Fagerberg and Bergstrom, 2015)). Hence, this experimental database is now available for studies using computational modelling and advanced modelling and simulation approaches that demand larger datasets.Recently, the solubility of drugs in FaSSIF was modelled making use of the linear solvation energy relationship (LFER) based on the Abraham descriptors (Niederquell and Kuentz, 2017). Abraham descriptors have previously been used to predict drug solubility in water of polychloronaphtalenes as well as the partitioning into micelles of sodium dodecyl sulphate (SDS) (Abraham and al-Hussaini, 2001, Sprunger et al., 2007). In the work by Niederquell and Kuentz, the modelled solubility data were given as the ratio of the solubility in FaSSIF to that in the corresponding blank buffer, defined as the solubility enhancement (SE) (Niederquell and Kuentz, 2017). The following equation was developed from results obtained based on 40 compounds:where E is the excess molar refractivity, S is the dipolarity/polarisability, A is the hydrogen-bonding acidity, B0 is the hydrogen-bond basicity and V is the McGowan characteristic volume. The model gave good results (R2 of 0.81 and mean absolute error (MAE) of 0.299) but was not challenged by application of external test sets. A similar approach was also used by Fagerberg et al. who used calculated descriptors from the Dragon package and PLS methodology to predict the SE in FaSSIF (Fagerberg et al., 2012). This resulted in an R2 of 0.88 and RMSE of 0.17 log units but, although promising, the dataset only consisted of 22 compounds and therefore was not challenged with test sets. The model was instead validated using permutation tests and Q2; both of which showed that the model did not show signs of being over-fitted to the training set used.
Prediction of solubility by modelling the thermodynamic cycle
A common feature of the QSPR-based models is that it has been proven difficult to obtain solubility predictions with an external validation of accuracy better than an RMSE of 0.7–1.0 log units, i.e. the predicted solubility value can be up to 10 times higher or lower than the actual experimental value, irrespective of the statistical methodology and descriptor space used (Bergstrom et al., 2004, McDonagh et al., 2014). In reality, the predictions are likely to be even more inaccurate when they are used to predict new chemical entities, since these can be quite structurally different from the training set used in the model. The lack of an accurate model has been attributed to variations in the accuracy of the experimental data extracted from the literature, since the model cannot be more accurate than the input data. However, this has been challenged by Palmer and Mitchell, who studied datasets carefully determined in-house versus datasets extracted from the literature (Palmer and Mitchell, 2014). They found that the poor accuracy was more dependent on deficiencies in the algorithms and the descriptors than deficiencies in the experimental values. Hence more effort is required for investigation of descriptors and models that will describe the solubility better than those typically used in QSPR models. Steps have, in fact, been taken in this direction over the last decade. New principles have been investigated in more detail, with the purpose of modelling the fundamental underlying mechanisms of solubility and increasing understanding of this property. One such approach is to model solubility using the thermodynamic cycle (Fig. 5). The relationship between the intrinsic solubility and the change in Gibbs free energy iswhere ΔG(sol) is the Gibbs free energy for solution, ΔG(sub) is the Gibbs free energy for sublimation, ΔG(hydr) is the Gibbs free energy for hydration, R is the molar gas constant, T is the temperature (in Kelvin), S0 is the intrinsic solubility (M) and Vm is the molar volume of the crystal. An organic solvent, typically octanol, can be used as an intermediate between the gaseous and hydrated states in the experimental assessment of the hydration process and, in such cases, Eq. (4) will be transformed towhere ΔG(solv) is the Gibbs free energy for solvation in octanol and ΔG(tr) is the Gibbs free energy for the transfer of the molecule from octanol to water. If the logP value of the molecule is determined, ΔG(tr) can be replaced by
Fig. 5
The thermodynamic cycle. The following abbreviations are used: ΔGsub = Gibbs free energy of sublimation; ΔGsolv = Gibbs free energy of solvation; ΔGhyd = Gibbs free energy of hydration; ΔGtr = Gibbs free energy of transfer.
The thermodynamic cycle. The following abbreviations are used: ΔGsub = Gibbs free energy of sublimation; ΔGsolv = Gibbs free energy of solvation; ΔGhyd = Gibbs free energy of hydration; ΔGtr = Gibbs free energy of transfer.In theory it would be possible to calculate the Gibbs free energy of all three steps of the thermodynamic cycle. ΔG(sub) can be calculated by modelling the potential-based lattice energy and by lattice dynamics simulations, ΔG(hydr) and ΔG(solv) can be calculated by quantum mechanics (provided an appropriate model for the solvent is available) and ΔG(tr) can be estimated from the calculated logP. In fact, several of these steps have lately been subjects for computational modelling. A series of papers has been published by Lüder and coworkers, in which they used stepwise methods to model the free energy of hydration (Westergren et al., 2007), the free energy of solvation in pure melts (Luder et al., 2007b), and the free energy of solvation in pure amorphous matter (Luder et al., 2007a). By producing models for both pure melts and pure amorphous matter, the hypothesis was that it should be possible to predict the difference in solubility between the crystalline and amorphous states using a computer. In their studies, 46 to 48 drug molecules were investigated. They concluded that the electrostatic interactions are much larger than the Lennard-Jones interactions in the hydration process, whereas the opposite is true for the pure melt. Among the different free energy calculations needed to cover the thermodynamic cycle, the free energy of hydration is the most studied process; small organic molecules, drug-like molecules and proteins have been computationally studied for this property (see e.g. (Michielan et al., 2008, Palmer et al., 2011, Tjong and Zhou, 2008)).The first attempt to predict drug solubility using a computational approach to solve the complete thermodynamic cycle was published in 2008 (Palmer et al., 2008). This work included a dataset of 34 drugs and drug precursors, and only compounds with experimental crystal structures available in the Cambridge Structural Database were selected. The mean solubility of the dataset was 1 mM, so focus in this study was not on particularly poorly soluble compounds, and all the molecules were quite small with a maximum molecular weight of 339 Da. Crystal lattice energies, the entropy change for sublimation and gas, the vibrational entropy occurring in the crystal and the free energies of hydration and solvation were calculated from energy-minimised crystal structures using quantum mechanics (QM). Calculated molecular descriptors were also explored. Unfortunately the ab initio approach using QM-calculated thermodynamics failed to predict the experimental result. This was probably because the prediction of ΔG(sol) was not accurate enough. The error of 5.7 kJ mole−1 in ΔG(sol) results in a 10-fold shift (1 log unit) in the predicted solubility value, making this calculation extremely important. Instead of applying only QM, the authors also included molecular descriptors with the QM-calculated properties in a multilinear regression. Hence, several different computational methodologies were combined in the final equation (QM and molecular descriptor calculations partly dependent on secondary QSPR models to allow prediction of logP). The final proposed model included a QM-calculated lattice energy term, a flexibility descriptor (fraction of rotatable bonds) and the calculated lipophilicity value, and resulted in an RMSE of 0.71 for the test set. This accuracy is comparable to that in previously published computational models based on molecular descriptors. In a second study, the authors attempted to predict aqueous solubility from first principles (Palmer et al., 2012). For details around first principles, see Section 6 below. Information about the crystal lattice was again obtained from experiments and used as input crystal structures for the sublimation calculations. Model potential-based crystal lattice simulations to calculate ΔG(sub) were combined with statistical mechanics to calculate ΔG(hyd). This method is not yet as accurate as the empirical methods – i.e. the QSPR methods where relationships are sought between structural features and solubility – and the best model for the dataset explored showed an RMSE of 1.45. However, this approach offers full computational characterization of the thermodynamic cycle. When these models become optimized, e.g. by using force fields that better describe the drug molecule as well as the solvent (water), they may represent a new approach for exploring the impact of the solid-state versus hydration effects on the resulting solubility.
First principles for prediction of solubility changes
In addition to methods for calculating solubility that rely on empirical or semi-empirical corrections and/or different kinds of molecular descriptors as input, as discussed above, solubility can also be predicted using only the underlying physical principles. One advantage of this approach is that the input data that are needed for the non-physics-based methods to work might not be readily available (e.g. for a novel compound), and a significant amount of work would be required to produce these data. An example of such an approach was provided in Section 5 above, although that model used the crystal structure as input. In this section we will focus primarily on relative solubility calculations (e.g. change in solubility as the result of other components being present) rather than absolute predictions of aqueous solubility.It makes sense, however, to start with a description of the use of physical principles to determine solubility with the Hansen-Hildebrand solubility parameters. These were first introduced in 1936 by Hildebrand and Scott (1936) and later extended by Hansen to include systems with polar and hydrogen-bonding characters (Hansen, 1967). Briefly, the Hansen-Hildebrand parameters describe the miscibility of solvents, and are defined as the square root of the “cohesive energy density”; the heat of vaporization divided by the molar volume. The original Hildebrand parameter is a single number, and solvents with similar Hildebrand parameters are generally considered to be miscible. The later extension by Hansen splits the solubility parameter into three components, representing polar, dispersion and hydrogen-bonding contributions to the solubility. A thorough description on how to use physics-based methods to calculate these parameters for a particular solute is given by Belmares et al. (2004) However, regardless of the method used to calculate these solubility parameters (group contributions or physics-based methods), they are based on regular solution theory assumptions and do not, for example, take into account entropy and free-volume changes, making them inaccurate in some circumstances. Nor do they account for changes in solubility with different solute concentrations or conformations, or the fact that individual molecules might interact with each other in very specific ways, which may also affect their solubility (Hancock and Zografi, 1997).The Flory-Huggins (FH) theory is closely related to the Hildebrand-Hansen solubility parameter theory (Flory, 1941, Huggins, 1941). In brief, the FH theory involves determination of an interaction parameter (χFH), which is a dimensionless quantity that represents the mixing behaviour (chiefly enthalpy) of a solute (originally considered by Flory and Huggins to be a polymer) and a solvent. These components are said to be miscible when χFH < 0.5 or, conversely, to be phase-separated when χFH > 0.5. This interaction parameter can also be calculated from physics-based simulation methods using the individual cohesive energy density of each component in the mixture as well as the molar volume of the solute (Case and Honeycutt, 1994). This approach has been used, for example, by Huynh et al. to predict the solubility of docetaxel (an anti-cancer agent) in different excipients (Huynh et al., 2007), and by Pajula et al. to predict the miscibility of small molecules in binary mixtures (Pajula et al., 2010).This section has so far described methods primarily used to determine whether two components are miscible. In general, these methods do not inform on the underlying mechanisms behind solubility. To that end, molecular dynamics (MD) or Monte Carlo (MC)-based simulations can be used, and these also allow prediction of the actual solubility of a particular compound, even in the complete absence of experimental data. Broadly, MD (as well as MC) schemes can be implicit, continuum-based methods that treat the surrounding solvent as an isotropic continuous medium, or methods that incorporate any solvent molecules explicitly. One example of an implicit method which has been shown to yield accurate results for predicted solubility values is the so-called conductor-like screening model (COSMO) (Klamt, 1995). With COSMO, the solution-phase part of solubility is calculated ab initio while the solid-phase contribution is estimated empirically using experimental data. There is also an extension to and further development of COSMO, called COSMO-RS (RS for realistic solvent) (Klamt et al., 2002). One of the main drawbacks of implicit simulation methods is that they do not provide details of the solvation process at an atomic level. With explicit solvation methods, on the other hand, solvent-specific effects and solute-solvent interactions are explicitly considered. These should therefore, at least theoretically, be more accurate and provide better information about solvation than the implicit models (Levy and Gallicchio, 1998). The main issue with explicit models is that they require extensive sampling of a high-dimensional phase-space, i.e. they take a very long time to run, especially for some processes, even with modern computers. This is primarily because of the high degree of freedom from the explicit solvent molecules. The interested reader is directed to papers by Bernardi et al., Páll et al. and Ganesan et al. to read more about explicit MD simulations and ways of enhancing sampling in general (Bernardi et al., 2015, Ganesan et al., 2017, Páll et al., 2014). Here we highlight the use of explicit MD simulations to predict solubility, of which relatively few attempts have been made to date (Ferrario et al., 2002, Paluch et al., 2010, Sanz and Vega, 2007, Schnieders et al., 2012).Essentially, calculating solubility from explicit MD simulations amounts to determining when the chemical potential of the solute in solution is equal to the chemical potential of the solute in its solid, crystalline form (Paluch et al., 2015). This can most readily be achieved by running a series of free-energy calculations, but the chemical potential of the solid is still quite complicated to calculate, since it requires calculation of the fugacity of the solid (Liu et al., 2016). Recently, Mobley and co-workers have shown how to use molecular dynamics free energy calculations to calculate both relative and excess solubilities directly, thus circumventing the need to determine the fugacity (Paluch et al., 2015). These calculations are done in the limit of infinite dilution for small compounds (see Fig. 6). This is of particular interest since most of the published work related to free energy calculations and MD simulations focuses on calculations of solvation free energies and octanol-water partition coefficients (for example through SAMPL challenges) (Marenich et al., 2009, Ribeiro et al., 2010). While useful, corresponding direct measurements of solvation free energies are challenging at best, and it is thus difficult to find or determine experimental values with which to compare calculations. The solubility of a particular solute, on the other hand, is readily measured. Mobley and co-workers have shown that MD simulations can be used to calculate both relative and excess solubility directly for a number of compounds (Paluch et al., 2015). Relative solubility is the ratio of the solubility of two different compounds in the same solvent, while excess solubility is the solubility of a compound in the actual solution relative to that in an ideal solution (Ellegaard et al., 2010, O’Connell and Prausnitz, 1964). The main advantage of these methods is that the properties of the solid form of the solute do not have to be calculated. Obviously, knowledge of the relative or excess solubility of compound A and/or B, together with an experimental measurement of the absolute solubility of e.g. compound A immediately leads to knowledge of the solubility of compound B.
Fig. 6
Molecular dynamics simulation snapshot showing a felodipine molecule surrounded by water and hexadecane. Felodipine is shown in red, water is shown as a transparent, blue surface and hexadecane molecules are shown as sticks. This setup could potentially be used to run a free energy calculation to predict the excess or relative solubility of a drug (felodipine in this case) in more complex systems using the framework of Paluch et al. (2015).
Molecular dynamics simulation snapshot showing a felodipine molecule surrounded by water and hexadecane. Felodipine is shown in red, water is shown as a transparent, blue surface and hexadecane molecules are shown as sticks. This setup could potentially be used to run a free energy calculation to predict the excess or relative solubility of a drug (felodipine in this case) in more complex systems using the framework of Paluch et al. (2015).All of the examples above have involved relatively simple mixtures, such as binary octanol-water systems with a single solute molecule. In the intestinal fluids of the gastrointestinal tract, however, lipophilic molecules are solubilized in mixed lipid aggregates. In the fasted state, these are composed of bile salts and phospholipids. The aggregates interact with the drug molecules, and the composition of these aggregates will have a significant effect on the partitioning of the drug molecules into them. This will then have an impact on both solubilization and bioavailability. To further complicate the picture, lipids in the intestinal tract are digested over time, adding another element of unpredictability. By matching experiments with MD simulations, Birru et al. recently investigated the fate of danazol in different kinds of intestinal colloidal structures (Birru et al., 2017b). They concluded that the solubility of danazol increases with the concentration of digested triglycerides. In two related papers, the same team of researchers also studied how the digestion of lipids alters the phase behaviour in the intestinal fluid, as well as how cholesterol and pH impact the aggregation behaviour of intestinal lipids (Birru et al., 2017a, Suys et al., 2017). Similarly, Benson and Pless found that, in a mixture of tri-, di-, and monoglycerides, the addition of extra monoglycerides leads to reorientation of cyclosporin to the core region of triglyceride moieties (Benson and Pleiss, 2017). A related study by Larsson et al. showed that the different phases of lipids that occur with water dispersion (as typically happens when a drug formulated with lipids passes through the intestinal tract) can be reproduced also using coarse-grained molecular dynamics (Larsson et al., 2017), allowing studies of solubilization of drug molecules under physiologically relevant conditions and concentrations of lipids. Relative solubilization, or partitioning, of a number of drugs (danazol, felodipine, carbamazepine) as well as ethanol and water into lipid bilayers consisting of a mixture of phospholipids (including lecithin) and taurocholate was also studied using MD simulations by Holmboe and colleagues (Holmboe et al., 2016). They concluded that partitioning, and thereby solubilization, is strongly influenced by hydrogen bonding between drug molecules and taurocholate, again highlighting the importance of detailed knowledge at an atomic level for understanding drug solubilization in physiologically relevant fluids.
Conclusion
Significant advances have recently been made in computational predictions of solubility in water-based systems. Recently, QSPR-like methods have been used to predict solubility in biorelevant dissolution media. The purpose of such models is to better predict the solubility in physiological fluids, with special emphasis on the human gastro-intestinal fluids and the impact the composition of these will have on dissolution and solubilization after oral ingestion of medications. For this purpose also MD simulations have been performed. These reveal the structure and the composition of solubilizing nanoaggregates and may inform on specific molecular interaction that occur when e.g. bile components are present. Many of the modelling and simulation efforts target mechanistic understanding of the different processes involved in the thermodynamic cycle of dissolution. Special attention has been directed towards modelling of the solid state. While this field is rapidly evolving, it is still difficult to reach accurate predictions of e.g. melting point useful for prediction via the general solubility equation. The increasing computational power obtained through national and international computational infrastructures combined with significant analytical advances in terms of solid state characterization are two key aspects that will facilitate highly accurate predictions of crystal lattice and melting point in the not too distant future. Advances within this field will be beneficial for the modelling and simulation of processes involved during drug dissolution and solubilisation. Finally, the experimental database on the composition of human gastrointestinal fluids is currently being expanded and these data will be crucial for the computational development of models predicting biorelevant dissolution and the intestinal solubility of drugs.
Authors: Paul Morgan; Piet H Van Der Graaf; John Arrowsmith; Doug E Feltner; Kira S Drummond; Craig D Wegner; Steve D A Street Journal: Drug Discov Today Date: 2011-12-29 Impact factor: 7.851
Authors: Raphael F Ribeiro; Aleksandr V Marenich; Christopher J Cramer; Donald G Truhlar Journal: J Comput Aided Mol Des Date: 2010-04-01 Impact factor: 3.686
Authors: Orlagh M Feeney; Matthew F Crum; Claire L McEvoy; Natalie L Trevaskis; Hywel D Williams; Colin W Pouton; William N Charman; Christel A S Bergström; Christopher J H Porter Journal: Adv Drug Deliv Rev Date: 2016-04-16 Impact factor: 15.470
Authors: James L McDonagh; Neetika Nath; Luna De Ferrari; Tanja van Mourik; John B O Mitchell Journal: J Chem Inf Model Date: 2014-03-11 Impact factor: 4.956
Authors: Mahmoud E Soliman; Adeniyi T Adewumi; Oluwole B Akawa; Temitayo I Subair; Felix O Okunlola; Oluwayimika E Akinsuku; Shahzeb Khan Journal: AAPS PharmSciTech Date: 2022-03-15 Impact factor: 3.246
Authors: Yue Sun; Angela Wei Hong Yang; Andrew Hung; George Binh Lenon Journal: Evid Based Complement Alternat Med Date: 2020-12-24 Impact factor: 2.629
Authors: Santu Bera; Xuewei Dong; Bankala Krishnarjuna; Shannon A Raab; David A Hales; Wei Ji; Yiming Tang; Linda J W Shimon; Ayyalusamy Ramamoorthy; David E Clemmer; Guanghong Wei; Ehud Gazit Journal: Cell Rep Phys Sci Date: 2021-04-21