Literature DB >> 32104400

Model evaluation for the prediction of solubility of active pharmaceutical ingredients (APIs) to guide solid-liquid separator design.

Kuveneshan Moodley¹, Jürgen Rarey^1,2, Deresh Ramjugernath¹.

Abstract

The assumptions and models for solubility modelling or prediction in systems using non-polar solvents, or water and complex triterpene and other active pharmaceutical ingredients as solutes aren't well studied. Furthermore, the assumptions concerning heat capacity effects (negligibility, experimental values or approximations) are explored, using non-polar solvents (benzene), or water as reference solvents, for systems with solute melting points in the range of 306-528 K and molecular weights in the range of 90-442 g/mol. New empirical estimation methods for the Δ f u s C p i of APIs are presented which correlate the solute molecular masses and van der Waals surface areas with Δ f u s C p i . Separate empirical parameters were required for oxygenated and non-oxygenated solutes. Subsequently, the predictive capabilities of the various approaches to solubility modelling for complex pharmaceuticals, for which data is limited, are analysed. The solute selection is based on a principal component analysis, considering molecular weights, fusion temperatures, and solubilities in a non-polar solvent, alcohol, and water, where data was available. New NRTL-SAC parameters were determined for selected steroids, by regression. The original UNIFAC, modified UNIFAC (Dortmund), COSMO-RS (OL), and COSMO-SAC activity coefficient predictions are then conducted, based on the availability of group constants and sigma profiles. These are undertaken to assess the predictive capabilities of these models when each assumption concerning heat capacity is employed. The predictive qualities of the models are assessed, based on the mean square deviation and provide guidelines for model selection, and assumptions concerning phase equilibrium, when designing solid-liquid separators for the pharmaceutical industry on process simulation software. The most suitable assumption regarding Δ f u s C p i was found to be system specific, with modified UNIFAC (Dortmund) performing well in benzene as a solvent system, while original UNIFAC performs better in aqueous systems. Original UNIFAC outperforms other predictive models tested in the triterpene/steroidal systems, with no significant influence from the assumptions regarding Δ f u s C p i .

Entities: CellLine Chemical Disease Gene

Keywords: Active pharmaceutical ingredients; Model prediction; Solid–Liquid Equilibrium; Solubility

Year: 2017 PMID： 32104400 PMCID： PMC7032238 DOI： 10.1016/j.ajps.2017.12.004

Source DB: PubMed Journal: Asian J Pharm Sci ISSN： 1818-0876 Impact factor: 6.598

Introduction

The separation and purification of pharmaceutical products, or intermediates, are arguably the most important and cost intensive process steps in the pharmaceutical industry. The method, degree and efficiency of the process are generally dictated by the phase behaviour of the solute. Kolář et al. [1] state that over 30% of the efforts of industrial property modellers and experimentalists deal with solvent selection. It is therefore imperative that appropriate solvents are analytically selected, based on broadly-sourced information that may include phase equilibrium experimental data, reliable predictions, experience and solute theory (e.g. structure, bonds and physical properties). Often it is not possible to determine the phase behaviour of these systems experimentally, as small amounts of each pharmaceutical product are manufactured in the initial stages of design and synthesis. Due to this constraint, many thermodynamic models have been applied to predict the solubility via predictive Gibbs excess energy models. These models include functional group approaches such as UNIFAC [2], modified UNIFAC (Dortmund) [3], and surface segment approach models, such as COSMO-RS (OL) [4], COSMO-SAC [5] and NRTL-SAC [6] and have exhibited varying degrees of success in predicting the solubility of common pharmaceutical compounds with relatively simple molecular structures [6], [7], [8], [9], [10]. Gmehling et al. [7] and Gracin et al. [8] have explored the ability of the UNIFAC model to predict solid–liquid equilibria. Gmehling et al. [7] considered relatively simple ring structured solutes such as naphthalene and anthracene. The authors could provide good estimates by UNIFAC predictions for the systems considered. Gracin et al. [8] used the UNIFAC model to predict solubilities of single-ring pharmaceuticals such as ibuprofen and aspirin. The authors concluded that accurate predictions were not achievable, and suggested the use of the UNIFAC model for initial estimates only. Hahnenkamp et al. [11] have evaluated and compared the predictive capabilities of the models of Fredenslund et al., Weidlich and Gmehling, Grensemann and Gmehling, and Lin and Sandler [2], [3], [4], [5] for systems containing ibuprofen and aspirin. The authors determined that the predictions of the model presented in Weidlich and Gmehling [3] provided the lowest deviations from the experimental data, when compared to the models from Fredenslund et al. [2] and Grensemann and Gmehling [4]. Diedrichs and Gmehling [12] conducted a detailed model comparison, but only systems with alcohol, alkane, or water as a solvent, were considered. Furthermore, systems with solute mole fractions greater than 0.1 were excluded in the comparison. Schröder et al. [13] explored the prediction of aqueous solubilities of various solid carboxylic acids that are used in the pharmaceutical industry. Little work on the abilities of predictive models for the solubility of complex pharmaceuticals, such as polycyclic aromatics, specifically steroidal triterpenes, is available in the literature. This is mainly due to a lack of experimental data which is imperative to generate model-specific parameters that are usually essential for the application of most predictive models. It is however important that accurate predictions can be made without an extensive set of experimental data, as this would obviously limit the practicality of the predictive model. Abildskov et al. [14] have provided some satisfactory predictions for a limited set of steroidal molecules by conducting sensitivity tests on UNIFAC model parameters however this data is incomplete and not readily available. In this work, the various aforementioned predictive models were tested to determine the most accurate method for solubility modelling for the solutes considered. The models were chosen based on the variations in the approach to solubility modelling (functional group based, segment based, reference solvent based). The differences in combinatorial and residual expressions are distinguished. The results of the predictions are intended to provide qualitative estimates of solubility data as the predictive models generally yield poor quantitative results in the case of solid–liquid equilibria. The performance of the models is correlated with the molecular surface area, molecular weight, and functional group diversity. In this work, functional group definitions based on the work of Fredenslund et al. [2] were used. In addition, the works of Mishra and Yalkowsky [15] and Neau et al. [16] are explored for complex steroidal systems in benzene, or water, as hydrophobic and hydrophilic reference solvents. This is to determine the effect, of the assumption of zero or non-zero-approximates/experimental data, on changes in heat capacity upon fusion in systems exhibiting ideal solubility in the solid phase. Neau et al. [16] showed that the assumption of negligible heat capacity changes can cause large errors in calculated solubility, during modelling for solutes of melting points exceeding 420 K. However, an ideal liquid phase was assumed in their work. Hence, the effect of the activity coefficient was not considered. The tests of Neau et al. [16] have been limited to solute melting points of 470 K, where the different assumptions for changes in heat capacity can result in deviations from experimental data of up to 27%. The effect of the increasing difference between the experimental solubility temperature and fusion temperature is tested in this work. The range of solute melting points considered in the test set here exceeds 520 K, with molecular weights in the range of 90–442 g/mol. It is also useful to establish differences (if any) in performance due to the solvent involved (non-polar organic vs. aqueous). The effect of the various methods of dealing with changes in heat capacity, between the solid and liquid solute () on the predicted solubility, are explored, in conjunction with the different predictive models for the activity coefficient, from the literature. This is to determine the most suitable combination of combinatorial and residual activity coefficient model terms, along with the most suitable model equation for solubility prediction.

Theory

The activity coefficient is a measure of the non-ideality of solutions [12]. The parameter is a strong function of composition, and of temperature to a degree, but is weakly dependent on pressure, at low to moderate pressures. In some cases, the activity coefficient is greater than 1, however, values below 1 are common in solvating systems (as shown in Gmehling et al. [7]), such as solutions of phenol and alkanols or alkanes and polymers. Usually, the degree of dissimilarity between component sizes comprising a mixture is proportional to the differences in activity coefficients of those components [6].

Solid–liquid phase equilibrium

At solid–liquid phase equilibrium, the solvent is saturated with the solute. In the case of eutectic mixtures, the solubility of the solvent in the solid solute is neglected, and the chemical potential of the solute, i, in the pure solid phase , is equal to the chemical potential of the solute in the liquid solution, as shown by Bouillot et al. [10]: The chemical potential of the solute in the liquid solution can be expressed as:where, , is the chemical potential of the hypothetical pure liquid solute at system temperature (reference state), T is the tempearture in Kelvin, R is the universal gas constant in J/mol∙K and is the activity coefficient of the solute in the saturated solution. The activity of the solute can be determined by combining Equations (1) and (2), yielding: At constant temperature and pressure, the chemical potential is equal to the partial molar Gibbs energy, so that: And hencewhere is the hypothetical partial molar Gibbs energy of melting at the system temperature and pressure [10], which is zero for the pure solute at its melting point. Assuming a constant difference in heat capacity, between the solid and the subcooled liquid solute, between the triple point and the system temperature, the following expression can be derived:where is the enthalpy of fusion at the triple point, is the triple point temperature in Kelvin, and is the difference in heat capacity between the subcooled liquid solute and the solid. This derivation disregards the pressure influence on solid solubility, as the difference between system pressure, and triple point pressure, is regarded as sufficiently small, so that a Poynting correction term is not required. Hence the triple point at 1 atmosphere () is often used as a substitute, due mainly to the greater abundance of this data. Often the effect of is assumed to be small in comparison to the other term, and is omitted. This assumtion is only valid when the SLE temperature is similar to the triple point temperature. Equation (6) then reduces to: Hildebrand and Scott [17], [18] however, recommend estimating the as yielding: This improvement has been supported by Neau et al. [16], and is explored further in this work.

Predictive activity coefficient models

A brief description of the predictive activity coefficient models used follows. The reader is referred to the original publications for an in-depth discussion [2], [3], [4], [5], [6].

The UNIFAC and modified UNIFAC (Dortmund) model

The UNIFAC activity coefficient model, introduced by Fredenslund et al. [2], makes two contributions to the activity coefficient. Namely a combinatorial (accounting for size shape interactions), and residual (acounting for energetic interactions), component.where, , and, , are the combinatorial, and residual contributions, respectively, and are given by the following expressions:whereandwhere, r, and, q, are the molecular volume and surface area, and Z is the coordination number. For the original UNIFAC model, the molecular volume and surface area are estimated from the group contribution values of ref. [19]. The residual term, , is evaluated from group contributions:where is the number of functional groups of the type, k, in a molecule of component, i, and is the residual contribution to the activity coefficient by the functional group, k, in the pure fluid, i. Since the pure fluid, i, is also a mixture of groups, the term, , is incorporated to reduce the residual term of the pure fluid to zero. The contribution to the residual portion of the activity by the functional group, k, is given by the following relationship:where is the surface area fraction of the functional group, m, in the mixture. The binary interaction parameter is between groups, m and n, while a is accounted for through the parameter, , where: T is the system temperature in Kelvin. As mentioned above, the expression for , presented in Equation (14), includes the functional group, k, contributions to activity, of both the mixture and the pure fluid. Several modifications to the original UNIFAC model have been proposed, with the most significant modifications made to the expression for the temperature dependence of binary interaction parameters, and the introduction of different combinatorial expressions, with unique group volume and area parameters, as well as component group fragmentations. In the modified UNIFAC (Dortmund) [3] a quadratic temperature dependence of the binary interaction parameter, , is proposed: Additionally, the combinatorial expression is given by:where The parameters of r and q are determined by data fitting, and not from the method of Bondi (1964). The modified UNIFAC (Dortmund) model was adapted further, for application to pharmaceutical systems, by Diedrichs and Gmehling [12]. This model was termed Pharma Modified UNIFAC. It was assumed, in that work, that certain functional group contributions become irrelevant in solutions of pharmaceutical molecules in common solvents, if the solubility is low, and can therefore be omitted. A unique group-fragmentation scheme is used in this model. Promising results for limited classes of solvents were obtained [12]. The model is however limited in applicability to a solute mole fraction of less than 0.1.

The COSMO-RS, COSMO-SAC and COSMO-RS (OL) models

Generally, the activity coefficient of a mixture is determined through the Gibbs excess energy function. Klamt [20] proposed a means of determining the activity coefficient, using chemical potentials from surface shielding charge densities determined by quantum-mechanical calculations. The Conductor-like Screening Model for Real Solvents (COSMO-RS) was introduced, as an a priori predictive model, and an alternative to the traditional group contribution-based models. In COSMO-RS, molecules of a solute–solvent system are treated as a combination of molecular-shaped, cavity surface segments. The concept involves modelling the placement of a “cavity” that is a replica of a molecule of the solute, with zero charge, inside the homogeneous theoretical solvent, with a fixed dielectric constant, ε. The energy change involved in this placement represents a component of the total Gibbs energy change of solvation. The replica molecule charges are then replaced, yielding a realistic solute. The energy change associated with this is the second contributor to the Gibbs energy change of solvation. To know how charges must be replaced, each shielding charge density (σ) must be characterized by a “sigma profile”. COSMO-RS (OL) is the in-built Dortmund Data Bank-modified version of the COSMO-RS model. The most significant modification to the model, in this version, includes an empirical correction term for hydrogen bonding, which is suggested to be over-compensated for in non-hydrogen bonding mixtures, in the original COSMO-RS model. The specifics of this modification are outlined in the original publication [4]. Lin and Sandler [5] have proposed some modifications to the original COSMO-RS model. The authors have stated that the expression for the chemical potential, given by ref. [20], does not converge with certain boundary conditions, and that the expression for the activity coefficient presented, does not satisfy certain thermodynamic consistency tests. The modifications of Lin and Sandler [5] result in the Conductor-like Screening Model-Segment Activity Coefficient model (COSMO-SAC), which is reviewed here. The derivation of the expression of the activity coefficient using the COSMO-SAC model is extensive and beyond the scope of this work, but the reader is referred to the original publications for both the COSMO-RS [4], [20] and COSMO-SAC [5] models for further details. The final expression for the activity coefficient of solute, i, in solvent S, , using the COSMO-SAC model is given by:where n is the total number of segments contributed by molecule, i. , is the surface charge density of segment, m, and, , is the frequency of surface charge density, m, of component, i, given by:where , is the total number of segments in component, i, with charge density, . , is the segment activity coefficient in the mixture for segments with charge density,, given by:where , is the exchange energy and k, is the Boltzmann constant. , is the segment activity coefficient in the pure component, i, for segments with charge density, . , is the Staverman–Guggenheim [21], [22] combinatorial term given by:where , is the surface area fraction given by:where , is the volume fraction parameter given by:and

Non-random two liquid segment activity coefficient model (NRTL-SAC)

The NRTL-SAC [6], [23] model, is based on the polymer NRTL model by Chen [24], and was developed specifically for the use in the modelling of the activity of complex molecules, such as pharmaceuticals. The non-ideality is accounted for based on “contributions” from four different conceptual segments that make up a particular component. These include polar-positive, polar-negative, hydrophobic and hydrophilic segments. Each molecular surface is conceptually divided into these segments, in different proportions of the molecular surface area. Every molecule is thus designated a conceptual segment surface “composition”. The surface interactions between pairs of segments are accounted for through constant binary interaction parameters only. The main differences between the original NRTL model of Renon and Prausnitz [25], and the NRTL-SAC model, include the concept of segment interaction, and the addition of a combinatorial term, as size/shape interactions become considerable in larger complex molecules. Additionally, the NRTL-SAC model has no in-built temperature dependency. The combinatorial term of Flory–Huggins [26], [27], is used in the model. The subscripts, A and B, are used to denote pure components, whereas the subscripts, i, j, k, m, and, m′, are used to represent segment-based species indices.wherewhere , is the total number of segments, i, in component, A, and, , is the segment mole fraction of component, A. The residual term is identical to that of the polymer NRTL [24] where:where , is the segment activity coefficient of species, m, in the mixture, and, , is the segment activity coefficient of species, m, in the pure component, A, and these are calculated from the following relations:whereandwhere , is the number of each segment of type, m, in component, A. , is the segment mole fraction of segment, j. , is the mole fraction of component, B. , , and, α, are the regular NRTL parameters, with being the binary interaction energy parameter between segment, j and m.

Experimental solubility and pure component property data

Pure component thermodynamic data

Pure component property data (melting temperature, enthalpy of fusion and heat capacity), of the active pharmaceutical ingredients selected for modelling in this work is limited in the literature. Bouillot et al [10] state that thermodynamic properties of the solids are scarcely accurate, when referring to experimentally determined heat of fusion and melting temperature data of pharmaceutical products. Bouillot et al [10] have proposed using average values of the available physical property data. In this work, the pure component data was used, where available, for the calculation of the activity coefficient from solubility measurements. However, in the case of mestanolone, the enthalpy of fusion was predicted by the method of Chickos and Acree [28]. The pure component properties from the literature, are presented in Table 1, along with molecular masses, van der Waals molecular surface area, and functional group diversity. Since the fragmentation of each molecule into its different functional groups was done in the same way as the original UNIFAC model, the functional group diversity represents the number of unique original UNIFAC functional groups in a molecule.

Table 1

Physical properties of the solutes used in this study.

Name	IUPAC name	Formula	CAS-RN	MM (g/mol)	fusTi (K)a	ΔfusHi (J/mol)b	ΔfusCpi (J/mol∙K)	No. of different functional groups	q₁
1,2-Benzophenanthrene	Chrysene	C₁₈H₁₂	218-01-9	228.29	528.15	26,135.40	39.73dc	2	5.52
1,3,5-Triphenylbenzene	1,3,5-Triphenylbenzene	C₂₄H₁₈	612-71-5	306.41	443.15	33,377.40	66.35 dc	2	7.92
2,3-Benzindene	9H-fluorene	C₁₃H₁₀	86-73-7	166.22	389.15	19,563.50	20.97d	3	4.22
2-Furancarboxylic acid	Furan-2-carboxylic acid	C₅H₄O₃	88-14-2	112.085	402.5 [29]	22,600 [30]	60.00 [13]	3	2.892
3-Nitrobenzoic acid	3-Nitrobenzoic acid	C₇H₅NO₄	121-92-6	167.121	414.15 [13]	21,400 [31]	60.00 [13]	4	4.048
9,10-Benzophenanthrene	Triphenylene	C₁₈H₁₂	217-59-4	228.29	471.15	25,086.00	31.33 d	2	5.52
Acenaphthene	1,2-Dihydroacenaphthylene	C₁₂H₁₀	83-32-9	154.21	367.15	21,522.50	20.93c	3	3.56
Adipic acid	Hexanedioic acid	C₆H₁₀O₄	124-04-9	146.143	419 [30]	33,700.00 [30]	88.60 [30]	2	4.608
Anthracene	Anthracene	C₁₄H₁₀	120-12-7	178.23	489.60	28,840.30	37.56 d	2	4.48
Ascorbic acid	(R)-3,4-dihydroxy-5-((S)-1,2-dihydroxyethyl)furan-2(5H)-one	C₆H₈O₆	50-81-7	176.126	465.15 [32]	29,200.00	60.00 [13]	-	-
Azelaic acid	Nonanedioic acid	C₉H₁₆O₄	123-99-9	188.224	372.4 [30]	30,400.00 [30]	103.60 [30]	2	6.228
Betulin	Lup-20(29)-ene-3β,28-diol	C₃₀H₅₀O₂	473-98-3	442.73	528.22 [33]	55,169.00 [33]	150.23c	6	14.55
Biphenyl	Biphenyl	C₁₂H₁₀	92-52-4	154.21	341.95	18,580.00	39.69 d	2	4.24
Citric acid	2-Hydroxypropane-1,2,3-tricarboxylic acid	C₆H₈O₇	77-92-9	192.125	426.15 [32]	26,700.00	70.00 [13]	4	5.336
Diglycolic acid	2-(Carboxymethyloxy)acetic acid	C₄H₆O₅	110-99-6	134.089	421.15 [32]	26,400.00	60.00 [13]	3	3.768
Diosgenin	(3β,25R)-spirost-5-en-3-ol	C₂₇H₄₂O₃	512-04-9	414.63	474.35 [34]	52,105.00 [34]	125.57c	7	12.68
Estrone	(8R,9S,13S,14S)-3-hydroxy-13-methyl- 6,7,8,9,11,12,13, 14,15,16- decahydrocyclopenta[a]phenanthren- 17- one	C₁₈H₂₂O₂	53-16-7	270.37	527.62 [35]	45,101.00 [35]	60.41c	9	7.53
Fluoranthene	Fluoranthene	C₁₆H₁₀	206-44-0	202.26	380.95	18,858.10	30.29 d	2	4.72
Glutaric acid	pPntanedioic acid	C₅H₈O₄	110-94-1	132.116	363.9 [30]	21,100.00 [30]	83.60 [30]	2	4.068
Hydrocortisone	(11β)-11,17,21-trihydroxypregn-4-ene-3,20-dione	C₂₁H₃₀O₅	50-23-7	362.47	485.15 [36]	33,890.40 [36]	101.24c	-	-
Levulinic acid	4-Oxopentanoic acid	C₅H₈O₃	123-76-2	116.117	306.15 [37]	9220.00 [37]	60.00 [13]	3	3.792
Malic acid	Hydroxybutanedioic acid	C₄H₆O₅	6915-15-7	134.089	403.15 [32]	25,300.00 [32]	60.00 [13]	4	3.8
Malonic acid	Propanedioic acid	C₃H₄O₄	141-82-2	104.062	407.95 [32]	25,480.00	60.00 [13]	2	2.988
Mestanolone	(5α,17β)-17-hydroxy-17-methylandrostan-3-one	C₂₀H₃₂O₂	521-11-9	304.47	465.65 [38]	21,504e	82.66c	6	9.54
m-Hydroxybenzoic acid	3-Hydroxybenzoic acid	C₇H₆O₃	99-06-9	138.123	474.8 [39]	35,920.00 [39]	60.00 [13]	4	3.624
m-Terphenyl	1,3-Diphenylbenzene	C₁₈H₁₄	33-76-3	230.31	362.15	24,073.50	44.74 d	2	6.08
Naphthalene	Bicyclo[4.4.0]deca-1,3,5,7,9-pentene	C₁₀H₈	91-20-3	128.17	353.35	19,110.00	19.07 d	2	3.44
o-Terphenyl	1,2-Diphenylbenzene	C₁₈H₁₄	84-15-1	230.31	331.15	17,179.10	77.88 d	2	6.08
Oxalic acid	Ethanedioic acid	C₂H₂O₄	144-62-7	90.035	465.26 [40]	58,158.00 [40]	50.00 [13]	1	2.448
Phenanthrene	Phenanthrene	C₁₄H₁₀	85-01-8	178.23	369.40	18,627.20	24.48 d	2	4.48
Phthalic acid	Benzene-1,2-dicarboxylic acid	C₈H₆O₄	88-99-3	166.133	463.45 [41]	36,500.00 [41]	100.00 [13]	3	4.288
p-Hydroxybenzoic acid	4-Hydroxy benzoic acid	C₇H₆O₃	99-96-7	138.123	487.15 [29]	31,400.00 [29]	63.10 [29]	4	3.624
p-Hydroxyphenyl acetic acid	2-(4-Hydroxyphenyl) acetic acid	C₈H₈O₃	156-38-7	152.15	422.85 [29]	28,000.00 [29]	59.70 [29]	4	4.164
Pimelic acid	Heptanedioic acid	C₇H₁₂O₄	111-16-0	160.17	368.2 [30]	25,200.00 [30]	88.60 [30]	2	5.148
Prednisolone	(11β)-11,17,21-Trihydroxypregna-1,4-diene-3,20-dione	C₂₁H₂₈O₅	50-24-8	360.45	506.00 [42]	59,303.20 [42]	98.75c	-	-
p-Terphenyl	1,4-Diphenylbenzene	C₁₈H₁₄	92-94-4	230.31	486.15	35,476.10	27.22 d	2	6.08
Pyrene	Pyrene	C₁₆H₁₀	129-00-0	202.26	422.15	17,100.00	25.30 d	2	4.72
Salicylic acid	2-Hydroxybenzoic acid	C₇H₆O₃	69-72-7	138.123	431.35 [43]	27,090.00 [43]	60.00 [13]	4	3.624
Suberic acid	Octanedioic acid	C₈H₁₄O₄	505-48-6	174.197	413.2 [30]	41,800.00 [30]	98.60 [30]	2	5.688
Succinic acid	Butanedioic acid	C₄H₆O₄	110-15-6	118.089	455.2 [30]	34,000.00 [30]	69.60 [30]	2	3.528
Tataric acid	2,3-Dihydroxybutanedioic acid	C₄H₆O₆	133-37-9	150.088	479.15 [32]	30,100.00 [32]	70.00 [13]	3	4.072
Testosterone	(8R,9S,10R,13S,14S,17S)-17-hydroxy-10,13-dimethyl-1,2,6,7,8,9,11, 12,14,15,16,17-dodeca hydrocyclopenta[a]phenanthren-3-one	C₁₉H₂₈O₂	58-22-0	288.43	424.40 [44]	27,946.20 [44]	74.29c	7	8.83
1,2-Benzophenanthrene	Chrysene	C₁₈H₁₂	218-01-9	228.29	528.15	26,135.40	39.73dc	2	5.52
1,3,5-Triphenylbenzene	1,3,5-Triphenylbenzene	C₂₄H₁₈	612-71-5	306.41	443.15	33,377.40	66.35 dc	2	7.92
2,3-Benzindene	9H-Fluorene	C₁₃H₁₀	86-73-7	166.22	389.15	19,563.50	20.97d	3	4.22
2-Furancarboxylic acid	Furan-2-carboxylic acid	C₅H₄O₃	88-14-2	112.085	402.5 [29]	22,600 [31]	60.00 [13]	3	2.892
3-Nitrobenzoic acid	3-Nitrobenzoic acid	C₇H₅NO₄	121-92-6	167.121	414.15 [45]	21,400 [45]	60.00 [13]	4	4.048

Obtained from the Dortmund Data Bank (2012) [46] unless otherwise stated.

Predicted in this work.

Calculated from heat capacity data (DDB, 2012).

Predicted by the method of [28].

Physical properties of the solutes used in this study. Obtained from the Dortmund Data Bank (2012) [46] unless otherwise stated. Obtained from the Dortmund Data Bank (2012) [46] unless otherwise stated. Predicted in this work. Calculated from heat capacity data (DDB, 2012). Predicted by the method of [28]. A principal component analysis was conducted on the test set using the solute solubility in an alcohol/non-polar solvent, and in water, temperature of fusion, enthalpy of fusion, and molecular mass, as input descriptors. The sample set of components selected were found to be heterogeneous, with a minimum of 80% of the datasets described by all combinations of input descriptors.

API selection and experimental solubility data

Solubility data for the APIs selected here (specifically steroids and triterpenes), are extremely limited in the literature. It is therefore important that preliminary predictions of the solubility of these solutes can be made in order to provide, at the very least, initial estimates for later use in the design and optimization of separation processes such as crystallization. While all components contain a similar basic structure, they differ according to the number of ester, ketone and alcohol groups in the molecule which should be the major cause of the dependence of the solubilities on the solvent. The major differences in solubility between the solutes are due to the differences in melting temperature, and heat of fusion. The components, and literature sources [29], [35], [39], [47], [48], [49], [50], [51], [52], [53], [54], [55], [56], [57], [58], [59], [60], [61], [62], for the experimental solubility data, are presented in Table S1 in Appendix A.

Results and discussion

In order to quantify the quality of the predictions for the various models tested, a Percentage Deviation (PD) was defined:where -, is the calculated and experimental solute compositions, and N, is the total number of data points considered. , is the average experimental composition for a particular set.

Assumptions regarding the heat capacity change of fusion

As menioned above, the availability of the physical property data for the solutes considered is limited. The standard state used in the calculation of these properties is a pure hypothetical liquid at a temperature much lower than the actual melting point. In order to calculate the change of heat of fusion with temperature, the difference of the heat capacities of the solid and the subcooled liquid is required (given by Equation (6)). This calculation is often simplified by assuming a negligible heat capacity difference in this range (given by Equation (7)). An alternative assumption is to approximate the heat capacity change as the entropy of fusion (given by Equation (8)). Uncertainties can thus be introduced in the calculation of the activity coefficient, from solubility data, and vice versa. The effect of these two assumptions is considered in this work, using benzene as a reference solvent. These results are compared in Table 2. Mishra and Yalkowsky [15] have analysed this behaviour for similar solutes, in benzene. In their work, for APIs in benzene, employing the UNIFAC combinitorial term, with the Scatchard–Hildebrand [63], [64] residual term, with the assumption of zero heat capacity changes, provided the best prediction of solubility. Benzene is used as a representative solvent for all hydrophobic solvents (alkane, aliphatics, alkenes, alkynes) due to the abundance of experimental data available in the literature for pharmaceutical systems with benzene as the solvent. It is not recommended as a pharmaceutical process solvent as it is a class one residual solvent. In practice, less hazardous hydrophobic solvents such as alkanes are used. Unfortunately, the data for pharmaceutical + alkane systems for a specific alkane e.g. hexane was not abundant in the literature and so a comprehensive result regarding heat capacity assumptions would not have been possible. It is assumed that the results obtained in this work using benzene would be very similar for systems composed of other hydrophobic solvents.

Table 2

Mean Percentage Deviations of various solutes in benzene.

Model	Heat capacity	Combinatorial	Residual	PDa (%)	Reference
M1	ΔfusCpi=0	Staverman–Guggenheim	UNIFAC	20.24	This work
M2	ΔfusCpi=0	Staverman–Guggenheim with modified UNIFAC parameters and free-volume correction	mod UNIFAC (Dortmund)	15.86	This work
M3	ΔfusCpi=0	Staverman–Guggenheim	COSMO-RS (OL)	18.33	This work
M4	ΔfusCpi=0	Staverman–Guggenheim	COSMO-SAC	21.56	This work
M5	ΔfusCpi=ΔfusSi	Staverman–Guggenheim	UNIFAC	29.09	This work
M6	ΔfusCpi=ΔfusSi	Staverman–Guggenheim with modified UNIFAC parameters	mod UNIFAC (Dortmund)	23.79	This work
M7	ΔfusCpi=ΔfusSi	Staverman–Guggenheim	COSMO-RS (OL)	25.60	This work
M8	ΔfusCpi=ΔfusSi	Staverman–Guggenheim	COSMO-SAC	29.67	This work
M9	ΔfusCpi=est. value	Staverman–Guggenheim	UNIFAC	24.95	This work
M10	ΔfusCpi=est. value	Staverman–Guggenheim with modified UNIFAC parameters	mod UNIFAC (Dortmund)	19.76	This work
M11	ΔfusCpi=est. value	Staverman–Guggenheim	COSMO-RS (OL)	21.84	This work
M12	ΔfusCpi=est. value	Staverman–Guggenheim	COSMO-SAC	25.58	This work
r1	ΔfusCpi=0	Flory–Huggins	Scatchard–Hildebrand	20.00	[15]
r2	ΔfusCpi=ΔfusSi	Flory–Huggins	Scatchard–Hildebrand	31.62	[15]
r3	ΔfusCpi=0	Staverman–Guggenheim	UNIFAC	37.42	[15]
r4	ΔfusCpi=ΔfusSi	Staverman–Guggenheim	UNIFAC	53.85	[15]
r5	ΔfusCpi=0	Flory–Huggins	UNIFAC	40.00	[15]
r6	ΔfusCpi=ΔfusSi	Flory–Huggins	UNIFAC	56.57	[15]
r7	ΔfusCpi=0	UNIFAC	Scatchard–Hildebrand	17.32	[15]
r8	ΔfusCpi=ΔfusSi	UNIFAC	Scatchard–Hildebrand	28.28	[15]

Mean Percentage Deviations of various solutes in benzene. All three assumptions regarding the heat capacity change of fusion (), at solid–liquid equilibrium, were explored here. These included treating, , or using an experimental, or empirically-predicted value for . The mean percentage deviations between experimental data, and the model predictions, are presented in Table S1. The results of overall performance are presented in Table 2, along with the activity coefficient model details used for the predictions. The effect of the activity coefficient model performance can be eliminated by only comparing each assumption case, on a model by model basis. In the case of the lower molecular mass APIs, hydrophobic and hydrophilic solutes were treated separately, as virtually immiscible solute–solvent systems generally do not provide consistent trends with regards to prediction. Furthermore, the quality of experimental data for such systems is usually poor. Triterpenes were all treated simultaneously as these APIs are neither strictly hydrophobic nor hydrophilic. A limited set of data was found in the literature. In some instances, it was possible to calculate from experimental pure component solid and liquid heat capacity data, where available in the literature. To predict for those systems, for which no experimental data was available, an empirical correlation was developed by correlating the available data with solute molecular masses and van der Waals surface areas. For non-oxygen containing solutes the following relation was determined:and for oxygenated solutes:where is the heat capacity change of fusion, MW is the component molecular mass (g/mol) and q is the molecular surface area. Since the above equations have no theoretical basis they are only recommended for estimates in the absence of any experimental data of the solute being considered. The empirical model parameters were determined by least squares regression using the following objective function:where the superscripts exp and calc refer to the experimental and calculated values respectively, and n is the total number of experimental points considered. The uncertainty in the calculated is estimated to be 20%–25%. The sources of the pure component properties used are indicated in Table 1. For the systems comprised of benzene as a solvent, the results determined here correspond with the results in ref. [15]. Namely, the assumption of negligible seemingly provides the closest replication of the experimental data. This finding may be due to poor estimates of. For the systems where water is used as the solvent (summarized in Table 3), the assumption of an estimated value provides the lowest replication of experimental data. It is therefore clear that ideal solubility assumptions are not suitable when comparing the performances of Equations ((6), (7), (8))).

Table 3

Mean Percentage Deviations of various solutes in water.

Model	Heat capacity	Combinatorial	Residual	PDa (%)	Reference
M1	ΔfusCpi=0	Staverman–Guggenheim	UNIFAC	116.40	This work
M2	ΔfusCpi=0	Staverman–Guggenheim with modified UNIFAC parameters and free-volume correction	mod UNIFAC (Dortmund)	283.43	This work
M3	ΔfusCpi=0	Staverman–Guggenheim	COSMO-RS (OL)	107.09	This work
M4	ΔfusCpi=0	Staverman–Guggenheim	COSMO-SAC	113.19	This work
M5	ΔfusCpi=ΔfusSi	Staverman–Guggenheim	UNIFAC	104.59	This work
M6	ΔfusCpi=ΔfusSi	Staverman–Guggenheim with modified UNIFAC parameters and free-volume correction	mod UNIFAC (Dortmund)	141.61	This work
M7	ΔfusCpi=ΔfusSi	Staverman–Guggenheim	COSMO-RS (OL)	114.95	This work
M8	ΔfusCpi=ΔfusSi	Staverman–Guggenheim	COSMO-SAC	130.26	This work
M9	ΔfusCpi=est. value	Staverman–Guggenheim	UNIFAC	98.73	This work
M10	ΔfusCpi=est. value	Staverman–Guggenheim with modified UNIFAC parameters	mod UNIFAC (Dortmund)	141.36	This work
M11	ΔfusCpi=est. value	Staverman–Guggenheim	COSMO-RS (OL)	117.23	This work
M12	ΔfusCpi=est. value	Staverman–Guggenheim	COSMO-SAC	134.09	This work

Mean Percentage Deviations of various solutes in water.

Selecting a suitable predictive activity coefficient model

Essentially, all predictive models require certain information about the solute in order to be utilized. For the UNIFAC-based models group, volume and surface, as well as group interaction parameters, represent the functional groups, and their energetic interactions. The COSMO-based models require so-called sigma profiles, that characterize the shielding charge distribution, as well as the cavity volume and surface. In this work the Oldenburg version of COSMO-RS [20] (COSMO-RS (OL) [4]) was used. Unfortunately group interaction parameters and segment area parameters were not available for all groups of solutes and solvents considered for prediction. Hence, not all solubilities could be described by all of the predictive methods. These systems are indicated by a dash in Table S1. The sigma profiles of the solutes, used in the COSMO-RS (OL) and COSMO-SAC methods, were determined by Gaussian 03 calculations with the hybrid density function theory type B3LYP, and basis sets 6-311G(d,p) [65]. These profiles were obtained from the Dortmund Data Bank software package (2012) [46]. The mean percentage deviations between experimental data and the model predictions are presented in Table S1. These results are presented graphically in Fig. 1 for ease of comparison. In a few cases, the SLE calculation failed to converge with a composition, and these are indicated in Table S1.

Fig. 1

Comparison of the natural logarithms of experimental and model calculated solubility composition (x1).

Comparison of the natural logarithms of experimental and model calculated solubility composition (x1). In the majority of the systems tested, all the predictive models tend to underestimate the solubility. Furthermore, very large discrepancies are apparent for sparingly soluble solute–solvent mixtures, such as the triterpines. In Table 4, however, it is shown that the original UNIFAC model with the Staverman–Guggenheim combinatorial term provides a superior replication of the experimental solubility and in some cases, is almost twice as precise. It must be noted, however, that the UNIFAC model cannot be applied to the systems composed of prednisolone and hydrocortisone, as these molecules cannot be fragmented by UNIFAC.

Table 4

Mean Percentage Deviations of triterpene/steroid solutes in various solvents.

Model	Heat capacity	Combinatorial	Residual	PDa (%)	Reference
M1	ΔfusCpi=0	Staverman–Guggenheim	UNIFAC	82.22	This work
M2	ΔfusCpi=0	Staverman–Guggenheim with modified UNIFAC parameters and free-volume correction	mod UNIFAC (Dortmund)	157.47	This work
M3	ΔfusCpi=0	Staverman–Guggenheim	COSMO-RS (OL)	146.41	This work
M4	ΔfusCpi=0	Staverman–Guggenheim	COSMO-SAC	139.25	This work
M5	ΔfusCpi=ΔfusSi	Staverman–Guggenheim	UNIFAC	82.56	This work
M6	ΔfusCpi=ΔfusSi	Staverman–Guggenheim with modified UNIFAC parameters and free-volume correction	mod UNIFAC (Dortmund)	103.08	This work
M7	ΔfusCpi=ΔfusSi	Staverman–Guggenheim	COSMO-RS (OL)	86.07	This work
M8	ΔfusCpi=ΔfusSi	Staverman–Guggenheim	COSMO-SAC	70.94	This work
M9	ΔfusCpi=est. value	Staverman–Guggenheim	UNIFAC	82.70	This work
M10	ΔfusCpi=est. value	Staverman–Guggenheim with modified UNIFAC parameters	mod UNIFAC (Dortmund)	116.92	This work
M11	ΔfusCpi=est. value	Staverman–Guggenheim	COSMO-RS (OL)	94.66	This work
M12	ΔfusCpi=est. value	Staverman–Guggenheim	COSMO-SAC	87.23	This work

Mean Percentage Deviations of triterpene/steroid solutes in various solvents. For systems with benzene as a solvent, the modified UNIFAC (Dortmund) model, with the Staverman–Guggenheim combinatorial term, and free-volume correction, is recommended; and the original UNIFAC model, with the Staverman–Guggenheim combinatorial term, is recommeneded when water is used as a solvent. In Fig. 2, Fig. 3, Fig. 4 an attempt is made to correlate the prediction capabilities of each model considered with molecular weight, van der Waals molecular surface area, and functional group diversity, in a non-polar solvent (benzene). The van der Waals molecular surface area was determined using the method of Bondi [19]. It is confirmed, from the presented figures, that virtually no correlation of these parameters to solubility exists in the systems considered here. Similar results were obtained when water is used as a solvent. It must be mentioned that the PDs in Fig. 2, Fig. 3, Fig. 4 are much larger than those presented in Table 2, as the mean compsition (, was calculated separately for each solute–solvent set in this case.

Fig. 2

Correlation of model percentage deviations with molecular mass of solute.

Fig. 3

Correlation of model percentage deviations with van der Waals area parameter (q1) in benzene as a solvent.

Fig. 4

Correlation of model percentage deviations with number of different functional groups present in solute for benzene as a solvent.

Correlation of model percentage deviations with molecular mass of solute. Correlation of model percentage deviations with van der Waals area parameter (q1) in benzene as a solvent. Correlation of model percentage deviations with number of different functional groups present in solute for benzene as a solvent. The NRTL-SAC model was applied to a subset of the dataset considered here. Comparisons are only made to experimental data, as the model is semi-correlative and would not offer a fair comparison to the purely predictive models discussed above. In order to apply the NRTL-SAC model to solubility predictions, the segment area parameters (X,Y+ ,Y− and Z) must be known for the solutes and solvents considered. If these parameters are not available in the literature, they can be regressed from solubility data via the calculation of the activity coefficient, and using pure component property data. Some of the NRTL-SAC model parameters for the solutes were not available in the literature, and were therefore determined by the regression of the solubility data provided in Table S1. These new parameters are available in Table 5, along with literature sources where available.

Table 5

Calculated segment area parameters for NRTL-SAC.

Solute	This work				Literaturea
Solute	X	Y+	Y−	Z	X	Y+	Y−	Z
Betulin	0.0441	0.0743	0.0189	0.0024	–	–	–	–
Diosgenin	0.1651	0.0112	0.1696	0.0183	–	–	–	–
Mestanolone	0.3224	1.1220	0.7231	0.1953	–	–	–	–
Hydrocortisone	0.4130	1.3020	0.9420	0.7110	0.4010	1.2480	0.9700	1.2480
Estrone	0.4822	1.4240	0.710	0.1973	0.4990	1.5210	0.6790	0.1960
Prednisolone	0.3945	1.1039	1.8975	0.3290	–	–	–	–
Testosterone	1.041	0.2290	0.5460	0.7010	1.0510	0.2330	0.7710	0.6690

Taken from Chen and Song [6].

Calculated segment area parameters for NRTL-SAC. Taken from Chen and Song [6]. After the application of the NRTL-SAC model, solubility predictions were performed using the new segment area parameters, and solvent parameters, provided by Chen and Song [6], as shown in Fig. 5. The results reveal that the NTRL-SAC model generally does not exhibit any tendency to over-, or underpredict, the experimental solubility. Again, the predictive capability of the model is a qualitative representation, in most cases, of the systems of steroidal APIs that were tested. This is a significant deficiency, as the model is semi-correlative as four component specific model parameters are required for application.

Fig. 5

Comparison of the natural logarithms of experimental and model calculated solubility composition (x1) with the NRTL-SAC model.

Comparison of the natural logarithms of experimental and model calculated solubility composition (x1) with the NRTL-SAC model. In Fig. 6 a decision tree is presented to assist in the selection of an appropriate model and assumption for depending on the solute and solvent class.

Fig. 6

Decision tree for predictive model selection and assumption.

Conclusion

Where model parameters were available in the literature, solubility predictions were carried out, using various predictive models, for the polycyclic steroidal and triterpene solutes considered in this work. It was found that the modified UNIFAC (Dortmund) model provided a superior solubility prediction, when benzene as a solvent was considered. The original UNIFAC model provided a superior solubility prediction in aqueous systems. The heat capacity changes of fusion were found to be solvent dependent; and hence, ideal solubility could not be assumed. A degree of correlation was found between molecular mass and van der Waals surface area, and heat capacity changes of fusion. Generally, the UNIFAC-based, COSMO-based models tended to underestimate the solubility in the triterpene solutes, while the NRTL-SAC model showed no appreciable under- or overestimating tendencies. However, the original UNIFAC model provided a superior solubility prediction for the triterpene/steroid systems, with no significant effect from the assumptions regarding heat capacity changes upon fusion. New NRTL-SAC segment area parameters have been determined for some of the solutes considered in this work. This information can be used as a subsidiary guide for the selection of solvents in crystallization process design involving the studied solutes, however experimental results will be required if quantitative data is desired.

Conflicts of interest

The authors declare that there are no conflicts of interest.

10 in total

Model evaluation for the prediction of solubility of active pharmaceutical ingredients (APIs) to guide solid-liquid separator design.

Introduction

Theory

Solid–liquid phase equilibrium

Predictive activity coefficient models

The UNIFAC and modified UNIFAC (Dortmund) model

The COSMO-RS, COSMO-SAC and COSMO-RS (OL) models

Non-random two liquid segment activity coefficient model (NRTL-SAC)

Experimental solubility and pure component property data

Pure component thermodynamic data

API selection and experimental solubility data

Results and discussion

Assumptions regarding the heat capacity change of fusion

Selecting a suitable predictive activity coefficient model

Conclusion

Conflicts of interest

1. Polymorphism and thermodynamics of m-hydroxybenzoic acid.

2. Solubility studies of estrone in organic solvents using gas-liquid chromatography.

3. Differential molar heat capacities to test ideal solubility estimations.

4. An investigation of the distribution coefficients of some androgen esters using paper chromatography.

5. An experimental method for determining the Hildebrand solubility parameter of organic nonelectrolytes.

6. Solubility and partitioning VI: octanol solubility and octanol-water partition coefficients.

7. Extended Hildebrand solubility approach: testosterone and testosterone propionate in binary solvents.

8. Analysis of the solubilization of steroids by bile salt micelles.

9. Solubility of hydrocortisone in organic and aqueous media: evidence for regular solution behavior in apolar solvents.

10. Measurement and prediction of solubilities of active pharmaceutical ingredients.