Literature DB >> 33913713

Identifying Conformational Isomers of Organic Molecules in Solution via Unsupervised Clustering.

Veselina Marinova1,2, Laurence Dodd1, Song-Jun Lee1, Geoffrey P F Wood3, Ivan Marziano4, Matteo Salvalaglio1.   

Abstract

We present a systematic approach for the identification of statistically relevant conformational macrostates of organic molecules from molecular dynamics trajectories. The approach applies to molecules characterized by an arbitrary number of torsional degrees of freedom and enables the transferability of the macrostates definition across different environments. We formulate a dissimilarity measure between molecular configurations that incorporates information on the characteristic energetic cost associated with transitions along all relevant torsional degrees of freedom. Such metric is employed to perform unsupervised clustering of molecular configurations based on the Fast Search and Find of Density Peaks algorithm. We apply this method to investigate the equilibrium conformational ensemble of Sildenafil, a conformationally complex pharmaceutical compound, in different environments including the crystal bulk, the gas phase, and three different solvents (acetonitrile, 1-butanol, and toluene). We demonstrate that while Sildenafil can adopt more than 100 metastable conformational configurations, only 12 are significantly populated across all of the environments investigated. Despite the complexity of the conformational space, we find that the most abundant conformers in solution are the closest to the conformers found in the most common Sildenafil crystal phase.

Entities:  

Mesh:

Substances:

Year:  2021        PMID: 33913713      PMCID: PMC8278389          DOI: 10.1021/acs.jcim.0c01387

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


Introduction

Conformational isomerism in organic molecules is an important characteristic that bears significance in a variety of problems. For example, binding properties of proteins in protein–ligand complexes are controlled by their conformational configuration by affecting association/dissociation rates and by entropic contributions to the process.[1,2] Understanding the details and mechanisms of conformational changes that proteins undergo is an important part of modern drug discovery methodologies.[3] For small organic molecules, the ability to adopt different conformational configurations can open the possibility for the formation of multiple crystal forms known as conformational polymorphs[4,5]—crystal structures of components with the same chemical formula but different molecular shapes. This phenomenon is particularly important in the pharmaceutical industry where the uncontrolled occurrence of an undesired polymorphic form can affect the stability, shelf-life, or efficacy of the drug. In the field of crystallization, conformational rearrangements are not only relevant to polymorphism. In our previous work[6] on the study of ibuprofen conformational isomerism at the crystal/solution interface, we demonstrate how, even for relatively small systems, conformational rearrangements, crystal growth, and dissolution are inherently coupled. Additionally, state-to-state transitions of a molecule along its path of incorporation into the crystal from solution may be limited by conformational rearrangements. Computational studies of conformational rearrangement in small organic molecules often use internal torsional angles to describe the adopted molecular configuration.[6−9] Torsional angles are a convenient way of describing rearrangements as they provide a fine-grained comprehensive picture of the internal molecular configuration space. To describe the conformation of larger molecules such as peptides or aliphatic chains, however, resorting to descriptors such as end-to-end distance or root-mean-square deviation (RMSD)[1,10] is a common choice, made necessary by the fact that the torsional angle space for these systems is high-dimensional and impractical to read and interpret. A critical drawback of this approach is that by reducing the dimensionality of the descriptors’ space used to represent configurations, degeneracy is introduced, and consequently information is lost. More generally, reliable conformational descriptors are particularly important when implementing enhanced sampling techniques. Enhanced sampling techniques are heavily reliant on the use of appropriate system descriptors. Particularly for studying self-assembly processes, conformationally flexible systems currently present a major challenge,[11] thereby driving the search for a systematic approach to their classification. Dividing the conformational space of large organic molecules is often done via partitional clustering methods, meaning that the data is assigned into groups without any hierarchical structure, based on a chosen criterion.[12] One of the best-known partitioning algorithms is k-means.[13] The idea behind this algorithm is to define a k-centroid for each cluster and measure the distance between a data point and each of the cluster centers. Many computational works have achieved partitioning of molecular configurational space through k-means clustering-based methodologies.[14−18] Despite its ease of use, it has a few important limitations. Cluster centers can be difficult to define a priori, as well as, k-means can be very sensitive to outliers and noise.[19,20] A partition-based algorithm that has tackled the drawbacks of k-means clustering is affinity propagation (AP),[21] where all data points are regarded as potential cluster centers. The negative distance between data points is their affinity, so the bigger the sum of the affinity of one data point to other data points, the higher the probability of this data point being a cluster center is. AP has been implemented in the study of protein conformations;[22] however, it has also been regarded as a complex and costly approach.[19,23] As an alternative to distance-based algorithms, density-based algorithms have been developed.[20,24−28] They work on the principle of assigning densities to local points and are able to separate clusters based on high- and low-density regions. Such algorithms not only do not require defining the number of cluster centers a priori but also allow the identification of nonspherical clusters. Particularly suited to the analysis of molecular dynamics (MD) simulations is the Fast Search and Find of Density Peaks (FSFDP) clustering algorithm, developed by Rodriguez and Laio.[26] By computing the distance between all pairs of data points, the algorithm identifies the points with the highest density in their neighborhood as the cluster centroids. Such tools, which allow the systematic classification of molecular configurations regardless of the dimensionality of the space of descriptors necessary to completely capture every conformational change, have the potential to improve existing methodologies for studying the effects of conformational rearrangements during crystal nucleation and growth.[6] Here, we propose a methodology that enables the study of conformational isomerism in a general way for systems with a large number of torsional degrees of freedom. Our approach, based on the application of the Fast Search and Find of Density Peaks clustering algorithm,[26] allows defining a set of conformational states that is common to multiple environments (i.e., solvents) and enables a systematic assessment of their impact on the conformational landscape. We demonstrate this approach by studying the conformational rearrangement of sildenafil, a commercially available active pharmaceutical ingredient (API).[7] Sildenafil is the main component of Viagra,[7] which is known to have two polymorphic forms.[29,30] Sildenafil is a relatively large molecule consisting of 63 atoms and a number of ring structures. The two forms of sildenafil (denoted form I and form II) are morphotropically related to one another as a noncrystallographic rearrangement can transform one to the other,[30] with form I being the thermodynamically stable form. Both forms have two molecules in the asymmetric unit adopting different conformations.[29,30] With the use of a data clustering approach, we demonstrate how characteristic conformational configurations can be identified a priori for a molecule in the gas phase to then extract quantitative information on conformational states from enhanced sampling molecular dynamics simulations performed in solution under experimentally relevant conditions. Our methodology also enables the breakdown of the free energy of a conformational state into enthalpic and entropic contributions, providing a valuable insight into the effect of the solvent on conformational isomerism. With this work, we aim to propose a method for conformational analysis which provides a route toward achieving rapid and automated conformational classification, enabling the comprehensive study of conformational isomerism in solution for systems for which it is currently impractical.

Methods

In this work, molecular dynamics (MD) simulations are used to study the conformational isomerism of sildenafil in the crystal bulk, in the gas phase, and in three solvents—acetonitrile, 1-butanol, and toluene. MD is combined with well-tempered metadynamics (WTmetaD) to enable the study of the conformational rearrangements of sildenafil in solution. The Fast Search and Find of Density Peaks (FSFDP) clustering method, developed by Rodriguez and Laio,[26] is used to identify sildenafil conformers in the gas phase and generate a characteristic fingerprint in torsional angle space for each of them. The metric used to define the similarity between configurations includes information on the free energy cost associated with transitions in every degree of freedom explicitly considered. The fingerprints are used to postprocess the biased trajectories in solution and assign a conformational macrostate for each trajectory step. Through the implementation of a reweighting procedure, the equilibrium probability of conformers, as well as enthalpic and entropic contributions to the free energy of each conformer for each of the solvents considered, is obtained. In this study, the molecular rearrangement of the drug is described in torsional angle space by considering all internal dihedral angles, as shown in Figure . By including all torsional degrees of freedom of the molecule, this study adopts a systematic and transferable strategy for tackling conformational rearrangement.
Figure 1

Sildenafil structure, where hydrogen atoms have been excluded for simplicity. The six internal torsional angles, labeled τA, τB, τC, τD, τE, and τF, are marked on the structure. All images of molecular structures shown have been generated with visual molecular dynamics (VMD).[31]

Sildenafil structure, where hydrogen atoms have been excluded for simplicity. The six internal torsional angles, labeled τA, τB, τC, τD, τE, and τF, are marked on the structure. All images of molecular structures shown have been generated with visual molecular dynamics (VMD).[31]

Molecular Dynamics Setup

Molecular dynamics simulations of form I and form II crystal polymorphs of sildenafil, a sildenafil molecule in the gas phase, and a sildenafil molecule in three different solvents were performed using the Generalized Amber Force Field (GAFF).[32] For all systems considered in this work, GAFF is able to reproduce properties consistent with experimental data (see the Supporting Information). MD simulations were performed with Gromacs 5.1.4[33] with an explicit representation of the solvent. Force field parameters for solvent molecules were obtained from the Virtual Chemistry solvent database.[34,35] A standard cut-off distance of 1.0 nm for the nonbonded interactions was chosen, along with including long-range intermolecular interactions using the particle-mesh Ewald (PME) approach.[36] A time step of 2 fs was used. Temperature and pressure control have been implemented through the use of the Bussi–Donadio–Parrinello thermostat,[37] Berendsen barostat,[38] and Parrinello–Rahman barostat.[39] More detail on the applied pressure control and the recovered system density in each case can be found in the Supporting Information.

Simulations of the Crystal Bulk

Supercells of size 3.5 × 3.5 × 5.0 and 7.0 × 3.5 × 2.3 nm3, representing crystal forms I and II, respectively, were set up containing 96 molecules each. Crystal structure.cif files[29] at ambient temperature and pressure were obtained from the Cambridge Structural Database (CSD) under deposition codes QEGTUT and QEGTUT02. In both cases, unbiased MD simulations were performed for 10 ns in the isothermal–isobaric ensemble, implemented by applying an anisotropic pressure control.

Simulation of Sildenafil in the Gas Phase

A 450 ns unbiased simulation of a sildenafil molecule in a box of 2.0 × 2.0 × 2.0 nm3 in a vacuum was carried out in the canonical ensemble. A free energy profile of each torsional angle as denoted in Figure was obtained. All one-dimensional free energy profiles are reported in the Supporting Information.

Simulations of Sildenafil in Solution

Simulations in solution were set up by solvating a single sildenafil molecule with each of the three solvents used in this study—acetonitrile, 1-butanol, and toluene—using the insert-molecules utility in Gromacs in a box of an approximate size of 4 × 4 × 4 nm3. MD simulations were performed in combination with well-tempered metadynamics. All simulations were performed in the isothermal–isobaric ensemble (NPT) at a pressure of 1 bar and temperature of 300 K.

Well-Tempered Metadynamics Setup

Metadynamics was implemented to enhance fluctuations in the internal rearrangement of sildenafil in solution for computational efficiency. The bias was applied as a function of the τA torsional angle as shown in Figure in blue. The choice of the collective variables (CV) was made based on 10 ns exploratory MD simulations, which revealed that overcoming the barrier associated with the rotation of τA in an efficient way requires enhanced sampling in all three solvents. The biasing protocol was applied in the form of Gaussian functions with a width of 0.3 rad and height of 2.5 kBT at a rate of every 500 simulation steps with a bias factor of 15 K. The use of WTmetaD was implemented through plumed 2.4.[40] All of the data and input files, required to reproduce the results reported in this paper, are available on plumed-NEST,[41] the public repository of the PLUMED consortium as plumID:20.032.

Clustering Procedure

Studying the conformational isomerism of systems with several internal degrees of freedom in a systematic and transferable way requires to group molecular configurations, identifying relevant conformational states. Achieving this provides the opportunity of reducing the dimensionality of the problem and enables the analysis of biased molecular dynamics trajectories to obtain useful kinetic and thermodynamic information. To obtain this partitioning of configurational space in an unsupervised, data-driven manner we apply the Fast Search and Find of Density Peaks (FSFDP) algorithm, developed by Rodriguez and Laio.[26] The algorithm is used to group molecular configurations into clusters of structures based on their similarity, defined by a distance matrix. The distance matrix contains the distance between any two molecular configurations, i and j, as a function of the chosen system descriptors, e.g., internal torsional angles, with no limit on dimensionality, as shown in eq Given a distance matrix, the algorithm operates by calculating the density of each point i, evaluated by considering the number of neighbors within a distance cutoff dc according towhere χ(x) = 1 if x < 0 and χ(x) = 0 otherwiseCluster centers are characterized by having the highest density within a cluster of points and a large distance from points with a higher density as shown in eq . Each cluster has an associated core set of data points and a halo, evaluated based on the given cutoff. The core set are the points that belong exclusively to a cluster and can be found within the selected distance cutoff from the cluster center, while the halo is considered as the noise around the cluster core while still affiliated with the given cluster. This grouping algorithm is particularly powerful as the clustering procedure is such that the number of clusters arises intuitively, which makes it particularly suited to identifying conformational states described by a highly multidimensional set of CVs. Here, we consider each frame of an MD trajectory of a sildenafil molecule in the gas phase as a single point in the six-dimensional torsional angle space and so the distance d between every two frames, i and j, is calculated according to eqwhere NCV refers to the total number of system descriptors, which in this case is six torsional angles, while w stands for a weight applied to each dimension as described below. In the above equation, d refers to the absolute difference in the values of any of the given torsional angles for points i and j. For example, d1 is calculated asShould the value of d be greater than π, periodicity is accounted for by subtracting the value from 2π as followsClustering structures in a meaningful way requires a distance definition able to distinguish between conformational transitions and conformational adjustments. These terms refer to the nature of the conformational rearrangement within the molecule. A conformational transition describes the conversion of one stable conformational state into another, usually associated with overcoming a free energy barrier higher than kBT. An adjustment is, on the other hand, a term used to describe a minor rearrangement, which is not associated with a new conformer, but rather with a relaxation of the structure from one configuration to another, both of which occupy the same free energy minimum in collective variable space.[5] To resolve these two cases, when calculating the distance d a slight modification to the algorithm is introduced by including the weight factor w (see eq), associated with the free energy barrier of rotation of each torsional angle. In such a way, rotations that lead to new conformational configurations through rare, activated transitions provide a larger contribution to the distance compared to rotations associated with an adjustment or a fast conversion. The weight factor, w, is defined as the lowest rotational barrier for each torsional angle according to the calculated one-dimensional free energy profiles (see the Supporting Information) along each coordinate of the CV space. The barriers are then normalized with respect to the lowest one, as shown in Table . An example for the case of τA is shown in Figure .
Table 1

Free Energy Barrier of Rotation to Each Torsional Angle Obtained from Simulations in Vacuum, along with the Corresponding Rescaling Used as a Weight in Calculating the Distance Matrix

anglebarrier heightweight
 (kBT) 
τA65
τB1.51.3
τC21.7
τD1.21
τE2.52.1
τF43.3
Figure 2

Free energy profile of torsional angle τA obtained from a vacuum simulation, along with Newman projections of the angle in the free energy minima. The lowest free energy barrier of rotation was extracted for the clustering weights.

Free energy profile of torsional angle τA obtained from a vacuum simulation, along with Newman projections of the angle in the free energy minima. The lowest free energy barrier of rotation was extracted for the clustering weights. We note that in the case of larger, more complex solutes with a higher number of conformational degrees of freedom one may need to analyze large data sets, and in such cases, the calculation of the distance matrix may become an efficiency bottleneck for the process. Recent work[27] addresses this problem pushing the limit of applicability of FSFDP to the million-configuration range.

Conformational Classification

Clustering of the molecular configurations was performed using the FSFDP algorithm, with a distance cutoff dc tuned to obtain an average number of neighbors for every data point equal to 2% of the entire data set. Based on the distance matrix calculation and the chosen cutoff, the sildenafil configurations sampled in the gas phase are grouped into clusters, each consisting of a cluster center structure, a core set of configurations, and a halo. All configurations assigned to a cluster were used to generate a structural fingerprint for the identified conformer. The term fingerprint refers to the probability distribution of each torsional angle as illustrated in Figure . All fingerprints are provided in the Supporting Information. Conformational classification of each frame of the trajectories of sildenafil in solution was carried out to represent a conformational change of the molecule in a one-dimensional space and hence enable further analysis of the enthalpic and entropic contributions to conformational isomerism in solution. To achieve that, an algorithm that compares the instantaneous value of the torsional angles, defined to describe the molecular configuration, to each of the given fingerprints for every trajectory frame was set up. For every instantaneous torsional angle value, where the corresponding probability density in the fingerprint is nonzero, a value of 1 is assigned. The total number of variables used for the classification is 6, and therefore a score of 6 means that the molecular configuration in the given frame matches a fingerprint and therefore is assigned the corresponding conformer number. A score lower than 6 indicates at least one mismatch between the given configuration and the fingerprint and is therefore assigned a value of 0, signifying that it remains unclassified.

Conformational Equilibrium Probability Distribution

A characteristic fingerprint in torsional angle space for each dominating sildenafil conformer in the gas phase was generated. Choosing the gas phase as a reference is inspired by the work of Cruz-Cabeza and Bernstein,[5] who use the same conditions to define reference conformational states. To calculate the conformational population of sildenafil in solution, each frame of the biased trajectory was assigned a characteristic conformer following the procedure discussed in the previous section. A discrete probability distribution in one-dimensional space can then be straightforwardly calculated, with the caveat that the bias potential deposited throughout the duration of the simulation needs to be accounted for in a procedure referred to as reweighting. In this work, the total metadynamics bias potential applied as a function of τA and recovered at the end of the simulation of a sildenafil molecule in solution, Vtotal(τA), is used in the reweighing scheme.[42] The trajectory is postprocessed so that each time frame, with a corresponding value for each torsional angle, as well a conformational cluster number, will also have a value associated with the total bias deposited in that particular point in the CV space of τA. For simplicity, let us refer to this value as Vtotal(τA), where i stands for the trajectory frame number. The weight W applied to each frame when reconstructing the unbiased probability distribution of conformational isomers in solution is a Boltzmann weight associated with a rescaled value of Vtotal(τA) according toIn such a way, the weight associated with points in CV space, where the maximum total external bias was deposited, will have a value of 1, as it corresponds to the lowest point in the free energy profile in τA and all other frames will have a correspondingly lower weight. This reweighting scheme was implemented to reconstruct the population of sildenafil conformers in different solvents, as well as obtaining a two-dimensional probability density function of the conformer number and its associated potential energy, used in the free energy decomposition discussed in the next chapter.

Enthalpy and Entropy Contributions to Free Energy Differences between Clusters

Free energy differences between conformational macrostates i and j of sildenafil in solution were computed from their equilibrium probability and P as and decomposed into their enthalpic and entropic contributions following the procedure outlined by Gimondi et al.[43,44] For instance, the difference in free energy between clusters i and j, ΔG, can be expressed as ΔH – TΔS, where ΔH and ΔS are the enthalpy and entropy differences between states i and j. Hence, the entropic contribution can be obtained by difference as TΔS = ΔG – ΔH, once the term ΔH is known. Since conformational transitions of sildenafil are not associated with a change in the ensemble average of the system’s volume, ΔH = ΔU + PΔV ≃ ΔU, where ΔU is the difference in internal energy between conformational macrostates i and j. Moreover, since states i and j are conformational isomers sampled at the same temperature, the internal energy difference reduces to the difference in the ensemble average of the potential energy between states i and j aswhere ⟨EP⟩, the ensemble average of the potential energy of configurations classified in cluster i, is computed aswhere p(EP(r)|r ∈ i) is the potential energy probability density, conditional to the system being classified in cluster i. The probability density p(EP(r)|r ∈ i) is computed as discussed by Gimondi et al.,[43] using the reweighting strategy described in the previous paragraph. In models in which explicit solvents are employed, ⟨EP⟩ is typically dominated by the contribution of the solvent molecules, and it is associated with large fluctuations that affect the convergence and accuracy of the ΔU estimate. To improve the statistical accuracy in ΔU, we follow the procedure outlined by Kollias et al.[44] and decompose the ensemble average of the potential energy in three components, namely ⟨EP⟩solute, ⟨EP⟩solvent, and ⟨EP⟩solute–solvent. The ⟨EP⟩solute and ⟨EP⟩solvent contributions account, respectively, for potential energy terms associated with interactions between atoms that belong exclusively to the solute and to the solvent species. The solute–solvent term ⟨EP⟩solute–solvent accounts instead for nonbonded interactions between solute and solvent atoms. Since conformational changes in the solute do not affect the solvent–solvent contribution to the potential energy, the term ⟨EP⟩solvent = ⟨EP⟩solvent = const, and thus ΔU reduces towhich is not affected by the large fluctuations of ⟨EP⟩solvent that would mask the contribution of conformational transitions to ΔU and hamper the convergence of the enthalpy and entropy contribution to free energy differences. Despite implementing this strategy, the convergence of the enthalpic and entropic contributions require a substantial sampling of the configuration space of the explicitly solvated system. Here, we achieve sufficient sampling by running WTmetaD simulations for 0.5–0.6 μs. In the case of larger, more complex solutes with a higher number of conformational degrees of freedom, we anticipate the need for replica exchange methods to exhaustively sample the configuration space and successfully apply this decomposition approach.

Results

The following sections summarize the analysis carried out on sildenafil conformers in the gas phase, as well as on the conformational rearrangements of sildenafil occurring in the crystal bulk and in solution. The results are organized as follows. First, the conformational freedom of sildenafil in the crystal bulk is reviewed by considering the torsional angle distribution obtained from MD simulations of forms I and II. Next, the results obtained from the clustering algorithm are reported, along with drawing a comparison between structures in the gas phase and those in the solid. The last section reports on the equilibrium distribution of sildenafil conformers in different solvents, obtained with the aid of WTmetaD, along with a breakdown of their corresponding free energy into enthalpy and entropy contributions.

Conformational Rearrangements in the Crystal Bulk

The conformational rearrangement of sildenafil was first investigated for the case of a molecule in the crystal bulk in each of the two polymorphic forms. As mentioned, each polymorph contains two conformational isomers of the molecule. A probability distribution of each torsional angle representative of the conformational state adopted by sildenafil in the solid was obtained, which enables to gain insight into the degree of conformational freedom available in each crystal form. The results reveal that while several of the dihedral angles of sildenafil are completely restrained by the crystal packing, a surprising amount of flexibility is accessible to the rest. A probability distribution of each torsional angle for crystal conformers 1 and 2 in form I is shown in Figure a. In the figure, a narrow and mono-modal distribution corresponds to each of the torsional angles τA, τC, and τF, shown, respectively, in blue, green, and brown. A mild degree of conformational adjustment is associated with torsional angles τB (in orange) and τD (in red), which are both distributed around ±π. These torsional angles are associated with adjacent substituents on the phenyl ring, suggesting that the rearrangement is possibly related to relieving steric hindrance. The highest degree of rotational freedom is observed in the case of torsional angle τE, shown in purple, representing the rotation of the propanyl substituent of the pyrazole ring (see Figure ), displaying a multimodal distribution. The flexibility of τE reveals a moderate degree of conformational rearrangement available to the conformer, despite the restrictive environment traditionally associated with a crystal. This observation has been addressed by Barbas et al.[29] who have recorded the presence of a dynamic disorder in the propyl groups at room temperature in form I, which disappears at temperatures lower than 100 K. The authors justify this observation with the fact that these functional groups do not establish strong intermolecular interactions with the surrounding atoms within the crystal, resulting in a moderate degree of flexibility in the chain.
Figure 3

(a) Probability density plots of each torsional angle for crystal conformers 1 and 2 obtained from a 10 ns MD simulation of form I sildenafil. (b) Image of crystal form I conformers 1 and 2, obtained from the.cif, generated with VMD, where hydrogen atoms are removed for simplicity.

(a) Probability density plots of each torsional angle for crystal conformers 1 and 2 obtained from a 10 ns MD simulation of form I sildenafil. (b) Image of crystal form I conformers 1 and 2, obtained from the.cif, generated with VMD, where hydrogen atoms are removed for simplicity. Analyzing the torsional angle distributions of each of the two conformers found within the crystal structure of form I reveals that they are conformational isomers, where a rotation along τA and τE can convert conformer 1 into conformer 2, as shown in Figure b. According to the results obtained from the MD simulation of form I, the internal rearrangement of crystal conformer 2 displays an identical behavior as to that of conformer 1. Similarly, the conformational freedom of sildenafil in form II was analyzed through an unbiased MD simulation. A comparison of the probability distribution of each torsional angle of sildenafil between form I and form II reveals a similar conformational configuration in the two structures, with the only difference being a marginally lower flexibility of τE in form II compared to that observed in form I. A detailed comparison between the conformers in each form is provided in the Supporting Information. An experimental comparison between the two crystal forms is provided by Barbas et al.,[30] who make a similar observation to the one reported here and stress that any differences between conformers in the two crystals can be classified as conformational readjustments of the same gas-phase conformer, validating the conclusions made on the basis of our MD simulations. As mentioned, the probability distribution of torsional angle τE, associated with the rearrangement of the pronanyl group, shows that the degree of rearrangement is counterintuitively lower in form II compared to form I, despite the presence of larger structural voids in the former. The experimental publication does not report measurements of the degree of disorder in form II; however, the authors speculate that the conformational rearrangement of the propanyl groups will be dominated by a drive to keep the cavities in the structure empty, which could explain the conformational behavior observed in the MD simulation of form II.

Structure Clustering in the Gas Phase

This section reports on the results obtained from implementing a data clustering algorithm to group structures from an MD trajectory of sildenafil in the gas phase and identify characteristic conformational configurations. A trajectory of a molecule in a vacuum was chosen for this purpose, as in the absence of solvent effects, the internal rearrangement of the molecule is unhindered and thorough sampling of all possible molecular configurations can be achieved efficiently. This allows considering the full conformational space of sildenafil in the clustering procedure. Such an approach has the potential to provide a more meaningful and robust method of identifying representative conformational isomers over methods that rely on generating conformers through random search and local minima strategies. By considering the free energy profile of each torsional angle in the gas phase, as discussed in the Supporting Information, all possible combinations of structural local minima of the molecule in torsional angle space are estimated to be 144. However, in reality, each local minimum comprises an ensemble of configurations, meaning that 144 structures are a rather conservative estimate, and in practice, there is a swarm of possible stable molecular configurations. For that reason, failing to explore the collective variable space thoroughly encounters the risk of missing out important structures due to the sheer number of available configurations, even for systems of moderate flexibility. A typical output of the clustering algorithm is a decision graph, displaying all data points as a function of their density ρ (number of neighbors) and distance from the nearest point of higher density δ (see eqs and 3). When applied, the clustering algorithm determined 12 cluster centers, as shown in the decision graph in Figure . Each cluster center corresponds to the most representative conformer structure for those belonging to a cluster (core and halo). A visual representation of the molecular shape corresponding to each cluster center is displayed around the decision graph. The conformers identified at this stage are labeled C1–C12, each of which has an associated characteristic fingerprint in torsional angle space as discussed in the Methods section. All fingerprints can be found in the Supporting Information.
Figure 4

Decision graph of the structure clustering performed on the trajectory in the gas phase. The circled structures represent the cluster centers. The figure below represents the distance matrix between distributions of conformers 1 and 2 in the crystal and identified from the algorithm cluster centers.

Decision graph of the structure clustering performed on the trajectory in the gas phase. The circled structures represent the cluster centers. The figure below represents the distance matrix between distributions of conformers 1 and 2 in the crystal and identified from the algorithm cluster centers. Before proceeding further into the analysis of the conformational population of sildenafil in solution, it is useful to compare the structures of the cluster centers identified in a vacuum and those of the crystal conformers. To this aim, a distance matrix comparing crystal conformers 1 and 2 found in form I and each of the 12 cluster center structures is generated as shown in Figure . In the plot, the color scale corresponds to the distance in torsional angle space, where white signifies the lowest distance, i.e., the most similar structures. The distance matrix was obtained by taking the most probable value for each angle and calculating the distance between points in the 6D space of all dihedrals. The distance matrix reveals that conformer C8 is the most similar structure to crystal conformer 1, while C2 is the most similar to conformer 2. The fingerprints corresponding to C8 and conformer 1 structures in torsional angle space are visually compared in Figure a. The two plots show that the conformational rearrangement of C8 into crystal conformer 1 would involve readjustment in torsional angles τA, τB, and τD, which display a comparatively broader distribution in a vacuum. The overlap in the distributions demonstrates that the exact crystal conformer is found in the gas phase, and it accounts for only 3% of all configurations. Figure b shows a comparison between the cluster center structure of C8 and crystal conformer 1 as taken from the CCDC database, prior to performing MD. The overlap demonstrates the readjustment in τA and τD necessary to convert C8 into conformer 1. Torsional angle τE differs significantly according to the figure; however, as discussed, it has moderate flexibility in the solid and it relaxes to a configuration closer to that of C8 during MD, as shown in the probability distribution in Figure a in purple.
Figure 5

(a) Probability density of each torsional angle for crystal conformer 1 (dashed lines) and cluster center 8 (thick line). The numerical distance between distribution peaks is provided in the table below. (b) Comparison of crystal conformer 1 structure and the representative structure for C8 cluster group.

(a) Probability density of each torsional angle for crystal conformer 1 (dashed lines) and cluster center 8 (thick line). The numerical distance between distribution peaks is provided in the table below. (b) Comparison of crystal conformer 1 structure and the representative structure for C8 cluster group. The second most similar to the crystal group of structures are conformers C10 and C12, which relate to, respectively, conformers C2 and C8 via a rotation of torsional angle τE.

Conformational Isomers of Sildenafil in Solution

Equilibrium Probability

MD simulations in combination with WTmetaD were used to investigate the conformational rearrangement of sildenafil in three different solvents—acetonitrile, 1-butanol, and toluene. The biased MD trajectory of each solvent case was analyzed using the characteristic fingerprints associated with structure clusters C1–C12, following the procedure outlined in the Methods section. The probability of each conformer cluster was generated by accounting for the deposited bias potential through the reweighting procedure discussed earlier. The obtained probability for all solvent cases is shown in Figure . The results show that 95% of configurations in solution are accounted for, indicating that the proposed procedure of identifying conformational structures via unsupervised clustering is a fast and reliable way of determining conformational configurations in solution for systems with a significant degree of conformational complexity. Examining the distributions, small but significant variations in the probability in different solvents are observed. These findings correlate with what was observed in our previous work for the case of ibuprofen.[6] Structure types C2 and C8 are each found to account for 15–20% of the structures in solution. As discussed, these two groups represent the conformers with a structure closest to the crystal-like configuration of sildenafil. Furthermore, C10 and C12, which relate to C2 and C8 via a rotation along the fast-converting torsional angle τE, each account for a further 15% of conformational isomers in solution. Therefore, given the high flexibility of τE even within the crystal structures, overall 60–70% of the structures found in solution resemble configurations close to those observed in the crystal, inferring that at the crystal–solution interface 30% of structures will have to encounter a more significant conformational rearrangement to adopt a crystal-like configuration and promote crystal nucleation or growth.
Figure 6

Probability of each conformer configuration C1–C12 along with crystal conformer 1 and crystal conformer 2 for all three solvents.

Probability of each conformer configuration C1–C12 along with crystal conformer 1 and crystal conformer 2 for all three solvents.

Enthalpy and Entropy Contributions to Conformational Stability

To gain further insight into the conformational isomerism of sildenafil in solution, the free energy ΔG of each state is calculated, along with a breakdown into potential energy and entropy. In Figure , the relative free energy of each state with respect to structure C2 for each solvent can be found in blue. As expected, the free energy difference between the four structures dominating the probability density plot—C2, C8, C10, and C12—is less than 1 kJ/mol for all solvents. The difference in free energy between the latter group and the rest of the structures varies between 2.5 and 5.5 kJ/mol, with the exception of C6 for the case of acetonitrile. Despite the minor variations in ΔG between different solvent cases, however, the potential energy and entropy reveal more significant differences induced by the solvent. The relative potential energy difference (with respect to C2) rarely exceeds 5 kJ/mol in the case of acetonitrile, which also translates into minor entropic contributions in most states. The exception is once again C6 for which the free energy is dominated by configurational entropy. A significantly different observation can be made for the cases of 1-butanol and toluene, where the relative potential energy of states is much larger, varying between 5 and 13 kJ/mol. Equally, entropy contributions in these two solvents are much more substantial than for the case of acetonitrile, indicating that solute–solvent interactions are much more dynamical. This is particularly prominent for the case of 1-butanol and it is possibly related to the inherent flexibility of its aliphatic chain.
Figure 7

Relative free energy, potential energy, and entropy of each cluster configuration with respect to C2 for the case of (a) acetonitrile, (b) 1-butanol, and (c) toluene.

Relative free energy, potential energy, and entropy of each cluster configuration with respect to C2 for the case of (a) acetonitrile, (b) 1-butanol, and (c) toluene.

Conclusions

In this paper, we develop a systematic approach to partition the configuration space of flexible molecules with an arbitrary number of rotatable bonds into conformational macrostates. The approach is based on the development of a distance metric between configurations that incorporates qualitative information on the energetic cost associated with transitions along each degree of freedom, and the subsequent application of unsupervised clustering. We apply this approach to investigate the conformational landscape of sildenafil in the crystal bulk, in the gas phase, and in solution. A key aspect of the methodology introduced in this work is that the cluster centers are identified only once, for a reference state in the gas phase. These cluster centers configurations are then used to classify configurations sampled in solution. This approach provides a self-consistent identification scheme for clusters in the condensed phase. Using this methodology, 95% of structures in three different solvents are unambiguously assigned to a cluster, demonstrating the effectiveness of the proposed classification procedure. We demonstrate that this classification strategy can be coupled with reweighting strategies to compute the free energy of conformational states and to further decompose it into its enthalpy and entropy contributions. This analysis leads to new insights into the role of solvent in the definition of the conformational landscape of an organic molecule. It is found that, while the relative free energy variation between states in different solvents is limited, solvents cases 1-butanol and toluene cause an increase in the entropic contribution to the conformational free energy. Combining this approach with existing strategies for studying the effects of conformational rearrangements on processes can prove invaluable in understanding the effect of conformational isomerism in the process of crystal nucleation, growth, and dissolution for systems of any size and level of conformational complexity.
  25 in total

1.  VMD: visual molecular dynamics.

Authors:  W Humphrey; A Dalke; K Schulten
Journal:  J Mol Graph       Date:  1996-02

2.  Adaptive partitioning by local density-peaks: An efficient density-based clustering algorithm for analyzing molecular dynamics trajectories.

Authors:  Song Liu; Lizhe Zhu; Fu Kit Sheong; Wei Wang; Xuhui Huang
Journal:  J Comput Chem       Date:  2016-11-21       Impact factor: 3.376

Review 3.  New insights in protein kinase conformational dynamics.

Authors:  Giorgio Saladino; Francesco Luigi Gervasio
Journal:  Curr Top Med Chem       Date:  2012       Impact factor: 3.295

4.  Machine learning. Clustering by fast search and find of density peaks.

Authors:  Alex Rodriguez; Alessandro Laio
Journal:  Science       Date:  2014-06-27       Impact factor: 47.728

5.  Dynamics and Thermodynamics of Ibuprofen Conformational Isomerism at the Crystal/Solution Interface.

Authors:  Veselina Marinova; Geoffrey P F Wood; Ivan Marziano; Matteo Salvalaglio
Journal:  J Chem Theory Comput       Date:  2018-11-07       Impact factor: 6.006

Review 6.  The Serendipitous Story of Sildenafil: An Unexpected Oral Therapy for Erectile Dysfunction.

Authors:  Irwin Goldstein; Arthur L Burnett; Raymond C Rosen; Peter W Park; Vera J Stecher
Journal:  Sex Med Rev       Date:  2018-10-06

7.  Force Field Benchmark of Organic Liquids: Density, Enthalpy of Vaporization, Heat Capacities, Surface Tension, Isothermal Compressibility, Volumetric Expansion Coefficient, and Dielectric Constant.

Authors:  Carl Caleman; Paul J van Maaren; Minyan Hong; Jochen S Hub; Luciano T Costa; David van der Spoel
Journal:  J Chem Theory Comput       Date:  2011-12-07       Impact factor: 6.006

8.  On the Application of Molecular-Dynamics Based Markov State Models to Functional Proteins.

Authors:  Robert D Malmstrom; Christopher T Lee; Adam Van Wart; Rommie E Amaro
Journal:  J Chem Theory Comput       Date:  2014-07-08       Impact factor: 6.006

9.  Protein conformational flexibility modulates kinetics and thermodynamics of drug binding.

Authors:  M Amaral; D B Kokh; J Bomke; A Wegener; H P Buchstaller; H M Eggenweiler; P Matias; C Sirrenberg; R C Wade; M Frech
Journal:  Nat Commun       Date:  2017-12-22       Impact factor: 14.919

10.  Markov clustering versus affinity propagation for the partitioning of protein interaction graphs.

Authors:  James Vlasblom; Shoshana J Wodak
Journal:  BMC Bioinformatics       Date:  2009-03-30       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.