Literature DB >> 25537389

The anatomy of a comprehensive constrained, restrained refinement program for the modern computing environment - Olex2 dissected.

Luc J Bourhis¹, Oleg V Dolomanov², Richard J Gildea³, Judith A K Howard⁴, Horst Puschmann².

Abstract

This paper describes the mathematical basis for olex2.refine, the new refinement engine which is integrated within the Olex2 program. Precise and clear equations are provided for every computation performed by this engine, including structure factors and their derivatives, constraints, restraints and twinning; a general overview is also given of the different components of the engine and their relation to each other. A framework for adding multiple general constraints with dependencies on common physical parameters is described. Several new restraints on atomic displacement parameters are also presented.

Entities: Chemical Disease Gene

Keywords: Olex2; constraints; least squares; refinement; restraints; small molecules

Year: 2015 PMID： 25537389 PMCID： PMC4283469 DOI： 10.1107/S2053273314022207

Source DB: PubMed Journal: Acta Crystallogr A Found Adv ISSN： 2053-2733 Impact factor: 2.290

Introduction

During the last four decades, small-molecule crystallographers have used a myriad of stand-alone routines and various comprehensive software packages, most notably those written by Sheldrick (1997 ▶, 2008 ▶) and that designed by Bob Carruthers and John Rollett, then maintained and enhanced by David Watkin (Betteridge et al., 2003 ▶). These latter two packages have dominated the citations in all publications containing crystal structure analyses for many years now. Therefore it might be claimed that the analysis of single-crystal X-ray (and neutron) diffraction data has reached a certain level of maturity and that there is no need for further program development. Nonetheless, there is still considerable activity in this area by various groups around the world and there has been a new release, SHELX2013, from Sheldrick recently. A retro-fit of new ideas to these older programs is not always possible, except perhaps by the authors themselves. When we first embarked on the project herein reported1 we were clear about one point – namely that we wanted to create a new and flexible refinement engine, working in a collaborative environment and to be based on trusted and mature pre-existing code. This new refinement engine would be open source and therefore available for verification and modification by others. Modern programming paradigms, unknown when the aforementioned packages were created in the 1960s, would form the basis of our development. It is hoped that such an architecture will allow extensions to the code without breaking the program or endangering the underlying functionality. It is this that we describe below in further detail and which now forms the underlying code for olex2.refine (http://www.olex2.org). We decided to base the computational core of olex2.refine on a pre-existing project, the Computational Crystallography Toolbox (cctbx), that provides the foundation for the macromolecular refinement facilities in the PHENIX suite of Adams et al. (2010 ▶). We made that choice because it provided the solid and versatile tools (Grosse-Kunstleve & Adams, 2003 ▶) that we needed to develop our project. The most important of them is a comprehensive and robust toolbox to handle crystal symmetries (sgtbx) that supports every space group in any setting, even rather unusual settings such as tripled cells. Another key cctbx toolbox is the eltbx, which is concerned with scattering-factor computations for any atom or ion and any wavelength that could be encountered in practice. The cctbx also featured many of the restraints we needed, in the module cctbx.restraints. For our project, we contributed to it a Small-Molecule Toolbox (smtbx) which features, in particular, the constraint framework presented in detail in §3 and Appendix C , and the computation of structure factors presented in Appendix A . We also contributed several new restraints to cctbx, restraints which are discussed in §4 and Appendix D . Finally, both the cctbx and the smtbx use the Scientific Toolbox (scitbx) for arrays, matrices, special functions and other purely mathematical matters. We have contributed a new least-squares toolbox (lstbx) to the scitbx, which provides flexible tools to deal with generic linear and non-linear problems, with or without an unknown overall scale factor. It implements the method based on normal equations presented in Appendix B . The program Olex2 relies either on SHELX or on the smtbx to refine structures. If using the latter, Olex2 converts its internal model of a structure into the objects used by the smtbx and the cctbx to represent the unit cell, reflections, crystal symmetry, constraints, restraints etc. Once the smtbx has produced the desired results they are sent back to Olex2 which converts them to its own representations, so as to perform diverse post-processing and eventually display the results to the user of the program. Fig. 1 ▶ summarizes the dependencies between the various modules and programs that have just been described.

Figure 1

A high-level depiction of the software modules involved in olex2.refine: an arrow is drawn from each component pointing towards another component it uses.

Least-squares refinement

A small-molecule structure refinement typically minimizes the weighted least-squares (LS) functionwhere (respectively, ) denotes either the measured amplitude (respectively, the modulus of the calculated complex structure factor ) or the measured intensity (respectively, the calculated intensity ), whereas K is an overall, unknown scale factor that places on the same scale as . The first sum over h runs over a set of m non-equivalent reflections that have been observed. Each observation is given an appropriate weight, , based on the reliability of the measurement. These may be pure statistical weights, , where σ is the estimated standard deviation of , though more complex weighting schemes are usually used. In the most general case, is a function of , , and h itself, whereas the most common weighting schemes exhibit only a dependence on the first three. These X-ray observations can be supplemented with the use of ‘observations of restraint’, as suggested by Waser (1963 ▶), where additional information such as target values for bond lengths, angles etc. is included in the minimization. This is the origin of the second term of the sum, where is the target value for our restraint, and is the value of the target function calculated using the current model (see, for example, Giacovazzo et al., 2011 ▶; Watkin, 2008 ▶). This term may of course be absent. Conversely, the data term could be absent in a refinement against geometrical terms only [cf. the program DLS-76 by Baerlocher et al. (1978 ▶) for example]. In equation (1), the crystallographic parameters [i.e. atomic positions, atomic displacement parameters (ADPs) and chemical occupancy in routine refinements] that each component of and depends upon will be collectively denoted as the vector and we can therefore denote those dependencies with the compact notations and . The parameters that are actually refined may be a different set , collectively denoted as a vector x. We assume that the dependency of y upon x is known analytically. Since the scale factor K is unknown, our problem is the minimization of L with respect to all of . We will denote that dependency as , or more tersely when we do not need to remember the crystallographic parameter vector y, as . These notations reflect the important fact that we treat the scale factor K separately, as will be explained later. For a small-molecule structure with a high data-to-parameter ratio, such unconstrained minimization as defined by the first term of equation (1), when , may well be sufficient. However, as the structure becomes larger, or the data-to-parameter ratio worsens, unconstrained minimization may not be well behaved, or result in some questionable parameter values. It has become customary to rely on the use of two techniques to solve such issues: Restraints: by having a non-zero second term in equation (1) – with the use of appropriate weighting of the restraints – the minimization is gently pushed towards giving a chemically sensible and hopefully correct structure. Constraints: by having a parameter vector x shorter than y, therefore explicitly taking into account exact relationships between some of the parameters . Whether the refinement uses restraints or constraints or both, at each refinement cycle, we first find the value of K that minimizes keeping the parameter vector x fixed: we will denote it by , where we use a notation that is explicit in its dependency on the value of the refined parameters at the beginning of the cycle. We then search for the shift vector s of the parameter vector x that brings closer to the minimum. It is solution of the so-called normal equations, where and are the so-called normal matrices associated with the two terms sharing the same labels in equation (1), respectively, and where the right-hand side and are equivalently associated with those two terms [see the derivation of equation (72) in Appendix B ]. The rest of the paper is organized as follows. §3 deals with the computation of the derivatives with respect to the refined parameters from the derivatives with respect to the crystallographic parameters . The computation of the latter derivatives is expounded in Appendix A along with the formulae for the complex structure factors . Appendix C details the computation of the value and derivatives of constrained parameters for each constraint featured by olex2.refine. §4 is devoted to the general principles of the computation of and . It is completed by Appendix D , which gives the formulae for each restraint featured by olex2.refine. §5 deals with the refinement of twinned structures. §6 details the computation of standard uncertainties (s.u.’s), emphasizing the influence of constraints. §7 gives a very quick overview of the output of the results of a refinement. Appendix B presents the method used to find the optimal scale factor K, and then the optimal value of all the other refined parameters.

Constraints

Synopsis

The difference between restraints and constraints may be conceptualized mainly in two manners that we will first illustrate with a simple example: a geometrically constrained acetylenic hydrogen, X C—H. The position of the hydrogen must be such that the distance CH is equal to some ideal bond length d and such that the angle is equal to 180°. Expressed in such an implicit manner, those restrictions could be used to construct restraints by adding to the function to minimize a term w 1(CH − d)2 + w 2( − 1)2. By doing so, the number of refined parameters would not be changed but the number of observations would increase by three. It is, however, traditional to do the opposite, by reducing the number of parameters by three and keeping the number of observations unchanged. This is achieved by using the implicit form of the constraints to express the position of as a function of the positions of the two carbon atoms. Denoting by x the triplet of coordinates of the atoms, where denotes the Euclidean norm. By using this formula, the observable of the fit (either or ) that used to depend on , and is replaced by a function of and but not of . We will call this transformation a reparametrization: and we will say that is a reparametrization of , whose arguments are and . The derivatives for the remaining variables are obtained with the chain rule We use the following compact notations for derivatives: for a column vector will always denote the row vectorGiven another column vector will always denote the matrixwhere i (respectively, j) indexes the rows (respectively, the columns). The identity matrix will be denoted by . It should be noted that it is customary to work within the ‘riding’ approximation, which results in the much simplified chain rule implemented in all refinement programs, including Olex2. It is not always the case that all three coordinates of an atom are removed from the refinement by constraints. For example, an atom in the plane of the mirror whose matrix readshas its coordinates constrained by the relation , which can be reduced to only. The corresponding reparametrization may be written the observable becoming now a function of the vector of newly introduced refinable parameters . There is a general algorithm implemented in the cctbx that for any special position returns a matrix Z so that the reparametrization takes the form where Z is a or matrix and z is a 3-vector. This algorithm first determines the space-group symmetries that leave the site invariant. The resulting system of linear equations, which read in our example, is then reduced to a triangular form from which the matrix Z and the vector z are then readily obtained. This algorithm does not therefore try to determine which components of u, if any, are also components of x. This is why we have presented the reparametrization for our example in this general form instead of keeping and as refinable parameters, which would sound more intuitive in the first place. This matrix Z is the matrix of constraints for the position of that atom. The anisotropic displacement tensor U is subject to symmetry constraints as well. In our example, it must satisfy where denotes the transpose of the matrix M (see e.g. Giacovazzo et al., 2011 ▶). Any number of such matrix equations can always be rewritten as a system of equations whose most general form reads where P has 6 columns and 6n rows, where n is the number of symmetry elements other than the identity involved in the special position [indeed equation (10), being trivial for , can safely be discarded]. In our example, and equation (11) reduces to leading to the constraint matrix Then one would refine instead of the . The cctbx provides an algorithm that computes this matrix P for any special positions, and then reduces it to triangular form in order to determine the constraint matrix for the ADP. In other cases, a reparametrization will make some crystallographic parameters disappear while introducing new refinable parameters. A typical example is that of a tetrahedral X—CH3, as the geometrical constraints leave one degree of freedom, a rotation about the axis X—C. Thus the reparametrization expresses the coordinates of the hydrogen atoms H0, H1 and H2 as functions of the coordinates of the carbon atoms, and of an angle ϕ modelling that rotation, where 109.5° and is an orthonormal basis of column vectors with in the direction of the bond . The riding approximation in this case consists of neglecting the derivatives of those base vectors, leading to Thus a new derivative with respect to the new refinable parameter ϕ is introduced by this reparametrization. The last important concept is that of the chaining or composition of reparametrizations, that we will illustrate with a combination of the examples above. This example is not particularly common but it is a simple illustration of the concept we want to introduce. In the case, the atoms and could be on the special position studied in the next-to-last example. One type of disorder could be modelled by first applying the reparametrization (15) and then reparametrizing and using equation (8), introducing parameters for the former and for the latter. The derivatives would then be obtained by the chain rule, e.g. in the riding approximation. This composition of reparametrization may be represented as a graph: each parameter (some of them are triplets of coordinates, others are scalars), , , , , , ϕ, u, v, and , constitutes a node of that graph, whereas edges are drawn from each node to its arguments, i.e. the nodes it depends upon, as shown in Fig. 2 ▶. In this example, has only one argument, u, whereas has three arguments, , and ϕ. The smtbx implements reparametrizations by explicitly building such a graph.

Figure 2

Example of a dependency graph for the chain of reparametrization discussed in §3. Only the part for hydrogen atom H0 is shown.

As models become more complex, e.g. hydrogen atoms riding on the atoms of a rigid body whose rotation centre is tied to an atom whose coordinates are refined, the reparametrization graph becomes deeper. We decided not to put arbitrary limits on that graph. Indeed, we could have made a closed list of reparametrizations and of reparametrization combinations that our framework would accept but instead we decided to write our code so that it could correctly handle the computation of parameter values and of partial derivatives for arbitrary reparametrizations, combined in arbitrary ways. This framework is therefore open as new types of reparametrizations can be added, without the need to change the basic infrastructure in any way, and without the risk of breaking existing reparametrizations. This has proven very useful to the authors as this enabled them to incrementally add the wealth of constraints now available, some of which are unique to olex2.refine, as discussed in Appendix C . Furthermore, it enables third parties to develop their own constraints without the need for the involvement of the original authors beyond documenting how the reparametrization framework works. A crystallographic refinement may involve many such reparametrizations. By piecing them all together, we obtain one reparametrization that expresses all refinable crystallographic parameters as a smaller vector of independent parameters that shall then be refined. Our framework safeguards that piecewise construction in several ways. First, at most one reparametrization may be applied to any given parameter. An attempt to add a reparametrization to a parameter that is already subject to one would be rejected by our framework as an error. Then if a cycle were found in the dependency graph, the framework would also reject the parametrization. This would happen when at least one parameter, through a series of reparametrization combinations, depends upon itself. Thus our framework safeguards against incorrect user inputs, and also against bugs in our own code that automatically builds constraints.

Constraint matrix

There is a wealth of algorithms designed to minimize least squares, but crystallographic software has only implemented a few of them. The two most popular methods have historically been the full matrix3 (all small-molecule programs, including Olex2, as explained in Appendix B ) and conjugate gradient LS (CGLS, see e.g. Björck, 1996 ▶) which SHELXL offers as an option along with full matrix. The macromolecular community later introduced the limited-memory Broyden–Fletcher–Goldfarb–Shanno method (LBFGS, Nocedal, 1980 ▶), phenix.refine (Afonine et al., 2012 ▶) and a sparse Gauss–Newton algorithm, REFMAC (Murshudov et al., 2011 ▶, §4 and references therein). All methods mentioned above have in common the fact that they require only the computation of the value and the first-order derivatives of the calculated F or , that we have denoted as in this paper. Since no higher-order derivatives are used, the implementation of constrained least squares is greatly simplified. Indeed, after transforming every constraint on the model into a reparametrization (as explained and exemplified in the previous section) and piecing all those reparametrizations together, we obtain a global reparametrization of the vector of crystallographic parameters as a function of the vector of the parameters that are actually refined. It should be noted that any component of y that is not reparametrized – e.g. a coordinate of the pivot atom on which another atom rides, and of course any parameter that is not involved in any constraint – is also a component of x, i.e. From the known analytical expression of , one computes the derivatives of (the reader is referred to Appendix A for a detailed presentation of those computations). The minimization algorithm then only needs the derivatives of with respect to each , which are easily obtained by a simple application of the chain rule The matrix is known as the constraint matrix in crystallographic circles.4 In standard mathematical nomenclatures, it is called the Jacobian of the transformation and we will therefore denote it J. The computation of J takes advantage of the fact that it is a very sparse matrix. This stems from the fact that any given crystallographic parameter depends on very few refined parameters , as amply illustrated in the previous section. We take advantage of this property to drastically reduce the memory cost of J and the cost of computing each of its non-zero coefficients. More precisely, both of these costs scale as the number of non-zero elements in J instead of if J were treated as a dense matrix.

Restrained least-squares refinement

In this section we will give an overview of the computations involved in restrained refinement. The mathematical formulae for each restraint objective and their derivatives can be found in Appendix D . Since each restraint target only depends on a very small subset of the (e.g., in the case of a bond length, only the six coordinates of the two bonded atoms would play a role), the matrix of derivatives with respect to the crystallographic parameters which is known as the design matrix for the restraints, is very sparse. We therefore use sparse-matrix techniques to efficiently store and perform computations with D, by only storing non-zero elements, and never performing any multiplication that involves an element of D known to be zero. We then introduce the matrix of derivatives with respect to the refined parameters, which is computed by forming the product of D and the constraint matrix. Thus the restraints are initially built up without any knowledge of the constraint matrix. This greatly simplifies their implementation and it simplifies their use in a refinement program that does not use constraints. This organization of the computation does not incur any inefficiency as the product (21) is a cheap operation since both matrices are sparse, and it therefore scales as the number of non-zero elements, which is typically much smaller than the number of parameters n. The two terms in the normal equations (2) coming from the restraints then read as the matrix product and matrix-vector product, where denotes the transpose of D and where W is the diagonal matrix featuring the restraint weights, and where is the residual for the ith restraint. We again take advantage of the sparsity of to efficiently implement those products. It would be desirable to place the weights of the restraints on the same scale as the typical residual, such that a restraint will have a similar strength for the same weight in different structures. Rollett (1970 ▶) suggests the normalization factor This is better known as the square of the goodness of fit, . This normalizing factor also allows the restraints to have greater influence when the fit of the model to the data is poor (and the goodness of fit is greater than unity), whilst their influence lessens as the fit improves (Sheldrick, 1997 ▶).

Implementation

The choice of the minimization algorithm has a significant impact on the organization of the computation of restraints. Indeed, a generic minimizer such as the LBFGS minimizer used in phenix.refine requires at each iteration only the function value L and the derivatives . The derivatives and can be calculated separately before combining their sum to obtain . In contrast, for full-matrix least squares we need the matrices of partial derivatives and . Therefore, depending on the optimization method used, we must be able to compute both and where by the chain rule The restraints framework was designed in such a way that it could be easily extended by adding further restraints. Each restraint must provide the array of partial derivatives of the restraint with respect to the crystallographic parameters (one row of the matrix ), the restraint delta, , and the weight, w, of the restraint.

Twinning

Like all other refinement programs, we have adopted the model of twins proposed by Jameson (1982 ▶) and Pratt et al. (1971 ▶). The sample is modelled as d domains, each sizeable enough a single crystal to give rise to observable Bragg peaks. If peaks from different domains fall very close in reciprocal space, the integration software analysing frames will be able to compute only the sum of the intensities of these superposed peaks. Data for the rth reflection will therefore consist of an intensity and of a list of Miller indices where is the triplet of Miller indices of the Bragg peak originating from diffraction by domain i. The model of the structure then predicts an intensity for that reads where is the fraction of the sample volume occupied by twin domain i. The least squares to minimize are then, adapting equation (1) for a twinned structure, where the weight is usually where w would be the same function discussed in the context of equation (1). The minimization of L with respect to the model parameters embodied in and with respect to the α’s is therefore subject to the constraint There is a special case that is common and therefore important, where there are exactly d superposed peaks for each reflection, i.e., for every r, and where there is a matrix R, the twin law, that generates the Miller indices for each domain as in this case, the input consists solely of a list of (as for an untwinned refinement but with ) and of R. Such twins belong to the taxonomies pseudo-merohedral or merohedral. olex2.refine is able to refine a twinned structure input in this manner. SHELXL users would handle the general case by using a reflection file in the HKLF5 format. Such a file is created by the integration software and this approach can deal with the most complex situations. By contrast, CRYSTALS always requires a list of twin laws. olex2.refine can handle a more general case: when general and merohedral twinning are simultaneously present. For example, one may have four domains 1, 2, 3 and 4. Domains 1 and 3 are related by a twin law R, and so are domains 2 and 4, but the relative orientation of domains 1 and 2 does not correspond to any twin law. Thus the measured frames exhibit two lattices of Bragg peaks, with some overlaps, and the integration software will then output a list of (, ), (, ) and (, , ). The refinement engine needs to expand this to a list of (, , ), (, , ) and (, , , , ). olex2.refine performs this task on the fly, while passing Miller triplets to the code computing structure factors and their derivatives. The theory and phenomenology of twinning extends far beyond our exposition, but from the narrow point of view of refinement our presentation is sufficient, since, in this paper, we do not concern ourselves with the computation of the twin law or with the indexing of superposed lattices.

Standard uncertainties

Variance matrix

In this section, we will discuss the rigorous computation of s.u.’s of and correlation between the parameters of a constrained model. As pointed out in Appendix B in the comment about equation (73), the variance matrix for the refined parameters x is where is the normal matrix for the constrained least-squares minimization. However, we are interested in the variance matrix for the crystallographic parameters of the model, not in . For small variations, we have the linear relation and therefore, using the well known heuristic definition of the variance matrix of y as the mean value of in the linear approximation around the minimum implicitly assumed throughout crystallographic refinement, We would like to stress an important consequence of this formula: constrained parameters generally have non-zero s.u.’s. This is the case, for example, for the coordinates of riding atoms. For most constraints, the s.u.’s of hydrogen coordinates are equal to the s.u.’s of the atom they ride on but e.g. for a rotating –CH3, they differ because the s.u. of the azimuthal angle increases the s.u. of the hydrogen coordinates that come from riding only.

Derived parameters

We will now discuss the s.u.’s of derived parameters. Such a parameter is a function f of a set of atomic parameters and its variance can be derived using the same heuristic as above in a linear approximation where by computing as the mean of . This leads to An early occurrence of this formula can be found in Sands (1966 ▶). As a result of this formula and of the consequence of equation (35) stated after it, any bond length, any bond angle and any dihedral angle involving a riding atom has a non-zero s.u. unless that geometrical quantity is fixed by the constraint. For example, in the case of —C—H, for which the constraints do not fix the angles , those angles have a non-zero s.u. This correct behaviour unfortunately triggers many alerts when a structure refined with Olex2 is checked with PLATON. PLATON plays a very important role in detecting common flaws in the crystallographic workflow. However, since PLATON does not have access either to the covariance matrix or the constraint matrix, it has to make guesses that result in estimation of e.s.d.’s that may be out by up to a factor 2. This is the key problem we encounter with the way olex2.refine reports e.s.d.’s. Specifically, most structures refined with olex2.refine fail checkCIF test PLAT732 (an alert C), for the reason explained in the documentation of this alert (in Note 2):5 PLATON computes the s.u. of the said angle using the s.u. of the atomic coordinates as if they were independent parameters since it does not have access to the variance–covariance matrix that describes their correlations. In this case, those correlations are very significant since the position of H completely depends on the position of R1, R2 and R3. Hence the s.u. of this angle as reported by Olex2 differs from the PLATON estimate. Thus all failures of test PLAT732 for angles or distances involving constrained hydrogen atoms are spurious. As a result, if a referee were to require that all alert C’s are to be addressed in a CIF submitted for publication in Acta Crystallographica, we advise authors to explain away PLAT732 for riding hydrogen atoms by quoting Note 2 in the documentation for that test. olex2.refine will actually automatically prepare the CIF file with such explanations.

Incorporating s.u.’s of unit-cell parameters

Derived parameters such as bond lengths and angles are a function of both the least-squares atomic parameters and the unit-cell parameters. As such, the s.u. of a derived parameter is likewise a function of both the atomic and unit-cell parameters as well as their respective s.u.’s. If the s.u.’s in atomic parameters are considered to be totally uncorrelated with the s.u.’s in the cell parameters, i.e. their covariance is zero, then the s.u. in a derived parameter can be considered as comprising two independent sources of uncertainties: where is the part coming from the uncertainties in the least-square estimates of the positional parameters, and comes from the uncertainties in the unit-cell parameters, where . This necessitates the calculation of the derivatives of the function with respect to the unit-cell parameters. In order to do so, it is easier to calculate separately the derivatives of the function with respect to the elements of the metrical matrix, and also the derivatives of the metrical matrix with respect to the cell parameters, and then to use the chain rule Indeed must be evaluated for every function, whereas is constant for a given unit cell. Now we consider the application of equation (36) to determine the s.u. in the length of the vector u, in fractional coordinates. The length, D, of the vector u is given by where G is the metrical matrix (see e.g. Giacovazzo et al., 2011 ▶). The derivatives of the distance, D, with respect to the elements of the metrical matrix, G, are given by and (given the metrical matrix is symmetric) Similarly, for the angle between two vectors in fractional coordinates, u and v, where the angle is defined as or where and are the Cartesian equivalents of u and v, the derivative of the angle, θ, with respect to the elements of the metrical matrix, , is given by and The derivatives of the metrical matrix with respect to the unit-cell parameters, needed in order to apply equation (39) are given below:

Symmetry

The variance–covariance matrix that is obtained from the inversion of the least-squares normal matrix contains the variance and covariance of all the refined parameters. Frequently, it is necessary to compute functions that involve parameters that are related by some symmetry operator of the space group to the original parameters. Sands (1966 ▶) suggests that the symmetry should be applied to the variance–covariance matrix to obtain a new variance–covariance matrix for the symmetry-generated atoms. Alternatively, and it is this method that is used here, the original variance–covariance matrix can be used if the derivatives in equation (36) are mapped back to the original parameters. Let the function f depend on the Cartesian site that is generated by the symmetry operator from the original Cartesian site , i.e. Then the gradient with respect to the original site can be obtained by The variance–covariance matrix that is used in this case should be the one that is transformed to Cartesian coordinates. The variance–covariance matrix for Cartesian coordinates can be obtained from that for fractional coordinates by the transformation where the transformation matrix needed to transform the entire variance–covariance matrix in one operation would be block diagonal, with the orthogonalization matrix O repeated at the appropriate positions along the diagonal. This transformation can be computed efficiently using sparse-matrix techniques.

Refinement results

olex2.refine uses CIF (Hall et al., 1991 ▶) as its main output. The CIF contains information regarding the space group, the data indicators such as merging indices, the refinement indicators such as the R factors, goodness of fit, residual electron density, refinement convergence indicators and tabulated structure information, including tables of atomic parameters, bonds and angles. olex2.refine produces a table describing the restraints using the CIF restraints dictionary. Moreover, the olex2.refine CIF always contains a verbal description of the refinement model – hydrogen-atom treatment, constraints, restraints and their targets. Optionally, the refinement model file (SHELX RES file) and the reflections can also be included in the final CIF for deposition. It should be noted that the constraints and restraints unique to olex2.refine, i.e. not featured by SHELX, are saved in REM sections (using an XML format). This has the advantage that they can be read back by Olex2 while providing a ‘.res file’ that can also be refined with SHELX, albeit with potentially a different model. As part of this work a CIF-handling toolbox (Gildea et al., 2011 ▶) was added to cctbx.

Conclusion

We have presented herein the full mathematical derivations of the concepts used within olex2.refine. This new refinement engine is feature-wise on a par with the established software in the field such as SHELXL and CRYSTALS. It actually provides a richer wealth of constraints than those classic suites. olex2.refine is immediately useful to the practising crystallographer since it is presently available from the program Olex2 by default. It can also be used independently as a library of components to write short scripts as well as more complex programs, dealing with any aspect of small-molecule refinement. Indeed, olex2.refine is largely based on the Small-Molecule Toolbox (smtbx) that is part of the Computational Crystallography Toolbox (cctbx), which is usable as a library for the Python programming language, thus providing great expressiveness, conciseness and ease of coding combined with an immense wealth of tools covering all of crystallography. We have explained herein how the smtbx added constrained least-squares refinement from scratch to the cctbx and how it added many restraints as well, thus opening new fields to the cctbx.

4 in total

1. Efficient anisotropic refinement of macromolecular structures using FFT.

Authors: G N Murshudov; A A Vagin; A Lebedev; K S Wilson; E J Dodson
Journal: Acta Crystallogr D Biol Crystallogr Date: 1999-01-01

2. A short history of SHELX.

Authors: George M Sheldrick
Journal: Acta Crystallogr A Date: 2007-12-21 Impact factor: 2.290

3. PHENIX: a comprehensive Python-based system for macromolecular structure solution.

Authors: Paul D Adams; Pavel V Afonine; Gábor Bunkóczi; Vincent B Chen; Ian W Davis; Nathaniel Echols; Jeffrey J Headd; Li-Wei Hung; Gary J Kapral; Ralf W Grosse-Kunstleve; Airlie J McCoy; Nigel W Moriarty; Robert Oeffner; Randy J Read; David C Richardson; Jane S Richardson; Thomas C Terwilliger; Peter H Zwart
Journal: Acta Crystallogr D Biol Crystallogr Date: 2010-01-22

4. REFMAC5 for the refinement of macromolecular crystal structures.

Authors: Garib N Murshudov; Pavol Skubák; Andrey A Lebedev; Navraj S Pannu; Roberto A Steiner; Robert A Nicholls; Martyn D Winn; Fei Long; Alexei A Vagin
Journal: Acta Crystallogr D Biol Crystallogr Date: 2011-03-18

4 in total

97 in total

1. DiSCaMB: a software library for aspherical atom model X-ray scattering factor calculations with CPUs and GPUs.

Authors: Michał L Chodkiewicz; Szymon Migacz; Witold Rudnicki; Anna Makal; Jarosław A Kalinowski; Nigel W Moriarty; Ralf W Grosse-Kunstleve; Pavel V Afonine; Paul D Adams; Paulina Maria Dominiak
Journal: J Appl Crystallogr Date: 2018-02-01 Impact factor: 3.304

2. X-Ray Crystal Structure of a 2-Amino-3,4-dihydroquinazoline 5-HT₃ Serotonin Receptor Antagonist and Related Analogs.

Authors: Ahmed S Abdelkhalek; Faik N Musayev; Kavita A Iyer; Prithvi Hemanth; Martin K Safo; Małgorzata Dukat
Journal: J Mol Struct Date: 2019-10-23 Impact factor: 3.196

3. Crystal structure of hexa-sodium tetra-serinolium paratungstate B deca-hydrate, [Na₆{(CH₂OH)₂CHNH₃}₄][W₁₂O₄₀(OH)₂]·10H₂O.

Authors: Kleanthi Sifaki; Nadiia I Gumerova; Gerald Giester; Annette Rompel
Journal: Acta Crystallogr E Crystallogr Commun Date: 2022-01-18

4. Chemoselective O-Alkylation of 4-(Trifluoromethyl)pyrimidin-2(1H)-ones Using 4-(Iodomethyl)pyrimidines.

Authors: Mateus Mittersteiner; Genilson S Pereira; Ludger A Wessjohann; Helio G Bonacorso; Marcos A P Martins; Nilo Zanatta
Journal: ACS Omega Date: 2022-05-24

5. Influence of modelling disorder on Hirshfeld atom refinement results of an organo-gold(I) compound.

Authors: Sylwia Pawlędzio; Maura Malinska; Florian Kleemiss; Simon Grabowsky; Krzysztof Woźniak
Journal: IUCrJ Date: 2022-06-11 Impact factor: 5.588

6. Implementation of the riding hydrogen model in CCTBX to support the next generation of X-ray and neutron joint refinement in Phenix.

Authors: Dorothee Liebschner; Pavel V Afonine; Alexandre G Urzhumtsev; Paul D Adams
Journal: Methods Enzymol Date: 2020-02-13 Impact factor: 1.600

7. Synthesis of multi-substituted pyridines from ylidenemalononitriles and their emission properties.

Authors: Juliana M de Souza; Irini Abdiaj; Jiaqi Chen; Kenneth Hanson; Kleber T de Oliveira; D Tyler McQuade
Journal: Org Biomol Chem Date: 2021-03-11 Impact factor: 3.876

8. Accurate crystal structures and chemical properties from NoSpherA2.

Authors: Florian Kleemiss; Oleg V Dolomanov; Michael Bodensteiner; Norbert Peyerimhoff; Laura Midgley; Luc J Bourhis; Alessandro Genoni; Lorraine A Malaspina; Dylan Jayatilaka; John L Spencer; Fraser White; Bernhard Grundkötter-Stock; Simon Steinhauer; Dieter Lentz; Horst Puschmann; Simon Grabowsky
Journal: Chem Sci Date: 2020-11-09 Impact factor: 9.825

9. Mono- and Mixed Metal Complexes of Eu³⁺, Gd³⁺, and Tb³⁺ with a Diketone, Bearing Pyrazole Moiety and CHF₂-Group: Structure, Color Tuning, and Kinetics of Energy Transfer between Lanthanide Ions.

Authors: Victoria E Gontcharenko; Mikhail A Kiskin; Vladimir D Dolzhenko; Vladislav M Korshunov; Ilya V Taydakov; Yury A Belousov
Journal: Molecules Date: 2021-05-01 Impact factor: 4.411

10. A synthetic small molecule stalls pre-mRNA splicing by promoting an early-stage U2AF2-RNA complex.

Authors: Rakesh Chatrikhi; Callen F Feeney; Mary J Pulvino; Georgios Alachouzos; Andrew J MacRae; Zackary Falls; Sumit Rai; William W Brennessel; Jermaine L Jenkins; Matthew J Walter; Timothy A Graubert; Ram Samudrala; Melissa S Jurica; Alison J Frontier; Clara L Kielkopf
Journal: Cell Chem Biol Date: 2021-03-08 Impact factor: 9.039