Literature DB >> 20017124

Fast determination of the optimal rotational matrix for macromolecular superpositions.

Pu Liu¹, Dimitris K Agrafiotis, Douglas L Theobald.

Abstract

Finding the rotational matrix that minimizes the sum of squared deviations between two vectors is an important problem in bioinformatics and crystallography. Traditional algorithms involve the inversion or decomposition of a 3 x 3 or 4 x 4 matrix, which can be computationally expensive and numerically unstable in certain cases. Here, we present a simple and robust algorithm to rapidly determine the optimal rotation using a Newton-Raphson quaternion-based method and an adjoint matrix. Our method is at least an order of magnitude more efficient than conventional inversion/decomposition methods, and it should be particularly useful for high-throughput analyses of molecular conformations. Copyright 2009 Wiley Periodicals, Inc.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Proteins

Year: 2010 PMID： 20017124 PMCID： PMC2958452 DOI： 10.1002/jcc.21439

Source DB: PubMed Journal: J Comput Chem ISSN： 0192-8651 Impact factor: 3.376

Introduction

The root-mean-square distance (RMSD) is a common metric used to characterize the similarity between two vector sets (e.g., protein structures).1 The minimum RMSD is conventionally determined using the method of least squares in which an optimal translation vector and rotation matrix are found that minimize the sum of the squared distances between corresponding atoms in two coordinate sets. Determining the optimal rotation matrix can be a rate-limiting step in several computationally intensive structural bioinformatics algorithms where large numbers of structures must be compared, such as in aligned-fragment-pair multiple protein structure alignment,2–4 fragment-assembly protein structure predictions,5 conformation sampling for structure-based drug design,6 and high-throughput superpositioning of analogous and homologous protein domains in the entire PDB database.7 Hence, more efficient superposition algorithms are desirable. Considerable effort has been directed toward developing fast and robust algorithms for determining the RMSD and the corresponding optimal rotation.8–15 For example, Kabsch calculates the optimal rotation by solving a least-squares problem with orthogonality constraints ensured by a Lagrange multiplier. This method requires the calculation of the eigenvalues and eigenvectors of a 3 × 3 matrix. In addition, improper rotation matrices may arise when the determinant of a key matrix is negative,11 which requires special handling.16–18 Ferro and Hermans (1977) approximate the rotational matrix by applying the best rotation about each Cartesian axis iteratively, which requires expensive square root operations and matrix multiplications.9 McLachlan describes a method to calculate the rotational matrix using conjugate gradient minimization and a succession of finite rotations about the conjugate axes.13 The coordinate sets must be updated in every iteration making this method computationally expensive for large systems. Lesk reduces the superposition problem to an unconstrained maximization of a function of a single variable. However, the evaluation of this function requires dynamically updating the coefficients of a quartic polynomial and locating its real roots.12 Horn,10 Diamond,8 Kearsley,15 and Theobald14 represent the rotations as quaternions and cast the original problem as an eigenvalue/eigenvector problem for a 4 × 4 matrix. In particular, Diamond developed a fast iterative method to calculate the minimum RMSD. However, his method is unstable when the required rotation is close to 180o because the matrix to be inverted becomes singular.8,14 Theobald circumvents the decomposition and inversion problem by using a Newton-Raphson (NR) algorithm that solves the characteristic polynomial for the minimum RMSD. While Theobald's method does not provide the optimal rotation matrix, the approach is over an order of magnitude more efficient when only the RMSD is of interest.14 Comparison of the Average Computational Time Required to Determine One Optimal Rotational Matrix for the Current Method (QCP) and the Traditional Household Reduction and QL Decomposition Approach (H-QL) Based on Horn's quaternion approach and Theobald's NR quaternion-based characteristic polynomial (NR-QCP) method, we present an extremely efficient algorithm to determine the optimal rotational matrix in the superposition problem. As in the previous article,14 the RMSD is first evaluated by solving for the most positive eigenvalue of the key matrix using the NR-QCP algorithm. Here, we show how to use this eigenvalue to rapidly determine the optimal rotation matrix. The best rotation is given by the corresponding eigenvector, which is calculated via the adjoint matrix. The present method has several advantages: (i) the time required to calculate the rotation matrix is independent of the system size after a special 3 × 3 matrix is constructed from the coordinates, (ii) no special cases need to be handled separately, and (iii) the approach is extremely fast, straightforward, and robust, as there is no expensive matrix inversion or decomposition. To our knowledge, the algorithm presented here is by far the fastest method currently available for superpositioning macromolecules.

The Weighted Least-Squares Superposition Problem

The structure of a molecule with N atoms can be conveniently represented as a N × 3 matrix in which the i-th row corresponds to the x,y,z coordinates of the i-th atom. Let and be two structures under consideration, and be a diagonal weighting matrix with the i-th diagonal element representing the weight for the i-th atom. If each structure is translated so that its centroid is at the origin, the superposition problem is to find an optimal rotation that minimizes the following function11,19: where ; c and a are the elements of the matrices and , respectively, and w is the i-th diagonal element of the matrix . If eq. (1) is expressed in matrix format and expanded, it can be seen that: where tr() is the trace of the matrix X* represents the transpose of X, G is the weighted inner product of structure A, and the matrix is the inner product of two structures A and B, and

Determination of the Optimal Rotation Matrix

Horn has shown that the optimal rotational matrix in the unit quaternion representation is the eigenvector associated with the most positive eigenvalue of the following symmetric 4 × 4 matrix 10: The eigenvalues can be determined by locating the roots of the characteristic polynomial det(), where is the identity matrix, λ is one of the eigenvalues, and det() represents the determinant of the matrix . As shown by Theobald,14 the coefficients of the quartic polynomial for the key matrix can be determined with at most 66 floating point operations (FLOPs). For this 4 × 4 matrix, the most positive root is bounded from above by the average of two self inner products, (G + G)/2. The use of this upper bound as the initial guess leads to quick and stable location of the most positive root with the NR method.14 This method only takes about five iterations for convergence to a relative precision of 10−6.14 Because there are only 11 FLOPs involved in every iteration,14 this method is extremely efficient in calculating the most positive root from which the RMSD is given by . The optimal rotation matrix corresponds to the eigenvector associated with the largest eigenvalue of the key matrix As the eigenvalue has been determined as stated earlier, one may solve for the eigenvector using standard iterative eigen-decomposition methods to solve the homogeneous equation ( − λ)e = 0. However, because is a small 4 × 4 matrix, one may efficiently determine the eigenvector analytically from the adjoint matrix. From basic linear algebra, it can be shown that adj() = det(), where adj() is the adjoint matrix for any matrix .20 If = − λ and λ is an eigenvalue (i.e., det( − λ) = 0), then any nonzero column of the adjoint of the matrix ( − λ) is an eigenvector associated with the eigenvalue λ.20 Calculating the first column of the adjoint matrix requires only 28 multiplications and 26 subtractions/additions. If the first column of the adjoint matrix is zero or very small, then calculation of the eigenvector may suffer from floating point error, and the calculation of one or more columns is necessary. However, for all the >109 superposition operations we performed, we have found that the first column is sufficient. Even in the worst case, where the entire adjoint matrix needs to be constructed, only an additional 60 multiplications and 39 subtractions/additions are required. The optimal rotational matrix is then uniquely determined by the resulting unit quaternion. To explore the robustness and efficiency of this method, we performed >109 superpositions for short protein fragments. Pairwise RMSDs were also calculated for protein conformations from the publicly accessible “ensemble protein database.”21 Table 1 compares the times for determining the optimal rotation determination using our approach QCP versus the traditional Householder reduction method followed by QL decomposition with implicit shift (H-QL).22,23 The time spent for the construction of the matrix is not included in timing because it is a prerequisite for all the methods. For accurate timing, the rotational matrix was calculated repeatedly 500,000 and 50,000 times for the QCP and H-QL approaches, respectively. All calculations were performed on an IBM Thinkpad T61 laptop computer equipped with a single dual-core 2GHz mobile Intel processor and 1.96 GB 667MHz DRAM. Our QCP method is about 20 times faster than the H-QL method, while giving identical rotational matrices within floating point error. Many widely used programs rely on extensive superpositioning. For example, FATCAT4 and Matt2 were proven to be able to align multiple protein structures and identify homologous residues efficiently. Rosetta5 has widely used in ab initio protein prediction and protein design.24,25 These programs could all potentially benefit from the algorithm presented herein. For the convenience of the audience, ANSI C source code of the present algorithm is organized to be integrated into existing packages straightforwardly with minimal effort. The code and the instruction are publicly available without charge under the BSD license from http://theobald.brandeis.edu/QCP/. For questions regarding to the code, please contact pliu24@its.jnj.com or dtheobald@brandeis.edu.

Table 1

Comparison of the Average Computational Time Required to Determine One Optimal Rotational Matrix for the Current Method (QCP) and the Traditional Household Reduction and QL Decomposition Approach (H-QL)

Protein	PDB Id	Number of residues	Number of structures	Time (μs) QCP	Time (μs) H-QL
d-Galactose/Glucose binding protein	2GBP	309	297	0.185	3.57
Human CDC25B catalytic domain	1QB0	177	400	0.200	3.54
Barstar	1A19	89	191	0.201	4.11
Alpha-Amylase inhibitor	1HOE	74	129	0.200	4.37
Calmodulin	1CFD	72	196	0.195	3.96
Ferredoxin II	1FXD	58	141	0.196	3.92

10 in total

1. A revised proof of the metric properties of optimally superimposed vector sets.

Authors: Boris Steipe
Journal: Acta Crystallogr A Date: 2002-09-01 Impact factor: 2.290

2. Design of a novel globular protein fold with atomic-level accuracy.

Authors: Brian Kuhlman; Gautam Dantas; Gregory C Ireton; Gabriele Varani; Barry L Stoddard; David Baker
Journal: Science Date: 2003-11-21 Impact factor: 47.728

3. The Structure Superposition Database.

Authors: Ranyee A Chiang; Elaine C Meng; Conrad C Huang; Thomas E Ferrin; Patricia C Babbitt
Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971

4. Flexible structure alignment by chaining aligned fragment pairs allowing twists.

Authors: Yuzhen Ye; Adam Godzik
Journal: Bioinformatics Date: 2003-10 Impact factor: 6.937

5. Rapid calculation of RMSDs using a quaternion-based characteristic polynomial.

Authors: Douglas L Theobald
Journal: Acta Crystallogr A Date: 2005-06-23 Impact factor: 2.290

6. Self-organizing superimposition algorithm for conformational sampling.

Authors: Fangqiang Zhu; Dimitris K Agrafiotis
Journal: J Comput Chem Date: 2007-05 Impact factor: 3.376

7. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path.

Authors: I N Shindyalov; P E Bourne
Journal: Protein Eng Date: 1998-09

8. High-resolution structure prediction and the crystallographic phase problem.

Authors: Bin Qian; Srivatsan Raman; Rhiju Das; Philip Bradley; Airlie J McCoy; Randy J Read; David Baker
Journal: Nature Date: 2007-10-14 Impact factor: 49.962

9. De novo computational design of retro-aldol enzymes.

Authors: Lin Jiang; Eric A Althoff; Fernando R Clemente; Lindsey Doyle; Daniela Röthlisberger; Alexandre Zanghellini; Jasmine L Gallaher; Jamie L Betker; Fujie Tanaka; Carlos F Barbas; Donald Hilvert; Kendall N Houk; Barry L Stoddard; David Baker
Journal: Science Date: 2008-03-07 Impact factor: 47.728

10. Matt: local flexibility aids protein multiple structure alignment.

Authors: Matthew Menke; Bonnie Berger; Lenore Cowen
Journal: PLoS Comput Biol Date: 2008-01 Impact factor: 4.475

10 in total

28 in total

1. Simulations of the alternating access mechanism of the sodium symporter Mhp1.

Authors: Joshua L Adelman; Amy L Dale; Matthew C Zwier; Divesh Bhatt; Lillian T Chong; Daniel M Zuckerman; Michael Grabe
Journal: Biophys J Date: 2011-11-15 Impact factor: 4.033

2. ART-RRT: As-Rigid-As-Possible search for protein conformational transition paths.

Authors: Minh Khoa Nguyen; Léonard Jaillet; Stéphane Redon
Journal: J Comput Aided Mol Des Date: 2019-08-21 Impact factor: 3.686

3. Structural features and lipid binding domain of tubulin on biomimetic mitochondrial membranes.

Authors: David P Hoogerheide; Sergei Y Noskov; Daniel Jacobs; Lucie Bergdoll; Vitalii Silin; David L Worcester; Jeff Abramson; Hirsh Nanda; Tatiana K Rostovtseva; Sergey M Bezrukov
Journal: Proc Natl Acad Sci U S A Date: 2017-04-18 Impact factor: 11.205

10. Inhibition of protein interactions: co-crystalized protein-protein interfaces are nearly as good as holo proteins in rigid-body ligand docking.

Authors: Saveliy Belkin; Petras J Kundrotas; Ilya A Vakser
Journal: J Comput Aided Mol Des Date: 2018-07-12 Impact factor: 3.686