| Literature DB >> 21453553 |
Ling-Hong Hung1, Michal Guerquin, Ram Samudrala.
Abstract
BACKGROUND: Calculation of the root mean square deviation (RMSD) between the atomic coordinates of two optimally superposed structures is a basic component of structural comparison techniques. We describe a quaternion based method, GPU-Q-J, that is stable with single precision calculations and suitable for graphics processor units (GPUs). The application was implemented on an ATI 4770 graphics card in C/C++ and Brook+ in Linux where it was 260 to 760 times faster than existing unoptimized CPU methods. Source code is available from the Compbio website http://software.compbio.washington.edu/misc/downloads/st_gpu_fit/ or from the author LHH.Entities:
Year: 2011 PMID: 21453553 PMCID: PMC3087690 DOI: 10.1186/1756-0500-4-97
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Figure 1GPU-Q-J RMSD calculation procedure. The methodology used to calculate RMSDs on GPUs is shown. The Cartesian coordinates of the two proteins are reshuffled in 4-vectors. This allows the use of built-in dot product operations for the calculation of the covariance matrix . Because we do not center the coordinates beforehand so that their barycenters are at the origin, a second term involving the mean values of the coordinates must be subtracted. By combining the two steps, we avoid an expensive extra fetch of coordinates. Optimized 4-vector summation is used to calculate the coordinate means. The values of the covariance matrix are maintained as double precision but the 4 × 4 matrix passed to the cyclic-Jacobi routine is single precision. This compromise increases the accuracy in some degenerate cases without sacrificing speed, as the vast majority of calculations take place as 4-vector single precision operations. The final value of RMSD obtained is identical that obtained by the CPU methods to at least 2 decimal places.
Figure 2Acceleration of RMSD calculations. Four CPU implementations of different RMSD calculation algorithms, Quaternion Characteristic Polynomial (Q-CP), Quaternion Power (Q-P), Quaternion Jacobi (Q-J) and Rotational (Rot) were compared against our GPU implementation of the Q-J algorithm (GPU-Q-J). 6 different datasets from NRW, comprising of predictions of protein structures ranging from 70 to 140 residues in size were used as the test set. The time required to calculate the RMSDs for each pair of structures in the ensemble are displayed relative to the time required by the GPU implementation. Numbers on the right indicate the average for the 6 datasets. The results show that for ensembles of greater than 1000 structures, the overhead in setting up the GPU algorithm becomes negligible. There is a 260-fold increase in speed over the fastest CPU implementation. This increase in speed allows large ensembles of structures to be clustered quickly and relieves a major bottleneck in processing the results from NRW.