| Literature DB >> 20529915 |
Shihua Zhang1, Daven Vasishtan, Min Xu, Maya Topf, Frank Alber.
Abstract
MOTIVATION: Single-particle cryo electron microscopy (cryoEM) typically produces density maps of macromolecular assemblies at intermediate to low resolution (approximately 5-30 A). By fitting high-resolution structures of assembly components into these maps, pseudo-atomic models can be obtained. Optimizing the quality-of-fit of all components simultaneously is challenging due to the large search space that makes the exhaustive search over all possible component configurations computationally unfeasible.Entities:
Mesh:
Year: 2010 PMID: 20529915 PMCID: PMC2881386 DOI: 10.1093/bioinformatics/btq201
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Our protocol for simultaneous fitting of component structures into density maps is divided into two stages. First, approximate positions of all components are determined at a coarse information level by our IQP point matching approach (upper grey shading). By varying the initial parameter settings, an ensemble of solutions is generated. At a second stage, all candidate structures are assessed and structurally refined using the initial density map and the density of the component structures simulated at the same resolution (lower grey shading).
Fig. 2.An illustration of the feature point matching procedure. The goal is to match simultaneously the point sets of Component 1 and Component 2 with the Assembly point set. All point sets are shown as spheres, where the size of a sphere represents the averaged density value in a defined volume of the density map, which is within five grid voxels of the corresponding feature point. The dashed lines between the spheres represent all possible distances within each component and within the assembly. a and b are the distances between points i and k in Component 1 and points j and l in the Assembly, respectively. The value of the binary variable x is set to 1 if point i in Component 1 matches with point j in the Assembly and x is set to zero otherwise. Correspondingly, the product xx is 1 if distance a in Component 1 matches with distance b in the Assembly and is 0 otherwise. The aim of IQP is to find the best matching x with maximized IQP score F(U1, U2, V1, V2) (see Section 2).
Summary of benchmark results
| Assembly | Comp. | Sym. | Feat. Points | Time (s) (Total time in min) | Lowest RMSD structure | Best CCF ranking structure | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| CCF (Lapl.-CCF) | CPS (Å, ○) | RMSD | RMSD* | CPS (Å, ○) | RMSD | RMSD* | |||||
| 1DOR | 2 | Y | 10 | 0.16 (1.33) | 2 (1) | (1.1, 6.8) | 2.1 | 1.1 | (0.6, 9.5) | 2.5 | 1.2 |
| 1AFW | 2 | Y | 10 | 0.15 (1.25) | 2 (1) | (2.3, 14.4) | 4.8 | 0.9 | (2.5, 15.0) | 4.9 | 0.9 |
| 1PC8 | 2 | N | 10 | 0.10 (0.83) | 6 (10) | (1.1, 3.1) | 1.3 | 0.5 | (0.8, 6.4) | 1.6 | 0.5 |
| 1TX4 | 2 | N | 11 | 0.14 (1.17) | 8 (6) | (1.2, 2.8) | 2.6 | 0.4 | (0.7, 2.9) | 3.0 | 0.4 |
| 1NIC | 3 | Y | 15 | 0.65 (5.42) | 1 (1) | (5.6, 5.1) | 5.9 | 1.1 | (5.6, 5.1) | 5.9 | 1.1 |
| 1CS4 | 3 | N | 11 | 0.16 (1.33) | 8 (7) | (2.4, 24.0) | 6.5 | 1.8 | (2.3, 55.5) | 12.8 | 11.7 |
| 2DQJ | 3 | N | 12 | 0.20 (1.67) | 34(11) | (2.0, 21.1) | 4.5 | 1.7 | (1.4, 62.1) | 9.5 | 7.8 |
| 1F1X | 4 | Y | 12 | 0.42 (3.50) | 2 (18) | (2.4, 14.6) | 4.6 | 0.9 | (2.3, 168.4) | 28.2 | 26.1 |
| 2BO9 | 4 | N | 18 | 0.75 (6.25) | 1 (1) | (1.1, 4.6) | 1.7 | 1.1 | (1.1, 4.6) | 1.7 | 1.1 |
| 2REC | 6 | Y | 30 | 2.56 (21.33) | 1 (1) | (1.3, 4.2) | 1.7 | 1.0 | (1.3, 4.2) | 1.7 | 1.0 |
| 1J2P | 7 | Y | 28 | 2.48 (20.67) | 1 (3) | (1.6, 16.2) | 4.4 | 1.5 | (1.6, 16.2) | 4.4 | 1.5 |
The individual columns are: Assembly the PDB ID (Bernstein et al., 1977) of the assembly structure being used; Comp., the number of components of the assembly; Sym., indicates if the assembly structure is symmetric (Y) or non symmetric (N); Feat. Points, the number of feature points being used; Time, the average running time for an IQP run, and the total time of 500 IQP runs is shown in brackets; CCF (Lapl. -CCF), the rank of the structure with the lowest RMS error based on the CCF and Laplacian CCF, respectively; CPS, Component placement score composed of two elements (the shift and orientation). The average component placement score for all components is shown. RMSD, the root-mean-square error (RMS error) between the corresponding Cα atoms in the fitted and the native structures. RMSD*, the RMS error of the assembly after wICP refinement.
Fig. 4.Simultaneous fitting of six components into the symmetric hexamer 2REC. (A) The dependency of fitting accuracy, measured by the RMS error (RMSD) with respect to the number of feature points M of the assembly. (B) The dependency of running time (in seconds) on the number of feature points M of the assembly. (A) and (B) are calculated using density maps at 20 Å resolution. (C) The fitting accuracy at different resolutions of the initial density map. The calculations for (C) are performed with 30 feature points per assembly. Results are shown after IQP fitting (IQP) and after additional refinement with wICP (IQP*).
Comparison of the performance between IQP and GMFIT fitting
| Assembly | Comp. | GMFIT | IQP | |||
|---|---|---|---|---|---|---|
| Time (s) | RMSD | Time (s) | RMSD | RMSD* | ||
| 1AFW | 2 | 7.1 | 1.0 | 0.15 | 4.8 | 0.9 |
| 1NIC | 3 | 16.0 | 1.8 | 0.65 | 5.9 | 1.1 |
| 2REC | 6 | 110.9 | 2.3 | 2.56 | 1.7 | 1.0 |
RMS errors between the fitted and native structures are shown for three different assemblies. GMFIT used 16 GDFs (Gaussian distribution functions) to represent the atomic structures of each component and 12 GDFs to represent the density map of the assembly (Kawabata, 2008). The IQP model used 5 feature points per component for each assembly. The individual columns are: Assembly, the PDB ID (Bernstein et al., 1977) of the assembly structure being used; Comp., the number of components of the assembly; RMSD, RMS error between the corresponding Cα atoms in the fitted and assembly structures; RMSD*, the RMS error after refinement.