Agnel Praveen Joseph1, Ingvar Lagerstedt2, Arjen Jakobi3, Tom Burnley1, Ardan Patwardhan2, Maya Topf4, Martyn Winn1. 1. Scientific Computing Department, Science and Technology Facilities Council, Research Complex at Harwell, Didcot OX11 0FA, United Kingdom. 2. European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom. 3. Kavli Institute of Nanoscience Delft (KIND), Department of Bionanoscienes, Delft University of Technology, Van der Maasweg 9, 2629 HZ Delft, The Netherlands. 4. Institute of Structural and Molecular Biology, Department of Biological Sciences, Birkbeck College, University of London, Malet Street, London WC1E 7HX, United Kingdom.
Abstract
Cryogenic electron microscopy (cryo-EM) is a powerful technique for determining structures of multiple conformational or compositional states of macromolecular assemblies involved in cellular processes. Recent technological developments have led to a leap in the resolution of many cryo-EM data sets, making atomic model building more common for data interpretation. We present a method for calculating differences between two cryo-EM maps or a map and a fitted atomic model. The proposed approach works by scaling the maps using amplitude matching in resolution shells. To account for variability in local resolution of cryo-EM data, we include a procedure for local amplitude scaling that enables appropriate scaling of local map contrast. The approach is implemented as a user-friendly tool in the CCP-EM software package. To obtain clean and interpretable differences, we propose a protocol involving steps to process the input maps and output differences. We demonstrate the utility of the method for identifying conformational and compositional differences including ligands. We also highlight the use of difference maps for evaluating atomic model fit in cryo-EM maps.
Cryogenic electron microscopy (cryo-EM) is a powerful technique for determining structures of multiple conformational or compositional states of macromolecular assemblies involved in cellular processes. Recent technological developments have led to a leap in the resolution of many cryo-EM data sets, making atomic model building more common for data interpretation. We present a method for calculating differences between two cryo-EM maps or a map and a fitted atomic model. The proposed approach works by scaling the maps using amplitude matching in resolution shells. To account for variability in local resolution of cryo-EM data, we include a procedure for local amplitude scaling that enables appropriate scaling of local map contrast. The approach is implemented as a user-friendly tool in the CCP-EM software package. To obtain clean and interpretable differences, we propose a protocol involving steps to process the input maps and output differences. We demonstrate the utility of the method for identifying conformational and compositional differences including ligands. We also highlight the use of difference maps for evaluating atomic model fit in cryo-EM maps.
Over
the past few years, cryogenic electron microscopy (cryo-EM)
has had an enormous impact on the structure determination of large
and dynamic molecular machines. Better detectors and algorithms for
three-dimensional structure reconstruction from images have helped
in achieving near atomic resolutions. There has been a large influx
of structures solved using cryo-EM in the central repository—the
Electron Microscopy Data Bank (EMDB, https://www.ebi.ac.uk/pdbe/emdb/statistics_main.html/)—and this is expected to rise dramatically in the coming
years. The lack of validation methods and guidelines to deal with
this data has been realized, and efforts are underway to address this.[1−3]Cryo-EM enables structure determination of different functional
forms of biological macromolecules in the near-native state.[4] Comparison of individual forms gives insights
into the biological pathway of the molecule. In some cases, new (different
state or conformation) cryo-EM structures are compared to existing
ones to understand structural and functional differences. Usually
difference maps are calculated for such comparisons, and the maps
are scaled to an equivalent density range prior to such calculations.
Approaches for global density scaling exist; e.g., Relion[5] (relion_image_handler), EMAN2[6] (e2proc3d), diffmap (http://grigoriefflab.janelia.org/diffmap), and BSoft[7] (bscale) work by scaling
amplitudes in each resolution shell of a map to that of a reference
power spectrum (usually based on an atomic model).Sample heterogeneity
arising from conformational and/or compositional
differences limits the resolution of cryo-EM reconstructions, often
resulting in local anisotropy of data resolution. The periphery of
the macromolecular complex is usually less resolved compared to the
core. Flexible domains or subunits with partial occupancy may be smoothed
out as well. Local scaling of maps has been found useful to improve
interpretation of density features with appropriate scaling estimated
based on local resolution differences.[8] In this approach, a reference power spectrum (of an atomic model)
from a local window is used for scaling the corresponding segment
of the map.Apart from calculating map–map differences,
local scaling
may be appropriate for model–map comparisons as well. A segment
of an atomic model with high B-factors (larger uncertainty in atomic
positions) often relates to poorly resolved areas of the map and hence
scales differently compared to a better resolved segment. Difference
maps are very useful pointers to areas in the map where the atomic
model fit is poor or incomplete. For structure determination using
X-ray crystallography, difference map calculations have been used
regularly for ligand identification and fixing atomic model fits in
density.In this study, we implement a generic approach for
calculating
difference densities for cryo-EM data. The two maps to be compared
are scaled based on Fourier amplitude matching before computing the
difference. The proposed method has the ability to scale maps locally
taking the local density variations into account. For intermediate
resolutions and noisy data, it is often difficult to get clean and
interpretable difference maps. We use map preprocessing steps including
masking, dusting, and filtering before scaling and associate a fractional
difference with each voxel to help interpret the differences. The
protocol presented here is the result of trying several approaches
to obtain clean and interpretable differences. We test its application
for detecting compositional and conformational differences and also
as a tool for validating atomic model fits in maps. We also provide
a user-friendly GUI implementation of this method in the CCP-EM software
package.[9]
Methods
We implemented
a method for calculating difference maps based on
either global or local amplitude scaling. The approach involves the
following steps:Map preprocessingTo minimize
the effects of background density artifacts on the scaling procedure,
contour thresholds can be selected for the experimental maps, or a
mask may be applied. This step is optional, but for a few cases discussed
in this paper, we noticed density artifacts in the original map which
possibly resulted from use of tight masks during map postprocessing.
For the test cases, we selected a contour threshold of two times sigma
from the background peak. Upon visual inspection, we found that most
of the densities arising from background artifacts are flattened at
this threshold. However, the choice of the threshold level is often
subjective and can vary depending on the density distribution, background
artifacts, and map resolution. For a systematic segregation of molecular
volume and background noise, the local signal with respect to noise
has to be quantified. One of the approaches that deals with the separation
of signal from background noise is the false discovery rate control.[10] This uses a statistical framework to calculate
3D confidence maps whose values (ranging from 0 to 1) correspond to
the confidence that the voxel contains a signal separated from the
background noise. The confidence map can be used as a mask for processing
the map or as a guide to choose a contour threshold for the map. A
graphical interface to this tool is available through the CCP-EM software
suite.Density values below the threshold were set to zero,
and a dust
filter was applied to remove any small disconnected densities that
remained. To this end, the sizes of disconnected densities (in number
of voxels) are divided into 20 bins. Those density islands that fall
into bins having a frequency of more than 10% and also having mean
densities within the lower 50% of the density range are removed.To minimize the effect of sharp contour edges on scaling, the edge
at the selected contour was smoothed by convolution with a Gaussian
kernel. We used the implementation of the n-dimensional Gaussian filter
in SciPy[11] with a sigma of 1 (radius of
the filter kernel is four times sigma) to smooth the edges at the
contour threshold. This results in a soft mask applied to the map,
where the density values within the contour are not altered, and voxels
at the edge are affected by this filter to obtain a smoother falloff
to zero.For calculating the difference between a map and model,
a simulated
map was calculated from the atomic model using Refmac5,[12] which uses electron scattering factors and considers
the map resolution and atomic B-factors to generate density.[12]Low pass filteringFor calculating
differences between experimental maps resolved at different resolutions,
the maps are low-pass filtered to the lower resolution of the two
maps using a hyperbolic tangent (tanh) filter (in TEMPy[13]) which is similar to that of the tanh filter
in EMAN2.[6]ScalingThe amplitude scaling
can be performed either globally or locally over sliding windows.For global amplitude scaling, the whole map grid is used for the
calculation of the power spectra, whereas for local scaling, a grid
based on a local moving window is used. The local scaling procedure
follows the implementation used in LocScale,[8] which performs local scaling based on a reference amplitude spectra.
As in the case of LocScale, a default window size which is seven times
the map resolution was used. The scaling calculation is used to update
the value assigned to the central voxel of the window.For a
given map, the amplitudes in each resolution shell are scaled
by the square root of the ratio of the average intensities of both
maps to the intensity of that map in that shell.where FT1sc is the scaled Fourier
term in a given shell for map 1, FT1 is the initial Fourier
term in the shell, I1 and I2 are the average intensities (square of amplitudes) in
the shell for map1 and map2, respectively. Map 2 is scaled in an analogous
manner.When the difference is calculated between a map and
an atomic model,
the amplitudes of the map simulated from the model are used as the
reference for scaling, by default. This is under the assumption that
the map simulated from the atomic model is noise free and gives a
reasonable representation of features at this resolution (of the experimental
map). In this case, the map amplitudes are scaled by the square root
of the ratio of the average intensity (rotationally averaged) of the
model-derived map (I2) to the average
intensity of the map (I1) in that shell.The map from the atomic model is not scaled. The reference-based
scaling can be overridden by changing the default option, especially
for cases where the atomic model is partial or not fitted well in
the map.The differences
between the scaled
maps are calculated in real space, giving absolute map–map
or map–model difference maps.To interpret the differences, we also
calculate the fractional differences with respect
to the scaled maps. For each voxelwhere Df is the fractional difference, D1–2 is the density difference between
map1 and map2, and ρ1 is the density
of scaled map1. A similar computation
can be used for calculating the extent of the difference with respect
to map2, for D2–1. Because of this
weighting, Df is not the negative of Df,2–1. In assessing differences, it is useful to
look at the positive regions of D1–2 or D2–1 and quantify the significance
using Df and Df,2–1.The fractional difference maps are useful
guides to interpret differences.
A suitable threshold of fractional difference can be used to mask
the difference maps. A lower threshold (e.g., 0.25) removes any insignificant
differences arising from noise. On the other hand, a higher threshold
(e.g., 0.5) shows areas of large differences. To further clean the
differences, a dust filter can be applied on the masked difference
map to remove small isolated densities around the masked difference
map.
Results and Discussion
Map–Map Comparison
We applied
the difference
map approach to the following cases to test the method and identify
compositional and conformational differences.
Strychnine-Bound vs Glycine-Bound
GlyR
A glycine receptor
is a ligand-gated channel receptor that opens a chloride-permeable
pore leading to inhibition of neuronal firing in the spinal cord and
brain stem.[14,15] It controls a wide range of motor
and sensory functions including vision and audition. Strychnine is
a complex alkaloid which is a potent receptor antagonist that binds
to the canonical intersubunit neurotransmitter site and locks the
receptor in the closed state.[16] Glycine
binds at the same site but induces channel opening, allowing permeation
of chloride ions. Ivermectin is an unconventional agonist of the GlyR
that activates GlyR, potentiates response to glycine,[17] and triggers the open conformation.The structures
of strychnine- and ivermectin/glycine-bound forms of GlyR (alpha-1
isoform) were determined at 3.9 Å (EMD-6344) and 3.8 Å (EMD-6346)
resolutions, respectively, using cryo-EM.[18] The structures have a five-fold symmetry around the pore axis. We
calculated the difference density using global amplitude scaling between
the strychnine- and ivermectin/glycine-bound forms of GlyR (Figure A,B). The maps were
not preprocessed. To assess the differences, we used a comparison
of the atomic models for the two forms built on the maps and also
the crystal structures of strychnine-bound (PDB ID: 5CFB) and ivermectin-bound
(PDB ID: 5VDH) GlyR (alpha-3 isoform).[19]
Figure 1
GlyR receptor.
(A) Global scaling-based density difference between
strychnine (EMD-6344)- and ivermectin/glycine (EMD-6346)-bound forms
of GlyR (alpha-1 isoform). The difference map (D1–2) is shown in gray, and the backbone of the atomic
model (ribbon) associated with the strychnine-bound map (PDB ID: 3JAD) is colored based
on the fractional difference Df,1–2 (averaged over voxels covered by each amino acid). Individual atoms
of the strychnine molecule (ball and stick representation) and the
bound sugars (stick representation) are colored based on Df,1–2. (B) Density difference between the ivermectin/glycine
(EMD-6346)- and strychnine (EMD-6344)-bound forms. The atomic model
associated with the ivermectin-bound map (PDB ID: 3JAF) is colored based
on the fractional difference Df,2–1 averaged over voxels covered by each amino acid. Individual atoms
of the ivermectin molecule (ball and stick representation) and the
bound sugars (stick representation) are colored based on Df,2–1. The difference map (D2–1) is in yellow. The insets between panels A and B
show differences at the strychnine and ivermectin binding sites (zoomed
in). (C) Comparison of crystal structures of strychnine (PDB ID: 5CFB)- and ivermectin-bound
(PDB ID: 5VDH) GlyR (alpha-3 isoform). The structure of strychnine-bound GlyR
is shown, colored based on the distance between backbone C-alpha atoms
in the two forms. (D) Local scaling-based density difference between
strychnine (EMD-6344)- and ivermectin/glycine (EMD-6346)-bound forms
of GlyR (alpha-1 isoform). The difference map (D1–2) is shown in gray, and the backbone of the atomic
model associated with the map (PDB ID: 3JAD) is colored based on the fractional difference Df,1–2. Individual atoms of the strychnine
molecule (ball and stick representation) and the bound sugars (stick
representation) are colored based on Df,1–2. (E) Local scaling-based density difference between the ivermectin/glycine
(EMD-6346)- and strychnine (EMD-6344)-bound forms. The atomic model
associated with the ivermectin-bound map (PDB ID: 3JAF) is colored based
on the fractional difference Df,2–1. The difference map D2–1 is in
yellow. Individual atoms of the ivermectin molecule (ball and stick
representation) and the bound sugars (stick representation) are colored
based on Df,2–1. The insets between
panels D and E shows differences at the strychnine and ivermectin
binding sites (zoomed in). (F) Crystal structures of strychnine (PDB
ID: 5CFB)-bound
GlyR (alpha-3 isoform) are colored based on the atomic B-factor distribution
(averaged over atoms in each amino acid residue).
GlyR receptor.
(A) Global scaling-based density difference between
strychnine (EMD-6344)- and ivermectin/glycine (EMD-6346)-bound forms
of GlyR (alpha-1 isoform). The difference map (D1–2) is shown in gray, and the backbone of the atomic
model (ribbon) associated with the strychnine-bound map (PDB ID: 3JAD) is colored based
on the fractional difference Df,1–2 (averaged over voxels covered by each amino acid). Individual atoms
of the strychnine molecule (ball and stick representation) and the
bound sugars (stick representation) are colored based on Df,1–2. (B) Density difference between the ivermectin/glycine
(EMD-6346)- and strychnine (EMD-6344)-bound forms. The atomic model
associated with the ivermectin-bound map (PDB ID: 3JAF) is colored based
on the fractional difference Df,2–1 averaged over voxels covered by each amino acid. Individual atoms
of the ivermectin molecule (ball and stick representation) and the
bound sugars (stick representation) are colored based on Df,2–1. The difference map (D2–1) is in yellow. The insets between panels A and B
show differences at the strychnine and ivermectin binding sites (zoomed
in). (C) Comparison of crystal structures of strychnine (PDB ID: 5CFB)- and ivermectin-bound
(PDB ID: 5VDH) GlyR (alpha-3 isoform). The structure of strychnine-bound GlyR
is shown, colored based on the distance between backbone C-alpha atoms
in the two forms. (D) Local scaling-based density difference between
strychnine (EMD-6344)- and ivermectin/glycine (EMD-6346)-bound forms
of GlyR (alpha-1 isoform). The difference map (D1–2) is shown in gray, and the backbone of the atomic
model associated with the map (PDB ID: 3JAD) is colored based on the fractional difference Df,1–2. Individual atoms of the strychnine
molecule (ball and stick representation) and the bound sugars (stick
representation) are colored based on Df,1–2. (E) Local scaling-based density difference between the ivermectin/glycine
(EMD-6346)- and strychnine (EMD-6344)-bound forms. The atomic model
associated with the ivermectin-bound map (PDB ID: 3JAF) is colored based
on the fractional difference Df,2–1. The difference map D2–1 is in
yellow. Individual atoms of the ivermectin molecule (ball and stick
representation) and the bound sugars (stick representation) are colored
based on Df,2–1. The insets between
panels D and E shows differences at the strychnine and ivermectin
binding sites (zoomed in). (F) Crystal structures of strychnine (PDB
ID: 5CFB)-bound
GlyR (alpha-3 isoform) are colored based on the atomic B-factor distribution
(averaged over atoms in each amino acid residue).
Difference Based on Global Scaling
The locations of
strychnine and ivermectin were identified as difference densities
(Figure A,B). The
atomic models in Figure A and B corresponding to the two GlyR states are colored by the Df,1–2 and Df,2–1 values, respectively. A clear difference density was observed for
strychnine at the intersubunit site between the extracellular domains.
The fractional difference averaged over the voxels of the binding
site is Df,1–2 ∼ 0.49, which
is less than 1.0 due to residual density in the ivermectin-bound form
arising mainly from the background and conformational changes in the
surrounding protein. Ivermectin density on the other hand was found
at the subunit interface between transmembrane domains. The difference
density was relatively less prominent (Df,2–1 ∼ 0.30) compared to that of strychnine. The C-terminal segment
of ivermectin is exposed to the membrane layer and is associated with
high B-factors (>100 Å2, PDB ID: 5VDH) suggesting greater
flexibility.The conformational changes between the closed strychnine-bound
and open/activated ivermectin-bound forms of GlyR are also captured
as differences. We compared the difference density against the differences
between crystal structures (alpha-3 isoform) of the two ligand-bound
forms (Figure C).
The differences generally agree and are more prominent in the transmembrane
domain. The differences also reflect the differences in the mechanism
of action of the ligands. In the glycine/ivermectin-bound form, the
intracellular halves of the transmembrane helices move closer to each
other compared to the extracellular half which is wider (Figure S1). In contrast, the pore in the strychnine-bound
form is constricted and rather perpendicular to the membrane. The
helices in the intracellular domain that bind the pore axis undergo
a larger tilt and clockwise rotation compared to the glycine/ivermectin-bound
form.[18,19]
Difference Based on Local Scaling
The local amplitude
scaling approach uses only a local window segment of the map at a
time to calculate amplitude spectra and the associated scaling factors
(see Methods). Hence, local contrast differences
can be accounted for in the scaling procedure and difference calculation.
To assess this advantage, we compared the difference densities from
local and global scaling approaches for the glycine receptor.The B-factor distribution suggests that the intracellular half of
the transmembrane domain and the tip of the extracellular domain of
GlyR receptors are more dynamic relative to the rest of the structure
(Figure F and Figure S2). We calculated difference maps between
the strychnine- and ivermectin/glycine-bound forms of GlyR based on
local scaling. The differences corresponding to the flexible segments
are relatively less pronounced (compared to differences from global
scaling), reflecting an appropriate contrast for the flexible segments
(Figure D). The difference
map also shows more features in the regions with lower B-factors,
especially for the interface between extracellular and transmembrane
domains (Figure D,E).
The difference density corresponding to the C-terminal segment of
ivermectin is more evident as well in the locally scaled difference
map (Figure E inset).
The fractional difference Df,1–2 averaged over voxels covered by strychnine is about 0.31, while
the voxels covered by ivermectin has an average fractional difference Df,2–1 ∼ 0.24.Hence, the
local scaling procedure enables differential scaling
depending on the signal in the windowed region. The distribution of
the difference density is altered accordingly, enhancing differences
in areas associated with smaller uncertainty.
MKLP2 ADP-AlFx
vs Non-Nucleotide State
MKLP2 is a kinesin-6
family motor protein that has important roles in different stages
of cell division.[20,21] Structural characterization of
the microtubule-bound MKLP2 motor domain at different stages of its
ATPase cycle provided insights into its function and divergence from
other kinesins.[22] Among different conformational
states, the structure of the ADP-AlFx (ATP analogue)-bound form of
the kinesin-6 (MKLP2) motor domain was solved at 4.4 Å resolution
(EMD-3622) and the non-nucleotide state (NN)[22] at a resolution of 6.1 Å (EMD-3621).Compared to the
previous example, these maps are resolved at lower resolutions, and
there is a mismatch in resolution between the maps we want to compare,
making this a more challenging test of the method. The difference
map approach was applied to compare the conformations of the ADP-AlFx
(ATP analogue)-bound state to that of the non-nucleotide state (NN).Without any map preprocessing (thresholding/masking, dusting, and
low-pass filtering), the difference map is much noisier with several
disconnected densities (Figure S3A). Without
thresholding and dusting but with low-pass filtering, the difference
is less noisy but has a few small disconnected densities (probable
dust) and broken features for loop11 (Figure S3B). With all preprocessing steps (see Methods), a cleaner difference is obtained (Figure S3C).The location of
ADP-AlFx was observed as a density difference unoccupied by the protein
model at the nucleotide-binding pocket (Figure A) (Df,1–2 ∼ 0.73). Significant differences were also observed in the
vicinity of the nucleotide indicating structural rearrangements upon
binding.
Figure 2
Actin-bound MKLP2. (A) Global scaling-based density difference
(gray) between ADP-AlFx (ATP analogue)-bound and non-nucleotide (NN)
states of kinesin-6 (MKLP2) motor domain. The backbone of the atomic
model built on the ADP-ALFx-bound map is colored by Df,1–2 values (averaged over voxels covered by each
amino acid). Different structural segments of the MKLP2 motor domain
are labeled. Atoms of ADP-AlFx (stick representation) are colored
based on Df,1–2. (B) Atomic model
built on the ADP-ALFx-bound map is colored based on backbone C−α
distances between the models built in the ADP-AlFx (ATP analogue)-bound
and non-nucleotide (NN) states of the kinesin-6 (MKLP2) motor domain.
(C) Local scaling-based density difference (gray) between ADP-AlFx
(ATP analogue)-bound and non-nucleotide (NN) states of the kinesin-6
(MKLP2) motor domain. The atomic model built on the ADP-ALFx-bound
map is colored by Df,1–2 values.
The region of loop6 where the density difference is less prominent
is pointed with an arrow. Atoms of ADP-AlFx (stick representation)
are colored based on Df,1–2.
Actin-bound MKLP2. (A) Global scaling-based density difference
(gray) between ADP-AlFx (ATP analogue)-bound and non-nucleotide (NN)
states of kinesin-6 (MKLP2) motor domain. The backbone of the atomic
model built on the ADP-ALFx-bound map is colored by Df,1–2 values (averaged over voxels covered by each
amino acid). Different structural segments of the MKLP2 motor domain
are labeled. Atoms of ADP-AlFx (stick representation) are colored
based on Df,1–2. (B) Atomic model
built on the ADP-ALFx-bound map is colored based on backbone C−α
distances between the models built in the ADP-AlFx (ATP analogue)-bound
and non-nucleotide (NN) states of the kinesin-6 (MKLP2) motor domain.
(C) Local scaling-based density difference (gray) between ADP-AlFx
(ATP analogue)-bound and non-nucleotide (NN) states of the kinesin-6
(MKLP2) motor domain. The atomic model built on the ADP-ALFx-bound
map is colored by Df,1–2 values.
The region of loop6 where the density difference is less prominent
is pointed with an arrow. Atoms of ADP-AlFx (stick representation)
are colored based on Df,1–2.To assess the conformational difference, we checked
the agreement
of the difference density with the spatial differences in the coordinates
of models fitted in the maps. The model segments associated with significant
spatial differences agree well with the density differences between
respective maps (Figure B). The atomic models fitted in intermediate resolution maps are
likely to be error prone compared to those built in a high resolution
map. Hence, the map–map differences may reflect a more reliable
comparison of the two states of MKLP2. Nevertheless, we use the models
to identify any significant changes and only used the backbone C−α
atoms (more reliable than side chains at these resolutions) to calculate
distances between the models. Also, we compare the differences to
the changes observed across other kinesins during the ATPase cycle
(see below).It is observed that the structural segments around
the nucleotide
binding site (e.g., loop 9, loop 11, and N-term helix-α4) are
more stable in the ADP-AlFx-bound state (ADP.Pi-like).[22] In addition, loop6 forms a separate subdomain
in kinesin-6[23] and is better resolved in
the ADP-AlFx-bound map. Secondary structure prediction for the sequence
of this loop suggested the presence of helices[22] which is also evident in the helical densities in the difference
(Figure A). Coordinated
movements of structural segments are observed during the microtubule-bound
ATPase cycle and in the transition from the NN to ADP-AlFx state;
the P-loop and alpha-3/loop9/loop11 segments move toward the catalytic
site.[24] These segments are also associated
with difference densities. Similar subdomain rearrangements were also
reported for other well-studied kinesins.[24,25]The difference map
calculated after local scaling (Figure C) had a similar profile compared to the global scaling-based
difference. A more localized density for the nucleotide analogue (ADP-AlFx)
was obtained with the local scaling-based difference, and part of
the differences corresponding to loop6 was less prominent (highlighted
in the figure). The voxels covered by ADP-AlFx are associated with
an average fractional difference Df,1–2 ∼ 0.60.The local scaling-based difference is associated
with a relatively narrow range of fractional difference values compared
to that of global scaling. This can be observed while comparing the Df value-based coloring of atomic models discussed
in the cases above (Figures and 2). As the scale of the amplitude
falloff is optimized locally, the local scaling procedure minimizes
oversharpening and overblurring of parts of the map that might otherwise
result from global scaling (due to local resolution variation). The
range of Df values over a structure narrows
as the window size for local scaling decreases. Df values around ligands are also suppressed with small
window sizes but remain significantly above the rest of the structure.
Model Validation Using Difference Maps
Atomic model
building and refinement in maps of resolutions worse than 3.0 Å
can be challenging. Moreover, local regions of cryo-EM maps often
have relatively lower resolutions associated with larger uncertainty.
We tested the difference map approach as a tool to identify errors
in the atomic model based on differences with the density.We
used the 3.2 Å hemoglobin map in the nonfunctional ferric state
(close to relaxed R2 state).[26] The map
was preprocessed with a contour threshold of two times sigma, followed
by application of dust and soft edge filters. An atomic model was
also deposited with the experimental map (PDB ID: 5NI1). The structure
is a heterotetramer made of two alpha and two beta subunits. The alpha
subunit is better resolved in the map than the beta subunit and is
associated with relatively lower B-factors (Figure S4A). Global scaling associates more differences to the beta
subunit compared to the alpha subunit, and the fractional differences
agree overall with the B-factor profile (Figure S4B). Local scaling, however, results is a more uniform distribution
(Figure S4C). Hence, the effect of the
nonuniform local resolution is minimized with local scaling, potentially
making real differences more apparent.We carried out a few
tests to check whether the local scaling-based
differences are useful for model validation.
Identify Errors Introduced
As described below, we introduced
specific errors in side chains and the backbone of parts of the model
that were otherwise well fitted in the density. The difference map
approach was then applied to check whether these errors could be detected
as differences.We first altered rotamers of a few side chains
in the model (Figure A). The map–model differences were calculated after local
density scaling. The errors associated with side chain fits could
be identified as peaks in the fractional difference maps, suggesting
that the differences can be a useful guide to track such errors, and
this method can be used to assess model fits in maps. As expected,
larger deviations (e.g., K11, W14, N68, and L80) from the true fit
were associated with more pronounced difference densities with misfitted
side chain atoms associated with Df,model-map values greater than 0.5. On the other hand, for subtle changes (e.g.,
H72, L83), displaced atoms were associated with Df,model-map values of about 0.3.
Figure 3
Detecting potential errors
in atomic model fits. (A) Structural
segment of the atomic model (PDB ID: 5ni1) built on the cryo-EM map of hemoglobin
in the nonfunctional ferric state (close to relaxed R2 state) is shown
(yellow). Six residues are labeled where the side chain rotamers were
altered to introduce errors in the fit. The atoms in the altered model
are colored based on the Dmodel-map of local scaling-based difference density between the model and
map. The difference map Dmodel-map is shown as orange mesh, while Dmap-model is shown as solid yellow. (B) Backbone atoms of another segment
of the model are shown where errors were introduced by peptide flips
and carbonyl rotations. The initial atom positions are shown with
thin sticks (green), and the atoms in the mutated model are colored
based on Df of the model–map difference.
(C) Plot of Df,model-map (averaged
over atoms of a residue) vs TEMPy SMOC scores for fit of original
atomic model (PDB ID: 5ni1) to density map. Examples of residues associated with
high Df,model-map (averaged over
atoms of a residue) and low SMOC scores are shown above the plot.
A few potential misfits highlighted by fractional difference but not
by SMOC scores are shown on the right (marked within a circle). The
difference map Dmodel-map is shown
as orange mesh, while Dmap-model is shown as solid yellow. The cryo-EM map associated with the model
(EMD-3488) is shown in transparent gray.
Detecting potential errors
in atomic model fits. (A) Structural
segment of the atomic model (PDB ID: 5ni1) built on the cryo-EM map of hemoglobin
in the nonfunctional ferric state (close to relaxed R2 state) is shown
(yellow). Six residues are labeled where the side chain rotamers were
altered to introduce errors in the fit. The atoms in the altered model
are colored based on the Dmodel-map of local scaling-based difference density between the model and
map. The difference map Dmodel-map is shown as orange mesh, while Dmap-model is shown as solid yellow. (B) Backbone atoms of another segment
of the model are shown where errors were introduced by peptide flips
and carbonyl rotations. The initial atom positions are shown with
thin sticks (green), and the atoms in the mutated model are colored
based on Df of the model–map difference.
(C) Plot of Df,model-map (averaged
over atoms of a residue) vs TEMPy SMOC scores for fit of original
atomic model (PDB ID: 5ni1) to density map. Examples of residues associated with
high Df,model-map (averaged over
atoms of a residue) and low SMOC scores are shown above the plot.
A few potential misfits highlighted by fractional difference but not
by SMOC scores are shown on the right (marked within a circle). The
difference map Dmodel-map is shown
as orange mesh, while Dmap-model is shown as solid yellow. The cryo-EM map associated with the model
(EMD-3488) is shown in transparent gray.We introduced another set of modeling errors in the backbone of
a helix (Figure B)
using peptide flips and change of phi/psi dihedrals introduced using
tools in Coot.[27] The misfit atoms in were
associated with a difference fraction Df,model-map greater than 0.25, suggesting that the backbone changes are less
prominent as expected at this resolution. Nevertheless, as routinely
done in crystallography, the difference densities can be used as a
guide to track potential misfits along the protein chain.
Compare against
a Density Fit Score
The difference
densities are usually more informative and quite complementary to
the metrics that evaluate the extent of model fit to density. The
positive and negative differences (D1–2 and D2–1) can act as a guide
(by providing directionality) for fixing the models. In another test,
we compared the difference density against the TEMPy SMOC score[28] which gives a cross-correlation analogue (Manders’
overlap coefficient) of the local density fit. For the original atomic
model (PDB ID: 5NI1) without any errors introduced, the average Df,model-map of each residue generally agrees with the
trend of SMOC scores (Figure C).We looked at a few examples of residues associated
with high Df,model-map (averaged
over atoms) and low SMOC scores, reflecting potential errors with
model fit (Figure C). The segment involving Gly51 is likely to be mistraced, as the
backbone is out of density. However, all the residues in this category
are not obvious misfits. We also observe cases where the differences
arise from inconsistencies between experimental maps and the theoretical
maps derived from the model. Residues Asp47 and Asp75 have acidic
side chains and lack well-defined densities at the end of their side
chains. The high Df,model-map associated
with the side chain atoms can be accounted for by the fact that the
map generated from the model does not accurately reflect the effects
of factors like atomic charges and radiation damage that affect the
experimental map. Lys56 is another example where the side chain lacks
a well-defined density but has high Df,model-map associated with the side chain atoms. This can be attributed to
the fact that the refined atomic B-factors used in the map calculation
may not accurately account for the dynamics or disorder. Nevertheless,
these differences reflected by high Df,model-map (and low SMOC scores) suggest that the atomic positions in the side
chains of these residues are less reliable.We looked at the
residues whose Df,model-map (averaged
over atoms) is greater than 0.3, despite relatively high
SMOC scores (Figure C, circled). One or more atoms in most of these residues are associated
with a Df,model-map greater than
0.5. These cases point to areas where the agreement between the residue
backbone and/or side chain and map density might be poor either due
to a bad fit (e.g., Pro114) and/or the map is poorly resolved (e.g.,
Pro5, Thr12) in this region.
Validate Atomic Models
from the Model Challenge
As
a separate test of the applicability of this approach for atomic model
validation, we selected models submitted to the EMDB Model Challenge
2015[29,30] and checked whether the difference maps
can indicate errors in the density fits. We compared models submitted
for the target gamma-secretase map (EMD-3061). The map was preprocessed
with a contour threshold of two times sigma, followed by application
of dust and soft edge filters. We selected a model ranked higher by
different metrics used to evaluate density fit in the model challenge
(see http://model-compare.emdatabank.org/2016/cgi-bin/em_multimer_results.cgi?target_map=T0007emd_3061). We compared this model against another model which was ranked
lower by metrics used in the model challenge. We calculated model–map
differences and compared areas where errors were identified based
on the differences (Figure A–D). The differences clearly point to locations where
residues fit poorly in density in the second model compared to the
best ranked model. The poorly fitted atoms are usually associated
with Df,model-map > 0.5. A better
fit was observed in the best model in these regions.
Figure 4
Identifying errors in
atomic model fits. In each panel (A–D),
local segments of two atomic models submitted to the EMDB Model Challenge
2015 for the target gamma-secretase map (EMD-3061) are compared for
fit to density. For each panel, the figure on the left corresponds
to the model ranked higher in the challenge, and a relatively lower
scoring model is on the right. The atoms in the models are colored
by Df,model-map based on model–map difference.
The poorly fitted residue (in the model on the right subpanel) is
labeled, and the chain ID is in paranthesis. In (D), a poorly fitted
backbone near S401 (chain B) is indicated with an arrow.
Identifying errors in
atomic model fits. In each panel (A–D),
local segments of two atomic models submitted to the EMDB Model Challenge
2015 for the target gamma-secretase map (EMD-3061) are compared for
fit to density. For each panel, the figure on the left corresponds
to the model ranked higher in the challenge, and a relatively lower
scoring model is on the right. The atoms in the models are colored
by Df,model-map based on model–map difference.
The poorly fitted residue (in the model on the right subpanel) is
labeled, and the chain ID is in paranthesis. In (D), a poorly fitted
backbone near S401 (chain B) is indicated with an arrow.
Discussion
The approach presented in this paper is
useful in identifying ligand densities and conformational differences
by comparing density maps. Identification of a ligand binding site
is challenging at intermediate-to-low resolutions, and the difference
density is a useful pointer to potential locations. In addition to
the examples presented above, this approach was found useful for identifying
the binding site of a kinesin inhibitor based on cryo-EM maps of resolutions
between 5 and 6 Å. A difference density blob coincided with a
potential drug binding pocket on the protein surface, with the interacting
site harboring residues specific for the subfamily of proteins that
the drug targets.[31] The drug molecule when
docked computationally at this pocket correlated well with the difference
density, although the resolution is not good enough to confirm details
of the pose.Map density scaling is central to difference map
calculations, and local scaling has been shown to be useful for model
building in maps that sample a wide range of local resolutions.[8] Local scaling was found more appropriate to interpret
differences especially when the differences are contributed by segments
involving flexible or less resolved parts of the molecule.The
developed approach is also useful to compare atomic models
to maps and can be a helpful guide in identifying errors in atomic
model fits. In the context of model validation, difference maps complement
other metrics based on model–map fit or expected geometries.
Some metrics are less discriminative at lower resolutions, though
CaBLAM, for example, still picks up the backbone model errors considered
in Figure B. In general,
although it is important to compare different validation metrics when
finalizing a structure, the difference maps provide useful visual
clues to problem areas. As mentioned earlier, inaccuracies in map
calculations from the model can result in differences with the experimental
map. Accounting for factors like atomic charges, radiation damage,
and accurate B-factor estimates to reflect dynamics will improve theoretical
map calculation and minimize such differences.The fractional
difference maps act as useful means to locate voxels
associated with significant conformational and compositional changes.
A threshold applied to the fractional difference maps is useful to
mask out differences that are less significant or arising from noise.
The choice of the threshold might depend on whether the differences
arise from areas where the molecular volumes overlap, local dynamics
of the molecule, and occupancy in the region of interest. In the case
of a map–map comparison applied to GlyR (discussed above),
the core of the ligands (which is better resolved than periphery)
could be located with a Df threshold of
0.4, while this threshold covers most or all of the ligand density
(ADP-AlFx) in the case of MKLP2 example. For validating atomic models
fitted in maps, a Df threshold of 0.5
identifies most of the obvious misfits and atoms outside the molecular
contour of the map. Subtle differences in backbone and side chains
were visible above a threshold of about 0.25. These thresholds may
be used as a guide, although different values might have to be tested
in practice.The quality of the map–map (or map–model)
alignment
affects the differences obtained, and errors in alignment are observed
as differences. For large-scale conformational changes or domain motions,
the alignment of two maps may have to be anchored on the less dynamic
segment of the molecular complex. Also, global scaling might be preferable
in such cases as local scaling works on the assumption that the equivalent
parts of the maps are aligned.
Implementation
The difference map
calculation method
is implemented in the CCP-EM software package for electron cryo-microscopy.[9] The interface either takes two maps or a map
and a model as input, and these should be aligned beforehand. If the
map sizes and/or voxel spacings differ, they have to be resampled
to a common grid. The input map(s) can be preprocessed to remove any
background using the map processing tool in CCP-EM. This tool provides
options to threshold/mask and dust and adds a soft edge to the masked
map.To calculate differences between a map and an atomic model,
a map simulated from the model can be generated externally and supplied
as input. Alternatively, if the atomic model is used as the second
input, a map is generated from the model using the TEMPy software
package.[13] By default, the model is used
as the reference for scaling, but this can be disabled.For
calculating differences, both local and global scaling modes
are provided as options for the user to choose from. For local scaling,
a mask file should be provided which covers the area wherein scaling
calculations will be done (note that this can be distinct from the
mask used in map preprocessing). Ideally, this mask covers useful
molecular volumes of both inputs, and it is recommended to provide
a mask. If a mask is not provided, a map contour threshold of 2.0
sigma is applied on the first map to create a mask.As expected,
the local scaling calculation for the maps is much
slower than the global calculation. For a map grid of size 1003, local scaling calculations take about 1 min 20 s, while
global scaling for the same map takes 1.3 s on a single CPU.The interface provides links to visualize the difference densities
in Chimera or Coot. The fractional difference maps Df,1–2 and Df,2–1 are also calculated by default. These maps can be used to color
atomic models in Chimera, using the fractional difference values as
attributes for atoms.Optionally, a fractional difference threshold
can be used to mask
the output difference map calculated. All voxels with Df less than the threshold are masked out in the difference
map. Similarly, a dust filter can be applied on the difference map
as an option. This removes any dust after masking the differences
at a given Df threshold (0.3 by default).
Conclusions
We present an approach for calculation
of difference densities
for cryo-EM maps and implement this as a tool with a user-friendly
interface in the CCP-EM package. The tests discussed here reflect
its potential for comparing different EM reconstructions to identify
compositional and conformational differences, as well as to evaluate
atomic model fit in maps. The fractional difference values help to
associate significance to the differences. Our multistep protocol
produces relatively clean and interpretable difference maps. Nevertheless,
a systematic study on the significance of difference densities will
be useful to delineate differences arising from noise vs signal.
Authors: Joseph Atherton; I-Mei Yu; Alexander Cook; Joseph M Muretta; Agnel Joseph; Jennifer Major; Yannick Sourigues; Jeffrey Clause; Maya Topf; Steven S Rosenfeld; Anne Houdusse; Carolyn A Moores Journal: Elife Date: 2017-08-11 Impact factor: 8.140
Authors: Gydo C P van Zundert; Nigel W Moriarty; Oleg V Sobolev; Paul D Adams; Kenneth W Borrelli Journal: Structure Date: 2021-04-05 Impact factor: 5.871
Authors: Keitaro Yamashita; Colin M Palmer; Tom Burnley; Garib N Murshudov Journal: Acta Crystallogr D Struct Biol Date: 2021-09-29 Impact factor: 7.652