John Jacob Peters1, Jeremy Leitz1, Qiang Guo2, Florian Beck3, Wolfgang Baumeister4, Axel T Brunger5. 1. Department of Molecular and Cellular Physiology, Stanford University, Stanford, United States; Department of Neurology and Neurological Sciences, Stanford University, Stanford, United States; Department of Structural Biology, Stanford University, Stanford, United States; Department of Photon Science, Stanford University, Stanford, United States; Howard Hughes Medical Institute, Stanford University, Stanford, United States. 2. State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences and Peking-Tsinghua Center for Life Sciences, Peking University, Beijing 100871, China; Department of Structural Biology, Max Planck Institute of Biochemistry, 82152 Martinsried, Germany. 3. CryoEM Technology, Max Planck Institute of Biochemistry, 82152 Martinsried, Germany. 4. Department of Structural Biology, Max Planck Institute of Biochemistry, 82152 Martinsried, Germany. 5. Department of Molecular and Cellular Physiology, Stanford University, Stanford, United States; Department of Neurology and Neurological Sciences, Stanford University, Stanford, United States; Department of Structural Biology, Stanford University, Stanford, United States; Department of Photon Science, Stanford University, Stanford, United States; Howard Hughes Medical Institute, Stanford University, Stanford, United States. Electronic address: brunger@stanford.edu.
Abstract
Advances in electron microscope instrumentation, cryo-electron tomography data collection, and subtomogram averaging have allowed for the in-situ visualization of molecules and their complexes in their native environment. Current data processing pipelines commonly extract subtomograms as a cubic subvolume with the key assumption that the selected object of interest is discrete from its surroundings. However, in instances when the object is in its native environment, surrounding densities may negatively affect the subsequent alignment and refinement processes, leading to loss of information due to misalignment. For example, the strong densities from surrounding membranes may dominate the alignment process for membrane proteins. Here, we developed methods for feature-guided subtomogram alignment and 3D signal permutation for subtomogram averaging. Our 3D signal permutation method randomizes and filters voxels outside a mask of any shape and blurs the boundary of the mask that encapsulates the object of interest. The randomization preserves global statistical properties such as mean density and standard deviation of voxel density values, effectively producing a featureless background surrounding the object of interest. This signal permutation process can be repeatedly applied with intervening alignments of the 3D signal-permuted subvolumes, recentering of the mask, and optional adjustments of the shape of the mask. We have implemented these methods in a new processing pipeline which starts from tomograms, contains feature-guided subtomogram extraction and alignment, 3D signal-permutation, and subtomogram visualization tools. As an example, feature-guided alignment and 3D signal permutation leads to improved subtomogram average maps for a dataset of synaptic protein complexes in their native environment.
Advances in electron microscope instrumentation, cryo-electron tomography data collection, and subtomogram averaging have allowed for the in-situ visualization of molecules and their complexes in their native environment. Current data processing pipelines commonly extract subtomograms as a cubic subvolume with the key assumption that the selected object of interest is discrete from its surroundings. However, in instances when the object is in its native environment, surrounding densities may negatively affect the subsequent alignment and refinement processes, leading to loss of information due to misalignment. For example, the strong densities from surrounding membranes may dominate the alignment process for membrane proteins. Here, we developed methods for feature-guided subtomogram alignment and 3D signal permutation for subtomogram averaging. Our 3D signal permutation method randomizes and filters voxels outside a mask of any shape and blurs the boundary of the mask that encapsulates the object of interest. The randomization preserves global statistical properties such as mean density and standard deviation of voxel density values, effectively producing a featureless background surrounding the object of interest. This signal permutation process can be repeatedly applied with intervening alignments of the 3D signal-permuted subvolumes, recentering of the mask, and optional adjustments of the shape of the mask. We have implemented these methods in a new processing pipeline which starts from tomograms, contains feature-guided subtomogram extraction and alignment, 3D signal-permutation, and subtomogram visualization tools. As an example, feature-guided alignment and 3D signal permutation leads to improved subtomogram average maps for a dataset of synaptic protein complexes in their native environment.
Advances in cryo-electron microscopy (cryoEM) instrumentation, data collection, and processing have enabled single particle structure determination at atomic resolution in favorable cases (Nakane et al., 2020). In particular, the development of direct electron detectors has permitted the collection of higher resolution information by improved detective quantum efficiency (DQE), and correcting for electron-beam induced drift has improved resolution (Milazzo et al., 2005; Bammes et al., 2012; McMullan et al., 2014). Single particle cryoEM is mostly performed with purified macromolecules, omitting cellular context, and to study such macromolecules in their native environment, cryo-electron tomography (cryoET), which consists of a collection of tilt series, is often used. In principle, the 3D reconstructions obtained from cryoET allow for segmentation and subsequent alignment of characteristic densities such as membranes, filaments, or large macromolecular complexes in their native environment. Subsequently, the resolution of objects of interest can be improved and the effect of the missing wedge alleviated by subtomogram averaging, reaching sub-nanometer resolution in favorable cases (Turoňová et al., 2020; Tegunov et al.; 2021).Subtomogram averaging generally requires collection of many tomograms. The collection of such large cryoET datasets is often challenging, in part due to the necessary data collection time (up to one hour per tomogram). Improvements in automatic cryoET data collection and beam-image shift have reduced the data collection time (Bouvette et al., 2021), and further advances in instrumentation should ultimately allow continuous cryoET data collection in the future (Chreifi et al., 2019; Eisenstein et al., 2019; Chreifi et al., 2021). Improvements in image processing and subtomogram averaging methods have also resulted in higher quality images of macromolecular complexes in their native environment (Heumann et al., 2011; Galaz-Montoya et al., 2015; Bharat et al., 2015; Bharat and Scheres, 2016; Galaz-Montoya et al., 2016; Turoňová et al., 2017; Tegunov and Cramer, 2019; Wan et al., 2020; Tegunov et al.; 2021). However, even with these advances, subtomogram averaging remains challenging for cases where the object of interest is in a crowded environment. In these cases, many objects may be misaligned by automated procedures leading to lower resolution in best case scenarios, and completely missing or averaging out morphological features in worst case scenarios. Increasing the number of tomograms, and hence object number, is one way to address this issue, but the efficacy of this strategy is fundamentally limited by sample preparation and data collection time. This misalignment issue is particularly vexing for transmembrane proteins in which high-intensity voxels, contributed by the membrane, bias image alignment and classification. Thus, there is a need for development of computational approaches that maximize the desired information about objects of interest that can be obtained by subtomogram averaging from cryoET datasets.Currently available packages and pipelines for subtomogram averaging (Bharat et al., 2015; Bharat and Scheres, 2016; Chen et al., 2019; Galaz-Montoya et al., 2016; Tegunov et al., 2021; Tegunov and Cramer, 2019; Wan et al., 2020) include CTF determination, CTF correction, and subtomogram extraction, averaging, classification, and 3D refinement. Generally, the classification and 3D refinement functionalities have been adopted from single particle EM approaches where a discrete target object with low-noise surrounding background is assumed. This approach may not be compatible with non-globular continuous objects, such as proteins associated with or embedded in membranes. Simple masking of the 3D class average density maps during refinement only partially addresses the issue of undesirable background information since such masking does not actually remove the voxels from the 3D tomographic reconstructions that are responsible for biasing classification and alignment. Therefore, as refinement progresses, these highly influential background voxels can re-enter the masked region of interest and bias refinement. For single particle cryoEM datasets, signal subtraction can be performed by removing the contribution of surrounding densities by partial subtraction of such densities in 2D projection images (Morais et al., 2003; Bharat et al., 2015; Nakane et al., 2018). However, it may not be optimal to modify nearby densities in the 2D tilt images of a cryoET dataset. Thus, signal modification in the 3D tomographic reconstruction should be more successful for subtomogram averaging.Here, we developed a new cryoET/subtomogram averaging pipeline (CryoET processing for subtomogram averaging, CrESTA) that includes feature-guided alignment and a new method for 3D signal modification by signal permutation for tomographic reconstructions. Our 3D signal permutation method randomizes voxels outside a mask that can be of any shape and that blurs the boundary of the mask. In addition, we present the option to filter the randomized voxels outside the mask to attenuate obvious outliers that may be caused by membrane densities. Our method effectively produces a featureless background surrounding the object of interest in 3D. This signal permutation process can be repeatedly applied with intervening 3D refinements of the 3D signal-permuted subvolumes, recentering of the mask, and optional adjustments of the shape of the mask. CrESTA starts from 3D tomographic reconstructions (including, but not limited to, CTF-corrected tomographic reconstructions using NovaCTF (Turoňová et al., 2017; Wan et al., 2020)) and contains feature-guided subtomogram alignment, extraction, signal permutation, and subtomogram visualization tools. An application to synaptic protein complexes in their native environment illustrates how CrESTA allows for improved processing of subtomogram averaging data.
Methods
Here we explored subtomogram alignment strategies and demonstrated the utility of 3D signal permutation. We compared two methods for the initial starting alignment of the subtomograms: free alignment and feature-guided alignment (Fig. 1). Moreover, we assessed the effect of a 3D signal permutation method implemented in the NovaCTF/CrESTA/RELION workflow (Figs. 2 and 3) on subtomogram averaging.
Fig. 1.
Subtomogram extraction and alignment strategies.
(A) Object coordinates were used to extract subtomograms from tomographic reconstructions. Following extraction, subtomograms were aligned in one of two workflows: (B) free alignment or (C) feature-guided alignment. (B) In free alignment, all subtomograms are first aligned to generate a class average model. The model is then manually rotated to an axis of interest (dashed line). Subtomograms are then aligned to this manually rotated class average model to produce initially aligned particles. (C) Feature-guided alignment makes use of the selection of a second object coordinate, here the ISV center coordinates (blue asterisks in the center of the vesicle). A vector, v, between the ISV membrane and center coordinates is then used to align all subtomograms (red arrow, here aligning all vesicle-vesicle pairs with the ISV vesicle on top).
Fig. 2.
CrESTA GUI for 3D signal permutation, visualization, subset selection by cross-correlation.
(A) Subtomogram visualization module. (B) Mask creation and 3D signal permutation module. (C) Cross-correlation module.
Fig. 3.
3D signal permutation of intermembrane densities.
(A) A 2D xy slice through a representative tomographic reconstruction (white arrow heads indicate putative ISV-SM vesicle pairs). (B) Segmentation of a subtomogram of a representative “synaptic” interface (ISV = isolated synaptic vesicle, SM = synthetic plasma membrane vesicle) observed in a tomographic reconstruction from our synaptic protein cryoET dataset. Segmentation was manually performed using Amira (Stalling et al., 2005). The blue asterisk marks a V-ATPase, a useful landmark for feature-guided alignment. (C) Missing wedge volume for RELION 3D classification and refinement (also used for cross-correlation calculations). (D) Raw original subtomogram (left) of a synaptic interface and the corresponding Wiener-filtered volume (Tegunov and Cramer, 2019) (defocus value of −3.0 μm and a signal-to-noise ratio falloff of 1.2) (right). Red arrow identifies target protein density between membranes. The subtomogram was extracted with RELION with inverted contrast, a 256-pixel box size (pixel size 2.62 Å), and was normalized. (E) 3D signal permutation around an intermembrane density by randomization and intensity filtering (see Methods) of the voxels outside a cylinder mask at the center of the intermembrane density in panel D. The raw (left) and Wiener-filtered (defocus value of −3.0 μm and a signal-to-noise ratio falloff of 1.2) (right) 3D volumes are shown.
Example cryoET dataset of synaptic protein complexes
Synaptic vesicles are neurotransmitter-containing organelles that fuse with the presynaptic plasma membrane of neurons to facilitate chemical neurotransmission. The core of the SNARE protein complex (for Soluble NSF-attachment protein receptor) is composed of syntaxin-1, SNAP-25 (synaptosomal-associated protein 25) which both primarily localize to the plasma membrane, and synaptobrevin/VAMP2 (vesicle-associated membrane protein 2) which primarily resides on the synaptic vesicle. These proteins zipper to generate the energy required to fuse the synaptic vesicle with the neuron plasma membrane. Additional proteins are necessary to ensure fast, precisely Ca2+-triggered, fusion in particular: the synaptic vesicle-associated calcium sensor synaptotagmin-1, and soluble accessory proteins Munc-18, Munc-13, complexin, and the AAA + ATPase NSF and the adapter protein SNAP which act at recycling and priming stages. Elucidating the structure of these proteins in a pre-fusion state is difficult due to their relatively small size (~70 kDa in total for the SNARE complex alone). Previously, we determined crystal structures of a pre-fusion state of the SNARE complex composed of a soluble SNARE mimetic complex that also included a fragment of synaptotagmin-1 and complexin (Zhou et al., 2015, 2017), but these crystal structures lack the full biological context (i.e., full length proteins and membranes). CryoEM/ET provides a means to examine these structures, but current analyses are hindered by the problems discussed above including a crowded molecular environment and non-discrete particles. As an example, we used a cryoET dataset of synaptic protein complexes in their native environment between isolated synaptic vesicles (ISVs) and synthetic “SM” acceptor vesicles (simply referred to as “synaptic protein cryoET dataset” consisting of 260 tomograms, details in Appendix A). However, our methods are generally applicable to any cryoET dataset.
Tomographic reconstructions
We tested the two alignment strategies using two methods for tomogram reconstruction and 3D-CTF correction: the previously published RELION (version 3) subtomogram averaging workflow (Bharat et al., 2015; Bharat and Scheres, 2016) and a new workflow that we refer to as NovaCTF/CrESTA/RELION workflow that is similar to STOPGAP (Wan et al., 2020) (Fig. 4).
Fig. 4.
Data workflow summary.
General overview of the tomographic reconstruction pipelines using either the (A-B) RELION subtomogram averaging workflow or the (C-D) NovaCTF/CrESTA/RELION workflow. Programs and/or the names of specific shell scripts are bracketed. Blue boxes represent custom scripts, red boxes represent scripts specific to the NovaCTF workflow (Turoňová, et al., 2017). In both workflows, after tomogram reconstruction, two methods of initial alignment were used followed by 3D classification.
The algorithms for the RELION subtomogram averaging workflow have been previously described (Bharat et al., 2015; Bharat and Scheres, 2016) (Fig. 4A and B). For the synaptic protein cryoET dataset we largely used default values for the RELION subtomogram averaging workflow starting with tomographic reconstructions produced by IMOD. Briefly, raw frames were motion corrected using MotionCor2 before being recombined into tilt series using a custom python script (drift_combine.py). Bin 1 and 4 tomographic reconstructions were generated using IMOD. The bin 4 reconstruction was used to manually select object coordinates while objects were extracted by the RELION subtomogram workflow from the bin 1 reconstruction (details in Appendix A).For 3D signal permutation (described below), a 3D CTF-corrected tomographic reconstruction must be calculated. Moreover, full 3D CTF correction is essential for a tilt series due to the >1 μm defocus difference, or up to 100% error in defocus estimation, in the periphery of the tilt series. We used NovaCTF (Turoňová et al., 2017; Wan et al., 2020) which performs 3D CTF estimation and correction on the entire tomogram. NovaCTF can perform the CTF correction by phase-flipping or by CTF multiplication. For our synaptic protein cryoET dataset we used phase-flipping.We use a parallelized implementation of NovaCTF similar to the STOPGAP implementation of NovaCTF (Fig. 4C and D, details in Appendix B) (Turoňová et al., 2017; Wan et al., 2020). Initial tomograms were reconstructed as above with IMOD, object coordinates selected, and the parameters of the initial tomogram reconstructions were then used to generate a new 3D-CTF corrected reconstruction with NovaCTF. We developed a parallelized bash script (Nova_Updated.sh) that applies a priori IMOD alignment transforms and the NovaCTF pipeline to generate CTF-corrected tomographic reconstructions. Nova_Updated.sh first performs defocus estimation using CTFFIND4 with a step size determined by the user, here we used 0.5 μm (Rohou and Grigorieff, 2015) and generates the defocus matrix text file required for NovaCTF defocus algorithm.The CTF correction implemented in NovaCTF uses the .tlt file generated in the ETOMO GUI of IMOD that contains information about the tilt angles of the images. Here our parallel NovaCTF pipeline then applies the alignments generated in ETOMO using IMOD commands and the fid.xf file that contains the information about the alignment transformations of the images. Next, gold particles are removed using the erase.fid file, the edges are tapered, if desired, and any views (i.e., specific bad tilt angle images) are removed using either the tilt.com file—which contains an exclusion list of bad images that were excluded in the ETOMO initialization—or a user-generated exclusions.txt file containing a comma-separated list of poor images. Next, the remaining NovaCTF steps are performed: y-z reorientation, filter projections algorithm, and finally the 3D-CTF algorithm reconstructs the tomogram (Fig. 4C) (Turoňová et al., 2017; Wan et al., 2020).
Extraction of subtomograms
For our synaptic protein cryoET dataset, we performed two separate subtomogram extractions. The first extraction produces synaptic interface subtomograms: that is subtomograms of the entire vesicle-vesicle “synapse”. Synaptic interface subtomograms were extracted using manually selected ISV membrane coordinates, (i.e., the point of the ISV membrane that is closest to the interacting SM vesicle; Fig. 1A, asterisk in membrane) for both the RELION subtomogram averaging and NovaCTF/CrESTA/RELION reconstruction workflows. The ISV could be easily identified by the presence of a large protein density belonging to the vesicular ATPase (Fig. 3A, 3B). To keep track of and visualize the alignment of the vesicle interfaces, we selected a second point at the center of the ISV (Fig. 1A, asterisk in the center of the vesicle).The second extraction focuses on the individual intermembrane densities (i.e., synaptic protein complex densities) present at the synaptic interface. For this second extraction, coordinates were obtained by visual inspection of Wiener-filtered (Tegunov and Cramer, 2019) synaptic interface subtomograms in Chimera (Fig. 3D, right). The approximate center coordinates of individual intermembrane densities were manually chosen (Fig. 3D, right).
CTF correction and missing wedge
The RELION subtomogram averaging workflow includes a 3D CTF correction (Bharat et al., 2015; Bharat and Scheres, 2016) which estimates the defocus using CTFFIND4 (Rohou and Grigorieff, 2015) and corrects the CTF on a per subtomogram basis. 3D CTF corrections are performed for each subtomogram–more precisely, it applies the same CTF correction parameters for all voxels in a particular subtomogram, despite changes in defocus in different portions of the volume (Bharat and Scheres, 2016). The CTF correction is applied in Fourier space, and it also incorporates the effects of the missing wedge and includes a dose-dependent B-factor model (Bharat et al, 2015). For higher tilt images where CTFFIND4 may not produce reliable estimates for some datasets (including our synaptic protein cryoET dataset), the defocus value is set to the average defocus estimate.For the CTF-corrected tomographic reconstructions generated by NovaCTF, the effect of the missing wedge still needs to be accounted for in subsequent classification and refinement processes with RELION. For our synaptic protein cryoET dataset, we generated a synthetic single missing wedge volume for the tilt scheme using tom_wedge.m (Nickell et al., 2005) (Fig. 3C), and this volume is designated as the “_rlnCtfImage” for each subtomogram in the STAR file inputted into RELION. In principle, this approach can be generalized to also apply dose filtering.
Initial alignment of subtomograms
Performing an entirely unbiased, reference-free 3D class averaging with even a modest number of subtomograms is computationally demanding due to the large sample space of possible orientations for each target density. Moreover, such unbiased, reference-free 3D class averaging may not be successful in cases where there is pseudo-symmetry with polarity (see Results). One way to limit this searchable sample space is to first align all subtomograms so that the objects of interest begin with similar orientations. Here we evaluated two methods for this initial alignment: free alignment and feature-guided alignment.
Free alignment
We used a two-step approach for initial reference-free alignment of subtomograms. The first step used RELION’s 3D refinement function to align subtomograms with no reference. Subtomograms were aligned and averaged using one class with no reference and loose limitations on translation and no limitations on angular sampling (Fig. 1B).In general, the resulting average 3D map will be in some arbitrary orientation (Fig. 1B). To better control subsequent re-alignments (and to compare the results to feature-guided alignment), the initial alignment was transformed to a standard orientation. For our synaptic protein cryoET dataset, the final class average from the initial free alignment was visualized using CrESTA’s “Display mrc in tom_volxyz” tool (Fig. 2B). Using this tool, the volume was manually rotated such that the object of interest was arbitrarily aligned to an axis of interest–in the case of an ISV-SM vesicle pair, the axis of closets approach between membranes (i.e., through both vesicles) was oriented to coincide with the z-axis with the ISV membrane on top (Fig. 1B dashed line). In the second step, another 3D RELION refinement with one class was performed with this volume as an initial reference. As an alternative, the particles could be directly rotated according to angles determined by the manually rotated object. As a result of this re-alignment step, all subtomograms are oriented such that they can later be rotationally aligned about this axis of interest.
Feature-guided alignment
Subtomograms are generally obtained by selecting a single coordinate at the center of the object of interest and then extracting a cubic subvolume around it. This coordinate selection can be done by manual inspection using IMOD (Mastronarde and Held, 2017), other packages such as EMAN2 (Galaz-Montoya et al., 2015; Galaz-Montoya et al., 2016), or obtained by template matching (Hrabe et al., 2012). Naturally, these center coordinates have no orientational information about the object of interest. In cases where additional context about the object’s orientation is available, taking advantage of this information can reduce the degrees of freedom in later refinement steps. Furthermore, in instances of pseudo-symmetry with polarity (e.g., a structure sandwiched between two distinct lipid bilayers), subtomograms may become improperly aligned. To reduce the chances of such misalignment a feature-guided alignment approach may be employed. For example, such an approach has proven useful when examining HIV coat assembly where subtomograms were first aligned based on the geometry of a sphere to generate a starting model (Briggs et al., 2009). We developed a conceptually similar feature-guided alignment approach, where a second coordinate is manually selected for each object of interest (in addition to the center coordinate), for example a particular membrane that is associated with a membrane protein. Of course, this requires that the identity of the membrane can be unambiguously established by visual inspection. Our approach should be applicable to cellular substructures with contextual information, such as membrane proteins on single membranes or proteins associated with filaments. We then use the vector between the two selected coordinates to align the axis of interest to the arbitrarily selected z-axis. Where a = [001], v is the vector between the two manually selected coordinates, r is the axis of rotation, and θ is angle of rotation for each subtomogram, i:The rotation matrix R is then calculated from the axis of rotation, r and the rotation angle, θ. This rotation matrix is converted to ZYZ intrinsic Euler angles (the RELION convention) using the scipy.spatial. transform.Rotation.as_euler package on a subtomogram-by-subtomogram basis. These angles are then set as the “_rlnAngleRot”, “_rlnAngleTilt”, and “_rlnAnglePsi” respectively in the STAR file for each subtomogram. It is important to note that the “_rlnAngleRot” value is randomly assigned at this stage. After feature-guided alignment, we recommend performing 3D classification/restrained refinement, effectively imposing a restrained rotational search around the vector v.For our synaptic protein cryoET dataset, this feature-guided alignment produces class average maps that are oriented in the same standard setting as the free alignment method (Fig. 1).
3D signal permutation and filtering of subtomograms
We developed a 3D signal permutation approach within the NovaCTF/CrESTA/RELION workflow to deemphasize potentially interfering background voxels around an object of interest within a subtomogram (Fig. 3). First, a simple geometric binary mask M (for example, a sphere or cylinder, depending on the shape of the object of interest) is generated within CrESTA using the “Create mask” module with values set to 1 and 0 inside and outside the mask, respectively. The rotation matrix, R, and the translation vector, T, from the initial alignment of each subtomogram are inverted and applied to this mask on a per subtomogram basis before the mask, Mrot_trans, is used for 3D signal permutation.The cut_part_and_movefunc.m script then performs the 3D signal permutation on a subtomogram volume, Vi, by leaving voxel indices within the Mrot_trans mask unaffected and randomly permuting the indices of the voxels outside Mrot_trans mask, resulting in the 3D-signal-permuted volume, VS:The permutation of indices of voxels outside the Mrot_trans mask removes features outside the masked region without changing the global statistics of the voxels of the whole subtomogram (mean and standard deviation). To lessen the impact of the border at the edge of the Mrot_trans mask, only a randomly selected subset of voxels is permuted outside the mask as a function of distance from the boundary of the mask. The number of randomly permuted voxels is increased as the distance from the boundary of the Mrot_trans mask increases. Therefore, the information decreases depending on the distance to the boundary of the mask. The problem of calculating a set of voxels which has a defined distance range (pseudo shell) is addressed by “growing” the given mask for a specified amount. The process of growing the mask and voxel permutation is repeated until eventually all voxels outside the grown mask are permuted.If the background contains voxels that are biasing classification (i.e., extreme values on either end of the voxel intensity distribution), it can be useful to filter these voxels (see Supplementary Fig. 1 for example). We included an optional filter that replaces voxels with very large or very small intensities (as identified by user-defined number of standard deviations w, b from the mean m) with values randomly selected from a normal distribution norm (m, s) with a mean, m, and standard deviation, s, (m is usually set to 0 and s set to 1).This high-intensity filter maintains the voxel mean and only subtly changes the voxel standard deviation while further deemphasizing contribution from outlier background voxels.An example of a 3D-signal-permuted subtomogram around an intermembrane density is shown in Fig. 3E.The signal permuted and filtered subtomograms, VSfiltered, can be inputted into RELION or other programs for 3D classification/refinement with the orientation information from initial alignment.
Repeat 3D signal permutation
After convergence of each of the class averages, a new round of 3D signal permutation can be performed. This signal permutation is once again performed on the original subtomograms but uses the alignment information from the previous 3D refinement as opposed to the manually selected coordinates and initial alignments. This step helps correct for inaccuracies in the initial coordinate picking.Additionally, if the class averages are substantially heterogeneous at this stage, using a unique geometric mask for 3D signal permutation for each class may be beneficial and can be done simultaneously. For example, the class average map obtained from a prior alignment/refinement job could be used to generate a more defined mask. However, it is important to assess potential bias that might be introduced by such masks, for example by cross-validation, e.g., the emergence of new features that are not part of the mask. Subsequent alignment/refinement is performed using these corrected and improved 3D-signal-permuted subtomograms. Depending on the quality of these class averages, this process of subtomogram refinement and unique mask generation followed by 3D signal permutation and 3D refinement can be repeated as many times as desired.
Pruning classes manually and with cross-correlation and visualization of subtomograms using CrESTA
For both the RELION subtomogram averaging and the NovaCTF/CrESTA/RELION workflows, throughout the process of alignment, 3D signal permutation, classification/refinement, and all iterations, subtomograms can be visualized using the “Stackbrowser” module in CrESTA. The stackbrowser horizontally tiles individual subtomograms as a series of virtual slices with/without the alignment information applied. The stackbrowser also provides one of two means of pruning subtomograms using CrESTA. Within the stackbrowser, users can deselect poorly aligned/low contrast/misclassified/etc. subtomograms and export selections into a new RELION STAR file. A second pruning approach is within the “Cross-correlation” tab of CrESTA. The “Volume & Star File” module allows users to calculate cross-correlation coefficients between a class average and all the individual subtomograms within a supplied STAR file (with missing wedge information applied in Fourier space) within that class average. Cross-correlation can be focused on a masked region of interest centered around the center of the subtomogram. In our applications we used a cubic mask, but in principle the mask can be of any shape. Additionally, a threshold value can be set to exclude all subtomograms with low cross-correlation coefficients and output a RELION STAR file. Both features offer users greater access to monitoring the progress of refinement and pruning classes.See Appendix C for an overview of a general implementation of the NovaCTF/CrESTA/RELION pipeline.
Results
Tomograms of our synaptic complex cryoET dataset were reconstructed using a standard reconstruction workflow (Fig. 4A). Two coordinates were selected for each ISV-SM interface: the ISV membrane coordinate used for the extraction and the ISV center coordinate used to assess subsequent alignments (Fig. 1A, blue asterisks). More precisely, the ISV membrane coordinate is the position in the ISV membrane that is closest to the SM membrane.The subtomograms obtained by both the RELION subtomogram averaging and the NovaCTF/CrESTA/RELION workflows (Fig. 4) were aligned with no reference (“free alignment”) and no orientational restraints using the 3D refinement function of RELION and were subsequently re-oriented in a standard orientation where the aligned average map consists of a vertically oriented vesicle-vesicle pair along the vector v with the ISV density on top (Fig. 1B, see also Methods). For better visualization of the alignments, we then superimposed all ISV membrane coordinate at the center of a plane and projected the ISV center points onto this plane for all subtomograms (Fig. 5A and F). For the RELION subtomogram averaging workflow, the resulting scatterplot shows ISV centers cluster into two groups—75.6% of center points are in the top cluster and 24.4% are in the bottom cluster—indicating incorrectly flipped configurations of the SM-ISV interface (Fig. 5A). In other words, since all ISV center coordinates are expected to belong to the top cluster, the free alignment process yielded a substantial subset of incorrect alignments. In addition to the flipping, there is also some horizontal spread of some ISV centers indicating improper tilting of interfaces.
Fig. 5.
Feature-guided alignment in NovaCTF tomographic reconstructions results in well-defined ISV membranes.
(A-J) Free alignment of 1473 subtomograms using either RELION subtomogram averaging or NovaCTF/CrESTA/RELION workflows (see Methods), (A) ISV center coordinates were projected onto a plane after rotations obtained by free alignment (blue dots, top scatterplot) cluster into two groups with respect to the ISV membrane coordinate (gray dot), indicating flipped orientations. (B) After 3D classification/v-restrained refinement with RELION, ISV centers remain clustered into two groups. (C) Class average maps resulting from 3D classification/v-restrained refinement. In all cases the ISV vesicle is on top. (D) A 21-pixel midsection-slice for each class average from (C) normalized to a mean of 0 and standard deviation (S.D.) of 1. (E) The pixel intensity profile of the center 21-pixels of a midsection-slice (regions marked by the yellow dashed lines in (D)) show two distinct peaks indicating a clear bilayer (red arrow heads). Pixel intensity values are standard deviations from the mean. (F-J) Panels are the same as (A-E) but using the NovaCTF/CrESTA/RELION subtomogram averaging workflow. (F) Scatterplot of ISV center coordinates indicates some modest clustering indicating interface flipping, but lateral dispersion, indicating tilting, is limited even after classification (G). (H-J) As in (C-E) but using subtomograms reconstructed using the NovaCTF/CrESTA/RELION workflow. Membrane bilayers, especially ISV membranes, become more prominent. (K-T) The same panels as shown above but now using feature-guided alignment. (K-O) Subtomograms reconstructed using the RELION subtomogram averaging workflow. (K) Feature-guided alignment, by definition, yields perfectly aligned interfaces as a starting point. (L) After 3D classification/v-restrained refinement with RELION there is some minimal lateral dispersion, indicating tilting but largely interfaces remain correctly oriented. (M-O) as in (C-E). (P-T) Subtomograms generated using the NovaCTF/CrESTA/RELION workflow and feature-guided alignment strategy. (P and Q) With feature-guided alignment, there is no interface flipping and tilting is largely reduced. (S and T) The lipid bilayer of the ISV is well-defined. (U) Average peak half-heights and amplitudes for traces in E, J, O, and T. For all cases, we performed 3D RELION classification/v-restrained refinement with five classes for 10 iterations with a 3.7° initial angular sampling rate, a 5-pixel offset range and 1-pixel offset step, a regularization parameter of 2, and C2 symmetry (around the axis defined by the tilt and psi angles) was imposed. Prior angles were provided for the “tilt” and “psi” angles, and these angles were limited to a 5° local search range, effectively imposing a restrained rotational search around the vector v. To restrict these angles, “_rlnAngleTilt” and “_rlnAnglePsi” values were added to the RELION STAR file as “_rlnAngleTiltPrior” and “_rlnAnglePsiPrior” respectively. Continuing iterations led noisier class averages.
These initial free alignments were used for 3D classification/v-restrained (see Methods) refinement using RELION with five classes (Fig. 5C) where the refinement was restrained around the vector v. The ISV center orientation was again plotted; it was not improved by 3D classification/v-restrained refinement (Fig. 5B). For each 3D class average map (Fig. 5C), we averaged a 21-pixel thick mid-section slice where the membranes were closest (Fig. 5D). We then calculated the average of the center 21 pixels (Fig. 5D, demarcated by yellow dashed lines) for each pixel along the vertical axis (Fig. 5E). These projections only vaguely suggest the presence of a bilayer as evident by two distinct peaks (red arrows).Better free alignment was achieved when using the NovaCTF/CrESTA/RELION workflow. There was considerably less flipping after initial alignment compared to the RELION subtomogram averaging workflow (97.9% of center points were in the top cluster and 2.1% in the bottom cluster) (Fig. 5F). These alignments were again subjected to 3D classification/v-restrained refinement using RELION (Fig. 5H) which did not substantially change the spread of ISV center locations (Fig. 5G). Projections from class averages yielded stronger evidence for the presence of lipid bilayers, as suggested by double peaks (Fig. 5H–J, red arrowheads). Additionally, we measured the angular difference between vectors v of the initial alignments from the same sub-volumes after the RELION subtomogram averaging and the NovaCTF/CreSTA/RELION workflows and plotted this difference against the average defocus value determined by CTFFIND4 (Supplementary Fig. 3A and B). Nearly all the orthogonal interfaces (i.e., those that have the largest angular differences) in the RELION subtomogram averaging workflow (Fig. 5A) arise from tomograms that were taken close to focus. Interfaces that were taken at high defocus were more likely aligned similarly in both workflows. These results suggest that not only is the precision of the CTF estimation an important factor in initial interface alignment, but how the CTF-correction is applied (e.g., phase-flipping and/or Wiener filtering) is also important. However, an in-depth analysis of these differences is beyond the scope of what we present here.We then used the same subtomograms and performed feature-guided alignment using the ISV membrane – ISV center vector v (Fig. 1C). Here, by design, none of the interfaces are incorrectly flipped or rotated (Fig. 5K, P). For both workflows, these rotations were used during subsequent 3D classification/v-restrained refinement with RELION (Fig. 5L–O and Q–T), yielding five class averages. Only small changes of rotations occurred, and, by design, no flipping, leading to well-aligned ISV centers (Fig. 5L and Q). We examined the difference in vector v between initial and final alignments and found that after feature-guided alignment, the small rotations observed were not affected by the defocus value (Supplementary Fig. 3C and D). However, when comparing the final rotational changes between the RELION subtomogram averaging and NovaCTF/CreSTA/RELION workflows (i.e., iteration 10, comparing panels 5L and 5Q), tomograms that were collected closer to focus again produced larger angular changes in the subtomogram alignment (Supplementary Fig. 3E and F). Again, this underscores the importance of the precision of the CTF estimation and of the CTF-correction method.We approximated the membrane bilayer distance as the width at half height of the peaks in the voxel-intensity traces. Additionally, we measured the peak magnitude as the average of the two local maxima within a double peak subtracted by the local minimum within the double peak for each bilayer, where a double peak could be seen (Fig. 5E, J, O, and T, red arrowheads). Averages are shown in Fig. 5U. Generally, feature-guided alignment with the NovaCTF/CrESTA/RELION workflow led to the sharpest membrane bilayer peaks.In sum, in all conditions (different workflows and alignments), the ISV vesicle was better resolved than the SM vesicle. This may be related to the larger variation of SM vesicle diameter and curvature than that of ISV vesicles. Yet, none of these alignment procedures yielded class averages with intermembrane densities that could be reflective of synaptic protein complexes. One possible explanation is that intermembrane densities may be too heterogeneously distributed and thus “averaged out” during class averaging when using subtomograms that include a relatively large amount of membrane density. This hypothesis is supported by the fact that individual interfaces in tomographic reconstructions contained clearly visible intermembrane densities (Fig. 3).
3D signal permutation and intensity filtering
To resolve individual intermembrane densities, we re-extracted subtomograms from NovaCTF tomographic reconstructions around each individual intermembrane density (Fig. 3D, right panel). We used the NovaCTF/CrESTA/RELION workflow with feature-guided alignment, followed by 3D classification/v-restrained refinement with RELION using a 350 Å diameter spherical mask (Fig. 6A). Weak intermembrane density appeared for one class, but this density was small and relatively featureless. Furthermore, this 3D classification/refinement was unstable and collapsed to a single class with no interface density after 60 iterations. We then restricted the RELION mask to an 80-pixel long by 40-pixel diameter cylindrical mask (pixel size = 2.62 Å) and performed 3D classification/v-restrained refinement, followed by auto-refinement and post-processing (Fig. 6B). This led to the appearance of intermembrane densities for two classes (Class 1 and Class 3). However, bilayers were relatively poorly defined for these two classes. Post-processing and FSC analysis yielded resolutions ranging from 21 to 27 Å (Supplementary Fig. 4, black traces).
Fig. 6.
Intermembrane densities at synaptic vesicle interfaces.
(A) 3D classification/v-restrained refinement with RELION of 1473 subtomograms extracted from NovaCTF tomographic reconstructions around the centers of individual synaptic protein complexes and feature-guided alignment using a 350 Å diameter spherical mask for RELION refinement. Classes 1, 2, and 3 contain 216, 1035, and 222 intermembrane complexes, respectively. Contour levels of normalized volumes are 3.09, 2.26, and 2.47 σ, respectively. (B) Auto-refined and post-processed class averages after 3D classification/v-restrained refinement of subtomograms extracted from NovaCTF tomographic reconstructions around the centers of synaptic protein complexes using feature-guided alignment and an 80-pixel long by 40-pixel diameter cylindrical mask for RELION refinement. Classes 1, 2, 3 contain 492, 452, and 529 intermembrane complexes, respectively. Class averages were filtered to the respective resolution calculated by post-processing—26.8 Å, 21.0 Å, and 24.0 Å for class 1, 2, and 3 respectively. Class averages have been cropped using a the 80×40 pixel cylindrical mask. Contour levels of normalized volumes are 6.51, 8.13, and 9.84 σ, respectively. (C) Auto-refined and post-processed class averages after 3D classification/v-restrained refinement of subtomograms extracted from NovaCTF tomographic reconstructions around the centers of synaptic protein complexes using feature-guided alignment and 3D signal permutation using the same 80-pixel long by 40-pixel diameter cylindrical mask along with a 350 Å diameter spherical mask for RELION. Intermembrane densities emerge, situated between the membranes. Classes 1, 2, and 3 contain 419, 433, and 621 intermembrane densities, respectively. Class averages were filtered to the respective resolution calculated by post-processing—30.5 Å, 27.9 Å, and 33.5 Å for class 1, 2, and 3 respectively. Contour levels of normalized volumes are 6.33, 5.22, and 9.93 σ, respectively. (D) Cross correlation coefficient (CCC) value distributions for each of the classes. (E) Examples of class average maps from 3D classification/v-restrained refinement alone created from Class 3 subtomograms with CCC values > 0 (left, 405 intermembrane complexes) and <0 (right, 216 intermembrane complexes) (See also Supplementary Fig. 2). 3D classification/v-restrained refinement with RELION was performed for 25 iterations with three classes in A (increasing iterations increased noise) and 100 iterations with three classes in B and C. No symmetry was imposed, a 5-pixel offset range, a 1-pixel offset step, and a regularization parameter of 2 was used. Prior angles were provided for the “tilt” and “psi” angles, and these angles were limited to a 5° local search range, effectively imposing a restrained rotational search around the vector v. To restrict these angles, “_rlnAngleTilt” and “_rlnAnglePsi” values were added to the RELION STAR file as “_rlnAngleTiltPrior” and “_rlnAnglePsiPrior” respectively.
To better resolve the intermembrane densities (which are presumably synaptic protein complexes), we next performed 3D signal permutation (Fig. 3E, Methods) as an intervening step between subtomogram extraction using the same 80-pixel long by 40-pixel diameter cylindrical mask and subsequent 3D classification/v-restrained refinement/auto-refinement using a 350 Å diameter spherical mask in RELION. This 3D signal permutation, combined with feature-guided alignment, produced strong densities in all three class average maps at the intermembrane density locations, and each of these intermembrane densities were morphologically distinct from one another (Fig. 6C). Class 1 was defined by a ~10 nm long density that is parallel to the lipid bilayers. Class 2 has a simple perpendicular stalk. Class 3 has an ~8 nm perpendicular stalk with ~10 nm long density that is parallel to the lipid bilayers. Post-processing and FSC analysis resulted in resolutions ranging from 28 to 34 Å (Supplementary Fig. 4, magenta traces).To further improve the quality of our class averages, we next used CrESTA’s cross-correlation coefficient (CCC) tool to remove any subtomograms with CCC values < 0. CCC values varied over a wide distribution (Fig. 6D). As an example, the class average map from class 3 using subtomograms with CCC values > 0 (Fig. 6E, left), had sharper bilayer density and more discrete intermembrane densities than subtomograms with CCC values < 0 (Fig. 6E, right). Negative CCC values might be indicative of high noise or anticorrelations in the particular subtomograms (see Supplementary Fig. 2).CCC filtering was then performed individually for each of the three classes. Additionally, 3D signal permutation was repeated with optimally adjusted cylindrical masks for each class that consider the extent of the interface densities (Fig. 7). 3D v-restrained refinement/auto-refinement was performed, ultimately yielding the class averages shown in Fig. 7A. These refinements stabilized after ~ 200 iterations, and only small changes occurred between 200 and 300 iterations. Post-processing and FSC analysis resulted in estimated resolutions ranging from 24 to 30 Å (Supplementary Fig. 4, green traces).
Fig. 7.
CCC filtering, 3D signal permutation and mask optimization produces classes that converge after continued iterations.
(A) Starting with the last step of the 3D classification/v-restrained refinement related to Fig. 6C, only subtomograms with CCC > 0 were kept, 3D signal permutation repeated with unique cylindrical masks generated for each class centered around the aligned coordinates, followed by v-restrained refinement, auto-refinement and post-processing. For class 1 (337 intermembrane complexes), we used an 80-pixel long by 50-pixel diameter mask. For class 2 (388 intermembrane complexes), we used an 80×40 pixel mask, and for class 3 (405 intermembrane complexes), we used a 90×40 pixel mask. Class averages after 3D classification followed by auto-refinement and post-processing of each class individually. 3D classification nearly reached convergence after 200 iterations, and there were only small changes between 200 and 300 iterations. No symmetry was imposed, a 1-pixel offset range with a 0.5-pixel offset step was used, and local searches from auto-sampling was set to 1.8°. A data-inspired mask with 3 pixels of extension and a 35-pixel soft edge was created in RELION for post-processing. Post-processed class averages were calculated with B-factor set to 200 Å2 and low-pass filtering at 15 Å. Class averages were filtered to the respective resolution calculated by post-processing—24.0 Å, 25.8 Å, and 30.5 Å for class 1, 2, and 3 respectively. Contour levels of normalized volumes are 5.76, 5.06, and 6.95 σ, respectively. (B) Averaged intermembrane densities after applying angular rotations obtained in (A) and without 3D signal permutation—only a 250 Å diameter spherical RELION mask was used (left). Overlay of the expanded class 1 interface with the class 1 average after postprocessing shown in panel (A) (right). Again, no symmetry was imposed, a 5-pixel offset range, a 1-pixel offset step and a regularization parameter of 2 was used in generating the class averages. Prior angles were provided for the “tilt” and “psi” angles, limited to 5° local search range.
These refined class averages produced interfaces with the greatest morphological detail. In particular, the intermembrane density for classes 1 and 2 showed a more extensive globular and more detailed shape after this procedure. As an example, to validate the class 1 average, we applied the orientations obtained after 300 iterations to subtomograms with CCC values > 0 and calculated an average map from the full subtomograms together using these alignments but without 3D signal permutation (Fig. 7B).
Discussion and conclusions
One of the key limitations to subtomogram averaging can be the low object of interest number, a problem that is compounded by the time-intensive process of collecting tomograms and potential rejection or misalignment of objects. Therefore, optimizing the data processing, subtomogram alignment, and averaging workflow is essential for improving final resolution. Ideally, each object of interest should ultimately be properly aligned and classified to maximize the information content of the resulting class average maps. We developed a new subtomogram averaging workflow (NovaCTF/CrESTA/RELION) that includes 1) feature-guided extraction and alignment of subtomograms around a common axis, 2) 3D signal permutation from subtomograms using geometric and data-inspired masks and the randomization of the surrounding voxel values followed by subtomogram averaging with 3D classification/v-restrained refinement, 3) selection of subsets using manual inspection and cross-correlation approaches, and 4) repeat 3D signal permutation and v-restrained refinement with realigned/adjusted masks.Feature-guided alignment of subtomograms from NovaCTF tomographic reconstructions led to reasonably well-defined lipid bilayers in the ISV-SM synthetic fusion system (Fig. 5P–T). Lipid bilayer thickness as measure by half-height widths of voxel projections from 3D class averages approached physiological thickness for the NovaCTF/CrESTA/RELION workflow with feature-guided alignment (5.29 ± 0.45 nm average across all bilayers). Furthermore, this approach led to the most distinct ISV bilayer traces having the largest difference of double peak height compared to the local minimum within the double peak. In general, feature-guided alignment was superior to free alignment. Moreover, free alignment performed better on NovaCTF-corrected tomograms versus the RELION subtomogram averaging workflow (97.9% of ISVs were in one cluster versus 75.6%). However, even with the improvements provided by feature-guided alignment, intermembrane densities were only poorly resolved or did not resolve at all.To resolve such intermembrane densities, we extracted subtomograms around the centers of individual intermembrane densities and performed 3D signal permutation (Fig. 3D and E). 3D signal permutation alone led to class averages with better defined lipid bilayers and intermembrane densities after 3D classification/v-restrained refinement and auto-refinement (compare Fig. 6B and C). Interestingly, even though the class averages produced without 3D signal permutation contained fewer features, resolutions from FSC analysis were higher for these class averages. This is likely because these class averages contained more lipid bilayer which was well aligned and influencing the apparent resolution. Another advantage to 3D signal permutation concerns the missing wedge artifact: voxel randomization outside the mask may reduce the artificial misalignment of subtomograms based on the missing wedge. Moreover, repeat 3D signal permutation with realigned/adjusted masks and added cross-correlation pruning ultimately produced the highest quality 3D maps (Fig. 7A), resulting in more extensive and detailed intermembrane densities. In particular, a more pronounced intermembrane density emerged for class 2, whereas for 3D classification/v-refinement without 3D signal permutation, there is no observable intermembrane density for class 2 (Fig. 6B), and even for 3D refinement with 3D signal permutation prior to realignment and adjustment of the masks, there was little intermembrane density (compare Fig. 6C and 7A).Class 1 might appear as though it could be due to average of intermembrane densities that contain tilting of the synaptic vesicle (top) membrane. However, application of the alignments to calculate an expanded average map without 3D signal permutation reveal that this intermembrane density is distinct from the lipid bilayer. This expanded average map is wholly unobtainable from standard averaging (compare Fig. 7B with Fig. 6A). The expanded map reveals an elongated intermembrane density (length ~ 100 Å) that is sandwiched between lipid bilayers. We note that the length of the postfusion (cis) SNARE complex is of comparable length (Sutton et al., 1998). However, since the SNAREs is expected to be in a trans conformation that is presumably shorter, the intermembrane density in Fig. 7B likely represents a complex of SNAREs and other factors, including synaptotagmins. Additional tomograms are needed to resolve individual molecules in this density map.Our NovaCTF/CrESTA/RELION workflow marks a substantial increase in user interactions with subtomogram averaging data processing. CrESTA allows users to filter subtomograms as well as rotate subtomograms based on results from RELION refinement. Beyond this, CrESTA also marks the first time 3D signal permutation from subtomograms can be performed in a manner that de-emphasizes background voxels and maintains an image mean of 0 and standard deviation near 1. Existing image alignment and classification algorithms are designed for globular objects of interest, such as single proteins or protein complexes. Continuous membranes, membrane-membrane interfaces, and densely packed targets are not perfectly compatible with the current algorithms used in single particle analysis, in large part due to assumptions made about the region of interest versus background during solvent noise calculation. CrESTA addresses this problem by altering subtomograms directly to de-emphasize background voxels. Combined with manual inspection, these approaches offer the user greater oversight and control of the refinement process in 3D refinement software packages.
Broad uses of feature-guided alignment and 3D signal permutation
One of the major hurdles to resolving a protein complex sandwiched between continuous membranes, as shown here, is the substantial signal produced by the membranes effectively produces a pseudo-symmetric system with polarity that is prone to misalignment. CrESTA allows users to define and apply a known vector to the starting subtomograms, and thus directly addresses this issue. Therefore, in addition to cell-free systems like the synaptic protein complex example presented here, CrESTA and the NovaCTF/CrESTA/RELION workflow should prove useful in other biological systems. For instance, CrESTA would likely be useful in cellular tomography, where objects of interest are more likely to be in feature-rich regions. Additionally, cellular context provides orientation information for objects of interest, and feature-guided alignment will take advantage of this information. For example, our approaches should be generally applicable to all membrane-membrane or cell–cell junctions.
Authors: Eric F Pettersen; Thomas D Goddard; Conrad C Huang; Gregory S Couch; Daniel M Greenblatt; Elaine C Meng; Thomas E Ferrin Journal: J Comput Chem Date: 2004-10 Impact factor: 3.376
Authors: Jesús G Galaz-Montoya; Corey W Hecksel; Philip R Baldwin; Eryu Wang; Scott C Weaver; Michael F Schmid; Steven J Ludtke; Wah Chiu Journal: J Struct Biol Date: 2016-03-22 Impact factor: 2.867