| Literature DB >> 28191458 |
H E White1, A Ignatiou1, D K Clare1, E V Orlova1.
Abstract
In living organisms, biological macromolecules are intrinsically flexible and naturally exist in multiple conformations. Modern electron microscopy, especially at liquid nitrogen temperatures (cryo-EM), is able to visualise biocomplexes in nearly native conditions and in multiple conformational states. The advances made during the last decade in electronic technology and software development have led to the revelation of structural variations in complexes and also improved the resolution of EM structures. Nowadays, structural studies based on single particle analysis (SPA) suggests several approaches for the separation of different conformational states and therefore disclosure of the mechanisms for functioning of complexes. The task of resolving different states requires the examination of large datasets, sophisticated programs, and significant computing power. Some methods are based on analysis of two-dimensional images, while others are based on three-dimensional studies. In this review, we describe the basic principles implemented in the various techniques that are currently used in the analysis of structural conformations and provide some examples of successful applications of these methods in structural studies of biologically significant complexes.Entities:
Mesh:
Year: 2017 PMID: 28191458 PMCID: PMC5274696 DOI: 10.1155/2017/1032432
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Overall diagram of the work flow of structural analysis by cryo-EM.
Figure 2Multivariate Statistical Analysis. (a) Left: ten images, each consisting of 2 pixels. Right: each image is represented as a vector in 2-dimensional space according to their grey values. (b) Hierarchical Classification. The left panel shows the sequential combination of vectors according to their closeness. The initial classification of images starts by forming small classes which include images that are close to one another in multidimensional space and then the size of the group is progressively increased by merging with dimensional other surrounding smaller groups that are in close proximity to each other (see the text). Images that are too far from each other form new separate classes. In the example shown in panel (b) the process of forming two classes is represented by the blue and green ovals which have varying degrees of colour intensity. The light and dark coloured ovals correspond to the initial and final steps of classification, respectively. The right panel shows a tree of HAC. The starting point is 10 classes which correspond to the number of single images in the dataset. The cut-off point is shown by the dashed red line if 2 classes are required and this corresponds to the two classes shown in the left panel.
Figure 3Eigenimages and Classification. (a) A set of raw images. (b) Four images (top) shown with a coarse pixilation similar to those in panel (a) with size K × K pixels. Images form a matrix where a single image is presented as a single row in it (bottom). Each pixel in row 1 of image 1 is laid out in the first row of the matrix. The second row of image 1 follows on after row 1 in the first row of the matrix. This continues until all K rows have been laid out in the first row of the matrix. The rows of image 2 are laid out in a similar manner in row 2 of the matrix and the process continues until all N images in the dataset have been placed into the matrix. (c) Eight eigenimages obtained from the set of aligned images in (a). (d) Classification of the dataset into 5 classes. (e) Classification of the dataset into 10 classes. (f) Raw unaligned rotated images. (g) Eigenimages from the unaligned dataset.
Figure 4Eigenimages-Symmetry. (a) A model dataset with 4-fold symmetry. (b) Eigenimages from that dataset. Eigenvectors 2 and 3 have clear 4-fold symmetry and are rotated with respect to one another by 22.5° degrees. (c) Tetrameric α-latrotoxin raw images (top row), class averages (middle row), and eigenimages (bottom row). The eigenimages showing 4-fold and pseudo 8-fold symmetry are shown with numbers. (d) Class averages of top views from the connector of bacteriophage SPP1 are shown in the top panel. Classes where the symmetry is visible are highlighted with yellow circles. The eigenimages are in the bottom panel. Eigenimage 1 represents the total sum of the data and the 12-fold symmetry is revealed in eigenimages 2 and 3.
Figure 5Eigenimages-Size Variation. (a) Eigenimages of Hsp26 are shown in the left panel. Eigenimage 1 represents the total sum of the dataset. Eigenimage 2 shows the continuous outer circle which indicates the characteristic size difference range within the dataset. The right panel shows the entire dataset separated into four classes via MSA by only using these first four eigenimages. The big class is highlighted with a white circle around its perimeter, the small class is highlighted with a dashed white circle, and the remaining two classes represent a mixture of large and small Hsp26 images. (b) Eigenimages of BSMV. The size difference is shown in images 11 and 12 (adapted from [18]). (c) A representative micrograph showing the heterogeneity of the SPP1 bacteriophage procapsids where different sizes are clearly seen [19]. (d) The classes of the procapsid images are labelled according to their size, big (B, in blue) and small (S, in yellow).
Figure 6Eigenimages-Substrate Binding. (a) GroEL bound to the substrate rhodanese with the raw images (top) and eigenimages (bottom). Eigenimage 5, highlighted with a yellow box indicates heterogeneity in the trans-ring which is related to the binding of rhodanese (adapted from [5]). (b) Three of the 12 orientation classes (column 1) from GroEL-rhodanese complex after MSA based on the eigenimages, the first six of which are shown in (a). The eigenimages of these classes are shown in columns 2–5 and the heterogeneity in the trans-ring is highlighted with a yellow box (from [5]).
Figure 7ML procedure in the analysis of conformational changes of biocomplexes. Raw images are firstly assigned initial orientation angles using the initial model. That is typically done by projection matching. Then the ML approach is used to obtain 6 to 8 reconstructions. Each 3D model is visually examined in the area of interest; for a ligand presence, in this case the bound tRNA is highlighted in red. Images which were used to obtain the models with tRNA are extracted and subjected to the next round of classification. The following step involves extracting images corresponding to one or another conformation and then followed by refinement. The percentages below the structures in the top row indicate fractions of images from the entire dataset used to calculate these models, while in the second row the percentages are taken from the number of images supposedly containing the bound tRNA.
Figure 8K-Means Clustering. (a) Two initial seeds are randomly placed within the data. (b) Step 2 indicates positions of the averages of images that are nearest to the seeds. (c) The averages are then recalculated based on the assignments in step 2. Steps 2 and 3 are reiterated; (d) shows the final classes.
Figure 9Bootstrapping. A representative set of chickens with different tails and head positions. During step one each of L subsets of M images was picked to make L reconstructions. During step 2 the variance within L reconstructions determines the most significant differences in the head (green) and tail (red) positions. The result of the classification of images shown in step 3 is done by analysing the level of variance in areas defined in step 2 (highlighted by red and blue circles). The two reconstructions generated are then used as the input to carry out the refinement using the focused classification (step 4).
Packages used to work with heterogeneous datasets.
| Package name | Package reference | Examples | Statistical | References |
|---|---|---|---|---|
| IMAGIC | Van Heel et al., 1996 [ | Hsp26 | MSA | White et al., 2004 [ |
|
| ||||
| SPIDER | Frank et al., 1996 [ | 70S ribosome | MSA | Liao and Frank, 2010 [ |
|
| ||||
| RELION | Scheres, 2012 [ | 70S ribosome complex | ML (regularized likelihood optimization), | Anden et al., 2015 [ |
|
| ||||
| EMAN2 | |
| MSA, BS, CM | Tang et al., 2007 [ |
|
| ||||
| SPARX | Hohn et al., 2007 [ | 70S ribosome | BS,CM | Liao and Frank, 2010 [ |
|
| ||||
| XMIPP | Sorzano et al., 2004 [ | 70S ribosome & Simian Virus 40 large T-antigen | ML | Scheres et al., 2007 [ |
|
| ||||
| ASPIRE | ( | 70S ribosome | CM | Katsevich et al., 2015 [ |
|
| ||||
| FREALIGN | Grigorieff 2007 [ | 70S ribosome | ML | Lyumkis et al., 2013 [ |
|
| ||||
| Appion | | integration of different software packages | Lander et al., 2009 [ | |
|
| ||||
| Scipion | | shell that combined different packages | de la Rosa-Trevín et al., 2016 [ | |
MSA: multivariate statistical analysis; ML: maximum likelihood; BS: bootstrapping; CM: covariance maps; NN: neural networks. For more software packages see http://www.emdatabank.org/emsoftware.html.