| Literature DB >> 34944417 |
Bahareh Behkamal1, Mahmoud Naghibzadeh1, Mohammad Reza Saberi2,3, Zeinab Amiri Tehranizadeh2, Andrea Pagnani4,5,6, Kamal Al Nasr7.
Abstract
Cryo-electron microscopy (cryo-EM) is a structural technique that has played a significant role in protein structure determination in recent years. Compared to the traditional methods of X-ray crystallography and NMR spectroscopy, cryo-EM is capable of producing images of much larger protein complexes. However, cryo-EM reconstructions are limited to medium-resolution (~4-10 Å) for some cases. At this resolution range, a cryo-EM density map can hardly be used to directly determine the structure of proteins at atomic level resolutions, or even at their amino acid residue backbones. At such a resolution, only the position and orientation of secondary structure elements (SSEs) such as α-helices and β-sheets are observable. Consequently, finding the mapping of the secondary structures of the modeled structure (SSEs-A) to the cryo-EM map (SSEs-C) is one of the primary concerns in cryo-EM modeling. To address this issue, this study proposes a novel automatic computational method to identify SSEs correspondence in three-dimensional (3D) space. Initially, through a modeling of the target sequence with the aid of extracting highly reliable features from a generated 3D model and map, the SSEs matching problem is formulated as a 3D vector matching problem. Afterward, the 3D vector matching problem is transformed into a 3D graph matching problem. Finally, a similarity-based voting algorithm combined with the principle of least conflict (PLC) concept is developed to obtain the SSEs correspondence. To evaluate the accuracy of the method, a testing set of 25 experimental and simulated maps with a maximum of 65 SSEs is selected. Comparative studies are also conducted to demonstrate the superiority of the proposed method over some state-of-the-art techniques. The results demonstrate that the method is efficient, robust, and works well in the presence of errors in the predicted secondary structures of the cryo-EM images.Entities:
Keywords: 3D graph matching; 3D vector matching; cryo-electron microscopy; modeled structure; protein; secondary structure elements; similarity-based voting algorithm
Mesh:
Substances:
Year: 2021 PMID: 34944417 PMCID: PMC8698881 DOI: 10.3390/biom11121773
Source DB: PubMed Journal: Biomolecules ISSN: 2218-273X
Figure 1Different stages of the framework pipeline: (a) the inputs, including the modeled structure (PDB ID: 1BJ7, chain A) visualized by Chimera [29]; (b) the density map simulated at 10 Å resolution using protein structure 1BJ7 and Chimera package [29]; (c) the secondary structure elements extracted from the 3D modeled structure in the preprocessing step (SSEs-A); (d) the secondary structure elements extracted from the cryo-EM density map (SSEs-C); (e) the 3D vectors constructed based on the extracted SSEs-A; (f) the 3D vectors constructed based on the extracted SSEs-C; (g,h) the 3D graphs are constructed; (i) the similarity-based voting algorithm is proposed as a decision making strategy for finding the SSEs correspondence; (j) the secondary structure elements correspondence.
Figure 2Secondary structure elements (α-helices and β-strands) in the fitted atomic structure with cryo-EM map visualized by Chimera [29].
Figure 3Construction of 3D vectors from extracted SSEs: (a) 3D structure of protein 1FLP (PDB ID) is shown with chimera [29]; (b) each α-helix in the atomic model is considered as a helix vector (HV) in the Cartesian coordinate system (); (c) the cryo-EM density map and the SSEs-C detected on it. The map is simulated at 10 Å resolution using protein structure 1FLP (PDB ID). The location of SSEs-C is illustrated as purple cylinders with Gorgon [21]; (d) extracted SSEs-C on the map considered as stick vector (SV) in three-dimensional Cartesian space .
Figure 4Transformation of 3D vectors into the weighted fully connected graph: (a) α-helix vectors in ; (b) construction of the weighted fully connected graph of α-helices ( ). The ith helix vector (HVi) is transformed into an ith helix node (HNi); (c) stick vectors in ; (d) construction of the weighted fully connected graph of sticks (.). The ith stick vector (SVi) is transformed into the ith stick node (SNi).
The information of the experimental cryo-EM maps.
| No | EMDB ID a | PDB ID b | Chain c | # Length d | # SSEs-A e | # SSEs-C f | Resolution g |
|---|---|---|---|---|---|---|---|
| 1 | 5030 | 3FIN * | R | 117 | 7 | 7 | 6.4 |
| 2 | 3888 | 6EM3 * | A | 291 | 11 | 9 | 4.2 |
| 3 | 8625 | 5UZB * | A | 177 | 13 | 9 | 7 |
| 4 | 4176 | 6F36 * | M | 327 | 13 | 11 | 3.7 |
| 5 | 1733 | 3C91 * | A | 233 | 18 | 15 | 6.8 |
| 6 | 8070 | 5I1M * | V | 458 | 19 | 17 | 7 |
| 7 | 2526 | 4CHV * | A | 361 | 23 | 22 | 7 |
| 8 | 3761 | 5O8O * | A | 349 | 24 | 22 | 6.8 |
| 9 | 20934 | 6UXW * | A | 1703 | 43 | 35 | 8.9 |
| 10 | 8231 | 5KBU * | A | 1034 | 65 | 54 | 7.8 |
a The EMDB ID of the protein used in the test; b the PDB ID of the protein used in the test. β-containing proteins are marked with *; c the protein chain; d the number of amino acid residues in the sequence; e the total number of secondary structure elements (α-helices and β-strands) in the atomic structure; f the total number of secondary structure elements (α-helices and β-strands) extracted from the cryo-EM map; g the resolution of the experimental map in angstrom (Å).
The information of the simulated cryo-EM map.
| No | Name a | PDB ID b | Uniprot ID c | Chain d | Length e | #SSEs-A f | #SSEs-C g |
|---|---|---|---|---|---|---|---|
| 1 | Apolipoprotein E | 1BZ4 | P02649 | A | 144 | 5 | 5 |
| 2 | Hemoglobin-1 | 1FLP | P41260 | A | 142 | 7 | 7 |
| 3 | Gag polyprotein | 2Y4Z * | P03336 | A | 140 | 8 | 8 |
| 4 | Uncharacterized protein YqeY | 1NG6 | P54464 | A | 148 | 9 | 7 |
| 5 | Phosphatidylinositol | 1HG5 | O55012 | A | 289 | 11 | 9 |
| 6 | Class IV chitinase Chia4-Pa2 | 3HBE | Q6WSR8 | X | 204 | 11 | 7 |
| 7 | Phospholipase C | 1P5X | P09598 | A | 245 | 13 | 9 |
| 8 | Tetracycline repressor protein class D | 2XB5 | P0ACT4 | A | 207 | 13 | 9 |
| 9 | Protein LlR18A | 1ICX * | P52778 | A | 155 | 13 | 11 |
| 10 | N-glycosylase/DNA lyase | 1XQO | Q8ZVK6 | A | 256 | 14 | 14 |
| 11 | AlphaRep-4 | 3LTJ | __ | A | 201 | 16 | 12 |
| 12 | 4,4’-diapophytoene synthases | 3ACW | A9JQL9 | A | 293 | 17 | 14 |
| 13 | Flagellar motor switch protein FliG | 3HJL | O66891 | A | 329 | 20 | 20 |
| 14 | Symplekin | 3ODS | Q92797 | A | 415 | 21 | 16 |
| 15 | Albumin | 2XVV | P02768 | A | 585 | 33 | 19 |
a the name of the protein; b the PDB ID of the protein used in the test. β-containing proteins are marked with *; c the Uniport ID of the protein; d the protein chain; e the number of amino acid residues in the sequence; f the total number of secondary structure elements (α-helices and β-strands) in the atomic structure; g the total number of secondary structure elements extracted from the cryo-EM map.
The accuracy of the three SSEs correspondence sets using two scoring functions.
| BD | MAC | ||||||
|---|---|---|---|---|---|---|---|
| NO | PDB ID | Angle | ED | RL | Angle | ED | RL |
| 1 | 1BZ4 | 80 | 80 | 80 | 80 | 60 | 80 |
| 2 | 1FLP | 42.85 | 57.14 | 28.57 | 57.14 | 71.42 | 57.14 |
| 3 | 2Y4Z | 50 | 58.33 | 58.33 | 58.33 | 50 | 50 |
| 4 | 1NG6 | 44.44 | 88.88 | 66.66 | 44.44 | 88.88 | 77.77 |
| 5 | 1HG5 | 72.72 | 36.36 | 36.36 | 54.54 | 45.45 | 54.54 |
| 6 | 3HBE | 81.81 | 90.9 | 81.81 | 81.81 | 90.9 | 72.72 |
| 7 | 1P5X | 69.23 | 84.16 | 61.53 | 76.92 | 100 | 69.23 |
| 8 | 2XB5 | 38.46 | 76.92 | 69.23 | 46.15 | 53.84 | 69.23 |
| 9 | 1ICX | 76.19 | 77.38 | 53.57 | 84.52 | 70.23 | 63.09 |
| 10 | 1XQO | 64.28 | 57.14 | 50 | 71.42 | 78.57 | 28.57 |
| 11 | 3LTJ | 43.75 | 93.75 | 37.5 | 100 | 43.75 | 62.5 |
| 12 | 3ACW | 35.29 | 64.7 | 47.05 | 35.29 | 52.94 | 35.29 |
| 13 | 3HJL | 20 | 90 | 30 | 40 | 95 | 30 |
| 14 | 3ODS | 33.33 | 52.38 | 33.33 | 23.8 | 57.14 | 42.58 |
| 15 | 2XVV | 60.6 | 78.78 | 45.45 | 63.63 | 78.78 | 54.54 |
| 16 | 3FIN | 58.33 | 58.33 | 29.16 | 45.83 | 87.5 | 58.33 |
| 17 | 6EM3 | 70.83 | 47.91 | 58.33 | 81.25 | 54.16 | 52.08 |
| 18 | 5UZB | 55.55 | 66.66 | 44.44 | 55.55 | 66.66 | 55.55 |
| 19 | 6F36 | 38.46 | 92.3 | 53.84 | 38.46 | 100 | 53.84 |
| 20 | 3C91 | 62.5 | 63.75 | 60 | 62.5 | 68.75 | 45 |
| 21 | 5I1M | 36.84 | 52.63 | 57.89 | 31.57 | 47.36 | 36.84 |
| 22 | 4CHV | 53.33 | 73.33 | 46.66 | 53.33 | 93.33 | 66.66 |
| 23 | 5O8O | 52.38 | 66.66 | 52.38 | 50 | 92.85 | 50 |
| 24 | 6UXW | 41.21 | 79.84 | 48.18 | 49.69 | 67.27 | 41.66 |
| 25 | 5KBU | 47.63 | 46.59 | 35.51 | 53.78 | 49.76 | 36.97 |
| Average | 53.20 | 69.39 | 50.63 | 57.59 | 70.58 | 53.76 |
The accuracy of the method incorporating the SimVA algorithm.
| No | PDB ID a | BD b | SimVA_BD c | MAC d | SimVA_MAC e |
|---|---|---|---|---|---|
| 1 | 1BZ4 | 80 | 80 | 73.33 | 80 |
| 2 | 1FLP | 42.85 | 57.14 | 61.9 | 85.71 |
| 3 | 2Y4Z | 55.55 | 66.66 | 55.55 | 66.66 |
| 4 | 1NG6 | 66.66 | 100 | 70.37 | 77.77 |
| 5 | 1HG5 | 48.48 | 54.54 | 51.51 | 72.72 |
| 6 | 3HBE | 84.84 | 90.9 | 81.81 | 90.9 |
| 7 | 1P5X | 71.79 | 92.3 | 82.05 | 84.61 |
| 8 | 2XB5 | 61.53 | 76.92 | 56.4 | 69.23 |
| 9 | 1ICX | 69.04 | 84.52 | 72.61 | 91.66 |
| 10 | 1XQO | 57.14 | 78.57 | 59.52 | 64.28 |
| 11 | 3LTJ | 58.33 | 100 | 62.5 | 56.25 |
| 12 | 3ACW | 49.01 | 70.58 | 41.17 | 70.58 |
| 13 | 3HJL | 46.66 | 85 | 55 | 75 |
| 14 | 3ODS | 39.68 | 61.9 | 41.26 | 66.66 |
| 15 | 2XVV | 61.61 | 66.66 | 65.65 | 63.63 |
| 16 | 3FIN | 48.61 | 70.83 | 63.88 | 70.83 |
| 17 | 6EM3 | 59.02 | 64.58 | 62.5 | 87.5 |
| 18 | 5UZB | 55.55 | 77.77 | 59.25 | 66.66 |
| 19 | 6F36 | 61.53 | 69.23 | 64.1 | 76.92 |
| 20 | 3C91 | 62.08 | 87.5 | 58.75 | 78.75 |
| 21 | 5I1M | 49.12 | 78.94 | 38.59 | 84.21 |
| 22 | 4CHV | 57.77 | 86.66 | 71.11 | 86.66 |
| 23 | 5O8O | 57.14 | 47.61 | 64.28 | 85.71 |
| 24 | 6UXW | 56.41 | 84.84 | 52.87 | 67.87 |
| 25 | 5KBU | 43.24 | 70.73 | 46.84 | 81.62 |
| Average | 57.74 | 76.17 | 61.51 | 76.09 |
a the PDB ID of the protein; b the total accuracy obtained from three mathematical-based features using BD scoring function; c the accuracy of the SimVA algorithm using BD scoring function; d the total accuracy obtained from three mathematical-based features using MAC scoring function. e the accuracy of the SimVA algorithm using the MAC scoring function.
Figure 5Assessment of the method concerning the performance measurements: (a) precision, (b) sensitivity, (c) F-measure, (d) accuracy.
Comparison between DP-TOSS and SimVA.
| No | PDB ID a | DP-TOSS b | SimVA_BD c | SimVA_MAC d |
|---|---|---|---|---|
| 1 | 1BZ4 | 100 | 80 | 80 |
| 2 | 1FLP | 100 | 57.14 | 85.71 |
| 3 | 2Y4Z | 50 | 66.66 | 66.66 |
| 4 | 1NG6 | 71.40 | 100 | 77.77 |
| 5 | 1HG5 | 55.60 | 54.54 | 72.72 |
| 6 | 3HBE | 57.10 | 90.9 | 90.9 |
| 7 | 1P5X | 55.60 | 92.3 | 84.61 |
| 8 | 2XB5 | 66.70 | 76.92 | 69.23 |
| 9 | 1ICX | 45.50 | 84.52 | 91.66 |
| 10 | 1XQO | 71.4 | 78.57 | 64.28 |
| 11 | 3LTJ | 83.30 | 100 | 56.25 |
| 12 | 3ACW | 100 | 70.58 | 70.58 |
| 13 | 3HJL | 100 | 85 | 75 |
| 14 | 3ODS | 100 | 61.9 | 66.66 |
| 15 | 2XVV | 89.40 | 66.66 | 63.63 |
| 16 | 3FIN | 100 | 70.83 | 70.83 |
| 17 | 6EM3 | 44.40 | 64.58 | 87.5 |
| 18 | 5UZB | 55.50 | 77.77 | 66.66 |
| 19 | 6F36 | 100 | 69.23 | 76.92 |
| 20 | 3C91 | 46.70 | 87.5 | 78.75 |
| 21 | 5I1M | 41.20 | 78.94 | 84.21 |
| 22 | 4CHV | 0 | 86.66 | 86.66 |
| 23 | 5O8O | 0 | 47.61 | 85.71 |
| 24 | 6UXW | 0 | 84.84 | 67.87 |
| 25 | 5KBU | 0 | 70.73 | 81.62 |
| Average | 61.35 | 76.17 | 76.09 |
a the PDB ID of the protein; b the accuracy of DP-TOSS method; c the accuracy of the SimVA algorithm using BD scoring function; d the accuracy of the SimVA algorithm using MAC scoring function.
Figure 6The runtime of the method with respect to the number of SSEs-A in proteins. (PDB ID (#SSEs-A)).