| Literature DB >> 29360779 |
Kamal Al Nasr1, Feras Yousef2, Ruba Jebril3, Christopher Jones4.
Abstract
To take advantage of recent advances in genomics and proteomics it is critical that the three-dimensional physical structure of biological macromolecules be determined. Cryo-Electron Microscopy (cryo-EM) is a promising and improving method for obtaining this data, however resolution is often not sufficient to directly determine the atomic scale structure. Despite this, information for secondary structure locations is detectable. De novo modeling is a computational approach to modeling these macromolecular structures based on cryo-EM derived data. During de novo modeling a mapping between detected secondary structures and the underlying amino acid sequence must be identified. DP-TOSS (Dynamic Programming for determining the Topology Of Secondary Structures) is one tool that attempts to automate the creation of this mapping. By treating the correspondence between the detected structures and the structures predicted from sequence data as a constraint graph problem DP-TOSS achieved good accuracy in its original iteration. In this paper, we propose modifications to the scoring methodology of DP-TOSS to improve its accuracy. Three scoring schemes were applied to DP-TOSS and tested: (i) a skeleton-based scoring function; (ii) a geometry-based analytical function; and (iii) a multi-well potential energy-based function. A test of 25 proteins shows that a combination of these schemes can improve the performance of DP-TOSS to solve the topology determination problem for macromolecule proteins.Entities:
Keywords: analysis; cryo-electron microscopy; geometry; potential energy; protein modeling; protein secondary structure elements; protein topology
Mesh:
Substances:
Year: 2018 PMID: 29360779 PMCID: PMC6017786 DOI: 10.3390/molecules23020028
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1Topology problem. (a) The volume and the sticks detected for SSEs-V. The volume was simulated at 10 Å resolution using protein structure 1POH (PDB ID). Three helix sticks (blue) and 4 strand sticks (green) were detected from the volume; (b) The SSEs-S observed from protein sequence are marked as SQ1 to SQ7. Helix segments were colored in blue and β-strands were colored in green; (c) The helix sticks were superimposed to the skeleton (yellow) that was generated using the initial version of our skeletonizer [47]; (d) The native protein structure was superimposed to the skeleton. (e) The correct topology of the SSEs; (f) An example of a wrong possible topology.
The information of the experimental volumes used.
| No | EMD a | ID b | Chain c | Resolution d |
|---|---|---|---|---|
| 1 | 5030 | 3FIN | R | 6.4 |
| 2 | 2526 | 4CHV | A | 7.0 |
| 3 | 8070 | 5I1M | V | 7.0 |
| 4 | 4176 | 6F36 | M | 3.7 |
| 5 | 3888 | 6EM3 | L | 4.2 |
| 6 | 2843 | 4UE4 | C | 7.0 |
| 7 | 8625 | 5UZB | A | 7.0 |
| 8 | 1733 | 3C91 | K | 6.8 |
| 9 | 3761 | 5O8O | A | 6.8 |
| 10 | 4154 | 5M50 | C | 5.5 |
a the EM Databank ID of the experimental cryo-EM volume; b the PDB ID of the fitted protein molecule; c the chain used in the experiment; d the resolution of the experimental image in Angstrom (Å).
The performance of DP-TOSS with different scoring functions.
| No. | ID a | SSEs-S b | SSEs-V c | Ranksk d | Ranksk+g e | Ranksk+g+e f | Ranksk+e g | Rankg h | Rankg+e i |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 1FLP | 7 | 6 | 1 | 1 | 1 | 1 | 4 | 17 |
| 2 | 1NG6 | 9 | 7 | 2 | 2 | 1 | 1 | 7 | 15 |
| 3 | 2XB5 | 13 | 10 | 11 | 2 | 9 | 47 | 91 | N/A |
| 4 | 1BZ4 | 5 | 5 | 1 | 1 | 3 | 56 | 87 | N/A |
| 5 | 3ACW | 17 | 15 | 32 | 7 | 24 | 28 | 73 | 61 |
| 6 | 1A7D | 6 | 4 | 12 | 2 | 17 | 19 | 46 | 94 |
| 7 | 3ODS | 21 | 16 | 7 | 1 | 34 | 61 | N/A | N/A |
| 8 | 3HJL | 20 | 20 | 1 | 1 | 1 | 1 | 4 | 16 |
| 9 | 1ICX* | 13 | 11 | 31 | 12 | 45 | N/A | N/A | N/A |
| 10 | 1OZ9* | 13 | 12 | 2 | 2 | 3 | 4 | 72 | N/A |
| 11 | 4OXW* | 8 | 7 | 6 | 1 | 2 | 2 | 18 | 77 |
| 12 | 1YD0* | 8 | 7 | 31 | 5 | 22 | N/A | N/A | 65 |
| 13 | 2Y4Z* | 8 | 8 | N/A | 14 | 59 | 92 | N/A | 83 |
| 14 | 4YOK* | 17 | 15 | N/A | 37 | N/A | 87 | N/A | N/A |
| 15 | 4R9A* | 14 | 10 | N/A | 27 | N/A | N/A | N/A | N/A |
| 16 | 3FIN* | 7 | 7 | 1 | 2 | 2 | 5 | 5 | 24 |
| 17 | 4CHV* | 23 | 19 | N/A | N/A | N/A | N/A | N/A | N/A |
| 18 | 5I1M | 19 | 12 | N/A | N/A | N/A | N/A | N/A | N/A |
| 19 | 6F36 | 13 | 7 | 2 | 1 | 3 | 19 | 2 | 63 |
| 20 | 6EM3* | 8 | 8 | 27 | 13 | 77 | N/A | 51 | N/A |
| 21 | 4UE4 | 6 | 5 | 1 | 1 | 3 | 14 | 11 | 56 |
| 22 | 5UZB* | 13 | 7 | 20 | 9 | 29 | 77 | N/A | N/A |
| 23 | 3C91* | 19 | 19 | N/A | 51 | 87 | N/A | N/A | N/A |
| 24 | 5O8O* | 24 | 22 | N/A | N/A | N/A | N/A | N/A | N/A |
| 25 | 5M50* | 9 | 8 | 41 | 31 | 55 | 82 | N/A | N/A |
a The PDB ID of the protein used in the test. β-containing proteins are marked with *; b total number of secondary structure elements in the sequence; c total number of secondary structure elements extracted from the cryo-EM volume; d the rank of the correct topology using skeleton traces scoring function, ; e the rank of the correct topology using skeleton and geometry function ; f the rank of the correct topology using skeleton, geometry, and energy, , functions; g the rank of the correct topology using skeleton and energy functions.; h the rank of the correct topology using geometry function.; i the rank of the correct topology using geometry and energy functions.
Figure 2An example of a topology graph. The weights were restricted to integers to save the space in drawing. Two examples of invalid paths are shown in red dashed lines. The shortest path is shown in magenta solid lines. The transparent nodes are nodes that are invalid where the sequence segment is assigned to a stick of different type. Only possible edges are shown.
Figure 3An example of building for the skeleton. (a) The clusters from model are built and centroids (shown in red) are calculated. Each centroid is a node in graph; (b) Two centroids are connected if the distance between a voxel from the first centroid’s cluster is within 3 Å of any voxel from the second centroid’s cluster. The weights of the edges are the Euclidean distances between the centroids.
Figure 4An example of paths for skeleton traces that can be found between two SSEs-V endpoints (a). Two paths are found (colored in black arrows). The native loop structure is shown (b) that shows that the longest path is the correct path. The length of the two paths is compared with the of the loop and the path that best fits the loop is chosen to calculate the weight of the edge in .
Figure 5The geometry of consecutive secondary structures. (a) The three vectors describe the geometry of the secondary structures and the loop; (b) Histogram for the dihedral angle (). The curve is the normal distribution with a peak at zero; (c) Two-dimensional contour representation of the distribution of angles θ1 and θ2. The ridge is along the diagonal line; (d) Box plot of the sum of the two packing angles ( ). The box plot is clearly symmetrical overall. The quartiles Q1 and Q3 approximately the same distance from the median. The “whiskers” of the plot approximately the same length; (e) Schematic representation of the scaled bivariate normal distribution of the dihedral angle ( ) and the sum of two packing angles ( ).