| Literature DB >> 34834140 |
Abstract
Although atomic structures have been determined directly from cryo-EM density maps with high resolutions, current structure determination methods for medium resolution (5 to 10 Å) cryo-EM maps are limited by the availability of structure templates. Secondary structure traces are lines detected from a cryo-EM density map for α-helices and β-strands of a protein. A topology of secondary structures defines the mapping between a set of sequence segments and a set of traces of secondary structures in three-dimensional space. In order to enhance accuracy in ranking secondary structure topologies, we explored a method that combines three sources of information: a set of sequence segments in 1D, a set of amino acid contact pairs in 2D, and a set of traces in 3D at the secondary structure level. A test of fourteen cases shows that the accuracy of predicted secondary structures is critical for deriving topologies. The use of significant long-range contact pairs is most effective at enriching the rank of the maximum-match topology for proteins with a large number of secondary structures, if the secondary structure prediction is fairly accurate. It was observed that the enrichment depends on the quality of initial topology candidates in this approach. We provide detailed analysis in various cases to show the potential and challenge when combining three sources of information.Entities:
Keywords: amino acid; constraints; contact; cryo-electron microscopy; image; protein structure; secondary structure; topology
Mesh:
Substances:
Year: 2021 PMID: 34834140 PMCID: PMC8624718 DOI: 10.3390/molecules26227049
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.927
Figure 1Secondary structure sequence segments, image traces, topology, and pair-contact. (A) The cryo-EM density map (gray, EMDB ID 6810) component that corresponds to chain H of atomic structure 5y5x (PDB ID). (B) The detected secondary structure regions of α-helices (yellow density) and β-sheet (blue density) using DeepSSETracer [20]. α-traces (red lines) were derived using Principle Component Analysis for α-helices, and β-traces (blue lines) were predicted for β-strands using StrandTwister [21]. (C) An illustration of the amino acid sequence of protein 5y5x chain H annotated with the locations of helices (red rectangles) and β-strands (blue rectangles) predicted using JPred [30]. (D) An example of a correct topology shown as a diagram and as a list of mapped pairs. Black arrows indicate the topology, with the order of the secondary structure traces from N to C terminal and direction of each trace. The atomic structure of 5y5x (PDB ID) chain H is shown as a rainbow ribbon. The correctly mapped secondary structure pairs are highlighted in the representation of the topology. (E) The atomic structure (cyan ribbon) of chain H in 6r0z superimposed with the secondary structure traces. (F) An example of a wrong topology indicated using black arrows. The secondary structure contact pairs derived from significant long-range amino acid contacts are indicated as green arrows for the wrong topology.
Secondary structure contact pairs derived from amino acid contact prediction. Amino acid contact pairs were obtained using MULTICOM [40,41] or RaptorX [42] (see details in Methods). a The ID of a test case involving a cryo-EM density map is labeled as EMDB ID-PDB ID-chain ID. The ID of a case involving a simulated density map is labeled using either a PDB ID or a CASP target ID. The threshold of p-values (standard deviations (SD)) used for selection of significant long-range pairs is indicated. b Secondary structure contact pairs are labeled using the IDs of sequence segments predicted using JPred [30], the type of secondary structure indicated as either α or β, and the number of significant long-range amino acid pairs that are mapped on the secondary structure pair.
| a Case | b Secondary Structure Contact and Number of Significant Long-Range Pairs |
|---|---|
| 6810-5y5x-H (3SD) | (S4, S5)-(β, α)-1; (S4, S7)-(β, β)-4 |
| 9534-5gpn-Ae (1SD) | (S0, S1)-(α, α)-1; (S1, S2)-(α, α)-1; (S2, S3)-(α, α)-3 |
| 8518-5u8s-A (3SD) | (S1, S2)-(α, α)-1; (S4, S7)-(β, β)-9; (S4, S9)-(β, β)-3; (S5, S6)-(β, β)-1; |
| 3948-6esg-B (1SD) | (S0, S1)-(α, α)-3; (S1, S2)-(α, α)-9; (S2, S3)-(α, β)-5 |
| 2620-4uje-BH (3SD) | (S2, S3)-(β, β)-22; (S2, S5)-(β, β)-10; (S3,S4)-(β, α)-1; (S4, S5)-(α, β)-1; |
| 8357-5t4o-L (2SD) | (S0, S5)-(α, α)-2; (S0, S1)-(α, α)-1; (S1, S2)-(α, α)-1; (S6, S8)-(β, β)-24; |
| 3LTJ (2SD) | (S3, S5)-(α, α)-1; (S5, S7)-(α, α)-1; (S8, S9)-(α, α)-1 |
| 2XB5 (3SD) | (S0, S1)-(α, α)-8; (S0, S3)-(α, α)-2; (S1, S2)-(α, β)-1; (S4, S9)-(α, α)-6; |
| 1HG5 (2SD) | (S1, S3)-(α, α)-5; (S1, S2)-(α, α)-1; (S2, S3)-(α, α)-1; |
| 3ACW (3SD) | (S2, S6)-(α, α)-1; (S5, S6)-(α, α)-1; (S6, S7)-(α, α)-4; (S7,S10)-(α, α)-1; |
| 1Z1L (3SD) | (S3, S8)-(α, α)-1; (S4, S5)-(α, α)-7; (S5, S9)-(α, α)-7; (S5, S11)-(α, α)-1; |
| T1029 (3SD) | (S1, S6)-(α, α)-4; (S2, S3)-(β, β)-9; (S3, S4)-(β, β)-11; (S4, S5)-(β, β)-12 |
| T1031 (3SD) | (S0, S1)-(α, α)-1; (S2, S3)-(α, β)-1; (S3, S4)-(β, β)-18; (S4, S5)-(β, β)-6 |
| T1033 (3SD) | (S0, S4)-(α, α)-2; (S2, S3)-(α, α)-7; (S3, S4)-(α, α)-6; (S4, S5)-(α, α)-2 |
The rank of the maximum-match topology produced using secondary structure sequence segments, traces, and amino acid contact pairs. a A test case involving a cryo-EM density map is labeled as EMDB ID-PDB ID-chain ID. A case involving a simulated density map is labeled using the PDB ID. A case involving a CASP target is labeled using the target ID. The resolution of a density map is indicated. b The number of amino acids in the protein (length of downloaded sequence/length in atomic structure). c The number of α-helices/β-strands in the atomic structure. (+) indicates number of β-strands in each β-sheet. d The number of α-helices/β-strands predicted using JPred. e The number of α-traces/β-traces detected from the 3D density map. f The number of correctly matched secondary structures (α-helices/β-strands) that are included in the maximum-match topology. g Rank of the maximum-match topology without using contact pairs. h Rank of the maximum-match topology using contact pairs. *: Merge error in a predicted helix using JPred.
| Case a | #a.a.b | True | Seq | Image | Max | Rank of Maximum-Match Topology | |
|---|---|---|---|---|---|---|---|
| No_C g | With_C h | ||||||
| 6810-5y5x-H(5 Å) | 104/100 | 5/3 | 5/4 | 5/3 | 6(4/2) | 1 | 1 |
| 9534-5gpn-Ae(5.4 Å) | 116/88 | 4/0 | 4/0 | 4/0 | 4(4/0) | 2 | 2 |
| 8518-5u8s-A(6.1 Å) | 208/208 | 6/2 | 5/3 + 2 | 5/3 + 2 | 7(5/2) | 142 | 116 |
| 3948-6esg-B(5.4 Å) | 102/78 | 3/0 | 3/1 | 3/0 | 3(3/0) | 5 | 5 |
| 2620-4uje-BH(6.9 Å) | 194/191 | 7/3 + 3 | 5/3 + 3 | 4/3 + 3 | 10(4/6) | NA | - |
| 8357-5t4o-L(6.9 Å) | 177/160 | 9/0 | 7/4 | 8/2 | 7(7/0) | 2 | 2 |
| 3LTJ(8 Å) | 201/191 | 16/0 | 12/0 | 12/0 | 12(12/0) | NA | - |
| 2XB5(8 Å) | 207/207 | 12/0 | 9/1 | 10/3 | 9(9/0) | NA | - |
| 1HG5(8 Å) | 289/263 | 11/0 | 10/0 | 13/0 | 9(9/0) | 1022 | 217 |
| 3ACW(8 Å) | 293/284 | 15/0 | 12/1 | 12/2 | 12(12/0) | 2072 | 1141 |
| 1Z1L(8 Å) | 345/338 | 23/0 | 15/0 | 15/0 | 13(13/0) | NA | - |
| T1029(8 Å) | 125/125 | 6/4 | 3/5 | 6/4 | 7(3/4) | 437 | 117 |
| T1031(8 Å) | 95/95 | 4/3 | 3 */3 | 4/3 | 5(2/3) | 56 | 187 |
| T1033(8 Å) | 100/100 | 3/0 | 6/0 | 4/0 | 4(4/0) | 5 | 5 |
Figure 2The maximum-match topology for case T1033. (A) The 1st ranked topology indicated with black arrows from the N to C terminus of the protein. The α-traces detected from the simulated density map are shown in red. (B) The 5th ranked topology is the maximum-match topology, similarly shown as in (A). (C) The atomic structure of T1033 (rainbow ribbon) superimposed with the 5th topology. (D) The amino acid sequence of protein T1033 and helices (red) predicted using JPred [30]. (E) Representation of the 1st and the 5th ranked topology. Correctly matched secondary structure pairs are highlighted in the maximum-match topology. Secondary structure contact pairs are marked in green for satisfaction of the distance requirement of 13 Å.
Figure 3The maximum-match topology for case 1HG5 (PDB ID) after significant long-range contact pairs are applied. (A) The simulated density map of chain A of 1HG5 (PDB ID). (B) The 217th topology superimposed with the atomic structure (rainbow ribbon). The direction of each α-trace (red line) and the order of α-traces in the topology are indicated with black arrows. (C) A separate view of the 217th ranked topology, the maximum-match topology. (D) The 56452th ranked topology. (E) Secondary structures predicted using JPred [30], with helices annotated in red. (F) Representation of the 217th and the 56452th topology. The correctly matched pairs in the maximum-match topology are highlighted. Secondary structure contact pairs are marked in green and red for satisfaction and dissatisfaction of the distance requirement of 13 Å, respectively.
Figure 4The maximum-match topology for case T1031. (A) The atomic structure (rainbow ribbon) is superimposed with the secondary structure regions of α-helices (yellow density) and β-sheet (blue density) detected using DeepSSETracer [20]. (B) The amino acid sequence and secondary structures, helices (red) and β-strands (blue), predicted using JPred [30]. (C) The 187th ranked initial topology is indicated with black arrows, α-traces (red lines) and β-traces (blue lines) from N to C terminal of the protein sequence. Correctly mapped secondary structure pairs are highlighted in the representation of the maximum-match topology. Secondary structure contact pairs are marked with green and red for satisfaction and dissatisfactory respectively. (D) The 1st ranked topology shown similarly as in (C).
Figure 5Evaluation of initial topologies using amino acid contact pairs.