| Literature DB >> 34977905 |
Luyuan Zhao1, Jinxiao Zhang2, Yaolong Zhang1, Sheng Ye3, Guozhen Zhang1, Xin Chen4, Bin Jiang1, Jun Jiang1.
Abstract
A data-driven approach to simulate circular dichroism (CD) spectra is appealing for fast protein secondary structure determination, yet the challenge of predicting electric and magnetic transition dipole moments poses a substantial barrier for the goal. To address this problem, we designed a new machine learning (ML) protocol in which ordinary pure geometry-based descriptors are replaced with alternative embedded density descriptors and electric and magnetic transition dipole moments are successfully predicted with an accuracy comparable to first-principle calculation. The ML model is able to not only simulate protein CD spectra nearly 4 orders of magnitude faster than conventional first-principle simulation but also obtain CD spectra in good agreement with experiments. Finally, we predicted a series of CD spectra of the Trp-cage protein associated with continuous changes of protein configuration along its folding path, showing the potential of our ML model for supporting real-time CD spectroscopy study of protein dynamics.Entities:
Year: 2021 PMID: 34977905 PMCID: PMC8715543 DOI: 10.1021/jacsau.1c00449
Source DB: PubMed Journal: JACS Au ISSN: 2691-3704
Figure 1(a) NMA structure and protein structure. (b) Valence molecular orbitals and two electronic transitions of the peptide bond which are n → π* or π → π* transitions. (c) Machine learning protocol for predicting protein CD spectra.
Figure 2ML prediction of the electric and magnetic transition dipole moments of peptide bonds. (a) Correlation plots of the TDDFT and ML predicted electric transition dipole moments of the n → π* and π → π* transitions using CM with GBR. (b) Correlation plots of the TDDFT and ML predicted magnetic transition dipole moments of the n → π* and π → π* transitions using CM with GBR. (c) Same as (a) but using EANN. (d) Same as (b) but using EANN.
Figure 3Experimental (black curves) and ML predicted (red curves) CD spectra of different types proteins. Intensity is scaled to have the same maximum intensity for each panel.
Comparison of the ML Simulated Protein CD Spectra with Experiments in Terms of Spearman Rank Correlation (ρ)
| protein | PDB ID | secondary class | number of atoms | ρ |
|---|---|---|---|---|
| Peroxidase C1A | 7ATJ | α | 2944 | 0.74 |
| Pectate lyase C | 1AIR | β | 2786 | 0.81 |
| Pyruvate kinase | 1A49 | α + β | 34001 | 0.79 |
| Cytochrome bc1 complex | 1BE3 | α + β | 16222 | 0.88 |
| Aspartokinase III | 2J0X | α + β | 6915 | 0.88 |
| DNase I | 3DNI | α + β | 2494 | 0.75 |
| TraF protein | 3JQO | α + β | 38842 | 0.83 |
| Carboxypeptidase A | 5CPA | α + β | 2753 | 0.97 |
Figure 4(a) Experimental (black curves) and ML predicted (red curves) CD spectra. The ML predictions are based on 1000 MD configurations. (b) The ML predicted CD spectra of the Trp-cage protein along its folding path (S1 → S100, S1: the original unfolded structure, S25: slightly folded along with the decrease of coil content, S50: folding faster and helical elements appear, S75: a cage formed with the rapid increase of α-helix, S100: the final stably folded structure). All spectra are averaged over 100 MD conformations for each state.