| Literature DB >> 33619443 |
David Liang1, Meichen Song2, Ziyuan Niu2, Peng Zhang2, Miriam Rafailovich2, Yuefan Deng2.
Abstract
ABSTRACT: Molecular dynamics (MD) simulations are a widely used technique in modeling complex nanoscale interactions of atoms and molecules. These simulations can provide detailed insight into how molecules behave under certain environmental conditions. This work explores a machine learning (ML) solution to predicting long-term properties of SARS-CoV-2 spike glycoproteins (S-protein) through the analysis of its nanosecond backbone RMSD (root-mean-square deviation) MD simulation data at varying temperatures. The simulation data were denoised with fast Fourier transforms. The performance of the models was measured by evaluating their mean squared error (MSE) accuracy scores in recurrent forecasts for long-term predictions. The models evaluated include k-nearest neighbors (kNN) regression models, as well as GRU (gated recurrent unit) neural networks and LSTM (long short-term memory) autoencoder models. Results demonstrated that the kNN model achieved the greatest accuracy in forecasts with MSE scores over around 0.01 nm less than those of the GRU model and the LSTM autoencoder. Furthermore, it demonstrated that the kNN model accuracy increases with data size but can still forecast relatively well when trained on small amounts of data, having achieved MSE scores of around 0.02 nm when trained on 10,000 ns of simulation data. This study provides valuable information on the feasibility of accelerating the MD simulation process through training and predicting supervised ML models, which is particularly applicable in time-sensitive studies. GRAPHIC ABSTRACT: SARS-CoV-2 spike glycoprotein molecular dynamics simulation. Extraction and denoising of backbone RMSD data. Evaluation of k-nearest neighbors regression, GRU neural network, and LSTM autoencoder models in recurrent forecasting for long-term property predictions.Entities:
Keywords: Computation; Machine learning; Molecular; Simulation
Year: 2021 PMID: 33619443 PMCID: PMC7888691 DOI: 10.1557/s43580-021-00021-4
Source DB: PubMed Journal: MRS Adv ISSN: 2059-8521
Fig. 1Actual and denoised RMSD (nm) vs time (ns) for various temperatures
Fig. 2Neural network model architectures. a Left: GRU Network. b Right: LSTM Autoencoder
Statistical model train (750–2250 ns) and forecast (2250–2500 ns) MSE (nm) at varying temperatures
| Temperature (Celsius) | kNN | GRU | LSTM Autoencoder | |||
|---|---|---|---|---|---|---|
| Train | Forecast | Train | Forecast | Train | Forecast | |
| 3 | 2.468e−08 | 9.830e−03 | 8.482e−03 | 4.442e−03 | 3.909e−03 | 4.627e−02 |
| 20 | 1.663e−08 | 8.172e−03 | 5.220e−03 | 2.907e−02 | 2.765e−03 | 4.837e−02 |
| 37 | 4.916e−08 | 6.944e−03 | 5.835e−03 | 2.721e−02 | 1.900e−03 | 4.919e−02 |
| 60 | 1.626e−08 | 8.974e−03 | 4.845e−03 | 2.813e−02 | 2.419e−03 | 5.000e−02 |
| 80 | 1.252e−08 | 2.882e−02 | 3.801e−03 | 2.406e−02 | 1.716e−03 | 1.560e−01 |
| 95 | 2.105e−09 | 2.121e−02 | 6.321e−03 | 3.078e−02 | 1.756e−03 | 1.390e−01 |
Fig. 3Supervised model predictions for 3 °C and 60 °C. Left: 3 °C. Right: 60 °C. a Top: kNN. b Middle: GRU Network. c Bottom: LSTM Autoencoder
kNN train and forecast MSE (nm) at varying training data sizes
| Temperature (Celsius) | Train (15,000) | Forecast | Train (10,000) | Forecast | Train (7500) | Forecast |
|---|---|---|---|---|---|---|
| 3 | 2.468e−08 | 9.830e−03 | 2.568e−08 | 9.905e−03 | 4.559e−09 | 1.030e−02 |
| 20 | 1.663e−08 | 8.172e−03 | 1.281e−08 | 2.501e−02 | 2.287e−07 | 1.844e−02 |
| 37 | 4.916e−08 | 6.944e−03 | 8.280e−08 | 1.956e−02 | 8.132e−09 | 1.170e−02 |
| 60 | 1.626e−08 | 8.974e−03 | 5.154e−08 | 2.527e−02 | 1.240e−08 | 3.345e−02 |
| 80 | 1.252e−08 | 2.882e−02 | 8.904e−10 | 1.853e−02 | 5.182e−08 | 2.311e−02 |
| 95 | 2.105e−09 | 2.121e−02 | 1.551e−08 | 3.478e−02 | 1.063e−09 | 2.807e−02 |
Fig. 4kNN forecasts for 80 °C. a Top Left: Trained on 750.0–2250.0 ns. b Top right: trained on 750.0–1750.0 ns data. c Bottom: 750.0–1500.0 ns data