| Literature DB >> 35710099 |
Derek Jones1,2, Jonathan E Allen2, Yue Yang3, William F Drew Bennett3, Maya Gokhale4, Niema Moshiri1, Tajana S Rosing1.
Abstract
Atomistic Molecular Dynamics (MD) simulations provide researchers the ability to model biomolecular structures such as proteins and their interactions with drug-like small molecules with greater spatiotemporal resolution than is otherwise possible using experimental methods. MD simulations are notoriously expensive computational endeavors that have traditionally required massive investment in specialized hardware to access biologically relevant spatiotemporal scales. Our goal is to summarize the fundamental algorithms that are employed in the literature to then highlight the challenges that have affected accelerator implementations in practice. We consider three broad categories of accelerators: Graphics Processing Units (GPUs), Field-Programmable Gate Arrays (FPGAs), and Application Specific Integrated Circuits (ASICs). These categories are comparatively studied to facilitate discussion of their relative trade-offs and to gain context for the current state of the art. We conclude by providing insights into the potential of emerging hardware platforms and algorithms for MD.Entities:
Mesh:
Year: 2022 PMID: 35710099 PMCID: PMC9281402 DOI: 10.1021/acs.jctc.1c01214
Source DB: PubMed Journal: J Chem Theory Comput ISSN: 1549-9618 Impact factor: 6.578
Figure 1Representative time scales for protein motions.[12,15−17]
Figure 2Lennard-Jones potential as a function of interatomic distance for a diatomic system.[22]
Various Non-bonded Force Interaction Algorithms Featured in the Literature Covered in This Review
| abbrev. | name |
|---|---|
| PME | Particle Mesh Ewald |
| SPME | Smooth Particle Mesh Ewald |
| FMM | Fast Multipole Method |
| MGrid | Multigrid |
| BH | Barnes-Hut |
Figure 3Description of an NVIDIA GPU architecture.[50]
Figure 4Description of an NVIDIA GPU Streaming Multiprocessor (SM) unit..[50]
Figure 5GPU price to performance comparison for Amber MD software for versions 2016 and 2018. The data are collected from the Amber Web site.[76]
Figure 6Description of an FPGA architecture.
Characteristics of FPGAs Featured in Molecular Dynamics Simulations
| ref | accelerator | ALMs | I/O band | embed. mem. | mult. blocks |
|---|---|---|---|---|---|
| ( | Intel Stratix 10 | 933 120 | 28.3 gb/s | 253 Mbits | 11 520 |
| ( | Xilinx V5 LX 330T | 51 840 | 3.75 gb/s | 11 664 kbits | 192 |
| ( | Xilinx XC2VP100 | 99 216 | 3.125 gb/s | 7992 kbits | 444 |
| ( | Xilinx XC2VP70–5 | 74 448 | 3.125 gb/s | 5904 kbits | 328 |
| ( | Xilinx XC2 V6000 | 33 792 | 840 mb/s | 2592 kbits | 144 |
| ( | Xilinx Virtex-E 2000E | 43 200 | 622 mb/s | 655.36 kbits | 0 |
Denotes FPGA was used for simulation of performance.
Description of the Development Environments Featured in the FPGA-Based MD Simulation Literature
| ref | year | dev. board | dev. framework |
|---|---|---|---|
| ( | 2004 | TM-3[ | VHDL |
| ( | 2005 | WildstarII-Pro[ | VHDL |
| ( | 2006 | SRC 6 MAPstation | Carte |
| ( | 2006 | SRC 6 MAPstation (E/C) | Carte |
| ( | 2006 | SRC 6 MAPstation (E) | Carte |
| ( | 2006 | WildstarII-Pro[ | VHDL |
| ( | 2007 | SRC 6 MAPstation (E) | Carte |
| ( | 2008 | SRC 6 MAPstation (E/C) | Carte |
| ( | 2011 | Gidel PROCStar III | Proc Dev. Kit |
| ( | 2019 | Intel Stratix 10 | Intel Quartus Prime Pro |
Quantitative Characteristics of Simulations Approached with FPGA-Based Acceleration
| ref | year | #atoms | force prec. | box-size |
|---|---|---|---|---|
| ( | 2004 | 8192 | 50-bit Fixed | not reported |
| ( | 2005 | 8192 | 48-bit Fixed | not reported |
| ( | 2005 | 8192 | 35-bit Fixed | not reported |
| ( | 2006 | 32 932 | 17-bit Fixed | 73.8 × 71.8 × 76.8 Å3 |
| ( | 2006 | 92 224 | 32-bit Float | 108 × 108 × 72 Å3 |
| ( | 2006 | 32 932 | 32-bit Float | 73.8 × 71.8 × 76.8 Å3 |
| ( | 2006 | 8192 | 35-bit Float | 64 × 50 × 50 Å3 |
| ( | 2007 | 23 558 | 32-bit Float | 62.23 × 62.23 × 62.23 Å3 |
| ( | 2008 | 32 932 | 32-bit Float | 73.8 × 71.8 × 76.8 Å3 |
| ( | 2019 | 20 000 | 32-bit Float | 59.5 × 51 × 51 Å3 |
| ( | 2019 | 20 000 | 32-bit Float | 59.5 × 51 × 51 Å3 |
| ( | 2019 | 23 558 | 32-bit Float | 62.23 × 62.23 × 62.23 Å3 |
| ( | 2019 | 23 558 | 32-bit Float | 62.23 × 62.23 × 62.23 Å3 |
Quantitative Characteristics of FPGA-Based Simulation Performance
| ref | year | time/day | speedup | benchmark |
|---|---|---|---|---|
| ( | 2004 | 2.34 ps | 0.29× | Intel Pentium 4 2.4 GHz |
| ( | 2005 | 517.4 ps | 57× | Intel Xeon 2.4 GHz |
| ( | 2005 | 3.8 ps | 51× | Intel Xeon 2.4 GHz |
| ( | 2006 | 0.188 ns | 2.72× | Intel Xeon 2.8 GHz |
| ( | 2006 | 28 ps | 3× | Intel Xeon 2.8 GHz |
| ( | 2006 | 0.22 ns | 1.9× | Intel Xeon 2.8 GHz |
| ( | 2006 | 0.72 ns | 15.7× | Intel Xeon 2.8 GHz |
| ( | 2007 | 0.2 ns | 3.19× | Intel Xeon 2.8 GHz |
| ( | 2008 | 246.86 ps | 2.08× | Intel Xeon 2.8 GHz (dual-core) |
| ( | 2019 | 1.4 μs | 96.5× | Intel Xeon |
| ( | 2019 | 1.4 μs | 3.29× | NVIDIA GTX 1080Ti |
| ( | 2019 | 630.25 ns | 25.3× | Intel Xeon |
| ( | 2019 | 630.25 ns | 1.1× | NVIDIA GTX 1080Ti |
Qualitative Characteristics of Simulations Approached with FPGA-Based Acceleration
| ref | year | force | LJ | coul. | alg. | arch. |
|---|---|---|---|---|---|---|
| ( | 2004 | LUT | yes | no | direct | full MD |
| ( | 2005 | LUT | yes | yes | direct | nonbond only |
| ( | 2006 | LUT | yes | yes | SPME | nonbond only |
| ( | 2006 | LUT | yes | yes | SPME | nonbond only |
| ( | 2006 | direct | yes | yes | SPME | nonbond only |
| ( | 2006 | LUT | yes | yes | direct | nonbond only |
| ( | 2007 | direct | yes | yes | PME | nonbond only |
| ( | 2008 | direct | yes | yes | SPME | nonbond only |
| ( | 2011 | direct | yes | yes | PME | nonbond only |
| ( | 2019 | LUT | yes | no | direct | full MD |
| ( | 2019 | LUT | yes | yes | PME | full MD |
Characteristics of ASIC Designs
| year | name | alg. | arch. | force calc. | force accum. |
|---|---|---|---|---|---|
| 1996 | MD-GRAPE[ | direct | nonbond only | 32-bit float | 80-bit fixed |
| 1999 | MD-Engine[ | direct | nonbond only | 40-bit float | 64-bit float |
| 2003 | MDGRAPE-2[ | direct | nonbond only | 32-bit float | 64-bit float |
| 2003 | MDGRAPE-3[ | direct | nonbond only | 32-bit float | 80-bit fixed |
| 2009 | Anton[ | GSE[ | full MD | 32–36 bit fixed | 86-bit fixed |
| 2014 | MDGRAPE-4[ | GSE[ | full MD | 32-bit float | 32-bit fixed |
| 2014 | Anton 2140 | full MD | 32–36 bit fixed | 86-bit fixed | |
| 2021 | Anton 3139 | full MD | 14–23 bit fixed | 14–23 bit fixed |
Performance of Various Accelerator Configurations to Run a Single Simulation of Dihydrofolate Reductase (DHFR)a
| accelerator | engine | time scale (ns/day) |
|---|---|---|
| Anton 3 (64-node) (ASIC) | Custom[ | 212 200 |
| Anton 2 (512-node) (ASIC) | Custom[ | 85 800 |
| Intel Stratix 10 (FPGA) | Custom[ | 630 |
| 2x Nvidia Titan-RTX (GPU) | Amber[ | 629.03 |
| NVIDIA V100 SXM (GPU) | Amber[ | 522.20 |
| NVIDIA V100 PCIE (GPU) | Amber[ | 277.14 |
| NVIDIA TITAN X (GPU) | OpenMM[ | 393 |
| NVIDIA TITAN V (GPU) | OpenMM[ | 419 |
| NVIDIA RTX 3090 (GPU) | ACEMD[ | 1308 |
DHFR is a 159-residue protein (suspended in water) target for cancer therapeutics that has been used as a standard benchmark for MD simulation throughput. All simulations reported here employ NVE microcanonical constraints. NVE refers to the set of constraints on MD simulations in which moles (N), volume (V), and energy (E) are conserved in the simulation. All simulations reported here with the exception of Anton 2[140] use the PME[32] algorithm for non-bonded interactions. Anton 2 uses the μ-series[142] algorithm for non-bonded interactions.
Selected MD Simulations among the Largest of Those Reported in the Literature
| name | # atoms | time scale (ns) | resource | engine | year |
|---|---|---|---|---|---|
| SARS-CoV-2 viral envelope[ | 304 780 149 | 84 | Summit | NAMD 2.14 | 2021 |
| H1N1 2009 viral envelope[ | 160 653 271 | 121.04 | Blue Waters | NAMD 2.10 | 2020 |
| GATA4 gene locus[ | 1 000 000 000 | 1 | Trinity | GENESIS | 2019 |
| STMV[ | 1 066 628 | 13 | NCSA Altix | NAMD 2.5 | 2006 |
Qualitative Comparison of Accelerator Classes Covered in the Discussiona
The time scales quoted here correspond to results collected for a common benchmark study that has been considered among the various architectures discussed in this work, Dihydrofolate Reductase (DHFR), and are not meant to be presented as definitive assessments.