Jan Domanski1,2, Oliver Beckstein3,4, Bogdan I Iorga5. 1. Department of Biochemistry, University of Oxford, Oxford, UK. 2. Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, USA. 3. Department of Physics, Arizona State University, Tempe, AZ, USA. 4. Center for Biological Physics, Arizona State University, Tempe, AZ, USA. 5. Institut de Chimie des Substances Naturelles, CNRS UPR 2301, Université Paris-Saclay, Labex LERMIT, Gif-sur-Yvette, France.
Abstract
SUMMARY: Ligandbook is a public database and archive for force field parameters of small and drug-like molecules. It is a repository for parameter sets that are part of published work but are not easily available to the community otherwise. Parameter sets can be downloaded and immediately used in molecular dynamics simulations. The sets of parameters are versioned with full histories and carry unique identifiers to facilitate reproducible research. Text-based search on rich metadata and chemical substructure search allow precise identification of desired compounds or functional groups. Ligandbook enables the rapid set up of reproducible molecular dynamics simulations of ligands and protein-ligand complexes. AVAILABILITY AND IMPLEMENTATION: Ligandbook is available online at https://ligandbook.org and supports all modern browsers. Parameters can be searched and downloaded without registration, including access through a programmatic RESTful API. Deposition of files requires free user registration. Ligandbook is implemented in the PHP Symfony2 framework with TCL scripts using the CACTVS toolkit. CONTACT: oliver.beckstein@asu.edu or bogdan.iorga@cnrs.fr ; contact@ligandbook.org . SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
SUMMARY: Ligandbook is a public database and archive for force field parameters of small and drug-like molecules. It is a repository for parameter sets that are part of published work but are not easily available to the community otherwise. Parameter sets can be downloaded and immediately used in molecular dynamics simulations. The sets of parameters are versioned with full histories and carry unique identifiers to facilitate reproducible research. Text-based search on rich metadata and chemical substructure search allow precise identification of desired compounds or functional groups. Ligandbook enables the rapid set up of reproducible molecular dynamics simulations of ligands and protein-ligand complexes. AVAILABILITY AND IMPLEMENTATION: Ligandbook is available online at https://ligandbook.org and supports all modern browsers. Parameters can be searched and downloaded without registration, including access through a programmatic RESTful API. Deposition of files requires free user registration. Ligandbook is implemented in the PHP Symfony2 framework with TCL scripts using the CACTVS toolkit. CONTACT: oliver.beckstein@asu.edu or bogdan.iorga@cnrs.fr ; contact@ligandbook.org . SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
The computational drug design process relies on the parameterization of atomic interactions through empirical force fields in order to probe protein–drug interactions with classical molecular dynamics or Monte Carlo simulations. The chemical diversity of drug-like molecules makes it difficult to generate reliable and transferable parameter sets for these compounds, compared to established parameters for, e.g. proteins or water. High-quality parameterizations of small molecules require specialist knowledge and the shortage of such expertise—and hence lack of high-quality parameters—presents a barrier for the field.Servers and tools exist to parameterize small molecules on demand such as ParamChem (Vanommeslaeghe and MacKerell, 2012), SwissParam (Zoete ), the Antechamber tool (Case ), the R.E.DD.B. server (Dupradeau ) and Automated Topology Builder (ATB) and Repository (Malde ). However, a reference repository for small molecule parameters generated by other means and especially for ones published in the literature is lacking. Such published parameters are valuable because they have typically been validated and can be easily used in new studies but they can be difficult to find and are often invisible to most searches, e.g. as part of supplementary information. Only a few, very specialized repositories for force field parameters exist: The Gromacs molecule & liquid database (Fischer ; van der Spoel ) contains about 500 parameter sets (OPLS-AA, AMBER/GAFF and CHARMM/CGenFF for Gromacs (Abraham )) for more than 150 organic molecules together with experimental liquid properties for validation purposes (as well as thermodynamic gas phase properties for about 3000 molecules (Ghahremanpour )). Our own Lipidbook repository (Domański ) archives lipid and detergent force field parameters since 2009 and is a successful example for how a community-curated repository can be reliably maintained over many years.Our new Ligandbook site is a public database for force-field parameters of small and drug-like molecules for all major all-atom force fields, including the popular OPLS-AA, CHARMM/CGenFF and AMBER/GAFF varieties. Ligandbook aims to enable parameter re-use and simulation reproducibility by (i) facilitating the publication of force field parameters as open data; (ii) acting as an archive for parameter sets that are supplied and maintained by the community; (iii) making large, richly annotated parameter datasets easily available through human and machine accessible interfaces.
2 Repository architecture
A set of parameters contained in the repository is called a package. For each tautomeric and ionization state of a molecule many packages can be created. A package contains a coordinate file and a topology file with the force field parameters. All files are versioned and each package-version pair has a unique and persistent package identification number. Metadata annotations as well as an abstract are stored to enable rich searching and filtering of packages. A package is linked to a user-supplied citation and carries a license. Ligandbook uses the Cactvs toolkit (Ihlenfeldt ) for the underlying cheminformatics functionality (see Implementation in Supplementary Information for details).
3 Capabilities
Search and Download. The repository can be searched using either text-based or structure-based queries. Text queries can contain words, phrases, wildcards and groupings using boolean operators as well as the advanced Apache Lucene syntax. By default, the text search is performed across all annotation types (including compound names, synonyms and abstract words) but can be limited to a single attribute (for instance, a PDB ligand id or the packageId). For chemical exact structure or substructure search, the query can be drawn interactively in the Cactvs Sketcher (Ihlenfeldt ) or entered as a SMILES string.Each package is shown with all files in its history (with SHA1 checksums), chemical structure depictions and the meta data with links to other databases. Older versions are always available and provide a transparent history of changes so that studies can be reproduced with the exact same parameters. A package can be downloaded as a zip file containing the files together with the license governing use of the parameters. References associated with the parameterizations are included so that users can ascertain appropriate use and cite the original authors appropriately. Additionally, if provided by the depositor, computed values and reference values for validation observables are shown as well as a subjective reliability score from 1 (not validated or not reliable) to 5 (very reliable); see Supplementary Information, Data structures and versioning.The database can also be queried directly with 3D coordinates. The input structure will be returned with the atoms reordered to match any found parameter files for immediate use in a simulation.Programmatic access. Results can be retrieved in YAML, XML and JSON formats for further automatic downstream processing through a RESTful API with URL-based queries.Package deposition. Because packages are curated and owned by individual users, package creation requires free registration with a valid email address. Upon submission, the uploaded coordinate and topology files are processed, checked if they can be parsed by the Gromacs grompp input processor (Abraham ), and a preview of the data is shown to the user for approval or corrections. Most of the meta data are automatically derived from the chemical structure. Users may supply a description in the abstract field and link the parameters to publications, which can be automatically fetched via PubMed IDs, as well as submit computed and reference validation values and provide a subjective reliability rating together with a free-form justification. Users must accept the CC BY-SA (Creative Commons Attribution-ShareAlike) license for the parameters and the CC0 Public Domain Dedication for all meta data (see Supplementary Information, Licenses). The package author may update some or all files, which creates a new version of the package.
4 Initial content
Currently the repository contains more than 2900 packages, formatted for use with Gromacs (Abraham ). These include 455 parameter sets that were validated with hydration free energy calculations as part of the SAMPL challenges (Beckstein and Iorga, 2012; Beckstein ; Kenney ), parameters from some of our previous studies (Simmons ), and > 2000 packages with ligands from the PDB, parameterized with mol2ff (manuscript in preparation). Because many common drugs are already included, users can easily set up simulations that probe protein-drug interactions (see Supplementary Information for an example).Coordinates are provided as pdb files. Topologies are always present as Gromacs itp files (Abraham ) although files in other formats can be optionally deposited as supplementary data, such as CHARMM prm files for CHARMM/CGenFF parameter sets. If necessary, itp files can be converted into or from other commonly used file formats (AMBER, CHARMM, Desmond, LAMMPS, etc.) using open source tools such as InterMol (https://github.com/shirtsgroup/InterMol), ParmEd (https://github.com/ParmEd/ParmEd), or acpype (Sousa da Silva and Vranken, 2012).
5 Conclusions
Ligandbook provides the infrastructure for a reference repository for ligand topologies and parameters. It is designed for growth and for wide use by the community. With its focus on open data and interoperability, it provides opportunities for other researchers to tap into a large and growing dataset of parameterizations. Ligandbook should become useful in the development of automated parameterization methods and for molecular simulations of drug-protein interactions.Click here for additional data file.
Authors: Nina M Fischer; Paul J van Maaren; Jonas C Ditz; Ahmet Yildirim; David van der Spoel Journal: J Chem Theory Comput Date: 2015-06-02 Impact factor: 6.006
Authors: David A Case; Thomas E Cheatham; Tom Darden; Holger Gohlke; Ray Luo; Kenneth M Merz; Alexey Onufriev; Carlos Simmerling; Bing Wang; Robert J Woods Journal: J Comput Chem Date: 2005-12 Impact factor: 3.376
Authors: Katie J Simmons; Scott M Jackson; Florian Brueckner; Simon G Patching; Oliver Beckstein; Ekaterina Ivanova; Tian Geng; Simone Weyand; David Drew; Joseph Lanigan; David J Sharples; Mark S P Sansom; So Iwata; Colin W G Fishwick; A Peter Johnson; Alexander D Cameron; Peter J F Henderson Journal: EMBO J Date: 2014-06-21 Impact factor: 14.012
Authors: Mariona Torrens-Fontanals; Tomasz Maciej Stepniewski; David Aranda-García; Adrián Morales-Pastor; Brian Medel-Lacruz; Jana Selent Journal: Int J Mol Sci Date: 2020-08-18 Impact factor: 5.923
Authors: Maria Lopez-Redondo; Shujie Fan; Akiko Koide; Shohei Koide; Oliver Beckstein; David L Stokes Journal: J Gen Physiol Date: 2021-07-13 Impact factor: 4.086