Literature DB >> 30653494

Ten simple rules on how to create open access and reproducible molecular simulations of biological systems.

Arne Elofsson1, Berk Hess2, Erik Lindahl1,2, Alexey Onufriev3, David van der Spoel4, Anders Wallqvist5.   

Abstract

Entities:  

Mesh:

Year:  2019        PMID: 30653494      PMCID: PMC6336246          DOI: 10.1371/journal.pcbi.1006649

Source DB:  PubMed          Journal:  PLoS Comput Biol        ISSN: 1553-734X            Impact factor:   4.475


× No keyword cloud information.
All PLOS journals have an open data policy that, amongst other things, states that all data and related metadata underlying the findings reported in a submitted manuscript should be deposited in an appropriate public repository, or for smaller datasets, as supporting information. This should obviously apply to computational methods as well, but unfortunately this is not always applied in practice, although it is of greatest importance for the scientific quality of simulations [1] and other modeling projects [2]. Molecular dynamics [3] and other type of simulations [2,4] have become a fundamental part of life sciences. The simulations are dependent on a number of parameters such as force fields, initial configurations, simulation protocols, and software. Researchers have different opinions about the types of software they prefer, and in general, we believe authors should be free to choose the tools that best fit their needs. However, as scientists, we also have a common obligation to critically test each other’s statements to find mistakes (including errors in the algorithms and bugs in the code), which can be exemplified by a heated debate over simulations of supercooled water that ended up being due to a subtle algorithmic issue [5], and we believe PLOS has a particularly strong responsibility to lead this development even if it might cause some short-term grief [6]. In particular, all published results should, in principle, be possible to reproduce independently by scientists in other labs using different tools. To ensure this, we propose a set of standards that any publication in PLOS Computational Biology, and hopefully, publications in other journals as well, should follow. We do believe that the sooner such policies are widely adapted, the more open and collaborative science will flourish [7]. These 10 simple rules should not be limited to molecular dynamics but also include Monte Carlo simulations, quantum mechanics calculations, molecular docking, and any other computational methods involving computations on biological molecules.

Rule 1: The simulation protocol should be provided

The complete set of input files that are used in the simulations should be provided, either as supplementary material or preferably through a publicly available repository.

Rule 2: Topology and parameters should be accessible for everyone

All topology and parameter files used in the simulations should be provided and made publicly available so they can be implemented and tested with a different program if necessary. That means the files should either be in human-readable format or the conversion rules should be be publically available.

Rule 3: Initial coordinate files should be included

All simulations are strongly dependent on the initial conditions [8]. To ensure maximum reproducibility, the authors should provide the input coordinate files for the simulations in the appropriate formats for the software used. The input files to initiate the simulations should be provided in a format ensuring that a reader can repeat all parts of the calculation workflow himself or herself. That means the files should either be in human-readable format or the conversion rules should be publicly available.

Rule 4: Full information about all software used needs to be provided

Reviewers and readers must be able to reproduce results with as much detail as possible. This means authors need to provide enough details so that the work can be repeated with widely available programs or that the software is provided. In particular, indicate the specific version of the software package used in the simulation. To further improve reproducibility, we encourage software authors to add information about compilers, flags, and the hardware used to log files.

Rule 5: Simulation results should be deposited in a database

Following the PLOS editorial policy for data access, the authors should deposit representative snapshots from the trajectories and/or simulations in findable, accessible, interoperable, reusable (FAIR) public repositories. The deposited snapshots must be dense enough so that the reported biological insights are supported with the same statistical error margin as originally reported and so that new analyses and publications can be performed using the deposited data.

Rule 6: Results should be easy to reproduce

Although the advantages of open source software are plentiful [9], many authors still use commercial software for simulations. However, in these cases, if the software used is not publicly available, the simulation method must be provided or already published in sufficient details so that the results can be reproduced within reported margin of error using publicly available software. Software and scripts used for analysis must also be made publicly available.

Rule 7: In docking studies, details should be included

For all studies including screening and docking, the complete set of molecules tested as well as the scoring functions used and the high-ranking poses should be publicly available either as databases or detailed descriptions.

Rule 8: In quantum mechanics calculations, all energies should be included in the results

For quantum mechanics studies, the authors need to provide the following information: absolute energies and energy breakdowns, the level of theory used, the basis set used, the optimization algorithm used, and coordinates of all optimized stationary points. Ideally, the archive entry for each calculation will be provided alongside the coordinates.

Rule 9: All sampling-based results should be evaluated using proper statistics

As in all scientific studies, statistical rigor is necessary in computational studies to evaluate the significance of an observation—in particular, for any method based on sampling, such as molecular dynamics or Monte Carlo simulations. Appropriate estimates of statistical uncertainty are therefore necessary and should be included for each relevant finding.

Rule 10: Be Nice

Remember that we all are a community, so sharing your data, methodologies, software, and results in such a way so that others can use it will make the entire community thrive. This also applies to the readers—they should not expect unlimited support for using in-house software or methods. Just the fact that it is available provides an important resource for the community.

Conclusion

As a result of these discussions, PLOS Computational Biology has made the following extension to the PLOS data sharing policies: The authors should provide a README file with a list of included files and/or links to publicly available repositories along with their brief description. The authors should describe all software used including the specific version(s) used in the work and how it can be obtained. PLOS expects researchers to share software and scripts needed for the work. If this cannot be made publicly available (e.g., due to licenses), the simulation method should be provided in sufficient detail so the results can, in principle, be reproduced using publicly available software. The authors should provide the complete set of input files used to initiate the calculations, including input coordinates, topologies, and parameter files. These files must be provided in human-readable formats and should preferably be included as supplementary material. The authors should deposit trajectories in a public repository according to FAIR data principles [10]. Examples of such databases include ModEL [11], Nomad (https://nomad-repository.eu/), and the Dryad repository (https://datadryad.org/).
  8 in total

1.  MoDEL (Molecular Dynamics Extended Library): a database of atomistic molecular dynamics trajectories.

Authors:  Tim Meyer; Marco D'Abramo; Adam Hospital; Manuel Rueda; Carles Ferrer-Costa; Alberto Pérez; Oliver Carrillo; Jordi Camps; Carles Fenollosa; Dmitry Repchevsky; Josep Lluis Gelpí; Modesto Orozco
Journal:  Structure       Date:  2010-11-10       Impact factor: 5.006

2.  Modeling, informatics, and the quest for reproducibility.

Authors:  W Patrick Walters
Journal:  J Chem Inf Model       Date:  2013-06-21       Impact factor: 4.956

3.  How Modeling Standards, Software, and Initiatives Support Reproducibility in Systems Biology and Systems Medicine.

Authors:  Dagmar Waltemath; Olaf Wolkenhauer
Journal:  IEEE Trans Biomed Eng       Date:  2016-06-02       Impact factor: 4.538

4.  Is the Supporting Information the Venue for Reproducibility and Transparency?

Authors:  Benjamin Rudshteyn; Atanu Acharya; Victor S Batista
Journal:  J Phys Chem B       Date:  2017-12-28       Impact factor: 2.991

5.  Starting-structure dependence of nanosecond timescale intersubstate transitions and reproducibility of MD-derived order parameters.

Authors:  Tim Zeiske; Kate A Stafford; Richard A Friesner; Arthur G Palmer
Journal:  Proteins       Date:  2012-12-24

6.  Ten simple rules for the open development of scientific software.

Authors:  Andreas Prlić; James B Procter
Journal:  PLoS Comput Biol       Date:  2012-12-06       Impact factor: 4.475

7.  Ten simple rules for cultivating open science and collaborative R&D.

Authors:  Hassan Masum; Aarthi Rao; Benjamin M Good; Matthew H Todd; Aled M Edwards; Leslie Chan; Barry A Bunin; Andrew I Su; Zakir Thomas; Philip E Bourne
Journal:  PLoS Comput Biol       Date:  2013-09-26       Impact factor: 4.475

8.  The FAIR Guiding Principles for scientific data management and stewardship.

Authors:  Mark D Wilkinson; Michel Dumontier; I Jsbrand Jan Aalbersberg; Gabrielle Appleton; Myles Axton; Arie Baak; Niklas Blomberg; Jan-Willem Boiten; Luiz Bonino da Silva Santos; Philip E Bourne; Jildau Bouwman; Anthony J Brookes; Tim Clark; Mercè Crosas; Ingrid Dillo; Olivier Dumon; Scott Edmunds; Chris T Evelo; Richard Finkers; Alejandra Gonzalez-Beltran; Alasdair J G Gray; Paul Groth; Carole Goble; Jeffrey S Grethe; Jaap Heringa; Peter A C 't Hoen; Rob Hooft; Tobias Kuhn; Ruben Kok; Joost Kok; Scott J Lusher; Maryann E Martone; Albert Mons; Abel L Packer; Bengt Persson; Philippe Rocca-Serra; Marco Roos; Rene van Schaik; Susanna-Assunta Sansone; Erik Schultes; Thierry Sengstag; Ted Slater; George Strawn; Morris A Swertz; Mark Thompson; Johan van der Lei; Erik van Mulligen; Jan Velterop; Andra Waagmeester; Peter Wittenburg; Katherine Wolstencroft; Jun Zhao; Barend Mons
Journal:  Sci Data       Date:  2016-03-15       Impact factor: 6.444

  8 in total
  9 in total

1.  How To Be a Good Member of a Scientific Software Community [Article v1.0].

Authors:  Alan Grossfield
Journal:  Living J Comput Mol Sci       Date:  2021

2.  Mitigating Computer Limitations in Replicating Numerical Simulations of a Neural Network Model With Hodgkin-Huxley-Type Neurons.

Authors:  Paulo H Lopes; Bruno Cruz Oliveira; Anderson Abner de S Souza; Wilfredo Blanco
Journal:  Front Neuroinform       Date:  2022-05-12       Impact factor: 3.739

Review 3.  Best Practices for Making Reproducible Biochemical Models.

Authors:  Veronica L Porubsky; Arthur P Goldberg; Anand K Rampadarath; David P Nickerson; Jonathan R Karr; Herbert M Sauro
Journal:  Cell Syst       Date:  2020-08-26       Impact factor: 10.304

4.  Towards Molecular Simulations that are Transparent, Reproducible, Usable By Others, and Extensible (TRUE).

Authors:  Matthew W Thompson; Justin B Gilmer; Ray A Matsumoto; Co D Quach; Parashara Shamaprasad; Alexander H Yang; Christopher R Iacovella; Clare M Cabe; Peter T Cummings
Journal:  Mol Phys       Date:  2020-04-08       Impact factor: 1.962

Review 5.  How Do Molecular Dynamics Data Complement Static Structural Data of GPCRs.

Authors:  Mariona Torrens-Fontanals; Tomasz Maciej Stepniewski; David Aranda-García; Adrián Morales-Pastor; Brian Medel-Lacruz; Jana Selent
Journal:  Int J Mol Sci       Date:  2020-08-18       Impact factor: 5.923

6.  A new approach for extracting information from protein dynamics.

Authors:  Jenny Liu; Luís A N Amaral; Sinan Keten
Journal:  ArXiv       Date:  2022-03-16

7.  Nine quick tips for pathway enrichment analysis.

Authors:  Davide Chicco; Giuseppe Agapito
Journal:  PLoS Comput Biol       Date:  2022-08-11       Impact factor: 4.779

8.  Pre-exascale HPC approaches for molecular dynamics simulations. Covid-19 research: A use case.

Authors:  Miłosz Wieczór; Vito Genna; Juan Aranda; Rosa M Badia; Josep Lluís Gelpí; Vytautas Gapsys; Bert L de Groot; Erik Lindahl; Martí Municoy; Adam Hospital; Modesto Orozco
Journal:  Wiley Interdiscip Rev Comput Mol Sci       Date:  2022-05-30

Review 9.  Validation Through Collaboration: Encouraging Team Efforts to Ensure Internal and External Validity of Computational Models of Biochemical Pathways.

Authors:  Richard Fitzpatrick; Melanie I Stefan
Journal:  Neuroinformatics       Date:  2022-05-11
  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.