Literature DB >> 32026396

The Biological Structure Model Archive (BSM-Arc): an archive for in silico models and simulations.

Gert-Jan Bekker1, Takeshi Kawabata2, Genji Kurisu2.   

Abstract

We present the Biological Structure Model Archive (BSM-Arc, https://bsma.pdbj.org), which aims to collect raw data obtained via in silico methods related to structural biology, such as computationally modeled 3D structures and molecular dynamics trajectories. Since BSM-Arc does not enforce a specific data format for the raw data, depositors are free to upload their data without any prior conversion. Besides uploading raw data, BSM-Arc enables depositors to annotate their data with additional explanations and figures. Furthermore, via our WebGL-based molecular viewer Molmil, it is possible to recreate 3D scenes as shown in the corresponding scientific article in an interactive manner. To submit a new entry, depositors require an ORCID ID to login, and to finally publish the data, an accompanying peer-reviewed paper describing the work must be associated with the entry. Submitting their data enables researchers to not only have an external backup but also provide an opportunity to promote their work via an interactive platform and to provide third-party researchers access to their raw data.

Entities:  

Keywords:  Archive; Database; Homology modeling; Molecular dynamics; Raw data; Sharing

Year:  2020        PMID: 32026396      PMCID: PMC7242595          DOI: 10.1007/s12551-020-00632-5

Source DB:  PubMed          Journal:  Biophys Rev        ISSN: 1867-2450


The Protein Data Bank (PDB) is one of the largest collaborative scientific archives on the planet, holding the molecular structures of various biological macromolecules, such as proteins, DNA, and RNA obtained via experimental methods (Burley et al. 2019). The submitted structures were all resolved using experimental methods such as X-ray crystallography, nuclear magnetic resonance, or electron microscopy. Recently, PDB-Dev was developed as an archive to incorporate data from various experimental methods, describing structures using complementary experimental and computational techniques (Burley et al. 2017). In the past, the PDB also included several theoretical models, but they were removed more than a decade ago and later adopted by the Protein Model Portal (Arnold et al. 2009). Since then, there have been several attempts by the community at establishing an archive for computational structural biology data, in addition to more general sharing methods such as Zenodo (https://zenodo.org/). Dynameomics was developed about a decade ago and contains analysis results obtained from short MD simulations at room and high temperature for a large number of small proteins and peptides performed by the Daggett group (van der Kamp et al. 2010). Similarly, Molecular Dynamics Extended Library contains analysis results obtained from MD simulations at room temperature performed by the Orozco group (Meyer et al. 2010). Finally, GPCRmd (http://www.gpcrmd.org/) contains MD simulation results specifically for GPCR systems. Still, efforts to construct a single, public archive for raw data from computational sources have proven to be difficult. Here, we present the Biological Structure Model Archive (BSM-Arc or BSMA) as an archive for computationally derived structural biology data. Thus, BSM-Arc for purely computationally derived data was designed to serve as the counterpart to the PDB for experimentally derived data and PDB-Dev for integrative/hybrid data. We accept a wide range of data derived via various computational methods and encourage depositors of experimental structures to the PDB that have also performed computational analysis on their structures, to also submit the data corresponding to their computational work to BSM-Arc. Depositors are free to submit their data in any format, but the data should be thoroughly documented if non-standard formats were to be used. Besides 3D structures, analysis results, either in text/binary files or in marked-up tables, can be added. Although the uploaded data files are format-free, meta-data is stored in the BSMA-STAR format, which is a format similar to the PDBx/mmCIF format, and the file can also be downloaded by the user. Meta-data such as file annotations, external database linking (e.g., to PDB and UniProt entries), and extensive descriptions can be added via an interface and are then stored in the BSMA-STAR formatted file. Thus, each BSM-Arc entry consists of a meta-data file in the BSMA-STAR format listing all the annotations, in addition to a set of raw data files uploaded by the depositor. Important to note though is that since we perform no extensive peer-review on the data and the methodology used to obtain the data, we require the data to be accompanied with a peer-reviewed paper that describes the methods used to obtain the data and a discussion of the results. Finally, for released entries, BSM-Arc incorporates viewers for 3D structures, images, and texts for standard formats, to enable users to view the data without requiring them to download the raw data. Prospective depositors require an ORCID ID (https://orcid.org/) to submit new data. The ORCID ID enables not only the community to uniquely identify the authors of an entry but also some basic verification of the work via past achievements related to the same authors. The policies of the archive are currently very flexible and simple; the data must be related to structural biology and an accompanying peer-reviewed paper is required before publication. Although it is possible to upload data before acceptance of a paper, publication requires the data to have been discussed in a peer-reviewed paper. The data to be submitted is also free to be decided upon by the depositor. Raw data, representative data, and a combination thereof are all accepted. In case large amounts of data are submitted, it is advisable to add some additional documentation to describe the organization. For this, BSM-Arc provides several annotation methods. Multiple free-text panels can be added to an entry to add an extensive description of the data, its organization, the data formats used, a summary of the paper, etc. (Fig. 1). New entries can also be initialized from a BSMA-STAR formatted file, so that depositors can pre-set various meta-data. Files can be easily uploaded in parallel via a web interface at high speeds, so that large files can also be submitted. Files and folders can also be individually annotated by depositors if they wish to do so (Fig. 1). Depositors can also upload a graphical abstract image, which will be shown on the entry page and with the search results. Upon completing an entry, depositors can mark an entry for release, and after checking the entry for potential issues by one of our biocurators (primarily to check whether an appropriate peer-reviewed paper has been associated), the entry will be released immediately, assuming no issues were found. After release, entries can be modified by the depositors, but need to be rechecked by a biocurator upon re-release.
Fig. 1

Editor/submission tool showing BSM-00001. The top-center panel (named “Project editor”) can be used to add meta-data to the entry and add extensive descriptions via full-text panels. The bottom-center panel (named “File manager”) can be used to upload new files (either via drag-and-drop operations or via the buttons) and assign per-file/folder annotations (description). Double clicking on supported files opens them in the BSM-Arc viewer (e.g., the file fig1B.mjs is shown in the bottom-right corner), while double clicking on folders accesses the clicked folder. Right clicking shows a context menu from which, e.g., the description can be modified and the files downloaded

Editor/submission tool showing BSM-00001. The top-center panel (named “Project editor”) can be used to add meta-data to the entry and add extensive descriptions via full-text panels. The bottom-center panel (named “File manager”) can be used to upload new files (either via drag-and-drop operations or via the buttons) and assign per-file/folder annotations (description). Double clicking on supported files opens them in the BSM-Arc viewer (e.g., the file fig1B.mjs is shown in the bottom-right corner), while double clicking on folders accesses the clicked folder. Right clicking shows a context menu from which, e.g., the description can be modified and the files downloaded Previously, Protein Data Bank Japan (PDBj) developed its own WebGL based molecular viewer, Molmil (Bekker et al. 2016), which has been integrated into many of our services (Kinjo et al. 2017, 2018). BSM-Arc also integrates Molmil for the visualization of submitted 3D structures and MD trajectories. A file manager enables users to quickly explore the submitted files, including any potential descriptions set by the depositors (Fig. 3). Double clicking on structural files will automatically open these files using Molmil. In addition, BSM-Arc also supports scripted mjs files, Molmil’s custom scripting format (Bekker et al. 2016), which is a mix between pymol-commands (Schrödinger 2015) and raw JavaScript code. This enables complex styling and annotation of the 3D structures and could be used to present the figures shown in the accompanying paper in an interactive manner. It also enables depositors to prepare movies, by loading a combination of structure (e.g., gro or pdb files) and trajectory (e.g., xtc or trr files) files. Molmil can also be embedded into the free-text panels, so that extensive descriptions can be combined with elaborate and interactive representations of the corresponding molecules.
Fig. 3

Published entry BSM-00001 at https://bsma.pdbj.org/entry/1. a In the top panel, the title, graphical abstract, authors, DOI, and links to external databases are listed. Below that, the free-text panels configured by the depositors are shown and finally the file manager, which works as the file manager described in Fig. 1, except no files can be uploaded and no modifications can be made. Here, two methods of annotation are used, first via a free-text panel (named “Description”), which describes the general layout of the uploaded data. Secondly, for the major files and folders, a per-file or per-folder description is included in the “File manager” panel. b List of raw data files included in one of the raw data folders of the entry (https://bsma.pdbj.org/entry/1/path/data/raw/300K/1fvc/1). The input and output files (both ASCII and binary) to/from the MD software were uploaded as is, without any modifications. For this entry, the individual trajectory files (md.xtc) were outputted during the simulation without solvent, making the trajectory files relatively small (albeit that there are 250 such trajectories in this entry). c The file md.gro loaded using the integrated Molmil viewer. In order to load a trajectory file (e.g., md.xtc) from this state, Molmil’s command line must be used, which can be accessed by clicking on the “<” icon in the bottom-left corner. From here, entering the command “load md.xtc” will download and load the file. Finally, to play the trajectory, the “mplay” command can be used

Several entries have already been submitted to BSM-Arc, in various formats, sizes, and annotation styles. BSM-00001, BSM-00002, BSM-00003, BSM-00004, BSM-00006, BSM-00007, and BSM-00009 pertain to MD simulations (Bekker et al. 2017, 2019a, b; Inaba et al. 2018; Oda et al. 2018; Numoto et al. 2018; Nagarathinam et al. 2018), while BSM-00005 pertains to molecular docking (Kawabata et al. 2017) and BSM-00011 and BSM-00012 to homology models (Ishizuka et al. 2017; Kimura et al. 2017). All the projects concerning MD simulations include representative structures, but BSM-00001 also includes all the raw trajectory data including topologies and preparation files. BSM-00009 also includes trajectory files, but only of the final production run. Because of the large number of files for BSM-00001, some file/folder description is included for the higher-level folders, while in addition, a general description of the entire project is given in a free-text panel. BSM-00001, BSM-00002, BSM-00004, and BSM-00007 also contain interactive versions of the images included in the corresponding papers via Molmil script files. BSM-00005, BSM-00006, BSM-00011, and BSM-00012 make extensive use of per-file annotations to explain the nature of the data files of the entries. New entries can be submitted before releasing them in case the paper has not yet been accepted yet, e.g., to refer to the BSM-Arc entry from your paper. This has been done for BSM-00008 (Bekker et al. 2020) and BSM-00010, which were registered before completing peer-review. Then, after the paper has been published, the DOI can be assigned and the entries can be released. This is similar to the HPUB status (hold until publication) found in the PDB. Thus, a wide range of data submission and annotation styles can be used with the archive, and newer ones can be added based on feedback from the community. Upon release, entries become immediately available and searchable (Fig. 2). In addition to the standard keyword-based search, we have also implemented a low-level SQL search methodology to enable users to easily search for specific meta-data of the released entries, similar to the PDBj Mine 2 RDB (Kinjo et al. 2017, 2018). Users can access individual entries to find more information provided by the depositors, or download the raw data files (Fig. 3). BSM-Arc entries are also cross-linked with PDB entries on the PDBj website, given that the depositors have added the corresponding annotation.
Fig. 2

List of published entries at https://bsma.pdbj.org/search/bsma. Published entries are shown as their title, the authors, a graphical abstract set by the depositors, and the deposition, modification, and release dates

List of published entries at https://bsma.pdbj.org/search/bsma. Published entries are shown as their title, the authors, a graphical abstract set by the depositors, and the deposition, modification, and release dates Published entry BSM-00001 at https://bsma.pdbj.org/entry/1. a In the top panel, the title, graphical abstract, authors, DOI, and links to external databases are listed. Below that, the free-text panels configured by the depositors are shown and finally the file manager, which works as the file manager described in Fig. 1, except no files can be uploaded and no modifications can be made. Here, two methods of annotation are used, first via a free-text panel (named “Description”), which describes the general layout of the uploaded data. Secondly, for the major files and folders, a per-file or per-folder description is included in the “File manager” panel. b List of raw data files included in one of the raw data folders of the entry (https://bsma.pdbj.org/entry/1/path/data/raw/300K/1fvc/1). The input and output files (both ASCII and binary) to/from the MD software were uploaded as is, without any modifications. For this entry, the individual trajectory files (md.xtc) were outputted during the simulation without solvent, making the trajectory files relatively small (albeit that there are 250 such trajectories in this entry). c The file md.gro loaded using the integrated Molmil viewer. In order to load a trajectory file (e.g., md.xtc) from this state, Molmil’s command line must be used, which can be accessed by clicking on the “<” icon in the bottom-left corner. From here, entering the command “load md.xtc” will download and load the file. Finally, to play the trajectory, the “mplay” command can be used BSM-Arc is still only in its infancy, with many of its policies and features being quite basic. We have implemented multiple basic methods for annotation to allow depositors to freely find and use their own style. Although in the future, we would like to unify everything under a single style, first a consensus within community must be reached. We would like to invite the wider computational community to try and evaluate our archive, to help us shape it, like the experimental community has for done for the PDB.
  18 in total

1.  Structural Dynamics of the PET-Degrading Cutinase-like Enzyme from Saccharomonospora viridis AHK190 in Substrate-Bound States Elucidates the Ca2+-Driven Catalytic Cycle.

Authors:  Nobutaka Numoto; Narutoshi Kamiya; Gert-Jan Bekker; Yuri Yamagami; Satomi Inaba; Kentaro Ishii; Susumu Uchiyama; Fusako Kawai; Nobutoshi Ito; Masayuki Oda
Journal:  Biochemistry       Date:  2018-08-27       Impact factor: 3.162

2.  Accurate Prediction of Complex Structure and Affinity for a Flexible Protein Receptor and Its Inhibitor.

Authors:  Gert-Jan Bekker; Narutoshi Kamiya; Mitsugu Araki; Ikuo Fukuda; Yasushi Okuno; Haruki Nakamura
Journal:  J Chem Theory Comput       Date:  2017-05-18       Impact factor: 6.006

3.  Dynameomics: a comprehensive database of protein dynamics.

Authors:  Marc W van der Kamp; R Dustin Schaeffer; Amanda L Jonsson; Alexander D Scouras; Andrew M Simms; Rudesh D Toofanny; Noah C Benson; Peter C Anderson; Eric D Merkley; Steven Rysavy; Dennis Bromley; David A C Beck; Valerie Daggett
Journal:  Structure       Date:  2010-03-14       Impact factor: 5.006

4.  Structural and thermodynamic characterization of endo-1,3-β-glucanase: Insights into the substrate recognition mechanism.

Authors:  Masayuki Oda; Satomi Inaba; Narutoshi Kamiya; Gert-Jan Bekker; Bunzo Mikami
Journal:  Biochim Biophys Acta Proteins Proteom       Date:  2017-12-12       Impact factor: 3.036

5.  New tools and functions in data-out activities at Protein Data Bank Japan (PDBj).

Authors:  Akira R Kinjo; Gert-Jan Bekker; Hiroshi Wako; Shigeru Endo; Yuko Tsuchiya; Hiromu Sato; Hafumi Nishi; Kengo Kinoshita; Hirofumi Suzuki; Takeshi Kawabata; Masashi Yokochi; Takeshi Iwata; Naohiro Kobayashi; Toshimichi Fujiwara; Genji Kurisu; Haruki Nakamura
Journal:  Protein Sci       Date:  2017-09-18       Impact factor: 6.725

6.  Protein Data Bank Japan (PDBj): updated user interfaces, resource description framework, analysis tools for large structures.

Authors:  Akira R Kinjo; Gert-Jan Bekker; Hirofumi Suzuki; Yuko Tsuchiya; Takeshi Kawabata; Yasuyo Ikegawa; Haruki Nakamura
Journal:  Nucleic Acids Res       Date:  2016-10-26       Impact factor: 16.971

7.  Thermal stability of single-domain antibodies estimated by molecular dynamics simulations.

Authors:  Gert-Jan Bekker; Benson Ma; Narutoshi Kamiya
Journal:  Protein Sci       Date:  2018-12-20       Impact factor: 6.725

8.  A novel rare variant R292H in RTN4R affects growth cone formation and possibly contributes to schizophrenia susceptibility.

Authors:  H Kimura; Y Fujita; T Kawabata; K Ishizuka; C Wang; Y Iwayama; Y Okahisa; I Kushima; M Morikawa; Y Uno; T Okada; M Ikeda; T Inada; A Branko; D Mori; T Yoshikawa; N Iwata; H Nakamura; T Yamashita; N Ozaki
Journal:  Transl Psychiatry       Date:  2017-08-22       Impact factor: 6.222

9.  Outward open conformation of a Major Facilitator Superfamily multidrug/H+ antiporter provides insights into switching mechanism.

Authors:  Kumar Nagarathinam; Yoshiko Nakada-Nakura; Christoph Parthier; Tohru Terada; Narinobu Juge; Frank Jaenecke; Kehong Liu; Yunhon Hotta; Takaaki Miyaji; Hiroshi Omote; So Iwata; Norimichi Nomura; Milton T Stubbs; Mikio Tanabe
Journal:  Nat Commun       Date:  2018-10-01       Impact factor: 14.919

10.  Protein Data Bank: the single global archive for 3D macromolecular structure data.

Authors: 
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

View more
  13 in total

1.  Dynamic Docking Using Multicanonical Molecular Dynamics: Simulating Complex Formation at the Atomistic Level.

Authors:  Gert-Jan Bekker; Narutoshi Kamiya
Journal:  Methods Mol Biol       Date:  2021

2.  Big data science at AMED-BINDS.

Authors:  Haruki Nakamura
Journal:  Biophys Rev       Date:  2020-02-06

3.  Overview of the big data bioinformatics symposium (2SCA) at BSJ2019.

Authors:  Tsuyoshi Shirai; Tohru Terada
Journal:  Biophys Rev       Date:  2020-02-14

4.  Biophysical Reviews' national biophysical society partnership program.

Authors:  Damien Hall
Journal:  Biophys Rev       Date:  2020-04-29

5.  Difference of binding modes among three ligands to a receptor mSin3B corresponding to their inhibitory activities.

Authors:  Tomonori Hayami; Narutoshi Kamiya; Kota Kasahara; Takeshi Kawabata; Jun-Ichi Kurita; Yoshifumi Fukunishi; Yoshifumi Nishimura; Haruki Nakamura; Junichi Higo
Journal:  Sci Rep       Date:  2021-03-17       Impact factor: 4.379

6.  Using Open Data to Rapidly Benchmark Biomolecular Simulations: Phospholipid Conformational Dynamics.

Authors:  Hanne S Antila; Tiago M Ferreira; O H Samuli Ollila; Markus S Miettinen
Journal:  J Chem Inf Model       Date:  2021-01-26       Impact factor: 4.956

7.  Cryptic-site binding mechanism of medium-sized Bcl-xL inhibiting compounds elucidated by McMD-based dynamic docking simulations.

Authors:  Gert-Jan Bekker; Ikuo Fukuda; Junichi Higo; Yoshifumi Fukunishi; Narutoshi Kamiya
Journal:  Sci Rep       Date:  2021-03-03       Impact factor: 4.379

8.  Functional characterization of rare NRXN1 variants identified in autism spectrum disorders and schizophrenia.

Authors:  Kanako Ishizuka; Tomoyuki Yoshida; Takeshi Kawabata; Ayako Imai; Hisashi Mori; Hiroki Kimura; Toshiya Inada; Yuko Okahisa; Jun Egawa; Masahide Usami; Itaru Kushima; Mako Morikawa; Takashi Okada; Masashi Ikeda; Aleksic Branko; Daisuke Mori; Toshiyuki Someya; Nakao Iwata; Norio Ozaki
Journal:  J Neurodev Disord       Date:  2020-09-17       Impact factor: 4.025

Review 9.  An RNA-centric historical narrative around the Protein Data Bank.

Authors:  Eric Westhof; Neocles B Leontis
Journal:  J Biol Chem       Date:  2021-03-18       Impact factor: 5.157

10.  Knowledge-based structural models of SARS-CoV-2 proteins and their complexes with potential drugs.

Authors:  Atsushi Hijikata; Clara Shionyu-Mitsuyama; Setsu Nakae; Masafumi Shionyu; Motonori Ota; Shigehiko Kanaya; Tsuyoshi Shirai
Journal:  FEBS Lett       Date:  2020-05-25       Impact factor: 3.864

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.