| Literature DB >> 29138947 |
Andrius Merkys1,2, Nicolas Mounet3, Andrea Cepellotti3, Nicola Marzari3, Saulius Gražulis4,5, Giovanni Pizzi3.
Abstract
In order to make results of computational scientific research findable, accessible, interoperable and re-usable, it is necessary to decorate them with standardised metadata. However, there are a number of technical and practical challenges that make this process difficult to achieve in practice. Here the implementation of a protocol is presented to tag crystal structures with their computed properties, without the need of human intervention to curate the data. This protocol leverages the capabilities of AiiDA, an open-source platform to manage and automate scientific computational workflows, and the TCOD, an open-access database storing computed materials properties using a well-defined and exhaustive ontology. Based on these, the complete procedure to deposit computed data in the TCOD database is automated. All relevant metadata are extracted from the full provenance information that AiiDA tracks and stores automatically while managing the calculations. Such a protocol also enables reproducibility of scientific data in the field of computational materials science. As a proof of concept, the AiiDA-TCOD interface is used to deposit 170 theoretical structures together with their computed properties and their full provenance graphs, consisting in over 4600 AiiDA nodes.Entities:
Keywords: DFT; Materials science; Ontology; Open data; Provenance; Reproducibility
Year: 2017 PMID: 29138947 PMCID: PMC5686034 DOI: 10.1186/s13321-017-0242-y
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Fig. 1Workflow of TCOD entry 10000008. The workflow consists of three consecutive relaxations of GaGeO structure with the Quantum ESPRESSO pw.x code. Artifacts (called “data nodes” in AiiDA) are marked as circles and processes (called “calculation nodes” in AiiDA) as rectangles. A special type of AiiDA artifact, a code, is represented by a diamond. Note that in AiiDA the direction of arrows is inverted with respect to the OPM notation [33, 34]
Fig. 2Example of a simple workflow [24]. Graph view (left) and representation in TCOD CIF (right). In the graph view, files are presented as circles and processes as squares. For the sake of brevity and clarity, file contents are not reported here for the TCOD CIF representation
Fig. 3Sample from TCOD entry 20000419. This excerpt displays computational setup and bulk modulus (in GPa), convergence criterion for cell energy and kinetic energy cut-off for wavefunctions (both in eV). Units for each data item are unambiguously defined in the TCOD dictionary
Comparison of a selection of TCOD CIF data items with respect to the corresponding ETSF variables
Comparison of a selection of TCOD CIF data items with respect to the corresponding NOMAD metadata