| Literature DB >> 31061674 |
Douglas Heintz1, Michael R Gryk1.
Abstract
This paper describes our recent and ongoing efforts for enhancing the curation of scientific workflows to improve reproducibility and reusability of biomolecular nuclear magnetic resonance (bioNMR) data. Our efforts have focused on both developing a workflow management system, called CONNJUR Workflow Builder (CWB), as well as refactoring our workflow data model to make use of the PREMIS model for digital preservation. This revised workflow management system will be available through the NMRbox cloud-computing platform for bioNMR. In addition, we are implementing a new file structure which bundles the original binary data files along with PREMIS XML records describing the provenance of the data. These are packaged together using a standardized file archive utility. In this manner, the provenance and data curation information is maintained together along with the scientific data. The benefits and limitations of these approaches as well as future directions are discussed.Entities:
Year: 2019 PMID: 31061674 PMCID: PMC6499392 DOI: 10.2218/ijdc.v13i1.657
Source DB: PubMed Journal: Int J Digit Curation ISSN: 1746-8256
Figure 1.Screenshot of the graphical canvas for CONNJUR Workflow Builder. Squares represent datasets while diamonds represent actors. The above workflow is in the process of being executed. Green objects are those who have completed successively, blue are in progress, and pink actors have yet to be invoked. While actors are being bypassed and not executed in the above workflow.
Figure 2.Notepad++ screenshot of original CONNJUR Workflow Builder (CWB) XML schema. While this XML document could be shared between CWB users, the metadata schema was specific for CWB and not intended for broader distribution. It contains information about the software state as well as specific Java classes of software code.
Figure 3.Oxygen screenshot of PREMIS XML record with CONNJUR_ML metadata embedded within the ‘significantProperties’ PREMIS tag. CONNJUR_ML is an ongoing modelling task and the XML can be found on GitHub.