Literature DB >> 29718213

The Scientific Filesystem.

Vanessa Sochat1,2.   

Abstract

Background: Here, we present the Scientific Filesystem (SCIF), an organizational format that supports exposure of executables and metadata for discoverability of scientific applications. The format includes a known filesystem structure, a definition for a set of environment variables describing it, and functions for generation of the variables and interaction with the libraries, metadata, and executables located within. SCIF makes it easy to expose metadata, multiple environments, installation steps, files, and entry points to render scientific applications consistent, modular, and discoverable. A SCIF can be installed on a traditional host or in a container technology such as Docker or Singularity. We start by reviewing the background and rationale for the SCIF, followed by an overview of the specification and the different levels of internal modules ("apps") that the organizational format affords. Finally, we demonstrate that SCIF is useful by implementing and discussing several use cases that improve user interaction and understanding of scientific applications. SCIF is released along with a client and integration in the Singularity 2.4 software to quickly install and interact with SCIF. When used inside of a reproducible container, a SCIF is a recipe for reproducibility and introspection of the functions and users that it serves.
Results: We use SCIF to evaluate container software, provide metrics, serve scientific workflows, and execute a primary function under different contexts. To encourage collaboration and sharing of applications, we developed tools along with an open source, version-controlled, tested, and programmatically accessible web infrastructure. SCIF and associated resources are available at https://sci-f.github.io. The ease of using SCIF, especially in the context of containers, offers promise for scientists' work to be self-documenting and programatically parseable for maximum reproducibility. SCIF opens up an abstraction from underlying programming languages and packaging logic to work with scientific applications, opening up new opportunities for scientific software development.

Entities:  

Mesh:

Year:  2018        PMID: 29718213      PMCID: PMC5952957          DOI: 10.1093/gigascience/giy023

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


  10 in total

1.  BioShaDock: a community driven bioinformatics shared Docker-based tools registry.

Authors:  François Moreews; Olivier Sallou; Hervé Ménager; Yvan Le Bras; Cyril Monjeaud; Christophe Blanchet; Olivier Collin
Journal:  F1000Res       Date:  2015-12-14

2.  Snakemake--a scalable bioinformatics workflow engine.

Authors:  Johannes Köster; Sven Rahmann
Journal:  Bioinformatics       Date:  2012-08-20       Impact factor: 6.937

3.  AlgoRun: a Docker-based packaging system for platform-agnostic implemented algorithms.

Authors:  Abdelrahman Hosny; Paola Vera-Licona; Reinhard Laubenbacher; Thibauld Favre
Journal:  Bioinformatics       Date:  2016-03-02       Impact factor: 6.937

4.  The Scientific Filesystem.

Authors:  Vanessa Sochat
Journal:  Gigascience       Date:  2018-05-01       Impact factor: 6.524

5.  Large-scale automated synthesis of human functional neuroimaging data.

Authors:  Tal Yarkoni; Russell A Poldrack; Thomas E Nichols; David C Van Essen; Tor D Wager
Journal:  Nat Methods       Date:  2011-06-26       Impact factor: 28.547

6.  Reproducibility of neuroimaging analyses across operating systems.

Authors:  Tristan Glatard; Lindsay B Lewis; Rafael Ferreira da Silva; Reza Adalat; Natacha Beck; Claude Lepage; Pierre Rioux; Marc-Etienne Rousseau; Tarek Sherif; Ewa Deelman; Najmeh Khalili-Mahani; Alan C Evans
Journal:  Front Neuroinform       Date:  2015-04-24       Impact factor: 4.081

7.  The impact of Docker containers on the performance of genomic pipelines.

Authors:  Paolo Di Tommaso; Emilio Palumbo; Maria Chatzou; Pablo Prieto; Michael L Heuer; Cedric Notredame
Journal:  PeerJ       Date:  2015-09-24       Impact factor: 2.984

8.  The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments.

Authors:  Krzysztof J Gorgolewski; Tibor Auer; Vince D Calhoun; R Cameron Craddock; Samir Das; Eugene P Duff; Guillaume Flandin; Satrajit S Ghosh; Tristan Glatard; Yaroslav O Halchenko; Daniel A Handwerker; Michael Hanke; David Keator; Xiangrui Li; Zachary Michael; Camille Maumet; B Nolan Nichols; Thomas E Nichols; John Pellman; Jean-Baptiste Poline; Ariel Rokem; Gunnar Schaefer; Vanessa Sochat; William Triplett; Jessica A Turner; Gaël Varoquaux; Russell A Poldrack
Journal:  Sci Data       Date:  2016-06-21       Impact factor: 6.444

9.  Singularity: Scientific containers for mobility of compute.

Authors:  Gregory M Kurtzer; Vanessa Sochat; Michael W Bauer
Journal:  PLoS One       Date:  2017-05-11       Impact factor: 3.240

10.  CarrierSeq: a sequence analysis workflow for low-input nanopore sequencing.

Authors:  Angel Mojarro; Julie Hachey; Gary Ruvkun; Maria T Zuber; Christopher E Carr
Journal:  BMC Bioinformatics       Date:  2018-03-27       Impact factor: 3.169

  10 in total
  2 in total

1.  Ten simple rules for writing Dockerfiles for reproducible data science.

Authors:  Daniel Nüst; Vanessa Sochat; Ben Marwick; Stephen J Eglen; Tim Head; Tony Hirst; Benjamin D Evans
Journal:  PLoS Comput Biol       Date:  2020-11-10       Impact factor: 4.475

2.  The Scientific Filesystem.

Authors:  Vanessa Sochat
Journal:  Gigascience       Date:  2018-05-01       Impact factor: 6.524

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.