| Literature DB >> 28381997 |
Jonathan Passerat-Palmbach1, Romain Reuillon2, Mathieu Leclaire2, Antonios Makropoulos1, Emma C Robinson1, Sarah Parisot1, Daniel Rueckert1.
Abstract
OpenMOLE is a scientific workflow engine with a strong emphasis on workload distribution. Workflows are designed using a high level Domain Specific Language (DSL) built on top of Scala. It exposes natural parallelism constructs to easily delegate the workload resulting from a workflow to a wide range of distributed computing environments. OpenMOLE hides the complexity of designing complex experiments thanks to its DSL. Users can embed their own applications and scale their pipelines from a small prototype running on their desktop computer to a large-scale study harnessing distributed computing infrastructures, simply by changing a single line in the pipeline definition. The construction of the pipeline itself is decoupled from the execution context. The high-level DSL abstracts the underlying execution environment, contrary to classic shell-script based pipelines. These two aspects allow pipelines to be shared and studies to be replicated across different computing environments. Workflows can be run as traditional batch pipelines or coupled with OpenMOLE's advanced exploration methods in order to study the behavior of an application, or perform automatic parameter tuning. In this work, we briefly present the strong assets of OpenMOLE and detail recent improvements targeting re-executability of workflows across various Linux platforms. We have tightly coupled OpenMOLE with CARE, a standalone containerization solution that allows re-executing on a Linux host any application that has been packaged on another Linux host previously. The solution is evaluated against a Python-based pipeline involving packages such as scikit-learn as well as binary dependencies. All were packaged and re-executed successfully on various HPC environments, with identical numerical results (here prediction scores) obtained on each environment. Our results show that the pair formed by OpenMOLE and CARE is a reliable solution to generate reproducible results and re-executable pipelines. A demonstration of the flexibility of our solution showcases three neuroimaging pipelines harnessing distributed computing environments as heterogeneous as local clusters or the European Grid Infrastructure (EGI).Entities:
Keywords: high performance computing; large datasets; neuroimaging; parameter exploration; pipeline; reproducibility; workflow systems
Year: 2017 PMID: 28381997 PMCID: PMC5361107 DOI: 10.3389/fninf.2017.00021
Source DB: PubMed Journal: Front Neuroinform ISSN: 1662-5196 Impact factor: 4.081
Figure 1Organization of OpenMOLE around three axes: the .
Figure 2Embedding a CARE archive in OpenMOLE with the CARETask.
Parameters and their values for the Logistic Regression classifier.
| C | {0.1; 0.5; 1; 5; 10; 50; 100} | Inverse of regularization strength |
| Penalty | {11; 12} | Norm used in the penalization |
| Seed | 0 | Seed initializing the Pseudorandom number generator |
Figure 3Representation of the Haxby decoder workflow: OpenMOLE elements (.
Average prediction scores out of 12 leave-one-out cross validations (± standard deviation) for subject 1 of the Haxby dataset.
| 0.175096102728 | 0.379731749159 | 0.158451539359 | 0.604156173217 | 0.848084821382 | 0.233506609807 | 0.666680287676 | 0.375028443778 | ||
| (±0.231049939141) | (±0.276586523275) | (±0.164116793689) | (±0.230013422495) | (±0.193749012866) | (±0.244159557883) | (±0.147995188894) | (±0.269241140013) | ||
| 0.451689065765 | 0.62791249587 | 0.406748456287 | 0.732189944558 | 0.87213622291 | 0.44593597263 | 0.716080161668 | 0.493902257873 | ||
| (±0.150073309605) | (±0.244023131922) | (±0.13915694051) | (±0.16331860702) | (±0.173805716785) | (±0.257571429727) | (±0.174958432456) | (±0.193535971087) | ||
| 0.47399243887 | 0.632733430141 | 0.440552878726 | 0.734600107495 | 0.891882988013 | 0.429409863592 | 0.703740403044 | 0.487136011563 | ||
| (±0.148313849961) | (±0.22733488965) | (±0.137281494554) | (±0.133217998162) | (±0.133412666949) | (±0.28355944954) | (±0.175397314428) | (±0.18609138877) | ||
| 0.471410676788 | 0.619848767217 | 0.445594322982 | 0.73255493045 | 0.888374216083 | 0.432704249094 | 0.69471103372 | 0.482796210813 | ||
| (±0.164574051317) | (±0.219134957488) | (±0.0886794828788) | (±0.118718190623) | (±0.134171749036) | (±0.268164483186) | (±0.203315689402) | (±0.189916530226) | ||
| 0.466838744958 | 0.638928250112 | 0.444088450235 | 0.732606643443 | 0.892668605904 | 0.405431521822 | 0.71285195265 | 0.497537256804 | ||
| (±0.166579054815) | (±0.219737279322) | (±0.0842821140037) | (±0.122777408669) | (±0.118406687757) | (±0.281777460975) | (±0.210434007693) | (±0.192609613885) | ||
| 0.489227398669 | 0.63862354636 | 0.455303030303 | 0.688123601676 | 0.857546710256 | 0.416753246753 | 0.764532755937 | 0.486474730818 | ||
| (±0.173950833724) | (±0.182743896393) | (±0.120853053014) | (±0.109180148231) | (±0.117807127625) | (±0.263269896373) | (±0.201940516096) | (±0.18853099241) | ||
| 0.478975007701 | 0.673136147956 | 0.478630692661 | 0.648015275109 | 0.830941774901 | 0.437724466891 | 0.755797787415 | 0.495224735512 | ||
| (±0.188926437971) | (±0.164701994218) | (±0.156000687925) | (±0.157277843991) | (±0.166878589573) | (±0.231837298803) | (±0.208210996079) | (±0.181335061366) | ||
| 0.419064747547 | 0.529790472655 | 0.540197885259 | 0.524839160021 | 0.607328524302 | 0.503213203538 | 0.775192036147 | 0.511069835451 | ||
| (±0.196075499695) | (±0.214583690108) | (±0.177491061481) | (±0.174635484257) | (±0.219564887192) | (±0.157493925525) | (±0.182461202755) | (±0.190309164145) | ||
| 0.442440703126 | 0.541560090043 | 0.545476902154 | 0.540376138138 | 0.633534946986 | 0.514000952751 | 0.790346387359 | 0.492847276932 | ||
| (±0.20440149609) | (±0.206602126667) | (±0.17485799816) | (±0.184370175442) | (±0.230112025184) | (±0.151190492209) | (±0.185784023473) | (±0.165133338032) | ||
| 0.43321401391 | 0.5356102353 | 0.539036006956 | 0.549934036724 | 0.63394509057 | 0.511795894766 | 0.779956776969 | 0.506150386029 | ||
| (±0.196947156928) | (±0.203785520359) | (±0.168881360193) | (±0.188614078434) | (±0.223754917472) | (±0.147923692585) | (±0.181218528886) | (±0.168495305593) | ||
| 0.437276734917 | 0.539067074353 | 0.536326639695 | 0.561171120546 | 0.639533423003 | 0.51426105273 | 0.772198269399 | 0.505117174666 | ||
| (±0.19981805444) | (±0.204503694746) | (±0.178953853726) | (±0.199103541317) | (±0.224359546867) | (±0.152538374463) | (±0.182761066327) | (±0.166622660548) | ||
| 0.436835824451 | 0.53423394452 | 0.535061891372 | 0.563763615878 | 0.639533423003 | 0.514457769593 | 0.769480878095 | 0.507298139645 | ||
| (±0.200058165728) | (±0.204880105531) | (±0.177213495836) | (±0.199772413453) | (±0.224359546867) | (±0.154085991243) | (±0.182828486106) | (±0.166133356219) | ||
| 0.438753630834 | 0.542474425035 | 0.531695098097 | 0.561135198439 | 0.644376756606 | 0.495566262135 | 0.769480878095 | 0.504692100888 | ||
| (±0.206114028623) | (±0.21163564676) | (±0.174964145627) | (±0.200185974598) | (±0.226195394616) | (±0.144364998604) | (±0.182828486106) | (±0.167107109882) | ||
| 0.438753630834 | 0.546178279103 | 0.530134975891 | 0.561135198439 | 0.643647391194 | 0.495953500594 | 0.764509986917 | 0.503501624698 | ||
| (±0.206114028623) | (±0.206459632778) | (±0.173661532191) | (±0.200185974598) | (±0.226021956671) | (±0.144993468122) | (±0.183050686489) | (±0.167085408715) |
Description of the parameters optimized for the MSM tool.
| Lambda | 3 | [0.00001, 100.0] | Weights the contribution of the regularizer relative to the similarity force. |
| sigma_in | 3 | [2; 10] | Sets the input smoothing: this changes the smoothing kernel's standard deviation |
| Iterations | 3 | [3; 5] | Controls the number of iterations at each resolution. |
| Galaxy | Yes | DRMAA clusters | No | No (manual cluster deployment) |
| Taverna | Yes | No | No | No |
| FastR | Yes | DRMAA clusters | No | No |
| LONI | No | DRMAA clusters | No | No (manual cluster deployment) |
| NiPype | Yes | PBS/Torque, SGE | No | No |
| Kepler | Yes | PBS, Condor, LoadLeveler | Globus | No |
| Pegasus | No (need local Condor) | Condor, PBS | No | No (manual cluster deployment) |
| PSOM | Yes | No | No | No |
| OpenMOLE | Yes | Condor, Slurm, PBS, SGE, OAR | EC2 (fully automated) |
| Galaxy | No | Yes | BioInformatics | AFL 3.0 |
| Taverna | No | Yes | BioInformatics | Apache 2.0 |
| FastR | Python | No | Neuroimaging | BSD |
| LONI | No | Yes | Neuroimaging | Proprietary (LONI) |
| NiPype | Python | No | Neuroimaging | BSD |
| Kepler | Partly with R | Yes | Generic | BSD |
| Pegasus | Python, Java, Perl | No | Generic | Apache 2.0 |
| PSOM | Matlab | No | Generic | MIT |
| OpenMOLE | Domain Specific Language, Scala | Yes | Generic | AGPL 3 |
Information was drawn from the web pages in footnote when present, or from the reference paper cited in the section otherwise.
| None | 4 cores | 20′36″ | Debian 8 | 4.6.0-1-amd64 | |
| SSH | 8 cores | 28′14″ | Ubuntu 14.04 | 3.13.0-91-generic | |
| Slurm | 312 cores | 14′50″ | Ubuntu 14.04 | 3.13.0-63-generic | |
| PBS | 13,558 cores | 48′25″ | Red Hat Enterprise Linux Server release 6.7 | 2.6.32-573.12.1.el6.x86_64 | |
| EMI/gLite | 650,000 cores | 27′15″ | CentOS 6/Scientific Linux | 2.6.32-642.6.2.el6.x86_64 |
| Permanent | 2.7.12 | OpenJDK 1.8.0_91 | Yes | |
| Shared, permanent | 2.7.6 | OpenJDK 1.7.0_101 | Yes | |
| Shared, permanent | 2.7.6 | OpenJDK 1.7.0_101 | No | |
| Temporary | 2.6.6 | OpenJDK 1.7.0_101 | No | |
| Shared, temporary | 2.7.8 | OpenJDK 1.6.0_40 | No |