| Literature DB >> 29048555 |
W Digan1, H Countouris1, M Barritault2, D Baudoin1, P Laurent-Puig3, H Blons2,3, A Burgun1, B Rance1.
Abstract
Next-generation sequencing is used on a daily basis to perform molecular analysis to determine subtypes of disease (e.g., in cancer) and to assist in the selection of the optimal treatment. Clinical bioinformatics handles the manipulation of the data generated by the sequencer, from the generation to the analysis and interpretation. Reproducibility and traceability are crucial issues in a clinical setting. We have designed an approach based on Docker container technology and Galaxy, the popular bioinformatics analysis support open-source software. Our solution simplifies the deployment of a small-size analytical platform and simplifies the process for the clinician. From the technical point of view, the tools embedded in the platform are isolated and versioned through Docker images. Along the Galaxy platform, we also introduce the AnalysisManager, a solution that allows single-click analysis for biologists and leverages standardized bioinformatics application programming interfaces. We added a Shiny/R interactive environment to ease the visualization of the outputs. The platform relies on containers and ensures the data traceability by recording analytical actions and by associating inputs and outputs of the tools to EDAM ontology through ReGaTe. The source code is freely available on Github at https://github.com/CARPEM/GalaxyDocker.Entities:
Keywords: Docker; Galaxy; ReGaTE; reproducibility
Mesh:
Year: 2017 PMID: 29048555 PMCID: PMC5691353 DOI: 10.1093/gigascience/gix099
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Overall architecture of our Galaxy Docker solution. Each tool comes as an independent Docker image.
Figure 2:The R/Shiny interface embedded inside Galaxy. In this example, the Shiny interface provides boxplot visualizations of gene coverage (outliers represent patients presenting amplification or deletion for these genes).
Figure 3:The AM “one click” interface: after registering to Galaxy, the user is able to use the AM. (A) This window lists the datasets ready for analysis. By clicking on the “Start” button, a pop-up window will appear. (B) The user can select specific files (here BAM files) and start an analytic workflow within Galaxy (in this example, the Samtools idstat analysis) for each of the files.
Traceability logs of Galaxy tools
| Name | Version | workflows | supported | dataHandle | dataDescription | dataFormatEdamOntology |
|---|---|---|---|---|---|---|
| tools_id | files_id | |||||
| samtools_ phase | 2.0 | samtools_phase_ 2.0.json | 3 | input_bam | Select dataset to phase |
|
| samtools_ idxstats | 2.0 | samtools_idxstats_ 2.0.json | 4 | input | BAM file |
|
For each tool located on our Galaxy instance, we registered the tool name, the version, and the URI of the Edam ontology, which describe the input and output data of a specific tool.
Traceability of reference genomes
| genomeID | version | Description | localization |
|---|---|---|---|
| Hg19plasma | Hg19 | Human(Homosapiens):Hg19plasma | /genomes/plasmaMutation/hg19/ hg19.fasta |
Traceability logs of user activities
| job_create_time | job_user_email_id | job_tool_id_id | job_params | job_inputs | File_name |
|---|---|---|---|---|---|
| 2017–05-12T08:51:39.373441 |
| samtools_phase_ 2.0.json | {u’chromInfo': u'“/Galaxy-central/tool-data/shared/ucsc/chrom/? .len”“, u’option_set”: u'{“__current_case__”: 1, “min_bq”: “10,” “read_depth”: “242,” “drop_ambiguous”: “False,” “min_het”: “37,” “block_length”: “13,” “option_sets”: “advanced,” “ignore_chimeras”: “False”}“, u’dbkey”: u'“?”'} | {u’input_bam': {u'src': u’hda', u’id': u’f2db41e1fa331b3e', u’uuid': u'5203fb22-cc3c-43f3–9e31–90bd22ded709'}} | MitoChrondrieH. bam |
| 2017–05-12T08:51:20.094488 |
| samtools_idxstats_ 2.0.json | {u’chromInfo': u'“/Galaxy-central/tool-data/shared/ucsc/chrom/? .len”“, u’dbkey”: u'“?”'} | {u’input': {u'src': u’hda', u’id': u’f2db41e1fa331b3e', u’uuid': u'5203fb22-cc3c-43f3–9e31–90bd22ded709'}} | MitoChrondrieH. bam |
| 2017–05-12T08:50:44.205220 |
| upload1_1.1.4. json | {u’files': u'[{“to_posix_lines”: “Yes,” “NAME”: “None,” “file_data”: “/tmp/nginx_upload_store/ 0000000001,” “space_to_tab”: null, “url_paste”: “", ”__index__“: 0, ”ftp_files“: ”“, ”uuid“: ”None“}]“, u’paramfile”: u'”/export/Galaxy-central/database/files/ tmpkDBRmt““, u’file_type”: u'”auto““, u’files_metadata”: u'{”file_type“: ”auto“, ”__current_case__“: 39}“, u’async_datasets”: u'”None““, u’dbkey”: u'”?"'} | {} | MitoChrondrieH. bam |
We store all information related to the execution of a job inside Galaxy. We store the date, the user name, the tool ID, the parameters used, and the name of the inputdata used.