| Literature DB >> 33720735 |
Vedran Kasalica1, Veit Schwämmle2, Magnus Palmblad3, Jon Ison4, Anna-Lena Lamprecht1.
Abstract
The bio.tools registry is a main catalogue of computational tools in the life sciences. More than 17 000 tools have been registered by the international bioinformatics community. The bio.tools metadata schema includes semantic annotations of tool functions, that is, formal descriptions of tools' data types, formats, and operations with terms from the EDAM bioinformatics ontology. Such annotations enable the automated composition of tools into multistep pipelines or workflows. In this Technical Note, we revisit a previous case study on the automated composition of proteomics workflows. We use the same four workflow scenarios but instead of using a small set of tools with carefully handcrafted annotations, we explore workflows directly on bio.tools. We use the Automated Pipeline Explorer (APE), a reimplementation and extension of the workflow composition method previously used. Moving "into the wild" opens up an unprecedented wealth of tools and a huge number of alternative workflows. Automated composition tools can be used to explore this space of possibilities systematically. Inevitably, the mixed quality of semantic annotations in bio.tools leads to unintended or erroneous tool combinations. However, our results also show that additional control mechanisms (tool filters, configuration options, and workflow constraints) can effectively guide the exploration toward smaller sets of more meaningful workflows.Entities:
Keywords: automated workflow composition; computational pipelines; proteomics; scientific workflows; semantic tool annotation; workflow exploration
Year: 2021 PMID: 33720735 PMCID: PMC8041394 DOI: 10.1021/acs.jproteome.0c00983
Source DB: PubMed Journal: J Proteome Res ISSN: 1535-3893 Impact factor: 4.466
Figure 1Experimental setup of the study.
Effects of Cleaning the Tool Annotation Sets
| original | extended | full bio.tools | |
|---|---|---|---|
| number of tools in bio.tools | 24 | 751 | 17 369 |
| number of functions annotated in the tool set | 24 | 858 | 18 408 |
| number of discarded functions | 3 | 587 | 16 778 |
| number of resulting APE annotations | 21 | 271 | 1642 |
Workflow Specification for the Four Use Cases
| use case | inputs | outputs | constraints |
|---|---|---|---|
| #1 | (i) Use operation | ||
| #2 | (i) Use operation | ||
| #3 | (i) Use operation | ||
| #4 | (i) Use operation |
Figure 2Workflow quality evaluation.
Figure 3Example workflow candidates: Use Case #1.
Figure 4Example workflow candidates: Use Case #2.
Figure 5Example workflow candidates: Use Case #3.
Figure 6Example workflow candidates: Use Case #4.