| Literature DB >> 19796405 |
Anna-Lena Lamprecht1, Tiziana Margaria, Bernhard Steffen.
Abstract
BACKGROUND: The development of bioinformatics databases, algorithms, and tools throughout the last years has lead to a highly distributed world of bioinformatics services. Without adequate management and development support, in silico researchers are hardly able to exploit the potential of building complex, specialized analysis processes from these services. The Semantic Web aims at thoroughly equipping individual data and services with machine-processable meta-information, while workflow systems support the construction of service compositions. However, even in this combination, in silico researchers currently would have to deal manually with the service interfaces, the adequacy of the semantic annotations, type incompatibilities, and the consistency of service compositions.Entities:
Mesh:
Year: 2009 PMID: 19796405 PMCID: PMC2755829 DOI: 10.1186/1471-2105-10-S10-S8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Figure 1BioSPICE Dashboard. Graphical user interface of the BioSPICE Dashboard [24]. Services can be arranged according to the categories location, contributor, function, and I/O type (top left).
Figure 2Bio-jETI GUI. The jABC framework, which provides the graphical user interface for Bio-jETI, supports the orchestration of processes from heterogeneous services. Workflow models are constructed graphically by placing process building blocks from a library (top left) on a canvas (center) and connecting them by labeled branches to define the flow of control. The models are directly executable by an inbuilt interpreter component (right).
Figure 3Relationship between SLTL and workflow languages. SLTL is designed to specify linear workflows on an abstract level. In conjunction with a set of services and adequate semantic information about the domain, it serves as input for the synthesis algorithm, which generates linear workflows according to the SLTL specification. The results are available as Bio-jETI SLGs, which can be further edited, combined, and refined. The SLGs can then be compiled into a number of different target languages by the GeneSys code generation framework.
Exemplary set of services. Fragment of a component library that we used in the examples. The table lists the names of the building blocks (SIBs) along with function descriptions and selected service predicates.
| SIB | Description |
|---|---|
| Archaeopteryx | Displays a phylogenetic tree [ |
| type:visualization, location:local, contributor:forester.org | |
| BLAST | BLAST [ |
| type:analysis, location:ddbj, contributor:ddbj | |
| ClustalW | Runs ClustalW [ |
| type:analysis, location:ddbj, contributor:ddbj | |
| Emma | EMBOSS [ |
| type:analysis, location:ebi, contributor:emboss | |
| ExtractPattern | Extracts all parts of a string that match a regular expression. |
| type:stringprocessing, location:local, contributor:jabc | |
| GetDDBJEntry | Fetches an entry in at file format from a DDBJ database [ |
| type:dataretrieval, location:ddbj, contributor:ddbj | |
| GetFASTA_DDBJEntry | Fetches an entry in FASTA format from a DDBJ database [ |
| type:dataretrieval, location:ddbj, contributor:ddbj | |
| List2String | Concatenates all entries of a list. |
| type:stringprocessing, location:local, contributor:jabc | |
| MatchString | Tries to match a string against a regular expression pattern. |
| type:condition, location:local, contributor:jabc | |
| PutExpression | Stores a user-supplied context expression or its value into the execution context. |
| type:definition, location:local, contributor:jabc | |
| PutInteger | Provides an integer value. |
| type:definition, location:local, contributor:jabc | |
| RepeatLoop | Realizes a counting loop. |
| type:loop, location:local, contributor:jabc | |
| ReplaceString | Replaces substrings of a string with another character sequence. |
| type:stringprocessing, location:local, contributor:jabc | |
| ShowInputDialog | Input dialog, provides a string. |
| type:definition, location:local, contributor:jabc | |
| WSDBFetch | Gets sequences from an EBI database [ |
| type:dataretrieval, location:ebi, contributor:ebi |
Figure 4Service taxonomy. Service taxonomy for the services that we use in our examples, edited in OntEd, the ontology editing plugin of the jABC.
Figure 5Type taxonomy. Type taxonomy classifying the data types involved in our examples.
Exemplary set of types. The set of data types that was used in the example processes.
| Type | Description |
|---|---|
| Accession | Single accession number. |
| AccessionList | Iteratable (java.util.)list of accession numbers. |
| Accessions | Concatenation of accession numbers, separated by some character. |
| Alignment | Multiple sequence alignment. |
| BlastResult | Tool output of BLAST. |
| ClustalWResult | Tool output of ClustalW. |
| Counter | Counter, i.e. positive integer value. |
| DDBJEntry | DDBJ entry in flat file format. |
| Limit | Limit, i.e. positive integer value. |
| Sequence | Single or multiple nucleic or amino acid sequences. |
| Tree | Phylogenetic tree. |
Figure 6Example 1. A simple phylogenetic analysis process. The upper left shows an erroneous stub for a simple phylogenetic analysis process, it lacks a sequence of services leading from a BLAST result to a phylogenetic tree. Below is the appropriate sequence of services that is proposed by our synthesis algorithm. The complete and correct analysis process is shown on the right.
Figure 7Execution of example 1. Execution of the simple phylogenetic analysis process. The execution begins with an interactive step, where a dialog is displayed in which the query sequence is entered (top). After some non-interactive steps, the finally available phylogenetic tree is displayed using Archaeopteryx (bottom).
Figure 8Blast-ClustalW workflow. Blast-ClustalW workflow as sketched by the DDBJ (following [50]).
Figure 9Example 2. The more complex Blast-ClustalW workflow. The model checking detects three errors for the original process (top). To bridge the gap between the available sequences and the required tree, the emma web service can be inserted, computing a multiple alignment and providing the corresponding phylogenetic tree. No mediating sequence can be found that converts DDBJ entry into FASTA format, but it is possible to get this format when the also available DDBJ accession number is used as input (center). The complete process (bottom) has an additional SIB emma and has substituted getFASTA_DDBJEntry by getDDBJEntry.
Figure 10Atomic propositions of the SIBs in example 1. Atomic propositions of the process stub and complete process model of example 1. The propositions describe basic data flow properties, like defined and used variables and their types in terms of the data types of the taxonomy.
Figure 11Fragment of the configuration universe. Fragment of the configuration universe based on the services and types from Tables 1 and 2. Paths through the configuration universe represent possible sequences of services. Note that the configuration universe is able to express service polymorphisms: the service ExtractPattern, for instance, can be applied to different inputs, and produces different outputs, accordingly.
Figure 12Synthesis SLG. Complete synthesis process, realized as jABC SLG. The main process (top) triggers the execution of the synthesis and displays the solution that is returned. Then, it assembles the SLG corresponding to this solution and displays it on a canvas, where it can be used for further process development. The actual synthesis is carried out by the sub-process (bottom): It captures the available domain knowledge and evaluates the workflow specification. The collected information is stored in a specific database file and sent to the synthesis algorithm, which computes one shortest solution (SynthOneShort). The generated sequence of services is then converted into the jABC's graph format in order to allow further processing within the framework.