| Literature DB >> 20823319 |
Matús Kalas1, Pål Puntervoll, Alexandre Joseph, Edita Bartaseviciūte, Armin Töpfer, Prabakar Venkataraman, Steve Pettifer, Jan Christian Bryne, Jon Ison, Christophe Blanchet, Kristoffer Rapacki, Inge Jonassen.
Abstract
MOTIVATION: The world-wide community of life scientists has access to a large number of public bioinformatics databases and tools, which are developed and deployed using diverse technologies and designs. More and more of the resources offer programmatic web-service interface. However, efficient use of the resources is hampered by the lack of widely used, standard data-exchange formats for the basic, everyday bioinformatics data types.Entities:
Mesh:
Year: 2010 PMID: 20823319 PMCID: PMC2935419 DOI: 10.1093/bioinformatics/btq391
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Three different scenarios of sending data from an output of one web service to an input of another web service. (A) Plain textual data inside SOAP messages. Proprietary parsing and serialization, or shims are necessary. (B) Different XML formats of output of the first service and input of the second. Translation of data is in general easier but still necessary. (C) Both services using the same standardized exchange format. No transformation is necessary and data flow smoothly from one service to another.
Fig. 2.Strategy to reach maximum interoperability, as recommended by EMBRACE. Strategy comprises the technology for implementing web-service interfaces (WS-I compliant, document/literal wrapped SOAP); the common exchange format for basic bioinformatics data (BioXSD); the semantic annotation format (SAWSDL model reference); and the ontology of bioinformatics-specific computational methods, data types and resources (EDAM). The common data format (BioXSD) is essential for increasing interoperability within the programmatic access and for construction of workflows. The common vocabulary of meanings (EDAM) is essential when discovering and matching services, and doing additional semantic reasoning.
Optional elements for sequence metadata
| Element | Description |
|---|---|
| Identifies the biological source of the sequence: typically an organism, but possibly also a sample, tissue, cell line, individual, conditions or a geographic location. The generic species type may formally refer to any taxonomy and supports meta- and individual genomics data | |
| A name to identify the sequence for a human user | |
| A textual note for a human user, if necessary. (not to be parsed) | |
| Identifies the data source of the sequence: typically a public or private database or data set. Can contain an accession, database identification, provenance data (version, date), isoform. Can also identify a position of the sequence in a super-sequence or a genome. May include an explicit super-sequence, necessary in special cases | |
| Element to hold data for forward or backward translation, if necessary. May identify for example a genetic code and translational phase of an incomplete coding sequence |
Fig. 3.BioXSD format of a sequence record including the sequence and optional metadata. (A) Diagram showing the structure of the sequence record (example type is specialized towards nucleotides). (B) Restricting pattern of the sequence string (example is a general amino-acid sequence type allowing ambiguous and additional residues: Pyl and Sec). (C) Example of a simple sequence record in BioXSD. (D) Example of a BioXSD sequence record with more metadata. Figure highlights which metadata elements are textual and focus purely on human understandability, and which are formally structured allowing more automatic usage by computer applications.
BioXSD-compatible web services by the time of article submission
| Service | Provider | Function |
|---|---|---|
| BLAST | IBCP, France | Similarity search (Altschul |
| ClustalW | IBCP, France | Multiple sequence alignment (Thompson |
| GorIV | IBCP, France | Prediction of secondary structure of proteins (Garnier |
| BLAST | BCCS, Norway | Similarity search |
| MaxAlign | CBS, Denmark | Optimization of multiple sequence alignment (Gouveia-Oliveira |
| ProP | CBS, Denmark | Prediction of pro-peptide cleavage sites (Duckert |
| NetNES | CBS, Denmark | Prediction of nuclear export signals (la Cour |
Updated list with links to service descriptions is available at http://bioxsd.org.
Fig. 4.An example bioinformatics workflow (analysis pipeline). Blue rectangles are web-service calls, red ovals are data. Common, standardized BioXSD format of the exchanged data makes sure that there is no additional parsing and transforming of the data necessary between the service calls. Such web services are smoothly interoperable, allowing users to combine them without any substantial effort.