| Literature DB >> 20727200 |
Toshiaki Katayama1, Kazuharu Arakawa, Mitsuteru Nakao, Keiichiro Ono, Kiyoko F Aoki-Kinoshita, Yasunori Yamamoto, Atsuko Yamaguchi, Shuichi Kawashima, Hong-Woo Chun, Jan Aerts, Bruno Aranda, Lord Hendrix Barboza, Raoul Jp Bonnal, Richard Bruskiewich, Jan C Bryne, José M Fernández, Akira Funahashi, Paul Mk Gordon, Naohisa Goto, Andreas Groscurth, Alex Gutteridge, Richard Holland, Yoshinobu Kano, Edward A Kawas, Arnaud Kerhornou, Eri Kibukawa, Akira R Kinjo, Michael Kuhn, Hilmar Lapp, Heikki Lehvaslaiho, Hiroyuki Nakamura, Yasukazu Nakamura, Tatsuya Nishizawa, Chikashi Nobata, Tamotsu Noguchi, Thomas M Oinn, Shinobu Okamoto, Stuart Owen, Evangelos Pafilis, Matthew Pocock, Pjotr Prins, René Ranzinger, Florian Reisinger, Lukasz Salwinski, Mark Schreiber, Martin Senger, Yasumasa Shigemoto, Daron M Standley, Hideaki Sugawara, Toshiyuki Tashiro, Oswaldo Trelles, Rutger A Vos, Mark D Wilkinson, William York, Christian M Zmasek, Kiyoshi Asai, Toshihisa Takagi.
Abstract
Web services have become a key technology for bioinformatics, since life science databases are globally decentralized and the exponential increase in the amount of available data demands for efficient systems without the need to transfer entire databases for every step of an analysis. However, various incompatibilities among database resources and analysis services make it difficult to connect and integrate these into interoperable workflows. To resolve this situation, we invited domain specialists from web service providers, client software developers, Open Bio* projects, the BioMoby project and researchers of emerging areas where a standard exchange data format is not well established, for an intensive collaboration entitled the BioHackathon 2008. The meeting was hosted by the Database Center for Life Science (DBCLS) and Computational Biology Research Center (CBRC) and was held in Tokyo from February 11th to 15th, 2008. In this report we highlight the work accomplished and the common issues arisen from this event, including the standardization of data exchange formats and services in the emerging fields of glycoinformatics, biological interaction networks, text mining, and phyloinformatics. In addition, common shared object development based on BioSQL, as well as technical challenges in large data management, asynchronous services, and security are discussed. Consequently, we improved interoperability of web services in several fields, however, further cooperation among major database centers and continued collaborative efforts between service providers and software developers are still necessary for an effective advance in bioinformatics web service technologies.Entities:
Year: 2010 PMID: 20727200 PMCID: PMC2939597 DOI: 10.1186/2041-1480-1-8
Source DB: PubMed Journal: J Biomed Semantics
Required metadata for service description and discovery.
| Required metadata for service description | |
|---|---|
| author contact | |
| authority identification | |
| service version | |
| software title or nature of algorithm (myGrid Task ontology) | |
| software version | |
| bandwidth and/or number of requests per minute | |
| example input | |
| example output and/or REGEXP to test output | |
| some description of error-handling capacity | |
| sync/async | |
| nature of underlying data | |
| organism | |
| biological nature of data (DNA/RNA/Protein, experimental methods or platform) | |
| input parameters and purpose of each | |
| output parameters and purpose of each | |
| usage/license restrictions | |
| authentication (whether required or not) | |
| usage statistics (as per service provider) | |
| usage statistics (as per third party commentary) | |
| protocol (Moby, SOAP, REST, GET, POST, etc.) | |
| mirror servers | |
| myGrid Ontology | provides many of the annotation information elements listed above |
| Moby Object | provides an ontology of data-types |
| Moby Service | similar to myGrid's bioinformatics_task branch of the myGrid Ontology |
Applications for bioinformatics web services.
| Project | Description | GUI | Open source | Programming Language |
|---|---|---|---|---|
| BioMoby/MobyCentral | Framework/repository of the interoperable web services | - | o | Perl/Java |
| Taverna | Workflow construction tool to connect web services in a pipeline | o | o | Java (BeanShell script to extend) |
| Seahawk | Graphical interface to invoke appropriate BioMoby services | o | o | Java |
| MOWserv | Web application to handle BioMoby services in the grid environment | o | - | - |
| G-language GAE | Command line shell to access BioMoby and other web services | - | o | Perl |
| Open Bio* | Libraries including supports for bioinformatics web services | - | o | Perl/Python/Ruby/Java |
Input and output data types relevant for phyloinformatic web services.
| One Tree | exactly one tree, which might function as a query topology, as an input for topology metric calculations, or as something for which associated data (matrices) and metadata might be retrieved |
| Pair of Trees | exactly two trees, for tree reconciliation (e.g. duplication inference) or for tree-to-tree distance calculations |
| Set of Trees | input for consensus calculations, or as query topologies |
| One OTU | exactly one OTU for which associated data (trees or matrices that contain it) and metadata might be retrieved |
| Pair of OTUs | exactly two OTUs, as input for topological queries (MRCA) and calculations (patristic distance) |
| Set of OTUs | input for topological queries (MRCA) and for trees or matrices that contain them, and metadata is retrieved |
| One Node | input for tree traversal operations (parent, children) and for which metadata might be retrieved |
| Pair of Nodes | input for topological queries (MRCA) and calculations (patristic distance) |
| Set of Nodes | input for topological queries (MRCA) |
| One Character | exactly one character (matrix column) for which calculations are performed (variability) and metadata is retrieved |
| Set of Characters | input as filter predicate, to retrieve OTUs that contain recorded states for the characters |
| One Character State Sequence | input for which metadata is retrieved |
| Pair of Character State Sequences | input for pairwise alignments, as input to calculate pairwise divergence |
| Set of Character State Sequences | input for multiple sequence alignment |
| Character State Matrix | input for inference (of one tree or set of trees), for calculations (average sequence divergence) and metadata retrieval |
| Int | an integer, for things such as topology metrics (node counts) tree-to-tree distances (in branch moves) node distances (in number of nodes in between), character state counts, sequence divergence (substitution counts, site counts) |
| Float | a floating point value, for topology metrics (balance, stemminess, resolution) tree-to-tree distances (symmetric difference), patristic distance, sequence divergence |
| String | for metadata, e.g. descriptions |
| Stringvector | for metadata, e.g. a set of tags |
Standardization of data exchange formats and web services.
| Domain | Format | Service | Relevant technologies |
|---|---|---|---|
| Interaction Network | PSI-MI | PSICQUIC | DIP, STRING, STITCH, IntAct, Cytoscape, Cell Designer |
| Glycoinformatics | GlycomicsObject | BioMoby | GLYDE-II, LINUCS, KEGG GLYCAN (KCF), RINGS |
| Phyloinformatics | phyloXML, NeXML | PhyloWS | CIPRES, Kepler, BioPerl (Bio::Phylo), NEXUS |
| Text-mining | U-Compare type system | U-Compare | UIMA, Whatizit, TerMine, iHOP, Allie |
| BioSQL | BioSQL schema | - | BioPerl, BioRuby, BioPython, BioJava |
Figure 1Screenshot of Taverna workflow constructed as a case study that pipelines Japanese web services (DDBJ, PDBj and KEGG) to annotate a protein sequence by homology and structure. Green boxes indicate the actual web services, and beige and purple boxes are local BeanShell script and Java shims that function as glue codes connecting the web services.