| Literature DB >> 21806842 |
Toshiaki Katayama1, Mark D Wilkinson, Rutger Vos, Takeshi Kawashima, Shuichi Kawashima, Mitsuteru Nakao, Yasunori Yamamoto, Hong-Woo Chun, Atsuko Yamaguchi, Shin Kawano, Jan Aerts, Kiyoko F Aoki-Kinoshita, Kazuharu Arakawa, Bruno Aranda, Raoul Jp Bonnal, José M Fernández, Takatomo Fujisawa, Paul Mk Gordon, Naohisa Goto, Syed Haider, Todd Harris, Takashi Hatakeyama, Isaac Ho, Masumi Itoh, Arek Kasprzyk, Nobuhiro Kido, Young-Joo Kim, Akira R Kinjo, Fumikazu Konishi, Yulia Kovarskaya, Greg von Kuster, Alberto Labarga, Vachiranee Limviphuvadh, Luke McCarthy, Yasukazu Nakamura, Yunsun Nam, Kozo Nishida, Kunihiro Nishimura, Tatsuya Nishizawa, Soichi Ogishima, Tom Oinn, Shinobu Okamoto, Shujiro Okuda, Keiichiro Ono, Kazuki Oshita, Keun-Joon Park, Nicholas Putnam, Martin Senger, Jessica Severin, Yasumasa Shigemoto, Hideaki Sugawara, James Taylor, Oswaldo Trelles, Chisato Yamasaki, Riu Yamashita, Noriyuki Satoh, Toshihisa Takagi.
Abstract
BACKGROUND: The interaction between biological researchers and the bioinformatics tools they use is still hampered by incomplete interoperability between such tools. To ensure interoperability initiatives are effectively deployed, end-user applications need to be aware of, and support, best practices and standards. Here, we report on an initiative in which software developers and genome biologists came together to explore and raise awareness of these issues: BioHackathon 2009.Entities:
Year: 2011 PMID: 21806842 PMCID: PMC3170566 DOI: 10.1186/2041-1480-2-4
Source DB: PubMed Journal: J Biomed Semantics
Figure 1Attendees of the DBCLS BioHackathon 2009. The BioHackathon 2009 was attended by representatives from projects in Web services, Text Mining, Visualization and Workflow development, in addition to genome biologists who provided real-world use cases from their research.
Summary of technical problems and solutions for each use case
| Use Case 1 | Annotation of 100,000 invertebrate ESTs |
|---|---|
| Task | A researcher needs to annotate 100,000 sequences obtained from an invertebrate species and also needs to provide |
| Strategy | Annotate sequences by similarity and complement these annotations for sequences showing no similarity by integrated |
| Problem | Needed to identify which tool was most suitable for each step. Some tools turned out to require very long time for |
| Solution | Firstly, use relatively fast tools like Blast2GO and KAAS then use ANNOTATOR for limted number of sequences. |
| Tools | Blast2GO, KAAS, ANNOTATOR, BioMart, TogoDB, TogoWS, jORCA, Taverna |
| Databases | Ensembl, BioMart, KEGG |
| | |
| Task | Identify SNPs in transcription factor binding sites and visualize the result as a genome browser. |
| Strategy | Retrieve SNP and TSS datasets through the DAS protocol, then compute enrichment and export results for a DAS viewer. |
| Problem | Needed to integrate information from multiple databases and needed to customize the visualization. |
| Solution | Developed a custom-made prediction system for the data obtained from DAS sources, then customize the Ajax |
| Tools | BioDAS, Ajax DAS viewer |
| Databases | FESD II, DBTSS |
| | |
| Task | Predict interacting pairs of proteins in a given metabolic pathway. |
| Strategy | Retrieve enzymes from a specified pathway and search pairs of homologous proteins forming complexes in a |
| Problem | Found version incompatilibity of the server and client implementations of SOAP protocol. Non-standard BLAST output |
| Solution | Switch programming languages according to the service in use. Programs are written to parse BLAST results and to |
| Tools | Java, OCaml, Perl, Ruby, BLAST, DDBJ WABI, PDBj Mine, KEGG API |
| Databases | DDBJ, KEGG, PDBj, UniProt |
| | |
| Task | Find human diseases which are potentially related to SNPs and glycans. |
| Stragety | Retrieve disease genes and search for homologs in other organisms to which glyco-gene interactions are recoreded, |
| Problem | No Web service existed to query GlycoEpitopeDB and to convert a glycan structure in IUPAC format into KCF format. |
| Solution | Implemented and registered BioMoby compliant Web services. Wrote custom BeanShell script for a Taverna workflow. |
| Tools | Taverna, BioMoby, KEGG API |
| Databases | OMIM, H-InvDB, GlycoEpitopeDB, RINGS, Consortium for Functional Glycomics, GlycomeDB, GlycoGene DataBase, KEGG |
Figure 2Workflow to annotate large sets of ESTs. Sequences are firstly annotated using high-throughput systems (e.g. Blast2GO, KAAS). Remaining difficult-to-annotate sequences are subsequently passed through ANNOTATOR for deeper analysis. The combined sequences are then joined with related annotations in the remote Ensembl database using BioMart and exposed through TogoDB such that they can be consumed by workflow managers (e.g. jORCA or Taverna) as TogoWS services.
Figure 3System to enrich TFBSs with differential expression data. Data on transcriptional start sites and on functional element SNPs are combined using distributed annotation system (DAS) protocol layers for the DBTSS and FESD II databases, respectively. Providing a list of genes or proteins (e.g. gene expression data), enrichment can then be computed and exposed using a DAS viewer.
Figure 4Workflow to analyze protein interactions among enzymes in a KEGG pathway. First, protein sequences are retrieved for each enzyme in a KEGG pathway. The sequences are then BLAST searched against UniProt and a phylogenetic profile is constructed of the results. Then, for each species in the phylogenetic profile, BLAST searches are run against PDB. Pairs of protein sequences (of the same species) that have homologs in the same PDB entry are inferred to be in physical contact and hence predicted to be interacting. Conserved and interacting proteins are then visualized on the pathway map, an example of which is shown in Figure 5.
Figure 5Evolutionary conservation rate of proteins on a KEGG pathway. Evolutionary conservation rate is defined as the ratio of the number of conserved proteins, i.e. homologs, over the number of species. Conservation rate is color-coded for each node in the pathway (see legend). See text and Figure 4 for more details.
Figure 6Workflow for analyzing glyco-gene-related diseases. In the first step of this workflow, GlycoEpitope DB entries are searched for disease-related keywords by a newly developed BioMoby service called getGlycoEpitopeIDfromKeyword. The identifiers of matching entries are then used to retrieve glycan structures in IUPAC format by another newly developed BioMoby service called getIUPACfromGlycoEpitopeID. The resulting IUPAC glycans are subsequently converted to KCF format by a new RINGS service called getKCFfromIUPAC. The KCF glycans can then be used for querying other RINGS data mining services.
Figure 7Connectivity and compatibility of participating projects. The BioHackathon 2009 was attended by participants representing projects operating in a number of problem domains (shown in Figure 1). Analysis of these participating projects during the hackathon revealed compatibilities and resulting connectivity as shown here.