Literature DB >> 19465388

WebLab: a data-centric, knowledge-sharing bioinformatic platform.

Xiaoqiao Liu¹, Jianmin Wu, Jun Wang, Xiaochuan Liu, Shuqi Zhao, Zhe Li, Lei Kong, Xiaocheng Gu, Jingchu Luo, Ge Gao.

Abstract

With the rapid progress of biological research, great demands are proposed for integrative knowledge-sharing systems to efficiently support collaboration of biological researchers from various fields. To fulfill such requirements, we have developed a data-centric knowledge-sharing platform WebLab for biologists to fetch, analyze, manipulate and share data under an intuitive web interface. Dedicated space is provided for users to store their input data and analysis results. Users can upload local data or fetch public data from remote databases, and then perform analysis using more than 260 integrated bioinformatic tools. These tools can be further organized as customized analysis workflows to accomplish complex tasks automatically. In addition to conventional biological data, WebLab also provides rich supports for scientific literatures, such as searching against full text of uploaded literatures and exporting citations into various well-known citation managers such as EndNote and BibTex. To facilitate team work among colleagues, WebLab provides a powerful and flexible sharing mechanism, which allows users to share input data, analysis results, scientific literatures and customized workflows to specified users or groups with sophisticated privilege settings. WebLab is publicly available at http://weblab.cbi.pku.edu.cn, with all source code released as Free Software.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2009 PMID： 19465388 PMCID： PMC2703900 DOI： 10.1093/nar/gkp428

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

To explore mechanisms underlying complex biological processes, high-throughput analysis techniques and multidisciplinary approaches are becoming main aspects of current biological research. Rapid growth of biological research places great demands on an integrative bioinformatic workbench to help biological researchers to mine knowledge from complex heterogeneous data. Several bioinformatic analysis systems with intuitive user interface have been implemented in recent years (1–13). While some of them are designed as wrappers for a few specified software packages, a number of systems provide further support to popular bioinformatic analysis tools. Several systems including Taverna (3–5), BioManager (6), Galaxy (7,8), PISE (9), MOWServ (11) and HNB (13) support workflow-based analysis to make complex analysis much easier for non-experts. Moreover, Taverna (3–5), BioManager (6), Galaxy (7,8), PISE (9) and WildFire (10) also allow users to create workflows, increasing the flexibility and customizability. On the other hand, while the importance of team work for research success is being widely recognized (3,14–16), few existing systems provide enough support for collaborative team work. Some systems allow users to store their input data and analysis results online (1,2,6–8,11–13), and BioManager (6), Galaxy (7,8) also support users to share their stored data and workflows. Moreover, with the help of some ‘Web 2.0’ websites, researchers can upload and share their annotation information and workflow online (14,16–18). However, to our best knowledge, no bioinformatic analysis platform with comprehensive supports for data managing, analyzing and sharing in a web-based integrative environment is publically available to the research community yet. Here, we have developed a data-centric knowledge-sharing platform WebLab to support biological researchers to efficiently manage, analyze and share their data in an easy-to-use integrative environment. As a data-centric platform, WebLab provides dedicated user space to store and manage input data, analysis results and scientific literatures online. Supports for searching against full text, extracting citation information from PubMed, and exporting citations to EndNote and BibTeX are provided for literature, which is missing in other existing systems. By assembling customized workflows from 260+ integrated bioinformatic tools, complex analysis tasks could be performed automatically. In order to facilitate team work, WebLab provides powerful and flexible sharing mechanism and group strategy. Users can share their data, literatures and customized workflows with specific users or user-groups with sophisticated privilege settings. WebLab is publicly accessible at http://weblab.cbi.pku.edu.cn, with all source code available for downloading freely.

DESIGN AND SYSTEM ARCHITECTURE

To be flexible for further extension and development, WebLab is designed with a modularization approach including three main modules: data management, analysis service and team work (Figure 1).

Figure 1.

Overview of WebLab architecture. WebLab is comprised of three main functional modules. (i) Data management module (in red frame) maintains the user data space (My Data and My Literature), and also provides supports for accessing remote databases through BioMart and SRS; (ii) Analysis service module (in blue frame) provides a uniform framework to integrate more than 260 popular bioinformatic analysis tools (including command-line program, Web-service and Grid-service), and supports workflow in both Protocol and Macro models; (iii) Team work module (in yellow frame) keeps track of user shared data, literatures and workflows. Virtual research group (VRG) is designed to help collaborators share their works easily.

Data management

As a data-centric platform, WebLab provides a powerful data management system for users to store and manage their data and scientific literatures online. In their own data space (‘My Data’), users can create a new entry by uploading a file from local disk or retrieving from remote databases through BioMart (19) and SRS (20). After data type for the newly created file is specified, WebLab can recognize the format and automatically detect available analysis tools in a context-sensitive approach. The entries in My Data are presented in hierarchical tree structure as in daily-used local file system. Users can create, rename and delete files and directories (folders) in My Data by simple mouse clicks. Moreover, users could also associate user-defined labels (Tags) or comments to entries in My Data, to classify and organize them in flexible and intuitive ways (Figure 2).

Figure 2.

My Data—user-dedicated space to store and manage input data and analysis results. The entries in My Data are presented in the hierarchical tree structure as in local file system. (A) WebLab provides an option to add user-defined labels (Tags) to each item in My Data as an annotation. Entries can later be classified and organized according to associated Tags in ‘tag view’. (B) For each entry, users can create comment, and the detailed content for a comment will be displayed when users put mouse over the yellow icon. The comment will also be shared out along with the data entries. In addition to conventional operations supported in My Data, rich literature-specified functions are provided in ‘My Literature’. After uploading literature, WebLab automatically generates HTML preview for a quick check of the paper's content in browser without downloading the whole article. Then, WebLab extracts and indexes full text contents for uploaded articles. When the indexing is done, users can do simple keyword search or complex query search against full text of literatures existing in My Literature. Moreover, citation information could be fetched from NCBI according to PubMed ID or title and to be further associated to respective article in My Literature. All citation information could be easily exported into various well-known citation managers such as EndNote and BibTex (Figure 3).

Figure 3.

WebLab provides rich supports for scientific literatures. (A) Literatures in PDF, Microsoft Word and plain text formats could be stored and managed in My Literature. In addition to conventional operations in My Data, rich literature-specified functions are provided in My Literature. (B) WebLab provides ‘HTML View’ for a quick check of the contents in browser without downloading. (C) (D) Users can fetch citation information from NCBI PubMed repository through PubMed ID or title. (E) Citation information can be batch exported in well-known bibliography formats such as Endnote, BibTeX, ADS reference, ISI, RIS and Word 2007 bibliography format. (F) (G) After the index is built, users can perform search against all literatures in My Literature. The matched literatures will be sorted by their relevant scores and the searched key words are highlighted.

Analysis service

As an integrative bioinformatic analysis platform, WebLab integrates numerous analysis tools within a uniform framework. In addition to command-line programs, Web-services and Grid-services are also integrated in WebLab with full interoperation (Table S1). By organizing different tools into a workflow (21), complex analysis tasks are performed in one run. In a workflow, several analysis tools are launched according to previously user-defined rules. Currently, two workflow models with different user interaction abilities are available. In the Protocol model, a workflow is executed stepwise, and the user can tune parameters or options in each step, thus providing maximum flexibility. On the other hand, in the Macro model, after mandatory parameters are first inputted by the user, each tool in a workflow will be sequentially executed. Thus, Macro is more suitable for routine analysis. Moreover, an existing Macro could be re-used and treated as a standard analysis tool to define new workflow (recursive definition), which further simplifies users’ daily work and increases flexibility. Besides defining their own workflows, users can also use several pre-defined workflows for common analysis tasks such as phylogenetic tree construction and protein function analysis (Figure 4).

Figure 4.

A workflow for protein function analysis. The workflow is represented as Directed Acyclic Graph (DAG) in WebLab. The rectangular node of the DAG means a single tool or a defined Macro; the diamond node stands for operators which control the execution of workflows. WebLab supports two types of operators: condition operator (for example, nodes 1 and 3 in figure) determines following analysis depending on a given condition (for example, the workflow in figure will try to translate the input sequences by transeq for DNA sequences); parallel operator (for example, node 4 in figure) is used to share incoming data among several parallel analysis tools or combine several incoming data into one sink. A few popular client-side utilities including Sequence Manipulation Suite (SMS) (22), WebMol (23), Dotlet (24) and JalView (25) are also integrated into WebLab for users to perform interactive work such as editing multiple sequence alignment or visualizing structure. While those utilities could not be incorporated into the workflow like other standard analysis tools due to their interactive nature, they are proved to be useful in daily work. Moreover, users also can keep their favorite analysis tools in their ‘My Toolbox’ for quick access.

Team work

Collaborations among several researchers in various fields and different locations are recently becoming more and more common, and also crucial for research success. To facilitate collaborative team work, WebLab provides flexible sharing mechanism and group strategy for users to share their data and knowledge. In WebLab, a user can share almost everything he owns with other users. For entries in My Data and My Literature, both ‘read only’ and ‘read and write’ sharing privileges are provided. By employing the reference-count based sharing model, changes in these shared contents will be seen by all collaborators simultaneously to assure efficient cooperation among all partners. On the other hand, once a user-defined workflow is shared out, a copy will be made which can be modified without altering the original one, to prevent possible flaw caused by recursive definition of workflow (Figure 5).

Figure 5.

Team work support in WebLab. (A) In WebLab, a user can share with his colleagues almost everything he owns. Analysis data and scientific literatures can be shared out with ‘read only’ or ‘read and write’ privileges. On the other hand, a different sharing strategy is applied for workflow to prevent possible flaw caused by recursive definition of workflow, i.e. once a user-defined workflow is shared out, the collaborators will get a copy of this workflow, which can be edited without altering the content of the original one. (B) Through group strategy provided by WebLab, users can create their own virtual research groups (VRG) in web environment. The group creator is privileged to edit or to delete the whole group. And a group member can also choose to quit from the group freely.

IMPLEMENTATION AND AVAILABILITY

Given the heavy computational load, WebLab is implemented as a loosely coupled distributed system. The portal server holds the web interface and acts as a proxy to users’ requests. With dispatch daemon running, several backend computing servers execute the required operations following the request from WebLab portal server. The results will be sent back to the portal server after the analysis is finished and saved into database maintained by the portal server. Call-back mechanism is widely used in WebLab system to increase the flexibility. Adding a new tool does not require writing additional codes besides changing an XML format configuration file. WebLab was developed using Java 1.5, providing it with the platform-independent advantage. WebLab uses Apache Tomcat as container for Java Servlet and JSP, MySQL as backend database system to store user data and other necessary information. WebLab also uses Graphviz (http://www.graphviz.org) to produce figures, and Lucene (http://lucene.apache.org) as information retrieval library to build index and search information. WebLab is publicly accessible at http://weblab.cbi.pku.edu.cn and is compatible with the most common web browsers such as Mozilla Firefox (version 2 and 3) and Internet Explorer (version 6, 7 and 8). Online HTML and video tutorials are being actively maintained and updated. The source code of WebLab is released as ‘Free Software’ under the GNU General Public License version 3 (GPLv3), and freely available for downloading.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

China National High-tech 863 Programs [2006AA02Z334, 2006AA02Z314, 2006AA02A312]; China high-tech platform program. Funding for open access charge: China National High-tech (863) Program (2006AA02Z334). Conflict of interest statement. None declared.

24 in total

1. Dotlet: diagonal plots in a web browser.

Authors: T Junier; M Pagni
Journal: Bioinformatics Date: 2000-02 Impact factor: 6.937

2. The design of Jemboss: a graphical user interface to EMBOSS.

Authors: Tim Carver; Alan Bleasby
Journal: Bioinformatics Date: 2003-09-22 Impact factor: 6.937

3. The Helmholtz Network for Bioinformatics: an integrative web portal for bioinformatics resources.

Authors: T Crass; I Antes; R Basekow; P Bork; C Buning; M Christensen; H Claussen; C Ebeling; P Ernst; V Gailus-Durner; K-H Glatting; R Gohla; F Gössling; K Grote; K Heidtke; A Herrmann; S O'Keeffe; O Kiesslich; S Kolibal; J O Korbel; T Lengauer; I Liebich; M van der Linden; H Luz; K Meissner; C von Mering; H-T Mevissen; H-W Mewes; H Michael; M Mokrejs; T Müller; H Pospisil; M Rarey; J G Reich; R Schneider; D Schomburg; S Schulze-Kremer; K Schwarzer; I Sommer; S Springstubbe; S Suhai; G Thoppae; M Vingron; J Warfsmann; T Werner; D Wetzler; E Wingender; R Zimmer
Journal: Bioinformatics Date: 2004-01-22 Impact factor: 6.937

4. Taverna: a tool for the composition and enactment of bioinformatics workflows.

Authors: Tom Oinn; Matthew Addis; Justin Ferris; Darren Marvin; Martin Senger; Mark Greenwood; Tim Carver; Kevin Glover; Matthew R Pocock; Anil Wipat; Peter Li
Journal: Bioinformatics Date: 2004-06-16 Impact factor: 6.937

5. Big data: Wikiomics.

Authors: Mitch Waldrop
Journal: Nature Date: 2008-09-04 Impact factor: 49.962

6. Towards a paradigm shift in biology.

Authors: W Gilbert
Journal: Nature Date: 1991-01-10 Impact factor: 49.962

7. WebMol--a Java-based PDB viewer.

Authors: D Walther
Journal: Trends Biochem Sci Date: 1997-07 Impact factor: 13.807

8. The EBI SRS server--recent developments.

Authors: Evgeni M Zdobnov; Rodrigo Lopez; Rolf Apweiler; Thure Etzold
Journal: Bioinformatics Date: 2002-02 Impact factor: 6.937

9. The Jalview Java alignment editor.

Authors: Michele Clamp; James Cuff; Stephen M Searle; Geoffrey J Barton
Journal: Bioinformatics Date: 2004-01-22 Impact factor: 6.937

10. Pegasys: software for executing and integrating analyses of biological sequences.

Authors: Sohrab P Shah; David Y M He; Jessica N Sawkins; Jeffrey C Druce; Gerald Quon; Drew Lett; Grace X Y Zheng; Tao Xu; B F Francis Ouellette
Journal: BMC Bioinformatics Date: 2004-04-19 Impact factor: 3.169

11 in total

1. The BDNF Valine 68 to Methionine Polymorphism Increases Compulsive Alcohol Drinking in Mice That Is Reversed by Tropomyosin Receptor Kinase B Activation.

Authors: Vincent Warnault; Emmanuel Darcq; Nadege Morisot; Khanhky Phamluong; Linda Wilbrecht; Stephen M Massa; Frank M Longo; Dorit Ron
Journal: Biol Psychiatry Date: 2015-06-12 Impact factor: 13.382

2. Recessive male sterility in cabbage (Brassica oleracea var. capitata) caused by loss of function of BoCYP704B1 due to the insertion of a LTR-retrotransposon.

Authors: Jia-Lei Ji; Li-Mei Yang; Zhi-Yuan Fang; Mu Zhuang; Yang-Yong Zhang; Hong-Hao Lv; Yu-Mei Liu; Zhan-Sheng Li
Journal: Theor Appl Genet Date: 2017-04-12 Impact factor: 5.699

3. Rice-Map: a new-generation rice genome browser.

Authors: Jun Wang; Lei Kong; Shuqi Zhao; He Zhang; Liang Tang; Zhe Li; Xiaocheng Gu; Jingchu Luo; Ge Gao
Journal: BMC Genomics Date: 2011-03-30 Impact factor: 3.969

4. AHD2.0: an update version of Arabidopsis Hormone Database for plant systematic studies.

Authors: Zhiqiang Jiang; Xiaochuan Liu; Zhiyu Peng; Yinan Wan; Yusi Ji; Wenrong He; Wen Wan; Jingchu Luo; Hongwei Guo
Journal: Nucleic Acids Res Date: 2010-11-02 Impact factor: 16.971

5. LSD: a leaf senescence database.

Authors: Xiaochuan Liu; Zhonghai Li; Zhiqiang Jiang; Yi Zhao; Jinying Peng; Jinpu Jin; Hongwei Guo; Jingchu Luo
Journal: Nucleic Acids Res Date: 2010-11-18 Impact factor: 16.971

6. AutismKB: an evidence-based knowledgebase of autism genetics.

Authors: Li-Ming Xu; Jia-Rui Li; Yue Huang; Min Zhao; Xing Tang; Liping Wei
Journal: Nucleic Acids Res Date: 2011-12-01 Impact factor: 16.971

7. ABrowse--a customizable next-generation genome browser framework.

Authors: Lei Kong; Jun Wang; Shuqi Zhao; Xiaocheng Gu; Jingchu Luo; Ge Gao
Journal: BMC Bioinformatics Date: 2012-01-05 Impact factor: 3.169

8. Positively-charged semi-tunnel is a structural and surface characteristic of polyphosphate-binding proteins: an in-silico study.

Authors: Zheng Zachory Wei; Greg Vatcher; Alvin Hok Yan Tin; Jun Lin Teng; Juan Wang; Qing Hua Cui; Jian Guo Chen; Albert Cheung Hoi Yu
Journal: PLoS One Date: 2015-04-16 Impact factor: 3.240

9. Genetic diversity of Salp15 in the Ixodes ricinus complex (Acari: Ixodidae).

Authors: Xin Wang; Yong Huang; Si-bo Niu; Bao-Gui Jiang; Na Jia; Leo van der Geest; Xue-bing Ni; Yi Sun; Wu-Chun Cao
Journal: PLoS One Date: 2014-04-08 Impact factor: 3.240

10. Teaching the ABCs of bioinformatics: a brief introduction to the Applied Bioinformatics Course.

Authors: Jingchu Luo
Journal: Brief Bioinform Date: 2013-09-05 Impact factor: 11.622