Literature DB >> 15980569

BRIGEP--the BRIDGE-based genome-transcriptome-proteome browser.

A Goesmann¹, B Linke, D Bartels, M Dondrup, L Krause, H Neuweger, S Oehm, T Paczian, A Wilke, F Meyer.

Abstract

The growing amount of information resulting from the increasing number of publicly available genomes and experimental results thereof necessitates the development of comprehensive systems for data processing and analysis. In this paper, we describe the current state and latest developments of our BRIGEP bioinformatics software system consisting of three web-based applications: GenDB, EMMA and ProDB. These applications facilitate the processing and analysis of bacterial genome, transcriptome and proteome data and are actively used by numerous international groups. We are currently in the process of extensively interconnecting these applications. BRIGEP was developed in the Bioinformatics Resource Facility of the Center for Biotechnology at Bielefeld University and is freely available. A demo project with sample data and access to all three tools is available at https://www.cebitec.uni-bielefeld.de/groups/brf/software/brigep/. Code bundles for these and other tools developed in our group are accessible on our FTP server at ftp.cebitec.uni-bielefeld.de/pub/software/.

Entities: Species

Mesh：

Substances：

Year: 2005 PMID： 15980569 PMCID： PMC1160161 DOI： 10.1093/nar/gki400

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

In the last few years, advances in high-throughput sequencing techniques have dramatically decreased time and costs needed for obtaining the DNA sequence of an organism. Currently, more than 1300 finished or ongoing genome projects are listed in the GOLD (1) database. For most organisms under investigation more and more experimental data are collected, e.g. by transcriptomics, proteomics and metabolomics experiments. To process and analyze these large amounts of datasets, software packages for each of these areas have been developed in recent years [e.g. ARTEMIS (2) and ERGO (3) for genome annotation or BASE (4) and GECKO (5) for gene expression data analysis]. Approaches to link and integrate data originating from these different areas have also been found to be a useful means for collecting new knowledge in an easy and more intuitive way [e.g. PRIME (6) and HaloLex ()]. In the Bioinformatics Resource Facility of the Center for Biotechnology, we have developed the following three applications: GenDB (7) for genome annotation, EMMA (8) for transcriptome analyses and ProDB (9) for proteome analyses. Each system exhibits a full-featured analysis software for the respective area, ranging from raw data processing to diverse and advanced functions for analyzing the processed data. In this report, we briefly describe the newly developed web frontends available for each of these applications as well as first examples of their ongoing integration enabled by the BRIDGE integration layer (10). Using the resulting new web-based system BRIGEP, our collaborators from all over the world are able to process and analyze their data, as well as sharing it with other members of the community. As we have included a project management component into our software, a user or a community can also decide whether and when their data are made available to the public. In the following, we illustrate the functionality of the three web interfaces and show examples of the ongoing integration of the data.

DESIGN AND IMPLEMENTATION

The BRIGEP system currently provides access to the three applications GenDB, EMMA and ProDB. The underlying data are stored in SQL databases. Each system has a three-tiered architecture based on an object-relational mapping provided by the in-house developed O2DBI (11) tool. The storage back-end can be accessed via an application programmer's interface (API). Standard methods for retrieving, manipulating and deleting objects in a persistent manner are automatically generated, while additional methods for more complex tasks are added manually as extensions to each class. To be able to restrict data access to authorized users, data are organized in projects, e.g. a GenDB project is usually set up for the annotation of a single genome or a ProDB project for maintaining proteome data of a single organism. Projects and their members are managed using the General Project Management System, for which we have developed a web interface as well (not described here). The BRIDGE layer is used to interconnect data objects from different projects. Based on the described API and the BRIDGE layer, which is provided in Perl, the web functionality is established using Perl CGI scripts. Via these scripts, authorized users (members of a project) can handle data input, manipulation, processing and retrieval. Complex analyses like the automated annotation of a whole genome or the analyses of a large bundle of mass spectra (MS) can also be initiated and subsequently visualized via the web interface. Documentation for users as well as for programmers is provided in the form of a WIKI ().

Details on the applications

In the following, we describe the three web applications in greater detail. GenDB is an open source genome annotation system for prokaryotic genomes that has been in development for more than five years. Given a genome sequence, the system integrates numerous tools to perform a gene prediction and a functional annotation of the genome. For the prediction of coding sequences (CDSs) we rely on an approach combining Glimmer (12) and Critica (13) [Reganor (14)]. For each CDS we perform an automatic function prediction (Metanor) using a combination of standard tools like BLAST (15), HMMer (16) and InterPro (17) as a basis for assigning a gene name, gene product, description, functional category, GO (18) numbers and other attributes. These automatic annotations can be curated and enriched manually via the web interface. In order to keep track of all automatic and manual annotations, GenDB stores a history of all annotations in the form of a list. Among other views, the web interface provides a contig view (Figure 1) for easy navigation, a report on each CDS, a region editor for changing gene starts, a region creator for manually creating new genes and a virtual 2D gel. For navigating all genes according to their functional classification the system provides a KEGG (19) (Figure 1), COG (20) and GO browser. Import and export can be done for FASTA, EMBL and GenBank files. Currently, more than 25 genomes are being analyzed in various national and international cooperations using this web interface.

Figure 1

The newly implemented GenDB web interface provides a multitude of views for browsing a genome and for manipulating a genome annotation. The up-most screenshot shows the GenDB contig view that can be used for navigating from a region in the genome to a specific gene or region on a contig. An informal reconstruction of metabolic pathways can be visualized using the KEGG browser shown in the lower part of the screenshot: here, automatically annotated enzymes are highlighted in green.

EMMA is a MAGE (21) compliant software platform for transcriptome data analysis including a LIMS component (ArrayLIMS). Data can be uploaded in standard formats and linked to the GenDB data. The system provides customizable pipelines for data processing and has a modular architecture that can easily be extended. Several visualization methods like scatter-plots or heat maps (Figure 2) are also available. EMMA features detailed reports about spots, genes and their corresponding measurements.

Figure 2

Two examples of the user interface of the EMMA software. The up-most window displays the tool configuration wizard, which serves to define customized analysis pipelines. The building blocks of pipelines are functions or external programs known as plug-ins. The second screenshot depicts a scatterplot of the normalized log-expression versus the log-intensity where each gene is linked to its corresponding annotation.

Data import and export are provided in a variety of formats. The complete MAGE-ML language is supported for array layouts, datasets and experimental descriptions. Datasets can be exported as tab-separated tables and in the binary format HDF5, which is also used for reliable storage of large quantification tables. EMMA supports the Array Description Format and MAGE-ML for defining array layouts. After creating the array layouts the contained sequences can be linked to GenDB automatically. Fine grained access control is provided for every experiment, array and dataset on a user and group level. Upload and storage of experimental setups, RNA-extraction, hybridization conditions, scanned images and quantification data are handled by the included ArrayLIMS system. ProDB is software for large-scale analysis of proteome data, including a LIMS component. ProDB stores experimental data, such as images of 2D gels or MS and allows automated data analysis and annotation of MS. The system handles data from different mass spectrometer software (e.g. processed data from Bruker or Thermofinnigan) and will support the mzData standard from the PSI (22). Since ProDB stores MS together with numerous details about the experimental setup, the annotation is automatically linked to the corresponding spots on a gel. The web interface provides data input and management of all experimental steps leading up to the MS data. We have implemented a common interface to different search engines like Mascot (23) or emowse (24) [contained in the EMBOSS package (25)]. In this interface, the user can define search sets consisting not only of one specific parameter for a MS search (e.g. peptide mass tolerance of 100 p.p.m.) but an interval for each parameter (e.g. peptide mass tolerance from 50 to 100 p.p.m. with steps of 25 p.p.m.). These sets of search parameters are used to analyze all selected MS. The results are then presented to the user for the annotation of the spectra.

Integrating the systems

Each of the described applications can be used stand-alone, but they can also be linked via the BRIDGE integration layer. Using this layer, data objects from different projects can be linked so that, for example, information originating from a GenDB project can be shown in the EMMA or ProDB web interface. We are in the process of tightly integrating the different applications to provide the user with the benefit of having all useful information present at each step of the analyses. In this manner, we have linked the sequences spotted on an array to the corresponding GenDB genes. Thus, the user can jump directly from a spot in EMMA to the corresponding GenDB contig view, report or annotation dialog. Sequence data from GenDB can also be used to create the database for MS analyses. The results of the analyses can be linked to the corresponding sequence object stored within GenDB (Figure 3). Some examples of the ongoing integration can be found in the supplementary material.

Figure 3

Data import in GenDB and ProDB. In the GenDB Import Dialog the import for the S.meliloti sequence and annotations using an EMBL file is shown. The imported contigs can then be used for creating a protein database for the MS analyses in a ProDB project as shown in the upper screenshot.

To demonstrate the benefit of the integration a sample application is described as follows.

A sample application: enriching gene annotations with experimental evidence from 2D gel analyses

For this sample application we have imported the EMBL files of all replicons of the Sinorhizobium meliloti (26) genome into a GenDB project. Afterwards, we computed a new automatic functional annotation using the standard GenDB pipeline. The amino acid sequences of all CDSs and their updated gene annotations were then imported directly from GenDB into the ProDB demo project and installed as a searchable database for emowse. Sample data from 2D gel analyses were imported into ProDB and several spots on a gel were identified from their peptide mass fingerprints. These spots were annotated within the ProDB system and by assigning the corresponding GenDB region the spot objects were linked directly to the GenDB CDS objects. At the same time, an observation was created within the GenDB system referring to the ProDB spot object, and these observations are listed in the GenDB region report showing a small image of the corresponding spot on a gel (Figure 4). Furthermore, the user can jump directly from the GenDB report to the ProDB 2D gel and experiment. Conversely, one can also navigate from an annotated spot to the GenDB contig view. Directly linking experimental data from 2D gel analyses is used in this example for creating enriched gene annotations with increased quality and reliability provided by experimental evidence.

Figure 4

Example for the ongoing integration of the BRIGEP applications. Spots in a 2D gel in ProDB are linked to CDS objects in GenDB. The spot marked with a red square in the up-most screenshot is highlighted in the GenDB contig view in the middle. The GenDB report for a selected region is enriched by images of corresponding proteins picked from a 2D gel in ProDB.

SUPPLEMENTARY MATERIAL

Supplementary Material is available at NAR Online.

25 in total

1. KEGG: kyoto encyclopedia of genes and genomes.

Authors: M Kanehisa; S Goto
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. EMBOSS: the European Molecular Biology Open Software Suite.

Authors: P Rice; I Longden; A Bleasby
Journal: Trends Genet Date: 2000-06 Impact factor: 11.639

3. Bioinformatics support for high-throughput proteomics.

Authors: Andreas Wilke; Christian Rückert; Daniela Bartels; Michael Dondrup; Alexander Goesmann; Andrea T Hüser; Sebastian Kespohl; Burkhard Linke; Martina Mahne; Alice McHardy; Alfred Pühler; Folker Meyer
Journal: J Biotechnol Date: 2003-12-19 Impact factor: 3.307

4. The proteomics standards initiative.

Authors: Sandra Orchard; Henning Hermjakob; Rolf Apweiler
Journal: Proteomics Date: 2003-07 Impact factor: 3.984

5. Development of joint application strategies for two microbial gene finders.

Authors: Alice C McHardy; Alexander Goesmann; Alfred Pühler; Folker Meyer
Journal: Bioinformatics Date: 2004-02-26 Impact factor: 6.937

6. PRIME: a graphical interface for integrating genomic/proteomic databases.

Authors: Axel Facius; Claudia Englbrecht; Fabian Birzele; Andreas Groscurth; Schmidt Benjamin; Steffi Wanka; Werner Mewes
Journal: Proteomics Date: 2005-01 Impact factor: 3.984

7. The COG database: new developments in phylogenetic classification of proteins from complete genomes.

Authors: R L Tatusov; D A Natale; I V Garkavtsev; T A Tatusova; U T Shankavaram; B S Rao; B Kiryutin; M Y Galperin; N D Fedorova; E V Koonin
Journal: Nucleic Acids Res Date: 2001-01-01 Impact factor: 16.971

8. BioArray Software Environment (BASE): a platform for comprehensive management and analysis of microarray data.

Authors: Lao H Saal; Carl Troein; Johan Vallon-Christersson; Sofia Gruvberger; Ake Borg; Carsten Peterson
Journal: Genome Biol Date: 2002-07-15 Impact factor: 13.583

9. GECKO: a complete large-scale gene expression analysis platform.

Authors: Joachim Theilhaber; Anatoly Ulyanov; Anish Malanthara; Jack Cole; Dapeng Xu; Robert Nahf; Michael Heuer; Christoph Brockel; Steven Bushnell
Journal: BMC Bioinformatics Date: 2004-12-10 Impact factor: 3.169

10. Design and implementation of microarray gene expression markup language (MAGE-ML).

Authors: Paul T Spellman; Michael Miller; Jason Stewart; Charles Troup; Ugis Sarkans; Steve Chervitz; Derek Bernhart; Gavin Sherlock; Catherine Ball; Marc Lepage; Marcin Swiatek; W L Marks; Jason Goncalves; Scott Markel; Daniel Iordan; Mohammadreza Shojatalab; Angel Pizarro; Joe White; Robert Hubley; Eric Deutsch; Martin Senger; Bruce J Aronow; Alan Robinson; Doug Bassett; Christian J Stoeckert; Alvis Brazma
Journal: Genome Biol Date: 2002-08-23 Impact factor: 13.583

7 in total

1. A portal for rhizobial genomes: RhizoGATE integrates a Sinorhizobium meliloti genome annotation update with postgenome data.

Authors: Anke Becker; Melanie J Barnett; Delphine Capela; Michael Dondrup; Paul-Bertram Kamp; Elizaveta Krol; Burkhard Linke; Silvia Rüberg; Kai Runte; Brenda K Schroeder; Stefan Weidner; Svetlana N Yurgel; Jacques Batut; Sharon R Long; Alfred Pühler; Alexander Goesmann
Journal: J Biotechnol Date: 2008-12-06 Impact factor: 3.307

2. MannDB - a microbial database of automated protein sequence analyses and evidence integration for protein characterization.

Authors: Carol L Ecale Zhou; Marisa W Lam; Jason R Smith; Adam T Zemla; Matthew D Dyer; Thomas A Kuczmarski; Elizabeth A Vitalis; Thomas R Slezak
Journal: BMC Bioinformatics Date: 2006-10-17 Impact factor: 3.169

3. EMMA 2--a MAGE-compliant system for the collaborative analysis and integration of microarray data.

Authors: Michael Dondrup; Stefan P Albaum; Thasso Griebel; Kolja Henckel; Sebastian Jünemann; Tim Kahlke; Christiane K Kleindt; Helge Küster; Burkhard Linke; Dominik Mertens; Virginie Mittard-Runte; Heiko Neuweger; Kai J Runte; Andreas Tauch; Felix Tille; Alfred Pühler; Alexander Goesmann
Journal: BMC Bioinformatics Date: 2009-02-06 Impact factor: 3.169

4. Construction of an adult barnacle (Balanus amphitrite) cDNA library and selection of reference genes for quantitative RT-PCR studies.

Authors: Tristano Bacchetti De Gregoris; Marco Borra; Elio Biffali; Thomas Bekel; J Grant Burgess; Richard R Kirby; Anthony S Clare
Journal: BMC Mol Biol Date: 2009-06-24 Impact factor: 2.946

5. Visualizing post genomics data-sets on customized pathway maps by ProMeTra-aeration-dependent gene expression and metabolism of Corynebacterium glutamicum as an example.

Authors: Heiko Neuweger; Marcus Persicke; Stefan P Albaum; Thomas Bekel; Michael Dondrup; Andrea T Hüser; Jörn Winnebald; Jessica Schneider; Jörn Kalinowski; Alexander Goesmann
Journal: BMC Syst Biol Date: 2009-08-23

6. CoryneRegNet 4.0 - A reference database for corynebacterial gene regulatory networks.

Authors: Jan Baumbach
Journal: BMC Bioinformatics Date: 2007-11-06 Impact factor: 3.169

7. afterParty: turning raw transcriptomes into permanent resources.

Authors: Martin Jones; Mark Blaxter
Journal: BMC Bioinformatics Date: 2013-10-07 Impact factor: 3.169

7 in total