Literature DB >> 16381962

GOBASE--a database of organelle and bacterial genome information.

Emmet A O'Brien1, Yue Zhang, LiuSong Yang, Eric Wang, Veronique Marie, B Franz Lang, Gertraud Burger.   

Abstract

The organelle genome database GOBASE is now in its twelfth release, and includes 350,000 mitochondrial sequences and 118,000 chloroplast sequences, roughly a 3-fold expansion since previously documented. GOBASE also includes a fully reannotated genome sequence of Rickettsia prowazekii, one of the closest bacterial relatives of mitochondria, and will shortly expand to contain more data from bacteria from which organelles originated. All these sequences are now accessible through a single unified interface. Enhancements to the functionality of GOBASE include addition of pages for RNA structures and a page compiling data about the taxonomic distribution of organelle-encoded genes; incorporation of Gene Ontology terms; addition of features deduced from incomplete annotations to sequences in GenBank; marking of type examples in cases where single genes in single species are oversampled within GenBank; and addition of graphics illustrating gene structure and the position of neighbouring genes on a sequence. The database has been reimplemented in PostgreSQL to facilitate development and maintenance, and structural modifications have been made to speed up queries, particularly those related to taxonomy. The GOBASE database can be queried at http://gobase.bcm.umontreal.ca/ and inquiries should be directed to gobase@bch.umontreal.ca.

Entities:  

Mesh:

Year:  2006        PMID: 16381962      PMCID: PMC1347460          DOI: 10.1093/nar/gkj098

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Mitochondria and chloroplasts are of interest to biologists for studies as diverse as population genetics, molecular taxonomy and understanding metabolism-related disease in humans. The volume of information available concerning these organelles is constantly increasing and diversifying. There is therefore a growing need for specialist databases to collect, cross-reference and annotate this information for the requirements of different research communities, to set raw data in the context of expert knowledge and to complementing the role of general databases such as GenBank (1). GOBASE was designed primarily to address broad issues of comparative biology, such as the evolutionary origins of organelle endosymbiosis, gene migration to the nucleus and diversity of genome architecture, gene structure and gene expression mechanisms in organelles (2,3). Organelles are well suited to evolutionary studies because of the large number of complete genomes available. GOBASE release 12 (May 2005) contains 1517 complete mitochondrial genomes and 43 complete chloroplast genomes. GOBASE has been gathering biological information related to mitochondria and chloroplasts, curating this information and making it publicly available through a web-based interface since 1995 (4–7). GOBASE contains a number of different categories of data (nucleotide and protein sequences, taxonomical data, RNA secondary structures, genetic maps) all of which have been collected and verified by expert curators. Gene and product names are assigned from a standardized list maintained internally, to allow for ease of searching and sorting. This assembly of data is made available to researchers through an intuitive interface allowing for a wide range of precisely specified searches. GOBASE release 12 (May 2005) contains ∼350 000 mitochondrial sequences, including 150 000 proteins, and 120 000 chloroplast sequences including 43 000 proteins, derived mostly from GenBank release 145. This represents a roughly 3-fold increase in the contents of GOBASE since the last documented release (7). To further enhance the database's utility as a tool for evolutionary comparison, we have recently started to integrate data from bacteria closely related to the ancestors of mitochondria and chloroplasts into GOBASE. A genome sequence for Rickettsia prowazekii was obtained from GenBank (1) and has been comprehensively re-annotated by a combination of the AutoFACT automated annotation system (8) and expert manual curation. Additional bacterial data will follow in GOBASE release 13, due later in 2005.

STRUCTURAL REDESIGN/NEW FEATURES

The GOBASE interface contains PHP query interface pages corresponding to the classes of biological entity represented within the GOBASE database: these are Gene, Gene&ProductClass, Sequence, Protein, Exon, Intron, RNA, RNAStructure, Taxonomy, Map and GeneDistribution. Since release 8.1 (October 2003) access to the mitochondrial and chloroplast datasets in GOBASE has been combined, and the previously independent interface to chloroplast data has been retired. In each page of the GOBASE interface, users can now select the origin of the data which they will query, with the options of searching mitochondrial, chloroplast or bacterial data, or all of these datasets. GOBASE now includes several new pages, allowing the contents of the database to be interrogated in different ways. The GeneDistribution page shows an overall summary of the distribution of mitochondrial genes, sorted into columns by functional category and into rows by species ordered by taxonomic division, in order to facilitate assessments of the distribution of specific genes or functional gene classes, in specific organisms or across clades. An RNAStructure page has also been added to the interface, providing direct links to .pdf files containing diagrams illustrating the structures of many of the ribosomal and RNAse P RNA sequences contained in the database, with links to the appropriate sequence and RNA feature entries. Finally, an updated Taxonomy interface page has been added, making use of a novel database architecture (manuscript under preparation) to provide rapid and efficient navigation of a structure representing the NCBI taxonomic tree and access to all GOBASE data relating to any clade of interest at any level (taxonomic rank) in the tree. The GOBASE interface has been redesigned to enhance querying and representation of results. The Gene query result page now contains graphics illustrating the internal structure of complex genes (Figure 1a) and neighbouring genes on the chromosome (Figure 1b). This also allows for a more sophisticated representation of trans-spliced genes than has previously been possible.
Figure 1

(a) Gene structure diagram for the mitochondrial nad1 gene from Arabidopsis thaliana, GOBASE feature ID 25091. The table shows links to the sequence entries containing this information, the intron and exon feature entries assigned to each sequence, and the positions of each feature on the respective sequence. Exons are shown in blue and introns in red. The images are each scaled to a standard width; in cases where exons are widely separated on a sequence, a breakpoint is indicated in the image. (b) Diagram of genes in the vicinity of chloroplast atpI gene from Euglena gracilis, GOBASE feature ID 784380. The table contains the position of each gene on the sequence and links to the entries for each of the neighbouring genes. The diagram indicates genes in blue and intergenic regions in black, scaled to a standard width. Strand direction is indicated by an arrow; in cases where there are neighbouring genes on both strands, one image is shown for each strand.

Information from the Gene Ontology project (9) has also been integrated into the GOBASE database. Every gene and gene product defined in GOBASE is associated with a suitable set of Gene Ontology terms as determined by our curators. This Gene Ontology information is accessible to GOBASE users through the Gene&ProductClass interface. GOBASE has recently started to include deduced features, taken from data that are only implicit in a GenBank entry. For example, while exons are usually explicitly identified in an entry in GenBank, this may not be the case for their cognate introns. In such instances, the presence of an intron is inferred from the positions of the exons bounding it, and an entry for that intron is generated internally in GOBASE. Also, new sequence entries are checked for transfer RNA sequences using the program tRNAscanSE (10) and any putative RNAs identified by this method which have not already been annotated are marked in GOBASE as deduced features. Deduced features are distinguished by colour on the appropriate query results page. Examples can be seen from the RNA query page for gene name ‘trnK’ and taxon name ‘Panax’, or from the intron page for gene name ‘trnG’ and taxon name ‘Prunus’. There are cases where the available data in GenBank contain numerous identical or near-identical sequences derived from population studies, and this sample bias can be inconvenient in certain queries. For example, GOBASE contains more than a thousand entries for the cox1 gene from Homo sapiens mitochondria, and to retrieve all of these by default may be inappropriate for researchers interested in the evolution of cox1 in a taxonomically broad context. We have therefore implemented a procedure for marking type examples (selected subsets of sequences to accurately represent the range of larger datasets) within GOBASE, such that for any situation where more than five copies of the same gene exist from a given species, the sequences are aligned using CLUSTAL W (11), and the most distantly related five are selected as type examples and marked as such in the interface. By default, GOBASE query results show only the type examples for these highly sampled genes, but the user may select the option of retrieving all sequences. Type examples are recalculated with every new population of the database.

IMPLEMENTATION

The GOBASE database is implemented in version 7.4.1 of the PostgreSQL relational database management system with a web interface written in v4.3.8 of the PHP scripting language. The graphics on the gene pages are generated using the GD module for Perl/PHP, version 2.0.25. Perl (5.8.0) scripts are used to download data from GenBank and process it into GOBASE. All procedures are executed on PCs with two 2.4 GHz or 2.8 GHz Intel Xeon CPUs.

FUTURE PLANS

The addition of bacterial data to GOBASE will continue, with the inclusion and reannotation of genomes from cyanobacteria, closely related to the ancestors of plastids, more α-proteobacterial sequences, which are closely related to the ancestors of mitochondria, and E.coli strain K12, the biochemically best-studied eubacterium. The presence of these sequences will permit more comprehensive comparative analysis of gene structure and function in organelles and the evolutionary relationships between organelles and their bacterial predecessors. We also intend to include information related to RNA editing in GOBASE in the near future. RNA editing is the programmed alteration of a transcript relative to the gene from which it is transcribed, and occurs in a broad range of biological contexts but is best documented in mitochondria (12). We have developed techniques for parsing information related to RNA editing from GenBank entries, with the intent of storing this information in GOBASE and making it available to users in a clear and consistent fashion.
  12 in total

1.  GOBASE: the organelle genome database.

Authors:  N Shimko; L Liu; B F Lang; G Burger
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

Review 3.  Mitochondrial genomes: anything goes.

Authors:  Gertraud Burger; Michael W Gray; B Franz Lang
Journal:  Trends Genet       Date:  2003-12       Impact factor: 11.639

4.  GOBASE--a database of mitochondrial and chloroplast information.

Authors:  Emmet A O'Brien; Elarbi Badidi; Ania Barbasiewicz; Cristina deSousa; B Franz Lang; Gertraud Burger
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

5.  GenBank.

Authors:  Dennis A Benson; Ilene Karsch-Mizrachi; David J Lipman; James Ostell; David L Wheeler
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

6.  Building a genome database using an object-oriented approach.

Authors:  Anna Barbasiewicz; Lin Liu; B Franz Lang; Gertraud Burger
Journal:  In Silico Biol       Date:  2002

Review 7.  Diversity and evolution of mitochondrial RNA editing systems.

Authors:  Michael W Gray
Journal:  IUBMB Life       Date:  2003 Apr-May       Impact factor: 3.885

8.  The Organelle Genome Database Project (GOBASE).

Authors:  M Korab-Laskowska; P Rioux; N Brossard; T G Littlejohn; M W Gray; B F Lang; G Burger
Journal:  Nucleic Acids Res       Date:  1998-01-01       Impact factor: 16.971

9.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.

Authors:  T M Lowe; S R Eddy
Journal:  Nucleic Acids Res       Date:  1997-03-01       Impact factor: 16.971

10.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.

Authors:  J D Thompson; D G Higgins; T J Gibson
Journal:  Nucleic Acids Res       Date:  1994-11-11       Impact factor: 16.971

View more
  10 in total

1.  Preparation of yeast mitochondrial DNA for direct sequence analysis.

Authors:  Matus Valach; Lubomir Tomaska; Jozef Nosek
Journal:  Curr Genet       Date:  2008-06-21       Impact factor: 3.886

2.  Co-evolution of mitochondrial tRNA import and codon usage determines translational efficiency in the green alga Chlamydomonas.

Authors:  Thalia Salinas; Francéline Duby; Véronique Larosa; Nadine Coosemans; Nathalie Bonnefoy; Patrick Motte; Laurence Maréchal-Drouard; Claire Remacle
Journal:  PLoS Genet       Date:  2012-09-20       Impact factor: 5.917

3.  REDIdb: the RNA editing database.

Authors:  Ernesto Picardi; Teresa Maria Rosaria Regina; Axel Brennicke; Carla Quagliariello
Journal:  Nucleic Acids Res       Date:  2006-12-14       Impact factor: 16.971

4.  Reconstructing the evolution of the mitochondrial ribosomal proteome.

Authors:  Paulien Smits; Jan A M Smeitink; Lambert P van den Heuvel; Martijn A Huynen; Thijs J G Ettema
Journal:  Nucleic Acids Res       Date:  2007-06-29       Impact factor: 16.971

5.  Mitome: dynamic and interactive database for comparative mitochondrial genomics in metazoan animals.

Authors:  Yong Seok Lee; Jeongsu Oh; Young Uk Kim; Namchul Kim; Sungjin Yang; Ui Wook Hwang
Journal:  Nucleic Acids Res       Date:  2007-10-16       Impact factor: 16.971

6.  IMGD: an integrated platform supporting comparative genomics and phylogenetics of insect mitochondrial genomes.

Authors:  Wonhoon Lee; Jongsun Park; Jaeyoung Choi; Kyongyong Jung; Bongsoo Park; Donghan Kim; Jaeyoung Lee; Kyohun Ahn; Wonho Song; Seogchan Kang; Yong-Hwan Lee; Seunghwan Lee
Journal:  BMC Genomics       Date:  2009-04-07       Impact factor: 3.969

7.  GOBASE: an organelle genome database.

Authors:  Emmet A O'Brien; Yue Zhang; Eric Wang; Veronique Marie; Wole Badejoko; B Franz Lang; Gertraud Burger
Journal:  Nucleic Acids Res       Date:  2008-10-25       Impact factor: 16.971

8.  Unassigned MURF1 of kinetoplastids codes for NADH dehydrogenase subunit 2.

Authors:  Sivakumar Kannan; Gertraud Burger
Journal:  BMC Genomics       Date:  2008-10-02       Impact factor: 3.969

9.  E-CAI: a novel server to estimate an expected value of Codon Adaptation Index (eCAI).

Authors:  Pere Puigbò; Ignacio G Bravo; Santiago Garcia-Vallvé
Journal:  BMC Bioinformatics       Date:  2008-01-29       Impact factor: 3.169

10.  The evolutionary history of Saccharomyces species inferred from completed mitochondrial genomes and revision in the 'yeast mitochondrial genetic code'.

Authors:  Pavol Sulo; Dana Szabóová; Peter Bielik; Silvia Poláková; Katarína Šoltys; Katarína Jatzová; Tomáš Szemes
Journal:  DNA Res       Date:  2017-12-01       Impact factor: 4.458

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.