Literature DB >> 18953030

GOBASE: an organelle genome database.

Emmet A O'Brien1, Yue Zhang, Eric Wang, Veronique Marie, Wole Badejoko, B Franz Lang, Gertraud Burger.   

Abstract

The organelle genome database GOBASE, now in its 21st release (June 2008), contains all published mitochondrion-encoded sequences (approximately 913,000) and chloroplast-encoded sequences (approximately 250,000) from a wide range of eukaryotic taxa. For all sequences, information on related genes, exons, introns, gene products and taxonomy is available, as well as selected genome maps and RNA secondary structures. Recent major enhancements to database functionality include: (i) addition of an interface for RNA editing data, with substitutions, insertions and deletions displayed using multiple alignments; (ii) addition of medically relevant information, such as haplotypes, SNPs and associated disease states, to human mitochondrial sequence data; (iii) addition of fully reannotated genome sequences for Escherichia coli and Nostoc sp., for reference and comparison; and (iv) a number of interface enhancements, such as the availability of both genomic and gene-coding sequence downloads, and a more sophisticated literature reference search functionality with links to PubMed where available. Future projects include the transfer of GOBASE features to NCBI/GenBank, allowing long-term preservation of accumulated expert information. The GOBASE database can be found at http://gobase.bcm.umontreal.ca/. Queries about custom and large-scale data retrievals should be addressed to gobase@bch.umontreal.ca.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18953030      PMCID: PMC2686550          DOI: 10.1093/nar/gkn819

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The amount of information available in generalist molecular sequence databases such as GenBank (1) continues to grow, and this information becomes more diverse and complex as we discover new biological phenomena. Therefore, there is an increasing need for expert databases specializing in particular areas of molecular biology. Specialist databases provide expert curation of data, and access to that data in a flexible and well-integrated fashion serves a purpose complementary to generalist databases such as GenBank. GOBASE is one such specialist database, which has been collecting, curating and publishing data concerning mitochondrial and chloroplast genomes since 1995 (2–5). Organelle genomes are of biological interest for a wide range of studies, such as molecular taxonomy, molecular mechanisms of trans-splicing and RNA editing, and non-Mendelian inherited metabolism-related disease in humans. GOBASE contains a number of different categories of data, such as nucleic acid and protein sequences, genetic maps, taxonomic data and RNA secondary structures. All gene and product names have been assigned from a locally maintained standard list, and this combines with a powerful and flexible interface to allow a wide range of complex searches. While initially GOBASE was designed primarily to address issues of comparative biology, such as the diversity of organelle genome structure in eukaryotes (e.g. 6,7), we have more recently added functionality specific to the human mitochondrial genome in GOBASE, such as searches by haplotype and disease state, which are of medical interest.

DATA CONTENT

GOBASE release 21 (June 2008) contains 913 000 mitochondrial sequences including 737 000 genes, and 250 000 chloroplast-encoded sequences including 174 000 genes, derived mostly from GenBank releases up to 164. The large number of complete organelle genomes available makes GOBASE a valuable resource for phylogenomics, with 6300 complete mitochondrial genomes and 213 chloroplast genomes. This number has increased almost 4-fold since the previous report. More recently (5), we have added bacterial genome sequences for reference purposes. As of release 21 GOBASE includes three complete bacterial genomes: Escherichia coli K12; the alpha-proteobacterium Rickettsia prowazekii strain Madrid E, closely related to the bacterial ancestor of mitochondria; and the cyanobacterium Nostoc sp., closely related to the bacterial ancestor of chloroplasts. In order to provide a consistent comparative view of these genomes, they have each been reannotated using the AutoFACT functional annotation tool (8), including assignation of Gene Ontology terms. GOBASE now contains 10 700 bacterial genes in total.

ENHANCEMENTS TO FUNCTIONALITY

RNA editing

RNA editing refers to a molecular process by which the sequence of a transcribed RNA is modified. This has been seen to occur in the mitochondria of several eukaryotic taxa, such as plants (9) and trypanosomes (10), and in chloroplasts (11). At the level of basic changes, examples exist in the database of sequences being modified by the substitution of one residue for another, by deletion of residues, and by the addition of residues, usually uracil. The RNA editing interface in GOBASE is based primarily on the previously existing RNA query page, with the addition of editing-specific selection parameters such as the type of modification (insertion, deletion or substitution). A query result is shown in Figure 1. In addition to the sequence itself, edited positions are displayed, both as a list specifying the exact change made at each position, and marked in red on an alignment of the relevant sections of sequence for a straightforward and intuitive visual representation. The interface displays only the regions of the sequence where editing occurs. Coding and intronic regions of the sequence are distinguished by background color. Complete unedited and edited sequences can be downloaded from the interface page. Future development will include the possibility of downloading the sequence alignment as displayed, and the addition of multiple rows to the alignment in cases where edits to a sequence are known to occur sequentially, so that observed intermediate stages in the editing process can be represented.
Figure 1.

RNA editing result page, showing sequence-specific data, location of edited positions and alignment of gene sequence with edited sequence. Hyperlinks lead to database pages for details of appropriate Gene Product, Taxonomy, Sequence and Gene, and to the Entrez page for the appropriate gi. Start and end positions of the gene, and locations of edited positions, are numbered relative to the start of the sequence entry containing the gene.

RNA editing result page, showing sequence-specific data, location of edited positions and alignment of gene sequence with edited sequence. Hyperlinks lead to database pages for details of appropriate Gene Product, Taxonomy, Sequence and Gene, and to the Entrez page for the appropriate gi. Start and end positions of the gene, and locations of edited positions, are numbered relative to the start of the sequence entry containing the gene.

Human-specific data

Information specific to the ∼3000 complete human mitochondrial genome sequences in GOBASE has been added from a number of sources, including HmtDB (http://www.hmtdb/uniba.it/) (12), OMIM (http://www.ncbi.nlm.nih.gov/omim/) (13) and MitoMap (http://www.mitomap.org/) (14). Two different interface pages provide access to these new data. The Human Sequence query page allows the user to select a set of human mitochondrial sequences based on haplogroup and disease state. More than 450 different haplogroup assignments are available in GOBASE, so a full list might become unwieldy for some queries. As haplogroup designators always start with a letter, the user is offered the option of first selecting an initial letter or letters, and then picking a range of individual haplogroups from the corresponding subset of haplogroup assignments shown in a menu. The results page (Figure 2) provides relevant information from the standard GOBASE Sequence page, and also shows all the positions at which this sequence differs from the reference human mitochondrial genome as defined in GenBank (accession no NC_001807) using an alignment. On this alignment, mutations that have been associated with disease are marked in yellow, and other polymorphic mutations are indicated in red.
Figure 2.

Human sequence result page, showing the difference between the queried sequence and the reference human mitochondrial genome sequence, both as a list of divergent positions and as an alignment of relevant sections of the sequences.

Human sequence result page, showing the difference between the queried sequence and the reference human mitochondrial genome sequence, both as a list of divergent positions and as an alignment of relevant sections of the sequences. The Human Mutation query page (Figure 3a) allows the user to search the dataset for mutations of interest within a specified range of positions on the human mitochondrial genome sequence, either by specifying start and end positions directly or by selecting one or more genes from a list on the interface. This search returns a list of positions at which mutations are documented. For each mutation (Figure 3b), the result page provides data on its disease associations, a section of the reference sequence showing the location and neighborhood of the mutation, and a list of the sequences in GOBASE containing this mutation.
Figure 3.

(a) Human mutation query page, allowing the user to select the gene(s) of interest and specify the range of positions on the sequence to search for mutations. (b) Result page showing details for an individual mutation.

(a) Human mutation query page, allowing the user to select the gene(s) of interest and specify the range of positions on the sequence to search for mutations. (b) Result page showing details for an individual mutation.

Other functional enhancements

The DNA sequence download functionality has been modified to allow the user to download either genomic sequence or gene-coding regions, selectable via buttons from the Gene query page. There are a small number of unusual cases, such as trans-spliced genes, where there is no straightforward correspondence between a single gene and a contiguous linear region of the source sequence record. The GOBASE database structure has now been modified to address these cases transparently. Sequences of complex gene-coding regions are assembled in advance, stored and made available in query results through the same interface as conventional linear genes. All sequences retrieved from GOBASE now come with detailed literature references derived from the source GenBank records. Journal, author and title are provided, and a direct link to the appropriate PubMed entry if one exists. Because of practical constraints, any given query in GOBASE returns at most 5000 results. Users wishing to execute custom queries retrieving larger amounts of data are invited to contact the GOBASE team at gobase@bch.umontreal.ca so that the query can be run directly on the database via SQL.

IMPLEMENTATION

The GOBASE database is implemented in version 7.4.1 of the PostgreSQL relational database management system with a web interface written in v4.3.8 of the PHP scripting language. The graphics on the gene pages are generated using the GD module for Perl/PHP, version 2.0.25. Perl (5.8.0) scripts are used to download data from GenBank and process it into GOBASE. All procedures are executed on PCs with two 2.4 GHz or 2.8 GHz Intel Xeon CPUs.

FUTURE PLANS

Specialized databases with all their valuable information are prone to disappearance (15), mostly because of funding constraints, unless transferred to sustainable public databases. We are therefore collaborating with scientists at NCBI to establish a database based on the content of GOBASE as an auxiliary to GenBank. This database will focus on the additional data that expert curation at GOBASE has generated, notably the curated gene and product names and synonyms and RNA secondary structure data, thus providing a permanent repository for two decades of curation of organelle genome data.

FUNDING

This project was funded by grants MOP-15331 and MOP-84453 from the Canadian Institute for Health Research (CIHR, Genetics Institute). Funding for open access charge: CIHR. Conflict of interest statement. None declared.
  15 in total

Review 1.  Mitochondrial genome evolution and the origin of eukaryotes.

Authors:  B F Lang; M W Gray; G Burger
Journal:  Annu Rev Genet       Date:  1999       Impact factor: 16.830

Review 2.  Mitochondrial genomes: anything goes.

Authors:  Gertraud Burger; Michael W Gray; B Franz Lang
Journal:  Trends Genet       Date:  2003-12       Impact factor: 11.639

3.  GOBASE--a database of mitochondrial and chloroplast information.

Authors:  Emmet A O'Brien; Elarbi Badidi; Ania Barbasiewicz; Cristina deSousa; B Franz Lang; Gertraud Burger
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

4.  Databases in peril.

Authors:  Zeeya Merali; Jim Giles
Journal:  Nature       Date:  2005-06-23       Impact factor: 49.962

5.  Editing of a chloroplast mRNA by creation of an initiation codon.

Authors:  B Hoch; R M Maier; K Appel; G L Igloi; H Kössel
Journal:  Nature       Date:  1991-09-12       Impact factor: 49.962

6.  The Organelle Genome Database Project (GOBASE).

Authors:  M Korab-Laskowska; P Rioux; N Brossard; T G Littlejohn; M W Gray; B F Lang; G Burger
Journal:  Nucleic Acids Res       Date:  1998-01-01       Impact factor: 16.971

7.  RNA editing in plant mitochondria.

Authors:  P S Covello; M W Gray
Journal:  Nature       Date:  1989-10-19       Impact factor: 49.962

8.  Major transcript of the frameshifted coxII gene from trypanosome mitochondria contains four nucleotides that are not encoded in the DNA.

Authors:  R Benne; J Van den Burg; J P Brakenhoff; P Sloof; J H Van Boom; M C Tromp
Journal:  Cell       Date:  1986-09-12       Impact factor: 41.582

9.  AutoFACT: an automatic functional annotation and classification tool.

Authors:  Liisa B Koski; Michael W Gray; B Franz Lang; Gertraud Burger
Journal:  BMC Bioinformatics       Date:  2005-06-16       Impact factor: 3.169

10.  GenBank.

Authors:  Dennis A Benson; Ilene Karsch-Mizrachi; David J Lipman; James Ostell; David L Wheeler
Journal:  Nucleic Acids Res       Date:  2007-12-11       Impact factor: 16.971

View more
  25 in total

1.  The intraspecific variability of mitochondrial genes of Agaricus bisporus reveals an extensive group I intron mobility combined with low nucleotide substitution rates.

Authors:  Banafsheh Jalalzadeh; Idy Carras Saré; Cyril Férandon; Philippe Callac; Mohammad Farsi; Jean-Michel Savoie; Gérard Barroso
Journal:  Curr Genet       Date:  2014-08-27       Impact factor: 3.886

Review 2.  Evolution of macromolecular import pathways in mitochondria, hydrogenosomes and mitosomes.

Authors:  Trevor Lithgow; André Schneider
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2010-03-12       Impact factor: 6.237

3.  CpGDB : A Comprehensive Database of Chloroplast Genomes.

Authors:  Bhupinder Pal Singh; Ajay Kumar; Harpreet Kaur; Harpreet Singh; Avinash Kaur Nagpal
Journal:  Bioinformation       Date:  2020-02-29

Review 4.  Biogenesis of the cytochrome bc(1) complex and role of assembly factors.

Authors:  Pamela M Smith; Jennifer L Fox; Dennis R Winge
Journal:  Biochim Biophys Acta       Date:  2011-11-22

5.  Development of mitochondrial loop-mediated isothermal amplification for detection of the small liver fluke Opisthorchis viverrini (Opisthorchiidae; Trematoda; Platyhelminthes).

Authors:  Thanh Hoa Le; Nga Thi Bich Nguyen; Nam Hai Truong; Nguyen Van De
Journal:  J Clin Microbiol       Date:  2012-02-08       Impact factor: 5.948

6.  Functional convergence in reduced genomes of bacterial symbionts spanning 200 My of evolution.

Authors:  John P McCutcheon; Nancy A Moran
Journal:  Genome Biol Evol       Date:  2010-09-09       Impact factor: 3.416

7.  Analysis of the complete plastid genome of the unicellular red alga Porphyridium purpureum.

Authors:  Naoyuki Tajima; Shusei Sato; Fumito Maruyama; Ken Kurokawa; Hiroyuki Ohta; Satoshi Tabata; Kohsuke Sekine; Takashi Moriyama; Naoki Sato
Journal:  J Plant Res       Date:  2014-03-05       Impact factor: 2.629

8.  Frequent gain and loss of introns in fungal cytochrome b genes.

Authors:  Liang-Fen Yin; Meng-Jun Hu; Fei Wang; Hanhui Kuang; Yu Zhang; Guido Schnabel; Guo-Qing Li; Chao-Xi Luo
Journal:  PLoS One       Date:  2012-11-07       Impact factor: 3.240

9.  ChloroMitoSSRDB: open source repository of perfect and imperfect repeats in organelle genomes for evolutionary genomics.

Authors:  Gaurav Sablok; Suresh B Mudunuri; Sujan Patnana; Martina Popova; Mario A Fares; Nicola La Porta
Journal:  DNA Res       Date:  2013-01-02       Impact factor: 4.458

Review 10.  RNA editing and drug discovery for cancer therapy.

Authors:  Wei-Hsuan Huang; Chao-Neng Tseng; Jen-Yang Tang; Cheng-Hong Yang; Shih-Shin Liang; Hsueh-Wei Chang
Journal:  ScientificWorldJournal       Date:  2013-04-24
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.