Literature DB >> 17088281

The Sulfolobus database.

Abstract

The Sulfolobus database (http://www.sulfolobus.org) integrates, for the first time, all currently available Sulfolobus chromosome sequences with annotations. It also includes all the sequence data for the extrachromosomal elements which can propagate in Sulfolobus organisms. All genomes and annotations deposited in GenBank are included in the database and a genefinder has been run on the sequences to ensure that all potential genes are present, and identifiable, in the database. Every month, all genes are searched against a range of external databases and new results are incorporated. The Sulfolobus database was developed as an asset to the rapidly-growing international community working with Sulfolobus as a model organism for the kingdom Crenarchaeota of the Archaea. It was accessed more that 46 000 times in its first year. The database aims to provide researchers easy access to sequence and gene information and the web-interface includes various searches, free text and BLAST, as well as genome browsing and data extraction. Updated annotations are incorporated regularly and the database will continue to expand as new information becomes available. This includes new sequences, newly identified genes, annotations and other related information.

Entities: Disease Species

Mesh：

Year: 2006 PMID： 17088281 PMCID： PMC1669736 DOI： 10.1093/nar/gkl847

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Organisms from the Sulfolobus genus have been selected by the international research community as model organisms for investigating the biology of the Crenarchaeota, which is one of the two major kingdoms of the Archaea. Organisms belonging to the Sulfolobus genus are all aerobic and grow optimally around 80°C and pH 2–4. Much of our current knowledge of archaeal and crenarchaeal mechanisms involved in the cell cycle (1), DNA replication (2), DNA repair (3) and RNA processing (4), derive from studies on Sulfolobus species. The database currently contains three fully sequenced Sulfolobus chromosomes [Sulfolobus acidocaldarius (5), Sulfolobus solfataricus (6) and Sulfolobus tokodaii (7)] as well as many sequenced extrachromosomal elements, plasmids and viruses, which can propagate in Sulfolobus. The plasmids are either cryptic or conjugative, and the viruses have been classified into seven new viral families (8). By including the extrachromosomal elements, genome comparison can be made between the extrachromosomal elements and chromosomes. Sequence comparison between extrachromosomal elements is a very useful approach for understanding how they evolve, and which genes are mandatory for propagation and spreading of these elements. Furthermore, it is useful for identifying and analysing the integration of extrachromosomal elements into genomes. New Sulfolobus sequences provided by the international community will be incorporated into the database as they become available. We are currently finishing two genomes; Sulfolobus islandicus and Acidianus brierleyi. The latter is phylogenetically close to Sulfolobus although it is anaerobic. Eight additional S.islandicus genomes are being sequenced (9), and these will be included when available. Moreover, several newly isolated viruses are being sequenced in our laboratory, all of which will be integrated into the database. New genes, corrections and/or new annotations will be added as this information becomes available, or upon request from other researchers. By providing these services we will ensure that all the latest corrections to annotations and functional assignments of proteins previously classified as hypothetical are available in one place, such that researches do not have to search several databases for this information.

MATERIALS AND METHODS

MUTAGEN (10) was originally developed as an annotation tool for the S.acidocaldarius genome, but after the genome was published, the system was further developed into a sequence database and made publicly available. The database is backed by a relational database which can be accessed through a simple, yet comprehensive, web interface. This gives the users the opportunity to perform mining and visualization of the information in the database. The idea with the Sulfolobus database is that it should be easy to use, yet powerful enough to perform advanced analyses of the data it contains. All aspects of the system are free to the public. Moreover, all the data in the database can be extracted, either fully or partially, in various formats. The whole system with programs and database can be acquired upon request, providing the possibility for other laboratories to install the system locally with an additional selection of their own sequences. The main sequence browser shows all genes and features which are annotated in the selected sequence (Figure 1a). Features include repeated regions, and other structurally interesting genomic regions. First the user selects which sequence should be displayed. It is always possible to zoom in and out, or even to select a different region of the sequence to be displayed. The user can then select either a gene, or a feature, by clicking on them. Data associated with the selected gene will be displayed, including the name and position of the gene, and which strand it is encoded on. The annotation originating from the GenBank file is displayed together with results from pre-computed analyses and searches against various databases. The user has the possibility to extract the gene, either as DNA or protein sequence, or extract the gene with the surrounding flanking sequence. Moreover, it is possible to open a new browser which displays the sequence at a DNA level for analysis of upstream and downstream regions or to check that a correct start codon has been assigned. Currently the sequences are analysed against GenBank (11), COG (12), SwissProt (13), pfam (14) and KEGG pathways (15). For the latter, a pathway map can be displayed where all the genes involved in the pathway are highlighted and can be accessed in the database. Potential transmembrane helices and signal peptides are predicted by utilising TMHMM (16) and signalp (17), respectively. When possible the results give access to detailed information by displaying relevant pages provided by the externally public databases (e.g. Swissprot, Genbank, &c) the data originated from. Since comparative analyses of both genes and genomes are dependent on the data being regularly updated, all the pre-computed searches against external databases are updated monthly. This ensures that the newest available data are accessible for the users.

Figure 1

The database has a set of basic pages that makes it possible to navigate and display database entries in various ways. (A) This is the main sequence browser where the user can select genes and display information based on searches against external databases. (B) A full alignment of 3 different viruses. Homologous genes are coloured identically to easily identify conserved genes and operons. A similar picture can be obtained showing the genetic neighbourhood of genes between all sequences in the database. Families of homologous proteins provide an important basis for evaluating conservation and, gain or loss of gene function in a genome, especially when these data can be utilized to visualize the genomic neighbourhood of genes of interest. These data are made accessible by identifying all homologous genes in the database using tribeMCL (18). Thus, orthologous and paralogous protein genes can readily be identified both within a given sequence and also with in the rest of the database. They can also be supplemented by showing a graphical alignment of genes surrounding a gene of interest, or if working with the extrachromosomal elements, by graphically aligning the full sequence (Figure 1b). In this way, it is possible to identify and analyse genomic neighbourhoods showing conserved operons and sequence regions. Moreover, this is especially useful when analysing the evolution of plasmids and viruses or the occurrence of chromosomally integrated elements. Thus, it is also possible to make multiple alignments of all, or a selection of the homologous protein genes, or to export the protein sequences for local computations of ones interest. The web interface provides a set of forms through which the user can easily query the annotation and pre-computed analysis data. Genes of interest can be identified by performing text searches in the imported annotations and/or data obtained from the external database searches, making it possible to identify easily all genes with a specific domain or annotated function. It is also possible to perform BLAST searches against sequences in the Sulfolobus database, searching both DNA sequence and annotated proteins. All new published prokaryotic microbial genomes are added to the database automatically, so the user can compare the Sulfolobus sequences with other micro organisms. When querying with protein sequences, the user has the possibility to access the sequence browser along with relevant information, as described above.

DISCUSSION

The Sulfolobus database is a powerful research tool that has been the cornerstone in our genome analyses. It has enabled us to compare and analyse sequences from newly sequenced genomes, independently of whether the sequences are chromosomal, plasmid or viral. The ability to readily identify homologous genes and to compare the gene orders facilitates the identification of potentially important genes and operons. Furthermore, it is straightforward to make genomic comparisons between any, or all, of the genomes in the database, revealing conserved gene clusters, genetic neighbourhoods etc. The operating principles of the Sulfolobus database are to assist and respond to the users' needs, and to capitalize on the major and vast efforts of the Sulfolobus research community, by providing an important and useful data resource. Most important, the database is interactive and has been established as a central facility where researchers can add, correct and/or update gene annotations, thereby ensuring that it will be the site with the most up-to-date annotations and gene predictions for the rapidly growing Sulfolobus community.

17 in total

1. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.

Authors: A Krogh; B Larsson; G von Heijne; E L Sonnhammer
Journal: J Mol Biol Date: 2001-01-19 Impact factor: 5.469

2. The complete genome of the crenarchaeon Sulfolobus solfataricus P2.

Authors: Q She; R K Singh; F Confalonieri; Y Zivanovic; G Allard; M J Awayez; C C Chan-Weiher; I G Clausen; B A Curtis; A De Moors; G Erauso; C Fletcher; P M Gordon; I Heikamp-de Jong; A C Jeffries; C J Kozera; N Medina; X Peng; H P Thi-Ngoc; P Redder; M E Schenk; C Theriault; N Tolstrup; R L Charlebois; W F Doolittle; M Duguet; T Gaasterland; R A Garrett; M A Ragan; C W Sensen; J Van der Oost
Journal: Proc Natl Acad Sci U S A Date: 2001-06-26 Impact factor: 11.205

3. MUTAGEN: multi-user tool for annotating genomes.

Authors: Kim Brügger; Peter Redder; Marie Skovgaard
Journal: Bioinformatics Date: 2003-12-12 Impact factor: 6.937

Review 4. Archaeal DNA repair: paradigms and puzzles.

Authors: M F White
Journal: Biochem Soc Trans Date: 2003-06 Impact factor: 5.407

5. Protein families and TRIBES in genome sequence space.

Authors: Anton J Enright; Victor Kunin; Christos A Ouzounis
Journal: Nucleic Acids Res Date: 2003-08-01 Impact factor: 16.971

6. Improved prediction of signal peptides: SignalP 3.0.

Authors: Jannick Dyrløv Bendtsen; Henrik Nielsen; Gunnar von Heijne; Søren Brunak
Journal: J Mol Biol Date: 2004-07-16 Impact factor: 5.469

Review 7. Viruses of hyperthermophilic Crenarchaea.

Authors: David Prangishvili; Roger A Garrett
Journal: Trends Microbiol Date: 2005-09-08 Impact factor: 17.079

8. The genome of Sulfolobus acidocaldarius, a model organism of the Crenarchaeota.

Authors: Lanming Chen; Kim Brügger; Marie Skovgaard; Peter Redder; Qunxin She; Elfar Torarinsson; Bo Greve; Mariana Awayez; Arne Zibat; Hans-Peter Klenk; Roger A Garrett
Journal: J Bacteriol Date: 2005-07 Impact factor: 3.490

9. Complete genome sequence of an aerobic thermoacidophilic crenarchaeon, Sulfolobus tokodaii strain7.

Authors: Y Kawarabayasi; Y Hino; H Horikawa; K Jin-no; M Takahashi; M Sekine; S Baba; A Ankai; H Kosugi; A Hosoyama; S Fukui; Y Nagai; K Nishijima; R Otsuka; H Nakazawa; M Takamiya; Y Kato; T Yoshizawa; T Tanaka; Y Kudoh; J Yamazaki; N Kushida; A Oguchi; K Aoki; S Masuda; M Yanagii; M Nishimura; A Yamagishi; T Oshima; H Kikuchi
Journal: DNA Res Date: 2001-08-31 Impact factor: 4.458

10. The COG database: an updated version includes eukaryotes.

Authors: Roman L Tatusov; Natalie D Fedorova; John D Jackson; Aviva R Jacobs; Boris Kiryutin; Eugene V Koonin; Dmitri M Krylov; Raja Mazumder; Sergei L Mekhedov; Anastasia N Nikolskaya; B Sridhar Rao; Sergei Smirnov; Alexander V Sverdlov; Sona Vasudevan; Yuri I Wolf; Jodie J Yin; Darren A Natale
Journal: BMC Bioinformatics Date: 2003-09-11 Impact factor: 3.169

5 in total

1. Complete genome sequence of the anaerobic, protein-degrading hyperthermophilic crenarchaeon Desulfurococcus kamchatkensis.

Authors: Nikolai V Ravin; Andrey V Mardanov; Alexey V Beletsky; Ilya V Kublanov; Tatiana V Kolganova; Alexander V Lebedinsky; Nikolai A Chernyh; Elizaveta A Bonch-Osmolovskaya; Konstantin G Skryabin
Journal: J Bacteriol Date: 2008-12-29 Impact factor: 3.490

2. The genome of Hyperthermus butylicus: a sulfur-reducing, peptide fermenting, neutrophilic Crenarchaeote growing up to 108 degrees C.

Authors: Kim Brügger; Lanming Chen; Markus Stark; Arne Zibat; Peter Redder; Andreas Ruepp; Mariana Awayez; Qunxin She; Roger A Garrett; Hans-Peter Klenk
Journal: Archaea Date: 2007-05 Impact factor: 3.273

3. Analysis of the complete genome of Fervidococcus fontis confirms the distinct phylogenetic position of the order Fervidicoccales and suggests its environmental function.

Authors: Alexander V Lebedinsky; Andrey V Mardanov; Ilya V Kublanov; Vadim M Gumerov; Alexey V Beletsky; Anna A Perevalova; Salima Kh Bidzhieva; Elizaveta A Bonch-Osmolovskaya; Konstantin G Skryabin; Nikolai V Ravin
Journal: Extremophiles Date: 2013-12-24 Impact factor: 2.395

4. Biogeography of the Sulfolobus islandicus pan-genome.

Authors: Michael L Reno; Nicole L Held; Christopher J Fields; Patricia V Burke; Rachel J Whitaker
Journal: Proc Natl Acad Sci U S A Date: 2009-05-12 Impact factor: 11.205

5. Patterns of gene flow define species of thermophilic Archaea.

Authors: Hinsby Cadillo-Quiroz; Xavier Didelot; Nicole L Held; Alfa Herrera; Aaron Darling; Michael L Reno; David J Krause; Rachel J Whitaker
Journal: PLoS Biol Date: 2012-02-21 Impact factor: 8.029

5 in total