Literature DB >> 19880388

CyanoBase: the cyanobacteria genome database update 2010.

Mitsuteru Nakao¹, Shinobu Okamoto, Mitsuyo Kohara, Tsunakazu Fujishiro, Takatomo Fujisawa, Shusei Sato, Satoshi Tabata, Takakazu Kaneko, Yasukazu Nakamura.

Abstract

CyanoBase (http://genome.kazusa.or.jp/cyanobase) is the genome database for cyanobacteria, which are model organisms for photosynthesis. The database houses cyanobacteria species information, complete genome sequences, genome-scale experiment data, gene information, gene annotations and mutant information. In this version, we updated these datasets and improved the navigation and the visual display of the data views. In addition, a web service API now enables users to retrieve the data in various formats with other tools, seamlessly.

Entities: Chemical Disease Species

Mesh：

Year: 2009 PMID： 19880388 PMCID： PMC2808859 DOI： 10.1093/nar/gkp915

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Cyanobacteria are prokaryotic organisms that have served as important model organisms for studying oxygenic photosynthesis and have played a significant role in the Earth’s history as primary producers of atmospheric oxygen. Synechocystis sp. PCC 6803 was the first cyanobacteria to have its genome sequenced in 1996 (1). CyanoBase is a comprehensive and freely accessible web database of Cyanobacteria genome information; the data and the web site are licensed under the Creative Commons CC0 public domain license. The database contains the entire 3.9 Mb genome sequence of Synechocystis sp. PCC 6803 in six circular genomic molecules (chromosome and plasmids), with a total of 3725 genes. CyanoBase was developed as the genome database not only for Synechocystis sp. PCC but also for the other cyanobacteria species (2,3). CyanoGenes/mutants, released in 1998, were designed to facilitate the sharing of information on mutants and manual gene annotations submitted by the research community (4). As a result of several genome sequencing projects involving cyanobacteria species, CyanoBase now includes 35 completely sequenced genomes. In addition, various genome scale experimental datasets have been produced, including gene expression profiles and protein–protein interaction data. In this update of the database, we have redesigned and improved the user interface in order to support both biologists, who work around the experiments, and bioinformaticians, who tend to deal with emerging genome-scale data. We have expanded the accessibility of the data navigation based on its hierarchical nature. We have also developed a new keyword search system to allow the user to access deep information about the genome. For improving reusability of the data, a web service was released to enable processing of the data using other tools. Useful external links were also updated to serve an information hub for the cyanobacteria genes.

IMPROVED DATA ACCESSIBILITY

CyanoBase provides access paths to the genome and gene information for cyanobacteria. To improve the accessibility of the data, the organization of the data was re-arranged. CyanoBase consists of pages for viewing each genome resource, including genome projects (DataSetView), an individual genome project (SpeciesView), individual chromosomes (MapView), multiple genes (GenesView), gene function classification (GeneCategoryView), word clouds (WordCloudView), search results (SearchView) and individual genes (GeneView). Pages are linked according to the hierarchy and connectivity of the data. We also designed the navigation to conform to the structure for genome information as used by biologists. The GeneView page can be reached via multiple paths from the species page, including the (i) chromosome circle map (MapView), (ii) gene list (GenesView), (iii) function classification (GeneCategoryView) and (iv) search results (SearchView). The hierarchical relationship is displayed on every page as a breadcrumbs list in the header. We formulated URLs to correspond to the hierarchical navigation. For instance, the URL of the slr1311 GeneView page for Synechocystis sp. PCC 6803 is http://genome.kauzsa.or.jp/cyanobase/Synechocystis/genes/slr1311, in which each part of the path refers to a step in the hierarchy: the data source name (Synechocystis), the scope name (genes) and the gene ID (slr1311). The word cloud is generated automatically from text descriptions of a gene set to facilitate a visual inspection of an outline of the gene descriptions. This view helps users to summarize the gene set and explore related gene sets. The word frequencies in the text of a gene description are summed to construct a word cloud view that captures the character of the selected gene set (Figure 1 ).

Figure 1.

Word cloud links of gene annotations. The display represents the search results from the SearchView page by the keyword ‘ABC’ for Synechocystis sp. PCC 6803 genes at SearchView page. The size of each word corresponds to the number of times it appears in the results. We also improved the keyword search feature. In a full text search, users can now select a target data scope. The search targets include gene symbols, definitions, function classifications, descriptions in automatic annotations and information on mutants.

IMPROVED DATA REPRESENTATION

In this update, we introduced several new data representations to improve viewing of and navigation through data.

Genome context

A graphical image of the genomic context, generated by Gbrowse (5), indicates the length, direction and function of the gene and the surrounding genes. It also provides the navigation links among genes in GeneView.

TableView

TableView provides an enhanced user interface that is sortable by column and contains related links. The tabular representation is suitable for the species list, BLAST hits (homologs) and InterProScan (6) matches. It is useful to analyze intra-/inter-genomic data using these simple statistics and rankings.

Protein domains

An image and a table of predicted InterPro functional domains enable the user to glance at the arrangement of the protein functional domains and peruse these descriptions on the GeneView page. The InterPro IDs in the table have links to lists of the genes to be matched within or between species in CyanoBase.

Word clouds

A word cloud image of the gene function, created using Wordle (http://www.wordle.net), provides a summarized view of the gene function of a species on the SpeciesView page. The graphics work as an icon of the gene function of the species.

NEW DATA AND RESOURCES

The resources present in CyanoBase as of September 2009 are shown in Table 1. The annotations, along with the additional genome and gene information, are accumulated continually. Genome projects on cyanobacteria species have produced 35 complete sequenced genomes. CyanoBase imported the genome information from both GenBank and the original sites of the genome projects.

Table 1.

Data and annotations in CyanoBase (September 2009)

Annotation	Data
Species (genome projects)	35
Nucleotides	266 584 858
Genes	117 435
Protein-coding genes	114 783
RNA genes	2652
PubMed references for genes	2260
Mutant information for 688 genes	1700

Data and annotations in CyanoBase (September 2009) We also updated the curated resources. First, we updated 301 open reading frames (ORFs) based on comments from the research community: 200 ORFs were improved in the translational initiation site, 99 new ORFs were added and 2 ORFs were deleted. Second, we updated the functional annotations of 338 genes based on information registered in CyanoGenes and CYORF. Third, mutant information and curated gene descriptions were collected directly from biologists for 1700 cases and 688 genes via CyanoMuntats. We integrated the mutant information into the GeneView page and released the new summary page for the mutants. Fourth, we added a publication list for each gene. Publications that described the cyanobacteria genes were curated manually and listed on the GeneView page. The curating effort has continued to operate on a portion of the Gene Indexing project using KazusaAnnotation, a social genome annotation system (http://a.kazusa.or.jp). Finally, the GeneView page now includes genome-scale experimental data, including protein–protein interaction data for Synechocystis sp. PCC 6803 collected by the yeast two-hybrid method (7). We added automated annotations for each gene on the GeneView page. These annotations include putative orthologs based on finding a reciprocal best hit using the BLAST program, protein functional domains based on the InterProScan system and a Gene Ontology gene association based on the ipr2go mapping. Users can browse and analyze these data using the TableView. Useful links were added to the GeneView external links section, for example, to the web sites for Gclust (8) and MBGD (9) (automated ortholog gene cluster databases) and Fluorome (10) (a database of a large-scale analysis of chlorophyll fluorescence kinetics). Moreover, it is possible to link many more external databases via KEGG/GENES (11) and LinkDB (12).

WEB SERVICES

CyanoBase provides a URL-based REST web-service interface for reusing data with other tools and computer programs. Data are available in several file formats: tab-delimitered text, CSV, FastA and gff3. Tools such as Galaxy (13), BioMart (14) and spreadsheet programs are able to import the data seamlessly. The URLs are indicated by the orange-colored icons and are specified in the KazusaAPI section on the relevant pages. The SearchView page has a special export function for a set of genes in the search results. Users can easily obtain the gene set in plaintext format. This gene set export function also allows users to obtain the set of genes belonging to a gene category on the GeneCategoryView page. CyanoBase also provides alternative ways to export sequence and gene annotation data. KazusaMart (http://mart.kazusa.or.jp), a BioMart system, is able to filter and export the data. Also, a Gbrowse service can be used to select and export the genome sequence and the features, with web and DAS interfaces (5).

FUNDING

Kazusa DNA Research Institute Foundation. Funding for open access charge: Kazusa DNA Research Institute Foundation. Conflict of interest statement. None declared.

14 in total

1. CyanoBase, the genome database for Synechocystis sp. strain PCC6803: status for the year 2000.

Authors: Y Nakamura; T Kaneko; S Tabata
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. The generic genome browser: a building block for a model organism system database.

Authors: Lincoln D Stein; Christopher Mungall; ShengQiang Shu; Michael Caudy; Marco Mangone; Allen Day; Elizabeth Nickerson; Jason E Stajich; Todd W Harris; Adrian Arva; Suzanna Lewis
Journal: Genome Res Date: 2002-10 Impact factor: 9.043

3. Large-scale analysis of chlorophyll fluorescence kinetics in Synechocystis sp. PCC 6803: identification of the factors involved in the modulation of photosystem stoichiometry.

Authors: Hiroshi Ozaki; Masahiko Ikeuchi; Teruo Ogawa; Hideya Fukuzawa; Kintake Sonoike
Journal: Plant Cell Physiol Date: 2007-02-06 Impact factor: 4.927

4. Gclust: trans-kingdom classification of proteins using automatic individual threshold setting.

Authors: Naoki Sato
Journal: Bioinformatics Date: 2009-01-21 Impact factor: 6.937

5. Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions.

Authors: T Kaneko; S Sato; H Kotani; A Tanaka; E Asamizu; Y Nakamura; N Miyajima; M Hirosawa; M Sugiura; S Sasamoto; T Kimura; T Hosouchi; A Matsuno; A Muraki; N Nakazaki; K Naruo; S Okumura; S Shimpo; C Takeuchi; T Wada; A Watanabe; M Yamada; M Yasuda; S Tabata
Journal: DNA Res Date: 1996-06-30 Impact factor: 4.458

6. MBGD: a platform for microbial comparative genomics based on the automated construction of orthologous groups.

Authors: Ikuo Uchiyama
Journal: Nucleic Acids Res Date: 2006-11-29 Impact factor: 16.971

7. A large-scale protein protein interaction analysis in Synechocystis sp. PCC6803.

Authors: Shusei Sato; Yoshikazu Shimoda; Akiko Muraki; Mitsuyo Kohara; Yasukazu Nakamura; Satoshi Tabata
Journal: DNA Res Date: 2007-11-13 Impact factor: 4.458

8. BioMart Central Portal--unified access to biological data.

Authors: Syed Haider; Benoit Ballester; Damian Smedley; Junjun Zhang; Peter Rice; Arek Kasprzyk
Journal: Nucleic Acids Res Date: 2009-05-06 Impact factor: 16.971

9. KEGG for linking genomes to life and the environment.

Authors: Minoru Kanehisa; Michihiro Araki; Susumu Goto; Masahiro Hattori; Mika Hirakawa; Masumi Itoh; Toshiaki Katayama; Shuichi Kawashima; Shujiro Okuda; Toshiaki Tokimatsu; Yoshihiro Yamanishi
Journal: Nucleic Acids Res Date: 2007-12-12 Impact factor: 16.971

10. InterPro: the integrative protein signature database.

Authors: Sarah Hunter; Rolf Apweiler; Teresa K Attwood; Amos Bairoch; Alex Bateman; David Binns; Peer Bork; Ujjwal Das; Louise Daugherty; Lauranne Duquenne; Robert D Finn; Julian Gough; Daniel Haft; Nicolas Hulo; Daniel Kahn; Elizabeth Kelly; Aurélie Laugraud; Ivica Letunic; David Lonsdale; Rodrigo Lopez; Martin Madera; John Maslen; Craig McAnulla; Jennifer McDowall; Jaina Mistry; Alex Mitchell; Nicola Mulder; Darren Natale; Christine Orengo; Antony F Quinn; Jeremy D Selengut; Christian J A Sigrist; Manjula Thimma; Paul D Thomas; Franck Valentin; Derek Wilson; Cathy H Wu; Corin Yeats
Journal: Nucleic Acids Res Date: 2008-10-21 Impact factor: 16.971

66 in total

1. Membrane proteins in four acts: function precedes structure determination.

Authors: W A Cramer; S D Zakharov; S Saif Hasan; H Zhang; D Baniulis; M V Zhalnina; G M Soriano; O Sharma; J C Rochet; C Ryan; J Whitelegge; G Kurisu; E Yamashita
Journal: Methods Date: 2011-11-10 Impact factor: 3.608

2. Target highlights in CASP9: Experimental target structures for the critical assessment of techniques for protein structure prediction.

Authors: Andriy Kryshtafovych; John Moult; Sergio G Bartual; J Fernando Bazan; Helen Berman; Darren E Casteel; Evangelos Christodoulou; John K Everett; Jens Hausmann; Tatjana Heidebrecht; Tanya Hills; Raymond Hui; John F Hunt; Jayaraman Seetharaman; Andrzej Joachimiak; Michael A Kennedy; Choel Kim; Andreas Lingel; Karolina Michalska; Gaetano T Montelione; José M Otero; Anastassis Perrakis; Juan C Pizarro; Mark J van Raaij; Theresa A Ramelot; Francois Rousseau; Liang Tong; Amy K Wernimont; Jasmine Young; Torsten Schwede
Journal: Proteins Date: 2011-10-21

3. The metabolic network of Synechocystis sp. PCC 6803: systemic properties of autotrophic growth.

Authors: Henning Knoop; Yvonne Zilliges; Wolfgang Lockau; Ralf Steuer
Journal: Plant Physiol Date: 2010-07-08 Impact factor: 8.340

4. Concerted changes in gene expression and cell physiology of the cyanobacterium Synechocystis sp. strain PCC 6803 during transitions between nitrogen and light-limited growth.

Authors: Eneas Aguirre von Wobeser; Bas W Ibelings; Jasper Bok; Vladimir Krasikov; Jef Huisman; Hans C P Matthijs
Journal: Plant Physiol Date: 2011-01-04 Impact factor: 8.340

5. The nitrogen-regulated response regulator NrrA controls cyanophycin synthesis and glycogen catabolism in the cyanobacterium Synechocystis sp. PCC 6803.

Authors: Deng Liu; Chen Yang
Journal: J Biol Chem Date: 2013-12-11 Impact factor: 5.157

6. Functional proteomic discovery of Slr0110 as a central regulator of carbohydrate metabolism in Synechocystis species PCC6803.

Authors: Liyan Gao; Chunting Shen; Libing Liao; Xiahe Huang; Kehui Liu; Wei Wang; Lihai Guo; Wenhai Jin; Fang Huang; Wu Xu; Yingchun Wang
Journal: Mol Cell Proteomics Date: 2013-10-29 Impact factor: 5.911

7. Global Proteomic Analysis Reveals an Exclusive Role of Thylakoid Membranes in Bioenergetics of a Model Cyanobacterium.

Authors: Michelle Liberton; Rajib Saha; Jon M Jacobs; Amelia Y Nguyen; Marina A Gritsenko; Richard D Smith; David W Koppenaal; Himadri B Pakrasi
Journal: Mol Cell Proteomics Date: 2016-04-07 Impact factor: 5.911

8. The GreenCut2 resource, a phylogenomically derived inventory of proteins specific to the plant lineage.

Authors: Steven J Karpowicz; Simon E Prochnik; Arthur R Grossman; Sabeeha S Merchant
Journal: J Biol Chem Date: 2011-04-22 Impact factor: 5.157

Review 9. Toward the complete proteome of Synechocystis sp. PCC 6803.

Authors: Liyan Gao; Jinlong Wang; Haitao Ge; Longfa Fang; Yuanya Zhang; Xiahe Huang; Yingchun Wang
Journal: Photosynth Res Date: 2015-04-11 Impact factor: 3.573

10. Phylogenomic analysis of the Chlamydomonas genome unmasks proteins potentially involved in photosynthetic function and regulation.

Authors: Arthur R Grossman; Steven J Karpowicz; Mark Heinnickel; David Dewez; Blaise Hamel; Rachel Dent; Krishna K Niyogi; Xenie Johnson; Jean Alric; Francis-André Wollman; Huiying Li; Sabeeha S Merchant
Journal: Photosynth Res Date: 2010-05-20 Impact factor: 3.573