Literature DB >> 27899668

CyanoBase: a large-scale update on its 20th anniversary.

Takatomo Fujisawa¹, Rei Narikawa², Shin-Ichi Maeda³, Satoru Watanabe⁴, Yu Kanesaki⁵, Koichi Kobayashi⁶, Jiro Nomata⁷, Mitsumasa Hanaoka⁸, Mai Watanabe⁶, Shigeki Ehira⁹, Eiji Suzuki¹⁰, Koichiro Awai², Yasukazu Nakamura¹¹.

Abstract

The first ever cyanobacterial genome sequence was determined two decades ago and CyanoBase (http://genome.microbedb.jp/cyanobase), the first database for cyanobacteria was simultaneously developed to allow this genomic information to be used more efficiently. Since then, CyanoBase has constantly been extended and has received several updates. Here, we describe a new large-scale update of the database, which coincides with its 20th anniversary. We have expanded the number of cyanobacterial genomic sequences from 39 to 376 species, which consists of 86 complete and 290 draft genomes. We have also optimized the user interface for large genomic data to include the use of semantic web technologies and JBrowse and have extended community-based reannotation resources through the re-annotation of Synechocystis sp. PCC 6803 by the cyanobacterial research community. These updates have markedly improved CyanoBase, providing cyanobacterial genome annotations as references for cyanobacterial research.

Entities: Chemical Disease Species

Mesh：

Year: 2016 PMID： 27899668 PMCID： PMC5210588 DOI： 10.1093/nar/gkw1131

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Bacteria belong to several phyla, such as the Proteobacteria and Firmicutes, and exhibit an extremely high level of diversity. Because of this heterogeneity, no database currently covers all of the genome sequences within a particular phylum, with the exception of one, the Cyanobacteria. Cyanobacteria is a very diverse phylum (1) from which >300 species have been sequenced, and the genome annotation database CyanoBase (http://genome.microbedb.jp/cyanobase) includes these multi-genus genomes. CyanoBase was originally established two decades ago as a genome database for Synechocystis sp. PCC 6803, which was the first cyanobacterial genome to be sequenced (2). Since then, it has been continuously extended to include the genomes of additional cyanobacteria and related species (3–6), covering 39 genera. More than 300 cyanobacterial genome sequences are currently available and the rapid advances that are being made in sequencing technologies will result in many more in the near future. However, prior to this update, CyanoBase only included 39 sequences (6). Therefore, the development of a high-quality automated annotation system was required to rectify this situation. Synechocystis sp. PCC 6803 represents an appropriate reference genome for automated annotation, as it is one of the best characterized cyanobacteria with many mutants and omics data. However, its annotation data have not been updated since 2003. Therefore, high-quality annotation was required, which ideally should be performed manually by experimental researchers covering multifarious aspects of cyanobacterial biology. In this report, we outline how we have newly incorporated 337 cyanobacterial genomes into CyanoBase. We also describe the development of a web-based annotation system that is easily accessible for experimental researchers. Using this system, we were able to perform high-quality re-annotation of Synechocystis sp. PCC 6803 through the use of a cyanobacterial research community covering various biological aspects. This re-annotated data will contribute to the automated and high-quality annotation of newly sequenced cyanobacterial genomes.

DATA RESOURCES

Reference genomes

CyanoBase integrates reference genomes from original genome projects conducted by Kazusa DNA Research Institute and from public sequence databases (2–6). We added 337 new genome entries into CyanoBase based on recent genome sequencing projects. As a result, CyanoBase has been extended to currently include 376 completely sequenced genomes, of which 86 are complete genomes and 290 are draft genomes, including unannotated contigs and scaffolds. In traditional classification, morphological characters are used to divide them into five subsections (7,8). In this update, species from subsection II and V were newly added to the list and now it is covering all the subsections of cyanobacteria (Supplementary Table S1). To deal with the rapidly increasing number of sequenced genomes, we also developed a processing pipeline for CyanoBase records, which are gathered from cyanobacterial genomic records in the International Nucleotide Sequence Database Collaboration (INSDC) based on the complete taxonomy subtree descended from the cyanobacteria taxon (taxonomy id: 1117) in the taxonomy database. We used a resource description framework (RDF) format for all of the assembly records for cyanobacteria. This pipeline is almost fully automated in converting, annotating, and importing the cyanobacterial genome datasets into CyanoBase (Figure 1).

Figure 1.

Flowchart representing data processing pipeline for CyanoBase.

Metadata and cross references

CyanoBase also contains metadata associated with cyanobacterial genomes that have been collected from multiple sources, such as the National Center for Biotechnology Information's (NCBI's) assembly database and GenBank sequence records. Also, we reviewed other databases and web resources of cyanobacteria with different aspects and contents. For example, CyanoExpress (9) is a database for gene expression, CyanoClust (10) is a database for clustering of homologous proteins, and CyanoLyase (11) is a database specialized for phycobilin lyases. CyanoBase represents one of the most comprehensive databases for cyanobacteria in the comparison of genome number of resources (Supplementary Table S2).

USER INTERFACE

Genome project

CyanoBase is derived from genome project data that have been submitted to the public nucleic acid databases DNA Data Bank of Japan, European Molecular Biology Laboratory, and GenBank. INSDC assembly records (12,13) contain information about a particular genome assembly and are supported by unique assembly-level identifiers. In these records, all of the pieces of a genome are collected together in ways that are much more flexible and powerful for indexing and retrieval purposes. In this update of CyanoBase, the number of genome projects included was increased from 39 to 376, and draft genomes were also introduced as database entries for the first time. We also provided a new user interface for genome projects (Figure 2) and applied TogoStanza (http://togostanza.org) for visualization of the list of genome projects—a generic web framework that enables reusable web components to be developed to assist the development of Semantic Web components such as querying SPARQL endpoints and visualizing the retuned results.

Figure 2.

A sample Genome Project View page resulting from the keyword search ‘Synechocystis’. The table is filterable and sortable for each column.

Genome browser

CyanoBase originally used GBrowse as a genomic browser and to graphically illustrate the genomic context, indicating the length, direction and function of the gene and the surrounding genes. To provide an equivalent service on a larger scale, we replaced GBrowse (14) with JBrowse (15), which is very lightweight and has few server resource requirements. JBrowse allows display features or quantitative data to be obtained directly from a SPARQL endpoint, which the CyanoBase annotation resource stores in RDF.

COMMUNITY ANNOTATION

Among the hundreds of sequenced cyanobacteria, the unicellular cyanobacterium Synechocystis sp. PCC 6803 has reigned as a cyanobacterial model for two decades, being the first cyanobacterium and fourth living organism to have its entire genome sequenced. Therefore, cyanobacterial researchers searched the newly reported genes from peer-reviewed research literature and annotated these in Synechocystis sp. PCC 6803. This effort resulted in the re-annotation of ∼30% of the genes (1096 of 3725 genes) in the genome of Synechocystis sp. PCC 6803, and decreased the rate of ‘function unknown genes’ to less than half (46.3%). This type of high-quality gene annotation is becoming increasingly important, as such information can be used as a platform for newly sequenced cyanobacterial genomes, particularly for the automated annotation of draft genome sequences. However, some genes had been annotated with an ambiguous function, such as probable/putative glycosyltransferase. We newly annotated 46 of these genes, but if we include them in the function unknown category, the proportion of function unknown genes remains at more than half (55.1%). To increase the accuracy of the annotation, we need more experiment-based information about these function unknown genes to allow the complete annotation of all genes in cyanobacterial genomes.

13 in total

1. CyanoBase, the genome database for Synechocystis sp. strain PCC6803: status for the year 2000.

Authors: Y Nakamura; T Kaneko; S Tabata
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions.

Authors: T Kaneko; S Sato; H Kotani; A Tanaka; E Asamizu; Y Nakamura; N Miyajima; M Hirosawa; M Sugiura; S Sasamoto; T Kimura; T Hosouchi; A Matsuno; A Muraki; N Nakazaki; K Naruo; S Okumura; S Shimpo; C Takeuchi; T Wada; A Watanabe; M Yamada; M Yasuda; S Tabata
Journal: DNA Res Date: 1996-06-30 Impact factor: 4.458

3. Improving the coverage of the cyanobacterial phylum using diversity-driven genome sequencing.

Authors: Patrick M Shih; Dongying Wu; Amel Latifi; Seth D Axen; David P Fewer; Emmanuel Talla; Alexandra Calteau; Fei Cai; Nicole Tandeau de Marsac; Rosmarie Rippka; Michael Herdman; Kaarina Sivonen; Therese Coursin; Thierry Laurent; Lynne Goodwin; Matt Nolan; Karen W Davenport; Cliff S Han; Edward M Rubin; Jonathan A Eisen; Tanja Woyke; Muriel Gugger; Cheryl A Kerfeld
Journal: Proc Natl Acad Sci U S A Date: 2012-12-31 Impact factor: 11.205

4. CyanoClust: comparative genome resources of cyanobacteria and plastids.

Authors: Naobumi V Sasaki; Naoki Sato
Journal: Database (Oxford) Date: 2010-01-08 Impact factor: 3.451

5. CyanoBase: the cyanobacteria genome database update 2010.

Authors: Mitsuteru Nakao; Shinobu Okamoto; Mitsuyo Kohara; Tsunakazu Fujishiro; Takatomo Fujisawa; Shusei Sato; Satoshi Tabata; Takakazu Kaneko; Yasukazu Nakamura
Journal: Nucleic Acids Res Date: 2009-10-30 Impact factor: 16.971

6. CyanoLyase: a database of phycobilin lyase sequences, motifs and functions.

Authors: Anthony Bretaudeau; François Coste; Florian Humily; Laurence Garczarek; Gildas Le Corguillé; Christophe Six; Morgane Ratin; Olivier Collin; Wendy M Schluchter; Frédéric Partensky
Journal: Nucleic Acids Res Date: 2012-11-21 Impact factor: 16.971

7. CyanoBase and RhizoBase: databases of manually curated annotations for cyanobacterial and rhizobial genomes.

Authors: Takatomo Fujisawa; Shinobu Okamoto; Toshiaki Katayama; Mitsuteru Nakao; Hidehisa Yoshimura; Hiromi Kajiya-Kanegae; Sumiko Yamamoto; Chiyoko Yano; Yuka Yanaka; Hiroko Maita; Takakazu Kaneko; Satoshi Tabata; Yasukazu Nakamura
Journal: Nucleic Acids Res Date: 2013-11-25 Impact factor: 16.971

8. Assembly information services in the European Nucleotide Archive.

Authors: Nima Pakseresht; Blaise Alako; Clara Amid; Ana Cerdeño-Tárraga; Iain Cleland; Richard Gibson; Neil Goodgame; Tamer Gur; Mikyung Jang; Simon Kay; Rasko Leinonen; Weizhong Li; Xin Liu; Rodrigo Lopez; Hamish McWilliam; Arnaud Oisel; Swapna Pallreddy; Sheila Plaister; Rajesh Radhakrishnan; Stephane Rivière; Marc Rossello; Alexander Senf; Nicole Silvester; Dmitriy Smirnov; Silvano Squizzato; Petra ten Hoopen; Ana Luisa Toribio; Daniel Vaughan; Vadim Zalunin; Guy Cochrane
Journal: Nucleic Acids Res Date: 2013-11-08 Impact factor: 16.971

9. JBrowse: a dynamic web platform for genome visualization and analysis.

Authors: Robert Buels; Eric Yao; Colin M Diesh; Richard D Hayes; Monica Munoz-Torres; Gregg Helt; David M Goodstein; Christine G Elsik; Suzanna E Lewis; Lincoln Stein; Ian H Holmes
Journal: Genome Biol Date: 2016-04-12 Impact factor: 13.583

10. Assembly: a resource for assembled genomes at NCBI.

Authors: Paul A Kitts; Deanna M Church; Françoise Thibaud-Nissen; Jinna Choi; Vichet Hem; Victor Sapojnikov; Robert G Smith; Tatiana Tatusova; Charlie Xiang; Andrey Zherikov; Michael DiCuccio; Terence D Murphy; Kim D Pruitt; Avi Kimchi
Journal: Nucleic Acids Res Date: 2015-11-17 Impact factor: 16.971

31 in total

1. Fundamental differences in diversity and genomic population structure between Atlantic and Pacific Prochlorococcus.

Authors: Nadav Kashtan; Sara E Roggensack; Jessie W Berta-Thompson; Maor Grinberg; Ramunas Stepanauskas; Sallie W Chisholm
Journal: ISME J Date: 2017-05-19 Impact factor: 10.302

2. Flux balance analysis of cyanobacteria reveals selective use of photosynthetic electron transport components under different spectral light conditions.

Authors: Masakazu Toyoshima; Yoshihiro Toya; Hiroshi Shimizu
Journal: Photosynth Res Date: 2019-10-17 Impact factor: 3.573

3. Molecular characterization of DXCF cyanobacteriochromes from the cyanobacterium Acaryochloris marina identifies a blue-light power sensor.

Authors: Masumi Hasegawa; Keiji Fushimi; Keita Miyake; Takahiro Nakajima; Yuki Oikawa; Gen Enomoto; Moritoshi Sato; Masahiko Ikeuchi; Rei Narikawa
Journal: J Biol Chem Date: 2017-12-11 Impact factor: 5.157

4. Respiratory terminal oxidases alleviate photo-oxidative damage in photosystem I during repetitive short-pulse illumination in Synechocystis sp. PCC 6803.

Authors: Ginga Shimakawa; Chikahiro Miyake
Journal: Photosynth Res Date: 2018-03-08 Impact factor: 3.573

5. Differential biochemical properties of three canonical Dps proteins from the cyanobacterium Nostoc punctiforme suggest distinct cellular functions.

Authors: Christoph Howe; Felix Ho; Anja Nenninger; Patrícia Raleiras; Karin Stensjö
Journal: J Biol Chem Date: 2018-08-31 Impact factor: 5.157

Review 6. Genomic attributes of thermophilic and hyperthermophilic bacteria and archaea.

Authors: Digvijay Verma; Vinay Kumar; Tulasi Satyanarayana
Journal: World J Microbiol Biotechnol Date: 2022-06-13 Impact factor: 3.312

7. Possible involvement of extracellular polymeric substrates of Antarctic cyanobacterium Nostoc sp. strain SO-36 in adaptation to harsh environments.

Authors: Devi B Effendi; Toshio Sakamoto; Shuji Ohtani; Koichiro Awai; Yu Kanesaki
Journal: J Plant Res Date: 2022-09-15 Impact factor: 3.000

8. Lipid Pathway Databases with a Focus on Algae.

Authors: Naoki Sato; Takeshi Obayashi
Journal: Methods Mol Biol Date: 2021

9. Draft Genome Sequence of the Nitrogen-Fixing and Hormogonia-Inducing Cyanobacterium Nostoc cycadae Strain WK-1, Isolated from the Coralloid Roots of Cycas revoluta.

Authors: Yu Kanesaki; Masaki Hirose; Yuu Hirose; Takatomo Fujisawa; Yasukazu Nakamura; Satoru Watanabe; Shigeru Matsunaga; Hiroko Uchida; Akio Murakami
Journal: Genome Announc Date: 2018-02-15

10. A Carbon Dioxide Limitation-Inducible Protein, ColA, Supports the Growth of Synechococcus sp. PCC 7002.

Authors: Ginga Shimakawa; Satoru Watanabe; Chikahiro Miyake
Journal: Mar Drugs Date: 2017-12-15 Impact factor: 5.118