Literature DB >> 24275496

CyanoBase and RhizoBase: databases of manually curated annotations for cyanobacterial and rhizobial genomes.

Takatomo Fujisawa1, Shinobu Okamoto, Toshiaki Katayama, Mitsuteru Nakao, Hidehisa Yoshimura, Hiromi Kajiya-Kanegae, Sumiko Yamamoto, Chiyoko Yano, Yuka Yanaka, Hiroko Maita, Takakazu Kaneko, Satoshi Tabata, Yasukazu Nakamura.   

Abstract

To understand newly sequenced genomes of closely related species, comprehensively curated reference genome databases are becoming increasingly important. We have extended CyanoBase (http://genome.microbedb.jp/cyanobase), a genome database for cyanobacteria, and newly developed RhizoBase (http://genome.microbedb.jp/rhizobase), a genome database for rhizobia, nitrogen-fixing bacteria associated with leguminous plants. Both databases focus on the representation and reusability of reference genome annotations, which are continuously updated by manual curation. Domain experts have extracted names, products and functions of each gene reported in the literature. To ensure effectiveness of this procedure, we developed the TogoAnnotation system offering a web-based user interface and a uniform storage of annotations for the curators of the CyanoBase and RhizoBase databases. The number of references investigated for CyanoBase increased from 2260 in our previous report to 5285, and for RhizoBase, we perused 1216 references. The results of these intensive annotations are displayed on the GeneView pages of each database. Advanced users can also retrieve this information through the representational state transfer-based web application programming interface in an automated manner.

Entities:  

Mesh:

Year:  2013        PMID: 24275496      PMCID: PMC3965071          DOI: 10.1093/nar/gkt1145

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Cyanobacteria constitute a large taxonomic group within the domain of eubacteria. They are widely used as model organisms to study the fundamental aspects of photosynthesis, in basic and applied plant-related research, in biotechnology for the development of third-generation biofuels and for their evolutionary contributions for the whole biosphere. CyanoBase was originally developed as a genome database for Synechocystis sp. PCC 6803, the first cyanobacterial genome sequenced in 1996 (1). CyanoBase subsequently has been extended to include additional cyanobacteria and related species (2–4), covering 39 organisms. Rhizobia, a collective name of the genera Rhizobium, Sinorhizobium, Mesorhizobium and Bradyrhizobium, are agronomically important bacteria because they have the ability to establish nitrogen-fixing symbioses with leguminous plants. RhizoBase was initiated as a genome database for Mesorhizobium loti strain MAFF303099 sequenced in 2000 (5) and was extended to include other rhizobia and related species, encompassing 18 organisms till date. Regarding CyanoBase and RhizoBase, we have been accumulating gene annotations by incorporating evidence from published data. To maintain the quality of annotations, the involvement of the research communities of cyanobacteria and rhizobia was essential. Therefore, to assist in the submission procedure of new annotations, we developed the TogoAnnotation system (6) and also conducted in-house curation efforts to ensure that annotations are as comprehensive as possible. New sequencing technologies and automatic genome processing pipelines [e.g., MiGAP (7) and DNA Databank of Japan (DDBJ) Pipeline (8,9)] have been certainly accelerating prokaryotic genome analyses. However, it is difficult to estimate the functions of predicted genes without the information from carefully curated reference annotations of model organisms. Thus, for this, the manually curated annotations in CyanoBase and RhizoBase provide fundamental information for the interpretation of high-throughput sequencing data. Regarding data reusability, it is important to provide a high level of accessibility and interoperability of the reference annotations. For accessibility, CyanoBase and RhizoBase use a common database system to provide the same types of functionalities, user interfaces and application programming interfaces. For interoperability, we have introduced Semantic Web technologies (10) for representing data in a standard format and providing an advanced query interface.

DATA CURATION

Reference genomes

CyanoBase and RhizoBase integrate reference genomes from original genome projects conducted by Kazusa DNA Research Institute and from public sequence databases. By the inclusion of recent genome sequencing projects, we added 4 and 17 new genome entries in CyanoBase and RhizoBase, respectively (4,5). As a result, CyanoBase is extended to currently include 39 completely sequenced genomes, and RhizoBase contains 18 completely sequenced genomes and two partially sequenced genomic regions, such as the symbiosis island (newly incorporated genomes are listed in Supplementary Table S1). We have integrated automatic gene annotations including BLAST and the InterPro search results in the new cyanobacterial and rhizobial genomic databases before the manual curations described in the following sections.

Manual curation

Expert curators extracted gene symbols and full names from full sections of the peer-reviewed research literature and annotated them using the Sequence Ontology (SO) terms (11) to indicate types of annotations. These annotations are immediately reflected in the ‘Extracted from literature’ fields in the ‘Summary’ section of the GeneView page of each database (Figure 1). We have been accepting community submissions to both databases including gene structure refinements, gene families, gene functions, gene symbols and links to other resources. In addition, submitted data are manually inspected by expert curators before becoming integrated.
Figure 1.

An example GeneView page for the sll1867 gene of Synechocystis sp. PCC 6803. Manually curated gene symbol(s) and gene product(s) are shown in the ‘Gene symbol Extracted from literature’ and ‘Gene symbol Extracted from literature’ fields in the ‘Summary’ section.

An example GeneView page for the sll1867 gene of Synechocystis sp. PCC 6803. Manually curated gene symbol(s) and gene product(s) are shown in the ‘Gene symbol Extracted from literature’ and ‘Gene symbol Extracted from literature’ fields in the ‘Summary’ section.

Curation platform

Manual curation is still one of the most important and most difficult tasks in genome projects. Therefore, methodological and technological solutions are urgently needed to reduce annotation costs. To address this issue, we have developed a web-based genome annotation tool, TogoAnnotation (http://togo.annotation.jp). This tool, which is derived from KazusaAnnotation (4), provides an easy way to access, edit and store annotation data over a flexible web interface based on social bookmarking web services architecture.

Curated genes

CyanoBase and RhizoBase have grown considerably since their introduction. The content of CyanoBase and RhizoBase and their composition are summarized in Table 1. A statistical summary of annotations conducted in August 2013 indicated that 138 896 cyanobacterial genes were curated from 5285 published references. Hence, the number of references investigated for CyanoBase increased by 3025 in comparison with our previous report in 2010 (4). For example, of the 3725 genes contained in the Synechocystis sp. PCC 6803 genome, 3067 (82.3%) have been already annotated with gene symbols, protein names and gene definitions from the literature. Users are able to access the annotation of each gene on the ‘Reference’ section of the GeneView page and to find annotated data [e.g. the photosystem II D1 protein (psbA3) currently have 386 citations http://genome.microbedb.jp/cyanobase/Synechocystis/genes/sll1867#references].
Table 1.

Number of curated publications and annotated genes for each organism of CyanoBase and RhizoBase

DatabaseOrganismReferencesAnnotationsAnnotated genesTotal genes
CyanoBaseSynechocystis sp. PCC 6803234680 20430643725
CyanoBaseAnabaena sp. PCC 712095929 15427546223
CyanoBaseSynechococcus elongatus PCC 794281517 0607942715
CyanoBaseThermosynechococcus elongatus BP-1270676825282528
CyanoBaseSynechococcus sp. PCC 700226439992653235
CyanoBaseNostoc punctiforme ATCC 2913315133497686794
CyanoBaseChlorobium tepidum TLS14355327512310
CyanoBaseAnabaena variabilis ATCC 2941311917312585724
CyanoBaseProchlorococcus marinus MED46421553901756
CyanoBaseGloeobacter violaceus PCC 742152560044834484
CyanoBaseProchlorococcus marinus MIT9313449192482326
CyanoBaseProchlorococcus marinus SS120375391351928
CyanoBaseArthrospira platensis NIES-3997872606676
CyanoBaseTrichodesmium erythraeum IMS101522144498
CyanoBaseSynechococcus sp. WH8102538222579
CyanoBaseSynechococcus elongatus PCC 63012522580
RhizoBaseBradyrhizobium japonicum USDA11055026 63683668374
RhizoBaseSinorhizobium meliloti 1021240980119906287
RhizoBaseMesorhizobium loti MAFF30309911523738657343
RhizoBaseRhizobium sp. pNGR234ab1075224989990
RhizoBaseRhizobium leguminosarum bv. viciae 38418334267817342
RhizoBaseRhizobium sp. NGR234846176437
Number of curated publications and annotated genes for each organism of CyanoBase and RhizoBase

AVAILABILITY

Application programming interface

CyanoBase and RhizoBase are based on the same in-house developed genome database system offering a representational state transfer-based web application programming interface for automated retrieval of data by third-party tools and computer programs. As an output, various widely used formats are supported, including TSV, CSV, FASTA and GFF3 (4).

Semantic Web application

To improve data integration within CyanoBase, RhizoBase and other microorganism databases in the near future, we have introduced Semantic Web technologies for the standard representation and common exchange protocol of data (10). First, we developed a generic ontology for semantically describing genomic annotations in cooperation with the DDBJ and the Database Center for Life Science (DBCLS). Based on this ontology, we converted annotations stored in the CyanoBase and RhizoBase databases into the resource description framework (RDF) format. The result is accessible from our SPARQL Protocol and RDF Query Language (SPARQL) endpoint at http://genome.microbedb.jp/sparql. A list of available resources is summarized in Table 2.
Table 2.

Summary of data types and the number of items accessible from the SPARQL endpoint

Data typeNumberRDFReference
CyanoBase
    Genome project39
    Gene138 896
    Publication5285
    Operona86
    Protein complexa68
    Protein–protein interaction3054(12)
RhizoBase
    Genome project20
    Gene116 140
    Publication1216
    Protein–protein interaction2987(13)
Summary of data types and the number of items accessible from the SPARQL endpoint Currently, databases of bacterial model organisms are maintained and distributed independently. To ensure that these data are interoperable for a large-scale genomic analysis, we collaborated with the MicrobeDB.jp (http://microbedb.jp/) and the TogoGenome (http://togogenome.org/) projects for sharing prokaryotic genome annotations as RDF data through respective SPARQL endpoints. Such standardization reduces duplicated efforts and improves reusability while allowing each database to update their own resources independently. In addition, it is beneficial for end users that they can use a variety of data sources with common software through the standard web service interface in a unified and automated manner.

Change of site URL

We have migrated the server hosting CyanoBase and RhizoBase from Kazusa DNA Research Institute to the National Institute of Genetics. Consequently, the location of these databases has changed to http://genome.microbedb.jp/.

Social media

We have been delivering timely announcements on Twitter. Users can follow @cyanobase and @rhizobase on Twitter to receive the latest information on database updates and server maintenance of the CyanoBase and RhizoBase databases.

License

All data in our database is provided under the Creative Commons CC0 public domain license (4).

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

Integrated Database Project, Ministry of Education, Culture, Sports, Science and Technology of Japan; National Bioscience Database Center (NBDC) of the Japan Science and Technology Agency (JST); Kazusa DNA Research Institute Foundation. Funding for Open Access: National Bioscience Database Center. Conflict of interest statement. None declared.
  12 in total

1.  CyanoBase, the genome database for Synechocystis sp. strain PCC6803: status for the year 2000.

Authors:  Y Nakamura; T Kaneko; S Tabata
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  CyanoBase, a www database containing the complete nucleotide sequence of the genome of Synechocystis sp. strain PCC6803.

Authors:  Y Nakamura; T Kaneko; M Hirosawa; N Miyajima; S Tabata
Journal:  Nucleic Acids Res       Date:  1998-01-01       Impact factor: 16.971

3.  Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions.

Authors:  T Kaneko; S Sato; H Kotani; A Tanaka; E Asamizu; Y Nakamura; N Miyajima; M Hirosawa; M Sugiura; S Sasamoto; T Kimura; T Hosouchi; A Matsuno; A Muraki; N Nakazaki; K Naruo; S Okumura; S Shimpo; C Takeuchi; T Wada; A Watanabe; M Yamada; M Yasuda; S Tabata
Journal:  DNA Res       Date:  1996-06-30       Impact factor: 4.458

4.  Complete genome structure of the nitrogen-fixing symbiotic bacterium Mesorhizobium loti.

Authors:  T Kaneko; Y Nakamura; S Sato; E Asamizu; T Kato; S Sasamoto; A Watanabe; K Idesawa; A Ishikawa; K Kawashima; T Kimura; Y Kishida; C Kiyokawa; M Kohara; M Matsumoto; A Matsuno; Y Mochizuki; S Nakayama; N Nakazaki; S Shimpo; M Sugimoto; C Takeuchi; M Yamada; S Tabata
Journal:  DNA Res       Date:  2000-12-31       Impact factor: 4.458

5.  CyanoBase: the cyanobacteria genome database update 2010.

Authors:  Mitsuteru Nakao; Shinobu Okamoto; Mitsuyo Kohara; Tsunakazu Fujishiro; Takatomo Fujisawa; Shusei Sato; Satoshi Tabata; Takakazu Kaneko; Yasukazu Nakamura
Journal:  Nucleic Acids Res       Date:  2009-10-30       Impact factor: 16.971

6.  The 3rd DBCLS BioHackathon: improving life science data integration with Semantic Web technologies.

Authors:  Toshiaki Katayama; Mark D Wilkinson; Gos Micklem; Shuichi Kawashima; Atsuko Yamaguchi; Mitsuteru Nakao; Yasunori Yamamoto; Shinobu Okamoto; Kenta Oouchida; Hong-Woo Chun; Jan Aerts; Hammad Afzal; Erick Antezana; Kazuharu Arakawa; Bruno Aranda; Francois Belleau; Jerven Bolleman; Raoul Jp Bonnal; Brad Chapman; Peter Ja Cock; Tore Eriksson; Paul Mk Gordon; Naohisa Goto; Kazuhiro Hayashi; Heiko Horn; Ryosuke Ishiwata; Eli Kaminuma; Arek Kasprzyk; Hideya Kawaji; Nobuhiro Kido; Young Joo Kim; Akira R Kinjo; Fumikazu Konishi; Kyung-Hoon Kwon; Alberto Labarga; Anna-Lena Lamprecht; Yu Lin; Pierre Lindenbaum; Luke McCarthy; Hideyuki Morita; Katsuhiko Murakami; Koji Nagao; Kozo Nishida; Kunihiro Nishimura; Tatsuya Nishizawa; Soichi Ogishima; Keiichiro Ono; Kazuki Oshita; Keun-Joon Park; Pjotr Prins; Taro L Saito; Matthias Samwald; Venkata P Satagopam; Yasumasa Shigemoto; Richard Smith; Andrea Splendiani; Hideaki Sugawara; James Taylor; Rutger A Vos; David Withers; Chisato Yamasaki; Christian M Zmasek; Shoko Kawamoto; Kosaku Okubo; Kiyoshi Asai; Toshihisa Takagi
Journal:  J Biomed Semantics       Date:  2013-02-11

7.  The Sequence Ontology: a tool for the unification of genome annotations.

Authors:  Karen Eilbeck; Suzanna E Lewis; Christopher J Mungall; Mark Yandell; Lincoln Stein; Richard Durbin; Michael Ashburner
Journal:  Genome Biol       Date:  2005-04-29       Impact factor: 13.583

8.  DDBJ read annotation pipeline: a cloud computing-based pipeline for high-throughput analysis of next-generation sequencing data.

Authors:  Hideki Nagasaki; Takako Mochizuki; Yuichi Kodama; Satoshi Saruhashi; Shota Morizaki; Hideaki Sugawara; Hajime Ohyanagi; Nori Kurata; Kousaku Okubo; Toshihisa Takagi; Eli Kaminuma; Yasukazu Nakamura
Journal:  DNA Res       Date:  2013-05-08       Impact factor: 4.458

9.  Complete genome sequence of Bradyrhizobium sp. S23321: insights into symbiosis evolution in soil oligotrophs.

Authors:  Takashi Okubo; Takahiro Tsukui; Hiroko Maita; Shinobu Okamoto; Kenshiro Oshima; Takatomo Fujisawa; Akihiro Saito; Hiroyuki Futamata; Reiko Hattori; Yumi Shimomura; Shin Haruta; Sho Morimoto; Yong Wang; Yoriko Sakai; Masahira Hattori; Shin-Ichi Aizawa; Kenji V P Nagashima; Sachiko Masuda; Tsutomu Hattori; Akifumi Yamashita; Zhihua Bao; Masahito Hayatsu; Hiromi Kajiya-Kanegae; Ikuo Yoshinaga; Kazunori Sakamoto; Koki Toyota; Mitsuteru Nakao; Mitsuyo Kohara; Mizue Anda; Rieko Niwa; Park Jung-Hwan; Reiko Sameshima-Saito; Shin-Ichi Tokuda; Sumiko Yamamoto; Syuji Yamamoto; Tadashi Yokoyama; Tomoko Akutsu; Yasukazu Nakamura; Yuka Nakahira-Yanaka; Yuko Takada Hoshino; Hideki Hirakawa; Hisayuki Mitsui; Kimihiro Terasawa; Manabu Itakura; Shusei Sato; Wakako Ikeda-Ohtsubo; Natsuko Sakakura; Eli Kaminuma; Kiwamu Minamisawa
Journal:  Microbes Environ       Date:  2012-03-28       Impact factor: 2.912

10.  DDBJ launches a new archive database with analytical tools for next-generation sequence data.

Authors:  Eli Kaminuma; Jun Mashima; Yuichi Kodama; Takashi Gojobori; Osamu Ogasawara; Kousaku Okubo; Toshihisa Takagi; Yasukazu Nakamura
Journal:  Nucleic Acids Res       Date:  2009-10-22       Impact factor: 16.971

View more
  27 in total

1.  Expanded microbial genome coverage and improved protein family annotation in the COG database.

Authors:  Michael Y Galperin; Kira S Makarova; Yuri I Wolf; Eugene V Koonin
Journal:  Nucleic Acids Res       Date:  2014-11-26       Impact factor: 16.971

2.  The structural code of cyanobacterial genomes.

Authors:  Robert Lehmann; Rainer Machné; Hanspeter Herzel
Journal:  Nucleic Acids Res       Date:  2014-07-23       Impact factor: 16.971

3.  Oxidation of P700 in Photosystem I Is Essential for the Growth of Cyanobacteria.

Authors:  Ginga Shimakawa; Keiichiro Shaku; Chikahiro Miyake
Journal:  Plant Physiol       Date:  2016-09-09       Impact factor: 8.340

4.  Sucrose synthesis in the nitrogen-fixing Cyanobacterium Anabaena sp. strain PCC 7120 is controlled by the two-component response regulator OrrA.

Authors:  Shigeki Ehira; Satoshi Kimura; Shogo Miyazaki; Masayuki Ohmori
Journal:  Appl Environ Microbiol       Date:  2014-07-07       Impact factor: 4.792

5.  Diversity in photosynthetic electron transport under [CO2]-limitation: the cyanobacterium Synechococcus sp. PCC 7002 and green alga Chlamydomonas reinhardtii drive an O2-dependent alternative electron flow and non-photochemical quenching of chlorophyll fluorescence during CO2-limited photosynthesis.

Authors:  Ginga Shimakawa; Seiji Akimoto; Yoshifumi Ueno; Ayumi Wada; Keiichiro Shaku; Yuichiro Takahashi; Chikahiro Miyake
Journal:  Photosynth Res       Date:  2016-03-29       Impact factor: 3.573

6.  Deletion of sll1541 in Synechocystis sp. Strain PCC 6803 Allows Formation of a Far-Red-Shifted holo-Proteorhodopsin In Vivo.

Authors:  Que Chen; Jeroen B van der Steen; Jos C Arents; Aloysius F Hartog; Srividya Ganapathy; Willem J de Grip; Klaas J Hellingwerf
Journal:  Appl Environ Microbiol       Date:  2018-04-16       Impact factor: 4.792

7.  The Peptidoglycan-Binding Protein SjcF1 Influences Septal Junction Function and Channel Formation in the Filamentous Cyanobacterium Anabaena.

Authors:  Mareike Rudolf; Nalan Tetik; Félix Ramos-León; Nadine Flinner; Giang Ngo; Mara Stevanovic; Mireia Burnat; Rafael Pernil; Enrique Flores; Enrico Schleiff
Journal:  MBio       Date:  2015-06-30       Impact factor: 7.867

8.  Genome-wide annotation and characterization of CLAVATA/ESR (CLE) peptide hormones of soybean (Glycine max) and common bean (Phaseolus vulgaris), and their orthologues of Arabidopsis thaliana.

Authors:  April H Hastwell; Peter M Gresshoff; Brett J Ferguson
Journal:  J Exp Bot       Date:  2015-07-17       Impact factor: 6.992

Review 9.  Databases for Microbiologists.

Authors:  Igor B Zhulin
Journal:  J Bacteriol       Date:  2015-05-26       Impact factor: 3.490

10.  Cyanobacterial KnowledgeBase (CKB), a Compendium of Cyanobacterial Genomes and Proteomes.

Authors:  Arul Prakasam Peter; Karthick Lakshmanan; Shylajanaciyar Mohandass; Sangeetha Varadharaj; Sivasudha Thilagar; Kaleel Ahamed Abdul Kareem; Prabaharan Dharmar; Subramanian Gopalakrishnan; Uma Lakshmanan
Journal:  PLoS One       Date:  2015-08-25       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.