Literature DB >> 21803841

Genomic structure of the cyanobacterium Synechocystis sp. PCC 6803 strain GT-S.

Naoyuki Tajima1, Shusei Sato, Fumito Maruyama, Takakazu Kaneko, Naobumi V Sasaki, Ken Kurokawa, Hiroyuki Ohta, Yu Kanesaki, Hirofumi Yoshikawa, Satoshi Tabata, Masahiko Ikeuchi, Naoki Sato.   

Abstract

Synechocystis sp. PCC 6803 is the most popular cyanobacterial strain, serving as a standard in the research fields of photosynthesis, stress response, metabolism and so on. A glucose-tolerant (GT) derivative of this strain was used for genome sequencing at Kazusa DNA Research Institute in 1996, which established a hallmark in the study of cyanobacteria. However, apparent differences in sequences deviating from the database have been noticed among different strain stocks. For this reason, we analysed the genomic sequence of another GT strain (GT-S) by 454 and partial Sanger sequencing. We found 22 putative single nucleotide polymorphisms (SNPs) in comparison to the published sequence of the Kazusa strain. However, Sanger sequencing of 36 direct PCR products of the Kazusa strains stored in small aliquots resulted in their identity with the GT-S sequence at 21 of the 22 sites, excluding the possibility of their being SNPs. In addition, we were able to combine five split open reading frames present in the database sequence, and to remove the C-terminus of an ORF. Aside from these, two of the Insertion Sequence elements were not present in the GT-S strain. We have thus become able to provide an accurate genomic sequence of Synechocystis sp. PCC 6803 for future studies on this important cyanobacterial strain.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21803841      PMCID: PMC3190959          DOI: 10.1093/dnares/dsr026

Source DB:  PubMed          Journal:  DNA Res        ISSN: 1340-2838            Impact factor:   4.458


Introduction

The nucleotide sequence of the genome of the cyanobacterium Synechocystis sp. PCC 6803 was determined by Kazusa DNA Research Institute in 1996 as the first genome of photosynthetic organism.[1] After that, this strain has been serving as a standard of cyanobacteria in various areas of research, such as photosynthesis, stress response and metabolism.[2] However, the sequenced strain (called Kazusa strain in the present study) is different from the stock in Pasteur Culture Collection (called PCC strain in the present study). In fact, the Kazusa strain is a derivative of a ‘glucose-tolerant’ strain, which was obtained by J.G.K. Williams in DuPont Institute.[3] The published sequence of the Kasuza strain included some genes inactivated by a putative point mutation, a putative frame shift, or an Insertion Sequence (IS) insertion, such as a one in the pilC gene. The mutation within the coding sequence of the pilC gene was pointed out to be a possible reason for the non-motility of the Kazusa strain.[4] A 154 bp deletion was also found in the GT strain with respect to the PCC strain.[5] The location of some IS elements in the Kazusa strain is known to be different with respect to other GT and PCC strains.[6] Even within the PCC strains, different strains having different light responses have been isolated.[2] All these slightly different strains bear the common strain name PCC 6803, but we need to recognize differences in exact strains used in various studies. For this purpose, we will have to pinpoint the differences in genome sequences of various different strains. One of the authors (N.S.) constructed 40 site-directed mutants in a previous work on comparative genomics of plants and cyanobacteria[7] using the laboratory stock of Synechocystis GT strain (called GT-S). We thought that this strain should be identical to the Kazusa strain, because it originated in the late 1980s from the strain owned by Dr T. Omata, which was also the source of the Kazusa strain. However, in view of the small but significant differences in genome sequence as reported earlier, it was important to establish the genetic background of our strain to assess correctly the phenotype of the above-mentioned mutants. Therefore, we attempted to analyse the genome sequence of the strain GT-S and to compare it with the reference sequence of the Kazusa strain. We found significant differences with respect to the database sequence, but we were finally convinced that the differences in the real sequences were minimal.

Materials and methods

Strain and genomes

Synechocystis GT-S strain was originally a gift from Dr Tatsuo Omata (Nagoya University, but he was in Riken Institute then) in the late 1980s, and then maintained in Sato laboratory as frozen glycerol stocks. In the present study, we used the stock originally frozen in the early 1990s. The cells were grown in the BG-11 medium at 32°C with aeration as described before.[8] The cells were harvested by centrifugation, and then washed twice with 4 M NaI to remove extracellular polysaccharide, and then, treated with lysozyme. DNA was released by treatment with proteinase K and sodium N-dodecanoylsarcosinate, extracted with phenol and chloroform and purified by CsCl ultracentrifugation.[9] As a reference, we also used an aliquot of the DNA of the original Kazusa strain, which had been stored as a stock in Kazusa DNA Research Institute.

Sequencing and data analysis

Genomic DNA was sheared by ultrasonic treatment and sequenced by a genome sequencer FLX instrument (Roche Diagnostics, Indianapolis, IN, USA) according to the manufacturer's protocol (this is usually referred to as ‘454 sequencing’). To find its genomic origin, namely, main genome or plasmids, each read was analysed by BLASTN[10] software version 2.2.18 using the sequences of the four plasmids as well as the main genome as targets (the accession numbers are given in Supplementary Table S5). The options were: -FF −e 0.0001 −v 2 −b 2 −m 8 −C F (no filtering, cut-off E-value = 0.0001, output and list sequences = 2, table-formatted output, no compositional adjustments). In the table-formatted output, only the first line corresponding to the highest identity was selected for each read, which was assigned to the genome shown therein. The authentic reads assigned for genomic DNA obtained in this way were mapped onto the reference sequence of the Kazusa strain (GenBank and RefSeq accession numbers: BA000022 and NC_000911 for the main genome) by the inGAP software version 2.3.1.[11] Unfortunately, the details of internal algorithm of the software are not clear, and there is no option related to the detection of SNPs. Therefore, all putative SNPs detected by default settings were analysed. Plasmids were also analysed by using respectively assigned reads. A list of putative SNPs was obtained as an output. Homology of affected open reading frames (ORFs) with orthologues in other cyanobacteria was analysed by the cluster data of CyanoClust database[12] prepared by the Gclust software.[13] Processing of DNA and protein sequences was performed with the SISEQ software version 1.59.[14] Sequence alignments were constructed with the Clustal X software version 2.0.9.[15] Genomic sequence was manipulated by the Artemis software version 13.0.[16]

Sequence confirmation

For each putative SNP, a genomic region of 200–300 bp was amplified (see Supplementary Table S1 for primer sequences). For each putative IS element, a genomic region of ∼300 or 1500 bp was amplified (see Supplementary Table S2 for primer sequences). The amplification of a long DNA was to overcome repeated sequences. DNA templates of both GT-S and Kazusa strains were used. The products were sequenced by conventional Sanger sequencing, using the sequencing services of MACROGEN Japan Corp. (Tokyo, Japan) or FASMAC Co. Ltd. (Atsugi, Japan).

Results

Identification of SNPs

We obtained 197 912 reads having an average length of 399.3 bases for the GT-S strain by 454 sequencing. Without the preliminary classification of reads, 68 single-nucleotide polymorphisms (SNPs) were obtained for the main genome, but many of them were not correct, because of the presence of highly homologous genes in plasmids. Then the reads were allocated to the main genome and the four plasmids by homology analysis as described in Section 2.2. The 173 217 reads that were classified as reads for the main genome were mapped to the reference sequence NC_000911. Using the default settings of inGAP software (see Supplementary data for the list of options), the entire genome was covered by at least one read, except four small regions (Supplementary Table S3). The analysis of such gap regions was performed separately, as described below. As a result, 31 putative SNPs were detected by the inGAP analysis. All of them were selected as highly probable SNPs for experimental validation. Each of the putative SNPs was checked by PCR amplification and Sanger sequencing of both strands. Twenty-two SNPs (Table 1) were finally identified as the differences of the sequence of the GT-S strain with respect to the database sequence NC_000911 (identical to BA000022 with respect to the DNA sequence). To verify that these represent real differences of the two strains, we analysed, by Sanger sequencing, the DNA of the Kazusa strain, which had been stored in small aliquots. Surprisingly, all the putative SNP sites were found identical in the Kazusa strain and the GT-S strain except No. 8 (Table 1). The SNP No. 8 is the mutation within the pilC gene, which had been reported earlier.[4] The two putative SNPs in the psbA3 coding region were identical to the corresponding sites of the psbA2 gene. Since the correct psbA3 sequence had been published before the genome sequence,[17] these putative SNPs are probably sequencing artefacts in NC_000911. A putative SNP site in the psaA gene also matches the previously published sequence.[18] In other cases, we have no clear explanation, and might be sequencing errors and/or mutations in cosmid clones used in the original sequencing.
Table 1.

List of putative SNPs

No.SiteGeneCyanoClust cluster no.DatabaseGT-KazusaGT-SAmino acid changeAnnotationRef.
1943495psaA16GAAV→IP700 apoprotein subunit Ia18
21012958No geneGTTN/A
31364187pyrF784AGGNoneOrotidine 5’ monophosphate decarboxylase
41819782psbA318AGGNonePhotosystem II D1 protein17
51819788AGGNone
62092571sll04221760ATTL→terAsparaginase
72198893sll014215TCCNoneCation or drug efflux system protein
82204584gspF+pilC917+7792GGFrame shiftPilin biogenesis protein4
92301721slr01686624AGGK→EHypothetical protein
102350285.5No geneAAN/A
112360245.5slr036426 765+19 649CCFrame shiftHypothetical protein
122409244sll07622611CFrame shiftHypothetical protein
132419399ycf22779TFrame shiftHypothetical protein
142544044.5ssl07872596CCFrame shiftHypothetical protein
152602717slr046831358CAAH→QHypothetical protein
162602734TAAI→N
172748897No geneCTTN/A
183096187ssr1175796TCCI→TTransposase
193110189No geneGAAN/A
203110343sll06651448GTTP→QTransposase
213142651sps2831AGGNoneSucrose phosphate synthase
223260096No geneCN/A

GT-Kazusa and GT-S are Synechocystis sp. PCC 6803 strain GT in Kazusa DNA Research Institute and Sato Laboratory. ‘Site’ and ‘Database’ refers to the sequences in BA000022 or NC_000911. Insertion site numbers represent the last position of insertion site + 0.5. N/A indicates that the amino acid change is not applicable because SNP site is not in an ORF.

List of putative SNPs GT-Kazusa and GT-S are Synechocystis sp. PCC 6803 strain GT in Kazusa DNA Research Institute and Sato Laboratory. ‘Site’ and ‘Database’ refers to the sequences in BA000022 or NC_000911. Insertion site numbers represent the last position of insertion site + 0.5. N/A indicates that the amino acid change is not applicable because SNP site is not in an ORF. Unfortunately, the mapping of reads on to the reference genome was not perfect using the obtained reads. In 14 short regions, no reads or at most two reads were mapped (Supplementary Tables S3 and S4). These regions were amplified by PCR for both GT-S and Kazusa strains (results not shown). Conventional sequencing of the PCR products confirmed that there is no sequence difference in 11 of these regions with respect to the database sequence. The remaining three regions having two reads were close to one another and located within a 3 kb region. Clean PCR amplification of this 3 kb region was not successful because of repeated sequences. However, the presence of two reads led us to tentatively conclude that there is no sequence difference in these regions.

Analysis of plasmids

Plasmids were also analysed by inGAP mapping. There were no putative SNPs in pSYSM and pSYSG (Supplementary Table S5). In pSYSA, four sites were reported as putative SNPs, but all of them represent sites having only two reads and one of the reads matched database sequence. Therefore, these were not considered as SNPs in pSYSA. In pSYSX, four sites within or near ssr6089 gene were detected as putative SNPs. Analysis using the CyanoClust database indicated that this plasmid contains 30 kb homologous regions, ssr6002–slr6038 and ssr6062–slr6094. The ssr6089 gene has a nearly identical homologue ssr6030. However, the sequence corresponding to the four putative SNPs were identical in the two genes in the database sequence NC_005232. Therefore, the SNP calling was not due to mixing of reads for homologous genes. The SNPs could possibly represent mutations in the strain GT-S, but final validation is hampered by high similarity of the long homologous regions.

Alteration of ORFs due to frame shift

There are five cases in which a single gene is split into a pair of genes as a result of frame shift. Figure 1A shows the site of putative SNP 12, namely the sll0762–sll0763 region. There is an extraneous C in the database sequence, and accordingly, the removal of this C results in fusion of the two ORFs. This new ORF encoding a hypothetical protein has well-conserved orthologues in other cyanobacteria (Anabaena, Cyanothece, Arthrospira etc.) as shown by the alignment of the cluster 2611 of the CyanoClust (Fig. 1B).
Figure 1.

Correction of ORFs due to a frame shift in the sll0762–sll0763 retion. (A) Output of an SNP site in the reference sequence of the Kazusa strain by the inGAP software. The upper DNA sequence indicates the reference sequence of the Kazusa strain (GenBank and RefSeq accession numbers: BA000022 and NC_000911), and the lower DNA sequence indicates the sequence of the GT-S strain. Each arrow represents a gene. Each arrowhead indicates an SNP site. (B) New alignment with a corrected sequence. Homology of affected ORFs with corresponding sequences in other cyanobacteria was analysed by the CyanoClust database version 4, and the cluster 2611 was found. Sequences were retrieved and a new alignment was obtained by the Clustal X software. ‘New_Sequence’ indicates the corrected sequence. Arrowhead indicates the nucleotide variations detected as putative SNP site.

Correction of ORFs due to a frame shift in the sll0762–sll0763 retion. (A) Output of an SNP site in the reference sequence of the Kazusa strain by the inGAP software. The upper DNA sequence indicates the reference sequence of the Kazusa strain (GenBank and RefSeq accession numbers: BA000022 and NC_000911), and the lower DNA sequence indicates the sequence of the GT-S strain. Each arrow represents a gene. Each arrowhead indicates an SNP site. (B) New alignment with a corrected sequence. Homology of affected ORFs with corresponding sequences in other cyanobacteria was analysed by the CyanoClust database version 4, and the cluster 2611 was found. Sequences were retrieved and a new alignment was obtained by the Clustal X software. ‘New_Sequence’ indicates the corrected sequence. Arrowhead indicates the nucleotide variations detected as putative SNP site. To correct the database sequence to obtain the GT-S genome sequence, we should combine (i) slr0162 (gspF) and slr0163 (pilC), (ii) slr0364 and slr0366, (iii) sll0762 and sll0763 (this is described above), (iv) sll0751 (ycf22) and sll0752, and (v) ssl0787 and ssl0788 (Supplementary Figs S1 and S2). In addition, the extended C-terminus of Sll0422 protein should be removed after correction for the nucleotide change (Supplementary Fig. S1). All these changes except (i) also apply to the real sequence of Kazusa strain.

Large indels

We also checked large indels (insertion/deletions). The exact sites of insertion of various IS elements have already been analysed.[6] Among them, ISY203b insertion between slr1862 and slr1863 and ISY203g insertion between sll1473 and sll1475 were found in the Kazusa strain but not in the GT-S strain. ISY203e insertion between ssl2982 and slr1636 was detected in both Kazusa and GT-S strains (Table 2) but not in another GT strain in Ikeuchi laboratory. It has also been known that a 154 bp element upstream of the slr2031 gene is deleted in the GT strains.[5] This deletion was shared by all GT strains analysed in the present study.
Table 2.

List of ISY203s detected in GT strains

IS nameTransposase geneDatabaseGT-KazusaGT-S
ISY203bsll1780YesYesNo
ISY203eslr1635YesYesYes
ISY203gsll1474YesYesNo
List of ISY203s detected in GT strains

Finally validated differences of the two strains

All previous description was based on the comparison using the database sequence as the sole reference. Given that there are a number of changes that have to be made for the database sequence, we summarize our results as the differences between the real sequences of GT-S and GT-Kazusa. The two sequences are essentially identical except a single frame-shift mutation in the pilC gene and two more insertions of ISY203 in GT-Kazusa with respect to GT-S.

Discussion

The present study revealed that a significant number of differences are present in the database sequence and the genome sequences of laboratory strains of the same ‘species’ Synechocystis sp. PCC 6803. The detailed analysis using the genomic DNA of both Kazusa and GT-S strains indicated that the detected 21 putative SNPs were, in fact, differences in the database sequence, but not real differences in the two genomes. The final balance sheet indicates that we found a single nucleotide change and two IS insertions between the Kazusa strain and the GT-S strain. The time of separation of the two strains may be estimated as the mid-1980s according to the opinions of concerned people, which are now quite obscure. The time interval until the DNA isolation for sequencing may be roughly estimated as about 10 years for the Kazusa strain. The GT-S strain was stocked in the early 1990s, and re-plated in 2010 for the present analysis. The effective time interval from the separation of the two sub-strains was also about 10 years. The results suggest that nucleotide change (mutation) could be kept to a minimum (only one, in this case) if due attention is paid for maintenance of strains, but IS mobilization may be more frequent (two events). The rapid mobilization of IS could be limited to the particular element ISY203, but we do not know the actual trigger of activation of this IS element. We, therefore, should be careful about IS activation in the maintenance of laboratory stocks. We will need a convenient way of detecting a mobilized ISY203 to be sure about our research using the GT strain. The nucleotide changes as a result of re-sequencing caused significant effects on gene annotation. As mentioned, five genes had been thought split into two by a single nucleotide difference before this analysis. The length of another gene was also changed. The IS element inserted in the sll1474 (ccaS) gene is known to inactivate it.[6] Altogether, the nucleotide changes (whether sequencing errors or real mutations) have an important impact on molecular biological researches using cyanobacteria or other bacteria. A single run of new generation sequencing with some additional PCR experiments can establish identity of the organism that is being used in the laboratory. This will become a standard of molecular genetics in microbiology. The genomic database is very important in not only experimental studies but also computational analysis. The use of correct sequence is a prerequisite for detailed comparative genomics research. The 21 sites per 3.6 Mb genome are significantly large number for present-day level of genome analysis. The correction of the standard sequence will be especially useful in Synechocystis sp. PCC 6803, which is a standard cyanobacterial strain in various areas of research such as photosynthesis and stress response among others. We hope our data deposited as a new separate entry will be useful for all those who are using this cyanobacterium in various researches.

Databases

The genome sequence of the strain GT-S was deposited in the DDBJ/GenBank/EMBL database under the accession number AP012205.

Supplementary data

Supplementary data are available at www.dnaresearch.oxfordjournals.org.

Funding

This work was supported in part by the Global Center of Excellence (GCOE) Program ‘From the Earth to “Earths”’ from the MEXT, Japan.
  15 in total

1.  Experimental analysis of recently transposed insertion sequences in the cyanobacterium Synechocystis sp. PCC 6803.

Authors:  S Okamoto; M Ikeuchi; M Ohmori
Journal:  DNA Res       Date:  1999-10-29       Impact factor: 4.458

2.  Artemis: sequence visualization and annotation.

Authors:  K Rutherford; J Parkhill; J Crook; T Horsnell; P Rice; M A Rajandream; B Barrell
Journal:  Bioinformatics       Date:  2000-10       Impact factor: 6.937

3.  SISEQ: manipulation of multiple sequence and large database files for common platforms.

Authors:  N Sato
Journal:  Bioinformatics       Date:  2000-02       Impact factor: 6.937

4.  Type IV pilus biogenesis and motility in the cyanobacterium Synechocystis sp. PCC6803.

Authors:  D Bhaya; N R Bianco; D Bryant; A Grossman
Journal:  Mol Microbiol       Date:  2000-08       Impact factor: 3.501

5.  Nucleotide sequence of the psbA3 gene from the cyanobacterium Synechocystis PCC 6803.

Authors:  J Metz; P Nixon; B Diner
Journal:  Nucleic Acids Res       Date:  1990-11-25       Impact factor: 16.971

6.  Orthogenomics of photosynthetic organisms: bioinformatic and experimental analysis of chloroplast proteins of endosymbiont origin in Arabidopsis and their counterparts in Synechocystis.

Authors:  Masayuki Ishikawa; Makoto Fujiwara; Kintake Sonoike; Naoki Sato
Journal:  Plant Cell Physiol       Date:  2009-02-18       Impact factor: 4.927

7.  Gclust: trans-kingdom classification of proteins using automatic individual threshold setting.

Authors:  Naoki Sato
Journal:  Bioinformatics       Date:  2009-01-21       Impact factor: 6.937

8.  DNA transformation.

Authors:  R D Porter
Journal:  Methods Enzymol       Date:  1988       Impact factor: 1.600

9.  CyanoClust: comparative genome resources of cyanobacteria and plastids.

Authors:  Naobumi V Sasaki; Naoki Sato
Journal:  Database (Oxford)       Date:  2010-01-08       Impact factor: 3.451

10.  inGAP: an integrated next-generation genome analysis pipeline.

Authors:  Ji Qi; Fangqing Zhao; Anne Buboltz; Stephan C Schuster
Journal:  Bioinformatics       Date:  2009-10-30       Impact factor: 6.937

View more
  20 in total

1.  MarR-type transcriptional regulator ChlR activates expression of tetrapyrrole biosynthesis genes in response to low-oxygen conditions in cyanobacteria.

Authors:  Rina Aoki; Tomoya Takeda; Tatsuo Omata; Kunio Ihara; Yuichi Fujita
Journal:  J Biol Chem       Date:  2012-02-28       Impact factor: 5.157

2.  Three Substrains of the Cyanobacterium Anabaena sp. Strain PCC 7120 Display Divergence in Genomic Sequences and hetC Function.

Authors:  Yali Wang; Yuan Gao; Chao Li; Hong Gao; Cheng-Cai Zhang; Xudong Xu
Journal:  J Bacteriol       Date:  2018-06-11       Impact factor: 3.490

3.  Enhancing photosynthesis at high light levels by adaptive laboratory evolution.

Authors:  Marcel Dann; Edgardo M Ortiz; Moritz Thomas; Arthur Guljamow; Martin Lehmann; Hanno Schaefer; Dario Leister
Journal:  Nat Plants       Date:  2021-05-03       Impact factor: 15.793

4.  Transcription regulation of plastid genes involved in sulfate transport in Viridiplantae.

Authors:  Vassily A Lyubetsky; Alexander V Seliverstov; Oleg A Zverkov
Journal:  Biomed Res Int       Date:  2013-08-29       Impact factor: 3.411

5.  Identification of substrain-specific mutations by massively parallel whole-genome resequencing of Synechocystis sp. PCC 6803.

Authors:  Yu Kanesaki; Yuh Shiwa; Naoyuki Tajima; Marie Suzuki; Satoru Watanabe; Naoki Sato; Masahiko Ikeuchi; Hirofumi Yoshikawa
Journal:  DNA Res       Date:  2011-12-22       Impact factor: 4.458

6.  Identification of Specific Variations in a Non-Motile Strain of Cyanobacterium Synechocystis sp. PCC 6803 Originated from ATCC 27184 by Whole Genome Resequencing.

Authors:  Qinglong Ding; Gu Chen; Yuling Wang; Dong Wei
Journal:  Int J Mol Sci       Date:  2015-10-12       Impact factor: 5.923

7.  Comparative genome analysis of the closely related Synechocystis strains PCC 6714 and PCC 6803.

Authors:  Matthias Kopf; Stephan Klähn; Nadin Pade; Christian Weingärtner; Martin Hagemann; Björn Voß; Wolfgang R Hess
Journal:  DNA Res       Date:  2014-01-09       Impact factor: 4.458

8.  Cellular Dynamics Drives the Emergence of Supracellular Structure in the Cyanobacterium, Phormidium sp. KS.

Authors:  Naoki Sato; Yutaro Katsumata; Kaoru Sato; Naoyuki Tajima
Journal:  Life (Basel)       Date:  2014-11-28

9.  Microevolution in cyanobacteria: re-sequencing a motile substrain of Synechocystis sp. PCC 6803.

Authors:  Danika Trautmann; Björn Voss; Annegret Wilde; Salim Al-Babili; Wolfgang R Hess
Journal:  DNA Res       Date:  2012-10-15       Impact factor: 4.458

Review 10.  Development of Synechocystis sp. PCC 6803 as a phototrophic cell factory.

Authors:  Yi Yu; Le You; Dianyi Liu; Whitney Hollinshead; Yinjie J Tang; Fuzhong Zhang
Journal:  Mar Drugs       Date:  2013-08-13       Impact factor: 5.118

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.