| Literature DB >> 22193367 |
Yu Kanesaki1, Yuh Shiwa, Naoyuki Tajima, Marie Suzuki, Satoru Watanabe, Naoki Sato, Masahiko Ikeuchi, Hirofumi Yoshikawa.
Abstract
The cyanobacterium, Synechocystis sp. PCC 6803, was the first photosynthetic organism whose genome sequence was determined in 1996 (Kazusa strain). It thus plays an important role in basic research on the mechanism, evolution, and molecular genetics of the photosynthetic machinery. There are many substrains or laboratory strains derived from the original Berkeley strain including glucose-tolerant (GT) strains. To establish reliable genomic sequence data of this cyanobacterium, we performed resequencing of the genomes of three substrains (GT-I, PCC-P, and PCC-N) and compared the data obtained with those of the original Kazusa strain stored in the public database. We found that each substrain has sequence differences some of which are likely to reflect specific mutations that may contribute to its altered phenotype. Our resequence data of the PCC substrains along with the proposed corrections/refinements of the sequence data for the Kazusa strain and its derivatives are expected to contribute to investigations of the evolutionary events in the photosynthetic and related systems that have occurred in Synechocystis as well as in other cyanobacteria.Entities:
Mesh:
Year: 2011 PMID: 22193367 PMCID: PMC3276265 DOI: 10.1093/dnares/dsr042
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Summary of mapping analyses using the read data (BWA, MAQ) or the de novo assembled contigs (Velvet and MUMmer)
| GT-I | PCC-N | PCC-P | |
|---|---|---|---|
| Total read bases (Mb) | 250 | 257 | 221 |
| Averaged read depth | 70 | 72 | 62 |
| Genome coverage (%) | 99.99 | 99.99 | 99.99 |
| Mapping programmes | Number of SNPs and indels called by each programmes (Final number of differences/number of differences including false-positive data) | ||
| MAQ | 16/76 | 26/78 | 23/89 |
| BWA | 19/69 | 32/79 | 28/75 |
| Velvet and MUMmer | 22/85 | 33/104 | 29/109 |
| BreakDancer | 3/3 | 3/3 | 3/3 |
| Final number of differences to the database | 28 | 44 | 39 |
Figure 1.Analytical scheme of the read data obtained by massive parallel sequencing. The preparation of the DNA library is described in Materials and methods section. The mapping programmes BWA and MAQ were used for short-read data; the de novo assembly programme was Velvet, and MUMmer was the mapping programme for assembled contigs.
Figure 2.Diagram of the mutations identified by each programme. The number of mutations (SNPs and indels) confirmed by the Sanger method is shown in each circle with the number of mutations including false-positive data in parenthesis. The number of mutations detected by plural programmes is indicated in the circle overlap region. Threshold (cut-off) value of 60% was used in mapping programmes; BWA and MAQ (see Materials and methods section). Mutations detected commonly by all three programmes were more reliable. The combinatorial use of the mapping programmes is important for the genome-wide identification of the mutation loci. Numbers labelled with an asterisk contain miss-called results indicated by parenthesis in Tables 2 and 3.
List of the genomic loci of SNPs and indels found in all GT-I, PCC-P, and PCC-N strains compared with the nucleotide sequence in the database
| Genomic loci | Type | Data base | GT-Kazusa | GT-S strain | GT-I strain | PCC-P strain | PCC-N strain | Quality score | Source | Gene ID | Annotation | Amino acid change | Comment |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 943495 | SNP | G | A | A | A | A | A | 255 255 | MAQ BWA mummer | V604I | Smart and McIntosh (10). Error of the Database (27) | ||
| 1012958 | SNP | G | T | T | T | T | T | 255 255 | MAQ BWA mummer | — | Error of the Database (27) | ||
| 1200143–1201488 (1200306) | Indel (SNP) | IS (C) | IS | — | —(A) | —(A) | —(A) | (108 233) 99 | (MAQ BWA) BreakDancer | — | Insertion of transposase (14). GT-Kazusa specific (27). MAQ and BWA detected this indel region as SNP | ||
| 1364187 | SNP | A | G | G | G | G | G | 255 255 | MAQ BWA mummer | Silent | Error of the Database (27) | ||
| 2092571 | SNP | A | T | T | T | T | T | 255 255 | MAQ BWA mummer | L313* Stop codon | Error of the Database (27) | ||
| 2198893 | SNP | T | C | C | C | C | C | 255 255 | MAQ BWA mummer | 689 Silent | Error of the Database (27) | ||
| 2204584 | Indel | G | G | — | — | — | — | mummer | G-insertion in GT-Kazusa strain causes split of the original | ||||
| 2301721 | SNP | A | G | G | G | G | G | 255 255 | MAQ BWA mummer | K403E | Error of the Database (27) | ||
| 2350285–2350286 | Indel | — | A | A | A | A | A | 317 | BWA mummer | — | Error of the Database (27) | ||
| 2360245–2360246 | Indel | — | C | C | C | C | C | 323 | BWA mummer | Frameshift | Error of the Database (27) | ||
| 2409244 | Indel | C | — | — | — | — | — | mummer | Frameshift | Error of the Database (27) | |||
| 2419399 | Indel | T | — | — | — | — | — | 302 | BWA mummer | Frameshift | Error of the Database (27) | ||
| 2544044–2544045 | Indel | — | C | C | C | C | C | 180 | BWA mummer | Frameshift | Error of the Database (27) | ||
| 2602717 | SNP | C | A | A | A | A | A | 255 255 | MAQ BWA mummer | H82Q | Error of the Database (27) | ||
| 2602734 | SNP | T | A | A | A | A | A | 255 255 | MAQ BWA mummer | I88N | Error of the Database (27) | ||
| 2748897 | SNP | C | T | T | T | T | T | 255 255 | MAQ BWA mummer | — | Error of the Database (27) | ||
| 3142651 | SNP | A | G | G | G | G | G | 255 255 | MAQ BWA mummer | 75 Silent | Error of the Database (27) | ||
| 3260096 | Indel | C | C | — | — | — | — | Mummer | — | GT-Kazusa specific (27) | |||
| 3400322–3401506 | Indel | IS | IS | — | — | — | — | 99 | BreakDancer | Insertion of transposase. IS-insertion causes split of the original | |||
| Genomic loci | Type | Data base | GT-Kazusa | GT-S strain | GT-I strain | PCC-P strain | PCC-N strain | Quality score | Source | Gene ID | Annotation | Amino acid change | Comment |
| 386410- 386411 (386406) | Indel (SNP) | —(T) | — | — | 102 bp (A) | 102 bp (A) | 102 bp (A) | (68) | (MAQ) | 34 amino acids deletion (V77D) | This indel region was called as SNP by MAQ as shown in parentheses. This indel region was not detected in the GT-S strain (27). CTGGGGGAAAAATGTTGGATTGATAACCTCGCCCCGGTTACCATTGAGTCCCATGTGTGTATTTCCCAGGGCGTTTACCTATGCACTGGCAACCACGATTGG | ||
| 1192983 | SNP | A | A | A | C/A | C/A | C/A | Mummer | T167P | Potential heterogeneous nucleotide (Intensity of the peaks due to C and A were almost equal.) This SNP was not detected in the GT-S strain (27) | |||
| 2048341–2049583 | Indel | IS | IS | IS | — | — | — | 99 | BreakDancer | — | Insertion of transposase (14). Specific IS in GT-Kazusa and GT-S strains (27) |
The left column shows the genomic locus of each mutation in the database (NCBI accession number; NC_000911). Quality scores indicate the phred-scaled scores called by MAQ and BWA, respectively. Quality scores given by BreakDancer is a software-original value. The upper table listed the mutations that were suggested as the error of the database and also that the GT-Kazusa strain-specific mutations such as ISY203b, ISY203g, and the locus 2204584 (31). Lower table shows additional differences found only in GT-I, PCC-P and PCC-N strains. Greyed columns emphasize the different sites and their details. Several indel regions miscalled as SNP by MAQ and BWA were shown in parentheses.
List of the genomic loci of SNPs and indels found in the specific strains compared with the nucleotide sequence in the database
| Genomic loci | Type | Data base | GT-Kazusa | GT-S strain | GT-I strain | PCC-P strain | PCC-N strain | Quality score | Source | Gene ID | Annotation | Amino acid change | Comment |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 126257 | SNP | C | C | C | C | T | T | 255 255 | MAQ BWA mummer | D63N | Different site between GT strains and PCC strains | ||
| 731367 | Indel | T | T | T | T | — | — | 155 | BWA mummer | T insertion in GT strains causes gene split of the original | |||
| 781625–781626 | Indel | — | — | — | — | 154 bp | 154 bp | — | — | Different site between GT strains and PCC strains (13). TTTAAACGTCATGCACCAATCTCTGATTTACTGGTTTATTCATCTATCAATTCCATAGGCTTTTTGCTTCATCGCTCCAACTAACTTTTCTGGGATGTCCTCCATGCCCCCCGTGCCTAGCTTACCGTCCACCGATGCCGTTATTCCCCCCGGC | |||
| 831647 | SNP | C | C | C | C | T | T | 255 255 | MAQ BWA mummer | — | Different site between GT strains and PCC strains | ||
| 1204616 | SNP | G | G | G | G | A | A | 255 255 | MAQ BWA mummer | C114Y | Different site between GT strains and PCC strains | ||
| 1300941–1300985 (1300977) | Indel (SNP) | 45 bp | 45 bp | 45 bp | 45 bp (C) | —(T) | —(T) | (164 255) | (MAQ BWA) | 15 amino acids deletion | Putative PCC strains-specific 45bp deletion without frameshift. This indel region was called as SNP by MAQ and BWA as shown in parentheses. GGGCTATCCTGCGGGATAGCGACATGACCCTGGCCACTCTCCAGG | ||
| 1423340–1423341 | Indel | — | — | — | — | A | A | 386 | BWA mummer | N1438* Stop codon | Different site between GT strains and PCC strains | ||
| 1437389 | SNP | A | A | A | A | G | G | 255 255 | MAQ BWA mummer | N6S | Different site between GT strains and PCC strains | ||
| 1812419 | SNP | C | C | C | C | T | T | 255 255 | MAQ BWA mummer | A225V | Different site between GT strains and PCC strains | ||
| 2521013 | SNP | T | T | T | T | C | C | 255 255 | MAQ BWA mummer | F898S | Different site between GT strains and PCC strains | ||
| 2736514–2736515 | Indel | — | — | — | — | T | T | 257 | BWA mummer | Frameshift | Different site between GT strains and PCC strains | ||
| 3014665 | SNP | T | T | T | T | C | C | 255 255 | MAQ BWA mummer | 92 Silent | Different site between GT strains and PCC strains | ||
| 3096187 | SNP | T | T | T | T | C | C | 66 | MAQ | I47T | Different site between GT strains and PCC strains | ||
| 3098707 | SNP | T | T | T | T | C | C | 217 224 | MAQ BWA | C95R | Different site between GT strains and PCC strains. Potential heterogenous nucleotide (Small T peak was also detected in the PCC strains) | ||
| Genomic loci | Type | Data base | GT-Kazusa | GT-S strain | GT-I strain | PCC-P strain | PCC-N strain | Quality score | Source | Gene ID | Annotation | Amino acid change | Comment |
| 387006 | SNP | C | C | C | T | C | C | 255 255 | MAQ BWA mummer | P109L | GT-I strain-specific | ||
| 842060 | SNP | C | C | C | T | C | C | 255 255 | MAQ BWA mummer | R185Q | GT-I strain-specific | ||
| 909360 | SNP | C | C | C | T | C | C | 255 255 | MAQ BWA mummer | E93K | GT-I strain-specific | ||
| 1392586 | SNP | T | T | T | C | T | T | 255 255 | MAQ BWA mummer | L204S | GT-I strain-specific | ||
| 1470212 | SNP | G | G | G | A | G | G | 255 255 | MAQ BWA mummer | R46C | GT-I strain-specific | ||
| 1764198 | SNP | T | T | T | G | T | T | 234 244 | MAQ BWA mummer | F158C | GT-I strain-specific | ||
| Genomic loci | Type | Data base | GT-Kazusa | GT-S strain | GT-I strain | PCC-P strain | PCC-N strain | Quality score | Source | Gene ID | Annotation | Amino acid change | Comment |
| 125218 | SNP | G | G | G | G | A | G | 255 255 | MAQ BWA mummer | T409M | PCC-P strain-specific | ||
| 1437136 | SNP | G | G | G | G | A | G | 255 255 | MAQ BWA mummer | 146 Silent | PCC-P strain-specific | ||
| 2674108 | SNP | C | C | C | C | T | C | 255 255 | MAQ BWA mummer | A3V | PCC-P strain-specific | ||
| 69849 | SNP | G | G | G | G | G | A | 255 255 | MAQ BWA mummer | R189Q | PCC-N strain-specific | ||
| 125262–125273 | Indel | 12 bp | 12 bp | 12 bp | 12 bp | 12 bp | — | Mummer | Four amino acids deletion | PCC-N strain-specific 12base deletion without frameshift. CTGGGTCAACAT | |||
| 1597057 | SNP | T | T | T | T | T | G | 255 255 | MAQ BWA mummer | V88G | PCC-N strain-specific | ||
| 1763998 | SNP | G | G | G | G | G | C | 255 255 | MAQ BWA mummer | E91D | PCC-N strain-specific | ||
| 2370197 | SNP | A | A | A | A | A | G | 255 255 | MAQ BWA mummer | T306A | PCC-N strain-specific | ||
| 2580625 | SNP | T | T | T | T | T | A | 255 255 | MAQ BWA mummer | — | PCC-N strain-specific | ||
| 2580626 | SNP | A | A | A | A | A | G | 255 255 | MAQ BWA mummer | — | PCC-N strain-specific | ||
| 2881614–2881615 | Indel | — | — | — | — | — | T | 269 | Bwa | Frameshift | PCC-N strain-specific |
The left column shows the genomic locus of each mutation in the database (NCBI accession number; NC_000911). Quality scores indicate the phred-scaled scores called by MAQ and BWA, respectively. Greyed columns emphasize the different sites and their details. Several indel regions miscalled as SNP by MAQ and BWA were shown in parentheses.
Figure 3.Alignment of the specific indel regions whose consensus read bases were miss-called. (A) The 154-base deletion in the slr2031 gene[13] in the GT strains. (B) The 12-base deletion in the hik33 gene in the PCC-N strain. (C) The 45-base deletion in the slr1819 gene in PCC-P and PCC-N strains. (D) The 102-base deletion in the slr1084 gene in the GT-S and Kazusa strains. Deleted regions were underlined and direct-repeat sequences were emphasized by grey colour. These deleted loci were situated in the middle of the direct-repeat sequences.
Figure 4.Unrooted tree of phylogenetic relationship of various strains of Synechocystis sp. PCC 6803. Known events are indicated on each branch. The number of mutations in each substrain to the database sequence (Kazusa strain) was indicated. The scale bar indicates the distance of branch corresponding to the number of mutations.