| Literature DB >> 22232617 |
Shellie R Bench1, Irina N Ilikchyan, H James Tripp, Jonathan P Zehr.
Abstract
Unicellular nitrogen-fixing cyanobacteria are important components of marine phytoplankton. Although non-nitrogen-fixing marine phytoplankton generally exhibit high gene sequence and genomic diversity, gene sequences of natural populations and isolated strains of Crocosphaerawatsonii, one of the two most abundant open ocean unicellular cyanobacteria groups, have been shown to be 98-100% identical. The low sequence diversity in Crocosphaera is a dramatic contrast to sympatric species of Prochlorococcus and Synechococcus, and raises the question of how genome differences can explain observed phenotypic diversity among Crocosphaera strains. Here we show, through whole genome comparisons of two phenotypically different strains, that there are strain-specific sequences in each genome, and numerous genome rearrangements, despite exceptionally low sequence diversity in shared genomic regions. Some of the strain-specific sequences encode functions that explain observed phenotypic differences, such as exopolysaccharide biosynthesis. The pattern of strain-specific sequences distributed throughout the genomes, along with rearrangements in shared sequences is evidence of significant genetic mobility that may be attributed to the hundreds of transposase genes found in both strains. Furthermore, such genetic mobility appears to be the main mechanism of strain divergence in Crocosphaera which do not accumulate DNA microheterogeneity over the vast majority of their genomes. The strain-specific sequences found in this study provide tools for future physiological studies, as well as genetic markers to help determine the relative abundance of phenotypes in natural populations.Entities:
Keywords: Crocosphaera; comparative genomics; exopolysaccharide biosynthesis; genome conservation; mobile genetic elements; nitrogen fixation
Year: 2011 PMID: 22232617 PMCID: PMC3247675 DOI: 10.3389/fmicb.2011.00261
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Genome assembly information and annotation summary.
| Strain (NCBI ID) | Total genome length (bp) | No. of contigs | Longest contig | Average contig length | Genome G + C% | No. of ORFs | No. of transposases |
|---|---|---|---|---|---|---|---|
| WH8501 (GI #67858163) | 6,238,156 | 323 | 720,107 | 19,313 | 37.1 | 5,958 | 1,211 |
| WH0003 (AESD01000001–899) | 5,465,610 | 899 | 46,275 | 6,079 | 37.7 | 5,795 | 220 |
| Probable WH0003 (AESD01000900–1126) | 424,894 | 227 | 15,256 | 1,872 | 37.3 | 350 | 9 |
Figure 1Nucleotide BLAST similarity of open reading frames (ORFs) and intergenic spaces (IGSs) of the two . Each feature was binned according to the percent identity of the top BLAST alignment. E-values above 0.001 and alignments shorter than 50 bp were considered not significant. Bars in inset figure show total number of features in each bin, and the main figure bars represent the percent of the total number each feature type.
Figure 2Comparison of nucleotide sequence identity of open reading frames (ORFs, in red) and intergenic spaces (IGSs, in blue) between . Each point represents a single sequence with the x-coordinate as the subject position of top BLAST hit (i.e., highest scoring pair or “HSP”) in the proxy genome sequence, and the y-coordinate as the percent identity of the top BLAST alignment when compared to the alternate strain genome.
Figure 3Taxonomic distribution of 930 ORFs in the WH0003 genome that show little or no sequence similarity to the WH8501 genome. WH0003 sequences were binned according to the level of sequence similarity to the WH8501 genome across the entire ORF. The number of sequences in each bin were as follows: 751 in the <20% bin; 75 in the 20–35% bin; and 104 in the 35–50% bin. Pie charts show the relative contributions of the three bins to each branch.
ORFs within WH0003 strain-specific genome region.
| ORF locus tag | ORF start | ORF stop | RAST annotated function | Most similar COG | COG description |
|---|---|---|---|---|---|
| CWATWH0003_3496 | 7204 | 6563 | Hypothetical protein | ||
| CWATWH0003_3497 | 7824 | 7303 | Short-chain dehydrogenase/reductase SDR | COG1028 | Dehydrogenases with different specificities (related to short-chain alcohol dehydrogenases) |
| CWATWH0003_3498 | 7971 | 7840 | Hypothetical protein | COG1028 | Dehydrogenases with different specificities (related to short-chain alcohol dehydrogenases) |
| 8768 | 8040 | ||||
| CWATWH0003_3500 | 9766 | 8780 | Pyruvate dehydrogenase (lipoamide) | COG0022 | Thiamine pyrophosphate-dependent dehydrogenases, E1 component beta subunit |
| CWATWH0003_3501 | 10854 | 9811 | Pyruvate dehydrogenase (lipoamide) | COG1071 | Thiamine pyrophosphate-dependent dehydrogenases, E1 component alpha subunit |
| CWATWH0003_3502 | 11877 | 10945 | Putative aldo/keto reductase | COG0667 | Predicted oxidoreductase (related to aryl-alcohol dehydrogenases) |
| 12620 | 11937 | Macrocin- | None | ||
| CWATWH0003_3504 | 14047 | 12869 | |||
| 1039 | 26 | ||||
| CWATWH0003_3506 | 2511 | 1174 | Hypothetical protein | None | |
| 3946 | 2633 | ||||
| CWATWH0003_3508 | 4998 | 3946 | DegT/DnrJ/EryC1/StrS aminotransferase family protein | COG0399 | Predicted pyridoxal phosphate-dependent enzyme apparently involved in regulation of cell wall biogenesis |
| CWATWH0003_3509 | 5645 | 5301 | Hypothetical protein | None | |
| CWATWH0003_3510 | 5983 | 5657 | Hypothetical protein | None | |
| CWATWH0003_3511 | 6492 | 6325 | Hypothetical protein | None | |
| CWATWH0003_3512 | 7097 | 6507 | Acetyltransferase, putative | COG0110 | Acetyltransferases (the isoleucine patch superfamily) |
| CWATWH0003_3513 | 8034 | 7090 | Oxidoreductase domain protein | COG0673 | Predicted dehydrogenases and related proteins |
| 9356 | 8031 | UDP- | COG0677 | UDP- | |
| CWATWH0003_3515 | 11421 | 9538 | |||
| 12854 | 11541 | ||||
| CWATWH0003_3517 | 15181 | 12926 | Hypothetical protein | ||
| CWATWH0003_3518 | 15424 | 17586 | Hypothetical protein | None | |
Functions related to polysaccharide synthesis and export are in bold.
*Genes without homologous functions in the WH8501 genome.
Figure 4Alignment of WH8501 contig (top) to WH0003 contigs (bottom), showing a 25 kb region of the WH0003 genome (within large green shaded box) that has been replaced by a single transposase gene in the WH8501 genome (marked with an X). Red connecting bars and shading indicate regions of sequence homology. Hypothetical genes are in light gray, transposase genes are yellow and ORFs with other annotated functions are in blue (WH8501) or green (WH0003). ORFs with functions related to polysaccharide synthesis or export are marked with green arrowheads. Descriptions of the numbered genes are listed below. See Table 2 for annotated functions and COG similarities of the 25 contiguous, WH0003-specific ORFs.(1) CWATWH0003_3507: “O-antigen translocase,” similar to wzx, (2) CWATWH0003_3515: “polysaccharide biosynthesis protein CapD,” (3) CWATWH0003_3516: “polysaccharide export protein,” similar to wza, and (4) CWATWH0003_3517: “uncharacterized protein involved in exopolysaccharide biosynthesis,” similar to wzc.
Transposase IS family distribution in both genomes.
| IS family | WH8501 | WH0003 |
|---|---|---|
| IS630 | 306 | 5 |
| IS5 | 294 | 9 |
| IS1634 | 152 | 9 |
| IS1380 | 120 | 14 |
| IS200/IS605 | 83 | 115 |
| IS66 | 77 | 1 |
| ISAzo13 | 49 | 2 |
| IS3 | 41 | 0 |
| IS4 | 38 | 6 |
| IS701 | 32 | 1 |
| IS607 | 14 | 22 |
| ISAs1 | 3 | 3 |
| Tn3 | 1 | 6 |
| Other | 1 | 4 |
| Unknown | 210 | 18 |
| Total | 1421 | 215 |
.
Figure 5Expression of four IS family genes and . Expression values for each gene were normalized to average expression for that gene over the entire 26 h time course, with negative expression values indicating down regulation, and positive values indicating up regulation.