| Literature DB >> 20100331 |
Radhey S Gupta1, Divya W Mathews.
Abstract
BACKGROUND: The phylogeny and taxonomy of cyanobacteria is currently poorly understood due to paucity of reliable markers for identification and circumscription of its major clades.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20100331 PMCID: PMC2823733 DOI: 10.1186/1471-2148-10-24
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
List of Cyanobacterial Genomes Studied in this work
| Species Name | Genome size (Mb) | GC content % | Protein Number | Genome | Center/Pubmed ID |
|---|---|---|---|---|---|
| 8.36 | 47.0 | 6254 | NC_009925.1 | [ | |
| 7.07 | 41.4 | 5043 | NC_007413.1 | DOE JGI | |
| 4.66 | 62 | 4430 | NC_005125.1 | [ | |
| 5.43 | 37.9 | 4762 | NC_010546.1 | Washington University | |
| 4.81 | 39.8 | 4260 | NC_011726.1 | DOE JGI | |
| 7.21 | 41.3 | 5366 | NC_003272.1 | [ | |
| 5.8 | 42.3 | 6312 | NC_010296.1 | Kazusa | |
| 8.2 | 41.4 | 6087 | NC_010628.1 | DOE JGI | |
| 1.7 | 31.3 | 1921 | NC_008816.1 | J. Craig Venter Institute | |
| 1.7 | 39.7 | 1855 | NC_009976.1 | [ | |
| 1.7 | 31.1 | 1983 | NC_009840.1 | DOE JGI | |
| 1.6 | 31.3 | 1907 | NC_009091.1 | GBM Foundation | |
| 2.7 | 50 | 2997 | NC_008820.1 | J. Craig Venter Institute | |
| 1.71 | 31.2 | 1810 | NC_007577.1 | DOE JGI. | |
| 2.41 | 50.7 | 2269 | NC_005071.1 | [ | |
| 1.7 | 30.8 | 1906 | NC_008817.1 | J. Craig Venter Institute | |
| 1.9 | 35 | 2193 | NC_008819.1 | J. Craig Venter Institute | |
| 1.8 | 35.1 | 2163 | NC_007335.2 | DOE Joint Genome Inst. | |
| 1.75 | 36.4 | 1883 | NC_005042.1 | [ | |
| 1.7 | 30.8 | 1717 | NC_005072.1 | [ | |
| 2.7 | 55.5 | 2527 | NC_006576.1 | [ | |
| 2.75 | 55.4 | 2612 | NC_007604.1 | DOE JGI | |
| 2.61 | 52.4 | 2892 | NC_008319.1 | [ | |
| 2.51 | 59.2 | 2645 | NC_007516.1 | [ | |
| 2.23 | 54.2 | 2307 | NC_007513.1 | [ | |
| 3.05 | 58.5 | 2862 | NC_007776.1 | TIGR | |
| 2.93 | 60.2 | 2760 | NC_007775.1 | TIGR | |
| 2.2 | 60.8 | 2535 | NC_009482.1 | [ | |
| 2.4 | 60.2 | 2533 | NC_009481.1 | [ | |
| 3.4 | 49.2 | 2823 | NC_010475.1 | Penn. State University | |
| 2.43 | 59.4 | 2519 | NC_005070.1 | [ | |
| 3.95 | 47.4 | 3172 | NC_000911.1 | [ | |
| 2.59 | 53.9 | 2476 | NC_004113.1 | [ | |
| 7.8 | 34.1 | 4451 | NC_008312.1 | DOE Joint Genome Inst. |
Abbreviations: DOE-JGI, Department of Energy Joint Genome Institute; TIGR, The Institute of Genome Research; GBM, Gordon & Betty Moore. The genome of Crocosphaera watsonii WH8501 was not fully sequenced.
Figure 1A maximum-likelihood distance tree for sequenced cyanobacteria based on concatenated sequences for 44 conserved proteins. The distance scale (bar = 0.1 substitutions per site) is shown in the top right hand corner. The tree was rooted using B. subtilis and S. aureus sequences. The numbers at the nodes indicate % of puzzling quartets supporting various nodes. The low B/A ecotype clade refers to the Prochlorococcus spp. containing lower ratio of chlorophyll b/athat are adapted to growth at high light intensities.
Cyanobacterial Signature Proteins
| (a) Protein that are Uniquely found in All (or most) Cyanobacteria | |||
|---|---|---|---|
| NP_439901/slr0613 | hypothetical (173) | NP_441893/ssl0242 | hypothetical (78) |
| NP_439967/slr1122 | hypothetical (329) | NP_442014/sll0350* | hypothetical (803) |
| NP_439995/slr0729+ | hypothetical (101) | NP_442026/slr0376 | hypothetical (116) |
| NP_440139/slr1796 | hypothetical (201) | NP_442147/sll0208* | hypothetical (231) |
| NP_440262/ssl1972 | hypothetical (93) | NP_442176/sll0413* | hypothetical (207) |
| NP_440437/slr2049+ | hypothetical (192) | NP_442207/ssr0109 | hypothetical (78) |
| NP_440459/slr1915 | hypothetical (104) | NP_442330/sll0372 a | hypothetical (196) |
| NP_440545/ssr2843+ | hypothetical (87) | NP_442365/ssr0332 | hypothetical (70) |
| NP_440678/slr1900 a | hypothetical (247) | NP_442366/slr0211 | hypothetical (403) |
| NP_440903/sll1271 | hypothetical (572) | NP_442402/slr0921 | hypothetical (128) |
| NP_440946/sll0860 | hypothetical (173) | NP_442464/sll0822a | hypothetical (129) |
| NP_441021/ssr3189 | hypothetical (55) | NP_442734/slr0042 | hypothetical (576) |
| NP_441047/slr2144* | hypothetical (301) | NP_442826/sll1340 | hypothetical (85) |
| NP_441164/ssr2087 | hypothetical (84) | NP_442884/slr1557 | hypothetical (369) |
| NP_441199/slr1990 | hypothetical (240) | NP_442932/slr0748+ | hypothetical (230) |
| NP_441265/ssl0461* | hypothetical (83) | NP_443015/sll1109 | hypothetical (194) |
| NP_441307/sll1979 | hypothetical (142) | NP_484529/asr0485+ | hypothetical (92) |
| NP_441346/ssr2551 | hypothetical (94) | NP_440513/slr1384 | hypothetical (391) |
| NP_441647/slr1160* | hypothetical (204) | NP_0010358/slr1146 | hypothetical (89) |
| NP_441848/sll0359 | hypothetical (155) | ||
| NP_439997/slr0731 | Hypothetical (402) | NP_441174/slr1260 | Hypothetical (177) |
| NP_440149/slr1800 | Hypothetical (355) | NP_441937/slr1949 | Hypothetical (212) |
| NP_441115/sll0854 | Hypothetical (308) | ||
| NP_440495/sll0984 | Hypothetical (148) | NP_441597/slr1276 | Hypothetical (275) |
| NP_440591/slr2025 | Hypothetical (153) | NP_485360/all1317 | Hypothetical (147) |
| NP_440594/sll1915 | Hypothetical (183) | NP_488024/all3984 | Hypothetical (231) |
| NP_440896/sll1274 | Hypothetical (171) | NP_488046/all4006 | Hypothetical (127) |
| NP_441155/sll1155* | Hypothetical (113) | NP_484683/asl0639 | Hypothetical (73) |
| NP_484163/all0119* | Hypothetical (137) | NP_485187/alr1144* | Hypothetical (290) |
| NP_484255/all0211* | Hypothetical (126) | ||
* - missing in 1-2 species
a significant similarity also seen for 1-2 other bacteria
+ also found in some algae and mosses
Clade A is comprised of G. violaceus, Synechococcus sp. JA-3-3Ab and Synechococcus sp. JA-2-3B'a
Clade C is comprised of most of the Synechococcus and all Prochlorococcus sps.
Figure 2An interpretive cladogram indicating the evolutionary stages where genes for different signature proteins described in this work, which are specific for different groups of cyanobacteria, likely evolved. Many conserved indels that are specific for the same groups/clades of cyanobacteria, have also been described in recent work [23].
Proteins Specific for Clade B Cyanobacteria
| (a) Protein that are Uniquely found in All (or most) Clade B Cyanobacteria | |||
|---|---|---|---|
| Protein | Function (length) | Protein | Function (length) |
| NP_439990/slr0723*+ | Hypothetical (363) | NP_484675/all0631*+ | Hypothetical (130) |
| NP_440199/slr0971 | Hypothetical (451) | NP_484710/all0666* | Hypothetical (348) |
| NP_440305/slr0695a | Hypothetical (173) | NP_485162/all1119*+ | Hypothetical (255) |
| NP_440382/sll1642*+ | Hypothetical (163) | NP_485285/alr1242* | Hypothetical (221) |
| NP_440557/sll1573* | Hypothetical (104) | NP_485393/alr1350*+ | Hypothetical (359) |
| NP_440936/slr0888*+ | Hypothetical (168) | NP_485508/all1467* | Hypothetical (247) |
| NP_441490/sll1247* | Hypothetical (457) | NP_486386/alr2346* | Hypothetical (104) |
| NP_441696/slr1686* | Hypothetical (141) | NP_486393/asl2353* | Hypothetical (98) |
| NP_441913/sll1858* | Hypothetical (627) | NP_486647/asr2607* | Hypothetical (65) |
| NP_442061/slr0779* | Hypothetical (206) | NP_487221/all3181* | Hypothetical (322) |
| NP_442144/slr0217+ | Hypothetical (140) | NP_487892/all3852 | Hypothetical (281) |
| NP_484091/all0047* | Hypothetical (531) | NP_488032/asr3992* | photosystem II reaction center |
| NP_484127/alr0083*+ | Hypothetical (137) | NP_488333/alr4293* | Hypothetical (163) |
| NP_484326/all0282* | Hypothetical(162) | NP_488559/all4519* | Hypothetical (104) |
| NP_484594/asl0550* | Hypothetical (72) | NP_488570/alr45302 | Hypothetical (388) |
| NP_484607/all0563 | general secretion pathway protein (207) | NP_488633/all4593a | Hypothetical (434) |
| NP_484635/all0591* | Hypothetical (123) | NP_488729/all4689* | Hypothetical (169) |
| NP_484674/all0630* | Hypothetical (128) | NP_489127/alr5087* | Hypothetical (124) |
| NP_440371/ssl1918 | Hypothetical (97) | NP_485176/alr1133* | Hypothetical (160) |
| NP_440821/slr1218 | Hypothetical (158) | NP_485590/alr1550* | Hypothetical (119) |
| NP_441017/sll1757+ | Hypothetical (292) | NP_486755/all2715* | Hypothetical (214) |
| NP_441155/sll1155 | Hypothetical (67) | NP_486776/all2736* | Hypothetical (186) |
| NP_441519/slr1970*+ | Hypothetical (173) | NP_487697/asr3657* | Hypothetical (120) |
| NP_441527/sll1884+ | Hypothetical (374) | NP_488054/asl4014* | Hypothetical (98) |
| NP_441857/ssr0657 | Hypothetical (103) | NP_488538/asr4498* | Hypothetical (86) |
| NP_442144/slr0217+ | Hypothetical 140) | NP_488628/asr4588* | Hypothetical (68) |
| NP_442174/ssl0788+ | Hypothetical (97) | NP_488797/all4757* | Hypothetical (116) |
| NP_442462/slr0845* | Hypothetical (190) | NP_488854/alr4814* | Hypothetical (162) |
| NP_484393/all0349*+ | Hypothetical(138) | NP_489314/all5274* | Hypothetical (247) |
* - Missing in 1-2 species
+ Also present in Synechococcus sp. PCC 7335
a A homolog showing significant similarity is also found in Sorganum cellulosum
Proteins Specific for Different Groups within Clade B Cyanobacteria
| (a) Proteins Specific for Nostocales, Oscillatoriales and Chroococcales (NOC) Orders | |||
|---|---|---|---|
| NP_441847/sll0360# | Hypothetical (277) | NP_486936/asr2896 | Hypothetical (63) |
| NP_484828/asr0785 | Hypothetical (60) | NP_488368/asl4328 | Hypothetical (68) |
| NP_485335/all1292 | Hypothetical (142) | NP_488902/asl4862 | Hypothetical (77) |
| NP_485350/asr1307 | Hypothetical (78) | NP_488971/all4931 | Hypothetical (225) |
| NP_485586/alr1546 | Hypothetical (170) | ||
| NP_484145/alr0101 | Hypothetical (258) | NP_485811/all1771 | Hypothetical (238) |
| NP_484259/all0215 | Hypothetical (212) | NP_486433/alr2393 | Hypothetical (343) |
| NP_484503/all0459* | Hypothetical (119) | NP_486508/asr2468* | Hypothetical (76) |
| NP_484625/asr0581* | Hypothetical (76) | NP_486828/all2788* | Hypothetical (146) |
| NP_484724/asr0680* | Hypothetical (94) | NP_487523/asr3483* | Hypothetical (64) |
| NP_484725/alr0681* | Hypothetical (115) | NP_488294/all4254× | Hypothetical (398) |
| NP_485091/asr1048* | Hypothetical (65) | NP_488340/all4300* | Hypothetical (227) |
| NP_485092/asr1049* | Hypothetical (88) | NP_488754/alr4714 | Hypothetical (232) |
| NP_485286/asl1243* | Hypothetical (72) | NP_488903/alr4863 | Hypothetical (999) |
| NP_485748/all1708* | Hypothetical (200) | NP_489130/all5090 | Hypothetical (162) |
| NP_486432/alr2392* | filament integrity protein (179) | NP_489162/all5122 | Hypothetical (119) |
| BAA10649/slr0111 | hypothetical (173) | BAA17589/sll1268 | hypothetical(517) |
| BAA10763 | cytochrome b6-f complex subunit (36) | BAA17704/sll1755 | hypothetical(407) |
| BAA16770/slr1107 | hypothetical(444) | BAA18427/slr0960 | hypothetical(146) |
| BAA17546/ssr2406 | hypothetical(74) | BAA18451/sll1531 | hypothetical(608) |
| NP_48404/all0002 | Hypothetical (245) | NP_485976/asl1936 | Hypothetical (81) |
| NP_484071/asl0027 | Hypothetical (81) | NP_485977/asl1937 | Hypothetical (83) |
| NP_484141/asl0097 | Hypothetical (51) | NP_486406/alr2366 | Hypothetical (118) |
| NP_484220/asl0176 | Hypothetical (87) | NP_486414/alr2374 | Hypothetical (129) |
| NP_484351/all0307 | Hypothetical (114) | NP_486562/alr2522 | Hypothetical (141) |
| NP_484421/alr0377 | Hypothetical (153) | NP_486815/alr2775 | Hypothetical (249) |
| NP_484504/asr0460 | Hypothetical (81) | NP_487185/all3145 | Hypothetical (122) |
| NP_484505/asr0461 | Hypothetical (96) | NP_487215/alr3175 | Hypothetical (264) |
| NP_484526/asr0482 | Hypothetical (64) | NP_487290/asr3250 | Hypothetical (69) |
| NP_484616/asl0572 | Hypothetical (75) | NP_487319/asr3279 | Hypothetical (64) |
| NP_484758/asl0715 | Hypothetical (56) | NP_487408/asr3368 | Hypothetical (75) |
| NP_484822/asl0779 | Hypothetical (67) | NP_487429/asr3389 | Hypothetical (75) |
| NP_484885/asl0842 | Hypothetical (80) | NP_487760/alr3720 | Hypothetical (129) |
| NP_484898/asr0855 | Hypothetical (83) | NP_487950/alr3910 | Hypothetical (252) |
| NP_484966/asr0923 | Hypothetical (67) | NP_487957/alr3917 | Hypothetical (447) |
| NP_485022/all0979 | Hypothetical (220) | NP_488113/all4073 | Hypothetical (121) |
| NP_485048/asr1005 | Hypothetical (80) | NP_488149/all4109 | Hypothetical (235) |
| NP_485180/alr1137 | Hypothetical (107) | NP_488157/all4117 | Hypothetical (411) |
| NP_485189/alr1146 | Hypothetical (847) | NP_488392/asr4352 | Hypothetical (65) |
# also found in one of the clade A cyanobacteria
* missing in 1-2 species/strains
+Additional proteins that are specific for Nostocales are listed in the Additional file 5.
Figure 3Partial sequence alignment of flavoprotein showing a 6 aa conserved insert (boxed) that is specific for the Clade C cyanobacteria. Dashes (-) in this and all other sequence alignments indicate identity with the amino acid on the top line. The numbers on the top indicate the position of the sequence in the species on the first line. The absence of this insert in all other cyanobacteria and other phyla of bacteria provide evidence that this indel is an insert in the Clade C.
Proteins Specific for the Clade C Cyanobacteria (Synechococcus/Prochlorococcus)
| Protein | Function (length) | Protein | Function (length) |
|---|---|---|---|
| NP_874427/Pro0033 | predicted membrane protein (87) | YP_001483584 | Hypothetical (114) |
| NP_874433/Pro0039 | predicted membrane protein (203) | YP_001483784 | Hypothetical (60) |
| NP_874460/Pro0066 | predicted membrane protein (128) | YP_001483792+ | Hypothetical (116) |
| NP_874461/Pro0067 | Hypothetical (154) | YP_001483839 | Hypothetical(75) |
| NP_874496/Pro0102 | Hypothetical (121) | YP_001484024 | Hypothetical (67) |
| NP_874497/Pro0103 | Hypothetical (76) | YP_001484070 | Hypothetical (96) |
| NP_874503/Pro0109 | Hypothetical (127) | YP_001484558 | Hypothetical(70) |
| NP_874769/Pro0375 | Hypothetical (128) | YP_001484735 | Hypothetical(136) |
| NP_874827/Pro0433 | Hypothetical (148) | YP_001484929 | Hypothetical (89) |
| NP_874971/Pro0578 | Hypothetical (104) | YP_001484936 | Hypothetical (237) |
| NP_875238/Pro0846 | Hypothetical (135) | YP_001485057 | Hypothetical(88) |
| NP_875250/Pro0858 | Hypothetical (116) | YP_001485093 | Hypothetical (172) |
| NP_875290/Pro0898 | Hypothetical (75) | YP_001485151+ | Hypothetical (139) |
| NP_875352/Pro0960 | Hypothetical (76) | NP_875191/Pro0799* | Hypothetical (234) |
| NP_875454/Pro1062 | Hypothetical (189) | NP_875240/Pro0848* | membrane protein/(99) |
| NP_875462/Pro1070 | dihydroneopterin aldolase (127) | NP_875270/Pro0878* | Hypothetical (62) |
| NP_875555/Pro1163 | predicted protein family PM-1 (67) | YP_001483575* | Hypothetical(71) |
| NP_875594/Pro1202 | Hypothetical (81) | YP_001483809*+ | Hypothetical(116) |
| NP_875635/Pro1243 | Hypothetical (193) | YP_001483828* | Hypothetical(122) |
| NP_876135/Pro1744 | Hypothetical (206) | YP_001483924* | Hypothetical(502) |
| NP_876152/Pro1761 | Hypothetical (98) | NP_875468/Pro1076* | Hypothetical (88) |
| NP_876219/Pro1828 | Hypothetical (100) | NP_875511/Pro1119* | Predicted protein with signal (144) |
| YP_001010165 | Hypothetical(121) | NP_875732/Pro1341* | Hypothetical (88) |
| YP_001483235 | type II secretion system (149) | NP_876151/Pro1760* | Hypothetical (152) |
| YP_001483304 | Hypothetical (100) | NP_876229/Pro1838* | Hypothetical (171) |
| YP_001483312 | Hypothetical (87) | YP_001483988* | Hypothetical(70) |
| YP_001483445 | Hypothetical(72) | YP_001484266* | Hypothetical(195) |
| YP_001483588 | TIR domain-containing protein (82) | YP_001483537* | possible Pollen allergen (139) |
| YP_001483568 | hypothetical (102) | YP_001483448 | Hypothetical (42) |
| YP_001484489 | hypothetical (85) | YP_001484000 | hypothetical (80) |
| NP_875075/Pro0683* | Predicted protein family PM-3 (178) | NP_875154/Pro0762 | Hypothetical (127) |
| NP_874434/Pro0040* | Hypothetical (119) | NP_875509/Pro1117 | Hypothetical (181) |
| NP_874631/Pro0237 | Hypothetical (102) | NP_875611/Pro1219* | Predicted protein family PM-3 (195) |
| NP_875013/Pro0621* | predicted protein family PM-3 (167) | NP_876129/Pro1738* | Predicted dehydrogenase (273) |
* - Missing in 1-2 species
+Also present in Synechococcous elongatus
Several of these proteins are also present in Cyanobium sp. PCC7001 and Paulinella chromatophora
#Low B/A ecotype clade is comprised of the following Prochlorococcus strains: Pro. marinus AS9601, Pro. marinus MIT9215, Pro. marinus MIT9301, Pro. marinus MIT9312, Pro. marinus MIT9515, Pro. marinus CCMP1986
Proteins specific for the Main Groups of Clade C Cyanobacteria
| (a) Proteins Specific for the Clade C cyanobacteria+ except | |||
|---|---|---|---|
| NP_896793/SYNW0700 | Hypothetical (76) | NP_897761/SYNW1668 | Hypothetical (181) |
| NP_896942/SYNW0849* | Hypothetical (120) | NP_898450/SYNW2361* | Hypothetical (129) |
| NP_897039/SYNW0946 | Hypothetical (139) | NP_896879/SYNW0786* | Hypothetical (107) |
| NP_896623/SYNW0528* | Hypothetical(94) | NP_896904/SYNW0811* | Hypothetical (81) |
| NP_896827/SYNW0734 | Hypothetical(152) | NP_897398/SYNW1305* | Hypothetical(78) |
| NP_897338/SYNW1245* | Hypothetical(95) | NP_897599/SYNW1506 | Hypothetical(221) |
| NP_897228/SYNW1135* | Hypothetical(139) | NP_897875/SYNW1784 | Hypothetical(150) |
| YP_001483307 | hypothetical (58) | YP_001484319* | hypothetical (94) |
| YP_001483938* | hypothetical (109) | YP_001484350 | hypothetical (104) |
| YP_001483942 | hypothetical (75) | YP_001484353* | hypothetical (68) |
| YP_001483946 | hypothetical (88) | YP_001484529* | hypothetical (99) |
| YP_001483975* | hypothetical (99) | YP_001484536* | hypothetical (42) |
| YP_001483996* | hypothetical (51) | NP_875788* | hypothetical (81) |
| YP_001484105* | hypothetical (64) | YP_001483983* | hypothetical (96) |
| YP_001484131 | hypothetical (61) | YP_001484474* | hypothetical (79) |
| YP_001484828 | hypothetical (55) | YP_001484870 | hypothetical (142) |
| YP_001483822 | hypothetical (44) | ||
* Missing in 1-2 strains/isolates
+ These proteins are primarily present in various Synechococcus species/strains that are part of Clade C (see Figs. 1 and 2). However, Synechococcus genus is not monophyletic and many Synechococcus strains group with Clade and B (viz. Synechococcus sp. PCC7002, Synechococcus sp. PCC7335, Synechococcus sp. JA-3-3Ab and JA-2-3B'a) and these proteins are absent in those strains. Besides Synechococcus, homologs of many of these proteins are also found in Cyanobium sp. PCC7001 as well as in Paulinella chromatophora, indicating that these species may also belong to the Clade C cyanobacteria.
Figure 4Partial sequence alignment of heme oxygenase showing a 2 aa insert (boxed) that is uniquely present in all sequenced . This insert provides evidence that Prochlorococcus strains are monophyletic and shared a common ancestor.
Figure 5Partial sequence alignment of the protein protochlorophyllide oxidoreductase showing a 1 aa deletion that is commonly shared by all . This indel provides evidence for the deep branching of these Prochlorococcus strains relative to all other strains.