Florian Meereis1, Michael Kaufmann. 1. The Protein Chemistry Group, Witten/Herdecke University, Stockumer Str, 10, 58448 Witten, Germany. florian@meereis.com
Abstract
BACKGROUND: The current versions of the COG and arCOG databases, both excellent frameworks for studies in comparative and functional genomics, do not contain the nucleotide sequences corresponding to their protein or protein domain entries. RESULTS: Using sequence information obtained from GenBank flat files covering the completely sequenced genomes of the COG and arCOG databases, we constructed NUCOCOG (nucleotide sequences containing COG databases) as an extended version including all nucleotide sequences and in addition the amino acid sequences originally utilized to construct the current COG and arCOG databases. We make available three comprehensive single XML files containing the complete databases including all sequence information. In addition, we provide a web interface as a utility suitable to browse the NUCOCOG database for sequence retrieval. The database is accessible at http://www.uni-wh.de/nucocog. CONCLUSION: NUCOCOG offers the possibility to analyze any sequence related property in the context of the COG and arCOG framework simply by using script languages such as PERL applied to a large but single XML document.
BACKGROUND: The current versions of the COG and arCOG databases, both excellent frameworks for studies in comparative and functional genomics, do not contain the nucleotide sequences corresponding to their protein or protein domain entries. RESULTS: Using sequence information obtained from GenBank flat files covering the completely sequenced genomes of the COG and arCOG databases, we constructed NUCOCOG (nucleotide sequences containing COG databases) as an extended version including all nucleotide sequences and in addition the amino acid sequences originally utilized to construct the current COG and arCOG databases. We make available three comprehensive single XML files containing the complete databases including all sequence information. In addition, we provide a web interface as a utility suitable to browse the NUCOCOG database for sequence retrieval. The database is accessible at http://www.uni-wh.de/nucocog. CONCLUSION:NUCOCOG offers the possibility to analyze any sequence related property in the context of the COG and arCOG framework simply by using script languages such as PERL applied to a large but single XML document.
Authors: R L Tatusov; D A Natale; I V Garkavtsev; T A Tatusova; U T Shankavaram; B S Rao; B Kiryutin; M Y Galperin; N D Fedorova; E V Koonin Journal: Nucleic Acids Res Date: 2001-01-01 Impact factor: 16.971
Authors: Kira S Makarova; Alexander V Sorokin; Pavel S Novichkov; Yuri I Wolf; Eugene V Koonin Journal: Biol Direct Date: 2007-11-27 Impact factor: 4.540
Authors: Roman L Tatusov; Natalie D Fedorova; John D Jackson; Aviva R Jacobs; Boris Kiryutin; Eugene V Koonin; Dmitri M Krylov; Raja Mazumder; Sergei L Mekhedov; Anastasia N Nikolskaya; B Sridhar Rao; Sergei Smirnov; Alexander V Sverdlov; Sona Vasudevan; Yuri I Wolf; Jodie J Yin; Darren A Natale Journal: BMC Bioinformatics Date: 2003-09-11 Impact factor: 3.169
Authors: Michail Yu Lobanov; Benjamin A Shoemaker; Sergiy O Garbuzynskiy; Jessica H Fong; Anna R Panchenko; Oxana V Galzitskaya Journal: Nucleic Acids Res Date: 2009-11-11 Impact factor: 16.971