| Literature DB >> 23987304 |
Hidetoshi Itaya1, Kazuki Oshita, Kazuharu Arakawa, Masaru Tomita.
Abstract
The popular European Molecular Biology Open Software Suite (EMBOSS) currently contains over 400 tools used in various bioinformatics researches, equipped with sophisticated development frameworks for interoperability and tool discoverability as well as rich documentations and various user interfaces. In order to further strengthen EMBOSS in the fields of genomics, we here present a novel EMBOSS associated software (EMBASSY) package named GEMBASSY, which adds more than 50 analysis tools from the G-language Genome Analysis Environment and its Representational State Transfer (REST) and SOAP web services. GEMBASSY basically contains wrapper programs of G-language REST/SOAP web services to provide intuitive and easy access to various annotations within complete genome flatfiles, as well as tools for analyzing nucleic composition, calculating codon usage, and visualizing genomic information. For example, analysis methods such as for calculating distance between sequences by genomic signatures and for predicting gene expression levels from codon usage bias are effective in the interpretation of meta-genomic and meta-transcriptomic data. GEMBASSY tools can be used seamlessly with other EMBOSS tools and UNIX command line tools. The source code written in C is available from GitHub (https://github.com/celery-kotone/GEMBASSY/) and the distribution package is freely available from the GEMBASSY web site (http://www.g-language.org/gembassy/).Entities:
Year: 2013 PMID: 23987304 PMCID: PMC3847652 DOI: 10.1186/1751-0473-8-17
Source DB: PubMed Journal: Source Code Biol Med ISSN: 1751-0473
Complete list of 53 tools implemented in GEMBASSY
| gentrez | | gbasezvalue | |
| | genret | | gconsensusz |
| gcgr | | gdeltagcskew | |
| | gcircularmap | | gdistincc |
| | gdnawalk | | gfindoriter |
| | ggenomemap3 | | ggcsi |
| | gseq2png | | ggcskew |
| gbui | | ggcwin | |
| | gcai | | ggeneskew |
| | gcbi | | ggenomicskew |
| | gcodoncompiler | | gkmertable |
| | gdeltaenc | | gldabias |
| | gdinuc | | gnucleotideperiodicity |
| | genc | | goligomercounter |
| | gew | | goligomersearch |
| | gfop | | gpalindrome |
| | gicdi | | gqueryarm |
| | gp2 | | gquerystrand |
| | gphx | | greporiter |
| | gsvalue | | gscs |
| | gwvalue | | gseqinfo |
| gb1 | | gsignature | |
| | gb2 | | gviewcds |
| | gbasecounter | gshuffleseq | |
| | gbaseentropy | gaminoinfo | |
| | gbaseinformationcontent | gaaui | |
| gbaserelativeentropy |
Implemented tools are mostly consisted of genome analysis methods for nucleotide composition, codon usage, and genome information visualization, which are mainly implemented with published algorithms. The letter “g” is prefixed to indicate the software as a tool included in the GEMBASSY package. Detailed documentations for each of the tools as well as full references are available at the GEMBASSY web site (http://www.g-language.org/gembassy/).
Figure 1Example analysis workflow with GEMBASSY. The workflow first searches the NCBI Entrez Genome database for the term “Bacillus subtilis” (gentrez), retrieves the genome flatfile of the [RefSeq:NC_000964] entry (entret), and generates a sequence logo for sequences around start codons of top or worst 100 PHX (predicted highly expressed) genes (gphx, emma, extractalign, and kweblogo). This example seamlessly utilizes GEMBASSY, EMBOSS and regular UNIX commands.
Figure 2Graphical output from the sample workflow. (A) Result of keyword search in the NCBI Entrez Genome database with gentrez (result of Figure 1–1). (B and C) Sequence logos created with kweblogo, for top 100 (B) and worst 100 (C) PHX genes (Figure 1–9).