| Literature DB >> 16845049 |
Ali Masoudi-Nejad1, Koichiro Tonomura, Shuichi Kawashima, Yuki Moriya, Masanori Suzuki, Masumi Itoh, Minoru Kanehisa, Takashi Endo, Susumu Goto.
Abstract
Expressed sequence tag (EST) sequencing has proven to be an economically feasible alternative for gene discovery in species lacking a draft genome sequence. Ongoing large-scale EST sequencing projects feel the need for bioinformatics tools to facilitate uniform EST handling. This brings about a renewed importance for a universal tool for processing and functional annotation of large sets of ESTs. EGassembler (http://egassembler.hgc.jp/) is a web server, which provides an automated as well as a user-customized analysis tool for cleaning, repeat masking, vector trimming, organelle masking, clustering and assembling of ESTs and genomic fragments. The web server is publicly available and provides the community a unique all-in-one online application web service for large-scale ESTs and genomic DNA clustering and assembling. Running on a Sun Fire 15K supercomputer, a significantly large volume of data can be processed in a short period of time. The results can be used to functionally annotate genes, to facilitate splice alignment analysis, to link the transcripts to genetic and physical maps, design microarray chips, to perform transcriptome analysis and to map to KEGG metabolic pathways. The service provides an excellent bioinformatics tool to research groups in wet-lab as well as an all-in-one-tool for sequence handling to bioinformatics researchers.Entities:
Mesh:
Year: 2006 PMID: 16845049 PMCID: PMC1538775 DOI: 10.1093/nar/gkl066
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1EGassembler data flow. The flowchart shows the pipeline used in the EGassembler web server. The Middle portion shows the process and running modes (parallel or single). The right side shows each process action and the left side shows the databases used by each process for masking.
Figure 2EGassembler performance. The large plot shows the EGassembler performance under different sequence loads and different numbers of CPUs. The inset displays the performance with ≤8000 sequences.