| Literature DB >> 17526512 |
Byungwook Lee1, Taehui Hong, Sang Jin Byun, Taeha Woo, Yoon Jeong Choi.
Abstract
We present a web-based server, called ESTpass, for processing and annotating sequence data from expressed sequence tag (EST) projects. ESTpass accepts a FASTA-formatted EST file and its quality file as inputs, and it then executes a back-end EST analysis pipeline consisting of three consecutive steps. The first is cleansing the input EST sequences. The second is clustering and assembling the cleansed EST sequences using d2_cluster and CAP3 programs and producing putative transcripts. From the CAP3 output, ESTpass detects chimeric EST sequences which are confirmed through comparison with the nr database. The last step is annotating the putative transcript sequences using RefSeq, InterPro, GO and KEGG gene databases according to user-specified options. The major advantages of ESTpass are the integration of cleansing and annotating processes, rigorous chimeric EST detection, exhaustive annotation, and email reporting to inform the user about the progress and to send the analysis results. The ESTpass results include three reports (summary, cleansing and annotation) and download function, as well as graphic statistics. They can be retrieved and downloaded using a standard web browser. The server is available at http://estpass.kobic.re.kr/.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17526512 PMCID: PMC1933161 DOI: 10.1093/nar/gkm369
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Schematic of the ESTpass workflow. The ESTpass pipeline consists of three major steps: cleansing, clustering and assembling, and annotation. ESTpass output is sent to the user via email and can be retrieved using a standard web browser.
Figure 2.Illustration of the detection of a chimeric EST sequence in the alignment output of the CAP3 program. A putative chimeric EST sequence is detected if it has chimeric spots, which is represented by a stretch of EST sequences with both a depth of one and being surrounded by an alignment depth of four or more, and is dumbbell-shaped. In this example, EST5 is a candidate chimeric EST sequence. The chimerism of the EST sequence containing the chimerism spot is confirmed by comparison with the nr database using BLASTX. In the BLASTX output, the putative chimeric sequence matches a protein and its alignment spans the chimerism spot, which disproves its chimerism. In contrast, both sides of the chimerism spot in the putative chimeric EST sequence can match different proteins of the nr database, its chimerism is confirmed.