Literature DB >> 18831775

Batch Blast Extractor: an automated blastx parser application.

Mehdi Pirooznia1, Edward J Perkins, Youping Deng.   

Abstract

MOTIVATION: BLAST programs are very efficient in finding similarities for sequences. However for large datasets such as ESTs, manual extraction of the information from the batch BLAST output is needed. This can be time consuming, insufficient, and inaccurate. Therefore implementation of a parser application would be extremely useful in extracting information from BLAST outputs.
RESULTS: We have developed a java application, Batch Blast Extractor, with a user friendly graphical interface to extract information from BLAST output. The application generates a tab delimited text file that can be easily imported into any statistical package such as Excel or SPSS for further analysis. For each BLAST hit, the program obtains and saves the essential features from the BLAST output file that would allow further analysis. The program was written in Java and therefore is OS independent. It works on both Windows and Linux OS with java 1.4 and higher. It is freely available from: http://mcbc.usm.edu/BatchBlastExtractor/

Entities:  

Mesh:

Year:  2008        PMID: 18831775      PMCID: PMC2559874          DOI: 10.1186/1471-2164-9-S2-S10

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


Background

The NCBI BLAST database search tool is one of the most popular programs designed to solve single query problems. BLAST (Basic Local Alignment Search Tool) is the heuristic search algorithm employed by the programs blastp, blastn, blastx, tblastn, and tblastx. The BLAST programs were tailored for sequence similarity searching for example to identify homologs of a given query sequence [1]. The five common BLAST programs perform the following tasks: 1) blastp compares an amino acid query sequence against a protein sequence database; 2) blastn compares a nucleotide query sequence against a nucleotide sequence database; 3) blastx compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database 4) tblastn compares a protein query sequence against a nucleotide sequence database dynamically translated in all six reading frames (both strands), and 5) tblastx compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database. The BLAST programs all provide information in roughly the same format. First comes (A) an introduction to the program; (B) a histogram of expectations if one was requested; (C) a series of one-line descriptions of matching database sequences; (D) the actual sequence alignments, and finally the parameters and other statistics gathered during the search. However, for genome-wide comparisons involving multiple queries (batch query), the search is a challenge. For instance, EST collections are currently produced for many species as an efficient strategy for gene identification. Analysis of the ESTs involves clustering, contig formation and annotation of thousands of fragments, interpretation of which may involve thousands of individual BLAST searches [2-5]. An automated post processing of the output (Figure 1) can simplify the analysis in such cases. The blast parser (BlastLikeSaxParser) in BioJava [6] and BPlite from BioPerl [7] are frequently being used to parse a variety of different blast outputs, but neither are user friendly and therefore programming skills are needed to use these applications.
Figure 1

Screenshot of a Blastx Output.

Screenshot of a Blastx Output. We developed the "Batch Blast Extractor" program (Figure 2 and 3) for use in this regard. It serves as a parser storing only the essential features of BLAST hits in a tabular form. The user can then apply a number of selection criteria to filter out hits with particular attributes. "Batch Blast Extractor" thus serves as a powerful annotation tool for large sets of query sequences.
Figure 2

Screenshot of the Batch Blast Extractor Web site.

Figure 3

The Bach Blast Extractor Graphical User Interface.

Screenshot of the Batch Blast Extractor Web site. The Bach Blast Extractor Graphical User Interface.

Results

The application generates a tab delimited text file that can be easily imported into any statistical package such as Excel or SPSS for further analysis. For each BLAST hit, the program derives and saves the following features: Query ID, Query Length, Accession version and GI number, Alignment Length, Score, bit, E-value, Identities, Positives, Gaps, Frame, Organism, and Description. The extracted information includes the following: ▪ Query: headers of sequences to analyze ▪ Subject: headers of sequences found in the database ▪ Score: a number representation (e.g. 550) ▪ Score Text: full text representation plus BITS (e.g. 235 bits (450)) ▪ Expect: the E-Value as number (e.g. 1e-166) ▪ Identities %: a number representation (e.g. 85) ▪ Identities Text: full text representation plus characters matching (e.g. 110/130 (90%)) ▪ Positives %: a number representation (e.g. 92) ▪ Positives Text: full text representation (e.g. 110/130 (90%)) ▪ Gaps %: a number representation (e.g. 11) ▪ Gaps Text: full text representation plus voids (e.g. 9/102 (9%)) ▪ Frame: orientation of the translated ORF (e.g. +3) ▪ Length Query: the number of nucleotides or amino acids (e.g. 400) ▪ Length Subject: the number of nucleotides or amino acids (e.g. 500) ▪ Position Query: as text representation plus the length of the frame (e.g. 328–600 (360)) ▪ Position Subject: as text representation plus the length of the frame (e.g. 1–110 (120)) The program was written in Java. It is OS independent and works on both Windows and Linux OS with java 1.4 and higher. It is freely available to noncommercial users from: (Figure 2 and 3). Currently the application works with blastx results. Efforts to extend functionality to other BLAST programs such as blastp and blastn are in progress.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

MP and YD initiated the project. MP designed, programmed and implemented the application and drafted the manuscript. EJP and YP directed the project. All authors read and approved the final manuscript.
  7 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  The Bioperl toolkit: Perl modules for the life sciences.

Authors:  Jason E Stajich; David Block; Kris Boulez; Steven E Brenner; Stephen A Chervitz; Chris Dagdigian; Georg Fuellen; James G R Gilbert; Ian Korf; Hilmar Lapp; Heikki Lehväslaiho; Chad Matsalla; Chris J Mungall; Brian I Osborne; Matthew R Pocock; Peter Schattner; Martin Senger; Lincoln D Stein; Elia Stupka; Mark D Wilkinson; Ewan Birney
Journal:  Genome Res       Date:  2002-10       Impact factor: 9.043

3.  The Bio* toolkits--a brief overview.

Authors:  Harry Mangalam
Journal:  Brief Bioinform       Date:  2002-09       Impact factor: 11.622

Review 4.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors:  S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal:  Nucleic Acids Res       Date:  1997-09-01       Impact factor: 16.971

5.  WebTraceMiner: a web service for processing and mining EST sequence trace files.

Authors:  Chun Liang; Gang Wang; Lin Liu; Guoli Ji; Yuansheng Liu; Jinqiao Chen; Jason S Webb; Greg Reese; Jeffrey F D Dean
Journal:  Nucleic Acids Res       Date:  2007-05-08       Impact factor: 16.971

6.  ESTExplorer: an expressed sequence tag (EST) assembly and annotation platform.

Authors:  Shivashankar H Nagaraj; Nandan Deshpande; Robin B Gasser; Shoba Ranganathan
Journal:  Nucleic Acids Res       Date:  2007-06-01       Impact factor: 16.971

7.  Cloning, analysis and functional annotation of expressed sequence tags from the Earthworm Eisenia fetida.

Authors:  Mehdi Pirooznia; Ping Gong; Xin Guan; Laura S Inouye; Kuan Yang; Edward J Perkins; Youping Deng
Journal:  BMC Bioinformatics       Date:  2007-11-01       Impact factor: 3.169

  7 in total
  11 in total

1.  Genome-Wide Identification of Long Noncoding RNA and Their Potential Interactors in ISWI Mutants.

Authors:  Ludan Zhang; Shuai Zhang; Ruixue Wang; Lin Sun
Journal:  Int J Mol Sci       Date:  2022-06-02       Impact factor: 6.208

2.  Comprehensive Transcriptome Analysis of Follicles from Two Stages of the Estrus Cycle of Two Breeds Reveals the Roles of Long Intergenic Non-Coding RNAs in Gilts.

Authors:  Mingzheng Liu; Qinglei Xu; Jing Zhao; Yanli Guo; Chunlei Zhang; Xiaohuan Chao; Meng Cheng; Allan P Schinckel; Bo Zhou
Journal:  Biology (Basel)       Date:  2022-05-06

3.  A novel endogenous betaretrovirus group characterized from polar bears (Ursus maritimus) and giant pandas (Ailuropoda melanoleuca).

Authors:  Jens Mayer; Kyriakos Tsangaras; Felix Heeger; María Avila-Arcos; Mark D Stenglein; Wei Chen; Wei Sun; Camila J Mazzoni; Nikolaus Osterrieder; Alex D Greenwood
Journal:  Virology       Date:  2013-05-29       Impact factor: 3.616

4.  The PARIGA server for real time filtering and analysis of reciprocal BLAST results.

Authors:  Massimiliano Orsini; Simone Carcangiu; Gianmauro Cuccuru; Paolo Uva; Anna Tramontano
Journal:  PLoS One       Date:  2013-05-07       Impact factor: 3.240

5.  Transcriptome Analysis Reveals Long Intergenic Non-Coding RNAs Contributed to Intramuscular Fat Content Differences between Yorkshire and Wei Pigs.

Authors:  Qianqian Li; Ziying Huang; Wenjuan Zhao; Mengxun Li; Changchun Li
Journal:  Int J Mol Sci       Date:  2020-03-03       Impact factor: 5.923

6.  BlaSTorage: a fast package to parse, manage and store BLAST results.

Authors:  Massimiliano Orsini; Simone Carcangiu
Journal:  Source Code Biol Med       Date:  2013-01-30

7.  Genomics, molecular imaging, bioinformatics, and bio-nano-info integration are synergistic components of translational medicine and personalized healthcare research.

Authors:  Jack Y Yang; Mary Qu Yang; Hamid R Arabnia; Youping Deng
Journal:  BMC Genomics       Date:  2008-09-16       Impact factor: 3.969

8.  De novo transcriptomic analysis of hydrogen production in the green alga Chlamydomonas moewusii through RNA-Seq.

Authors:  Shihui Yang; Michael T Guarnieri; Sharon Smolinski; Maria Ghirardi; Philip T Pienkos
Journal:  Biotechnol Biofuels       Date:  2013-08-23       Impact factor: 6.040

9.  BLASTGrabber: a bioinformatic tool for visualization, analysis and sequence selection of massive BLAST data.

Authors:  Ralf Stefan Neumann; Surendra Kumar; Thomas Hendricus Augustus Haverkamp; Kamran Shalchian-Tabrizi
Journal:  BMC Bioinformatics       Date:  2014-05-05       Impact factor: 3.169

10.  Identification of Long Non-Coding RNAs Involved in Porcine Fat Deposition Using Two High-Throughput Sequencing Methods.

Authors:  Yibing Liu; Ying Yu; Hong Ao; Fengxia Zhang; Xitong Zhao; Huatao Liu; Yong Shi; Kai Xing; Chuduan Wang
Journal:  Genes (Basel)       Date:  2021-08-31       Impact factor: 4.096

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.