X Guan1, L Du. 1. Glaxo Wellcome Research and Development, Five Moore Drive, Research Triangle Park, NC 27709, USA. xg42498@glaxowellcome.com
Abstract
MOTIVATION: As sequence databases grow rapidly, results from sequence comparison searches using fast search methods such as BLAST and FASTA tend to be long and difficult to digest. RESULTS: In this paper, we present a new method to extract domain information from sequence comparison searches by clustering the resulting alignments according to their similarity to the query sequence. Efficient tree structures and algorithms are used to organize the alignment data such that structurally conserved elements can be easily identified. The hierarchical nature of the data structures used and the flexible X-Window-based interface provide an efficient and intuitive means to explore the alignment data at different levels so that the common domains, as well as distantly related features, can be explored. AVAILABILITY: The clustering program is available by anonymous ftp at: ftp.embl-ebi.ac.uk under directory /pub/software/unix, file: clustering.tar.Z.
MOTIVATION: As sequence databases grow rapidly, results from sequence comparison searches using fast search methods such as BLAST and FASTA tend to be long and difficult to digest. RESULTS: In this paper, we present a new method to extract domain information from sequence comparison searches by clustering the resulting alignments according to their similarity to the query sequence. Efficient tree structures and algorithms are used to organize the alignment data such that structurally conserved elements can be easily identified. The hierarchical nature of the data structures used and the flexible X-Window-based interface provide an efficient and intuitive means to explore the alignment data at different levels so that the common domains, as well as distantly related features, can be explored. AVAILABILITY: The clustering program is available by anonymous ftp at: ftp.embl-ebi.ac.uk under directory /pub/software/unix, file: clustering.tar.Z.