| Literature DB >> 17132830 |
Alecksandr Kutchma1, Nayeem Quayum, Jan Jensen.
Abstract
The GeneSpeed database (http://genespeed.uchsc.edu/) is an online database and resource tool facilitating the detailed study of protein domain homology in the transcriptomes of Homo sapiens, Mus musculus, Drosophila melanogaster and Caenorhabditis elegans. The population schema for the GeneSpeed database takes advantage of HOWARD parallel cluster technology (http://www.massivelyparallel.com/) and performs exhaustive tBLASTn searches covering all pre-assigned PFAM domain classes in all species (currently 7973 domain families) against the respective Unigene EST databases of the selected four transcriptomes. The resulting database provides a complete annotation of presumed protein domain presence for each Unigene cluster. To complement this domain annotation we have also performed a custom transcription factor-family curation of all Pfam domains, incorporated the Gene Ontology classifications for these domains as well as integrated the Novartis SymAtlas2 dataset for both human and mouse which provides rapid and easy access to tissue-based expression analysis. Consequently, the GeneSpeed database provides the user with the capability to browse or search the database by any of these specialized criteria as well as more traditional means (gene identifier, gene symbol, etc.), thereby enabling a supervised analysis of gene families through a top-down hierarchical basis defined by domain content, all directly linked to an optimized gene expression dataset.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17132830 PMCID: PMC1716729 DOI: 10.1093/nar/gkl990
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1GeneSpeed database population process. In sequential order as given in the figure: (1) all domain information and alignments in the Pfam database are downloaded from Pfam; (2) a single domain is selected (C2H2-Zinc Finger in this example); (3) the ‘full’ alignment (in this case the C2H2-Zinc Finger domain ‘full’ alignment contains 32 874 distinct protein sequences) is used in a (4) batch NCBI-based tBLASTn using the Massively Parallel PbH BLAST Server; (5) tBLASTn output is parsed and redundancy eliminated; (6) non-redundant data are banked into a custom MySQL database; (7) the process is repeated for all domain families in Pfam (currently 7973); (8) other datasets are integrated with the database.
Figure 2GeneSpeed database navigation diagram. (A. Search) A search is started by first selecting the organism and then the search type. (B. Results) The results page will display the number of Unigene clusters (genes) found and the user then has the option to select additional specific information they would like displayed concerning these genes. (C. Sub-searches/Tool) Other sub-searches/tools are available to refine the original search including Domain Sub-search (finds all domains in a specific gene), InterPro Sub-search (finds all genes with a specific domain), External Links (www links to outside databases) and Novartis SymAtlas2 (investigate the expression level of any number of genes with the data contained within the Norvartis SymAtlas2 dataset including 79 human and 61 mouse tissues). (D. User Account) Each user is provided with a private account, in which they may store any number of user-specified gene lists to keep for analysis at any later time point. Upload tools are also provided, allowing users to analyze any gene list not generated within the GeneSpeed database.