| Literature DB >> 12401134 |
Katerina Michalickova1, Gary D Bader, Michel Dumontier, Hao Lieu, Doron Betel, Ruth Isserlin, Christopher W V Hogue.
Abstract
BACKGROUND: SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment.Entities:
Mesh:
Year: 2002 PMID: 12401134 PMCID: PMC138791 DOI: 10.1186/1471-2105-3-32
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The SeqHound database system in UML. From the bottom up: the system relies on data provided by the NCBI FTP site and the Gene Ontology resource. It uses the NCBI programming toolkit, the database management system (DBMS) and the bzip compression scheme as programming tools. The database is filled and updated using SeqHound parsers, programming tools and NCBI data as input. The database is searched using the SeqHound query interface which is usable in three forms – as CGI-based web pages, as a local API and as a remote API. All applications (top right) are written using the SeqHound API.
Figure 2The database schema in UML. Each box depicts one table within the SeqHound system. The grey areas contain the table names. PK stands for "primary key". For the majority of the tables, the primary key is the GenInfo (GI) identifier. Each subsequent entry in each of the boxes indicates a field of information stored in the tables. Required fields are in bold. ASN.1 schema in these tables can be found at http://ncbi.nlm.nih.gov/IEB (for the Bioseq, Seq-Entry, Cdd and Biostruc) and at http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/slritools/slri/seqhound/asn for the rest of the objects.
Summary of contents of SeqHound database tables.
| Database Table | Contents Summary |
|---|---|
| ASNDBs | GI and ASN.1 Bioseq – partitioned into divisions |
| PARTITION | GI and division – master list |
| ACCDB | GI, accession number, and database specific identifier (if available) |
| NUCPROT | protein GI and DNA GI |
| PUBSEQ | GI and MEDLINE identifier |
| REDUND | GI and redundant group identifier |
| SENGI | GI and SeqEntry identifier – partitioned into divisions |
| SENDBs | SeqEntry identifier, ASN.1 SeqEntry and division |
| CHROM | GI and chromosomal identifier from complete genome |
| TAXGI | GI and taxonomy identifier |
| TAXDB | Taxonomy hierarchy |
| GCODEDB | Taxonomy identifier and genetic code |
| DIVDB | Entrez division |
| MMGI | GI and MMDB identifier |
| MMDB | MMDB identifier and ASN.1 for 3-D structures (Biostruc) |
| DOMDB | GI, ASN.1 structural domain and chain redundant set tag |
| NRBLASTDB | GI and ASN.1 list of neighbours |
| BLASTDB | Hashed pair of GIs and ASN.1 alignment from BLAST comparison |
| GO_PARENT | Gene ontology |
| GO_NAME | GO identifier and function name |
| GO_SYNONYM | GO identifier and synonym |
| GO_REFERENCE | GO identifier and reference to other databases (e.g. Enzyme Consortium) |
| LL_GO | GI and GO identifier |
| LL_OMIM | GI and Online Mendelian Inheritance in Man identifier |
| LL_LLINK | GIs, Locus Link identifier and chromosomal location |
| LL_CDD | GI and Conserved Domain Database identifier extracted from Locus Link database |
| RPSDB | GI and Conserved Domain Database identifier, details of alignment |
| DOMNAME | Conserved Domain Database annotation |
Parsers and resource files needed to build and update SeqHound.
| Input File | Resource | Parser | Tables Modified | Module |
|---|---|---|---|---|
| ASN.1 sequences | mother | ASNDB, PARTI, NUCPOT, ACCDB, PUBSEQ, TAXGI, SENDB, SENGI | core | |
| ASN.1 sequences | update | ASNDB, PARTI, NUCPOT, ACCDB, PUBSEQ, TAXGI, SENDB, SENGI | core | |
| FASTA nr database | redund | REDUND | redundb | |
| List of complete genomes (flat file) | chrom | CHROM | gendb | |
| ASN.1 for complete genomes | comgen | TAXGI, ACCDB | gendb | |
| Taxonomy release (flat file) | importtaxdb | TAX, GCODE, DIV, DEL, MERGE | taxdb | |
| ASN.1 MMDB release | cbmmdb | MMDB, MMGI | strucdb | |
| MMDB (database table) | MMDB table | vastblst | DOMDB | strucdb |
| 3-D chain BLAST sets (flat file) | pdbrep | DOMDB | strucdb | |
| FASTA nr database | nblast | nrB | neigdb | |
| BLAST ASN.1 results | nrB table available nrB and nrN tables availables at | nbraccess | nrN | neigdb |
| LL_tmpl (flat file) | llparser | LL_OMIM, LL_GO, LL_LLINK, LL_CDD | lldb | |
| gene_associaton.com pugen.GenBank/Swissprot (flat files) | addgoid | LL_GO | lldb | |
| function.ontology process.ontology component.ontology (flat files) | goparser | GO_PARENT, GO_NAME, GO_REFERENCE, GO_SYNONYM | godb | |
| CDD database | domname | DOMNAME | rpsdb | |
| FASTA nr database and CDD database | rpsdb | RPSDB | rpsdb |
Figure 3The application programming interface (API) in UML. The SeqHound API consists of the database administration API, the local and remote query APIs, the formatdb API and the Clustal API. The remote server executes remote API requests using local API and returns results to a client. The WWW server utilizes the local API to present WWW pages to the user. Each box contains a group of programming functions with similar purpose. The individual functions are used to retrieve a set of data from the SeqHound system.
Figure 4Clustal formatted tyrosyl tRNA synthetase sequence. The letter "A" denotes an α-helix, "B" a β-strand. The capital letters indicate that the automated secondary structure assignment (as annotated in the MMDB database) and the assignment by authors agreed while the lower case letter indicates that there was a disagreement.