| Literature DB >> 15790402 |
S A Kirov1, X Peng, E Baker, D Schmoyer, B Zhang, J Snoddy.
Abstract
BACKGROUND: The analysis of biological data is greatly enhanced by existing or emerging databases. Most existing databases, with few exceptions are not designed to easily support large scale computational analysis, but rather offer exclusively a web interface to the resource. We have recognized the growing need for a database which can be used successfully as a backend to computational analysis tools and pipelines. Such database should be sufficiently versatile to allow easy system integration.Entities:
Mesh:
Year: 2005 PMID: 15790402 PMCID: PMC1274265 DOI: 10.1186/1471-2105-6-72
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Comparison of different databases, which could be used to annotate and analyze large-scale biological data. Simplified joins refers to the availability of a central key and the ability to join tables through simple queries. Remote access column refers only to a machine access to a database server.
| Database | SQL | Web interface | Simplified joins | 1Remote access | Automatic updates provided | Type | Structure available as | Modular design | Download |
| GeneKeyDB | Yes | 2Indirect | Yes | No | Yes | Oracle | Oracle, mysql, postgresql | Yes | Yes |
| LocusLink (Entrez Gene) | No | Yes | Na | Na | No | Flat file(s) | Flat file only | Na | Yes |
| RefSeq | No | Yes | Na | Na | No | Flat file | Flat file only | Na | Yes |
| Ensembl (core databases) | Yes | Yes | No | Yes | No | Mysql | Mysql | No | Yes |
| EnsMART | Yes | Yes | Yes | Yes | No | Mysql | Mysql | No | Yes |
| Dragon | Yes | Yes | No | No | No | Mysql | Flat file only | No | Data files only |
| HomoloGene | No | Yes | Na | Na | No | Flat files, XML | XML | No | Yes |
1Refers only to machine access to the relational databases.
2GeneKeyDB serves as a data mining environment to different tools, therefore these tools are could also be considered a part of the interface layer.
Figure 1GeneKeyDB sub-modules, external database identifiers and connecting tables. The connecting tables may convert between the central key and another unique key used throughout the sub-module and are shown next to the connector lines.
Summary of the attributes provided by each source
| Sub module | Source | Attributes |
| CGAP | CGAP | Expression data, LocusLink ID to Unigene and Genbank Accession; Unigene to KEGG/GO/Biocarta |
| UCSC | UCSC RefGene | LocusLink ID to chromosome coordinates and exon structure |
| LocusLink | LocusLink (Entrez Gene) | Locuslink to Genbank accesion number, RefSeq Accession numbers, Gene descriptions (symbol, name, etc.), GRIF, Pubmed, OMIM, CDD, map location, |
| MGI | MGI comparative map | Homology data |
| Homologene | Homologene | Homology data |
| Ensembl | Ensembl | LocusLink ID to Ensebml Gene and Transcript Stable IDs, Contig data |
Figure 2A workflow schema of GeneKeyDB creation and export to other RDBMS. *LocusLink is parsed first as other sub-modules depend on it with respect to the central key of the database. PROD refers to the current production stage database.