| Literature DB >> 21205783 |
Pascale Gaudet, Amos Bairoch, Dawn Field, Susanna-Assunta Sansone, Chris Taylor, Teresa K Attwood, Alex Bateman, Judith A Blake, Carol J Bult, J Michael Cherry, Rex L Chisholm, Guy Cochrane, Charles E Cook, Janan T Eppig, Michael Y Galperin, Robert Gentleman, Carole A Goble, Takashi Gojobori, John M Hancock, Douglas G Howe, Tadashi Imanishi, Janet Kelso, David Landsman, Suzanna E Lewis, Ilene Karsch Mizrachi, Sandra Orchard, B F Francis Ouellette, Shoba Ranganathan, Lorna Richardson, Philippe Rocca-Serra, Paul N Schofield, Damian Smedley, Christopher Southan, Tin W Tan, Tatiana Tatusova, Patricia L Whetzel, Owen White, Chisato Yamasaki.
Abstract
The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources; and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.Entities:
Mesh:
Year: 2011 PMID: 21205783 PMCID: PMC3017395 DOI: 10.1093/database/baq027
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Proposed core descriptors for inclusion in the BioDBCore specification
| Proposed core descriptors for a biological database |
|---|
| 1. Database name |
| 2. Main resource URL |
| 3. Contact information (E-mail; postal mail) |
| 4. Date resource established (year) |
| 5. Conditions of use (Free, or type of license) |
| 6. Scope: data types captured, curation policy, standards used |
| 7. Standards: MIs, Data formats, Terminologies |
| 8. Taxonomic coverage |
| 9. Data accessibility/output options |
| 10. Data release frequency |
| 11. Versioning policy and access to historical files |
| 12. Documentation available |
| 13. User support options |
| 14. Data submission policy |
| 15. Relevant publications |
| 16. Resource’s Wikipedia URL |
| 17. Tools available |
The BioDBCore will be used to collect information about databases for use in online browsing, searching and classification. The current specification can be found as an online survey and users are encouraged to join the project and leave feedback (http://biocurator.org/biodbcore.shtml; Figure 1). Examples can be found in Table 2 and at the BioDBCore web site.
Figure 1.A screenshot of the BioDBCore discussion page on the ISB web site (http://biocurator.org/biodbcore.shtml).
| 1. Database name | dictyBase | EMAGE | Gene Ontology Database | IntAct | SGD, Saccharomyces Genome Database | MGI, Mouse Genome Informatics |
| 2. Main resource URL | ||||||
| 3. Contact information | ||||||
| 4. Date resource established (year) | 2003 | 2002 | 1998 | 2003 | 1992 | 1989 |
| 5. Conditions of use | Free | Creative commons | Free | Free | Free | Free |
| 6. Scope: | ||||||
| Data types captured | Genome sequence; gene models including CDS and predicted proteins; phenotypes, Gene Ontology annotations, functional annotation (gene product names), gene nomenclature; strains; plasmids; free text descriptions, domains (via InterPro), orthologs (via OrthoMCL and inParanoid), protein subcellular location (via Swiss-Prot); protein existence (via Swiss-Prot), citations, researchers database | Spatially integrated | Gene Ontology (Biological Process, Molecular Function, Cellular Component), GO annotations for proteins, functional RNAs and stable complexes. | Molecular interactions | Genome sequence; gene models including CDS and predicted proteins and non-coding RNAs; chromosomal features including telomeres, centromeres and ARS elements; mutant phenotypes; Gene Ontology annotations; gene product names; gene nomenclature; strains; plasmids; free text descriptions and literature summaries; protein domains (from InterPro); orthologs; literature citations; database of yeast researchers; functional genomics (gene expression, synthetic genetic arrays); biochemical pathways; genetic and physical interactions (from BioGRID); images of protein subcellular location (via YeastGFP); links to other tools and databases including post-translational modification databases | Genes, pseudogenes, and gene models including CDS and predicted proteins and non-coding RNAs; cytogenetic markers; genomic and genetic maps; nucleotide and protein sequence associations; spontaneous, induced, and genetically engineered alleles; transgenes; QTL; mutant and conditional phenotypes; mouse models of human disease annotations; Gene Ontology annotations; mouse anatomy, mouse phenotype ontology, gene product names; gene nomenclature; strains; SNPs; protein domains (from InterPro); mammalian orthologs; literature citations; experimental molecular reagents; functional genomics (gene expression); biochemical pathways; images of phenotypic mutants and gene expression; links to other tools and other database resources |
| Curation policy | Manual curation | Manual curation | Manual curation | Manual curation | Manual curation | Manual curation |
| Standards: MIs, Data formats, Terminologies | Gene Ontology, Dicty Anatomy Ontology, Dicty Gene Nomeclature | EMAP Mouse Anatomy Ontology, MISFISHIE, MGI (MGNC) Gene/Protein ID, MGI Mouse Strain Information, MGI Mouse Allele ID, INSDC versioned sequence ID, EMBL/PIR versioned ID, MGI probe ID. | Development of the Gene Ontology standard. | MIMIX, IMEx, Gene Ontology, MOD gene nomenclature, PSI-MI CV, PSI-MOD CV | Gene Ontology, Saccharomyces Gene Nomenclature, GenBank feature table, Sequence Ontology, ChEBI, Yeast Phenotype Ontology (YPO) | Mouse gene nomenclature, Gene Ontology, Mammalian Phenotype Ontology, Mouse Adult Anatomy |
| 7. Data formats | FASTA, OBO, GAF, GFF3 (standard) | 2D Images: jpg,gif, tiff, png, etc. (standard)—3D images: OPT (standard)—Data Domains: wlz: Probe sequence: FASTA, versioned INSDC ID (standard) | OBO v1.2, Gene Association Format (GAFs obtained via Model organism databases, UniProt-KB and other collaborators), MySQL and SQL database dumps, RDF-XML, OBO-XML, OWL | PSI-MI XML2.5, MITAB2.5 (standard) | FASTA, GenBank, GAF, GFF3 (standard) | HTMLl, tab-delimited, GFF3, images, GAF files, FASTA, XML/webservices |
| 8. Taxonomic coverage (use NCBI Taxid) | D. discoideum (44689) including all strains [PRIMARY], also some genome/EST/gene model info for D. purpureum (5786), and gene model sequences for P. pallidum (13642) and D. fasiculatum (261658) | Mus musculus (10090) | All | All | Laboratory mouse (10090) | |
| 9. Data accessibility/ output options | HTML, text, database reports | HTML, xml, csv, webservices, SQL, Java API, DAS | HTML|text|XML|database reports| database dumps| web services | PSI-MI XML2.5, MITAB2.5 | HTML, text, TAB, ASN.1, FTP, Intermine | HTML, tab-delimited, GFF3, images, GAF files, FASTA, XML/webservices, FTP, BioMart |
| 10. Data release frequency | Curators work on the ‘live’ database, data dumps are done weekly (sequences) or monthly (other data) | As and when available, in principle daily | Daily | Weekly | Daily | Daily |
| 11. Versioning policy/ access to historical files | No versioning but access to historical files is possible | Versioning by date. Access to monthly releases of the full GO database going back to 2002. | Versioning by date, access to historic files available | Versioning frequency specified by datatype, database updated in real time | ||
| 12. Documentation available | Documentation, FAQ's, etc. found here | |||||
| 13. User support options | Documents, Email, web form | Documentation, FAQ’s, demo movies, glossary, email, live demo at meeting exhibits, ad hoc workshops. | Written documentation on web pages, FAQ’s, email helpdesk, webform, training camps. | Documents, email, webform, training | Dedicated user support staff available via email, phone, customized SQL, training, tutorials, FAQs | |
| 14. Data submission policy | Data from published literature. Some HTP data corresponding to published analyses is incorporated | Daily updates to GAF repository from verified submitting groups (approximately 30 at present time). Submissions from other groups accepted after quality assurance agreements. | Data accepted as part of publication process, released on article publication by Journal | Data from published literature. Some HTP data corresponding to published analyses is incorporated | Data from published literature, contributed data sets. | |
| 15. Relevant publications | PMID: 18974179, PMID: 14681427 | PMID:19767607, PMID:18077470, PMID:16381949. | PMID: 10802651, PMID: 14681407, PMID: 19920128 | PMID: 19850723 | PMID:10592186, PMID:11125055, PMID:11752257, PMID:12073322, PMID:14681421, PMID:15153302, PMID:15608219, PMID:16381907, PMID:17001629, PMID:17142221, PMID:17982175, PMID:19906697, PMID:20157474, PMID:9169866, PMID:9297238, PMID:9399804, PMID:9847146, PMID:9885151 | PMID:19864252 PMID:18981050 PMID:18158299 PMID:17135206 PMID:16381933 PMID:15608240 |
| 16. Resource’s Wikipedia URL | ||||||
| 17. Tools available | BLAST, BioMart, Generic Genome Browser, TextPresso, MetaCyc (dictyCyc) | LOSSST (Spatial Query Tool), Gene Query Tool, Anatomy Query Tool, GO Query Tool, ‘Find Similar’ Spatial Query Tool, MAPaint, Spatial Clustering Tool, Webservices, Java API, DAS Query Tool, Formatted URL Query Tool | Ontology Browseer (AmiGO), BLAST, GOTerm Finder, GOOSE (SQL query tool), GO Slimmer, Visualization, Web Services, Galaxy | BLAST (variety of fungal genome data sets), GO Query Tools (GO Slim Mapper, GO Term Finder), GBrowse for chromosomal sequence and features, GBrowse for protein sequence features, short sequence pattern matching tool (PATMATCH), oligonucleotide primer design (webprimer), genome restriction enzyme cutting site analysis, Synteny Viewer between | mouseBLAST (mouse, human, rat), Ontology Browsers, VLAD, Batch Quesy, BioMart, Gbrowse, MGI GO_slim |