Literature DB >> 22144685

The 2012 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection.

Michael Y Galperin1, Xosé M Fernández-Suárez.   

Abstract

The 19th annual Database Issue of Nucleic Acids Research features descriptions of 92 new online databases covering various areas of molecular biology and 100 papers describing recent updates to the databases previously described in NAR and other journals. The highlights of this issue include, among others, a description of neXtProt, a knowledgebase on human proteins; a detailed explanation of the principles behind the NCBI Taxonomy Database; NCBI and EBI papers on the recently launched BioSample databases that store sample information for a variety of database resources; descriptions of the recent developments in the Gene Ontology and UniProt Gene Ontology Annotation projects; updates on Pfam, SMART and InterPro domain databases; update papers on KEGG and TAIR, two universally acclaimed databases that face an uncertain future; and a separate section with 10 wiki-based databases, introduced in an accompanying editorial. The NAR online Molecular Biology Database Collection, available at http://www.oxfordjournals.org/nar/database/a/, has been updated and now lists 1380 databases. Brief machine-readable descriptions of the databases featured in this issue, according to the BioDBcore standards, will be provided at the http://biosharing.org/biodbcore web site. The full content of the Database Issue is freely available online on the Nucleic Acids Research web site (http://nar.oxfordjournals.org/).

Entities:  

Mesh:

Year:  2011        PMID: 22144685      PMCID: PMC3245068          DOI: 10.1093/nar/gkr1196

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


COMMENTARY

This current, 19th annual Database Issue of Nucleic Acids Research (NAR) features descriptions of 92 new online databases covering a variety of molecular biology data, 77 update papers on databases that have been previously described in the NAR Database Issue and 23 papers with updates on database resources whose descriptions have previously been published in other journals (Table 1). The accompanying NAR online Molecular Biology Database Collection (http://www.oxfordjournals.org/nar/database/a/) has been revised, which resulted in updating the URLs of more than 30 databases and exclusion of more than 20 obsolete web sites. This list now includes 1380 databases sorted into 14 categories and 41 subcategories.
Table 1.

New databases featured in the 2012 NAR Database issue

Database nameURLBrief description
ApoHoloDBhttp://ahdb.ee.ncku.edu.tw/Apo- and Holo- structure pairs of proteins
AutismKBhttp://autism.cbi.pku.edu.cnAutism genetics knowledgebase
BGMUThttp://www.ncbi.nlm.nih.gov/projects/gv/mhc/xslcgi.cgi?cmd=bgmutBlood Group antigen gene Mutation database
BitterDBhttp://bitterdb.agri.huji.ac.il/bitterdb/dbbitter.phpBitter taste: molecules and receptors
canSARhttp://cansar.icr.ac.ukIntegrated cancer research and drug discovery resource
CAPS-DBhttp://www.bioinsilico.org/CAPSDBClassification of helix cappings in protein structures
ccPDBhttp://crdd.osdd.net/raghava/ccpdb/Compilation and creation of datasets from Protein Data Bank
CharProtDBhttp://www.jcvi.org/charprotdb/Experimentally Characterized Protein annotations
COLT-Cancerhttp://colt.ccbr.utoronto.ca/cancerEssential gene profiles in human cancer cell lines
Crystallography Open Databasehttp://www.crystallography.net/Crystal structures of small molecules
Cube-DBhttp://epsf.bmad.bii.a-star.edu.sg/cube/db/html/home.htmlFunctional divergence in human protein families
DARChttp://darcsite.genzentrum.lmu.de/darc/Database for Aligned Ribosomal Complexes
DBETHhttp://www.hpppi.iicb.res.in/btoxDatabase for Bacterial ExoToxins for Humans
Death Domain databasehttp://www.deathdomain.orgProtein interaction data for Death Domain superfamily
DIGIThttp://www.biocomputing.it/digit4/Database of ImmunoGlobulin sequences and Integrated Tools
Disease Ontologyhttp://diseaseontology.sf.net/Ontology for a variety of human diseases
DiseaseMethhttp://202.97.205.78/diseasemethHuman disease methylation database
DistiLDhttp://distild.jensenlab.org/Diseases and Traits In Linkage Disequilibrium blocks
DNAtraffichttp://dnatraffic.ibb.waw.pl/DNA dynamics during the cell cycle
DOMMINOhttp://dommino.orgDatabase of MacroMolecular INteractions
doRiNAhttp://dorina.mdc-berlin.deDatabase of RNA interactions in post-transcriptional regulation
DR.VIShttp://www.scbit.org/dbmi/drvisHuman Disease-Related Viral Integration Sites
EBI BioSample Databasehttp://www.ebi.ac.uk/biosamples/Biological samples used as sources of sequence, structure or expression data
EcoliWikihttp://ecoliwiki.netCommunity-based pages about non-pathogenic E. coli
eQuilibratorhttp://equilibrator.weizmann.ac.ilThermodynamics calculator for biochemical reactions
FungiDBhttp://fungidb.orgFunctional genomics of fungi
FunTreehttp://www.ebi.ac.uk/thornton-srv/databases/FunTree/Evolution of novel enzyme functions in enzyme superfamilies
GeneWeaverhttp://www.GeneWeaver.orgFunctional genomics analysis system
GONUTShttp://gowiki.tamu.eduGene Ontology Normal Usage Tracking System
GWASdbhttp://jjwanglab.org/gwasdbHuman genetic variants identified by genome wide association studies
HaploReghttp://compbio.mit.edu/HaploRegSNP-centric access to chromatin state information
HFV databasehttp://hfv.lanl.gov/Hemorrhagic fever virus sequence database
hiPathDBhttp://hipathdb.kobic.re.kr/Human Integrated Pathway Database
Histomehttp://www.histome.net/Human histone database
HotRegionhttp://prism.ccbb.ku.edu.tr/hotregionDatabase of interaction Hotspots
Human OligoGenome Resourcehttp://oligogenome.stanford.edu/Oligonucleotides for targeted resequencing of the human genome
ICEberghttp://db-mml.sjtu.edu.cn/ICEberg/Integrative and Conjugative Elements in Bacteria
IDEALhttp://www.ideal.force.cs.is.nagoya-u.ac.jp/IDEAL/Intrinsically Disordered proteins with Extensive Annotations and Literature
IGDB.NSCLChttp://igdb.nsclc.ibms.sinica.edu.twIntegrated Genomic Database of Non-Small Cell Lung Cancer
IndelFRhttp://indel.bioinfo.sdu.edu.cnIndel Flanking Region database
InterEvolhttp://biodev.cea.fr/interevolEvolution of protein–protein Interfaces
LegumelIPhttp://plantgrn.noble.org/LegumeIP/Model Legumes Integrative database Platform
MetaBasehttp://metadatabase.orgWiki database of biological databases
MethylomeDBhttp://epigenomics.columbia.edu/methylomedb/DNA methylation profiles in human and mouse brain
MINAShttp://www.minas.uzh.chMetal Ions in Nucleic AcidS
MIPModDBhttp://bioinfo.iitk.ac.in/MIPModDBMajor Intrinsic Protein superfamily Models
miREXhttp://bioinfo.amu.edu.pl/mirexPlant microRNA Expression data
miRNESThttp://mirnest.amu.edu.plmicroRNAs in animal and plant EST sequences
MMMDBhttp://mmdb.iab.keio.ac.jp/Mouse Multiple Tissue Metabolomics Database
modMinehttp://intermine.modencode.orgMining of modENCODE data
MOPEDhttp://moped.proteinspire.orgModel Organism Protein Expression Database
NCBI BioSamplehttp://www.ncbi.nlm.nih.gov/biosampleBiological samples used as sources of sequence, structure or expression data
NCBI BioProjecthttp://www.ncbi.nlm.nih.gov/bioprojectLinked data related to a single research project
Nematodes.orghttp://www.nematodes.org/nematodegenomes/Wiki for coordinating nematode sequencing projects
Newt-omicshttp://newt-omics.mpi-bn.mpg.deData on red spotted newt Notophthalmus viridescens
neXtProthttp://www.nextprot.org/A knowledgebase for human proteins
NRG-CINGhttp://nmr.cmbi.ru.nl/NRG-CINGValidated NMR structures of proteins and nucleic acid
OGEEhttp://ogeedb.embl.deOnline GEne Essentiality database
PDBjhttp://pdbj.org/Protein Data Bank Japan
PhenoMhttp://phenom.ccbr.utoronto.caMorphological database of essential yeast genes
Phytozomehttp://www.phytozome.net/JGI's platform for green plant genomics
PlantNATsDBhttp://bis.zju.edu.cn/pnatdb/Plant natural antisense transcripts
Polbasehttp://polbase.neb.comBiochemical, genetic, and structural information about DNA polymerases
PomBasehttp://www.pombase.org/Genome database on S. pombe
PoSSuMhttp://possum.cbrc.jp/PoSSuM/Ligand-binding POcket Similarity Search Using Multiple-Sketches
Predictive Networkshttp://predictivenetworks.orgIntegration, navigation, visualization, and analysis of gene interaction networks
ProGlycProthttp://www.proglycprot.orgExperimentally characterized Prokaryotic GlycoProteins
ProOpDBhttp://operons.ibt.unam.mx/OperonPredictor/Prokaryotic Operon DataBase
ProPortalhttp://proportal.mit.edu/Prochlorococcus marinus and its phages
ProRepeathttp://prorepeat.bioinformatics.nl/Amino acid tandem Repeats in Proteins
ProtChemSIhttp://pcidb.russelllab.org/Protein-Chemical Structural Interactions
PSCDBhttp://idp1.force.cs.is.nagoya-u.ac.jp/pscdb/Protein Structural Change upon ligand binding
RecountDBhttp://recountdb.cbrc.jpRecalculated transcript amounts database
Rheahttp://www.ebi.ac.uk/rhea/EBI's biochemical reaction database
RNA CoSSMoshttp://cossmos.slu.eduRNA Characterization of Secondary Structure Motifs
ScerTFhttp://ural.wustl.edu/TFDB/Binding sites for Saccharomyces cerevisiae Transcription Factors
SCRIPDBhttp://dcv.uhnres.utoronto.ca/SCRIPDB/searchSearch for Chemicals and Reactions In Patents
SEQanswershttp://seqanswers.com/wiki/SEQanswersWiki on all aspects of next-generation genomics
SitExhttp://www-bionet.sscc.ru/sitex/Projections of protein functional Sites on Exons
SNPediahttp://www.SNPedia.comWiki on SNPs and genome annotation
SpliceDiseasehttp://cmbi.bjmu.edu.cn/SdiseaseLinks between RNA splicing and disease
STAP refinement of NMRdbhttp://psb.kobic.re.kr/STAP/refinementRefined solution NMR structures
Stem Cell Discovery Enginehttp://discovery.hsci.harvard.edu/Comparison system for cancer stem cell analysis
TopFINDhttp://clipserve.clip.ubc.ca/topfindProtein N- and C-termini and protease processing
UMD-BRCA1/ BRCA2 databaseshttp://www.umd.be/BRCA1/BRCA1 and BRCA2 mutations detected in France
UniPathwayhttp://www.grenoble.prabi.fr/obiwarehouse/unipathwayMetabolic pathway information in UniProt knowledge base
VIRsiRNAdbhttp://crdd.osdd.net/servers/virsirnadbExperimentally validated Viral siRNA/shRNA
YeTFaSCohttp://yetfasco.ccbr.utoronto.ca/Yeast Transcription Factor binding Site sequence Collection
YMDBhttp://www.ymdb.caYeast Metabolome Database
zfishbookhttp://zfishbook.org/Transposon-labeled mutants in zebrafish
New databases featured in the 2012 NAR Database issue

NEW AND UPDATED DATABASES

This issue contains an unusually high number of papers from the authors’ host institutions, NCBI and EMBL-EBI, respectively. In addition to the annual papers from the International Nucleotide Sequence Database collaboration [INSDC (1), which includes the DNA Data Bank of Japan, GenBank and the European Nucleotide Archive (2–4)], Ensembl (5), UniProtKB (6) and the Protein Data Bank in Europe (7), these include two papers that describe the BioSample database project, recently launched at both institutions. The BioSample databases [http://www.ncbi.nlm.nih.gov/biosample and http://www.ebi.ac.uk/biosamples/, (8) and (9), respectively] aim at capturing essential information about each biological sample used to obtain sequence, gene expression or protein expression data, as well as the relationship between different samples and their sources. The sample information includes the name of the source organism (or an environmental isolate), the source material within that species such as e.g. the organ, tissue and the cell type. It will also contain information about the isolation source of the sample, (some or all of) locality, host, collection date, etc. For human sources, BioSample information will include any available—and ethically appropriate—additional data, such as the disease state and clinical information [clinical samples that may raise privacy concerns will continue to be kept at the NCBI's dbGaP database (10) and the EBI's European Genome-phenome Archive (http://www.ebi.ac.uk/ega/), with sanitized versions available in the BioSample databases]. While providing sample information will place additional burden on the submitters, the availability of BioSample data should dramatically improve the experience of a typical user. By consistently recording sample information for various kinds of data stored in the NCBI and EBI databases, the BioSample databases will allow smooth cross-database searching of all available information pertaining to a particular sample source, such as cell type, disease, or a tissue biopsy. Furthermore, since NCBI and EBI agreed to assign shared sample accession numbers, these numbers could now be used to query web sites of both institutions (8,9). The NCBI paper (8) also presents the BioProject database (http://www.ncbi.nlm.nih.gov/bioproject), another INSDC initiative, which aims to provide a higher-order organization of large-scale data submitted by a single organization or a consortium, funded from a single source, or relating to the same whole-genome assembly. Again, the availability of such metadata should simplify the task of retrieving related data sets from different kinds of databases held at NCBI, EBI and DDBJ. Five papers in this issue describe databases resources of the US Department of Energy's Joint Genome Institute (JGI, http://www.jgi.doe.gov). These include a description of the JGI Genome Portal (11) with its fungal (MycoCosm), plant (Phytozome), prokaryotic (IMG) and metagenomic (IMG/M) resources, and the Genomes OnLine Database (GOLD, http://www.genomesonline.org), which lists the ongoing genomic and metagenomic projects (12). One of the major highlights of this issue is the first description of neXtProt, a knowledgebase on human proteins that has been created at the Swiss Institute of Bioinformatics (SIB) on the basis of the human protein set in the UniProtKB/Swiss-Prot and then expanded by including quality-assessed protein expression, localization, variation and proteomics data (13). Other highlights include CharProtDB, a database of experimentally characterized proteins that is used for genome annotation at the J. Craig Venter Institute (14); a detailed explanation of the basic principles behind the NCBI Taxonomy Database and the ways it ties together various DNA and protein sequence and gene expression data for all organisms and taxonomic groups represented in GenBank (15); the descriptions of the recent developments in the Gene Ontology and UniProt Gene Ontology Annotation projects (16,17), and updates on model organism databases SGD, MGD, FlyBase and WormBase (18–21) and on Pfam, SMART and InterPro domain databases (22–24). With all the diversity of the databases featured in this issue, the major trend appears to be an increased focus on small molecules (ChEMBL, PubChem, BitterDB, SCRIPDB, Crystallography Open Database) and related topics, such as properties of enzyme-catalyzed reactions (Rhea, MACiE, eQuilibrator, SABIO-RK), protein–ligand binding (Pocketome, PoSSuM, ProtChemSI, STITCH), and the analysis of potential drugs and drug targets for human disease (canSAR, DAMPD, DBETH, SuperTarget, TDR Targets, Therapeutic Target Database). As in previous years, there is a strong representation of structure databases, including descriptions of the European and Japanese Protein Data Banks (PDBe, PDBj), two databases of refined NMR structures (NRG-CING and STAP Refinement of NMR database), and several other databases on protein structure and protein–protein interactions. An unusually high number of databases, including ChEMBL, FunCoup, MitoMiner, PhosphoSitePlus, Pocketome, SABIO-RK and TDR Targets, are featured in this NAR Database Issue for the first time after having their descriptions published elsewhere (Table 2). All these databases have been available online for several years and have been accepted and valued by the community. Accordingly, they presented few, if any, problems with the database design, although some appeared somewhat less user-friendly than is required for the NAR Database Issue. We consider publication of these papers in the NAR Database Issue a continuation of our efforts to bring the readers the best publicly available molecular biology databases, as well as a reflection of the unique status of this publication that introduces the databases to a very wide audience.
Table 2.

Database updates new for the NAR Database issue

Database nameURLBrief description
BYKdbhttp://bykdb.ibcp.fr/Bacterial protein tYrosine Kinase database
BμG@Sbasehttp://bugs.sgul.ac.uk/E-BUGS-PUBMicroarray datasets for microbial gene expression
ChEMBLhttps://www.ebi.ac.uk/chembldbEMBL's database of bioactive drug-like small molecules
ConoServerhttp://www.conoserver.org/Sequence and structures of peptides expressed by marine cone snails
CoryneRegNethttp://coryneregnet.cebitec.uni-bielefeld.de/Corynebacterial Regulatory Network
ExoCartahttp://exocarta.ludwig.edu.auDatabase on exosomes, membrane vesicles of endocytic origin released by diverse cell types
FunCouphttp://funcoup.sbc.su.se/Networks of Functional Coupling of proteins
HmtDBhttp://www.hmtdb.uniba.it/Human mitochondrial genome variability
MimoDBhttp://immunet.cn/mimodbMimotope database, active site-mimicking peptides from phage-display libraries
MIRIAM Registryhttp://www.ebi.ac.uk/miriam/Minimal Information Required In the Annotation of Models
MitoMinerhttp://mitominer.mrc-mbu.cam.ac.uk/Mitochondrial proteomics data
MitoZoahttp://www.caspur.it/mitozoaMitochondrial genomes in Metazoa
NAPPhttp://rna.igmors.u-psud.fr/NAPPNucleic Acid Phylogenetic Profile database
OPMdbhttp://opm.phar.umich.eduOrientations of Proteins in Membranes database
PhosphoSItePlushttp://www.phosphosite.org/Protein phosphorylation sites and other post-translational modifications
PINAhttp://cbg.garvan.unsw.edu.au/pina/Protein Interaction Network Analysis
Plant Metabolomicshttp://plantmetabolomics.vrac.iastate.edu/Arabidopsis metabolomics database
PLEXdbhttp://www.plexdb.orgGene Expression Resources for Plants and Plant Pathogens
Pocketomehttp://www.pocketome.orgSmall-molecule binding pockets in the structural proteome
SABIO-RKhttp://sabiork.h-its.org/System for the Analysis of Biochemical Pathways Reaction Kinetics
SubtiWikihttp://subtiwiki.uni-goettingen.de/Collaborative resource for the Bacillus community
TDR Targetshttp://tdrtargets.org/Targets against neglected tropical diseases
WikiPathwayshttp://www.wikipathways.orgCommunity curation of biological pathways
Database updates new for the NAR Database issue In response to the growing popularity of Wikipedia (http://www.wikipedia.org) and wiki-based approaches to constructing and curating biological databases, this issue includes a special section with 10 papers describing various wiki-based databases. These papers are introduced in an accompanying editorial by Rob Finn, Paul Gardner and Alex Bateman (25), whose very popular Pfam (22) and Rfam (26) databases successfully incorporate wiki elements. It could be argued that the Pfam update paper (22) should have been placed in that section as well.

SUSTAINABILITY OF BIOINFORMATICS DATABASES

A joint paper in this issue from the three INSDC members (27) discusses the progress of the Sequence Read Archive (SRA, previously known as the Short Read Archive), however, without mentioning the controversy that surrounded the SRA in the past year. Established in 2007 as a public repository of raw sequence data from next-generation sequencing platforms, SRA stores sequence data generated for RNA-Seq, ChIP-Seq and genotyping studies, as well as from several large-scale projects, such as the Human Microbiome project (https://commonfund.nih.gov/hmp) and the 1000 Genomes project (http://www.1000genomes.org) (27). In June 2011, its volume surpassed 100 Terabases (1014 bases) of DNA. In February, NCBI announced that, due to budget constraints, it would discontinue the SRA within the next 12 months (http://www.ncbi.nlm.nih.gov/About/news/16feb2011). This announcement caused a widespread response (28). One news source even claimed that NCBI ‘announced that it would slowly phase out its DNA archive due to federal budget cuts’. There has been also an extensive online discussion on the http://seqanswers.com wiki web site (which is described in a separate paper in this issue). However, the news of the SRA demise proved largely premature. Within days, EBI and DDBJ announced that they would continue supporting the SRA (http://www.ebi.ac.uk/ena/SRA_announcement_Feb_2011.pdf, http://www.ddbj.nig.ac.jp/whatsnew/2011/DRA20110222.html), and the NIH provided support to enable the continuation of the SRA (http://www.ncbi.nlm.nih.gov/About/news/13Oct2011.html). Still, given that the SRA keeps growing at a rapid pace and handling the data becomes increasingly complicated, the INSDC paper carefully states that ‘SRA partners actively discuss and pursue approaches together with user communities to maximize the benefit gained from archiving next-generation sequencing data while minimizing the infrastructure costs’ (27). Despite its successful resolution, the SRA story highlights an important problem of whether public database providers should try keeping all sequence-related data or make certain choices about the kind of resources that they would like to maintain. The same news release in February 2011 announced the closure of Peptidome, the NCBI resource for tandem mass spectrometry peptide and protein identification data (29). The closure of Peptidome attracted far less attention than of SRA, probably because of the continued operation of EBI's PRIDE (30), Seattle Proteome Center's PeptideAtlas (31), the recently created MOPED (32) and other proteomics resources. Still, it is definitely a sign of things to come, as is the recently announced closure of the International Protein Index, which is to be replaced by the complete proteome sets in UniProtKB (33). Most importantly, the worldwide attention to the SRA story illuminates the deep concern that exists in the community with regard to the stability (viability) of the online databases that have become key resources enabling all kinds of biomedical research. Previously, we have seen a natural selection of databases that led to a relatively orderly succession: as some databases have grown obsolete, they were replaced by similar but more robust databases maintained elsewhere. For example, after termination of IRESdb, a database of the internal ribosome entry sites (34), the same data were still available through the IRESite database (35). Among the databases featured in this issue, MitoZoa provides the same coverage of metazoan mitochondrial genomes as the now-defunct AMmtDB, Gene3D fully replaces the no-longer-maintained 3D-Genomics, and Ensembl (5) provides the alternative splicing data that have previously been available through ASHESdb, EBI's ASD/ATD/ATSD and several other recently discontinued databases. Unfortunately, owing to the difficult economic times, budget constraints are now leading to the termination (or commercialization) of truly unique resources, such as the Kyoto Encyclopedia of Genes and Genomes (KEGG, http://www.genome.jp/kegg) and The Arabidopsis Information Resource (TAIR, http://arabidopsis.org), both featured in this issue (36,37). The KEGG database, maintained by Minoru Kanehisa and his colleagues at the Bioinformatics Center of the Kyoto University Institute for Chemical Research, has been a permanent feature of the NAR Database Issue since 1997 and is now in its 60th release (36), see http://www.genome.jp/en/release.html. However, after Kanehisa, who was one of the founders of GenBank and has been at the forefront of bioinformatics research ever since, has reached the mandatory retirement age; the future of KEGG has suddenly become uncertain (see http://www.genome.jp/kegg/docs/plea.html). Right now, KEGG continues to be publicly available but its funding mechanisms support a narrow focus on translational research (36), which is certainly important but is only a minor part of the enormous contribution of this database to the progress of genomics and bioinformatics around the world. The case of TAIR is even more troubling. Over the past 12 years, TAIR enjoyed generous support from the US National Science Foundation (NSF, http://www.nsf.gov) that helped it grow into a recognized source of sequence data and curated annotation of the model plant Arabidopsis thaliana. Three previous publications on TAIR in the NAR Database Issue in 2001, 2003 and 2008 were all extremely well cited, confirming the widespread use of this resource. With the completion of the Arabidopsis sequencing project, the focus of TAIR shifted from providing new annotation to improving the existing genome annotation, making it the ultimate source of gene annotation and expression data for A. thaliana. Unfortunately, this new focus failed to win the NSF support and the funding for a project that until recently has been heralded as one of the NSF best success stories will end in August of 2013. This will likely mean termination of TAIR as we know it; the existing plans for corporate sponsorship of TAIR and/or for its shift to an International Arabidopsis Informatics Consortium (see http://www.arabidopsis.org/doc/about/tair_funding/410) are not going to prevent the demise of this useful genomic resource. These recent developments show that the importance of the public database resources, which is obvious to any biologist, needs to be constantly highlighted to the national and international financing bodies. We all remember the financial difficulties encountered in the 1990s by the Swiss-Prot database after it failed to secure sufficient support from the European Union (http://web.expasy.org/docs/crisis96/help-sprot.html) (38). Fortunately, in the end, Swiss government recognized the value of that unique resource and provided funding to support Swiss-Prot (39). It now supports the UniProtKB/Swiss-Prot activities at the SIB, whereas funding for the UniProtKB activities at the EBI and PIR is provided by the NIH, NSF and the European Commission (6). The stories of Swiss-Prot, KEGG and TAIR also illustrate the need [clearly articulated in a recent paper by Julian Parkhill, Ewan Birney and Paul Kersey, (40)] for a comprehensive infrastructure that would (i) support the key bioinformatics resources, (ii) extend to the model organism databases and (iii) bring the genomic information into every biological lab. In the USA, such infrastructure includes the NCBI, the JGI and associated DOE labs, the NIH-funded Bioinformatics Resource Centers (this issue includes papers on VectorBase and ViPR, as well as on EuPathDB-associated databases, such as GeneDB, FungiDB, and TDR Targets) and comprehensive resources on model organisms, such as FlyBase, WormBase, SGD and MGD (18–21). In Europe, coordination of the bioinformatics infrastructure is planned through the EU-sponsored ELIXIR (European Life Sciences Infrastructure for Biological Information, http://www.elixir-europe.org) project, which aims at guaranteeing seamless access to biological information by integrating data generators and data centers throughout Europe.

AN ECOSYSTEM OF DATABASES

Although this issue looks like a simple catalog, it is important to note that we are not dealing with isolated resources: many listed databases interact in a variety of ways, forming a network of interconnected (or at least hyperlinked) data resources. Obviously, UniProtKB provides a plethora of links to all kinds of databases, including ENA, GenBank, DDBJ, RefSeq, PDBe, PDBj, IntAct, MINT, Ensembl, KEGG, UCSC Genome Browser, neXtProt, SGD, FlyBase, WormBase, MGD, TAIR, eggNOG, MetaCyc, InterPro, Gene3D, Pfam, SMART and ProtoNet, which are featured in this issue. However, many database interactions are more subtle: for example, BioMart has been recently used to link protein annotation data from the Reactome database of metabolic networks (41) to phosphoproteomics data in PRIDE (30) and somatic mutations in COSMIC (42), which allowed putting cancer-related mutation data into a functional context (43). We believe that establishing connections between databases is an important way of improving the databases themselves, providing the user with additional search tools and, more generally, creating a live ecosystem that stores and expands knowledge. Accordingly, we consider it essential that the databases featured in the NAR Database Issue do their best in creating links to outside resources and providing an easy and straightforward way for the authors of other databases to link to their database content. Last year, we published a paper by the BioDBcore Working Group that proposed creating a resource of ‘minimal information about a biological database’, a community-defined, uniform, generic description of the core attributes of biological databases (44). Accordingly, submitters to this year's NAR Database Issue were asked to fill out a checklist of core attributes (available at http://www.biodbcore.org) of their databases and provide it as supplementary material to their manuscripts. Most of the authors complied with this request, which resulted in a stand-alone resource that contains machine-readable descriptions of the databases featured in this issue and is available from the BioSharing website (http://biosharing.org/biodbcore). We hope that this effort would illuminate the scope and general features of every listed database resource, including the community standards that these systems support, forge better contacts between their authors, simplify linking various data sets, and, eventually, bring greater clarity and integration to the whole field of molecular biology databases.

FUNDING

Intramural Research Program of the US National Institutes of Health at the National Library of Medicine (to M.Y.G.); European Molecular Biology Laboratory (to X.M.F.S.). Funding for open access charge: waived by Oxford University Press. Conflict of interest statement. The authors’ opinions do not necessarily reflect the views of their respective institutions.
  44 in total

1.  Serendipity in bioinformatics, the tribulations of a Swiss bioinformatician through exciting times!

Authors:  A Bairoch
Journal:  Bioinformatics       Date:  2000-01       Impact factor: 6.937

2.  IRESdb: the Internal Ribosome Entry Site database.

Authors:  Sophie Bonnal; Christel Boutonnet; Leonel Prado-Lourenço; Stéphan Vagner
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

3.  The NCBI dbGaP database of genotypes and phenotypes.

Authors:  Matthew D Mailman; Michael Feolo; Yumi Jin; Masato Kimura; Kimberly Tryka; Rinat Bagoutdinov; Luning Hao; Anne Kiang; Justin Paschall; Lon Phan; Natalia Popova; Stephanie Pretel; Lora Ziyabari; Moira Lee; Yu Shao; Zhen Y Wang; Karl Sirotkin; Minghong Ward; Michael Kholodov; Kerry Zbicz; Jeffrey Beck; Michael Kimelman; Sergey Shevelev; Don Preuss; Eugene Yaschenko; Alan Graeff; James Ostell; Stephen T Sherry
Journal:  Nat Genet       Date:  2007-10       Impact factor: 38.330

4.  Unique protein database imperiled.

Authors:  N Williams
Journal:  Science       Date:  1996-05-17       Impact factor: 47.728

5.  Genomic information infrastructure after the deluge.

Authors:  Julian Parkhill; Ewan Birney; Paul Kersey
Journal:  Genome Biol       Date:  2010-07-26       Impact factor: 13.583

6.  COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer.

Authors:  Simon A Forbes; Nidhi Bindal; Sally Bamford; Charlotte Cole; Chai Yin Kok; David Beare; Mingming Jia; Rebecca Shepherd; Kenric Leung; Andrew Menzies; Jon W Teague; Peter J Campbell; Michael R Stratton; P Andrew Futreal
Journal:  Nucleic Acids Res       Date:  2010-10-15       Impact factor: 16.971

7.  The PeptideAtlas project.

Authors:  Frank Desiere; Eric W Deutsch; Nichole L King; Alexey I Nesvizhskii; Parag Mallick; Jimmy Eng; Sharon Chen; James Eddes; Sandra N Loevenich; Ruedi Aebersold
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

8.  IRESite--a tool for the examination of viral and cellular internal ribosome entry sites.

Authors:  Martin Mokrejs; Tomás Masek; Václav Vopálensky; Petr Hlubucek; Philippe Delbos; Martin Pospísek
Journal:  Nucleic Acids Res       Date:  2009-11-16       Impact factor: 16.971

9.  NCBI Peptidome: a new repository for mass spectrometry proteomics data.

Authors:  Li Ji; Tanya Barrett; Oluwabukunmi Ayanbule; Dennis B Troup; Dmitry Rudnev; Rolf N Muertter; Maxim Tomashevsky; Alexandra Soboleva; Douglas J Slotta
Journal:  Nucleic Acids Res       Date:  2009-11-26       Impact factor: 16.971

10.  The Proteomics Identifications database: 2010 update.

Authors:  Juan Antonio Vizcaíno; Richard Côté; Florian Reisinger; Harald Barsnes; Joseph M Foster; Jonathan Rameseder; Henning Hermjakob; Lennart Martens
Journal:  Nucleic Acids Res       Date:  2009-11-11       Impact factor: 16.971

View more
  32 in total

Review 1.  Online tools for bioinformatics analyses in nutrition sciences.

Authors:  Sridhar A Malkaram; Yousef I Hassan; Janos Zempleni
Journal:  Adv Nutr       Date:  2012-09-01       Impact factor: 8.701

2.  MBASED: allele-specific expression detection in cancer tissues and cell lines.

Authors:  Oleg Mayba; Houston N Gilbert; Jinfeng Liu; Peter M Haverty; Suchit Jhunjhunwala; Zhaoshi Jiang; Colin Watanabe; Zemin Zhang
Journal:  Genome Biol       Date:  2014-08-07       Impact factor: 13.583

3.  Unlocking the Power of Big Data at the National Institutes of Health.

Authors:  Meghan F Coakley; Maarten R Leerkes; Jason Barnett; Andrei E Gabrielian; Karlynn Noble; M Nick Weber; Yentram Huyen
Journal:  Big Data       Date:  2013-06-06       Impact factor: 2.128

Review 4.  Drug target inference through pathway analysis of genomics data.

Authors:  Haisu Ma; Hongyu Zhao
Journal:  Adv Drug Deliv Rev       Date:  2013-01-28       Impact factor: 15.470

Review 5.  Crowdsourcing for bioinformatics.

Authors:  Benjamin M Good; Andrew I Su
Journal:  Bioinformatics       Date:  2013-06-19       Impact factor: 6.937

6.  The Internet as Scientific Knowledge Base: Navigating the Chem-Bio Space.

Authors:  Katrin Stierand; Tim Harder; Thomas Marek; Matthias Hilbig; Christian Lemmen; Matthias Rarey
Journal:  Mol Inform       Date:  2012-08-07       Impact factor: 3.353

7.  Nanoinformatics: a new area of research in nanomedicine.

Authors:  Victor Maojo; Martin Fritts; Diana de la Iglesia; Raul E Cachau; Miguel Garcia-Remesal; Joyce A Mitchell; Casimir Kulikowski
Journal:  Int J Nanomedicine       Date:  2012-07-24

8.  Building Linked Open Data towards integration of biomedical scientific literature with DBpedia.

Authors:  Yasunori Yamamoto; Atsuko Yamaguchi; Akinori Yonezawa
Journal:  J Biomed Semantics       Date:  2013-03-13

9.  Probabilistic error correction for RNA sequencing.

Authors:  Hai-Son Le; Marcel H Schulz; Brenna M McCauley; Veronica F Hinman; Ziv Bar-Joseph
Journal:  Nucleic Acids Res       Date:  2013-04-04       Impact factor: 16.971

10.  The 2013 Nucleic Acids Research Database Issue and the online molecular biology database collection.

Authors:  Xosé M Fernández-Suárez; Michael Y Galperin
Journal:  Nucleic Acids Res       Date:  2012-11-30       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.