| Literature DB >> 28053160 |
Michael Y Galperin1, Xosé M Fernández-Suárez2, Daniel J Rigden3.
Abstract
This year's Database Issue of Nucleic Acids Research contains 152 papers that include descriptions of 54 new databases and update papers on 98 databases, of which 16 have not been previously featured in NAR As always, these databases cover a broad range of molecular biology subjects, including genome structure, gene expression and its regulation, proteins, protein domains, and protein-protein interactions. Following the recent trend, an increasing number of new and established databases deal with the issues of human health, from cancer-causing mutations to drugs and drug targets. In accordance with this trend, three recently compiled databases that have been selected by NAR reviewers and editors as 'breakthrough' contributions, denovo-db, the Monarch Initiative, and Open Targets, cover human de novo gene variants, disease-related phenotypes in model organisms, and a bioinformatics platform for therapeutic target identification and validation, respectively. We expect these databases to attract the attention of numerous researchers working in various areas of genetics and genomics. Looking back at the past 12 years, we present here the 'golden set' of databases that have consistently served as authoritative, comprehensive, and convenient data resources widely used by the entire community and offer some lessons on what makes a successful database. The Database Issue is freely available online at the https://academic.oup.com/nar web site. An updated version of the NAR Molecular Biology Database Collection is available at http://www.oxfordjournals.org/nar/database/a/. Published by Oxford University Press on behalf of Nucleic Acids Research 2017. This work is written by (a) US Government employee(s) and is in the public domain in the US.Entities:
Mesh:
Year: 2017 PMID: 28053160 PMCID: PMC5210597 DOI: 10.1093/nar/gkw1188
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Descriptions of new online databases in the 2017 NAR Database issue
| Database name | URL | Brief descriptiona |
|---|---|---|
| 3DSNP |
| Human noncoding SNPs: interactions with genes and other SNPs |
| AAgAtlas |
| Human AutoAntigen database |
| ADPriboDB |
| ADP-ribosylated proteins and sites |
| antiSMASH |
| antibiotics and Secondary Metabolite Analysis SHell |
| AraPheno |
| Phenotypic data for |
| ccNET |
| Co-expression networks for diploid and polyploid |
| CeNDR |
|
|
| CGDB |
| Circadian Gene database |
| CistromeDB |
| ChIP-Seq and DNase-Seq data in human and mouse |
| Coexpedia |
| Gene co-expression data mapped to medical subject headings (MeSH). |
| dbSAP |
| Single Amino acid Polymorphisms: SNP-derived variation in human proteins |
| denovo-db |
| Human |
| DrugCentral |
| Active ingredients of approved pharmaceutical products, indications and mode of action |
| EURISCO |
| European catalogue for plant genetic resources |
| ExAC browser |
| Exome Aggregation Consortium sequence data |
| Exposome-Explorer |
| Biomarkers of exposure to disease risk factors |
| FAIRDOMHub |
| Findable, Accessible, Interoperable and Reusable Data, Operating procedures and Models |
| FuzDB |
| Database of fuzzy protein complexes |
| GenomeCRISPR |
| High-throughput screening using the CRISPR/Cas-9 system |
| GTRD |
| Gene Transcription Regulation Database |
| HieranoiDB |
| Ortholog groups and trees inferred by Hieranoid2 software |
| IGSR |
| International Genome Sample Resource |
| IMG/VR |
| DOE Joint Genome Institute Viral Resource |
| JET2 Viewer |
| Joint Evolutionary Trees: protein-protein interaction patches in known structures |
| jPOSTrepo |
| Japanese ProteOme STandard repository |
| KERIS |
| Kaleidoscope of gEne Responses to Inflammation among Species |
| LinkProt |
| Topologically complex protein structures |
| LNCediting |
| RNA editing sites in lncRNAs from human, monkey, mouse and fly |
| MEGaRes |
| Mechanisms of antimicrobial resistance |
| Membranome |
| A database of single-pass membrane proteins |
| MethSMRT |
| DNA methylation data from Single Molecule, Real-Time sequencing |
| mirDNMR |
| Background |
| Monarch Initiative |
| Human disease-related genotypes and phenotypes in model organisms |
| MRPrimerV |
| PCR primer pairs for detecting RNA virus-mediated infectious diseases |
| mutLBSgeneDB |
| Mutations in Ligand Binding Sites gene DataBase |
| NSDNA |
| Nervous System Disease NcRNA Atlas |
| Ontobee |
| Ontology database server of OBO Foundry |
| Open Targets |
| Target validation platform: links between potential drug targets and diseases |
| pathDIP |
| Pathway data integration and analysis portal |
| PathoYeastract |
| Transcription regulation in pathogenic yeasts |
| PceRBase |
| Plant competing endogenous RNAs |
| Pharos |
| Data on unstudied and understudied drug targets |
| PLaMoM |
| Plant Mobile Macromolecules: Extracellular siRNAs, microRNAs, mRNAs and proteins in plants |
| Plant Reactome |
| Plant metabolic, regulatory and signaling pathways |
| PMDBase |
| Plant microsatellites and marker development |
| POSTAR |
| Post-transcriptional regulation by RNA-binding proteins |
| proGenomes |
| Consistently annotated bacterial and archaeal genomes |
| Proteome-pI |
| Pre-computed isoelectric points for >5000 proteomes |
| REDIportal |
| A-to-I RNA editing events in human |
| RNALocate |
| RNA localization in the cell |
| SNP2TFBS |
| Regulatory SNPs affecting predicted transcription factor binding sites |
| SoyNet |
| Co-functional networks for soy bean |
| TFBSbank |
| Transcription Factor Binding Site profiles deduced from ChIP-seq or ChIP-chip data |
| TSTMP |
| Target Selection database for human TransMembrane Proteins |
| Uniclust |
| Clustered protein sequences and multiple sequence alignments |
| WERAM |
| Writers, Erasers and Readers of histone Acetylation and Methylation |
aAt the time of this writing, references to the databases featured in this issue have not yet been finalized; please see the Database Issue Table of Contents.
Updated descriptions of databases most recently published elsewhere
| Database | URL | Brief descriptiona |
|---|---|---|
| CARD |
| Comprehensive Antibiotic Research Database |
| dbDEMC |
| Differentially expressed miRNAs in human cancers |
| DisGeNET |
| Genetic determinants of human diseases |
| ECOD |
| Evolutionary Classification Of protein Domains |
| GETPrime |
| Gene- or transcript-specific primers for qPCR |
| HIPPIE |
| Human Integrated Protein–Protein Interaction rEference |
| HipSci |
| Human induced pluripotent Stem cells initiative |
| IMG-ABC |
| Integrated Microbial Genomes—Atlas of Biosynthetic gene Clusters |
| Influenza Research Database |
| All data on influenza: sequences, strains, alignments, trees, variation, epitopes, classification and surveillance |
| LincSNP |
| Association of human lncRNAs with disease-related SNPs |
| MalaCards |
| Human maladies and their annotations, organized into ‘disease cards’ |
| pVOGs |
| Prokaryotic Virus Orthologous Groups of proteins |
| Proteome Xchange |
| Proteomics resources portal |
| RAID |
| Human RNA–RNA and RNA–protein interactions |
| SZGR |
| SchiZophrenia Gene Resource |
| WDCM |
| World Data Center of Microorganisms collections |
| XTalkDB |
| Crosstalk among signaling pathways |
aFor full references to the databases featured in this issue, please see the Table of Contents.
The ‘golden set’ of the most popular databases featured in multiple NAR issuesa
| No.b | Database name | Current URL | Brief description |
|
|---|---|---|---|---|
| Annual updates | ||||
| 1 | DDBJ |
| All known nucleotide and protein sequences | 2000, 2002–2017 ( |
| 2 | ENA |
| All known nucleotide and protein sequences | 1986, 1990, 1997–2017 ( |
| 3 | GenBank |
| All known nucleotide and protein sequences | 1986, 1988, 1990–1994, 1996–2000, 2002–2017 ( |
| 27 | Ensembl |
| Annotated information on eukaryotic genomes | 2002-2017 ( |
| 87 | Mouse Genome Database |
| Mouse genome database | 1997-2017 ( |
| 316 | UCSC Genome Browser |
| A universal genome viewing and analysis platform | 2006-2017 ( |
| 318 | UniProt |
| A universal database of protein sequences (includes Swiss-Prot and TrEMBL) | 1991-1994, 1996–2000, 2003, 2004–2010, 2012–2015, 2017 ( |
| Regular updates | ||||
| 338 | ArrayExpress |
| Array-based gene expression data | 2003, 2005, 2007, 2009, 2011, 2013, 2015 ( |
| 420 | BioCycd |
| Pathway information for sequenced genomes | 2005, 2008, 2010, 2012, 2014, 2016 ( |
| 800 | BioGRID |
| Genetic and physical interactions in yeast, worm and fly | 2006, 2008, 2011, 2013, 2015, 2017 ( |
| 421 | BRENDA |
| Enzyme names and biochemical properties | 2002, 2004, 2007, 2009, 2011, 2013, 2015, 2017 ( |
| 645 | CGD |
| Candida Genome Database | 2005, 2007, 2010, 2012, 2014, 2016 ( |
| 1531 | CanSAR |
| Cancer research and drug discovery resource | 2012, 2014, 2016 ( |
| 258 | CATH |
| Protein domain structure database | 1999, 2000, 2001, 2003, 2005, 2007, 2009, 2011, 2013, 2015, 2017 ( |
| 1211 | CAZy |
| Carbohydrate-Active enZymes database | 2009, 2014 ( |
| 204 | CDD |
| Conserved Domain Database | 2002, 2003, 2005, 2007, 2009, 2011, 2013, 2015, 2017 ( |
| 646 | ChEBI |
| Chemical Entities of Biological Interest | 2008, 2013, 2016 ( |
| 1548 | ChEMBL |
| Interaction of drugs and compounds with their targets | 2012, 2014, 2017 ( |
| 803 | ChimerDB |
| Chromosome translocations and gene fusions | 2006, 2010, 2017 ( |
| 7 | COG |
| Clusters of Orthologous Groups of proteins | 2000, 2001, 2015 ( |
| 1188 | Comparative Toxicogenomics Database |
| A knowledgebase for curated chemical-gene-disease networks | 2009, 2011, 2013, 2017 ( |
| 651 | COSMIC |
| Catalogue of Somatic Mutations in Cancer | 2010, 2011, 2015, 2017 ( |
| 68 | CyanoBase |
| Cyanobacterial genomes | 1998, 1999, 2000, 2010, 2014, 2017 ( |
| 885 | dbPTM |
| Post-translational modification of proteins | 2006, 2013, 2014, 2016 ( |
| 591 | DBTSS |
| Database of transcriptional start sites | 2002, 2004, 2006, 2008, 2010, 2012, 2015 ( |
| 445 | DEG |
| Database of essential genes | 2004, 2009, 2014 ( |
| 446 | DictyBase |
| Model organism database for | 2004, 2006, 2009, 2011, 2013 ( |
| 811 | DrugBank |
| Drug and drug target database | 2006, 2008, 2011, 2014 ( |
| 108 | EcoCyc |
|
| 1996, 1997, 1998, 2000, 2002, 2005, 2009, 2011, 2013 ( |
| 1068 | eggNOG |
| Evolutionary genealogy of genes: Non-supervised Orthologous Groups | 2008, 2010, 2012, 2014, 2016 ( |
| 1347 | ELM |
| Eukaryotic Linear Motif: functional sites in eukaryotic proteins | 2003, 2008, 2010, 2011, 2012, 2014, 2016 ( |
| 812 | EMAGE |
| e-Mouse Atlas of Gene Expression | 2006, 2008, 2010, 2014 ( |
| 985 | ENCODE project at UCSC |
| Encyclopedia of DNA Elements, functional elements in human genome | 2007, 2010–2013 ( |
| 33 | EPD |
| Eukaryotic Promoter Database | 1998, 1999, 2000, 2002, 2004, 2006, 2013, 2015, 2017 ( |
| 91, 969, 1219 | EuPathD |
| Unified genome databases on eukaryotic pathogens (includes PlasmoDB, ToxoDB, ApiDB, TrichDB, TriTrypDB, GiardiaDB, etc.) | 2002, 2003, 2007–2013, 2017 ( |
| 1294 | Expression Atlas |
| Dene expression patterns deduced from microarray and RNA-seq data | 2010, 2012, 2014, 2016 ( |
| 465 | FANTOM |
| Functional annotation of mouse full-length cDNA clones | 2002, 2011, 2016, 2017 ( |
| 1020 | FINDBase |
| Frequencies of INherited Disorders | 2007, 2011, 2014, 2017 ( |
| 71 | FlyBase |
| Drosophila sequences and genomic information | 1994, 1996–1999, 2002, 2003, 2005–2009, 2012–2017 ( |
| 817 | FlyRNAi |
| Genome-wide RNAi analysis in Drosophila | 2006, 2012, 2017 ( |
| 472 | Gene3D |
| Structural domain assignments for protein sequences | 2003, 2005, 2006, 2008, 2010, 2012, 2014, 2016 ( |
| 73 | Genenames |
| The HGNC human gene nomenclature database | 2008, 2011, 2013, 2015, 2017 ( |
| 989 | GenomeRNAi |
| RNA interference data for human and Drosophila | 2007, 2010, 2013, 2017 ( |
| 603 | GEO |
| NCBI's Gene Expression Omnibus | 2005, 2007, 2009, 2011, 2013 ( |
| 487 | GO |
| Gene Ontology Database | 2004, 2006, 2008, 2010, 2012, 2013, 2015, 2017 ( |
| 389 | GOA |
| Gene Ontology annotations for proteins in UniProt | 2004, 2009, 2015 ( |
| 75 | GOLD |
| Genomes online database: completed and ongoing genome projects | 2001, 2006, 2008, 2010, 2012, 2015, 2017 ( |
| 166 | GPCRdb |
| Data and tools for studying G protein-coupled receptors | 1998, 2001, 2003, 2011, 2014, 2016 ( |
| 607 | Gramene |
| Comparative genomics of crops and model plant species | 2002, 2006, 2008, 2010, 2013, 2016 ( |
| 15 | GXD |
| Mouse Gene Expression Database | 1999-2001, 2004, 2007, 2011, 2014, 2017 ( |
| 1210 | HAMAP |
| High-quality Automated and Manual Annotation of Proteins | 2009, 2013, 2015 ( |
| 991 | HMDB |
| Human Metabolome Database | 2007, 2009, 2013 ( |
| 779 | IEDB |
| Immune Epitope Database | 2008, 2012, 2015 ( |
| 1089 | IMG/M |
| JGI's Integrated Microbial Genomics and Metagenomics | 2006, 2008, 2012, 2014, 2017 ( |
| 172 | IMGT |
| International ImMunoGeneTics database. | 1997-2001, 2003–2006, 2008–2010, 2015 ( |
| 690 | InParanoid |
| Orthologous relationships between eukaryotic proteomes | 2005, 2008, 2010, 2015 ( |
| 507 | IntAct |
| Protein–Protein INTerACTion data | 2004, 2007, 2010, 2012, 2014 ( |
| 207 | InterPro |
| Integrated resource of protein families, domains and functional sites | 2001, 2003, 2005, 2007, 2009, 2012, 2015, 2017 ( |
| 367 | IPD |
| Immuno Polymorphism database (includes IMGT/HLA) | 2001, 2003, 2005, 2009, 2010, 2011, 2013, 2015 ( |
| 516 | JASPAR |
| PSSMs for transcription factor DNA-binding sites | 2004, 2006, 2008, 2010, 2014, 2016 ( |
| 112 | KEGG |
| Kyoto Encyclopedia of Genes and Genomes: genes, proteins, pathways | 1999, 2000, 2002, 2004, 2006, 2008, 2010, 2012–2014, 2016, 2017 ( |
| 177 | MEROPS |
| Database of proteases (peptidases) | 1999, 2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014, 2016 ( |
| 114 | MetaCyc |
| Metabolic pathways and enzymes in various organisms | 2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014, 2016 ( |
| 529 | miRBase |
| MicroRNA sequences, names and predicted targets in animals | 2006, 2008, 2011, 2014 ( |
| 1098 | miRGator |
| MicroRNA expression profiles and mRNA targets | 2008, 2011, 2013 ( |
| 994 | miRGen |
| MicroRNA promoters and transcription start sites | 2007, 2010, 2016 ( |
| 1423 | miRTarBase |
| Experimentally validated microRNA–target interactions | 2011, 2013, 2016 ( |
| 270 | MMDB |
| Molecular Modeling Database of protein structures | 1999, 2000, 2002, 2003, 2007, 2012, 2014 ( |
| 840 | MODOMICS |
| RNA modification pathways | 2006, 2009, 2013 ( |
| 152 | Mouse Tumor Biology Database |
| Mouse as a model system of human cancers | 1999, 2000, 2007, 2015 ( |
| 1453 | neXtProt |
| A database of human proteins | 2012, 2015, 2017 ( |
| 705 | NONCODE |
| A database of noncoding RNAs | 2005, 2008, 2012, 2014, 2016 ( |
| 143 | OMIM |
| Online Mendelian inheritance in man: A catalog of human genetic and genomic disorders | 1994, 2002, 2005, 2009, 2015 ( |
| 1108 | OrthoDB |
| An hierarchical catalog of orthologous proteins | 2008, 2011, 2013, 2015, 2017 ( |
| 552 | PANTHER |
| Protein sequence evolution mapped to functions and pathways | 2003, 2005, 2007, 2010, 2013, 2016, 2017 ( |
| 1000 | PATRIC |
| PathoSystems Resource Integration Center | 2007, 2014, 2017 ( |
| 276 | PDB |
| Protein DataBank: All biological macromolecular structures | 2000-2002, 2004–2006, 2011, 2013, 2015, 2017 ( |
| 456 | PDBe |
| Protein Databank in Europe | 2010-2012, 2014, 2016 ( |
| 278 | PDBsum |
| Summaries and analyses of PDB structures | 2001, 2005, 2009, 2014 ( |
| 210 | Pfam |
| Protein families: Multiple sequence alignments and profile hidden Markov models of protein domains | 1998-2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014, 2016 ( |
| 852 | PHI-base |
| Genes affecting fungal pathogen–host interactions | 2006, 2008, 2015, 2017 ( |
| 194 | PIR |
| Protein Information Resource, part of UniProt | 1986, 1988, 1991–1994, 1996–2004 ( |
| 857 | PRIDE |
| Proteomics peptide identification database | 2006, 2008, 2013, 2016 ( |
| 212 | PRINTS |
| Protein fingerprints, conserved motifs used to characterise a protein family | 1994, 1996–2000, 2002, 2003 ( |
| 215 | Prosite |
| Biologically-significant protein patterns and profiles | 1991-1994, 1996, 1997, 1999, 2002, 2004, 2006, 2008, 2010, 2013 ( |
| 735 | PubChem |
| Structures and biological activities of small organic molecules | 2009, 2010, 2014, 2016, 2017 ( |
| 93 | RGD |
| Rat Genome Database | 2002, 2005, 2007, 2009, 2015 ( |
| 243 | RDP |
| Ribosomal Database Project: Bacterial and archaeal 16S rRNA and fungal 28S rRNA sequences | 19991-1994, 1996, 1997, 1999–2001, 2003, 2005, 2007, 2009, 2014 ( |
| 612 | Reactome |
| A database of metabolic and signaling pathways | 2005, 2009, 2011, 2014, 2016 ( |
| 224 | REBASE |
| Restriction enzyme database | 1993, 1994, 1996–2001, 2003, 2005, 2007, 2010, 2015 ( |
| 391 | RefSeq |
| NCBI Reference Sequence Database | 2000, 2001, 2005, 2007, 2009, 2012, 2014–2016 ( |
| 382 | Rfam |
| RNA families with multiple sequence alignments | 2003, 2005, 2009, 2011, 2013, 2015 ( |
| 282 | SCOP |
| Structural Classification Of Proteins | 1997, 1999, 2000, 2002, 2004, 2008, 2014 ( |
| 352 | SGD |
| Saccharomyces Genome Database | 1998, 1999, 2002–2008, 2010, 2012, 2014, 2016 ( |
| 1183 | SILVA |
| Aligned small- and large subunit rRNA sequences | 2007, 2013, 2014 ( |
| 867 | SIMAP |
| Similarity Matrix of Proteins | 2006, 2008, 2010, 2014 ( |
| 218 | SMART |
| Simple Modular Architecture Research Tool: signalling, extracellular and chromatin-associated protein domains | 1999, 2000, 2002, 2004, 2006, 2009, 2012, 2015 ( |
| 1134 | STITCH |
| Search Tool for Interactions of Chemicals | 2008, 2010, 2012, 2014, 2016 ( |
| 582 | STRING |
| Predicted functional associations between proteins | 2000, 2003, 2005, 2007, 2009, 2011, 2013, 2015, 2017 ( |
| 285 | SUPERFAMILY |
| Genome-wide identification of protein domains of known structure | 2002, 2004, 2007, 2009, 2011, 2015 ( |
| 585 | SWISS-MODEL |
| 3D models for proteins of unknown structure | 2003, 2004, 2006, 2009, 2014, 2017 ( |
| 97 | TAIR |
| The Arabidopsis information resource | 2001, 2003, 2008, 2012 ( |
| 1264 | TarBase |
| Database of experimentally supported microRNA targets | 2006, 2009, 2012, 2015 ( |
| 790 | TCDB |
| Transporter protein classification database | 2006, 2009, 2014, 2016 ( |
| 1452 | UCSC Cancer Genomics Browser |
| Visualization of cancer genomic datasets | 2011, 2013, 2015 ( |
| 1031 | VectorBase |
| Invertebrate vectors of human pathogens | 2007, 2009, 2012, 2015 ( |
| 51 | WormBase |
| Community portal on all aspects of C. elegans biology | 2001, 2003–2008, 2010, 2012, 2014, 2016 ( |
| 1151 | XenBase |
| Xenopus frog database | 2008, 2010, 2013, 2015 ( |
| 792 | YEASTRACT |
| Transcriptional regulation in Saccharomyces cerevisiae | 2006, 2008, 2011, 2014 ( |
| 101 | ZFIN |
| Zebrafish information network | 2001, 2003, 2011, 2013 ( |
aThis list includes databases that have been featured in the NAR Database Issue multiple times as separate papers. This listing omits many NCBI databases whose updated descriptions are published in annual NCBI overview papers.
bThe database entry in the NAR online Database Collection. For example, the summary for ArrayExpress (no. 338) is available at http://www.oxfordjournals.org/nar/database/summary/338.
cThe reference to the most recent database description that is available in PubMed (excludes the current issue).
dThis database has switched to subscription-based service and is no longer available without registration.