Literature DB >> 19033364

Nucleic Acids Research annual Database Issue and the NAR online Molecular Biology Database Collection in 2009.

Abstract

The current issue of Nucleic Acids Research includes descriptions of 179 databases, of which 95 are new. These databases (along with several molecular biology databases described in other journals) have been included in the Nucleic Acids Research online Molecular Biology Database Collection, bringing the total number of databases in the collection to 1170. In this introductory comment, we briefly describe some of these new databases and review the principles guiding the selection of databases for inclusion in the Nucleic Acids Research annual Database Issue and the Nucleic Acids Research online Molecular Biology Database Collection. The complete database list and summaries are available online at the Nucleic Acids Research web site (http://nar.oxfordjournals.org/).

Entities: Disease Gene Species

Mesh：

Year: 2008 PMID： 19033364 PMCID： PMC2686608 DOI： 10.1093/nar/gkn942

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

THE 2009 DATABASE ISSUE

The 2009 Nucleic Acids Research (NAR) annual Database Issue is the 16th in a series that started in July 1993 with 24 database papers. This current issue comprises 179 papers describing 95 new databases and 84 status updates on databases that were previously described in NAR or other journals. These databases (along with further molecular biology databases that have been described in other journals) have been included into the NAR online Molecular Biology Database Collection (http://www.oxfordjournals.org/nar/database/a/), bringing the total number of databases in the collection to 1170 (16 obsolete databases have been removed from the list). The list of countries represented in the online collection has also increased through the inclusion of the first Argentinean database, TcSNP (http://snps.tcruzi.org), a database of genetic variation in Trypanosoma cruzi (1). On several occasions, we have included in the Database Issue two or more databases that have similar coverage. This issue features, for example, three different databases of tRNA sequences identified in the genomes of various organisms. Two of the papers describe recent updates to the Genomic tRNA Database (GtRNAdb, http://gtrnadb.ucsc.edu/), maintained at Todd Lowe's lab at the University of California—Santa Cruz (2), and to the compilation of tRNA sequences, originally created by Mathias Sprinzl at the University of Bayreuth and currently maintained as the Transfer RNA database [tRNAdb, http://trnadb.bioinf.uni-leipzig.de (3)] by a consortium that includes three more groups at the Universities of Leipzig, Marburg and Strasbourg. The third paper (4) describes a new database, tRNA Gene DataBase Curated by Experts (tRNADB-CE, http://trna.nagahama-i-bio.ac.jp), compiled by Takashi Abe and colleagues at the Nagahama Institute of Bio-Science and Technology in Shiga Prefecture, Japan. The Japanese team report that they have found as much as 4% discordance in tRNA predictions from three different programs and provide manual reconciliation of these results. In addition, the rrnDB database (http://ribosome.mmg.msu.edu/rrndb/), maintained by Thomas Schmidt and colleagues at Michigan State University (5), lists the numbers of rRNA and tRNA genes in various prokaryotic genomes. In our opinion, the availability of these databases ensures friendly competition, helps ensure accurate information and benefits the user by providing an unbiased assessment of tRNA predictions in any given organism. Likewise, the current Database Issue features two different databases of predicted microbial operons. One paper offers an update on the popular OperonDB database (http://www.cbcb.umd.edu/), originally created in 2001 by Steven Salzberg and colleagues (6) at The Institute for Genomic Research (TIGR) and currently maintained by Salzberg's group at the University of Maryland (7). The other details the Database of prOkaryotic OpeRons [DOOR, http://csbl1.bmb.uga.edu/OperonDB/(8)], which has been created by Ying Xu and colleagues (9) at the University of Georgia in Athens and utilizes an alternative algorithm for operon prediction. Together with the operon prediction data in the DOE's MicrobesOnline database [http://www.microbesonline.org/operons/ (10)], which relies on yet another prediction algorithm (11), these databases provide three different sets of predictions for the same genes in the same organisms and give the user an opportunity to compare sets and make an informed choice on the prediction that can be trusted. The importance of studying genomes of pathogens causing emerging and re-emerging diseases prompted inclusion in this issue of such databases as GiardiaDB, PlasmoDB, TrichDB and VectorBase (12–14), products of the Bioinformatics Resource Centers, supported by the US National Institutes of Health, National Institute of Allergy and Infectious Diseases (http://www3.niaid.nih.gov/research/resources/brc/). On some occasions, the number of papers dedicated to the same topic had to be limited. This year, for example, there were nine submissions of papers describing new databases dealing with microRNAs, not to mention the updates to the popular Rfam and TarBase databases (15,16) and a further seven microRNA databases already included in the NAR Database Collection. Four of these submissions have been accepted (17–20), based largely on volume of manually curated data and convenience for naïve users, but several otherwise viable databases have had to be rejected. Several databases that have been featured in the previous release of the NAR Database Collection have been removed from the list. One such casualty was the once popular Genome DataBase (GDB) featured in a number of NAR publications (21,22). After its initial success, this database struggled to find its niche, was moved from the Johns Hopkins University in Baltimore, Maryland, to the Hospital for Sick Children in Toronto, Canada, back to Johns Hopkins and finally came to rest in Research Triangle Institute (RTI International) in Research Triangle Park, North Carolina (23), where its operation closed down in 2008 after control of the project reverted to Johns Hopkins. Two other popular databases, eMOTIF (24) and HSSP (25), no longer support browsing, although their content remains available for download.

CRITERIA FOR INCLUSION

This Database Issue has been produced by a new team. After five very successful years at the helm, Alex Bateman retired as editor of the NAR Database Issue. Michael Galperin, who was previously responsible for the NAR Database Collection, became the new editor and Guy Cochrane came on board as curator of the NAR Database Collection. We are committed to continuing the policies of Alex Bateman and previous NAR database editors (Richard J. Roberts, Christian Burks and Andreas D. Baxevanis) that brought the NAR annual Database Issue and the NAR Database Collection to their current prominence. Given the unique position of the NAR Database Issue as a premier forum for the publication of molecular biology databases and the ever-increasing influx of proposed submissions, we feel that it would be useful to reiterate the guiding principles that we use in the selection of databases for inclusion in the NAR Database Issue and the Database Collection. First, coverage is by no means exhaustive; the NAR Database Issue and Database Collection were never intended to represent ‘all’ molecular biology databases, or even all of those that were publicly available. Rather, the NAR Database Issue features thoroughly curated databases that are expected to be of interest to a wide variety of biologists, primarily bench scientists. The key criteria for selection are the general utility of the database to the scientific community, comprehensiveness of coverage and degree of value added (usually in the form of manual curation) in the production of the database. We are primarily interested in web-accessible databases that offer carefully curated data that are not available elsewhere. Data warehouses, portals, cross-platform search tools and visualization tools are more suitable for such journals as Bioinformatics, BMC Bioinformatics or Database: The Journal of Biological Databases and Curation (http://www.oxfordjournals.org/our_journals/databa/), recently launched by our publisher, Oxford University Press. We will consider, however, data portals that add value to the user by providing a convenient one-stop source of disparate data not available elsewhere and supplement this with convenient search tools and easy-to-use visualization. We would generally avoid accepting databases on gene expression, as the underlying data must be submitted to ArrayExpress (26) and/or GEO (27). Similarly, we would avoid accepting new EST databases, particularly those dealing with individual species, as these data have a home in the DDBJ, GenBank and European Nucleotide Archive databases. Another important issue is consideration of so-called ‘boutique’ databases, covering relatively narrow topics. The key judgement here is whether or not the database in question is likely to be useful to those beyond specialists in the field that it covers and could serve as a useful introduction for the general public or scientists unfamiliar with the field. As an example, one of the microRNA databases mentioned above, miR2Disease [http://mlg.hit.edu.cn:8080/miR2Disease (18)] created by Yadong Wang and colleagues at Harbin Institute of Technology in Harbin, China, and Indiana University in Indianapolis, received high marks from the reviewers for linking two important areas and for its potential to introduce pathologists and other clinicians to the world of microRNAs. On the other hand, a large number of interesting plant databases has had to be rejected because the databases were designed to serve only very limited user groups. Plant databases have typically only been accepted when they appear to offer the potential to be of interest to scientists studying general biological problems, such as regulation of gene expression, protein–protein interactions, comparative genomics, and other subjects with universal appeal. In addition to the scientific quality of a database and its general utility to the scientific community, reviewers are also asked to evaluate whether the database is well curated and is likely to be maintained for a long period of time. Submission of a paper to the NAR Database Issue implies a commitment to maintaining the database on the part of the senior author and the host institution. Once the database paper has been published, graduation of a particular student or a postdoc is not considered a valid reason to discontinue maintaining the database or to move it out of the public domain. Should this happen, the respective senior authors (and in some cases, their host institutions) will be prevented from publishing new papers in the NAR Database Issue. Another important requirement is for a database not to have been described elsewhere. Authors’ desire to popularize their work sometimes results in the simultaneous submission of two different descriptions of a database to NAR and a more specialized journal, respectively. While this may seem trivial, we consider it a matter of principle that NAR papers should be unique. In certain rare cases, upon the request of authors, we may consider including in the NAR Database Issue a paper that was published elsewhere fewer than two years earlier. For example, owing to the importance of the IUBMB Enzyme Classification to a wide variety of biologists, a description of the ExplorEnz database (http://www.enzyme-database.org) has been included in the 2009 NAR Database Issue only a year after the database was first introduced in another publication (28,29). In most cases, however, such duplicate submissions will be rejected, sometimes in the later stages of the review process. This year, this has happened to several otherwise viable databases. Rejection of papers from the Database Issue does not necessarily disqualify these databases from inclusion into the NAR online collection but reduces the chances. Because the key criterion for inclusion is usefulness of the database to the community, the NAR Database Issue sometimes features unorthodox databases that the editors deem valuable, even if they do not fit standard expectations. For example, the database of highly similar Medline citations (Déjà vu, http://spore.swmed.edu/dejavu/), already mentioned in the previous comment (30), has been included into the Database Issue (31), as, in addition to its primary goal, it provides the useful service of allowing the users to search for experts in certain areas, the most appropriate journals in which to publish their work and who potential reviewers may be. Another such unorthodox database in the current issue is BodyParts3D [http://lifesciencedb.jp/ag/bp3d/(32)], a database of morphological and geometrical knowledge in human anatomy and a visualization tool for 3D reconstruction of the human body that, among other applications, will have huge utility in the mapping of gene-expression data onto tissues. To simplify the review process, all submissions to the Database Issue are pre-screened by the editor, Dr Michael Galperin (nardatabase@gmail.com). In 2008, the rejection rate of this pre-screening was lower than 50%, which resulted in an unusually high numbers of potential papers and of papers that were ultimately rejected based on reviewers’ comments. In future, we will employ stricter criteria for pre-screening, such that submissions will be invited only from those databases with the appropriate commitments to longevity and sustained value to users that have a realistic chance of surviving review. For update papers, inclusion criteria are even stricter. Only updates from the most popular databases, such as GenBank, the European Nucleotide Archive, DDBJ and UniProt, are published every year. From all other databases, updates can be submitted every other year, but only when there are significant new developments that warrant the publication. The decision on publication of any update paper will be made on a case-by-case basis, considering the importance of the database for the community, the amount of new material, improvements in data presentation and other measures. The NAR online Molecular Database Collection gets most of its content from the publications in annual NAR Database Issues and database papers published in Bioinformatics, our sister journal. As a result, it is a very selective list of databases that have gone through scrupulous peer review. The database list is annually vetted for continuity and obsolete databases are purged from the collection. We strive to maintain the NAR online Molecular Biology Database Collection as a curated list that features the best publicly available molecular biology databases.

FUNDING

Intramural Research Program of the US National Institutes of Health (to M.Y.G.); European Molecular Biology Laboratory (to G.R.C.). The Open Access publication charges for this article were waived by Oxford University Press. Conflict of interest statement. The authors’ opinions do not necessarily reflect the views of their respective institutions.

32 in total

1. The EMOTIF database.

Authors: J Y Huang; D L Brutlag
Journal: Nucleic Acids Res Date: 2001-01-01 Impact factor: 16.971

2. TcSNP: a database of genetic variation in Trypanosoma cruzi.

Authors: Alejandro A Ackermann; Santiago J Carmona; Fernán Agüero
Journal: Nucleic Acids Res Date: 2008-10-30 Impact factor: 16.971

3. DOOR: a database for prokaryotic operons.

Authors: Fenglou Mao; Phuongan Dam; Jacky Chou; Victor Olman; Ying Xu
Journal: Nucleic Acids Res Date: 2008-11-06 Impact factor: 16.971

4. GtRNAdb: a database of transfer RNA genes detected in genomic sequence.

Authors: Patricia P Chan; Todd M Lowe
Journal: Nucleic Acids Res Date: 2008-11-04 Impact factor: 16.971

5. OperonDB: a comprehensive database of predicted operons in microbial genomes.

Authors: Mihaela Pertea; Kunmi Ayanbule; Megan Smedinghoff; Steven L Salzberg
Journal: Nucleic Acids Res Date: 2008-10-23 Impact factor: 16.971

6. UCbase & miRfunc: a database of ultraconserved sequences and microRNA function.

Authors: Cristian Taccioli; Enrica Fabbri; Rosa Visone; Stefano Volinia; George A Calin; Louise Y Fong; Roberto Gambari; Arianna Bottoni; Mario Acunzo; John Hagan; Marilena V Iorio; Claudia Piovan; Giulia Romano; Carlo Maria Croce
Journal: Nucleic Acids Res Date: 2008-10-22 Impact factor: 16.971

7. The database of experimentally supported targets: a functional update of TarBase.

Authors: Giorgos L Papadopoulos; Martin Reczko; Victor A Simossis; Praveen Sethupathy; Artemis G Hatzigeorgiou
Journal: Nucleic Acids Res Date: 2008-10-27 Impact factor: 16.971

8. tRNAdb 2009: compilation of tRNA sequences and tRNA genes.

Authors: Frank Jühling; Mario Mörl; Roland K Hartmann; Mathias Sprinzl; Peter F Stadler; Joern Pütz
Journal: Nucleic Acids Res Date: 2008-10-28 Impact factor: 16.971

9. PlasmoDB: a functional genomic database for malaria parasites.

Authors: Cristina Aurrecoechea; John Brestelli; Brian P Brunk; Jennifer Dommer; Steve Fischer; Bindu Gajria; Xin Gao; Alan Gingle; Greg Grant; Omar S Harb; Mark Heiges; Frank Innamorato; John Iodice; Jessica C Kissinger; Eileen Kraemer; Wei Li; John A Miller; Vishal Nayak; Cary Pennington; Deborah F Pinney; David S Roos; Chris Ross; Christian J Stoeckert; Charles Treatman; Haiming Wang
Journal: Nucleic Acids Res Date: 2008-10-28 Impact factor: 16.971

10. Rfam: updates to the RNA families database.

Authors: Paul P Gardner; Jennifer Daub; John G Tate; Eric P Nawrocki; Diana L Kolbe; Stinus Lindgreen; Adam C Wilkinson; Robert D Finn; Sam Griffiths-Jones; Sean R Eddy; Alex Bateman
Journal: Nucleic Acids Res Date: 2008-10-25 Impact factor: 16.971

28 in total

Review 1. Designing and encoding models for synthetic biology.

Authors: Lukas Endler; Nicolas Rodriguez; Nick Juty; Vijayalakshmi Chelliah; Camille Laibe; Chen Li; Nicolas Le Novère
Journal: J R Soc Interface Date: 2009-04-01 Impact factor: 4.118

2. Fifteen years of microbial genomics: meeting the challenges and fulfilling the dream.

Authors: Nikos C Kyrpides
Journal: Nat Biotechnol Date: 2009-07 Impact factor: 54.908

Review 3. Lowering industry firewalls: pre-competitive informatics initiatives in drug discovery.

Authors: Michael R Barnes; Lee Harland; Steven M Foord; Matthew D Hall; Ian Dix; Scott Thomas; Bryn I Williams-Jones; Cory R Brouwer
Journal: Nat Rev Drug Discov Date: 2009-07-17 Impact factor: 84.694

Review 4. Allergen databases: current status and perspectives.

Authors: Adriano Mari; Chiara Rasi; Paola Palazzo; Enrico Scala
Journal: Curr Allergy Asthma Rep Date: 2009-09 Impact factor: 4.806

5. Omics-based molecular target and biomarker identification.

Authors: Zhang-Zhi Hu; Hongzhan Huang; Cathy H Wu; Mira Jung; Anatoly Dritschilo; Anna T Riegel; Anton Wellstein
Journal: Methods Mol Biol Date: 2011

6. Differential direct coding: a compression algorithm for nucleotide sequence data.

Authors: Gregory Vey
Journal: Database (Oxford) Date: 2009-09-14 Impact factor: 3.451

7. Translational bioinformatics applications in genome medicine.

Authors: Atul J Butte
Journal: Genome Med Date: 2009-06-29 Impact factor: 11.117

8. Hyperlink Management System and ID Converter System: enabling maintenance-free hyperlinks among major biological databases.

Authors: Tadashi Imanishi; Hajime Nakaoka
Journal: Nucleic Acids Res Date: 2009-05-19 Impact factor: 16.971

9. Triangle network motifs predict complexes by complementing high-error interactomes with structural information.

Authors: Bill Andreopoulos; Christof Winter; Dirk Labudde; Michael Schroeder
Journal: BMC Bioinformatics Date: 2009-06-27 Impact factor: 3.169

10. The rat genome database curators: who, what, where, why.

Authors: Mary Shimoyama; G Thomas Hayman; Stanley J F Laulederkind; Rajni Nigam; Timothy F Lowry; Victoria Petri; Jennifer R Smith; Shur-Jen Wang; Diane H Munzenmaier; Melinda R Dwinell; Simon N Twigger; Howard J Jacob
Journal: PLoS Comput Biol Date: 2009-11-26 Impact factor: 4.475