Literature DB >> 17148484

The Molecular Biology Database Collection: 2007 update.

Abstract

The NAR online Molecular Biology Database Collection is a public resource that contains links to the databases described in this issue of Nucleic Acids Research, previous NAR database issues, as well as a selection of other molecular biology databases that are freely available on the web and might be useful to the molecular biologist. The 2007 update includes 968 databases, 110 more than the previous one. Many databases that have been described in earlier issues of NAR come with updated summaries, which reflect recent progress and, in some instances, an expanded scope of these databases. The complete database list and summaries are available online on the Nucleic Acids Research web site http://nar.oxfordjournals.org/.

Entities: Species

Mesh：

Year: 2006 PMID： 17148484 PMCID： PMC1761423 DOI： 10.1093/nar/gkl1008

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

COMMENTARY

The current issue of the Nucleic Acids Research features 174 databases, of which 106 are new and 68 are updates of previously described databases. These new databases, as well as 15 ones described elsewhere, have been added to the NAR online Molecular Biology Database Collection (), bringing the total list to 968. The geography of the database collection kept expanding and now includes the first database created by Bulgarian (and US) scientists (BANMOKI, , at No. 976). On the other hand, 11 databases featured in the previous release of the NAR database collection (1) have been dropped from the list. Some of these (Crow21, PIR-NREF) have been superseded by newer and more advanced databases. Three databases (DbCat, GenetPig and HugeMap) were discontinued owing to the demise of the French INFOBIOGEN centre [some other INFOBIOGEN databases migrated to the new web site maintained by the Institut National de la Recherche Agronomique (INRA)]. The BIND project, which never lived up to its full promise, has gone commercial. In any case, these 11 databases comprise only a small fraction of total database list, which again held very nicely and showed surprising resilience. In the comment to the last year's release of the NAR database collection (1), I have discussed the citation rates for various papers in the 2004 NAR database issue and noted that the high-citation rate of certain databases reflects their worldwide acceptance as de facto standards of protein functional annotation [UniProt, , No. 318, Ref. 2], domain structure [, No. 210, Ref. 3] and biomedical terminology [Gene Ontology, , No. 487, Ref. 4]. However, citation data can be biased; e.g. in many articles use of information from publicly available databases is acknowledged by providing their URLs, or not acknowledged at all. Besides, some databases could be cited on the web sites and in new or obscure journals, not covered by the ISI Citation Index. With this in mind, I have tried here to use additional metrics for assessing the popularity of the NAR database issue. First, I have checked the citations of the database papers listed on the Google Scholar web site, which reflects citations on the web sites. In addition, I have looked at the number of times that the full text of each paper (in PDF or HTML versions) was downloaded from the PubMed Central web site (). It should be mentioned that all papers in the NAR database issue are freely available for downloading from PubMed Central and NAR web sites; the numbers of downloads from both sites are believed to be somewhat similar. The NAR website already lists the most frequently downloaded and most cited papers of all time, which include three papers on the Pfam database published in NAR, respectively, in 2000, 2002 and 2004 (5–7), as well as two papers on SwiissProt (8,9) and one on the Protein Data Bank (10), the same databases whose descriptions topped the list of the most cited papers from the 2004 database issue (1). It would seem that these three metrics all reflect usage of the NAR database issue: the user typically starts by finding a database of interest in PubMed or some other bibliographic database, then proceeds to browse the full text in the HTML format. If the paper is interesting enough, s/he would download its text in the PDF format. Finally, if the database turns to be useful, it might be acknowledged with a formal citation. Indeed, the number of HTML downloads and PDF downloads for the same paper correlated very well; the number of PDF downloads was about one-third of the HTML downloads (Figure 1). Curiously, citation rate poorly correlated with the number of downloads. The two most obvious deviations were the 2004 Pfam paper (7) that is extremely well cited but moderately downloaded (791 citations, 1992 total downloads) and my own comment (11) that is much more often downloaded than it is cited (59 citations, 1806 downloads). I am glad to report that, with a single exception, all papers in the 2004 NAR database issue have now been cited at least two times (and downloaded at least 260 times). That single non-cited exception is the description of the ORFDB (), the Invitrogen's collection of human and mouse ORF clones (12). This paper, which was never intended to be cited, has been nevertheless downloaded 983 times (including 207 times as a PDF) and apparently has served its purpose. Obviously, a list of downloads is an interesting and valuable tool for analyzing various trends in science. For example, of all papers in the 2006 NAR database issue, three of the top five downloads are all descriptions of microRNA databases, miRNAMap, miRBase and the Argonaute (13–15), which obviously reflects the explosive growth of this area. Highlighting such databases has always been and will remain the key goal of the NAR database issues and the NAR online Molecular Biology Database Collection.

Figure 1

The total number of full-text HTML downloads (closed squares) and literature citations (open squares) as function of the number of the PDF downloads for 142 papers in the 2004 NAR database issue.

15 in total

1. The Pfam protein families database.

Authors: A Bateman; E Birney; R Durbin; S R Eddy; K L Howe; E L Sonnhammer
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. The Molecular Biology Database Collection: 2004 update.

Authors: Michael Y Galperin
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

3. The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999.

Authors: A Bairoch; R Apweiler
Journal: Nucleic Acids Res Date: 1999-01-01 Impact factor: 16.971

4. Argonaute--a database for gene regulation by mammalian microRNAs.

Authors: Priyanka Shahi; Serguei Loukianiouk; Andreas Bohne-Lang; Marc Kenzelmann; Stefan Küffer; Sabine Maertens; Roland Eils; Herrmann-Josef Gröne; Norbert Gretz; Benedikt Brors
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

5. The Universal Protein Resource (UniProt): an expanding universe of protein information.

Authors: Cathy H Wu; Rolf Apweiler; Amos Bairoch; Darren A Natale; Winona C Barker; Brigitte Boeckmann; Serenella Ferro; Elisabeth Gasteiger; Hongzhan Huang; Rodrigo Lopez; Michele Magrane; Maria J Martin; Raja Mazumder; Claire O'Donovan; Nicole Redaschi; Baris Suzek
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

6. miRNAMap: genomic maps of microRNA genes and their target genes in mammalian genomes.

Authors: Paul W C Hsu; Hsien-Da Huang; Sheng-Da Hsu; Li-Zen Lin; Ann-Ping Tsou; Ching-Ping Tseng; Peter F Stadler; Stefan Washietl; Ivo L Hofacker
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

7. The Molecular Biology Database Collection: 2006 update.

Authors: Michael Y Galperin
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

8. miRBase: microRNA sequences, targets and gene nomenclature.

Authors: Sam Griffiths-Jones; Russell J Grocock; Stijn van Dongen; Alex Bateman; Anton J Enright
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

9. Pfam: clans, web tools and services.

Authors: Robert D Finn; Jaina Mistry; Benjamin Schuster-Böckler; Sam Griffiths-Jones; Volker Hollich; Timo Lassmann; Simon Moxon; Mhairi Marshall; Ajay Khanna; Richard Durbin; Sean R Eddy; Erik L L Sonnhammer; Alex Bateman
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

10. The Pfam protein families database.

Authors: Alex Bateman; Lachlan Coin; Richard Durbin; Robert D Finn; Volker Hollich; Sam Griffiths-Jones; Ajay Khanna; Mhairi Marshall; Simon Moxon; Erik L L Sonnhammer; David J Studholme; Corin Yeats; Sean R Eddy
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

15 in total

Review 1. Bioinformatics and cancer research: building bridges for translational research.

Authors: Gonzalo Gómez-López; Alfonso Valencia
Journal: Clin Transl Oncol Date: 2008-02 Impact factor: 3.405

2. Large datasets in biomedicine: a discussion of salient analytic issues.

Authors: Anshu Sinha; George Hripcsak; Marianthi Markatou
Journal: J Am Med Inform Assoc Date: 2009-08-28 Impact factor: 4.497

3. DoD2007: 1082 molecular biology databases.

Authors: Padavala Ajay Babu; Juttada Udyama; Rajam Kiran Kumar; Radha Boddepalli; Dhurjeti Sarva Mangala; Gollapalli Nageswara Rao
Journal: Bioinformation Date: 2007-10-12

4. Conducting research on the web: 2007 update for the bioinformatics links directory.

Authors: Joanne A Fox; Scott McMillan; B F Francis Ouellette
Journal: Nucleic Acids Res Date: 2007-06-22 Impact factor: 16.971

5. Leveraging existing biological knowledge in the identification of candidate genes for facial dysmorphology.

Authors: Hannah J Tipney; Sonia M Leach; Weiguo Feng; Richard Spritz; Trevor Williams; Lawrence Hunter
Journal: BMC Bioinformatics Date: 2009-02-05 Impact factor: 3.169

6. Nucleic Acids Research annual Database Issue and the NAR online Molecular Biology Database Collection in 2009.

Authors: Michael Y Galperin; Guy R Cochrane
Journal: Nucleic Acids Res Date: 2008-11-25 Impact factor: 16.971

7. Modeling genomic data with type attributes, balancing stability and maintainability.

Authors: Norbert Busch; Gero Wedemann
Journal: BMC Bioinformatics Date: 2009-03-27 Impact factor: 3.169

8. epiPATH: an information system for the storage and management of molecular epidemiology data from infectious pathogens.

Authors: Alicia Amadoz; Fernando González-Candelas
Journal: BMC Infect Dis Date: 2007-04-20 Impact factor: 3.090

9. Rule-based knowledge aggregation for large-scale protein sequence analysis of influenza A viruses.

Authors: Olivo Miotto; Tin Wee Tan; Vladimir Brusic
Journal: BMC Bioinformatics Date: 2008 Impact factor: 3.169

10. Userscripts for the life sciences.

Authors: Egon L Willighagen; Noel M O'Boyle; Harini Gopalakrishnan; Dazhi Jiao; Rajarshi Guha; Christoph Steinbeck; David J Wild
Journal: BMC Bioinformatics Date: 2007-12-21 Impact factor: 3.169