Literature DB >> 22139927

MetaBase--the wiki-database of biological databases.

Dan M Bolser¹, Pierre-Yves Chibon, Nicolas Palopoli, Sungsam Gong, Daniel Jacob, Victoria Dominguez Del Angel, Dan Swan, Sebastian Bassi, Virginia González, Prashanth Suravajhala, Seungwoo Hwang, Paolo Romano, Rob Edwards, Bryan Bishop, John Eargle, Timur Shtatland, Nicholas J Provart, Dave Clements, Daniel P Renfro, Daeui Bhak, Jong Bhak.

Abstract

Biology is generating more data than ever. As a result, there is an ever increasing number of publicly available databases that analyse, integrate and summarize the available data, providing an invaluable resource for the biological community. As this trend continues, there is a pressing need to organize, catalogue and rate these resources, so that the information they contain can be most effectively exploited. MetaBase (MB) (http://MetaDatabase.Org) is a community-curated database containing more than 2000 commonly used biological databases. Each entry is structured using templates and can carry various user comments and annotations. Entries can be searched, listed, browsed or queried. The database was created using the same MediaWiki technology that powers Wikipedia, allowing users to contribute on many different levels. The initial release of MB was derived from the content of the 2007 Nucleic Acids Research (NAR) Database Issue. Since then, approximately 100 databases have been manually collected from the literature, and users have added information for over 240 databases. MB is synchronized annually with the static Molecular Biology Database Collection provided by NAR. To date, there have been 19 significant contributors to the project; each one is listed as an author here to highlight the community aspect of the project.

Entities: Chemical Disease Species

Mesh：

Year: 2011 PMID： 22139927 PMCID： PMC3245051 DOI： 10.1093/nar/gkr1099

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

When discussing biological databases, there are simply too many different resources to comprehensively cover the topic in a short introduction. There are well-established data warehouses that act as community repositories for data of a single type such as GenBank (1), PDB (2) and ArrayExpress (3). There are organism-specific databases, combining many different types of data under a unifying, genomic framework such as TAIR (4), FlyBase (5) and WormBase (6). There are databases of derived data, collecting and systematizing the body of knowledge from the scientific literature such as GTEx (http://www.ncbi.nlm.nih.gov/gtex/GTEX2/gtex.cgi), TRANSFAC (7), Brenda (8) and ChEMBL (9). There are competing databases that cover specific kinds of -omics information, collecting data from different experiments within a common biological theme such as DIP (10), HPID (11) and IntAct (12). There are classification databases (13,14), databases of terminology (15,16), databases of protein families (17,18) and databases built around diseases (19) or taxonomic groups (20). This list barely scratches the surface, but gives a flavour of the number, types and diversity of biological databases. As the type and volume of biological data continues to increase, so do the type and number of databases that analyse, integrate and summarize the available data. For example, querying the database of biomedical publications PubMed (21) shows that the number of unique publications with the word ‘database’ in the title has increased from just 2 in 1980 to 91 in 1990 and 469 in 2000. Since 1990, there has been an exponential increase in the number of database publications per year, reaching over 1000 per year between 2008 and 2010 (Figure 1). If this trend continues, the number of database publications per year will double to nearly 2000 by 2015.

Figure 1.

The growth in the number of database publications per year. Each bar shows the number of research articles with the keyword ‘database’ appearing in the article title in the given year. The count only covers articles indexed in PubMed. The increase shows an exponential trend that will produce nearly 2000 database publications per year by 2015. Biological databases have proven crucially important for basic research, however, the current growth in the available databases creates several problems. Researchers seeking the most up-to-date and comprehensive information in their domain may struggle to identify the definitive sources of reliable data from among the many resources available. Initially, it is difficult to judge the strengths, weaknesses, or status of the available resources without peer guidance. For these reasons, the proliferation of resources may, ironically, lead to an increase in redundancy, as new resources are created to cope with the perceived problems or omissions of existing databases. This process is exacerbated by a lack of public forums where researchers can engage database creators to discuss databases and suggest improvements. These issues have created an unfortunate situation whereby many resources are short-lived, existing for only a short time before being abandoned. This ‘half-life’ is analogous to ‘link rot’ (22). This creates a vicious cycle, whereby the publication of database resources is devalued (23). To address these problems, we have created MetaBase (MB), a wiki-based database of biological databases.

DATABASE DESCRIPTION

MB is a community-curated database of all the biological databases available on the Internet. The aim of the project is to make it easy for researchers to quickly find relevant information about useful databases. Entries can be searched, queried or browsed by category, and users can contribute, update and maintain the data in many different ways. Each database in MB is described in a semi-structured way using forms and templates. Entries carry data for various fields and allow a free-text description of the resource. In detail, data for each database include a brief description, a URL, a contact email, links to associated literature and various categorization tags. In addition, entries can carry various user comments and annotations. MB has been implemented using MediaWiki (MW), the same software that powers Wikipedia, probably the best known user-contributed resource in the world (http://wikipedia.org). The MediaWiki system allows users to contribute to the project on many different levels, ranging from authors and editors to curators and site designers. Within the MW system, we created one wiki-page per database entry. The information about each database is structured by using a template with named fields. The template stores data for each database internally using the Semantic MediaWiki extension (http://semantic-mediawiki.org), allowing data to be queried within the wiki directly, by additional extensions or via the semantic web. In particular, we use the Semantic Forms extension (http://www.mediawiki.org/wiki/SF) to allow users to create or edit entries and the Semantic Drilldown extension (http://www.mediawiki.org/wiki/SD) to allow users to explore the database. User comments are collected as free text, just like in Wikipedia.

FEATURES

The MW platform provides a robust base from which to build an online resource. By using MW, many powerful features are provided ‘for free’. The use of MW to support Wikipedia demonstrates the scalability and security of the system, guaranteeing developer support and providing a degree of familiarity to users. Out of the box, MW provides searching, editing, versioning, history and discussion features, as well as user account management and user-email functions. MW includes a powerful extension framework for easily adding functionality. One criticism of MW is that it provides largely unstructured information, not suitable for advanced searching or reporting. To this end, we employ Semantic MediaWiki and Semantic Forms to create a wiki-database system suitable for maintaining a user-contributed database of information.

DATABASE CONTENTS

Currently, there are 1795 entries in MB, each describing a different biological database. The initial release was derived from the content of the 2007 Nucleic Acids Research (NAR) Database Issue (24). Specifically, each database page was ‘seeded’ with text from the Molecular Biology Database Collection provided by NAR (25). Subsequent releases have been updated into MB on a semi-regular basis. Since the initial release, there have been over 100 user contributed resources added, in addition to 100 resources that were manually collected from the literature. Most of these were taken from database publications in BMC Bioinformatics and BMC Biology. To date, there have been 19 significant contributors to the project, each of whom has been listed as an author on this publication. This step was taken to highlight the community aspect of the MB project. The homepage has been visited approximately 100 000 times. The project has 80 registered users in total, and there have been approximately 15 000 edits. We hope that with ongoing improvements and through increased publicity, usage will continue to grow helping to establish MB as a powerful and referential community resource.

FUTURE DIRECTIONS

In the future, we hope to use MB as a resource to allow more communication between database developers and user communities, acting as a common portal for the biological database community. To achieve this goal, we will automatically register the database's contact email address and add the database's discussion page to that user's ‘watch list’. Comments will then automatically alert the contact, providing them with the opportunity to reply. We hope to add user rating functionality and usage statistics to each resource. This will be done with a combination of existing MediaWiki extensions, adding links to social networking sites and automatic queries to collect the number of citations for each resource. We expect that MB could be used as a source of genuine metadata for data integration projects, and we plan to incorporate ontologies such as EDaM (26,27) and the Biomedical Resource Ontology (28), and to develop links with similar projects such as BioCatalogue (29) and BioDBCore (30). Finally, we aim to improve the content of MB through an aggressive marketing strategy, contacting the relevant mailing lists, forums and news groups, as well as exploiting the collection of contact email addresses, thereby encouraging the community to contribute to the maintenance of this important resource.

RELATED WORK

MB is by no means unique. There are many related resources, falling into two broad categories: ‘BioWikis’ and ‘databases of biological databases’. First, there are several other ‘BioWiki’ projects. Like MB, these projects use the tremendously successful MediaWiki software platform to provide user-contributed content to the biological community. For a comprehensive list of important and interesting BioWiki projects, see the BioWiki database on Bioinformatcs.Org (http://bioinformatics.org/wiki/BioWiki). The most successful collection of user-contributed content is Wikipedia (http://www.wikipedia.org/). The success of Wikipedia is intimately related to the success of the MediaWiki software platform, leading to a proliferation of wikis, including several BioWiki projects. However, Wikipedia is still a very important resource for biologists (e.g. http://en.wikipedia.org/wiki/Wikipedia:MCB). Wikipedia maintains a sizeable list of biological databases (http://en.wikipedia.org/wiki/List_of_biological_databases), and many of the databases in MB also have articles in Wikipedia. Second, there are several ‘databases of biological databases’, which aim to provide a list of all the most important biological databases and data resources available on the Internet. Several prominent biological database collections and related projects are listed in Table 1 (see also http://metadatabase.org/wiki/Help:Related).

Table 1.

Projects with a similar scope to MB

Name	Description	URL
The Molecular Biology Database Collection	A public on-line resource that lists the databases described in Nucleic Acids Research, together with other databases of value to the biologist (25).	http://www.oxfordjournals.org/nar/database/c/
OBRC: Online Bioinformatics Resources Collection	Contains annotations and links for 1746 bioinformatics databases and software tools.	http://www.hsls.pitt.edu/guides/genetics/obrc/
The Bioinformatics Links Directory	Features curated links to molecular resources, tools and databases (31).	http://bioinformatics.ca/links_directory/
CABRI: Common Access to Biotechnology Resources and Information	An service to search European Biological Resource Centre catalogues. The catalogues may be searched independently, or as one, and the located materials ordered online or by post (32).	http://www.cabri.org/
DBD: Database of Biological Database	Consists of 1200 database entries covering wide range of databases useful for biological researchers.	http://www.biodbs.info/
BioDBCore	A community-defined description of the core attributes of biological databases (28).	http://biocurator.org/biodbcore.shtml
MetaBasis	A database of metadata for bioinformatics software tools and databases. The system contains 3229 published bioinformatics tools and databases (33).	http://bioserver-1.bioacademy.gr/Metabasis/
Biomed Central Databases	A catalogue of online databases with more than 1100 sites covering a wide range of biomedical topics.	http://databases.biomedcentral.com/
OReFiL	An Online Resource Finder for Life sciences (34).	http://orefil.dbcls.jp/
NIST Data Gateway	Provides easy access to many of the The National Institute of Standards and Technology databases, covering a many different scientific disciplines.	http://srdata.nist.gov/gateway/

These projects aim to list the most important biological databases and data resources available on the Internet. For a version of this table that you can edit, see http://metadatabase.org/wiki/Help:Related

Projects with a similar scope to MB These projects aim to list the most important biological databases and data resources available on the Internet. For a version of this table that you can edit, see http://metadatabase.org/wiki/Help:Related

DISCUSSION

Biological databases have proven crucially important for basic research. However, exponential growth in the volume of biological data has led to several problems. MB is an international, community-based database that aims to list all the commonly used biological databases in the world. Here, we have created a new scientific-wiki that addresses some of the issues described earlier. The first version of the system was based on a static database of biological databases that has been imported to a wiki system for community annotation. Although similar to several other ‘lists of resources’, MB is unique, being the only truly user-editable list of databases. The NAR Molecular Biology Database Collection is a curated database with strict criteria for inclusion. It covers only a relatively small number of the available molecular biology databases (M. Galperin, personal communication). In contrast, we hope MB, with its liberal wiki-based inclusion policy, might be useful as a wider, more general list with quicker updates.

FUNDING

Industrial Strategic technology development program, (10040231), “Bioinformatics platform development for next generation bioinformation analysis” funded by the Ministry of Knowledge Economy (MKE, Korea). Funding for Open access charge: Genome Research Foundation's internal Biowiki funds. Conflict of interest statement. None declared.

32 in total

1. The ENZYME database in 2000.

Authors: A Bairoch
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. The Biomedical Resource Ontology (BRO) to enable resource discovery in clinical and translational research.

Authors: Jessica D Tenenbaum; Patricia L Whetzel; Kent Anderson; Charles D Borromeo; Ivo D Dinov; Davera Gabriel; Beth Kirschner; Barbara Mirel; Tim Morris; Natasha Noy; Csongor Nyulas; David Rubenson; Paul R Saxman; Harpreet Singh; Nancy Whelan; Zach Wright; Brian D Athey; Michael J Becich; Geoffrey S Ginsburg; Mark A Musen; Kevin A Smith; Alice F Tarantal; Daniel L Rubin; Peter Lyster
Journal: J Biomed Inform Date: 2010-10-16 Impact factor: 6.317

3. The Pfam protein families database.

Authors: Robert D Finn; Jaina Mistry; John Tate; Penny Coggill; Andreas Heger; Joanne E Pollington; O Luke Gavin; Prasad Gunasekaran; Goran Ceric; Kristoffer Forslund; Liisa Holm; Erik L L Sonnhammer; Sean R Eddy; Alex Bateman
Journal: Nucleic Acids Res Date: 2009-11-17 Impact factor: 16.971

4. BioCatalogue: a universal catalogue of web services for the life sciences.

Authors: Jiten Bhagat; Franck Tanoh; Eric Nzuobontane; Thomas Laurent; Jerzy Orlowski; Marco Roos; Katy Wolstencroft; Sergejs Aleksejevs; Robert Stevens; Steve Pettifer; Rodrigo Lopez; Carole A Goble
Journal: Nucleic Acids Res Date: 2010-05-19 Impact factor: 16.971

5. BioXSD: the common data-exchange format for everyday bioinformatics web services.

Authors: Matús Kalas; Pål Puntervoll; Alexandre Joseph; Edita Bartaseviciūte; Armin Töpfer; Prabakar Venkataraman; Steve Pettifer; Jan Christian Bryne; Jon Ison; Christophe Blanchet; Kristoffer Rapacki; Inge Jonassen
Journal: Bioinformatics Date: 2010-09-15 Impact factor: 6.937

6. HPID: the Human Protein Interaction Database.

Authors: Kyungsook Han; Byungkyu Park; Hyongguen Kim; Jinsun Hong; Jong Park
Journal: Bioinformatics Date: 2004-04-29 Impact factor: 6.937

7. ArrayExpress update--an archive of microarray and high-throughput sequencing-based functional genomics experiments.

Authors: Helen Parkinson; Ugis Sarkans; Nikolay Kolesnikov; Niran Abeygunawardena; Tony Burdett; Miroslaw Dylag; Ibrahim Emam; Anna Farne; Emma Hastings; Ele Holloway; Natalja Kurbatova; Margus Lukk; James Malone; Roby Mani; Ekaterina Pilicheva; Gabriella Rustici; Anjan Sharma; Eleanor Williams; Tomasz Adamusiak; Marco Brandizi; Nataliya Sklyar; Alvis Brazma
Journal: Nucleic Acids Res Date: 2010-11-10 Impact factor: 16.971

8. The Online Bioinformatics Resources Collection at the University of Pittsburgh Health Sciences Library System--a one-stop gateway to online bioinformatics databases and software tools.

Authors: Yi-Bu Chen; Ansuman Chattopadhyay; Phillip Bergen; Cynthia Gadd; Nancy Tannery
Journal: Nucleic Acids Res Date: 2006-11-15 Impact factor: 16.971

9. FlyBase: enhancing Drosophila Gene Ontology annotations.

Authors: Susan Tweedie; Michael Ashburner; Kathleen Falls; Paul Leyland; Peter McQuilton; Steven Marygold; Gillian Millburn; David Osumi-Sutherland; Andrew Schroeder; Ruth Seal; Haiyan Zhang
Journal: Nucleic Acids Res Date: 2008-10-23 Impact factor: 16.971

10. WormBase 2007.

Authors: Anthony Rogers; Igor Antoshechkin; Tamberlyn Bieri; Darin Blasiar; Carol Bastiani; Payan Canaran; Juancarlos Chan; Wen J Chen; Paul Davis; Jolene Fernandes; Tristan J Fiedler; Michael Han; Todd W Harris; Ranjana Kishore; Raymond Lee; Sheldon McKay; Hans-Michael Müller; Cecilia Nakamura; Philip Ozersky; Andrei Petcherski; Gary Schindelman; Erich M Schwarz; Will Spooner; Mary Ann Tuli; Kimberly Van Auken; Daniel Wang; Xiaodong Wang; Gary Williams; Karen Yook; Richard Durbin; Lincoln D Stein; John Spieth; Paul W Sternberg
Journal: Nucleic Acids Res Date: 2007-11-08 Impact factor: 16.971

17 in total

1. Regulation of miR-29b-1/a transcription and identification of target mRNAs in CHO-K1 cells.

Authors: Penn Muluhngwi; Kirsten Richardson; Joshua Napier; Eric C Rouchka; Justin L Mott; Carolyn M Klinge
Journal: Mol Cell Endocrinol Date: 2017-01-28 Impact factor: 4.102

2. Linking genome-scale metabolic modeling and genome annotation.

Authors: Edik M Blais; Arvind K Chavali; Jason A Papin
Journal: Methods Mol Biol Date: 2013

Review 3. Integrated Bio-Search: challenges and trends for the integration, search and comprehensive processing of biological information.

Authors: Marco Masseroli; Barend Mons; Erik Bongcam-Rudloff; Stefano Ceri; Alexander Kel; François Rechenmann; Frederique Lisacek; Paolo Romano
Journal: BMC Bioinformatics Date: 2014-01-10 Impact factor: 3.169

4. From data repositories to submission portals: rethinking the role of domain-specific databases in CollecTF.

Authors: Sefa Kılıç; Dinara M Sagitova; Shoshannah Wolfish; Benoit Bely; Mélanie Courtot; Stacy Ciufo; Tatiana Tatusova; Claire O'Donovan; Marcus C Chibucos; Maria J Martin; Ivan Erill
Journal: Database (Oxford) Date: 2016-04-25 Impact factor: 3.451

5. Ten Simple Rules for Developing Public Biological Databases.

Authors: Mohamed Helmy; Alexander Crits-Christoph; Gary D Bader
Journal: PLoS Comput Biol Date: 2016-11-10 Impact factor: 4.475

6. Profiling microRNAs in individuals at risk of progression to rheumatoid arthritis.

Authors: L Ouboussad; L Hunt; E M A Hensor; J L Nam; N A Barnes; P Emery; M F McDermott; M H Buch
Journal: Arthritis Res Ther Date: 2017-12-22 Impact factor: 5.156

7. Community intelligence in knowledge curation: an application to managing scientific nomenclature.

Authors: Lin Dai; Chao Xu; Ming Tian; Jian Sang; Dong Zou; Ang Li; Guocheng Liu; Fei Chen; Jiayan Wu; Jingfa Xiao; Xumin Wang; Jun Yu; Zhang Zhang
Journal: PLoS One Date: 2013-02-25 Impact factor: 3.240

8. Taking Open Innovation to the Molecular Level - Strengths and Limitations.

Authors: Barbara Zdrazil; Niklas Blomberg; Gerhard F Ecker
Journal: Mol Inform Date: 2012-08-07 Impact factor: 3.353

9. BioImg.org: A Catalog of Virtual Machine Images for the Life Sciences.

Authors: Martin Dahlö; Frédéric Haziza; Aleksi Kallio; Eija Korpelainen; Erik Bongcam-Rudloff; Ola Spjuth
Journal: Bioinform Biol Insights Date: 2015-09-10

10. RiceWiki: a wiki-based database for community curation of rice genes.

Authors: Zhang Zhang; Jian Sang; Lina Ma; Gang Wu; Hao Wu; Dawei Huang; Dong Zou; Siqi Liu; Ang Li; Lili Hao; Ming Tian; Chao Xu; Xumin Wang; Jiayan Wu; Jingfa Xiao; Lin Dai; Ling-Ling Chen; Songnian Hu; Jun Yu
Journal: Nucleic Acids Res Date: 2013-10-16 Impact factor: 16.971