Literature DB >> 16381927

BioGRID: a general repository for interaction datasets.

Chris Stark¹, Bobby-Joe Breitkreutz, Teresa Reguly, Lorrie Boucher, Ashton Breitkreutz, Mike Tyers.

Abstract

Access to unified datasets of protein and genetic interactions is critical for interrogation of gene/protein function and analysis of global network properties. BioGRID is a freely accessible database of physical and genetic interactions available at http://www.thebiogrid.org. BioGRID release version 2.0 includes >116 000 interactions from Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens. Over 30 000 interactions have recently been added from 5778 sources through exhaustive curation of the Saccharomyces cerevisiae primary literature. An internally hyper-linked web interface allows for rapid search and retrieval of interaction data. Full or user-defined datasets are freely downloadable as tab-delimited text files and PSI-MI XML. Pre-computed graphical layouts of interactions are available in a variety of file formats. User-customized graphs with embedded protein, gene and interaction attributes can be constructed with a visualization system called Osprey that is dynamically linked to the BioGRID.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Multiprotein Complexes

Year: 2006 PMID： 16381927 PMCID： PMC1347471 DOI： 10.1093/nar/gkj109

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Protein interactions assemble the molecular machines of the cell and underlie the dynamics of virtually all cellular responses (1), while genetic interactions reveal functional relationships between and within regulatory modules (2). The sum of all such interactions defines the global regulatory network of the cell (3). Proteomic and functional genomics platform technologies now generate large datasets of protein and genetic interactions, but these datasets vary widely in coverage, data quality, annotation and availability (4,5). The collation of interaction data in a consistent, well-annotated format is essential for interrogation of gene function, investigation of system level attributes and benchmarking of high throughput (HTP) interaction studies. A number of interaction databases, including BIND (6), DIP (7), HPRD (8), IntAct (9), MINT (10), and MIPS (11), provide a variety of datasets and analysis tools. We have developed a biological General Repository for Interaction Datasets (BioGRID) to house and distribute comprehensive collections of physical and genetic interactions. The precursor to BioGRID was originally conceived as a laboratory information management system (LIMS) for HTP interaction data (12). The first public release of BioGRID (version 1.0; July 2002; then termed GRID) housed HTP two-hybrid and mass spectrometric protein interaction data generated from the budding yeast Saccharomyces cerevisiae (13). The BioGRID has since been elaborated into a resource for HTP interaction data from other species, including the nematode worm Caenorhabditis elegans, the fruit fly Drosophila melanogaster and human. In addition, the BioGRID now contains many genetic and protein interactions curated from focused studies reported in the primary literature [Reguly,T., Breitkreutz,A., Boucher,L., Breikreutz,B.-J., Hon,G., Myers,C., Parsons,A., Friesen,H., Oughtred,R., Tong,A. et al. (2005) Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae (submitted)]. The BioGRID has been queried for over 38 000 000 interactions since its inception. The recent version 2.0 release of BioGRID is a fully integrated cross-species database that supports most major model organisms, with increased data content and improved functionality.

HIGH THROUGHPUT INTERACTIONS

HTP approaches to identify novel protein and gene networks have begun to augment hypothesis-driven biochemical and genetic approaches (14). These hypothesis-generating HTP techniques include the two-hybrid (2-H) method for detecting pair-wise protein interactions (15–17), mass spectrometric (MS) analysis of purified protein complexes (12,18), and the synthetic genetic array (SGA) and molecular barcode (dSLAM) methods for systematic detection of synthetic lethal genetic interactions (19,20). BioGRID currently includes HTP protein interaction datasets from two systematic mass spectrometric studies (12,18) and three two-hybrid studies (15–17) in S.cerevisiae, which total 12 994 interactions between 4478 proteins (Table 1). In addition, BioGRID contains all extant HTP genetic interaction datasets from both SGA and dSLAM approaches (19–22), totaling 6119 interactions between 1440 genes. Finally, BioGRID incorporates large-scale HTP two-hybrid surveys for C.elegans (23) and D.melanogaster (24,25), among others.

Table 1

Total number of interactions currently housed in BioGRID for indicated species

Species	Set	No. of nodes	No. of edges	No. of sources
S.cerevisiae	HTP-PI	4478	12 994	5
	LC-PI	3099	19 744	3132
	HTP-GI	1440	6119	21
	LC-GI	2656	11 234	3581
	Total	5370	50 091	5794
D.melanogaster	HTP-PI	6840	21 944	2
	LC-GI	1312	9164	1398
	Total	7216	31 108	1400
C.elegans	HTP-PI	2801	4453	1
H.sapiens	LC-PI	6374	30 761	11 921
All interactions	Total	21 761	116 413	19 116

Interactions include self edges, multiple sources and multiple experimental evidence types. HTP, high throughput; LC, literature curated; PI, protein interactions; GI, genetic interactions.

LITERATURE-DERIVED INTERACTIONS

HTP datasets are laden with false positive and negative interactions (4,5). This shortfall compromises both prediction of gene/protein function and network-level analysis. The primary literature contains a vast collection of well-validated physical and genetic interactions that, while searchable on a publication by publication basis in PubMed, are not available in a relational database. A comprehensive set of literature-derived interactions would serve as a gold standard both for HTP datasets and for automated text mining approaches, augment the predictive power of HTP data and enable a re-analysis of global network properties. Spurred on by these potential applications, significant efforts to curate interaction data from the primary literature are underway by several databases (6–11), as well as by the Gene Ontology (GO) consortium (26). We have recently manually parsed the entire S.cerevisiae literature for protein and genetic interactions [Reguly,T., Breitkreutz,A., Boucher,L., Breikreutz,B.-J., Hon,G., Myers,C., Parsons,A., Friesen,H., Oughtred,R., Tong,A. et al., submitted for publication]. This comprehensive curation effort yielded 19 744 protein interactions and 11 234 genetic interactions, all of which have been placed into BioGRID. We note that the size of this literature dataset exceeds all HTP datasets combined. BioGRID also contains imports of 10 943 literature-derived genetic interactions from Flybase (27) and 30 761 literature-derived interactions from HPRD (8). The total number of literature interactions in BioGRID currently stands at over 70 000 (Table 1). In addition to the S.cerevisiae literature, we have curation efforts underway for the fission yeast Schizosaccharomyces pombe, the fruit fly Drosophila melanogaster and focused aspects of the human protein interaction literature, all of which will be deposited in BioGRID.

SEARCH FEATURES

The primary method of data access for BioGRID is via the web-based search interface. Combined JavaScript, PHP and Cascading Style Sheets (CSS) enable an interface that is both easy to interpret and navigate. BioGRID is supported by all main standards-compliant web browsers. Searches may be based on a wide range of supported identifiers, including gene name, ORF name, PubMed ID and free text. All genes/proteins retrieved by the query are listed in tabular format and are internally hyperlinked to allow rapid recursive searches. The BioGRID search interface retrieves the results, compiles interaction redundancies often found in large datasets and/or in combined multiple datasets, and provides an annotation-rich results page for further investigation (Figure 1). Annotation features include descriptions of gene/protein function and GO biological process, molecular function and cellular compartment terms (26).

Figure 1

A sample search and result page provided by the BioGRID for the query yeast gene KSS1. Annotated results are collapsed to remove redundancy and hyperlinked to allow recursive searches and access to external resources. Graphical representation at upper left shows all interactions annotated with color-coded GO terms and experimental evidence. Graphs are generated by Osprey and may be downloaded in JPEG, PNG and SVG formats.

VISUALIZATION

As network complexity increases, tabular formats for data display quickly overwhelm human comprehension. Graphical representation of interaction networks not only enables a high density of data to be visualized but immediately conveys complex inter-relationships between graph nodes, in this case either proteins or genes. A defining feature of the GRID database is an inter-dependent visualization tool called Osprey () that runs as a desktop application in Windows, Linux and OSX environments (28). The Osprey platform is a facile graphical interface to query BioGRID datasets, from which the user can build custom graphical representations of any chosen set of interactions. Osprey represents individual genes/proteins by nodes and interactions by edges that connect nodes. Additional color-coded annotation is embedded in nodes and edges to represent GO categories, experimental evidence and/or data source information. A variety of graphical layouts and toggle options afford different views of the network. The Osprey file format captures all annotation associated with each node/edge in the graph, and can thus be used as a graphical file exchange format for interaction data. User-defined datasets can be up-loaded into Osprey for annotation and integration with public datasets in BioGRID. Osprey graphs can also be saved in JPEG, PNG, SVG file formats for figure construction. Pre-computed graphical representations of the first-order interaction shell for every gene/protein in the BioGRID are included on each results page and are available for direct download (Figure 1).

DATABASE STRUCTURE AND ANNOTATION

The BioGRID web interface was developed with PHP 5.0.4 and is hosted on an Apache 2.0 web server at our primary mirror (). The entire package is capable of running on any PHP 4.x compatible web server, and has been tested successfully on IIS, Apache 1.3 and Apache 2.0. BioGRID currently uses freely available MySQL 4.1 as its primary database management system () for both the web-based interface and interaction curation. The BioGRID is readily established on in-house servers and is easily adapted as an internal data management system by the individual laboratory. Consistent annotation is essential in order to collapse redundant interactions into a single search result and ensure accuracy for queries and results. All ancillary annotation is compiled from over 25 popular web-based resources, extracted and stored via an annotation compilation system (ACS) written with Java Technology and Java SDK version 1.4.2. BioGRID annotation tables are updated on a monthly basis and made freely available via the web-based interface. The BioGRID ACS currently supports 294140 genes in 13 different organisms: Saccharomyces cerevisiae, Schizosaccharomyces pombe, Caenorhabditis elegans, Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Canis familiaris, Bos taurus, Arabidopsis thaliana, Xenopus laevis, Takifugu rubripes and Danio rerio.

DOWNLOADS AND ACCESS

All the interaction data present in BioGRID is freely downloadable at . Data is available in multiple formats including tab-delimited text file and PSI-MI XML (29), as well as in Osprey and other graphical file formats. BioGRID supports the data exchange standard PSI version 2.5, as mandated by the International Molecular Exchange Consortium (IMEx) that aims to facilitate the open distribution of interaction data (see ). Interaction data is updated regularly, and all downloadable files are refreshed to reflect the most recent changes. Download files are customizable by publication, record, organism and experimental system. To maximize performance and minimize database downtime, mirror versions of BioGRID are under construction in the US and Europe. Information on curation contributions or hosting a mirror may be obtained from the BioGRID website. Source code is freely available on request. The BioGRID is actively linked to the Saccharomyces Genome Database (30), Flybase (27) and Germ Online (31) websites.

FUTURE DEVELOPMENT

We will continue to curate interactions from major model organisms, including human, which will be posted as monthly updates of interaction data. Annotation will be routinely updated to allow unambiguous retrieval of protein/gene names. Capability to house quantitative genetic interactions and curated post-translational modifications will be implemented in the near future. We also plan to support complex and pathway descriptions, and to enable cross-species predictions though BLAST-based alignments of orthologous networks (32). A planned open source release version of the BioGRID platform, called ProtoGRID, will simplify installation of local versions of BioGRID. Similarly, the curation management system will be released to facilitate curation of interaction data by interested groups. Finally, graphical representations will be augmented through network clustering based on user-defined attributes, including co-expression and co-localization.

32 in total

1. From molecular to modular cell biology.

Authors: L H Hartwell; J J Hopfield; S Leibler; A W Murray
Journal: Nature Date: 1999-12-02 Impact factor: 49.962

Review 2. Assembly of cell regulatory systems through protein interaction domains.

Authors: Tony Pawson; Piers Nash
Journal: Science Date: 2003-04-18 Impact factor: 47.728

3. Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins.

Authors: T Ito; K Tashiro; S Muta; R Ozawa; T Chiba; M Nishizawa; K Yamamoto; S Kuhara; Y Sakaki
Journal: Proc Natl Acad Sci U S A Date: 2000-02-01 Impact factor: 11.205

Review 4. Functional genomics and proteomics: charting a multidimensional map of the yeast cell.

Authors: Gary D Bader; Adrian Heilbut; Brenda Andrews; Mike Tyers; Timothy Hughes; Charles Boone
Journal: Trends Cell Biol Date: 2003-07 Impact factor: 20.808

5. The synthetic genetic interaction spectrum of essential genes.

Authors: Armaity P Davierwala; Jennifer Haynes; Zhijian Li; Renée L Brost; Mark D Robinson; Lisa Yu; Sanie Mnaimneh; Huiming Ding; Hongwei Zhu; Yiqun Chen; Xin Cheng; Grant W Brown; Charles Boone; Brenda J Andrews; Timothy R Hughes
Journal: Nat Genet Date: 2005-09-11 Impact factor: 38.330

6. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae.

Authors: P Uetz; L Giot; G Cagney; T A Mansfield; R S Judson; J R Knight; D Lockshon; V Narayan; M Srinivasan; P Pochart; A Qureshi-Emili; Y Li; B Godwin; D Conover; T Kalbfleisch; G Vijayadamodar; M Yang; M Johnston; S Fields; J M Rothberg
Journal: Nature Date: 2000-02-10 Impact factor: 49.962

7. Systematic genetic analysis with ordered arrays of yeast deletion mutants.

Authors: A H Tong; M Evangelista; A B Parsons; H Xu; G D Bader; N Pagé; M Robinson; S Raghibizadeh; C W Hogue; H Bussey; B Andrews; M Tyers; C Boone
Journal: Science Date: 2001-12-14 Impact factor: 47.728

8. A comprehensive two-hybrid analysis to explore the yeast protein interactome.

Authors: T Ito; T Chiba; R Ozawa; M Yoshida; M Hattori; Y Sakaki
Journal: Proc Natl Acad Sci U S A Date: 2001-03-13 Impact factor: 11.205

9. Development of human protein reference database as an initial platform for approaching systems biology in humans.

Authors: Suraj Peri; J Daniel Navarro; Ramars Amanchy; Troels Z Kristiansen; Chandra Kiran Jonnalagadda; Vineeth Surendranath; Vidya Niranjan; Babylakshmi Muthusamy; T K B Gandhi; Mads Gronborg; Nieves Ibarrola; Nandan Deshpande; K Shanker; H N Shivashankar; B P Rashmi; M A Ramya; Zhixing Zhao; K N Chandrika; N Padma; H C Harsha; A J Yatish; M P Kavitha; Minal Menezes; Dipanwita Roy Choudhury; Shubha Suresh; Neelanjana Ghosh; R Saravana; Sreenath Chandran; Subhalakshmi Krishna; Mary Joy; Sanjeev K Anand; V Madavan; Ansamma Joseph; Guang W Wong; William P Schiemann; Stefan N Constantinescu; Lily Huang; Roya Khosravi-Far; Hanno Steen; Muneesh Tewari; Saghi Ghaffari; Gerard C Blobe; Chi V Dang; Joe G N Garcia; Jonathan Pevsner; Ole N Jensen; Peter Roepstorff; Krishna S Deshpande; Arul M Chinnaiyan; Ada Hamosh; Aravinda Chakravarti; Akhilesh Pandey
Journal: Genome Res Date: 2003-10 Impact factor: 9.043

10. The Biomolecular Interaction Network Database and related tools 2005 update.

Authors: C Alfarano; C E Andrade; K Anthony; N Bahroos; M Bajec; K Bantoft; D Betel; B Bobechko; K Boutilier; E Burgess; K Buzadzija; R Cavero; C D'Abreo; I Donaldson; D Dorairajoo; M J Dumontier; M R Dumontier; V Earles; R Farrall; H Feldman; E Garderman; Y Gong; R Gonzaga; V Grytsan; E Gryz; V Gu; E Haldorsen; A Halupa; R Haw; A Hrvojic; L Hurrell; R Isserlin; F Jack; F Juma; A Khan; T Kon; S Konopinsky; V Le; E Lee; S Ling; M Magidin; J Moniakis; J Montojo; S Moore; B Muskat; I Ng; J P Paraiso; B Parker; G Pintilie; R Pirone; J J Salama; S Sgro; T Shan; Y Shu; J Siew; D Skinner; K Snyder; R Stasiuk; D Strumpf; B Tuekam; S Tao; Z Wang; M White; R Willis; C Wolting; S Wong; A Wrong; C Xin; R Yao; B Yates; S Zhang; K Zheng; T Pawson; B F F Ouellette; C W V Hogue
Journal: Nucleic Acids Res Date: 2005-01-01 Impact factor: 16.971

1529 in total

1. Identification of SUMO-2/3-modified proteins associated with mitotic chromosomes.

Authors: Caelin Cubeñas-Potts; Tharan Srikumar; Christine Lee; Omoruyi Osula; Divya Subramonian; Xiang-Dong Zhang; Robert J Cotter; Brian Raught; Michael J Matunis
Journal: Proteomics Date: 2015-01-07 Impact factor: 3.984

2. Phosphoproteome Response to Dithiothreitol Reveals Unique Versus Shared Features of Saccharomyces cerevisiae Stress Responses.

Authors: Matthew E MacGilvray; Evgenia Shishkova; Michael Place; Ellen R Wagner; Joshua J Coon; Audrey P Gasch
Journal: J Proteome Res Date: 2020-07-13 Impact factor: 4.466

3. IMID: integrated molecular interaction database.

Authors: Sentil Balaji; Charles Mcclendon; Rajesh Chowdhary; Jun S Liu; Jinfeng Zhang
Journal: Bioinformatics Date: 2012-01-11 Impact factor: 6.937

4. Modeling community-wide molecular networks of multicellular systems.

Authors: Kakajan Komurov
Journal: Bioinformatics Date: 2011-12-30 Impact factor: 6.937

5. Construction of regulatory networks using expression time-series data of a genotyped population.

Authors: Ka Yee Yeung; Kenneth M Dombek; Kenneth Lo; John E Mittler; Jun Zhu; Eric E Schadt; Roger E Bumgarner; Adrian E Raftery
Journal: Proc Natl Acad Sci U S A Date: 2011-11-14 Impact factor: 11.205

6. Functional wiring of the yeast kinome revealed by global analysis of genetic network motifs.

Authors: Sara Sharifpoor; Dewald van Dyk; Michael Costanzo; Anastasia Baryshnikova; Helena Friesen; Alison C Douglas; Ji-Young Youn; Benjamin VanderSluis; Chad L Myers; Balázs Papp; Charles Boone; Brenda J Andrews
Journal: Genome Res Date: 2012-01-26 Impact factor: 9.043

Review 10. Network analysis of GWAS data.

Authors: Mark D M Leiserson; Jonathan V Eldridge; Sohini Ramachandran; Benjamin J Raphael
Journal: Curr Opin Genet Dev Date: 2013-11-26 Impact factor: 5.578