Literature DB >> 27980099

The BioGRID interaction database: 2017 update.

Andrew Chatr-Aryamontri¹, Rose Oughtred², Lorrie Boucher³, Jennifer Rust², Christie Chang², Nadine K Kolas³, Lara O'Donnell³, Sara Oster³, Chandra Theesfeld², Adnane Sellam⁴, Chris Stark³, Bobby-Joe Breitkreutz³, Kara Dolinski², Mike Tyers^5,3.

Abstract

The Biological General Repository for Interaction Datasets (BioGRID: https://thebiogrid.org) is an open access database dedicated to the annotation and archival of protein, genetic and chemical interactions for all major model organism species and humans. As of September 2016 (build 3.4.140), the BioGRID contains 1 072 173 genetic and protein interactions, and 38 559 post-translational modifications, as manually annotated from 48 114 publications. This dataset represents interaction records for 66 model organisms and represents a 30% increase compared to the previous 2015 BioGRID update. BioGRID curates the biomedical literature for major model organism species, including humans, with a recent emphasis on central biological processes and specific human diseases. To facilitate network-based approaches to drug discovery, BioGRID now incorporates 27 501 chemical-protein interactions for human drug targets, as drawn from the DrugBank database. A new dynamic interaction network viewer allows the easy navigation and filtering of all genetic and protein interaction data, as well as for bioactive compounds and their established targets. BioGRID data are directly downloadable without restriction in a variety of standardized formats and are freely distributed through partner model organism databases and meta-databases.

Entities: Chemical

Mesh：

Substances：
Proteins

Year: 2016 PMID： 27980099 PMCID： PMC5210573 DOI： 10.1093/nar/gkw1102

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Biological interactions, whether functional interactions between genes or physical interactions between proteins and other biomolecules, provide the framework to understand how the cell is structured and controlled. Interaction networks reflect the organization of cellular pathways and are hence essential for the interpretation of genomic, functional genomic, phenotypic and chemical screen data. Since the advent of high-throughput approaches to detect protein (1–3) and genetic interactions (4), the depth of coverage and accuracy of biological interaction networks has continued to improve (5,6), with increased resolution at the level of protein isoforms (7) and metabolites (8,9). Extensive network contexts now provide a basis for the rationalization of perturbations caused by disease-associated mutations (10–12) and have helped deconvolve complex mutational profiles generated by genome-wide association studies (GWAS) and next-generation sequencing-based approaches for analysis of the genome (13), transcriptome, and epigenome (14). The network paradigm thus holds the promise of predictive and precision medicine, as illustrated for example by the synthetic lethal interaction networks between cancer driver mutations and established drug targets (15). The Biological General Repository for Interaction Datasets (BioGRID: https://thebiogrid.org) was originally implemented in 2003 (16) to provide open access to high-throughput (HTP) interaction datasets (1–3), and subsequently to augment and benchmark HTP data with interactions drawn from focused low-throughput (LTP) studies in the biomedical literature (17). Since its inception as a yeast-specific database (16), BioGRID has grown to cover interactions from 66 different species, including all major model organisms and humans. Correspondingly, the number of annotated interactions in BioGRID has increased from 30 000 protein interactions in the original version to more than one million protein, genetic and chemical interactions in the current release (Figure 1).

Figure 1.

Increase in data content of BioGRID. Increments in interaction records and source publications reported in BioGRID from March 2010 (release 2.0.62) to September 2016 (release 3.4.140). Left panel shows the increase of annotated protein interactions (red), genetic interactions (green) and total interactions (blue). Right panel shows the number of publications that actually reported protein or genetic interactions (blue) as a function of the total number of publications examined by BioGRID curators (red). The primary focus of BioGRID is the manual curation of experimentally validated genetic and protein interactions that are reported in peer-reviewed biomedical publications. Text-mining approaches are now routinely used to accelerate and prioritize expert manual curation in BioGRID (18–20). All interactions in BioGRID are annotated by curators according to a structured set of experimental evidence codes. In addition BioGRID captures post-translational modification (PTM) data, such as sites of phosphorylation and ubiquitination, from both LTP and HTP studies. Recently, BioGRID has extended its data content to include the protein and/or genetic interactions of drugs, metabolites and other bioactive small molecules. The current version of BioGRID includes a newly implemented network viewer that allows visualization of all search results in an interactive graphical format. Additional new custom page views also allow the interrogation of PTM sites and chemical interaction data. BioGRID content is updated on a monthly basis and is made freely accessible via the web interface, through downloadable files in standardized formats, and through dissemination by model organism database (MOD) partners (21–26) and other biological resources (27–32).

DATABASE GROWTH AND STATISTICS

Since our 2015 NAR Database report (33), the number of curated interactions housed in BioGRID has increased by 30%. As of September 2016 (version 3.4.140), BioGRID contains 1 072 173 protein and genetic interactions, of which 836 212 are non-redundant interactions. These interactions correspond to 621 639 (470 810 non-redundant) protein interactions and 450 534 (373 762 non-redundant) genetic interactions (Table 1). These data were directly extracted from 47 223 manually annotated peer-reviewed publications, which were identified from the biomedical literature by keyword searches and text-mining approaches. Extensive manual inspection of candidate abstracts and/or full text papers reveals that approximately one in four candidate publications actually contains experimentally documented interaction data, such that many more publications are parsed by curators than are entered into BioGRID as sources of interaction data (Figure 1). All BioGRID interaction records are directly mapped to experimental evidence in the supporting publication, as classified by a structured set of evidence codes that map to the PSI-MI 2.5 standard (34). The BioGRID also currently contains data on 38 559 protein PTMs as curated from 4317 publications. These PTM data are now drawn mainly from high-throughput mass spectrometry studies, which are able to routinely survey many thousands of modification sites for any given PTM type (35). All yeast phosphorylation site data in BioGRID are also currently housed in the PhosphoGRID database (36), but this older database has been essentially subsumed by the new PTM functionalities of BioGRID (see below). BioGRID curation has recently focused on improved coverage for PTMs. For example, a recent themed project on the ubiquitin proteasome system has documented 130 184 sites of ubiquitin modification on proteins encoded by 8681 human genes and 20 019 sites on 2549 yeast proteins, all which will be released as a consolidated dataset by the end of 2016. A new aspect of BioGRID is the coverage of chemical interactions, typically for drug-protein targets and/or drug-gene interactions. BioGRID release 3.3.123 (April 2015) included 27 034 chemical–target interaction records drawn from DrugBank, a database of manually curated drug-target relationships (37). At present, BioGRID contains 2519 unique genes/proteins from 21 organisms that are linked to 4999 total unique chemicals as curated from 8989 publications. 2129 of these interactions are between drugs or other bioactive agents and human genes or proteins.

Table 1.

Increase in BioGRID data content since previous update

		August 2014 (3.2.115)			September 2016 (3.4.140)
Organism	Type	Nodes	Edges	Publications	Nodes	Edges	Publications
Arabidopsis thaliana	PI	7200	21 536	1414	9479	41 918	2168
	GI	112	192	66	246	298	125
Caenorhabditis elegans	PI	3288	6345	178	3277	6341	190
	GI	1129	2344	30	1123	2330	31
Drosophila melanogaster	PI	8076	37 606	416	8236	38 638	454
	GI	1042	9980	1483	1042	9979	1482
Escherichia coli	PI	105	102	14	108	109	17
	GI	34	25	11	4000	166 137	15
Homo sapiens	PI	18 435	237 498	23 388	20 914	365 547	25 383
	GI	1364	1678	273	1577	1663	283
Mus musculus	PI	8276	22 563	3207	11 892	38 163	3529
	GI	259	290	167	275	309	176
Saccharomyces cerevisiae	PI	6410	135 690	7402	6299	131 659	8074
	GI	5674	207 188	7257	5719	212 092	7880
Schizosaccharomyces pombe	PI	2694	11 270	1146	2946	12 817	1247
	GI	3158	56 745	1359	3208	57 847	1459
Other organisms	ALL	7999	12 367	1920	9688	14 814	2250
Total	ALL	55 528	749 912	43 149	65 031	1 072 173	47 223

Data drawn from monthly release 3.2.115 and 3.4.140 of BioGRID. Nodes refer to genes or proteins, edges refer to interactions. PI, protein (physical) interactions; GI, genetic interactions. All numbers represent total interactions curated. In 2016, Google Analytics reported that the BioGRID received on average 124 232 page views and 14 444 unique visitors per month, versus 88 080 page views and 12 399 unique visitors per month in 2014. We estimate that these page views correspond to perusal of ∼24 million interactions by BioGRID users in 2016. BioGRID data files were downloaded on average 10 135 times per month in 2016, compared with 9256 downloads per month in 2014. These statistics do not include the widespread dissemination of BioGRID records by various partner databases, which include the MODs Saccharomyces Genome Database (25), PomBase (23), Candida Genome Database (38), WormBase (26), FlyBase (24), TAIR (39), ZFIN (21) and MGD (22) and the meta-database resources NCBI (29), UniProt (28), Pathway Commons (30), BeagleDB (40) and others. In 2016, the BioGRID user base was located primarily in the USA (29%), followed by China (10%), United Kingdom (7%), Germany (6%), Canada (5%), Japan (4%), India (4%), France (4%) and all other countries (31%).

OVERALL CURATION STRATEGY

All interactions in BioGRID must be directly supported by experimental evidence in the source publications as identified by BioGRID curators. All curation activity is controlled by a dedicated Interaction Management System (IMS) that serves as the primary curator interface (33). The IMS is used to build publication lists and standardize all aspects of curation, including controlled vocabularies for experimental evidence, interaction types and gene names. BioGRID interaction annotation is based on Entrez Gene identifiers for genes and proteins. RefSeq protein identifiers are used for the annotation of PTMs, which are typically mapped to RefSeq by mass spectrometry search engines (see the BioGRID WikiPage for further details; URL: https://wiki.thebiogrid.org/doku.php/identifiers). The IMS also tracks all curator contributions for dispute resolution and curation consistency. BioGRID currently contains interaction data for 66 model organisms at varying depths of coverage. BioGRID continues to maintain complete curation of the primary literature for genetic and protein interactions in the model yeasts Saccharomyces cerevisiae (343 751 total interactions, 231 326 non-redundant interactions) and Schizosaccharomyces pombe (70 664 total interactions, 57 699 non-redundant interactions). These datasets are updated on a monthly basis and released for redistribution through the Saccharomyces Genome Database (25) and PomBase (23). Comprehensive curation of protein interactions is also maintained for the model plant Arabidopsis thaliana (39). It is not possible to manually curate the vast and ever-expanding human biomedical literature, which currently exceeds 16 million publications reported in PubMed with an average growth rate of >600 000 papers per year over the past 5 years. Instead, to achieve meaningful depth of coverage in key areas of human biology and disease, BioGRID has established a number of ongoing themed projects centered on cellular functions relevant for central cellular processes and/or major human diseases. Current themed curation projects on particular biological processes include inflammation, chromatin modification, autophagy, the ubiquitin proteasome system, the DNA damage response, phosphorylation-based signaling and stem cell regulators. Curation projects focused on particular diseases include cardiovascular disease and hypertension, brain cancer, and prevalent infectious diseases, such as tuberculosis and HIV. A recent example of a themed curation project on a central biological process is the conserved autophagy network that targets damaged organelles, cytoplasmic material, and pathogens to the lysosome for degradation. This process is mediated by a core set of 18 autophagy (ATG) proteins that drive membrane formation to mediate both selective and non-selective degradation (41). An additional 116 human genes associated with autophagy were identified through Gene Ontology (GO) annotations and ongoing literature review. This set of 134 genes was used to build a candidate publication list of over 7603 papers for review by BioGRID curators, of which 1277 publications yielded 7888 interactions that were entered into BioGRID. A recent example of a themed curation project with a disease focus is on glioblastoma (GBM) in collaboration with the Stand Up to Cancer (SU2C) team on brain cancer (see www.standup2cancer.ca). GBM has amongst the worst long term survival rates of all human cancers, in part because the brain tumour stem cells (BTSCs) that drive tumour growth readily acquire drug resistance (42). BioGRID curators have coordinated with scientists and clinicians in the SU2C team to identify a core set of 31 genes implicated in GBM, as drawn from cancer genome analyses for genes either mutated or of altered copy number in patient-derived tumour samples (43,44). From this GBM-associated gene list, 2443 papers have been curated to date to yield 8781 interactions. Once completed, this literature-derived GBM interaction network will serve as a resource for the interpretation of genome-scale sequence, transcriptional, epigenetic, proteomic and genetic datasets in GBM, with the goal of identifying new drug targets and drug combinations that are effective against this deadly cancer (42). Important curation contributions are also made by model organism and other database partners, and are prominently attributed through a ‘curated by’ icon that hyperlinks the record to the original source database. These attributions are listed directly in all search results for the entire BioGRID website and are also provided in download files. The BioGRID also works in close conjunction with the GO consortium (45), both to guide BioGRID curation efforts based on relevant GO terms, and on occasion to help elaborate branches of the GO, for example as pertains to the ubiquitin proteasome system. Finally, BioGRID supports pre-publication deposition of experimental results to facilitate rapid dissemination of HTP datasets generated by resource centers and other groups. For example, BioGRID curators have assisted with the formatting and upload of large-scale genetic interaction datasets for immediate release upon publication (46). In another instance, the biophysical interactions of ORFeome-based complexes (BioPlex) network generated by high-throughput affinity-purification mass spectrometry (6), was uploaded in BioGRID well in advance of publication in order to provide immediate open access to the dataset. Pre-publication data records are fully archived on a monthly basis along with all other BioGRID interactions but are excluded from BioGRID downloads until conversion into full BioGRID records upon publication of the dataset.

TEXT MINING

The vast amount of free form text that is used to report essentially all biomedical knowledge in journals precludes the manual distillation and annotation of biological interactions. Unfortunately, despite recent improvements in natural language processing based on artificial intelligence approaches (47), fully automated text mining systems (TMS) are unable to match the annotation accuracy of expert manual curation. Nonetheless, TMS can be of great utility in supporting biocuration tasks through the proficient triaging of non-pertinent literature and the provision of customized annotation interfaces that can both increase the rate of curation and track text statements that support curator inferences. Text-mining approaches to develop publication queues for curation have now largely superseded simple PubMed queries and are carried out in collaboration with leading text-mining groups. For example, a Support Vector Machine (SVM) method developed by the Textpresso (18,48,49) group was used to select pertinent full-text articles for the HIV, Wnt, arachidonic acid pathway (AAP) interaction networks and the A. thaliana project. In another example, a text-mining system developed by the RLIMS-P group (50) was used to help identify publications containing yeast phosphorylation site data. We have also recently implemented an in-house system that incorporates additional machine learning technologies (Chatr-aryamontri et al., in preparation) (51). The BioGRID team has strongly supported the biomedical text-mining community through the BioCreative initiative (52) by providing curation expertise and manually annotated gold standard reference datasets (53,54). For example, in the most recent BioCreative competition in 2015 one particular task focused on the development of a curation interface designed specifically for BioGRID curators (55), based on the BioC standard, an extensible mark-up language created to allow interoperability between biomedical text processing systems (56). This BioCreative task also produced an innovative text-based dataset that tracked all statements used by curators to infer interaction data and phenotypes (20). This dataset will allow the refinement of text-mining algorithms to more closely mimic the intuitive sparse coverage approaches used by human curators. The initial version of the BioC-based curation interface devised by the BioCreative consortium will support annotation for both genetic and protein interactions, and will allow both faster extraction of interaction data and better curation quality control.

GENETIC INTERACTION CURATION

The accurate representation of genetic interactions is a challenge due to the complexity of both phenotypes and the precise genetic context of an interaction. In an effort to more precisely describe genetic interactions, and reconcile the different terminologies used within the various model organism research communities, BioGRID has collaborated with WormBase (26) to develop a new Genetic Interactions Structured Terminology (GIST) (Grove et al., in preparation). This effort to delineate well-defined, standardized genetic interaction (GI) terms has been supported by various MODs, including ZFIN (21), FlyBase (24), SGD (25), CGD (38), PomBase (23) and TAIR (39). Importantly, the GIST will not only facilitate the interpretation of genetic interactions, but also the integration of large volumes of genetic interaction data across different species. The acute need for a structured terminology was recognized by both WormBase and BioGRID curators while attempting to coordinate the curation of worm genetic interactions between the two databases. For historical reasons, GI terms in BioGRID were biased towards yeast genetic interaction descriptors that were unduly restrictive because the phenotype was often implicit within the GI term. For example, the common term ‘synthetic lethality’ represents a greater than multiplicative genetic interaction and the implicit phenotype of cell growth. Since there is currently no separate ‘synthetic’ GI term in BioGRID for curating interactions with other phenotypes, the existing GI terms could not be used to effectively curate more complicated phenotypes that arise in yeast and even more frequently in more complex metazoans, including humans. To resolve this issue in a general form, that is to cover all possible GI scenarios, the new GIST was organized according to a structured set of genetic terms that are completely separated from the myriad of possible phenotypes that might be linked to the interaction. The GIST is thus intended to be used in conjunction with all relevant species- or tissue-specific phenotype ontologies such that the type of genetic interaction is curated as a separate entity from the specific phenotype that is scored. This approach allows BioGRID to take full advantage of rigorous phenotype ontologies across model systems and humans, including Uberon, the Monarch Initiative, the Human Phenotype Ontology, and others (57,58). For yeast genetic interactions, 11 of the current BioGRID GI terms map to seven of the new GIST terms that will be used for curation going forward in 2017. This mapping will allow automated back-curation of more than 270 000 yeast genetic interactions associated with over 600 unique phenotypes (25) to be fully automated. BioGRID will also implement GIST for the curation of genetic interactions in human and key model organisms, including yeast, worm, fly, zebrafish and mouse. The use of standardized GI terms will facilitate the cross-species integration of genetic interaction datasets produced by large-scale CRISPR/Cas9-based screens in human cells and other organisms (59,60).

CHEMICAL INTERACTIONS

Since our previous update, the BioGRID has initiated a new pilot project to incorporate curated chemical interaction data and to combine this data with other biological interaction types curated from the literature. Initially, this focus has been on chemical–protein interactions because many biochemical relationships between drugs, toxins and other bioactive compounds and their targets are documented in the literature. A number of drug-discovery associated databases have captured either direct or inferred evidence for drug-target interactions. In order to incorporate previously annotated chemical–protein interaction data into BioGRID, a minimal unified record structure compatible with the diverse annotation systems used across multiple chemical databases was required. We surveyed the content of the major specialized chemical interaction databases, including DrugBank (37), HMdb (61), T3DB (62), BindingDB (63), CTD (64), Therapeutic Target DB (65), ChemBank (66), PharmGKB (67), DGIdb (68), PubChem (69) and ChEMBL (70) to determine the shared fields housed in each of these databases. Based on this survey, a minimal interoperable record structure was designed that contains: the target protein based on UniProt or GeneID identifiers; generic chemical name, synonyms and/or brand name for the chemical agent; the class of agent, such as small molecule, natural product, or biologic; the structural formula of the agent; CAS and ATC identifiers for the agent; the molecular action or effect of the agent; associated citations; and the original database source. This minimal set of fields allows the facile import of data records into BioGRID and effective interoperability between multiple chemical databases. Relevant database sources for all of the associated records are explicitly acknowledged with reciprocal links to the parent database, thereby allowing users the option of direct access to the original source of data in a transparent fashion. As the first test case, BioGRID has recently imported manually curated chemical–target data records from DrugBank (37), which contains >12 800 experimental and approved drugs and >4200 proteins. The downloadable DrugBank files were parsed and drug-target interactions re-mapped to the minimal chemical record structure in BioGRID. The automated mapping was validated by extensive manual review to resolve any issues and ensure data integrity. To display chemical interaction information, the BioGRID interface was modified to include new tabs on the result summary page to show chemical associations for the protein of interest. These chemical associations have been incorporated into the BioGRID network viewer to allow users to visualize chemical, genetic and protein interactions as a single network if desired. BioGRID chemical association data are also made available for download in a standardized tab-delimited file format. These infrastructure developments now allow the straightforward incorporation of chemical–protein and chemical–genetic interaction data from any source into BioGRID.

ENHANCED INTERACTION NETWORK VIEWER

The BioGRID Network Viewer has recently undergone substantial revisions in order to improve its functionality for visualizing complex interaction data (Figure 2). Each search result page now has an embedded Javascript-based viewer that leverages the powerful Cytoscape.js platform (71) to display interactive graph-based data representations. A default network layout provides an intuitive overview of the overall topology of the network. In this view, individual nodes represent each protein, gene, or chemical, with the distance from the center of the network proportional to connectivity for each node. Node size is also proportional to the number of interaction partners for that node. Edge colours depict the type of relationship between entities, namely protein–protein, gene–gene, chemical–protein, chemical–gene or chemical–chemical interactions. Edge thickness represents the quantity of evidence in support of the connection, such that the thicker the edge, the more types of evidence support the interaction. All nodes and edges in the network viewer can be dragged by the user to any desired position and can be clicked (or right-clicked or two-finger clicked on a Macintosh computer) to show additional details such as experimental evidence or additional annotation. Individual nodes can be right-clicked (or two-finger clicked) and the option ‘display network’ chosen to generate a new network with the selected node in the center. Each embedded BioGRID network provides several additional built-in layout options that include grid, concentric circles, single circle and arbor views. Users can apply on-the-fly filtering to show or hide specific types of edges and use toggles to increase or decrease experimental evidence thresholds for edge and node visualization. All networks generated with the viewer can be saved as high-resolution PNG images for use in figures for presentation or publication.

Figure 2.

New BioGRID network viewer. (A) The network tab in the ‘Switch View’ menu opens a network view for the selected query gene, as shown for human DHFR. (B) Users may export the network view as a figure file in PNG format, set filters that show or hide interactions, set thresholds for experimental evidence, and select from a number of layout formats. Explanatory text is provided under the help menu. (C) Node and edge colour indicates the interaction type and node size is proportional to its connectivity. In this example, green nodes represent chemicals and blue nodes represent proteins. When common names are not available, compounds are abbreviated by the chemical formula. (D) Yellow edges represent protein interactions, green edges represent genetic interactions, blue edges represent chemical interactions and purple edges represent both protein and genetic interactions.

VISUALIZATION OF POST-TRANSLATIONAL MODIFICATIONS

We have made several improvements to the BioGRID PTM viewer, which displays PTM sites on the protein sequence of interest (Figure 3). A comprehensive new layout for PTM views indicates all linked protein records, including splice isoforms, and provides links to the curated evidence for the PTM. In contrast to the original PTM viewer, which displayed only phosphorylation sites in budding yeast proteins (36), the new PTM viewer enables visualization of PTMs for all species for all PTMs in BioGRID, including phosphorylation, ubiquitination, acetylation, methylation, sumoylation, fat10ylation, and neddylation. All PTMs for any given query protein can now be viewed on the associated PTM summary page. PTMs that have not been mapped to a specific residue in the protein of interest are now also displayed in addition to site specific PTMs. Proteins annotated with PTMs in the BioGRID are marked by icons on the search list and result summary pages, and clicking on any icon opens the PTM viewer for the entire protein sequence.

Figure 3.

New BioGRID post-translational modification (PTM) viewer. (A) Users can select the ‘PTM Sites’ tab from the ‘Switch View’ menu to view PTM data when available. (B) The ‘Stats & Options’ box indicates the number of PTM sites and defines the colours assigned to each PTM type. (C) PTM locations are displayed on the protein sequence with modified residues highlighted. (D) Assigned PTM sites are displayed in tabular format with supporting evidence and citations. (E and F) Non-assigned PTMs are displayed at the bottom of the page.

To accompany these improvements to the PTM viewer, we have recently completed an extensive project to migrate 57 819 PTMs that were originally housed in relatively obscure form within interaction records into the PTM viewer. Most notably, for the covalent protein modifier ubiquitin, we reassigned 49 425 annotations previously recorded in BioGRID as covalent protein interactions and demarcated in the free text notes as ‘likely ubiquitin conjugate’. The segregation of covalent ubiquitin modifications from non-covalent ubiquitin interactions properly delineates these two distinct types of interaction, and reduces the previous artificial dominance of ubiquitin as a super-hub in protein interaction networks. Non-covalent interactions between ubiquitin and recognition components of the ubiquitin-proteasome system are still retained as interaction records. As a consequence of the reassignment of ubiquitin and other small protein modifiers as PTMs, the number of protein interaction records for Homo sapiens decreased by 46 946 interactions and for S. cerevisiae decreased by 10 466 interactions in BioGRID release 3.4.125 (June 2015). However, these reductions were more than offset by curation of an even greater number protein interactions for each species since the previous update (Table 1). We anticipate the imminent addition of hundreds of thousands of new ubiquitin PTM sites with the release of a themed ubiquitin curation project in the immediate future. New BioGRID post-translational modification (PTM) viewer. (A) Users can select the ‘PTM Sites’ tab from the ‘Switch View’ menu to view PTM data when available. (B) The ‘Stats & Options’ box indicates the number of PTM sites and defines the colours assigned to each PTM type. (C) PTM locations are displayed on the protein sequence with modified residues highlighted. (D) Assigned PTM sites are displayed in tabular format with supporting evidence and citations. (E and F) Non-assigned PTMs are displayed at the bottom of the page.

DATABASE IMPROVEMENTS

In 2013, we completed the deployment of the BioGRID database, tools and web applications to a suite of six virtual machines (VMs) hosted by a commercial provider (Linode, NJ, USA). The VMs provide state-of-the-art processors, scalable memory, and native SSD high performance storage that can be expanded as needed. Each system has a fully redundant backup that runs daily and weekly, and is situated on a 40 Gigabit network for fast access by BioGRID developers and curators in different countries, as well as by BioGRID web interface and REST service users. Since deployment to cloud-based servers, the BioGRID software suite has maintained > 99.9% uptime. Since 2013, we have increased the processor speed and memory available on each VM in order to satisfy increased user demand. The number of VMs has been increased to eight in total in order to support additional new projects. Major improvements have been made to the size, speed, and storage capabilities of the MySQL database that underpins the BioGRID in order to incorporate the various new features described above. Finally, we have made continuous improvements to the extensive BioGRID annotation system that supports all BioGRID operations, including public-facing websites, download files, REST/PSICQUIC services, text-mining algorithms, and internal curation toolsets. The current annotation platform provides references for >100 million unique aliases, identifiers, systematic names and MOD references for >200 organisms, compared to the previous annotation system that supported ∼48 million identifiers for ∼100 supported organisms.

DATA DISSEMINATION

BioGRID data can be searched via the web search page or downloaded in a number of tabular (tab, tab 2 and mitab) and XML (PSI-MI 1.0, PSI-MI 2.5) formats. The BioGRID REST web service supports over 660 active projects worldwide that perform over 100 000 queries per month with an average return of ∼2 million interactions per month. The IMEx consortium PSICQUIC API interface (72) also fielded over 140 000 queries per month to BioGRID from a wide variety of third party plugins. For example, the REST service enables the direct comparison of all data in BioGRID to real time experimental data in the ProHits mass spectrometry LIMS (73). With the release of BioGRID 3.4, we introduced a new search result page that allows the relationship between any two entities to be viewed and linked to independently. This feature enables external resources such as NCBI (29), Uniprot (28) and others (23–25,27,30,31,39,74–76) to link to individual entity relationships described within a publication rather than simply to an entire search page result as previously. BioGRID search results now include more details, including the number of interaction partners, interactions, PTMs and chemical interactions. Furthermore, when applicable each result page also indicates whether a particular interaction was curated as part of one or more themed projects. We have continued to update our online Wiki documentation with detailed information on all aspects of BioGRID tools and resources (see https://wiki.thebiogrid.org). In early 2016, we released two protocol papers that outline key functions in step-by-step processes to aid new users in using the platform (77,78). The BioGRID also maintains an active e-mail help desk to assist users and facilitate the direct deposition of large datasets (biogridadmin@gmail.com). We continue to update and post all new related source code repositories at our GitHub organizational page (https://github.com/BioGRID) and we continue to update both our Twitter (https://twitter.com/biogrid) and YouTube Channel (https://www.youtube.com/user/TheBioGRID) with the latest BioGRID news and feature updates.

FUTURE DEVELOPMENTS

The BioGRID will continue to annotate high quality protein, genetic and chemical interaction data, with increasing attention to human datasets as focused on themes of central biological processes and specific human diseases. The BioGRID curation pipeline will be enhanced through the integration of ever more sophisticated text-mining tools, which will be implemented in collaboration with text-mining groups and the BioCreative consortium (19,48,49,55,79,80). These efforts will be augmented by collaborations with diverse database partners, including MODs, phenotype databases, and chemical databases. In particular, chemical–protein interaction datasets will be prioritized for elaboration with specific attention to drugs, metabolites, toxins and bioactive small molecules with the goal of facilitating network-based approaches to drug-discovery (37,63,70,81). We will continue to improve search and visualization tools to expedite the analysis of interaction datasets and to provide additional resources and support for the propagation of BioGRID interaction data through partner databases. An imminent new update to the BioGRID database architecture will allow the seamless acquisition and integration of human genetic and chemical–genetic interaction data generated in human cell lines and various model organisms by CRISPR/Cas9 gene editing technology (59,60). These improvements will also allow the precise capture of more complex interaction contexts, for example higher order genetic interactions, splice isoform dependent protein interactions, and tissue-specific interactions (82). The BioGRID will thus continue to evolve as a biological interaction data resource for the biomedical research community.

82 in total

1. Functional organization of the yeast proteome by systematic analysis of protein complexes.

Authors: Anne-Claude Gavin; Markus Bösche; Roland Krause; Paola Grandi; Martina Marzioch; Andreas Bauer; Jörg Schultz; Jens M Rick; Anne-Marie Michon; Cristina-Maria Cruciat; Marita Remor; Christian Höfert; Malgorzata Schelder; Miro Brajenovic; Heinz Ruffner; Alejandro Merino; Karin Klein; Manuela Hudak; David Dickson; Tatjana Rudi; Volker Gnau; Angela Bauch; Sonja Bastuck; Bettina Huhse; Christina Leutwein; Marie-Anne Heurtier; Richard R Copley; Angela Edelmann; Erich Querfurth; Vladimir Rybin; Gerard Drewes; Manfred Raida; Tewis Bouwmeester; Peer Bork; Bertrand Seraphin; Bernhard Kuster; Gitte Neubauer; Giulio Superti-Furga
Journal: Nature Date: 2002-01-10 Impact factor: 49.962

2. Navigating the Phenotype Frontier: The Monarch Initiative.

Authors: Julie A McMurry; Sebastian Köhler; Nicole L Washington; James P Balhoff; Charles Borromeo; Matthew Brush; Seth Carbon; Tom Conlin; Nathan Dunn; Mark Engelstad; Erin Foster; Jean-Philippe Gourdine; Julius O B Jacobsen; Daniel Keith; Bryan Laraway; Jeremy Nguyen Xuan; Kent Shefchek; Nicole A Vasilevsky; Zhou Yuan; Suzanna E Lewis; Harry Hochheiser; Tudor Groza; Damian Smedley; Peter N Robinson; Christopher J Mungall; Melissa A Haendel
Journal: Genetics Date: 2016-08 Impact factor: 4.562

3. A proteome-scale map of the human interactome network.

Authors: Thomas Rolland; Murat Taşan; Benoit Charloteaux; Samuel J Pevzner; Quan Zhong; Nidhi Sahni; Song Yi; Irma Lemmens; Celia Fontanillo; Roberto Mosca; Atanas Kamburov; Susan D Ghiassian; Xinping Yang; Lila Ghamsari; Dawit Balcha; Bridget E Begg; Pascal Braun; Marc Brehme; Martin P Broly; Anne-Ruxandra Carvunis; Dan Convery-Zupan; Roser Corominas; Jasmin Coulombe-Huntington; Elizabeth Dann; Matija Dreze; Amélie Dricot; Changyu Fan; Eric Franzosa; Fana Gebreab; Bryan J Gutierrez; Madeleine F Hardy; Mike Jin; Shuli Kang; Ruth Kiros; Guan Ning Lin; Katja Luck; Andrew MacWilliams; Jörg Menche; Ryan R Murray; Alexandre Palagi; Matthew M Poulin; Xavier Rambout; John Rasla; Patrick Reichert; Viviana Romero; Elien Ruyssinck; Julie M Sahalie; Annemarie Scholz; Akash A Shah; Amitabh Sharma; Yun Shen; Kerstin Spirohn; Stanley Tam; Alexander O Tejeda; Shelly A Trigg; Jean-Claude Twizere; Kerwin Vega; Jennifer Walsh; Michael E Cusick; Yu Xia; Albert-László Barabási; Lilia M Iakoucheva; Patrick Aloy; Javier De Las Rivas; Jan Tavernier; Michael A Calderwood; David E Hill; Tong Hao; Frederick P Roth; Marc Vidal
Journal: Cell Date: 2014-11-20 Impact factor: 41.582

4. Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae.

Authors: Teresa Reguly; Ashton Breitkreutz; Lorrie Boucher; Bobby-Joe Breitkreutz; Nizar N Batada; Gary C Hon; Chad L Myers; Ainslie Parsons; Helena Friesen; Rose Oughtred; Amy Tong; Chris Stark; Yuen Ho; David Botstein; Brenda Andrews; Charles Boone; Olga G Troyanskya; Trey Ideker; Kara Dolinski; Mike Tyers
Journal: J Biol Date: 2006-06-08

5. The Comparative Toxicogenomics Database's 10th year anniversary: update 2015.

Authors: Allan Peter Davis; Cynthia J Grondin; Kelley Lennon-Hopkins; Cynthia Saraceni-Richards; Daniela Sciaky; Benjamin L King; Thomas C Wiegers; Carolyn J Mattingly
Journal: Nucleic Acids Res Date: 2014-10-17 Impact factor: 16.971

6. The PhosphoGRID Saccharomyces cerevisiae protein phosphorylation site database: version 2.0 update.

Authors: Ivan Sadowski; Bobby-Joe Breitkreutz; Chris Stark; Ting-Cheng Su; Matthew Dahabieh; Sheetal Raithatha; Wendy Bernhard; Rose Oughtred; Kara Dolinski; Kris Barreto; Mike Tyers
Journal: Database (Oxford) Date: 2013-05-13 Impact factor: 3.451

7. PIE: an online prediction system for protein-protein interactions from text.

Authors: Sun Kim; Soo-Yong Shin; In-Hee Lee; Soo-Jin Kim; Ram Sriram; Byoung-Tak Zhang
Journal: Nucleic Acids Res Date: 2008-05-28 Impact factor: 16.971

8. Integrative analysis of 111 reference human epigenomes.

Authors: Anshul Kundaje; Wouter Meuleman; Jason Ernst; Misha Bilenky; Angela Yen; Alireza Heravi-Moussavi; Pouya Kheradpour; Zhizhuo Zhang; Jianrong Wang; Michael J Ziller; Viren Amin; John W Whitaker; Matthew D Schultz; Lucas D Ward; Abhishek Sarkar; Gerald Quon; Richard S Sandstrom; Matthew L Eaton; Yi-Chieh Wu; Andreas R Pfenning; Xinchen Wang; Melina Claussnitzer; Yaping Liu; Cristian Coarfa; R Alan Harris; Noam Shoresh; Charles B Epstein; Elizabeta Gjoneska; Danny Leung; Wei Xie; R David Hawkins; Ryan Lister; Chibo Hong; Philippe Gascard; Andrew J Mungall; Richard Moore; Eric Chuah; Angela Tam; Theresa K Canfield; R Scott Hansen; Rajinder Kaul; Peter J Sabo; Mukul S Bansal; Annaick Carles; Jesse R Dixon; Kai-How Farh; Soheil Feizi; Rosa Karlic; Ah-Ram Kim; Ashwinikumar Kulkarni; Daofeng Li; Rebecca Lowdon; GiNell Elliott; Tim R Mercer; Shane J Neph; Vitor Onuchic; Paz Polak; Nisha Rajagopal; Pradipta Ray; Richard C Sallari; Kyle T Siebenthall; Nicholas A Sinnott-Armstrong; Michael Stevens; Robert E Thurman; Jie Wu; Bo Zhang; Xin Zhou; Arthur E Beaudet; Laurie A Boyer; Philip L De Jager; Peggy J Farnham; Susan J Fisher; David Haussler; Steven J M Jones; Wei Li; Marco A Marra; Michael T McManus; Shamil Sunyaev; James A Thomson; Thea D Tlsty; Li-Huei Tsai; Wei Wang; Robert A Waterland; Michael Q Zhang; Lisa H Chadwick; Bradley E Bernstein; Joseph F Costello; Joseph R Ecker; Martin Hirst; Alexander Meissner; Aleksandar Milosavljevic; Bing Ren; John A Stamatoyannopoulos; Ting Wang; Manolis Kellis
Journal: Nature Date: 2015-02-19 Impact factor: 69.504

9. PubChem BioAssay: 2014 update.

Authors: Yanli Wang; Tugba Suzek; Jian Zhang; Jiyao Wang; Siqian He; Tiejun Cheng; Benjamin A Shoemaker; Asta Gindulyte; Stephen H Bryant
Journal: Nucleic Acids Res Date: 2013-11-05 Impact factor: 16.971

10. WormBase 2016: expanding to enable helminth genomic research.

Authors: Kevin L Howe; Bruce J Bolt; Scott Cain; Juancarlos Chan; Wen J Chen; Paul Davis; James Done; Thomas Down; Sibyl Gao; Christian Grove; Todd W Harris; Ranjana Kishore; Raymond Lee; Jane Lomax; Yuling Li; Hans-Michael Muller; Cecilia Nakamura; Paulo Nuin; Michael Paulini; Daniela Raciti; Gary Schindelman; Eleanor Stanley; Mary Ann Tuli; Kimberly Van Auken; Daniel Wang; Xiaodong Wang; Gary Williams; Adam Wright; Karen Yook; Matthew Berriman; Paul Kersey; Tim Schedl; Lincoln Stein; Paul W Sternberg
Journal: Nucleic Acids Res Date: 2015-11-17 Impact factor: 16.971

367 in total

1. Agonist-specific Protein Interactomes of Glucocorticoid and Androgen Receptor as Revealed by Proximity Mapping.

Authors: Joanna K Lempiäinen; Einari A Niskanen; Kaisa-Mari Vuoti; Riikka E Lampinen; Helka Göös; Markku Varjosalo; Jorma J Palvimo
Journal: Mol Cell Proteomics Date: 2017-06-13 Impact factor: 5.911

2. TICA: Transcriptional Interaction and Coregulation Analyzer.

Authors: Stefano Perna; Pietro Pinoli; Stefano Ceri; Limsoon Wong
Journal: Genomics Proteomics Bioinformatics Date: 2018-12-19 Impact factor: 7.691

3. DNA Repair Network Analysis Reveals Shieldin as a Key Regulator of NHEJ and PARP Inhibitor Sensitivity.

Authors: Rajat Gupta; Kumar Somyajit; Takeo Narita; Elina Maskey; Andre Stanlie; Magdalena Kremer; Dimitris Typas; Michael Lammers; Niels Mailand; Andre Nussenzweig; Jiri Lukas; Chunaram Choudhary
Journal: Cell Date: 2018-04-12 Impact factor: 41.582

4. A SARS-CoV-2 (COVID-19) biological network to find targets for drug repurposing.

Authors: Mahnaz Habibi; Golnaz Taheri; Rosa Aghdam
Journal: Sci Rep Date: 2021-04-30 Impact factor: 4.379

5. Molecular mechanisms underlying gliomas and glioblastoma pathogenesis revealed by bioinformatics analysis of microarray data.

Authors: Basavaraj Vastrad; Chanabasayya Vastrad; Ashok Godavarthi; Raghu Chandrashekar
Journal: Med Oncol Date: 2017-09-26 Impact factor: 3.064

6. An integrative systems genetic analysis of mammalian lipid metabolism.

Authors: Benjamin L Parker; Anna C Calkin; Marcus M Seldin; Michael F Keating; Elizabeth J Tarling; Pengyi Yang; Sarah C Moody; Yingying Liu; Eser J Zerenturk; Elise J Needham; Matthew L Miller; Bethan L Clifford; Pauline Morand; Matthew J Watt; Ruth C R Meex; Kang-Yu Peng; Richard Lee; Kaushala Jayawardana; Calvin Pan; Natalie A Mellett; Jacquelyn M Weir; Ross Lazarus; Aldons J Lusis; Peter J Meikle; David E James; Thomas Q de Aguiar Vallim; Brian G Drew
Journal: Nature Date: 2019-02-27 Impact factor: 49.962

7. Functional geometry of protein interactomes.

Authors: Noël Malod-Dognin; Nataša Pržulj
Journal: Bioinformatics Date: 2019-10-01 Impact factor: 6.937

8. Prediction of regulatory motifs from human Chip-sequencing data using a deep learning framework.

Authors: Jinyu Yang; Anjun Ma; Adam D Hoppe; Cankun Wang; Yang Li; Chi Zhang; Yan Wang; Bingqiang Liu; Qin Ma
Journal: Nucleic Acids Res Date: 2019-09-05 Impact factor: 16.971

9. Physiological and pathophysiological characteristics of ataxin-3 isoforms.

Authors: Daniel Weishäupl; Juliane Schneider; Barbara Peixoto Pinheiro; Corinna Ruess; Sandra Maria Dold; Felix von Zweydorf; Christian Johannes Gloeckner; Jana Schmidt; Olaf Riess; Thorsten Schmidt
Journal: J Biol Chem Date: 2018-11-19 Impact factor: 5.157

10. Chromosome 17 Missing Proteins: Recent Progress and Future Directions as Part of the neXt-MP50 Challenge.

Authors: Omer Siddiqui; Hongjiu Zhang; Yuanfang Guan; Gilbert S Omenn
Journal: J Proteome Res Date: 2018-10-23 Impact factor: 4.466