| Literature DB >> 18801179 |
Eric Banks1, Elena Nabieva, Ryan Peterson, Mona Singh.
Abstract
NetGrep (http://genomics.princeton.edu/singhlab/netgrep/) is a system for searching protein interaction networks for matches to user-supplied 'network schemas'. Each schema consists of descriptions of proteins (for example, their molecular functions or putative domains) along with the desired topology and types of interactions among them. Schemas can thus describe domain-domain interactions, signaling and regulatory pathways, or more complex network patterns. NetGrep provides an advanced graphical interface for specifying schemas and fast algorithms for extracting their matches.Entities:
Mesh:
Year: 2008 PMID: 18801179 PMCID: PMC2592716 DOI: 10.1186/gb-2008-9-9-r138
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1A sample schema and its instances in yeast. (a) An example of a schema. Each protein in the schema has a specific feature description and each edge has a type. In this case, the schema describes Ras GTPase signaling, where small G proteins from the Ras family are regulated by GTPase activating proteins (GAPs) and Guanine nucleotide exchange factors (GEFs), and in turn regulate effector kinases, which may phosphorylate other proteins. (b) Instances of the schema in S. cerevisiae.
Figure 2Sample schemas. Examples of network schemas. Unlabeled schema proteins are considered to be 'wildcards' and can match any protein in the interaction network. (a) A signaling pathway schema. This schema matches all sets of proteins such that a protein in the cell membrane physically interacts with a succession of anywhere between one and three kinases, the last of which physically interacts with a protein that is a transcription factor. (b) A MAP kinase schema, specified by particular yeast proteins making up a canonical MAPK signaling pathway. (c) A feed-forward loop network motif [8] schema. The unlabeled nodes can match any protein in the network. (d) A 'kinate' feedback loop network motif schema [13]. (e) An SH3 domain interaction schema. This schema matches all interacting pairs of proteins such that one contains a Pfam SH3 domain and the other has one of the specified patterns, corresponding to SH3 binding sites, in its underlying amino acid sequence. Amino acids in the pattern are specified by their one letter code, and 'x' denotes a match to any amino acid. (f) A specific protein schema. This schema matches all proteins with a synthetic lethal relationship to yeast protein ACT1.
Feature comparisons
| Feature | PathBLAST [ | Fanmod [ | Narada [ | SAGA [ | NetMatch [ | NetGrep |
| Non-linear queries | X | X | X | X | X | |
| Allows arbitrary protein annotations | 1 per node | Unlimited | Unlimited | |||
| Boolean combination of annotations | X | X | ||||
| Inexact matches | X | X | X | |||
| Multiple edge types in a network | X | X | X | |||
| Boolean combination of edge types | X | |||||
| UI for searching/choosing annotations | X | X | ||||
| Can be used with Cytoscape | X | X | ||||
| Can be used as a standalone | X | X | X | X | X | |
| Custom data sets provided | X | X | X |
A comparison of built-in features available in systems that can, in principle, be used for querying interactomes using network schemas. A network alignment tool, PathBLAST, and a network motif finder, Fanmod, are shown for comparison. All other systems are explicitly designed for querying interactomes utilizing labeled subgraphs.
Protein features
| Protein feature | Source |
| Gene names and aliases | BioGRID [ |
| Amino acid sequences | Biomart [ |
| Paralogs | COG [ |
| Pfam A/B motifs | Pfam [ |
| SMART [ | InterPro [ |
| Prosite [ | |
| SCOP [ | |
| GO functional annotations | GO [ |
Protein features used to annotate proteins in the built-in data sets provided with NetGrep.
Interaction types
| Interaction type | Source | Restrictions |
| Physical | BioGRID [ | |
| Genetic | ||
| Gene coexpression | [ | |
| Transcriptional regulation | [ | Yeast only |
| Phosphorylation | [ | Yeast only |
Figure 3NetGrep screenshot. A detailed screenshot of the NetGrep display showing a sample query schema. (a) The graph panel area used to describe schemas. The Ras GTPase signaling schema from Figure 1 is shown in the panel with the Ras GTPase node highlighted. (b) The panel used to designate which interaction network to use, to choose the maximum number of matches desired, and to initiate a search. (c) The panel used to annotate nodes in the schema and to create or modify edges. The information for the highlighted node (node 3) is currently displayed in the panel; the edge between the first and third nodes is being modified. (d) The results panel in which the matches found from the search are displayed. Each row lists the proteins that make up a particular match along with its reliability score.
Running time comparisons
| Running time (s) | |||||
| Sample query | PathBLAST | Fanmod | Narada | NetMatch | NetGrep |
| Signaling pathway 1 | 28 | 4.2 | |||
| Signaling pathway 2 | 26.9 | ||||
| MAPK pathway | 90 | 0.02 | |||
| Feed-forward motif | 32 | 5.2 | 1.4 | ||
| Kinate motif | 32 | 5 | 0.5 | ||
| SH3 domain interaction | 0.5 | ||||
| ACT1 genetic interaction | 15 | 0.1 | |||
Running times (in seconds) for several sample queries on the S. cerevisiae interaction network, using PathBLAST, Fanmod, Narada, NetMatch and NetGrep. All reported running times are for search and output only. As in Table 1, PathBLAST is used as a prototypical example of a network alignment tool and Fanmod represents network motif finders. Note that SAGA is excluded here because it cannot be run on Windows. The sample schemas correspond to those provided in Figure 2, except that two distinct queries are used for Figure 2a. In the first, all three kinases in the pathway are required. In the second, two of the kinases are designated as optional (as in Figure 2a). Each query is run ten times and the average computation time is provided. Row entries are left blank for any tool that is unable to find instances of a particular schema because of feature limitations.
Figure 4Yeast GO molecular function schema timings. All possible triangular, 4-node linear, and 4-node branched schemas ('Y-star') with nodes described via GO molecular function slim terms were run systematically on NetGrep. Results are reported for those schemas with at least 5 but no more than 80,000 instances in S. cerevisiae: 780 triangular schemas; 80,719 4-node linear schemas; and 30,642 4-node branched schemas. Boxplots of the running times for each topology are given; boxplots are a convenient way of depicting the smallest observation, second quartile, median, third quartile, and largest observation in the data.
GO MF running time comparisons
| Running time (s) | ||||
| Topology | Query | Narada | NetMatch | NetGrep |
| Triangle | GO:0003677, GO:0004386, GO:0004672 | 15 | 0.1 | |
| Triangle | GO:0004386, GO:0004672, GO:0030528 | 16 | 0.2 | |
| Triangle | GO:0003723, GO:0003723, GO:0003723 | 15 | 1.9 | |
| Quad | GO:0004386, GO:0003677, GO:0016874, GO:0016829 | 1 | 14 | 0.2 |
| Quad | GO:0016787, GO:0030234, GO:0005515, GO:0008233 | 2.3 | 17 | 1.2 |
| Quad | GO:0003677, GO:0003723, GO:0005515, GO:0005198 | 4 | 16 | 1.9 |
| Quad | GO:0016787, GO:0005198, GO:0003677, GO:0016779 | 2.2 | 17 | 1.7 |
| Quad | GO:0016787, GO:0016740, GO:0016779, GO:0030528 | 4.8 | 16 | 2.9 |
| Y-star | GO:0008233, GO:0016874, GO:0030234, GO:0005215 | 15 | 0.2 | |
| Y-star | GO:0005515, GO:0004721, GO:0008233, GO:0016740 | 17 | 0.8 | |
| Y-star | GO:0005515, GO:0008233, GO:0005198, GO:0005215 | 17 | 3.9 | |
| Y-star | GO:0030528, GO:0005515, GO:0016740, GO:0005215 | 14 | 1.5 | |
| Y-star | GO:0016740, GO:0005515, GO:0030528, GO:0005215 | 14 | 5.2 | |
A comparison of running times (in seconds) for several sample schemas annotated with GO molecular function slim terms on the S. cerevisiae interaction network using Narada, NetMatch and NetGrep. Of the previous methods, Narada and NetMatch are chosen as they can be run off-the-shelf for these schemas; note, however, that Narada only handles linear topology queries. All reported running times are for search and output only. In the case of the Y-stars, the first term shown annotates the central node. The schemas shown have between 10 and 11,000 instances in S. cerevisiae.