| Literature DB >> 27153592 |
Charles Blatti1, Saurabh Sinha2.
Abstract
MOTIVATION: Analysis of co-expressed gene sets typically involves testing for enrichment of different annotations or 'properties' such as biological processes, pathways, transcription factor binding sites, etc., one property at a time. This common approach ignores any known relationships among the properties or the genes themselves. It is believed that known biological relationships among genes and their many properties may be exploited to more accurately reveal commonalities of a gene set. Previous work has sought to achieve this by building biological networks that combine multiple types of gene-gene or gene-property relationships, and performing network analysis to identify other genes and properties most relevant to a given gene set. Most existing network-based approaches for recognizing genes or annotations relevant to a given gene set collapse information about different properties to simplify (homogenize) the networks.Entities:
Mesh:
Year: 2016 PMID: 27153592 PMCID: PMC4937193 DOI: 10.1093/bioinformatics/btw151
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1. Illustration of DRaWR Method. Given a set of genes called the query set, Q = {G3, G4}, DRaWR will rank all remaining genes from the query species, U = {G1, G2, …, G5}, based on their relevance from a random walk with restart (RWR) in a heterogeneous network of biological knowledge. In this example, the heterogeneous network contains gene nodes from two species (circles) and feature nodes from two public resources (squares) which are connected by gene–gene sequence homology and by feature-gene annotations (edges). First, DRaWR will find the subset of feature nodes that are specific to Q, (P2, P3, P4 in this example), by comparing the relevance (shading) of the nodes between (a) a ‘baseline’ RWR on the entire network with U as the restart set and (b) a ‘stage 1’ RWR on the entire network with Q as the restart set. A subnetwork is created using only the feature nodes that are specific to Q and (c) a ‘stage 2’ relevance is calculated from a RWR on this subnetwork with Q as the restart set. DRaWR finally ranks the genes in U for their similarity to the initial query set, Q, based on the ‘stage 2 ’relevance scores
Fig. 2.Comparison of Stage 1 and 2 Rankings on Drosophila Heterogeneous Network. We compared the rankings produced at the end of the first stage random walk to the second stage random walk on query specific networks. We calculated the average stage 1 and stage 2 AUROCs for each of the 92 expression domains and then plot the number of domains (y-axis) that were above each possible AUROC threshold (x-axis)
Fig. 3.Comparison of RWR on Different Drosophila Networks. We compared the stage 2 rankings produced by our algorithm when the initial network was defined by single (‘Domain’, ‘ChIP’, ‘Motif’) or ‘Heterogeneous’ feature types. We calculated the stage 2 AUROCs for each of the 92 expression domains and then plot the number of domains (y-axis) that were above each possible AUROC threshold (x-axis). The inset shows more detail for the chart region of high AUROC
Fig 4Comparison between Single and Multi-Species Networks. We compared the stage 1 and stage 2 rankings when the initial network was defined as the heterogeneous network either from a single species (‘Fly’) or from multiple species (‘5Insect’). We calculated the AUROCs from each stage’s rankings for each of the 12 selected Drosophila expression domain genes sets and then plot the number of domains (y-axis) that were above each possible AUROC threshold (x-axis)
Ten query-specific features
| Rank | Feature node name | Feature node type |
|---|---|---|
| 1 | Striatum | Brain Atlas |
| 2 | Retrohippocampal | Brain Atlas |
| 3 | Hippocampus | Brain Atlas |
| 4 | Pallidum | Brain Atlas |
| 5 | MRJP | Prot Domain |
| 6 | PMP22_Claudin | Prot Domain |
| 7 | JHBP | Prot Domain |
| 8 | Globin | Prot Domain |
| 9 | Olfactory | Brain Atlas |
| 10 | Claudin_2 | Prot Domain |
The top ten feature nodes selected with our algorithm on the ‘3 species’ query set and multi-species, heterogeneous aggression network. Each node is listed along with its feature type.