| Literature DB >> 31118945 |
Walter Nelson1,2, Marinka Zitnik3, Bo Wang3,4,5, Jure Leskovec3,6, Anna Goldenberg1,5,7, Roded Sharan8.
Abstract
Current technology is producing high throughput biomedical data at an ever-growing rate. A common approach to interpreting such data is through network-based analyses. Since biological networks are notoriously complex and hard to decipher, a growing body of work applies graph embedding techniques to simplify, visualize, and facilitate the analysis of the resulting networks. In this review, we survey traditional and new approaches for graph embedding and compare their application to fundamental problems in network biology with using the networks directly. We consider a broad variety of applications including protein network alignment, community detection, and protein function prediction. We find that in all of these domains both types of approaches are of value and their performance depends on the evaluation measures being used and the goal of the project. In particular, network embedding methods outshine direct methods according to some of those measures and are, thus, an essential tool in bioinformatics research.Entities:
Keywords: community detection; network alignment; network biology; network embedding; protein function prediction
Year: 2019 PMID: 31118945 PMCID: PMC6504708 DOI: 10.3389/fgene.2019.00381
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1Schematic representing three applications applied to networks directly as well as applied to the network embeddings. Colors represent some node features in the network; for example, protein families. (A,B) Visualization of the embedding process for two networks in 2D space. (C) Visualization of community detection in embedded space (top) and directly on the network (bottom). (D) Top: visualization of network alignment in embedded space. In this example, the network embedding in panel (B) is rotated, translated and reflected to find an optimal alignment with the embedding in panel (A). Bottom: visualization of direct alignment of two networks: vertical proximity represents the found alignment. (E) Visualization of function prediction in embedded space. The previously unlabeled (white) nodes (bottom) or their embeddings (top) are labeled (colored).
Comparative analysis of direct vs. embedding methods across a range of problems in network biology.
| IsoRank (α = 0.5) | MuNK (λ = 0.05) | ||||
|---|---|---|---|---|---|
| EC | 21.9% | ||||
| GOC | |||||
| 57.6% | |||||
| 17.9% | |||||
| 1.0% | |||||
| GOC (AUPR) | 0.721 | ||||
| Runtime | 26 min 40 s (incl. grid search) | 1 min 52 s (incl. alignment) | |||
| Buettner ( | 0.256 | ||||
| Kolodziejczyk ( | 0.325 | ||||
| Pollen ( | 0.928 | ||||
| Usoskin ( | 0.373 | ||||
| Avg. Runtime | 1 min 15 s (incl. parameter grid search) | <5 s | |||
| STRING v9.1 | |||||
| AUPR | |||||
| MF | 0.327 | ||||
| BP | 0.213 | ||||
| CC | 0.487 | ||||
| Avg. Runtime | 3 min 57 s | 14 min 56 s (incl. recommended SVM tuning procedure) | |||
A summary of network embedding tools and their applications.
| Name of the tool | Availability | What was it applied to |
|---|---|---|
| Network enhancement | Matlab code | Hi-C interaction networks combining gene interaction networks across tissues |
| Single-cell representation learning | Binary | Single-cell RNA-seq data |
| Geometric denoising | PPI networks | |
| MuNK | Python code and all Anaconda-reproducible experiments | Cross-species functional PPIs (yeast, mouse, human) |
| Minimum curvilinearity embedding II | (i) Cerebrospinal fluid proteomics – neuropathic pain | |
| (ii) Transcription factor expressions – tissue prediction | ||
| Vicus | Single-cell RNA-seq: | |
| (i) Pollen – neural and stem cells | ||
| (ii) Usoskin – mouse neurons, sensory subtypes | ||
| (iii) Buettner – embryonic stem cells | ||
| (iv) Kolodziejczyk – pluripotent cells | ||
| Coalescent embedding | Non-biological | |
| Mashup | Protein function prediction, gene ontology reconstruction, and genetic interaction prediction | |
| OhmNet | Tissue-specific gene function prediction | |
| Disease gene discovery | Disease pathway detection | |
| Molecular fingerprints | Prediction of molecular properties, including drug efficacy, solubility, and photovoltaic efficiency | |
| Decagon | (i) Polypharmacy side-effect prediction | |
| (ii) Drug–drug interaction prediction | ||
| Graph convolutional policy network | Molecular graph generation | |
| Residual LSTM Embeddings | (i) Drug side-effect prediction | |
| (ii) Drug toxicity prediction |