| Literature DB >> 30712510 |
Jingcheng Du1, Peilin Jia1, Yulin Dai1, Cui Tao1, Zhongming Zhao1, Degui Zhi2.
Abstract
BACKGROUND: Existing functional description of genes are categorical, discrete, and mostly through manual process. In this work, we explore the idea of gene embedding, distributed representation of genes, in the spirit of word embedding.Entities:
Keywords: Distributed representation; Embedding; Gene co-expression; Gene-gene interaction; Gene2Vec; Word2vec
Mesh:
Year: 2019 PMID: 30712510 PMCID: PMC6360648 DOI: 10.1186/s12864-018-5370-x
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1The Skip-Gram architecture was used for training for gene embedding. This is the modified architecture which is equivalent to the original word2vec, adopted from this blog [22]
Fig. 2The architecture of gene-gene interaction predictor neural network (GGIPNN)
Hyperparameter tuning using clusteredness as target function
| Dimension | Number of Iterations | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
| 50 | 1.428 | 1.444 | 1.467 | 1.470 |
| 1.465 | 1.473 | 1.479 | 1.475 | 1.462 |
| 100 | 1.415 | 1.467 | 1.488 | 1.491 | 1.498 | 1.501 |
| 1.486 | 1.480 | 1.490 |
| 200 | 1.403 | 1.463 | 1.491 | 1.498 | 1.495 | 1.482 | 1.470 | 1.488 |
| 1.509 |
| 300 | 1.392 | 1.443 | 1.472 | 1.473 | 1.473 | 1.509 | 1.474 |
| 1.479 | 1.480 |
Bold number denotes the largest number in that row
Fig. 3Gene co-expression map generated from embedding reveals clusters of functionally related genes. F1 and F2 are the first and the second dimensions of t-SNE. Red: LOC non-coding genes; cyan: microRNA; pink: small nucleolar RNA (snoRNA); yellow: undercharacterized ORFs
Fig. 4Embedding reveals clusters of genes with tissue-specificity. Blood and spleen have clear patterns of tissue-specific genes. Reproductive system (e.g., ovary) also showed distinguished genes. Genes not available in GTEx data were colored grey
Fig. 5ROC curves for gene-gene interaction predictor neural networks