| Literature DB >> 31695723 |
Yan Wang1,2, Shuangquan Zhang1, Lili Yang3, Sen Yang1, Yuan Tian2, Qin Ma4.
Abstract
Measuring conditional relatedness, the degree of relation between a pair of genes in a certain condition, is a basic but difficult task in bioinformatics, as traditional co-expression analysis methods rely on co-expression similarities, well known with high false positive rate. Complement with prior-knowledge similarities is a feasible way to tackle the problem. However, classical combination machine learning algorithms fail in detection and application of the complex mapping relations between similarities and conditional relatedness, so a powerful predictive model will have enormous benefit for measuring this kind of complex mapping relations. To this need, we propose a novel deep learning model of convolutional neural network with a fully connected first layer, named fully convolutional neural network (FCNN), to measure conditional relatedness between genes using both co-expression and prior-knowledge similarities. The results on validation and test datasets show FCNN model yields an average 3.0% and 2.7% higher accuracy values for identifying gene-gene interactions collected from the COXPRESdb, KEGG, and TRRUST databases, and a benchmark dataset of Xiao-Yong et al. research, by grid-search 10-fold cross validation, respectively. In order to estimate the FCNN model, we conduct a further verification on the GeneFriends and DIP datasets, and the FCNN model obtains an average of 1.8% and 7.6% higher accuracy, respectively. Then the FCNN model is applied to construct cancer gene networks, and also calls more practical results than other compared models and methods. A website of the FCNN model and relevant datasets can be accessed from https://bmbl.bmi.osumc.edu/FCNN.Entities:
Keywords: co-expression similarity; conditional relatedness between genes; fully convolutional neural network; gene network; prior-knowledge similarity
Year: 2019 PMID: 31695723 PMCID: PMC6818468 DOI: 10.3389/fgene.2019.01009
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
The structure of FCNN dataset.
| Sub dataset | Sub-sub dataset | Type of gene pair | Sample size |
|---|---|---|---|
| Co-expression | Co-expression | Positive | 32,735 |
| Negative | 26,782 | ||
| Prior-knowledge | KEGG | Positive | 11,526 |
| Negative | 11,526 | ||
| PPI | Positive | 15,222 | |
| Negative | 21,579 | ||
| TRRUST | Positive | 7,136 | |
| Negative | 7,136 | ||
| DIP | DIP | Positive | 1,396 |
| Negative | 1,396 | ||
| GeneFriends | GeneFriends | Positive | 8,675 |
| Negative | 8,675 |
Figure 1The structure of the FCNN model.
Figure 2The flowchart of experimental design in biological pathways identification.
Effects of the varied hyper-parameters through a 10-fold cross-validation in terms of AUC based on the validation and test datasets.
| Hyper-parameter | Parameter | Validation | Test |
|---|---|---|---|
| Kernel size | 2 |
|
|
| 3 | 0.8121 | 0.8172 | |
| Stride | 1 |
|
|
| 2 | 0.8089 | 0.8156 | |
| Number of neurons | 25 | 0.8191 | 0.8232 |
| 81 |
|
| |
| 169 | 0.8189 | 0.8236 | |
| Learning rate | 0.01 | 0.8250 | 0.8296 |
| 0.001 |
|
| |
| 0.0001 | 0.7763 | 0.7802 | |
| Dropout probability | 0.1 |
|
|
| 0.2 | 0.8196 | 0.8228 | |
| 0.3 | 0.8180 | 0.8227 | |
| Batch size | 200 | 0.8166 | 0.8231 |
| 250 |
|
| |
| 300 | 0.8135 | 0.8209 | |
| Activation function | ReLU_ReLU | 0.8132 | 0.8224 |
| ReLU_Sigmoid | 0.8127 | 0.8210 | |
| ReLU_Tanh | 0.8127 | 0.8242 | |
| Sigmoid_ReLU | 0.8224 | 0.8296 | |
| Sigmoid_Sigmoid | 0.8245 | 0.8301 | |
| Sigmoid_Tanh | 0.8271 | 0.8308 | |
| Tanh_ReLU | 0.8253 | 0.8297 | |
| Tanh_Sigmoid | 0.8245 | 0.8309 | |
| Tanh_Tanh |
|
|
FCNN model obtains the optimal AUC value, based on the different hyper-parameters combinations.
Figure 3ROCs of all models and methods for identifying gene–gene interactions in the (A) validation, (C) test, (E) DIP, and (G) GeneFriends datasets. ACCs of all models and methods for identifying gene–gene interactions in the (B) validation, (D) test, (F) DIP, and (H) GeneFriends datasets.
The number of samples in cancer and normal tissue.
| Caner type | Samples in normal tissue | Samples in cancer |
|---|---|---|
| LUAD | 515 | 19 |
| COAD | 285 | 113 |
| BRCA | 1095 | 41 |
| BLCA | 408 | 59 |
Figure 4Number of metabolic pathways predicted to be directly influenced by increased serine metabolism in four cancer types.