| Literature DB >> 30294271 |
Lifen Shao1, Hui Gao1, Zhen Liu1, Juan Feng2, Lixia Tang2, Hao Lin2.
Abstract
Antioxidant proteins have been found closely linked to disease control for its ability to eliminate excess free radicals. Because of its medicinal value, the study of identifying antioxidant proteins is on the upsurge. Many machine-learning classifiers have performed poorly owing to the nonlinear and unbalanced nature of biological data. Recently, deep learning techniques showed advantages over many state-of-the-art machine learning methods in various fields. In this study, a deep learning based classifier was proposed to identify antioxidant proteins based on mixed g-gap dipeptide composition feature vector. The classifier employed deep autoencoder to extract nonlinear representation from raw input. The t-Distributed Stochastic Neighbor Embedding (t-SNE) was used for dimensionality reduction. Support vector machine was finally performed for classification. The classifier achieved F 1 score of 0.8842 and MCC of 0.7409 in 10-fold cross validation. Experimental results show that our proposed method outperformed the traditional machine learning methods and could be a promising tool for antioxidant protein identification. For the convenience of others' scientific research, we have developed a user-friendly web server called IDAod for antioxidant protein identification, which can be accessed freely at http://bigroup.uestc.edu.cn/IDAod/.Entities:
Keywords: antioxidant proteins; deep learning; feature selection; g-gap dipeptide; webserver
Year: 2018 PMID: 30294271 PMCID: PMC6158654 DOI: 10.3389/fphar.2018.01036
Source DB: PubMed Journal: Front Pharmacol ISSN: 1663-9812 Impact factor: 5.810
Figure 1Our integral model's pipeline.
Figure 2The deep neural network structure corresponding to feature learning.
Figure 3The architecture of the seven-layers' deep autoencoder. The number on the left side indicates the number of nodes in each layer.
Figure 4t-SNE visualization of the output of each layer in the deep neural network. Positive and negative samples are marked as green and red points, respectively. (A) input layer; (B) the first encoder layer; (C) the second encoder layer; (D) the third encoder layer; (E) the first FC layer; (F) the second FC layer.
Comparison of our model with other methods.
| AodPred | 35.97 | 98.52 | 94.84 | 0.4959 | 0.4951 |
| Logistic | 53.23 | 80.38 | 76.39 | 0.3831 | 0.2695 |
| Decision Tree | 52.69 | 71.79 | 68.78 | 0.3230 | 0.1817 |
| Random Forest | 30.09 | 92.96 | 84.33 | 0.3465 | 0.2620 |
| Ensemble Model | 87.80 | 86.00 | 86.30 | 0.6699 | 0.6170 |
| IDAod | 81.27 | 99.59 | 97.05 | 0.8842 | 0.7409 |