| Literature DB >> 31874633 |
Nguyen Quoc Khanh Le1, Quang H Nguyen2, Xuan Chen3, Susanto Rahardja4, Binh P Nguyen5.
Abstract
BACKGROUND: Adaptor proteins are carrier proteins that play a crucial role in signal transduction. They commonly consist of several modular domains, each having its own binding activity and operating by forming complexes with other intracellular-signaling molecules. Many studies determined that the adaptor proteins had been implicated in a variety of human diseases. Therefore, creating a precise model to predict the function of adaptor proteins is one of the vital tasks in bioinformatics and computational biology. Few computational biology studies have been conducted to predict the protein functions, and in most of those studies, position specific scoring matrix (PSSM) profiles had been used as the features to be fed into the neural networks. However, the neural networks could not reach the optimal result because the sequential information in PSSMs has been lost. This study proposes an innovative approach by incorporating recurrent neural networks (RNNs) and PSSM profiles to resolve this problem.Entities:
Keywords: Adaptor proteins; Classification; Deep learning; GRU; PSSM; Prediction; RNN
Mesh:
Substances:
Year: 2019 PMID: 31874633 PMCID: PMC6929330 DOI: 10.1186/s12864-019-6335-4
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Different compositions of amino acid in adaptor proteins and non-adaptor proteins. x-axis represents 20 amino acids, y-axis represents the frequency (%) of each amino acid
Performance results of distinguishing adaptor proteins with different methods
| Method | Cross Validation | Independent Test | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sensitivity | Specificity | Accuracy | AUC | MCC | Sensitivity | Specificity | Accuracy | AUC | MCC | |
| 0.635 | 0.750 | 0.738 | 0.770 | 0.254 | 0.671 | 0.751 | 0.743 | 0.791 | 0.280 | |
| RF | 0.185 | 0.837 | 0.214 | 0.290 | 0.923 | 0.860 | 0.838 | 0.216 | ||
| SVM | 0.397 | 0.934 | 0.881 | 0.818 | 0.332 | 0.426 | 0.806 | 0.353 | ||
| CNN | 0.532 | 0.875 | 0.841 | 0.774 | 0.328 | 0.548 | 0.873 | 0.841 | 0.783 | 0.339 |
| RNN | 0.751 | 0.757 | 0.798 | 0.804 | ||||||
(k-NN: k=10; RF: num_stimators=500; SVM: c=8.0, g=0.5; CNN: 128 filters; RNN: 512 filters)
Fig. 2The receiver operating characteristic (ROC) curve of one fold in our experiments
Fig. 3Flowchart of the study
Statistics of the benchmark dataset
| Original | Non-Redundant | |||
|---|---|---|---|---|
| Total | Train-Val | Test | ||
| Adaptor | 4049 | 1224 | 1069 | 155 |
| Non-Adaptor | 23,917 | 11,078 | 9695 | 1383 |
Fig. 4Architecture of the RNN model