| Literature DB >> 31921391 |
Nguyen Quoc Khanh Le1,2, Edward Kien Yee Yapp3, N Nagasundaram1, Matthew Chin Heng Chua4, Hui-Yuan Yeh1.
Abstract
Protein function prediction is one of the most well-studied topics, attracting attention from countless researchers in the field of computational biology. Implementing deep neural networks that help improve the prediction of protein function, however, is still a major challenge. In this research, we suggested a new strategy that includes gated recurrent units and position-specific scoring matrix profiles to predict vesicular transportation proteins, a biological function of great importance. Although it is difficult to discover its function, our model is able to achieve accuracies of 82.3% and 85.8% in the cross-validation and independent dataset, respectively. We also solve the problem of imbalance in the dataset via tuning class weight in the deep learning model. The results generated showed sensitivity, specificity, MCC, and AUC to have values of 79.2%, 82.9%, 0.52, and 0.861, respectively. Our strategy shows superiority in results on the same dataset against all other state-of-the-art algorithms. In our suggested research, we have suggested a technique for the discovery of more proteins, particularly proteins connected with vesicular transport. In addition, our accomplishment could encourage the use of gated recurrent units architecture in protein function prediction.Entities:
Keywords: Deep learning; Membrane proteins; Protein function prediction; Recurrent neural network; Transport proteins; Vesicular trafficking model
Year: 2019 PMID: 31921391 PMCID: PMC6944713 DOI: 10.1016/j.csbj.2019.09.005
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1The flowchart for identifying vesicular transport proteins using GRU and PSSM profiles.
Statistics of all dataset used in this study.
| Original | Identity < 30% | Cross-validation | Independent | |
|---|---|---|---|---|
| Vesicular transport | 7108 | 2533 | 2214 | 319 |
| Non-vesicular transport | 17656 | 9086 | 7573 | 1513 |
Summary of GRU architecture in this study.
| Layer | Weights | Parameters |
|---|---|---|
| Conv1d (20, 250, 3) | ((250, 20, 3), (250,)) | 15,250 |
| AvgPool1d (3) | 0 | 0 |
| Conv1d (250, 250, 3) | ((250, 250, 3), (250,)) | 187,750 |
| AvgPool1d (3) | 0 | 0 |
| GRU (250, 150, 1) | ((750, 150), (750, 150), (750,), (750,)) | 226,500 |
| Linear (150, 32) | ((32, 150), (32,)) | 4832 |
| Dropout (0.01) | 0 | 0 |
| Linear (32, 1) | ((1, 32), (1,)) | 33 |
| Sigmoid () | 0 | 0 |
Fig. 2Amino acid composition in vesicular transport and non-vesicular transport proteins.
Fig. 3Comparison between vesicular and non-vesicular transport proteins using their dipeptide and tripeptide composition.
Performance results of identifying vesicular transport proteins with different fully-connected (FC) layer sizes.
| FC sizes | Sensitivity | Precision | Specificity | Accuracy | MCC | AUC |
|---|---|---|---|---|---|---|
| 16 | 39.6 | 63.4 | 93.5 | 81.5 | 0.40 | 0.765 |
| 32 | 63.4 | 93.3 | ||||
| 64 | 34.6 | 81.4 | 0.38 | 0.757 | ||
| 128 | 40.8 | 63 | 93.1 | 81.5 | 0.40 | 0.75 |
| 256 | 38.8 | 63.2 | 93.5 | 81.4 | 0.39 | 0.762 |
| 512 | 38.2 | 63.7 | 93.8 | 81.4 | 0.39 | 0.76 |
| 1024 | 37.1 | 64.7 | 94.2 | 81.5 | 0.39 | 0.757 |
The bold values are the highest ones in each specific metric.
Comparative performance results among different imbalanced techniques.
| Techniques | Sensitivity | Precision | Specificity | Accuracy | MCC | AUC |
|---|---|---|---|---|---|---|
| Oversampling | 77.3 | 47.4 | 82.5 | 81.6 | 0.50 | 0.849 |
| Undersampling | 60.4 | 46.5 | 85.8 | 81.5 | 0.42 | 0.781 |
| Class weight tuning | 79.2 | 48.7 | 82.9 | 82.3 | 0.52 | 0.861 |
Fig. 4ROC Curves among different methods for identifying vesicular transport proteins.
Comparative performance results among different protein function prediction methods.
| Techniques | Sensitivity | Precision | Specificity | Accuracy | MCC | AUC |
|---|---|---|---|---|---|---|
| Traditional GRU* | 70.8 | 44 | 81 | 79.2 | 0.44 | 0.848 |
| BLSTM | 54.2 | 55.8 | 90.9 | 84.6 | 0.46 | 0.846 |
| BLAST | 54.1 | 52.8 | 89.8 | 83.6 | 0.43 | 0.82 |
| New GRU** | 79.2 | 48.7 | 82.9 | 82.3 | 0.52 | 0.861 |
(* traditional PSSM profiles + GRU, ** our GRU architecture).