| Literature DB >> 32328150 |
Jie Li1, Zhong Li1,2, Jiesi Luo3, Yuhua Yao4.
Abstract
The type III secretion system (T3SS) is a special protein delivery system in Gram-negative bacteria which delivers T3SS-secreted effectors (T3SEs) to host cells causing pathological changes. Numerous experiments have verified that T3SEs play important roles in many biological activities and in host-pathogen interactions. Accurate identification of T3SEs is therefore essential to help understand the pathogenic mechanism of bacteria; however, many existing biological experimental methods are time-consuming and expensive. New deep-learning methods have recently been successfully applied to T3SE recognition, but improving the recognition accuracy of T3SEs is still a challenge. In this study, we developed a new deep-learning framework, ACNNT3, based on the attention mechanism. We converted 100 residues of the N-terminal of the protein sequence into a fusion feature vector of protein primary structure information (one-hot encoding) and position-specific scoring matrix (PSSM) which are used as the feature input of the network model. We then embedded the attention layer into CNN to learn the characteristic preferences of type III effector proteins, which can accurately classify any protein directly as either T3SEs or non-T3SEs. We found that the introduction of new protein features can improve the recognition accuracy of the model. Our method combines the advantages of CNN and the attention mechanism and is superior in many indicators when compared to other popular methods. Using the common independent dataset, our method is more accurate than the previous method, showing an improvement of 4.1-20.0%.Entities:
Year: 2020 PMID: 32328150 PMCID: PMC7157791 DOI: 10.1155/2020/3974598
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Figure 1ACNNT3 architecture for T3SE prediction. Firstly, 64 1D convolution kernels with a length of 6 are convoluted to generate a 195 × 64 feature map, and then a 65 × 64 feature map is obtained through a 3 × 1 maximum pooling layer. The feature map is then input to the attention and full connection layers, and the two output results are combined to get 66 nodes. Finally, the 66 nodes are fully connected to the two output nodes, and the sigmoid function is used to activate to get the prediction probability of T3SE and non-T3SE.
Figure 2ACC comparison on the independent dataset under different epochs and batches.
Figure 3ROC curves on different training sets. (a) Use 5-fold crossvalidation experiment on training set 1. (b) Use 5-fold crossvalidation experiment on training set 2.
Figure 4Comparison of experimental results of fusion feature and single feature under the same network model.
Comparison with mainstream deep-learning methods.
| Method | PRE | F1 score | ACC | MCC | AUC |
|---|---|---|---|---|---|
| ACNNT3 |
|
|
|
|
|
| DenseNet | 0.850 | 0.907 | 0.942 | 0.870 | 0.951 |
| VGG16 | 0.846 | 0.892 | 0.934 | 0.847 | 0.937 |
| ResNet | 0.609 | 0.691 | 0.838 | 0.552 | 0.795 |
| CNN | 0.780 | 0.842 | 0.901 | 0.776 | 0.904 |
| LSTM | 0.875 | 0.933 | 0.959 | 0.909 | 0.961 |
The bold values indicate the best prediction results.
Comparison of ACNNT3 and DeepT3, Effective T3, BPBAac, and BEAN2 on an independent dataset.
| Method | PRE | SN | SP | F1 score | ACC | MCC | AUC |
|---|---|---|---|---|---|---|---|
| ACNNT3-1 | 0.919 |
| 0.965 |
|
|
| 0.968 |
| ACNNT3-2 | 0.711 | 0.914 | 0.849 | 0.800 | 0.868 | 0.716 | 0.882 |
| DeepT3-1 | 0.825 | 0.943 | 0.919 | 0.880 | 0.926 | 0.830 |
|
| DeepT3-2 | 0.643 | 0.771 | 0.825 | 0.701 | 0.810 | 0.569 | 0.896 |
| Effective T3 | 0.542 | 0.839 | 0.741 | 0.658 | 0.767 | 0.521 | 0.803 |
| BPBAac |
| 0.548 |
| 0.694 | 0.871 | 0.656 | 0.902 |
| BEAN2 | 0.674 | 0.935 | 0.835 | 0.784 | 0.862 | 0.706 | 0.865 |
The bold values indicate the best prediction results.
Comparison of ACNNT3 and DeepT3, Effective T3, BPBAac, and BEAN2 on a P. syringae dataset.
| Method | PRE | SN | SP | F1 score | ACC | MCC | AUC |
|---|---|---|---|---|---|---|---|
| ACNNT3-1 | 0.900 | 0.976 | 0.357 | 0.936 |
| 0.452 | 0.667 |
| ACNNT3-2 | 0.872 |
| 0.143 | 0.926 | 0.866 | 0.265 | 0.565 |
| DeepT3-1 | 0.905 | 0.962 | 0.429 | 0.932 | 0.884 |
|
|
| DeepT3-2 |
| 0.924 | 0.500 | 0.918 | 0.860 | 0.437 | 0.763 |
| Effective T3 | 0.906 | 0.906 | 0.428 | 0.906 | 0.838 | 0.334 | 0.810 |
| BPBAac | 0.875 | 0.494 |
| 0.631 | 0.505 | 0.046 | 0.562 |
| BEAN2 | 0.883 | 0.988 | 0.083 |
| 0.884 | 0.271 | 0.607 |
The bold values indicate the best prediction results.
Figure 5ROC curve of the best model selected from the 5-fold crossvalidation on two datasets. (a) ROC curve on a common independent dataset. (b) ROC curve on a P. syringae dataset.