| Literature DB >> 33790952 |
Zhe Liu1,2,3, Yingli Gong4, Yuanzhao Guo2, Xiao Zhang5, Chang Lu2, Li Zhang1, Han Wang2.
Abstract
Transmembrane protein (TMP) is an important type of membrane protein that is involved in various biological membranes related biological processes. As major drug targets, TMPs' surfaces are highly concerned to form the structural biases of their material-bindings for drugs or other biological molecules. However, the quantity of determinate TMP structures is still far less than the requirements, while artificial intelligence technologies provide a promising approach to accurately identify the TMP surfaces, merely depending on their sequences without any feature-engineering. For this purpose, we present an updated TMP surface residue predictor TMP-SSurface2 which achieved an even higher prediction accuracy compared to our previous version. The method uses an attention-enhanced Bidirectional Long Short Term Memory (BiLSTM) network, benefiting from its efficient learning capability, some useful latent information is abstracted from protein sequences, thus improving the Pearson correlation coefficients (CC) value performance of the old version from 0.58 to 0.66 on an independent test dataset. The results demonstrate that TMP-SSurface2 is efficient in predicting the surface of transmembrane proteins, representing new progress in transmembrane protein structure modeling based on primary sequences. TMP-SSurface2 is freely accessible at https://github.com/NENUBioCompute/TMP-SSurface-2.0.Entities:
Keywords: attention mechanism; deep learning; long short term memory; relative accessible surface area; transmembrane protein
Year: 2021 PMID: 33790952 PMCID: PMC8006303 DOI: 10.3389/fgene.2021.656140
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1One-hot code of protein residues.
FIGURE 2Encoding features as the model input.
FIGURE 3(A) Pipeline of the deep learning model. (B) The attention-enhanced bidirectional LSTM network.
Prediction performance based on individual input features and their various combinations.
| Feature | CC | MAE |
| Z-coordinate | 0.310 | 0.191 |
| one-hot | 0.417 | 0.180 |
| PSSM | 0.631 | 0.144 |
| one-hot+PSSM | 0.641 | 0.142 |
| one-hot+PSSM+ Z-coordinate |
Effect of sliding window length on CC performance.
| Window Length | CC | MAE |
| 13 | 0.642 | 0.141 |
| 15 | 0.641 | 0.143 |
| 17 | 0.645 | 0.143 |
| 19 | ||
| 21 | 0.646 | 0.141 |
| 23 | 0.640 | 0.142 |
Effect of the number of LSTM layers on CC performance.
| LSTM Layers | CC | MAE | Num of parameters |
| 1 | 0.648 | 0.140 | 4,187,781 |
| 2 | |||
| 3 | 0.642 | 0.141 | 27,718,981 |
| 4 | 0.646 | 0.141 | 39,484,581 |
Effect of dropout rate on CC performance.
| Dropout rate | Train CC | Test CC | Test MAE |
| No | 0.851 | 0.632 | 0.143 |
| 0.2 | 0.806 | 0.640 | 0.143 |
| 0.3 | |||
| 0.4 | 0.762 | 0.641 | 0.141 |
| 0.5 | 0.725 | 0.638 | 0.143 |
Effect of LSTM units’ number on CC performance.
| Num of units | CC | MAE | Num of Parameters |
| 500 | 0.639 | 0.142 | 2,191,381 |
| 600 | 0.641 | 0.142 | 3,109,591 |
| 700 | |||
| 800 | 0.643 | 0.143 | 5,425,981 |
| 900 | 0.646 | 0.140 | 6,824,181 |
Comparison of TMP-SSurface2 with the previous predictors on the independent dataset.
| Predictor | CC | MAE | Failure | Time Cost (min) |
| MPRAP | 0.397 | 0.176 | 9 | 6.5 |
| MemBrane-Rasa | 0.545 | 0.153 | 7 | 23.7 |
| TMP-SSurface | 0.584 | 0.144 | 0 | 4.7 |
| TMP-SSurface2 |
Performance of TMP-SSurface2 on different types of TMPs.
| TMP Types | Protein number | CC | MAE |
| α-helical TMPs | 45 | 0.674 | 0.138 |
| β-barrel TMPs | 5 | 0.562 | 0.151 |
| all-TMPs | 50 | 0.659 | 0.140 |
FIGURE 4Validation loss curve of the training process with and without attention mechanism.
Contribution of attention mechanism.
| Model | CC | MAE |
| No attention | 0.637 | 0.150 |
| Attention with LSTM | ||
| Attention with Dropout | 0.645 | 0.141 |
FIGURE 5Visualization of the features learned by LSTM using PCA.
FIGURE 6The 3D visualization of the predicted result (surface version).
FIGURE 7The comparison between the TMP-SSurface2-predicted rASA values and real rASA values.