| Literature DB >> 30463553 |
Fei He1,2, Rui Wang1, Jiagen Li1, Lingling Bao1, Dong Xu3, Xiaowei Zhao4,5.
Abstract
BACKGROUND: Ubiquitination, which is also called "lysine ubiquitination", occurs when an ubiquitin is attached to lysine (K) residues in targeting proteins. As one of the most important post translational modifications (PTMs), it plays the significant role not only in protein degradation, but also in other cellular functions. Thus, systematic anatomy of the ubiquitination proteome is an appealing and challenging research topic. The existing methods for identifying protein ubiquitination sites can be divided into two kinds: mass spectrometry and computational methods. Mass spectrometry-based experimental methods can discover ubiquitination sites from eukaryotes, but are time-consuming and expensive. Therefore, it is priority to develop computational approaches that can effectively and accurately identify protein ubiquitination sites.Entities:
Keywords: Convolution neural network; Deep learning; Deep neural network; Multiple modalities; Protein ubiquitination site
Mesh:
Year: 2018 PMID: 30463553 PMCID: PMC6249717 DOI: 10.1186/s12918-018-0628-0
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Details of training dataset, validation dataset and independent testing dataset
| Data set | Description | |||
|---|---|---|---|---|
| Number of sequences | Number of positive data | Number of negative data | Note | |
| Training | 12,100 | 7733 | 250,054 | Random partitioning in each training iteration |
| Validation | 1547 | 50,010 | ||
| Testing | 1345 | 6293 | 46,080 | Reservation |
The selected physico-chemical properties
| Physico-chemical property | Description |
|---|---|
| EISD860102 | Atom-based hydrophobic moment |
| ZIMJ680104 | Isoelectric point |
| HUTJ700103 | Entropy of formation |
| KARP850103 | Flexibility parameter for two rigid neighbors |
| JANJ780101 | Average accessible surface area |
| FAUJ880111 | Positive charge |
| GUYH850104 | Apparent partition energies calculated from Janin index |
| JANJ780103 | Percentage of exposed residues |
| JANJ790102 | Transfer free energy |
| PONP800102 | Average gain in surrounding hydrophobicity |
| CORJ870101 | NNEIG index |
| VINM940101 | Normalized flexibility parameters, average |
| OOBM770101 | Average non-bonded energy per atom |
Fig. 1The structure of the proposed deep architecture
The hyper-parameters of the proposed deep architecture
| Subnet | Layer | Hyper-parameters | |||
|---|---|---|---|---|---|
| Activation function | Size | Filters | Drop-out | ||
| One hot vector | 1D Convolution | softsign | 2 | 200 | 0.4 |
| softsign | 3 | 150 | 0.4 | ||
| softsign | 5 | 150 | 0.4 | ||
| softsign | 7 | 100 | 0.4 | ||
| Densea | relu | 256 | – | 0.3 | |
| relu | 128 | – | 0 | ||
| relu | 128 | – | – | ||
| Phsico- chemical properties | Dense | softplus | 1024 | – | 0.2 |
| softplus | 512 | – | 0.4 | ||
| softplus | 256 | – | 0.5 | ||
| relu | 128 | – | – | ||
| PSSM profile | 1D Convolution | relu | 1 | 200 | 0.5 |
| relu | 8 | 150 | 0.5 | ||
| relu | 9 | 200 | 0.5 | ||
| 1D Convolutionb | relu | 1 | 200 | 0.5 | |
| relu | 3 | 150 | 0.5 | ||
| relu | 7 | 200 | 0.5 | ||
| Dense | relu | 128 | – | 0.3 | |
| relu | 128 | – | 0 | ||
| Merged representations | Dense | softmax | 2 | – | 0 |
aDense layers represent for the fully connected layers in keras
bThe layers were designed for trans-positioned PSSM profile
cThe size of convolution layers means the kernel sizes, and the size of Dense layers denotes the number of hidden states
Fig. 2The accuracies of validation samples using different window sizes on three modalities
Fig. 3ROC and precision-recall curves comparing our multi-modal network and subnets of uni-modality
Fig. 4t-SNE visualization of (a) input layers and (b) merged layer
Comparative results with SVM classifier and Random Forest
| Model | Input | Metrics | |||
|---|---|---|---|---|---|
| Accuracy | Sensitivity | Specificity | MCC | ||
| SVM | One hot vector | 59.65% | 46.69% | 61.42% | 0.054 |
| Physico-chemical property | 57.36% | 43.84% | 59.22% | 0.051 | |
| PSSM | 55.71% | 44.29% | 57.84% | 0.047 | |
| Merged | 56.92% | 44.34% | 58.97% | 0.049 | |
| Random Forest | One hot vector | 57.27% | 45.01% | 58.94% | 0.026 |
| Physico-chemical property | 56.55% | 47.40% | 57.80% | 0.034 | |
| PSSM | 54.19% | 44.98% | 56.32% | 0.021 | |
| Merged | 56.52% | 46.36% | 58.83% | 0.024 | |
| Our deep architecture | One hot vector | 64.15% | 64.41% | 64.08% | 0.189 |
| Physico-chemical property | 61.84% | 60.97% | 61.95% | 0.151 | |
| PSSM | 56.82% | 58.73% | 56.57% | 0.099 | |
| Merged | 66.43% | 66.67% | 66.40% | 0.221 | |
Comparison of independent testing performance with other ubiquitination site prediction tools
| Tool | Metrics | |||
|---|---|---|---|---|
| Accuracy | Sensitivity | Specificity | MCC | |
| ESA-Ubisite | 61.26% | 46.14% | 63.34% | 0.064 |
| UbiProber | 55.06% | 62.40% | 54.05% | 0.107 |
| iUbiq-Lys | 84.63% | 3.35% | 96.88% | 0.005 |
| Ubisite | 73.63% | 29.62% | 79.64% | 0.073 |
| Our deep architecture | 66.43% | 66.67% | 66.40% | 0.221 |
Fig. 5The ROC and precision-recall curves comparing proposed deep architecture and other protein ubiquitination site prediction tools
The performance of our deep architecture on different datasets
| Datasets | Metrics | |||
|---|---|---|---|---|
| Accuracy | Sensitivity | Specificity | MCC | |
| CPLM | 74.39% | 60.44% | 74.72% | 0.120 |
| Swiss-prot | 64.28% | 68.95% | 63.81% | 0.193 |
| hCKSAAP | 73.97% | 73.76% | 73.98% | 0.26 |
| PLMD | 66.43% | 66.67% | 66.40% | 0.221 |