| Literature DB >> 32784927 |
Xiao Wang1, Yinping Jin1, Qiuwen Zhang1.
Abstract
Mitochondrial proteins are physiologically active in different compartments, and their abnormal location will trigger the pathogenesis of human mitochondrial pathologies. Correctly identifying submitochondrial locations can provide information for disease pathogenesis and drug design. A mitochondrion has four submitochondrial compartments, the matrix, the outer membrane, the inner membrane, and the intermembrane space, but various existing studies ignored the intermembrane space. The majority of researchers used traditional machine learning methods for predicting mitochondrial protein localization. Those predictors required expert-level knowledge of biology to be encoded as features rather than allowing the underlying predictor to extract features through a data-driven procedure. Besides, few researchers have considered the imbalance in datasets. In this paper, we propose a novel end-to-end predictor employing deep neural networks, DeepPred-SubMito, for protein submitochondrial location prediction. First, we utilize random over-sampling to decrease the influence caused by unbalanced datasets. Next, we train a multi-channel bilayer convolutional neural network for multiple subsequences to learn high-level features. Third, the prediction result is outputted through the fully connected layer. The performance of the predictor is measured by 10-fold cross-validation and 5-fold cross-validation on the SM424-18 dataset and the SubMitoPred dataset, respectively. Experimental results show that the predictor outperforms state-of-the-art predictors. In addition, the prediction of results in the M983 dataset also confirmed its effectiveness in predicting submitochondrial locations.Entities:
Keywords: deep learning; imbalance data; mitochondria; mitochondrial intermembrane space
Mesh:
Substances:
Year: 2020 PMID: 32784927 PMCID: PMC7460811 DOI: 10.3390/ijms21165710
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Receiver Operating Characteristic (ROC) curve of our proposed predictor performance in imbalanced data and balanced data. (a) Imbalance Data ROC; (b) Balance Data ROC.
DeepPred-SubMito structure parameters.
| Parameter | List of Values Evaluated |
|---|---|
| Sliding window size (W) | 80, 130, 180, 230, 280 |
| Max-pooling | 2 |
| Number of convolutional motifs (F) | 32, 64, 128 |
| Kernel size (k) | 3, 5, 7, 9 |
| Droup (D) | 0.25 |
| Optimization | Adam |
CNN structure.
| Name | Architecture |
|---|---|
| 1 layer32 | 32 Convolution kernels |
| 1 layer64 | 64 Convolution kernels |
| 1 layer128 | 128 Convolution kernels |
| 2 layer | 64/128 Convolution kernels |
| 3 layer | 64/64/128 Convolution kernels |
Figure 2Accuracy of the different sliding window under different convolutional neural networks (CNN) structures on the dataset SubMitoPred.
Figure 3Performance comparison of DeepPred-SubMito with different numbers of CNN structures and kernel sizes in the SubMitoPred dataset.
Figure 4Accuracy of different CNN structure models in the SubMitoPred dataset.
Performance comparison of different predictors.
| Datasets | Model | MCC (O) | MCC (I) | MCC (S) | MCC (M) | ACC |
|---|---|---|---|---|---|---|
| SM424-18 | DeepMito | 0.46 | 0.47 | 0.53 | 0.65 | NA |
| DeepPred-SubMito | 0.85 | 0.49 | 0.99 | 0.56 | 0.79 | |
| SubMitoPred | SubMitoPred | 0.42 | 0.34 | 0.19 | 0.51 | NA |
| DeepMito | 0.45 | 0.68 | 0.54 | 0.79 | NA | |
| DeepPred-SubMito | 0.92 | 0.69 | 0.97 | 0.73 | 0.88 |
MCC (O, I, S, M): Matthew Correlation Coefficient of outer membrane, inner membrane, intermembrane space, and matrix localization, respectively. ACC: accuracy. NA: Not available.
Figure 5Confusion matrix of multi-classification in SM424-18 and SubMitoPred datasets. (a) confusion matrix of multi-classification in SM424-18 dataset; (b) confusion matrix of multi-classification in SubMitoPred dataset.
Prediction results for submitochondrial of the M983 dataset.
| Dataset | Model | MCC (I) | MCC (M) | MCC (O) | ACC (%) |
|---|---|---|---|---|---|
| M983 | SubMito-PSPCP | 0.77 | 0.73 | 0.83 | 89.01 |
| Ahmad et al. | 0.871 | 0.986 | 0.996 | 0.951 | |
| SubMito-XGBoost | 0.9559 | 0.9595 | 0.9604 | 98.94 | |
| DeepPred-SubMito | 0.9503 | 0.9649 | 0.9807 | 97.68 |
MCC (I, M, O): Matthew Correlation Coefficient of inner membrane, matrix, and outer membrane localization, respectively. ACC: accuracy.
The feature of datasets.
| Compartment | SM424-18 | SubMitoPred | M983 |
|---|---|---|---|
| Outer membrane | 74 | 82 | 145 |
| Inner membrane | 190 | 282 | 661 |
| Intermembrane space | 25 | 32 | NA |
| Matrix | 135 | 174 | 177 |
| Total | 424 | 570 | 983 |
NA: Not available.
Figure 6A flowchart of protein submitochondrial localizations prediction based on the DeepPred-SubMito predictor.