| Literature DB >> 28744305 |
Xiao Wang1, Hui Li1, Rong Wang1, Qiuwen Zhang1, Weiwei Zhang1, Yong Gan1.
Abstract
Apoptosis proteins play an important role in the mechanism of programmed cell death. Predicting subcellular localization of apoptosis proteins is an essential step to understand their functions and identify drugs target. Many computational prediction methods have been developed for apoptosis protein subcellular localization. However, these existing works only focus on the proteins that have one location; proteins with multiple locations are either not considered or assumed as not existing when constructing prediction models, so that they cannot completely predict all the locations of the apoptosis proteins with multiple locations. To address this problem, this paper proposes a novel multilabel predictor named MultiP-Apo, which can predict not only apoptosis proteins with single subcellular location but also those with multiple subcellular locations. Specifically, given a query protein, GO-based feature extraction method is used to extract its feature vector. Subsequently, the GO feature vector is classified by a new multilabel classifier based on the label-specific features. It is the first multilabel predictor ever established for identifying subcellular locations of multilocation apoptosis proteins. As an initial study, MultiP-Apo achieves an overall accuracy of 58.49% by jackknife test, which indicates that our proposed predictor may become a very useful high-throughput tool in this area.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28744305 PMCID: PMC5514333 DOI: 10.1155/2017/9183796
Source DB: PubMed Journal: Comput Intell Neurosci
Breakdown of the apoptosis protein benchmark dataset MSapo518.
| Order | Compartment | Number of proteins |
|---|---|---|
| 1 | Cytoplasm | 244 |
| 2 | Membrane | 126 |
| 3 | Secreted | 36 |
| 4 | Mitochondrion | 107 |
| 5 | Nucleus | 207 |
| 6 | Endosome | 12 |
| 7 | Endoplasmic reticulum | 47 |
| 8 | Golgi apparatus | 25 |
Figure 1The distribution of proteins with different number of subcellular locations.
Figure 2Schematic illustration of using Pearson's correlation coefficient (PCC) to rank features for each different class label.
Figure 3A flowchart to show how the MultiP-Apo predictor works. See the text for further explanation.
Performance comparison of MultiP-Apo with BrP-Apo on the benchmark dataset MSapo518 by the jackknife test.
| Measure | MultiP-Apo (%) | BrP-Apo (%) |
|---|---|---|
| mlACC | 76.37 | 62.84 |
| mlPRE | 84.12 | 71.10 |
| mlREC | 84.86 | 74.56 |
| mlF1 | 81.87 | 69.61 |
| ACC | 58.49 | 42.08 |
A comparison of the overall accuracies (ACCs) by MultiP-Apo and BrP-Apo for proteins with different number of subcellular locations.
| Number of locations | Number of proteins | The overall accuracy (ACC) | |
|---|---|---|---|
| MultiP-Apo (%) | BrP-Apo (%) | ||
| 1 | 303 | 68.65 | 50.83 |
| 2 | 155 | 56.13 | 36.77 |
| 3 | 52 | 15.38 | 13.46 |
| 4 | 6 | 0 | 0 |
| 5 | 1 | 0 | 0 |
| 6 | 1 | 0 | 0 |
Figure 4The graph shows how different numbers of homologous proteins affect the prediction performance (a) for the mlACC metric, (b) for mlPRE, (c) for mlREC, (d) for mlF1, and (e) for ACC.
Multilabel performance comparison of MultiP-Apo with GO-DWKNN on a new dataset by the independent test.
| Measure | MultiP-Apo (%) | GO-DWKNN (%) |
|---|---|---|
| mlACC | 69.17 | 48.53 |
| mlPRE | 90.38 | 88.46 |
| mlREC | 72.05 | 48.53 |
| mlF1 | 77.07 | 59.87 |
| ACC | 46.15 | 19.23 |