| Literature DB >> 31218353 |
Castrense Savojardo1, Niccolò Bruciaferri1, Giacomo Tartari1,2, Pier Luigi Martelli1, Rita Casadio1,2.
Abstract
MOTIVATION: The correct localization of proteins in cell compartments is a key issue for their function. Particularly, mitochondrial proteins are physiologically active in different compartments and their aberrant localization contributes to the pathogenesis of human mitochondrial pathologies. Many computational methods exist to assign protein sequences to subcellular compartments such as nucleus, cytoplasm and organelles. However, a substantial lack of experimental evidence in public sequence databases hampered so far a finer grain discrimination, including also intra-organelle compartments.Entities:
Year: 2020 PMID: 31218353 PMCID: PMC6956790 DOI: 10.1093/bioinformatics/btz512
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Summary statistics of the SM424-18 and the SubMitoPred datasets
| Compartment | SM424-18 | SubMitoPred |
|---|---|---|
| Outer membrane | 74 | 82 |
| Inner membrane | 190 | 282 |
| Intermembrane space | 25 | 32 |
| Matrix | 135 | 174 |
| Total | 424 | 570 |
This paper.
Number of sequences.
From Kumar .
Fig. 1.Schematic view of the DeepMito CNN architecture.
Cross-validation performance on the SM424-18 dataset using different feature sets
| Feature set | MCC(O) | MCC(I) | MCC(T) | MCC(M) | GCC |
|---|---|---|---|---|---|
| SEQ | 0.17 | 0.15 | 0.13 | 0.07 | 0.15 |
| PROP | 0.17 | 0.07 | 0.22 | 0.13 | 0.19 |
| PSSM | 0.51 | 0.47 | 0.42 | 0.57 | 0.50 |
| SEQ+PROP | 0.16 | 0.07 | 0.55 | 0.09 | 0.34 |
| PSSM+PROP | 0.46 | 0.47 | 0.53 | 0.65 | 0.54 |
MCC (O, I, T, M): Matthews Correlation Coefficient of Outer, Inner, Intermembrane and Matrix localization, respectively.
GCC: Generalized Correlation Coefficient (Equation (16)).
Residue one-hot encoding.
Residue physicochemical attributes.
PSSM: Position Specific Scoring Matrix.
DeepMito prediction performance on proteins from different taxonomic kingdoms
| Kingdom | MCC(O) | MCC(I) | MCC(T) | MCC(M) | GCC |
|---|---|---|---|---|---|
| Metazoa (193 | 0.44 | 0.44 | 0.52 | 0.69 | 0.54 |
| Viridiplantae (60 | 0.45 | 0.52 | 0.90 | 0.76 | 0.71 |
| Fungi (166 | 0.49 | 0.52 | 0.37 | 0.59 | 0.50 |
MCC (O, I, T, M): Matthews Correlation Coefficient of Outer, Inner, Intermembrane and Matrix localization, respectively.
GCC: Generalized Correlation Coefficient (Equation (16)).
Number of sequences.
DeepMito prediction performance on mitochondrial membrane proteins with respect to annotated membrane protein topology
| Topology |
|
| MCC(O) | MCC(I) |
|---|---|---|---|---|
| SP | 71 (31+40) | 92 | 0.43 | 0.38 |
| MP | 94 (21+73) | 98 | 0.47 | 0.49 |
| PM | 61 (6+55) | 36 | 0.36 | 0.09 |
N P (NO+NI): number of membrane protein (outer and inner).
: the fraction of proteins correctly predicted in either inner or outer membrane.
SP: single-pass membrane protein; MP: multiple-pass membrane protein; PM: peripheral membrane protein.
Performance comparison of different methods
| Method | Cross-validation | MCC(O) | MCC(I) | MCC(T) | MCC(M) |
|---|---|---|---|---|---|
| SubMitoPred | RS | 0.42 | 0.34 | 0.19 | 0.51 |
| DeepMito | RS | 0.45 | 0.68 | 0.54 | 0.79 |
| DeepMito | CL | 0.42 | 0.60 | 0.46 | 0.76 |
Results taken from Kumar .
RS=cross-validation performed by random splitting the dataset. CL=cross-validation performed confining any local similarity into the same cross-validation set (see Sections 2.2.1 and 2.2.2 for details).
Fig. 2.Distribution of annotations and DeepMito predictions on the Mito-CA-Annotated dataset.
Fig. 3.Distribution of DeepMito predictions on the Mito-CA-Full dataset.