| Literature DB >> 33214771 |
Rahul Semwal1, Pritish Kumar Varadwaj1.
Abstract
AIMS: To develop a tool that can annotate subcellular localization of human proteins.Entities:
Keywords: Bioinformatics; deep learning; deep neural network; human protein; machine learning; subcellular localization
Year: 2020 PMID: 33214771 PMCID: PMC7604748 DOI: 10.2174/1389202921999200528160534
Source DB: PubMed Journal: Curr Genomics ISSN: 1389-2029 Impact factor: 2.236
Fig. (1)Architectural view of the HumDLoc model. The vertical rectangular bars represent the layer of HumDLoc model, and dotted lines between the two layers represent a fully connected connection.
Fig. (2)a) Tracing of training and validation loss b) Tracing of training and validation accuracy during the training of HumDLoc. (A higher resolution / colour version of this figure is available in the electronic copy of the article).
Fig. (3)Performance analysis of HumDLoc against seven subcellular compartments. a) Represents the ROC analysis of HumDLoc. b) Represents the Precision-Recall analysis of HumDLoc. (A higher resolution / colour version of this figure is available in the electronic copy of the article).
Fig. (4)Performance comparison among HumDLoc, DeepLoc, and CELLO using LocDB dataset. (A higher resolution / colour version of this figure is available in the electronic copy of the article).
Summary of human protein subcellular localization.
|
|
|
|
|
|---|---|---|---|
| 1 | Nucleus | 1251 | Nucleus, Nucleus Matrix, Nucleolus, Nucleus lamina, Nucleus envelope, Nucleus speckle, Nucleus inner membrane, Nucleus Outer Membrane, Nucleus Membrane, Chromosome. |
| 2 | Cell Membrane | 1045 | Cell Membrane, Apical cell membrane, Basal cell membrane, Basolateral cell membrane, Lateral cell membrane, Cell Projection, lamellipodium, axon, dendrite, filopodium, Phagocytic cup. |
| 3 | Cytoplasm | 764 | Cytoplasm, Microtubule, Stress fiber, Spindle, myofibril, Spindle Pole, Centrosome, Cytoskeleton, Cytosol, sarcomere, A band, M line, H zone, Z line, I band, microtubule organizing center. |
| 4 | Extracellular | 578 | Extracellular, Secreted, Extracellular Space, Extracellular Matrix, Basement membrane, surface film, Interphotoreceptor matrix. |
| 5 | Mitochondrion | 456 | Mitochondrion, Mitochondrion outer membrane, Mitochondrion Matrix, mitochondrion nucleoid, Mitochondrion Membrane, Mitochondrion Inner Membrane, Mitochondrion intermembrane space. |
| 6 | Endoplasmic Reticulum | 230 | Endoplasmic Reticulum, Endoplasmic reticulum Membrane, Sarcoplasmic Reticulum, Microsome, Endoplasmic Reticulum Lumen, Microsome membrane, Rough endoplasmic reticulum, Rough endoplasmic reticulum lumen, Smooth endoplasmic reticulum membrane. |
| 7 | Golgi Apparatus | 94 | Golgi Apparatus, Golgi Network, Golgi Apparatus Lumen, Golgi apparatus Membrane, Cis Golgi network, cis-Golgi network membrane, Golgi stack membrane, trans-Golgi network membrane, Golgi stack trans-Golgi network. |
List of protein sequence features calculated using ‘protr’.
|
|
|
|
|
|---|---|---|---|
| 1 | Conjoint Triad | Conjoint Triad | 343 |
| 2 | Pseudo-Amino Acid Composition | Pseudo-Amino Acid Composition | 50 |
| Amphiphilic Pseudo-Amino Acid Composition | 80 | ||
| 3 | CTD | Composition | 21 |
| Transition | 21 | ||
| Distribution | 105 | ||
| 4 | Quasi-Sequence-Order | Sequence-Order-Coupling Number | 60 |
| Quasi-Sequence-Order Descriptors | 100 | ||
| 5 | Autocorrelation | Moran Autocorrelation | 240 |
| Geary Autocorrelation | 240 | ||
| Normalized Moreau-Broto Autocorrelation | 240 | ||
| 6 | Amino Acid Composition | Amino Acid Composition | 20 |
| Dipeptide Composition | 400 | ||
| Tripeptide Composition | 8000 |
HumDLoc statistical scores correspond to each subcellular localization/compartment.
|
|
|
|
|
| |
|---|---|---|---|---|---|
| Cell Membrane | 0.9862 | 0.99 | 0.98 | ||
| 0.99 | +0.9912 | Cytoplasm | 0.9358 | 0.71 | 0.81 |
| 0.76 | +0.7104 | Endoplasmic Reticulum | 0.9862 | 0.77 | 1.00 |
| 0.87 | +0.7717 | Golgi-Apparatus | 1.00 | 1.00 | 1.00 |
| 1.00 | +1.000 | Mitochondria | 0.9817 | 0.87 | 0.87 |
| 0.87 | +0.8765 | Nucleus | 0.9771 | 0.88 | 0.92 |
| 0.90 | +0.8801 | Extracellular | 0.9679 | 1.00 | 0.82 |
0.90
+0.9810
Micro-averaged comparative statistical scores of classifiers.
|
|
|
|
|
| |
|---|---|---|---|---|---|
| HumDCLoc | 0.92 | ||||
| K-NNa | 0.9004 | 0.65 | 0.65 | ||
| 0.65 | +0.6500 | Naive-Bayes | 0.9043 | 0.67 | 0.67 |
| 0.67 | +0.6093 | SVMLb | 0.9646 | 0.88 | 0.88 |
| 0.88 | +0.8555 | SVMPc | 0.9581 | 0.85 | 0.85 |
| 0.85 | +0.8287 | SVMRd | 0.9607 | 0.86 | 0.86 |
| 0.86 | +0.8394 | SVMSe | 0.9541 | 0.84 | 0.84 |
| 0.84 | +0.8400 | RFf | 0.9227 | 0.97 | 0.67 |
| 0.80 | +0.6843 | DeepLoc | 0.9521 | 0.81 | 0.81 |
| 0.78 | +0.7631 | CELLO | 0.9498 | 0.80 | 0.80 |
0.80
+0.7704
aK-nearest neighbour, bSVM with linear kernel, cSVM with polynomial kernel, dSVM with radial basis kernel, eSVM with sigmoid kernel, fRandom Forests.