| Literature DB >> 31886228 |
Supatcha Lertampaiporn1, Sirapop Nuannimnoi1, Tayvich Vorapreeda1, Nipa Chokesajjawatee2, Wonnop Visessanguan2, Chinae Thammarongtham1.
Abstract
Several computational approaches for predicting subcellular localization have been developed and proposed. These approaches provide diverse performance because of their different combinations of protein features, training datasets, training strategies, and computational machine learning algorithms. In some cases, these tools may yield inconsistent and conflicting prediction results. It is important to consider such conflicting or contradictory predictions from multiple prediction programs during protein annotation, especially in the case of a multiclass classification problem such as subcellular localization. Hence, to address this issue, this work proposes the use of the particle swarm optimization (PSO) algorithm to combine the prediction outputs from multiple different subcellular localization predictors with the aim of integrating diverse prediction models to enhance the final predictions. Herein, we present PSO-LocBact, a consensus classifier based on PSO that can be used to combine the strengths of several preexisting protein localization predictors specially designed for bacteria. Our experimental results indicate that the proposed method can resolve inconsistency problems in subcellular localization prediction for both Gram-negative and Gram-positive bacterial proteins. The average accuracy achieved on each test dataset is over 98%, higher than that achieved with any individual predictor.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31886228 PMCID: PMC6925685 DOI: 10.1155/2019/5617153
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Summary of predictors used in this work.
| Predictor | Organism categories | Subcellular compartments predicted | URL | References |
|---|---|---|---|---|
| SOSUI-GramN | Gram-negative bacteria | Extracellular region, outer membrane, periplasm, inner membrane, and cytoplasm |
| [ |
|
| ||||
| CELLO | Bacteria, eukaryotes | Extracellular region, outer membrane, inner membrane, periplasm, and cytoplasm |
| [ |
|
| ||||
| CELLO2GO | Archaea, bacteria, eukaryotes, viruses | Extracellular region, outer membrane, inner membrane, periplasm, and cytoplasm |
| [ |
|
| ||||
| PSLpred | Gram-negative bacteria | Extracellular region, outer membrane, inner membrane, periplasm, and cytoplasm |
| [ |
|
| ||||
| SLP-local | Prokaryotes, eukaryotes | 3 locations for prokaryotes (cytoplasm, extracellular region, and periplasm) |
| [ |
|
| ||||
| Gneg-mPLoc | Gram-negative bacteria | Cytoplasm, extracellular region, fimbria, flagella, inner membrane, nucleoid, outer membrane, and periplasm |
| [ |
|
| ||||
| Gpos-mPLoc | Gram-positive bacteria | Cytoplasm, cell wall, plasma membrane, and extracellular region |
| [ |
|
| ||||
| PSORTb 3.0 | Archaea, bacteria | 4 locations for Gram-positive bacteria and archaea (cytoplasm, cytoplasmic membrane, cell wall, and extracellular region) |
| [ |
| 5 locations for Gram-negative bacteria (cytoplasm, inner membrane, periplasm, outer membrane, and extracellular region) | ||||
|
| ||||
| ngLOC | Prokaryotes, eukaryotes | 4 locations for Gram-positive bacteria (cytoplasm, inner membrane, cell wall, and extracellular region) |
| [ |
| 5 locations for Gram-negative bacteria (cytoplasm, inner membrane, periplasm, outer membrane, and extracellular region) | ||||
|
| ||||
| LocTree3 | Archaea, bacteria, eukaryotes | 3 locations for archaea (cytoplasm, extracellular region, and plasma membrane) |
| [ |
| 6 locations for bacteria (cytoplasm, extracellular region, fimbria, outer membrane, periplasm, and plasma membrane) | ||||
Figure 1Flowchart of the proposed algorithm.
Figure 2Flowchart of the PSO process.
Accuracy of each individual classifier and PSO-LocBact.
| Gram-negative bacterial proteins | ||||||
| Location: predictor | Extracellular region | Outer membrane | Periplasm | Inner membrane | Cytoplasm | Overall |
|
| ||||||
| CELLO | 40.69% | 48.84% | 76.74% | 87.21% | 89.53% |
|
| PSORTb 3.0 | 100% | 100% | 88.37% | 100% | 98.84% |
|
| CELLO2GO | 100% | 100% | 87.21% | 100% | 100% |
|
| SOSUI-GramN | 66.28% | 56.98% | 67.44% | 90.70% | 87.21% |
|
| SLP-local | 36.05% | 0 | 75.58% | 0 | 65.12% |
|
| ngLOC | 77.91% | 96.51% | 86.05% | 93.02% | 94.19% |
|
| Gneg-mPLoc | 82.56% | 89.53% | 1.16% | 100% | 0 |
|
| PSLpred | 0 | 100% | 1.16% | 0 | 0 |
|
| LocTree3 | 84.88% | 46.51% | 80.23% | 93.02% | 93.02% |
|
| PSO-LocBact | 100% | 100% | 94.19% | 100% | 100% |
|
|
| ||||||
| Gram-positive bacterial proteins | ||||||
| Location: predictor | Extracellular region | Cell wall | Inner membrane | Cytoplasm | Overall | |
|
| ||||||
| CELLO | 86.84% | 29.87% | 100% | 100% |
| |
| PSORTb 3.0 | 93.42% | 93.50% | 100% | 100% |
| |
| CELLO2GO | 97.39% | 90.90% | 100% | 100% |
| |
| ngLOC | 86.84% | 42.85% | 93.67% | 100% |
| |
| Gpos-mPLoc | 34.21% | 24.68% | 77.21% | 100% |
| |
| LocTree3 | 85.53% | 0 | 91.14% | 96.20% |
| |
| PSO-LocBact | 97.39% | 94.80% | 100% | 100% |
| |
Accuracy of various consensus methods on the test sets.
| Gram-negative bacterial proteins | ||||||
| Location: | Extracellular region (%) | Outer membrane (%) | Periplasm (%) | Inner membrane (%) | Cytoplasm (%) | Overall (%) |
|
| ||||||
| Single predictors (as shown in | 0–100 | 0–100 | 1.16–88.37 | 0–100 | 0–100 |
|
| Consensus classifier: PSO-LocBact | 100 | 100 | 94.19 | 100 | 100 |
|
| Consensus classifier: majority voting | 97.67 | 100 | 95.35 | 100 | 98.84 |
|
| Consensus classifier: Naïve Bayes | 100 | 98.84 | 94.18 | 100 | 98.84 |
|
| Consensus classifier: logistic regression | 98.84 | 100 | 97.67 | 95.35 | 98.84 |
|
| Consensus classifier: average probability voting | 98.84 | 100 | 90.69 | 98.84 | 98.84 |
|
| Single predictor: FUEL-mLoc (2017) | 79.07 | 97.67 | 96.51 | 93.02 | 82.56 |
|
|
| ||||||
| Gram-positive bacterial proteins | ||||||
| Location: | Extracellular region (%) | Cell wall (%) | Inner membrane (%) | Cytoplasm (%) | Overall (%) | |
|
| ||||||
| Single predictors (as shown in | 34.21–97.39 | 0–93.50 | 77.21–100 | 97.50–100 |
| |
| Consensus classifier: PSO-LocBact | 97.39 | 94.80 | 100 | 100 |
| |
| Consensus classifier: majority voting | 93.42 | 93.50 | 100 | 100 |
| |
| Consensus classifier: Naïve Bayes | 69.73 | 92.20 | 100 | 100 |
| |
| Consensus classifier: logistic regression | 89.47 | 100 | 100 | 100 |
| |
| Consensus classifier: average probability voting | 96.05 | 87.01 | 100 | 98.73 |
| |
| Single predictor: FUEL-mLoc (2017) | 86.84 | 81.82 | 100 | 100 |
| |
Accuracy of PSO-LocBact in different experimental cases.
| Gram-negative bacterial proteins | ||||||
| Location: | Extracellular region (%) | Outer membrane (%) | Periplasm (%) | Inner membrane (%) | Cytoplasm (%) | Overall (%) |
|
| ||||||
| Experimental case 1 | 98.84 | 100 | 93.02 | 97.67 | 100 |
|
| Experimental case 2 (>90) | 100 | 100 | 87.21 | 100 | 100 |
|
| Experimental case 3 (<90) | 80.23 | 91.86 | 90.69 | 94.19 | 96.51 |
|
| Experimental case 4 (>80) | 100 | 100 | 89.53 | 100 | 98.84 |
|
| Experimental case 5 (<80) | 84.88 | 54.65 | 84.88 | 95.35 | 94.19 |
|
| Experimental case 6 (>70) | 100 | 100 | 94.19 | 100 | 100 |
|
| Experimental case 7 (<70) | 44.19 | 66.28 | 81.4 | 88.37 | 95.35 |
|
| Experimental case 8 (>60) | 100 | 100 | 94.19 | 100 | 98.83 |
|
| Experimental case 9 (<60) | 76.74 | 96.51 | 86.05 | 81.39 | 93.02 |
|
|
| ||||||
| Gram-positive bacterial proteins | ||||||
| Location: | Extracellular region (%) | Cell wall (%) | Inner membrane (%) | Cytoplasm (%) | Overall (%) | |
|
| ||||||
| Experimental case 1 | 96.05 | 100 | 100 | 98.73 |
| |
| Experimental case 2 (>90) | 96.34 | 96.39 | 100 | 100 |
| |
| Experimental case 3 (<90) | 89.02 | 54.22 | 98.82 | 100 |
| |
| Experimental case 4 (>80) | 97.56 | 96.39 | 100 | 100 |
| |
| Experimental case 5 (<80) | 78.05 | 43.37 | 91.76 | 100 |
| |
| Experimental case 6 (>70) | 94.73 | 100 | 100 | 100 |
| |
| Experimental case 7 (<70) | 68.29 | 12.05 | 100 | 100 |
| |
| Experimental case 8 (>60) | 97.56 | 96.39 | 100 | 100 |
| |
| Experimental case 9 (<60) | NA | NA | NA | NA |
| |
Experimental case 1: performance of PSO-LocBact without PSORTb 3.0. Experimental case 2: performance of PSO-LocBact considering only classifiers with accuracy ≥90%. Experimental case 3: performance of PSO-LocBact considering only classifiers with accuracy <90%. Experimental case 4: performance of PSO-LocBact considering only classifiers with accuracy ≥80%. Experimental case 5: performance of PSO-LocBact considering only classifiers with accuracy <80%. Experimental case 6: performance of PSO-LocBact considering only classifiers with accuracy ≥70%. Experimental case 7: performance of PSO-LocBact considering only classifiers with accuracy <70%. Experimental case 8: performance of PSO-LocBact considering only classifiers with accuracy ≥60%. Experimental case 9: performance of PSO-LocBact considering only classifiers with accuracy <60%.
Accuracy of PSO-LocBact compared to other state-of-the-art methods on the well-known benchmark dataset S taken from [7, 9, 30].
| Gram-negative bacterial proteins | ||||||
| Benchmark dataset S: predictor | Inner membrane (557 proteins) | Outer membrane (124 proteins) | Cytoplasm (410 proteins) | Extracellular region (133 proteins) | Periplasm (180 proteins) | Overall (1,404 proteins) |
|
| ||||||
| PSO-LocBact | 547 | 116 | 387 | 129 | 171 |
|
| Gram-LocEN [ | 551 | 116 | 374 | 130 | 169 |
|
| PSORTb 3.0 [ | 529 | 114 | 380 | 117 | 168 |
|
| CELLO2GO [ | 519 | 107 | 383 | 128 | 170 |
|
| Gneg-PLoc [ | 454 | 68 | 362 | 59 | 87 |
|
| Gneg-mPLoc [ | 525 | 105 | 357 | 79 | 154 |
|
| iLoc-Gneg [ | 539 | 103 | 367 | 115 | 161 |
|
| Fuel-mLoc [ | 541 | 111 | 379 | 129 | 161 |
|
|
| ||||||
| Gram-positive bacterial proteins | ||||||
| Benchmark dataset S: predictor | Cell membrane (174 proteins) | Cell wall (18 proteins) | Cytoplasm (208 proteins) | Extracellular region (123 proteins) | Overall (523 proteins) | |
|
| ||||||
| PSO-LocBact | 174 | 18 | 206 | 122 |
| |
| Gram-LocEN [ | 173 | 17 | 203 | 120 |
| |
| PSORTb 3.0 [ | 169 | 14 | 203 | 112 |
| |
| CELLO2GO [ | 149 | 10 | 197 | 121 |
| |
| iLoc-Gpos [ | 167 | 12 | 198 | 110 |
| |
| Fuel-mLoc [ | 170 | 17 | 202 | 117 |
| |
| Gpos-PLoc [ | — | — | — | — |
| |
| Gpos-mPLoc [ | — | — | — | — |
| |
| ML-KNN [ | — | — | — | — |
| |
| wML-KNN [ | — | — | — | — |
| |
PSO-LocBact configuration variables.
| Configuration variable | Value type | Default value | Description |
|---|---|---|---|
| w1 | Float | 0.9 | Inertial weight value at the beginning of PSO |
| w2 | Float | 0.4 | Inertial weight value at the end of PSO |
| c1i | Float | 2.5 | Cognitive coefficient value at the beginning of PSO |
| c1f | Float | 0.5 | Cognitive coefficient value at the end of PSO |
| c2i | Float | 0.5 | Social coefficient value at the beginning of PSO |
| c2f | Float | 2.5 | Social coefficient value at the end of PSO |
| Particle num | Integer | 25 | Number of particles generated in the swarm |
| MAXOBJ | Integer | 1,000 | Maximum number of allowable objective function calls |
| MAXITER | Integer | — | Maximum number of allowable iterations; if this value is set, MAXOBJ will be ignored |
| (Program_name) | String | (Gram-negative: CELLO, PSORTb 3.0, CELLO2GO, SOSUI-GramN, SLP-Local, ngLOC, Gneg-mPLoc, PSLpred, LocTree3; Gram-positive: CELLO, PSORTb 3.0, CELLO2GO, ngLOC, Gpos-mPLoc, LocTree3) | A list of names of the programs used to calculate the final result |
| (Weight) | Float | A list of weights given to represent the reliability of every program included |