| Literature DB >> 18423054 |
Wasna Viratyosin1, Supawadee Ingsriswang, Eakasit Pacharawongsakda, Prasit Palittapongarnpim.
Abstract
BACKGROUND: In bacterial pathogens, both cell surface-exposed outer membrane proteins and proteins secreted into the extracellular environment play crucial roles in host-pathogen interaction and pathogenesis. Considerable efforts have been made to identify outer membrane (OM) and extracellular (EX) proteins produced by Leptospira interrogans, which may be used as novel targets for the development of infection markers and leptospirosis vaccines. RESULT: In this study we used a novel computational framework based on combined prediction methods with deduction concept to identify putative OM and EX proteins encoded by the Leptospira interrogans genome. The framework consists of the following steps: (1) identifying proteins homologous to known proteins in subcellular localization databases derived from the "consensus vote" of computational predictions, (2) incorporating homology based search and structural information to enhance gene annotation and functional identification to infer the specific structural characters and localizations, and (3) developing a specific classifier for cytoplasmic proteins (CP) and cytoplasmic membrane proteins (CM) using Linear discriminant analysis (LDA). We have identified 114 putative EX and 63 putative OM proteins, of which 41% are conserved or hypothetical proteins containing sequence and/or protein folding structures similar to those of known EX and OM proteins.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18423054 PMCID: PMC2387172 DOI: 10.1186/1471-2164-9-181
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Flow chart of the method used for subcellular localizations of . Protein sequences of Leptospira interrogans serovar Lai genome (4,727 ORFs) were analyzed for subcellular localization using PSORTb, ProtCompB, and Proteome analyst (PA) prediction. (a) The consensus vote was obtained from the majority vote type procedure to obtain the result with high prediction accuracy. If all 3 methods agree for localization it was assigned as a consensus vote. The remaining (1 or 2 out of 3 predicted result) was assigned as non-consensus vote. The consensus vote of CP and CM was used as a training set for the development of an LDA-based classifier for CP and CM in the next step. (b) The non-consensus vote results of OM, PP, and EX were further analyzed for sequence and structure homology by DBsubloc and GTD prediction. The non-consensus vote of EX, OM, and PP with significant homology or/and structure information were identified by DBsubloc and GTD prediction. (c) Non-consensus votes of CP, CM and the non predicted data from DBsubloc and GTD predictions were further analyzed for subcellular localization using LDA-based classifier for CP and CM. Significantly predicted results were proteins classified with more than 0.90 probability for CP and CM proteins. The remaining queries that could not be identified in this step were classified as "unknown" results.
Localization predictions of a set of 299 experimentally verified proteins with known localization
| Actual localization | Total | TP | FP | FN | TN | Precision | Recall |
| CP | 145 | 110 | 1 | 35 | 110 | 99.10% | 75.86% |
| CM | 69 | 55 | 2 | 14 | 166 | 96.49% | 79.71% |
| PP | 29 | 18 | 0 | 11 | 207 | 100.00% | 62.07% |
| OM | 38 | 30 | 0 | 8 | 195 | 100.00% | 78.95% |
| EX | 18 | 6 | 3 | 12 | 216 | 66.67% | 33.33% |
| Total | 299 | 219 | 6 | 80 | 894 | 97.33% | 73.24% |
| CP | 145 | 94 | 0 | 51 | 119 | 100.00% | 64.83% |
| CM | 69 | 59 | 2 | 10 | 162 | 96.72% | 85.51% |
| PP | 29 | 19 | 3 | 10 | 201 | 86.36% | 65.52% |
| OM | 38 | 31 | 0 | 7 | 192 | 100.00% | 81.58% |
| EX | 18 | 9 | 6 | 9 | 207 | 60.00% | 50.00% |
| Total | 299 | 212 | 11 | 87 | 881 | 95.07% | 70.90% |
| CP | 145 | 127 | 11 | 18 | 144 | 92.03% | 87.59% |
| CM | 69 | 55 | 7 | 14 | 227 | 88.71% | 79.71% |
| PP | 29 | 19 | 9 | 10 | 261 | 67.86% | 65.52% |
| OM | 38 | 23 | 18 | 15 | 243 | 56.10% | 60.53% |
| EX | 18 | 11 | 4 | 7 | 277 | 73.33% | 61.11% |
| Total | 299 | 235 | 49 | 64 | 1152 | 82.75% | 78.60% |
| CP | 145 | 67 | 0 | 78 | 154 | 100.00% | 46.21% |
| CM | 69 | 43 | 0 | 26 | 230 | 100.00% | 62.32% |
| PP | 29 | 11 | 0 | 18 | 270 | 100.00% | 37.93% |
| OM | 38 | 19 | 0 | 19 | 261 | 100.00% | 50.00% |
| EX | 18 | 4 | 0 | 13 | 216 | 100.00% | 23.53% |
| Total | 299 | 144 | 0 | 154 | 1131 | 100.00% | 48.32% |
| CP | 145 | 121 | 0 | 24 | 154 | 100.00% | 83.45% |
| CM | 69 | 59 | 0 | 10 | 230 | 100.00% | 85.51% |
| PP | 29 | 17 | 0 | 12 | 270 | 100.00% | 58.62% |
| OM | 38 | 29 | 0 | 10 | 213 | 100.00% | 74.36% |
| EX | 18 | 6 | 1 | 12 | 215 | 85.71% | 33.33% |
| Total | 299 | 232 | 1 | 68 | 1082 | 99.57% | 77.33% |
| CP | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
| CM | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
| PP | 29 | 25 | 0 | 4 | 56 | 100.00% | 86.20% |
| OM | 38 | 34 | 1 | 4 | 46 | 97.14% | 89.47% |
| EX | 18 | 12 | 2 | 6 | 65 | 85.71% | 66.67% |
| Total | 85 | 71 | 3 | 14 | 167 | 95.95% | 87.53% |
299 proteins obtained from the test set used in comparison study by Gardy and Brinkman [22] Majority vote is the result from 2 out of 3 predictions. Combination method: the result from non-consensus vote with significant DBsubloc [31] and/or GTD prediction [32] Precision is calculated as TP/(TP+FP), Recall is calculated as TP/(TP+FN) TP = true positive, TN = true negative, FP = false positive, FN = false negative, N/A= Not applicable
Predicted protein subcellular localizations of L. interrogans by PSORb, PA, ProtCompB and consensus vote predictions.
| Localization | Subcellular localization prediction | |||
| PSORTb | PA | ProtCompB | Consensus vote | |
| Cytoplasm (CP) | 1125 | 921 | 2013 | 418 |
| Cytoplasmic membrane (CM) | 606 | 715 | 1726* | 332 |
| Outer membrane (OM) | 112 | 28 | 15 | |
| Periplasmic (PP) | 30 | 86 | 478 | 17 |
| Extracellular (EX) | 29 | 326 | 510 | 15 |
| Unknown | 2825 | 2652 | - | 3930 |
* Note that ProtCompB prediction in this version, CM and OM were predicted as membrane proteins.
Putative extracellular proteins (EX) predicted by the consensus vote
| LA3731 | LIC10497 | Fmh-like protein/hypothetical protein |
| LA0587 | LIC12988 | Lactonizing lipase/lipase |
| LA0872 | LIC12760 | Microbial collagenase |
| LA1450 | LIC12302 | Probable O-sialoglycoprotein endopeptidase |
| LA2448 | LIC10830 | Putative outermembrane protein/putative lipoprotein |
| LA1765 | LIC12047 | Rhs family protein/cytoplasmic membrane protein |
| LA4161 | LIC13320 | Thermolysin/thermolysin precursor |
| LA4164 | LIC13321 | Thermolysin/thermolysin homolog precursor |
| LA2303 | LIC11634 | 3-oxoacyl- [acyl-carrier protein] reductase/CsgA |
| LA0873 | LIC12759 | LRR containing protein/cytoplasmic membrane protein |
| LA2964 | LIC11098 | LRR containing protein/conserved hypothetical protein |
| LA3028 | LIC11051 | LRR containing protein/conserved hypothetical protein |
| LA3320 | LIC10831 | LRR containing protein/conserved hypothetical protein |
| LA3323 | LIC10829 | LRR containing protein/conserved hypothetical protein |
| LA0709 | LIC12896 | Unknown protein/conserved hypothetical protein |
Note LRR: Leucine-rich repeat
Putative outer membrane proteins (OM) predicted by the consensus vote
| LA2375 | LIC11570 | General secretory pathway protein D |
| LA3149 | LIC10964 | Hemin receptor/TonB-dependent outer membrane hemin receptor |
| LB328 | LIC20250 | Outer membrane protein OmpA/PG-associated CM protein |
| LA3615 | LIC10592 | Outer membrane protein OmpA family/PG-associated CM protein |
| LA1963 | LIC11941 | Outer membrane protein precursor CzcC/heavy metal efflux pump |
| LA3927 | LIC13135 | Outer membrane protein tolC precursor/outer membrane protein |
| LA1356 | LIC12374 | Probable TonB-dependent receptor |
| LA2641 | LIC11345 | Probable TonB-dependent receptor/ferrichrome-iron receptor |
| LA3468 | LIC10714 | Probable TonB-dependent receptor/outer membrane receptor protein |
| LB191 | LIC20151 | Putative TonB-dependent outer membrane receptor protein (Hbp A) |
| LA2510 | LIC11458 | Conserved hypothetical protein/outer membrane protein, porin superfamily |
| LA4337 | LIC13479 | Conserved hypothetical protein/PG-associated CM protein |
| LA0572 | LIC12998 | Conserved hypothetical protein/TonB-dependent outer membrane receptor |
| LA3258 | LIC10881 | Hypothetical protein/outer membrane protein, TonB dependent |
| LA2186 | LIC11739 | Conserved hypothetical protein |
Lai locus: L. interrogans serovar Lai locus
Copen locus: L. interrogans serovar Copenhageni locus
43 Putative extracellular proteins (EX) derived from the 2 out of 3 predictions with significant DBSubloc or/and GTD prediction
| LA1027 | LIC12632 | Sphingomyelinase C precursor (Sph1)/hemolysin | - | |
| LA1029 | LIC12631 | Sphingomyelinase C precursor (Sph2)/hemolysin | - | |
| LA4004 | LIC13198 | Sphingomyelinase C precursor hemolysin (Sph3)/sph- like | - | |
| LA3540 | LIC10657 | Sphingomyelinase C precursor; hemolysin | - | |
| LA3050 | LIC11040 | Hemolytic protein-like protein/hemolysin (sph4) | - | |
| LA3466 | LIC10715 | Thermolysin | ||
| LA3454 | LIC10723 | Flagellar hook-associated protein(fliD) | ||
| LA3097 | LIC11003 | Treponemal membrane protein B precursor-like protein/LipL71 | ||
| LA1530 | LIC12234 | LRR containing protein | ||
| LA1324 | LIC12401 | LRR containing protein/cytoplasmic membrane protein | - | |
| LA1354 | LIC12375 | LRR containing protein/cytoplasmic membrane protein | ||
| LA2452 | LIC11504 | LRR containing protein/cytoplasmic membrane protein | ||
| LA2862 | LIC11180 | LRR containing protein/cytoplasmic membrane protein | ||
| LA2966 | LIC11097 | LRR containing protein/cytoplasmic membrane protein | ||
| LA3324 | LIC10831 | LRR containing protein/conserved hypothetical protein | ||
| LA3321 | LIC10830 | LRR containing protein/putative lipoprotein | ||
| LA3322 | LIC10830 | LRR containing protein/putative lipoprotein | ||
| LA0701 | LIC12901 | LRR containing protein/molybdate metabolism regulator | ||
| LA2377 | LIC11568 | Peptidase, M23/M37/membrane associated peptidase | ||
| LA0505 | LIC13050 | Probable glycosyl hydrolase/conserved hypothetical protein | - | |
| LA3725 | LIC10502 | Probable phenazine biosynthesis family protein/CM protein | - | |
| LA3730 | LIC10498 | Putative lipoprotein | ||
| LA1368 | LIC12364 | Putative outer membrane protein/CagA | - | |
| LA1759 | LIC12050 | Putative outer membrane protein/conserved hypothetical protein | ||
| LA2443 | LIC11507 | Putative outer membrane protein/conserved hypothetical protein | ||
| LA2447 | LIC11505 | Putative outer membrane protein/conserved hypothetical protein | ||
| LA2450 | LIC11505 | Putative outer membrane protein/conserved hypothetical protein | ||
| LA1915 | LIC11990 | TPR-repeat-containing proteins/cytoplasmic membrane protein | ||
| LA0043 | LIC10038 | TPR-repeat-containing proteins/conserved hypothetical protein | ||
| LA2773 | LIC11246 | Conserved hypothetical protein | ||
| LA3233 | LIC10902 | Conserved hypothetical protein | ||
| LB001 | LIC20001 | Conserved hypothetical protein | - | |
| LA1499 | LIC12259 | Conserved hypothetical protein/cytoplasmic membrane protein | ||
| LA1766 | LIC12047 | Conserved hypothetical protein/cytoplasmic membrane protein | ||
| LA3333 | LIC10825 | Conserved hypothetical protein/cytoplasmic membrane protein | ||
| LA2208 | LIC11720 | Conserved hypothetical protein/hypothetical protein | - | |
| LA3276c | LIC10868 | Conserved hypothetical protein/hypothetical protein | ||
| LA0022 | LIC10021 | Conserved hypothetical protein/putative lipoprotein | - | |
| LA3210 | LIC10920 | Conserved hypothetical protein/putative lipoprotein | - | |
| LA3726 | LIC10501 | Conserved hypothetical protein/putative lipoprotein | ||
| LB216 | LIC20172 | Conserved hypothetical protein/putative lipoprotein | - | |
| LB225 | LIC20176 | Conserved hypothetical protein/putative lipoprotein | - | |
| LA4135d | LIC13296 | hypothetical protein/putative lipoprotein | - |
Note LRR: Leucine rich repeat
a: Swiss-Prot ID derived from DBsubloc database
b: PDB code derived from GTD prediction
c: Pfam: PF06739: SBBP (Seven Beta Blade Propeller domain)
d: pfam07588: DUF1554
56 Putative extracellular proteins (EX) derived from the 1 out of 3 predictions with significant DBSubloc or/and GTD prediction
| LB258 | LIC20197 | Cysteine protease | - | |
| LA0975 | LIC12680 | Fimh-like protein | - | |
| LA0858 | LIC12930 | Fimh-like protein/hypothetical protein | - | |
| LA0492 | LIC13060 | LipL36 protein | - | |
| LA3469 | LIC10713 | Iron-reglulated protein A/LruB/putative lipoprotein | - | |
| LA3075 | LIC10464 | Surface protein Lk90-like protein/Ig-like repeat domain | ||
| LA3778 | LIC10464 | Surface protein Lk90-like protein/Ig-like repeat domain | ||
| LA0378 | LIC10325 | TPR-repeat-containing proteins/hemolysin | ||
| LA3138 | LIC10973 | Transmembrane outer membrane protein L1 | - | |
| LA1353 | LIC12375 | LRR containing protein | Q9RBS2 | |
| LB196 | LIC20154 | LRR containing protein/lipoprotein | - | |
| LA0416e | LIC10365 | Putative lipoprotein (LpL effector) | - | |
| LA0962d | LIC12690 | Putative lipoprotein | - | |
| LA1569c | LIC12208 | Putative lipoprotein | ||
| LA2823e | LIC11207 | Putative lipoprotein | - | |
| LA3064e | LIC11030 | Putative lipoprotein | - | |
| LA3848c | LIC13075 | Putative lipoprotein | - | |
| LA3867 | LIC13086 | Putative lipoprotein | - | |
| LA1159 | LIC12525 | Putative outer membrane protein/putative lipoprotein | - | |
| LA1905 | LIC11996 | Putative outer rmembrane protein/hypothetical protein | - | |
| LA1939 | LIC11966 | Putative outer membrane protein/hypothetical protein | - | |
| LA2273 | LIC11665 | Putative outer membrane protein/hypothetical protein | ||
| LA0563d | LIC13006 | Hypothetical protein/putative lipoprotein (LenC) | - | |
| LA0695d | LIC12906 | Hypothetical protein/putative lipoprotein (LenA/LfhA/Lsa24) | - | |
| LA1433d | LIC12315 | Hypothetical protein/putative lipoprotein (LenD) | - | |
| LA3103d | LIC10997 | Hypothetical protein (LenB) | - | |
| LA4073d | LIC13248 | Hypothetical protein/putative lipoprotein (LenF) | - | |
| LA4324d | LIC13467 | Hypothetical protein/conserved hypothetical protein (LenE) | - | |
| LA3370 | LIC10793 | Conserved hypothetical protein/surface antigen (Lp24) | - | |
| LA0965 | LIC12676 | Conserved hypothetical protein | ||
| LA1066 | LIC12601 | Conserved hypothetical protein | - | |
| LA1498 | LIC12260 | Conserved hypothetical protein | - | |
| LA2811 | LIC11217 | Conserved hypothetical protein | ||
| LA3734 | LIC10495 | Conserved hypothetical protein/CM protein | - | |
| LA3834c | LIC13066 | Conserved hypothetical protein | ||
| LA4227 | LIC13381 | Conserved hypothetical protein | - | |
| LA0663 | LIC12930 | Conserved hypothetical protein/hypothetical protein | - | |
| LA0423c | LIC10371 | Conserved hypothetical protein/putative lipoprotein | ||
| LA1567c | LIC12209 | Conserved hypothetical protein/putative lipoprotein | ||
| LA1568c | LIC12209 | Conserved hypothetical protein/putative lipoprotein | ||
| LA1691c | LIC12099 | Conserved hypothetical protein/putative lipoprotein | - | |
| LA3340e | LIC10821 | Conserved hypothetical protein/putative lipoprotein | - | |
| LA3394e | LIC10774 | Conserved hypothetical protein/putative lipoprotein | - | |
| LA3501 | LIC10686 | Conserved hypothetical protein/putative lipoprotein | - | |
| LA0283c | LIC10239 | Hypothetical protein | - | |
| LA0426c | LIC10373 | Hypothetical protein | ||
| LA0996d | LIC12668 | Hypothetical protein | - | |
| LA1764 | LIC12048 | Hypothetical protein | - | |
| LA1869 | LIC12023 | Hypothetical protein | - | |
| LA2272 | LIC11664 | Hypothetical protein | ||
| LA3240 | LIC10898 | Hypothetical protein | ||
| LA0074 | LIC10067 | Hypothetical protein/conserved hypothetical protein | - | |
| LA1065 | LIC12602 | Hypothetical protein/conserved hypothetical protein | - | |
| LA1762 | LIC12048 | Hypothetical protein/conserved hypothetical protein | - | |
| LA3649 | LIC10561 | Hypothetical protein/conserved hypothetical protein | - | |
| LA3881 | LIC13101 | Hypothetical protein/OM with integrin like repeat domains |
Note LRR: Leucine-rich repeat, a: Swiss-Prot ID derived from DBsubloc database, b: PDB code derived from GTD prediction, c: pfam06739: SBBP (seven bladed beta propeller) repeat d: pfam07588: DUF1554, e: pfam07602: DUF1565
23 Putative outer membrane proteins (OM) derived the 2 out of 3 predictions with significant DBSubloc or/and GTD prediction
| LA3471 | LIC10711 | Iron-reglulated protein A/cytoplasmic membrane protein | ||
| LA1161 | LIC12524 | Long-chain fatty acid transport protein/fatty acid transport protein | - | |
| LA1100 | LIC12575 | Outer membrane efflux protein/cytoplasmic membrane protein | - | |
| LA1445 | LIC12307 | Outer membrane efflux protein/OM- TolC superfamily | ||
| LA3685 | LIC10537 | Outer membrane protein/PG- associated periplasmic protein | ||
| LA0056 | LIC10050 | Outer membrane protein OmpA family/PG-associated CM protein | Q05146 | |
| LA2318 | LIC11623 | Predicted outer membrane protein/outer membrane protein | - | |
| LA1968 | LIC11935 | Putative outer membrane protein/conserved hypothetical protein | - | |
| LA2444 | LIC11506 | Putative outer membrane protein/outer membrane protein | - | |
| LB110 | LIC20087 | Putative outer membrane protein/outer membrane protein | - | |
| LA2242 | LIC11694 | TonB-dependent outer membrane receptor | ||
| LA3242 | LIC10896 | TonB-dependent outer membrane receptor | ||
| LA0465 | LIC10405 | TPR-repeat-containing proteins/conserved hypothetical | - | |
| LA3675 | LIC10544 | Hypothetical protein/outer membrane protein | - | |
| LA2063 | LIC11851 | Conserved hypothetical protein/cytoplasmic membrane protein | - | |
| LA3102 | LIC10998 | Conserved hypothetical protein | ||
| LA3675 | LIC10544 | Hypothetical protein/outer membrane protein | - | |
| LA2168 | - | Hypothetical protein | ||
| LA3809 | LIC10439 | Hypothetical protein | - | |
| LA1501 | LIC12258 | Hypothetical protein | - | |
| LA3552 | LIC10647 | Hypothetical protein/conserved hypothetical protein | - | |
| LA2818 | LIC11211 | Hypothetical protein/conserved hypothetical protein | - | |
| LA4059 | LIC13238 | Hypothetical protein/conserved hypothetical protein | - | |
| LB279 | LIC20214 | Hypothetical protein/conserved hypothetical protein | - |
Note a: Swiss-Prot ID derived from DBsubloc database, b: PDB code derived from GTD prediction
25 Putative outer membrane proteins (OM) derived from the 1 out of 3 predictions with significant DBSubloc and/or GTD prediction
| LA0616 | LIC12966 | LipL41/Outer membrane lipoprotein lipL41 | - | |
| LA2295 | LIC11643 | LipL45 protein | ||
| LA0957 | LIC12693 | Outer membrane efflux protein/conserved hypothetical protein | ||
| LA0581 | LIC12990 | Outer membrane efflux protein/conserved hypothetical protein | ||
| LA3733 | LIC10496 | Outer membrane efflux protein/conserved hypothetical protein | - | |
| LA0301 | LIC10258 | Outer membrane protein OmpA family/hypothetical protein | ||
| LA0222 | LIC10191 | Outer membrane protein OmpA family/PG-associated CM protein | ||
| LA1192 | LIC12499 | Putative outer membrane protein | - | |
| LA1404 | LIC12337 | Putative outer membrane protein | - | |
| LA1931 | LIC11975 | Putative outer membrane protein/outer membrane protein | - | |
| LA1987 | LIC11918 | Putative outer membrane protein/conserved hypothetical protein | - | |
| LB199 | LIC20157 | Putative outer membrane protein/conserved hypothetical protein | - | |
| LA1030 | LIC12630 | TPR-repeat-containing proteins/hypothetical protein | ||
| LA0568 | LIC13002 | Conserved hypothetical protein | - | |
| LA1510 | LIC12252 | Conserved hypothetical protein | - | |
| LA0835 | LIC12791 | Hypothetical protein/conserved hypothetical protein | - | |
| LA2746 | LIC11268 | Hypothetical protein/conserved hypothetical protein | - | |
| LA2940 | LIC11121 | Hypothetical protein/conserved hypothetical protein | - | |
| LA2976 | LIC11086 | Hypothetical protein/conserved hypothetical protein | - | |
| LA3870 | LIC13089 | Hypothetical protein/conserved hypothetical protein | - | |
| LA4272 | LIC13418 | Hypothetical protein/conserved hypothetical protein | - | |
| LA4335 | LIC13477 | Hypothetical protein/conserved hypothetical protein | - | |
| LA0706 | LIC12898 | Unknown protein | ||
| LA1507 | LIC12254 | Unknown protein/outer membrane protein | - | |
| LA3853 | LIC13078 | Unknown protein/conserved hypothetical protein | - |
Note a: Swiss-Prot ID derived from DBsubloc database, b: PDB code derived from GTD prediction
Protein subcellular localizations of L. interrogans predicted by PSORTb, PA, ProtCompB and the combination prediction
| Cytoplasm (CP) | 1125 | 921 | 2013 | 2690 |
| Cytoplasmic membrane (CM) | 606 | 715 | 1726* | 813 |
| Outer membrane (OM) | 112 | 28 | 63 | |
| Periplasmic (PP) | 30 | 86 | 478 | 75 |
| Extracellular (EX) | 29 | 326 | 510 | 114 |
| Unknown | 2825 | 2652 | - | 972 |
* Note that in ProtCompB prediction in this version, CM and OM were predicted as membrane proteins.