Literature DB >> 18423054

Genome-wide subcellular localization of putative outer membrane and extracellular proteins in Leptospira interrogans serovar Lai genome using bioinformatics approaches.

Wasna Viratyosin1, Supawadee Ingsriswang, Eakasit Pacharawongsakda, Prasit Palittapongarnpim.   

Abstract

BACKGROUND: In bacterial pathogens, both cell surface-exposed outer membrane proteins and proteins secreted into the extracellular environment play crucial roles in host-pathogen interaction and pathogenesis. Considerable efforts have been made to identify outer membrane (OM) and extracellular (EX) proteins produced by Leptospira interrogans, which may be used as novel targets for the development of infection markers and leptospirosis vaccines. RESULT: In this study we used a novel computational framework based on combined prediction methods with deduction concept to identify putative OM and EX proteins encoded by the Leptospira interrogans genome. The framework consists of the following steps: (1) identifying proteins homologous to known proteins in subcellular localization databases derived from the "consensus vote" of computational predictions, (2) incorporating homology based search and structural information to enhance gene annotation and functional identification to infer the specific structural characters and localizations, and (3) developing a specific classifier for cytoplasmic proteins (CP) and cytoplasmic membrane proteins (CM) using Linear discriminant analysis (LDA). We have identified 114 putative EX and 63 putative OM proteins, of which 41% are conserved or hypothetical proteins containing sequence and/or protein folding structures similar to those of known EX and OM proteins.
CONCLUSION: Overall results derived from the combined computational analysis correlate with the available experimental evidence. This is the most extensive in silico protein subcellular localization identification to date for Leptospira interrogans serovar Lai genome that may be useful in protein annotation, discovery of novel genes and understanding the biology of Leptospira.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18423054      PMCID: PMC2387172          DOI: 10.1186/1471-2164-9-181

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


Background

Leptospirosis is a globally widespread zoonosis caused by the animal spirochete pathogen Leptospira interrogans [1]. The clinical feature of its severe disease form, known as Weil's syndrome, or acute renal failure, is associated with multiple system complications, including renal failure, meningitis, and pulmonary haemorrhage. Although early treatment for leptospirosis is important for ensuring a favorable clinical outcome, this is often difficult to achieve, as symptoms during the early stages of infection resemble those of several other systematic diseases. One potential method for controlling the spread of leptospirosis is through the development of vaccines. Candidates for vaccine production include outer membrane (OM) and extracellular (EX) proteins, several of which have been implicated in chemotaxis, adherence and other pathogenic steps. Attempts to identify such proteins have been performed previously by experimental [2-14] and computational methods [15-20]. Complete genome sequences of two serovars, Lai and Copenhageni of L. interrogans have been reported [15-17]. Hundreds of putative membrane proteins and lipoproteins were predicted, although in many cases, gene annotation may be incomplete or inaccurate to reliably identify putative vaccine candidates. Previous studies have tried to identify potential vaccine candidates using experimental methods and in silico predictions. Proteomic analysis of purified outer membrane vesicles (OMVs) of L. interrogans serovar Copenhageni was performed by Nally et al. and revealed 33 intact OM proteins [13]. The study by Gamberini et al. [18] showed 16 predicted surface exposed lipoproteins of L. interrogans serovar Copenhageni via whole genome analysis, only four of which are conserved among 8 pathogenic serovars. Since leptospiral lipoproteins are usually (but not exclusively) surface exposed proteins, and many are vaccine candidates, Setubal et al. [19] focused on lipoprotein prediction using spirochaetal lipoprotein (SpLip) program and identified 146 predicted lipoproteins (but not their localizations) for L. interrogans serovar Lai. The search for new potential vaccine candidates was continued by Yang et al. [20], who used a filtering approach combining in silico analysis, comparative genome hybridization, and microarray methods to identify 226 leptospiral surface exposed proteins. All of the previous studies summarized above focus on identification of vaccine candidates. However, both computational and experimental have their own drawbacks [21,22] Computational methods, for instance, depend on the presence of type I signal peptides [23,24], transmembrane helices [24-26], or other particular features specifically found in previously identified membrane proteins, which may not be highly specific or sensitive. Experimental methods, on the other hand, yield results that may be complicated by cross-compartment contamination occurring during the preparation of samples, which can also result in the inclusion of false positive results in data sets [21,22]. Hence, results obtained from both methods can occasionally lead to conflicting conclusions. We believe that such a focused approach without attempt to accurately identify periplasmic proteins (PP) and cytoplasmic membrane (CM) proteins can lead to erroneous identification of PP and CM as OM or EX by both in silico and experimental approaches. A holistic prediction of all membrane protein localizations will lead to better accuracy in genome annotation of membrane proteins, including vaccine candidates. In this study we utilized a combination of three computational prediction tools PSORTb [27,28], Proteome Analyst (PA) [29], and ProtCompB [30] to perform whole genome analysis of protein subcellular localization, and to identify novel putative L. interrogans serovar Lai OM and EX vaccine candidates. We combined the results derived from these three prediction algorithms into a consensus vote, resulting in a more accurate protein subcellular localization prediction. Furthermore, we incorporated homology searching against the DBSubloc database [31] and structural information from the GTD prediction [32] to enhance genome annotation, and to infer OM, EX and PP localized proteins. We also developed a specific classifier based on Linear Discriminant Analysis (LDA) for identification of leptospiral cytoplasmic proteins (CP) and cytoplasmic membrane proteins (CM), using a training set obtained from the consensus vote. We were able to assign subcellular localizations to several previously uncharacterized hypothetical proteins, thus improving L interrogans genome annotation.

Results

We performed the subcellular localization prediction of L. interrogans serovar Lai using the pipeline described in the Material and methods section (shown in Figure 1), following the steps of training set verification, consensus vote, homology and structural prediction, and finally LDA-based classification.
Figure 1

Flow chart of the method used for subcellular localizations of . Protein sequences of Leptospira interrogans serovar Lai genome (4,727 ORFs) were analyzed for subcellular localization using PSORTb, ProtCompB, and Proteome analyst (PA) prediction. (a) The consensus vote was obtained from the majority vote type procedure to obtain the result with high prediction accuracy. If all 3 methods agree for localization it was assigned as a consensus vote. The remaining (1 or 2 out of 3 predicted result) was assigned as non-consensus vote. The consensus vote of CP and CM was used as a training set for the development of an LDA-based classifier for CP and CM in the next step. (b) The non-consensus vote results of OM, PP, and EX were further analyzed for sequence and structure homology by DBsubloc and GTD prediction. The non-consensus vote of EX, OM, and PP with significant homology or/and structure information were identified by DBsubloc and GTD prediction. (c) Non-consensus votes of CP, CM and the non predicted data from DBsubloc and GTD predictions were further analyzed for subcellular localization using LDA-based classifier for CP and CM. Significantly predicted results were proteins classified with more than 0.90 probability for CP and CM proteins. The remaining queries that could not be identified in this step were classified as "unknown" results.

Flow chart of the method used for subcellular localizations of . Protein sequences of Leptospira interrogans serovar Lai genome (4,727 ORFs) were analyzed for subcellular localization using PSORTb, ProtCompB, and Proteome analyst (PA) prediction. (a) The consensus vote was obtained from the majority vote type procedure to obtain the result with high prediction accuracy. If all 3 methods agree for localization it was assigned as a consensus vote. The remaining (1 or 2 out of 3 predicted result) was assigned as non-consensus vote. The consensus vote of CP and CM was used as a training set for the development of an LDA-based classifier for CP and CM in the next step. (b) The non-consensus vote results of OM, PP, and EX were further analyzed for sequence and structure homology by DBsubloc and GTD prediction. The non-consensus vote of EX, OM, and PP with significant homology or/and structure information were identified by DBsubloc and GTD prediction. (c) Non-consensus votes of CP, CM and the non predicted data from DBsubloc and GTD predictions were further analyzed for subcellular localization using LDA-based classifier for CP and CM. Significantly predicted results were proteins classified with more than 0.90 probability for CP and CM proteins. The remaining queries that could not be identified in this step were classified as "unknown" results.

Training set verification: Localization predictions of a set of experimentally verified proteins with known localization

To evaluate the robustness and versatility of our protein localization procedure, we used a set of well- characterized Gram-negative bacterial proteins with experimentally verified localizations taken from the work by Gardy and Brinkman [22] as a test set. The data set comprising 299 proteins was first analyzed by using PSORTb, PA, and ProtCompB. We found that, individually, PSORTb, PA, and ProtCompB assigned 73%, 71% and 79% of the verified protein localizations respectively (recall rate in Table 1). The overall precision rates were 97%, 95 and 83%, respectively. As expected, the overall recall rate was highest for ProtCompB, while its precision rate was also the lowest. The recall rate based on "consensus vote" (see materials and methods) results derived from all three methods was 48% without any false positives. Relaxing the criteria by considering predicted results of any two methods or the "majority vote" resulted in an overall recall rate of 77% with a single false positive.
Table 1

Localization predictions of a set of 299 experimentally verified proteins with known localization

Actual localizationTotalTPFPFNTNPrecisionRecall
PSORTb
CP14511013511099.10%75.86%
CM695521416696.49%79.71%
PP2918011207100.00%62.07%
OM383008195100.00%78.95%
EX18631221666.67%33.33%
Total29921968089497.33%73.24%
Proteome Analyst
CP14594051119100.00%64.83%
CM695921016296.72%85.51%
PP291931020186.36%65.52%
OM383107192100.00%81.58%
EX1896920760.00%50.00%
Total299212118788195.07%70.90%
ProtCompB
CP145127111814492.03%87.59%
CM695571422788.71%79.71%
PP291991026167.86%65.52%
OM3823181524356.10%60.53%
EX18114727773.33%61.11%
Total2992354964115282.75%78.60%
Consensus vote
CP14567078154100.00%46.21%
CM6943026230100.00%62.32%
PP2911018270100.00%37.93%
OM3819019261100.00%50.00%
EX184013216100.00%23.53%
Total29914401541131100.00%48.32%
Majority vote (2 out of 3 predictions)
CP145121024154100.00%83.45%
CM6959010230100.00%85.51%
PP2917012270100.00%58.62%
OM3829010213100.00%74.36%
EX18611221585.71%33.33%
Total299232168108299.57%77.33%
Combination method
CPN/AN/AN/AN/AN/AN/AN/A
CMN/AN/AN/AN/AN/AN/AN/A
PP29250456100.00%86.20%
OM3834144697.14%89.47%
EX1812266585.71%66.67%
Total857131416795.95%87.53%

299 proteins obtained from the test set used in comparison study by Gardy and Brinkman [22] Majority vote is the result from 2 out of 3 predictions. Combination method: the result from non-consensus vote with significant DBsubloc [31] and/or GTD prediction [32] Precision is calculated as TP/(TP+FP), Recall is calculated as TP/(TP+FN) TP = true positive, TN = true negative, FP = false positive, FN = false negative, N/A= Not applicable

Localization predictions of a set of 299 experimentally verified proteins with known localization 299 proteins obtained from the test set used in comparison study by Gardy and Brinkman [22] Majority vote is the result from 2 out of 3 predictions. Combination method: the result from non-consensus vote with significant DBsubloc [31] and/or GTD prediction [32] Precision is calculated as TP/(TP+FP), Recall is calculated as TP/(TP+FN) TP = true positive, TN = true negative, FP = false positive, FN = false negative, N/A= Not applicable Since the number of outputs for EX and OM proteins agreed by all three predictions was low (low recall rate), we used structure-based homology information from GTD and/or homology search results from DBSubloc prediction as the additional information for inferring protein localization. Using this information, we assessed the likelihood of the "non-consensus vote" outputs (see material and methods) for being EX or OM proteins. When the information from DBSubloc and GTD predictions were also used, the overall recall rates for the EX, OM and PP increased to 67%, 89% and 86% respectively as shown in Table 1. The method resulted in 96% precision. This performance was much better than any of the three individual methods, or any of the above combinations. Therefore, we have shown that the combination of prediction tools, DBSubloc homology search and GTD structural-based prediction markedly improved the accuracy and recall for EX, OM and PP protein localization prediction. Therefore, our prediction pipeline is applicable for subcellular localization prediction of hypothetical, or unknown proteins.

Subcellular localization predictions of L. interrogans: Step 1 Consensus votes

After demonstration of the accuracy of our pipeline prediction with the training set, the whole predicted proteome of L. interrogans serovar Lai was analyzed using three computational predictions for protein subcellular localization: PSORTb, ProtCompB, and Proteome analyst (PA). The results obtained from each prediction program are shown in Table 2. ProtCompB assigned subcellular localizations to all protein queries whereas approximately 50% of protein queries were assigned as unknown localization by PSORTb and PA.
Table 2

Predicted protein subcellular localizations of L. interrogans by PSORb, PA, ProtCompB and consensus vote predictions.

LocalizationSubcellular localization prediction

PSORTbPAProtCompBConsensus vote
Cytoplasm (CP)11259212013418
Cytoplasmic membrane (CM)6067151726*332
Outer membrane (OM)1122815
Periplasmic (PP)308647817
Extracellular (EX)2932651015
Unknown28252652-3930

* Note that ProtCompB prediction in this version, CM and OM were predicted as membrane proteins.

Predicted protein subcellular localizations of L. interrogans by PSORb, PA, ProtCompB and consensus vote predictions. * Note that ProtCompB prediction in this version, CM and OM were predicted as membrane proteins. After inspection of the prediction results derived from the three prediction algorithms, it was found that 797 out of 4,727 ORFs of L. interrogans serovar Lai genome had the following consensus vote predicted localizations: 418 cytoplasmic proteins (CP), 332 cytoplasmic membrane proteins (CM), 17 periplasmic proteins (PP), 15 outer membrane proteins (OM), and 15 extracellular/secreted proteins (EX) (Table 2, 3, 4 Additional file 1, 2, 3). The biological functions of most of the localized proteins are already annotated. Only about 9% (68 of 797 ORFs) were proteins annotated as conserved hypothetical or unknown proteins. This shows that the consensus vote approach has a high accuracy of subcellular localization prediction for L. interrogans. However, this recall of these methods is unacceptably low, since the localization of the majority of proteins remains unknown (3930 out of 4727 proteins).
Table 3

Putative extracellular proteins (EX) predicted by the consensus vote

Lai LocusCopen LocusProtein annotation
LA3731LIC10497Fmh-like protein/hypothetical protein
LA0587LIC12988Lactonizing lipase/lipase
LA0872LIC12760Microbial collagenase
LA1450LIC12302Probable O-sialoglycoprotein endopeptidase
LA2448LIC10830Putative outermembrane protein/putative lipoprotein
LA1765LIC12047Rhs family protein/cytoplasmic membrane protein
LA4161LIC13320Thermolysin/thermolysin precursor
LA4164LIC13321Thermolysin/thermolysin homolog precursor
LA2303LIC116343-oxoacyl- [acyl-carrier protein] reductase/CsgA
LA0873LIC12759LRR containing protein/cytoplasmic membrane protein
LA2964LIC11098LRR containing protein/conserved hypothetical protein
LA3028LIC11051LRR containing protein/conserved hypothetical protein
LA3320LIC10831LRR containing protein/conserved hypothetical protein
LA3323LIC10829LRR containing protein/conserved hypothetical protein
LA0709LIC12896Unknown protein/conserved hypothetical protein

Note LRR: Leucine-rich repeat

Table 4

Putative outer membrane proteins (OM) predicted by the consensus vote

Lai LocusCopen LocusProtein annotation
LA2375LIC11570General secretory pathway protein D
LA3149LIC10964Hemin receptor/TonB-dependent outer membrane hemin receptor
LB328LIC20250Outer membrane protein OmpA/PG-associated CM protein
LA3615LIC10592Outer membrane protein OmpA family/PG-associated CM protein
LA1963LIC11941Outer membrane protein precursor CzcC/heavy metal efflux pump
LA3927LIC13135Outer membrane protein tolC precursor/outer membrane protein
LA1356LIC12374Probable TonB-dependent receptor
LA2641LIC11345Probable TonB-dependent receptor/ferrichrome-iron receptor
LA3468LIC10714Probable TonB-dependent receptor/outer membrane receptor protein
LB191LIC20151Putative TonB-dependent outer membrane receptor protein (Hbp A)
LA2510LIC11458Conserved hypothetical protein/outer membrane protein, porin superfamily
LA4337LIC13479Conserved hypothetical protein/PG-associated CM protein
LA0572LIC12998Conserved hypothetical protein/TonB-dependent outer membrane receptor
LA3258LIC10881Hypothetical protein/outer membrane protein, TonB dependent
LA2186LIC11739Conserved hypothetical protein

Lai locus: L. interrogans serovar Lai locus

Copen locus: L. interrogans serovar Copenhageni locus

Putative extracellular proteins (EX) predicted by the consensus vote Note LRR: Leucine-rich repeat Putative outer membrane proteins (OM) predicted by the consensus vote Lai locus: L. interrogans serovar Lai locus Copen locus: L. interrogans serovar Copenhageni locus When comparing the concordance or prediction agreement rates between the three prediction methods (excluding proteins with unknown localization by one or two programs), the rates for PSORTb and PA, PSORTb and ProtCompB, and PA and ProtCompB were 70.3%, 80%, and 59.5%, respectively. PSORTb was found to have a strong propensity to assign protein queries to CP and OM proteins, while PA was found to assign preferentially to CM, PP and EX proteins (p < 0.001, chi-square tests).

Step 2: Homology-based and protein folding recognition predictions for non-consensus vote localizations

The non-consensus vote OM, EX, and PP proteins were further analyzed for localizations using DBsubloc, and GTD. As presented in Table 5, 6, 99 more proteins (43 out of 83 proteins predicted by two previous methods and 56 out of 617 proteins predicted by one previous method) were additionally identified as putative EX, while 48 proteins (23 out of 59 proteins predicted by two methods, and 25 from 980 proteins predicted by one method) were additionally identified as putative OM proteins as shown in Table 7, 8. Moreover, 58 proteins (20 out of 20 proteins predicted by two methods and 38 out of 504 proteins predicted by one method) were additionally predicted as PP proteins (Additional file 1). It is of interest that several protein loci currently annotated as hypothetical proteins without localization information were predicted in EX, OM and PP compartments by the combination method (Tables 3, 4, 5, 6, 7, 8, 9 and Additional file 1). The homology search and structural information from DBSubloc and GTD thus allowed further identification of EX, OM, and PP from the non-consensus vote set, however, 3725 protein localizations remain unknown.
Table 5

43 Putative extracellular proteins (EX) derived from the 2 out of 3 predictions with significant DBSubloc or/and GTD prediction

Lai LocusCopen LocusProtein annotationSWISS-PROTaPDB Codeb
LA1027LIC12632Sphingomyelinase C precursor (Sph1)/hemolysin-1bix
LA1029LIC12631Sphingomyelinase C precursor (Sph2)/hemolysin-1bix
LA4004LIC13198Sphingomyelinase C precursor hemolysin (Sph3)/sph- like-1bix
LA3540LIC10657Sphingomyelinase C precursor; hemolysin-1bix
LA3050LIC11040Hemolytic protein-like protein/hemolysin (sph4)-1aq0
LA3466LIC10715ThermolysinP431331hyt
LA3454LIC10723Flagellar hook-associated protein(fliD)Q9KWW71osp
LA3097LIC11003Treponemal membrane protein B precursor-like protein/LipL71P196491l8w
LA1530LIC12234LRR containing proteinQ9RBS21d0b
LA1324LIC12401LRR containing protein/cytoplasmic membrane protein-1ogq
LA1354LIC12375LRR containing protein/cytoplasmic membrane proteinQ9RBS21ogq
LA2452LIC11504LRR containing protein/cytoplasmic membrane proteinQ9RBS21ogq
LA2862LIC11180LRR containing protein/cytoplasmic membrane proteinQ9RBS21ogq
LA2966LIC11097LRR containing protein/cytoplasmic membrane proteinQ9RBS21ogq
LA3324LIC10831LRR containing protein/conserved hypothetical proteinQ9RBS21ogq
LA3321LIC10830LRR containing protein/putative lipoproteinQ9RBS21ogq
LA3322LIC10830LRR containing protein/putative lipoproteinQ9RBS21ogq
LA0701LIC12901LRR containing protein/molybdate metabolism regulatorQ9RBS21ogq
LA2377LIC11568Peptidase, M23/M37/membrane associated peptidaseP242041acc
LA0505LIC13050Probable glycosyl hydrolase/conserved hypothetical protein-1f00
LA3725LIC10502Probable phenazine biosynthesis family protein/CM protein-1air
LA3730LIC10498Putative lipoproteinP159211rmg
LA1368LIC12364Putative outer membrane protein/CagAP47460-
LA1759LIC12050Putative outer membrane protein/conserved hypothetical proteinQ526571czf
LA2443LIC11507Putative outer membrane protein/conserved hypothetical proteinQ9RBS21ogq
LA2447LIC11505Putative outer membrane protein/conserved hypothetical proteinQ9RBS21jl5
LA2450LIC11505Putative outer membrane protein/conserved hypothetical proteinQ9RBS21ogq
LA1915LIC11990TPR-repeat-containing proteins/cytoplasmic membrane proteinP805441qqe
LA0043LIC10038TPR-repeat-containing proteins/conserved hypothetical proteinQ9KQ401qqe
LA2773LIC11246Conserved hypothetical proteinQ068521l8w
LA3233LIC10902Conserved hypothetical proteinO834971qcx
LB001LIC20001Conserved hypothetical protein-1eut
LA1499LIC12259Conserved hypothetical protein/cytoplasmic membrane proteinP358251dab
LA1766LIC12047Conserved hypothetical protein/cytoplasmic membrane proteinQ078331czf
LA3333LIC10825Conserved hypothetical protein/cytoplasmic membrane proteinQ078331acc
LA2208LIC11720Conserved hypothetical protein/hypothetical protein-1e15
LA3276cLIC10868Conserved hypothetical protein/hypothetical proteinP153451dab
LA0022LIC10021Conserved hypothetical protein/putative lipoprotein-1dab
LA3210LIC10920Conserved hypothetical protein/putative lipoprotein-1rmg
LA3726LIC10501Conserved hypothetical protein/putative lipoproteinQ9PJY21acc
LB216LIC20172Conserved hypothetical protein/putative lipoprotein-1wxr
LB225LIC20176Conserved hypothetical protein/putative lipoprotein-1wxr
LA4135dLIC13296hypothetical protein/putative lipoprotein-1koe

Note LRR: Leucine rich repeat

a: Swiss-Prot ID derived from DBsubloc database

b: PDB code derived from GTD prediction

c: Pfam: PF06739: SBBP (Seven Beta Blade Propeller domain)

d: pfam07588: DUF1554

Table 6

56 Putative extracellular proteins (EX) derived from the 1 out of 3 predictions with significant DBSubloc or/and GTD prediction

Lai LocusCopen locusProtein annotationSWISS-PROTaPDB codeb
LB258LIC20197Cysteine protease-1deu
LA0975LIC12680Fimh-like protein-1a6c
LA0858LIC12930Fimh-like protein/hypothetical protein-1dab
LA0492LIC13060LipL36 protein-1acc
LA3469LIC10713Iron-reglulated protein A/LruB/putative lipoprotein-1rmg
LA3075LIC10464Surface protein Lk90-like protein/Ig-like repeat domainP358281dab
LA3778LIC10464Surface protein Lk90-like protein/Ig-like repeat domainQ526571dbg
LA0378LIC10325TPR-repeat-containing proteins/hemolysinQ98KC11a17
LA3138LIC10973Transmembrane outer membrane protein L1-1acc
LA1353LIC12375LRR containing proteinQ9RBS21jl5
LB196LIC20154LRR containing protein/lipoprotein-1d0b
LA0416eLIC10365Putative lipoprotein (LpL effector)-1gq8
LA0962dLIC12690Putative lipoprotein-1eut/1koe
LA1569cLIC12208Putative lipoproteinP153451acc
LA2823eLIC11207Putative lipoprotein-1gq8
LA3064eLIC11030Putative lipoprotein-1czf
LA3848cLIC13075Putative lipoprotein-1qjv
LA3867LIC13086Putative lipoprotein-1cwv
LA1159LIC12525Putative outer membrane protein/putative lipoprotein-1cs6
LA1905LIC11996Putative outer rmembrane protein/hypothetical protein-1kit
LA1939LIC11966Putative outer membrane protein/hypothetical protein-1fio
LA2273LIC11665Putative outer membrane protein/hypothetical protein1air
LA0563dLIC13006Hypothetical protein/putative lipoprotein (LenC)-1koe
LA0695dLIC12906Hypothetical protein/putative lipoprotein (LenA/LfhA/Lsa24)-1koe
LA1433dLIC12315Hypothetical protein/putative lipoprotein (LenD)-1koe
LA3103dLIC10997Hypothetical protein (LenB)-1koe
LA4073dLIC13248Hypothetical protein/putative lipoprotein (LenF)-1koe
LA4324dLIC13467Hypothetical protein/conserved hypothetical protein (LenE)-1koe
LA3370LIC10793Conserved hypothetical protein/surface antigen (Lp24)-1loq
LA0965LIC12676Conserved hypothetical proteinP251561d0b
LA1066LIC12601Conserved hypothetical protein-1dbg
LA1498LIC12260Conserved hypothetical protein-1ogq
LA2811LIC11217Conserved hypothetical proteinP251461ogq
LA3734LIC10495Conserved hypothetical protein/CM protein-1dab
LA3834cLIC13066Conserved hypothetical proteinP153451acc
LA4227LIC13381Conserved hypothetical protein-1sli
LA0663LIC12930Conserved hypothetical protein/hypothetical protein-1acc
LA0423cLIC10371Conserved hypothetical protein/putative lipoproteinP153451qjv
LA1567cLIC12209Conserved hypothetical protein/putative lipoproteinP153451czf
LA1568cLIC12209Conserved hypothetical protein/putative lipoproteinP153451czf/1dbg
LA1691cLIC12099Conserved hypothetical protein/putative lipoprotein-1acc
LA3340eLIC10821Conserved hypothetical protein/putative lipoprotein-1ee6
LA3394eLIC10774Conserved hypothetical protein/putative lipoprotein-1gq8
LA3501LIC10686Conserved hypothetical protein/putative lipoprotein-1air
LA0283cLIC10239Hypothetical protein-1air
LA0426cLIC10373Hypothetical proteinP569641acc
LA0996dLIC12668Hypothetical protein-1koe
LA1764LIC12048Hypothetical protein-1qlg
LA1869LIC12023Hypothetical protein-1k14
LA2272LIC11664Hypothetical protein1dab
LA3240LIC10898Hypothetical protein1rmg
LA0074LIC10067Hypothetical protein/conserved hypothetical protein-1dbg
LA1065LIC12602Hypothetical protein/conserved hypothetical protein-1dab
LA1762LIC12048Hypothetical protein/conserved hypothetical protein-1qcx
LA3649LIC10561Hypothetical protein/conserved hypothetical protein-1qcx
LA3881LIC13101Hypothetical protein/OM with integrin like repeat domainsP358251dab

Note LRR: Leucine-rich repeat, a: Swiss-Prot ID derived from DBsubloc database, b: PDB code derived from GTD prediction, c: pfam06739: SBBP (seven bladed beta propeller) repeat d: pfam07588: DUF1554, e: pfam07602: DUF1565

Table 7

23 Putative outer membrane proteins (OM) derived the 2 out of 3 predictions with significant DBSubloc or/and GTD prediction

Lai locusLIC locusProtein annotationSWISS-PROTaPDB codeb
LA3471LIC10711Iron-reglulated protein A/cytoplasmic membrane proteinP126081i5p
LA1161LIC12524Long-chain fatty acid transport protein/fatty acid transport protein-1kmo
LA1100LIC12575Outer membrane efflux protein/cytoplasmic membrane protein-1ek9
LA1445LIC12307Outer membrane efflux protein/OM- TolC superfamilyP504681ek9
LA3685LIC10537Outer membrane protein/PG- associated periplasmic proteinP383691r1m
LA0056LIC10050Outer membrane protein OmpA family/PG-associated CM proteinQ051461r1m
LA2318LIC11623Predicted outer membrane protein/outer membrane protein-1a0t
LA1968LIC11935Putative outer membrane protein/conserved hypothetical protein-1a0t
LA2444LIC11506Putative outer membrane protein/outer membrane protein-1fep
LB110LIC20087Putative outer membrane protein/outer membrane protein-1uyn
LA2242LIC11694TonB-dependent outer membrane receptorP463591fep
LA3242LIC10896TonB-dependent outer membrane receptorP374091kmo
LA0465LIC10405TPR-repeat-containing proteins/conserved hypotheticalP58937-
LA3675LIC10544Hypothetical protein/outer membrane protein-1a0t
LA2063LIC11851Conserved hypothetical protein/cytoplasmic membrane protein-1by5
LA3102LIC10998Conserved hypothetical proteinP761151nqe
LA3675LIC10544Hypothetical protein/outer membrane protein-1a0t
LA2168-Hypothetical proteinP431531a0t
LA3809LIC10439Hypothetical protein-1a0t
LA1501LIC12258Hypothetical protein-2mpr
LA3552LIC10647Hypothetical protein/conserved hypothetical protein-1kmo
LA2818LIC11211Hypothetical protein/conserved hypothetical protein-2mpr
LA4059LIC13238Hypothetical protein/conserved hypothetical protein-1by5
LB279LIC20214Hypothetical protein/conserved hypothetical protein-1kmo

Note a: Swiss-Prot ID derived from DBsubloc database, b: PDB code derived from GTD prediction

Table 8

25 Putative outer membrane proteins (OM) derived from the 1 out of 3 predictions with significant DBSubloc and/or GTD prediction

Lai locusCopen LocusProtein annotationSWISS-PROTaPDB Codeb
LA0616LIC12966LipL41/Outer membrane lipoprotein lipL41-1a17
LA2295LIC11643LipL45 proteinP029771l8w
LA0957LIC12693Outer membrane efflux protein/conserved hypothetical proteinP241451ek9
LA0581LIC12990Outer membrane efflux protein/conserved hypothetical proteinQ9ZHD21ek9
LA3733LIC10496Outer membrane efflux protein/conserved hypothetical protein-1ek9
LA0301LIC10258Outer membrane protein OmpA family/hypothetical proteinQ926C31r1m
LA0222LIC10191Outer membrane protein OmpA family/PG-associated CM proteinP222631r1m
LA1192LIC12499Putative outer membrane protein-1fep
LA1404LIC12337Putative outer membrane protein-2mpr
LA1931LIC11975Putative outer membrane protein/outer membrane protein-2mpr
LA1987LIC11918Putative outer membrane protein/conserved hypothetical protein-1osp
LB199LIC20157Putative outer membrane protein/conserved hypothetical protein-1fep
LA1030LIC12630TPR-repeat-containing proteins/hypothetical proteinP589371a17
LA0568LIC13002Conserved hypothetical protein-1kmo
LA1510LIC12252Conserved hypothetical protein-2mpr
LA0835LIC12791Hypothetical protein/conserved hypothetical protein-1fnf
LA2746LIC11268Hypothetical protein/conserved hypothetical protein-2mpr
LA2940LIC11121Hypothetical protein/conserved hypothetical protein-2mpr
LA2976LIC11086Hypothetical protein/conserved hypothetical protein-2mpr
LA3870LIC13089Hypothetical protein/conserved hypothetical protein-2mpr
LA4272LIC13418Hypothetical protein/conserved hypothetical protein-2mpr
LA4335LIC13477Hypothetical protein/conserved hypothetical protein-1kmo
LA0706LIC12898Unknown proteinP383701fep
LA1507LIC12254Unknown protein/outer membrane protein-1a0t
LA3853LIC13078Unknown protein/conserved hypothetical protein-1bxw

Note a: Swiss-Prot ID derived from DBsubloc database, b: PDB code derived from GTD prediction

Table 9

Protein subcellular localizations of L. interrogans predicted by PSORTb, PA, ProtCompB and the combination prediction

LocalizationSubcellular localization prediction

PSORTbPAProtCompBCombination prediction
Cytoplasm (CP)112592120132690
Cytoplasmic membrane (CM)6067151726*813
Outer membrane (OM)1122863
Periplasmic (PP)308647875
Extracellular (EX)29326510114
Unknown28252652-972

* Note that in ProtCompB prediction in this version, CM and OM were predicted as membrane proteins.

43 Putative extracellular proteins (EX) derived from the 2 out of 3 predictions with significant DBSubloc or/and GTD prediction Note LRR: Leucine rich repeat a: Swiss-Prot ID derived from DBsubloc database b: PDB code derived from GTD prediction c: Pfam: PF06739: SBBP (Seven Beta Blade Propeller domain) d: pfam07588: DUF1554 56 Putative extracellular proteins (EX) derived from the 1 out of 3 predictions with significant DBSubloc or/and GTD prediction Note LRR: Leucine-rich repeat, a: Swiss-Prot ID derived from DBsubloc database, b: PDB code derived from GTD prediction, c: pfam06739: SBBP (seven bladed beta propeller) repeat d: pfam07588: DUF1554, e: pfam07602: DUF1565 23 Putative outer membrane proteins (OM) derived the 2 out of 3 predictions with significant DBSubloc or/and GTD prediction Note a: Swiss-Prot ID derived from DBsubloc database, b: PDB code derived from GTD prediction 25 Putative outer membrane proteins (OM) derived from the 1 out of 3 predictions with significant DBSubloc and/or GTD prediction Note a: Swiss-Prot ID derived from DBsubloc database, b: PDB code derived from GTD prediction Protein subcellular localizations of L. interrogans predicted by PSORTb, PA, ProtCompB and the combination prediction * Note that in ProtCompB prediction in this version, CM and OM were predicted as membrane proteins.

Step 3: Cytoplasmic (CP) and cytoplasmic membrane proteins (CM) identified by Linear Discriminant Analysis (LDA)

The remaining 3725 proteins with unknown localization after step 2 were further analyzed using an LDA-based classifier we developed to identify CP and CM proteins using the set of CP and CM consensus outputs (418 CP proteins and 332 CM proteins) predicted by all of the three prediction programs (Additional file 2, 3) as a training set (see Materials and Methods). 2272 CP and 481 CM proteins were additionally identified from the 3725 "unknown set" by this approach (Additional file 4, 5). We also found that 66% (1501 out of 2272) of the LDA based predicted CP and 54% (260 out of 481) of the LDA based predicted CM are hypothetical or unknown proteins. In other words, overall 56.3 % (1516 out of 2690) of hypothetical and/or unknown proteins in the whole genome were assigned as CP and 38 % as CM or helix transmembrane proteins. After the final step in the prediction method, we are able to confidently predict the localization of 3755 (79.4%) Leptospiral proteins. Our combination method thus has a considerably improved recall over the PSORTB and PA methods, approaching that of ProtCompB (Table 1). To test the final prediction accuracy with estimated % agreement and % coverage of our combination method, we then performed the localization prediction of 28 experimentally verified proteins from several studies of Leptospiral outer membrane and extracellular, or cell surface proteins.

Protein subcellular localization prediction on the experimentally verified leptospiral outer membrane and extracellular proteins

As shown in the Additional file 6, the three prediction programs PSORTb, PA and ProtCompB gave markedly different predictions from one another for 28 experimentally OM and EX. Each of the three prediction programs had weaknesses, either poor agreement (ProtCompB) or low coverage (PSORTb and PA). Our combination approach was much better in the respect and showed good agreement and coverage.

Discussion

Computational prediction for protein subcellular localization is a key step for genome annotation and development of drug and vaccine target. In this study, we used a combination method to putatively assign CP, CM, PP, OM, and EX proteins. We combined the results from three different algorithms namely PSORTb, PA and ProtCompB into a consensus vote to obtain higher prediction accuracy. The combination approach has previously been used to significantly reduce, or exclude false positive predictions for membrane topology prediction [33], and outer membrane prediction [34]. In our case, the accuracy of consensus vote is very high, since well characterized OM and EX proteins were predicted including lactonizing lipase [35], microbial collagenase [36], O-sialoglycoprotein endopeptidase [37], Rhs family protein [38], CsgA or C factor [39], thermolysin [40], leucine rich repeat proteins (LRR) [41-43], Ton-B dependent outer membrane receptor proteins, OmpA, porin, heavy metal efflux pump, TolC, and general secretory pathway protein D (Table 4). On the other hand, the recall, or sensitivity of consensus vote prediction is low, especially for EX and OM. The recall for consensus vote is low, because PSORTb and PA programs are known to have limitations for some proteins. PSORTb requires a training set from a limited number of experimentally-determined proteins, while PA has a disadvantage in that query proteins have to share similarity to known proteins in the Swiss-Prot database [44]. Among high-throughput computational predictions for protein subcellular localization, PSORTb has been reported as the prediction tool that achieves the highest overall accuracy, followed closely by PA [22]. To overcome the limitations in PSORTb, PA and ProtCompB, the predictions for proteins predicted by only one or two out of the three prediction methods (the non consensus vote) were refined by homology-based search using the DBSubloc database and structural annotation in GTD. This allowed us to identify protein localizations with greater confidence. The advantage of GTD is that protein folding recognition or threading methods can determine pairs of proteins that have no obvious similarities in sequence, but have similar folds. It was previously suggested this approach should be carried out to increase prediction sensitivity for specific protein localization [22,45,46]. To our knowledge, this study is the first to employ GTD information to infer leptospiral protein localizations. Structure-based information from GTD prediction revealed that the majority of the 99 EX predictions were proteins that may be secreted by the type III or the type V (autotransport) system. These proteins are shown in Table 5, 6 with their corresponding PDB code. Many of the putative EX proteins that are annotated as leucine rich repeat (LRR) containing proteins share sequence similarity to PopC protein (Q9RBS2), which is secreted through the hrp-secretion apparatus or the type III secretion pathway of Ralstonia solanacearum [41]. Structurally related well-characterized extracellular LRR proteins in other species include YopM (PDB code 1jl5), a Yersinia pestis cytotoxin [43], internalin B [47], a virulence factor of Listeria monocytogenase (PDB code 1d0b) and polygalacturonase inhibiting protein (PDB code 1ogq), a secreted protein involved in plant defense [48]. It is of interest to note that several L. interrogans proteins are contained within the LRR and TPR (Tetratricopeptide repeat) protein families, but predicted sub-cellular localization is not necessarily conserved among all members within each family (Table 3, 5, 6, 7, 8, 9 and Table in additional file 4). The majority of LRR proteins were predicted to be EX localized, while TPR proteins were predicted in all compartments except PP. This finding is consistent with the multiple functions of TPR homologues from more distantly related species in different sub-cellular milieux, including signal transduction, chaperone activity, cell-cycle, transcription, and protein transport [49,50]. Out of 48 non-consensus vote of predicted OM, 24 were proteins annotated as outer membrane or putative outer membrane proteins, while of the remainder were proteins annotated as conserved hypothetical proteins. The structural information derived from the GTD prediction of the conserved or hypothetical proteins that were predicted as putative OM were the same as that of the annotated outer membrane proteins. As shown in Table 7, 8, it can be observed that 24 hypothetical proteins can now be annotated as putative OM. Although it is clear that the consensus vote combined with DB and GTD prediction can give robust prediction for EX, OM and PP, there are many proteins with either CP or CM localization remaining. Using our combination approach, we found that about 17% of genes encode putative CM proteins in L. interrogans serovar Lai genome, which is of similar proportion to the 20% – 30% CM proteins in other bacterial species [25,51]. From our subcellular location prediction we identified 63 OM and 114 EX proteins as potential vaccine candidates. On the other hand, it is possible to exclude 813 CM and 75 PP predicted proteins as vaccine candidates, on the basis of their localization. We compared our predictions with the previously published works. We found that 10 of 16 membrane proteins predicted by Gamberini et al. 2006, including four also demonstrated to be immunogenic among 8 pathogenic serovars in that study, were also predicted by our method as membrane proteins (2 EX, 1OM, 1PP and 6 CM) [18]. We examined the localizations of the 145 putative lipoproteins reported by Setubal et al. [19], and found 29 EX, 2 OM, 7 PP and 26 CM proteins among 125 probable lipoproteins, and 1 PP and 3 CM among 21 possible lipoproteins. The localizations of 63 putative lipoproteins could not be identified, which included proteins containing signal peptidase II recognition sites and proteins lacking sequence and/or structural homology to known membrane proteins (see Additional file 7). Spirochaetal lipoproteins are found in four subcellular compartments: the periplasmic leaflet of the cytoplasmic membrane, the periplasmic outer leaflet of the outer membrane, or beyond the outer membrane into the environment as extracellular proteins [52]. Therefore, 15 of the 145 putative lipoproteins identified as CP by our method are unlikely to be lipoproteins because of their localization. These false positive lipoproteins include UDP-glucose 6-dehydrogenase, cell-division protein, regulator of chromosome condensation RCC1 family, and 3-oxoacyl- [acyl-carrier protein] reductase. The frequency of falsely-identified lipoproteins just exceeds the reported 1% false positive rate for the SpLip program [52]. Our results can be considered as complementary to those reported by Setubal et al. [52], and increase the accuracy of lipoprotein prediction. We also compared our predictions with the 226 leptospiral surface exposed protein predictions (extracellular, outer membrane, periplasmic, inner (cytoplasmic) membrane by their localization definition) reported by Yang et al. [20] and found a concordance of 38.5 % (87/226) (see Additional file 8). We think the discrepancies arise from false assignments generated by the prediction algorithms used, which can be identified by comparison with proteins for which there are reliable experimental data of localization (see Additional file 6) [2-14,53-57]. Our predictions have a higher coverage and agreement with the experimentally tested L. interrogans protein set than the study by Yang et al. [20], suggesting that our prediction method may be of greater overall utility for genome annotation of membrane proteins. After manual inspection of predicted localizations, we found further examples of possible false assignments. The greatest discrepancy was found for 42 proteins were identified as CM by our method, but OM by Yang et al. Some proteins among this group have homologues in other species for which there is experimental evidence of CM location, including methyl-accepting chemotaxis protein mcpB [58], aerotaxis sensor receptor [59], and penicillin-binding protein [60]. It was found that several loci without localization annotation were assigned by the combination prediction method. Therefore, we propose that the annotations with respect to subcellular localization for these loci can be tentatively revised. Among this group of proteins, we noted additional similarities to known protein families. One prominent group with the the SBBP domain (seven beta blade propeller proteins, Pfam PF06739) contain 9 hypothetical proteins: LA0283 (LIC10239), LA0423 (LIC10371), LA0426 (LIC10373), LA1567 (LIC12209), LA1568(12209), LA1569 (LIC12208), LA1691 (LIC12099), LA3276 (LIC10868), LA3834 (LIC13066). Three loci annotated as hypothetical proteins or lipoproteins, namely LA0996 (LIC12668), LA0962 (LIC12690), and LIC13296 (LA4135), were predicted as EX localized (shown in Table 5, 6), and may belong to the Len (leptospiral endostatin-like lipoproteins) family, based on conservation of DUF1554 domain (pfam PF07588) and structural similarity to mammalian endostatin-like protein (PDB 1koe). These proteins act as adhesion proteins and bind to host extracellular matrix (ECM) [53,57] or human factor H [56]. (Table 5, 6 and Table in the Additional file 6). Furthermore, three loci LIC11207 (LA2823), LIC10821 (LA3340) and LIC10774 (LA3394) and LIC10365 (LA0416), previously described to have similarity with the leptospiral effector protein [54] were identified as putative EX proteins in agreement with their proposed immunomodulator function. Our combination prediction method has high agreement and coverage of experimentally verified OM and EX proteins (see Additional file 6). On the other hand, experimental localization studies are limited by insufficient sensitivity to detect low abundance proteins and cross contamination of cellular compartments during sample purification, as discussed previously by Rey et al. [21]. It is of note that several predicted PP proteins in this work e.g. FlaB1 periplasmic flagellin (LA2017/LIC11890) have previously been identified as possible PP contaminants in experimental studies of OMV proteins [13,20]; hence our prediction method may help in correct interpretation of future experimental verification studies, thus leading to better predictions in uncharacterized genomes. However, it should be emphasized that no automatic prediction can be accurate without experimental verification.

Conclusion

In this study, we have demonstrated that the specificity and sensitivity of protein subcellular localization prediction can be improved by incorporation of multiple predictive methods and structural information. By this approach, localizations can be assigned to previously hypothetical L. interrogans proteins. We think this approach is applicable for subcellular localization predictions in other prokaryote proteomes, with the caveat that some predictions are robust than others, i.e. CP and CM better than OM, EX or PP.

Materials and Methods

Data sets

Amino acid sequence queries were 4,727 proteins of Leptospria interrograns serovar Lai genome (chromosome I: NC_004342, chromosome II: NC_004343) [15] and 3,728 protein ORFs of Leptospira interrogans serovar Copenhageni strain (Fiocuz L1-I30) [accession number AEO16823 (chromosome I) and AEO16824 (chromosome II) [17] obtained from GenBank. Two datasets of proteins with known subcellular localization were used. One was an experimentally confirmed data set containing 278 CP and 309 CM of Gram-negative bacteria described by Gardy et al. 2003 [28] and used for validation of the LDA based classifier's performance. Another one was a 299 protein-data set containing 145 CP, 69 CM proteins, 29 PP, 38 OM and 18 EX which was the testing data previously used to evaluate various protein localization predictions in Gardy and Brinkman [22].

Computational Data sets mputational prediction tools for in silico protein localization

Several publicly available programs were used in combination of predictions. Protein subcellular localization for Gram-negative bacteria was carried out using PSORTb [27,28], Proteome analysis (PA) [29], and ProtCompB [30]. Feature based predictions for signal peptide sequence and α helix transmembrane proteins were identified using SignalP [23] and TMHMM [24,25] respectively.

Homology based searching and structural annotation

Homology search for subcellular localization information was carried out using BLAST search against DBSubloc, a localization specific protein database [31]. A protein folding recognition method for structural information used to predict the fold of protein sequence with distant homology to known structure was performed using homology search against GTD (the Genomic Threading Database) [32].

Prediction strategy (as shown in Figure 1)

Step 1. Consensus votes prediction

We reasoned that more accurate protein subcellular localization predictions can be gained from the consensus of methods. All leptospiral protein queries were analyzed using three subcellular localization prediction tools for Gram-negative bacteria, namely PSORTb, Proteome analysis (PA), and ProtCompB for cytoplasm (CP), cytoplasmic membrane (CM), periplasmic (PP), outer membrane (OM) and extracellular proteins (EX). Note that in this version ProtCompB prediction, CM and OM are not distinguished so both proteins are predicted as membrane proteins. The consensus prediction for each sequence was calculated using a simple majority vote type procedure. If all 3 methods agree for localization, it is assigned as a "consensus vote". The remaining results (1 or 2 out of 3 predicted) were assigned as "non-consensus vote". The CP and CM proteins assigned in this step were used as a training set for the development of LDA based classifier for CP and CM in a the next step.

Step 2. Homology-based and protein folding recognition prediction

Homology based and structural information can also be used to infer the potential localization site of query proteins [22,45,46]. Therefore, the remaining query proteins assigned as non-consensus vote results of PP, OM and EX were further analyzed for sequence and structure homology. Since subcellular localization is an evolutionarily conserved trait, if a protein query is homologous to a known protein with the same localization, the localization was assigned. The protein query sequences were compared to proteins in DBSubloc database at E-value ≤ 10-3 using BLAST search. Structure annotation of these queries was also performed using GTD prediction. The query proteins sequences were assigned to structures (shown as PDB code) with the high level of probability prediction (certain and high) for these protein queries. In this study, the confidence range based on p-value of measuring the reliability of the structure annotation as certain (0 ≤ p < 0.01%) and high (0.01% ≤ p < 0.1%) were considered as a statistically significant structure annotation.

Step 3. Identification of putative CP and CM using the LDA based classifier

A number of putative CP and CM identified as non-consensus vote results was further analyzed by SignalP and TMHMM. The feature attributors derived from SignalP and TMHMM predictions were then integrated and analyzed using the LDA based classifier. Proteins classified with probabilities ≥ 0.9 to be CP or CM proteins were taken as significant. The remaining queries that could not be identified in this step were classified as "unknown" results.

LDA based Classifier for CP and CM

We developed a specific classifier using the training set driven from the consensus vote prediction of leptospiral CP and CM proteins to increase the accuracy of prediction. In the classification-based prediction, our classifier was built on an LDA algorithm analyzing the value of multiple character vectors of SignalP-NN, SignalP-HMM and TMHMM prediction results of the set of training sequences. The accuracy of the LDA based classifier was investigated using leave-one out cross validation. We used experimentally determined or known CP and CM proteins of Gram-negative bacteria previously performed in the evaluation of PSORTb as a test dataset for validation of the LDA based classifier's performance [27]. Overall, the accuracy of LDA based classifier achieved 94.96%.

Authors' contributions

WV and SI participated in designed the research project. SI and EP carried out the computational analysis and developed LDA-based classifier. WV analyzed and interpreted the result, drafted and produced the manuscript. PP provided the further insights for refining the manuscript. All authors read and approved the final manuscript.

Additional file 1

Putative PP proteins in . This table lists the Lai locus and protein annotation of (A) 17 predicted PP derived from the consensus vote prediction (B) 20 predicted PP derived from 2 out of 3 predictions with significant DBsubloc and/or GTD predictions, (C) 38 predicted PP derived from 1 out of 3 predictions with significant DBsubloc and/or GTD predictions. Click here for file

Additional file 2

Putative CP proteins predicted by the consensus vote prediction in . This table lists the Lai locus and protein annotation of 418 predicted CP proteins derived from consensus vote and used as the training set for the development of the LDA based classifier. Click here for file

Additional file 3

Putative CM proteins predicted by the consensus vote prediction in . This table lists the Lai locus and protein annotation of 332 predicted CM proteins derived from consensus vote and used as the training set for the development of the LDA based classifier. Click here for file

Additional file 4

Putative CP proteins predicted by LDA based classifier of . This table lists the Lai locus and protein annotation of 2272 predicted CP proteins predicted by LDA based classifier Click here for file

Additional file 5

Putative CM proteins predicted by LDA based classifier of . This table lists the Lai locus and protein annotation of 481 predicted CM proteins predicted by LDA based classifier. Click here for file

Additional file 6

Subcellular localizations of 28 experimentally studied OM and EX proteins of . This table lists the protein name, L. interrogans serovar Lai and copenhengeni locus, experimental localization, subcellular localization prediction using PSORTb, ProtCompB, PA, and the combination prediction of 28 experimentally studied OM and EX proteins. Click here for file

Additional file 7

The result of subcellular localization of putative lipoproteins using the combination method. This table lists the Lai locus tag and protein annotation of 125 probable lipoproteins and 21 possible lipoproteins predicted by SpLip programs [19] and the subcellular localization of these lipoproteins predicted by the combination method. Click here for file

Additional file 8

Subcellular localization of vaccine candidate using the combination method.. This table lists the Lai locus tag and protein annotation of 226 vaccine candidate predicted by Yang et al. [20] and the subcellular localization of these vaccine candidates predicted by the combination method. Click here for file
  59 in total

Review 1.  TPR proteins: the versatile helix.

Authors:  Luca D D'Andrea; Lynne Regan
Journal:  Trends Biochem Sci       Date:  2003-12       Impact factor: 13.807

Review 2.  Leptospirosis: a zoonotic disease of global importance.

Authors:  Ajay R Bharti; Jarlath E Nally; Jessica N Ricaldi; Michael A Matthias; Monica M Diaz; Michael A Lovett; Paul N Levett; Robert H Gilman; Michael R Willig; Eduardo Gotuzzo; Joseph M Vinetz
Journal:  Lancet Infect Dis       Date:  2003-12       Impact factor: 25.071

3.  Better prediction of sub-cellular localization by combining evolutionary and structural information.

Authors:  Rajesh Nair; Burkhard Rost
Journal:  Proteins       Date:  2003-12-01

4.  DBSubLoc: database of protein subcellular localization.

Authors:  Tao Guo; Sujun Hua; Xinglai Ji; Zhirong Sun
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

5.  Molecular cloning and characterization of a novel leptospiral lipoprotein with OmpA domain.

Authors:  Nobuo Koizumi; Haruo Watanabe
Journal:  FEMS Microbiol Lett       Date:  2003-09-26       Impact factor: 2.742

6.  Unique physiological and pathogenic features of Leptospira interrogans revealed by whole-genome sequencing.

Authors:  Shuang-Xi Ren; Gang Fu; Xiu-Gao Jiang; Rong Zeng; You-Gang Miao; Hai Xu; Yi-Xuan Zhang; Hui Xiong; Gang Lu; Ling-Feng Lu; Hong-Quan Jiang; Jia Jia; Yue-Feng Tu; Ju-Xing Jiang; Wen-Yi Gu; Yue-Qing Zhang; Zhen Cai; Hai-Hui Sheng; Hai-Feng Yin; Yi Zhang; Gen-Feng Zhu; Ma Wan; Hong-Lei Huang; Zhen Qian; Sheng-Yue Wang; Wei Ma; Zhi-Jian Yao; Yan Shen; Bo-Qin Qiang; Qi-Chang Xia; Xiao-Kui Guo; Antoine Danchin; Isabelle Saint Girons; Ronald L Somerville; Yu-Mei Wen; Man-Hua Shi; Zhu Chen; Jian-Guo Xu; Guo-Ping Zhao
Journal:  Nature       Date:  2003-04-24       Impact factor: 49.962

7.  Pathogenic Leptospira species express surface-exposed proteins belonging to the bacterial immunoglobulin superfamily.

Authors:  James Matsunaga; Michele A Barocchi; Julio Croda; Tracy A Young; Yolanda Sanchez; Isadora Siqueira; Carole A Bolin; Mitermayer G Reis; Lee W Riley; David A Haake; Albert I Ko
Journal:  Mol Microbiol       Date:  2003-08       Impact factor: 3.501

8.  PSORT-B: Improving protein subcellular localization prediction for Gram-negative bacteria.

Authors:  Jennifer L Gardy; Cory Spencer; Ke Wang; Martin Ester; Gábor E Tusnády; István Simon; Sujun Hua; Katalin deFays; Christophe Lambert; Kenta Nakai; Fiona S L Brinkman
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

9.  The crystal structure of polygalacturonase-inhibiting protein (PGIP), a leucine-rich repeat protein involved in plant defense.

Authors:  A Di Matteo; L Federici; B Mattei; G Salvi; K A Johnson; C Savino; G De Lorenzo; D Tsernoglou; F Cervone
Journal:  Proc Natl Acad Sci U S A       Date:  2003-08-06       Impact factor: 11.205

10.  The Genomic Threading Database: a comprehensive resource for structural annotations of the genomes from key organisms.

Authors:  Liam J McGuffin; Stefano A Street; Kevin Bryson; Søren-Aksel Sørensen; David T Jones
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

View more
  14 in total

1.  A putative regulatory genetic locus modulates virulence in the pathogen Leptospira interrogans.

Authors:  Azad Eshghi; Jérôme Becam; Ambroise Lambert; Odile Sismeiro; Marie-Agnès Dillies; Bernd Jagla; Elsio A Wunder; Albert I Ko; Jean-Yves Coppee; Cyrille Goarant; Mathieu Picardeau
Journal:  Infect Immun       Date:  2014-03-31       Impact factor: 3.441

Review 2.  Leptospira: the dawn of the molecular genetics era for an emerging zoonotic pathogen.

Authors:  Albert I Ko; Cyrille Goarant; Mathieu Picardeau
Journal:  Nat Rev Microbiol       Date:  2009-10       Impact factor: 60.633

3.  Pathogenic Leptospira interrogans exoproteins are primarily involved in heterotrophic processes.

Authors:  Azad Eshghi; Elisa Pappalardo; Svenja Hester; Benjamin Thomas; Gabriela Pretre; Mathieu Picardeau
Journal:  Infect Immun       Date:  2015-05-18       Impact factor: 3.441

4.  In vitro identification of novel plasminogen-binding receptors of the pathogen Leptospira interrogans.

Authors:  Monica L Vieira; Marina V Atzingen; Tatiane R Oliveira; Rosane Oliveira; Daniel M Andrade; Silvio A Vasconcellos; Ana L T O Nascimento
Journal:  PLoS One       Date:  2010-06-22       Impact factor: 3.240

5.  Transcriptional responses of Leptospira interrogans to host innate immunity: significant changes in metabolism, oxygen tolerance, and outer membrane.

Authors:  Feng Xue; Haiyan Dong; Jinyu Wu; Zuowei Wu; Weilin Hu; Aihua Sun; Bryan Troxell; X Frank Yang; Jie Yan
Journal:  PLoS Negl Trop Dis       Date:  2010-10-26

6.  Extracellular proteome analysis of Leptospira interrogans serovar Lai.

Authors:  Lingbing Zeng; Yunyi Zhang; Yongzhang Zhu; Haidi Yin; Xuran Zhuang; Weinan Zhu; Xiaokui Guo; Jinhong Qin
Journal:  OMICS       Date:  2013-07-29

7.  Predicting the outer membrane proteome of Pasteurella multocida based on consensus prediction enhanced by results integration and manual confirmation.

Authors:  Teerasak E-komon; Richard Burchmore; Pawel Herzyk; Robert Davies
Journal:  BMC Bioinformatics       Date:  2012-04-27       Impact factor: 3.169

8.  Production and characterization of a polyclonal antibody of anti-rLipL21-IgG against leptospira for early detection of acute leptospirosis.

Authors:  Arivudainambi Seenichamy; Abdul Rani Bahaman; Abdul Rahim Mutalib; Siti Khairani-Bejo
Journal:  Biomed Res Int       Date:  2014-04-22       Impact factor: 3.411

Review 9.  Reverse Vaccinology: An Approach for Identifying Leptospiral Vaccine Candidates.

Authors:  Odir A Dellagostin; André A Grassmann; Caroline Rizzi; Rodrigo A Schuch; Sérgio Jorge; Thais L Oliveira; Alan J A McBride; Daiane D Hartwig
Journal:  Int J Mol Sci       Date:  2017-01-14       Impact factor: 5.923

10.  Overcoming function annotation errors in the Gram-positive pathogen Streptococcus suis by a proteomics-driven approach.

Authors:  Manuel J Rodríguez-Ortega; Inmaculada Luque; Carmen Tarradas; José A Bárcena
Journal:  BMC Genomics       Date:  2008-12-05       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.