Literature DB >> 22570522

LIPOPREDICT: bacterial lipoprotein prediction server.

S Ramya Kumari1, Kiran Kadam, Ritesh Badwaik, Valadi K Jayaraman.   

Abstract

Bacterial lipoproteins have many important functions owing to their essential nature and roles in pathogenesis and represent a class of possible vaccine candidates. The prediction of bacterial lipoproteins from sequence is thus an important task for computational vaccinology. A Support Vector Machines (SVM) based module for predicting bacterial lipoproteins, LIPOPREDICT, has been developed. The best performing sequence model were generated using selected dipeptide composition, which gave 97% accuracy of prediction. The results obtained were compared very well with those of previously developed methods.

Entities:  

Keywords:  Bacterial lipoproteins; Support Vector Machine (SVM); compositional features; prediction server

Year:  2012        PMID: 22570522      PMCID: PMC3346017          DOI: 10.6026/97320630008394

Source DB:  PubMed          Journal:  Bioinformation        ISSN: 0973-2063


Background

Bacterial lipoproteins have many important functions owing to their essential nature and roles in pathogenesis representing a class of possible vaccine candidates. They are functionally diverse class of membrane-anchored proteins that typically represent approximately 2% of the bacterial proteome [1]. They consist of a large group of proteins and perform many different functions: promote antibiotic resistance, cell signaling and substrate binding in ABC transport systems, protein export, sporulation, germination, bacterial conjugation, and many others are yet to be assigned a function [2]. Lipoproteins are required for virulence in many bacteria. They perform variety of roles in host-pathogen interaction, from surface adhesion and initiation of inflammatory processes through translocation of virulence factors into the host cytoplasm [3]. Several methods have been devised in literature to predict bacterial lipoproteins, using different approaches and data sets. Identification of Gram-positive bacterial lipoproteins has resulted in various servers and databases such as DOLOP [4], LIPO [5], PSORT [6] and ScanProsite [7]. LipoP [8] and Phobius [9] use hidden Markov model. LipPred [10] uses Naive-Bayesian network and SPEPLip [11] uses neural network. LipoP [8] identification of Gram- negative bacterial lipoproteins uses pattern matching methodology. In this work, we present a SVM based method using amino acid composition to identify bacterial lipoproteins.

Methodology

Bacterial Lipoprotein Dataset:

The dataset of bacterial lipoproteins consists of experimentally annotated 222 sequences. It has been derived from distinct bacterial lipoproteins available in the DOLOP [4] database.

Bacterial Non Lipoprotein Dataset:

222 bacterial non lipoprotein sequences which were obtained from various databases such as NCBI [12], UNIPROT [13] were used for the construction of the dataset. Both the datasets were compiled after performing CD-HIT. The program CD-HIT (Cluster Database at High Identity with Tolerance) [14] [15] removes homologous sequences by clustering the protein dataset at user-defined sequence identity thresholds. Here we employed multiple CD-HIT runs; for example 90%, and then 60% and then 50% generating more efficient non-redundant datasets.

Server Implementation:

The prediction method described in this paper is implemented in the form of a web-server LIPOPREDICT: Bacterial Lipoprotein prediction server (Figure 1). Bacterial lipoproteins are a diverse and functionally important group of proteins that are amenable to bioinformatic analyses because of their unique signal peptide features. They are characterized by the presence of a signal peptide in their Nterminus, followed by presence of a specific cysteine residue [16]. The lipidation motif, represented in PROSITE by the regular expression DERK (6) [LIVMFW STAG] (2) − [LIVMFYSTAGCQ] −[AGS]−C(PS00013), is present in both Gram- positive and Gram-negative bacteria. Signal peptides of bacterial lipoproteins possess many distinctive physio chemical features, along with the presence or absence of specific amino acids [17]. G. von Heijne [18] showed that considerable structural and compositional differences exist between signal peptides of bacterial lipoproteins and bacterial non lipoproteins. The two classes differ from each other in terms of the physio-chemical properties like charge, hydrophobicity, secondary structure propensities and amino acid size. All those differences in signal peptides from lipoproteins and non-lipoproteins can be attributed to the amino acid composition of the signal peptide. Hence compositional features like amino acid and dipeptide composition can be employed for discriminating these signal peptides, which in turn will result in differentiating bacterial lipoproteins and non-lipoproteins. In our work, we analyzed the amino acid sequence of the signal peptides of lipoproteins and bacterial non lipoproteins. The average length of signal peptides in bacteria ranges from 24 amino acids for Gram-negatives and 32 amino acids for Gram positives [19]. Considering few variations in the lengths, we selected first 35 residues for the analysis.
Figure 1

Snapshot of the index page of LIPOPREDICT server.

Server description

Prediction Input Interface:

Users can click on the prediction icon and the prediction interface displays various input type option. In which user can either type or paste sequence (Figure 2) or submit the file using the option upload file (Figure 3). Submitting sequence/s must be in FASTA format, on submission the input sequence is validated and if invalid the errors are reported to the user to rectify the problem.
Figure 2

Snapshot of query prediction page − Type or Paste Sequence.

Figure 3

Snapshot of query prediction page − Upload File.

Prediction Output Interface:

When the run prediction button is clicked, the users are directed to the results page, best model (Selected Dipeptide Composition) with highest cross validation were created and used for prediction. Compositional feature model selected dipeptide is been used to generate prediction results. Support vector machines (SVMs) with probability estimates are calculated and the prediction result with probability estimate of the sequence belonging to the respective class is displayed in the result page. Result page also gives an option to download the prediction results for further use.

Discussion

SVM kernel types and kernel parameters were tuned based on 10 fold cross validation accuracy as performance measure. The results are tabulated in Table 1 (see supplementary material). Bacterial lipoprotein prediction problem with feature selection gave the highest accuracy of 97% with selected dipeptide composition feature. We employed information gain feature selection using WEKA software [20] with the view to extracting the subsets of informative features. With feature selection the maximum accuracy increased for selected dipeptide, so we employed 67 selected dipeptide composition features as SVM domain feature input for building the model.

Conclusions

In this study we have presented a prediction server, LIPOPREDICT for identification and classification of bacterial lipoproteins. The server employs Support Vector Machines supervisory learning model, which is rigorously based on statistical learning theory. For prediction of bacterial lipoprotein, selection of most informative features in dipeptide composition further improved model accuracy. Our results indicate that this SVM model can be employed for accurate prediction of bacterial lipoproteins. The prediction models include probability measures in the output, so it can be used to assess the confidence of SVM predictions. Further, our user friendly web server can be readily used for annotation of novel proteins.
  17 in total

1.  Clustering of highly homologous sequences to reduce the size of large protein databases.

Authors:  W Li; L Jaroszewski; A Godzik
Journal:  Bioinformatics       Date:  2001-03       Impact factor: 6.937

2.  Tolerating some redundancy significantly speeds up clustering of large protein databases.

Authors:  Weizhong Li; Lukasz Jaroszewski; Adam Godzik
Journal:  Bioinformatics       Date:  2002-01       Impact factor: 6.937

3.  Pattern searches for the identification of putative lipoprotein genes in Gram-positive bacterial genomes.

Authors:  Iain C Sutcliffe; Dean J Harrington
Journal:  Microbiology       Date:  2002-07       Impact factor: 2.777

4.  A combined transmembrane topology and signal peptide prediction method.

Authors:  Lukas Käll; Anders Krogh; Erik L L Sonnhammer
Journal:  J Mol Biol       Date:  2004-05-14       Impact factor: 5.469

5.  Improved prediction of signal peptides: SignalP 3.0.

Authors:  Jannick Dyrløv Bendtsen; Henrik Nielsen; Gunnar von Heijne; Søren Brunak
Journal:  J Mol Biol       Date:  2004-07-16       Impact factor: 5.469

Review 6.  Lipoproteins of bacterial pathogens.

Authors:  A Kovacs-Simon; R W Titball; S L Michell
Journal:  Infect Immun       Date:  2010-10-25       Impact factor: 3.441

Review 7.  Lipoprotein biogenesis in Gram-positive bacteria: knowing when to hold 'em, knowing when to fold 'em.

Authors:  Matthew I Hutchings; Tracy Palmer; Dean J Harrington; Iain C Sutcliffe
Journal:  Trends Microbiol       Date:  2008-12-06       Impact factor: 17.079

8.  Distinctive properties of signal sequences from bacterial lipoproteins.

Authors:  P Klein; R L Somorjai; P C Lau
Journal:  Protein Eng       Date:  1988-04

Review 9.  Lipoproteins in bacteria.

Authors:  S Hayashi; H C Wu
Journal:  J Bioenerg Biomembr       Date:  1990-06       Impact factor: 2.945

10.  Analysing the outer membrane subproteome of Methylococcus capsulatus (Bath) using proteomics and novel biocomputing tools.

Authors:  Frode S Berven; Odd André Karlsen; Anne Hege Straume; Kristian Flikka; J Colin Murrell; Anne Fjellbirkeland; Johan R Lillehaug; Ingvar Eidhammer; Harald B Jensen
Journal:  Arch Microbiol       Date:  2005-11-26       Impact factor: 2.552

View more
  1 in total

1.  Resistome of Staphylococcus aureus in Response to Human Cathelicidin LL-37 and Its Engineered Antimicrobial Peptides.

Authors:  Radha M Golla; Biswajit Mishra; Xiangli Dang; Jayaram Lakshmaiah Narayana; Amy Li; Libin Xu; Guangshun Wang
Journal:  ACS Infect Dis       Date:  2020-05-11       Impact factor: 5.084

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.