Literature DB >> 18795116

SOSUI-GramN: high performance prediction for sub-cellular localization of proteins in gram-negative bacteria.

Kenichiro Imai1, Naoyuki Asakawa, Toshiyuki Tsuji, Fumitsugu Akazawa, Ayano Ino, Masashi Sonoyama, Shigeki Mitaku.   

Abstract

A predictive software system, SOSUI-GramN, was developed for assessing the subcellular localization of proteins in Gram-negative bacteria. The system does not require the sequence homology data of any known sequences; instead, it uses only physicochemical parameters of the N- and C-terminal signal sequences, and the total sequence. The precision of the prediction system for subcellular localization to extracellular, outer membrane, periplasm, inner membrane and cytoplasmic medium was 92.3%, 89.4%, 86.4%, 97.5% and 93.5%, respectively, with corresponding recall rates of 70.3%, 87.5%, 76.0%, 97.5% and 88.4%, respectively. The overall performance for precision and recall obtained using this method was 92.9% and 86.7%, respectively. The comparison of performance of SOSUI-GramN with that of other methods showed the performance of prediction for extracellular proteins, as well as inner and outer membrane proteins, was either superior or equivalent to that obtained with other systems. SOSUI-GramN particularly improved the accuracy for predictions of extracellular proteins which is an area of weakness common to the other methods.

Entities:  

Keywords:  Gram-negative bacteria; amino acids; physicochemical parameters; subcellular localization of proteins

Year:  2008        PMID: 18795116      PMCID: PMC2533062          DOI: 10.6026/97320630002417

Source DB:  PubMed          Journal:  Bioinformation        ISSN: 0973-2063


Background

Subcellular localization is one of the most important characteristics of proteins and is central to understanding their function and the constitution of biological systems. In bacteria, information related to the subcellular location of pathogen proteins can facilitate the development of drugs and vaccines for treatment. We previously developed a system called SOSUI for predicting inner membrane proteins from amino acid sequences [1,2]. Because of its high accuracy, this system can be used to improve the performance of the prediction of subcellular localization. An advantage of SOSUI is that all of the parameters used for the prediction [1,2] are the physicochemical properties of amino acids. By adopting a similar approach, we developed in this work a novel method for predicting subcellular localization in Gram-negative bacteria for proteins in the extracellular, the outer membrane, the periplasm, and the medium cytoplasm, which are less hydrophobic than the proteins in the inner membrane. The physicochemical properties of the N- and C-terminal signal regions, as well as a whole sequences were also used for predicting protein localization. By combining this method with the SOSUI membrane prediction system, we constructed a unified system, referred to as SOSUI-GramN, for the complete characterization of subcellular protein localization in the five compartments of Gram-negative bacteria. Several systems for predicting the subcellular localization of proteins in bacterial cells have already been developed: PSORTb, CELLO, PSLpred and P-CLASSIFIER [3-6]. The common weak point of those systems is precision and recall for the prediction of proteins localized in the extracellular medium are low [7]. The reasons for the low predictive performance associated with extracellular proteins may be due to the complex secretory pathways found in Gram-negative bacteria; a minimum of seven mechanisms have been reported to date [8,9]. In this work, we have assumed that the pathways through the cytoplasmic membrane can be categorized into two main groups: Sec dependent pathways and Sec independent pathways. Therefore, we classified proteins of the same localization into two main groups based on the physicochemical properties of their N-terminal signal regions prior to discrimination of the final localization of proteins. In addition, proteins with the same subcellular localization in each of the main groups were classified into several subclasses based on the physicochemical parameters of their N- and C-terminal signal sequences. Here we describe the development of a novel software system, SOSUI-GramN, which was achieved by combining the SOSUI system used for the prediction of inner membrane proteins with a new protein prediction system capable of discriminating among four different localizations of less hydrophobic proteins in Gram-negative bacteria, leading to an overall precision of 92.9%. Precision and recall obtained using for extracellular proteins SOSUI-GramN were markedly higher than the currently available systems, i.e. PSORTb, CELLO, PSLpred and P-CLASSIFIER etc.

Methodology

The dataset used for developing the novel system was obtained from ePSORTdb [10] and Swiss-Prot (release 54.4) by omitting data with sequence identities exceeding 50%. The dataset contained 1795 proteins: 733 in the cytoplasm, 396 in the inner membrane, 248 in the periplasm, 240 in the outer membrane and 178 in the extracellular medium. The dataset was randomly partitioned into five sets, four of which were used for training and the remaining data were used to evaluate the system. Furthermore, an additional test set of 299 proteins was created in order to be used for the comparison of different prediction systems [7] because a number of proteins of our test data were included in training data of other Method. There is no sequence, which is identical to training and test data, in the data of 299, and the sequence identity between this calibration data, the training data, and the test data was less than at most 50%. The prediction procedure of the SOSUI-GramN system consists of three layers of filters may reflects several pathways for the same subcellular localization site as shown in Figure 1. The first layer is the filter, consisting of SOSUI [1,2], required for distinguishing inner membrane proteins from the other classes of proteins. Some physiochemical parameter such as the distribution of hydrophobicty and amphiphilicity index [11,12] of amino acids around transmembrane regions and the size of protein, were combined in SOSUI to discriminate an inner membrane protein from other types of proteins. SOSUI achieved 98.0% accuracy of prediction for inner (or cytoplasmic) membrane protein of Prokayote. Because of its high accuracy, this system can distinctly differentiate inner membrane proteins from other proteins.
Figure 1

A simplified SOSUI-GramN flowchart.

Using the physicochemical properties of N-terminal signal sequence region and whole sequence, the second filter then classifies all of the proteins that are not inner membrane proteins into two groups depending on whether they are involved in the Sec dependent or independent pathways. The proteins of the Sec dependent pathway generally have the signal sequence, consisting of the positively charged region, hydrophobic regions, and polar region in N-terminal and these are not found in the proteins involved in the Sec independent pathway. The physicochemical parameters, required for the second filter, are hydrophobicity index [11], the densities of positively and negatively charged residues, small polar and non-polar residues, aromatic residues, secondary structure-breaking residues such as proline and glycine, the helical periodicity of 3.6 residues, and the alternate periodicity of the two residues of the hydrophobicity index. We calculated the average of these parameters of the region of 60 residues around N-terminal sorting signal and total sequence, and then distinguished Sec dependent proteins from Sec independent proteins by the canonical discriminant analysis with using the average of these parameters of each target region. The fundamental process required for the discrimination is described below. (i) The target region was divided into ten segments, except for total sequence, and then an average of parameter for segment was calculated by equation (1) (see supplementary material). When the target region is total sequence, the region is defined as one segment. (ii) To obtain the normalized p-th parameter of the target region j, the average values is normalized by the difference between positive data and negative data by equation (2) (see supplementary material). Finally, we discriminate positive data from negative data by the canonical discriminant analysis with using normalized parameters. The third filter consists of several prediction modules based on the subclassification of the extracellular proteins, the outer membrane proteins, the periplasmic proteins and the cytoplasmic proteins. Assuming that there are several pathways for the same subcellular localization site, we classified proteins into several subclasses by the canonical discriminant analysis based on the average of physicochemical parameters of the 60 residues of the most of N- and C-terminal, which may include sorting signals, prior to discrimination analysis. We then distinguished a subclass from the other subclasses by the same discriminant analysis with the average of physicochemical parameters of 60, 100 and 200 residues of N- and C-terminal residues, as well as the entire sequence. Consequently, the assembly of the prediction modules required for the third layer are relatively complicated, but basically, each prediction module correspond to the discrimination of each subclass from the other subclasses and the discrimination method is the same as the method is used in second filter. The parameters, for these subclassification and prediction modules, are also identical to that for the discrimination in the second filter. The number of prediction modules was five for the prediction of the extracellular proteins, six for the outer membrane proteins and three for the periplasmic proteins, one for the cytoplasmic proteins as shown in Figure 1.

Discussion

When the performance of the SOSUI-GramN prediction system for characterizing subcellular localization to the compartments of the extracellular, the outer membrane, the periplasm, the inner membrane, and the cytoplasm medium was evaluated using test data, precision estimates of 92.3%, 89.4%, 86.4%, 97.5% and 93.5%, and corresponding recall values of 70.3%, 87.5%, 76.0%, 97.5% and 88.4% were obtained, respectively. The overall performance for precision and recall was 92.9% and 86.7% (Table 1a under supplementary material). In this study, precision was calculated as TP / (TP+FP) and recall as TP / (TP+FN), where TP, FP and FN represent the number of true positive, false positive and false negative data, respectively. Table 1b (supplementary material) shows the performance of our system relative to that of other systems using 299 proteins as the calibration data. The assessment of individual methods reveals that the highest overall precision and recall were achieved by Psortb and SOSUI-GramN, respectively. The prediction of the extracellular proteins was particularly low accuracy and common weakness of the other methods, however, our system archived the highest precision and recall, which are 90.6 and 50.0%, respectively. Our system SOSUI-GramN significantly improved the precision of extracellular proteins. The performance of the prediction of outer and inner membrane protein was superior or equivalent to that of the other method. The highest precision of outer membrane proteins was achieved by Psortb, however our method attained the highest recall at 92.1%. Psortb performed the highest precision and recall for inner membrane protein at 96.5 and 79.7%, respectively followed closely by our method at 96.2 and 72.5%. The other hand, the highest precision for periplasmic and cytoplasmic proteins are attained by Psortb, and CELLO, PSLpred and P-CLASSIFIER archived higher recall for cytoplasmic than our method. The accuracy of prediction for periplasmic and cytoplasmic protein is lower than the other methods, however, our method predicted extracellular protein, outer and inner membrane protein, which are important for facilitating the development of drugs for bacterial pathogen, with high accuracy, so SOSUI-GramN would be particularly useful for screening amino acid sequences in medical applications.

Conclusion

Gram-negative bacteria have five major subcellular localization sites, which are extracellular, outer membrane, periplasm, inner membrane and cytoplasm. Proteins synthesized in cytosol and then are targeted and transported to their localization sites through translocation pathways, however the translocation pathways to same localization site is not single pathway. The secretion pathways of extracellular proteins are at least seven pathways, so this complicated pathways result in the diversity of target signals, moreover the low prediction accuracy of extracelluar proteins. In this study, we developed the novel method for prediction the subcellular localization of proteins in Gram-negative bacteria based on the classification of protein on the assumption of the several pathways for the same localization site. The comparison of individual methods (Table 1b in supplementary material) reveals that the performance associated with the prediction of extracellular proteins, which is an area of weakness common to many of the other methods, was superior to that obtained with other systems. Most of sorting signals is not fully understood, but our classification based on physicochemical properties around the sorting signal may provide the clue to make clear features of sorting signals. The SOSUI-GramN system is a web-based application that users can use after inputting their single or multiple sequences on the submission. Note that a minimum input sequence length of more than 60 amino acids is required. The predicted subcellular location of the proteins of interest in Gram-negative bacterial cells is shown on the output page. In the event that the sequence length is less than 60 amino acids or if the confident prediction cannot be estimated, the SOSUI-GramN returns a value of “unknown”. The population of “unknown” in data tested was only about 7% which is smaller than the currently available systems. The SOSUI-GramN system is available at http://bp.nuap.nagoya-u.ac.jp/sosui/sosuigramn/sosuigramn_submit.html.
  11 in total

1.  Amphiphilicity index of polar amino acids as an aid in the characterization of amino acid preference at membrane-water interfaces.

Authors:  Shigeki Mitaku; Takatsugu Hirokawa; Toshiyuki Tsuji
Journal:  Bioinformatics       Date:  2002-04       Impact factor: 6.937

2.  PSLpred: prediction of subcellular localization of bacterial proteins.

Authors:  Manoj Bhasin; Aarti Garg; G P S Raghava
Journal:  Bioinformatics       Date:  2005-02-04       Impact factor: 6.937

3.  PSORTb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis.

Authors:  J L Gardy; M R Laird; F Chen; S Rey; C J Walsh; M Ester; F S L Brinkman
Journal:  Bioinformatics       Date:  2004-10-22       Impact factor: 6.937

Review 4.  Mechanisms of protein export across the bacterial outer membrane.

Authors:  Maria Kostakioti; Cheryl L Newman; David G Thanassi; Christos Stathopoulos
Journal:  J Bacteriol       Date:  2005-07       Impact factor: 3.490

5.  Prediction of protein subcellular localization.

Authors:  Chin-Sheng Yu; Yu-Ching Chen; Chih-Hao Lu; Jenn-Kang Hwang
Journal:  Proteins       Date:  2006-08-15

Review 6.  Methods for predicting bacterial protein subcellular localization.

Authors:  Jennifer L Gardy; Fiona S L Brinkman
Journal:  Nat Rev Microbiol       Date:  2006-09-11       Impact factor: 60.633

Review 7.  Protein secretion systems and adhesins: the molecular armory of Gram-negative pathogens.

Authors:  Roman G Gerlach; Michael Hensel
Journal:  Int J Med Microbiol       Date:  2007-05-07       Impact factor: 3.473

8.  SOSUI: classification and secondary structure prediction system for membrane proteins.

Authors:  T Hirokawa; S Boon-Chieng; S Mitaku
Journal:  Bioinformatics       Date:  1998       Impact factor: 6.937

9.  A simple method for displaying the hydropathic character of a protein.

Authors:  J Kyte; R F Doolittle
Journal:  J Mol Biol       Date:  1982-05-05       Impact factor: 5.469

10.  PSORTdb: a protein subcellular localization database for bacteria.

Authors:  Sébastien Rey; Michael Acab; Jennifer L Gardy; Matthew R Laird; Katalin deFays; Christophe Lambert; Fiona S L Brinkman
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

View more
  42 in total

1.  Quantitative proteomics of the Neisseria gonorrhoeae cell envelope and membrane vesicles for the discovery of potential therapeutic targets.

Authors:  Ryszard A Zielke; Igor H Wierzbicki; Jacob V Weber; Philip R Gafken; Aleksandra E Sikora
Journal:  Mol Cell Proteomics       Date:  2014-03-08       Impact factor: 5.911

2.  Comparative Genomic Analysis of Closely Related Acetobacter pasteurianus Strains Provides Evidence of Horizontal Gene Transfer and Reveals Factors Necessary for Thermotolerance.

Authors:  Minenosuke Matsutani; Nami Matsumoto; Hideki Hirakawa; Yuh Shiwa; Hirofumi Yoshikawa; Akiko Okamoto-Kainuma; Morio Ishikawa; Naoya Kataoka; Toshiharu Yakushi; Kazunobu Matsushita
Journal:  J Bacteriol       Date:  2020-03-26       Impact factor: 3.490

3.  Transcription level analysis of intracellular Burkholderia pseudomallei illustrates the role of BPSL1502 during bacterial interaction with human lung epithelial cells.

Authors:  Teerasit Techawiwattanaboon; Tanachaporn Bartpho; Rasana Wongratanacheewin Sermswan; Sorujsiri Chareonsudjai
Journal:  J Microbiol       Date:  2015-01-28       Impact factor: 3.422

4.  Quantitative Proteomics of the 2016 WHO Neisseria gonorrhoeae Reference Strains Surveys Vaccine Candidates and Antimicrobial Resistance Determinants.

Authors:  Fadi E El-Rami; Ryszard A Zielke; Teodora Wi; Aleksandra E Sikora; Magnus Unemo
Journal:  Mol Cell Proteomics       Date:  2018-10-23       Impact factor: 5.911

5.  CoBaltDB: Complete bacterial and archaeal orfeomes subcellular localization database and associated resources.

Authors:  David Goudenège; Stéphane Avner; Céline Lucchetti-Miganeh; Frédérique Barloy-Hubler
Journal:  BMC Microbiol       Date:  2010-03-23       Impact factor: 3.605

6.  Structure of a membrane-attack complex/perforin (MACPF) family protein from the human gut symbiont Bacteroides thetaiotaomicron.

Authors:  Qingping Xu; Polat Abdubek; Tamara Astakhova; Herbert L Axelrod; Constantina Bakolitsa; Xiaohui Cai; Dennis Carlton; Connie Chen; Hsiu Ju Chiu; Thomas Clayton; Debanu Das; Marc C Deller; Lian Duan; Kyle Ellrott; Carol L Farr; Julie Feuerhelm; Joanna C Grant; Anna Grzechnik; Gye Won Han; Lukasz Jaroszewski; Kevin K Jin; Heath E Klock; Mark W Knuth; Piotr Kozbial; S Sri Krishna; Abhinav Kumar; Winnie W Lam; David Marciano; Mitchell D Miller; Andrew T Morse; Edward Nigoghossian; Amanda Nopakun; Linda Okach; Christina Puckett; Ron Reyes; Henry J Tien; Christine B Trame; Henry van den Bedem; Dana Weekes; Tiffany Wooten; Andrew Yeh; Jiadong Zhou; Keith O Hodgson; John Wooley; Marc André Elsliger; Ashley M Deacon; Adam Godzik; Scott A Lesley; Ian A Wilson
Journal:  Acta Crystallogr Sect F Struct Biol Cryst Commun       Date:  2010-07-31

7.  Extracellular proteome analysis of Leptospira interrogans serovar Lai.

Authors:  Lingbing Zeng; Yunyi Zhang; Yongzhang Zhu; Haidi Yin; Xuran Zhuang; Weinan Zhu; Xiaokui Guo; Jinhong Qin
Journal:  OMICS       Date:  2013-07-29

8.  Proteome-wide subcellular topologies of E. coli polypeptides database (STEPdb).

Authors:  Georgia Orfanoudaki; Anastassios Economou
Journal:  Mol Cell Proteomics       Date:  2014-09-10       Impact factor: 5.911

9.  Proteomics-driven Antigen Discovery for Development of Vaccines Against Gonorrhea.

Authors:  Ryszard A Zielke; Igor H Wierzbicki; Benjamin I Baarda; Philip R Gafken; Olusegun O Soge; King K Holmes; Ann E Jerse; Magnus Unemo; Aleksandra E Sikora
Journal:  Mol Cell Proteomics       Date:  2016-05-02       Impact factor: 5.911

10.  Genomic Target Database (GTD): a database of potential targets in human pathogenic bacteria.

Authors:  Debmalya Barh; Anil Kumar; Amarendra Narayana Misra
Journal:  Bioinformation       Date:  2009-08-17
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.