Literature DB >> 17517783

WoLF PSORT: protein localization predictor.

Paul Horton1, Keun-Joon Park, Takeshi Obayashi, Naoya Fujita, Hajime Harada, C J Adams-Collier, Kenta Nakai.   

Abstract

WoLF PSORT is an extension of the PSORT II program for protein subcellular location prediction. WoLF PSORT converts protein amino acid sequences into numerical localization features; based on sorting signals, amino acid composition and functional motifs such as DNA-binding motifs. After conversion, a simple k-nearest neighbor classifier is used for prediction. Using html, the evidence for each prediction is shown in two ways: (i) a list of proteins of known localization with the most similar localization features to the query, and (ii) tables with detailed information about individual localization features. For convenience, sequence alignments of the query to similar proteins and links to UniProt and Gene Ontology are provided. Taken together, this information allows a user to understand the evidence (or lack thereof) behind the predictions made for particular proteins. WoLF PSORT is available at wolfpsort.org.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17517783      PMCID: PMC1933216          DOI: 10.1093/nar/gkm259

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Bilipid membranes divide eukaryotic cells into various types of organelles containing characteristic proteins and performing specialized functions. Thus, subcellular localization information gives an important clue to a protein's function. Although localization signals in mRNA appear to play some role (1), the main determinant of a protein's localization residues in the protein's amino acid sequence. (We recommend wikipedia.org/wiki/Protein_targeting for a brief overview and Alberts et al. (2) for a textbook description.) Numerous experiments to determine protein localization have been performed to date. These can broadly be classified as: small-scale experiments—the results of which continue to accumulate in public databases, such as UniProt (3) and Gene Ontology (4); and large-scale experiments using epitope (5) or green fluorescent protein (GFP) (6) tagging, or by separation of organelles by centrifugation combined with protein identification by mass spectrometry (7,8). Although they provide invaluable information, the coverage of experimental data is only high for model organisms, particularly yeast. Moreover, the agreement amongst large-scale experimental data is only 75–80% (6–9). Thus, computational prediction of localization from amino acid remains an important topic. Numerous computational methods are available [reviewed in (10,11)]. Some (including WoLF PSORT) have recently been benchmarked by Sprenger et al. (12), who found the computational methods to be useful for sites, such as the nucleus, for which many training examples can be easily obtained from UniProt (which is the source of most or all of the training data for most prediction methods—including WoLF PSORT). The different methods they benchmarked were found to have different strengths. Here, we describe the public server for our WoLF PSORT method.

PREDICTION METHOD

WoLF PSORT is an extension of PSORT II (13,14) and also uses the PSORT (15) localization features for prediction. In addition, WoLF PSORT uses some features from iPSORT (16) and amino acid composition. Those features are used to convert amino acid sequences into numerical vectors, which are then classified with a weighted k-nearest neighbor classifier. WoLF PSORT uses a wrapper method to select and use only the most relevant features. This reduces the amount of information which needs to be considered (and displayed) for the user to interpret individual predictions and may also make the predictor less prone to over learning. The prediction method has described in more detail elsewhere (17).

Dataset

The WoLF PSORT dataset is divided into fungi, plant and animal containing 2113, 2333 and 12771 proteins, respectively. The current data was primarily obtained from UniProt (3) version 45, but subcellular localization information from Gene Ontology (4) was also used. Entries with evidence codes {TAS, IDA, IMP} were included, with manual revisions in a few cases. We intend to update these datasets regularly in the future.

LOCALIZATION SITES AND PREDICTION ACCURACY

WoLF PSORT classifies proteins into more than 10 localization sites, including dual localization such as proteins which shuttle between the cytosol and nucleus. Based on our cross-validation studies (17), we estimate sensitivity and specificity of around 70% for: nucleus, mitochondria, cytosol, plasma membrane, extracellular and (in plants) chloroplast. For other sites, such as peroxisome, Golgi, etc. the sensitivity is very low, but useful predictions are still made in some cases. For example, the Arabidopsis seed protein 12S1_ARATH is reasonably predicted to localize to the vacuole even though only one of its neighbors (see below) shares significant sequence similarity. An independent test (12) on mouse proteins gave a significantly lower estimate of WoLF PSORT's prediction accuracy (around 50%). This discrepancy may be explained by the over-representation of well-studied proteins in the WoLF PSORT training data and perhaps also by the size of their test data (in particular, their `LOC2145' test set contained only 87 cytosolic proteins) or differences in site definition.

PREDICTION RESULTS DISPLAY

The k-nearest neighbors classifier allows for an intuitive display of the prediction results which is exactly analogous to sequence similarity search. Using multifasta format, multiple sequences can be given in a query. The first page returned from the server gives a one line summary of the result for each query sequence. For example the prediction summary line for the TCOF_HUMAN protein is: The localization sites are abbreviated to four letter codes (documented on the server) with dual localization denoted by joining the four letter codes with an underscore character. The numbers roughly indicate the number of nearest neighbors to the query which localize to each site—but are adjusted to account for the possibility of dual localization (17). TCOF_HUMAN

Neighbor list

Details about the queries neighbor list and localization signals can be obtained by following the `details' link. The first part of the display page is a neighbor list table such as the one shown in Figure 1. This list gives information regarding the query's neighbors (proteins in the WoLF PSORT training data that have the most similar localization features). For user convenience, the percent identity and a link to the alignment of each neighbor to the query is given. Sequence similarity is not used for prediction but can provide additional corroborating evidence in many cases. Links to the relevant entries in UniProt, Gene Ontology and TAIR (www.arabidopsis.org) for many Arabidopsis entries are also provided.
Figure 1.

Part of the list of proteins similar to the query protein, an isoform of TCOF_HUMAN, is shown. For each neighbor the following is shown: UniProt ID, localization site, the distance in localization features from the query, the percent identity to the query, a link to its UniProt entry, the subcellular localization line from UniProt and other available localization information.

Part of the list of proteins similar to the query protein, an isoform of TCOF_HUMAN, is shown. For each neighbor the following is shown: UniProt ID, localization site, the distance in localization features from the query, the percent identity to the query, a link to its UniProt entry, the subcellular localization line from UniProt and other available localization information.

Localization feature table

By scrolling down on the detailed results pages, one can find a feature table giving the values of each localization feature for the query and its neighbors. In some cases, the individual values can help support (or question) the predicted site. For example in the case of TCOF_HUMAN (Figure 2), the 99 percentile value of the PSORT localization feature ‘nuc’ (which is based on nuclear localization signals and DNA-binding site motifs), is consistent with the nuclear prediction. Below the normalized table, a similar table with the raw feature values is displayed.
Figure 2.

The localization features for the query and its neighbors are shown. The values are normalized to percentiles relative to the WoLF PSORT training data. Neighbor values shown in blue are within 10% points to the query value, while those shown in red are 20 or more percentile point different from the query.

The localization features for the query and its neighbors are shown. The values are normalized to percentiles relative to the WoLF PSORT training data. Neighbor values shown in blue are within 10% points to the query value, while those shown in red are 20 or more percentile point different from the query.

IMPLEMENTATION

The server is implemented with Mason (www.masonhq.com), which allows convenient embedding of logic and computed results into html via the Perl programming language. Multiple requests are handled with the simple strategy of returning the results in a URI containing an MD5 hash of the query contents. Upon sending a query a wait page is shown, followed by an automatic redirect to the results page upon task completion (usually requiring around 40 s). Task scheduling is delegated to Apache and the Linux operating system. Multiple sequences are allowed in one query, but we currently limit the query size to 64 KB. For large-scale use, such as whole genome annotation, we encourage users to download the stand-alone package (available on the server) and run WoLF PSORT locally.

SUMMARY

WoLF PSORT not only provides subcellular localization prediction with competitive accuracy, but also provides detailed information relevant to protein localization to help users to form their own hypotheses.
  13 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization.

Authors:  K Nakai; P Horton
Journal:  Trends Biochem Sci       Date:  1999-01       Impact factor: 13.807

3.  Extensive feature detection of N-terminal protein sorting signals.

Authors:  Hideo Bannai; Yoshinori Tamada; Osamu Maruyama; Kenta Nakai; Satoru Miyano
Journal:  Bioinformatics       Date:  2002-02       Impact factor: 6.937

4.  Global analysis of protein localization in budding yeast.

Authors:  Won-Ki Huh; James V Falvo; Luke C Gerke; Adam S Carroll; Russell W Howson; Jonathan S Weissman; Erin K O'Shea
Journal:  Nature       Date:  2003-10-16       Impact factor: 49.962

5.  Mimicking cellular sorting improves prediction of subcellular localization.

Authors:  Rajesh Nair; Burkhard Rost
Journal:  J Mol Biol       Date:  2005-04-22       Impact factor: 5.469

Review 6.  RNA localization in yeast: moving towards a mechanism.

Authors:  Graydon B Gonsalvez; Carl R Urbinati; Roy M Long
Journal:  Biol Cell       Date:  2005-01       Impact factor: 4.458

7.  Evaluation and comparison of mammalian subcellular localization prediction methods.

Authors:  Josefine Sprenger; J Lynn Fink; Rohan D Teasdale
Journal:  BMC Bioinformatics       Date:  2006-12-18       Impact factor: 3.169

8.  The Universal Protein Resource (UniProt).

Authors:  Amos Bairoch; Rolf Apweiler; Cathy H Wu; Winona C Barker; Brigitte Boeckmann; Serenella Ferro; Elisabeth Gasteiger; Hongzhan Huang; Rodrigo Lopez; Michele Magrane; Maria J Martin; Darren A Natale; Claire O'Donovan; Nicole Redaschi; Lai-Su L Yeh
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

9.  Integrative analysis of the mitochondrial proteome in yeast.

Authors:  Holger Prokisch; Curt Scharfe; David G Camp; Wenzhong Xiao; Lior David; Christophe Andreoli; Matthew E Monroe; Ronald J Moore; Marina A Gritsenko; Christian Kozany; Kim K Hixson; Heather M Mottaz; Hans Zischka; Marius Ueffing; Zelek S Herman; Ronald W Davis; Thomas Meitinger; Peter J Oefner; Richard D Smith; Lars M Steinmetz
Journal:  PLoS Biol       Date:  2004-06-15       Impact factor: 8.029

10.  A knowledge base for predicting protein localization sites in eukaryotic cells.

Authors:  K Nakai; M Kanehisa
Journal:  Genomics       Date:  1992-12       Impact factor: 5.736

View more
  1177 in total

1.  Towards an understanding of wheat chloroplasts: a methodical investigation of thylakoid proteome.

Authors:  Abu Hena Mostafa Kamal; Kun Cho; Setsuko Komatsu; Nobuyuki Uozumi; Jong-Soon Choi; Sun Hee Woo
Journal:  Mol Biol Rep       Date:  2011-12-11       Impact factor: 2.316

2.  A vacuolar arsenite transporter necessary for arsenic tolerance in the arsenic hyperaccumulating fern Pteris vittata is missing in flowering plants.

Authors:  Emily Indriolo; GunNam Na; Danielle Ellis; David E Salt; Jo Ann Banks
Journal:  Plant Cell       Date:  2010-06-08       Impact factor: 11.277

3.  Identification of potential serodiagnostic and subunit vaccine antigens by antibody profiling of toxoplasmosis cases in Turkey.

Authors:  Li Liang; Mert Döşkaya; Silvia Juarez; Ayşe Caner; Algis Jasinskas; Xiaolin Tan; Bettina E Hajagos; Peter J Bradley; Metin Korkmaz; Yüksel Gürüz; Philip L Felgner; D Huw Davies
Journal:  Mol Cell Proteomics       Date:  2011-04-21       Impact factor: 5.911

4.  Analysis of subcellular localization of auxin carriers PIN, AUX/LAX and PGP in Sorghum bicolor.

Authors:  SuiKang Wang; ChenJia Shen; SaiNa Zhang; YanXia Xu; DeAn Jiang; YanHua Qi
Journal:  Plant Signal Behav       Date:  2011-12

5.  Evolution and functional diversification of the small heat shock protein/α-crystallin family in higher plants.

Authors:  Hernán Gabriel Bondino; Estela Marta Valle; Arjen Ten Have
Journal:  Planta       Date:  2011-12-31       Impact factor: 4.116

6.  Learning cellular sorting pathways using protein interactions and sequence motifs.

Authors:  Tien-Ho Lin; Ziv Bar-Joseph; Robert F Murphy
Journal:  J Comput Biol       Date:  2011-10-14       Impact factor: 1.479

7.  In-depth analysis of the Magnaporthe oryzae conidial proteome.

Authors:  Emine Gokce; William L Franck; Yeonyee Oh; Ralph A Dean; David C Muddiman
Journal:  J Proteome Res       Date:  2012-10-29       Impact factor: 4.466

8.  Mrp1 localization and function in cardiac mitochondria after doxorubicin.

Authors:  Paiboon Jungsuwadee; Ramaneeya Nithipongvanitch; Yumin Chen; Terry D Oberley; D Allan Butterfield; Daret K St Clair; Mary Vore
Journal:  Mol Pharmacol       Date:  2009-02-20       Impact factor: 4.436

9.  Mapping metabolic and transcript temporal switches during germination in rice highlights specific transcription factors and the role of RNA instability in the germination process.

Authors:  Katharine A Howell; Reena Narsai; Adam Carroll; Aneta Ivanova; Marc Lohse; Björn Usadel; A Harvey Millar; James Whelan
Journal:  Plant Physiol       Date:  2008-12-12       Impact factor: 8.340

10.  Trmt61B is a methyltransferase responsible for 1-methyladenosine at position 58 of human mitochondrial tRNAs.

Authors:  Takeshi Chujo; Tsutomu Suzuki
Journal:  RNA       Date:  2012-10-24       Impact factor: 4.942

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.