Literature DB >> 20033263

Comparative analysis of an experimental subcellular protein localization assay and in silico prediction methods.

Yuhui Hu1, Hans Lehrach, Michal Janitz.   

Abstract

The subcellular localization of a protein can provide important information about its function within the cell. As eukaryotic cells and particularly mammalian cells are characterized by a high degree of compartmentalization, most protein activities can be assigned to particular cellular compartments. The categorization of proteins by their subcellular localization is therefore one of the essential goals of the functional annotation of the human genome. We previously performed a subcellular localization screen of 52 proteins encoded on human chromosome 21. In the current study, we compared the experimental localization data to the in silico results generated by nine leading software packages with different prediction resolutions. The comparison revealed striking differences between the programs in the accuracy of their subcellular protein localization predictions. Our results strongly suggest that the recently developed predictors utilizing multiple prediction methods tend to provide significantly better performance over purely sequence-based or homology-based predictions.

Entities:  

Mesh:

Substances:

Year:  2009        PMID: 20033263      PMCID: PMC2834777          DOI: 10.1007/s10735-009-9247-9

Source DB:  PubMed          Journal:  J Mol Histol        ISSN: 1567-2379            Impact factor:   2.611


Introduction

Knowing the location of a protein within its cellular environment is critical for understanding the regulatory mechanisms by which it is controlled. The accurate function of proteins and their interaction networks relies greatly on the proper localization of each protein component. A conventional method to identify protein–protein interactions at the single-cell level is to trace the mutual localization of proteins under physiological conditions (Relic et al. 1998; Surapureddi et al. 2000). Another common strategy in the study of regulation and interaction networks is to determine whether the localization of proteins is altered by the intentional disruption of the networks (Zuckerbraun et al. 2003). The aberrant translocation of proteins often correlates with pathological changes in cell physiology and accounts for the clinical manifestations of several pan class="Disease">genetic diseases such as pan class="Disease">primary hyperoxaluria (Danpure et al. 1993). A growing list of diseases caused by the improper localization of proteins makes protein translocation a promising target for the development of therapeutic agents (Besemer et al. 2005; Garrison et al. 2005). Computational biologists have made extensive efforts to develop programs to predict the subcellular localization of proteins. Numerous software suites have been released in this field, based on various biological concepts and computational methods. Presently, four leading methods are commonly used. The first uses the overall protein amino acid composition. For example, SubLoc predicts protein localization based on the fact that proteins with different subcellular localizations usually have different amino acid compositions (Hua and Sun 2001). The second type of method utilizes known targeting sequences. One of the most important principles of the protein sorting mechanism is the existence of a targeting signal in the amino acid sequence that leads proteins to different organelles or out of the cell. Hence, several computational approaches focus on predicting the presence of certain targeting motifs in protein sequences, e.g. signal peptides (SPs), the pan class="Species">mitochondrial targeting peptide (pan class="Gene">mTP), nuclear localization signals (NLS) and transmembrane alpha helices (Bannai et al. 2002; Claros and Vincens 1996; Emanuelsson 2002). A third approach uses sequence homology and/or motifs. For example, the Proteome Analyst Subcellular Localization Server (PA-SUB) utilizes keywords from the protein database SWISS-PROT and the annotation of homologous proteins (Lu et al. 2004). Finally, a combination of the information obtained from the three categories described above has been used in prediction tools such as WoLF-PSORT (updated version of PSORT II) and the most recent, SherLoc2 (Horton et al. 2007; Briesemeister et al. 2009). Due to their automated and high-throughput nature, computational methods are appealing for the large-scale assignment of protein subcellular locations. Regardless of the algorithm used, however, computational predictions have always been based on available biological knowledge, which is far from complete. The enormous complexity of the protein sorting process, the existence of alternative transportation pathways and the lack of complete data for every organelle still limit the application of computational methods. For instance, very few current predictors can deal with multi-site localization of a protein, with the exception of WoLF-PSORT and Hum-mPLoc (Shen and Chou 2009). Due to the uncertain effectiveness of the available methods, particularly on a random protein dataset, we performed a comparative analysis between experimentally obtained subcellular localization data for 52 pan class="Species">human Chr.21 proteins (Hu et al. 2006) and in silico prediction results, with the aim of evaluating the reliability of the bioinformatics approaches. Nine leading computational programs were included in the analysis, mainly due to their variable prediction strategies and the user-friendly web services that they provide.

Materials and methods

The materials and methods for the experimental characterization of protein subcellular localizations were reported previously (Hu et al. 2006). The computational predictions were performed on the internet website interfaces provided by each prediction program. A positive prediction was counted if the program gave the same site as at least one of the experimentally determined localizations for a given protein. The web addresses of the prediction programs used in this study are as follows: SherLoc2: http://www-bs.informatik.uni-tuebingen.de/Services/SherLoc2; WoLF-PSORT: http://wolfpsort.org/; pTARGET: http://bioapps.rit.albany.edu/pTARGET/; ProtComp8: http://linux1.softberry.com/berry.phtml?topic=protcompan&group=programs&subgroup=proloc; PA-SUB v2.5: http://pasub.cs.ualberta.ca:8080/pa/Subcellular; MultiLoc2: http://www-bs.informatik.uni-tuebingen.de/Services/MultiLoc2/; ESLPred2: http://www.imtech.res.in/raghava/eslpred2/; BaCelLo: http://gpcr.biocomp.unibo.it/bacello/; SubLoc: http://www.bioinfo.tsinghua.edu.cn/SubLoc/.

Results

We divided the nine programs into two groups according to their prediction resolutions: low-resolution four-site prediction (nucleus, cytoplasm, pan class="Species">mitochondrion and secretory pathway) and high-resolution organelle prediction that can further assign a secretory pathway protein to specific subcellular organelles such as the ER, Golgi apparatus, peroxisome and lysosome, as well as the plasma membrane and extracellular secretion. The prediction principles and capaspan>bilities of the nine programs are summarized in Table 1.
Table 1

Comparison of the protein localization prediction software programs used in the study

SoftwarePrediction strategyNumber of predicted localizations*Reference
SherLoc2Sequence-based predictions (aa composition, sorting signals), homology similarity, GO terms9Briesemeister et al. (2009)
WoLF-PSORTSequence-based predictions (aa composition, sorting signals, functional motifs), homology similarity11Horton et al. (2007)
pTARGETSequence-based predictions (aa composition, localization-specific Pfam domains)9Guda (2006)
ProtCom p8Sequence-based predictions (signal sequences, anchors, other functional peptides), homology similarity9 www.softberry.com
PA-SUB v.2.5Homology similarity9Lu et al. (2004)
MultiLoc2# Sequence-based predictions (aa composition, sorting signals), homology similarity, GO terms4Blum et al. (2009)
ESLPred2Sequence-based predictions (aa composition, sorting signals), homology similarity4Garg and Raghava (2008)
BaCelLoSequence composition4Pierleoni et al. (2006)
SubLocAa composition4Hua and Sun (2001)

* The number of sites was counted only for eukaryotic proteins

#Only the low-resolution function of MultiLoc2 was used; the high-resolution module was included in SherLoc2; aa amino acids

Comparison of the protein localization prediction software programs used in the study * The number of sites was counted only for eukaryotic proteins #Only the low-resolution function of MultiLoc2 was used; the high-resolution module was included in SherLoc2; aa amino acids The prediction results for the 52 Chr.21 proteins are summarized in Tables 2, 3; they were compared to the experimentally determined localization patterns described previously (Hu et al. 2006). If one of the actual localization sites of a protein was predicted by a program, we counted a full positive prediction. This means, for example, that a prediction of “extracellular/secretory” in a low-resolution group was considered to reflect good performance in predicting the localization of plasma membrane, ER, Golgi and lysosomal proteins (in total, 15 proteins in this study). This loose criterion for the secretory pathway, however, was not applied to the high-resolution predictors that can classify proteins into specific organelle locations. For all of the predictors, however, a prediction of either “cytoplasm” or “nucleus” was counted as a full positive hit for the 12 Chr.21 proteins with “cyto-nuc” (cytoplasm and nucleus) dual localization. These calculations significantly raised the overall success rates for all nine of the predictors, but they should have no impact on comparisons of the relative performances of predictors with the same resolution, as none of the nine predictors showed a dual-localization prediction for any of the 52 proteins tested in this study.
Table 2

Comparison of experimental localization results for 52 Chr.21 proteins to in silico low-resolution predictions

Gene symbolGenBank protein acc. no.Function classLocalization in HEK293TLow-resolution localization prediction
MultiLoc2-LowResESLPred2BaCelLoSubLoc
ABCG1 CAA62631.1ATPasePM/GolgiCytoCytoCytoSecr. Path.
AGPAT3 AAH11971.1AcyltransferaseER/PM(less)MitoSecr. PathSecr. Path.Cyto
B3GALT5 NP_006048.1Galactosyl-transferaseGolgi/ERSecr. Path.Secr. Path.CytoMito
BACH1 BAA24932.1Transcription regulationCyto(punct) Nuc-M-phaseNucNucNucNuc
C21orf103 NP_853633.1UnclearCytoCytoSecr. Path.Secr. Path.Nuc
C21orf19 AAL34462.1UnknownNuc/CytoCytoCytoCytoNuc
C21orf25 XP_032945.2UnknownNuc/CytoCytoNucNucNuc
C21orf30 CAB56001.2UnknownNucCytoNucNucNuc
C21orf4 AAC05974.2UnknownPMCytoSecr. Path.Secr. Path.Cyto
C21orf59 AAG00496.1UnknownNuc/CytoCytoMitoCytoCyto
C21orf69 AAK60445.1UnknownERMitoNucSecr. Path.Nuc
C21orf96 NP_079419.1UnknownCyto (punct)NucNucMitoMito
CBS NP_000062.1 (splicing isoform)Cystathionine-beta-synthaseCytoCytoCytoCytoCyto
CCT8 BAA02792.1ChaperoninCytoCytoCytoCytoCyto
CHAF1B NP_005432.1Chromatin assembly factorNucleoplasm Cyto-M phaseCytoNucNucCyto
CLDN14 AAG60052.1Tight junctionER/PM(less)Secr. Path.Secr. Path.Secr. Path.Secr. Path.
CLDN17 CAB60616.1Tight junctionPM/GolgiSecr. Path.Secr. Path.Secr. Path.Secr. Path.
CLDN8 NP_036264.1Tight junctionER/PM(less)Secr. Path.Secr. Path.Secr. Path.Secr. Path.
CRYZL1 BAA91605.1OxidoreductaseCytoCytoCytoCytoCyto
CXADR AAH10536.1ReceptorPMSecr. Path.Secr. Path.CytoNuc
DNMT3L AAH02560.1Methyltransferase-likeNuc/CytoCytoSecr. Path.CytoCyto
DSCR3 NP_006043.1UnknownNucCytoCytoCytoCyto
ETS2 NP_005230.1Transcription factorNucNucNucCytoNuc
GCFC AAD34617.1Transcriptional repressorCytoNucNucNucNuc
HLCS NP_000402.2Protein ligaseCytoCytoNucCytoCyto
HMGN1 AAA52676.1DNA bindingNucNucNucNucNuc
HSF2BP NP_008962.1Transcription factor bindingCytoCytoNucCytoCyto
IFNGR2 AAH03624.1ReceptorER/PM(less)Secr. PathSecr. Path.Secr. Path.Secr. Path.
KCNE1 AAH36452.1K-channelLyso/PMSecr. Path.Secr. Path.CytoSecr. Path.
KCNE2 NP_005127.1K-channelLyso/PMCytoSecr. Path.Secr. Path.Secr. Path.
KCNJ15 NP_002234.2K-channelPM/GolgiCytoCytoCytoCyto
KCNJ6 NP_002231.1K-channelPM/GolgiCytoCytoCytoCyto
KIAA0179 XP_035973.4UnknownNuc/Cyto (punct)-M phaseNucNucNucNuc
MCM3AP BAA25170.1DNA bindingCyto/NucCytoCytoCytoCyto
MX1 NP_002453.1Dynamin and large GTPasesCyto(punct)CytoMitoCytoCyto
NNP1 AAH00380.1RNA processingNucleolusNucNucNucCyto
PCBP3 AAH12061.1RNA bindingCyto/NucCytoCytoNucCyto
PCP4 CAA63724.1UnknownNuc/CytoCytoNucCytoMito
PDE9A AAH09047.1PhosphodiesteraseCyto (accum)CytoCytoNucNuc
PDXK AAH00123.1KinaseCytoCytoCytoCytoSecr. Path.
PFKL AAH09919.1KinaseCyto (accum)CytoCytoMitoMito
PKNOX1 AAH07746.1Transcription factorNuc/CytoNucNucNucNuc
PPIA3L CAA37039.1Peptidylprolyl isomerase ANuc/CytoCytoCytoSecr. Path.Cyto
RPS5L Pseudogene, 81% identity to BAB79493.1UnknownCytoCytoCytoCytoSecr. Path.
SH3BGR AAH06371.1SH3 adaptorCytoCytoCytoCytoNuc
TAK1L AAF81754.1Transcription factor-likeNuc/CytoCytoNucCytoSecr. Path.
TMPRSS3a NP_076927.1ProteaseERCytoSecr. Path.CytoSecr. Path.
TSGA2 NP_543136.1Chromosome-associatedCyto/NucCytoCytoCytoCyto
UBASH3A NP_061834.1Catalytic activityCytoCytoCytoCytoCyto
UBE2G2 AAC32312.1LigaseCytoCytoCytoCytoNuc
WDR4 AAH06341.1UnknownNucleoplasmSecr. PathNucNucCyto
WDR9_3′ BAA92123.1UnknownNucNucNucNucNuc

The localization properties of 52 Chr.21 proteins determined experimentally in HEK293T cells were compared to prediction results given by four computational programs that can only classify proteins into four subcellular compartments. Accum accumulated, Cyto cytosol, ER endoplasmic reticulum, Lyso lysosome and endosome, Mem-bound membrane-bound, Mito mitochondria, Nuc Nucleus, PM plasma membrane, Punct punctuated, Secr. Path. extracellular secreted protein or secretory pathway protein

Table 3

Comparison of experimental localization results for 52 Chr.21 proteins to in silico high-resolution predictions

Gene symbolGenBank protein acc. no.Function classLocalization in HEK293THigh-resolution localization prediction
SherLoc2WoLF-PSORTpTARGETProtComp8PA-SUB v2.5
ABCG1 CAA62631.1ATPasePM/GolgiCytoPMPMPMER
AGPAT3 AAH11971.1AcyltransferaseER/PM(less)ERExtracellERERMito
B3GALT5 NP_006048.1Galactosyl-transferaseGolgi/ERGolgiExtracellGolgiGolgiGolgi
BACH1 BAA24932.1Transcription regulation

Cyto(punct)

Nuc-M-phase

NucNucNucNucNuc
C21orf103 NP_853633.1UnclearCytoCytoExtracellExtracellPMCyto
C21orf19 AAL34462.1UnknownNuc/CytoMitoNucPMExtracell
C21orf25 XP_032945.2UnknownNuc/CytoNucNucNucExtracellExtracell
C21orf30 CAB56001.2UnknownNucCytoNucExtracellMem-bound Perox
C21orf4 AAC05974.2UnknownPMPMPMPMExtracell
C21orf59 AAG00496.1UnknownNuc/CytoCytoCytoCytoExtracell
C21orf69 AAK60445.1UnknownERMitoExtracellExtracellCyto
C21orf96 NP_079419.1UnknownCyto (punct)CytoCyto_NucCytoExtracell
CBS NP_000062.1 (splicing isoform)Cystathionine-beta-synthaseCytoCytoPMCytoCyto
CCT8 BAA02792.1ChaperoninCytoCytoCytoMitoCytoCyto
CHAF1B NP_005432.1Chromatin assembly factorNucleoplasm Cyto-M phaseNucNucNucNucNuc
CLDN14 AAG60052.1Tight junctionER/PM(less)PMPMPMPM
CLDN17 CAB60616.1Tight junctionPM/GolgiPMPMPMPM
CLDN8 NP_036264.1Tight junctionER/PM(less)PMPMPMPM
CRYZL1 BAA91605.1OxidoreductaseCytoCytoCytoCytoExtracellCyto
CXADR AAH10536.1ReceptorPMPMPMExtracellPMExtracell
DNMT3L AAH02560.1Methyltransferase-likeNuc/CytoCytoNucPMNucNuc
DSCR3 NP_006043.1UnknownNucCytoCytoCytoExtracellCyto
ETS2 NP_005230.1Transcription factorNucNucNucNucNucNuc
GCFC AAD34617.1Transcriptional repressorCytoNucCytoNucMem-bound peroxNuc
HLCS NP_000402.2Protein ligaseCytoCytoCytoMitoExtracellCyto
HMGN1 AAA52676.1DNA bindingNucNucNucNucMitoNuc
HSF2BP NP_008962.1Transcription factor bindingCytoCytoCytoCytoExtracellCyto
IFNGR2 AAH03624.1ReceptorER/PM (less)PMPMLysoPMExtracell
KCNE1 AAH36452.1K-channelLyso/PMPMExtracellPMPMER
KCNE2 NP_005127.1K-channelLyso/PMPMCytoPMPMER
KCNJ15 NP_002234.2K-channelPM/GolgiPMPMPMPMER
KCNJ6 NP_002231.1K-channelPM/GolgiPMPMPMPMMito
KIAA0179 XP_035973.4UnknownNuc/Cyto (punct)-M phaseNucNucNucNucNuc
MCM3AP BAA25170.1DNA bindingCyto/NucNucNucLysoExtracellNuc
MX1 NP_002453.1Dynamin and large GTPasesCyto (punct)CytoCytoCytoCytoCyto
NNP1 AAH00380.1RNA processingNucleolusNucNucNucNucNuc
PCBP3 AAH12061.1RNA bindingCyto/NucNucCyskCytoCytoCyto
PCP4 CAA63724.1UnknownNuc/CytoCytoCytoCytoMitoCyto
PDE9A AAH09047.1PhosphodiesteraseCyto (accum)CytoCytoCytoExtracellCyto
PDXK AAH00123.1KinaseCytoCytoCytoCytoCytoCyto
PFKL AAH09919.1KinaseCyto (accum)CytoCytoCytoCyto
PKNOX1 AAH07746.1Transcription factorNuc/CytoNucNucCytoNucNuc
PPIA3L CAA37039.1Peptidylprolyl isomerase ANuc/CytoCytoCytoCytoCytoCyto
RPS5L Pseudogene, 81% identity to BAB79493.1UnknownCytoCytoCytoExtracell
SH3BGR AAH06371.1SH3 adaptorCytoCytoCytoNucExtracellCyto
TAK1L AAF81754.1Transcription factor-likeNuc/CytoCytoCytoCytoCytoCyto
TMPRSS3a NP_076927.1ProteaseERPMCytoPMERExtracell
TSGA2 NP_543136.1Chromosome-associatedCyto/NucCytoCytoCytoExtracellCyto
UBASH3A NP_061834.1Catalytic activityCytoCytoNucGolgiExtracellCyto
UBE2G2 AAC32312.1LigaseCytoPeroxMitoERExtracellCyto
WDR4 AAH06341.1UnknownNucleoplasmCytoExtracellGolgiExtracellCyto
WDR9_3′ BAA92123.1UnknownNucNucNucNucCyto

The localization properties of 52 Chr.21 proteins determined experimentally in HEK293T cells were compared to prediction results given by five computational programs that can classify proteins into at least nine subcellular compartments. Accum accumulated Cysk cytoskeleton, Cyto cytosol, ER endoplasmic reticulum, Extracell extracellular secreted protein, Lyso lysosome and endosome, Mem-bound membrane-bound, Mito mitochondria, Nuc Nucleus, PM plasma membrane, Punct punctuated, Perox peroxisome

Comparison of experimental localization results for 52 Chr.21 proteins to in silico low-resolution predictions The localization properties of 52 Chr.21 proteins determined experimentally in HEK293T cells were compared to prediction results given by four computational programs that can only classify proteins into four subcellular compartments. Accum accumulated, Cyto cytosol, ER endoplasmic reticulum, Lyso lysosome and endosome, Mem-bound membrane-bound, Mito mitochondria, Nuc Nucleus, PM plasma membrane, Punct punctuated, Secr. Path. extracellular secreted protein or secretory pathway protein Comparison of experimental localization results for 52 Chr.21 proteins to in silico high-resolution predictions Cyto(punct) Nuc-M-phase The localization properties of 52 Chr.21 proteins determined experimentally in HEK293T cells were compared to prediction results given by five computational programs that can classify proteins into at least nine subcellular compartments. Accum accumulated Cysk cytoskeleton, Cyto cytosol, ER endoplasmic reticulum, Extracell extracellular secreted protein, Lyso lysosome and endosome, Mem-bound membrane-bound, Mito mitochondria, Nuc Nucleus, PM plasma membrane, Punct punctuated, Perox peroxisome The total number of positive predictions consistent with the experimental findings was summarized for each program; the percentage of prediction accuracy is shown next to the name of the prediction program in Figs. 1, 2. Among the low-resolution predictors, the three recently published programs MultiLoc2, ESLPred2 and BaCelLo were found to have similar prediction accuracies, with 75% (MultiLoc2-LowReso, ESLPred2) and 71% (BaCelLo) agreement with the experimental data. A relatively low percentage of positive prediction, 60%, was observed for SubLoc, which was written in 2001.
Fig. 1

Comparison of the prediction performances of five computational predictors with high resolution. Prediction performance varied among the different programs. SherLoc2 and WoLF-PSORT rendered the highest accuracy with the experimental results (indicated as Hek), at 83% and 75%, respectively, which was significantly better than pTARGET (60%), ProtComp8 (56%) and PA-SUB v2.5 (54%). Prediction accuracy was found to be associated with the specific localization site. Abbreviations: Nuc nucleus, Cyto cytoplasm, PM plasma membrane, ER endoplasmic reticulum, Lyso lysosome and endosome. *For the proteins with dual localization sites, all five of the predictors predicted only one site but such predictions were still counted as a full correct prediction

Fig. 2

Comparison of the prediction performances of four computational predictors with low resolution. The recently developed predictors were found to have similar prediction accuracies, with 75% (MultiLoc2-LowReso, ESLPred2) and 71% (BaCelLo) agreement with the experimental data (indicated as Hek). A relatively low percentage of positive prediction, 60%, was observed for SubLoc, which was developed in 2001. Prediction accuracy was found to be associated with the specific localization site. Abbreviations: Nuc nucleus, Cyto cytoplasm, Secr. path. secretory pathway protein (including plasma membrane, ER, Golgi and lysosomal proteins in this study)

Comparison of the prediction performances of five computational predictors with high resolution. Prediction performance varied among the different programs. SherLoc2 and WoLF-PSORT rendered the highest accuracy with the experimental results (indicated as Hek), at 83% and 75%, respectively, which was significantly better than pTARGET (60%), ProtComp8 (56%) and PA-SUB v2.5 (54%). Prediction accuracy was found to be associated with the specific localization site. Abbreviations: Nuc nucleus, Cyto cytoplasm, PM plasma membrane, ER endoplasmic reticulum, Lyso lysosome and endosome. *For the proteins with dual localization sites, all five of the predictors predicted only one site but such predictions were still counted as a full correct prediction Comparison of the prediction performances of four computational predictors with low resolution. The recently developed predictors were found to have similar prediction accuracies, with 75% (MultiLoc2-LowReso, ESLPred2) and 71% (BaCelLo) agreement with the experimental data (indicated as Hek). A relatively low percentage of positive prediction, 60%, was observed for SubLoc, which was developed in 2001. Prediction accuracy was found to be associated with the specific localization site. Abbreviations: Nuc nucleus, Cyto cytoplasm, Secr. path. secretory pathway protein (including plasma membrane, ER, Golgi and lysosomal proteins in this study) The high-resolution predictors were found to have huge differences in accuracy. SherLoc2 and WoLF-PSORT displayed the highest accuracy, at 83 and 75%, respectively, which was significantly better than pTARGET (60%), ProtComp8 (56%) and PA-SUB v2.5 (54%). This variation in performance may originate from the different prediction methods that each program utilizes. There is a commonality among the two best predictors in both resolution groups (MultiLoc2 and ESLPred2, and SherLoc2 and WoLF-PSORT) in that they all utilize a wide range of prediction methods based on amino acid sequence composition, sorting signals and homology similarity. This finding indicates that the combination of homology information with sequence-based prediction can greatly improve the accuracy of protein localization prediction. On the other hand, the low success rate of PA-SUB (54%) suggested that searching for the localization of homologs alone is not powerful enough to create a high-standard prediction. The main problem of an approach based only on homology is that the prediction results can be ambiguous if there are no homologous proteins available with annotated localizations. In this study the localization of 10 out of 52 proteins could not be predicted using PA-SUB. This incompleteness creates a significant challenge when using homolog-based programs for genome-wide predictions of protein localization. To evaluate whether prediction performance was associated with the specific localization site, the prediction results were grouped into different categories based on the experimental localization results. The number of predictions consistent with the experimental data was counted for each localization category and is shown in Figs. 1, 2. For the low-resolution predictors, the localization sites appeared to be irrelevant to prediction performance; the only exception was SubLoc, which could only predict seven out of 16 cytoplasmic proteins, a much smaller number than obtained with the other three programs. The performance similarity of these programs seemed reasonable because about 30% of the test proteins fell into the secretory pathway category. When we looked at the data from the high-resolution predictors, the prediction accuracies were found to be closely correlated with the localization sites. For example, PA-SUB showed high accuracy in predicting cytoplasmic proteins (13 out of 16) but failed to predict all 12 of the plasma membrane proteins, of which over 80% could be predicted by the other four predictors. ProtComp8 and pTARGET, on the other hand, tended to have lower accuracy in predicting cytoplasmic proteins, scoring below 40%. A different trend was observed for the prediction of ER proteins. Interestingly, in spite of the existence of a signal peptide (SP)—the first and most extensively studied protein sorting signal—all five of the predictors tended to miss the proteins residing in the ER. Instead, the ER proteins (e.g., C21orf69 and TMPRSS3a) were often misclassified as extracellular secretory and plasma membrane proteins. This is very likely due to the biological fact that most secretory and plasma membrane proteins also carry an SP in their amino acid sequences.

Discussion

The localization site-dependent performance shown by the different prediction programs may be attributable to the different prediction strategies utilized by each particular program and the level of knowledge available about protein trafficking mechanisms. For example, the sequence and structure of the signal peptide (SP), a motif that directs proteins to the ER membrane, are well studied as compared to nuclear localization signals (NLS), thus facilitating the prediction of proteins destined for the ER-associated secretory pathway (e.g., ER, Golgi, plasma membrane, lysosome/endosome and secretory proteins). This contributes to the high accuracy of low-resolution predictors that do not distinguish between specific localization sites within the pathway. For the high-resolution predictors, however, difficulties remain regarding how to classify the different organelles in relation to the secretory pathway. Hence, further studies on protein targeting motifs and their underlying mechanisms should contribute to the improvement of the accuracy of protein localization predictions. The present results demonstrate that prediction performance varies between different programs and different localization categories. Consequently, it might be advisable to use multiple localization predictors that utilize different prediction methods. Moreover, special attention should be paid to the relative confidence scores assigned to the different localization sites. Generally, a large difference between the second best score and the best one implies a reliable prediction, whereas similar scores obtained for different locations may reflect the unreliability of the prediction or may indicate that the protein has multiple localization patterns. A good example of this in our study is the C21orf7 protein. The C21orf7 (TAK1-like) gene shares homology with the human TAK1 (TGF-beta activated kinase) gene, which plays a critical role in the TGF-beta signal transduction pathway. Even though it was classified as a cytoplasmic protein by most of the predictors, ESLPred2 predicted the nucleus as the most plausible localization site; moreover, WoLF-PSORT suggested a dual localization in the cytoplasm and nucleus with 19.8% probability, second to a 24% probability of localization in the cytoplasm alone. In our previous transfected-cell array experiments (Hu et al. 2006), the actual localization of this protein was found to be quite dynamic, with a distribution in both the cytoplasm and the nucleus. In some cases the predictions may still be incorrect even though the majority of the predictors report the same localization. In this study the actual localization of several proteins was in disagreement with most of the predictions. For example, the WDR4 gene encodes a member of the WD-repeat protein family and is a candidate for some disorders mapped to 21q22.3 and for Down syndrome phenotypes (Michaud et al. 2000). Despite the fact that BaCelLo and ESLPred2 predicted it as a nuclear protein, the other seven programs predicted that it is either cytoplasmic protein or is exported outside of the cell. In the actual experiment, WDR4 proteins were found to reside in the nucleus, distributed within the nucleoplasm. The yeast homolog of WDR4, Trm82, has been previously reported to be required for 7-methylguanosine modification of tRNA (Alexandrov et al. 2002). Because this pre-tRNA processing is known to take place in the nucleoplasm before the resulting mature tRNAs are transported out to the cytoplasm (Lodish et al. 2000), Trm82 was expected to localize in the nucleus, especially in the nucleoplasm, as we observed for WDR4. Although the functional role of WDR4 in human cells has not been experimentally verified, Alexandrov et al. have found that WDR4, in a complex with METTL1, is required for the 7-methylguanosine modification of yeast tRNA (Alexandrov et al. 2002). In conjunction with our localization results, this finding suggests that human WDR4 performs a similar tRNA-processing function as does its yeast homolog. Taken together, despite the relatively small number of proteins analyzed in this study, our results indicate a generally lower percentage of prediction accuracy (54–83%) than claimed by recently published predictors; for instance, ESLPred2 was claimed to have an accuracy of over 90% (Garg and Raghava 2008). Nevertheless, SherLoc2, MultiLoc2, ESLPred2 and WoLF-PSORT showed significantly better performance than the other programs evaluated in our study. The predictors that showed the best performance were SherLoc2 and WoLF-PSORT. Both programs can carry out high-resolution predictions of at least nine subcellular localizations, which is an extra merit in addition to their high prediction accuracy. Their outstanding capabilities are likely related to the multi-dimensional biological information they integrate into their prediction strategies, ranging from amino acid composition and the presence of sorting signals and targeting motifs to homology profiles and Gene Ontology terms. Taken together, the differences in the accuracy of subcellular protein localization predictions presented in this study strongly suggest that the outcomes of in silico localization predictions should be treated with caution, and that it is always beneficial to compare the results provided by different prediction algorithms.
  20 in total

1.  Support vector machine approach for protein subcellular localization prediction.

Authors:  S Hua; Z Sun
Journal:  Bioinformatics       Date:  2001-08       Impact factor: 6.937

2.  Extensive feature detection of N-terminal protein sorting signals.

Authors:  Hideo Bannai; Yoshinori Tamada; Osamu Maruyama; Kenta Nakai; Satoru Miyano
Journal:  Bioinformatics       Date:  2002-02       Impact factor: 6.937

3.  Isolation and characterization of a human chromosome 21q22.3 gene (WDR4) and its mouse homologue that code for a WD-repeat protein.

Authors:  J Michaud; J Kudoh; A Berry; B Bonne-Tamir; M D Lalioti; C Rossier; K Shibuya; K Kawasaki; S Asakawa; S Minoshima; N Shimizu; S E Antonarakis; H S Scott
Journal:  Genomics       Date:  2000-08-15       Impact factor: 5.736

4.  Predicting subcellular localization of proteins using machine-learned classifiers.

Authors:  Z Lu; D Szafron; R Greiner; P Lu; D S Wishart; B Poulin; J Anvik; C Macdonell; R Eisner
Journal:  Bioinformatics       Date:  2004-01-22       Impact factor: 6.937

5.  A substrate-specific inhibitor of protein translocation into the endoplasmic reticulum.

Authors:  Jennifer L Garrison; Eric J Kunkel; Ramanujan S Hegde; Jack Taunton
Journal:  Nature       Date:  2005-07-14       Impact factor: 49.962

6.  SherLoc2: a high-accuracy hybrid method for predicting subcellular localization of proteins.

Authors:  Sebastian Briesemeister; Torsten Blum; Scott Brady; Yin Lam; Oliver Kohlbacher; Hagit Shatkay
Journal:  J Proteome Res       Date:  2009-11       Impact factor: 4.466

7.  Selective inhibition of cotranslational translocation of vascular cell adhesion molecule 1.

Authors:  Jürgen Besemer; Hanna Harant; Shirley Wang; Berndt Oberhauser; Katharina Marquardt; Carolyn A Foster; Erwin P Schreiner; Jan E de Vries; Christiane Dascher-Nadel; Ivan J D Lindley
Journal:  Nature       Date:  2005-07-14       Impact factor: 49.962

8.  Colocalization of leukotriene C synthase and microsomal glutathione S-transferase elucidated by indirect immunofluorescence analysis.

Authors:  S Surapureddi; J Svartz; K E Magnusson; S Hammarström; M Söderström
Journal:  FEBS Lett       Date:  2000-09-01       Impact factor: 4.124

9.  Interaction of the DNA modifying proteins VirD1 and VirD2 of Agrobacterium tumefaciens: analysis by subcellular localization in mammalian cells.

Authors:  B Relić; M Andjelković; L Rossi; Y Nagamine; B Hohn
Journal:  Proc Natl Acad Sci U S A       Date:  1998-08-04       Impact factor: 11.205

10.  ESLpred2: improved method for predicting subcellular localization of eukaryotic proteins.

Authors:  Aarti Garg; Gajendra P S Raghava
Journal:  BMC Bioinformatics       Date:  2008-11-28       Impact factor: 3.169

View more
  1 in total

1.  Analysis of the intracellular localization of p73 N-terminal protein isoforms TAp73 and ∆Np73 in medulloblastoma cell lines.

Authors:  Marta Nekulová; Karel Zitterbart; Jaroslav Sterba; Renata Veselská
Journal:  J Mol Histol       Date:  2010-08-28       Impact factor: 2.611

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.