| Literature DB >> 21543452 |
Andrea Pierleoni1, Valentina Indio, Castrense Savojardo, Piero Fariselli, Pier Luigi Martelli, Rita Casadio.
Abstract
MemPype is a Python-based pipeline including previously published methods for the prediction of signal peptides (SPEP), glycophosphatidylinositol (GPI) anchors (PredGPI), all-alpha membrane topology (ENSEMBLE), and a recent method (MemLoci) that specifically discriminates the localization of eukaryotic membrane proteins in: 'cell membrane', 'internal membranes', 'organelle membranes'. MemLoci scores with accuracy of 70% and generalized correlation coefficient (GCC) of 0.50 on a rigorous homology-unbiased validation set and overpasses other predictors for subcellular localization. The annotation process is based both on inheritance through homology and computational methods. Each submitted protein first retrieves, when available, up to 25 similar proteins (with sequence identity ≥50% and alignment coverage ≥50% on both sequences). This helps the identification of membrane-associated proteins and detailed localization tags. Each protein is also filtered for the presence of a GPI anchor [0.8% false positive rate (FPR)]. A positive score of GPI anchor prediction labels the sequence as exposed to 'Cell surface'. Concomitantly the sequence is analysed for the presence of a signal peptide and classified with MemLoci into one of three discriminated classes. Finally the sequence is filtered for predicting its putative all-alpha protein membrane topology (FPR <1%). The web server is available at: http://mu2py.biocomp.unibo.it/mempype.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21543452 PMCID: PMC3125734 DOI: 10.1093/nar/gkr282
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Workflow of the MemPype annotation pipeline. MemPype performs annotation with homology search and prediction tools. See text for further details.
Figure 2.MemPype output results. Two outputs are returned: (i) a list of at the most 25 proteins sharing sequence identity ≥50% on an alignment covering ≥50% of both sequence lengths (when available). Both keywords and GO terms can be transferred on the basis of sequence similarity to the query sequence. (ii) A list of all the predicted features including signal peptide [with SPEP (7)], GPI-anchor [with PredGPI (8)], all-alpha TM topology [with ENSEMBLE3.0 (9)] and prediction of subcellular localization [with MemLoci (4)]. See text for further details.
Performance of the different predictors included in MemPype on never seen before validation sets
| Method | Blind validation set | Sen, % | Sp, % | FPR, % | Acc, % | CC |
|---|---|---|---|---|---|---|
| SPEP | 543 proteins with SP | 89 | 95 | 3 | 93 | 0.87 |
| 744 proteins without SP | 97 | 91 | 11 | |||
| PredGPI | 19 GPI-anchored proteins | 89 | 85 | 0.8 | 99 | 0.87 |
| 391 non-GPI-anchored proteins | 99 | 99 | 11 | |||
| ENSEMBLE3.0 | 15 TM proteins | 100 | 83 | 0.4 | 99 | 0.91 |
| 208 non-TM proteins | 99 | 100 | 0 | |||
| MemLoci | 32 CM proteins | 56 | 75 | 9 | 70 | 0.50 |
| 18 OM proteins | 50 | 56 | 9 | |||
| 50 IM proteins | 86 | 72 | 34 |
aThe validation set collects never seen before chains by the method and deposited after January 2010. Predictions are scored with the following indexes: Sen: sensitivity = (no. of correctly predicted proteins in the class)/(total no. of proteins in the class); Sp: specificity = (no. of correctly predicted proteins in the class)/(total no. of proteins predicted in the class); FPR = (no. of mispredicted proteins in the class)/(total no. of proteins in the complementary class); Acc = (no. of correctly predicted proteins)/(total no. of proteins); Matthews CC is adopted for binary classifications, while GCC (b) is computed for multiclass classifications (22).
cIMs comprising all the endomembrane system except the cell membrane. All the validation sets are available at the MemPype website in the ‘Info’ page.