| Literature DB >> 31857603 |
Linlin Zhao1,2, Gereon Poschmann1, Daniel Waldera-Lupa1, Nima Rafiee2, Markus Kollmann2, Kai Stühler3,4.
Abstract
The prediction of protein localization, such as in the extracellular space, from high-throughput data is essential for functional downstream inference. It is well accepted that some secreted proteins go through the classic endoplasmic reticulum-Golgi pathway with the guidance of a signal peptide. However, a large number of proteins have been found to reach the extracellular space by following unconventional secretory pathways. There remains a demand for reliable prediction of unconventional protein secretion (<span class="Gene">UPS). Here, we present OutCyte, a fast and accu<span class="Species">rate tool for the prediction of UPS, which for the first time has been built upon experimentally determined UPS proteins. OutCyte mediates the prediction of protein secretion in two steps: first, proteins with N-terminal signals are accurately filtered out; second, proteins without N-terminal signals are classified as UPS or intracellular proteins based on physicochemical features directly generated from their amino acid sequences. We are convinced that OutCyte will play a relevant role in the annotation of experimental data and will therefore contribute to further characterization of the extracellular nature of proteins by considering the commonly neglected UPS proteins.OutCyte has been implemented as a web server at www.outcyte.com.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31857603 PMCID: PMC6923414 DOI: 10.1038/s41598-019-55351-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The OutCyte framework is an integrated predictive tool for signal peptide-containing proteins and unconventionally secreted proteins. OutCyte-SP classifies input proteins into three categories: proteins with a signal peptide, proteins with a transmembrane domain at the N-terminus, or proteins not belonging to these two classes. The latter proteins were further analysed by OutCyte-UPS, which has been trained on experimentally determined secreted proteins and classifies input proteins as intracellular or unconventionally secreted.
Figure 2The OutCyte-SP model and its predictions. (a) The structure of the convolutional neural network for learning the motifs at the N-termini of sequences. The network consists of two convolutional layers, which use ReLu transformations and no pooling. A max pooling layer follows to extract the strongest distinguishing features, followed by the dense and softmax layers. (b) Matthews correlation coefficients (MCCs) for signal peptide identification from three datasets are shown in the left panel. In the right panel, micro-averaged MCCs were calculated for OutCyte-SP and DeepSig on the two evaluation datasets. *The SignalP5.0 training dataset overlapped with SignalP4.0’s benchmark set; thus, two MCCs were not included. (c) Intersections among 4 different annotations for signal-peptide-containing proteins in the human proteome from OutCyte-SP, UniProt (with evidence), SignalP 4.1 and DeepSig.
Figure 3Prediction of unconventional protein secretion by OutCyte-UPS. (a) Eight features were identified to be important for the classification of unconventionally secreted proteins. Important features include a high frequency of arginine residues and positively charged amino acids, which have already been previously associated with the membrane transition of proteins. (b) Cross-validated training curve for XGBoost-based OutCyte-UPS. (c) An independent data set containing experimentally verified UPS proteins as well as the top 20 intracellular proteins from experimental data was used for performance comparison. Here, OutCyte-UPS showed improved performance compared to SecretomeP. (d) The OutCyte pipeline was applied to all 20170 proteins from the human proteome: OutCyte-SP classified 6077 proteins as containing either an N-terminal signal peptide or a transmembrane domain. The remaining 14,254 proteins were then used for OutCyte-UPS prediction of unconventional secreted proteins. Finally, 3,475 human proteins were predicted to be unconventionally secreted.