| Literature DB >> 27714937 |
Yasser El-Manzalawy1, Elyse E Munoz2, Scott E Lindner2, Vasant Honavar1.
Abstract
Accurate and comprehensive identification of surface-exposed proteins (SEPs) in parasites is a key step in developing novel subunit vaccines. However, the reliability of MS-based high-throughput methods for proteome-wide mapping of SEPs continues to be limited due to high rates of false positives (i.e., proteins mistakenly identified as surface exposed) as well as false negatives (i.e., SEPs not detected due to low expression or other technical limitations). We propose a framework called PlasmoSEP for the reliable identification of SEPs using a novel semisupervised learning algorithm that combines SEPs identified by high-throughput experiments and expert annotation of high-throughput data to augment labeled data for training a predictive model. Our experiments using high-throughput data from the Plasmodium falciparum surface-exposed proteome provide several novel high-confidence predictions of SEPs in P. falciparum and also confirm expert annotations for several others. Furthermore, PlasmoSEP predicts that 25 of 37 experimentally identified SEPs in Plasmodium yoelii salivary gland sporozoites are likely to be SEPs. Finally, PlasmoSEP predicts several novel SEPs in P. yoelii and Plasmodium vivax malaria parasites that can be validated for further vaccine studies. Our computational framework can be easily adapted to improve the interpretation of data from high-throughput studies.Entities:
Keywords: Bioinformatics; Malaria; Plasmodium; Predicting surface-exposed proteins; Semi-supervised learning; Surface-exposed proteomics
Mesh:
Substances:
Year: 2016 PMID: 27714937 PMCID: PMC5600274 DOI: 10.1002/pmic.201600249
Source DB: PubMed Journal: Proteomics ISSN: 1615-9853 Impact factor: 3.984
Figure 1Flowchart of PlasmoSEP framework for integrating proteomics studies, expert annotations, bioinformatics tools, and semisupervised learning for accurate identification of SEPs in the malaria parasite (Plasmodium spp.).
Figure 2AUC comparisons between supervised learning (SL) and semisupervised learning (SSL), Algorithm 1, using NB (top) and RF100 (bottom) as supervised and base classifiers.
Figure 3AUC comparisons between basic SSL and our proposed SSL (SSL_k%) with k% noise in potentially labeled data using NB (top) and RF100 (bottom) as base classifiers.
List of predicted P. falciparum SEPs with maximum score ≥ 0.70 from the set of expert annotated unknown SEPs
| ID | Name | PlasmoSEP | SignalP | Antigenicity | Max_score |
|---|---|---|---|---|---|
| PF3D7_1462800 | Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) | 1.00 | 0.17 | 0.35 | 1.00 |
| PF3D7_0818900 | Heat shock protein 70 (HSP70) | 1.00 | 0.11 | 0.56 | 1.00 |
| PF3D7_1444800 | Fructose-bisphosphate aldolase (FBPA) | 1.00 | 0.10 | 0.25 | 1.00 |
| PF3D7_0903700 | Alpha tubulin 1 | 1.00 | 0.14 | 0.38 | 1.00 |
| PF3D7_0922200 | S-adenosylmethionine synthetase (SAMS) | 1.00 | 0.11 | 0.32 | 1.00 |
| PF3D7_0627500 | 4-Methyl-5(B-hydroxyethyl)-thiazol monophosphate biosynthesis enzyme | 1.00 | 0.12 | 0.60 | 1.00 |
| PF3D7_1140400 | Conserved Plasmodium protein, unknown function | 1.00 | 0.10 | 0.62 | 1.00 |
| PF3D7_1133400 | Apical membrane antigen 1 (AMA1) | 0.00 | 0.55 | 0.91 | 0.91 |
| PF3D7_1235700 | ATP synthase subunit beta, mitochondrial | 0.90 | 0.15 | 0.27 | 0.90 |
| PF3D7_0826700 | Receptor for activated c kinase (RACK) | 0.90 | 0.10 | 0.41 | 0.90 |
| PF3D7_0620000 | Conserved Plasmodium protein, unknown function | 0.00 | 0.87 | 0.58 | 0.87 |
| PF3D7_1335900 | Sporozoite surface protein 2 (TRAP) | 0.10 | 0.85 | 0.30 | 0.85 |
| PF3D7_1028600 | Conserved Plasmodium protein, unknown function | 0.00 | 0.10 | 0.85 | 0.85 |
| PF3D7_0812300 | Conserved Plasmodium protein, unknown function | 0.00 | 0.84 | 0.39 | 0.84 |
| PF3D7_0917900 | Heat shock protein 70 (HSP70-2) | 0.10 | 0.84 | 0.48 | 0.84 |
| PF3D7_0513300 | Purine nucleoside phosphorylase (PNP) | 0.80 | 0.12 | 0.56 | 0.80 |
| PF3D7_0524000 | Karyopherin beta (KASbeta) | 0.00 | 0.10 | 0.78 | 0.78 |
| PF3D7_0708400 | Heat shock protein 90 (HSP90) | 0.10 | 0.13 | 0.75 | 0.75 |
| PF3D7_0827900 | Protein disulfide isomerase (PDI8) | 0.00 | 0.74 | 0.35 | 0.74 |
| PF3D7_1361800 | Conserved Plasmodium protein, unknown function | 0.00 | 0.11 | 0.70 | 0.70 |
| PF3D7_0922500 | Phosphoglycerate kinase (PGK) | 0.00 | 0.10 | 0.70 | 0.70 |
| PF3D7_0320300 | T-complex protein 1 epsilon subunit, putative | 0.00 | 0.10 | 0.70 | 0.70 |
| PF3D7_1037300 | ADP/ATP transporter on adenylate translocase | 0.70 | 0.19 | 0.19 | 0.70 |
List of 37 identified SEPs in P. yoelii salivary gland sporozoites using MS experiments and their predicted PlasmoSEP, SignalP, and antigenicity scores
| ID | Name | PlasmoSEP | SignalP | Antigenicity | Max |
|---|---|---|---|---|---|
| PY17X_1330200 | Glyceraldehyde-3-phosphate dehydrogenase, putative (GAPDH) | 1.00 | 0.14 | 0.58 | 1.00 |
| PY17X_0712100 | Heat shock protein, putative (HSP70) | 1.00 | 0.11 | 0.62 | 1.00 |
| PY17X_1007600 | Sporozoite invasion-associated protein 1 (SIAP1) | 1.00 | 0.87 | 0.40 | 1.00 |
| PY17X_1312400 | Fructose-bisphosphate aldolase 2 (ALDO2) | 1.00 | 0.10 | 0.31 | 1.00 |
| PY17X_0420500 | Alpha tubulin 1 | 1.00 | 0.14 | 0.45 | 1.00 |
| PY17X_1354800 | Sporozoite surface protein 2, thrombospondin-related anonymous protein (TRAP) | 0.50 | 0.83 | 0.97 | 0.97 |
| PY17X_1007700 | Perforin-like protein 1, sporozoite micronemal protein essential for cell traversal (SPECT2) | 0.80 | 0.64 | 0.56 | 0.80 |
| PY17X_1461900 | Actin I | 0.80 | 0.10 | 0.52 | 0.80 |
| PY17X_0835500 | Conserved Plasmodium protein, unknown function | 0.20 | 0.67 | 0.79 | 0.79 |
| PY17X_0702200 | Secreted ookinete protein, putative, GPI-anchored micronemal antigen, putative (GAMA) | 0.00 | 0.76 | 0.48 | 0.76 |
| PY17X_1427200 | Conserved Plasmodium protein, unknown function | 0.10 | 0.74 | 0.54 | 0.74 |
| PY17X_0210500 | Thrombospondin related sporozoite protein, putative (TRSP) | 0.30 | 0.72 | 0.31 | 0.72 |
| PY17X_1210100 | Tubulin beta chain, putative | 0.70 | 0.10 | 0.44 | 0.70 |
| PY17X_0405400 | Circumsporozoite (CS) protein (CSP) | 0.70 | 0.68 | 0.70 | 0.70 |
| PY17X_1037800 | Glideosome associated protein with multiple membrane spans 3, putative (GAPM3) | 0.70 | 0.12 | 0.14 | 0.70 |
| PY17X_0902700.1 | Merozoite adhesive erythrocytic binding protein (MAEBL) | 0.30 | 0.69 | 0.44 | 0.69 |
| PY17X_0826700 | Phosphoglycerate kinase, putative (PGK) | 0.00 | 0.10 | 0.68 | 0.68 |
| PY17X_0912300 | Conserved Plasmodium protein, unknown function | 0.40 | 0.12 | 0.68 | 0.68 |
| PY17X_0404800 | Inner membrane complex protein 1a (IMC1a) | 0.20 | 0.11 | 0.67 | 0.67 |
| PY17X_1439800 | Endoplasmin, putative (GRP94) | 0.10 | 0.65 | 0.47 | 0.65 |
| PY17X_1217500 | Enolase, putative (ENO) | 0.50 | 0.11 | 0.64 | 0.64 |
| PY17X_1316500 | Gamete egress and sporozoite traversal protein, putative (GEST) | 0.20 | 0.63 | 0.46 | 0.63 |
| PY17X_1034500 | Rhoptry-associated protein 1, putative (RAP1) | 0.00 | 0.62 | 0.22 | 0.62 |
| PY17X_0910400 | Carbonic anhydrase, putative | 0.20 | 0.54 | 0.61 | 0.61 |
| PY17X_1134900 | Elongation factor 1-alpha, putative | 0.60 | 0.12 | 0.45 | 0.60 |
| PY17X_0703100 | Protein disulfide isomerase, putative | 0.10 | 0.59 | 0.49 | 0.59 |
| PY17X_0404900 | Membrane skeletal protein, putative | 0.10 | 0.10 | 0.55 | 0.55 |
| PY17X_0525300 | Glideosome associated protein with multiple membrane spans 2, putative (GAPM2) | 0.50 | 0.10 | 0.27 | 0.50 |
| PY17X_1361400 | Myosin A (MyoA) | 0.20 | 0.10 | 0.40 | 0.40 |
| PY17X_0303100 | Hexose transporter (HT) | 0.40 | 0.13 | 0.24 | 0.40 |
| PY17X_0712800 | 14-3-3 Protein, putative (14-3-3I) | 0.10 | 0.10 | 0.32 | 0.32 |
| PY17X_0706500 | Nucleoside transporter, putative (NT2) | 0.20 | 0.11 | 0.32 | 0.32 |
| PY17X_1424900 | Conserved Plasmodium protein, unknown function | 0.10 | 0.11 | 0.30 | 0.30 |
| PY17X_0823700 | Sugar transporter, putative | 0.20 | 0.30 | 0.19 | 0.30 |
| PY17X_0514100 | Conserved Plasmodium protein, unknown function | 0.00 | 0.10 | 0.22 | 0.22 |
| PY17X_1143100 | 60S ribosomal protein L40/UBI, putative | 0.00 | 0.12 | 0.09 | 0.12 |
| PY17X_1118200 | Histone H3 variant, putative (H3.3) | 0.00 | 0.10 | 0.03 | 0.10 |
Our approach confirms that the first 25 proteins are SEPs with predicted score ≥0.60.