Literature DB >> 18928201

A machine learning based method for the prediction of secretory proteins using amino acid composition, their order and similarity-search.

Aarti Garg1, Gajendra P S Raghava.   

Abstract

Most of the prediction methods for secretory proteins require the presence of a correct N-terminal end of the preprotein for correct classification. As large scale genome sequencing projects sometimes assign the 5'-end of genes incorrectly, many proteins are encoded without the correct N-terminus leading to incorrect prediction. In this study, a systematic attempt has been made to predict secretory proteins irrespective of presence or absence of N-terminal signal peptides (also known as classical and non-classical secreted proteins respectively), using machine-learning techniques; artificial neural network (ANN) and support vector machine (SVM). We trained and tested our methods on a dataset of 3321 secretory and 3654 non-secretory mammalian proteins using five-fold cross-validation technique. First, ANN-based modules have been developed for predicting secretory proteins using 33 physico-chemical properties, amino acid composition and dipeptide composition and achieved accuracies of 73.1%, 76.1% and 77.1%, respectively. Similarly, SVM-based modules using 33 physico-chemical properties, amino acid, and dipeptide composition have been able to achieve accuracies of 77.4%, 79.4% and 79.9%, respectively. In addition, BLAST and PSI-BLAST modules designed for predicting secretory proteins based on similarity search achieved 23.4% and 26.9% accuracy, respectively. Finally, we developed a hybrid-approach by integrating amino acid and dipeptide composition based SVM modules and PSI-BLAST module that increased the accuracy to 83.2%, which is significantly better than individual modules. We also achieved high sensitivity of 60.4% with low value of 5% false positive predictions using hybrid module. A web server SRTpred has been developed based on above study for predicting classical and non-classical secreted proteins from whole sequence of mammalian proteins, which is available from http://www.imtech.res.in/raghava/srtpred/.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18928201

Source DB:  PubMed          Journal:  In Silico Biol        ISSN: 1386-6338


  18 in total

1.  Prediction of GTP interacting residues, dipeptides and tripeptides in a protein from its evolutionary information.

Authors:  Jagat S Chauhan; Nitish K Mishra; Gajendra P S Raghava
Journal:  BMC Bioinformatics       Date:  2010-06-03       Impact factor: 3.169

2.  PTPAMP: prediction tool for plant-derived antimicrobial peptides.

Authors:  Mohini Jaiswal; Ajeet Singh; Shailesh Kumar
Journal:  Amino Acids       Date:  2022-07-21       Impact factor: 3.789

3.  Prediction and classification of aminoacyl tRNA synthetases using PROSITE domains.

Authors:  Bharat Panwar; Gajendra P S Raghava
Journal:  BMC Genomics       Date:  2010-09-22       Impact factor: 3.969

Review 4.  Evolution, role in inflammation, and redox control of leaderless secretory proteins.

Authors:  Roberto Sitia; Anna Rubartelli
Journal:  J Biol Chem       Date:  2020-04-24       Impact factor: 5.157

5.  Serine protease inhibitors of the whirling disease parasite Myxobolus cerebralis (Cnidaria, Myxozoa): Expression profiling and functional predictions.

Authors:  Edit Eszterbauer; Dóra Szegő; Krisztina Ursu; Dóra Sipos; Ákos Gellért
Journal:  PLoS One       Date:  2021-03-29       Impact factor: 3.240

6.  KUPS: constructing datasets of interacting and non-interacting protein pairs with associated attributions.

Authors:  Xue-wen Chen; Jong Cheol Jeong; Patrick Dermyer
Journal:  Nucleic Acids Res       Date:  2010-10-15       Impact factor: 16.971

7.  Prediction of antimicrobial peptides based on sequence alignment and feature selection methods.

Authors:  Ping Wang; Lele Hu; Guiyou Liu; Nan Jiang; Xiaoyun Chen; Jianyong Xu; Wen Zheng; Li Li; Ming Tan; Zugen Chen; Hui Song; Yu-Dong Cai; Kuo-Chen Chou
Journal:  PLoS One       Date:  2011-04-13       Impact factor: 3.240

Review 8.  Forest and Trees: Exploring Bacterial Virulence with Genome-wide Association Studies and Machine Learning.

Authors:  Jonathan P Allen; Evan Snitkin; Nathan B Pincus; Alan R Hauser
Journal:  Trends Microbiol       Date:  2021-01-14       Impact factor: 18.230

9.  Prediction of guide strand of microRNAs from its sequence and secondary structure.

Authors:  Firoz Ahmed; Hifzur Rahman Ansari; Gajendra P S Raghava
Journal:  BMC Bioinformatics       Date:  2009-04-09       Impact factor: 3.169

10.  Identification of proteins secreted by malaria parasite into erythrocyte using SVM and PSSM profiles.

Authors:  Ruchi Verma; Ajit Tiwari; Sukhwinder Kaur; Grish C Varshney; Gajendra Ps Raghava
Journal:  BMC Bioinformatics       Date:  2008-04-16       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.