Literature DB >> 27272007

MSLVP: prediction of multiple subcellular localization of viral proteins using a support vector machine.

Anamika Thakur1, Akanksha Rajput1, Manoj Kumar1.   

Abstract

Knowledge of the subcellular location (SCL) of viral proteins in the host cell is important for understanding their function in depth. Therefore, we have developed "MSLVP", a two-tier prediction algorithm for predicting multiple SCLs of viral proteins. For this study, data sets of comprehensive viral proteins with experimentally validated SCL annotation were collected from UniProt. Non-redundant (90%) data sets of 3480 viral proteins that belonged to single (2715), double (391) and multiple (374) sites were employed. Additionally, 1687 (30% sequence identity) viral proteins were categorised into single (1366), double (167) and multiple (154) sites. Single, double and multiple locations further comprised of eight, four and six categories, respectively. Viral protein locations include the nucleus, cytoplasm, endoplasmic reticulum, extracellular, single-pass membrane, multi-pass membrane, capsid, remaining others and combinations thereof. Support vector machine based models were developed using sequence features like amino acid composition, dipeptide composition, physicochemical properties and their hybrids. We have employed "one-versus-one" as well as "one-versus-other" strategies for multiclass classification. The performance of "one-versus-one" is better than the "one-versus-other" approach during 10-fold cross-validation. For the 90% data set, we achieved an accuracy, a Matthew's correlation coefficient (MCC) and a receiver operating characteristic (ROC) of 99.99%, 1.00, 1.00; 100.00%, 1.00, 1.00 and 99.90%; 1.00, 1.00 for single, double and multiple locations, respectively. Similar results were achieved for a 30% sequence identity data set. Predictive models for each SCL performed equally well on the independent dataset. The MSLVP web server () can predict subcellular locations i.e. single (8; including single and multi-pass membrane), double (4) and multiple (6). This would be helpful for elucidating the functional annotation of viral proteins and potential drug targets.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 27272007     DOI: 10.1039/c6mb00241b

Source DB:  PubMed          Journal:  Mol Biosyst        ISSN: 1742-2051


  6 in total

1.  Computational Structural and Functional Analyses of ORF10 in Novel Coronavirus SARS-CoV-2 Variants to Understand Evolutionary Dynamics.

Authors:  Seema Mishra
Journal:  Evol Bioinform Online       Date:  2022-07-07       Impact factor: 2.031

2.  In silico analyses of conservational, functional and phylogenetic distribution of the LuxI and LuxR homologs in Gram-positive bacteria.

Authors:  Akanksha Rajput; Manoj Kumar
Journal:  Sci Rep       Date:  2017-08-01       Impact factor: 4.379

3.  Protein subnuclear localization based on a new effective representation and intelligent kernel linear discriminant analysis by dichotomous greedy genetic algorithm.

Authors:  Shunfang Wang; Yaoting Yue
Journal:  PLoS One       Date:  2018-04-12       Impact factor: 3.240

4.  ASFVdb: an integrative resource for genomic and proteomic analyses of African swine fever virus.

Authors:  Zhenglin Zhu; Geng Meng
Journal:  Database (Oxford)       Date:  2020-01-01       Impact factor: 3.451

5.  Identification of Proteins of Tobacco Mosaic Virus by Using a Method of Feature Extraction.

Authors:  Yu-Miao Chen; Xin-Ping Zu; Dan Li
Journal:  Front Genet       Date:  2020-10-09       Impact factor: 4.599

6.  Anti-Ebola: an initiative to predict Ebola virus inhibitors through machine learning.

Authors:  Akanksha Rajput; Manoj Kumar
Journal:  Mol Divers       Date:  2021-08-06       Impact factor: 2.943

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.