Literature DB >> 32520672

Improved Prediction of Protein-Protein Interaction Mapping on Homo Sapiens by Using Amino Acid Sequence Features in a Supervised Learning Framework.

Md Merajul Islam1, Md Jahangir Alam1, Fee Faysal Ahmed2, Md Mehedi Hasan3, Md Nurul Haque Mollah1.   

Abstract

BACKGROUND: Protein-Protein Interaction (PPI) has emerged as a key role in the control of many biological processes including protein function, disease incidence, and therapy design. However, the identification of PPI by wet lab experiment is a challenging task, since it is laborious, time consuming and expensive. Therefore, computational prediction of PPI is now given emphasis before going to the experimental validation, since it is simultaneously less laborious, time saver and cost minimizer.
OBJECTIVE: The objective of this study is to develop an improved computational method for PPI prediction mapping on Homo sapiens by using the amino acid sequence features in a supervised learning framework.
METHODS: The experimentally validated 91 positive-PPI pairs of human protein sequences were collected from IntAct Molecular Interaction Database. Then we constructed three balanced datasets with ratios 1:1, 1:2 and 1:3 of positive and negative PPI samples. Then we partitioned each dataset into training (80%) and independent test (20%) datasets. Again each training dataset was partitioned into four mutually exclusive groups of equal sizes for interchanging each group with independent test group to perform 5-fold cross validation (CV). Then we trained candidate seven classifiers (NN, SVM, LR, NB, KNN, AB and RF) with each ratio case to obtain the better PPI predictor by comparing their performance scores.
RESULTS: The random forest (RF) based predictor that was trained with 1:2 ratio of positive-PPI and negative-PPI samples based on AAC encoding features provided the most accurate PPI prediction by producing the highest average performance scores of accuracy (93.50%), sensitivity (95.0%), MCC (85.2%), AUC (0.941) and pAUC (0.236) with the 5-fold cross-validation. It also achieved the highest average performance scores of accuracy (92.0%), sensitivity (94.0%), MCC (83.6%), AUC (0.922) and pAUC (0.207) with the independent test datasets in a comparison of the other candidate and existing predictors.
CONCLUSION: The final resultant prediction strongly recommend that the RF based predictor is a better prediction model of PPI mapping on Homo sapiens. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.net.

Entities:  

Keywords:  Protein sequence; feature selection; performance comparison; protein-protein interaction (PPI) prediction; random forest; sequence encoding; supervisedzzm321990learning framework

Mesh:

Substances:

Year:  2021        PMID: 32520672     DOI: 10.2174/0929866527666200610141258

Source DB:  PubMed          Journal:  Protein Pept Lett        ISSN: 0929-8665            Impact factor:   1.890


  3 in total

1.  Prediction of serine phosphorylation sites mapping on Schizosaccharomyces Pombe by fusing three encoding schemes with the random forest classifier.

Authors:  Samme Amena Tasmia; Md Kaderi Kibria; Khanis Farhana Tuly; Md Ariful Islam; Mst Shamima Khatun; Md Mehedi Hasan; Md Nurul Haque Mollah
Journal:  Sci Rep       Date:  2022-02-16       Impact factor: 4.379

Review 2.  Evolution of Sequence-based Bioinformatics Tools for Protein-protein Interaction Prediction.

Authors:  Mst Shamima Khatun; Watshara Shoombuatong; Md Mehedi Hasan; Hiroyuki Kurata
Journal:  Curr Genomics       Date:  2020-09       Impact factor: 2.236

3.  An Improved Computational Prediction Model for Lysine Succinylation Sites Mapping on Homo sapiens by Fusing Three Sequence Encoding Schemes with the Random Forest Classifier.

Authors:  Samme Amena Tasmia; Fee Faysal Ahmed; Parvez Mosharaf; Mehedi Hasan; Nurul Haque Mollah
Journal:  Curr Genomics       Date:  2021-02       Impact factor: 2.236

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.