Literature DB >> 29183738

isGPT: An optimized model to identify sub-Golgi protein types using SVM and Random Forest based feature selection.

M Saifur Rahman1, Md Khaledur Rahman2, M Kaykobad3, M Sohel Rahman4.   

Abstract

The Golgi Apparatus (GA) is a key organelle for protein synthesis within the eukaryotic cell. The main task of GA is to modify and sort proteins for transport throughout the cell. Proteins permeate through the GA on the ER (Endoplasmic Reticulum) facing side (cis side) and depart on the other side (trans side). Based on this phenomenon, we get two types of GA proteins, namely, cis-Golgi protein and trans-Golgi protein. Any dysfunction of GA proteins can result in congenital glycosylation disorders and some other forms of difficulties that may lead to neurodegenerative and inherited diseases like diabetes, cancer and cystic fibrosis. So, the exact classification of GA proteins may contribute to drug development which will further help in medication. In this paper, we focus on building a new computational model that not only introduces easy ways to extract features from protein sequences but also optimizes classification of trans-Golgi and cis-Golgi proteins. After feature extraction, we have employed Random Forest (RF) model to rank the features based on the importance score obtained from it. After selecting the top ranked features, we have applied Support Vector Machine (SVM) to classify the sub-Golgi proteins. We have trained regression model as well as classification model and found the former to be superior. The model shows improved performance over all previous methods. As the benchmark dataset is significantly imbalanced, we have applied Synthetic Minority Over-sampling Technique (SMOTE) to the dataset to make it balanced and have conducted experiments on both versions. Our method, namely, identification of sub-Golgi Protein Types (isGPT), achieves accuracy values of 95.4%, 95.9% and 95.3% for 10-fold cross-validation test, jackknife test and independent test respectively. According to different performance metrics, isGPT performs better than state-of-the-art techniques. The source code of isGPT, along with relevant dataset and detailed experimental results, can be found at https://github.com/srautonu/isGPT.
Copyright © 2017 Elsevier B.V. All rights reserved.

Entities:  

Keywords:  Classification; Random Forest; Regression; Sub-Golgi Apparatus; Support vector machine

Mesh:

Substances:

Year:  2017        PMID: 29183738     DOI: 10.1016/j.artmed.2017.11.003

Source DB:  PubMed          Journal:  Artif Intell Med        ISSN: 0933-3657            Impact factor:   5.326


  7 in total

1.  Identification of Sub-Golgi protein localization by use of deep representation learning features.

Authors:  Zhibin Lv; Pingping Wang; Quan Zou; Qinghua Jiang
Journal:  Bioinformatics       Date:  2020-12-26       Impact factor: 6.937

2.  Hybrid Random Forest and Support Vector Machine Modeling for HVAC Fault Detection and Diagnosis.

Authors:  Wunna Tun; Johnny Kwok-Wai Wong; Sai-Ho Ling
Journal:  Sensors (Basel)       Date:  2021-12-07       Impact factor: 3.576

Review 3.  Machine Learning-Based Epileptic Seizure Detection Methods Using Wavelet and EMD-Based Decomposition Techniques: A Review.

Authors:  Rabindra Gandhi Thangarajoo; Mamun Bin Ibne Reaz; Geetika Srivastava; Fahmida Haque; Sawal Hamid Md Ali; Ahmad Ashrif A Bakar; Mohammad Arif Sobhan Bhuiyan
Journal:  Sensors (Basel)       Date:  2021-12-20       Impact factor: 3.576

4.  Immunoglobulin Classification Based on FC* and GC* Features.

Authors:  Hao Wan; Jina Zhang; Yijie Ding; Hetian Wang; Geng Tian
Journal:  Front Genet       Date:  2022-01-24       Impact factor: 4.599

5.  Microtubule assembly and disassembly dynamics model: Exploring dynamic instability and identifying features of Microtubules' Growth, Catastrophe, Shortening, and Rescue.

Authors:  Evgenii Kliuchnikov; Eugene Klyshko; Maria S Kelly; Artem Zhmurov; Ruxandra I Dima; Kenneth A Marx; Valeri Barsegov
Journal:  Comput Struct Biotechnol J       Date:  2022-01-31       Impact factor: 7.271

6.  CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning.

Authors:  Ali Haisam Muhammad Rafid; Md Toufikuzzaman; Mohammad Saifur Rahman; M Sohel Rahman
Journal:  BMC Bioinformatics       Date:  2020-06-01       Impact factor: 3.169

Review 7.  Artificial Intelligence (AI) in Rare Diseases: Is the Future Brighter?

Authors:  Sandra Brasil; Carlota Pascoal; Rita Francisco; Vanessa Dos Reis Ferreira; Paula A Videira; And Gonçalo Valadão
Journal:  Genes (Basel)       Date:  2019-11-27       Impact factor: 4.096

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.