Literature DB >> 31603468

SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting.

Bin Yu1,2,3,4, Wenying Qiu1,3, Cheng Chen1,3, Anjun Ma5, Jing Jiang5,6, Hongyan Zhou1,3, Qin Ma5.   

Abstract

MOTIVATION: Mitochondria are an essential organelle in most eukaryotes. They not only play an important role in energy metabolism but also take part in many critical cytopathological processes. Abnormal mitochondria can trigger a series of human diseases, such as Parkinson's disease, multifactor disorder and Type-II diabetes. Protein submitochondrial localization enables the understanding of protein function in studying disease pathogenesis and drug design.
RESULTS: We proposed a new method, SubMito-XGBoost, for protein submitochondrial localization prediction. Three steps are included: (i) the g-gap dipeptide composition (g-gap DC), pseudo-amino acid composition (PseAAC), auto-correlation function (ACF) and Bi-gram position-specific scoring matrix (Bi-gram PSSM) are employed to extract protein sequence features, (ii) Synthetic Minority Oversampling Technique (SMOTE) is used to balance samples, and the ReliefF algorithm is applied for feature selection and (iii) the obtained feature vectors are fed into XGBoost to predict protein submitochondrial locations. SubMito-XGBoost has obtained satisfactory prediction results by the leave-one-out-cross-validation (LOOCV) compared with existing methods. The prediction accuracies of the SubMito-XGBoost method on the two training datasets M317 and M983 were 97.7% and 98.9%, which are 2.8-12.5% and 3.8-9.9% higher than other methods, respectively. The prediction accuracy of the independent test set M495 was 94.8%, which is significantly better than the existing studies. The proposed method also achieves satisfactory predictive performance on plant and non-plant protein submitochondrial datasets. SubMito-XGBoost also plays an important role in new drug design for the treatment of related diseases.
AVAILABILITY AND IMPLEMENTATION: The source codes and data are publicly available at https://github.com/QUST-AIBBDRC/SubMito-XGBoost/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Year:  2020        PMID: 31603468     DOI: 10.1093/bioinformatics/btz734

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  26 in total

1.  Computational prediction of species-specific yeast DNA replication origin via iterative feature representation.

Authors:  Balachandran Manavalan; Shaherin Basith; Tae Hwan Shin; Gwang Lee
Journal:  Brief Bioinform       Date:  2021-07-20       Impact factor: 11.622

2.  DeepStack-DTIs: Predicting Drug-Target Interactions Using LightGBM Feature Selection and Deep-Stacked Ensemble Classifier.

Authors:  Yan Zhang; Zhiwen Jiang; Cheng Chen; Qinqin Wei; Haiming Gu; Bin Yu
Journal:  Interdiscip Sci       Date:  2021-11-03       Impact factor: 2.233

3.  Supervised Pretraining through Contrastive Categorical Positive Samplings to Improve COVID-19 Mortality Prediction.

Authors:  Tingyi Wanyan; Mingquan Lin; Eyal Klang; Kartikeya M Menon; Faris F Gulamali; Ariful Azad; Yiye Zhang; Ying Ding; Zhangyang Wang; Fei Wang; Benjamin Glicksberg; Yifan Peng
Journal:  ACM BCB       Date:  2022-08-07

4.  A versatile active learning workflow for optimization of genetic and metabolic networks.

Authors:  Amir Pandi; Christoph Diehl; Ali Yazdizadeh Kharrazi; Scott A Scholz; Elizaveta Bobkova; Léon Faure; Maren Nattermann; David Adam; Nils Chapin; Yeganeh Foroughijabbari; Charles Moritz; Nicole Paczia; Niña Socorro Cortina; Jean-Loup Faulon; Tobias J Erb
Journal:  Nat Commun       Date:  2022-07-05       Impact factor: 17.694

5.  Extremely-randomized-tree-based Prediction of N6-Methyladenosine Sites in Saccharomyces cerevisiae.

Authors:  Rajiv G Govindaraj; Sathiyamoorthy Subramaniyam; Balachandran Manavalan
Journal:  Curr Genomics       Date:  2020-01       Impact factor: 2.236

6.  A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features.

Authors:  Changli Feng; Zhaogui Ma; Deyun Yang; Xin Li; Jun Zhang; Yanjuan Li
Journal:  Front Bioeng Biotechnol       Date:  2020-05-05

7.  Identification of Human Enzymes Using Amino Acid Composition and the Composition of k-Spaced Amino Acid Pairs.

Authors:  Lifu Zhang; Benzhi Dong; Zhixia Teng; Ying Zhang; Liran Juan
Journal:  Biomed Res Int       Date:  2020-05-22       Impact factor: 3.411

8.  Estimation of the LDL subclasses in ischemic stroke as a risk factor in a Chinese population.

Authors:  Ruisheng Duan; Wenjun Xue; Kunpeng Wang; Nan Yin; Hongyu Hao; Hongshan Chu; Lijun Wang; Peng Meng; Le Diao
Journal:  BMC Neurol       Date:  2020-11-13       Impact factor: 2.474

9.  Regional Population Forecast and Analysis Based on Machine Learning Strategy.

Authors:  Chian-Yue Wang; Shin-Jye Lee
Journal:  Entropy (Basel)       Date:  2021-05-24       Impact factor: 2.524

10.  DeepPred-SubMito: A Novel Submitochondrial Localization Predictor Based on Multi-Channel Convolutional Neural Network and Dataset Balancing Treatment.

Authors:  Xiao Wang; Yinping Jin; Qiuwen Zhang
Journal:  Int J Mol Sci       Date:  2020-08-09       Impact factor: 5.923

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.