Literature DB >> 33732270

Identifying Plant Pentatricopeptide Repeat Proteins Using a Variable Selection Method.

Xudong Zhao1, Hanxu Wang1, Hangyu Li1, Yiming Wu1, Guohua Wang1,2.   

Abstract

Motivation: Pentatricopeptide repeat (PPR), which is a triangular pentapeptide repeat domain, plays an important role in plant growth. Features extracted from sequences are applicable to PPR protein identification using certain classification methods. However, which components of a multidimensional feature (namely variables) are more effective for protein discrimination has never been discussed. Therefore, we seek to select variables from a multidimensional feature for identifying PPR proteins. Method: A framework of variable selection for identifying PPR proteins is proposed. Samples representing PPR positive proteins and negative ones are equally split into a training and a testing set. Variable importance is regarded as scores derived from an iteration of resampling, training, and scoring step on the training set. A model selection method based on Gaussian mixture model is applied to automatic choice of variables which are effective to identify PPR proteins. Measurements are used on the testing set to show the effectiveness of the selected variables.
Results: Certain variables other than the multidimensional feature they belong to do work for discrimination between PPR positive proteins and those negative ones. In addition, the content of methionine may play an important role in predicting PPR proteins.
Copyright © 2021 Zhao, Wang, Li, Wu and Wang.

Entities:  

Keywords:  Gaussian mixture model; model selection; pentatricopeptide repeat; random forest; variable importance; variable selection

Year:  2021        PMID: 33732270      PMCID: PMC7957076          DOI: 10.3389/fpls.2021.506681

Source DB:  PubMed          Journal:  Front Plant Sci        ISSN: 1664-462X            Impact factor:   5.753


  20 in total

1.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors:  Weizhong Li; Adam Godzik
Journal:  Bioinformatics       Date:  2006-05-26       Impact factor: 6.937

2.  Incorporating Distance-Based Top-n-gram and Random Forest To Identify Electron Transport Proteins.

Authors:  Xiaoqing Ru; Lihong Li; Quan Zou
Journal:  J Proteome Res       Date:  2019-06-03       Impact factor: 4.466

3.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes.

Authors:  Kuo-Chen Chou
Journal:  Bioinformatics       Date:  2004-08-12       Impact factor: 6.937

4.  Unexpected functional versatility of the pentatricopeptide repeat proteins PGR3, PPR5 and PPR10.

Authors:  Margarita Rojas; Hannes Ruwe; Rafael G Miranda; Reimo Zoschke; Nora Hase; Christian Schmitz-Linneweber; Alice Barkan
Journal:  Nucleic Acids Res       Date:  2018-11-02       Impact factor: 16.971

5.  HBPred: a tool to identify growth hormone-binding proteins.

Authors:  Hua Tang; Ya-Wei Zhao; Ping Zou; Chun-Mei Zhang; Rong Chen; Po Huang; Hao Lin
Journal:  Int J Biol Sci       Date:  2018-05-22       Impact factor: 6.580

6.  A Random Forest Sub-Golgi Protein Classifier Optimized via Dipeptide and Amino Acid Composition Features.

Authors:  Zhibin Lv; Shunshan Jin; Hui Ding; Quan Zou
Journal:  Front Bioeng Biotechnol       Date:  2019-09-04

7.  ECFS-DEA: an ensemble classifier-based feature selection for differential expression analysis on expression profiles.

Authors:  Xudong Zhao; Qing Jiao; Hangyu Li; Yiming Wu; Hanxu Wang; Shan Huang; Guohua Wang
Journal:  BMC Bioinformatics       Date:  2020-02-05       Impact factor: 3.169

8.  enDNA-Prot: identification of DNA-binding proteins by applying ensemble learning.

Authors:  Ruifeng Xu; Jiyun Zhou; Bin Liu; Lin Yao; Yulan He; Quan Zou; Xiaolong Wang
Journal:  Biomed Res Int       Date:  2014-05-26       Impact factor: 3.411

9.  nDNA-Prot: identification of DNA-binding proteins based on unbalanced classification.

Authors:  Li Song; Dapeng Li; Xiangxiang Zeng; Yunfeng Wu; Li Guo; Quan Zou
Journal:  BMC Bioinformatics       Date:  2014-09-08       Impact factor: 3.169

10.  Identifying Plant Pentatricopeptide Repeat Coding Gene/Protein Using Mixed Feature Extraction Methods.

Authors:  Kaiyang Qu; Leyi Wei; Jiantao Yu; Chunyu Wang
Journal:  Front Plant Sci       Date:  2019-01-10       Impact factor: 5.753

View more
  8 in total

1.  SNAREs-SAP: SNARE Proteins Identification With PSSM Profiles.

Authors:  Zixiao Zhang; Yue Gong; Bo Gao; Hongfei Li; Wentao Gao; Yuming Zhao; Benzhi Dong
Journal:  Front Genet       Date:  2021-12-20       Impact factor: 4.599

2.  KK-DBP: A Multi-Feature Fusion Method for DNA-Binding Protein Identification Based on Random Forest.

Authors:  Yuran Jia; Shan Huang; Tianjiao Zhang
Journal:  Front Genet       Date:  2021-11-29       Impact factor: 4.599

Review 3.  Identify DNA-Binding Proteins Through the Extreme Gradient Boosting Algorithm.

Authors:  Ziye Zhao; Wen Yang; Yixiao Zhai; Yingjian Liang; Yuming Zhao
Journal:  Front Genet       Date:  2022-01-28       Impact factor: 4.599

4.  Immunoglobulin Classification Based on FC* and GC* Features.

Authors:  Hao Wan; Jina Zhang; Yijie Ding; Hetian Wang; Geng Tian
Journal:  Front Genet       Date:  2022-01-24       Impact factor: 4.599

5.  Ensemble Learning-Based Feature Selection for Phage Protein Prediction.

Authors:  Songbo Liu; Chengmin Cui; Huipeng Chen; Tong Liu
Journal:  Front Microbiol       Date:  2022-07-15       Impact factor: 6.064

6.  IBPred: A sequence-based predictor for identifying ion binding protein in phage.

Authors:  Shi-Shi Yuan; Dong Gao; Xue-Qin Xie; Cai-Yi Ma; Wei Su; Zhao-Yue Zhang; Yan Zheng; Hui Ding
Journal:  Comput Struct Biotechnol J       Date:  2022-08-28       Impact factor: 6.155

7.  VTP-Identifier: Vesicular Transport Proteins Identification Based on PSSM Profiles and XGBoost.

Authors:  Yue Gong; Benzhi Dong; Zixiao Zhang; Yixiao Zhai; Bo Gao; Tianjiao Zhang; Jingyu Zhang
Journal:  Front Genet       Date:  2022-01-03       Impact factor: 4.599

Review 8.  AOPM: Application of Antioxidant Protein Classification Model in Predicting the Composition of Antioxidant Drugs.

Authors:  Yixiao Zhai; Jingyu Zhang; Tianjiao Zhang; Yue Gong; Zixiao Zhang; Dandan Zhang; Yuming Zhao
Journal:  Front Pharmacol       Date:  2022-01-18       Impact factor: 5.810

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.