Literature DB >> 34562175

Development and validation of multiple machine learning algorithms for the classification of G-protein-coupled receptors using molecular evolution model-based feature extraction strategy.

Cheng Ling1, Xiaolin Wei1, Yitian Shen1, Haoyu Zhang2.   

Abstract

Machine learning is one of the most potential ways to realize the function prediction of the incremental large-scale G-protein-coupled receptors (GPCR). Prior research reveals that the key to determining the overall classification accuracy of GPCR is extracting valuable features and filtering out redundancy. To achieve a more efficient classification model, we put the feature synonym problem into consideration and create a new method based on functional word clustering and integration. Through evaluating the evolution correlation between features using the transition scores in mature molecular substitution matrices, candidate features are clustered into synonym groups. Each group of the clustered features is then integrated and represented by a unique key functional word. These retained key functional words are used to form a feature knowledge base. The original GPCR sequences are then transferred into feature vectors based on a feature re-extraction strategy according to the features in the knowledge base before the training and testing stage. We create multiple machine learning models based on Naïve Bayesian (NB), random forest (RF), support vector machine (SVM), and multi-layer perceptron (MLP) algorithms. The established model is applied to classify two public data sets containing 8354 and 12,731 GPCRs, respectively. These models achieve significant performance in almost all evaluation criteria in comparison with state-of-the art. This work demonstrated the potential of the novel feature extraction strategy and provided an effective theoretical design for the hierarchical classification of GPCRs.
© 2021. The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature.

Entities:  

Keywords:  Artificial neural network; Classification; G-protein-coupled receptors; Machine learning

Mesh:

Substances:

Year:  2021        PMID: 34562175     DOI: 10.1007/s00726-021-03080-x

Source DB:  PubMed          Journal:  Amino Acids        ISSN: 0939-4451            Impact factor:   3.520


  1 in total

Review 1.  Metabolic networks: a signal-oriented approach to cellular models.

Authors:  J W Lengeler
Journal:  Biol Chem       Date:  2000 Sep-Oct       Impact factor: 3.915

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.