Literature DB >> 30998480

Amino Acid Encoding Methods for Protein Sequences: A Comprehensive Review and Assessment.

Xiaoyang Jing, Qiwen Dong, Daocheng Hong, Ruqian Lu.   

Abstract

As the first step of machine-learning based protein structure and function prediction, the amino acid encoding play a fundamental role in the final success of those methods. Different from the protein sequence encoding, the amino acid encoding can be used in both residue-level and sequence-level prediction of protein properties by combining them with different algorithms. However, it has not attracted enough attention in the past decades, and there are no comprehensive reviews and assessments about encoding methods so far. In this article, we make a systematic classification and propose a comprehensive review and assessment for various amino acid encoding methods. Those methods are grouped into five categories according to their information sources and information extraction methodologies, including binary encoding, physicochemical properties encoding, evolution-based encoding, structure-based encoding, and machine-learning encoding. Then, 16 representative methods from five categories are selected and compared on protein secondary structure prediction and protein fold recognition tasks by using large-scale benchmark datasets. The results show that the evolution-based position-dependent encoding method PSSM achieved the best performance, and the structure-based and machine-learning encoding methods also show some potential for further application, the neural network based distributed representation of amino acids in particular may bring new light to this area. We hope that the review and assessment are useful for future studies in amino acid encoding.

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 30998480     DOI: 10.1109/TCBB.2019.2911677

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  10 in total

1.  i2APP: A Two-Step Machine Learning Framework For Antiparasitic Peptides Identification.

Authors:  Minchao Jiang; Renfeng Zhang; Yixiao Xia; Gangyong Jia; Yuyu Yin; Pu Wang; Jian Wu; Ruiquan Ge
Journal:  Front Genet       Date:  2022-04-27       Impact factor: 4.772

2.  Prediction of Protein-Protein Interaction Sites Using Convolutional Neural Network and Improved Data Sets.

Authors:  Zengyan Xie; Xiaoya Deng; Kunxian Shu
Journal:  Int J Mol Sci       Date:  2020-01-11       Impact factor: 5.923

3.  Effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction.

Authors:  Angela Lopez-Del Rio; Maria Martin; Alexandre Perera-Lluna; Rabie Saidi
Journal:  Sci Rep       Date:  2020-09-03       Impact factor: 4.379

4.  A Novel Protein Mapping Method for Predicting the Protein Interactions in COVID-19 Disease by Deep Learning.

Authors:  Talha Burak Alakus; Ibrahim Turkoglu
Journal:  Interdiscip Sci       Date:  2021-01-12       Impact factor: 2.233

Review 5.  A Review of Sensors and Biosensors Modified with Conducting Polymers and Molecularly Imprinted Polymers Used in Electrochemical Detection of Amino Acids: Phenylalanine, Tyrosine, and Tryptophan.

Authors:  Ancuța Dinu; Constantin Apetrei
Journal:  Int J Mol Sci       Date:  2022-01-22       Impact factor: 5.923

6.  GLTM: A Global-Local Attention LSTM Model to Locate Dimer Motif of Single-Pass Membrane Proteins.

Authors:  Quanchao Ma; Kai Zou; Zhihai Zhang; Fan Yang
Journal:  Front Genet       Date:  2022-03-15       Impact factor: 4.599

7.  Recognition of Protein Network for Bioinformatics Knowledge Analysis Using Support Vector Machine.

Authors:  Arshpreet Kaur; Abhijit Chitre; Kirti Wanjale; Pankaj Kumar; Shahajan Miah; Arnold C Alguno
Journal:  Biomed Res Int       Date:  2022-04-23       Impact factor: 3.246

8.  FFP: joint Fast Fourier transform and fractal dimension in amino acid property-aware phylogenetic analysis.

Authors:  Wei Li; Lina Yang; Yu Qiu; Yujian Yuan; Xichun Li; Zuqiang Meng
Journal:  BMC Bioinformatics       Date:  2022-08-19       Impact factor: 3.307

9.  In-Pero: Exploiting Deep Learning Embeddings of Protein Sequences to Predict the Localisation of Peroxisomal Proteins.

Authors:  Marco Anteghini; Vitor Martins Dos Santos; Edoardo Saccenti
Journal:  Int J Mol Sci       Date:  2021-06-15       Impact factor: 5.923

10.  Deep learning program to predict protein functions based on sequence information.

Authors:  Chang Woo Ko; June Huh; Jong-Wan Park
Journal:  MethodsX       Date:  2022-01-15
  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.