Literature DB >> 12183124

Amino acid encoding schemes from protein structure alignments: multi-dimensional vectors to describe residue types.

Kuang Lin1, Alex C W May, William R Taylor.   

Abstract

Bioinformatic software has used various numerical encoding schemes to describe amino acid sequences. Orthogonal encoding, employing 20 numbers to describe the amino acid type of one protein residue, is often used with artificial neural network (ANN) models. However, this can increase the model complexity, thus leading to difficulty in implementation and poor performance. Here, we use ANNs to derive encoding schemes for the amino acid types from protein three-dimensional structure alignments. Each of the 20 amino acid types is characterized with a few real numbers. Our schemes are tested on the simulation of amino acid substitution matrices. These simplified schemes outperform the orthogonal encoding on small data sets. Using one of these encoding schemes, we generate a colouring scheme for the amino acids in which comparable amino acids are in similar colours. We expect it to be useful for visual inspection and manual editing of protein multiple sequence alignments.

Mesh:

Year:  2002        PMID: 12183124     DOI: 10.1006/jtbi.2001.2512

Source DB:  PubMed          Journal:  J Theor Biol        ISSN: 0022-5193            Impact factor:   2.691


  8 in total

Review 1.  Visualization of multiple alignments, phylogenies and gene family evolution.

Authors:  James B Procter; Julie Thompson; Ivica Letunic; Chris Creevey; Fabrice Jossinet; Geoffrey J Barton
Journal:  Nat Methods       Date:  2010-03       Impact factor: 28.547

2.  iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization.

Authors:  Zhen Chen; Pei Zhao; Chen Li; Fuyi Li; Dongxu Xiang; Yong-Zi Chen; Tatsuya Akutsu; Roger J Daly; Geoffrey I Webb; Quanzhi Zhao; Lukasz Kurgan; Jiangning Song
Journal:  Nucleic Acids Res       Date:  2021-06-04       Impact factor: 16.971

3.  BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches.

Authors:  Bin Liu; Xin Gao; Hanyu Zhang
Journal:  Nucleic Acids Res       Date:  2019-11-18       Impact factor: 16.971

4.  On the utility of alternative amino acid scripts.

Authors:  Darren R Flower
Journal:  Bioinformation       Date:  2012-06-28

Review 5.  A review on compound-protein interaction prediction methods: Data, format, representation and model.

Authors:  Sangsoo Lim; Yijingxiu Lu; Chang Yun Cho; Inyoung Sung; Jungwoo Kim; Youngkuk Kim; Sungjoon Park; Sun Kim
Journal:  Comput Struct Biotechnol J       Date:  2021-03-10       Impact factor: 7.271

6.  Prediction of viral-host interactions of COVID-19 by computational methods.

Authors:  Talha Burak Alakus; Ibrahim Turkoglu
Journal:  Chemometr Intell Lab Syst       Date:  2022-07-21       Impact factor: 4.175

7.  SiteSeek: post-translational modification analysis using adaptive locality-effective kernel methods and new profiles.

Authors:  Paul D Yoo; Yung Shwen Ho; Bing Bing Zhou; Albert Y Zomaya
Journal:  BMC Bioinformatics       Date:  2008-06-10       Impact factor: 3.169

8.  A Deep Learning Approach for Predicting Antigenic Variation of Influenza A H3N2.

Authors:  Yuan-Ling Xia; Weihua Li; Yongping Li; Xing-Lai Ji; Yun-Xin Fu; Shu-Qun Liu
Journal:  Comput Math Methods Med       Date:  2021-10-16       Impact factor: 2.238

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.