Literature DB >> 16413021

Novel 2D maps and coupling numbers for protein sequences. The first QSAR study of polygalacturonases; isolation and prediction of a novel sequence from Psidium guajava L.

Guillermín Agüero-Chapin1, Humberto González-Díaz, Reinaldo Molina, Javier Varona-Santos, Eugenio Uriarte, Yenny González-Díaz.   

Abstract

The development of 2D graph-theoretic representations for DNA sequences was very important for qualitative and quantitative comparison of sequences. Calculation of numeric features for these representations is useful for DNA-QSAR studies. Most of all graph-theoretic representations identify each one of the four bases with a unitary walk in one axe direction in the 2D space. In the case of proteins, twenty amino acids instead of four bases have to be considered. This fact has limited the introduction of useful 2D Cartesian representations and the corresponding sequences descriptors to encode protein sequence information. In this study, we overcome this problem grouping amino acids into four groups: acid, basic, polar and non-polar amino acids. The identification of each group with one of the four axis directions determines a novel 2D representation and numeric descriptors for proteins sequences. Afterwards, a Markov model has been used to calculate new numeric descriptors of the protein sequence. These descriptors are called herein the sequence 2D coupling numbers (zeta(k)). In this work, we calculated the zeta(k) values for 108 sequences of different polygalacturonases (PGs) and for 100 sequences of other proteins. A Linear Discriminant Analysis model derived here (PG=5.36.zeta1-3.98.zeta3-42.21) successfully discriminates between PGs and other proteins. The model correctly classified 100% of a subset of 81 PGs and 75 non-PG proteins sequences used to train the model. The model also correctly classified 51 out of 52 (98.07%) of proteins sequences used as external validation series. The uses of different group of amino acids and/or axes orientation give different results, so it is suggested to be explored for other databases. Finally, to illustrates the use of the model we report the isolation and prediction of the PG action for a novel sequence (AY908988) isolated by our group from Psidium guajava L. This prediction coincides very well with sequence alignment results found by the BLAST methodology. These findings illustrate the possibilities of the sequence descriptors derived for this novel 2D sequence representation in proteins sequence QSAR studies.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 16413021     DOI: 10.1016/j.febslet.2005.12.072

Source DB:  PubMed          Journal:  FEBS Lett        ISSN: 0014-5793            Impact factor:   4.124


  14 in total

1.  Prediction of subcellular location of mycobacterial protein using feature selection techniques.

Authors:  Hao Lin; Hui Ding; Feng-Biao Guo; Jian Huang
Journal:  Mol Divers       Date:  2009-11-12       Impact factor: 2.943

2.  A study of the Immune Epitope Database for some fungi species using network topological indices.

Authors:  Severo Vázquez-Prieto; Esperanza Paniagua; Hugo Solana; Florencio M Ubeira; Humberto González-Díaz
Journal:  Mol Divers       Date:  2017-05-31       Impact factor: 2.943

3.  Study of peptide fingerprints of parasite proteins and drug-DNA interactions with Markov-Mean-Energy invariants of biopolymer molecular-dynamic lattice networks.

Authors:  Lázaro Guillermo Pérez-Montoto; María Auxiliadora Dea-Ayuela; Francisco J Prado-Prado; Francisco Bolas-Fernández; Florencio M Ubeira; Humberto González-Díaz
Journal:  Polymer (Guildf)       Date:  2009-06-03       Impact factor: 4.430

4.  An alignment-free approach for eukaryotic ITS2 annotation and phylogenetic inference.

Authors:  Guillermin Agüero-Chapin; Aminael Sánchez-Rodríguez; Pedro I Hidalgo-Yanes; Yunierkis Pérez-Castillo; Reinaldo Molina-Ruiz; Kathleen Marchal; Vítor Vasconcelos; Agostinho Antunes
Journal:  PLoS One       Date:  2011-10-26       Impact factor: 3.240

Review 5.  Graphical representation and mathematical characterization of protein sequences and applications to viral proteins.

Authors:  Ambarnil Ghosh; Ashesh Nandy
Journal:  Adv Protein Chem Struct Biol       Date:  2011       Impact factor: 3.507

6.  Graph Theory-Based Sequence Descriptors as Remote Homology Predictors.

Authors:  Guillermin Agüero-Chapin; Deborah Galpert; Reinaldo Molina-Ruiz; Evys Ancede-Gallardo; Gisselle Pérez-Machado; Gustavo A de la Riva; Agostinho Antunes
Journal:  Biomolecules       Date:  2019-12-23

7.  Exploring the adenylation domain repertoire of nonribosomal peptide synthetases using an ensemble of sequence-search methods.

Authors:  Guillermin Agüero-Chapin; Reinaldo Molina-Ruiz; Emanuel Maldonado; Gustavo de la Riva; Aminael Sánchez-Rodríguez; Vitor Vasconcelos; Agostinho Antunes
Journal:  PLoS One       Date:  2013-07-16       Impact factor: 3.240

8.  Alignment-Free Method to Predict Enzyme Classes and Subclasses.

Authors:  Riccardo Concu; M Natália D S Cordeiro
Journal:  Int J Mol Sci       Date:  2019-10-29       Impact factor: 5.923

9.  Scoring function for DNA-drug docking of anticancer and antiparasitic compounds based on spectral moments of 2D lattice graphs for molecular dynamics trajectories.

Authors:  Lázaro G Pérez-Montoto; Lourdes Santana; Humberto González-Díaz
Journal:  Eur J Med Chem       Date:  2009-06-17       Impact factor: 6.514

10.  Alignment-free prediction of mycobacterial DNA promoters based on pseudo-folding lattice network or star-graph topological indices.

Authors:  Alcides Perez-Bello; Cristian Robert Munteanu; Florencio M Ubeira; Alexandre Lopes De Magalhães; Eugenio Uriarte; Humberto González-Díaz
Journal:  J Theor Biol       Date:  2008-10-17       Impact factor: 2.691

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.