Literature DB >> 30726079

Random Forest Refinement of the KECSA2 Knowledge-Based Scoring Function for Protein Decoy Detection.

Jun Pei1, Zheng Zheng1, Kenneth M Merz1,2.   

Abstract

Knowledge-based potentials generally perform better than physics-based scoring functions in detecting the native structure from a collection of decoy protein structures. Through the use of a reference state, the pure interactions between atom/residue pairs can be obtained through the removal of contributions from ideal-gas state potentials. However, it is a challenge for conventional knowledge-based potentials to assign different importance factors to different atom/residue pairs. In this work, via the use of the "comparison" concept, Random Forest (RF) models were successfully generated using unbalanced data sets that assign different importance factors to atom pair potentials to enhance their ability to identify native proteins from decoy proteins. Individual and combined data sets consisting of 12 decoy sets were used to test the performance of the RF models. We find that RF models increase the recognition of native structures without affecting their ability to identify the best decoy structures. We also created models using scrambled atom types, which create physically unrealistic probability functions in order to test the ability of the RF algorithm to create useful models based on inputted scrambled probability functions. From this test, we find that we are unable to create models that are of similar quality relative to the unscrambled probability functions. Next, we created uniform probability functions where the peak positions are the same as the original, but each interaction has the same peak height. Using these uniform potentials, we were able to recover models as good as the ones using the full potentials suggesting all that is important in these models are the experimental peak positions. The KECSA2 potential along with all codes used in this work are available at https://github.com/JunPei000/protein_folding-decoy-set .

Mesh:

Substances:

Year:  2019        PMID: 30726079     DOI: 10.1021/acs.jcim.8b00734

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  2 in total

1.  Nonparametric chemical descriptors for the calculation of ligand-biopolymer affinities with machine-learning scoring functions.

Authors:  Edelmiro Moman; Maria A Grishina; Vladimir A Potemkin
Journal:  J Comput Aided Mol Des       Date:  2019-11-14       Impact factor: 3.686

2.  A simple neural network implementation of generalized solvation free energy for assessment of protein structural models.

Authors:  Shiyang Long; Pu Tian
Journal:  RSC Adv       Date:  2019-11-06       Impact factor: 4.036

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.