Literature DB >> 23184517

Fast algorithm for population-based protein structural model analysis.

Jingfen Zhang1, Dong Xu.   

Abstract

De novo protein structure prediction often generates a large population of candidates (models), and then selects near-native models through clustering. Existing structural model clustering methods are time consuming due to pairwise distance calculation between models. In this paper, we present a novel method for fast model clustering without losing the clustering accuracy. Instead of the commonly used pairwise root mean square deviation and TM-score values, we propose two new distance measures, Dscore1 and Dscore2, based on the comparison of the protein distance matrices for describing the difference and the similarity among models, respectively. The analysis indicates that both the correlation between Dscore1 and root mean square deviation and the correlation between Dscore2 and TM-score are high. Compared to the existing methods with calculation time quadratic to the number of models, our Dscore1-based clustering achieves a linearly time complexity while obtaining almost the same accuracy for near-native model selection. By using Dscore2 to select representatives of clusters, we can further improve the quality of the representatives with little increase in computing time. In addition, for large size (~500 k) models, we can give a fast data visualization based on the Dscore distribution in seconds to minutes. Our method has been implemented in a package named MUFOLD-CL, available at http://mufold.org/clustering.php.
© 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

Entities:  

Mesh:

Substances:

Year:  2013        PMID: 23184517      PMCID: PMC3641909          DOI: 10.1002/pmic.201200334

Source DB:  PubMed          Journal:  Proteomics        ISSN: 1615-9853            Impact factor:   3.984


  24 in total

1.  Completeness of NOEs in protein structure: a statistical analysis of NMR.

Authors:  J F Doreleijers; M L Raves; T Rullmann; R Kaptein
Journal:  J Biomol NMR       Date:  1999-06       Impact factor: 2.835

2.  MaxSub: an automated measure for the assessment of protein structure prediction quality.

Authors:  N Siew; A Elofsson; L Rychlewski; D Fischer
Journal:  Bioinformatics       Date:  2000-09       Impact factor: 6.937

3.  Protein structure prediction and structural genomics.

Authors:  D Baker; A Sali
Journal:  Science       Date:  2001-10-05       Impact factor: 47.728

Review 4.  Protein structure similarities.

Authors:  P Koehl
Journal:  Curr Opin Struct Biol       Date:  2001-06       Impact factor: 6.809

5.  A new family of global protein shape descriptors.

Authors:  Peter Røgen; Henrik Bohr
Journal:  Math Biosci       Date:  2003-04       Impact factor: 2.144

6.  Fully automated ab initio protein structure prediction using I-SITES, HMMSTR and ROSETTA.

Authors:  Christopher Bystroff; Yu Shao
Journal:  Bioinformatics       Date:  2002       Impact factor: 6.937

7.  LGA: A method for finding 3D similarities in protein structures.

Authors:  Adam Zemla
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

8.  SPICKER: a clustering approach to identify near-native protein folds.

Authors:  Yang Zhang; Jeffrey Skolnick
Journal:  J Comput Chem       Date:  2004-04-30       Impact factor: 3.376

9.  Scoring function for automated assessment of protein structure template quality.

Authors:  Yang Zhang; Jeffrey Skolnick
Journal:  Proteins       Date:  2004-12-01

10.  SCUD: fast structure clustering of decoys using reference state to remove overall rotation.

Authors:  Hongzhi Li; Yaoqi Zhou
Journal:  J Comput Chem       Date:  2005-08       Impact factor: 3.376

View more
  9 in total

1.  Massive integration of diverse protein quality assessment methods to improve template based modeling in CASP11.

Authors:  Renzhi Cao; Debswapna Bhattacharya; Badri Adhikari; Jilong Li; Jianlin Cheng
Journal:  Proteins       Date:  2015-09-29

2.  Large-scale model quality assessment for improving protein tertiary structure prediction.

Authors:  Renzhi Cao; Debswapna Bhattacharya; Badri Adhikari; Jilong Li; Jianlin Cheng
Journal:  Bioinformatics       Date:  2015-06-15       Impact factor: 6.937

3.  Identify High-Quality Protein Structural Models by Enhanced K-Means.

Authors:  Hongjie Wu; Haiou Li; Min Jiang; Cheng Chen; Qiang Lv; Chuang Wu
Journal:  Biomed Res Int       Date:  2017-03-22       Impact factor: 3.411

4.  Unsupervised and Supervised Learning over theEnergy Landscape for Protein Decoy Selection.

Authors:  Nasrin Akhter; Gopinath Chennupati; Kazi Lutful Kabir; Hristo Djidjev; Amarda Shehu
Journal:  Biomolecules       Date:  2019-10-14

5.  Ranking near-native candidate protein structures via random forest classification.

Authors:  Hongjie Wu; Hongmei Huang; Weizhong Lu; Qiming Fu; Yijie Ding; Jing Qiu; Haiou Li
Journal:  BMC Bioinformatics       Date:  2019-12-24       Impact factor: 3.169

6.  Decoy selection for protein structure prediction via extreme gradient boosting and ranking.

Authors:  Nasrin Akhter; Gopinath Chennupati; Hristo Djidjev; Amarda Shehu
Journal:  BMC Bioinformatics       Date:  2020-12-09       Impact factor: 3.169

7.  Estimation of model accuracy by a unique set of features and tree-based regressor.

Authors:  Mor Bitton; Chen Keasar
Journal:  Sci Rep       Date:  2022-08-18       Impact factor: 4.996

8.  UniCon3D: de novo protein structure prediction using united-residue conformational search via stepwise, probabilistic sampling.

Authors:  Debswapna Bhattacharya; Renzhi Cao; Jianlin Cheng
Journal:  Bioinformatics       Date:  2016-06-03       Impact factor: 6.937

9.  QDeep: distance-based protein model quality estimation by residue-level ensemble error classifications using stacked deep residual neural networks.

Authors:  Md Hossain Shuvo; Sutanu Bhattacharya; Debswapna Bhattacharya
Journal:  Bioinformatics       Date:  2020-07-01       Impact factor: 6.937

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.