Literature DB >> 22199383

Fast large-scale clustering of protein structures using Gauss integrals.

Tim Harder1, Mikael Borg, Wouter Boomsma, Peter Røgen, Thomas Hamelryck.   

Abstract

MOTIVATION: Clustering protein structures is an important task in structural bioinformatics. De novo structure prediction, for example, often involves a clustering step for finding the best prediction. Other applications include assigning proteins to fold families and analyzing molecular dynamics trajectories.
RESULTS: We present Pleiades, a novel approach to clustering protein structures with a rigorous mathematical underpinning. The method approximates clustering based on the root mean square deviation by first mapping structures to Gauss integral vectors--which were introduced by Røgen and co-workers--and subsequently performing K-means clustering.
CONCLUSIONS: Compared to current methods, Pleiades dramatically improves on the time needed to perform clustering, and can cluster a significantly larger number of structures, while providing state-of-the-art results. The number of low energy structures generated in a typical folding study, which is in the order of 50,000 structures, can be clustered within seconds to minutes.

Mesh:

Substances:

Year:  2011        PMID: 22199383     DOI: 10.1093/bioinformatics/btr692

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  8 in total

1.  Efficiently refining a transition path using clustering.

Authors:  Ian F Thorpe
Journal:  Biophys J       Date:  2013-08-06       Impact factor: 4.033

2.  Fast algorithm for population-based protein structural model analysis.

Authors:  Jingfen Zhang; Dong Xu
Journal:  Proteomics       Date:  2013-01-03       Impact factor: 3.984

3.  Bayesian inference of protein structure from chemical shift data.

Authors:  Lars A Bratholm; Anders S Christensen; Thomas Hamelryck; Jan H Jensen
Journal:  PeerJ       Date:  2015-03-24       Impact factor: 2.984

4.  The value of protein structure classification information-Surveying the scientific literature.

Authors:  Naomi K Fox; Steven E Brenner; John-Marc Chandonia
Journal:  Proteins       Date:  2015-09-19

5.  UQlust: combining profile hashing with linear-time ranking for efficient clustering and analysis of big macromolecular data.

Authors:  Rafal Adamczak; Jarek Meller
Journal:  BMC Bioinformatics       Date:  2016-12-28       Impact factor: 3.169

6.  Identify High-Quality Protein Structural Models by Enhanced K-Means.

Authors:  Hongjie Wu; Haiou Li; Min Jiang; Cheng Chen; Qiang Lv; Chuang Wu
Journal:  Biomed Res Int       Date:  2017-03-22       Impact factor: 3.411

7.  ENCORE: Software for Quantitative Ensemble Comparison.

Authors:  Matteo Tiberti; Elena Papaleo; Tone Bengtsen; Wouter Boomsma; Kresten Lindorff-Larsen
Journal:  PLoS Comput Biol       Date:  2015-10-27       Impact factor: 4.475

8.  Network-based protein structural classification.

Authors:  Khalique Newaz; Mahboobeh Ghalehnovi; Arash Rahnama; Panos J Antsaklis; Tijana Milenković
Journal:  R Soc Open Sci       Date:  2020-06-03       Impact factor: 2.963

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.