Literature DB >> 26636222

Clustering Molecular Dynamics Trajectories: 1. Characterizing the Performance of Different Clustering Algorithms.

Jianyin Shao1, Stephen W Tanner1, Nephi Thompson1, Thomas E Cheatham1.   

Abstract

Molecular dynamics simulation methods produce trajectories of atomic positions (and optionally velocities and energies) as a function of time and provide a representation of the sampling of a given molecule's energetically accessible conformational ensemble. As simulations on the 10-100 ns time scale become routine, with sampled configurations stored on the picosecond time scale, such trajectories contain large amounts of data. Data-mining techniques, like clustering, provide one means to group and make sense of the information in the trajectory. In this work, several clustering algorithms were implemented, compared, and utilized to understand MD trajectory data. The development of the algorithms into a freely available C code library, and their application to a simple test example of random (or systematically placed) points in a 2D plane (where the pairwise metric is the distance between points) provide a means to understand the relative performance. Eleven different clustering algorithms were developed, ranging from top-down splitting (hierarchical) and bottom-up aggregating (including single-linkage edge joining, centroid-linkage, average-linkage, complete-linkage, centripetal, and centripetal-complete) to various refinement (means, Bayesian, and self-organizing maps) and tree (COBWEB) algorithms. Systematic testing in the context of MD simulation of various DNA systems (including DNA single strands and the interaction of a minor groove binding drug DB226 with a DNA hairpin) allows a more direct assessment of the relative merits of the distinct clustering algorithms. Additionally, means to assess the relative performance and differences between the algorithms, to dynamically select the initial cluster count, and to achieve faster data mining by "sieved clustering" were evaluated. Overall, it was found that there is no one perfect "one size fits all" algorithm for clustering MD trajectories and that the results strongly depend on the choice of atoms for the pairwise comparison. Some algorithms tend to produce homogeneously sized clusters, whereas others have a tendency to produce singleton clusters. Issues related to the choice of a pairwise metric, clustering metrics, which atom selection is used for the comparison, and about the relative performance are discussed. Overall, the best performance was observed with the average-linkage, means, and SOM algorithms. If the cluster count is not known in advance, the hierarchical or average-linkage clustering algorithms are recommended. Although these algorithms perform well, it is important to be aware of the limitations or weaknesses of each algorithm, specifically the high sensitivity to outliers with hierarchical, the tendency to generate homogenously sized clusters with means, and the tendency to produce small or singleton clusters with average-linkage.

Entities:  

Year:  2007        PMID: 26636222     DOI: 10.1021/ct700119m

Source DB:  PubMed          Journal:  J Chem Theory Comput        ISSN: 1549-9618            Impact factor:   6.006


  217 in total

1.  Molecular simulation uncovers the conformational space of the λ Cro dimer in solution.

Authors:  Logan S Ahlstrom; Osamu Miyashita
Journal:  Biophys J       Date:  2011-11-15       Impact factor: 4.033

2.  Long route or shortcut? A molecular dynamics study of traffic of thiocholine within the active-site gorge of acetylcholinesterase.

Authors:  Yechun Xu; Jacques-Philippe Colletier; Martin Weik; Guangrong Qin; Hualiang Jiang; Israel Silman; Joel L Sussman
Journal:  Biophys J       Date:  2010-12-15       Impact factor: 4.033

3.  Analysis of the bacterial luciferase mobile loop by replica-exchange molecular dynamics.

Authors:  Zachary T Campbell; Thomas O Baldwin; Osamu Miyashita
Journal:  Biophys J       Date:  2010-12-15       Impact factor: 4.033

4.  Understanding the molecular mechanism of the broad and potent neutralization of HIV-1 by antibody VRC01 from the perspective of molecular dynamics simulation and binding free energy calculations.

Authors:  Yan Zhang; Dabo Pan; Yulin Shen; Nengzhi Jin; Huanxiang Liu; Xiaojun Yao
Journal:  J Mol Model       Date:  2012-05-29       Impact factor: 1.810

5.  Accelerating molecular simulations of proteins using Bayesian inference on weak information.

Authors:  Alberto Perez; Justin L MacCallum; Ken A Dill
Journal:  Proc Natl Acad Sci U S A       Date:  2015-09-08       Impact factor: 11.205

6.  Simulations of allosteric motions in the zinc sensor CzrA.

Authors:  Dhruva K Chakravorty; Bing Wang; Chul Won Lee; David P Giedroc; Kenneth M Merz
Journal:  J Am Chem Soc       Date:  2011-11-14       Impact factor: 15.419

7.  On the active site of mononuclear B1 metallo β-lactamases: a computational study.

Authors:  Jacopo Sgrignani; Alessandra Magistrato; Matteo Dal Peraro; Alejandro J Vila; Paolo Carloni; Roberta Pierattelli
Journal:  J Comput Aided Mol Des       Date:  2012-04-25       Impact factor: 3.686

8.  Network visualization of conformational sampling during molecular dynamics simulation.

Authors:  Logan S Ahlstrom; Joseph Lee Baker; Kent Ehrlich; Zachary T Campbell; Sunita Patel; Ivan I Vorontsov; Florence Tama; Osamu Miyashita
Journal:  J Mol Graph Model       Date:  2013-10-16       Impact factor: 2.518

9.  Effect of pathogenic mutations on the structure and dynamics of Alzheimer's A beta 42-amyloid oligomers.

Authors:  Kristin Kassler; Anselm H C Horn; Heinrich Sticht
Journal:  J Mol Model       Date:  2009-11-12       Impact factor: 1.810

10.  Active Site Breathing of Human Alkbh5 Revealed by Solution NMR and Accelerated Molecular Dynamics.

Authors:  Jeffrey A Purslow; Trang T Nguyen; Timothy K Egner; Rochelle R Dotas; Balabhadra Khatiwada; Vincenzo Venditti
Journal:  Biophys J       Date:  2018-10-11       Impact factor: 4.033

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.