Literature DB >> 26816532

How frequently do clusters occur in hierarchical clustering analysis? A graph theoretical approach to studying ties in proximity.

Wilmer Leal1, Eugenio J Llanos2, Guillermo Restrepo3, Carlos F Suárez4, Manuel Elkin Patarroyo5.   

Abstract

BACKGROUND: Hierarchical cluster analysis (HCA) is a widely used classificatory technique in many areas of scientific knowledge. Applications usually yield a dendrogram from an HCA run over a given data set, using a grouping algorithm and a similarity measure. However, even when such parameters are fixed, ties in proximity (i.e. two equidistant clusters from a third one) may produce several different dendrograms, having different possible clustering patterns (different classifications). This situation is usually disregarded and conclusions are based on a single result, leading to questions concerning the permanence of clusters in all the resulting dendrograms; this happens, for example, when using HCA for grouping molecular descriptors to select that less similar ones in QSAR studies.
RESULTS: Representing dendrograms in graph theoretical terms allowed us to introduce four measures of cluster frequency in a canonical way, and use them to calculate cluster frequencies over the set of all possible dendrograms, taking all ties in proximity into account. A toy example of well separated clusters was used, as well as a set of 1666 molecular descriptors calculated for a group of molecules having hepatotoxic activity to show how our functions may be used for studying the effect of ties in HCA analysis. Such functions were not restricted to the tie case; the possibility of using them to derive cluster stability measurements on arbitrary sets of dendrograms having the same leaves is discussed, e.g. dendrograms from variations of HCA parameters. It was found that ties occurred frequently, some yielding tens of thousands of dendrograms, even for small data sets.
CONCLUSIONS: Our approach was able to detect trends in clustering patterns by offering a simple way of measuring their frequency, which is often very low. This would imply, that inferences and models based on descriptor classifications (e.g. QSAR) are likely to be biased, thereby requiring an assessment of their reliability. Moreover, any classification of molecular descriptors is likely to be far from unique. Our results highlight the need for evaluating the effect of ties on clustering patterns before classification results can be used accurately.Graphical abstractFour cluster contrast functions identifying statistically sound clusters within dendrograms considering ties in proximity.

Entities:  

Keywords:  Cluster frequency; Cluster stability; Dendrogram; Hierarchical cluster analysis (HCA); Molecular descriptor; Ties in proximity

Year:  2016        PMID: 26816532      PMCID: PMC4727313          DOI: 10.1186/s13321-016-0114-x

Source DB:  PubMed          Journal:  J Cheminform        ISSN: 1758-2946            Impact factor:   5.514


  17 in total

1.  Ties in proximity and clustering compounds.

Authors:  J MacCuish; C Nicolaou; N E MacCuish
Journal:  J Chem Inf Comput Sci       Date:  2001 Jan-Feb

2.  Iterative cluster analysis of protein interaction data.

Authors:  Vicente Arnau; Sergio Mars; Ignacio Marín
Journal:  Bioinformatics       Date:  2004-09-16       Impact factor: 6.937

3.  VISCANA: visualized cluster analysis of protein-ligand interaction based on the ab initio fragment molecular orbital method for virtual ligand screening.

Authors:  Shinji Amari; Masahiro Aizawa; Junwei Zhang; Kaori Fukuzawa; Yuji Mochizuki; Yoshio Iwasawa; Kotoko Nakata; Hiroshi Chuman; Tatsuya Nakano
Journal:  J Chem Inf Model       Date:  2006 Jan-Feb       Impact factor: 4.956

4.  Assessing different classification methods for virtual screening.

Authors:  Dariusz Plewczynski; Stéphane A H Spieser; Uwe Koch
Journal:  J Chem Inf Model       Date:  2006 May-Jun       Impact factor: 4.956

5.  Three dissimilarity measures to contrast dendrograms.

Authors:  Guillermo Restrepo; Héber Mesa; Eugenio J Llanos
Journal:  J Chem Inf Model       Date:  2007-04-28       Impact factor: 4.956

6.  Using molecular docking, 3D-QSAR, and cluster analysis for screening structurally diverse data sets of pharmacological interest.

Authors:  Osvaldo A Santos-Filho; Artem Cherkasov
Journal:  J Chem Inf Model       Date:  2008-09-24       Impact factor: 4.956

7.  Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research.

Authors:  Denis Fourches; Eugene Muratov; Alexander Tropsha
Journal:  J Chem Inf Model       Date:  2010-07-26       Impact factor: 4.956

8.  Application of computer-aided drug repurposing in the search of new cruzipain inhibitors: discovery of amiodarone and bromocriptine inhibitory effects.

Authors:  Carolina L Bellera; Darío E Balcazar; Lucas Alberca; Carlos A Labriola; Alan Talevi; Carolina Carrillo
Journal:  J Chem Inf Model       Date:  2013-08-16       Impact factor: 4.956

9.  Control of yeast filamentous-form growth by modules in an integrated molecular network.

Authors:  Susanne Prinz; Iliana Avila-Campillo; Christine Aldridge; Ajitha Srinivasan; Krassen Dimitrov; Andrew F Siegel; Timothy Galitski
Journal:  Genome Res       Date:  2004-03       Impact factor: 9.043

10.  Gold(III) macrocycles: nucleotide-specific unconventional catalytic inhibitors of human topoisomerase I.

Authors:  Kate J Akerman; Alexander M Fagenson; Vidusha Cyril; Michael Taylor; Mark T Muller; Matthew P Akerman; Orde Q Munro
Journal:  J Am Chem Soc       Date:  2014-04-02       Impact factor: 15.419

View more
  2 in total

1.  Machine Learning Applications for the Development of a Questionnaire to Identify Sasang Constitution Typology.

Authors:  Soon Mi Kim; Jeongkun Ryu; Eunhye Olivia Park
Journal:  Int J Environ Res Public Health       Date:  2022-09-19       Impact factor: 4.614

2.  Theme trends and knowledge structure on choroidal neovascularization: a quantitative and co-word analysis.

Authors:  Fangkun Zhao; Bei Shi; Ruixin Liu; Wenkai Zhou; Dong Shi; Jinsong Zhang
Journal:  BMC Ophthalmol       Date:  2018-04-03       Impact factor: 2.209

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.