Mattias Jakobsson1, Noah A Rosenberg. 1. Center for Computational Medicine and Biology, Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA. mjakob@umich.edu
Abstract
MOTIVATION: Clustering of individuals into populations on the basis of multilocus genotypes is informative in a variety of settings. In population-genetic clustering algorithms, such as BAPS, STRUCTURE and TESS, individual multilocus genotypes are partitioned over a set of clusters, often using unsupervised approaches that involve stochastic simulation. As a result, replicate cluster analyses of the same data may produce several distinct solutions for estimated cluster membership coefficients, even though the same initial conditions were used. Major differences among clustering solutions have two main sources: (1) 'label switching' of clusters across replicates, caused by the arbitrary way in which clusters in an unsupervised analysis are labeled, and (2) 'genuine multimodality,' truly distinct solutions across replicates. RESULTS: To facilitate the interpretation of population-genetic clustering results, we describe three algorithms for aligning multiple replicate analyses of the same data set. We have implemented these algorithms in the computer program CLUMPP (CLUster Matching and Permutation Program). We illustrate the use of CLUMPP by aligning the cluster membership coefficients from 100 replicate cluster analyses of 600 chickens from 20 different breeds. AVAILABILITY: CLUMPP is freely available at http://rosenberglab.bioinformatics.med.umich.edu/clumpp.html.
MOTIVATION: Clustering of individuals into populations on the basis of multilocus genotypes is informative in a variety of settings. In population-genetic clustering algorithms, such as BAPS, STRUCTURE and TESS, individual multilocus genotypes are partitioned over a set of clusters, often using unsupervised approaches that involve stochastic simulation. As a result, replicate cluster analyses of the same data may produce several distinct solutions for estimated cluster membership coefficients, even though the same initial conditions were used. Major differences among clustering solutions have two main sources: (1) 'label switching' of clusters across replicates, caused by the arbitrary way in which clusters in an unsupervised analysis are labeled, and (2) 'genuine multimodality,' truly distinct solutions across replicates. RESULTS: To facilitate the interpretation of population-genetic clustering results, we describe three algorithms for aligning multiple replicate analyses of the same data set. We have implemented these algorithms in the computer program CLUMPP (CLUster Matching and Permutation Program). We illustrate the use of CLUMPP by aligning the cluster membership coefficients from 100 replicate cluster analyses of 600 chickens from 20 different breeds. AVAILABILITY: CLUMPP is freely available at http://rosenberglab.bioinformatics.med.umich.edu/clumpp.html.
Authors: G A De Groot; H J During; S W Ansell; H Schneider; P Bremer; E R J Wubs; J W Maas; H Korpelainen; R H J Erkens Journal: Ann Bot Date: 2012-02-09 Impact factor: 4.357
Authors: Chunlei Su; Asis Khan; Peng Zhou; Debashree Majumdar; Daniel Ajzenberg; Marie-Laure Dardé; Xing-Quan Zhu; James W Ajioka; Benjamin M Rosenthal; Jitender P Dubey; L David Sibley Journal: Proc Natl Acad Sci U S A Date: 2012-03-19 Impact factor: 11.205
Authors: Asif Javed; Marta Melé; Marc Pybus; Pierre Zalloua; Marc Haber; David Comas; Mihai G Netea; Oleg Balanovsky; Elena Balanovska; Li Jin; Yajun Yang; Ganeshprasad Arunkumar; Ramasamy Pitchappan; Jaume Bertranpetit; Francesc Calafell; Laxmi Parida Journal: Hum Genet Date: 2011-10-18 Impact factor: 4.132
Authors: Jessica L Abbate; Pierre Gladieux; Michael E Hood; Damien M de Vienne; Janis Antonovics; Alodie Snirc; Tatiana Giraud Journal: Mol Ecol Date: 2018-07-21 Impact factor: 6.185