Veit Schwämmle1,2, Ole N Jensen1,2. 1. Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense M, Denmark. 2. VILLUM Center for Bioanalytical Sciences, University of Southern Denmark, Odense M, Denmark.
Abstract
Motivation: Data clustering is indispensable for identifying biologically relevant molecular features in large-scale omics experiments with thousands of measurements at multiple conditions. Optimal clustering results yield groups of functionally related features that may include genes, proteins and metabolites in biological processes and molecular networks. Omics experiments typically include replicated measurements of each feature within a given condition to statistically assess feature-specific variation. Current clustering approaches ignore this variation by averaging, which often leads to incorrect cluster assignments. Results: We present VSClust that accounts for feature-specific variance. Based on an algorithm derived from fuzzy clustering, VSClust unifies statistical testing with pattern recognition to cluster the data into feature groups that more accurately reflect the underlying molecular and functional behavior. We apply VSClust to artificial and experimental datasets comprising hundreds to >80 000 features across 6-20 different conditions including genomics, transcriptomics, proteomics and metabolomics experiments. VSClust avoids arbitrary averaging methods, outperforms standard fuzzy c-means clustering and simplifies the data analysis workflow in large-scale omics studies. Availability and implementation: Download VSClust at https://bitbucket.org/veitveit/vsclust or access it through computproteomics.bmb.sdu.dk/Apps/VSClust. Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: Data clustering is indispensable for identifying biologically relevant molecular features in large-scale omics experiments with thousands of measurements at multiple conditions. Optimal clustering results yield groups of functionally related features that may include genes, proteins and metabolites in biological processes and molecular networks. Omics experiments typically include replicated measurements of each feature within a given condition to statistically assess feature-specific variation. Current clustering approaches ignore this variation by averaging, which often leads to incorrect cluster assignments. Results: We present VSClust that accounts for feature-specific variance. Based on an algorithm derived from fuzzy clustering, VSClust unifies statistical testing with pattern recognition to cluster the data into feature groups that more accurately reflect the underlying molecular and functional behavior. We apply VSClust to artificial and experimental datasets comprising hundreds to >80 000 features across 6-20 different conditions including genomics, transcriptomics, proteomics and metabolomics experiments. VSClust avoids arbitrary averaging methods, outperforms standard fuzzy c-means clustering and simplifies the data analysis workflow in large-scale omics studies. Availability and implementation: Download VSClust at https://bitbucket.org/veitveit/vsclust or access it through computproteomics.bmb.sdu.dk/Apps/VSClust. Supplementary information: Supplementary data are available at Bioinformatics online.
Authors: Anna Fernanda Vasconcellos; Reynaldo Magalhães Melo; Samuel Coelho Mandacaru; Lucas Silva de Oliveira; Athos Silva de Oliveira; Emily Caroline Dos Santos Moraes; Monique Ramos de Oliveira Trugilho; Carlos André Ornelas Ricart; Sônia Nair Báo; Renato Oliveira Resende; Sébastien Charneau Journal: Front Cell Infect Microbiol Date: 2022-06-15 Impact factor: 6.073
Authors: Emilly Caroline Dos Santos Moraes; Remy Martins-Gonçalves; Luana Rocha da Silva; Samuel Coelho Mandacaru; Reynaldo Magalhães Melo; Isaclaudia Azevedo-Quintanilha; Jonas Perales; Fernando A Bozza; Thiago Moreno Lopes Souza; Hugo Caire Castro-Faria-Neto; Eugenio D Hottz; Patricia T Bozza; Monique R O Trugilho Journal: Front Cell Infect Microbiol Date: 2022-07-22 Impact factor: 6.073
Authors: Alerie G de la Fuente; Rayner M L Queiroz; Tanay Ghosh; Christopher E McMurran; Juan F Cubillos; Dwight E Bergles; Denise C Fitzgerald; Clare A Jones; Kathryn S Lilley; Colin P Glover; Robin J M Franklin Journal: Mol Cell Proteomics Date: 2020-05-20 Impact factor: 5.911