| Literature DB >> 34529048 |
Mustafa Buyukozkan1,2, Karsten Suhre2,3, Jan Krumsiek1,2.
Abstract
The 'Subgroup Identification' (SGI) toolbox provides an algorithm to automatically detect clinical subgroups of samples in large-scale omics datasets. It is based on hierarchical clustering trees in combination with a specifically designed association testing and visualization framework that can process an arbitrary number of clinical parameters and outcomes in a systematic fashion. A multi-block extension allows for the simultaneous use of multiple omics datasets on the same samples. In this paper, we first describe the functionality of the toolbox and then demonstrate its capabilities through application examples on a type 2 diabetes metabolomics study as well as two copy number variation datasets from The Cancer Genome Atlas. AVAILABILITY: SGI is an open-source package implemented in R. Package source codes and hands-on tutorials are available at https://github.com/krumsieklab/sgi. The QMdiab metabolomics data is included in the package and can be downloaded from https://doi.org/10.6084/m9.figshare.5904022. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.Entities:
Year: 2021 PMID: 34529048 PMCID: PMC8723155 DOI: 10.1093/bioinformatics/btab656
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Application example. Blood metabolomics-based clustering of n = 356 participants of the QMdiab study. White circles in the tree indicate the left/right splitting points of the samples in the data (note that, these are not centered if the subclusters are of unequal size). Markings on the tree indicate statistically significant associations of the parameter with the respective left and right subgroups at that split. Heatmap track below the tree shows individual values for selected parameters. Red circles between gaps indicate significant results for left versus right at that split and are horizontally aligned with their respective white circles on the tree. Bottom panel shows the metabolomics data matrix behind the clustering