| Literature DB >> 22833709 |
Y M Tikunov, S Laptenok, R D Hall, A Bovy, R C H de Vos.
Abstract
Mass peak alignment (ion-wise alignment) has recently become a popular method for unsupervised data analysis in untargeted metabolic profiling. Here we present MSClust-a software tool for analysis GC-MS and LC-MS datasets derived from untargeted profiling. MSClust performs data reduction using unsupervised clustering and extraction of putative metabolite mass spectra from ion-wise chromatographic alignment data. The algorithm is based on the subtractive fuzzy clustering method that allows unsupervised determination of a number of metabolites in a data set and can deal with uncertain memberships of mass peaks in overlapping mass spectra. This approach is based purely on the actual information present in the data and does not require any prior metabolite knowledge. MSClust can be applied for both GC-MS and LC-MS alignment data sets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11306-011-0368-2) contains supplementary material, which is available to authorized users.Entities:
Year: 2011 PMID: 22833709 PMCID: PMC3397229 DOI: 10.1007/s11306-011-0368-2
Source DB: PubMed Journal: Metabolomics ISSN: 1573-3882 Impact factor: 4.290
Fig. 1A general workflow of a comparative metabolomics data analysis which is based on mass peak alignment approach. MSClust receives a mass peak alignment data matrix of size M × S, where M is a number of mass peaks (often tens thousands) aligned across a number of samples profiled S. As the result it produces a reduced data matrix of size C × S, where C a number of putative compounds each represented by a single mass peak (normally a few hundred) aligned across the same number of samples S. Besides, it extracts a mass spectra for each of the compounds C, that in case of GC–MS data is compatible with the NIST MSSearch compound identification software
Fig. 2The schema illustrates basic steps of the MSClust algorithm. A—computing PD of each ion fragment based on two distances: the retention time distance between mass peak peak apices (determined by an alignment software) (X-axis of A, B and C) and an intensity pattern similarity distance (Y-axis of A, B and C). The more close neighbours an mass peak has in the two-dimensional feature space, the higher its PD is (the darker its dot in plot A). B—selection of ‘centrotype’ ion fragments as centres of clusters (cA and cB). C—classification: computing memberships of each ion fragment in the cluster centers. The dots depicted in brown have uncertain (intermediate) membership and can represent mass peaks common for cA and cB. D—conversion of clustering results into reconstructed mass spectra (‘ms A’ and ‘ms B’) and selection of most representative mass peaks (‘qA’ and ‘qB’). The red-green color scale below reflects the membership of mass peaks in cluster A (green) and B (red)