| Literature DB >> 24983991 |
Saumyadipta Pyne1, Sharon X Lee2, Kui Wang2, Jonathan Irish3, Pablo Tamayo4, Marc-Danie Nazaire4, Tarn Duong5, Shu-Kay Ng6, David Hafler7, Ronald Levy8, Garry P Nolan9, Jill Mesirov4, Geoffrey J McLachlan2.
Abstract
In biomedical applications, an experimenter encounters different potential sources of variation in data such as individual samples, multiple experimental conditions, and multivariate responses of a panel of markers such as from a signaling network. In multiparametric cytometry, which is often used for analyzing patient samples, such issues are critical. While computational methods can identify cell populations in individual samples, without the ability to automatically match them across samples, it is difficult to compare and characterize the populations in typical experiments, such as those responding to various stimulations or distinctive of particular patients or time-points, especially when there are many samples. Joint Clustering and Matching (JCM) is a multi-level framework for simultaneous modeling and registration of populations across a cohort. JCM models every population with a robust multivariate probability distribution. Simultaneously, JCM fits a random-effects model to construct an overall batch template--used for registering populations across samples, and classifying new samples. By tackling systems-level variation, JCM supports practical biomedical applications involving large cohorts. Software for fitting the JCM models have been implemented in an R package EMMIX-JCM, available from http://www.maths.uq.edu.au/~gjm/mix_soft/EMMIX-JCM/.Entities:
Mesh:
Year: 2014 PMID: 24983991 PMCID: PMC4077578 DOI: 10.1371/journal.pone.0100334
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1JCM model and application.
The multi-level model is illustrated using the samples (bottom) and the template (top) for the samples of the 3 min class, along 3 out of 4 dimensions in the TCR activation data. Actual values of the JCM parameters were used to construct the 50th percentile multivariate t density contour (ellipsoid) depicting every population. The overall class template is computed by fitting a random effects model on all the samples, which in turn are fitted with sample-specific finite mixture models of multivariate t's. Under the JCM framwork, each sample can be described as an affine transformation of the template, where each population in a sample corresponds to its counterpart in the class template, as shown by the matched colors and labels (# 1–5).
Figure 2Distinct spatial characteristics of phospho-marker expression in samples from two classes of patients with different outcomes.
(A) Heatplots provide insight into the distribution of phospho-proteomic expression of p-PLCg2 and p-STAT5 (panel 4) for (top 2 rows) and (bottom row) samples. The mound (high CD20 and BCL-2) populations are shown here. In contrast to the more symmetrically distributed, well-rounded mounds, the skewness in the mounds is clearly visible. (B) The stimulated mound (light brown histogram) of a sample is shown in contrast with the corresponding population prior to stimulation (greyish blue histogram). (C) The ability of the mound skew parameters () for 16 phospho-markers to distinguish samples across the and classes (green and pink labels respectively) is shown with a heatmap based on the corresponding posterior log-odds scores. The higher the score, the darker the corresponding entry in red/blue. Each marker name and its average posterior log-odds score over all samples are marked on the sides of the heatmap.
Classification error rates of three methods on DLBCL data.
| Sample | JCM | HDPGMM | FLAME |
| Sa001 | 0.3045 | 0.2046 | 0.5143 |
| Sa002 | 0.0339 | 0.1044 | 0.4300 |
| Sa003 | 0.0694 | 0.0946 | 0.5931 |
| Sa004 | 0.0659 | 0.0946 | 0.5459 |
| Sa005 | 0.0089 | 0.1230 | 0.4440 |
| Sa006 | 0.2947 | 0.0611 | 0.5987 |
| Sa007 | 0.0208 | 0.0510 | 0.2584 |
| Sa008 | 0.0683 | 0.0719 | 0.3719 |
| Sa009 | 0.0249 | 0.1343 | 0.2417 |
| Sa010 | 0.0121 | 0.3828 | 0.5413 |
| Sa011 | 0.0236 | 0.4082 | 0.4792 |
| Sa012 | 0.0096 | 0.1148 | 0.2456 |
| Sa013 | 0.0326 | 0.3247 | 0.5947 |
| Sa014 | 0.0062 | 0.2959 | 0.6000 |
| Sa015 | 0.1283 | 0.4110 | 0.3927 |
| Sa016 | 0.0361 | 0.4437 | 0.5372 |
| AMCR | 0.0711 | 0.2038 | 0.4618 |
Samples from 16 patients diagnosed with Diffuse Large B-cell Lymphoma (DLBCL) were clustered using JCM, HDPGMM, and FLAME. For both JCM and HDPGMM, a class template is computed for the entire batch of samples, while FLAME performs post hoc alignment of the results given by FLAME-I, where FLAME-I denotes the procedure with FLAME applied to each individual sample considered separately. The final row shows the average misclassification rate (AMCR) for each method. Clearly, JCM shows overall superior performance.
Classification error rates of various methods on DLBCL data.
| Sample | JCM | FLAME-I | flowClust-I | SWIFT-I | FLAME-P | flowClust-P | SWIFT-P |
| Sa001 | 0.3045 | 0.3039 | 0.3070 | 0.5368 | 0.1666 | 0.2187 | 0.3039 |
| Sa002 | 0.0339 | 0.3394 | 0.0388 | 0.1526 | 0.2146 | 0.4096 | 0.3060 |
| Sa003 | 0.0694 | 0.0753 | 0.0588 | 0.4500 | 0.1790 | 0.3194 | 0.2204 |
| Sa004 | 0.0659 | 0.0687 | 0.0682 | 0.5506 | 0.1227 | 0.1661 | 0.3038 |
| Sa005 | 0.0089 | 0.1631 | 0.1868 | 0.4521 | 0.1415 | 0.0752 | 0.1220 |
| Sa006 | 0.2947 | 0.2670 | 0.1150 | 0.3612 | 0.0809 | 0.3773 | 0.1869 |
| Sa007 | 0.0208 | 0.0211 | 0.0217 | 0.2580 | 0.0943 | 0.0569 | 0.0438 |
| Sa008 | 0.0683 | 0.0678 | 0.0997 | 0.1911 | 0.0852 | 0.1045 | 0.3560 |
| Sa009 | 0.0249 | 0.3191 | 0.0891 | 0.2508 | 0.0487 | 0.0302 | 0.0186 |
| Sa010 | 0.0121 | 0.0575 | 0.0111 | 0.5353 | 0.0628 | 0.0471 | 0.0757 |
| Sa011 | 0.0236 | 0.0248 | 0.0248 | 0.1627 | 0.0240 | 0.1660 | 0.1004 |
| Sa012 | 0.0096 | 0.3919 | 0.4613 | 0.2170 | 0.0421 | 0.0299 | 0.0188 |
| Sa013 | 0.0326 | 0.0324 | 0.0355 | 0.5936 | 0.0796 | 0.0500 | 0.0581 |
| Sa014 | 0.0062 | 0.0065 | 0.0083 | 0.5612 | 0.0857 | 0.0159 | 0.0373 |
| Sa015 | 0.1283 | 0.1274 | 0.1317 | 0.5896 | 0.1093 | 0.1077 | 0.0947 |
| Sa016 | 0.0361 | 0.0554 | 0.1832 | 0.4502 | 0.0524 | 0.0535 | 0.0803 |
| AMCR | 0.0711 | 0.1451 | 0.1151 | 0.3946 | 0.1128 | 0.1393 | 0.1454 |
Misclassification rate (MCR) for JCM, FLAME, flowClust and SWIFT on the 16 samples from the DLBCL dataset (see also Table 1). The latter three methods were applied to each individual sample separately (denoted with suffix -I), and also based on a pooling approach (denoted with suffix -P). The final row shows the average misclassification rate (AMCR) for each method.