| Literature DB >> 27416291 |
Andrew P Voigt1, Lisa Eidenschink Brodersen1, Laura Pardo2, Soheil Meshinchi2, Michael R Loken1.
Abstract
Identification and quantification of maturing hematopoietic cell populations in flow cytometry data sets is a complex and sometimes irreproducible step in data analysis. Supervised machine learning algorithms present promise to automatically classify cells into populations, reducing subjective bias in data analysis. We describe the use of support vector machines (SVMs), a supervised algorithm, to reproducibly identify two distinctly different populations of normal hematopoietic cells, mature lymphocytes and uncommitted progenitor cells, in the challenging setting of pediatric bone marrow specimens obtained 1 month after chemotherapy. Four-color flow cytometry data were collected on a FACS Calibur for 77 randomly selected postchemotherapy pediatric patients enrolled on the Children's Oncology Group clinical trial AAML1031. These patients demonstrated no evidence of detectable residual disease and were divided into training (n = 27) and testing (n = 50) cohorts. SVMs were trained to identify mature lymphocytes and uncommitted progenitor cells in the training cohort before independent evaluation of prediction efficiency in the testing cohort. Both SVMs demonstrated high predictive performance (lymphocyte SVM: sensitivity >0.99, specificity >0.99; uncommitted progenitor cell SVM: sensitivity = 0.94, specificity >0.99) and closely mirrored manual cell classifications by two expert-analysts. SVMs present an efficient, automated methodology for identifying normal cell populations even in stressed bone marrows, replicating the performance of an expert while reducing the intrinsic bias of gating procedures between multiple analysts.Entities:
Keywords: bone marrow; classification; flow cytometry; lymphocytes; support vector machines; uncommitted progenitor cells
Mesh:
Year: 2016 PMID: 27416291 PMCID: PMC5132084 DOI: 10.1002/cyto.a.22905
Source DB: PubMed Journal: Cytometry A ISSN: 1552-4922 Impact factor: 4.355
Monoclonal antibody combinations
| Tube no. | FITC | PE | PerCP | APC |
|---|---|---|---|---|
| 1 | HLA‐DR | CD11b | CD45 | CD34 |
| Clone | L243 (BD) | D12 (BD) | 2D1 (BD) | 8G12 (BD) |
| 2 | CD36 | CD38 | CD45 | CD34 |
| Clone | FA6.152 (BC) | HB7 (BD) | ||
| 3 | CD16 | CD13 | CD45 | CD34 |
| Clone | 3G8 (BD) | L138 (BD) | ||
| 4 | CD14 | CD33 | CD45 | CD34 |
| Clone | Mϕ/P9 (BD) | P67.6 (BD) | ||
| 5 | CD7 | CD56 | CD45 | CD34 |
| Clone | 4H9 (BD) | MY31 (BD) | ||
| 6 | CD38 | CD117 | CD45 | CD34 |
| Clone | HIT2 (Invitro) | 104D2 (BD) | ||
| 7 | CD36 | CD64 | CD45 | CD34 |
| Clone | FA6.152 (BC) | 22 (T) | ||
| 8 | CD19 | CD123 | CD45 | CD34 |
| Clone | 4G7 (BD) | 9F5 (BD) |
Figure 1Expert cellular classifications for SVM training: (A) Lymphocytes (purple) were identified by an expert analyst as a discrete cluster of events with high CD45 intensity and low SSC. (B) The high relative frequency of the lymphocyte population is depicted on a 3D plot of CD45, SSC, and frequency. (C) Uncommitted progenitor cells (purple) were identified by an expert analyst as the cells with the brightest CD34 intensity before a gain or loss of CD33. Maturational pathways as these cells commit to monocyte, neutrophil, dendritic, basophil, and lymphocyte lineages are shown with arrows. (D) The low relative frequency of the uncommitted progenitor cell population is depicted on a 3D plot of CD33, CD34, and frequency.
Figure 2Qualitative evaluation of SVM predictions for a test cohort patient. (A, B) Each SVM prediction was compared to an independent manual classification of lymphocytes (A) and uncommitted progenitor cells (B). Cells colored in red were classified by both the expert analyst and the SVM. Discrepant classifications of events colored in green were identified only by the expert analyst, while events colored in purple were classified only by the SVM. The discrepant classifications occur at the outer boundaries of the target population. (C) A frequency curve of CD45 intensities for all cells (black) reveals a uniform subpopulation of cells with high‐intensity of CD45, which is comprised almost entirely of SVM‐classified lymphocytes (red). The majority of nonlymphocytes have a lower CD45 intensity (blue), with the remainder of bright CD45 cells classified as monocytes. (D) A frequency curve of CD34 intensities for all cells with a CD34 intensity greater than two log units (black) reveals a heterogeneous distribution of CD34 intensities. SVM‐classification reveals a homogenous, high intensity CD34 peak for the uncommitted progenitor cells (red) compared to the lineage committed progenitor cells (blue).
Prediction efficiency of each SVM
| Lymphocyte SVM | Uncommitted progenitor cell SVM | Lymphocyte expert 2 | Uncommitted progenitor cell expert 2 | |
|---|---|---|---|---|
| Sensitivity | 0.994 | 0.944 | 0.934 | 0.974 |
| Specificity | 0.991 | 0.998 | 0.996 | 0.998 |
| MCC | 0.948 | 0.904 | 0.940 | 0.921 |
Average sensitivity, specificity, and MCC values were computed by calculating the mean of all sensitivity, specificity, and MCC measurements for the 50 test predictions for both the SVM and expert 2 (LP). Classifications from expert 1 (MRL) were designated as the true classifications.
Target population variability between replicate analyses
| Lymphocytes | Uncommitted progenitor cells | |
|---|---|---|
| Replicate FSC SD (linear units) | 3.28 | 25.6 |
| Replicate SSC SD (log units) | 0.022 | 0.053 |
| Replicate CD45 SD (log units) | 0.017 | 0.032 |
| Replicate CD34 SD (log units) | N.A. | 0.031 |
| Replicate frequency SD | 0.0053 | 0.0019 |
In each test patient, the average FSC, SSC, and CD45 intensities were calculated for the eight lymphocyte and uncommitted progenitor cell predictions. Replicate CD34 intensities were additionally computed for uncommitted progenitor cells but not lymphocytes, as the lymphocytes do not express this gene product. The variation (standard deviation) between these eight measurements was calculated for each test patient and averaged for the test patient cohort. Additionally, the predicted population frequency (predicted events/total events) was calculated for each of the eight predictions. The variation (standard deviation) of the predicted population frequency between the eight predictions was calculated for each patient and averaged for the test patient cohort.