| Literature DB >> 27594698 |
Holger Hennig1, Paul Rees2, Thomas Blasi3, Lee Kamentsky4, Jane Hung4, David Dao4, Anne E Carpenter4, Andrew Filby5.
Abstract
Imaging flow cytometry (IFC) enables the high throughput collection of morphological and spatial information from hundreds of thousands of single cells. This high content, information rich image data can in theory resolve important biological differences among complex, often heterogeneous biological samples. However, data analysis is often performed in a highly manual and subjective manner using very limited image analysis techniques in combination with conventional flow cytometry gating strategies. This approach is not scalable to the hundreds of available image-based features per cell and thus makes use of only a fraction of the spatial and morphometric information. As a result, the quality, reproducibility and rigour of results are limited by the skill, experience and ingenuity of the data analyst. Here, we describe a pipeline using open-source software that leverages the rich information in digital imagery using machine learning algorithms. Compensated and corrected raw image files (.rif) data files from an imaging flow cytometer (the proprietary .cif file format) are imported into the open-source software CellProfiler, where an image processing pipeline identifies cells and subcellular compartments allowing hundreds of morphological features to be measured. This high-dimensional data can then be analysed using cutting-edge machine learning and clustering approaches using "user-friendly" platforms such as CellProfiler Analyst. Researchers can train an automated cell classifier to recognize different cell types, cell cycle phases, drug treatment/control conditions, etc., using supervised machine learning. This workflow should enable the scientific community to leverage the full analytical power of IFC-derived data sets. It will help to reveal otherwise unappreciated populations of cells based on features that may be hidden to the human eye that include subtle measured differences in label free detection channels such as bright-field and dark-field imagery.Entities:
Keywords: Feature selection; High-throughput; Imaging flow cytometry; Machine learning; Open-source software; Profiling
Mesh:
Year: 2016 PMID: 27594698 PMCID: PMC5231320 DOI: 10.1016/j.ymeth.2016.08.018
Source DB: PubMed Journal: Methods ISSN: 1046-2023 Impact factor: 3.608
Fig. 1Guidance on choosing cytometric method and analysis method. Any researcher who wants to use cytometry technology to ask a defined question should consider “what is the best approach” based on the question. For example if morphological/spatial information is not required then so-called “zero resolution flow cytometry” is best. If however the question absolutely requires imagery, then the sample type should next be considered, is it tissue? Can it be disaggregated? Could it be analysed in such a way that the spatial relationship of individual cells is lost? In our experience, IFC is best applied to situations where the cells biology can still be analysed when in suspension. This could still be disaggregated tissue or adherent cells and not just cells that exist in suspension. If the target cell population is rare, then suspension-based high throughput analysis is often necessary to collect sufficient events for statistical confidence. Once the IFC data is collected, several options can be chosen for data analysis. This figure summarises these options in light of our proposed solution. The historical option is to rely entirely on IDEAS software to perform a potentially subjective, iterative image analysis that involves adapting the masking/segmentation rules to best identify key pixels within an image channel and then to try and select the best feature calculated on these pixels with the aim of resolving different phenotypes from one another. This approach can be partially automated using the so-called “find the best feature” method. We propose however that a deeper analysis of features is more appropriate to IFC data sets. In this regard we have developed and validated a machine learning-based approach to analyse IFC data that has been corrected and compensated in IDEAS (.rif to .cif conversion). We then use the open source image analysis platforms CellProfiler and CellProfiler Analyst to better interrogate the imagery. Even in cases where the IDEAS-based iterative approach works very well, as is often the case when the outcome is well defined, there may be benefit to re-analysing these data using the approach presented here. It may uncover unappreciated features - in our own experience, this allowed us to perform a label-free classification of cell cycle stages, thus eliminating the need to add potentially confounding dyes to our cells [6].
Fig. 2In-focus single cells are gated from the population using bright-field images. Left: cells with a sufficiently high gradient RMS are in-focus (left). Right: objects with a high aspect ratio (a measure of circularity, y-axis) and a mask area that is neither too high nor too low (x-axis) represent single cells.
Fig. 3(A) Previous protocol for high-throughput data analysis for imaging flow cytometry [6]. (B) New protocol for high-throughput data analysis in imaging flow cytometry, built from open-source, user-friendly software. (C) Alternative new protocol for high-throughput data analysis in imaging flow cytometry, describing use of various alternate tools at each step.
Fig. 4Classification of the cell cycle of Jurkat cells using machine learning in CellProfiler Analyst. The cell images can be sorted via drag & drop into the five different bins at the bottom, which are interphase (G1/S/G2) and the four mitotic phases: prophase (pro), metaphase (meta), anaphase (ana) and telophase (telo). The classifier, here GradientBoosting, is first trained (train button) and then the training set is cross-validated (evaluate button). With the score all button one can predict the cell cycle phase of all cells in the data set.
Fig. 5Label-free prediction of cell-cycle phases using Gradient Boosting classification. The true positive rate (which is the ratio between correctly scored phase and total number of cells in that phase) is more accurate for GradientBoosting than for Random Forests classification, in particular for metaphase and anaphase.
Fig. 6Label-free prediction of cell-cycle phases using a Random Forest classifier.
Confusion matrices for GradientBoosting and Random Forests classifier.
| GradientBoosting | Predicted class | |||||
|---|---|---|---|---|---|---|
| Int | Pro | Meta | Ana | Telo | ||
| True class | Int | 93.63 | 4.64 | 1.63 | 0.07 | 0.03 |
| Pro | 18.16 | 65.53 | 15.79 | 0.26 | 0.26 | |
| Meta | 0.00 | 0.00 | 62.50 | 37.50 | 0.00 | |
| Ana | 0.00 | 0.00 | 20.00 | 80.00 | 0.00 | |
| Telo | 0.00 | 0.00 | 0.00 | 8.33 | 91.67 | |
| Random Forests | Predicted class | |||||
| Inter | Pro | Meta | Ana | Telo | ||
| True class | Int | 93.19 | 5.43 | 1.35 | 0.00 | 0.03 |
| Pro | 21.32 | 67.89 | 10.00 | 0.00 | 0.79 | |
| Meta | 37.50 | 12.50 | 50.00 | 0.00 | 0.00 | |
| Ana | 0.00 | 40.00 | 20.00 | 40.00 | 0.00 | |
| Telo | 0.00 | 0.00 | 0.00 | 0.00 | 100.00 | |
Fig. 7Example images of 224 cells from the test set where the cell cycle phase was predicted using machine learning (GradientBoosting). All cells displayed were deemed prophase based on our ground truth. Ground truth was obtained from the fluorescence markers via gating in IDEAS.
The top 20 features for both the GradientBoosting and Random Forest algorithms.
| 1. SSC_Granularity_1_DF_image | 11. BF_AreaShape_Zernike_2_2 |
| 2. BF_AreaShape_MajorAxisLength | 12. BF_Texture_InfoMeas2_BF_image_3_0 |
| 3. BF_AreaShape_Compactness | 13. BF_Intensity_MinIntensityEdge_BF_image |
| 4. BF_AreaShape_MaxFeretDiameter | 14. BF_RadialDistribution_RadialCV_BF_image_2of4 |
| 5. BF_Intensity_MeanIntensity_BF_image | 15. BF_Intensity_IntegratedIntensityEdge_BF_image |
| 6. BF_Intensity_MeanIntensityEdge_BF_image | 16. BF_Intensity_LowerQuartileIntensity_BF_image |
| 7. BF_AreaShape_MaximumRadius | 17. BF_Granularity_1_BF_image |
| 8. BF_RadialDistribution_MeanFrac_BF_image_4of4 | 18. BF_RadialDistribution_MeanFrac_BF_image_3of4 |
| 9. BF_AreaShape_Eccentricity | 19. SSC_RadialDistribution_RadialCV_DF_image_2of4 |
| 10. BF_Intensity_MassDisplacement_BF_image | 20. BF_RadialDistribution_MeanFrac_BF_image_2of4 |
| 1. BF_Intensity_IntegratedIntensityEdge_BF_image | 11. BF_RadialDistribution_MeanFrac_BF_image_3of4 |
| 2. BF_Intensity_MeanIntensityEdge_BF_image | 12. BF_AreaShape_Perimeter |
| 3. BF_AreaShape_MeanRadius | 13. BF_RadialDistribution_FracAtD_BF_image_2of4 |
| 4. BF_Intensity_MeanIntensity_BF_image | 14. BF_AreaShape_MaxFeretDiameter |
| 5. BF_AreaShape_MaximumRadius | 15. BF_RadialDistribution_MeanFrac_BF_image_2of4 |
| 6. BF_Intensity_IntegratedIntensity_BF_image | 16. BF_Intensity_MassDisplacement_BF_image |
| 7. BF_AreaShape_MajorAxisLength | 17. BF_RadialDistribution_FracAtD_BF_image_4of4 |
| 8. BF_AreaShape_MinorAxisLength | 18. BF_AreaShape_Zernike_0_0 |
| 9. BF_AreaShape_Area | 19. BF_AreaShape_Eccentricity |
| 10. BF_Intensity_MinIntensityEdge_BF_image | 20. BF_RadialDistribution_MeanFrac_BF_image_4of4 |