Literature DB >> 29573668

Accurate label-free 3-part leukocyte recognition with single cell lens-free imaging flow cytometry.

Yuqian Li¹, Bruno Cornelis², Alexandra Dusa³, Geert Vanmeerbeeck³, Dries Vercruysse³, Erik Sohn³, Kamil Blaszkiewicz³, Dimiter Prodanov³, Peter Schelkens², Liesbet Lagae⁴.

Abstract

Three-part white blood cell differentials which are key to routine blood workups are typically performed in centralized laboratories on conventional hematology analyzers operated by highly trained staff. With the trend of developing miniaturized blood analysis tool for point-of-need in order to accelerate turnaround times and move routine blood testing away from centralized facilities on the rise, our group has developed a highly miniaturized holographic imaging system for generating lens-free images of white blood cells in suspension. Analysis and classification of its output data, constitutes the final crucial step ensuring appropriate accuracy of the system. In this work, we implement reference holographic images of single white blood cells in suspension, in order to establish an accurate ground truth to increase classification accuracy. We also automate the entire workflow for analyzing the output and demonstrate clear improvement in the accuracy of the 3-part classification. High-dimensional optical and morphological features are extracted from reconstructed digital holograms of single cells using the ground-truth images and advanced machine learning algorithms are investigated and implemented to obtain 99% classification accuracy. Representative features of the three white blood cell subtypes are selected and give comparable results, with a focus on rapid cell recognition and decreased computational cost.

Entities: Chemical Disease Gene Species

Keywords: Flow cytometry; Hologram; Lens-free imaging; Three-part differential; White blood cell

Mesh：

Year: 2018 PMID： 29573668 PMCID： PMC5933530 DOI： 10.1016/j.compbiomed.2018.03.008

Source DB: PubMed Journal: Comput Biol Med ISSN： 0010-4825 Impact factor: 4.589

Introduction

White Blood Cells (WBC), or leukocytes, are the principal cells of the immune system and their main function is to protect and defend the body against foreign pathogens. Having precise information on the relative proportions of each of the three main white blood cell subtypes (i.e. lymphocyte, granulocyte and monocyte), a parameter called a three-part differential, constitutes important evidence in diagnostics of diseases as diverse as leukemia, viral or bacterial infections [1]. Conventional blood analysis is usually performed using hematology analyzers and elaborate microscopes, which are typically expensive and bulky, and require highly trained personnel to operate and accurately analyze results. Efforts are being undertaken to develop novel blood cell imaging technologies that can be integrated on-chip and bring about potential miniaturization of the current tools, rendering them amenable to the point-of-need [[2], [3], [4]]. Several studies demonstrated new imaging technologies that provide both spatial and spectral information to support cell identification and can also potentially be integrated into on-chip blood analysis systems. Examples include hyperspectral imaging [[5], [6], [7], [8], [9]], multispectral imaging [10] or Raman spectroscopic imaging [11]. However, these studies were limited to the examination of blood smear samples, the preparation of which has been shown to yield different cell morphologies when compared to cells in suspension [12]. Imaging flow cytometers [[13], [14], [15], [16]] can provide optical and morphological information of cells in suspension beyond the single feature of a fluorescent label, but most systems are built with very complex optical components. Recently, novel lens-free imaging technologies have been described which are characterized by the absence of complex optical systems, offering advantages in terms of portability, scalability, and cost-effectiveness, and thereby bringing new opportunities for in-flow blood analysis [17]. Our group has developed a lens-free flow cytometer based on Digital Holographic Microscopy (DHM) for single leukocyte analysis [18]. The system captures holographic images of suspended single cells flowing by in a microfluidic channel. Earlier results on a three-part differential have been demonstrated by extracting two basic image features, diameter as the measure for cell size and the internal complexity of a cell quantified using ridge detection based on scale-space analysis. In order to extend and improve this analysis, automation of a stable workflow for leukocyte recognition is required. Automated cell recognition usually requires several essential processing steps including feature extraction [15,16,[19], [20], [21], [22]], feature selection [15,[23], [24], [25], [26]] and classification [15,16,[19], [20], [21], [22]]. However, simply borrowing the existing methods does not guarantee good results due to the differences in the image modalities or the differences in cell morphology. Therefore, a comprehensive feature study and machine learning algorithm comparison is needed to enhance the accuracy of 3-part leukocytes classification and bring it to the level of accuracy required in the clinic. In this work, we focus on the image processing and cell recognition to improve the 3-part leukocytes classification using an extension of our lens-free imaging flow cytometer, which provides a reliable ground-truth for supervised classification. We propose an automated and stable workflow, built around a high-dimensional feature space, that yields high classification accuracy. Features are analyzed and selected for dimensionality reduction. Several machine learning algorithms are compared and used for classification. The rest of this paper is structured as follows: The cell preparation and experimental setup are described in Section 2. In Section 3, we present the entire automated data processing pipeline for 3-part leukocyte recognition, including image preprocessing, improved auto-focus hologram reconstruction, feature extraction based on both optical and morphologic characteristics, feature selection and classification. In Section 4, we investigate the performance of the extracted features as well as feature subsets by evaluating their classification accuracy using different classifiers. We discuss the advantages and limitations of this technology and some remarks about our experiments in Section 5. The concluding remarks are summarized in the final section of the paper.

Materials and data acquisition

Cell preparation

All experiments were performed in compliance with the Belgian (7th May 2004 law related to experiments on human persons), international regulations (Directive 2001/20/EC) and institutional guidelines for medical research. Blood samples were obtained from healthy donors having consented to the goals and objectives of the study after signing a written informed consent. The study protocol was approved by the Medical Ethical Committee of University Hospital of the University of Leuven (UZ Gasthuisberg), Ref. S57599. Two milliliter aliquots of whole blood were incubated with Phycoerythrin-conjugated anti-CD3, anti-CD14 and anti-CD15 antibodies (all from BD Biosciences) to specifically label lymphocytes, monocytes and granulocytes, respectively. Samples were then washed, re-suspended in BD FACS Lyse solution in order to lyse the red blood cells, washed again and finally re-suspended in running buffer (PBS + 0.5%BSA +2 mM EDTA). Cell concentrations of each sample were measured with a Scepter Cell Counter (Merck Millipore) and adjusted to cells/mL. Each fluorescently labeled cell sample was individually loaded into the system and analyzed in flow. Hologram acquisition of single cells was triggered by detection of the fluorescence signal emitted by each labeled cell, ensuring only cells of interest were imaged. At least 5000 holograms were acquired for each white blood cell subtype, which were subsequently reconstructed and analyzed.

Acquisition of holographic images

A schematic of the imaging system is depicted in Fig. 1. The acquisition of the single cell images is triggered by the detection of a fluorescence signal emitted by each labeled cell. This procedure ensures only cells of interest are imaged. The holographic imaging was performed on our lens-free microscopic system adapted for fluorescent signal triggering and recording. At least 5000 holograms were acquired for each leukocyte subtype and the holographic images of the cells are subsequently reconstructed and analyzed.

Fig. 1

Schematic drawing of the imaging set up. The inset is the detected fluorescent signal for camera triggering.

Schematic drawing of the imaging set up. The inset is the detected fluorescent signal for camera triggering. A CMOS camera is connected to a transparent glass microchip, in which microfluidic components hydrodynamically focus a stream of cells in a focusing channel. This stack of components is suspended on a manual stage which controls linear, vertical and tilt motion above a static optical stack. The optical stand redirects a 488 nm laser and a 532 nm laser for fluorescent excitation and imaging respectively. These are both focused in a 20× objective through the pinhole array at the top of the stand, which contains a small pinhole for imaging illumination and a large window for fluorescent excitation and emission. These lasers are redirected with mirrors in the optical stand, which include one for redirecting emitted fluorescent signals to a photomultiplier tube to trigger the CMOS camera. Fluorescent signal values are read by a Field Programmable Gate Array (FPGA) and displayed to the user. Values surpassing a user-set threshold are triggered with the rising edge, which initiates fluorescent signal saving and subsequent cell imaging. After imaging is initiated, the cell passes through a detection region, which records the fluorescent signal on the FPGA. The excitation laser is subsequently turned off for imaging and a 532  nm ns laser is used for illumination. Therefore, only fluorescently detected cells are imaged; generated image files are saved on the PC hard drive, along with two consecutively-timed images used later to remove the background from the image. The fluorescent signal is later correlated with the generated image for future use as ground-truth during the validation phase of the classification algorithms. A Labview program loaded onto the FPGA controls the timing between the emission of the excitation laser, fluorescent signal collection, emission of the imaging laser and subsequent cell imaging.

Image processing and WBCs recognition

Holograms of leukocytes in suspension display different morphologies compared to cells from blood smears under microscope as shown in Fig. 2. The data analysis pipeline has been improved for robust automatic 3-part leukocytes recognition with high accuracy as shown in Fig. 3. Compared to our previous study [18], we have added several steps in the preprocessing and reconstruction part to control the image quality. In the image processing and recognition part, relevant features are developed and selected for good classification performance. We will focus the added steps and focus on the image processing and cell recognition steps of the pipeline depicted in Fig. 3.

Fig. 2

Leukocyte subtypes. First row: Cells under conventional microscope. Second row and third row: Reconstructed lens-free cell images.

Fig. 3

Leukocyte recognition pipeline.

Leukocyte subtypes. First row: Cells under conventional microscope. Second row and third row: Reconstructed lens-free cell images. Leukocyte recognition pipeline.

Data preprocessing and reconstruction

The preprocessing and reconstruction steps in our pipeline are crucial to guarantee the quality of our cell images for subsequent classification. The procedure is similar to the one presented in our earlier study [18] with several adjustments to improve robustness. In the preprocessing steps, the background is removed from the hologram of the cell. The intensity level of holograms is normalized with regard to the illumination intensity, and an extra step for the automatic estimation of reconstruction depth is added, as described below. After reconstruction, pixel size is scaled to the same pitch using calibration information.

Preprocessing

Prior to reconstruction, the background from each cell frame needs to be subtracted to obtain a ‘cleaned’ hologram. If a background frame contains platelets or reflections of nearby floating cells, it may introduce extra artifacts to the reconstructed cell image. Therefore, we calculate a median background image from a stack of images that do not contain any cells and use it to perform background subtraction for the cell frame in the same stack. Only cells close to the illumination center are used for further analysis. Illumination normalization is performed in a cropped small window around the cell. A normalization factor is calculated as the ratio of the sum of pixel intensities from the cropped cell image and that from the corresponding background image.

Automatic estimation of reconstruction depth

The angular spectrum reconstruction method [27] essentially relies on three parameters. Two of those, i.e. pixel pitch and illumination laser wavelength, are determined by the hardware setup. The reconstruction depth however can vary from experiment to experiment or, in some cases, even within one experiment because of a drift in the setup. Choosing the reconstruction depth appropriately ensures that the reconstructed cell is in-focus and the image is sharp. A wide range of measures to estimate the focus of an image were proposed in literature [28]. We observed that a combination of three measures and the use of a voting scheme provides the most robust solution for our data. Two of the measures are based on the gradient mean and standard deviation of the pixel intensities of the reconstructed cell. The motivation being that the sharpest reconstruction will have the highest gradient energy. The third metric relies on using entropy to measure the sharpness of the reconstruction, as proposed in Ref. [29]. The entropy is computed for an image of pixels as follows:where is the real part of the complex reconstruced image and is the total energy of the image. At the start of each experiment, cells are reconstructed at a discrete number of reconstruction depths within a predefined range and, for each of these reconstructions, the three aforementioned measures are computed. The optimal reconstruction depth is determined from the majority vote of the three computed measures. Usually, all measures are in agreement and the optimal reconstruction depth is chosen accordingly. However, in the case that all three measures differ, the reconstruction depth of the previous reconstruction is chosen, as it is safe to assume that the reconstruction depth will not be much different between two consecutive acquisitions within the same experiment.

Pixel pitch calibration

To assure the measures from different experiments are consistent in scale, the system needs to be calibrated. Calibration patterns etched into the chip, which consist of an array of holes with a known size, are used. The optical zoom can be controlled by vertically adjusting the camera and microchip stack in relation to the optical stack. Images of the calibration patterns at various zoom levels are captured before and after cell image acquisition. For each captured image, the averaged distance between the holes on the chip is used as one sample point to measure the magnification factor of the system by linear fitting of sample points measured at different depths. After reconstruction, cell images are then scaled to the pixel pitch of 50 nm using the estimated magnification factor while the reconstruction depths are known.

Feature extraction

Extracting proper features is one of the most important steps for image classification. Besides the two basic cell features we proposed in Ref. [18], i.e. the cell diameter characterizing the cell size and the cell ridge characterizing the cell internal structure, we further extract more advanced image features describing the morphological, optical and biological characteristics of the leukocytes to increase the accuracy of 3-part leukocyte classification. The features used in this study are listed in Table 1.

Table 1

Feature list.

Features	Notation	Number of features	Feature type
Cell diameter	cd	1	Basic morphology

Cell ridge	cr	1	Morphological and optical

Intensity level of the cell edge	ei	1	Optical

Width of the cell edge	ew	1	Basic morphology

Image moments:(mo24×1)	Raw moments: m00, m10, m01,m20, m11, m02, m30, m21, m12, m03	10	Translational invariants

	Central moments: μ20, μ11,μ02, μ30, μ21, μ12, μ03	7

Central normalized mometns:η20, η11, η02, η30, η21, η12, η03	7

Hu moments:(hu7×1)	hu1, …, hu7	7	Translational androtational invariants

Zernike moments:(zm25×1)	z00, z11, z20, …, z88	25	Rotational invariants

Feature list.

Basic features

Basic features, i.e. cell diameter and cell ridge, were computed using the same scale-space analysis as in our earlier study [18]. The cell edge is detected using the zero-crossing of the second-order directional derivative of the reconstructed phase image. The cell diameter is then measured by fitting a circle on the edge of the cell. The cell ridge is measured on the reconstructed amplitude image which contains more structural information inside the cell edge than the phase image as shown in Fig. 2. These two features showed the potential for 3-part leukocytes differentiation [18] and are thus included in our feature set for classification.

Features based on cell edge

In conventional stained blood smears, monocytes are morphologically characterized by a bean-shaped nucleus [30] while electron micrographs of monocytes reveal a slightly ruffled cell membrane [31]. In the lens-free image reconstructions, we also observe that monocytes usually present a wider and more ruffled cell edge. Moreover, its intensity level is lower than those of the granulocytes and lymphocytes. Here we develop a new feature group to extract information from the cell edge. First, the cell edge defined by the phase image is overlaid onto the amplitude image and a set of the normal lines are drawn around the cell edge, as shown in Fig. 4. Since the cell edges are characterized by low amplitude values shown as the dark color, the intensities curve of the normal line shows an undershoot peak at the location where the cell edge intersects with the normal line. The minimum value of the undershoot peak represents the intensity level of the cell edge at the intersection point and the halfwidth of the undershoot peak is used to measure the cell width at that intersection point. Occasionally, the cell edge measured from a normal line can be either too wide because the cell edge is conjunct with internal granules or other artifacts near the cell edge, or too narrow due to artifacts. Such cases are considered as outliers. We use the remaining normal lines to calculate the mean intensity and the mean width as features based on the cell edge, namely the edge intensity and the edge width, respectively.

Fig. 4

Measurement of cell edge. a: Cell edge and normal lines overlaid on the amplitude image. Green: normal lines of cell edges. Blue line: cell edge obtained from phase image. b: Blue line: intensities of the amplitude located on one of the normal line. Green block: The full width half minimum of the undershoot peak. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

Moments

Image moments are used to describe global characteristics of images, such as spatial distribution, shape, orientation, etc. They have shown good feature representation capabilities and have been widely used in image processing and visual recognition [32,33]. Basic image moments consist of three groups of moments, i.e. raw moments, which present simple 2D spatial information, central moments, which are translational invariant and normalized central moments, which are invariant with respect to both translation and scale. We construct these features up to the 3rd order from the amplitude of reconstructed cell images using OpenCV [34]. As a result, 24 moments are obtained for each cell, including 10 spatial moments, 7 central moments and 7 normalized central moments. Based on the basic normalized central moments, which are invariant to translation and scale, Hu [35] constructed seven rotation invariant moments, which have been successfully applied in recognizing patterns at different scale and with different orientations [36,37]. The first six Hu moments (HU) are invariant to scale, translation and rotation. The seventh moment is skew invariant, meaning it can be used to distinguish mirror images or reflections. In our study, we calculated the seven Hu moments using OpenCV [34] as one group of features to investigate their potential in leukocyte classification. Zernike Moment (ZM), based on the Zernike polynomials, are one of the powerful moment techiques developed later in Ref. [38]. The ZMs are orthogonal polynomials, thus the redundancy between moments is maximally reduced. Zernike moments have been favorably used as shape descriptors for cell images [[39], [40], [41]]. Here, we calculated twenty-five Zernike moments using the approach of Mahotas [42] with degree = 8.

Evaluation of the importance of features

The feature extraction step produces feature vectors with a total of 60 elements per data sample. The resulting high dimensional feature space might cause many machine learning algorithms to overfit the data, due to the so called curse of dimensionality. Moreover, calculating such large feature vectors is computationally expensive. Evaluating the features (listed in Table 1) and selecting those with higher impact on the designed classification task, helps to construct more accurate and compact classification models, reducing the chance of overfitting and the overall computational cost.

Analysis of variance

Analysis of variance (ANOVA) estimates whether the means of the variables among different groups are equal. This provides a statistical significance measure, known as a p-value, to evaluate if the population means are all the same. Multiple comparison test is designed to control the overall significance level for multiple groups cases and provide pairwise results. In our study, there are three cell types, i.e. three groups. So we apply the multiple comparison test using Tucky's method [43] as a follow-up to ANOVA, with the null hypothesis that the population means are equal. The method is applied on every single feature group to provide insights about which feature is important and can be potentially useful for differentiating the cell types. Small p-value rejects the null hypothesis, indicating that the differences between group means are significant. Contrarily, large p-values indicate that the variation of the tested variable is not enough and thus might not be a suitable feature for 3-part WBCs classification.

Advanced feature selection using Linear Discriminant Analysis

Feature selection can alleviate the problem of overfitting and help to reduce the overall computational burden. In this paper, we mainly focus on selecting the most relevant features under the defined classification problem, which requires weighing the features using the labeling information. In this way, we select the most informative features based on their weighing and filter out those variables that have a lower impact on the 3-part classification of leukocytes. Linear Discriminant Analysis (LDA) models the difference between the classes with assumption of normally distributed classes and equal class covariances [26,44]. Consider a set of features with known class for each class k. Predictions can be obtained using the Bayes' theorem,where is the class conditional distribution. LDA enforces the assumption that the Gaussians of each class share the same covariance matrix . By matrix transformation, the data can be rescaled to make the covariance identical, which is similar as projecting the high dimensional data to a lower dimension subspace. Very similar to, but unlike Principal Component Analysis (PCA), LDA takes into account the class label. It projects the dataset onto a lower-dimensional space and finds the component axes which maximize the separation between classes. Then the weights are calculated to estimate the importance of each feature [45]. High weights indicate high importance of the corresponding features. As a pre-processing step for classification, we select the most relevant features based on these weights for our 3-part leukocyte classification.

Machine learning and validation

As inputs for machine learning algorithms, a feature vector as in Eq (3) for each cell image is obtained by stacking all the extracted cell features as proposed in Section 3.2. Each input feature is normalized to zero mean and unit variance prior to classification. To validate the extracted features on our unique data, classification for 3-part leukocyte is performed using 9 well-known machine learning methods, including nearest neighbors, support vector machine (SVM) with linear kernel (SVM-linear) and with radial basis function (RBF) kernel (SVM-RBF), decision tree, random forest, AdaBoost, naive Bayes, LDA, and quadratic discriminant analysis (QDA). All the machine learning algorithms are implemented using the scikit-learn package [45]. For result evaluation, confusion matrices are calculated. We use the true positive rate to show the classification accuracy for each cell type and use the precision to present the overal classification accuracy for our multi-class classification problem [46]. For the each cell class , where , the precision, which is an average per-class agreement of the data class labels with those of a classifiers, is defined as:where is true positive for , is the false positive for , and M indice represents macro-averaging.

Results

Experiments

We have established a ground-truth image library of 1911 images collected from several experiments, specifically 637 single cell images for each cell subtype, for evaluating the three-part classification accuracy. For each cell image, all features mentioned in Section 3.2 were extracted and we performed the statical analysis first. Then, we evaluated and compared the performance of nine machine learning methods as mentioned in Section 3.4 using the complete feature vector as in Eq (3). Finally, the machine learning method which showed the best performance for all the extracted features was then further used to evaluate the performance of the selected feature subsets. For machine learning, 25% of the images from the library containing three cell types are randomly selected as the test set. The rest of the images are randomly permutated 10 times. From each permutation, 75% of the images are used for training and then the trained classifier is tested on the test set. Averaged results of the 10 times train-and-test are used for performance evaluation.

Statistical analysis

ANOVA is applied to each of the 60 features to evaluate their variation among the 3 subtypes of leukocytes. Multiple pairwise comparison is applied to each single feature as well to evaluate their variance among each pair of cell subtypes. p-Values for each of the 60 extracted features per cell type are shown in Fig. 5 (features are listed in the same order as in Eq (3)). As explained earlier, the representative features should be among the features having small p-values. The red filled dots in Fig. 5 are the selected features with high importance based on LDA analysis. As shown, all the red dots have very small p-values, which supports the above statement. The detailed results of feature selection based on LDA will be shown and discussed in Section 4.4.

Fig. 5

One-way Anova and multiple pairwise comparison on different features. Features are listed in the same order as in Eq. (3). The features of high importance based on LDA analysis are marked and highlighted as red spots. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

Classification accuracy and comparison

The overall performance of 3-part leukocyte classification using all 60 features is evaluated on nine classifiers as shown in Fig. 6. All listed classification methods provide good classification accuracy. The highest classification accuracy of lymphocytes is given by SVM-linear, SVM-RBF, LDA and Nearest neighbors. The highest classification accuracy of granulocytes is given by SVM-linear. For monocytes, SVM and LDA perform best. The highest classification accuracy of monocytes is given by LDA. The overall highest classification accuracy is obtained using the SVM with a linear kernel.

Fig. 6

Classifiers comparison.

Classifiers comparison. We applied the SVM classifier with a linear kernel on different groups of features. Confusion matrices are given in Fig. 7. This provides valuable insight into the importance of each feature group. We can see that while using only diameter and ridge features, the errors mostly go into monocytes and granulocytes. Lymphocytes, which have smaller cell size, have less classification errors. These support our previous statement that these two features are not sufficient enough to separate granulocytes and monocytes well. While using the edge features, the wrong prediction between granulocytes and monocytes decreases. Some errors go into lymphocytes because the size information is not taken into account. Among all the features, the Zernike moments provide the highest classification accuracy. The highest accuracy (99%) was obtained using all 60 features. However, due to the large numbers of input features, the model is likely to have overfitted the data. Reducing the dimention of the feature vector and selecting the most important features is needed to avoid the risk of overfitting and decreasing the computational load.

Fig. 7

Confusion matrices for different feature groups.

Feature selection based on LDA

To evaluate the importance of the developed features and select the features that are most relevant to our 3-part classification problem, LDA is applied to the feature vectors with labeled information of the cell types. A weight vector of length equals to 60 was calculated for each of the three cell types to show the importance of the 60 features. Fig. 8 displays the feature weights computed using LDA. The plots on the left show the weights as dots for each cell type. Features that have high weights are notated. The sorted features in descending order of importance for each cell type are plotted in the right column. Negative weights indicate that the covariance matrix is not positive definite and some features are highly correlated. Therefore, when we sort and select features, we dont take account the weights with negative sign but large magnitude. As shown, the first six features, especially the first three features, are significantly larger than the rest.

Fig. 8

Feature selection using LDA. Left column: Cell-type specific weights of all 60 features (features are listed in the same order as in Eq. (3)). First six features with the highest weights are marked. Right column: Cell-type specific features sorted by weights. The most relevant six features to discriminate lymphocytes are the six image moments, specifically, , , , , and (in descending order of importance). The most relevant six features to discriminate granulocytes are also image moments, specifically, , , , , and (in a descent order of importance). The most relevant six features to discriminate monocytes are , , , , and (in descending order of importance). We can observe that there are some overlaps among the first six features of different cell types. For instance, has high importance for both granulocytes and monocytes. By combining those relevant features for each cell type, we obtain 12 features in total which are the most important features for the 3-part classification, i.e. , , , , , , , , , , and . We also highlight these 12 selected features in Fig. 5 with filled red color. The selected features all have very small p-values, meaning they are significantly different between groups. This result implies that it might be sufficient to perform 3-part classification using these 12 features instead of all the 60 features. To investigate how the feature selection influences the classification performance, we applied SVM-linear to different numbers of selected features using the same shuffle-split strategy. The extracted features are first sorted in descending order based on their importance evaluated using LDA, as shown in Fig. 8. Then, SVM with linear kernel is applied consecutively to the most important features for each cell types (correspoinding to three features in total), to the 2 most important features for each cell type (corresponding to 5 features in total, because is the first important feature for monocytes and the second important feature for granulocytes), and so on until all the 60 features are used. The classification accuracy for each cell type as well as the overall accuracy are shown in Fig. 9. We can see that the classification accuracy is low when we only select the first 3 features for each cell. When increasing the number of features, the classification accuracy increases as well. Using the 6 most important features from each of the 3 cell types (corresponding to 12 features in total as mentioned above) results in an overall classification accuracy of 95%. If we use one more feature from each of the 3 cell types (corresponding to 15 features in total), in particular as the 7th ranked feature for lymphocytes, cell diameter as the 7th ranked feature for granulocytes, and as the 7th ranked feature for monocytes, the overall classification accuracy increases to 99%, which is comparable to the performance when using all 60 features. After having added the 7 most relevant features for each cell type, the classification accuracy starts to stabilize even when increasing the number of features. This result confirms our previous assumption that the remainder of the features are redundant to our classification task.

Fig. 9

Classification accuracy of selected features.

Classification accuracy of selected features. Experiments coded with unopimized python implemention on a 2.5 GHz Intel Core i7 64-bit system indicate a time of 28 ms for calculating the 12 features instead of more than 1s for calculating all 60 features from one cell, i.e. 50 times faster. Implementation in C++ or GPU is expected to be orders of magnitude faster.

Discussion

Advantages and limitations for lens free imaging for flow cytometry

Conventional flow cytometry is an advanced technique that is routinely used in the diagnosis of health disorders. Combining with imaging technology could be envisioned as an extension of the conventional flow cytometry. Especially in the case of hematological analysis, having access to the images of each individual blood cell included in three-part classification could offer clues into its morphology. Though extracting detailed morphology features from reconstructed images may prohibit real-time computation, offline analysis of the extra information obtained from images such as size, complexity, internal content, shape, gradient, etc. would create an extra piece of information as validation that could help hematologists refine their diagnoses. In conventional analyzers or in a flow cytometer, such morphological information is missing as no images are generated during the analysis. The only tool currently on the market which recognized the utility of combining imaging with flow cytometry is the Merck Millipore Amnis®. However, due to its bulky size and complicated optics, this system is strictly restricted to the research field. In fact, conventional flow cytometers are typically large and bulky tools which currently require specialized laboratories or centralized facilities, and could not be placed in a doctors office or a smaller clinic. An extension that is amenable to miniaturization would be a big step for potential presence at the point of care. As the optical elements required by the lens-free imaging technique are overall quite simple and easily miniaturized, a device based on this technology could be fabricated more easily and could reside at the point of need.

Consistency of experiments

Our experimental data are collected from several independent experiments. The consistency of experiments is crucial in order to ensure that we classify the cell differences instead of experiments. The two main potential sources of experimental variation are from the sample preparation and cell image acquisition. In order to guarantee no variation in the cell preparation, all samples were prepared following strict protocols as described in Section 2.1. During image acquisition, a list of measurement protocols, including a control of illumination intensity, size of illumination area, position of cells, etc., have been implemented to control and optimize the stability of the image quality and acquisition consistency. To further safeguard the consistency of the measured images, we have implemented several preprocessing steps prior to the classification. Auto-focus function assigns the best focus depth to the reconstructed images. Calibration, which determines the scaling factor, ensures the consistency of the cell size over different experiments. We have also implemented the background subtraction and intensity normalization to make sure the brightness of all the images are maintained at comparable levels.

Model evaluation procedure

K-fold cross-validation is known as the standard way to evaluate the prediction model while the data are scarce. It separates the data into K roughly equal-sized parts and iterates K times to estimate the averaged error. However, while using the random permutations cross-validation, there is always a trade-off between the different size of the training set and the variation of performance depending on the choice of K [47]. Due to the limit of our data size, if we use the K-fold (or Leave One Out, etc. similar strategies) with a large K (for instance K = 10), the testing set will be very small. Shuffle and split is an alternative to K-fold cross validation that allows finer control on the number of iterations and the proportion of samples on each side of the train/test split [45]. We intend to have relatively big amount of training/testing datasets and also enough iterations to avoid bias. Specifically, we used 25% of the dataset as an independent testing set, ensuring the size of the testing set is enough. Then, training sets were selected from the permuted rest of the dataset for 10 times, making sure the classifiers are trained on different subset of the data, and then tested on the independent test dataset. This helps to avoid the dependency problem mentioned in Ref. [47], as well as the variation and bias problems.

Conclusion

In this study, we performed an in-depth analysis for 3-part leukocyte classification using the complex holographic image data emerging from our advanced single cell lens-free imaging system. Both morphological and optical features were extracted from each single cell frame. The most relevant features for 3-part classification were selected and several classifiers were considered, with SVM-linear performing best in terms of overall classification accuracy. A classification accuracy up to 99% is obtained for 3-part leukocyte differential, hence validating of our lens-free holographic imaging system. Comparable classification accuracies are obtained using the reduced feature sets, showing the potential for fast computation for cell sorting based on the reconstructed images. This study helps to further pave the way towards a miniaturized, scalable and high-throughput benchtop imaging flow cytometer. In the future, we will evaluate the proposed classification framework on downsampled cell images to further speed up the image analysis. We will also extend our cell recognition to other cell types and validate the extracted and selected features with whole blood samples.

Conflicts of interest

None declared.

25 in total

1. Cell sorting: divide and conquer.

Authors: Michael Eisenstein
Journal: Nature Date: 2006-06-29 Impact factor: 49.962

2. Comparing individual means in the analysis of variance.

Authors: J W TUKEY
Journal: Biometrics Date: 1949-06 Impact factor: 2.571

3. Leukocyte cells identification and quantitative morphometry based on molecular hyperspectral imaging technology.

Authors: Qingli Li; Yiting Wang; Hongying Liu; Xiaofu He; Dongrong Xu; Jianbiao Wang; Fangmin Guo
Journal: Comput Med Imaging Graph Date: 2013-12-19 Impact factor: 4.790

4. Comparison of quantitative methods for cell-shape analysis.

Authors: Z Pincus; J A Theriot
Journal: J Microsc Date: 2007-08 Impact factor: 1.758

5. Three-part differential of unlabeled leukocytes with a compact lens-free imaging flow cytometer.

Authors: Dries Vercruysse; Alexandra Dusa; Richard Stahl; Geert Vanmeerbeeck; Koen de Wijs; Chengxun Liu; Dimiter Prodanov; Peter Peumans; Liesbet Lagae
Journal: Lab Chip Date: 2015-02-21 Impact factor: 6.799

6. Fast multi-spectral imaging technique for detection of circulating endothelial cells in human blood samples.

Authors: Ihor V Berezhnyy; Svitlana Y Berezhna
Journal: J Biomed Opt Date: 2012-08 Impact factor: 3.170

7. Identification and differentiation of single cells from peripheral blood by Raman spectroscopic imaging.

Authors: Ute Neugebauer; Joachim H Clement; Thomas Bocklitz; Christoph Krafft; Jürgen Popp
Journal: J Biophotonics Date: 2010-08 Impact factor: 3.207

Review 8. Lensless Imaging and Sensing.

Authors: Aydogan Ozcan; Euan McLeod
Journal: Annu Rev Biomed Eng Date: 2016-01-25 Impact factor: 9.590

9. Peripheral blood monocytes show morphological pattern of activation and decreased nitric oxide production during acute Chagas' disease in rats.

Authors: Daniela L Fabrino; Leonor L Leon; Gleydes G Parreira; Marcelo Genestra; Patrícia E Almeida; Rossana C N Melo
Journal: Nitric Oxide Date: 2004-09 Impact factor: 4.427

10. Histological image classification using biologically interpretable shape-based features.

Authors: Sonal Kothari; John H Phan; Andrew N Young; May D Wang
Journal: BMC Med Imaging Date: 2013-03-13 Impact factor: 1.930

3 in total

1. Label-Free Identification of White Blood Cells Using Machine Learning.

Authors: Mariam Nassar; Minh Doan; Andrew Filby; Olaf Wolkenhauer; Darin K Fogg; Justyna Piasecka; Catherine A Thornton; Anne E Carpenter; Huw D Summers; Paul Rees; Holger Hennig
Journal: Cytometry A Date: 2019-05-13 Impact factor: 4.355

2. Deep-Learning Based Label-Free Classification of Activated and Inactivated Neutrophils for Rapid Immune State Monitoring.

Authors: Xiwei Huang; Hyungkook Jeon; Jixuan Liu; Jiangfan Yao; Maoyu Wei; Wentao Han; Jin Chen; Lingling Sun; Jongyoon Han
Journal: Sensors (Basel) Date: 2021-01-13 Impact factor: 3.576

3. Machine learning issues and opportunities in ultrafast particle classification for label-free microflow cytometry.

Authors: Alessio Lugnan; Emmanuel Gooskens; Jeremy Vatin; Joni Dambre; Peter Bienstman
Journal: Sci Rep Date: 2020-11-26 Impact factor: 4.379

3 in total