Jiaqian Li1, Kuo-Kun Tseng1, Zu Yi Hsieh2, Ching Wen Yang3, Huang-Nan Huang4. 1. Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen Graduate School, Shenzhen, China. 2. Department of Internal Medicine, Taichung Veterans General Hospital, Taichung, Taiwan. 3. Computer & Communication Center, Taichung Veterans General Hospital, Taichung, Taiwan. 4. Department of Mathematics, Tunghai University, Taichung, Taiwan.
Abstract
Indirect immunofluorescence based on HEp-2 cell substrate is the most commonly used staining method for antinuclear autoantibodies associated with different types of autoimmune pathologies. The aim of this paper is to design an automatic system to identify the staining patterns based on block segmentation compared to the cell segmentation most used in previous research. Various feature descriptors and classifiers are tested and compared in the classification of the staining pattern of blocks and it is found that the technique of the combination of the local binary pattern and the k-nearest neighbor algorithm achieve the best performance. Relying on the results of block pattern classification, experiments on the whole images show that classifier fusion rules are able to identify the staining patterns of the whole well (specimen image) with a total accuracy of about 94.62%.
Indirect immunofluorescence based on HEp-2 cell substrate is the most commonly used staining method for antinuclear autoantibodies associated with different types of autoimmune pathologies. The aim of this paper is to design an automatic system to identify the staining patterns based on block segmentation compared to the cell segmentation most used in previous research. Various feature descriptors and classifiers are tested and compared in the classification of the staining pattern of blocks and it is found that the technique of the combination of the local binary pattern and the k-nearest neighbor algorithm achieve the best performance. Relying on the results of block pattern classification, experiments on the whole images show that classifier fusion rules are able to identify the staining patterns of the whole well (specimen image) with a total accuracy of about 94.62%.
Autoimmune diseases, such as rheumatoid arthritis, primary biliary cirrhosis and
dermatomyositis, are individually rare in contrast with other kinds of diseases, but
together they affect the health of many people worldwide. They are a fascinating but poorly
understood group of diseases [1].
Antinuclear autoantibodies are a serological hallmark of most autoimmune diseases, and serve
as diagnostic biomarkers and classification criteria for a number of these diseases [2]. Although the role of
autoantibodies is still not clear, growing evidence shows that most autoimmune diseases are
confirmed to be in connection with the occurrence of specific auto-antibodies, such as
primary biliary cirrhosis [3].
However, antinuclear antibodies are also detectable in approximately 50% of subjects with
primary biliary cirrhosis. Several ANAs are associated with primary biliary cirrhosis, so
the connection of a specific ANA to the pathogenesis of primary biliary cirrhosis is not
known [3]. This demonstrates that
the relationship between autoimmune diseases and autoantibodies is not a single
correspondence.Although there are many tests for the detection of ANAs, such as indirect
immunofluorescence (IIF) and enzyme-linked immunosorbent assay (ELISA), IIF based on HEp-2
cell substrate during the serological hallmark is the most commonly used staining method for
antinuclear autoantibodies. Usually, the immunofluorescence patterns are manually identified
by the physician visually inspecting the slides under a microscope. Since IIF diagnosis
requires both the estimation of fluorescence intensity and the description of staining
patterns, adequately trained persons are not always available for these tasks, so this
procedure still needs highly specialized and experienced physicians to make the diagnoses.
As ANA testing becomes more used in clinics, an automatic inspection system for pattern
categories is in great demand [4].Before the classification of staining patterns, relevant patterns (see Figure 1) related to the most recurrent ANAs should be
considered [5], [6] in the experimental dataset.
Figure 1
ANA patterns in the experimental dataset: (a) coarse speckled (b) fine speckled (c)
nucleolar (d) peripheral.
Coarse Speckled: this pattern is characterized by coarse granular
nuclear staining of the interphase cell nuclei;Fine Speckled: this pattern is characterized by fine granular nuclear
staining of the interphase cell nuclei;Peripheral: this group is characterized by solid staining, primarily
around the outer region of the nucleus, with weaker staining toward the centre of the
nucleus;Nucleolar: this pattern is characterized by large coarse speckled
staining within the nucleus, less than six in number per cell.The aim of this paper is to design an automatic system with a two-layer classification
model, block pattern recognition and well pattern recognition, to identify the staining
patterns of the whole well based on block segmentation. In particular, the following points
will be investigated in the present study:Block segmentation. In contrast to the previous cell segmentation used
for ANA classification, block segmentation is significantly easier to implement and more
applicable due to the erroneous conditions of cell segmentation.Block pattern classification. Various image features (local binary
pattern (LBP), linear discrimination analysis (LDA), scale-invariant feature transform
(SIFT) and grey-level co-occurrence matrix (GLCM) and classifiers K-nearest neighbour
(KNN), Back Propagation Neural Network (BPNN) and support vector machine (SVM) are
compared in this step to seek the best characteristic and classifier for ANA
classification.Well pattern classification. Based on the results of the block pattern
classification, classifier fusion rules are used to identify the staining patterns of
the whole well. Meanwhile, a kind of cell pattern classification is regarded as the
control group.The rest of this paper comprises four parts. In Section 2, we introduce some related
studies on ANA patterns including segmentation, feature extraction and classification.
Section 3 presents the proposed method consisting of four steps: block segmentation, feature
extraction, block pattern classification and well pattern classification. Section 4 provides
the experimental results and comparison. Finally Section 5 is the conclusion and discussion.
Related Studies
2.1 Image Segmentation
The previous research on ANA image segmentation has mainly focused on cell segmentation
and the criteria for recognition of cell patterns, but a more applicable method of block
segmentation for ANA pattern classification has so far not been developed. Many
competitions and conferences research cell classification and cell segmentation, for
example, the competition on cell classification by fluorescent image analysis hosted by
the 20th IEEE International Conference on Image Processing (ICIP) and The 1st Workshop on
Pattern Recognition Techniques for Indirect Immunofluorescence Images.Creemers et al. [7] repeatedly
used image processing techniques, including morphological opening and Otsu thresholding,
to cut out the needed region of interest. It was found that this method has the capability
to split connected regions into individual cells with an average accuracy of 89.57%.Huang et al. [8] proposed an
efficient method for automatically detecting the outlines of fluorescent cells in IIF
images. This method first classified an IIF image into two cases, sparse and mass cell
regions, based on the number of connected regions in an image. Depending on the cell types
of the images, different colour spaces and processing techniques were adopted for cell
segmentation. For images with sparse region cells, HSB colour space, anisotropic
diffusion, canny edge detection and morphological smoothing are applied sequentially to
detect the cell outline, while for images with mass region cells, CMY colour space,
anisotropic diffusion, Otsu's thresholding and morphological processing are used.Hsieh et al. and Huang et al. [9],
[10] also presented a reliable
region-based method of two-staged watershed segmentation to solve a wide range of
difficult problems of ANA image segmentation, i.e. over-segmentation and sensitivity to
noise and contrast in the image. Region merging and region elimination were utilized for
the first stage watershed algorithm [11] to obtain the cell boundaries and in the second stage the similarity-based
watershed algorithm acted as the marker to prevent over-segmentation. It was proved that
the segmentation performance achieved an overall sensitivity of 94.7%.
2.2 Image Feature Extraction
Numerous features utilized in ANA pattern classification were investigated, including
texture features and shape features, as shown in Figure 2. Since the same object may have a variety of
different colours but a similar shape, many queries may arise as to the shape of the image
instead of the colour of the image. There are two methods of presenting shape features:
contour feature and regional characteristics. However, shape features lack a model, and
have high computation and storage requirements. In [12], the shape measurement of a single feature vector,
with greater weight by far given to texture, is used to identify the cytoplasmatic class
and the shape feature (calculated as the area divided by the square of the perimeter) is
able to recognize most samples of this category based on a single parameter. In [4], four shape features, area,
perimeter, inside area and perimeter area, in the feature vector are utilized as the
inputs for a self-organizing map (SOM) model to determine the similarity of the cells.
Figure 2
ANA pattern classification methods: segmentation, feature extraction and
classification.
Texture feature is the most commonly used feature for pattern classification, describing
the surface nature of the scene corresponding to a specified image or image area. Texture
feature is not the sort of feature based on pixels, which need statistical calculations of
more than one pixel belonging to the region. As a statistical feature, texture feature,
often with rotation-invariant characteristics, has a strong capability to resist noise.
However, it also has its drawbacks, and one obvious drawback is that changing the image
resolution, may result in larger deviations in the calculated texture feature. Moreover,
the feature may sometimes be affected by light and reflection. Texture feature extraction
methods can be divided into statistical methods, structure methods and spectrum
methods.Giulio et al. [5] used the
well-known SIFT [13], [14] descriptor to extract concise and
informative local characteristic from HEp-2 images. The SIFT algorithm has proved to be
one of the most effective in the object-recognition field because of its invariance to
common image transformations, illumination changes and noise. Discrete cosine transform
(DCT) [15] is always used to
extract relevant textural information for image compression and classification. In [16], 328 DCT coefficients, which
represent different patterns of image variation and directional information of the
texture, are calculated through two-dimensional DCT of the normalized images. In [17], 48 DCT features, including the DC
component, mean value and standard variance, are extracted for HEp-2 cell pattern
classification. The LBP descriptor [18]–[20] is a robust and
computationally efficient means of texture description, which derives from a general
definition of texture in a local neighbourhood, incorporates both statistical and
structural information and has shown effectiveness in many applications. Kuan et al. [17] extracted 42 features from each
HEp-2 cell image through the multi-resolution LBP descriptors. Ersoy et al. [21] used a uniform rotation-invariant
LBP consisting of 18 unique patterns for HEp-2 cell classification.GLCM [22] is a powerful
technique that extracts texture characteristics from the spatial relationship among
intensity values at specified offsets and reports the distribution of co-occurring values
among local pixels based on different distances and angles. In [23], only four GLCM features (intensity, standard
deviation, entropy and range) are calculated as a part of the final feature vector, while
in [16], a total number of 44
features, represented by the mean and the range value over the 22 statistical measures
(e.g. correlation, cluster prominence, cluster shade, energy, entropy, variance,
homogeneity, maximum probability, etc. [22]), are extracted from four GLCMs.There are many other texture features used for HEp-2 cell classification, such as Gabor
transform [17], histogram of
oriented gradients (HOG) [24],
independent component analysis (ICA) [25], codebook based descriptors [26], speeded-up robust features (SURF) [24], grey-level size zone matrix
(GLSZM) [27] etc. However, most
studies combine several of the image features mentioned above into a feature vector to
recognize the patterns of the HEp-2 cell instead of using a single characteristic since a
combination of several features is able to extract more image information on texture,
shape and space than a single feature. In [24], Ghosh and Chaudhary. explored various features like SURF in a bag of words
(BoW) model, texture-based features from the GLCMs and region of interest (ROI)-based
features and HOG features, using one or several of them to create a composite feature
vector to investigate the performance of a classification based on various features. A
total of 372 features containing 44 GLCM features and 328 DCT coefficients in [16] were used to characterize each
HEp-2 image. Moreover, in [9] a
total of eight practical features (standard deviation, uniformity/entropy, block variation
of local correlation coefficients, spatial grey-level dependence matrices, grey-level
difference matrix, neighbourhood grey-tone difference matrix, fractal dimension and image
coarse degrees) obtained from an IIF cell image were utilized to identify fluorescence
patterns.
2.3 ANA Pattern Classification
In the past decade, there have been many studies on the detection of ANA patterns and
many classification methods, mainly including the KNN algorithm [28], [29], artificial neural networks (ANNs), expert
systems (ESs) and SVM etc., have been utilized for pattern recognition in HEp-2 cells and
have achieved positive final performance. Multi-class SVMs with different kernels were
investigated and used in [12],
[17], [24]. Soda has always used multi-expert systems (MESs)
[30]–[33] to explore the problem of HEp-2 cell pattern
classification. Cordelli and Soda [34] test four popular classifiers belonging to different paradigms: a
multi-layer perceptron (MLP) as a neural network, a KNN as a statistical classifier, an
SVM as a kernel machine, and AdaBoost as an ensemble of classifiers. In [31], the authors selected two ANN-based
classifiers based on the MLPs and the radial basis function (RBF) network architecture to
separate intrinsically dubious samples whose error tolerance can be flexibly set. In [9], learning vector quantization
(LVQ), which is a prototype-based supervised classification algorithm and can be
understood as a special case of an ANN, utilized the normalized feature vector to
differentiate the autoantibody fluorescence patterns.
Methods
The proposed classification architecture consists of four sequential steps: block
segmentation, feature extraction, block pattern classification and well pattern
classification. To our knowledge, block segmentation has never been used to process HEp-2
cell images. Various combinations of features and classifiers are utilized to identify the
patterns of the blocks to explore the best features and classifiers suitable for this
application. But not all combinations are used in the block pattern classification; for
example, the VLFeat Package has its own classifier for the SIFT feature, so we just use this
combination. As to the fusion rules, the weighted sum rule (WSR) is only defined in the KNN
classifier, so in other classifiers, we just aggregate the block patterns to classify the
staining pattern of the specimen image with WR and weighted majority rule (WMR) rules.
3.1 Block Segmentation
As mentioned above, the previous research into HEp-2 cell image segmentation mainly
concentrated on cell segmentation, which also has drawbacks affecting the final
sensitivity of the segmentation. For example, Otsu's thresholding method can choose the
threshold to minimize the intra-class variance of the black and white pixels
automatically, but due to the variety of ANA patterns, Otsu's algorithm always failed to
segment cells of discrete speckled and nucleolar patterns and resulted in
over-segmentation [8]. Even though
two-stage watershed segmentation [10] uses two watershed transformations to avoid over-segmentation, it may occur
in generating erroneous outlines of IIF cells because of noise and speckles in IIF images.
Since the two-stage watershed segmentation uses a great number of morphological
techniques, including pre-processing, Otsu's thresholding, region merging and region
elimination, in the expectation of better segmentation performance, its time complexity
and space complexity significantly enlarge in contrast to Otsu's segmentation method. The
methods [8], [10], [35] proposed in the previous works for the segmentation utilized various
techniques to eliminate over-segmentation and overlap problems, which have no effects on
the performance of block segmentation although there are overlap areas between different
blocks.Block segmentation is much easier to implement than cell segmentation and does not have
the same problems as cell segmentation. As is shown in Figure 3, first the RGB image is converted into a binary
image and morphological erosion with a disk mask is performed; then the connected regions,
which determine the position of candidate blocks, are located. The centre of the connected
region is regarded as the centre of the block with a fixed size, such as and (the set
depending on the size of the well image). The centre of the connected region is defined as
(Figure 4a)where
denotes the location of the centre and
and denote the
maximum and minimum x axes of the connected region. Similarly,
and are the
maximum and minimum y axes of the connected region.
Figure 3
Flowchart of block segmentation.
Figure 4
Overlap in the block segmentation: (a) calculate the centre of the connected
regions; (b) overlap problem; (c) the overlapping rates with the size
increasing.
Sometimes, the overlapping area occupies a large part of the total area, here using
to present the rate of overlap areas between two blockswhere
denotes the area of the overlapping area,
denotes the area of one block and denotes the
area of another block overlapping with the former block as shown in Figure 4b. Depending on the overlapping rates, the block
with the maximum overlapping relationships is removed first and then the second maximum,
then the third, and so on, until there is no overlapping rate which is larger than the
threshold. Here the overlapping relationship means the mapping from one block to another
if they have overlaps. A block may overlap with many other blocks, so the block with the
most overlapping relationships is removed first.The preliminary experiments show that the average overlapping rate of two blocks
increases as the block enlarges (Figure
4c). In order to decrease the number of blocks separated from an image and the
influence of block overlap, one of the two overlapping blocks will be removed if the
overlap rate is larger than 0.5. But if the block is too small, for example,
and (Figure 4b), then most of the blocks are
retained resulting in enormous experimental complexity and the blocks are too large to
obtain sufficient blocks with a low overlapping rate, e.g. the average overlapping rate
exceeds 80% when the size is or
. Moreover, the investigation and experiments demonstrate that the
block of is the most suitable for block segmentation and
classification. The overlap problem in block segmentation has no effect on the final well
pattern classification as the blocks with overlapping relationships are all either in the
training set or in the test set.
3.2 Feature Extraction
In this section, in total four practical features, LBP, SIFT, LDA and GLCM, are solely
explored to identify fluorescence patterns. These features are briefly described as
follows.
LBP Features
The original LBP operator, introduced by Ojala et al. [36], is a powerful means of texture description.
Here, we use the notation for the
LBP operator (Figure 5a, 5b, 5c).
The subscript represents using the operator in a (P,
R) neighbourhood. A histogram of the
labelled image can be defined as:where n is the number of
different labels produced by the LBP operator and
This histogram contains information about the distribution of the local micro-patterns,
such as edges, spots and flat areas, over the whole image. For efficient image feature
representation, spatial information should be retained. So the image is divided into
regions (see Figure 5d, here m = 16) and the spatially enhanced
histogram is defined as [18]In this experiment, the best LBP descriptor is , and the
dimension of the feature vector is 256.
SIFT Features
Lowe [13] summed up the existing
feature detection method based on invariants technology in 2004, and formally proposed
the SIFT algorithm invariant to common image transformations (image scaling, rotation,
even affine transformation), illumination changes and noise. The SIFT algorithm first
undertakes feature detection in scale space and defines the key point positions and the
scale of the key points, and then uses the main direction of the neighbourhood gradient
of the key points as the direction features of the points in order to achieve the
operator independent of scale and direction. The MATALB Code S1 of SIFT we
used is from http://www.vlfeat.org/index.html
[37]. Different images have
feature vectors with different dimensions, but each element has a direction parameter
with 128 dimensions.
LDA Features
Principal component analysis (PCA) and LDA [38]–[40] are two powerful tools used for dimensionality
reduction and feature extraction in most pattern recognition applications. Due to the
number of blocks being so numerous that pattern classification only based on PCA
characteristics may waste too much time, LDA features with five dimensions based on PCA
characteristics are utilized to identify HEp-2 cells, achieving better performance than
that only using PCA.
GLCM Features
GLCM [22] is a powerful
technique that extracts texture characteristics from the spatial relationship between
intensity values at specified offsets and reports the distribution of co-occurring
values between local pixels based on different distances and angles. Here we extract 44
features, represented by the mean and the range value over the four GLCMs for each of
the 22 statistical measures (e.g. correlation, cluster prominence, cluster shade,
energy, entropy, variance, homogeneity and maximum probability, etc. [22]) in [16].
3.3 Block Pattern Classification
In this procedure, three commonly used classifiers, i.e. KNN using Euclidean distance,
common BPNN with sigmoid units and SVM with linear kernel function, are used with
different features for block recognition. Several different patterns may appear in a
single image, but the segmentation method proposed here only considers images with a
unique pattern, which implies that blocks separated from an image are all marked with the
pattern of the same well image.The KNN classification algorithm, presented by Cover and Hart [28] in 1967, is a more mature approach in theory, but
also one of the simplest machine learning algorithms. This decision rule provides a simple
non-parametric procedure for the assignment of a class label to the input pattern based on
the class labels represented by the k-closest neighbours of the vectors. BPNN learning
methods provide a robust approach to approximating real-valued, discrete-valued, and
vector-valued target functions. For certain types of problems, such as learning to
interpret complex real-world sensor data, ANNs are among the most effective learning
methods currently known. SVM is a powerful machine learning method successfully used in
many applications and the classification is based on the implicit mapping of data to a
higher dimensional space via a kernel function and on the identification of the
maximum-margin hyperplane that separates the given training instances in this
high-dimensional space [16].
3.4 Well Pattern Classification
To classify the screening patterns of the whole image into one of the basic classes
mentioned in Section 1, first blocks should be segmented from the well image and then the
set of features extracted; second the staining patterns of blocks labelled by the pattern
of the original image are classified, and finally the staining pattern of the whole well
is distinguished based on the results of the classification of its cells (Figure 6).
Figure 6
Architecture of well pattern classification.
In fact, such an approach based on classification of individual blocks cannot detect the
occurrence of multiple patterns since there may be cells with different patterns in a
block marked with one pattern. But it is acceptable that most cells in a block belong to
the class of the block. Furthermore, this approach is greatly tolerant and robust to
misclassifications in block recognition since the final label of the whole image is
aggregated by the classification information of all the blocks segmented from the image.
Indeed, if enough blocks per well are available, it is reasonable that block
misclassification, if limited, does not affect the final well pattern classification.Typical fusion techniques, including majority rule (MR), WMR [41] and WSR [42], [43] (see Figure 7), will be
used in this section to combine the results of block recognition. However, a critical
point of these fusion rules is that different blocks belonging to the same well should be
included in either the training set or the testing set, which guarantees that the final
well pattern is determined by all the blocks belonging to this well image. So we randomly
subdivided all the well images into two equal partitions and different blocks belonging to
the same well were all in one partition. In the following, we briefly describe these
fusion rules.
Figure 7
Method flow in the classification architecture.
First, a conceptual formula [43] is
given as follows:in which indicates
the weighted parameter of the input block b in the block set
B and is defined
as follows:The index of the final class of the well staining pattern is the class for which
is maximum. If , the rule is
MR; if indicates the number of cells in block
b, the fusion rule is defined as WMR and if
indicates the classification reliability of the input block
b, this rule becomes WSR, which is only used in the KNN
classification.The classification reliability for the KNN [42] classifier is given bywhere
denotes the smallest distance of b from a reference
sample belonging to the same pattern of b;
implies the highest among the values of
obtained for samples in a set disjointed from the reference set and
the test set and is the distance between x and
the reference sample with the second smallest distance from x among all
the reference set samples belonging to a class which is different from that of
.
Experimental Results
Note that the proposed method to identify the staining pattern of the HEp-2 cell image here
only considers images with a unique staining pattern; implying that blocks separated from an
image are all marked with the pattern of the same well image. Not only the direct
classification of the whole image, but also the staining pattern classification of the well
image, based on cell segmentation as the control group, abides by this principle. Therefore,
block segmentation is equal to cell segmentation in the problem to be solved.
4.1 Dataset
In this study, the IIF images were collected based on HEp-2 substrate at a serum dilution
of 1∶80. A physician takes images of slides with an acquisition unit consisting of the
fluorescence microscope coupled with a commonly used fluorescence microscope (Axioskop 2,
CarlZeiss, Jena, Germany) at 640-fold magnification. The IIF images were taken by an
operator with a colour digital camera (E-330, Olympus, Tokyo, Japan). The digitized images
were of 8-bit photometric resolution for each RGB colour channel with a resolution of
3136×2352 pixels [9]. This image
database contains 260 samples belonging to four different patterns, i.e. coarse speckled
(CS), fine speckled (FS), nucleolar (NU) and peripheral (PE). The number of samples in
each pattern were 167 (CS), 20 (FS), 38 (NU) and 35 (PE), and the odd-numbered half of
them were selected to belong to the training set, and the remainder were the test set
(Table 1). If ANA testing detects
any of the four patterns, the patients may have specific systemic autoimmune diseases. For
example, if the test detected the CS pattern, the patients may have systemic lupus
erythematosus (SLE), mixed connective tissue disease (MCTD), progressive systemic
sclerosis (PSS) or cryoglobulinemia. Experiments have shown that the best features for ANA
classification are features, which are shown in Figure 8.
Table 1
The samples in the training set and in the testing set.
IIF patterns
Samples in the training set
Samples in the testing set
Coarse speckled
84
83
Fine speckled
10
10
Nucleolar
19
19
Peripheral
17
18
Total
130
130
Figure 8
Four staining patterns and the corresponding
descriptors: (a) coarse speckled (CS); (b) fine speckled (FS); (c) nucleolar (NU); (d)
peripheral (PE).
The dataset is from the third party, Taichung Veterans General Hospital. The data is
available upon request to the corresponding author. Moreover, the Code S1 of this
experiment is uploaded in the online version for the convenience of testing our proposed
method on other datasets, please refer to Supporting Information: Instruction S1.
4.2 Direct Whole Image Pattern Classification
LBP descriptors of the HEp-2 images without cell segmentation and block segmentation were
directly extracted, and then classified on the test set. It was found that the KNN
classifier with the descriptor just achieved the best performance
with a total accuracy of 83.08%, as depicted in Table 2; in particular, only half of the samples with
peripheral patterns were classified into the right class, demonstrating that global
classification is not applicable in some patterns.
Table 2
Classification results based on the whole image.
CS
FS
NU
PE
Accuracy
CS
75
0
1
7
90.36%
FS
0
9
0
1
90.00%
NU
3
0
15
1
78.95%
PE
8
0
1
9
50.00%
Total
83.08%
4.3 Classification Based on Cell Segmentation
HEp-2 cell images were separated by Otsu's thresholding method and all cells divided from
an image belonging to the training set or the test set were still regarded as the training
set or the test set respectively. Subsequently, various combinations of classifiers and
features were applied to the HEp-2 cells dataset, and then suitable fusion rules were used
to aggregate the results of cell pattern classification into well pattern classification.
The experiments showed that when utilizing a combination of KNN, LBP and MR, the total
accuracy of the four distinct patterns is 90.77% (see Table 3), which is better than that in the direct
classification. Compared with direct classification, the accuracy of the CS pattern
increased to 100%, while that of the FS and NU patterns slightly decreased.
Table 3
Classification results based on cell segmentation with KNN classifier, LBP
feature and Majority Rule.
CS
FS
NU
PE
Accuracy
CS
83
0
0
0
100.0%
FS
2
5
3
0
70.00%
NU
3
0
14
2
73.68%
PE
2
0
0
16
77.78%
Total
90.77%
4.4 Classification Based on Block Segmentation
To better illustrate the advantages of block segmentation, such comparisons are described
as follows: (a) the block segmentation considered is significantly different from cell
segmentation, that is, the complexity of block segmentation is significantly lower than
that of cell segmentation with numerous morphological techniques, and the block
segmentation method just depends on connected regions; (b) in contrast to direct whole
image classification, the classification based on block segmentation has a robust
tolerance to misclassifications in the block recognition since the final label of the
whole image is aggregated by classification information of all blocks segmented from the
image. However, that the size of the block requires a great number of explorations to
determine, since there has been no regularity so far, is a problem.In this experiment, various combinations of classifier, feature and fusion rule were
utilized to evaluate the performance of the staining pattern recognition of the HEp-2 cell
image. Figure 9 presents the
accuracies of 10 combinations mainly focusing on the LBP feature and KNN classifier, with
some passive combinations omitted, such as LDA feature and BPNN classifier, LDA feature
and SVM classifier etc. LBP+BPNN+MR and LBP+KNN+WSR achieve the same accuracy, 94.62%, and
the classification results based on the LBP feature is more positive than other features
with a maximum accuracy of 76.15% using GLCM+KNN+MR (Figure 9). This indicates that the LBP feature is the
most suitable characteristic to identify ANA patterns.
Figure 9
Accuracies of different combinations of classifier, feature and fusion rule: from
right to left sequentially GLCM+SVM+WMR, GLCM+KNN+MR, LBP+BPNN+MR, LBP+BPNN+WMR,
LBP+KNN+MR, LBP+KNN+WMR, LBP+KNN+WSR, LDA+KNN+MR, SIFT(vlfeat)+MR and
SIFT(vlfeat)+WMR.
Two methods have the same accuracy, LBP+BPNN+MR and LBP+KNN+WSR, but their individual
results for cell pattern classification are different. The latter, with a total accuracy
of 82.21%, is slightly better than the former with 79.95%, as shown in Table 4. We used different fusion rules
to aggregate the different classification results of the block pattern, but achieved the
same positive performance, demonstrating the robust tolerance to misclassifications of
well pattern classification based on the results of block pattern classification.
Table 4
Classification results of block pattern classification of LBP+KNN and
LBP+BPNN.
Block patterns
Ntrain
Ntest
Nc (LBP+BPNN)
Nc (LBP+KNN)
CS
1714
1584
1417(89.46%)
1458(92.05%)
FS
130
133
106(79.70%)
127(95.49%)
NU
419
440
239(54.32%)
321(72.95%)
PE
373
401
283(70.57%)
197(49.13%)
Total
2636
2558
79.95%
82.21%
Ntrain: number of blocks in the training set.
Ntest: number of blocks in the testing set.
Nc: number of correct classification of blocks.
Ntrain: number of blocks in the training set.Ntest: number of blocks in the testing set.Nc: number of correct classification of blocks.Even though only approximately half of the blocks marked with the peripheral pattern are
correctly distinguished (Table 4) in
the block pattern classification with the LBP characteristic and KNN classifier, there are
still 12 samples among the test set of well images (18 samples) correctly classified
(Table 5). Even though the
accuracy of block pattern classification with the LBP feature and BPNN classifier is no
more than 80%, the final well pattern classification based on it achieved positive
performance with a total accuracy of 94.62% (Figure 9). In contrast with the classification based on cell segmentation (Table 3), the accuracies of FS and NU
patterns both in well classification based on block segmentation reached 100% while those
of CS and PE patterns slightly decreased. Consequently, the performance of well
classification based on block segmentation is a little better than that based on cell
segmentation.
Table 5
Classification results based on block segmentation with KNN classifier, LBP
feature and Weighted Sum Rule.
CS
FS
NU
PE
Accuracy
CS
82
0
1
0
98.80%
FS
0
10
0
0
100.0%
NU
0
0
19
0
100.0%
PE
5
0
1
12
66.67%
Total
94.62%
Moreover, mean class accuracy (MCA) is commonly used as the evaluation criteria in cell
level classification. So here we use it to measure the performance of cell segmentation
and block segmentation under the same circumstance, that is, feature, classifier and
fusion rule. MCA can be defined as follows:where is the
correct classification rate for class k and K is equal to the number of classes.Table 6 reports the classification
results of the proposed approach and some previous methods. The result of the
classification based on block segmentation with MCA of 91.37% is significantly better than
that of the others. Furthermore, some previous approaches, such as HOG and SVM, GLCM and
SVM, are distinctly inapplicable in this dataset, achieving passive and biased
accuracies.
Table 6
Comparison of direct whole image classification, classification based on cell
segmentation and classification based on block segmentation.
Total accuracy
Mean class accuracy
Direct whole image classification
83.08%
77.33%
Cordelli E [34] (LBP and KNN) and WSR
90.77%
78.14%
Cordelli E [34] (LBP and KNN) and MR
90.00%
79.03%
Cordelli E [34] (LBP and SVM) and MR
63.85%
25.00%
Ghosh S [24] (Texture feature, HOG and SVM) and
MR
58.46%
22.89%
Di Cataldo S [16] (GLCM and SVM) and MR
59.23%
23.19%
Classification based on block
segmentation
94.62%
91.37%
Conclusion and Discussion
In this study, in contrast to cell segmentation a new block segmentation method never used
was proposed to process the original HEp-2 images and then classification of the block
patterns was undertaken based on various selected features (GLCM, LBP, SIFT and LDA) and
classifiers (KNN, BPNN and SVM), commonly used in the previous studies of cell pattern
classification. Subsequently, fusion rules (MR, WMR, and WSR) were used to aggregate the
results of the block pattern classification to identify the staining patterns of the whole
images. The results show that the proposed method can classify the well images correctly
with an accuracy of 94.62% depending on the combination of LBP, KNN and MSR or the
combination of LBP, BPNN and MR, which is better than pattern classification with a total
accuracy of 90.77% based on cell segmentation and direct whole image classification with a
total accuracy of 83.08%.The block segmentation considered is significantly different from cell segmentation, that
is, the complexity of block segmentation is significantly lower than that of cell
segmentation with numerous morphological techniques, and the block segmentation method just
depends on connected regions. In contrast to direct whole image classification, the
classification based on block segmentation has a robust tolerance to misclassifications in
block recognition since the final label of the whole image is aggregated by classification
information of all the blocks segmented from the image. However, that the size of the block
requires a great number of explorations to determine, since there has been no regularity so
far, is a problem. If the block size is too large, block segmentation will waste too much
memory compared with cell segmentation. Moreover, well pattern classification based on the
classification of individual blocks cannot detect the occurrence of multiple patterns since
there may be cells with different patterns in a block marked with one pattern. However, this
approach is greatly tolerant and robust to misclassifications in block recognition. If
enough blocks per well are available, it is reasonable that block misclassification does not
affect the final well pattern classification.(PDF)Click here for additional data file.(ZIP)Click here for additional data file.