| Literature DB >> 23560739 |
Abstract
This paper reviews image processing and pattern recognition techniques, which will be useful to analyze bioimages. Although this paper does not provide their technical details, it will be possible to grasp their main tasks and typical tools to handle the tasks. Image processing is a large research area to improve the visibility of an input image and acquire some valuable information from it. As the main tasks of image processing, this paper introduces gray-level transformation, binarization, image filtering, image segmentation, visual object tracking, optical flow and image registration. Image pattern recognition is the technique to classify an input image into one of the predefined classes and also has a large research area. This paper overviews its two main modules, that is, feature extraction module and classification module. Throughout the paper, it will be emphasized that bioimage is a very difficult target for even state-of-the-art image processing and pattern recognition techniques due to noises, deformations, etc. This paper is expected to be one tutorial guide to bridge biology and image processing researchers for their further collaboration to tackle such a difficult target.Entities:
Mesh:
Year: 2013 PMID: 23560739 PMCID: PMC3746120 DOI: 10.1111/dgd.12054
Source DB: PubMed Journal: Dev Growth Differ ISSN: 0012-1592 Impact factor: 2.053
Image processing and recognition methods which fit to a specific purpose
| What do you want to do? | Method | Note | |
|---|---|---|---|
| I want to improve visibility of my image by… | controlling contrast | Gray level transform | It is also possible to calibrate gray levels of two images via gray-level transformation |
| Binarization | Appropriate for essentially (or approximately) binary images | ||
| removing noise | Image filtering | Smoothing filters | |
| emphasizing object boundary | Edge detection filters, sharpening filters | ||
| I want to extract target objects in my image for… | counting them, understanding their boundaries, evaluating their shape and size | Binarization | Simple but applicable only when the target objects have only brighter (or darker) pixels than the background |
| Image segmentation | Image recognition techniques are also often used | ||
| I want to analyze the motion in my video by… | tracking a single or multiple target objects | Visual object tracking | It is comprised of two sub-problems, i.e. detection of the target objects at each frame and connection of the detected results over frames to form temporal trajectories |
| determining the motion in the entire image | Optical flow | It is possible to interpret optical flow as a set of tracking results of all pixels in the image | |
| I want to compare two images by overlaying them flexibly, i.e. elastically | Image registration | Non-parametric image registration is mathematically similar to optical flow | |
| I want to classify images or targets into several types | Image recognition | If classes are already defined | |
| Clustering | If classes are not defined in advance | ||
Fig. 1Combination of multiple image processing and image recognition techniques for realizing a complete system for a specific task.
Fig. 2Gray-level transformation by tone curve z = T(x). (a) Gray-level transformation for contrast enhancement. (b) Binarization as a gray-level transformation. (c) Tone curve alignment between a pair of images.
A list of binarization methods
| Name | Methodology | Merit | Demerit |
|---|---|---|---|
| Global thresholding | First, a single threshold value is determined for the whole image. Then, the gray-value at each pixel is compared to the threshold. If the gray-value is larger (i.e. brighter), it is converted to white | Generally simple. Many variations for determining the threshold | Unsuccessful when the range of gray-scale varies locally (by, for example, uneven lighting condition and non-uniform background) |
| Local thresholding | For each pixel, a threshold value is determined by considering a surrounding region | Better performance where the global thresholding method will fail | Generally more computations. A special treatment is necessary for the case where the surrounding region is essentially uniform |
| Optimization-based binarization | A Markov random field (MRF) formulation where black/white decision is done at each pixel while considering the decisions at neighboring pixels | Using appropriate optimization scheme, the decision becomes more robust to gray-level fluctuations. Solid mathematical basis | Necessary to design the problem formulation as a mathematical optimization problem. Sometimes, computationally expensive |
Fig. 3Binarization by global thresholding and local thresholding. For this input image, no global threshold provides the result where two circles are successfully extracted. In contrast, the local threshold determined at each pixel by the local average around the pixel will provide a successful result.
A list of image filtering methods
| What do you want to do by filtering? | Filter name | Filter type | Note |
|---|---|---|---|
| Smoothing | Blurring filter | Linear | A kind of low-pass filters |
| Median filter | Nonlinear | Good for removal of salt-pepper (dot) noise | |
| Bilateral filter | Nonlinear | Edge-preserving smoothing | |
| Edge detection | Sobel filter | Linear | Calculate |
| Laplacian filter | Linear | Calculate two-dimensional gradient at once | |
| Canny filter | Nonlinear | Recently, one of the most popular image detection techniques | |
| Sharpening | Unsharp masking | Linear | Emphasize non-smooth component |
| Reshape connected components in a binary image | Morphological filters | Nonlinear | For example, it is possible to fill small holes in a connected component and remove small connected components. Extendable to deal with gray-scale images |
Fig. 4Effect of three different smoothing filters. Top: Original image. Middle: Mild effect. Bottom: Strong effect. The original image contains linear and textured components. Note that the components in the right side of the original image are blurred intentionally. For blurring and median filters, 3 × 3 mask (mild) and 7 × 7 mask (strong) were used. For bilateral filter, its filtering operation with a 7 × 7 Gaussian mask was done once (mild) and three times (strong). All of those filters are implemented in OpenCV.
Fig. 5Effect of three different edge detection filters, Laplacian, Canny, and Sobel filters. Note that the components in the right side of the original image are blurred intentionally. Sobel filter detects vertical edges (Sobel X) and horizontal edges (Sobel Y) independently. They are added to have all edges (Sobel X + Y). All of those filters (except for Sobel X + Y) are implemented in OpenCV.
A list of image segmentation methods
| Name | Methodology | Merit | Demerit |
|---|---|---|---|
| Image binarization | See | Appropriate when the target object is comprised of only bright pixels (or dark pixels) | Limited applicability (however, note that several binarization methods can be extended for multi-level thresholding. For example, by using two thresholds, an image is partitioned into bright regions, mid regions, and dark regions.) |
| Background subtraction | Detect target objects by removing the background part | Appropriate when target objects are distributed over the background | The background image is necessary. Especially when the background is not constant, some dynamic background estimation is necessary |
| Watershed method | Representing an image as a three-dimensional surface, and detecting its ridge lines, i.e. watershed | Even if gray-level change is not abrupt, it is possible to detect its peak as an edge | Appropriate preprocessing is necessary for suppressing noises |
| Region growing | Iterative. If neighboring regions have similar properties, combine them | Simple | Inaccurate due to its local optimization policy |
| Clustering | Grouping pixels with similar properties | Simple. Popular clustering algorithms, such as k-means, can be used | Difficulty in balancing locational proximity and pixel value similarity |
| Active contour model (snakes) | Optimally locating a deformable closed contour around a single target object | Robust by its optimization framework. If the contour of a target object is invisible, it still provides closed contour | Only for a single object. Difficulties of dealing with unsmooth contours. Usually, characteristics of the region enclosed by the contour are not considered |
| Template matching and recognition based method | Finding pixels or blocks whose appearance or other characteristics are similar to reference patterns of the target object | Capable of stable segmentation by using various pattern recognition theories for evaluating the similarity | Computationally expensive. Often a sufficient number of reference patterns are necessary for realizing stability |
| Markov random field (MRF) | An integrated method to optimize the segmentation result considering the similarity of neighboring pixels | Accurate and robust. Highly flexible and capable of using various criteria | Computationally expensive. Difficult to implement |
Fig. 6Background subtraction. If there is a background image, it is possible to detect targets in an input image. Note that the pixel intensity value of the detected targets in the subtraction image can be different from its original value. If it is necessary to retrieve the original intensity value, it should refer to the pixel value of the original image at the pixel with a non-zero value in the subtraction image.
Fig. 7K-means algorithm. The number of centroids is fixed at 4 in the figure.
Fig. 8Visual object tracking. (a) Connecting the detected target locations. In some frames, multiple candidates exist and thus we need to select the best candidate for having, for example, a smooth trajectory. (b) Template-based tracking. Around the previous target location, the most similar (or the least dissimilar) location is selected as the current target location. (c) Multiple object tracking. The optimal one-to-one correspondence should be determined between each pair of consecutive frames.
Four major properties on classifying visual object tracking methods
| The number of targets | ||
| Target shape | ||
| The order of target detection and tracking | ||
| Trajectory optimization strategy |
Major visual object tracking methods
| Tracking method | Methodology | Merit | Demerit |
|---|---|---|---|
| Template matching | Search the similar pattern to a template image of the target object around the position estimated at the previous frame | Simple. Extendable | Weak to deformations and occlusion |
| Lucas- Kanade tracker | Similar to the template matching method but more efficient by using a gradient-based solution | More efficient than template-matching. Capable of dealing with rotation and other parametric deformations, such as affine | Difficult to deal with a large displacement and deformation between consecutive frames |
| Mean-shift | Represent the target object by color histogram | Robust to deformation. An efficient search by combining gradient-based strategy | Weak to occlusion and existence of similar objects |
| Kalman filter | Determine the current target position by integrating the current frame and a prediction result | Robust if the assumptions hold true. Efficient by linear computation | Several hard assumptions in target motion |
| Particle filter | Estimate the distribution of the target position with multiple hypotheses generated and evaluated by some fitness | Robust by its multiple (i.e. parallel) search nature and statistical validity | Large computations. Ambiguity on determining the position from the estimated distribution |
| Dynamic programming-based tracking | Solve the tracking problem as a globally optimal path problem in spatio-temporal space | Even if severe distortions, such as occlusion, happen in several frames, we can expect a stable result | Unsuitable for real-time application. Huge computations |
| Integer linear programming-based tracking | Similar to dynamic programming-based tracking, but more efficient by introducing specific constraints | The same as above | The same as above, but less computations |
A classification of visual object tracking methods
| Tracking method | The number of targets | Target shape | The order of target detection and tracking | Trajectory optimization strategy |
|---|---|---|---|---|
| Template matching | Single/Multiple | Arbitrary | One-step | Online |
| Lucas–Kanade tracker | Single/Multiple | Arbitrary | One-step | Online |
| Mean-shift | Single/Multiple | Arbitrary | One-step | Online |
| Kalman filter | Single/Multiple | Arbitrary | One-step | Online |
| Particle filter | Single/Multiple | Arbitrary | One-step | Online |
| Dynamic programming-based tracking | Single/Multiple | Arbitrary/Point | One-step/two-step | Offline |
| Integer linear programming-based tracking | Multiple | Arbitrary/Point | Two-step | Offline |
Fig. 9Dynamic time warping (DTW) for comparing two tracking results, that is, two temporal trajectories. Using dynamic programming, the optimal correspondence, which is represented as a path (1,1), …, (t,s), …, (T,S) having smaller local costs, is derived. Note that the lengths of the trajectories (T and S) can be different.
Fig. 10Difference among three similar image processing techniques. Visual object tracking (a) tracks one or multiple objects over all video frames (more than two frames). Usually, it is assumed that each individual object will have the same motion and thus a tracking result will show a sparse motion field. Optical flow (b) provides motion at every pixel by observing a pair of consecutive frames (not all the frames). Accordingly, it provides a more “dense” motion field and an object will have different motions at different positions inside the object. Image registration (c) matches a pair of images using a 2D-2D mapping function (this figure uses an affine transformation function for registration). The images need not be a pair of consecutive frames in a video; thus, we can apply image registration between, for example, face images of different persons.
Fig. 11Geometric transformation functions for image registration. Note that affine transformation is a combination of translation, rotation, scaling, and shear. Also note that linear transformation functions map any straight line to a straight line, whereas nonlinear transformation function maps a straight line to a curve.
Fig. 12Estimation of the geometric transformation function by keypoint correspondence. First, keypoints are detected at each image and then each keypoint is described as a feature vector, called local descriptor. Second, keypoint correspondence is established by using similarity between local descriptors. Finally, the geometric transformation function is estimated using the keypoint correspondence.
Fig. 13Classification for pattern recognition. (a) Patterns to be classified to two classes, healthy and unhealthy. (b) Classification by using two features independently. The patterns with a green arrow are to be misclassified. (c) Linear classification. The classification boundary is plotted by dash lines. In this case, the classification is done considering the dependency between height and weight. (d) Nonlinear classification. Classification boundary can be an arbitrary curve. (e) A linear discriminant function for the linear classification of (c).
A list of typical features for image pattern recognition
| How feature is extracted? | Where feature is extracted from? | Feature name | Notes |
|---|---|---|---|
| Intuition and/or heuristics | Individual pixel value | Gray-scale feature | Use the input image (bitmap) directly as a feature vector of pixel values |
| Color feature | |||
| Connected component (CC) | Topological feature | After thinning a CC, count its holes, crossing points, etc. | |
| Moment feature | 1st order moment is the center of gravity of the CC | ||
| Line segment, curve, contour | Direction, length, curvature, position | ||
| Texture | Co-occurrence matrix | Co-occurrence of pixel values at distant pixels | |
| Texture in local region in the image | Local descriptor based on local co-occurrences (e.g. BRIEF and BRISK) | ||
| Gradient analysis | Entire image (bitmap) | Gradient image/edge image | |
| Local region in the image | Local descriptor based on local gradients (e.g. SIFT and SURF) | SIFT and SURF are features invariant to size change and rotation | |
| Frequency analysis | Texture | 2D Fourier spectrum, discrete cosine transform (DCT), Wavelet transform, Haar-like feature, linear filters | Linear filters can be interpreted as frequency analysis because they also can be interpreted as a filter in the frequency domain |
| Contour | Fourier descriptor | A parametric representation of a contour | |
| Histogram analysis | Individual pixel values | Gray-scale/color histogram | Robust to deformation |
| Line segments | Direction histogram | ||
| Connected components | Size histogram of CCs | Related to pattern spectrum | |
| Local descriptors in the entire image | Bag-of-features | Using representative local descriptors called visual words, count how many local descriptors are classified to each visual word | |
| Linear projection to a trained low-dimensional subspace | Entire image (bitmap) | Discriminative feature | Obtained by linear discriminant analysis |
| PCA coefficient feature | Obtained by principal component analysis (PCA) | ||
| Structural representation | Local features or local regions scattered over the entire image | Attributed relational graph, constellation model | Feature represented as a graph whose node is a local feature and edge indicates a geometric proximity of the connected nodes. Graph matching techniques are used in classification module for comparing two graphs |
| Comparison with another image | Motion field by optical flow | Motion feature (motion vector) | Feature showing the geometric change between consecutive frames. The set of arrows in |
| Deformation given by comparison to a reference image | Deformation feature (deformation vector) | Linear or nonlinear image registration technique is used for extraction |
Fig. 14A realization of bag-of-features. An input image is decomposed into parts in some way and each part is assigned to one of the pre-defined K visual words. Then, a histogram is created by counting how many parts are recognized as each visual word. This histogram is treated as a K-dimensional feature vector and to be recognized by a classifier.
Fig. 15Local regions detected by SURF (Speeded Up Robust Features), which is a method to detect keypoints and describes a small region around each keypoint by a gradient feature. In this figure, each red circle corresponds to a local region detected by SURF. Since SURF has a function to set the size of the local region automatically (according to a condition), the size of the circle varies.
A list of pattern classification methods, where x denotes a feature vector of input image and c denotes a class
| Classification method | How to classify | How to train | Note | |
|---|---|---|---|---|
| Nearest neighbor classifiers | 1-nearest neighbor classifier (1-NN classifier) | Select the class of the reference pattern closest to | Just prepare patterns with ground-truth as reference patterns. Thus, no explicit training step. If we need to reduce the reference patterns, some pre-selection might be done in advance | Simple but powerful. Generally, accuracy increases with the number of reference patterns. Many variations by the metric to evaluate “closeness.” Computationally expensive with huge reference patterns |
| k-nearest neighbor classifier (k-NN classifier) | Select the majority class of the | An improved version of 1-NN. More robust to outliers than 1-NN. Usually, an odd number, say, 3 or 5, is used as | ||
| Discriminant function methods | Bayesian classifier | The optimal classifier which uses a posterior probability distribution as the discriminant function | Estimate statistical properties, such as likelihood | Theoretically optimal (by minimizing the Bayes risk) but practically it is difficult to realize because the accurate statistical properties are difficult to estimate |
| Linear classifier | Use a linear function of | Error-correcting learning and Widrow-Hoff learning are classic. Recently, SVM is more common | Class boundary is given as a set of hyper-plane in feature space. A special case of Bayesian classification. If each class only has a single reference pattern, 1-NN classifier is reduced to this | |
| Piecewise linear classifier | Use multiple linear discriminant functions for each class | Consider each cluster of a class as a subclass and train linear classifiers to discriminate subclasses | Class boundary is given as a set of polygonal chains. 1-NN classifier is a special case of this classifier | |
| Quadratic classifier | Use a quadratic function of | Estimate likelihood | Class boundary is given as a set of quadratic curves. A special case of Bayesian classification. Mahalanobis distance is its simplified version | |
| Support vector machine (SVM) | Determine the class boundary at the center of gap between two classes. By using a so-called kernel, it is possible to have various types of class boundary | Solve a quadratic optimization problem. The problem is to derive the optimally centered discrimination boundary | SVM is a general method to train various discriminant functions in an optimization framework and can provide a linear or a quadratic or a more flexible class boundary. Only two-class classification | |
| Multilayer perceptron (neural network) | It combines feature extraction and classification modules into one framework. The classification is done by aggregating the outputs from trainable units, called perceptron | Back-propagation is a popular choice. Note that it can train not only classifier but also feature extraction | Huge variations by its inner structure. Perceptron in the simplest case is a linear function whose coefficients are trainable. Nonlinear perceptron is also used | |
| Voting | Select the majority class in the results by multiple classifiers | If individual classifiers are trained, no further training is necessary | Any classifier can be used. Various voting schemes can be used | |
| Classifier ensemble methods | Boosting | Select the class that wins a weighted voting by multiple classifiers | Classifiers are trained complementary; difficult patterns by a classifier are treated as important patterns on training another classifier. The weight for the voting is a reliability of the classifier | Many versions. AdaBoost is the most popular one. Any two-class classifier can be used |
| Decision tree/random forest | Decision tree makes a final classification by hierarchical classifiers. Random forest is a set of decision trees | For a decision tree, ID3 and C4.5 are classic training methods. The key idea of the training is to evaluate the importance of each feature | Random forest is a doubly ensemble method because it is an ensemble of decision trees and each decision tree is also a hierarchical ensemble classifier | |