| Literature DB >> 35125669 |
Rajendran Nirthika1, Siyamalan Manivannan1, Amirthalingam Ramanan1, Ruixuan Wang2.
Abstract
Convolutional neural networks (CNN) are widely used in computer vision and medical image analysis as the state-of-the-art technique. In CNN, pooling layers are included mainly for downsampling the feature maps by aggregating features from local regions. Pooling can help CNN to learn invariant features and reduce computational complexity. Although the max and the average pooling are the widely used ones, various other pooling techniques are also proposed for different purposes, which include techniques to reduce overfitting, to capture higher-order information such as correlation between features, to capture spatial or structural information, etc. As not all of these pooling techniques are well-explored for medical image analysis, this paper provides a comprehensive review of various pooling techniques proposed in the literature of computer vision and medical image analysis. In addition, an extensive set of experiments are conducted to compare a selected set of pooling techniques on two different medical image classification problems, namely HEp-2 cells and diabetic retinopathy image classification. Experiments suggest that the most appropriate pooling mechanism for a particular classification task is related to the scale of the class-specific features with respect to the image size. As this is the first work focusing on pooling techniques for the application of medical image analysis, we believe that this review and the comparative study will provide a guideline to the choice of pooling mechanisms for various medical image analysis tasks. In addition, by carefully choosing the pooling operations with the standard ResNet architecture, we show new state-of-the-art results on both HEp-2 cells and diabetic retinopathy image datasets.Entities:
Keywords: Convolutional neural networks; HEp-2 cell image classification; Medical image analysis; Pooling; Retinopathy image classification
Year: 2022 PMID: 35125669 PMCID: PMC8804673 DOI: 10.1007/s00521-022-06953-8
Source DB: PubMed Journal: Neural Comput Appl ISSN: 0941-0643 Impact factor: 5.102
Overview of different pooling methods used for different medical imaging tasks
| Name of the pooling | Example applications in computer vision | Example applications in medical imaging | |
|---|---|---|---|
| Type of application | Modality | ||
| Max and/or average pooling | Classification [ | Image classification and localization of lesions [ | Retina |
| Segmentation [ | Cell image classification [ | HEp-2 cells | |
| Image classification and detection of pneumonia [ | X-Ray (chest) | ||
| Weakly supervised learning [ | X-Ray (chest) | ||
| Multiple sclerosis identification [ | MRI (brain) | ||
| Object localization [ | – | – | |
| Mixed max-average pooling [ | Classification [ | – | – |
| Gated max-average pooling [ | Classification [ | – | – |
| Dynamic correlation pooling [ | Classification [ | – | – |
| Generalized max pooling | Segmentation [ | Multiple Instance Learning [ | Histopathology |
| Root-mean-square pooling [ | Classification [ | – | – |
| Log-sum-exp pooling [ | Segmentation [ | Weakly supervised classification and localization: thorax diseases [ | X-Ray (chest) |
| Proximal femur fractures [ | X-Ray (bone) | ||
| Histopathology cancer image classification [ | Histopathology | ||
| Polynomial pooling [ | Segmentation [ | – | – |
| Learned-norm pooling [ | Classification [ | – | – |
| Classification [ | – | – | |
| Rank-based pooling [ | Classification [ | Cerebral micro-bleed detection [ | MRI (brain) |
| Multipartite pooling [ | Classification [ | ||
| Ordinal pooling [ | Classification [ | – | – |
| Multi-activation pooling [ | Classification [ | – | – |
| Classification [ | – | – | |
| Global feature guided local pooling [ | Classification [ | – | – |
| SQUare-root (SQU) pooling [ | Image instance retrieval [ | – | – |
| Dynamic pooling [ | – | Chronic kidney disease detection [ | Saliva |
| Smooth-Maximum-Pooling [ | Classification [ | – | – |
| SoftPool [ | Classification, Action recognition [ | – | – |
| RunPool [ | Classification [ | – | – |
| Maxfun pooling [ | Classification, Convolutional sparse coding [ | – | – |
| Stochastic pooling [ | Classification [ | Multiple sclerosis identification [ | MRI (brain) |
| Alcoholism Detection [ | MRI (brain) | ||
| COVID-19 diagnosis [ | CT (chest) | ||
| Rank-based stochastic pooling [ | Classification [ | Abnormal breast identification [ | Breast |
| Mixed pooling [ | Classification [ | Brain tumor segmentation [ | MRI (brain) |
| Hybrid pooling [ | Classification [ | – | – |
| Max pooling dropout [ | Classification [ | – | – |
| S3 pooling [ | Classification [ | – | – |
| Fractional max pooling [ | Classification [ | Retinopathy image classification [ | Retina |
| Sparsity-based stochastic pooling [ | Classification [ | – | – |
| EasyConvPooling (ECP) [ | Classification [ | – | – |
| PatchShuffle stochastic pooling [ | – | Diagnosis of COVID-19 [ | CT (chest) |
| Spatial pyramid pooling [ | Classification, Detection [ | Brain image segmentation [ | MRI (brain) MRI (prostate) MRI, CT (rectum) |
| Concentric circle pooling [ | Remote sensing scene classification [ | – | – |
| Polycentric circle pooling [ | Remote sensing image recognition. [ | – | – |
| Pose pooling kernels [ | Fine-grained image classification [ | – | – |
| Geometric | Classification [ | – | – |
| Cell pyramid matching [ | – | Cell image classification [ | HEp-2 cells |
| Multi-pooling [ | – | Brain tumor segmentation [ | MRI (brain) |
| Donut-shaped spatial pooling [ | – | Cell image classification [ | HEp-2 cells |
| Structure based graph pooling [ | Action recognition [ | – | – |
| Atrous Spatial Pyramid Pooling (ASPP) [ | Segmentation [ | Multi-scale retinal vessel segmentation [ | Retina |
| Second oder pooling [ | Classification, Segmentation [ | – | – |
| Bilinear pooling [ | Fine-grained classification [ | – | – |
| Improved bilinear pooling [ | Fine-grained classification [ | – | – |
| Fine-grained classification [ | – | – | |
| Statistically-motivated second-order pooling [ | Classification, Fine-grained classification [ | – | – |
| Global second order pooling [ | Classification [ | – | – |
| Kernel pooling [ | Classification [ | – | – |
| Global covariance pooling [ | Classification [ | – | – |
| Global gated Mixture of Second-Order Pooling (GM-SOP) [ | Classification [ | – | – |
| Second-order temporal pooling [ | Action recognition [ | – | – |
| Graph pooling [ | Graph classification [ | – | – |
| Hierarchical adaptive pooling [ | Graph classification, Graph Matching, Graph Similarity Learning [ | – | – |
| Higher-order pooling [ | Action recognition [ | – | – |
| Detachable second-order pooling [ | Classification [ | – | – |
| Detail preserving pooling [ | Classification [ | – | – |
| Local importance-based pooling [ | Classification, Detection [ | – | – |
| RNNPool [ | Classification, Visual wake words, Face Detection [ | – | – |
| Double-attention network ( | Classification [ | – | – |
| Convolutional Block Attention Module (CBAM) [ | Classification [ | Diabetic retinopathy grading [ | Retina |
| Global learnable pooling [ | Classification [ | – | – |
| Zoom-in-Net [ | – | Diabetic retinopathy grading [ | Retina |
| Recurrent attention model [ | – | Detection of pulmonary lesions [ | X-Ray (chest) |
| Attention based CNN models | – | Glaucoma detection [ | Retina |
| Thorax disease classification [ | X-Ray (chest) | ||
| Generalized max pooling [ | Classification [ | – | – |
| Task-driven feature pooling [ | Classification [ | – | – |
| Deep generalized max pooling [ | Witter identification and document classification [ | – | – |
| Adaptive spatial pooling [ | Classification [ | Retrieving brain tumors [ | CE-MRI (brain) |
| Deep Adaptive Temporal Pooling (DATP) [ | Human activity recognition [ | – | – |
| Dynamic temporal pooling [ | Time series classification [ | – | – |
| Learnable Pooling Module (LPM) [ | Full-face gaze estimation [ | Brain surface analysis [ | MRI (brain) |
| Video tagging [ | |||
| Transformation Invariant Pooling (TI-Pooling) [ | Classification [ | Neuronal structures segmentation [ | Microscopy |
| Hierarchical mix pooling [ | - | HEp-2 cell image classification [ | HEp-2 cells X-Ray |
| Tree pooling [ | Classification [ | – | – |
| Virtual Pooling (ViP) [ | Classification, Object Detection [ | – | – |
| Kernelized subspace pooling [ | Image patch matching [ | – | – |
| LiftPool [ | Classification, Segmentation [ | – | – |
Fig. 1Demonstration of relevant notations. a A set of feature maps . b An example feature map from (or the channel of ), and a pooling region defined on . c The features inside the pooling region of the selected feature map . d The feature vector, , obtained across channels at the i-th position of the feature maps
Fig. 2Max pooling and different Stochastic pooling approaches: a the standard max pooling, b stochastic pooling, c max pooling dropout, d another view of max pooling with stride = 2, and e S3 pooling. For all the above, downsampling is performed with a filter size of with a step size of 2. In b, colors corresponding to probability values. High values of red correspond to high probability values and vice versa. In c, the values in the shaded squares are dropped. In e, ‘*’ corresponds to the selected rows and columns (Color figure online)
Fig. 4Example images from different classes of the HEp-2 cell image dataset
Fig. 3Different pooling techniques to capture information about spatial structures. represents feature maps. represents different pooling regions specified by different techniques. is the pooled representation of the region . Each of the pooled representation from an individual pooling region will have a dimension of , where C is the number of channels in . The final image representation will have a dimension of , where M is the total number of pooling regions specified by the pooling algorithm (images best viewed in color) (Color figure online)
Fig. 5Example images from different classes of the DR image dataset
HEp-2 cells image dataset
| Class | Training | Testing |
|---|---|---|
| Homogeneous | 3435 | 2363 |
| Speckled | 3498 | 2403 |
| Nucleolar | 3322 | 2253 |
| Centromere | 3339 | 2419 |
| Nuclear membrane | 1169 | 930 |
| Golgi | 571 | 396 |
| Total | 15,314 | 10,764 |
DR image dataset
| Class | Training | Testing |
|---|---|---|
| No DR | 3500 | 8130 |
| Mild | 2443 | 720 |
| Moderate | 3500 | 1579 |
| Severe | 873 | 237 |
| Proliferative DR | 708 | 240 |
| Total | 11,024 | 10,906 |
Network architecture used for both HEp-2 cells and the DR datasets
| Description | HEp-2 cells dataset | DR dataset |
|---|---|---|
| Convolution layer | Conv 3 | Conv 7 |
| Transition layer | Pooling/convolution | |
| Residual blocks | Conv | |
| Transition layer | Pooling/convolution | |
| Residual blocks | Conv | |
| Transition layer | Pooling/convolution | |
| Residual blocks | Conv | |
| Global pooling layer | Pooling | |
| Fully connected layer | FC–6 | FC–5 |
Conv indicates the convolutional layers, FC represents fully connected layer
Comparison of different pooling approaches on the HEp-2 cells and DR datasets
| Pooling method | HEp-2 Cells | DR | ||
|---|---|---|---|---|
| MCA | QK | |||
| Max | 0.0000 | 0.0004 | ||
| Average | – | 0.0199 | ||
| Mixed max-average | 0.0020 | 0.0258 | ||
| GM pooling | 0.0019 | – | ||
Here, pooling is applied to all the transition layers of the CNN
Effect of the value r in GM pooling
| HEp-2 cells (MCA) | DR (QK) | |
|---|---|---|
| 2 | ||
| 3 | ||
| 5 | ||
| 7 |
Effect of mixing proportion a in mixed max-average pooling
| HEp-2 Cells (MCA) | DR (QK) | |
|---|---|---|
| 0.2 | ||
| 0.4 | ||
| 0.6 | ||
| 0.8 |
Comparison of max, avg and GM pooling
| Pooling method | HEp-2 cells | DR | |||
|---|---|---|---|---|---|
| First transition layer | Global pooling layer | MCA |
| QK |
|
| Average | Average | – | 0.0080 | ||
| Max | Average | 0.0000 | 0.0918 | ||
| Conv | Average | 0.0061 | 0.0001 | ||
| Max | Max | 0.0000 | 0.0038 | ||
| Average | Max | 0.0000 | 0.0002 | ||
| Conv | Max | 0.0001 | 0.0001 | ||
| GM | GM | 0.0000 | – | ||
Here pooling is applied at the first transition layer and at the global pooling layer only. Convolution is applied for downsampling at other transition layers
The top scores are highlighed in bold
Bilinear pooling as the global pooling operator
| Pooling method | HEp-2 Cells | DR | |||
|---|---|---|---|---|---|
| First transition layer | Global pooling layer | MCA | QK | ||
| Max | Bilinear | 0.0000 | 0.6932 | ||
| Average | Bilinear | 0.7421 | 0.0000 | ||
| Max | Average + bilinear | 0.0000 | 0.1813 | ||
| Average | Average + bilinear | – | 0.0002 | ||
| Average | Average | 0.1625 | – | ||
Convolution is applied at all the transition layers except the first one
The top scores are highlighed in bold
Effect of stochastic pooling: convolution is applied for downsampling transition layers except in the first one, where average pooling is used
| Global pooling layer | HEp-2 cells | |
|---|---|---|
| MCA | ||
| Stochastic pooling [ | 0.0200 | |
| S3 pooling [ | 0.0003 | |
| Max pooling dropout [ | 0.0009 | |
| Average pooling dropout [ | – | |
| Global average pooling with no stochasticity/dropout | 0.4585 | |
The top scores are highlighed in bold
Effect of attention weighted blocks with different pooling operations at the first and the last transition layers
| Method | DR (QK) | |||
|---|---|---|---|---|
| First transition layer | Attention | Global pooling layer | ||
| Max | – | Average | 0.0006 | |
| Max | CBAM | Average | 0.4051 | |
| Max | Average | – | ||
| GM | – | GM | 0.0219 | |
| GM | CBAM | GM | 0.0961 | |
| GM | GM | – | ||
The top scores are highlighed in bold
Comparison of our approach with the state-of-the-art methods on the DR dataset with different evaluation measures (QK, accuracy, and weighted F1 score)
| Method | Validation | Testing | ||||
|---|---|---|---|---|---|---|
| QK | Accuracy | F1-score | QK | Accuracy | ||
| MobileNet-Dense [ | – | – | – | 0.825 | – | – |
| MobileNetV2 [ | – | – | – | 0.822 | – | – |
| M-Net [ | 0.832 | – | – | 0.825 | – | – |
| Ours: max-avg | 0.858 | 84.25 | 0.844 | 0.849 | 83.17 | 0.833 |
| Ours: GM-GM | 0.852 | 83.60 | 0.841 | 0.850 | 82.63 | 0.831 |
| Ours: max-A | 0.854 | 83.83 | 0.841 | 0.851 | 82.77 | 0.831 |
| Ours: GM-A | 0.850 | 83.93 | 0.842 | 0.847 | 82.80 | 0.831 |
| Model ensemble [ | – | – | – | 0.852 | – | – |
| Min-pooling [ | 0.860 | – | – | 0.849 | – | – |
| Zoom-in-Net [ | 0.865 | – | – | 0.854 | – | – |
| o_O [ | 0.854 | – | – | 0.844 | – | – |
| Reformed gamblers [ | 0.851 | – | – | 0.839 | – | – |
| Ours | 83.37 | 0.840 | 82.34 | 0.830 | ||
The top scores are highlighed in bold
Comparison with the state-of-the-art methods on the HEp-2 cells dataset
| Method | MCA | Accuracy | F1 score |
|---|---|---|---|
| LeNet-based CNN [ | 71.88 | – | – |
| Deep CNN [ | 74.67 | – | – |
| Shape index histograms with donut-shaped spatial pooling [ | 78.70 | – | – |
| Multi-resolution patterns with ensemble SVMs [ | 87.10 | – | – |
| 87.95 | 0.88 |
The top scores are highlighed in bold
Notations used throughout this paper (please refer Fig. 1 for more information)
| Notation | Dimension | Detail |
|---|---|---|
| A set of feature maps | ||
| 1 | Width of | |
| 1 | Height of | |
| 1 | Number of feature maps (channels) in | |
| A pooling region for each feature map | ||
| 1 | Width of | |
| 1 | Height of | |
| The part of a feature map (channel) within the pooling region | ||
| 1 | The | |
| 1 | The number of elements in | |
| A feature vector across channels at the | ||