| Literature DB >> 33425040 |
D R Sarvamangala1, Raghavendra V Kulkarni2.
Abstract
Imaging techniques are used to capture anomalies of the human body. The captured images must be understood for diagnosis, prognosis and treatment planning of the anomalies. Medical image understanding is generally performed by skilled medical professionals. However, the scarce availability of human experts and the fatigue and rough estimate procedures involved with them limit the effectiveness of image understanding performed by skilled medical professionals. Convolutional neural networks (CNNs) are effective tools for image understanding. They have outperformed human experts in many image understanding tasks. This article aims to provide a comprehensive survey of applications of CNNs in medical image understanding. The underlying objective is to motivate medical image understanding researchers to extensively apply CNNs in their research and diagnosis. A brief introduction to CNNs has been presented. A discussion on CNN and its various award-winning frameworks have been presented. The major medical image understanding tasks, namely image classification, segmentation, localization and detection have been introduced. Applications of CNN in medical image understanding of the ailments of brain, breast, lung and other organs have been surveyed critically and comprehensively. A critical discussion on some of the challenges is also presented.Entities:
Keywords: Classification; Convolutional neural networks; Detection; Image understanding; Localization; Segmentation
Year: 2021 PMID: 33425040 PMCID: PMC7778711 DOI: 10.1007/s12065-020-00540-3
Source DB: PubMed Journal: Evol Intell ISSN: 1864-5909
Confusion matrix
| Total test samples | Predicted positive | Predicted negative |
|---|---|---|
| Actual positive | True positives ( | False negatives ( |
| Actual negative | False positives ( | True negatives ( |
Performance evaluation metrics for image processing
| Evaluation metric | To determine | Formula | Preferred value |
|---|---|---|---|
| Accuracy | Overall how often the result is true | High | |
| Misclassification result (error rate) | Overall how often the result is false | Low | |
| True positive rate ( | When it is actually true, how often does it predict true | High | |
| False positive rate ( | When its actually false, how often does it predict true | Low | |
| False negative rate ( | Low | ||
| Specificity | When its actually false, how often does it predict false | High | |
| Precision (positive predictive value ( | When it predicts true, how often is it correct? | High | |
| F-Score (Dice similarity coefficient) | Harmonic mean of recall and precision. | High | |
| Receiver operating characteristic (ROC) | Commonly used graph to summarizes the performance of a classifier over all possible thresholds. | Plot of recall verses FPR | High |
| Area under ROC (AUC) | The area under ROC | Plots | High |
Fig. 1Building blocks of a CNN
Various award winning CNN architectures
| Architecture | References | Architecture | References |
|---|---|---|---|
| LeNet-5 | [ | VGGNet | [ |
| AlexNet | [ | GoogLeNet | [ |
| Overfeat | [ | ResNet | [ |
| ZFNet | [ | Xception | [ |
Various existing CNN frameworks
| Deep learning frameworks | Developed at | Implemented using | Pros | Cons |
|---|---|---|---|---|
| Caffe [ | Berkeley Vision and Learning Centre | C++ | Cross platform, easy to deploy, fast, has good Matlab and Python interface, models can be trained without writing code | Usability drops outside ConvNets, not very good with new architectures |
| Keras [ | Francois Chollet, Google | Python | Easiest deep learning framework, clean API, huge active community, easily extensible, user-friendly, works with Theano and TensorFlow | Poor support for multi node training, cannot be used efficiently as an independent framework |
| TensorFlow [ | Google Brainteam | Python, C++ | Has modular architecture, supports multiple front ends and execution platforms, availability of tensorboard for visualization | Slow, slower than Theano and Torch in terms of execution time, lacks many pre-trained models, not completely open source |
| Torch [ | New York University | Lua | Modular architecture, easy to setup, displays helpful error messages and has a large amount of tutorials | Difficult to integrate since it is implemented in Lua and its data science stack in R or Python |
| PyTorch [ | Facebook artificial intelligence group | Python, C++, CUDA | Fast, minimal framework overhead, understanding error messages and stack traces is easy and hence easy to debug, good for research and development | Depends on Matplotlib and Seaborn for visualization, lacks distributed training |
| Theano [ | Python | Expressive Python syntax, large help on net, supports high level wrappers | Cross platform, needs secondary libraries for deploying, supports single GPU, error messages are not helpful | |
| Neon [ | Nervana Systems | Python | Very fast due to fast matrix operations, the same code runs on GPU and CPU | Too many errors with scattered dependencies, less material on web, architecture not user friendly, not general-purpose |
| Deeplearning4j [ | Java | Performance equal to Caffe and better than TensorFlow or Torch, fast | Generally not preferred because Java is unpopular in ML circles | |
| Cognitive Toolkit (CNTK) [ | Microsoft | C++ | Fast, accurate scalable over GPUs, flexible, allows distributed training, supports C++, Java, Python and C# | Lacks visualizations, crashes with error, no Matlab or Python bindings |
A summary of CNN applications in ILD image classification surveyed in Sect. 4
| References | Preprocessing | Architecture | Dataset | Comparison against | Performance metric |
|---|---|---|---|---|---|
| [ | Normalized with unit variance and zero-mean | Conv, ReLU, maxpool, FC | ILD | SIFT, LBP, unsupervised feature learning using RBM | Recall, precision |
| [ | Augmentation by rotating | (Conv, Leaky ReLU, avgpool) | ILD from University of Geneva and Bern University hospital | AlexNet, VGGNet | Time, ROC, AUC |
| [ | Augmentation by adding jitter and cropping | AlexNet architecture | Public ILD database | CNN patch based methods | Accuracy |
| [ | Augmentation | (Conv, ReLU, maxpool with multicrop) | LIDC-IDRI dataset | HOG, LBP | Accuracy |
A summary of CNN applications in HEp-2 cell classification surveyed in Sect. 4
| References | Preprocessing | Architecture | Dataset | Comparison against | Perfromance metric |
|---|---|---|---|---|---|
| [ | Augmented by affine transform and intensity variations, green channel images used | Conv, maxpool, ReLU, conv, avgpool, ReLU, conv, avgpool, FC with dropout, weight decay, softmax with multinomial logistic regression | ICPR 2014 | ICPR 2012 benchmark and winners model | Average classification accuracy |
| [ | Normalized using zero mean and unit variance, and rotated | (Conv, tanh, maxpool) | ICPR 2014 | ICPR 2012 classification contestants | Accuracy |
A summary of CNN applications in breast medical image classification surveyed in Sect. 4
| References | Preprocessing | Architecture | Comparison against | Performance metric |
|---|---|---|---|---|
| [ | Normalized by subtracting mean, dividing by standard deviation, augmented by flipping | (Conv, ReLU, maxpool) | BOW, HOG, SIFT, VGGNet | ROC and accuracy |
| [ | Crop, augmentation by flipping and rotating, global contrast normalization, local contrast normalization | (Conv, ReLU, maxpool) | HOG, HGD | AUC |
| [ | Feature extraction, data weighing, data labeling | (Conv, maxpool) | SVM, ANN, labeled data, unlabeled data and mixed data | Accuracy |
| [ | Image oversegmented into atomic regions using superpixel based scheme | (Conv, maxpool) | Linder, Bianconi, DCNN-Ncut + SVM, DCNN-Ncut + SMC, DCNN-Ncut + SVM, DCNN-SLIC-SVM, DCNN-SLIC-SMC | ROC, AUC, accuracy |
A summary of CNN applications in heart classification surveyed in Sect. 4
| References | Preprocessing | Architecture | Dataset | Comparison against | Performance metric |
|---|---|---|---|---|---|
| [ | ECG signals filtered using bandpass filter at 0.1-100 Hz and digitized at 360 Hz. | One-dimensional CNN with conv | MIT/BIH arrhythmia database | Other existing methods | Accuracy, sensitivity, specificity |
| [ | Image resized to | Two path CNN of seven layers (conv, ReLU, maxpool) | Tsinghua University Hospital, Beijing and Fuzhou University Hospital, China | SIFT, KAZE | Accuracy |
A summary of CNN applications in colon medical image classification surveyed in Sect. 4
| References | Preprocessing | Architecture | Comparison against | Performance metric |
|---|---|---|---|---|
| [ | Augmented using small patches | NEP with standard softmax CNN, ReLU, dropout | AUC | |
| [ | Image subdivided into | (Conv, maxpool) | Linder, Bianconi, DCNN-SW-SVM, DCNN-SW-SMC | ROC, AUC, accuracy |
| [ | Normalized by subtracting mean, dividing by standard deviation, augmented by flipping | (Conv, pool) | BFD, SSF, DT-CTW, MB-LBP, SIFT | Accuracy |
A summary of CNN applications in medical image classification surveyed in Sect. 4
| References | Organ | Preprocessing | Architecture | Dataset | Performance metric |
|---|---|---|---|---|---|
| [ | Brain | Geometric normalization of CT images | Conv | 3-D datasets from 282 subjects (51 with AD, 118 with lesions, 117 normal) | Accuracy |
| [ | Brain | Motion correction, skull stripping, spatial smoothing | LeNet | ADNI dataset | Accuracy |
| [ | Brain | Data augmentation | Input cascaded CNN for segmentation, VGG-19 with transfer learning | Radiopedia and brain tumor | Accuracy |
| [ | Eye, retinopathy | Color normalization, image size reduced to | Conv | Kaggle | Specificity |
| [ | Digestive organs | (Conv, maxpool) | Real clinical data of wireless capsule endoscopy images | Accuracy |
A summary of CNN applications in brain medical image segmentation surveyed in Sect. 5
| References | Preprocessing | Architecture | Performance metric |
|---|---|---|---|
| [ | N4ITK bias field correction, Nyul’s intensity normalization | Seven layers with conv and ReLU at 1, 3, 5, 6 and maxpool at 2 and 4, layer seven was FC with softmax | Precision |
| [ | Augmented by flipping and translation | Shallow network, (conv, max pool layers) | DSC, Haudsroff distance, contour mean distance |
| [ | Removed | Two path CNN with cascaded architecture with maxout, followed by maxpool, softmax. optimization using stochastic gradient descent, regularization using L1, L2 and dropout. | DSC |
| [ | 2 pathway, 11 layers architecture called deep medic, with three-dimensional CNN, FC three-dimensional CRF and smaller kernels | DSC, precision and sensitivity | |
| [ | Classified brain tissue pixels into classes, using different kernel sizes | Separate network branch was used for each patch size, and only output layer was shared. Mini batch learning and RMSprop to train the network with ReLU and cross entropy as the cost function. | DSC and mean surface distance between manual and automatic segmentation |
A summary of CNN applications in medical image segmentation surveyed in Sect. 5
| References | Organ | Dataset | Comparison against | Performance metric |
|---|---|---|---|---|
| [ | MS lesion | ISBI 2015 and MICCAI 2008 datasets | Five publicly available methods | DSC, TPR and FPR. Achieved better TPR and FPR, but less DSC values |
| [ | Breast | Dutch breast cancer screening dataset | State of the art methods | AUC |
| [ | Breast | The Cancer Genome Atlas breast cancer dataset | Three texture classification methods namely RPLSVM, LBPLSVM and THLB | Accuracy, efficiency and scalability and F-score of |
| [ | Nucleus | Brain tumor, pancreatic NET and breast cancer | SVM, RF and DBN | Precision, recall and F-score |
| [ | Heart | CCTA scans of 60 patients | Manual segmentation | Sensitivity |
| [ | Eye glaucoma | DRISHTI-GS, Messidor | Manual | Precision, recall, F-score |
A summary of CNN applications in medical image detection surveyed in Sect. 6
| References | Organ | Preprocessing | Architecture | Performance metric |
|---|---|---|---|---|
| [ | Breast | Threshold tissue from white background and patch-based classification | GoogLeNet | AUC for slide based classification |
| [ | Skin (melanoma) | Illumination correction step, removal of noise using gaussian filters | (Conv, ReLU, maxpool) | Sensitivity, specificity, true positive, false positive and accuracy |
| [ | Hemorrhage (eye) | Image contrast using gaussian filters, augment using vertical, horizontal flips | (Conv, ReLU, maxpool) | Sensitivity, specificity and ROC |
| [ | Eye retina | Ensemble of twelve CNN having (conv, ReLU) | Accuracy, FROC of 0.928 | |
| [ | Cell | Extraction of grayscale patches, segmentation and binarization | LeNet | Sensitivity, specificity, AUC and F-score. F-score (SVM |
A summary of CNN applications in medical image localization surveyed in Sect. 7
| References | Organ | Architecture | Dataset | Comparison against | Performance Metric |
|---|---|---|---|---|---|
| [ | Heart | (Conv, ReLU, maxpool) | Public dataset of York University | Accuracy | |
| [ | Fetal abdominal | (Conv, ReLU, maxpool) | Imagenet 2014, Shenzen Maternal and Child Healthcare Hospital, China | R-CNN and RVD | Accuracy, precision, recall, F-score |
| [ | Breast | Semi-supervised deep CNN | SVM, ANN | Accuracy |
Summary of papers reviewed for CNN applications in medical image understanding
| Image understanding tasks | Organs targeted | References |
|---|---|---|
| Medical image classification | Lung | [ |
| COVID-19 | [ | |
| HEp-2 | [ | |
| Breast | [ | |
| Heart | [ | |
| Eye | [ | |
| Digestive organs | [ | |
| Brain | [ | |
| Colon | [ | |
| Medical image segmentation | Brain | [ |
| Breast | [ | |
| MS lesion | [ | |
| Nucleus | [ | |
| Heart | [ | |
| Eye | [ | |
| Medical image detection | Breast | [ |
| Melanoma | [ | |
| Eye | [ | |
| Cell | [ | |
| Medical Image localization | Heart | [ |
| Fetal Abdomen | [ |
Ways of addressing challenges of medical image understanding
| Challenges in medical image understanding | Solutions for overcoming challenges |
|---|---|
| Less training samples | Augmentation |
| Sparsity of labeled data | Semi supervised deep CNN |
| Noise in images | Gaussian filtering |
| Thicker CT images of brain | Geometric normalization |
| Extract local and global features | Multipathway CNN |
| Avoid redundant computations | ROI segmentation using fCNN |
| Reduce classification errors | Train on misclassified image patches |
| Heart classification | Two path CNN, for spatial and temporal features |
| Imbalanced label distributions | Two phase training |
| Less resolution in brain images | Slice by slice segmentation |
| Increase accuracy and AUC in eye images (retinal eye vessel) | Ensemble learning (boosting instead of back propagation, RF instead of FC) |
Efficient CNN architectures For medical image understanding
| Organ (modality) | Diseases | Tasks | Best architecture |
|---|---|---|---|
| Brain MRI | Alzheimer | Segementation | Lenet-5, GoogleNet |
| Detection | Multipath CNN | ||
| Epilepsy | Classification | CNN with seven conv and | |
| Schizophrenic, bipolar disorder | Segementation | Patch based CNN with 4 conv, 2 maxpool and softmax regression | |
| Breast mammogram | Tumour | Segementation | (Conv + maxpool) |
| Detection | Scaled VGGnet or GoogleNet | ||
| Classification | Alexnet | ||
| Localization | Semi Supervised Deep CNN | ||
| Heart (CT) | Coronary artery calcium scoring | Segmentation | Two path CNN |
| Heart (ECG) | ECG | Classification | Two path CNN, one along spatial and the other along temporal and fused together finally |
| Heart (MRI) left ventricle | Localization | (Conv, maxpool) | |
| Lung (CT) | Cancer | Classification | Ensemble of overfeat across axial, saggital and coronal plane |
| Nodule | Detection | Lenet, Overfeat | |
| COVID-19 | Classification | multi scale multi encoder ensemble CNN model | |
| ILD | Classification | Any CNN architecture like Alexnet | |
| Hep-2 cell | Classification | (Conv, maxpool) | |
| Eye | Haemorrhage | Detection | (Conv, ReLu, maxpool) |
| Glaucoma | Detection | (Conv, ReLu, maxpool) | |
| Retinopathy | Segmentation | (Conv, ReLu, maxpool) | |
| Colon | Polyp | Classification | Any simple CNN architecture |
| Polyp | Detection | Ensemble of CNN | |
| Skin | Melanoma | Detection | (Conv, ReLu, maxpool) |
| Liver (CT) | Classification | (Conv, maxpool) | |
| Abdomen (US scan) | Fetus | Localization | (Conv, ReLU, maxpool) |
Fig. 2Bar chart summarizing the number of papers surveyed