Literature DB >> 33425040

Convolutional neural networks in medical image understanding: a survey.

D R Sarvamangala¹, Raghavendra V Kulkarni².

Abstract

Imaging techniques are used to capture anomalies of the human body. The captured images must be understood for diagnosis, prognosis and treatment planning of the anomalies. Medical image understanding is generally performed by skilled medical professionals. However, the scarce availability of human experts and the fatigue and rough estimate procedures involved with them limit the effectiveness of image understanding performed by skilled medical professionals. Convolutional neural networks (CNNs) are effective tools for image understanding. They have outperformed human experts in many image understanding tasks. This article aims to provide a comprehensive survey of applications of CNNs in medical image understanding. The underlying objective is to motivate medical image understanding researchers to extensively apply CNNs in their research and diagnosis. A brief introduction to CNNs has been presented. A discussion on CNN and its various award-winning frameworks have been presented. The major medical image understanding tasks, namely image classification, segmentation, localization and detection have been introduced. Applications of CNN in medical image understanding of the ailments of brain, breast, lung and other organs have been surveyed critically and comprehensively. A critical discussion on some of the challenges is also presented.

Entities: Chemical

Keywords: Classification; Convolutional neural networks; Detection; Image understanding; Localization; Segmentation

Year: 2021 PMID： 33425040 PMCID： PMC7778711 DOI： 10.1007/s12065-020-00540-3

Source DB: PubMed Journal: Evol Intell ISSN： 1864-5909

Introduction

Loss of human lives can be prevented or the medical trauma experienced in an injury or a disease can be reduced through the timely diagnosis of medical anomalies. Medical anomalies include glaucoma, diabetic retinopathy, tumors [34], interstitial lung diseases [44], heart diseases and tuberculosis. Diagnosis and prognosis involve the understanding of the images of the affected area obtained using X-ray, magnetic resonance imaging (MRI), computed tomography (CT), positron emission tomography (PET), single photon emission computed tomography or ultrasound scanning. Image understanding involves the detection of anomalies, ascertaining their locations and borders, and estimating their sizes and severity. The scarce availability of human experts and their fatigue, high consultation charges and rough estimate procedures limit the effectiveness of image understanding. Further, shapes, locations and structures of the medical anomalies are highly variable [55]. This makes diagnosis difficult even for specialized physicians [4]. Therefore, human experts often feel a need for support tools to aid in precise understanding of medical images. This is the motivation for intelligent image understanding systems. Image understanding systems that exploit machine learning (ML) techniques are fast evolving in recent years. ML techniques include decision tree learning [35], clustering, support vector machines (SVMs) [47], k-means nearest neighbor (K-NN), restricted Boltzmann machines (RBMs) [42] and random forests (RFs) [28]. The pre-requisite for ML techniques to work efficiently is the extraction of discriminant features. And these features are generally unknown and is also a very challenging task especially for applications involving image understanding and is still a topic of research. A logical step to overcome was to create intelligent machines which could learn features needed for image understanding and extract it on its own. One such intelligent and successful model is the convolutional neural network (CNN) model, which automatically learns the needed features and extracts it for medical image understanding. The CNN model is made of convolutional filters whose primary function is to learn and extract necessary features for efficient medical image understanding. CNN started gaining popularity in the year 2012, due to AlexNet [41], a CNN model, which defeated all the others models with a record accuracy and low error rate in imageNet challenge 2012. CNN has been used by corporate giants for providing internet services, automatic tagging in images, product recommendations, home feed personalization and autonomous cars [59]. The major applications of the CNN are in image and signal processing, natural language processing and data analytics. The CNN had a major breakthrough when GoogleNet used it to detect cancer at an accuracy of 89% while human pathologists could achieve the accuracy of only 70% [3].

Motivation and purpose

CNNs have contributed significantly in the areas of image understanding. CNN-based approaches are placed in the leader board of the many image understanding challenges, such as Medical Image Computing and Computer Assisted Intervention (MICCAI) biomedical challenge, Brain Tumor segmentation (BRATS) Multimodal Brain Tumor Segmentation challenge [48], Imagenet classification challenge, challenges of International Conference on Pattern Recognition (ICPR) [31] and Ischemic Stroke Lesion Segmentation (ISLES) challenge [32]. CNN has become a powerful choice as a technique for medical image understanding. Researchers have successfully applied CNNs for many medical image understanding applications like detection of tumors and their classification into benign and malignant [52], detection of skin lesions [50], detection of optical coherence tomography images [39], detection of colon cancer [71], blood cancer, anomalies of the heart [40], breast [36], chest, eye etc. Also CNN-based models like CheXNet [56, 58], used for classifying 14 different ailments of the chest achieved better results compared to the average performance of human experts. CNNs have also dominated the area of COVID-19 detection using chest X-rays/CT scans. Research involving CNNs is now a dominant topic at major conferences. In addition, there are special issues reserved in reputed journals for solving challenges using deep learning models. The vast amount of literature available on CNNs is the testimonial of their efficiency and the widespread use. However, various research communities are developing these applications concurrently and the dissemination results are scattered in a wide and diverse range of conference proceedings and journals. A large number of surveys on deep learning have been published recently. A review of deep learning techniques applied in medical imaging, bioinformatics and pervasive sensing has been presented in [60]. A thorough review of deep learning techniques for segmentation of MRI images of brain has been presented in [2]. Survey of deep learning techniques for medical image segmentation, their achievements and challenges involved in medical image segmentation has been presented in [27] Though literature is replete with many survey papers, most of them concentrate on deep learning models which include CNN, recurrent neural network, generative adversial network or on a particular application. There is also no coverage of the application of CNN in early detection of COVID-19 as well as many other areas. The survey includes research papers on various applications of CNNs in medical image understanding. The papers for the survey are queried from various journal websites. Additionally, arxiv, conference proceedings of various medical image challenges are also included in the survey. Also the references of these papers are checked. The query used are: “CNN” or “deep learning” or “convolutional neural network” or terms related to medical image understanding. These terms had to be present either in title or abstract to be considered. The objective of this survey is to offer a comprehensive overview of applications and methodology of CNNs and its variants, in the fields of medical image understanding including the detection of latest global pandemic COVID-19. The survey includes overview tables which can be used for quick reference. The authors leverage experiences of their own and that of the research fraternity on the applications of CNNs to provide an insight into various state of the art CNN models, challenges involved in designing CNN model, overview of research trends in the field, and to motivate medical image understanding researchers and medical professionals to extensively apply CNNs in their research and diagnosis, respectively.

Contributions and the structure

Primary contributions of this article are as follows: The remainder of this article has been organized as follows: Medical image understanding has been briefly introduced in Sect. 2. A brief introduction of CNN and its architecture has been presented in Sect. 3. The applications of CNN in medical image understanding have been surveyed comprehensively through Sects. 4–7. Finally, concluding remarks and a projection of the trends in CNN applications in image understanding have been presented in Sect. 8. To briefly introduce medical image understanding and CNN. To convey that CNN has percolated in the field of medical image understanding. To identify the various challenges in medical image understanding. To highlight contributions of CNN to overcome those challenges

Medical image understanding

Medical imaging is necessary for the visualization of internal organs for the detection of abnormalities in their anatomy or functioning. Medical image capturing devices, such as X-ray, CT, MRI, PET and ultrasound scanners capture the anatomy or functioning of the internal organs and present them as images or videos. The images and videos must be understood for the accurate detection of anomalies or the diagnosis of functional abnormalities. If an abnormality is detected, then its exact location, size and shape must be determined. These tasks are traditionally performed by the trained physicians based on their judgment and experience. Intelligent healthcare systems aim to perform these tasks using intelligent medical image understanding. Medical image classification, segmentation, detection and localization are the important tasks in medical image understanding.

Medical image classification

Medical image classification involves determining and assigning labels to medical images from a fixed set. The task involves the extraction of features from the image, and assigning labels using the extracted features. Let I denote an image made of pixels and denote the labels. For each pixel x, a feature vector , consisting of values is extracted from the neighborhood N(x) using (1), where for .A label from the list of labels is assigned to the image based on .

Medical image segmentation

Medical image segmentation helps in image understanding, feature extraction and recognition, and quantitative assessment of lesions or other abnormalities. It provides valuable information for the analysis of pathologies, and subsequently helps in diagnosis and treatment planning. The objective of segmentation is to divide an image into regions that have strong correlations. Segmentation involves dividing the image I into a finite set of regions as expressed in (2).

Medical image localization

Automatic localization of pathology in images is quite an important step towards automatic acquisition planning and post imaging analysis tasks, such as segmentation and functional analysis. Localization involves predicting the object in an image, drawing a bounding box around the object and labeling the object. The localization function f(I) on an image I computes , which represent respectively, class label, centroid x and y coordinates, and the proportion of the bounding box with respect to width and height of the image as expressed in (3).

Medical image detection

Image detection aims at the classification and the localization of regions of interest by drawing bounding boxes around multiple regions of interest and labeling them. This helps in determining the exact locations of different organs and their orientation. Let I be an image with n objects or regions of interest. Then detection function D(I) computes and these are respectively the class label, centroid x and y coordinates, proportion of the bounding box with respect to width and height of the image I as given in the (4) Confusion matrix

Evaluation metrics for image understanding

There are many metrics used for the evaluation of performance of medical image understanding algorithms. The confusion matrix, also known as the error matrix, is the table used for visualizing the performance of an algorithm and for calculation of various evaluation metrics. It provides an insight about the types of errors that are made by the classifier. It is a square matrix in which rows represent the instances of actual results and the columns represent the instances of predicted results of the algorithm. The confusion matrix of a binary classifier is shown in Table 1.

Table 1

Confusion matrix

Total test samples	Predicted positive	Predicted negative
Actual positive	True positives (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_P$$\end{document}TP)	False negatives (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_N$$\end{document}FN)
Actual negative	False positives (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_P$$\end{document}FP)	True negatives (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_N$$\end{document}TN)

Performance evaluation metrics for image processing Here, indicates correctly identified positives, indicates correctly identified negatives, indicates incorrectly identified positives and indicates incorrectly identified negatives. is also known as false error and is known as miss. The sum of correct and incorrect predictions is represented as T and expressed as in (5).Performance metrics can be determined with the help of confusion matrix and are given in Table 2.

Table 2

Performance evaluation metrics for image processing

Evaluation metric	To determine	Formula	Preferred value
Accuracy	Overall how often the result is true	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{T_P+T_N}{T} $$\end{document}TP+TNT	High \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\approx 1 )$$\end{document}(≈1)
Misclassification result (error rate)	Overall how often the result is false	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{F_P-F_N}{T} $$\end{document}FP-FNT	Low \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\approx 0 )$$\end{document}(≈0)
True positive rate (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{PR}$$\end{document}TPR) (recall or sensitivity)	When it is actually true, how often does it predict true	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{T_P}{F_N+T_P} $$\end{document}TPFN+TP	High \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\approx 1 )$$\end{document}(≈1)
False positive rate (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_{PR}$$\end{document}FPR)	When its actually false, how often does it predict true	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{F_P}{F_P+T_N} $$\end{document}FPFP+TN	Low \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\approx 0 )$$\end{document}(≈0)
False negative rate (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_{NR}$$\end{document}FNR)		\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{F_N}{F_N+T_P} $$\end{document}FNFN+TP	Low \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\approx 0 )$$\end{document}(≈0)
Specificity	When its actually false, how often does it predict false	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{T_N}{F_P+T_N} $$\end{document}TNFP+TN	High \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\approx 1 )$$\end{document}(≈1)
Precision (positive predictive value (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P_{PV}$$\end{document}PPV))	When it predicts true, how often is it correct?	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{T_P}{F_P+T_P} $$\end{document}TPFP+TP	High \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\approx 1 )$$\end{document}(≈1)
F-Score (Dice similarity coefficient)	Harmonic mean of recall and precision.	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2 \times \frac{P_{PV} \times T_{PR}}{P_{PV}+T_{PR}} $$\end{document}2×PPV×TPRPPV+TPR	High \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\approx 1 )$$\end{document}(≈1)
Receiver operating characteristic (ROC)	Commonly used graph to summarizes the performance of a classifier over all possible thresholds.	Plot of recall verses FPR	High \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\approx 1 )$$\end{document}(≈1)
Area under ROC (AUC)	The area under ROC	Plots \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{PR}$$\end{document}TPR versus \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_{PR}$$\end{document}FPR	High \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\approx 1 )$$\end{document}(≈1)

A brief introduction to CNNs

Image understanding by animals is a very fascinating process, and a very simple task for them. But for a machine, to understand an image, there are lot of hidden complexities during the process. What animals feel is the eyes capturing the image, which is processed by the neurons and sent to the brain for interpretation. CNN is a deep learning algorithm inspired by the visual cortex of animal brain [30] and aims to imitate the visual machinery of animals. CNNs represents a quantum leap in the field of image understanding, involving image classification, segmentation, localization, detection etc. The efficacy of CNNs in image understanding is the main reason of its abundant use. CNNs are made of convolutions having learnable weights and biases similar to neurons (nerve cells) of the animal. Convolutional layers, activation functions, pooling and fully-connected layers are the core building blocks of CNNs, as depicted in Fig. 1. Very brief introduction to CNNs has been presented in this paper. Detailed discussions on CNNs are presented in [9, 41].

Fig. 1

Building blocks of a CNN

Convolution layers (Conv layers)

The visual cortex of the animal brain is made of neuronal cells which extract features of the images. Each neuronal cell extracts different features, which help in image understanding. The conv layer is modeled over the neuronal cells and its objective is to extract features, such as edges, colors, texture and gradient orientation. Conv layers are made of learnable filters called convolutional filters, or kernels, of size , where d is the depth of the image. During the forward pass, the kernels are convolved across the width and height of input volume and dot product is computed between the entries of the filter and the input. Intuitively, the CNN learns filters that gets activated when they come across edge, colors, texture etc. The output of the conv layer is fed into an activation function layer.

Activation functions or nonlinear functions

Since data in real world is mostly nonlinear, activation functions are used for nonlinear transformation of the data. It is used to ensure that the representation in the input space is mapped to a different output space as per the requirements. The different activation functions are discussed in Sects. 3.2.1–3.2.3.

Sigmoid

It takes a real-valued number x and squashes it into range between 0 and 1. In particular, large negative and positive inputs are placed very close to 0 and unity, respectively. It is expressed as in (6).

Tan hyperbolic

It takes a real valued number x and squashes it between to 1 as expressed in (7).

Rectified linear unit (ReLU)

This nonlinear function takes a real valued number x and converts x to 0 if x is negative. ReLU is the most often used nonlinear function for CNN, takes less computation time and hence faster compared to the other two and is expressed in (8).

Pooling

Pooling layer performs a nonlinear down sampling of convolved feature. It decreases the computational power required to process the data through dimensionality reduction. It reduces the spatial size by aggregating data over space or feature type, controls overfitting and overcomes translation and rotational variance of images. Pooling operation results in partitioning of its input into a set of rectangle patches. Each patch gets replaced by a single value depending on the type of pooling selected. The different types are maximum pooling and average pooling.

Fully connected (FC) layer

FC layer is similar to artificial neural network, where each node has incoming connections from all the inputs and all the connections have weights associated with them. The output is sum of all the inputs multiplied by the corresponding weights. FC layer is followed by sigmoid activation function and performs the classifier job.

Data preprocessing and augmentation

The raw images obtained from imaging modalities need to be preprocessed and augmented before sending to CNN. The raw image data might be skewed, altered by bias distortion [55], having intensity inhomogeneity during capture, and hence needs to be preprocessed. Multiple data preprocessing methods exist and the preferred methods are mean subtraction and normalization. CNN needs to be trained on a larger dataset to achieve the best performance. Data augmentation increases the existing set of images by horizontal and vertical flips, transformations, scaling, random cropping, color jittering and intensity variations. The preprocessed, augmented image data is then fed into CNN.

CNN architectures and frameworks

Many CNN architectures have been proposed by researchers depending on kind of task to be performed. A few award-winning architectures are listed in Table 3. CNN frameworks (toolkits) enable the efficient development and implementation of deep learning methods. Various frameworks used by researchers and developers is listed in Table 4.

Table 3

Various award winning CNN architectures

Architecture	References	Architecture	References
LeNet-5	[43]	VGGNet	[70]
AlexNet	[41]	GoogLeNet	[3]
Overfeat	[65]	ResNet	[26]
ZFNet	[79]	Xception	[12]

Table 4

Various existing CNN frameworks

Deep learning frameworks	Developed at	Implemented using	Pros	Cons
Caffe [33]	Berkeley Vision and Learning Centre	C++	Cross platform, easy to deploy, fast, has good Matlab and Python interface, models can be trained without writing code	Usability drops outside ConvNets, not very good with new architectures
Keras [11]	Francois Chollet, Google	Python	Easiest deep learning framework, clean API, huge active community, easily extensible, user-friendly, works with Theano and TensorFlow	Poor support for multi node training, cannot be used efficiently as an independent framework
TensorFlow [1]	Google Brainteam	Python, C++	Has modular architecture, supports multiple front ends and execution platforms, availability of tensorboard for visualization	Slow, slower than Theano and Torch in terms of execution time, lacks many pre-trained models, not completely open source
Torch [15]	New York University	Lua	Modular architecture, easy to setup, displays helpful error messages and has a large amount of tutorials	Difficult to integrate since it is implemented in Lua and its data science stack in R or Python
PyTorch [53]	Facebook artificial intelligence group	Python, C++, CUDA	Fast, minimal framework overhead, understanding error messages and stack traces is easy and hence easy to debug, good for research and development	Depends on Matplotlib and Seaborn for visualization, lacks distributed training
Theano [3]		Python	Expressive Python syntax, large help on net, supports high level wrappers	Cross platform, needs secondary libraries for deploying, supports single GPU, error messages are not helpful
Neon [51]	Nervana Systems	Python	Very fast due to fast matrix operations, the same code runs on GPU and CPU	Too many errors with scattered dependencies, less material on web, architecture not user friendly, not general-purpose
Deeplearning4j [59]		Java	Performance equal to Caffe and better than TensorFlow or Torch, fast	Generally not preferred because Java is unpopular in ML circles
Cognitive Toolkit (CNTK) [64]	Microsoft	C++	Fast, accurate scalable over GPUs, flexible, allows distributed training, supports C++, Java, Python and C#	Lacks visualizations, crashes with error, no Matlab or Python bindings

Various award winning CNN architectures Various existing CNN frameworks

CNN applications in medical image classification

Lung diseases

Interstitial lung disease (ILD) is the disorder of lung parenchyma in which lung tissues get scarred leading to respiratory difficulty. High resolution computed tomography (HRCT) imaging is used to differentiate between different types of ILDs. HRCT images have a high visual variation between different classes and high visual similarity within the same class. Therefore, accurate classification is quite challenging.

Ensemble CNN

Ensemble of rf and overfeat for classification of pulmonary peri fissural nodules of lungs was proposed in [14]. The complexity of the input was reduced by extracting two-dimensional views from three-dimensional volume. The performance was enhanced by using a combination of overfeat followed by rf. The bagging technique of rf boosted the performance of the model. The proposed model obtained an AUC of .

Small-kernel CNN

Low level textual information and more non linear activations enhances performance of classification was emphasized by [4]. The authors shrinked the kernel size to to involve more non linear activations. The receptive fields were kept smaller to capture low level textual information. Also, to handle increasing complexity of the structures, the number of kernels were made proportional to the number of receptive field of its neurons. The model classified the lung tissue image into seven classes (a healthy tissue and six different ILD patterns). The results were compared against AlexNet and VGGNet and the ROC curves. The structure took only 20 s to classify the whole lung area in 30 slices of an average size HRCT scan image. AlexNet and VGG-Net took 136 s and 160 s for classification. The model delivered a classification accuracy of , while the traditional methods delivered an accuracy of .

Whole image CNN

Smaller image patches to prevent loss of spatial information and different attenuation ranges to enhance better visibility was proposed in [18]. Since the images were RGB, the proposed CNN model used three lung attenuation ranges namely, lower attenuation, normal attenuation and higher attenuation. To avoid overfitting, the images were augmented by jitter and cropping. A simple Alexnet model with the above variations was implemented and compared against other CNN models implemented to work on image patches. The performance metrics were accuracy and F-score. The model obtained an F-score of and the average accuracy of .

Multicrop pooling CNN

Limitation of reduced training samples can be overcome by extraction of salient multi scale features. The features were extracted using multicrop pooling for automatic lung nodule malignancy suspicious classification in [68]. The model was a simple 3 layered CNN architecture but with multicrop pooling and randomized leaky ReLu as activation. The proposed method obtained accuracy and AUC of and . Fivefold cross validation was used for evaluation. The CNN applications in lung classification is summarized in Table 5.

Table 5

A summary of CNN applications in ILD image classification surveyed in Sect. 4

References	Preprocessing	Architecture	Dataset	Comparison against	Performance metric
[44]	Normalized with unit variance and zero-mean	Conv, ReLU, maxpool, FC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 3$$\end{document}×3, backpropagation, batch processing, dropout of 0.6	ILD	SIFT, LBP, unsupervised feature learning using RBM	Recall, precision
[4]	Augmentation by rotating	(Conv, Leaky ReLU, avgpool) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 5$$\end{document}×5, three FC, softmax classifier	ILD from University of Geneva and Bern University hospital	AlexNet, VGGNet	Time, ROC, AUC
[18]	Augmentation by adding jitter and cropping	AlexNet architecture	Public ILD database	CNN patch based methods	Accuracy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$87.9\%$$\end{document}87.9% compared to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$86.1\%$$\end{document}86.1% using patch based methods
[68]	Augmentation	(Conv, ReLU, maxpool with multicrop) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 3 $$\end{document}×3 + FC, softmax	LIDC-IDRI dataset	HOG, LBP	Accuracy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$87.4\%$$\end{document}87.4% and AUC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$93\%$$\end{document}93%

A summary of CNN applications in ILD image classification surveyed in Sect. 4

Coronavirus disease 2019 (COVID-19)

COVID-19 is a global pandemic disease spreading rapidly around the world. Reverse Transcription Polymerase Chain Reaction (RT-PCR) is a commonly employed test for detection of COVID-19 infection. RT-PCR testing is the gold standard for COVID-19 testing, RT-PCR is very complicated, time-consuming and labor-intensive process, sparse availability and not very accurate. Chest X-ray could be used for the initial screening of the COVID-19 in places having shortage of RT-PCR kits and is more accurate at diagnosis. Many researchers have used deep learning to classify if the chest infection is due to COVID-19 or other ailments.

Customized CNN

One of the initial model proposed for detection of COVID-19 was a simple pretrained AlexNet model proposed in [45] and fine tuned on chest X-ray images. The results were very promising with accuracy of classifying positive and negative patients of around . Use of pretrained with transfer learning ResNet and InceptionNet CNN models were also proposed. These models demonstrated that the transfer learning models were also efficient and achieved a test accuracy of .

Bayesian CNN

Uncertainty was explored to enhance the diagnostic performance of classification of COVID-19 datasets in [22]. The primary aim of proposed method was to avoid COVID-19 misdiagnoses. The method explored Monte-Carlo Dropweights Bayesian CNN to estimate uncertainty in deep learning, to better the diagnostic performance of human-machine decisions. The method showed that there is a strong correlation between classification accuracy and estimated uncertainty in predictions. The proposed method used ResNet50v2 model. The softmax layer was preceded by dropweights. Dropweights were applied as an approximation to the Gaussian Process, which was used to estimate meaningful model uncertainty. The softmax layer finally outputs each possible class label’s probability distribution.

PDCOVIDNET

Use of dilation to detect dominant features in the image was explored in [13]. The authors proposed parallel dilated CNN model. The dilated module involved skipping of pixels during convolution process. Parallel CNN branches are proposed with different dilation rates. The features obtained from parallel branches were concatenated and input to the next convolution layer. Concatenation-convolution operation was used to explore feature relationship of dilated convolutions so as to detect dominant features for classification. The model also used Grad-CAM and Grad-CAM++ to highlight the regions of class-discriminative saliency maps. The performance metrics used were accuracy, precision, recall, F1-score with ROC/AUC and are and 0.991 respectively.

CVR-Net

To prevent degrading of final prediction and to compensate for lesser number of datasets, multi scale multi encoder ensemble CNN model for classification of COVID-19 was proposed in [24]. The proposed model ensembled feature maps at different scales obtained from different encoders. To avoid overfitting, geometry based image augmentations and transfer learning was proposed. To overcome vanishing gradients, each encoder consisted of residual and convolutional blocks to allow gradients to pass, like in resNet architecture. Moreover Depth-wise separable convolution was used to create a light weight network. The depth information of feature map was enhanced by concatenating different 2D feature maps of different encoders in channel-wise. The performance metrics for classifying images into positive and negative were recall, precision, f1-score and accuracy. The model showed a very efficient performance with score of nearly for all the metrics.

Twice transfer learning CNN

A denseNet model trained twice using transfer learning approach was proposed in [6]. The denseNet201 model was trained initially on imageNet dataset , followed by chest X-ray 14 dataset and then fine tuned on COVID-19 dataset. Various combinations of training the model first with single transfer learning, twice transfer learning, twice transfer learning with output neuron keeping were experimented. The model with twice transfer learning with output neuron keeping achieved the best performance accuracy of over the other models. Transfer learning on chest X-ray 14 dataset enhanced the result, as the model had learnt most of the features related to chest abnormalities.

Immune response abnormalities

Autoimmune diseases result from an abnormal immune response to a normal body part. The immune system of the body attacks the healthy cells in such diseases. Indirect immunofluorescence (IIF) on human epithelial-2 (HEp-2) cells is used to diagnose an autoimmune disease. Manual identification of these patterns is a time-consuming process.

CUDA ConvNet CNN

Preprocessing using histogram equalization and zero-mean with unit variance increases classification accuracy by an additional with augmentation was proposed in [7]. The experiments also demonstrate that pretraining, followed by fine tuning boosts performance. It achieved an average classification accuracy of which was greater than the previous best of . The authors used Caffe library [33] and CUDA ConvNet model architecture to extract CNN-based features for classification of HEp-2 cells.

Six-layer CNN

Preprocessing and augmentation enhanced the mean classification accuracy of HEp-2 cell images and was shown in [21]. The framework consisted of three stages of image preprocessing, network training and feature extraction with classification. Mean classification accuracy of on ICPR-2012 dataset was obtained. The CNN approaches for HEp-2 cell classification are summarized in Table 6.

Table 6

A summary of CNN applications in HEp-2 cell classification surveyed in Sect. 4

References	Preprocessing	Architecture	Dataset	Comparison against	Perfromance metric
[7]	Augmented by affine transform and intensity variations, green channel images used	Conv, maxpool, ReLU, conv, avgpool, ReLU, conv, avgpool, FC with dropout, weight decay, softmax with multinomial logistic regression	ICPR 2014	ICPR 2012 benchmark and winners model	Average classification accuracy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$81\%$$\end{document}81%
[21]	Normalized using zero mean and unit variance, and rotated	(Conv, tanh, maxpool) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 3$$\end{document}×3, FC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 2$$\end{document}×2, softmax regression, outputs six classes	ICPR 2014	ICPR 2012 classification contestants	Accuracy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$96.17\%$$\end{document}96.17%

A summary of CNN applications in HEp-2 cell classification surveyed in Sect. 4

Breast tumors

Breast cancer is the most common cancer that affects women across the world. It can be detected by the analysis of mammographs. Two radiologists independently reading the same mammogram has been advocated to overcome any misjudgement.

Stopping monitoring CNN

Stopping monitoring to reduce the computation time is proposed in [52]. Stopping monitoring was performed using AUC on validation set. The model extracted region of interest (ROI) by cropping. The images were augmented to increase the number of samples and also to prevent overfitting. The CNN model proposed was for classification of breast tumors. The result was compared against the state-of-the-art image descriptors HOG and HOG divergence. The proposed method resulted in AUC of compared to for other methods. Fine tuning enhanced the performance in the case of limited dataset was proposed in [34]. The proposed model was similar to AlexNet and was pretrained on imagenet, followed by fine tuning on breast images due to shortage of breast images. The middle and high level features were extracted from different layers of the network and fed into SVM classifiers for training. The model classified the breast masses into malignant and benign. Due to extraction of efficient features by deep network, simple classifier also resulted in accuracy of . The proposed method was compared against bag-of-words, HOG and SIFT and it outperformed all of them.

Semi-supervised CNN

CNN can also be used in scenarios involving sparse labeled data and abundant unlabeled data. To overcome the sparse labeled data problem, a new graph based semi supervised learning techniques for breast cancer diagnosis was proposed in [73]. For removal of redundancies and feature correlations, dimensionality reduction was employed. The method used four modules, feature extraction to extract 21 features from breast masses, data weighing to minimize the influence of noisy data and division of co-training data labeling followed by the CNN. It involved sub patches extraction of ROIs which were input to three pairs of conv layers, maxpooling and FC layer. Three models CNN, SVM and ANN are compared. The AUC for CNN for mixture of labeled and unlabeled data was compared to of SVM and of ANN and accuracy for CNN with mixed data was compared to for SVM and for ANN. The CNN approaches for breast tumor classification are summarized in Table 7.

Table 7

A summary of CNN applications in breast medical image classification surveyed in Sect. 4

References	Preprocessing	Architecture	Comparison against	Performance metric
[34]	Normalized by subtracting mean, dividing by standard deviation, augmented by flipping	(Conv, ReLU, maxpool) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 5$$\end{document}×5, (FC + ReLU) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 2$$\end{document}×2 + FC, stochastic gradient descent	BOW, HOG, SIFT, VGGNet	ROC and accuracy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$96.7\%$$\end{document}96.7%
[52]	Crop, augmentation by flipping and rotating, global contrast normalization, local contrast normalization	(Conv, ReLU, maxpool)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 2$$\end{document}×2 + FC	HOG, HGD	AUC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$82.2\%$$\end{document}82.2%
[73]	Feature extraction, data weighing, data labeling	(Conv, maxpool) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 3$$\end{document}×3, FC	SVM, ANN, labeled data, unlabeled data and mixed data	Accuracy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$82.43\%$$\end{document}82.43%, AUC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$88.18\%$$\end{document}88.18%
[78]	Image oversegmented into atomic regions using superpixel based scheme	(Conv, maxpool) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 2$$\end{document}×2, FC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 2$$\end{document}×2, softmax classifier	Linder, Bianconi, DCNN-Ncut + SVM, DCNN-Ncut + SMC, DCNN-Ncut + SVM, DCNN-SLIC-SVM, DCNN-SLIC-SMC	ROC, AUC, accuracy

A summary of CNN applications in breast medical image classification surveyed in Sect. 4

Heart diseases

Electrocardiogram (ECG) is used for the assessment of the electrical activity of the heart to detect anomalies in the heart.

One-dimensional CNN

ECG classification using CNN model demonstrates superior performance with classification accuracy of was proposed in [40]. The model comprised of one-dimensional CNN with three conv layers and two FC layers that fused feature extraction and classification into a single learning body. Once the dedicated CNN was trained for a particular patient, it could be solely used to classify ECG records in a fast and accurate manner.

Fused CNN

Classification of echocardiography videos require both spatial and temporal data. Fused CNN architecture using both spatial and temporal data was proposed in [20]. It used a two-path CNN, one along the spatial direction and the other along temporal direction. Each individual CNN path executed individually and was fused only after obtaining the final classification scores. The spatial CNN learnt the spatial information automatically from the original normalised echo video images. Temporal CNN learnt from acceleration images along the time direction of the echo videos. The outputs of both CNNs were fused and applied to softmax classifier for the final classification. The proposed model achieved an average accuracy of compared to for single path CNN, for three-dimensional KAZE, for three-dimensional SIFT. The long time required for initial training was the disadvantage of this approach. The CNN approaches to heart classification are summarized in Table 8.

Table 8

A summary of CNN applications in heart classification surveyed in Sect. 4

References	Preprocessing	Architecture	Dataset	Comparison against	Performance metric
[40]	ECG signals filtered using bandpass filter at 0.1-100 Hz and digitized at 360 Hz.	One-dimensional CNN with conv \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 3$$\end{document}×3, FC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 2$$\end{document}×2	MIT/BIH arrhythmia database	Other existing methods	Accuracy, sensitivity, specificity
[20]	Image resized to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$175\times 200\times 26$$\end{document}175×200×26	Two path CNN of seven layers (conv, ReLU, maxpool) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 2$$\end{document}×2, (conv, ReLU) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 2$$\end{document}×2, (conv, ReLU, maxpool), (conv, dropout) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 2$$\end{document}×2	Tsinghua University Hospital, Beijing and Fuzhou University Hospital, China	SIFT, KAZE	Accuracy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$92.1\%$$\end{document}92.1%

A summary of CNN applications in heart classification surveyed in Sect. 4

Eye diseases

Gaussian initialized CNN

Initial training time can be reduced by Gaussian initialization, and overfitting can be avoided by weighted class weights. This was proposed for classifying diabetic retinopathy (DR) in fundus imagery in [57]. The performance was compared with SVM and other methods that required feature extraction prior to classification. The method achieved specificity but less sensitivity of . The trained CNN did a quick diagnosis and gave an immediate response to the patient during screening.

Hyper parameter tuning inception-v4

Automated hyper parameter tuning inception-v4 (HPTI-v4) model for DR in color fundus images classification and detection is proposed in [67]. The images are preprocessed using CLAHE to enhance contrast level, segmented using histogram based segmentation model. Hyper parameter tuning is done using Bayesian optimization method, as Bayesian model has the ability to analyze the previous validation outcome, to create a probabilistic model. Classification is done using HPTI-v4 model followed by multi layer perceptron. The classification is applied on MESSIDOR DR dataset. The CNN model performance was extraordinary with the accuracy, sensitivity, and specificity of , , and respectively.

Colon cancer

Usage of small patches increased the amount of training data and localized the analysis to small nuclei in images. This enhanced the performance of detecting and classifying nuclei in H&E stained histopathology images of colorectal adenocarcinoma. This was proposed in [71]. The model also demonstrated locality sensitive deep learning approach with neighboring ensemble predictor (NEP) in conjunction with a standard softmax CNN and eliminated need of segmentation. The model used dropout to avoid overfitting. The model obtained an AUC of and F-score of The CNN approaches for colon cancer classification are summarized in Table 9.

Table 9

A summary of CNN applications in colon medical image classification surveyed in Sect. 4

References	Preprocessing	Architecture	Comparison against	Performance metric
[71]	Augmented using small patches	NEP with standard softmax CNN, ReLU, dropout		AUC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$91.7\%$$\end{document}91.7%, F-score \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$78.4\%$$\end{document}78.4%.
[78]	Image subdivided into \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$80 \times 80$$\end{document}80×80 using sliding window, and border padding to prevent artifacts	(Conv, maxpool) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 2$$\end{document}×2, FC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 2$$\end{document}×2, softmax	Linder, Bianconi, DCNN-SW-SVM, DCNN-SW-SMC	ROC, AUC, accuracy
[61]	Normalized by subtracting mean, dividing by standard deviation, augmented by flipping	(Conv, pool) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 2$$\end{document}×2, conv, FC with dropout, softmax	BFD, SSF, DT-CTW, MB-LBP, SIFT	Accuracy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$90.96\%$$\end{document}90.96%

A summary of CNN applications in colon medical image classification surveyed in Sect. 4

Brain disorders

Alzheimer’s disease causes the destruction of brain cells leading to memory loss. Classification of Alzheimer’s disease (AD) has been challenging since it involves selection of discriminative features. Fusion of two-dimensional CNN and three-dimensional CNN achieves better accuracy was demonstrated in [19]. Information along Z direction acts very crucial for analysis of brain images and three-dimensional CNN was used to retain this information. Since the thickness of brain CT images is thicker than MRI images, geometric normalization of CT images were performed. Output of the last conv layer of two-dimensional CNN was fused with three-dimensional convoluted data to get three classes (Alzheimer’s, lesions, and healthy data). It was compared with two hand-crafted approaches SIFT and KAZE for accuracy and achieved better accuracy of and for AD, lesion and normal class, respectively.

Input cascaded CNN

Lack of training data can be overcome by extensive augmentation and fine tuning was proposed in [62]. Multi-grade brain tumor classification was performed by segmenting the tumor regions from an MR image using input cascaded CNN, extensive augmentation and then fine-tuned using data augmented. The performance was compared against state-of-art methods. It resulted in an accuracy of , sensitivity of and specificity of . The CNN approaches for medical image classification discussed above are summarized in Table 10.

Table 10

A summary of CNN applications in medical image classification surveyed in Sect. 4

References	Organ	Preprocessing	Architecture	Dataset	Performance metric
[19]	Brain	Geometric normalization of CT images	Conv \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 5$$\end{document}×5, (conv + dropout) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 3$$\end{document}×3, fusion with 3-D convoluted data	3-D datasets from 282 subjects (51 with AD, 118 with lesions, 117 normal)	Accuracy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$86.7\%, 78.9\%$$\end{document}86.7%,78.9% and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$95.6\%$$\end{document}95.6% for AD, lesion and normal class
[63]	Brain	Motion correction, skull stripping, spatial smoothing	LeNet	ADNI dataset	Accuracy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$96.86\%$$\end{document}96.86%
[62]	Brain	Data augmentation	Input cascaded CNN for segmentation, VGG-19 with transfer learning	Radiopedia and brain tumor	Accuracy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$94.58\%$$\end{document}94.58%, sensitivity \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$88.41\%$$\end{document}88.41%, specificity \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$96.58\%$$\end{document}96.58%
[57]	Eye, retinopathy	Color normalization, image size reduced to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$512 $$\end{document}512	Conv \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 10$$\end{document}×10, FC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 3$$\end{document}×3 with softmax classifier	Kaggle	Specificity \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$95\%$$\end{document}95% but less sensitivity \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$30\%$$\end{document}30%
[82]	Digestive organs		(Conv, maxpool) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 3$$\end{document}×3, FC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 2$$\end{document}×2, mini batch stochastic gradient descent	Real clinical data of wireless capsule endoscopy images	Accuracy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$95\%$$\end{document}95%

A summary of CNN applications in medical image classification surveyed in Sect. 4

CNN applications in medical image segmentation

CNNs have been applied to implement efficient segmentation of images of brain tumors, hearts, breasts, retina, fetal abdomen, stromal and epithelial tissues.

Brain tumors

MRI is used to obtain detailed images of the brain to diagnose tumors. Automatic segmentation of a brain tumor is very challenging because it involves the extraction of high level features.

Small kernel CNN

Patch-wise training and use of small filter sizes () was proposed for segmentation of gliomas in [54]. This provided an advantage of deep architecture, while retaining the same receptive fields. Two separate models were trained for high and low gliomas. High glioma model consisted of eight conv layers and three dense layers. Low glioma model contained four conv layers and three dense layers. Maxpooling was used along with dropout for dense layers. It ranked fourth in the BRATS-2015 challenge. Data augmentation was achieved by rotation which enhanced the performance of segmentation of gliomas.

Fully blown CNN

Fully blown MRI two-dimensional images enhances performance of segmentation of sub-cortical human brain structure. This was shown in [66]. The proposed model applied Markov random field on CNN output to impose volumetric homogeneity to the final results. It outperformed several state-of-the-art methods.

Multipath CNN

Two pathways, one for convolution and the other for deconvolution, enhances segmentation output was shown in [8]. The model was used for automatic MS lesions segmentation. The model had convolutional pathway consisting of alternating conv, pool layers, and a deconvolutional pathway consisting of alternate deconv layer and unpooling layer. The pretraining was performed by convolutional RBMs (convRBM). Both pretraining and fine training were performed on a highly optimized GPU-accelerated implementation of three-dimensional convRBMs and convolutional encoder networks (CEN). It was compared with five publicly available methods and established as comparison reference points. The model performance was evaluated using evaluation metrics DSC, TPR and FPR. TPR and FPR achieved were comparatively better than the previous models developed. However, it achieved lesser DSC in comparison to other methods.

Cascaded CNN

In case of imbalanced label distributions, two phase training could be used. Global contextual features and local detailed features can be learned simultaneously by two-pathway architecture for brain segmentation and was proposed in [25]. The advantage of two-pathway was, it could recognize fine details of the tumor at a local scale and correct labels at a global scale to yield a better segmentation. Slice-by-slice segmentation from the axial view due to less resolution in the third dimension was performed. The cascaded CNN achieved better rank than two-pathway CNN and was ranked second at the MICCAI BRATS-2013 challenge. The evaluation metrics used were DSC, specificity and sensitivity and the obtained values were and . The time taken for segmentation was between 25 s and 3 min.

Multiscale CNN

In case of brain tumor segmentation, a multiscale CNN architecture for extracting both local and global features at different scales was proposed in [80]. The model performed better due to different features extracted at various resolution. The computation time was reduced by exploiting a two-dimensional CNN instead of a three-dimensional CNN. Three patch sizes , and were input to three CNNs for feature extraction. All the features extracted were input to the FC layer. Evaluation of the model was by DSC and accuracy. The model performance was almost as stable as the best method with an accuracy of nearly .

Multipath and multiscale CNN

Twopath and multiscale architecture were also explored for brain lesion segmentation by [37]. The model exploited smaller kernels to get local neighbour information and employed parallel convolutional pathways for multiscale processing. It achieved highest accuracy when applied on patients with severe traumatic brain injuries. It could also segment small and diffused pathologies. Three-dimensional CNN produced accurate segmentation borders. FC three-dimensional CRF imposed regularization constraints on CNN output and produced final hard segmentation labels. Also, due to its generic nature, it cold be applied to different lesion segmentation tasks with slight modifications. It was ranked first in the stroke lesions ISLES-SISS-2015 challenge. Advantages of multipath and multiscale CNN was exploited for automatic segmentation of analytical brain images in [49]. The bigger kernel was used for spatial information. A separate network branch was used for each patch size, and only the output layer was shared. Mini batch learning and RMSprop were used to train the network with ReLU and cross entropy as the cost function. Automatic segmentation was evaluated using the DSC and mean surface distance between manual and automatic segmentation. It achieved accurate segmentation in terms of DSC for all tissue classes. The CNN approaches for brain segmentation discussed above are summarized in Table 11.

Table 11

A summary of CNN applications in brain medical image segmentation surveyed in Sect. 5

References	Preprocessing	Architecture	Performance metric
[55]	N4ITK bias field correction, Nyul’s intensity normalization	Seven layers with conv and ReLU at 1, 3, 5, 6 and maxpool at 2 and 4, layer seven was FC with softmax	Precision \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$98.3\%$$\end{document}98.3% and accuracy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$97\%$$\end{document}97%
[66]	Augmented by flipping and translation	Shallow network, (conv, max pool layers) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 5$$\end{document}×5. First two layer outputs are subsampled, five output classes	DSC, Haudsroff distance, contour mean distance
[25]	Removed \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1\%$$\end{document}1% higher and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1\%$$\end{document}1% lower intensity, normalization	Two path CNN with cascaded architecture with maxout, followed by maxpool, softmax. optimization using stochastic gradient descent, regularization using L1, L2 and dropout.	DSC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$79\%$$\end{document}79%, sensitivity \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$79\%$$\end{document}79% and specificity \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$81\%$$\end{document}81%
[37]		2 pathway, 11 layers architecture called deep medic, with three-dimensional CNN, FC three-dimensional CRF and smaller kernels	DSC, precision and sensitivity
[49]	Classified brain tissue pixels into classes, using different kernel sizes	Separate network branch was used for each patch size, and only output layer was shared. Mini batch learning and RMSprop to train the network with ReLU and cross entropy as the cost function.	DSC and mean surface distance between manual and automatic segmentation

A summary of CNN applications in brain medical image segmentation surveyed in Sect. 5

Breast cancer

Breast cancers can be predicted by automatically segmenting breast density and by characterizing mammographic textural patterns.

FCNN

Redundant computations in conv and max pool layers can be avoided by using ROI segmentation in a fast scanning deep CNN (FCNN). The above technique was applied for segmentation of histopathological breast cancer images and was proposed in [72]. The proposed work was compared against three texture classification methods namely raw pixel patch with large scale SVM, local binary pattern feature with large scale SVM and texton histogram with logistic booting. The evaluation metrics used were accuracy, efficiency and scalability. Ihe proposed method was robust to intra-class variance. It achieved an F-score of , whereas the other methods delivered the maximum F-score of . It took only 2.3 s to segment image of resolution of .

Probability map CNN

Probability maps were explored for iterative region merging for shape initialization along with compact nucleus shape repository with selection-based dictionary learning algorithm in [77]. The model resulted in better automatic nucleus segmentation using CNN. The framework was tested on three types of histopathology images namely, brain tumor, pancreatic neuro endocrine tumor and breast cancer. The parameters for comparison were precision, recall and F-score. It achieved better performance when compared to SVM, RF and DBN, especially for breast cancer images. Pixel-wise segmentation accuracy measured using DSC, HD and MAD resulted in superior performance when compared to other methods.

Patch CNN

Advantages of patch-based CNN was exploited in [78]. The method also exploited super pixel method to over segment breast cancer H&E images into atomic images. The result was natural boundaries with errors being subtle and less egregious, whereas sliding window methods resulted in zigzag boundaries. Both patch-based CNN and superpixel techniques were combined for segmenting and classifying the stromal and epithelial regions in histopathological images for detection of breast and colorectal cancer. The proposed model outperformed CNN with SVM. The comparison was done against methods using handcrafted features. It achieved accuracy and Deep CNN-Ncut-SVM had better AUC than other CNN.

Greedy CNN

The architecture of conventional CNNs was tweaked by making the filters of the CNN learn sequentially using a greedy approach of boosting instead of backpropagation. Boosting was applied to learn diverse filters to minimize weighted classification error. The ensembling learning was proposed for the automatic segmentation of optic cup and optic disc from retinal fundus images to detect glaucoma in [81]. The model performed entropy sampling to identify informative points on landmarks such as edges, blood vessels. etc. The weight updates were done considering final classification error instead of back propagation error. The method operated on patches of image taken around a point. A F-score of was obtained which was comparatively better to normal CNN whose best F-score was .

Multi label inference CNN

Retinal blood vessel segmentation was dealt as multi-label inference problem and solved using CNN in [16]. The model extracted green channel from RGB fundus image, as blood vessels manifest high contrast in green channel. The model was upsampled at the sixth layer to increase spatial dimension for structured output. The output of CNN model was modeled as vector instead of a scalar, due to multiple labels. It achieved precision of , sensitivity of , specificity of , accuracy of and AUC of .

Lung

U net

Lung segmentation and bone shadow exclusion techniques for analysis of lung cancer using U-net architecture is proposed in [23]. The images were preprocessed to eliminate bone shadow and a simple U-net architecture was used to segment the lung ROI. The results obtained were very promising and showed a good speed and precise segmentation. The CNN approaches for medical image segmentation discussed above are summarized in Table 12.

Table 12

A summary of CNN applications in medical image segmentation surveyed in Sect. 5

References	Organ	Dataset	Comparison against	Performance metric
[8]	MS lesion	ISBI 2015 and MICCAI 2008 datasets	Five publicly available methods	DSC, TPR and FPR. Achieved better TPR and FPR, but less DSC values
[36]	Breast	Dutch breast cancer screening dataset	State of the art methods	AUC
[72]	Breast	The Cancer Genome Atlas breast cancer dataset	Three texture classification methods namely RPLSVM, LBPLSVM and THLB	Accuracy, efficiency and scalability and F-score of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$85\%$$\end{document}85%
[77]	Nucleus	Brain tumor, pancreatic NET and breast cancer	SVM, RF and DBN	Precision, recall and F-score
[83]	Heart	CCTA scans of 60 patients	Manual segmentation	Sensitivity \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$95\%$$\end{document}95% and specificity \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$96.6\%$$\end{document}96.6% DSC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$85\%$$\end{document}85% and mean absolute surface distance \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$85\%$$\end{document}85%
[81]	Eye glaucoma	DRISHTI-GS, Messidor	Manual	Precision, recall, F-score

A summary of CNN applications in medical image segmentation surveyed in Sect. 5

CNN applications in medical image detection

A Camelyon grand challenge for automatic detection of metastatic breast cancer in digital whole slide images of sentinel lymph node biopsies is organised by the International Symposium of Biomedical Imaging.

GoogLeNet CNN

The award-winning system with performance very closer to human accuracy was proposed in [76]. The computation time was reduced by first excluding the white background of digital images using Otsu’s algorithm. The method exploited advantages of patch-based classification to obtain better results. Also the model trained extensively on misclassified image patches, to decrease classification error. The results of the patches were embedded on heatmap image and heatmaps were used to compute evaluation scores. AUC of was obtained, and was the top performer in the challenge. In case of lesion based detection, the system achieved the sensitivity of , whereas the second ranking score was .

Dynamic CNN

Random assignment of weights, speeds up the training and improves the performance. This was proposed for hemorrhage detection in fundus eye images in [75]. Also the samples were dynamically selected at every training epoch from a large pool of medical images. Pre-processing was performed using image contrast using gaussian filters. To prevent overfitting, the images were augmented. For correct classification of hemorrhage, the result was convolved with gaussian filter to smoothen the values. It achieved sensitivity, specificity and ROC of , and , whereas non selective sampling obtains sensitivity, specificity and ROC of and for Messidor dataset. AUC was used to monitor overfitting during training and when AUC value reached a stable maximum, the CNN training was stopped. An ensemble performs better than a single CNN and can be used to achieve higher performance. The ensemble model for detection of retinal vessels in fundus images is proposed in [46]. The model was an ensemble of twelve CNNs. Each CNN’s output probability was averaged to get the final vessel’s probability of each pixel. The probability was used to discriminate between vessel pixels from non-vessel ones for detection. The performance measures, accuracy and Kappa score were compared with existing state of the art methods. It stood second in terms of accuracy as well as in kappa score. The model obtained a FROC score of 0.928.

Cell division

LeNet CNN

Augmentation and shifting the centroid of the object enhanced the performance, and was proposed in [69]. The model was for automatic detection of mitosis (cell divisions) and quantification of mitosis occurring during scratch assay. The positive example training samples was augmented by mirroring and rotating by and centering by shifting the centroid of the object to the patch center. A random additional sampling of negative samples was added in the same amount as positive examples. The performance parameters used were sensitivity, specificity, AUC and F-score and compared with SVM. The results indicated significant increase in F-score (for SVM , for CNN ). The model concluded that both positive and negative samples are needed for better performance. The CNN applications in medical image detection reviewed in this paper are summarized in Table 13.

Table 13

A summary of CNN applications in medical image detection surveyed in Sect. 6

References	Organ	Preprocessing	Architecture	Performance metric
[76]	Breast	Threshold tissue from white background and patch-based classification	GoogLeNet	AUC for slide based classification \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$92.5\%$$\end{document}92.5%, lesion based detection sensitivity \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$70.5\%$$\end{document}70.5%
[50]	Skin (melanoma)	Illumination correction step, removal of noise using gaussian filters	(Conv, ReLU, maxpool) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 2$$\end{document}×2, FC, softmax	Sensitivity, specificity, true positive, false positive and accuracy
[75]	Hemorrhage (eye)	Image contrast using gaussian filters, augment using vertical, horizontal flips	(Conv, ReLU, maxpool) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 5$$\end{document}×5, FC, softmax	Sensitivity, specificity and ROC
[46]	Eye retina		Ensemble of twelve CNN having (conv, ReLU) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 3$$\end{document}×3, (FC, dropout) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 2$$\end{document}×2	Accuracy, FROC of 0.928
[69]	Cell	Extraction of grayscale patches, segmentation and binarization	LeNet	Sensitivity, specificity, AUC and F-score. F-score (SVM \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$78\%$$\end{document}78%, CNN \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$89\%$$\end{document}89%)

A summary of CNN applications in medical image detection surveyed in Sect. 6

CNN applications in medical image localization

Semi-supervised deep CNN

To overcome challenges of sparsely labeled data, a model was proposed in [73]. The unlabeled data was first automatically labeled using labeled data. Newly labeled data and initial labeled data were used to train the deep CNN. The method used semi-supervised deep CNN for breast cancer diagnosis. The performance of CNN was compared with SVM and ANN using different numbers of labeled data like 40, 70 and 100. The model produced comparable results even with sparse labeled data with accuracy of and AUC of .

Pyramid of scales localization

Pyramid of scales (PoS) localization leads to better performance, specially in cases, where the size of the organ varies between the patients. The size of the heart is not consistent among human, and hence PoS was proposed for localization of left ventricle (LV) in cardiac MRI images in [17]. The model also exploited patch based training. Evaluation metrics used were accuracy, sensitivity and specificity and were respectively and . The limitation of the approach was the computing time of 10 s/image.

Fetal abnormalities

Transfer learning CNN

Transfer learning uses the knowledge of low layers of a base CNN trained on a large cross domain of dataset of images. Transfer learning advantages include saving of training time, and need of less data for training. This reduces overfitting and enhances the classification performance. Domain transferred deep CNN for fetal abdominal standard plane (FASP) localization in fetal ultrasound scanning was proposed in [10]. The base CNN was trained on 2014 ImageNet detection dataset. The metrics accuracy, precision, recall and F-score were the highest when compared to R-CNN and RVD. The drawback of the system was that it took more time to locate FASP from one ultra sound video. The CNN methods for image localization previewed in this paper are summarized in Table 14.

Table 14

A summary of CNN applications in medical image localization surveyed in Sect. 7

References	Organ	Architecture	Dataset	Comparison against	Performance Metric
[17]	Heart	(Conv, ReLU, maxpool) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 2$$\end{document}×2, FC	Public dataset of York University		Accuracy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$98.66\%$$\end{document}98.66%, sensitivity \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$83.91\%$$\end{document}83.91%, specificity 99.07%
[10]	Fetal abdominal	(Conv, ReLU, maxpool) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 2$$\end{document}×2 conv \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 3$$\end{document}×3, maxpool, FC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 3$$\end{document}×3	Imagenet 2014, Shenzen Maternal and Child Healthcare Hospital, China	R-CNN and RVD	Accuracy, precision, recall, F-score
[73]	Breast	Semi-supervised deep CNN		SVM, ANN	Accuracy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$83\%$$\end{document}83%, AUC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$88\%$$\end{document}88%

The papers reviewed for medical image understanding are summarized in Table 15.

Table 15

Summary of papers reviewed for CNN applications in medical image understanding

Image understanding tasks	Organs targeted	References
Medical image classification	Lung	[4, 18, 44, 68]
	COVID-19	[6, 13, 22, 24, 45]
	HEp-2	[7, 21]
	Breast	[34, 52, 73, 78]
	Heart	[20, 40]
	Eye	[57]
	Digestive organs	[82]
	Brain	[19, 63]
	Colon	[61, 71, 78]
Medical image segmentation	Brain	[25, 37, 49, 55, 66]
	Breast	[36, 72]
	MS lesion	[8]
	Nucleus	[77]
	Heart	[83]
	Eye	[81]
Medical image detection	Breast	[5, 76]
	Melanoma	[50]
	Eye	[46, 75]
	Cell	[69]
Medical Image localization	Heart	[17]
Medical Image localization	Fetal Abdomen	[10]

A summary of CNN applications in medical image localization surveyed in Sect. 7 Summary of papers reviewed for CNN applications in medical image understanding Ways of addressing challenges of medical image understanding

Critical review and conclusion

CNNs have been successfully applied in the areas of medical image understanding and this section provides a critical review of applications of CNNs in medical image understanding. Firstly, the literature contains a vast number of CNN architectures. It is difficult to select the best architecture for a specific task due to high diversity in architectures. Moreover, the same architecture might result in different performance due to inefficient data preprocessing techniques. A prior knowledge of data for applying the correct preprocessing technique is needed. Futhermore, hyper parameters optimization (dropout rate, learning rate, optimizer etc) help in enhancing or declining the performance of a network. For training, CNNs require exhaustive amounts of data containing the most comprehensive information. Insufficient information or features leads to underfitting of the model. However, augmentation could be applied in such scenarios as it results in translation variance and increases the training dataset, thereby enhancing the CNNs efficiency. Furthermore, transfer learning and fine-tuning could also be used to enhance the efficiency in case of sparse availability of data. These enhance the performance since the low level features are nearly the same for most of the images. Small-sized kernels could be used to enhance the performance by capturing low-level textual information. However, it is at the cost of increased computational complexity during training. Moreover, multiple pathway architecture could be used to enhance performance of CNN. The performance is enhanced due to simultaneous learning of global contextual features and local detailed features, but this in turn, increases the computational burden on the processor and memory. One of the challenge involved in medical data is the class imbalance problem, where the positive class is generally under-represented and most of the images belong to the normal class. Designing CNNs to work on imbalanced data is a challenging task. However, researchers have tried to overcome this challenge by applying augmentation of the under-represented data. Denser CNNs could also lead to the vanishing gradient problem which could be overcome by using skip connections as in the inceptionNet architecture. Furthermore, CNNs’ significant depth and enormous size require huge memory and higher computational resources for training. The deeper CNNs involves millions of training parameters which could lead the model to overfit and also inefficient at generalization, especially in the case of limited dataset. This calls for models which are lightweight and which could also extract critical features like the dense models. Lightweight CNNs could be explored further. Medical image Understanding would be more efficient in the presence of background context or knowledge about the image to be understood. In this context, CNNs would be more efficient if the data consists of not only images, but also patient history. Hence, the next challenging task would be to build models, which take as input both images and patient history to make a decision and this could be the next research trend. Interpreting CNNs is challenging due to many layers, millions of parameters, and complex, nonlinear data structures. CNN researchers have been concentrating on building accurate models without quantifying uncertainty in the obtained results. The need for successful utilization of the CNN model in medical diagnosis lies in providing confidence and this confidence needs the ability of the model to ascertain its uncertainty or certainty or explain the results obtained. This field needs further exploration. Although, researchers have proposed heat maps, class activation maps (CAM), grad CAM, grad CAM++ for visualization of CNN outputs, the area of visualization is still a challenge. The various challenges and methods of overcoming some of the challenges of medical image understanding are summarized in Table 16. Further, efficient architectures to overcome some of the challenges as per the survey are summarized in Table 17.

Table 16

Ways of addressing challenges of medical image understanding

Challenges in medical image understanding	Solutions for overcoming challenges
Less training samples	Augmentation
Sparsity of labeled data	Semi supervised deep CNN
Noise in images	Gaussian filtering
Thicker CT images of brain	Geometric normalization
Extract local and global features	Multipathway CNN
Avoid redundant computations	ROI segmentation using fCNN
Reduce classification errors	Train on misclassified image patches
Heart classification	Two path CNN, for spatial and temporal features
Imbalanced label distributions	Two phase training
Less resolution in brain images	Slice by slice segmentation
Increase accuracy and AUC in eye images (retinal eye vessel)	Ensemble learning (boosting instead of back propagation, RF instead of FC)

Table 17

Efficient CNN architectures For medical image understanding

Organ (modality)	Diseases	Tasks	Best architecture
Brain MRI	Alzheimer	Segementation	Lenet-5, GoogleNet
	Alzheimer	Detection	Multipath CNN
	Epilepsy	Classification	CNN with seven conv and
	Schizophrenic, bipolar disorder	Segementation	Patch based CNN with 4 conv, 2 maxpool and softmax regression
Breast mammogram	Tumour	Segementation	(Conv + maxpool) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document}× \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2 +2$$\end{document}2+2 FC, Softmax
		Detection	Scaled VGGnet or GoogleNet
		Classification	Alexnet
		Localization	Semi Supervised Deep CNN
Heart (CT)	Coronary artery calcium scoring	Segmentation	Two path CNN
Heart (ECG)	ECG	Classification	Two path CNN, one along spatial and the other along temporal and fused together finally
Heart (MRI) left ventricle		Localization	(Conv, maxpool) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 2$$\end{document}×2, FC
Lung (CT)	Cancer	Classification	Ensemble of overfeat across axial, saggital and coronal plane
	Nodule	Detection	Lenet, Overfeat
	COVID-19	Classification	multi scale multi encoder ensemble CNN model
	ILD	Classification	Any CNN architecture like Alexnet
Hep-2 cell		Classification	(Conv, maxpool) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 3$$\end{document}×3, 2 FC, softmax regressor
Eye	Haemorrhage	Detection	(Conv, ReLu, maxpool) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 5$$\end{document}×5, FC, softmax classifier
	Glaucoma	Detection	(Conv, ReLu, maxpool) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 2$$\end{document}×2, FC with adaboost
	Retinopathy	Segmentation	(Conv, ReLu, maxpool) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 10$$\end{document}×10, 3 FC, softmax classifier
Colon	Polyp	Classification	Any simple CNN architecture
Colon	Polyp	Detection	Ensemble of CNN
Skin	Melanoma	Detection	(Conv, ReLu, maxpool) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 2$$\end{document}×2, FC
Liver (CT)		Classification	(Conv, maxpool) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 2$$\end{document}×2, 3 conv, max pool, 3 FC, softmax
Abdomen (US scan)	Fetus	Localization	(Conv, ReLU, maxpool) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 2$$\end{document}×2 conv \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 3$$\end{document}×3, maxpool, FC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 3$$\end{document}×3

Efficient CNN architectures For medical image understanding Deep learning includes methods like CNN, recurrent neural network and generative adversial networks. The review of these methods and applications have not been included, as these methods by themselves become topics of research and there is lot of research happening in those areas. Moreover, all the aspects involved in medical image understanding are also not included since it is an ocean and the focus in the paper is only on a few important techniques involved.

Conclusion

The heterogeneous nature of medical anomalies in terms of shape, size, appearance, location and symptoms poses challenges for medical anomaly diagnosis and prognosis. Traditional methods of using human specialists involve fatigue, oversight and high cost and also sparse availability. ML-based healthcare systems need efficient feature extraction methods. But efficient features are still unknown and also, the methods available for feature extraction are not very efficient. This calls for intelligent healthcare systems that automatically extract efficient features for medical image understanding that aids diagnosis and prognosis. CNN is a popular technique for solving medical image understanding challenges due to its highly efficient methods of feature extraction and learning low level, mid level and high level discriminant features of an input medical image. The literature reviewed in this paper underscores that researchers have focused their attention on the use of CNN to overcome many challenges in medical image understanding. Many have accomplished the task successfully. The CNN methods discussed in this paper have been found to either outperform or compliment the existing traditional and ML approaches in terms of accuracy, sensitivity, AUC, DSC, time taken etc. However, their performance is often not the best due to a few factors. A snapshot summary of the quantum of research articles surveyed in this article is presented in the Fig. 2.

Fig. 2

Bar chart summarizing the number of papers surveyed

Bar chart summarizing the number of papers surveyed The challenges in image understanding with respect to medical imaging have been discussed in this paper. Various image understanding tasks have been introduced. In addition, CNN and its various components have been outlined briefly. The approaches used by the researchers to address the various challenges in medical image understanding have been surveyed. CNN models have been described as black boxes and there is a lot of research happening in terms of analyzing and understanding output at every layer. Since medical images are involved, we need an accountable and efficient prediction system which should also be able to articulate about a decision taken. Researchers are also working on image captioning (textual representations of the image) [29]. This will enable physicians to understand the perception of the network at both output layer and intermediate levels. Researchers have tried Bayesian deep learning models which calculates the uncertainty estimates [38]. This would help physicians assess the model. All these could further accelerate medical image understanding using CNNs among physicians.

33 in total

1. Deep 3D Convolutional Encoder Networks With Shortcuts for Multiscale Feature Integration Applied to Multiple Sclerosis Lesion Segmentation.

Authors: Tom Brosch; Lisa Y W Tang; David K B Li; Anthony Traboulsee; Roger Tam
Journal: IEEE Trans Med Imaging Date: 2016-02-11 Impact factor: 10.048

2. HEp-2 Cell Image Classification With Deep Convolutional Neural Networks.

Authors: Zhimin Gao; Lei Wang; Luping Zhou; Jianjia Zhang
Journal: IEEE J Biomed Health Inform Date: 2016-02-08 Impact factor: 5.772

3. A Deep Convolutional Neural Network for segmenting and classifying epithelial and stromal regions in histopathological images.

Authors: Jun Xu; Xiaofei Luo; Guanhao Wang; Hannah Gilmore; Anant Madabhushi
Journal: Neurocomputing Date: 2016-02-17 Impact factor: 5.719

4. Standard Plane Localization in Fetal Ultrasound via Domain Transferred Deep Neural Networks.

Authors: Hao Chen; Dong Ni; Jing Qin; Shengli Li; Xin Yang; Tianfu Wang; Pheng Ann Heng
Journal: IEEE J Biomed Health Inform Date: 2015-04-21 Impact factor: 5.772

5. Automatic Segmentation of MR Brain Images With a Convolutional Neural Network.

Authors: Pim Moeskops; Max A Viergever; Adrienne M Mendrik; Linda S de Vries; Manon J N L Benders; Ivana Isgum
Journal: IEEE Trans Med Imaging Date: 2016-03-30 Impact factor: 10.048

6. Glaucoma detection using entropy sampling and ensemble learning for automatic optic cup and disc segmentation.

Authors: Julian Zilly; Joachim M Buhmann; Dwarikanath Mahapatra
Journal: Comput Med Imaging Graph Date: 2016-08-23 Impact factor: 4.790

7. Real-Time Patient-Specific ECG Classification by 1-D Convolutional Neural Networks.

Authors: Serkan Kiranyaz; Turker Ince; Moncef Gabbouj
Journal: IEEE Trans Biomed Eng Date: 2015-08-14 Impact factor: 4.538

8. Fast Convolutional Neural Network Training Using Selective Data Sampling: Application to Hemorrhage Detection in Color Fundus Images.

Authors: Mark J J P van Grinsven; Bram van Ginneken; Carel B Hoyng; Thomas Theelen; Clara I Sanchez
Journal: IEEE Trans Med Imaging Date: 2016-02-08 Impact factor: 10.048

9. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning.

Authors: Daniel S Kermany; Michael Goldbaum; Wenjia Cai; Carolina C S Valentim; Huiying Liang; Sally L Baxter; Alex McKeown; Ge Yang; Xiaokang Wu; Fangbing Yan; Justin Dong; Made K Prasadha; Jacqueline Pei; Magdalene Y L Ting; Jie Zhu; Christina Li; Sierra Hewett; Jason Dong; Ian Ziyar; Alexander Shi; Runze Zhang; Lianghong Zheng; Rui Hou; William Shi; Xin Fu; Yaou Duan; Viet A N Huu; Cindy Wen; Edward D Zhang; Charlotte L Zhang; Oulan Li; Xiaobo Wang; Michael A Singer; Xiaodong Sun; Jie Xu; Ali Tafreshi; M Anthony Lewis; Huimin Xia; Kang Zhang
Journal: Cell Date: 2018-02-22 Impact factor: 41.582

10. Intelligent screening systems for cervical cancer.

Authors: Yessi Jusman; Siew Cheok Ng; Noor Azuan Abu Osman
Journal: ScientificWorldJournal Date: 2014-05-11

24 in total

1. CNN Architectures and Feature Extraction Methods for EEG Imaginary Speech Recognition.

Authors: Ana-Luiza Rusnac; Ovidiu Grigore
Journal: Sensors (Basel) Date: 2022-06-21 Impact factor: 3.847

2. LLRHNet: Multiple Lesions Segmentation Using Local-Long Range Features.

Authors: Liangliang Liu; Ying Wang; Jing Chang; Pei Zhang; Gongbo Liang; Hui Zhang
Journal: Front Neuroinform Date: 2022-05-05 Impact factor: 3.739

Review 3. A Survey on Human Cancer Categorization Based on Deep Learning.

Authors: Ahmad Ibrahim; Hoda K Mohamed; Ali Maher; Baochang Zhang
Journal: Front Artif Intell Date: 2022-06-27

4. The impact of transfer learning on 3D deep learning convolutional neural network segmentation of the hippocampus in mild cognitive impairment and Alzheimer disease subjects.

Authors: Erica Balboni; Luca Nocetti; Chiara Carbone; Nicola Dinsdale; Maurilio Genovese; Gabriele Guidi; Marcella Malagoli; Annalisa Chiari; Ana I L Namburete; Mark Jenkinson; Giovanna Zamboni
Journal: Hum Brain Mapp Date: 2022-04-04 Impact factor: 5.399

5. Evaluating Deep Neural Network Architectures with Transfer Learning for Pneumonitis Diagnosis.

Authors: Surya Krishnamurthy; Kathiravan Srinivasan; Saeed Mian Qaisar; P M Durai Raj Vincent; Chuan-Yu Chang
Journal: Comput Math Methods Med Date: 2021-09-12 Impact factor: 2.238

6. Deep Convolution Neural Network for Laryngeal Cancer Classification on Contact Endoscopy-Narrow Band Imaging.

Authors: Nazila Esmaeili; Esam Sharaf; Elmer Jeto Gomes Ataide; Alfredo Illanes; Axel Boese; Nikolaos Davaris; Christoph Arens; Nassir Navab; Michael Friebe
Journal: Sensors (Basel) Date: 2021-12-06 Impact factor: 3.576

7. AI-Based Pipeline for Classifying Pediatric Medulloblastoma Using Histopathological and Textural Images.

Authors: Omneya Attallah; Shaza Zaghlool
Journal: Life (Basel) Date: 2022-02-03

8. Tomographic reconstruction from planar thermal imaging using convolutional neural network.

Authors: Daniel Ledwon; Agata Sage; Jan Juszczyk; Marcin Rudzki; Pawel Badura
Journal: Sci Rep Date: 2022-02-11 Impact factor: 4.379

9. A Deep Learning Framework for Segmenting Brain Tumors Using MRI and Synthetically Generated CT Images.

Authors: Kh Tohidul Islam; Sudanthi Wijewickrema; Stephen O'Leary
Journal: Sensors (Basel) Date: 2022-01-11 Impact factor: 3.576

10. Do comprehensive deep learning algorithms suffer from hidden stratification? A retrospective study on pneumothorax detection in chest radiography.

Authors: Jarrel Seah; Cyril Tang; Quinlan D Buchlak; Michael Robert Milne; Xavier Holt; Hassan Ahmad; John Lambert; Nazanin Esmaili; Luke Oakden-Rayner; Peter Brotchie; Catherine M Jones
Journal: BMJ Open Date: 2021-12-07 Impact factor: 2.692