Obioma Pelka1,2, Felix Nensa3, Christoph M Friedrich1,4. 1. Department of Computer Science, University of Applied Sciences and Arts Dortmund (FHDO), Dortmund, NRW Germany. 2. Faculty of Medicine, University of Duisburg-Essen, Essen, NRW, Germany. 3. Department of Diagnostic and Interventional Radiology and Neuroradiology, University Hospital Essen, Essen, NRW, Germany. 4. Institute for Medical Informatics, Biometry and Epidemiology (IMIBE), University Hospital Essen, Essen, NRW, Germany.
Abstract
The number of images taken per patient scan has rapidly increased due to advances in software, hardware and digital imaging in the medical domain. There is the need for medical image annotation systems that are accurate as manual annotation is impractical, time-consuming and prone to errors. This paper presents modeling approaches performed to automatically classify and annotate radiographs using several classification schemes, which can be further applied for automatic content-based image retrieval (CBIR) and computer-aided diagnosis (CAD). Different image preprocessing and enhancement techniques were applied to augment grayscale radiographs by virtually adding two extra layers. The Image Retrieval in Medical Applications (IRMA) Code, a mono-hierarchical multi-axial code, served as a basis for this work. To extensively evaluate the image enhancement techniques, five classification schemes including the complete IRMA code were adopted. The deep convolutional neural network systems Inception-v3 and Inception-ResNet-v2, and Random Forest models with 1000 trees were trained using extracted Bag-of-Keypoints visual representations. The classification model performances were evaluated using the ImageCLEF 2009 Medical Annotation Task test set. The applied visual enhancement techniques proved to achieve better annotation accuracy in all classification schemes.
The number of images taken per patient scan has rapidly increased due to advances in software, hardware and digital imaging in the medical domain. There is the need for medical image annotation systems that are accurate as manual annotation is impractical, time-consuming and prone to errors. This paper presents modeling approaches performed to automatically classify and annotate radiographs using several classification schemes, which can be further applied for automatic content-based image retrieval (CBIR) and computer-aided diagnosis (CAD). Different image preprocessing and enhancement techniques were applied to augment grayscale radiographs by virtually adding two extra layers. The Image Retrieval in Medical Applications (IRMA) Code, a mono-hierarchical multi-axial code, served as a basis for this work. To extensively evaluate the image enhancement techniques, five classification schemes including the complete IRMA code were adopted. The deep convolutional neural network systems Inception-v3 and Inception-ResNet-v2, and Random Forest models with 1000 trees were trained using extracted Bag-of-Keypoints visual representations. The classification model performances were evaluated using the ImageCLEF 2009 Medical Annotation Task test set. The applied visual enhancement techniques proved to achieve better annotation accuracy in all classification schemes.
With respect to the last decade, ten times more medical images are taken, increasing the number of images per body region per patient to 200–1000 [1]. This huge increase can be traced back to two major facts: rapid advances in technology and significant importance of medical images. Medical images contain relevant information that is valuable to physicians. It provides a reliable source of anatomical and functional information for accurate diagnosis, effective treatment planning as well as research work [2, 3]. The advances of software and hardware in information technology sector and digital imaging in the medical domain have made the acquisition and storage of images in hospitals possible [4].This large image collection aids medical professionals and improves diagnosis. However, radiologists are challenged by the amount of data. They have to maintain a high interpretation accuracy of radiological images, but also maximize efficiency in terms of the increasing number of images per body region. Computer-based assistance is needed for image interpretation, categorization and annotation [5], as these are beneficial for content-based image retrieval (CBIR) systems and computer-aided diagnosis (CAD) [6].Deep learning techniques [7] have improved prediction accuracies in object detection [8], speech recognition [9] and in domain application such as medical imaging [10, 11]. Hence, two Deep Convolutional Neural Network (dCNN) systems were adopted for image classification. To compare and evaluate the performance of applied dCNN systems, a traditional classifier was modeled in addition.This paper evaluates the effect of several image enhancement techniques on the prediction accuracy rate on radiographs. To analyze this value, several classification schemes were acquired from the ImageCLEF 2009 Medical Annotation Task dataset. All images used at the training and testing stages were preprocessed with the various presented image enhancement techniques. Finally, the obtained image annotation performance accuracies are compared and discussed.
Related work
Several approaches to Information Retrieval (IR) in Medical Domain as objective have been designed. KHRESMOI was a large EU-funded project aimed at creating a multilingual and multimodal-based search system for biomedical information and documentation [12]. The GNU Image-Finding Tool (GIFT), an outcome of the Viper Project, enables users to perform query-by-example (QBE) search and improves result quality with relevance feedback [13]. In [14], Parallel Distributed Image Search Engine (ParaDISE) was proposed. This search engine enables the indexing and retrieving of images using present visual and text features. The Lucene Image Retrieval (LIRE), a lightweight open source library, provides image retrieval using visual features such as color and texture [15]. The IRMA-code, a mono-hierarchical multi-axial classification code for medical image was proposed in the Image Retrieval in Medical Applications (IRMA) [16]. The IRMA-code describes the modality of the images, orientation of the image, examined body region and the biological system investigated.Positive results have been achieved by image preprocessing using input color enhancement techniques. In [17], superior values were obtained by using dual deep convolutional neural networks and color input enhancement [18] to detect malignancy in digital mammography images. As computer-aided assistance is needed in image interpretation [19] and improved prediction accuracies have been obtained using deep convolutional neural networks [7], the objective of this paper is to create an automatic image annotation system using deep learning and image enhancement techniques. These annotated radiographs are fundamental for medical image retrieval systems.The aim of this presented approach is to apply several image enhancement techniques on radiographs, to increase the overall prediction accuracy of classification models. This is fundamental for implementing image retrieval systems.
Material
Dataset
The dataset adopted for evaluation was distributed at the ImageCLEF 2009 Medical Annotation task [20, 21]. The training set consists of 12,671 grayscale images and the official evaluation set has 1,732 grayscale images. Each radiograph in the training set is annotated with a 13-character string. Fig 1 shows two radiographs with the annotations 1121-127-732-500 and 1121-410-620-625, representing “Xray Analog Overview Image; Coronal Anteroposterior Supine; Lower Middle Quadrant; Uropoietic System” and “Xray Analog Low Beam Energy; Other Oblique Orientation; Left Breast; Reproductive Female System Breast”.
Fig 1
Example of two grayscale radiographs annotated with the 13-digit classification code.
Both images were randomly chosen from the ImageCLEF 2009 Medical Annotation Task Training Set. Republished from [21] under a CC BY license, with permission from [RWTH Aachen], original copyright [2009].
Example of two grayscale radiographs annotated with the 13-digit classification code.
Both images were randomly chosen from the ImageCLEF 2009 Medical Annotation Task Training Set. Republished from [21] under a CC BY license, with permission from [RWTH Aachen], original copyright [2009].
Classification schemes
5 different classification schemes are used for evaluation, which were derived by using the complete IRMA code, as well as splitting the code to its’ four axes.
IRMA
The 13-digit code used for annotation is known as the IRMA code and was proposed in [16]. The IRMA coding system is hierarchical and consists of four axes: the technical code (T) for image modality, the directional code (D) for body orientations, the anatomical code (A) referring to body region examined, and the biological code (B) for the biological system examined [16]. The code results in a string of 13 characters, ie. TTTT-DDD-AAA-BBB, which can be seen in Fig 1. The IRMA classification scheme contains altogether 197 individual classes, which represent the total distinct combinations of all four axes.
(T) technical scheme
The (T) technical classification scheme is the technical axis of the IRMA code. It consists of a 4-character string and denotes physical source, modality position, techniques and sub-techniques [16]. The T-scheme has 6 classes. A random excerpt of radiographs from the training set annotated with the t-scheme is shown in Fig 2.
Fig 2
Examples of radiographs annotated with two classes from the T-scheme.
(A) shows three images belonging to class ‘1124’ representing ‘Xray; Plain Radiology; Analog; Low Beam Energy’ and (B) displays three images belonging to class ‘1123’ representing ‘Xray; Plain Radiology; Analog; High Beam Energy’. All radiographs were randomly chosen from the ImageCLEF 2009 Medical Annotation Task Training Set. Republished from [21] under a CC BY license, with permission from [RWTH Aachen], original copyright [2009].
Examples of radiographs annotated with two classes from the T-scheme.
(A) shows three images belonging to class ‘1124’ representing ‘Xray; Plain Radiology; Analog; Low Beam Energy’ and (B) displays three images belonging to class ‘1123’ representing ‘Xray; Plain Radiology; Analog; High Beam Energy’. All radiographs were randomly chosen from the ImageCLEF 2009 Medical Annotation Task Training Set. Republished from [21] under a CC BY license, with permission from [RWTH Aachen], original copyright [2009].
(D) directional scheme
The (D) directional classification scheme is a 3-character string and denotes the orientation plane of the radiographs, such as coronal, sagittal and transversal [16]. This scheme is made up of 34 classes. A random excerpt of radiographs from the training set annotated with the d-scheme is shown in Fig 3.
Fig 3
Examples of radiographs annotated with two classes from the D-scheme.
(A) shows three image belonging to class ‘125’ representing ‘Coronal; Anteroposterior; Upright’ and (B) displays three images belonging to class ‘228’ representing ‘Sagital; Lateral, left-right; Inclination’. All radiographs were randomly chosen from the ImageCLEF 2009 Medical Annotation Task Training Set. Republished from [21] under a CC BY license, with permission from [RWTH Aachen], original copyright [2009].
Examples of radiographs annotated with two classes from the D-scheme.
(A) shows three image belonging to class ‘125’ representing ‘Coronal; Anteroposterior; Upright’ and (B) displays three images belonging to class ‘228’ representing ‘Sagital; Lateral, left-right; Inclination’. All radiographs were randomly chosen from the ImageCLEF 2009 Medical Annotation Task Training Set. Republished from [21] under a CC BY license, with permission from [RWTH Aachen], original copyright [2009].
(A) anatomical scheme
The (A) classification scheme stands for the complete coding of anatomical regions which are present in the human body. The A-scheme defines nine major body regions, where each region has 2 hierarchical sub-regions [16]. In total, the anatomical scheme has 97 individual classes and each class is represented by a 3-character string. A random excerpt of radiographs from the training set annotated with the a-scheme is shown in Fig 4.
Fig 4
Examples of radiographs annotated with two classes from the A-scheme.
(A) shows three images each belonging to class ‘732’ representing ‘Abdomen; Lower abdomen; Lower middle quadrant’ and (B) displays three images belonging to ‘213’ representing ‘Cranium; Facial cranium; Nose area’. All radiographs were randomly chosen from the ImageCLEF 2009 Medical Annotation Task Training Set. Republished from [21] under a CC BY license, with permission from [RWTH Aachen], original copyright [2009].
Examples of radiographs annotated with two classes from the A-scheme.
(A) shows three images each belonging to class ‘732’ representing ‘Abdomen; Lower abdomen; Lower middle quadrant’ and (B) displays three images belonging to ‘213’ representing ‘Cranium; Facial cranium; Nose area’. All radiographs were randomly chosen from the ImageCLEF 2009 Medical Annotation Task Training Set. Republished from [21] under a CC BY license, with permission from [RWTH Aachen], original copyright [2009].
(B) biological scheme
The (B) biological classification code categorizes the organic system scanned into ten major parts [16]. The B-scheme contains 11 classes and is represented by a 3-character string. A random excerpt of radiographs from the training set annotated with the b-scheme is shown in Fig 5.
Fig 5
Examples of radiographs annotated with two classes from the B-scheme.
(A) shows three imagse belonging to class ‘443’ representing ‘Gastrointestinal system; Small intestine; Ileum’ and (B) displays three images belonging to class ‘512’ representing ‘Uropoietic system; Kidney; Renal pelvis’. All radiographs were randomly chosen from the ImageCLEF 2009 Medical Annotation Task Training Set. Republished from [21] under a CC BY license, with permission from [RWTH Aachen], original copyright [2009].
Examples of radiographs annotated with two classes from the B-scheme.
(A) shows three imagse belonging to class ‘443’ representing ‘Gastrointestinal system; Small intestine; Ileum’ and (B) displays three images belonging to class ‘512’ representing ‘Uropoietic system; Kidney; Renal pelvis’. All radiographs were randomly chosen from the ImageCLEF 2009 Medical Annotation Task Training Set. Republished from [21] under a CC BY license, with permission from [RWTH Aachen], original copyright [2009].
Image enhancement
In this section, the three experiments adopted for enhancing visual representation before the classification and annotation of the radiographs are explained.
Image layering
For image recognition tasks, convolutional neural networks trained on large datasets produce favorable results. Considering the number of images in the ImageCLEF 2009 Medical Annotation Task, the adaptation of Transfer Learning with pre-trained neural, such as Inception-v3 [22] and Inception-ResNet-v2 [23], networks was chosen. These pre-trained Deep Convolutional Neural Network (dCNN) models were designed to extract amongst other features, color information from the images [24, 25]. However, the radiographs distributed for at the ImageCLEF 2009 Medical Annotation Task are grayscale images and have single color channel with values [0, 255]. To fully utilize the capabilities of dCNNs, two extra color layers are augmented to each radiograph, completing the RGB frames with the enhanced slices.The first extra layer was obtained using the image processing technique: Contrast Limited Adaptive Historization Equation (CLAHE) [18]. CLAHE is a contrast enhancement method, modified from the Adaptive Histogram Equation (AHE). It is designed to be broadly applicable and having demonstrated effectiveness, especially for medical images [26]. Fig 6 displays the original radiograph and the corresponding output image after CLAHE was performed. The CLAHE output images were obtained using the following parameters:
Fig 6
Medical image before and after Contrast Limited Adaptive Histogram Equation (CLAHE) was performed.
The radiograph was randomly chosen from the ImageCLEF 2009 Medical Annotation Task Training Set. Republished from [21] under a CC BY license, with permission from[RWTH Aachen], original copyright [2009].
Number of tiles: [8, 8]Contrast enhancement limit: 0.01Number of histogram bins: 256Range of output data: FullDesired histogram shape: UniformDistribution parameter: 0.4
Medical image before and after Contrast Limited Adaptive Histogram Equation (CLAHE) was performed.
The radiograph was randomly chosen from the ImageCLEF 2009 Medical Annotation Task Training Set. Republished from [21] under a CC BY license, with permission from[RWTH Aachen], original copyright [2009].The second layer was generated by applying the Non Local Means (NL-MEANS) preprocessing method. This is a digital image denoising method, based on a non local averaging of all present pixels in an image [27]. The effect of applying NL-MEANS to a randomly chosen radiograph from the ImageCLEF 2009 Medical Annotation Task Training Set is shown in Fig 7. The NL-MEANS output images were obtained using the following parameters:
Fig 7
Medical image before and after applying the Non Local Means (NL-MEANS) preprocessing method.
The radiograph was randomly chosen from the ImageCLEF 2009 Medical Annotation Task Training Set. Republished from [21] under a CC BY license, with permission from [RWTH Aachen], original copyright [2009].
Medical image before and after applying the Non Local Means (NL-MEANS) preprocessing method.
The radiograph was randomly chosen from the ImageCLEF 2009 Medical Annotation Task Training Set. Republished from [21] under a CC BY license, with permission from [RWTH Aachen], original copyright [2009].The augmented RGB-Image is obtained by adding the two layers to the original grayscale radiograph, as shown in Fig 8.
Fig 8
Enhanced grayscale radiograph, by augmenting 2 extra color layers to obtain a RGB-channeled medical image.
The radiographs were randomly chosen from the ImageCLEF 2009 Medical Annotation Task Training Set. Republished from [21] under a CC BY license, with permission from [RWTH Aachen], original copyright [2009].
Enhanced grayscale radiograph, by augmenting 2 extra color layers to obtain a RGB-channeled medical image.
The radiographs were randomly chosen from the ImageCLEF 2009 Medical Annotation Task Training Set. Republished from [21] under a CC BY license, with permission from [RWTH Aachen], original copyright [2009].
Image padding
There are variations regarding the height and width of the radiographs distributed for the ImageCLEF 2009 Medical Annotation Task. The upper and lower extremities are usually narrow with less width size, while head scans are wider with less height size. To obtain size similarity over all images, a fixed size was defined. All radiographs in the dataset were resized to [512 x 512] by padding the input images, which can be seen in Fig 9. The images are padded with their repetition, other alternatives are padding with a constant value or noise as well as image squashing.
Fig 9
Resized radiographs by padding input images to the defined width and height size [512 x 512].
(A) shows horizontal and (B) vertical padding. The radiographs were randomly chosen from the ImageCLEF 2009 Medical Annotation Task Training Set. Modified from [21] under a CC BY license, with permission from [RWTH Aachen], original copyright [2009].
Resized radiographs by padding input images to the defined width and height size [512 x 512].
(A) shows horizontal and (B) vertical padding. The radiographs were randomly chosen from the ImageCLEF 2009 Medical Annotation Task Training Set. Modified from [21] under a CC BY license, with permission from [RWTH Aachen], original copyright [2009].Both image layering and padding, as explained in subsections Image Layering and Image Padding, are applied successively; the output image is shown in Fig 10.
Fig 10
Output image after successively applying the image padding and image layering enhancement techniques.
The radiographs were randomly chosen from the ImageCLEF 2009 Medical Annotation Task Training Set. Modified from [21] under a CC BY license, with permission from [RWTH Aachen], original copyright [2009].
Output image after successively applying the image padding and image layering enhancement techniques.
The radiographs were randomly chosen from the ImageCLEF 2009 Medical Annotation Task Training Set. Modified from [21] under a CC BY license, with permission from [RWTH Aachen], original copyright [2009].
Classification
TensorFlow
For the dCNNs, TensorFlow-Slim (TF-slim), a lightweight package for defining, training and evaluating models in TensorFlow [28] with pre-trained models, was adopted. To optimize prediction performance, the models were fine-tuned with all trainable weights and best hyper-parameter configuration in the second training phase.
Inception-v3
The pre-trained model Inception-v3 [22] which was trained for the ImageNet [24] Large Visual Recognition Challenge 2012 [29], was used to fine-tune the classification model. To optimize classification accuracy, a grid search was used to obtain best hyper-parameters configurations. For the Inception-v3 classification models, the following hyper-parameter configuration was applied:Optimizer: Root Mean Square Propagation (rmsprop)Number of epochs: [1. Trainingphase = 2.5; 2. Trainingphase = 25]Number of steps: [1. Trainingphase = 1,000; 2. Trainingphase = 10,000]Batch size: [1. Trainingphase = 2.5; 2. Trainingphase = 25]Learning rate: 0.01Learning rate decay type: [1. Trainingphase = fixed; 2. Trainingphase = exponential]Weight decay: 0.00004Model name: Inception-v3For all other parameters not mentioned above, the default values as proposed in TF-Slim [28] were adopted.
Inception-ResNet-v2
The pre-trained model Inception-ResNet-v2 [23] which is a variation of the Inception-v3 using the ideas presented in [30, 31], was used to fine-tune the classification model. For the Inception-ResNet-v2 classification models, the following hyper-parameter configuration was applied:Optimizer: Root Mean Square Propagation (rmsprop)Number of epochs: [1. Trainingphase = 2.5; 2. Trainingphase = 25]Number of steps: [1. Trainingphase = 1,000; 2. Trainingphase = 10,000]Batch size: 32Learning rate: 0.01Learning rate decay type: [1. Trainingphase = fixed; 2. Trainingphase = exponential]Weight decay: 0.00004Model name: Inception-ResNet-v2For all other parameters not mentioned above, the default values as proposed in TF-Slim [28] were adopted.
Random Forest
Random forest (RF) [32] models with 1000 deep trees were trained to compare accuracy performances amongst classification models. These RF-models were trained using visual image representation obtained with the Bag-of-Keypoints (BoK) [33] approach. For whole-image classification tasks, BoK approach has achieved high classification accuracy results [34, 35]. BoK is based on vector quantization of affine invariant descriptors of image patches [33]. The simplicity and invariance to affine transformation are advantages that come with this approach.All functions applied to render visual models are from the VLFEAT library [36]. Dense SIFT (dSIFT) [37] applied at several resolutions were uniformly extracted with an interval of 4 pixels using the VL-PHOW function. Computational time was sped up by computing k-means clustering with Approximated Nearest Neighbor (ANN) [38] on randomly chosen descriptors using the VL-KMEANS function. This partitions the observations into k clusters so that the within-cluster sum of square is minimized.A maximum number of 20 iterations was defined to allow the k-means algorithm to converge and cluster centers were initialized using random data points [39]. A codebook containing 1,000 keypoints was generated as k = 1,000. Using the VL-KDTREEBUILD function, the codebook was further optimized by adapting a kd-tree with metric distance L2 for quick nearest neighbor lookup. Parameters used to tune BoK and RF are:Codebook size: 1,000Number of descriptors extracted: 1,000Visual representation size: 4,000 (2x2 grid)Feature size reduction: 4000 to 100 (Principal Component Analysis)Number of trees (RF): 1,000Ensemble method (RF): Bag
Results
Image class prediction was computed using five classification schemes: the complete IRMA code and its 4 axes separately. The performance of modeled classifiers on different classification schemes are listed in Tables 1–3, for Random Forest, Inception-v3 and Inception-ResNet-v2, respectively.
Table 1
Prediction performance of the Random Forest image classification model on the various image input types.
The highlighted accuracies are the best per classification scheme. Evaluation was calculated on the ImageCLEF 2009 Medical Annotation Task Test Set.
Input Image
T-Code
D-Code
A-Code
B-Code
IRMA
Image Padding
97.98%
63.57%
53.35%
92.21%
48.67%
Image Padding/Layered
97.52%
62.47%
51.15%
92.32%
47.98%
Image Layered
98.09%
63.05%
54.39%
91.40%
48.96%
CLAHE Image
97.69%
60.68%
50.46%
91.97%
45.84%
NLMEANS Image
97.58%
61.89%
49.42%
91.22%
45.09%
Original Image
97.00%
61.64%
51.15%
90.76%
47.11%
Table 3
Prediction performance of the Inception-ResNet-v2 image classification model on the various image input types.
The highlighted accuracies are the best per classification scheme. Evaluation was calculated on the ImageCLEF 2009 Medical Annotation Task Test Set.
Input Image
T-Code
D-Code
A-Code
B-Code
IRMA
Image Padding
99.22%
78.50%
57.89
96.78%
51.22%
Image Padding/Layered
99.28%
75.72%
59.83%
89.22%
49.83%
Image Layered
98.67%
77.61%
53.44%
92.83%
49.31%
CLAHE Image
98.06%
76.11%
51.22%
95.56%
43.33%
NLMEANS Image
97.00%
70.89%
49.44%
94.61%
42.00%
Original Image
97.33%
73.88%
49.94%
94.67%
42.67%
Prediction performance of the Random Forest image classification model on the various image input types.
The highlighted accuracies are the best per classification scheme. Evaluation was calculated on the ImageCLEF 2009 Medical Annotation Task Test Set.
Prediction performance of the Inception-v3 image classification model on the various image input types.
The highlighted accuracies are the best per classification scheme. Evaluation was calculated on the ImageCLEF 2009 Medical Annotation Task Test Set.
Prediction performance of the Inception-ResNet-v2 image classification model on the various image input types.
The highlighted accuracies are the best per classification scheme. Evaluation was calculated on the ImageCLEF 2009 Medical Annotation Task Test Set.Evaluation was performed on the official test set and all models were trained with the complete training set distributed at the ImageCLEF 2009 Medical Annotation Task.The best prediction performances per classifier model and image input obtained on the different classification schemes are displayed in Tables 4–8 for easier understanding. Evaluation was calculated for using the ImageCLEF 2009 Medical Annotation Task test set.
Table 4
Best prediction performances for the applied classification models.
The classification scheme is (T) technical axis and contains 6 classes.
Input Image
Classifier
Performance
Image Layered
Random Forest
98.09%
Image Padding
Inception-v3
99.21%
Image Layered/Padding
Inception-ResNet-v2
99.28%
Table 8
Best prediction performances for the applied classification models.
The classification scheme is the complete IRMA code, which has 193 classes.
Input Image
Classifier
Performance
Image Layered
Random Forest
48.96%
Image Padding
Inception-v3
47.00%
Image Padding
Inception-ResNet-v2
51.22%
Best prediction performances for the applied classification models.
The classification scheme is (T) technical axis and contains 6 classes.The classification scheme is (D) directional axis and contains 34 classes.The classification scheme is (A) anatomical axis and contains 97 classes.The classification scheme is (B) biological system axis and contains 11 classes.The classification scheme is the complete IRMA code, which has 193 classes.
Discussion
It can be seen from all result tables, better prediction accuracies are obtained with the enhanced radiographs. This is observed for all three classification models and all five schemes adopted. However, there is not one enhancement technique that outperforms the rest, it varies with the classification scheme used, which can be explained by the no free lunch theorem [40].Certain enhancement techniques perform better at some classification schemes. Image Layered achieves best results when trained with the Bag-of-Keywords and Random Forest. Image Padding performs best with models trained with the deep learning system Inception-v3. For models trained with Inception-ResNet-v2, Image Layered/Padding leads to better results. Best prediction performance was obtained with the following model and enhancement technique combination:(T) technical: Image Padding and Layering with Inception-ResNet-v2(D) directional: Image Padding and Layering with Inception-v3(A) anatomical: Image Padding with Inception-v3(B) biological: Image Padding with Inception-ResNet-v2IRMA: Image Padding with Inception-ResNet-v2As the number of classes increase, the prediction accuracy rate decreases. The anatomical and IRMA schemes are class imbalanced, having less or no image representing some classes. Hence, the uncertainty of the models is high at these images. The prediction results of the IRMA scheme is lowest, as it contains the highest number of classes of sparse representations. However, a hierarchical classification can be used to tackle this task, as the results in the individual axes perform well.Following the shown results, a more robust and certain model can be obtained with a balanced class distribution of the images in the training set. An ensemble of models trained with several image enhancement techniques should be applied with majority vote, to achieve the optimal training model and enhancement technique combination.
Conclusion
In this paper, grayscale radiograph enhancement methods aiming to achieve better classification and annotation performance is presented. Two extra color layers are augmented to simulate RGB-channeled images, as Deep Convolutional Neural Networks (dCNN) use color information for training. Due to variations in width size and height size, the radiographs are padded with cropped patches to fill up the defined size [512 x 512].The dCNN systems Inception-v3 and Inception-ResNet-v2 were applied as image classification models. The traditional machine learning algorithm Random Forest (RF), trained with Bag-of-Keypoints visual representation, was adopted for performance comparison. Five classification schemes, each having different number of classes and categorization focus, were applied to evaluate the image enhancement techniques.This works shows that enhancing the radiographs before training and classification, proves to obtain positive results. This is observed for the models trained with the deep learning systems Inception-v3 and Inception-ResNet-v2, as well as the traditional combination of Bag-of-Keypoints and Random Forest. For all five classification schemes, better prediction accuracies were achieved when the enhanced radiographs were used.Prospective evaluation of annotating radiographs can be based on multi-modal image representation and hierarchical class annotation, as positive results have been presented in recent approaches.
Table 2
Prediction performance of the Inception-v3 image classification model on the various image input types.
The highlighted accuracies are the best per classification scheme. Evaluation was calculated on the ImageCLEF 2009 Medical Annotation Task Test Set.
Input Image
T-Code
D-Code
A-Code
B-Code
IRMA
Image Padding
99.21%
76.61%
60.33%
96.39%
46.39%
Image Padding/Layered
99.06%
79.11%
56.67%
95.78%
47.00%
Image Layered
98.83%
74.33%
50.22%
93.78%
39.77%
CLAHE Image
98.61%
71.72%
45.77%
93.61%
39.44%
NLMEANS Image
96.78%
69.33%
43.06%
95.22%
40.44%
Original Image
97.89%
70.00%
44.33%
91.50%
39.33%
Table 5
Best prediction performances for the applied classification models.
The classification scheme is (D) directional axis and contains 34 classes.
Input Image
Classifier
Performance
Image Padding
Random Forest
63.57%
Image Layered/Padding
Inception-v3
79.11%
Image Padding
Inception-ResNet-v2
78.50%
Table 6
Best prediction performances for the applied classification models.
The classification scheme is (A) anatomical axis and contains 97 classes.
Input Image
Classifier
Performance
Image Layered
Random Forest
54.39%
Image Padding
Inception-v3
60.33%
Image Layered/Padding
Inception-ResNet-v2
59.83%
Table 7
Best prediction performances for the applied classification models.
The classification scheme is (B) biological system axis and contains 11 classes.
Authors: Geert Litjens; Thijs Kooi; Babak Ehteshami Bejnordi; Arnaud Arindra Adiyoso Setio; Francesco Ciompi; Mohsen Ghafoorian; Jeroen A W M van der Laak; Bram van Ginneken; Clara I Sánchez Journal: Med Image Anal Date: 2017-07-26 Impact factor: 8.545
Authors: Holger R Roth; Le Lu; Jiamin Liu; Jianhua Yao; Ari Seff; Kevin Cherry; Lauren Kim; Ronald M Summers Journal: IEEE Trans Med Imaging Date: 2015-09-28 Impact factor: 10.048
Authors: Obioma Pelka; Christoph M Friedrich; Felix Nensa; Christoph Mönninghoff; Louise Bloch; Karl-Heinz Jöckel; Sara Schramm; Sarah Sanchez Hoffmann; Angela Winkler; Christian Weimar; Martha Jokisch Journal: PLoS One Date: 2020-09-25 Impact factor: 3.240