Elyas Rashno1, Ahmad Akbari1, Babak Nasersharif2. 1. Department of Computer Engineering, Iran University of Science and Technology, Narmak, Tehran, 1684613114 Iran. 2. Department of Computer Engineering, K.N. Toosi University of Technology, Tehran, Iran.
Abstract
The performance of convolutional neural networks is degraded by noisy data, especially in the test phase. To address this challenge, a new convolutional neural network structure with data indeterminacy handling in the neutrosophic (NS) domain, named as Neutrosophic Convolutional Neural Networks, is proposed for image classification. For this task, images are firstly mapped from the pixel domain to three sets true (T), indeterminacy (I) and false (F) in NS domain by the proposed method. Then, NCNN with two parallel paths, one with the input of T and another with I, is constructed followed by an appropriate combination of paths to generate the final output. Here, two paths are trained simultaneously, and neural network weights are updated using back propagation algorithm. The effectiveness of NCNN to handle noisy data is analyzed mathematically in terms of the weights update rule. Proposed two paths NS idea is applied to two basic models: CNN and VGG-Net to construct NCNN and NVGG-Net, respectively. The proposed method has been evaluated on MNIST, CIFAR-10 and CIFAR-100 datasets contaminated with 20 levels of Gaussian noise. Results show that two-path NCNN outperforms CNN by 5.11% and 2.21% in 5 pairs (training, test) with different levels of noise on MNIST and CIFAR-10 datasets, respectively. Finally, NVGG-Net increases the accuracy by 3.09% and 2.57% compared to VGG-Net on CIFAR-10 and CIFAR-100 datasets, respectively.
The performance of convolutional neural networks is degraded by noisy data, especially in the test phase. To address this challenge, a new convolutional neural network structure with data indeterminacy handling in the neutrosophic (NS) domain, named as Neutrosophic Convolutional Neural Networks, is proposed for image classification. For this task, images are firstly mapped from the pixel domain to three sets true (T), indeterminacy (I) and false (F) in NS domain by the proposed method. Then, NCNN with two parallel paths, one with the input of T and another with I, is constructed followed by an appropriate combination of paths to generate the final output. Here, two paths are trained simultaneously, and neural network weights are updated using back propagation algorithm. The effectiveness of NCNN to handle noisy data is analyzed mathematically in terms of the weights update rule. Proposed two paths NS idea is applied to two basic models: CNN and VGG-Net to construct NCNN and NVGG-Net, respectively. The proposed method has been evaluated on MNIST, CIFAR-10 and CIFAR-100 datasets contaminated with 20 levels of Gaussian noise. Results show that two-path NCNN outperforms CNN by 5.11% and 2.21% in 5 pairs (training, test) with different levels of noise on MNIST and CIFAR-10 datasets, respectively. Finally, NVGG-Net increases the accuracy by 3.09% and 2.57% compared to VGG-Net on CIFAR-10 and CIFAR-100 datasets, respectively.
Although crisp sets with 0 and 1 membership degrees were developed into fuzzy sets with continuous membership degrees, the uncertainty of each data point is not considered and described in the classical fuzzy set. In each application, uncertainty is a concept that indicates how membership degree is described for each set. The traditional fuzzy set describes the membership degree with a real number [1]. In this situation, the uncertainty about itself is not considered and handled [2]. In systems such as expert systems, information fusion, and belief systems, the truth-membership supported by the evidence can not solve and model the related problems, alone. So, the classical fuzzy set cannot solve these problems as well [2]. To achieve a complete model, falsity and indeterminacy memberships should be considered. Neutrosophic (NS) set is an extension of the fuzzy set that attempts to solve this problem by considering the truth, indeterminacy, and falsity memberships. Where the truth and indeterminacy memberships can be considered independently [3].
Literature review
Neutrosophy theory, proposed by Smarandache in 1995 [4], is a branch of philosophy which studies the nature and scope of the neutralities and their interactions with different ideational spectra which is the basis of neutrosophic logic and set [5, 6]. This theory has been applied for image processing first by Guo et. al [3], and then, it has been successfully used for other image processing domains including image segmentation [7-10], image thresholding [11], image edge detection [12], retinal image analysis [13-21], liver image analysis [22, 23], breast ultrasound image analysis [24], data classification [25], uncertainty handling [26, 27], data clustering [28-30], content-based image retrieval [31, 32] and skeletal muscle analysis [33, 34].Today, deep learning methods have been widely used for feature extraction, feature compression, and classification. Over the past few years, convolutional neural networks (CNNs), introduced as one of deep learning approaches, have significantly improved classification accuracy and reduced processing costs [35, 36]. CNNs use variations of multilayer perceptron such as shift invariant or space invariant artificial neural networks (SIANN) to analyze visual imagery automatically [37]. Different CNN architectures have been proposed as reference models for image processing and machine vision tasks. An eight-layer CNN named as Alex-Net has been proposed in which CNN features were visualized [38]. In addition, two proposed CNN models including a very deep convolutional network (VGG-Net) [39] and GoogLe-Net [40] have been proposed by Oxford University’s Visual Geometry Group and Google Inc., respectively. CNNs have been applied in many application such as fluid segmentation [41-43], coronavirus disease (covid-19) detection [44], and vegetation remote sensing [45].Any prediction or classification by CNN may be associated with a degree of uncertainty. There are two common types of uncertainty which stem from the uncertainty in the training dataset and uncertainty in the model structure [46-49]. If uncertainty is not handled, it may lead to disastrous consequences. For example, in May 2016, the perception system confused the white side of a trailer and the bright sky which caused the first fatality from an assisted driving system [50]. Also, an image classification system for racial discrimination erroneously identified two African Americans as gorillas [51]. In these examples, if systems were able to handle errors by assigning a high level of uncertainty to erroneous predictions, better decisions were achieved and disaster could be avoided. Uncertainty handling has been discussed in CNN structures. Quantify uncertainty in vision tasks was modeled in CNN by Kendall and Gal [52]. Considering uncertainty in the convolution layer was proposed in [53-55]. Also, uncertainty was processed in FC layers and convolution layers; e.g., network morphism; by Wei Ma and Jun Lu [56, 57]. Consideration of uncertainty in CNN layers has some issues such as needing more memory storage and more computation. Some state-of-the art methods use Monte Carlo dropout sampling to extract model uncertainty in CNN [58].CNNs have been used in NS domain in the literature. In [59], a model for skin dermoscopic was proposed based on NS multiple deep CNNs. Then, NS similarity score (NSS) was applied to determine the number of training set for each epoch during the training process. In [60], a hybrid method using NS and CNNs was introduced for the classification of tumor region areas in MRI images as benign and malignant. For this task, MRI images were segmented using the neutrosophic set expert maximum fuzzy sure entropy (NS-EMFSE) approach. Feature vectors were extracted from segmented brain images and CNNs and classified using SVM and KNN classifiers. Effects of NS on deep transfer learning models Alexnet, Googlenet, and Restnet18 were investigated to categorize X-ray images as COVID-19, normal, pneumonia bacterial, and pneumonia virus [61]. Breast tumor segmentation was formulated as a classification problem in the NS domain for removing speckle noise and enhancing images contrast. The similarity set score and homogeneity value for each pixel were calculated in the NS domain, and then, seed regions are selected by an adaptive Otsu-based thresholding method and morphology operations. Finally, a deep CNN, based on VGG-16 network, was applied for false-positive rate reduction [62].The proposed scheme in this research, named as neutrosophic convolution neural network (NCNN), is a CNN model with uncertainty handling in NS domain for image classification. This model can be easily adapted for other applications such as speech processing [63]. In the first step, images are mapped to three sets True (T), Indeterminacy (I) and False (F) in NS domain. The proposed definition for data indeterminacy leads to more highlighted noisy pixels in comparison with conventional NS transformation. Then, T and I sets in NS domain are presented as inputs of the proposed two-path NCNN followed by the combination of the outputs of two paths to achieve the final output. In fact, one path is trained with T set and another with I set. It has been shown that when clean and noisy data points with the same label are fed to NCNN, the difference between weight updates for these training samples is not significant. Also, back-propagation equations are computed for each path to show its difference with conventional one-path CNN and it is shown that NCNN is more robust against noisy images in comparison with conventional CNN. Key differences between the proposed NCNN and other CNNs are based on NS. Despite [59-62] which use NS for training set determination in NCNN, indeterminacy set in NS domain is used as a part of CNN training set process. It means that information provided by this set affects weights of neurons in back-propagation process and makes CNNs more robust against noisy samples. It is worth mentioning that path combination in NCNN is different from conventional combination of current CNNs in which each CNN is considered as an independent path. In NCNN, two paths are trained simultaneously, and each one affects another for updating weights. As it will be concluded from experiments and mathematical analysis, NCNN handles noisy data with higher indeterminacy and it is not overfitted in the training phase as quickly as conventional CNNs.The rest of this paper is organized as follows: Sect. 2 presents a review on NS sets. The proposed method is discussed in Sect. 3. Experimental results are reported in Sect. 4. Finally, this work is discussed and concluded in Sects. 5 and 6, respectively.
Neutrosophic set
NS is an extension of the fuzzy set in which data indeterminacy is considered in addition to truth membership degree. In fact, for each data point, the confidence of assigning truth membership degrees is also considered. Suppose that there are two observers which are going to detect a flower in a picture. The first one assigns 0.8 truth membership degree for the existence of flower, while the second one assigns 0.5. At a higher level, the confidence for the observer’s decision is also considered. Therefore, if these confidences for first and second observers are 0.3 and 0.9, respectively, the first observer affects the final decision with more weight. In NS, the confidence level for each observer is considered in the indeterminacy set. These are the main differences between NS and fuzzy sets. Generally, considering set A in NS, each member x in A is denoted by three real subsets: true, false and indeterminacy in the interval [0, 1] referred as T, F and I, respectively. Each element is expressed as x(t, i, f) which means that it is t% true, i% indeterminate, and f% false. In each application, domain experts propose the concept behind true, false and indeterminacy.To use NS in the image processing domain, each pixel is considered as a data point and should be mapped to NS sets. The first method for this mapping was proposed by Guo [3]. Mapping methods completely depend on the image processing application. An image g in pixel domain is represented with three subsets: T, I and F in NS domain. Therefore, pixel p(i, j) in g is shown with or PNS(t, i, f). T, I and F indicate white, noise and black pixel sets, respectively. PNS(t, i, f) provides useful information about white, noisy and black percentages in this pixel that is t% to be a white pixel, i% to be a noisy pixel and f% to be a black pixel. T, I and F are computed as follows [2, 3].where g is grayscale image, is filtered image g with mean filter, w is window size for mean filter, and and are the maximum and minimum of the , respectively. is the absolute difference between g and , and and are the maximum and minimum values of , respectively.
Proposed method
The main contributions of this work are first proposing data indeterminacy concept as well as the truth set in NS domain and then using these concepts in CNN models which leads to a new two-path network. It is theoretically and experimentally proved that the proposed CNN structure handles noisy and outlier data points more efficiently than conventional CNN models and converges quicker. Mentioned contributions are discussed in the following sections.
Data indeterminacy in NS domain
In the proposed scheme, noisy data points are modeled as high indeterminacy data. This is the first and the most important step since it should interpret noisy pixels for CNN model correctly. Therefore, in the proposed definition for data indeterminacy, it is expected that a high indeterminacy is assigned to noisy pixels and this concept is used for the proposed CNN structure. The proposed method for pixel indeterminacy is summarized in Eqs.(6)–(9):where is the input image convolved by the mean filter, and and represent the mean of pixels in and , respectively. The reason for dividing by mean is that pixels with higher indeterminacy are revealed brighter. Therefore, as shown in Fig. 1, edges and noisy pixels are appeared brighter in comparison with the basic NS operator.
Fig. 1
Proposed pixel Indeterminacy: a Input image, b Indeterminacy matrix computed by basic NS and c Indeterminacy matrix computed by the proposed method
Proposed pixel Indeterminacy: a Input image, b Indeterminacy matrix computed by basic NS and c Indeterminacy matrix computed by the proposed method
Proposed network structure
Presenting indeterminacy (I) and truth (T) sets for CNN structures leads to a two-parallel-path network referred as Neutrosophic CNN. Indeterminacy and truth membership degrees are trained in the first (I-path) and second path (T-path), respectively. The outputs of these paths are combined to compute the final label predicted by the network. The proposed structure is illustrated in Figs. 2 and 3.
Fig. 2
The proposed NCNN model
Fig. 3
Network weights for NCNN
The proposed NCNN modelIt may be worth mentioning that two paths are trained simultaneously in contrast with two-path structures in which each path is trained separately, and then, frozen weights are combined in the final step.In fact, in the first epoch of training, weights in two paths are updated simultaneously by multiplying the outputs of paths which leads to gradient switching. In this case, paths affect and help each other resulting in robust weight updates.
Network training
Suppose that and are two weights in I-path and T-path, respectively, shown in Fig. 3. It is clear from the figure that these weights are multiplied by the neurons in the previous layer to construct I and T. Note that I and T are the predicted labels by I-path and T-path networks, respectively. These labels are combined (multiplied) to compute the final label f (see Fig. 3).Network weights for NCNNTo prove how NCNN handles noisy data points with the indeterminacy concept, we discuss how network weights are updated for noisy and clean data points. Here, we explain the weight update for T-path. The scenario for I-path is the same.Each weight in T-path network is updated by the general update rule as follow:where is computed based on the neural networks update rule in Eq. (11)Applying chain rule to compute leads to:Based on network structure in Fig. 3, f and T are computed by Eq. (13):In addition, is learning rate. Therefore,By substituting (14) in (12):To compute , chain rule is used again which tends to:In Eq. (16), p is the final network output, e represents difference between data label L and predicted label p. E is sum squared error. These parameters are defined by Eq. (17):Applying derivation rules leads to Eq. (18):Therefore, is computed as follow:By substituting (19) in (15), the update rule is done by:It can be concluded from these equations that weight updates for neurons in T-path are affected by I subset in NS domain and vice versa.
Network behavior analysis
Finally, in this section, it is explained how the proposed idea for NCNN leads to better results in comparison with CNN. The main difference between the weight update rule in NCNN and conventional CNN is the parameter I. To find out the effect of I in computation, Eq. (21) is extended to Eq. (22) as follow:here there are two possible class labels 0 and 1 for L. Therefore, separated values in Eq. (23) can be considered in Eq. (22):Since T and I are the output of softmax layer, these parameters are also 0 and 1, therefore:Equation (24) in NCNN is compared with Eq. (25) in conventional CNN which is as follows:One of the main challenges of CNN is its sensitiveness against noisy data. It means that when clean data are fed as input and its label is predicted correctly, network weights are updated slightly. If the same data (with the same label) with noise are fed to the network, it tries to update weights significantly. It means that in CNN, weight update for clean and noisy data with the same label is significantly different. This behavior misleads the network for noisy data. Here, it is shown that NCNN handles noisy data and does not update weights for clean and noisy data with the same label differently. To compare the amount of weight updates for clean and noisy data in NCNN and CNN, if true label 1 is considered for noisy and clean data, weight updates for clean and noisy data in CNN are and , respectively. Therefore, the difference between weight updates for clean and noisy data in CNN is:Also, weights update for clean and noisy data in NCNN is and , respectively. The difference between weight update for clean and noisy data in NCNN is as follows:If it is proved that (27) is less than (26), it is equivalent to this fact that NCNN does not make a difference in weight update for clean and noisy data in comparison with CNN. It means that NCNN is more robust against noisy data.It is clear that for a data point with label 1, and are bigger than and . Since is a negative value, to show that (27) < (26), it is enough to show that:As it will be shown numerically in the discussion section, if the I-path and T-path networks are trained separately, they almost predict the same label. Therefore, elements in pair and as well as pair and have near values. Note that in each image, I-path learns labels from noisy pixels and pixels with a high gradient, while T-path learns labels from all pixels in the image. Therefore, in NCNN, both noisy pixels (with a high indeterminacy) and clean pixels are used to predict the label. From the result of Table 8 in the Discussion section, it can be concluded that:It is worth mentioning that inequality (30) does not satisfy when . Clean data have high values for and . Therefore, inequality (30) is not satisfied for clean data. Therefore, in clean datasets, NCNN not only does not improve the performance but also has lower accuracy in comparison with CNN. For noisy datasets, and are far from each other and leads to values near to zero. Therefore, inequality (30) is almost satisfied for noisy data. As experimental results will also show, this is the main reason that NCNN handles noisy data points and has a very good performance for noisy data classification.
Table 8
Correlation between the outputs of two paths
Dataset
MNIST
CIFAR-10
CIFAR-100
Clean
Noisy
Clean
Noisy
Clean
Noisy
T-path output
98.97
61
92.32
57.66
69.39
50.94
I-path output
98.52
60.55
90.01
57.72
68.91
50.36
Experimental results
Dataset
NCNN model is proposed for the image classification task. Therefore, to show the effectiveness of NCNN, it has been evaluated on MNIST, CIFAR-10 and CIFAR-100 datasets. To evaluate the robustness of NCNN and other models against noise, Gaussian noise with different means and standard deviations is added to images. For each dataset in each model, 5 levels of noise are considered. Since we evaluate 2 models and 2 datasets for each model, a total of 20 noise levels are used. In this section, each dataset is described briefly.
MNIST
The first dataset is MNIST (Modified National Institute of Standards and Technology database), which is a common dataset used for evaluating various image processing and machine learning methods. This dataset contains 70,000 images of handwritten digits divided into 60,000 training images and 10,000 testing images. All images in MNIST were normalized to fit into a 28 28 pixel bounding box and anti-aliased in grayscale levels [64].
CIFAR-10
The second dataset is CIFAR-10 (Canadian Institute For Advanced Research) which is the most widely used for research in machine learning and computer vision algorithms. The CIFAR-10 dataset contains 60,000 color images with a dimension of 32 32 in 10 different classes including airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks with 6000 images in each class. The main purpose of CIFAR-10 is to teach a computer how to recognize objects. This dataset allows researchers to quickly try different machine learning and machine vision algorithms since images are in low-resolution (32 32). Different models of CNNs have tried to achieve the best accuracy in recognizing CIFAR-10 images [65].
CIFAR-100
CIFAR-100 is the same as CIFAR-10 except for the number of classes and samples in each class. CIFAR-100 contains 100 classes with 600 samples in each class. The number of training and testing samples in each class is 500 and 100, respectively. Classes are further divided into 20 superclasses. Therefore, each sample is assigned to “fine” and “coarse” labels which are the label of class and superclass, respectively. Classes are aquatic mammals, fish, flowers, food containers, fruit and vegetables, household electrical devices, household furniture, insects, large carnivores, large man-made outdoor things, large natural outdoor scenes, large omnivores and herbivores, medium-sized mammals, non-insect invertebrates, people, reptiles, small mammals, trees, vehicles 1 and vehicles 2 [66].
CNN models
NCNN is a structure that can be applied to any CNN model. It means that CNN models with a different number of layers can be placed in NCNN structure. In this section, we want to show how NCNN can be applied to any CNN model to improve its robustness against noisy data with higher indeterminacy. For this task, two basic CNN models are considered. The first one is the proposed 11-layer CNN model, and the second one is the reference model VGG-net. Applying NS-based two parallel paths idea to the first and second mentioned models constructs NCNN and NVGG-net, respectively. NCNN and CNN have been applied to MNIST and CIFAR-10 datasets. VGG-net and NVGG-net are evaluated on CIFAR-10 and CIFAR-100 datasets. All CNN structures have been implemented with python and TensorFlow libraries installed in Linux operating system on a machine with 3.26 GHz Corei7 CPU, 32 GB of DDR4 RAM and GeForece 1080ti 8GB RAM GPU.
The first model
The first proposed model is an 11-layer CNN shown in Fig. 4. In this structure, “5 × 5 conv, 64” means convolution layer with 64 filters in dimension 5 5. “3 3 Pooling” means polling layer with a 3 3 filter, and “FC, 384” means fully connected layer with 384 neurons. In each convolution layer, the step size is 1 1; therefore, the first and second dimensions of the input image remain fixed, while the third layer may be changed depending on the number of convolution layers. In the pooling layer, since the step size is 2 2, the first and second dimensions are reduced by half. Finally, there are three fully connected layers with different numbers of neurons shown in Fig. 4. After each convolution and pooling layer, activation function “RELU” is used.
Fig. 4
The first proposed CNN model
The first proposed CNN modelNCNN and CNN have been evaluated on MNIST and CIFAR-10 datasets. In each experiment, a level of Gaussian noise is considered which makes different combinations of training and test sets with noise levels including “clean,” “NT1: Mean = 0.8, Std = 0.6,” “NT2: Mean = 0.6, Std = 0.9,” “NT3: Mean = 0.7, Std = 1.2” and “NT4: Mean = 0.8, Std = 1.6.” Therefore, from each dataset, we obtain 25 datasets. The first dataset is the original dataset with clean training and clean test data. The 25th dataset is a dataset in which training and test data are added with NT4 noise. The classification accuracies on MNIST test data are reported in Table 1. In all tables, vertical and horizontal data types are used for training and test sets, respectively.
Table 1
Results of CNN and NCNN for MNIST
Clean
NT1
NT2
NT3
NT4
Average
Clean
CNN
98.97
49.51
35.23
24.27
18.63
45.32
NCNN
98.47
48.83
43.4
32.84
28.65
50.43
NT1
CNN
97.16
93.96
80.67
63.07
46.2
76.21
NCNN
97.9
94.86
85.85
70.3
54.88
80.75
NT2
CNN
96.58
94.41
87.95
75.36
57.31
82.32
NCNN
95.66
94.52
88.21
75.41
59.01
82.56
NT3
CNN
94.72
89.16
86.32
76.99
61.78
81.79
NCNN
90.17
92.6
87.39
76.85
62.26
81.85
NT4
CNN
90.24
90.05
84.09
74.87
61
80.05
NCNN
88.85
91.28
85.69
76.06
64.83
81.34
Average
CNN
95.534
83.418
74.852
62.912
48.984
–
NCNN
94.21
84.418
78.108
66.292
53.926
–
Results of CNN and NCNN for MNISTIt can be concluded from the reported results in Table 1 (where it is also shown mathematically in Sect. 3) that if both training and test data are clean, CNN outperforms NCNN slightly. The best performance of NCNN is revealed in cases which noise is presented either in training or test data. For example, when training data are clean and test data are NT4, NCNN outperforms CNN with 10.02%. This behavior of NCNN is shown in Fig. 5. It can be seen that the higher amount of noise, the better improvement in NCNN (in comparison with CNN) is achieved. These results verify the main advantage of NCNN for uncertainty handling in noisy data.
Fig. 5
NCNN and CNN accuracies with clean training and noisy test data for MNIST
NCNN and CNN accuracies with clean training and noisy test data for MNISTAnother conclusion from reported results on MNIST dataset is that training network with noisy data makes it more robust against noisy data in the test phase. Figure 6 illustrates the results of CNN and NCNN with NT3 training data and all noise levels for test data. In this experiment, similar to Fig. 5, applying more noise (NT3) in test data leads to a better improvement (3%) in the results of NCNN in comparison with CNN. When clean data are considered in the test phase, CNN outperforms NCNN by about 1%.
Fig. 6
NCNN and CNN accuracies with noisy training and noisy test data for MNIST
NCNN and CNN accuracies with noisy training and noisy test data for MNISTCIFAR-10 is the second dataset considered for evaluation of the first model. By applying 5 models of data including “Clean,” “NT5: Mean = 0.1, Std = 0.1,” “NT6: Mean = 0.4, Std = 0.2,” “NT7: Mean = 0.2, Std = 0.4” and “NT8: Mean = 0.3, Std = 0.6,” 25 datasets are created. The classification accuracies on test data for CIFAR-10 are reported in Table 2. Here, each column shows experiments with the same training data and test data with different levels of noise, while each row is for the training data with different noise levels and the same test data. In CIFAR-10, the best improvement of NCNN (3.74%) appeared in the case that training and test data are contaminated with NT7 noise.
Table 2
Results of CNN and NCNN for CIFAR-10
Clean
NT5
NT6
NT7
NT8
Average
Clean
CNN
71.35
63.13
45.94
23.61
16.84
44.174
NCNN
72.76
65.12
49.39
22.73
19.23
45.846
NT5
CNN
68.56
67.3
58.29
35.41
23.17
50.546
NCNN
71.06
70.68
61.39
36.91
22.27
52.462
NT6
CNN
62.67
62.78
61.59
40.29
29.04
51.274
NCNN
64.08
64.75
63.83
43.33
28.83
52.964
NT7
CNN
51.72
49.08
36.06
51.24
43.48
46.316
NCNN
53.77
51.45
36.38
54.98
46.03
48.522
NT8
CNN
41.77
39.45
23.96
48.22
45.5
39.78
NCNN
44.77
41.46
24.08
48.46
48.42
41.438
Average
CNN
59.214
56.348
45.168
39.754
31.606
–
NCNN
61.288
58.692
47.014
41.282
32.956
–
Results of CNN and NCNN for CIFAR-10The two columns with “Clean” header report the classification accuracies of NCNN and CNN with the clean training data and test data with all noise levels. These classification accuracies are depicted in Fig. 7. For these experiments, NCNN outperforms CNN by 3.45% for test data with NT6 noise. In CIFAR-10 dataset, the trend is the same with MNIST with a slower slope. It means that MNIST is more affected by noisy test data in comparison with MNIST.
Fig. 7
NCNN and CNN accuracies with clean training and noisy test data for CIFAR-10
NCNN and CNN accuracies with clean training and noisy test data for CIFAR-10Evaluation of NCNN and CNN with NT7 training data and different noise levels in test data is shown in Fig. 8. In these cases, instead of a down trend toward increasing noise, the trend is upward after NT6 which means the similarity between noise applied to training and test data.
Fig. 8
NCNN and CNN accuracies with noisy training and noisy test data for CIFAR-10
NCNN and CNN accuracies with noisy training and noisy test data for CIFAR-10Generally, as it is illustrated in Figs. 9, 10, 11 and 12, if the noise is increased from NT5 to NT8, the accuracies of both networks are decreased with negative slopes in all charts. In cases where the noise level in training and test is the same, not only there is no significant decrement in the accuracy, but also there is an improvement in some cases. For example, in Fig. 10, since the noise level in training is NT6, the accuracy is not decreased noticeably when the noise level is increased from NT5 to NT6. In Fig. 11 which represents training data with noise level NT7, when noise is increased from NT6 to NT7, the accuracy is improved significantly.
Fig. 9
NCNN and CNN accuracies for NT5 training data
Fig. 10
NCNN and CNN accuracies for NT6 training data
Fig. 11
NCNN and CNN accuracies for NT7 training data
Fig. 12
NCNN and CNN accuracies for NT8 training data
NCNN and CNN accuracies for NT5 training dataNCNN and CNN accuracies for NT6 training dataNCNN and CNN accuracies for NT7 training dataNCNN and CNN accuracies for NT8 training data
The second model (VGG-Net)
The second base model is VGG-Net [39] as illustrated in Fig. 13. This model has 16 layers with “RELU” activation functions. After convolution and fully connected layers, batch normalization is used. For convolution layers which are not followed by pooling layers, dropout is considered with the coefficients of 0.4 and 0.3. Also, in all fully connected layers, except the last one, dropout with the coefficient of 0.5 is used. In convolution layers, since the step size is 1x1, the dimensions of images before and after passing these layers are the same, while in the dropout layer, they are decreased by half because of step size 2x2. Applying NS-based two parallel paths to VGG-Net creates NVGG-Net.
Fig. 13
VGG-Net structure
VGG-Net structureVGG-Net and NVGG-Net have been evaluated on CIFAR-10 and CIFAR-100 datasets. In CIFAR-10, 5 levels of noise including “Clean,” “NT9: Mean =0.2, Std=0.1,” “NT10: Mean = 0.1, Std = 0.2,” “NT11: Mean = 0.2, Std = 0.3” and “NT12: Mean = 0.3, Std = 0.5” are considered which can be applied to either training or test data. Table 3 reports the results of VGG-Net and NVGG-Net for CIFAR-10. NVGG-Net achieves better classification accuracy rather than VGG-Net except for pairs (Clean, NT10), (NT9, NT11), (NT11, NT10), (NT11, Clean) and (NT11, NT11) out of all 25 pairs for (Training, Test). The best average improvement of 3.09% was obtained by NVGG-Net in cases with NT12 test set and noisy training sets (with all levels of noise). In these cases, an improvement of 4.9% was achieved by NVGG-Net in pair (Clean, NT12).
Table 3
Results of VGG-Net and NVGG-Net for CIFAR-10
Clean
NT9
NT10
NT11
NT12
Average
Clean
VGG-Net
92.32
75.28
54.18
43.6
30.54
59.18
NVGG-Net
92.52
78.22
53.9
47.94
35.44
61.6
NT9
VGG-Net
88.31
84.67
72.32
59.64
42.54
69.5
NVGG-Net
88.53
86.15
73.62
59.13
44.04
70.29
NT10
VGG-Net
83.6
81.8
76.76
67.57
52.23
72.39
NVGG-Net
84.83
82.48
78.56
69.38
55.09
74.07
NT11
VGG-Net
79.14
77.93
75
68.83
56.18
71.42
NVGG-Net
78.12
79.44
74.18
68.18
59.59
71.9
NT12
VGG-Net
69.82
69.47
68.32
65.16
57.66
66.09
NVGG-Net
71.5
70.55
69.05
67.22
60.44
67.75
Average
VGG-Net
82.638
75.28
69.316
60.96
47.83
–
NVGG-Net
83.1
78.22
69.86
62.37
50.92
–
Results of VGG-Net and NVGG-Net for CIFAR-10Figures 14, 15, 16, 17 and 18 show how networks are affected by changing training and test data and can be very useful. It can be that NVGG-Net outperforms VGG-Net in the majority of cases especially when the noise in the test set is NT12. In both networks, if the network is trained with more noisy data, it is more robust against increasing noise in test data. Therefore, Fig. 14 for clean training data has the most descent in the slope, while Fig. 18 for NT12 training data is near to a horizontal line (the least descent in slope). Also, using the same noise in training and test data makes the network more robust.
Fig. 14
NVGG-Net and VGG-Net accuracies with Clean training data
Fig. 15
NVGG-Net and VGG-Net accuracies with NT9 training data
Fig. 16
NVGG-Net and VGG-Net accuracies with NT10 training data
Fig. 17
NVGG-Net and VGG-Net accuracies with NT11 training data
Fig. 18
NVGG-Net and VGG-Net accuracies with NT12 training data
NVGG-Net and VGG-Net accuracies with Clean training dataNVGG-Net and VGG-Net accuracies with NT9 training dataNVGG-Net and VGG-Net accuracies with NT10 training dataNVGG-Net and VGG-Net accuracies with NT11 training dataNVGG-Net and VGG-Net accuracies with NT12 training dataFor CIFAR-100, different noises were applied in cases “clean,” “NT13: Mean = 0.1, Std = 0.1,” “NT14: Mean = 0.2, Std = 0.1,” “NT15: Mean = 0.1, Std = 0.2” and “NT16: Mean = 0.2, Std = 0.2.” Table 4 reports the results of VGG-Net and NVGG-Net for CIFAR-100. The best improvement of 4.5% was obtained by NVGG-Net in pair (NT14,NT15). Also, the best average improvement of 2.57% was obtained by NVGG-Net in cases with NT16 test set and noisy training sets (with all levels of noise).
Table 4
Accuracies of VGG-Net and NVGG-Net for CIFAR-100
Clean
NT13
NT14
NT15
NT16
Average
Clean
VGG-Net
69.39
56.8
56.05
46.06
46.03
54.87
NVGG-Net
69.51
57.82
55.53
49.72
50.64
56.64
NT13
VGG-Net
63.12
59.67
59.38
50.2
50.64
56.6
NVGG-Net
63.075
61.42
61.72
52.96
52.63
58.36
NT14
VGG-Net
63.93
59.2
59.19
49.75
50.98
56.61
NVGG-Net
64.14
61.69
61.78
54.25
53.82
59.13
NT15
VGG-Net
57.43
55.68
55.51
50.52
51.3
54.09
NVGG-Net
60.87
55.51
56.32
51.82
52.63
55.43
NT16
VGG-Net
57.1
55.54
55.92
50.85
50.94
54.07
NVGG-Net
58.75
57.33
56.28
50.49
53.01
55.17
Average
VGG-Net
62.194
57.378
57.21
49.476
49.978
–
NVGG-Net
63.269
58.754
58.326
51.848
52.546
–
Accuracies of VGG-Net and NVGG-Net for CIFAR-100Finally, as it is shown in Fig. 19, networks training with the highest level of noise (NT16) does not make a significant change in the robustness of the networks. The reason is that the amount of noises added to CIFAR-100 is much less than those in CIFAR-10.
Fig. 19
NVGG-Net and VGG-Net accuracies with training data: (a) Clean and (b) NT4
NVGG-Net and VGG-Net accuracies with training data: (a) Clean and (b) NT4
Window size effect
Window size has an important effect on computing indeterminacy and true sets in image transformation to NS domain. In this section, the effect of window size is evaluated on MNIST dataset for the first model and CIFAR-10 and CIFAR-100 dataset for the second model. Classification accuracies with different window sizes are reported in Table 5. It can be concluded that for clean data, smaller window sizes have better performance, while for noisy data, bigger window sizes are preferred. The reason is that with a bigger window size, more general information around each pixel is considered in NS domain and the robustness against noisy pixels in local neighbors is increased. On the other hand, for clean data, we need only local information and we do not need general information around each pixel. Window size 3 3 has the best performance in all experiments in average.
Table 5
Classification accuracies with different window sizes
Classification accuracies with different window sizes
Combination method effect
The proposed model for indeterminacy consideration in CNN is a two parallel paths network, one path with the input of indeterminacy set I and another with true set T. Combination method of two paths is very important. Here, three combination methods including Maximum, Mean and Product are assessed. Table 6 reports the classification accuracies with different combination methods applied to MNIST dataset for the first model and CIFAR-10 and CIFAR-100 dataset for the second model. In these experiments, window sizes are selected as 3 3. It is clear from the reported results that Product method has the best accuracy among other methods The effect of Mean and Maximum combination methods will be mathematically analyzed in the Discussion section.
Table 6
Classification accuracies with different combination methods
Dataset
MNIST
CIFAR-10
CIFAR-100
Max
Mean
Prod
Max
Mean
Prod
Max
Mean
Prod
Clean
49.32
50.12
50.43
60.22
61.01
61.60
55.87
56.83
56.64
NT1
76.33
79.41
80.75
69.76
69.18
70.29
57.68
57.19
58.36
NT2
81.00
81.92
82.56
73.10
73.55
74.07
58.19
58.37
59.13
NT3
80.40
81.77
81.65
69.94
69.22
71.90
54.78
55.10
55.43
NT4
79.33
81.10
81.34
66.10
67.45
67.75
54.24
53.72
55.17
Average
73.27
74.86
75.34
67.82
68.08
69.12
56.15
56.24
56.94
Classification accuracies with different combination methods
Training methods
The last experiments are performed to evaluate training methods. There are two options to train parallel paths in the network. The first one is an ensemble method in which two paths are trained separately and then their outputs are combined. In the second method, two paths are trained simultaneously. The first and second methods are called “Separate Training” and “Joint Training,” respectively. In these experiments, window size 3 3 and Product combination methods are considered. As reported in Table 7, the second training method has better classification accuracy with the 2.13% of improvement on CIFAR-100 dataset. Therefore, we used the second training in this research.
Table 7
Classification accuracies with different training methods
Dataset
MNIST
CIFAR-10
CIFAR-100
Training method
Separate
Joint
Separate
Joint
Separate
Joint
Clean
50.12
50.43
60.99
61.80
55.32
58.64
NT1
79.32
80.75
70.01
70.29
56.75
58.36
NT2
81.02
82.56
73.12
74.07
58.22
59.13
NT3
79.98
81.65
70.27
71.90
54.68
57.43
NT4
80.11
81.34
65.91
67.75
53.11
55.17
Average
74.11
75.34
68.06
69.16
55.61
57.74
Classification accuracies with different training methodsThe source code of NCNN and other CNN models used in this research as well as three datasets in clean and noisy formats will publicly available online as supplementary material in the journal website and Github after the publication of the paper.
Computational cost
In this section, the computational cost of CNN and NCNN models is analyzed. For this task three datasets CIFAR-Clean, CIFAR10-NT6 and CIFAR10-NT8 are selected for cost evaluation. All models have been implemented in python programming language with TensorFlow and Pandas libraries with NDIVIA 1080ti graphical processing unit (GPU), 32G RAM DDR4, corei7 CPU and 500G hard drive SSD. Time costs for running 1 patch from all 10000 data points in test set for CNN and NCNN are 1.18 and 2.08 seconds, respectively. Since NCNN uses two paths T-path and I-path, the number of parameters (weights) is as twice as the number of parameters in CNN with one path, so NCNN training time is more than CNN one.The number of iterations as well as the accuracy of each iteration in the training phase is shown in Figs. 20, 21 and 22 for CIFAR-Clean, CIFAR10-NT6 and CIFAR10-NT8 datasets, respectively. As it is clear from these figures, CNN is overfitted in iteration 4000, while NCNN leads to better results in the training phase which can be considered as an advantage for NCNN.
Fig. 20
Accuracy versus number of iterations in CIFAR-Clean dataset
Fig. 21
Number of iteration versus accuracy in CIFAR-10-NT6 dataset
Fig. 22
Number of iteration versus accuracy in CIFAR-10-NT8 dataset
Accuracy versus number of iterations in CIFAR-Clean datasetNumber of iteration versus accuracy in CIFAR-10-NT6 datasetNumber of iteration versus accuracy in CIFAR-10-NT8 dataset
Discussion
To prove inequalities (29)–(30), it was supposed that the output of networks in the first and second paths is highly correlated. Note that the input of the first and second paths is true and indeterminacy sets, respectively. It means that the label of data can be learned either from its true set or indeterminacy set. This behavior is very useful especially for noisy data in which the label is also learned from noisy pixels with higher indeterminacy. To support this consideration experiments have been done as follows. First, MNIST, CIFAR-10 and CIFAR-100 with the highest level of noise NT4 in both training and test sets are considered. Then, the I-path ad T-path in NCNN and NVGG-Net are separated, and the outputs of each path are reported in Table 8. As it is clear from Table 8, the output of two networks is highly correlated.Correlation between the outputs of two pathsIt may be worth mentioning that in backpropagation, if the product combination method is used, gradient switcher is generated. It means that in two parallel paths, backpropagated error of each path is switched to another one. For Mean combination method, gradient distributor is constructed in backpropagation in which backpropagated error is distributed equally in two paths. Here, the weights update rule is investigated for Mean combination method:Therefore,By substituting (32) in (12):To compute , chain rule is used again which reaches to:In Eq. (34), p is the final output of the network, e represents the difference between data label L and predicted label p. E is the sum square error. These parameters are defined as follows:Applying derivation rule:Therefore, is computed as follow:By substituting (37) in (33), the update rule is performed by:The main difference in the weight update equation of NCNN with Mean combination method and CNN is that is error term e. In this case, and . Therefore, error e is affected by both T and I sets. This weights update rule can be interpreted as: The bigger difference between predicted labels by two parallel paths (T-path and I-path) and output (real) label leads to bigger amounts of updates for weights. If the predicted label by either T-path or I-path is almost the same with the real output label, low amount of error is back propagated in comparison with CNN. Finally, if both T-path and I-path predict the label close to the real output label, the error is decreased significantly and weights are updated slightly. It means that NCNN considers the predicted output of both T-path and I-path for updating weights. Finally, if the Max combination method is selected, a gradient router is constructed which means that the error is back propagated in a path in which its output is maximum. The weights of another path are not updated. If a path wins repeatedly in the first interactions, this path will be updated in the next iterations and weights of another path are not updated. This behavior leads the network to work similar to one-path CNN with the input of T or I. This is the main issue of the Max method. As it can be seen from the reported results in Table 6, this combination method has lower accuracy compared with the Mean and Product combination methods.
Conclusion
In this work, a two parallel paths NCNN model with data indeterminacy handling was proposed in NS domain for image classification task. In the proposed scheme, images were transformed to NS domain, and then, T and I sets were placed as inputs of T-path and I-path in NCNN. The output of two paths was combined to make the final output. In the proposed structure, two paths are trained simultaneously, and each path helps another for updating weights in backpropagation steps. Therefore, path combination is not a combination of previously fine-tuned structures. Also, computational cost analysis showed that NCNN is not overfitted in the training phase as quick as conventional CNN. Although the cost of NCNN is more than CNN in each training epoch. The effectiveness of NCNN to handle noisy data with higher indeterminacy was analyzed mathematically to show how weights are updated in each path. NCNN was further evaluated contaminated on three different datasets contaminated with 20 levels of Gaussian noise. Results showed that NCNN outperforms CNN models significantly in noisy test data. Future efforts will be directed toward using the proposed model in other applications such as noisy speech recognition by proposing indeterminacy for noisy speech spectrogram in NS domain. Finally, using the proposed model in other deep neural networks such as LSTM can be considered as other future works.
Authors: Abdolreza Rashno; Dara D Koozekanani; Paul M Drayna; Behzad Nazari; Saeed Sadri; Hossein Rabbani; Keshab K Parhi Journal: IEEE Trans Biomed Eng Date: 2017-07-31 Impact factor: 4.538