Literature DB >> 35756151

Uncertainty handling in convolutional neural networks.

Elyas Rashno¹, Ahmad Akbari¹, Babak Nasersharif².

Abstract

The performance of convolutional neural networks is degraded by noisy data, especially in the test phase. To address this challenge, a new convolutional neural network structure with data indeterminacy handling in the neutrosophic (NS) domain, named as Neutrosophic Convolutional Neural Networks, is proposed for image classification. For this task, images are firstly mapped from the pixel domain to three sets true (T), indeterminacy (I) and false (F) in NS domain by the proposed method. Then, NCNN with two parallel paths, one with the input of T and another with I, is constructed followed by an appropriate combination of paths to generate the final output. Here, two paths are trained simultaneously, and neural network weights are updated using back propagation algorithm. The effectiveness of NCNN to handle noisy data is analyzed mathematically in terms of the weights update rule. Proposed two paths NS idea is applied to two basic models: CNN and VGG-Net to construct NCNN and NVGG-Net, respectively. The proposed method has been evaluated on MNIST, CIFAR-10 and CIFAR-100 datasets contaminated with 20 levels of Gaussian noise. Results show that two-path NCNN outperforms CNN by 5.11% and 2.21% in 5 pairs (training, test) with different levels of noise on MNIST and CIFAR-10 datasets, respectively. Finally, NVGG-Net increases the accuracy by 3.09% and 2.57% compared to VGG-Net on CIFAR-10 and CIFAR-100 datasets, respectively.

Entities: Chemical

Keywords: Convolutional neural network; Data indeterminacy; Image classification; Neutrosophic theory

Year: 2022 PMID： 35756151 PMCID： PMC9206226 DOI： 10.1007/s00521-022-07313-2

Source DB: PubMed Journal: Neural Comput Appl ISSN： 0941-0643 Impact factor: 5.102

Introduction

Although crisp sets with 0 and 1 membership degrees were developed into fuzzy sets with continuous membership degrees, the uncertainty of each data point is not considered and described in the classical fuzzy set. In each application, uncertainty is a concept that indicates how membership degree is described for each set. The traditional fuzzy set describes the membership degree with a real number [1]. In this situation, the uncertainty about itself is not considered and handled [2]. In systems such as expert systems, information fusion, and belief systems, the truth-membership supported by the evidence can not solve and model the related problems, alone. So, the classical fuzzy set cannot solve these problems as well [2]. To achieve a complete model, falsity and indeterminacy memberships should be considered. Neutrosophic (NS) set is an extension of the fuzzy set that attempts to solve this problem by considering the truth, indeterminacy, and falsity memberships. Where the truth and indeterminacy memberships can be considered independently [3].

Literature review

Neutrosophy theory, proposed by Smarandache in 1995 [4], is a branch of philosophy which studies the nature and scope of the neutralities and their interactions with different ideational spectra which is the basis of neutrosophic logic and set [5, 6]. This theory has been applied for image processing first by Guo et. al [3], and then, it has been successfully used for other image processing domains including image segmentation [7-10], image thresholding [11], image edge detection [12], retinal image analysis [13-21], liver image analysis [22, 23], breast ultrasound image analysis [24], data classification [25], uncertainty handling [26, 27], data clustering [28-30], content-based image retrieval [31, 32] and skeletal muscle analysis [33, 34]. Today, deep learning methods have been widely used for feature extraction, feature compression, and classification. Over the past few years, convolutional neural networks (CNNs), introduced as one of deep learning approaches, have significantly improved classification accuracy and reduced processing costs [35, 36]. CNNs use variations of multilayer perceptron such as shift invariant or space invariant artificial neural networks (SIANN) to analyze visual imagery automatically [37]. Different CNN architectures have been proposed as reference models for image processing and machine vision tasks. An eight-layer CNN named as Alex-Net has been proposed in which CNN features were visualized [38]. In addition, two proposed CNN models including a very deep convolutional network (VGG-Net) [39] and GoogLe-Net [40] have been proposed by Oxford University’s Visual Geometry Group and Google Inc., respectively. CNNs have been applied in many application such as fluid segmentation [41-43], coronavirus disease (covid-19) detection [44], and vegetation remote sensing [45]. Any prediction or classification by CNN may be associated with a degree of uncertainty. There are two common types of uncertainty which stem from the uncertainty in the training dataset and uncertainty in the model structure [46-49]. If uncertainty is not handled, it may lead to disastrous consequences. For example, in May 2016, the perception system confused the white side of a trailer and the bright sky which caused the first fatality from an assisted driving system [50]. Also, an image classification system for racial discrimination erroneously identified two African Americans as gorillas [51]. In these examples, if systems were able to handle errors by assigning a high level of uncertainty to erroneous predictions, better decisions were achieved and disaster could be avoided. Uncertainty handling has been discussed in CNN structures. Quantify uncertainty in vision tasks was modeled in CNN by Kendall and Gal [52]. Considering uncertainty in the convolution layer was proposed in [53-55]. Also, uncertainty was processed in FC layers and convolution layers; e.g., network morphism; by Wei Ma and Jun Lu [56, 57]. Consideration of uncertainty in CNN layers has some issues such as needing more memory storage and more computation. Some state-of-the art methods use Monte Carlo dropout sampling to extract model uncertainty in CNN [58]. CNNs have been used in NS domain in the literature. In [59], a model for skin dermoscopic was proposed based on NS multiple deep CNNs. Then, NS similarity score (NSS) was applied to determine the number of training set for each epoch during the training process. In [60], a hybrid method using NS and CNNs was introduced for the classification of tumor region areas in MRI images as benign and malignant. For this task, MRI images were segmented using the neutrosophic set expert maximum fuzzy sure entropy (NS-EMFSE) approach. Feature vectors were extracted from segmented brain images and CNNs and classified using SVM and KNN classifiers. Effects of NS on deep transfer learning models Alexnet, Googlenet, and Restnet18 were investigated to categorize X-ray images as COVID-19, normal, pneumonia bacterial, and pneumonia virus [61]. Breast tumor segmentation was formulated as a classification problem in the NS domain for removing speckle noise and enhancing images contrast. The similarity set score and homogeneity value for each pixel were calculated in the NS domain, and then, seed regions are selected by an adaptive Otsu-based thresholding method and morphology operations. Finally, a deep CNN, based on VGG-16 network, was applied for false-positive rate reduction [62]. The proposed scheme in this research, named as neutrosophic convolution neural network (NCNN), is a CNN model with uncertainty handling in NS domain for image classification. This model can be easily adapted for other applications such as speech processing [63]. In the first step, images are mapped to three sets True (T), Indeterminacy (I) and False (F) in NS domain. The proposed definition for data indeterminacy leads to more highlighted noisy pixels in comparison with conventional NS transformation. Then, T and I sets in NS domain are presented as inputs of the proposed two-path NCNN followed by the combination of the outputs of two paths to achieve the final output. In fact, one path is trained with T set and another with I set. It has been shown that when clean and noisy data points with the same label are fed to NCNN, the difference between weight updates for these training samples is not significant. Also, back-propagation equations are computed for each path to show its difference with conventional one-path CNN and it is shown that NCNN is more robust against noisy images in comparison with conventional CNN. Key differences between the proposed NCNN and other CNNs are based on NS. Despite [59-62] which use NS for training set determination in NCNN, indeterminacy set in NS domain is used as a part of CNN training set process. It means that information provided by this set affects weights of neurons in back-propagation process and makes CNNs more robust against noisy samples. It is worth mentioning that path combination in NCNN is different from conventional combination of current CNNs in which each CNN is considered as an independent path. In NCNN, two paths are trained simultaneously, and each one affects another for updating weights. As it will be concluded from experiments and mathematical analysis, NCNN handles noisy data with higher indeterminacy and it is not overfitted in the training phase as quickly as conventional CNNs. The rest of this paper is organized as follows: Sect. 2 presents a review on NS sets. The proposed method is discussed in Sect. 3. Experimental results are reported in Sect. 4. Finally, this work is discussed and concluded in Sects. 5 and 6, respectively.

Neutrosophic set

NS is an extension of the fuzzy set in which data indeterminacy is considered in addition to truth membership degree. In fact, for each data point, the confidence of assigning truth membership degrees is also considered. Suppose that there are two observers which are going to detect a flower in a picture. The first one assigns 0.8 truth membership degree for the existence of flower, while the second one assigns 0.5. At a higher level, the confidence for the observer’s decision is also considered. Therefore, if these confidences for first and second observers are 0.3 and 0.9, respectively, the first observer affects the final decision with more weight. In NS, the confidence level for each observer is considered in the indeterminacy set. These are the main differences between NS and fuzzy sets. Generally, considering set A in NS, each member x in A is denoted by three real subsets: true, false and indeterminacy in the interval [0, 1] referred as T, F and I, respectively. Each element is expressed as x(t, i, f) which means that it is t% true, i% indeterminate, and f% false. In each application, domain experts propose the concept behind true, false and indeterminacy. To use NS in the image processing domain, each pixel is considered as a data point and should be mapped to NS sets. The first method for this mapping was proposed by Guo [3]. Mapping methods completely depend on the image processing application. An image g in pixel domain is represented with three subsets: T, I and F in NS domain. Therefore, pixel p(i, j) in g is shown with or PNS(t, i, f). T, I and F indicate white, noise and black pixel sets, respectively. PNS(t, i, f) provides useful information about white, noisy and black percentages in this pixel that is t% to be a white pixel, i% to be a noisy pixel and f% to be a black pixel. T, I and F are computed as follows [2, 3].where g is grayscale image, is filtered image g with mean filter, w is window size for mean filter, and and are the maximum and minimum of the , respectively. is the absolute difference between g and , and and are the maximum and minimum values of , respectively.

Proposed method

The main contributions of this work are first proposing data indeterminacy concept as well as the truth set in NS domain and then using these concepts in CNN models which leads to a new two-path network. It is theoretically and experimentally proved that the proposed CNN structure handles noisy and outlier data points more efficiently than conventional CNN models and converges quicker. Mentioned contributions are discussed in the following sections.

Data indeterminacy in NS domain

In the proposed scheme, noisy data points are modeled as high indeterminacy data. This is the first and the most important step since it should interpret noisy pixels for CNN model correctly. Therefore, in the proposed definition for data indeterminacy, it is expected that a high indeterminacy is assigned to noisy pixels and this concept is used for the proposed CNN structure. The proposed method for pixel indeterminacy is summarized in Eqs.(6)–(9):where is the input image convolved by the mean filter, and and represent the mean of pixels in and , respectively. The reason for dividing by mean is that pixels with higher indeterminacy are revealed brighter. Therefore, as shown in Fig. 1, edges and noisy pixels are appeared brighter in comparison with the basic NS operator.

Fig. 1

Proposed pixel Indeterminacy: a Input image, b Indeterminacy matrix computed by basic NS and c Indeterminacy matrix computed by the proposed method

Proposed network structure

Presenting indeterminacy (I) and truth (T) sets for CNN structures leads to a two-parallel-path network referred as Neutrosophic CNN. Indeterminacy and truth membership degrees are trained in the first (I-path) and second path (T-path), respectively. The outputs of these paths are combined to compute the final label predicted by the network. The proposed structure is illustrated in Figs. 2 and 3.

Fig. 2

The proposed NCNN model

Fig. 3

Network weights for NCNN

The proposed NCNN model It may be worth mentioning that two paths are trained simultaneously in contrast with two-path structures in which each path is trained separately, and then, frozen weights are combined in the final step. In fact, in the first epoch of training, weights in two paths are updated simultaneously by multiplying the outputs of paths which leads to gradient switching. In this case, paths affect and help each other resulting in robust weight updates.

Network training

Suppose that and are two weights in I-path and T-path, respectively, shown in Fig. 3. It is clear from the figure that these weights are multiplied by the neurons in the previous layer to construct I and T. Note that I and T are the predicted labels by I-path and T-path networks, respectively. These labels are combined (multiplied) to compute the final label f (see Fig. 3). Network weights for NCNN To prove how NCNN handles noisy data points with the indeterminacy concept, we discuss how network weights are updated for noisy and clean data points. Here, we explain the weight update for T-path. The scenario for I-path is the same. Each weight in T-path network is updated by the general update rule as follow:where is computed based on the neural networks update rule in Eq. (11)Applying chain rule to compute leads to:Based on network structure in Fig. 3, f and T are computed by Eq. (13):In addition, is learning rate. Therefore,By substituting (14) in (12):To compute , chain rule is used again which tends to:In Eq. (16), p is the final network output, e represents difference between data label L and predicted label p. E is sum squared error. These parameters are defined by Eq. (17):Applying derivation rules leads to Eq. (18):Therefore, is computed as follow:By substituting (19) in (15), the update rule is done by:It can be concluded from these equations that weight updates for neurons in T-path are affected by I subset in NS domain and vice versa.

Network behavior analysis

Finally, in this section, it is explained how the proposed idea for NCNN leads to better results in comparison with CNN. The main difference between the weight update rule in NCNN and conventional CNN is the parameter I. To find out the effect of I in computation, Eq. (21) is extended to Eq. (22) as follow:here there are two possible class labels 0 and 1 for L. Therefore, separated values in Eq. (23) can be considered in Eq. (22):Since T and I are the output of softmax layer, these parameters are also 0 and 1, therefore:Equation (24) in NCNN is compared with Eq. (25) in conventional CNN which is as follows:One of the main challenges of CNN is its sensitiveness against noisy data. It means that when clean data are fed as input and its label is predicted correctly, network weights are updated slightly. If the same data (with the same label) with noise are fed to the network, it tries to update weights significantly. It means that in CNN, weight update for clean and noisy data with the same label is significantly different. This behavior misleads the network for noisy data. Here, it is shown that NCNN handles noisy data and does not update weights for clean and noisy data with the same label differently. To compare the amount of weight updates for clean and noisy data in NCNN and CNN, if true label 1 is considered for noisy and clean data, weight updates for clean and noisy data in CNN are and , respectively. Therefore, the difference between weight updates for clean and noisy data in CNN is:Also, weights update for clean and noisy data in NCNN is and , respectively. The difference between weight update for clean and noisy data in NCNN is as follows:If it is proved that (27) is less than (26), it is equivalent to this fact that NCNN does not make a difference in weight update for clean and noisy data in comparison with CNN. It means that NCNN is more robust against noisy data. It is clear that for a data point with label 1, and are bigger than and . Since is a negative value, to show that (27) < (26), it is enough to show that:As it will be shown numerically in the discussion section, if the I-path and T-path networks are trained separately, they almost predict the same label. Therefore, elements in pair and as well as pair and have near values. Note that in each image, I-path learns labels from noisy pixels and pixels with a high gradient, while T-path learns labels from all pixels in the image. Therefore, in NCNN, both noisy pixels (with a high indeterminacy) and clean pixels are used to predict the label. From the result of Table 8 in the Discussion section, it can be concluded that:It is worth mentioning that inequality (30) does not satisfy when . Clean data have high values for and . Therefore, inequality (30) is not satisfied for clean data. Therefore, in clean datasets, NCNN not only does not improve the performance but also has lower accuracy in comparison with CNN. For noisy datasets, and are far from each other and leads to values near to zero. Therefore, inequality (30) is almost satisfied for noisy data. As experimental results will also show, this is the main reason that NCNN handles noisy data points and has a very good performance for noisy data classification.

Table 8

Correlation between the outputs of two paths

Dataset	MNIST		CIFAR-10		CIFAR-100
Dataset	Clean	Noisy	Clean	Noisy	Clean	Noisy
T-path output	98.97	61	92.32	57.66	69.39	50.94
I-path output	98.52	60.55	90.01	57.72	68.91	50.36

Experimental results

Dataset

NCNN model is proposed for the image classification task. Therefore, to show the effectiveness of NCNN, it has been evaluated on MNIST, CIFAR-10 and CIFAR-100 datasets. To evaluate the robustness of NCNN and other models against noise, Gaussian noise with different means and standard deviations is added to images. For each dataset in each model, 5 levels of noise are considered. Since we evaluate 2 models and 2 datasets for each model, a total of 20 noise levels are used. In this section, each dataset is described briefly.

MNIST

The first dataset is MNIST (Modified National Institute of Standards and Technology database), which is a common dataset used for evaluating various image processing and machine learning methods. This dataset contains 70,000 images of handwritten digits divided into 60,000 training images and 10,000 testing images. All images in MNIST were normalized to fit into a 28 28 pixel bounding box and anti-aliased in grayscale levels [64].

CIFAR-10

The second dataset is CIFAR-10 (Canadian Institute For Advanced Research) which is the most widely used for research in machine learning and computer vision algorithms. The CIFAR-10 dataset contains 60,000 color images with a dimension of 32 32 in 10 different classes including airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks with 6000 images in each class. The main purpose of CIFAR-10 is to teach a computer how to recognize objects. This dataset allows researchers to quickly try different machine learning and machine vision algorithms since images are in low-resolution (32 32). Different models of CNNs have tried to achieve the best accuracy in recognizing CIFAR-10 images [65].

CIFAR-100

CIFAR-100 is the same as CIFAR-10 except for the number of classes and samples in each class. CIFAR-100 contains 100 classes with 600 samples in each class. The number of training and testing samples in each class is 500 and 100, respectively. Classes are further divided into 20 superclasses. Therefore, each sample is assigned to “fine” and “coarse” labels which are the label of class and superclass, respectively. Classes are aquatic mammals, fish, flowers, food containers, fruit and vegetables, household electrical devices, household furniture, insects, large carnivores, large man-made outdoor things, large natural outdoor scenes, large omnivores and herbivores, medium-sized mammals, non-insect invertebrates, people, reptiles, small mammals, trees, vehicles 1 and vehicles 2 [66].

CNN models

NCNN is a structure that can be applied to any CNN model. It means that CNN models with a different number of layers can be placed in NCNN structure. In this section, we want to show how NCNN can be applied to any CNN model to improve its robustness against noisy data with higher indeterminacy. For this task, two basic CNN models are considered. The first one is the proposed 11-layer CNN model, and the second one is the reference model VGG-net. Applying NS-based two parallel paths idea to the first and second mentioned models constructs NCNN and NVGG-net, respectively. NCNN and CNN have been applied to MNIST and CIFAR-10 datasets. VGG-net and NVGG-net are evaluated on CIFAR-10 and CIFAR-100 datasets. All CNN structures have been implemented with python and TensorFlow libraries installed in Linux operating system on a machine with 3.26 GHz Corei7 CPU, 32 GB of DDR4 RAM and GeForece 1080ti 8GB RAM GPU.

The first model

The first proposed model is an 11-layer CNN shown in Fig. 4. In this structure, “5 × 5 conv, 64” means convolution layer with 64 filters in dimension 5 5. “3 3 Pooling” means polling layer with a 3 3 filter, and “FC, 384” means fully connected layer with 384 neurons. In each convolution layer, the step size is 1 1; therefore, the first and second dimensions of the input image remain fixed, while the third layer may be changed depending on the number of convolution layers. In the pooling layer, since the step size is 2 2, the first and second dimensions are reduced by half. Finally, there are three fully connected layers with different numbers of neurons shown in Fig. 4. After each convolution and pooling layer, activation function “RELU” is used.

Fig. 4

The first proposed CNN model

The first proposed CNN model NCNN and CNN have been evaluated on MNIST and CIFAR-10 datasets. In each experiment, a level of Gaussian noise is considered which makes different combinations of training and test sets with noise levels including “clean,” “NT1: Mean = 0.8, Std = 0.6,” “NT2: Mean = 0.6, Std = 0.9,” “NT3: Mean = 0.7, Std = 1.2” and “NT4: Mean = 0.8, Std = 1.6.” Therefore, from each dataset, we obtain 25 datasets. The first dataset is the original dataset with clean training and clean test data. The 25th dataset is a dataset in which training and test data are added with NT4 noise. The classification accuracies on MNIST test data are reported in Table 1. In all tables, vertical and horizontal data types are used for training and test sets, respectively.

Table 1

Results of CNN and NCNN for MNIST

		Clean	NT1	NT2	NT3	NT4	Average
Clean	CNN	98.97	49.51	35.23	24.27	18.63	45.32
Clean	NCNN	98.47	48.83	43.4	32.84	28.65	50.43
NT1	CNN	97.16	93.96	80.67	63.07	46.2	76.21
NT1	NCNN	97.9	94.86	85.85	70.3	54.88	80.75
NT2	CNN	96.58	94.41	87.95	75.36	57.31	82.32
NT2	NCNN	95.66	94.52	88.21	75.41	59.01	82.56
NT3	CNN	94.72	89.16	86.32	76.99	61.78	81.79
NT3	NCNN	90.17	92.6	87.39	76.85	62.26	81.85
NT4	CNN	90.24	90.05	84.09	74.87	61	80.05
NT4	NCNN	88.85	91.28	85.69	76.06	64.83	81.34
Average	CNN	95.534	83.418	74.852	62.912	48.984	–
Average	NCNN	94.21	84.418	78.108	66.292	53.926	–

Results of CNN and NCNN for MNIST It can be concluded from the reported results in Table 1 (where it is also shown mathematically in Sect. 3) that if both training and test data are clean, CNN outperforms NCNN slightly. The best performance of NCNN is revealed in cases which noise is presented either in training or test data. For example, when training data are clean and test data are NT4, NCNN outperforms CNN with 10.02%. This behavior of NCNN is shown in Fig. 5. It can be seen that the higher amount of noise, the better improvement in NCNN (in comparison with CNN) is achieved. These results verify the main advantage of NCNN for uncertainty handling in noisy data.

Fig. 5

NCNN and CNN accuracies with clean training and noisy test data for MNIST

NCNN and CNN accuracies with clean training and noisy test data for MNIST Another conclusion from reported results on MNIST dataset is that training network with noisy data makes it more robust against noisy data in the test phase. Figure 6 illustrates the results of CNN and NCNN with NT3 training data and all noise levels for test data. In this experiment, similar to Fig. 5, applying more noise (NT3) in test data leads to a better improvement (3%) in the results of NCNN in comparison with CNN. When clean data are considered in the test phase, CNN outperforms NCNN by about 1%.

Fig. 6

NCNN and CNN accuracies with noisy training and noisy test data for MNIST

NCNN and CNN accuracies with noisy training and noisy test data for MNIST CIFAR-10 is the second dataset considered for evaluation of the first model. By applying 5 models of data including “Clean,” “NT5: Mean = 0.1, Std = 0.1,” “NT6: Mean = 0.4, Std = 0.2,” “NT7: Mean = 0.2, Std = 0.4” and “NT8: Mean = 0.3, Std = 0.6,” 25 datasets are created. The classification accuracies on test data for CIFAR-10 are reported in Table 2. Here, each column shows experiments with the same training data and test data with different levels of noise, while each row is for the training data with different noise levels and the same test data. In CIFAR-10, the best improvement of NCNN (3.74%) appeared in the case that training and test data are contaminated with NT7 noise.

Table 2

Results of CNN and NCNN for CIFAR-10

		Clean	NT5	NT6	NT7	NT8	Average
Clean	CNN	71.35	63.13	45.94	23.61	16.84	44.174
	NCNN	72.76	65.12	49.39	22.73	19.23	45.846
NT5	CNN	68.56	67.3	58.29	35.41	23.17	50.546
	NCNN	71.06	70.68	61.39	36.91	22.27	52.462
NT6	CNN	62.67	62.78	61.59	40.29	29.04	51.274
	NCNN	64.08	64.75	63.83	43.33	28.83	52.964
NT7	CNN	51.72	49.08	36.06	51.24	43.48	46.316
	NCNN	53.77	51.45	36.38	54.98	46.03	48.522
NT8	CNN	41.77	39.45	23.96	48.22	45.5	39.78
	NCNN	44.77	41.46	24.08	48.46	48.42	41.438
Average	CNN	59.214	56.348	45.168	39.754	31.606	–
	NCNN	61.288	58.692	47.014	41.282	32.956	–

Results of CNN and NCNN for CIFAR-10 The two columns with “Clean” header report the classification accuracies of NCNN and CNN with the clean training data and test data with all noise levels. These classification accuracies are depicted in Fig. 7. For these experiments, NCNN outperforms CNN by 3.45% for test data with NT6 noise. In CIFAR-10 dataset, the trend is the same with MNIST with a slower slope. It means that MNIST is more affected by noisy test data in comparison with MNIST.

Fig. 7

NCNN and CNN accuracies with clean training and noisy test data for CIFAR-10

NCNN and CNN accuracies with clean training and noisy test data for CIFAR-10 Evaluation of NCNN and CNN with NT7 training data and different noise levels in test data is shown in Fig. 8. In these cases, instead of a down trend toward increasing noise, the trend is upward after NT6 which means the similarity between noise applied to training and test data.

Fig. 8

NCNN and CNN accuracies with noisy training and noisy test data for CIFAR-10

NCNN and CNN accuracies with noisy training and noisy test data for CIFAR-10 Generally, as it is illustrated in Figs. 9, 10, 11 and 12, if the noise is increased from NT5 to NT8, the accuracies of both networks are decreased with negative slopes in all charts. In cases where the noise level in training and test is the same, not only there is no significant decrement in the accuracy, but also there is an improvement in some cases. For example, in Fig. 10, since the noise level in training is NT6, the accuracy is not decreased noticeably when the noise level is increased from NT5 to NT6. In Fig. 11 which represents training data with noise level NT7, when noise is increased from NT6 to NT7, the accuracy is improved significantly.

Fig. 9

NCNN and CNN accuracies for NT5 training data

Fig. 10

NCNN and CNN accuracies for NT6 training data

Fig. 11

NCNN and CNN accuracies for NT7 training data

Fig. 12

NCNN and CNN accuracies for NT8 training data

NCNN and CNN accuracies for NT5 training data NCNN and CNN accuracies for NT6 training data NCNN and CNN accuracies for NT7 training data NCNN and CNN accuracies for NT8 training data

The second model (VGG-Net)

The second base model is VGG-Net [39] as illustrated in Fig. 13. This model has 16 layers with “RELU” activation functions. After convolution and fully connected layers, batch normalization is used. For convolution layers which are not followed by pooling layers, dropout is considered with the coefficients of 0.4 and 0.3. Also, in all fully connected layers, except the last one, dropout with the coefficient of 0.5 is used. In convolution layers, since the step size is 1x1, the dimensions of images before and after passing these layers are the same, while in the dropout layer, they are decreased by half because of step size 2x2. Applying NS-based two parallel paths to VGG-Net creates NVGG-Net.

Fig. 13

VGG-Net structure

VGG-Net structure VGG-Net and NVGG-Net have been evaluated on CIFAR-10 and CIFAR-100 datasets. In CIFAR-10, 5 levels of noise including “Clean,” “NT9: Mean =0.2, Std=0.1,” “NT10: Mean = 0.1, Std = 0.2,” “NT11: Mean = 0.2, Std = 0.3” and “NT12: Mean = 0.3, Std = 0.5” are considered which can be applied to either training or test data. Table 3 reports the results of VGG-Net and NVGG-Net for CIFAR-10. NVGG-Net achieves better classification accuracy rather than VGG-Net except for pairs (Clean, NT10), (NT9, NT11), (NT11, NT10), (NT11, Clean) and (NT11, NT11) out of all 25 pairs for (Training, Test). The best average improvement of 3.09% was obtained by NVGG-Net in cases with NT12 test set and noisy training sets (with all levels of noise). In these cases, an improvement of 4.9% was achieved by NVGG-Net in pair (Clean, NT12).

Table 3

Results of VGG-Net and NVGG-Net for CIFAR-10

		Clean	NT9	NT10	NT11	NT12	Average
Clean	VGG-Net	92.32	75.28	54.18	43.6	30.54	59.18
	NVGG-Net	92.52	78.22	53.9	47.94	35.44	61.6
NT9	VGG-Net	88.31	84.67	72.32	59.64	42.54	69.5
	NVGG-Net	88.53	86.15	73.62	59.13	44.04	70.29
NT10	VGG-Net	83.6	81.8	76.76	67.57	52.23	72.39
	NVGG-Net	84.83	82.48	78.56	69.38	55.09	74.07
NT11	VGG-Net	79.14	77.93	75	68.83	56.18	71.42
	NVGG-Net	78.12	79.44	74.18	68.18	59.59	71.9
NT12	VGG-Net	69.82	69.47	68.32	65.16	57.66	66.09
	NVGG-Net	71.5	70.55	69.05	67.22	60.44	67.75
Average	VGG-Net	82.638	75.28	69.316	60.96	47.83	–
	NVGG-Net	83.1	78.22	69.86	62.37	50.92	–

Results of VGG-Net and NVGG-Net for CIFAR-10 Figures 14, 15, 16, 17 and 18 show how networks are affected by changing training and test data and can be very useful. It can be that NVGG-Net outperforms VGG-Net in the majority of cases especially when the noise in the test set is NT12. In both networks, if the network is trained with more noisy data, it is more robust against increasing noise in test data. Therefore, Fig. 14 for clean training data has the most descent in the slope, while Fig. 18 for NT12 training data is near to a horizontal line (the least descent in slope). Also, using the same noise in training and test data makes the network more robust.

Fig. 14

NVGG-Net and VGG-Net accuracies with Clean training data

Fig. 15

NVGG-Net and VGG-Net accuracies with NT9 training data

Fig. 16

NVGG-Net and VGG-Net accuracies with NT10 training data

Fig. 17

NVGG-Net and VGG-Net accuracies with NT11 training data

Fig. 18

NVGG-Net and VGG-Net accuracies with NT12 training data

NVGG-Net and VGG-Net accuracies with Clean training data NVGG-Net and VGG-Net accuracies with NT9 training data NVGG-Net and VGG-Net accuracies with NT10 training data NVGG-Net and VGG-Net accuracies with NT11 training data NVGG-Net and VGG-Net accuracies with NT12 training data For CIFAR-100, different noises were applied in cases “clean,” “NT13: Mean = 0.1, Std = 0.1,” “NT14: Mean = 0.2, Std = 0.1,” “NT15: Mean = 0.1, Std = 0.2” and “NT16: Mean = 0.2, Std = 0.2.” Table 4 reports the results of VGG-Net and NVGG-Net for CIFAR-100. The best improvement of 4.5% was obtained by NVGG-Net in pair (NT14,NT15). Also, the best average improvement of 2.57% was obtained by NVGG-Net in cases with NT16 test set and noisy training sets (with all levels of noise).

Table 4

Accuracies of VGG-Net and NVGG-Net for CIFAR-100

		Clean	NT13	NT14	NT15	NT16	Average
Clean	VGG-Net	69.39	56.8	56.05	46.06	46.03	54.87
	NVGG-Net	69.51	57.82	55.53	49.72	50.64	56.64
NT13	VGG-Net	63.12	59.67	59.38	50.2	50.64	56.6
	NVGG-Net	63.075	61.42	61.72	52.96	52.63	58.36
NT14	VGG-Net	63.93	59.2	59.19	49.75	50.98	56.61
	NVGG-Net	64.14	61.69	61.78	54.25	53.82	59.13
NT15	VGG-Net	57.43	55.68	55.51	50.52	51.3	54.09
	NVGG-Net	60.87	55.51	56.32	51.82	52.63	55.43
NT16	VGG-Net	57.1	55.54	55.92	50.85	50.94	54.07
	NVGG-Net	58.75	57.33	56.28	50.49	53.01	55.17
Average	VGG-Net	62.194	57.378	57.21	49.476	49.978	–
	NVGG-Net	63.269	58.754	58.326	51.848	52.546	–

Accuracies of VGG-Net and NVGG-Net for CIFAR-100 Finally, as it is shown in Fig. 19, networks training with the highest level of noise (NT16) does not make a significant change in the robustness of the networks. The reason is that the amount of noises added to CIFAR-100 is much less than those in CIFAR-10.

Fig. 19

NVGG-Net and VGG-Net accuracies with training data: (a) Clean and (b) NT4

Window size effect

Window size has an important effect on computing indeterminacy and true sets in image transformation to NS domain. In this section, the effect of window size is evaluated on MNIST dataset for the first model and CIFAR-10 and CIFAR-100 dataset for the second model. Classification accuracies with different window sizes are reported in Table 5. It can be concluded that for clean data, smaller window sizes have better performance, while for noisy data, bigger window sizes are preferred. The reason is that with a bigger window size, more general information around each pixel is considered in NS domain and the robustness against noisy pixels in local neighbors is increased. On the other hand, for clean data, we need only local information and we do not need general information around each pixel. Window size 3 3 has the best performance in all experiments in average.

Table 5

Classification accuracies with different window sizes

Dataset	MNIST			CIFAR-10			CIFAR-100
Window size	2 × 2	3 × 3	4 × 4	2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document}× 2	3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document}× 3	4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document}× 4	2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document}× 2	3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document}× 3	4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document}× 4
Clean	50.89	50.43	50.4	62.03	61.6	61.62	56.69	56.64	56.11
NT1	80.88	80.75	80.79	70.05	70.29	69.77	58.58	58.36	57.9
NT2	82.4	82.56	82.5	73.88	74.07	73.92	58.73	59.13	59.1
NT3	80.1	81.65	81.99	70.93	71.9	71.99	54.11	55.43	55.61
NT4	79.99	81.34	82.01	66.01	67.75	68.01	53.89	55.17	55.23
Average	74.85	75.34	75.53	68.58	69.12	69.06	56.4	56.94	56.79

Classification accuracies with different window sizes

Combination method effect

The proposed model for indeterminacy consideration in CNN is a two parallel paths network, one path with the input of indeterminacy set I and another with true set T. Combination method of two paths is very important. Here, three combination methods including Maximum, Mean and Product are assessed. Table 6 reports the classification accuracies with different combination methods applied to MNIST dataset for the first model and CIFAR-10 and CIFAR-100 dataset for the second model. In these experiments, window sizes are selected as 3 3. It is clear from the reported results that Product method has the best accuracy among other methods The effect of Mean and Maximum combination methods will be mathematically analyzed in the Discussion section.

Table 6

Classification accuracies with different combination methods

Dataset	MNIST			CIFAR-10			CIFAR-100
Dataset	Max	Mean	Prod	Max	Mean	Prod	Max	Mean	Prod
Clean	49.32	50.12	50.43	60.22	61.01	61.60	55.87	56.83	56.64
NT1	76.33	79.41	80.75	69.76	69.18	70.29	57.68	57.19	58.36
NT2	81.00	81.92	82.56	73.10	73.55	74.07	58.19	58.37	59.13
NT3	80.40	81.77	81.65	69.94	69.22	71.90	54.78	55.10	55.43
NT4	79.33	81.10	81.34	66.10	67.45	67.75	54.24	53.72	55.17
Average	73.27	74.86	75.34	67.82	68.08	69.12	56.15	56.24	56.94

Classification accuracies with different combination methods

Training methods

The last experiments are performed to evaluate training methods. There are two options to train parallel paths in the network. The first one is an ensemble method in which two paths are trained separately and then their outputs are combined. In the second method, two paths are trained simultaneously. The first and second methods are called “Separate Training” and “Joint Training,” respectively. In these experiments, window size 3 3 and Product combination methods are considered. As reported in Table 7, the second training method has better classification accuracy with the 2.13% of improvement on CIFAR-100 dataset. Therefore, we used the second training in this research.

Table 7

Classification accuracies with different training methods

Dataset	MNIST		CIFAR-10		CIFAR-100
Training method	Separate	Joint	Separate	Joint	Separate	Joint
Clean	50.12	50.43	60.99	61.80	55.32	58.64
NT1	79.32	80.75	70.01	70.29	56.75	58.36
NT2	81.02	82.56	73.12	74.07	58.22	59.13
NT3	79.98	81.65	70.27	71.90	54.68	57.43
NT4	80.11	81.34	65.91	67.75	53.11	55.17
Average	74.11	75.34	68.06	69.16	55.61	57.74

Classification accuracies with different training methods The source code of NCNN and other CNN models used in this research as well as three datasets in clean and noisy formats will publicly available online as supplementary material in the journal website and Github after the publication of the paper.

Computational cost

In this section, the computational cost of CNN and NCNN models is analyzed. For this task three datasets CIFAR-Clean, CIFAR10-NT6 and CIFAR10-NT8 are selected for cost evaluation. All models have been implemented in python programming language with TensorFlow and Pandas libraries with NDIVIA 1080ti graphical processing unit (GPU), 32G RAM DDR4, corei7 CPU and 500G hard drive SSD. Time costs for running 1 patch from all 10000 data points in test set for CNN and NCNN are 1.18 and 2.08 seconds, respectively. Since NCNN uses two paths T-path and I-path, the number of parameters (weights) is as twice as the number of parameters in CNN with one path, so NCNN training time is more than CNN one. The number of iterations as well as the accuracy of each iteration in the training phase is shown in Figs. 20, 21 and 22 for CIFAR-Clean, CIFAR10-NT6 and CIFAR10-NT8 datasets, respectively. As it is clear from these figures, CNN is overfitted in iteration 4000, while NCNN leads to better results in the training phase which can be considered as an advantage for NCNN.

Fig. 20

Accuracy versus number of iterations in CIFAR-Clean dataset

Fig. 21

Number of iteration versus accuracy in CIFAR-10-NT6 dataset

Fig. 22

Number of iteration versus accuracy in CIFAR-10-NT8 dataset

Accuracy versus number of iterations in CIFAR-Clean dataset Number of iteration versus accuracy in CIFAR-10-NT6 dataset Number of iteration versus accuracy in CIFAR-10-NT8 dataset

Discussion

To prove inequalities (29)–(30), it was supposed that the output of networks in the first and second paths is highly correlated. Note that the input of the first and second paths is true and indeterminacy sets, respectively. It means that the label of data can be learned either from its true set or indeterminacy set. This behavior is very useful especially for noisy data in which the label is also learned from noisy pixels with higher indeterminacy. To support this consideration experiments have been done as follows. First, MNIST, CIFAR-10 and CIFAR-100 with the highest level of noise NT4 in both training and test sets are considered. Then, the I-path ad T-path in NCNN and NVGG-Net are separated, and the outputs of each path are reported in Table 8. As it is clear from Table 8, the output of two networks is highly correlated. Correlation between the outputs of two paths It may be worth mentioning that in backpropagation, if the product combination method is used, gradient switcher is generated. It means that in two parallel paths, backpropagated error of each path is switched to another one. For Mean combination method, gradient distributor is constructed in backpropagation in which backpropagated error is distributed equally in two paths. Here, the weights update rule is investigated for Mean combination method:Therefore,By substituting (32) in (12):To compute , chain rule is used again which reaches to:In Eq. (34), p is the final output of the network, e represents the difference between data label L and predicted label p. E is the sum square error. These parameters are defined as follows:Applying derivation rule:Therefore, is computed as follow:By substituting (37) in (33), the update rule is performed by:The main difference in the weight update equation of NCNN with Mean combination method and CNN is that is error term e. In this case, and . Therefore, error e is affected by both T and I sets. This weights update rule can be interpreted as: The bigger difference between predicted labels by two parallel paths (T-path and I-path) and output (real) label leads to bigger amounts of updates for weights. If the predicted label by either T-path or I-path is almost the same with the real output label, low amount of error is back propagated in comparison with CNN. Finally, if both T-path and I-path predict the label close to the real output label, the error is decreased significantly and weights are updated slightly. It means that NCNN considers the predicted output of both T-path and I-path for updating weights. Finally, if the Max combination method is selected, a gradient router is constructed which means that the error is back propagated in a path in which its output is maximum. The weights of another path are not updated. If a path wins repeatedly in the first interactions, this path will be updated in the next iterations and weights of another path are not updated. This behavior leads the network to work similar to one-path CNN with the input of T or I. This is the main issue of the Max method. As it can be seen from the reported results in Table 6, this combination method has lower accuracy compared with the Mean and Product combination methods.

Conclusion

In this work, a two parallel paths NCNN model with data indeterminacy handling was proposed in NS domain for image classification task. In the proposed scheme, images were transformed to NS domain, and then, T and I sets were placed as inputs of T-path and I-path in NCNN. The output of two paths was combined to make the final output. In the proposed structure, two paths are trained simultaneously, and each path helps another for updating weights in backpropagation steps. Therefore, path combination is not a combination of previously fine-tuned structures. Also, computational cost analysis showed that NCNN is not overfitted in the training phase as quick as conventional CNN. Although the cost of NCNN is more than CNN in each training epoch. The effectiveness of NCNN to handle noisy data with higher indeterminacy was analyzed mathematically to show how weights are updated in each path. NCNN was further evaluated contaminated on three different datasets contaminated with 20 levels of Gaussian noise. Results showed that NCNN outperforms CNN models significantly in noisy test data. Future efforts will be directed toward using the proposed model in other applications such as noisy speech recognition by proposing indeterminacy for noisy speech spectrogram in NS domain. Finally, using the proposed model in other deep neural networks such as LSTM can be considered as other future works.

14 in total

1. Confidence estimation methods for neural networks: a practical comparison.

Authors: G Papadopoulos; P J Edwards; A F Murray
Journal: IEEE Trans Neural Netw Date: 2001

2. An improved shift-invariant artificial neural network for computerized detection of clustered microcalcifications in digital mammograms.

Authors: W Zhang; K Doi; M L Giger; R M Nishikawa; R A Schmidt
Journal: Med Phys Date: 1996-04 Impact factor: 4.071

3. RETOUCH: The Retinal OCT Fluid Detection and Segmentation Benchmark and Challenge.

Authors: Hrvoje Bogunovic; Freerk Venhuizen; Sophie Klimscha; Stefanos Apostolopoulos; Alireza Bab-Hadiashar; Ulas Bagci; Mirza Faisal Beg; Loza Bekalo; Qiang Chen; Carlos Ciller; Karthik Gopinath; Amirali K Gostar; Kiwan Jeon; Zexuan Ji; Sung Ho Kang; Dara D Koozekanani; Donghuan Lu; Dustin Morley; Keshab K Parhi; Hyoung Suk Park; Abdolreza Rashno; Marinko Sarunic; Saad Shaikh; Jayanthi Sivaswamy; Ruwan Tennakoon; Shivin Yadav; Sandro De Zanet; Sebastian M Waldstein; Bianca S Gerendas; Caroline Klaver; Clara I Sanchez; Ursula Schmidt-Erfurth
Journal: IEEE Trans Med Imaging Date: 2019-02-26 Impact factor: 10.048

4. OCT Fluid Segmentation using Graph Shortest Path and Convolutional Neural Network^.

Authors: Abdolreza Rashno; Dara D Koozekanani; Keshab K Parhi
Journal: Annu Int Conf IEEE Eng Med Biol Soc Date: 2018-07

5. A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects.

Authors: Zewen Li; Fan Liu; Wenjie Yang; Shouheng Peng; Jun Zhou
Journal: IEEE Trans Neural Netw Learn Syst Date: 2021-06-10 Impact factor: 10.451

6. Combined endeavor of Neutrosophic Set and Chan-Vese model to extract accurate liver image from CT scan.

Authors: Sangeeta K Siri; Mrityunjaya V Latte
Journal: Comput Methods Programs Biomed Date: 2017-08-24 Impact factor: 5.428

7. Fully Automated Segmentation of Fluid/Cyst Regions in Optical Coherence Tomography Images With Diabetic Macular Edema Using Neutrosophic Sets and Graph Algorithms.

Authors: Abdolreza Rashno; Dara D Koozekanani; Paul M Drayna; Behzad Nazari; Saeed Sadri; Hossein Rabbani; Keshab K Parhi
Journal: IEEE Trans Biomed Eng Date: 2017-07-31 Impact factor: 4.538

8. A novel glomerular basement membrane segmentation using neutrsophic set and shearlet transform on microscopic images.

Authors: Yanhui Guo; Amira S Ashour; Baiqing Sun
Journal: Health Inf Sci Syst Date: 2017-11-09

9. Automatic detection of coronavirus disease (COVID-19) using X-ray images and deep convolutional neural networks.

Authors: Ali Narin; Ceren Kaya; Ziynet Pamuk
Journal: Pattern Anal Appl Date: 2021-05-09 Impact factor: 2.580

10. Automated image segmentation method to analyse skeletal muscle cross section in exercise-induced regenerating myofibers.

Authors: Masoud Rahmati; Abdolreza Rashno
Journal: Sci Rep Date: 2021-10-29 Impact factor: 4.379