Literature DB >> 34847177

Fully automatic image colorization based on semantic segmentation technology.

Abstract

Aiming at these problems of image colorization algorithms based on deep learning, such as color bleeding and insufficient color, this paper converts the study of image colorization to the optimization of image semantic segmentation, and proposes a fully automatic image colorization model based on semantic segmentation technology. Firstly, we use the encoder as the local feature extraction network and use VGG-16 as the global feature extraction network. These two parts do not interfere with each other, but they share the low-level feature. Then, the first fusion module is constructed to merge local features and global features, and the fusion results are input into semantic segmentation network and color prediction network respectively. Finally, the color prediction network obtains the semantic segmentation information of the image through the second fusion module, and predicts the chrominance of the image based on it. Through several sets of experiments, it is proved that the performance of our model becomes stronger and stronger under the nourishment of the data. Even in some complex scenes, our model can predict reasonable colors and color correctly, and the output effect is very real and natural.

Entities: Chemical

Mesh：

Substances：
Coloring Agents

Year: 2021 PMID： 34847177 PMCID： PMC8631650 DOI： 10.1371/journal.pone.0259953

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Many fields, including old photo and old movie restoration, remote sensing image, and biologic medical image, have strong demand for image colorization technology. The goal of image colorization is to assign colors to each pixel of a grayscale image, and the researches on this subject has also been in the ascendant. The earliest research on this subject was Markle [1], who realized the colorization of the moon image with the help of computer aided technologies, which attracted wide attention from all walks of life. In the past, most commonly used methods for processing image colorization were extension method based on local color [2, 3] and color transfer method based on the reference image [4, 5]. The biggest advantage of the former is reflected in the interactivity and controllability. The user can color the target image according to their own will, and it can get better coloring effect even for target images with complex content. The disadvantage is that the algorithm has certain requirements on the user’s own color sensitivity and color matching. In addition, it is prone to problems such as color bleeding and boundary blurring when dealing with images with complex textures. Therefore, it is only suitable for application scenarios with low requirements on the accuracy of the colorization of the border. The advantage of the latter is that the influence of human factors is eliminated in the coloring process, and the result is relatively objective. The limitation is that the coloring effect is completely dependent on the similarity between the reference image and the target image, so it is only suitable for image colorization with a single hue or content. Deep neural networks realize image colorization that have gradually became a trend to replace manual coloring now. Compared with the above two methods, this end-to-end network overcomes the limitation of human intervention, and it is natural, efficient and easy to operate. The goal of this paper is to convert the grayscale image or the color image without rich information into the color image with clear details and clear colors, so as to improve the visual effect of users and facilitate the study of subsequent images. In theory, different landscape’s colors in black-and-white images have different grayscales, so neural network based on grey values of images can judge color of the item roughly. The result may not be very accurate, especially the pixels of similar grayscales, such as grass may become blue and blue jeans may became green through the calculation of neural network. In addition, color bleeding is also a common problem in image colorization. Therefore, we need to let neural networks have common sense so that it can judge where the boundaries of objects are and what colors the items should be in different scenarios. Aiming at the above two problems, this paper proposes a fully automatic grayscale image colorization model based on semantic segmentation technology, and its colorization process is shown in Fig 1.

Fig 1

Colorization process of our algorithm.

In the dotted box, the corrected grayscale image is fed into the network model for color prediction of a* and b*.

Colorization process of our algorithm.

In the dotted box, the corrected grayscale image is fed into the network model for color prediction of a* and b*. The contributions of this paper include: (1) histogram equalization effectively improves the visual effect and the colorfulness of overexposed and underexposed images; (2) the introduction of semantic segmentation network accelerates the edge convergence of the image and improves the positioning accuracy of the algorithm, and solves the problem of color bleeding; (3) compared with several popular algorithms, our model has better results in natural images colorization and black-and-white images colorization.

Related work

The neural network models currently used for image colorization are mostly generative adversarial networks(GAN) [6] and convolutional neural networks(CNN) [7].

Research on image colorization based on GAN

In recent years, GAN has achieved great success in the fields of image generation, image translation and image restoration. Isola et al. [8] proposed pix2pix that realized the image transform of two types, which basically solved the problem of grayscale images colorization, but its disadvantage was that it was difficult to obtain paired data for training in real life. Zhu et al. [9] presented CycleGAN on the basis of pix2pix, which introduced the reconstructed loss to achieve the separation of style and content. CycleGAN supports the training of unpaired data to achieve one-to-one style transfer and image colorization, but the coloring object must be the object that appears in the training set, otherwise the incorrect color may be passed to it. Zhu et al. [10] put forward BicycleGAN, which combined the advantages of VAE GAN [11] and LR GAN [12]. BicycleGAN can be regarded as the upgraded version of pix2pix, which requires paired data to train and has features of parameter sharing in CycleGAN. These models are more accurate in the treatment of highly recognizable colors such as portraits, plants, and the sky. For the difficult part of identification, more warm colors are used or the contrast is simply enhanced.

Research on image colorization based on CNN

CNN models that are similar to AlexNet [13] and VGGNet [14] are usually used for image classification and regression tasks, while image colorization can be regarded as the prediction of the probability of color of each pixel of grayscale image, which is similar to regression tasks. Cheng et al. [15] used CNN to extract the high-level features of the image, realized automatic image colorization by inputting the image descriptors. Iizuka et al. [16] constructed a fusion layer to fuse the local image block information with the global prior of the whole image to predict the color information of each pixel in the grayscale image, but the coloring result was fixed. Larsson et al. [17] used the natural appearance of multimodal color distribution of the image scenes to train the model for generating corresponding color histogram and color image. The effect of their approach was better than others. Zhang et al. [18] were inspired by “color prediction is a multimodal problem” [14], and predicted the color distribution for each pixel, so that the final result could be in a variety of different styles. Zhang et al. [19] introduced an AI tool for real-time coloring black-and-white images by fusing low-level clues and high-level semantic information, which could directly map grayscale images to CNN to generate colorized results. He et al. [20] proposed an example-based deep learning method for local colorization, this network allowed different reference images to be selected to control the output image. Even if unrelated reference images are used, this method has good robustness and versatility.

Other methods

In addition to image content, color also affects the user’s emotional response. Yang et al. [21, 22] used color histogram to convey emotions between color images. This method is very flexible and supports users to choose different references to convey different emotions. Wang et al. [23, 24] obtained a sentiment palette by giving a semantic word or providing a reference image, and directly transferred the color of the template to the target image. However, these studies [25-28] incorporated emotional factors they had ignored into the image. Cho et al. [25, 26] proposed Text2Colors that this model included text-to-palette generation networks and palette-based colorization networks, these two sub-networks utilized conditional GAN (cGAN) [12]. Chen et al. [27] used the recurrent attention model (RAM) to fuse images and semantic features, introduced a stop gate for each region of the image, so as to dynamically determine whether to continue to infer additional information from the text description after each reasoning step, and finally showed the first semantic based coloring result on the Oxford-102 Flowers dataset [28]. Su and Sun [29] improved the previous idea of using a single color or several colors to achieve color transform, and proposed that the user could adjust the number of main colors in the image according to the complexity of the image content. The operation is more flexible and the coloring effect is more accurate and natural. Wan S et al. [30] input the extracted points of interest into the network to generate color points, and continued to spread to neighboring pixels to achieve fully automatic image coloring. Through several sets of comparative experiments, it can be seen that this method is very efficient and has a good application prospect. At the same time, the author also talked about the subsequent application of the model to low-light night vision images in the conclusion section, which will usher in greater challenges. Yu X et al. [31] first confirmed the scene category of the input image, and then performed color mapping learning based on the image in the corresponding category, which greatly improved the coloring efficiency and coloring accuracy of the algorithm. The idea of guiding coloring according to the use scene is very targeted and very suitable for coloring medical images. Special applications in other fields can also be considered in the future.

Proposed method

We borrowed the architecture design of Iizuka et al. [16] on the feature extraction network, and combined the actual needs to introduce the histogram equalization and semantic segmentation technology, which makes the final coloring effect very good. The model mainly includes six parts: low-level feature extraction network, local feature extraction network, global feature extraction network, the fusion modules, semantic segmentation network, and color prediction network. The main line of the network is a U-Net [32] with an encoding-decoding structure, the specific design is shown in Fig 2. The encoder allows us to input images of arbitrary resolution. The core of the color prediction network is the decoder, which is responsible for predicting a* and b* of the image based on the output of the encoder and the learning effect of the semantic segmentation network.

Fig 2

Our model.

Some semantic segmentation information of the image can help the prediction network understand the content of the image and coloring accurately.

Our model.

Some semantic segmentation information of the image can help the prediction network understand the content of the image and coloring accurately.

Feature extraction network

Our feature extraction network includes low-level feature extraction network, local feature extraction network and global feature extraction network. In order to reduce the difficulty of network training, the encoder on the left of U-Net is selected as the local feature extraction network, and VGG-16 [14] is used to extract the global feature, and they share the low-level feature.

Low-level feature extraction network

Before entering the low-level feature extraction network, the target image needs to be normalized to a size of 224 × 224, because the size of the fully connected (FC) layer needs to be fixed to complete feature fusion and semantic classification in the following global feature extraction and semantic segmentation network. The low-level feature extraction network uses six convolutional layers to extract the low-level features, and Eq (1) is as follows: Where (u, v) is the coordinate of a pixel, k and k are the height and the width of the convolution kernel. X and Y are the values of the input and the output under the coordinates (u, v). W ∈ R is the weight matrix of the 2D convolution kernel, b is the bias term, δ is a non-linear activation function, p is the number of convolution kernels, q is the number of channels of the input unit. The Rectified Linear Unit (ReLU) is used to complete the convolution calculation, Eq (2) is as follows:

Local feature extraction network

After obtaining the low-level features of the image, the model is initially divided into two branches, one of which uses two convolutional layers to calculate local semantic features, and it obtains the local features. Here, H and W are the height and width of the input image.

Global feature extraction network

At the same time, another branch processes the low level features by additional four convolutional layers and three FC layers to get the global features, a 256 bins vector. In order to retain as many semantic features of the image as possible, we did not pool the features after convolution, but increase the step size of the convolution kernel to achieve pooling effect and reduce the dimension of the features correspondingly. As a result, not only the semantic features would be retained, but also the feature size, the noise, and the parameters number would be reduced.

Two fusion modules

In order to get better coloring results, this paper introduces two fusion modules. The first fusion layer is responsible for fusing global features and local features together. The convolution kernel size of this network is 1 × 1, and the step size is 1. The second fusion layer is responsible for combining the image’s semantic segmentation information when predicting the color of the image, which can color more accurately and prevent color bleeding.

Semantic segmentation network

FCN [33] is regarded as a pioneering work in the field of image semantic segmentation, and many semantic segmentation models have been proposed based on FCN. For example, some models improve the structure of the network (SegNet [34], DeconvNet [35]), and some models improve the convolution kernel (DeepLab [36]), the most important is the introduction of markov random field (MRF) on the basis of the rough semantic map to smooth the segmentation of edges [37]. This paper makes full use of the local and global features of the image, and uses the FCN model first to segment the target image into categories such as plants, buildings, sky, water, roads and etc., then calculates the color of each category, and finally calculates the hue value of each block by using a probabilistic method to mix the feature color of each category. The data from a particular paper [38] shows that the addition of conditional random field (CRF) can improve the final score by 1 − 2%. Given that image data is the set of observable variables , and the set of hidden variables to be inferred is , both are sequences of random variables are by linear chains. According to the conditional probability model P(Y|X) proposed in this paper [39] to predict the label of each pixel, and it satisfies markov process. Eq (3) is as follows: Where Y is the tag sequence or state sequence of the output, and its value is the category label {1, 2, …, L}. So the output of FCN is a L bins vector, where each bin represents the probability that the set of hidden variables belongs to this class. According to hammersley-clifford theorem, the factorization formula of linear chain random field P(Y|X) can be given, and each factor is a function defined at two adjacent nodes. Given the condition is x, y, Eq (3) could be written as Eq (4): Where Z(x) is the normalized term, whose sum is carried out on all possible output sequences. Eq (5) is as follows: Here, t and s are characteristic functions, λ and μ are corresponding weights. t is a feature function defined on an edge, called a transition feature, and it depends on the current and previous positions. s is a feature function defined on a node, called a state feature, and it depends on the current position. In general, the local feature functions t and s take values of 1 or 0. Given K1 transition features and K2 state features, then we can unify the feature as Eq (6). Next, the transition feature and the state feature are summed at various positions i, and it can be expressed as Eq (7). The weight W corresponding to f(y, x) is shown in Eq (8). Therefore, the CRF can be expressed as Eq (9). The parameter setting of the convolutional layer of the FCN is the same as that of the left side of U-Net. The difference is that it adds two convolutional layers, three FC layers and Softmax function after the first fusion module. To remove the spatial information and train the model to output a scalar, the result of our classification, the original 2D vector is converted to a 1D vector. These parameter settings of semantic segmentation network are shown in Table 1, Softmax function is as follows:

Table 1

Parameter settings of semantic segmentation network.

Layer	Kernels	Stride	Output
conv	3 × 3	2 × 2	512
conv	3 × 3	1 × 1	512
FC	-	-	1024
FC	-	-	256
FC	-	-	2

Color prediction network

Color prediction network is to predict a* and b* according to the feature tensor and semantic segmentation information of the input image. Its core is the decoder on the right side of the U-Net, which is composed of the convolution layer and the upsample layer. The output image tensor is required to be H × W × 2, and these parameter settings are shown in Table 2.

Table 2

Parameter settings of color prediction network.

Layer	Kernels	Stride	Output
conv	3 × 3	2 × 2	256
conv	3 × 3	1 × 1	128
upsamp	-	-	128
conv	3 × 3	1 × 1	64
conv	3 × 3	1 × 1	64
upsamp	-	-	64
conv	3 × 3	1 × 1	32
conv	3 × 3	1 × 1	2
upsamp	-	-

The convolutional layer cuts down the information of the image, so the image proportion can be kept constant by adding blank padding. The upsample layer can double the resolution of the image, and if the two are used together, they not only can increase the information density, but also can’t distort the image. To compare the difference between the predicted value and the actual value, we use the Tanh function, because the input of the Tanh function can be any value and the output is [−1, 1]. The Tanh function is as follows: Since the color values of a*b* are distributed in the interval [−128, 128], it is necessary to divide all values of the output layer by 128 to ensure that these values are in the −1 to 1 range for the convenience of comparing the errors of the predicted results. After the final error is obtained, the network will update the filter to reduce the global error, and improve the feature extraction effect through back propagation based on the error of each pixel until the error is small enough. After the neural network is trained, all the results are multiplied by 128 and converted back to the CIE L*a*b* image for the final prediction. There is no direct conversion between RGB colorspace and CIE L*a*b* colorspace, but it exists a XYZ colorspace that can help convert the two to each other: RGB ↔ XYZ ↔ CIE L*a*b*.

Objective function and network training

In this paper, the loss of the model includes the loss of color prediction and the loss of semantic segmentation. To quantify the loss of the model, we calculate the mean square error between the estimated pixel color in a*b* colorspace and the actual value. For image X, the loss of its color prediction network is as follows: Where θ are the parameters of all models, and are the i and j pixel values of the K component of the target image and the reconstructed image respectively. Semantic segmentation network can help color prediction network to learn how to supplement color information, so it is necessary to calculate the loss of semantic segmentation. In this paper, the loss function of semantic segmentation is defined as Eq (13). Where V is the weight of rebalancing losses. The total loss of this network can be expressed as Eq (14). Where L(y) is the loss of semantic segmentation network, L(y) is the loss of color prediction network, η1 and η2 are the correlation of the two losses.

Experiments

Experimental environment and dataset

All tests are run on an NVIDIA GTX 1080 TITAN GPU. According to the hardware, we divided the training 2000000 images into 15625 batches with batch size is 128. In addition, Adam [40] optimization algorithm is also used to accelerate the training speed. All training images and validation images in our model and several comparison algorithms in this paper are from the same public dataset, ILSVRC2012 [41], which is the dataset of the famous ImageNet [42]. All test images shown in the manuscript could be obtained from the support information (S2 File).

Performance evaluation index

This paper uses image colourfulness(C), the peak signal-to-noise ratio (PSNR), the structural similarity (SSIM), the quaternion structural similarity (QSSIM) and the qualitative evaluation by user study to evaluate the performance of these algorithms. We will use the colorfulness metric methodology described in Hasler’s paper [43], Eq (15) is as follows: Where σ is the standard of rg and μ is the deviation of rg, as σ and μ are to yb. C is described the colorfulness of the image. PSNR is obtained from the mean square error (MSE), and it is defined as follows: Here, MAX refers to the grayscale of the image, which is generally 255. MSE is the mean square error between the original image I and the processing image K. m and n are the number of rows and columns of the images respectively. SSIM uses the mean value as the estimation of the luminance L, the standard deviation is estimated as the contrast degree C, and the covariance is estimated as the structural similarity degree S, and the mathematical model is calculated as follows [44]: Here, μ and μ are the mean of image x and y respectively, σ and σ are the variance of image x and y respectively, σ is the covariance of image x and y. To avoid having a zero in the denominator, we introduce c1, c2, c3. We usually set: c1 = (K1 × L)2, c2 = (K2 × L)2, c3 = c2/2, K1 = 0.01, K2 = 0.03, L = 255. Eq (19) is rewritten as follows: PSNR and SSIM are the most common and widely used full-reference image quality evaluation indexes. The PSNR value cannot well reflect the subjective feelings of the human eye. The calculation of SSIM is a little complicated, and it circumvents the complexity of natural image content and the problem of multi-channel uncorrelation to some extent, and measures the similarity of two images by directly estimating the structural changes of two complex structural signals. Its value can better reflect the subjective perception of the human eye, but it is only suitable for measuring the structural similarity between grayscale images. QSSIM is a new color image quality index, and SSIM can also be regarded as a special case of QSSIM [45]. Eq (20) is as follows [46]: Note: for the definition and value of parameters in Eq (20), please refer to reference [46].

Experimental results and analysis

Before training the network, the RGB colorspace of the image is converted to the CIE L*a*b* colorspace. During training the network, the learning rate is initialized to 0.001, momentum is initialized to 0.5, and weight decay is initialized to 0.0005. After training the network, all results are multiplied by 128 to convert back to the RGB image.

Comparison of coloring effects under different epochs

Fig 3(a) shows the comparison of the coloring effects of our model on eight images after the training of the 10th, 20th, 30th, 40th and 50th generations. Fig 3(b)–3(g) are the magnification effect of two groups of images in Fig 3(a). Through comparing the coloring effects of the five epochs, we find that with the increase of the number of epoch, the number of weight updating iterations in the neural network increases, the color bleeding decreases, and the coloring effect of the image is closer and closer to the ground truth.

Fig 3

Comparison of coloring effects under different epochs.

The full name of GT is the ground truth, that is, is also the source image. The full name of Eps is Epochs. For convenience, the following image subtitles will use GT instead of the ground truth.

Comparison of coloring effects under different epochs.

The full name of GT is the ground truth, that is, is also the source image. The full name of Eps is Epochs. For convenience, the following image subtitles will use GT instead of the ground truth. To further verify that the image quality is affected by the epoch size, we use line graphs to show the change of the PSNR values, the SSIM values and the QSSIM values of the above eight groups of images under different epochs, as described in Fig 4. In the eight groups of images given, with the increase of the number of epoch, the PSNR values of half of the images have been changing and the corresponding curve fluctuates greatly, up to 13.471dB. The PSNR values of the other half of the images are relatively stable, and the two extremes fluctuate within the range of about 5dB. On the whole, the PSNR values of the eight groups of images are all above 29dB and tend to be stable when the epoch is 40. The SSIM values of the images are very stable overall, and the fluctuation range of each image does not exceed 0.01. The QSSIM values of the images are also relatively stable on the whole, and the fluctuation range is less than 0.02. In other words, the increase in the number of epoch would affect the PSNR value of some images, but basically would not affect the SSIM value and QSSIM value of images.

Fig 4

Quantitative evaluation.

The data in this figure is from Fig 3, and the images are in the same order.

Quantitative evaluation.

The data in this figure is from Fig 3, and the images are in the same order.

Effect of histogram equalization on image colorization

Theoretically, histogram equalization will result in the merge of gray levels, which may reduce the colorfulness of the image. It is found that histogram equalization in advance can eliminate the clutter and enhance the contrast of the image in the colorization study of some overexposed or underexposed images, and it can increase the colorfulness of the image in the later coloring process. Fig 5 is the image and its histogram before and after gray histogram equalization. It can be seen that the probability density of the gray level of the transformed image is evenly distributed, and the brightness and contrast of the whole image also become relatively natural. The pre-processed flower has clearer details and edges, which is very helpful in improving the accuracy of the model’s color prediction.

Fig 5

The image and its histogram before and after gray-histogram equalization.

Iizuka et al. [16] is the object for reference in this paper, which is also one of the most classical algorithms that applied deep learning to color prediction and achieved good coloring effects. Moreover, it does not consider coloring underexposed or overexposed images, so it will be selected as a comparative document in this section. For a specific comparison of visual sensory effects, please see Fig 6. From the horizontal perspective, histogram equalization can remove some clutter, save super dark images (underexposed images) and super bright images (overexposed images), and thus affect the final texture and color. For example, for the images with normal exposure in the first two lines in Fig 6, whether to add histogram equalization into our model has little influence on the final coloring effect, which is also the case in literature [16]. For the images with abnormal exposure in the last four lines, the coloring effect of our model with histogram equalization is obviously better than that without equalization, with richer details, more natural color transition and better visual effect. This is also the case in literature [16]. From the column perspective, compared with [16], our model rarely shows color bleeding.

Fig 6

The effect of HE on image colorization.

The effect of HE on image colorization.

From top to bottom: two normal exposure image, two overexposed images, two underexposed images. From left to right: the ground truth, the coloring effect of Iizuka et al. [16] (the original method), the coloring effect of Iizuka et al. [16] with histogram equalization(the improved method), the coloring effect of ours without histogram equalization, and the coloring effect of ours with histogram equalization.(HE = histogram equalization). (a) GT, (b) [16], (c) [16] + HE, (d) Ours—HE, (e) Ours. By analyzing the data in Tables 3 and 4, we found that the C values of the colored image predicted by these model are mostly lower than the C values of the ground truth. There are a few special cases, such as the img_5 in Table 3 and the img_3, img_4, img_5 in Table 4, the C values of the image obtained after adding histogram equalization to our model and literature [16] are significantly higher than the C values of the ground truth. Through the analysis of these data, it is not difficult to find that adding histogram equalization to the coloring model does not help the color prediction of normally exposed images, but may reduce the C value of the image, while for the image with abnormal exposure, it can effectively increase the C value of the predicted image. For img_1 and img_2 that are relatively normally exposed, whether to add histogram equalization has little effect on the C values of the image, that is, ΔC2 is close to 0. For img_3, img_4, img_5 and img_6 that are not properly exposed, the model with histogram equalization added is significantly more accurate in predicting the color of grayscale images. These C values of the colored image are higher or slightly different than these C values of the image without histogram equalization in a large probability, that is, ΔC2 is much higher than 0 or close to 0.

Table 3

The influence of HE on the colorfulness of Iizuka et al. [16].

[16]	C _GT	C _before	C _after	ΔC₁	ΔC₂
img_1	29.243	24.987	23.878	-5.365	-1.109
img_2	31.036	15.735	13.246	-17.790	-2.489
img_3	37.986	33.255	34.855	-3.131	1.600
img_4	6.572	9.351	9.438	2.866	0.087
img_5	27.276	23.788	37.396	10.120	13.608
img_6	19.610	11.929	12.790	-6.820	0.861

Table 4

The influence of HE on the colorfulness of ours.

Ours-HE	C _GT	C _before	C _after	ΔC₁	ΔC₂
img_1	29.243	26.682	26.512	-2.731	-0.170
img_2	31.036	27.254	26.176	5.14	-1.078
img_3	37.986	31.476	39.692	1.706	8.216
img_4	6.572	5.252	6.784	0.212	1.532
img_5	27.276	35.011	30.977	3.701	-4.034
img_6	19.610	2.240	14.475	-5.135	12.235

The meaning of each parameter in Table 4 is the same as that in Table 3. The data in the table is from Fig 6, and the images are in the same order.

C is the colorfulness of the ground truth. C is the colorfulness of the colored image without HE. C is the colorfulness of the colored image with HE. ΔC1 = C − C, ΔC2 = C − C. The data in the table is from Fig 6, and the images are in the same order. The meaning of each parameter in Table 4 is the same as that in Table 3. The data in the table is from Fig 6, and the images are in the same order.

Comparison with state-of-the-art algorithms

Fig 7 shows the comparison of the global coloring effects of our model and five classic models [16–18, 47, 48] in different complexity scenes. Through comparing the coloring effects of eight groups of test images, we find that the images processed by three models proposed in 2016 have obvious color bleeding, the images processed by Lei et al. [47] have simple colors and Su et al. [48] have slight color overflow, while our algorithm with the high-level semantic segmentation information of the image itself has strong robustness, which can apply to natural image colorization in different scenes.

Fig 7

Recolor of natural images.

Recolor of natural images.

The first four are simple natural scenes such as the lawn, single object, the sky and simple architecture, while the last four are complex natural scenes such as the water, many objects, brilliant lights and complex color levels. Literature [48] has two ways of automatic coloring and manual coloring, and the effect of automatic coloring is shown here. (a) GT, (b) [16], (c) [17], (d) [18], (e) [47], (f) [48], (g) Ours. In general, these algorithms are more accurate in dealing with highly recognizable scenes, while color bleeding and unclear edges may occur for difficult to recognize parts. In order to further verify the advantages of our algorithm, we invite 20 college students (10 women and 10 men, ranging from 20 to 30 years old) with normal vision, and ask them to score the coloring effects of these algorithms in terms of the three indexes given in Table 5. The test content is the above eight groups of images, the highest score in each group of images is 5, the lowest score is 1, the same score can appear in the same group of images. Then, we calculate the average score of each algorithm under these three indexes, and get the Table 5. After comparing these data in Table 5, it is found that ours has higher scores than other five algorithms in these three indexes. The comprehensive score is 4.27, which is at least 0.15 higher than the scores of other algorithms, which shows that the robustness of ours is very good, and the coloring effect in different scenes is relatively stable.

Table 5

User ratings.

The data in the table is from Fig 7, and the images are in the same order.

Methods	Index 1	Index 2	Index 3	Composite Scores
[16]	3.86	3.38	3.76	3.67
[17]	3.63	3.42	3.59	3.55
[18]	3.81	3.58	3.47	3.62
[47]	4.03	3.69	3.71	3.81
[48]	4.13	4.03	4.19	4.12
Ours	4.31	4.21	4.28	4.27

The three indexes are the rationality of coloring(Index 1), the naturalness of color transition(Index 2), the richness of color(Index 3).

User ratings.

The data in the table is from Fig 7, and the images are in the same order. The three indexes are the rationality of coloring(Index 1), the naturalness of color transition(Index 2), the richness of color(Index 3). Tables 6–8 show the PSNR values, the SSIM values and the QSSIM values of the above eight groups of images in turn. The data marked in bold is the top three best values obtained by using different methods. In Table 6, literature [16] has obvious advantages, and the images processed by it all get the best PSNR value. However, the performance of our algorithm is also good. Among the eight PSNR values, the second place accounts for 1/2, the third place accounts for 1/4, and the fourth place also accounts for 1/4. In Table 7, The SSIM values of the images processed by these six algorithms are good and the differences between them are very small. However, our algorithm performed very well. Among the eight SSIM values, the first place accounts for 5/8, the second place accounts for 1/4, and the fourth place account for 1/8. In Table 8, the QSSIM values the images processed by the six methods are nice, and the differences between them are also small. Similarly, among the eight SSIM values of our algorithm, the first place accounts for 1/4, the second place accounts for 1/2, and the third and fourth place account for 1/8 respectively. Although the PSNR, SSIM and QSSIM values of color images predicted by our model are not always optimal, there is a small gap between them and the optimal values. At the same time, it can be seen from the obtained mean values that the objective indicators of these images obtained by our model are all good and can basically meet the requirements of users.

Table 6

PSNR.

The data in Tables 6–8 are all from Fig 7, and the order of images is the same.

Methods	img_1	img_2	img_3	img_4	img_5	img_6	img_7	img_8	Average
[16]	51.551	51.053	49.859	50.765	51.020	51.412	50.534	50.961	50.8944
[17]	28.721	27.795	32.348	36.392	36.192	35.279	31.759	30.926	32.4265
[18]	36.066	38.247	35.511	44.046	41.617	40.724	34.409	34.811	38.1789
[47]	32.294	32.121	36.407	49.702	41.471	42.675	34.493	35.656	38.1024
[48]	22.811	23.696	26.362	28.799	25.065	23.417	25.628	25.791	25.1961
Ours	37.786	39.326	34.390	49.291	41.152	42.714	34.918	34.966	39.3179

Table 8

QSSIM.

Methods	img_1	img_2	img_3	img_4	img_5	img_6	img_7	img_8	Average
[16]	0.898	0.938	0.918	0.965	0.964	0.975	0.944	0.932	0.9418
[17]	0.934	0.955	0.924	0.980	0.973	0.979	0.948	0.946	0.9549
[18]	0.891	0.928	0.908	0.975	0.969	0.960	0.933	0.902	0.9333
[47]	0.912	0.926	0.924	0.984	0.973	0.975	0.950	0.943	0.9484
[48]	0.879	0.926	0.920	0.969	0.960	0.920	0.932	0.932	0.9298
Ours	0.934	0.936	0.920	0.983	0.972	0.972	0.949	0.947	0.9516

Table 7

SSIM.

Methods	img_1	img_2	img_3	img_4	img_5	img_6	img_7	img_8	Average
[16]	0.897	0.844	0.889	0.947	0.860	0.874	0.958	0.963	0.9040
[17]	0.897	0.850	0.890	0.945	0.862	0.872	0.957	0.961	0.9043
[18]	0.898	0.839	0.890	0.946	0.861	0.873	0.958	0.962	0.9034
[47]	0.894	0.842	0.891	0.947	0.861	0.874	0.958	0.963	0.9038
[48]	0.850	0.830	0.873	0.944	0.853	0.841	0.942	0.950	0.8854
Ours	0.898	0.839	0.890	0.947	0.861	0.875	0.958	0.963	0.9039

PSNR.

The data in Tables 6–8 are all from Fig 7, and the order of images is the same.

Application effects of state-of-the-art algorithms on black-and-white images colorization

Due to the long-term fading of these historical images and old photos, it is urgent to study a robust colorization algorithm to rescue them. Fig 8 shows the colorization effects of several algorithms on several groups of black-and-white images. At first glance, it is found that the colorization effects of these algorithms are very good. Compared with the original black-and-white image, the visual effect of the colored image has improved a lot. After magnification of the parts, it is found that in contrast to these methods, our model and literature [48] have a very uniform and stable effect on people’s skin, clothing, plants, sky, natural light and so on. In complex scenes, such as the last two groups of images, the colors predicted by four models [16–18, 47] are not only single in color, but also appear a lot of colors overflow. However, the coloring effect of this paper and literature [48] are better, the color of the image is not only rich, but the transition between each other is very natural.

Fig 8

Color restoration of black-and-white images.

(a) GT, (b) [16], (c) [17], (d) [18], (e) [47], (f) [48], (g) Ours.

Color restoration of black-and-white images.

(a) GT, (b) [16], (c) [17], (d) [18], (e) [47], (f) [48], (g) Ours. Boxplot data in Fig 9 comes from 20 testers who sort the six groups of images in Fig 8. Some data can be drawn from diagram: the effect generated by Iizuka et al. [16] has a probability distribution of 50% between 3 − 5, and its color effect is relatively stable, but not outstanding; both Larsson et al. [17] and Lei et al. [47] have a probability distribution of 50% between 2 − 5, 25% of them are between 2 − 4, and they are better than the former; Zhang et al. [18] have a probability distribution of 50% between 2 − 6, 25% of them are between 4 − 6, and its data is more scattered than the former; Su et al. [48] has better stability, with its 25% is distributed between 1 − 2, 50% is distributed between 2 − 4.5, and 25% is distributed between 4.5 − 6, which is better than the previous four algorithms; the images processed by our method, 25% of them is distributed between 1 − 2, 50% is distributed between 2 − 5, and 25% is distributed between 5 − 6, which is slightly lower than literature [48] in general. In the objective evaluation in the previous section, the advantages of our method are not particularly prominent, but the results of this survey show that the subjective evaluation effect of our method is better than the images colored by four models [16–18, 47], and it’s more consistent with people’s subjective perception, which will be more meaningful.

Fig 9

Ranking distribution of coloring effects of six algorithms.

Ranking distribution of coloring effects of six algorithms.

Each group of images contains the coloring results of six algorithms, which are sorted in ascending order of 1–6. No parallel ranking is allowed in the same group of images. Among them, the smaller the sorted number, the better the coloring effect processed by the algorithm. To highlight the advantages of our algorithm, we ask these 20 testers to do the third test. The scoring object includes 100 groups of images, each group has six images. These six images are in turn the coloring results of five classical algorithms [16–18, 47, 48] and ours. Each group of image is only displayed for ten seconds, and the testers must score and sort them immediately after reading a set of images. Unlike the previous two manual tests, this test requires the testers to directly sort and rank each group of images according to their own feelings at this time, and the results are shown in Table 9. Our model has the highest hit rate in the top 1 and the top 3. Among them, the top 1 is about 38%, which is far more than other algorithms, and the last hit rate is only 6%, which is much lower than other algorithms. The datas show that the coloring effects of our model are better than other algorithms in general.

Table 9

The ranking analysis of coloring effects of six algorithms.

Each set of images is still sorted in ascending order of 1 − 6. The formula satisfied here is as follows: Top1 = No.1, Top3 = No.1 + No.2 + No.3, Last1 = No.6. It should be noted that the values of the last three columns in Table 9 are reserved only for the integer portion of the percentage.

Methods	No.1(%)	No.2(%)	No.3(%)	No.4(%)	No.5(%)	No.6(%)	Top 1(%)	Top 3(%)	Last 1(%)
[16]	9	9.9	20.05	13.5	21.85	25.7	9	39	26
[17]	9.6	15.6	14.2	23.2	17.25	20.15	10	39	20
[18]	12.35	11	20.2	24.25	15.2	17	12	44	17
[47]	15.95	25.9	13.55	17.3	14.85	12.45	16	55	12
[48]	15.35	24.1	11.2	14.05	16.95	18.35	15	51	18
Ours	37.75	13.5	20.8	7.7	13.9	6.35	38	72	6

The ranking analysis of coloring effects of six algorithms.

Limitations and transferability test of ours

Fig 10 shows the comparison of the color prediction effects of the six image colorization methods on the five groups of images. Several comparison cases show that the transferability of our algorithm is visually superior to the other five algorithms. The examples listed in this paper are limited, but these cases are also relatively common and many algorithms are not well handled. In most cases, the experimental results of our algorithm are almost the same as the ground truth, and the colors of the images are also very bright. Although there may be deviations from the ground truth, it is still semantically correct. Nowadays, colorization technology is no longer only used for black-and-white photos, but also has a widely fields included old movies, medical image, cartoon coloring, the restoration of cultural relics and artwork, statue restoration, remote sensing images and so on. Therefore, our algorithm will have a large application market.

Fig 10

Transferability test of our algorithm.

Transferability test of our algorithm.

These five groups of images show the comparison of coloring effects of six algorithms on different materials, including wool, forging tape, ceramic, glass, stone Buddha, oil painting, etc. (a) GT, (b) [16], (c) [17], (d) [18], (e) [47], (f) [48], (g) Ours. Our model also has some areas that need to be improved. Fig 11 shows a few cases of poor coloring with our model. The following images contain too many targets, too dense targets, too small targets and no depth information, resulting in poor coloring effects, and problems such as warming the grayscale image directly, a mess of colors, blurry edges, and inactive colors. From the second row of images, we can see that our model is still very effective. For example, the red rose in the first column, white metal bookrack in the second column, white billboard in the crowd in the third column, dining table in the fourth column, basketball frame line between the sky in the fifth column, blond hair in the sixth column, white stripes in a striped socks in the seventh column, barbed wire in the eighth column, overall lighting atmosphere in the ninth column, which are endowed with the right color.

Fig 11

Limitations of our algorithm.

The three rows of images from top to bottom are the ground truth, ours, and the partial zoom effect corresponding to the red box in the second row.

Limitations of our algorithm.

The three rows of images from top to bottom are the ground truth, ours, and the partial zoom effect corresponding to the red box in the second row.

Comparison of calculation speed of several algorithms

Table 10 shows the average running time of these six colorization models when testing an image on the CPU and GPU respectively. We find that running code on the GPU is at least twice as fast as running code on the CPU. According to the data in Table 10, Lei et al. [47] takes the longest time to realize the image colorization, whether it is running on the CPU or the GPU, and our model is the second, Zhang et al. [18] ranks the third, literature [16, 17, 48] spends the shortest time. However, if the performance of the algorithm is compared according to the computing time of GPU, our algorithm can complete this operation in 5 seconds, which is not far different from the running speed of literature [16–18, 48], which is also quite good.

Table 10

Comparison of calculation speed.

The data in Table 10 is the average running time of each image on the CPU/GPU, which is obtained by dividing the total time spent on testing the six models on the ILSVRC2012 by the total number of images.

Methods	Num. lter.	CPU Time(s)	GPU Time(s)	Speedup
[16]	1	4.576	1.015	4.508
[17]	1	4.262	1.036	4.114
[18]	1	6.173	2.452	2.518
[47]	1	25.281	7.602	3.326
[48]	1	4.236	2.054	2.062
Ours	1	11.826	3.806	3.107

Comparison of calculation speed.

Conclusions

Since color images have incomparable advantages over black-and-white images in terms of people’s visual perception and subsequent image understanding and analysis, it is of great significance to continue to study a practical grayscale image colorization algorithm. There are four advantages to our algorithm. Firstly, manual coloring requires high professional knowledge of users, and a little negligence will cause color matching problems. However, the biggest advantage of our model is automation, which does not require manual intervention and only requires the user to provide a target grayscale image. Secondly, our model can predict the two color layers a* and b* by using the gray information L* of the gray image itself as much as possible. Third, the network is able to capture and use semantic information, which makes the predicted color correct even if it is not close to the ground truth, which completely explains the problem that a single grayscale image may correspond to many reasonable color images. Fourth, we do pre-processing before the image is input to the network, which can effectively improve the color quality of overexposed and underexposed images, and increase the colorfulness of the image. In addition, our model can not only colorize grayscale images, but also colorize videos. Here, we only need to turn the video into a series of consecutive images before entering the network. As mentioned in the previous section, There are still some defects in our model, for example, our model has poor effect on this kind of image with many targets, small targets, dense targets and no depth information (see Fig 11). On top of that, there are also other limitations, for example, our model can neither generate strange color that formed by artists, nor automatically imagine the light, shade and complex texture in the comic manuscript. Therefore, it’s necessary to rich the kinds of images of training set to enhance the generalization ability of neural network. In the following study, we will further improve the performance of the model and make the model learn people’s visual aesthetics to color the image as much as possible.

The data source for Fig 9.

These data were obtained by 20 testers who ranked the six groups of images in Fig 8 from 1 − 6. Among them, this image is ranked in the first place represents the best colorization effect and this image is ranked in the sixth place represents the worst colorization effect. There are a total of 120 groups of data, and each group of data in turn corresponds to the subjective ranking of the processing effects of these six colorization algorithms. (XLSX) Click here for additional data file.

The source address download page for all images involved in Figs 1–11.

These pages contain the copyright holder and the copyright license information. (XLSX) Click here for additional data file.

The data sources of Tables 5–9.

The Table 5 in the compressed package records the scores of 20 testers on eight groups of images in Fig 7 according to the given three indexes and shows the calculation process of the final composite scores of each algorithm. The Tables 6–8 show the PSNR values, the SSIM values and the QSSIM values of eight groups of images in Fig 7. The Table 9 records the coloring effect evaluation of 100 groups of images by 20 testers, and each group of images corresponds to the processing effect of six colorization algorithms in turn. Testers need to sort them from 1 − 6 according to their subjective consciousness. (ZIP) Click here for additional data file. (TXT) Click here for additional data file. 12 Feb 2021 PONE-D-20-38461 Fully automatic image colorization based on semantic segmentation technology PLOS ONE Dear Dr. Ding, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. I found this manuscript well written and interesting. Both reviewers were unanimous in their opinion that your manuscript describes technically sound piece of scientific research, however, they have made certain observations which need to be addressed. After thorough consideration of comments from all the reviewers, I felt that your study has merit but identified points that need to be addressed. Therefore, my decision is “major revision”. Please submit your revised manuscript by Mar 29 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols We look forward to receiving your revised manuscript. Kind regards, Gulistan Raja Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. In your Methods section, please include additional information about your dataset, in particular the testing images, and how it was collected, in enough detail for another researcher to replicate the findings. 3.We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section. 4. We note that Figure(s) 2, 3, 5, 6, 7, 8 and 10 in your submission contain copyrighted images. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright. We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission: a) You may seek permission from the original copyright holder of Figure(s) 2, 3, 5, 6, 7, 8 and 10 to publish the content specifically under the CC BY 4.0 license. We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text: “I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.” Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission. In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].” b) If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The subject of the paper is interesting and it fits well with the scope of the journal, and objectives are stated clearly. I could not find any logical errors in the presentation and the approaches used. In the manuscript, it is peculiar that sometimes the authors use "I" if they refer to themselves, however the manuscript has two authors. The authors apply multiple performance indices to evaluate the quality of colorization, such as PSNR, SSIM, and MSE. However, MSE and PSNR does not indicate well the perceptual quality of digital images and SSIM is for grayscale images. I recommend the authors to use somekind of full-reference image quality metrics for RGB images, for example QSSIM ( https://searchcode.com/codesearch/view/85580087/ ). On the other hand, there are many full-reference image quality metrics for RGB images are available online (https://www.mdpi.com/1999-4893/13/12/313), so the authors can choose from many possible algorithms. The authors compared the proposed algorithm to several other state-of-the-art deep learning algorithms. It is not clear from the algorithm whether these methods were trained on the same database or not. I think a detailed description of the evaluation protocol would solve this problem. I think it is important to train all examined methods on the same dataset. Reviewer #2: Aiming at these problems of image colorization algorithms based on deep learning, such as color bleeding and insufficient color, the authors convert the study of image colorization to the optimization of image semantic segmentation, and propose a fully automatic image colorization model based on semantic segmentation technology. Although the idea and research questions of this paper are timely and clear, however, some major questions need to be addressed. My questions and comments are presented as follows that may be helpful to improve the quality of this paper. 1. The paper does not explain clearly its advantages with respect to the existing mehtods: it is not clear what is the novelty and contributions of the proposed work: does it propose a new method or any improvement of the existing models ? If so, please state clearly the modifications. There is slight novelty but there are several components of the framework that would require marked improvement. 2. In Table. 6 and Table .7, some PSNR and SSIM indexes are much higher than those in the compared methods. This issue further leads my following concerns about the generalization and overfitting in some occasions. 3. in discussing the complexity of your method, it might also be helpful to include an analysis of the empirical speed of different parts of the colorization process. 4. Could the techniques used or insights gained in this paper be applied to problems apart from image colorization? If so, these could be discussed in the conclusion. 5. The literature has to be strongly updated with some relevant and recent papers focused on the fields dealt with the manuscript, such as "Automated colorization of a grayscale image with seed points propagation, IEEE Transactions on Multimedia 22 (7), 1756-1768,2020. Scene guided colorization using neural networks, Neural Computing and Applications, 2019." ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 16 Sep 2021 Dear Editors and Reviewers， Thank you for your letter and the reviewers’ comments on our manuscript entitled “Fully automatic image colorization based on semantic segmentation technology” (ID: PONE-D-20-38461). Those comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our researches. We have carefully studied the opinions of the experts and actively made corrections, hoping to get approval. The test images used here are from the public dataset ILSVRC2012 and are freely available for academic research and non-commercial use. As the test images in the paper are replaced, the corresponding tabular data and textual analysis are adjusted accordingly. At the same time, the data materials attached to the original manuscript have been updated. In addition, we added a subsection "Comparison of Calculation Speed of Several Algorithms" at the end of the module of "Experimental Results and Analysis", which further enriched our evaluation indicators. Compared to the manuscript submitted in the first version, the revision we submitted this time shows significant changes. These comparisons can be viewed in the document labeled " Revised Manuscript with Track Changes ". We use the PDF manuscript file generated by the latex template provided on the official website of PLOS ONE, and then use Adobe Acrobat Pro to compare the old and new versions of the document and generate a "revised manuscript with track changes". What needs to be reminded is that for better viewing results, we recommend that experts use Adobe Acrobat Pro to open and view this "Revised Manuscript with Track Changes ". When you click on the highlighted text, you can also see the comparison of the changes between the old and the new versions. As shown in Fig 1: Fig 1. Preview effect with modified track manuscript. Responds to the Editor’s comments: Thanks for your comments on our paper. We have revised our paper according to your comments. If you have any question about this paper, please don’t hesitate to let me know. Here are our answers to each of these questions: Comment 1：Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf Reply 1: Yes, we have confirmed that we are using the latest latex template (https://journals.plos.org/plosone/s/latex) provided on the official website of PLOS ONE, which should meet the style requirements of PLOS ONE. At the same time, we checked the naming of all files and corrected the corresponding errors, such as changing "fig1.eps" to "Fig1.eps". Images uploaded to the site should start with a capital letter “Fig.eps”. Comment 2: In your Methods section, please include additional information about your dataset, in particular the testing images, and how it was collected, in enough detail for another researcher to replicate the findings. Reply 2：We really appreciate your valuable advice. It is indeed our job to provide relevant information of the dataset. We are very sorry that because the training set and the test set used in the first manuscript submitted contain some images from the Internet, it is difficult for us to contact these original copyright holders, so we can only choose to delete most of the test images in this article, and change all to the images in ILSVRC2012. For an introduction to the training set and the test set used in the manuscript, please refer directly to page 8 lines 236-245. This paragraph introduces the training images, validation images, and test images used in this article and several comparison algorithms in the article, all from the same data set ILSVRC2012. The ILSVRC2012 dataset is the data set of the famous ImageNet2012 competition. Its download link is: http://www.image-net.org/challenges/LSVRC/2012/index. Comment 3: We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section. Reply 3：Financial Disclosure：This project was funded by National Natural Science Foundation of China under the grant 61303093 and 61402278. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. None of the authors received salaries from any of the funders. Comment 4: We note that Figure(s) 2, 3, 5, 6, 7, 8 and 10 in your submission contain copyrighted images. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and supporting information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright. Reply 4：Due to the difficulty of obtaining the permission from the original copyright holders to publish these data under CC BY 4.0, we have chosen to delete the original image and have replaced all the copyrighted images in Figures 2, 3, 5, 6, 7, 8 and 10 in the new manuscript. The images used in the present manuscript are from the ILSVRC2012 dataset, which was generated during the ImageNet Challenge in 2012. Each image in ImageNet belongs to the individual who provided the image. ImageNet does not own the copyright of the image. Subject to certain terms and conditions, we have obtained permission to use the ImageNet dataset for academic research and non-commercial purposes, This is shown in Fig 2. Fig 2. A copyright certificate for ImageNet has been obtained. Responds to the reviewer’s comments: Reviewer #1: The subject of the paper is interesting and it fits well with the scope of the journal, and objectives are stated clearly. I could not find any logical errors in the presentation and the approaches used. In the manuscript, it is peculiar that sometimes the authors use "I" if they refer to themselves, however the manuscript has two authors. The authors apply multiple performance indices to evaluate the quality of colorization, such as PSNR, SSIM, and MSE. However, MSE and PSNR does not indicate well the perceptual quality of digital images and SSIM is for grayscale images. I recommend the authors to use somekind of full-reference image quality metrics for RGB images, for example QSSIM ( https://searchcode.com/codesearch/view/85580087/ ). On the other hand, there are many full-reference image quality metrics for RGB images are available online (https://www.mdpi.com/1999-4893/13/12/313), so the authors can choose from many possible algorithms. The authors compared the proposed algorithm to several other state-of-the-art deep learning algorithms. It is not clear from the algorithm whether these methods were trained on the same database or not. I think a detailed description of the evaluation protocol would solve this problem. I think it is important to train all examined methods on the same dataset. 1. Response to comment: We are very grateful to you for your time and wisdom to provide suggestions for revisions to our articles to ensure that the highest-level articles are published. These suggestions really helped us a lot, for example, you proposed to use some kind of full reference image quality metrics for RGB images, so we specially added the new data, tables, pictures and other information about QSSIM in the revised manuscript. We have carefully studied your modification suggestions and summarized them into the following three questions (Comment 1.1, Comment 1.2, Comment 1.3). Below, we will also respond to these three questions in order (Reply 1.1, Reply 1.2, Reply 1.3) in a serious and responsible manner based on our research situation. Comment 1.1: In the manuscript, it is peculiar that sometimes the authors use "I" if they refer to themselves, however the manuscript has two authors. Reply 1.1：We are very sorry about the trouble caused by our wrong writing in Cover Letter and in the conclusion section (the sixth word in line 437 on page 14) of the first edition of the manuscript. You are absolutely right. Our manuscript has two authors. Whether it is in Cover Letter or in the manuscript, we should use it correctly. In our newly submitted Cover Letter and revised manuscript, we have carefully checked all the words and ensured that “I” no longer appear as the subject. Comment 1.2: The authors apply multiple performance indices to evaluate the quality of colorization, such as PSNR, SSIM, and MSE. However, MSE and PSNR does not indicate well the perceptual quality of digital images and SSIM is for grayscale images. I recommend the authors to use some kind of full-reference image quality metrics for RGB images, for example QSSIM ( https://searchcode.com/codesearch/view/85580087/ ). On the other hand, there are many full-reference image quality metrics for RGB images are available online (https://www.mdpi.com/1999-4893/13/12/313), so the authors can choose from many possible algorithms. Reply 1.2：As you said, MSE and PSNR are really not good indicators of the perceptual quality of digital images, so we also invited 20 volunteers to rate and sort the final results produced by using our model and five more advanced deep learning-based image colorization algorithms based on their own visual experience. The specific performance is as follows: 1) Volunteers were asked to rate the images processed by the six models from three indicators(the rationality of coloring(Index 1), the naturalness of color transition(Index 2), the richness of color(Index 3) according to their own intuitive feelings. For specific data, see “Tab5.xlsx” in the compressed package “S1_Data” in the attachment, which produced a total of 160 rows*18 columns of data. Then calculate the average score of each model in turn according to various indicators, and evaluate the performance of these methods in the perceptual quality of digital images. These contents are reflected in Table 5 of the revised manuscript(on page 12); 2) Volunteers were asked to rate and rank the overall quality of the coloring results of the same image processed by different methods according to their own intuitive experience, so as to evaluate the quality of each coloring method in terms of overall coloring effect based on these data. The specific data source can be seen in “Tab9.xlsx” in the compressed package “S1_Data” in the attachment (a total of 2000 rows * 6 columns of data are generated) and “S1_File.xlsx” (a total of 120 rows * 6 columns of data are generated). For the correspondence between the files in the attachment and the content in the manuscript, see the introduction of the module "Supporting information" (on page 16, lines 492-510). This part of the content is shown in Fig 9 and Table 9 of the revised manuscript(on page 14). In the first version of the submitted manuscript, we combined PSNR, SSIM and Colorfulness as objective evaluation indicators, and user scoring as a subjective evaluation method to evaluate the quality of the colorized image. The content corresponding to this module can be found in a separate file labeled “Revised Manuscript with Track Changes” (on page 8, lines 260-262). SSIM is suitable for gray images, where we convert RGB to gray images before processing. PSNR, SSIM, QSSIM, etc. are all typical full-reference image quality indicators. We have also carefully read the information about QSSIM provided by you. We unanimously agree that it is reasonable to add QSSIM indicators for evaluation. In addition, the formula of QSSIM indicator is added to the module "Performance Evaluation Index" in the revised manuscript (on page 9, lines 286-290), and the analysis of QSSIM indicator is added to the module "Experimental Results and Analysis" (for example, Fig 4 on page 10 and Table 8 on page 13). Comment 1.3: The authors compared the proposed algorithm to several other state-of-the-art deep learning algorithms. It is not clear from the algorithm whether these methods were trained on the same database or not. I think a detailed description of the evaluation protocol would solve this problem. I think it is important to train all examined methods on the same dataset. Reply 1.3：Yes, the algorithm proposed in this article and several other most advanced deep learning-based image colorization algorithms are trained and tested on the same database, and the results are compared. This is described in the section "Experimental Environment and Dataset" of the revised manuscript (on page 8, lines 249-251). Reviewer #2: Aiming at these problems of image colorization algorithms based on deep learning, such as color bleeding and insufficient color, the authors convert the study of image colorization to the optimization of image semantic segmentation, and propose a fully automatic image colorization model based on semantic segmentation technology. Although the idea and research questions of this paper are timely and clear, however, some major questions need to be addressed. My questions and comments are presented as follows that may be helpful to improve the quality of this paper. 2. Response to comment: Thank you very much for the excellent and professional revision of our manuscript. We found the reviewers’ comments to be helpful in revising the manuscript and have carefully considered and responded to each suggestion. In the majority of cases we were successful in incorporating the reviewers’ feedback into our revised manuscript. There were some very good suggestions that we couldn't complete in a short time, but we learned a lot from them.By answering the following questions, we know that there is still a lot to learn in this field. We will keep your ideas in mind and continue to do further research in this field. Comment 2.1: The paper does not explain clearly its advantages with respect to the existing mehtods: it is not clear what is the novelty and contributions of the proposed work: does it propose a new method or any improvement of the existing models ? If so, please state clearly the modifications. There is slight novelty but there are several components of the framework that would require marked improvement. Reply 2.1：Our network draws on the work of Iizuka et al. [16], and preprocesses the image before sending it to the network to improve the contrast of abnormally exposed images. At the same time, we add a semantic segmentation network to the model to optimize the coloring effect. These descriptions can be found in the manuscript (on page 3, lines 113-115). In addition, when designing the loss function, we considered the loss of the semantic segmentation network and the loss of the color prediction network(on lines 227 to 243 on pages 7 to 8 of the manuscript). The experimental results prove that the coloring effect obtained by using our model is better than the coloring effect of Iizuka et al. [16] in terms of objective indicators and subjective evaluation. Each set of experiments in this paper contains a comparison of the color effect of the proposed algorithm and this algorithm. The details can be seen from the beginning of line 296 on page 9 of the manuscript to the end of the experimental results and analysis. Compared with the existing methods, the fully automatic image colorization based on semantic segmentation technology method proposed in this article Its novelty is reflected in: (1) the addition of histogram equalization preprocessing for colored images, which makes our model have obvious advantages over other methods when dealing with over-dark or over-exposed images; (2) the introduction of semantic segmentation technology improves the accuracy of the algorithm and solves the problem of color overflow. The contributions of this paper include(See lines 38-43 on page 2 of the manuscript): (1) histogram equalization effectively improves the visual effect and the colorfulness of overexposed and underexposed images;(2) the introduction of semantic segmentation network accelerates the edge convergence of the image and improves the positioning accuracy of the algorithm, and solves the problem of color bleeding; (3) compared with several popular algorithms, our model has better results in natural images colorization and old black-and-white images colorization. As for the several components you mentioned that need marked improvement, we have been thinking about it since we received this suggestion. Although we try to do some parameter modification and component innovation, the current results are not very good. It is very important. Because of your suggestions, we have discovered the shortcomings in our current work. We will follow your suggestions to improve the level of scientific research and achieve more results in future work. We are very happy to receive your guidance, and especially hope that this article can be accepted. Comment 2.2: In Table. 6 and Table .7, some PSNR and SSIM indexes are much higher than those in the compared methods. This issue further leads my following concerns about the generalization and overfitting in some occasions. Reply 2.2：The question you raised is also what we thought about when designing the network model and designing the loss function. At that time, we also made preparations. If the model has a small error on the training set, but a large error on the test set, our solutions are two: Our solutions are two: (1) at the data level, you can use a simple matlab code to move, zoom, rotate, invert, and add noise to each image to obtain more data, thereby Realize data augmentation; (2) we can add L1 regularization to the loss function to prevent overfitting and improve generalization ability. At present, our model has not exhibited under-fitting and over-fitting. It is used in common natural image colorization (from page 11, line 354 to page 13, line 387), old black-and-white image colorization (from page 13, line 388 to page 14 line 429) and coloring of images containing special materials (from Page 15 line 430 to page 15 line 442) performed very well. Our model also has some areas that need to be improved (from page 15, line 443 to page 15, line 454). In the future, we will still be engaged in the colorization of images and videos. At the same time, we will continue to pay attention to this problem and continue to optimize the network structure. Comment 2.3: in discussing the complexity of your method, it might also be helpful to include an analysis of the empirical speed of different parts of the colorization process. Reply 2.3：Thank you for mentioning the analysis of operating speed. This is indeed our lack of consideration. We have also added an experiment to compare our own method with several other methods in terms of execution efficiency difference (on page 15, lines 455-464). However, we did not specifically discuss the execution efficiency of different parts of the colorization process. There are two main reasons: (1) although it is very good to be able to analyze the speed of different parts of the colorization process, as a complete automatic colorization method, the operating efficiency of a certain part of it cannot determine the final execution effect; (2) this time we are discussing the performance of our proposed method in completing the task of image colorization. It is not the focus of this article to look at the local running speed from the inside of the method. But if we want to continue to improve the running speed of our model in the future, it is natural to study the optimization space of different parts of the colorization process. Comment 2.4: Could the techniques used or insights gained in this paper be applied to problems apart from image colorization? If so, these could be discussed in the conclusion. Reply 2.4：This paper does not use any new technology, in which the semantic segmentation technology used to divide the image into different regions and extract the region of interest can be used in the fields of image classification, automatic driving of object detection, etc. However, this is already a well-known conclusion, which is not discussed in our conclusion section. Comment 2.5: The literature has to be strongly updated with some relevant and recent papers focused on the fields dealt with the manuscript, such as "Automated colorization of a grayscale image with seed points propagation, IEEE Transactions on Multimedia 22 (7), 1756-1768,2020. Scene guided colorization using neural networks, Neural Computing and Applications, 2019." Reply 2.5：Yes, we carefully studied the two articles you recommended, and they each proposed a fully automatic coloring algorithm, which is the same as our method. Wan S et al.[47] mentioned the subsequent application of the model to low-light night vision images in the conclusion part, and we pointed out in the contribution that the equalization can improve the coloring effect of low-exposure images (on page 2 lines 38-43). The specific experimental comparison effect can be seen in Fig 6 (on page 11). In the follow-up, we will also consider further expanding its scope of application. For example, applied to the color enhancement of low-light night vision images, and infrared image colorization. Yu X et al. [48] used scene-guided coloring to essentially classify the coloring images first, and then quickly and effectively color them according to the characteristics of the images in the category. This makes coloring more targeted and reasonable in theory, and indeed has great reference significance , which we have quoted in the manuscript respectively(on page 3, lines 100-111). But in the experimental part, we did not compare them with the colorization effect of the algorithm in this paper. In the future research, we will pay attention to the update of the literature. Among the five contrast algorithms, three are from the very classic grayscale image colorization algorithm in 2016[16,17,18]. At present, the latest research will generally take them as the contrast algorithm, because their coloring effect is really very excellent. The other two are in 2019[45] and 2020[46]. Among them, the paper in 2020 has been put into commercial application of PS Element. It should be noted that we chose its automatic coloring effect of the literature [46] in the comparative experiment, because its manual coloring effect is indeed very good, and it can be colored according to the user's own wishes. But this requires the user to have a certain color matching foundation, and the final product will be excellent. The number of references referred to in the reply to this question corresponds to the order in the manuscript, as follows : [16] Iizuka, Satoshi and Simo-Serra, Edgar and Ishikawa, Hiroshi. Let there be color!: joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Transactions on Graphics(TOG). 2016; 35(4):110. [17] Larsson, Gustav and Maire, Michael and Shakhnarovich, Gregory. Learning representations for automatic colorization. European Conference on Computer Vision. Springer, 2016; pp.577-593. [18] Zhang, Richard and Isola, Phillip and Efros, Alexei A. Colorful Image Colorization. arXiv preprint arXiv:1603.08511.2016. [45] Lei, Chenyang and Chen, Qifeng. Fully Automatic Video Colorization With Self-Regularization and Diversity. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019; pp.3753-3761. [46] Su, Jheng-Wei and Chu, Hung-Kuo and Huang, Jia-Bin. Instance-aware Image Colorization. IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2020. [47] Shaohua Wan, Yu Xia, Lianyong Qi, Yee-Hong Yang, Mohammed Atiquzzaman. Automated Colorization of a Grayscale Image With Seed Points Propagation. IEEE Transactions on Multimedia. 2020, 22(7): 1756-1768. [48] Xia, Y., Qu, S. & Wan, S. Scene guided colorization using neural networks. Neural Comput & Applic , 2018. https://doi.org/10.1007/s00521-018-3828-z. We tried our best to improve the manuscript and made some changes in the manuscript. These changes will not influence the content and framework of the paper. Here we did not list all the modifications one by one, but made detailed replies according to the comments raised by academic editors and two reviewers, and pointed out some modifications related to these comments. See a separate file labeled “Revised Manuscript with Track Changes” for details of modifications. We appreciate for Editors/Reviewers’ warm work earnestly, and hope that the correction will meet with approval. Once again, thank you very much for your comments and suggestions. Sincerely yours, Submitted filename: Response to Reviewers.pdf Click here for additional data file. 2 Nov 2021 Fully automatic image colorization based on semantic segmentation technology PONE-D-20-38461R1 Dear Dr. Ding, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Gulistan Raja Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors addressed all my comments very carefully. I was not able to find any logical errors in the presentation. I can recommend this manuscript for publication. Reviewer #2: The authors have made significant progress in improving the paper.All issues raised by the concerns are addressed reasonably, so I am in favor of publication of the paper. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No 12 Nov 2021 PONE-D-20-38461R1 Fully automatic image colorization based on semantic segmentation technology Dear Dr. Ding: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Gulistan Raja Academic Editor PLOS ONE

6 in total

1 in total

1. Automatic Gray Image Coloring Method Based on Convolutional Network.

Authors: Jiayi Fan; Wentao Xie; Tiantian Ge
Journal: Comput Intell Neurosci Date: 2022-04-26

1 in total