Ying Wang1, Jianbo Wu2,3, Hui Deng1, Xianghui Zeng1. 1. College of Food and Chemistry Engineering, Shaoyang University, Shao Yang, Hunan 422000, China. 2. Nankai University, Tianjin 300071, China. 3. Si Chuan Corder Technology Co., Ltd, Chengdu 610209, China.
Abstract
With the development of machine learning, as a branch of machine learning, deep learning has been applied in many fields such as image recognition, image segmentation, video segmentation, and so on. In recent years, deep learning has also been gradually applied to food recognition. However, in the field of food recognition, the degree of complexity is high, the situation is complex, and the accuracy and speed of recognition are worrying. This paper tries to solve the above problems and proposes a food image recognition method based on neural network. Combining Tiny-YOLO and twin network, this method proposes a two-stage learning mode of YOLO-SIMM and designs two versions of YOLO-SiamV1 and YOLO-SiamV2. Through experiments, this method has a general recognition accuracy. However, there is no need for manual marking, and it has a good development prospect in practical popularization and application. In addition, a method for foreign body detection and recognition in food is proposed. This method can effectively separate foreign body from food by threshold segmentation technology. Experimental results show that this method can effectively distinguish desiccant from foreign matter and achieve the desired effect.
With the development of machine learning, as a branch of machine learning, deep learning has been applied in many fields such as image recognition, image segmentation, video segmentation, and so on. In recent years, deep learning has also been gradually applied to food recognition. However, in the field of food recognition, the degree of complexity is high, the situation is complex, and the accuracy and speed of recognition are worrying. This paper tries to solve the above problems and proposes a food image recognition method based on neural network. Combining Tiny-YOLO and twin network, this method proposes a two-stage learning mode of YOLO-SIMM and designs two versions of YOLO-SiamV1 and YOLO-SiamV2. Through experiments, this method has a general recognition accuracy. However, there is no need for manual marking, and it has a good development prospect in practical popularization and application. In addition, a method for foreign body detection and recognition in food is proposed. This method can effectively separate foreign body from food by threshold segmentation technology. Experimental results show that this method can effectively distinguish desiccant from foreign matter and achieve the desired effect.
In the new era, the development of China's catering industry shows the characteristics of the times focusing on health. In recent years, people's health awareness has awakened all over the country, and healthy body shape has become the need of the public. Body management and body shape management have also been generally recognized. The rapid development of computer technology is the driving force of Chinese diet in the new era. On the other hand, artificial intelligence technology has been integrated into all aspects of social life, and it is also the backbone of the development of mobile Internet diet. On the other hand, the demand for new formats in the catering field has produced more application scenarios and requires more accurate technical requirements [1]. Under the joint action of technical support and social needs, the diet field with the characteristics of the new era has high research value in its specific application scenarios and provides a new idea for the development of the mobile Internet diet field [2].In recent years, the field of image detection and classification has developed rapidly, and many proposals of image detection and classification methods based on machine learning have greatly improved the accuracy and efficiency of image detection and classification [3]. Therefore, image detection and classification technology can be better applied to many practical fields and industries. Mobile applications such as menu image recognition and classification and food health management have brought great convenience to people's healthy life and have a wide range of application scenarios. In the field of catering, the intellectualization of ordering service and restaurant recommendation is the field of rapid development and application. In the actual production environment, different food and drink systems have accumulated a large number of cooking image data resources [4].Menu image recognition and classification is an important research direction combining application practice with target detection technology, and the utilization possibility and actual demand of technology must be considered comprehensively [5]. Although the recognition field of cuisine has broken through a certain range, there is still room for improvement in the basic problems of image recognition and classification of cuisine. Especially, as the target detection technology represented by involved neural network becomes more and more mature, it can be better solved by new technology. There is a big gap between cuisine types, and interference factors such as photographic illumination will affect the recognition accuracy of cuisine [6]. In order to improve the inspection accuracy, menu image recognition and classification is an important direction of subdivision research in the future.China's food industry has made great achievements, but there are many problems that cannot be ignored. Food quality is one of the most important problems, and it is the bottleneck restricting the development of China's food industry [7]. Compared with foreign products, the accuracy and speed of foreign body inspection equipment in China have a certain gap. The types of foreign bodies detected in foreign body inspection equipment developed by many enterprises are limited. The measurement speed is slow, and only things with regular shapes can be detected. This will affect the popularity and use of equipment [8]. The overall technical strength is weak and the equipment is backward. Therefore, it is of great significance to study the foreign body inspection system for food quality management [9].
2. Food Recognition Based on Deep Convolutional Neural Network
2.1. Convolutional Neural Network Structure
CNN is the most popular and widely used neural network in the field of computer vision [10]. Figure 1 illustrates the workflow of an exemplary CNN model where input images are first iterated through sequential convolution and pooling processes to obtain feature maps and classified through the entire connection layer.
Figure 1
CNN model workflow chart.
CNN can share the two characteristics of sparse connection and weight, which significantly reduce the number of model parameters, and can increase the size of the network without increasing the training data, so as to train more complex models. Some scholars use two linear parameters to scale the data to obtain the characteristics of the volume layer, so as to satisfy the dispersion of 1 and the average value of 0, and then input it into the lower layer through the activation function. The process for the BN layer is as follows:
2.2. Semisupervised Labeling and Coarse Enhancement of Food Images
2.2.1. Semisupervised Labeling of Food Images
It is very difficult to obtain food inspection datasets, and each sample needs to be labeled manually, which will lead to a lot of waste of human resources. In order to reduce the labeling workload, this section studies the method of automatic image labeling successfully applied to CNFood-252 dataset [11] as shown in Figure 2.
Figure 2
Automatically label samples.
As shown in Figure 3, 52 food image samples were manually verified by a small threshold, which automatically labeled the CNFood-252 dataset to cause an error box to appear [12,13].
Figure 3
Partially staggered sample.
This method also requires manual marking checking, but its efficiency has been greatly improved compared with manual marking. Therefore, in practical application, this method can be used as the construction method of early datasets.
2.2.2. Coarse Enhancement of Food Image
In the real environment, the position and space structure of food images are not completely fixed. In order to improve the generalization ability of the model, the detection accuracy of the dataset is expanded by rotating the image. Finally, the original image and the flipped image were rotated at the same time to expand the sample every 12°. The specific steps are as follows:Determine the food category to be enhanced and extract the original image, as shown in Figure 4 (taking Lion Head vermicelli as an example).
Figure 4
Categories to be enhanced.
The extracted image is horizontally flipped to obtain a flipped image, as shown in Figure 5.
Figure 5
Horizontally flipped image. The original image is on the left, and the flipped image is on the right.
The original image and the flipped image are rotated by 12° at the same time to generate the amplified image.(4)In order to reduce the influence of black edges on the detection results, fill the black edges with the center color of the tray, as shown in Figure 6, and the filling results are shown in Figure 7.
Figure 6
Pallets of various colors.
Figure 7
The center pixel of the tray is filled with black edges.
Finally, expand 100 pixels at the center of the tray and randomly place images in this area to generate expanded samples, as shown in Figure 8.
Figure 8
Amplified images of different tray samples.
The method can be automatically generated at the same time of expanding the labeling file. Although it can reduce the amount of manual labeling work, the method is suitable for expanding the crude sample. The performance test is carried out on the model of the generated test sample.
2.3. Food Image Location and Classification
Whether it is a one-stage detection method or a two-stage detection method, its essence is the combination of location task and classification task [12]. On the other hand, the accuracy of regression target position is high, but the classification ability is weak. At present, classification models are very mature, and many models with excellent performance are proposed, combining the advantages of both [14]. The CNFood-252 dataset is used to display the experimental results, as shown in Figure 9.
Figure 9
Location + classification flowchart.
2.4. Food Image Matching
Target inspection is supervised learning and cannot check the kinds without training. In practical application, the metabolism of food kinds is frequent. The biggest problem with food inspection is that when a new food category is added, the model needs to be retrained, and because the overall update process is very long, it cannot be used immediately. The second problem is that food inspection requires collecting training samples. The current data expansion method can reduce the number of training samples, but it needs to collect a certain amount of original samples.Image matching is one of the important implementation methods in CBR [15]. In this section, the learning and measurement ideas of a small number of samples are introduced to solve the main problems in the above target detection through image matching.
2.4.1. Small Sample Learning
The purpose of small sample learning is to extract important features from a small number of limited samples and obtain better robustness. Its essence is to study the rapid learning ability of human beings. After learning a large amount of data, only a small number of samples can achieve better performance for new species [16]. Learning with fewer samples can be divided into single sample learning and K sample learning according to the number of model training samples. K is the number of training samples, and the value is generally not more than 20.
2.4.2. Measurement Learning
Quantitative learning is also called similarity learning. The relationship between two samples is determined by measuring the similarity between them. Generally, the Euclidean distance and Mahalanobis distance are used to express similarity. Traditional measurement methods such as KNN are realized by simple nonparametric estimation, but the measurement method based on depth learning is also called depth measurement, which uses CNN's strong feature representation ability to measure high-dimensional space [17,18]. Currently, metrics-based learning, which is commonly used in classification tasks with fewer samples, is suitable for networks, prototype networks, correlation networks, and twin networks. Here, the twin network needs to input two samples and compare the loss function to calculate the similarity between them.As shown in Figure 10, the structure of twin network is to combine two samples one by one to form samples, train them in the input network, and apply similar functions to calculate the similarity of sample pairs. The specific process is as follows:
Figure 10
Twin network structure diagram. Green color represents the convolution layer, and blue color represents the pooling layer.
Feature maps f (x1) and f (x2) are obtained from sample pairs x1 and x2 by CNN feature extraction, which are expanded into vectors as shown in equations (2) and (3).The distance between vector a and vector ß is calculated using the distance formula, taking the l2 norm as shown, for example, in the following equation:For the input samples x1 and x2, D (a, β) is smaller if they are of the same class, and D (a, β) is larger if they are of different classes; then, the loss function of the model can be defined as follows:where N is the number of sample pairs, Y is the label of sample pairs, which is used to indicate whether the sample pairs x1 and x2 are of the same category, and m is the judgment threshold.
2.4.3. Model Design
In order to realize image retrieval of multiple targets, the twin network uses FewFood-50 dataset for training to measure the similarity between sample pairs [19], and the YOLO-SiamV1 model is shown in Figure 11.
Figure 11
Flowchart of YOLO-SiamV1 model.
Experiments show that the performance of this model is not good. The twin network is improved, and the YOLO-SiamV2 model is proposed, which extends CNN to 15 layers, including 10 tatami layers, 4 pooling layers, and 1 full connection layer. The twin network structures of the YOLO-SIMAMV1 and YOLO-SiamV2 models are shown, for example, in Table 1.
Table 1
Comparison of YOLO-SiamV1 and YOLO-SiamV2 twin network structures.
YOLO-SiamV1
YOLO-SiamV2
Layer
Nuclear size
Number of channels
Step length
Layer
Nuclear size
Number of channels
Step length
Conv1
10×10
64
1
Conv1
3×3
64
1
Maxpooling
2×2
2
Conv2
3×3
64
1
Conv2
7×7
128
1
Maxpooling
2×2
2
Maxpooling
2×2
2
Conv3
3×3
128
1
Cinv3
4×4
128
1
Conv4
3×3
128
1
Maxpooling
2×2
2
Maxpooling
2×2
2
Conv4
4×4
256
1
Conv5
3×3
256
1
FC
4096
Conv6
3×3
256
1
Maxpooling
2×2
2
Conv7
3×3
512
1
Conv8
3×3
512
1
Maxpooling
2×2
2
Conv9
3×3
512
1
Conv10
3×3
512
1
FC
4096
3. Food Foreign Matter Detection Method
3.1. Image Segmentation
In order to analyze and recognize images, mathematical morphology uses some structural elements as a tool to measure and extract the corresponding shape features in images.
3.1.1. Expansion
Two sets of extensions are synthesized using vector addition. The definition is shown in the following equation.The function of dilation operation is to integrate the background points around the object image with the object.
3.1.2. Corrosion
Erosion is vector subtraction of set elements, and corrosion is a dual operation of expansion. The definition of Errosion is shown in the following equation.
3.1.3. Open Operation
Open operation is an operation that uses the same structural elements to corrode and then expand the image [20]. The definition of the opening operation is shown in the following equation.The open operation can take several sharp corners extending into the background as the background and process the image in the open operation. Remove details, smooth boundaries, spikes, flanges, and narrow connections.
3.1.4. Closed Operation
Closed operation is an operation that uses the same structural elements to expand the image first and then erode it [21]. The closed operation closing is defined as follows.Closed operation can fill several small holes, connect two adjacent objects, filter the image externally, make light reflect to the sharp corners inside the image, and smooth the edges of the objects.The purpose of using mathematical morphology is to fill holes and eliminate burrs in images. For example, in order to obtain the relatively correct image pair and more detail of Figure 12(a), the image is first segmented and eroded and then dilated. As shown in Figure 12, remove external burrs and do not change the overall shape.
Figure 12
Image threshold segmentation, erosion, and swelling. (a) Image after threshold segmentation. (b) Image etching after segmentation. (c) Reexpansion of the etched image.
The segmentation threshold is the result of dilating and eroding the segmented image. As shown in Figure 13, performing morphology processing can effectively fill the cavity in the image and form a connection region including the segmented region including a burr portion of a nearby region of interest.
Figure 13
Image threshold segmentation, expansion, erosion. (a) Image after threshold segmentation. (b) Image etching after segmentation. (c) Reexpansion of the etched image.
Experiments show that in binary images, all four basic operations can perform noise filtering to a certain extent. Particularly, the dual operation of open-close operation can eliminate the fine part of the image and keep the overall shape unchanged so it is widely used to remove the noise of the image [22].
3.2. Foreign Body Identification Method
3.2.1. Feature-Based Recognition
Image feature extraction is carried out after image preprocessing and segmentation [23]. Because it is based on preprocessing and segmentation, it is easy to extract better features, and the difference and independence become stronger according to image features.Regional Characteristics. The basic parameters of region feature include region area, region center of gravity, and region shape feature, which are usually calculated by the set of all pixels belonging to the feature region.Area. The area of the characteristic region is the basic characteristic of the region and represents the size of the region. The calculation formula of the area of a region R is shown in the following equation:It can be seen from the formula that the area of the calculation area is the number of pixels in the statistical feature area.Regional Center of Gravity. The center of gravity of a region is the global description of the characteristic region, and the point coordinates of the center of gravity of a region are calculated from the points belonging to all regions, and there are generally many points in the region.Shape Parameters. The shape parameters of the region are usually used to describe the shape of the target region, and the shape parameters are calculated based on the periphery of the region contour and the area of the region. Shape parameters are insensitive to the change of area size.Contour Feature. The basic parameters of contour include contour length, contour diameter, inclination, curvature, corner, and so on.Length of Contour. Contour length is a simple regional characteristic, which is around the characteristic region.The Diameter of the Profile. The diameter of contour refers to the distance between the farthest two points in the region, that is, the length of the straight line segment between these two points, which plays a certain role in explaining the characteristics of the region.Inclination, Curvature, and Corners of the Contour. The inclination of the contour can indicate the direction of each point on the contour. Curvature is the rate of change of inclination, which indicates the change of each point on the contour in the contour direction.Grayscale Feature. The gray characteristics of feature regions are very important and easy to obtain, and they are also the most easily distinguished characteristics intuitively by human eyes.Feature-Based Recognition. Figure 14 shows a different X-ray photograph of packaged peanuts containing desiccant. The skeleton function in HALCON is used to construct the skeleton in various fields and calculate the length of the skeleton, the area, and center of each area. The results are shown in Table 2.
Figure 14
X-ray photos of peanuts in different packages.
Table 2
Central row and area of desiccant.
Desiccant area
Central communication
Central communication
Skeleton length
Area
Area-length ratio
a
201.641
369.5
98.78
6225
15.64
b
111.345
280.375
412.4
6239
15.44
c
213.784
446.807
59.98
6380
12.52
d
211.804
423.465
625.98
6448
10.40
e
195.355
358.584
454.95
6376
14.02
f
488.75
285.83
458.85
6400
13.94
As can be seen from Table 2, the area of desiccant in the figure is between 6223 and 6447. In the actual food production process, there is only one desiccant for the same kind of packaged food, and the characteristics of the same type of desiccant are basically stable. Therefore, according to the area characteristics of the area, the desiccant area can be effectively distinguished from the foreign matter, and the desiccant can be quickly excluded from the foreign matter.
3.2.2. Recognition Based on Template Matching
We find objects using template images [24]. In order to find the position of the template in the image, it is necessary to calculate the similarity between all relevant bit positions of the template and various positions of the image. In the case of high similarity, an example of this template is found.Assuming the bit position of the object, it can be described by translation. Similar metrics are obtained at each point, and the result can be regarded as an image as shown in the following equation:Calculating similarity in the whole image is a very time-consuming task. In order to improve the speed of the algorithm, it is necessary to reduce the number of bit gestures studied and the number of points in the template. An image pyramid can be constructed, and the pyramid model is shown in Figure 15.
Figure 15
Pyramid model.
Similar measurement methods mentioned above only allow small rotation and scaling of objects in images. If the orientation and scaling of the object in the image are different from those of the template, the object cannot be found. In actual packaged food, desiccant has great rotation deviation, but the scaling situation is very small. In order to find the rotating object in the image, we create a template with multiple directions and discretize the space to realize the purpose of searching the rotating object.The shape matching algorithm based on HALCON mainly constructs templates for small areas of interest. The steps are as follows:The ROI region of the template is determined, and the image of the region is obtained from the image.Create a template using the shape1 () of the cleaner. This function has many arguments. The series of the pyramid is specified by Numlevels. The larger the value is, the less time it takes to find the object. AngleStart and Angledent determine the range of possible rotation. Angle Step specifies the steps for angle range retrieval.After creating a template, you can open other images for template matching. This process is looking for the image part in the new image that is consistent with the template. If you need to be more accurate, set it in “last.” Since template matching adds extra time, this actually requires a trade-off between time and accuracy. The two more important parameters are MinSocre and Greediness, which were used to analyze the rotational symmetry of templates and their similarity last time. The larger the value is, the more similar it is. The latter is to search for greed. This value has a great influence on the retrieval speed. In most cases, increase the value as much as possible when a match can be made.If a matching template instance is found, the functions vector _ angle _ to _ rigid () and affine _ trans _ contour _ xld () are converted and displayed.Using the shape template matching technique, it is possible to find an image part consistent with the template in the picture of Figure 16. The rectangular area in Figure 16 is the desiccant template for shape template matching, the template position center is (213.5, 434.5), the angle is 0.14061 rad, the width is 47.0744, and the height is 63.1916.
Figure 16
Shape matching template.
The desiccant in Figure 17(b) does not match well. The result that the food is placed at a certain angle in Figure 17(b) is analyzed. When designing the system, try to overcome the mismatch caused by image capture.
Figure 17
Matching results of peanut shape templates with different packages.
The result data of template matching are shown in Table 3. Using template matching technology, the position of desiccant packaged food can be effectively obtained.
Table 3
Position and matching degree of desiccant in X-ray picture.
Desiccant
Walk
Column
Angle
Matching degree
a
205.467
378.568
6.15237
0.988416
b
114.756
295.135
0.667849
0.815648
c
215.759
456.995
6.2635
0.950045
d
213.569
434.59
0.0015894
0.991715
e
197.658
369.789
0.015569
0.973565
f
490.789
276.405
3.23678
0.902236
4. Experiment
In this section, through the experiment and analysis of the previous methods, the Faster-SRCNN and YOLO models are firstly used to train and test on CNFood-252 dataset. Faster R-CNN takes VG16, ResNet V1-50, ResNetV1-152, and MobileNetV1 as the extraction network of feature maps and chooses YOLO as the extraction network.All the experiments in this section were performed on a PC machine with Windows 10. CPU is Inter Corei9-9900k, GPU uses RTX2080ti with 11 G in accelerated model training and testing, and the model used runs on Python 3.6, Tensorflow 1.8.0, CUDA 9.0.30000 pieces randomly selected from CNFood-252 dataset are used as training group and the remaining 2190 pieces of verification group are used as test group. All kinds of foods are kept in the order of classification and distributed evenly. We guarantee that some categories will not leak when randomly sampled. Experimental results show that the detection speed of YOLO series is much faster than that of Faster-CNN, but the accuracy is low. From the perspective of correctness, it proves the feasibility of food target inspection in Table 4.
Table 4
Baseline result.
Algorithm
Accuracy (%)
Detection speed (s)
YOLOV2
88.89
0.0063
Tiny-YOLOV2
90.06
0.0058
Faster R-CNN@VGG16
92.13
0.1565
Faster R-CNN@ResNetV1-50
89.69
0.1065
Faster R-CNN@ResNetV1-101
91.23
0.1256
Faster R-CNN@ResNetV1-152
91.56
0.1456
Faster R-CNN@MobileNetV1
83.56
0.0511
The crude sample enhancement method of the above food image samples is to verify its performance, and 12 kinds of foods are selected from the data of CNFood-252. The sample is shown in Figure 18.
Figure 18
Sample enhanced instance diagram.
The Tiny-YOLOV2 model was used for training, 12 species were randomly selected by CNFood-252, and 1522 samples were tested. We calculate whether the object of each food is detected or not. 1522 samples contain 6011 food objects. Table 5 shows the distribution of correct inspection, missed inspection, and false inspection of various foods.
Table 5
Enhanced detection results of food image samples.
Food category
Number of categories
Error detection
Missed detection
Correct detection
Accuracy (%)
Green vegetable with bean skin
500
1
15
484
96.80
Egg
497
0
23
476
95.56
Sesame cake
488
3
65
421
86.26
Cake rolls
518
2
115
362
75.45
Vegetarian chicken
486
5
26
488
93.56
Fried meat with celery
516
2
18
464
95.56
Braised chicken nuggets
496
5
24
484
94.46
Thousands of kelp
495
3
15
477
95.56
Pig's trotters
504
5
42
451
91.22
Fried hairtail
513
2
51
445
88.13
Lion Head Vermicelli
512
0
46
472
91.56
Auricularia yam
512
4
18
489
95.46
As can be seen from Table 5, only one sample can be taken for various foods. After strengthening the image by this method, the correctness can reach 91.8%. In this experiment, only 12 kinds of food images have been collected for experiments, and there are great differences among various kinds of foods, but the experimental results prove that this method is effective for enhancing crude samples.Because the training sample contains only one kind of food in one picture, the test sample contains many kinds of food. Therefore, there is a leak box in the repeated part of the food, and the test result diagram is shown in Figure 19.
Figure 19
Partial test results.
It is proposed that the detection task should be reclassified into localization and classification tasks, and food image recognition should be carried out by two-stage training mode. The statistical results are shown in Table 6, and the experimental results are shown in Figure 19.
Table 6
Result statistics.
Algorithm
Classified image size
Classification accuracy
Detection speed
Tiny-YOLOV2@InceptionV3
299×299
91.56
0.1289
Tiny-YOLOV2@MobileNetv1
224×224
80.86
0.0958
Tiny-YOLOV2@MobileNetV2
224×224
87.66
0.0956
Tiny-YOLOV2@NasNet_Large
331×331
89.61
0.1356
Tiny-YOLOV2@PNasNet_Large
331×331
76.46
0.1355
Tiny-YOLOV2@ResNetV2-50
224×224
87.56
0.1123
Tiny-YOLOV2@ResNetV2-101
224×224
92.56
0.1256
The correct rate shown in Table 6 is the percentage of the correct rate, and the detection time unit is seconds.YOLO-SIMAMV1 and YOLO-SiamV2 were not trained in transfer learning. The parameters of each layer need to be retrained to determine the best superparameters in the following range. The model performance is shown in Table 7.
Table 7
Model performance comparison.
Algorithm
Accuracy (%)
Detection speed (s)
YOLO-SiamV1
15.34
0.0589
YOLO-SiamV2
41.56
0.0745
Immediate detection
5.00
—
As can be seen from Table 7, the accuracy of YOLO-SiamV1 is low, and the accuracy of improved YOLO-SiamV2, which is related to the small number of network layers, reaches 45.75%. Although the accuracy of measurement and classification is much lower than that of previous methods, the double crystal network has proved to play a certain role in food image matching, and the future research is mainly to improve the accuracy.
5. Conclusion
In this paper, for the practical application of restaurant, the detection task is redivided into location and classification tasks, and the convolutional neural network is used to solve each task, which proves that the experimental results of CNFood-252 dataset play a certain role in improving the recognition accuracy. Then, because the measurement method needs to collect a large number of training samples for display, it is difficult in practical application, so the image matching method is used to identify, and the dataset with less samples is constructed. FewFood-50 combines Tiny-YOLO and twin networks to propose a two-stage learning mode of YOLO-SIMM and designs two versions of YOLO-SiamV1 and YOLO-SiamV2. The experimental results of FewFood-50 dataset show that the highest accuracy of this method is only 45.75%, but there is no need to label samples manually, which proves that it has a good development prospect in practical popularization or application.At the same time, by correcting the original image, higher quality X-ray photos can be obtained. Using threshold segmentation technology, most packaged food products can effectively separate iron wire foreign bodies from food background, but it is impossible to effectively distinguish desiccant from foreign bodies only by threshold segmentation. The mathematical morphology, feature extraction, and template matching of the image are studied and experimented. The experiment shows that the desiccant and foreign matter can be distinguished effectively, which contributes to the safety of food and can get the desired results.