Literature DB >> 32140596

A comparative study of the effectiveness of using popular DNN object detection algorithms for pith detection in cross-sectional images of parawood.

Abstract

The location of pith in a cross-sectional surface of wood can be used to either evaluate its quality or guide the removal of soft wood from the wood stem. There have been many attempts to automate pith detection in images taken by a normal camera. The objective of this study is to comparatively study the effectiveness of two popular deep neural network (DNN) object detection algorithms for parawood pith detection in cross-sectional wood images. In the experiment, a database of 345 cross-sectional images of parawood, taken by a normal camera within a sawmill environment, was quadrupled in size via image augmentation. The images were then manually annotated to label the pith regions. The dataset was used to train two DNN object detection algorithms, an SSD (single shot detector) MobileNet and you-only-look-once (YOLO), via transfer learning. The inference results, utilizing pretrained models obtained by minimizing a loss function in both algorithms, were obtained on a separate dataset of 215 images and compared. The detection rate and average location error with respect to the ground truth were used to evaluate the effectiveness of detection. Additionally, the average distance error results were compared with the results of a state-of-the-art non-DNN algorithm. SSD MobileNet obtained the best detection rate of 87.7% with a ratio of training to test data of 80:20 and 152,000 training iterations. The average distance error of SSD MobileNet is comparable to that of YOLO and six times better than that of the non-DNN algorithm. Hence, SSD MobileNet is an effective approach to automating parawood pith detection in cross-sectional images.

Entities: Chemical Disease Species

Keywords: Computer science; Deep neural networks object detection; Parawood pith location; SSD MobileNet; Wood pith detection; You-only-look-once

Year: 2020 PMID： 32140596 PMCID： PMC7049650 DOI： 10.1016/j.heliyon.2020.e03480

Source DB: PubMed Journal: Heliyon ISSN： 2405-8440

Introduction

The wood pith is located roughly at the centre of the cross-section of a wood stem. It has two main uses. First, the wood quality can be evaluated using pith location (Fallah et al., 2012). High quality wood tends to have the pith at the centre. Second, it is used to locate regions of soft wood with low quality along the wood stem that must be removed. It has been experimentally shown that the knot within a stem, whose surrounding regions are soft, can be completely removed by locating the pith locations at both ends of the stem and pushing it through a vertically aligned bandsaw. The soft region of the wood is likely to be within a single board at the centre of the wood stem. This technique is used in several small to medium-sized sawmills in South and Southeast Asia for processing parawood (Hevea brasiliensis) to produce wood parquets and other wood products. Locating the wood pith in the cross-section of wood stems originating from Europe is fairly easy because they are cultivated under controlled conditions to make them grow vertically (i.e. by removing their branches). Their pith locations are likely to be at the centre of the stems. Parawood is very different because it is cultivated to produce latex, and several branches and leaves are required for sufficient photosynthesis. Only after about 20 years, when the tree begins to produce insufficient amounts of latex, is it cut down and replaced. Thus, parawood is considered a by-product. Because of the way it is grown, its pith location tends to deviate from the cross-section centre. However, parawood has been successfully used as an alternative to other tropical woods extracted from natural forests (Joerg Balsiger and Bahdon, 2000). Furthermore, parawood has proven to be very versatile in furniture manufacturing and the wood-based panel industry. Where forests are scarce, particularly in South and Southeast Asia, the use of parawood as fuel continues to increase pressure on natural forest resources. Yield increase from efficient processing could significantly improve parawood utilization and prevent natural forests from being cut down while reducing pollution caused by use of the waste as low-cost fuel. Many studies have proposed using the geometric relationships of the annual rings within the cross-section of wood stems to locate the pith (Krähenbühl et al., 2012; Boukadida et al., 2012; Kurdthongmee et al., 2018; Kurdthongmee and Suwannarat, 2019). Once the boundary of an annual ring has been extracted from an image, image processing routines are used to create segments from neighbouring pairs of pixels that form the boundary. The normal lines of the segments are then generated. This procedure produces a set of normal lines, and their intersection points can be used to approximate the pith location. Several algorithms have been presented to effectively extract annual rings or a wood stem boundary and approximate the pith location. Some algorithms require sophisticated and expensive image capturing instruments (e.g. CT or X-ray scanners). Others are suitable only for images with imperfection-free cross-sections; rot, dirt, snow, or defects from the cutting process must first be removed (Hanning et al., 2003; Longuetaud et al., 2004; Österberg et al., 2004). Unfortunately, using these algorithms and instruments to analyse the cross-sectional surface of parawood is complicated. Its surface is likely to contain several imperfections, mainly caused by diffused latex, mounds, or defects from the cutting process. Removing them to make the pith clearly visible for computer vision methods requires labour intensive mechanical or chemical pre-processing stages. Deep neural networks (DNN) have been successfully applied in many application domains to either classify (Canziani et al., 2016) or detect objects (Cheng et al., 2016a, 2016b, 2019) in an image. Regarding wood manufacturing and processing, it has been successfully used for defect classification, including the wood pith classification, in lumber images (Hu et al., 2019). The RestNet18 network (He et al., 2015) was investigated using an 80:20 ratio of training to test images. Four datasets that consist of two common lumber defects (cracks, insect defects, sound knots, and unsound knots), wood textures, and wood species were used and classification accuracies of 98.16%, 93.32%, 96.64%, and 99.50% were obtained. Here, the classification result only indicates the existence of defects within the image. This is in contrast to a detection result that additionally identifies the regions around the defects. To our knowledge, no studies have been published on the topic of wood pith detection. The main objective of this study is to investigate the effectiveness of applying DNN to detect parawood pith within a cross-sectional image captured using a normal camera. None of the samples were treated to remove defects from the cross-sectional surface. Additionally, all trained weights of the model, for both SSD (single shot detection) MobileNet and YOLO (you-only-look-once) are provided for transfer learning.

Materials and methods

Parawood pith image dataset preparation

In this experiment, 345 cross-sectional images of parawood from sawmills in south Thailand were collected. The images were captured using a digital camera (iPhone 6, Apple Inc.) in the working environment of sawmills under various lighting conditions. This was to ensure that a variety of images were obtained to train the DNN. Neither cleaning using chemical substances nor mechanical polishing were applied to the parawood surfaces. The images include the whole area of the surfaces. To increase the number of suitable images for DNN training, these original images were processed using the open-source image augmentation tool Image Augmentor (https://github.com/mdbloice/Augmentor). It generated new images by changing the angle, saturation, exposure, and hue of the original images. After image augmentation, the total dataset size was increased fourfold to 1,380 images. The data set was then manually annotated using the open-source labelImg tool (https://github.com/tzutalin/labelImg). For each image within the dataset, the pith location was identified under the guidance of the log scalers from the sawmills. A rectangle was superimposed on the original image such that it was centred exactly at the pith location and covered at least three annual rings surrounding the pith. This also automatically produced the ground-truth pith location of the cross-section within the image (i.e. the location is at the centre of the rectangle). Figure 1 illustrates two sample images of cross-sectional surfaces after annotation. Annotation files were produced for each image in the dataset. Because two DNN object detection algorithms were employed in our experiment, two annotation file formats were required: one for SSD MobileNet and one for YOLO. The labelImg tool is able to generate these formats. Note that our dataset is available upon request.

Figure 1

Two sample images of cross-sectional surfaces after annotation. Rectangles centred on each pith have been drawn.

Two sample images of cross-sectional surfaces after annotation. Rectangles centred on each pith have been drawn. All 1,380 images and corresponding annotation files were then split into two datasets for training and testing. Different configurations were used with ratios of 60:40, 70:30, 80:20, and 90:10. The images and their corresponding annotation files were randomly selected and assigned to these datasets using an open-source Python script called splitTrainingAndTesting (https://www.learnopencv.com/training-yolov3-deep-learning-based-custom-object-detector/). Figure 2 summarizes the image dataset preparation process described above.

Figure 2

Flowchart of the image dataset preparation for configuring datasets with different ratios of training images to test images.

DNN training

After the training and test data were prepared, they were used to train the DNNs. DNNs are well known for being slow to train on a moderate-performance computer system. The use of a graphics processing unit (GPU) is therefore recommended to increase the resources of a computer system. Alternatively, cloud services, which have re-configurable hardware components, can be used to accelerate execution. In this study, we used Google Cloud Platform (GCP) resources during the training stage. We used the following GCP configuration: two virtual CPUs; 13 GB memory; a single NVIDIA Tesla K80 GPU; and a Linux operating system. Because it took a very long time to train the DNN even on the GCP with GPU acceleration, a termination criterion was required. Note that during the DNN training stage, the procedure periodically performs testing and saves the current weights of the model trained with respect to the test data to minimise a loss function. The loss was therefore used as the training termination criteria.

SSD MobileNet configuration

We employed the ‘AI engine’ of the GCP to train the SSD MobileNet. This engine was specially created to simplify platform creation in TensorFlow DNN applications. The normal procedures of the AI engine (Google, 2019) contained in its configuration file were followed to create all training jobs, which were submitted to the Al engine. The three primary details for the training jobs described in the configuration file are as follows. The transfer learning method was implemented using the following pre-trained model: ‘ssd_mobilenet_v1_0.75_depth_300 × 300_coco14_sync_2018_07_03’. The default hyperparameters and DNN model configuration were used for training (see Table 1).

Table 1

Training parameter values of SSD MobileNet and YOLO.

Parameters	SSD MobileNet	YOLO
Batch/epoch	20,000	1,000
Batch size	64	64
Momentum	0.9	0.9
Weight decay	0.95	0.0005
Learning rate	0.004	0.001

Training parameter values of SSD MobileNet and YOLO. No built-in image augmentation function was used because the dataset had already undergone image augmentation (as detailed in Section 2.1).

YOLO configuration

The GCP does not provide any engine to automate YOLO training. It was necessary to install the Darknet framework provided by YOLO researchers (Redmon and Farhadi, 2018; Redmon et al., 2016) and execute it directly from the Linux command shell of the GCP. The YOLO training configuration was as follows. The Darknet source code for YOLOv3 (Redmon and Farhadi, 2018) was employed. It was recompiled to run on the GCP shell using the GPU. Transfer learning with the ‘darknet53 checkpoint’ was used. A Tiny-YOLOv3 network with 23 layers (Redmon and Farhadi, 2018; Redmon et al., 2016) was used in all experiments (see Table 1). All parameters defined in the Darknet configuration file were unchanged except for the image augmentation option, which was disabled for the reasons given above.

Detection performance metrics

The application of DNN object detection to the pith detection task differs from general object detection. This is because the aim of detection is to find a pith in the image as well as identify its location. This location is used in further processing (i.e. to judge if the wood stem is of high quality or to guiding a wood stem turning machine). Therefore, the detection performance metrics are the detection rate and the distance from the ground truth (the location error). The former measurement is the ratio of the number of the images in which the pith is detected by DNN object detection to the total number of images, whereas the latter is defined bywhere is the pith location estimated by DNN object detection and is the ground truth pith location. The average location error of all detectable piths within a test dataset of images is defined simply as The training and test datasets described earlier, which are variable in size and randomly selected, make it difficult to compare the location error of different configurations as well as the performance of SSD MobileNet and YOLO. Hence, we also created an inference dataset with a fixed number of images. It was used to compare the effectiveness of inference all configurations and object detection algorithms. The dataset consists of 215 images ranging from clean and clear cross-section surfaces to an unobservable annual ring cross-section. None of the images in this dataset are included in the training and test data. This is to ensure that the results from the training stage can be generalized to a completely new dataset (i.e. images that were not used for training). Figure 3 summarizes the method used for training and inference described above. Note that the inference dataset is available upon request.

Figure 3

Flowchart of the research method for DNN training and inference.

Flowchart of the research method for DNN training and inference. The inference dataset consists of images with different levels of difficulty for the object detector. These difficulties result from different factors (mostly from diffusing latex during cutting and the quality of the cutting tools), wood compression (which makes a pith likely to deviate from the centre of the stem), mounds, and improper illumination during image capture. Log scalers from the sawmills were consulted to help categorize the level of difficulty of the images. The difficulty is classified into four levels: ‘very easy,’ ‘easy,’ ‘difficult,’ and ‘very difficult.’ Figure 4 illustrates sample images for each level of difficulty. The pith location of the ‘very easy’ example is almost at the centre of the cross-section, which has clearly observable annual rings. The ‘easy’ example also shows observable annual rings and pith, but it appears to be disturbed by mounds. The ‘difficult’ example appears to have no annual ring features and contains many defects from cutting. Finally, the ‘very difficult’ example has a pith far from the stem centre and the illumination quality is low. For benchmarking the effectiveness of the object detection algorithms, we also provide a ground truth text file that lists the coordinates of all piths in all dataset images.

Figure 4

Examples of wood cross-section images with four levels of detection difficulty.

Results

The training loss of both object detection algorithms was recorded for all configurations for 200,000 iterations. Other functions can be used to measure the quality of a trained model (i.e. the inference result of the trained model versus the ground truth). However, these functions require many modifications to the DNN object detection algorithms. The loss, which is defined as the difference between the output and target variable, is a good indicator of the quality of a trained model. Figure 5 compares of the loss functions of both DNNs during training between 12,000 and 172,000 iterations. During the training stage, the loss function of both DNNs seem to behave similarly. All values for all configurations are initially very large and slowly decay to a constant value after 200,000 iterations. It is clear that YOLO is likely to obtain a lower loss. The two configurations shown in this figure produce the lowest loss of all configurations. The number of iterations that lead to the minimum loss is selected to extract the best trained model for further use in the inference stage. That is, the trained models of YOLO and SSD MobileNet were extracted from iterations 132,000 and 152,000, respectively.

Figure 5

Comparison of the loss functions for both DNNs during the training stage; the 70:30 YOLO configuration is shown in blue, and the 80:20 SSD MobileNet configuration is shown in orange.

Comparison of the loss functions for both DNNs during the training stage; the 70:30 YOLO configuration is shown in blue, and the 80:20 SSD MobileNet configuration is shown in orange. An experiment was carried out using the 70:30 YOLO model trained for 132,000 iterations (referred to as the YOLO pith detector) and the 80:20 SSD MobileNet model trained for 152,000 iterations (referred to as the SSD MobileNet pith detector) to evaluate the detection performance (Section 2.3). The inference dataset was employed in the comparison. The confidence levels of detection were fixed to 0.4 (40%) for both the YOLO and SSD MobileNet pith detectors. This means that all objects detected within an image that have a confidence level greater than or equal to 0.4 are reported as piths. Table 2 shows the comparison results. The detection rate of the SSD MobileNet pith detector is superior to that of the YOLO pith detector. It is capable of making correct pith detections 83.6% of the time. However, when the location error is considered, the YOLO pith detector outperforms the SSD MobileNet pith detector because it has half the average location error. Both DNN pith detectors significantly outperform the non-DNN algorithm (Kurdthongmee et al., 2018).

Table 2

Detection rate, location errors, and the average detection time of the YOLO, SSD MobileNet, and non-DNN-based pith detection algorithms (Kurdthongmee et al., 2018) on the inference dataset.

Algorithm	Detection Rate (%)	D¯	DSD	Dmax	Dmin	T¯(s)
YOLO (70:30) @132,000	76.6	7.61	13.8	96.71	0.05	1.05
SSD MobileNet (80:20) @152,000	87.7	7.04	15.32	193.19	0.11	0.78
Kurdthongmee et al. (2018)	N/A	41.76	25.07	230.73	0.02	31.86

Detection rate, location errors, and the average detection time of the YOLO, SSD MobileNet, and non-DNN-based pith detection algorithms (Kurdthongmee et al., 2018) on the inference dataset. Table 2 also shows the average detection times of all three algorithms. These detection/execution times are the times taken to process the inference dataset divided by the size of the inference dataset. The results show that the SSD MobileNet pith detector is also superior to the YOLO pith detector and the non-DNN pith detector in terms of average detection time. In Figure 6, some examples of wood cross-sectional images along with the detected pith locations are illustrated. This figure compares the pith locations detected by the method of Kurdthongmee et al. (2018), the YOLO pith detector, and the SSD MobileNet pith detector. It is clear that both DNN pith detectors detect piths very close to their ground truth locations. The YOLO pith detector produces two pith locations in Figure 6b. The overall inference results revealed that the YOLO pith detector is likely to output several pith detections in an image with a single pith. The closest one can be determined by changing the confidence threshold. If it is increased, incorrectly detected piths are removed, but this also reduces the detection rate. This is a disadvantage of the YOLO pith detector. Therefore, the SSD MobileNet pith detector is more suitable for practical application in sawmills.

Figure 6

Examples of wood cross-sectional images with detected pith locations. Detections using the method of Kurdthongmee et al. (2018) (left column) are indicated by blue circles, the YOLO (middle column) and SSD MobileNet (right column) pith detectors are indicated by white and green boxes, respectively. In addition, to further evaluate the effectiveness of the DNN pith detector, we performed experiments on a publicly available dataset. Because a parawood cross-section image dataset was not available, we used a European origin cross-section image dataset that was prepared by Norell and Borgefors (2008) to test their pith estimation algorithm. The dataset consists of 53 images of pine and spruce wood cross sections taken under normal illumination and without treatment to the surfaces. The dataset was processed by the SSD MobileNet and the YOLO pith detectors to determine the detection rate and location errors. Table 3 presents the results. It can be seen that SSD MobileNet again yields a higher detection rate and lower location error. The average location error of the detected piths is less than the estimation error of Norell and Borgefors’ algorithm. We note that the results of both DNN pith detectors are better than those on our own dataset (Table 3). This is because the wood surface quality in these images is better than that of the parawood, even without surface treatment.

Table 3

Detection rate and location errors of the YOLO and SSD MobileNet pith detectors for the dataset of Norell and Borgefors (2008).

Algorithms	Detection Correctness (%)	D¯	DSD	Dmax	Dmin
YOLO (70:30) @132,000	80.5	6.42	10.68	19.62	0.12
SSD MobileNet (80:20)	89.2	5.12	9.27	17.25	0.24
@152,000	NA	8.47	9.69	18.45	0.74

Detection rate and location errors of the YOLO and SSD MobileNet pith detectors for the dataset of Norell and Borgefors (2008).

Discussion

In our study, two well-known DNN object detection algorithms, SSD MobileNet and YOLO, were compared. An image dataset was created and split into training and inference datasets. Experiments were performed to train both object detection algorithms to find the best configuration ratio (of training to test data) and number of training iterations (yielding the minimum loss) that produce the best trained model. The default training parameters of both algorithms were fixed to make the results comparable. We also evaluated the trained models using detection rate and average location error. The results of our study reveal that both DNN pith detectors are superior to the most recently proposed non-DNN algorithm (Kurdthongmee et al., 2018). This is because the non-DNN pith location algorithm relies heavily on the quality of the cross-sectional surface of parawood. If there are any defects in the cross-sectional surface image, many features that are required to assist pith location effectively disappear. In contrast, the DNN pith detectors rely on a higher-level scheme to extract and create the features of wood piths from different images. A wood pith is distinguished by its surrounding annual rings, colour, and its neighbouring wood sections. We have observed that most wood piths are correctly detected if they have some crack lines that originate from the centre of the piths and emanate to the boundary of the wood stem (see Figure 6d). Both DNN pith detectors are capable of detecting piths of this type. To our knowledge, these crack lines have not been employed in non-DNN pith locating algorithms. The effectiveness of the SSD MobileNet pith detector is better than that of the YOLO pith detector. This could be partly because the internal network structure of the former is more sensitive to the accumulated knowledge of this specific object type. Additionally, the network structure of SSD MobileNet includes more convolution layers than does YOLO. This study reveals that the SSD MobileNet pith detector is superior to the YOLO pith detector with respect to the training dataset of the original 345 parawood cross-sectional images. Although image augmentation was employed to increase the dataset fourfold, this approach seems to be limited from the point of view of DNN object detection training. It could be possible to further increase the dataset size by increasing the variety of cross-sectional images by either collecting more realistic images or creating more using other features of the image augmentation tool. We believe this to be the best approach of choice to increase the detection capability of DNN object detection algorithms, and we suggest it as future work.

Conclusions

Parawood has been continually used as an alternative to natural wood in furniture industries worldwide. Increasing the yield of parawood processing increases the profit for sawmills as well as reducing the usage of protected natural woods and pollution from the use of the waste as a low-cost fuel. Automatic parawood pith detection is a starting point for removing the poor-quality parts of parawood. Previously proposed algorithms for locating piths rely on geometric relationships between the pith and annual rings. They can be applied successfully to mechanically and/or chemically treated cross-section surfaces of European-origin woods. However, their applicability to parawood cross-sections, which have many defects, is limited. In this study, we applied DNN object detection algorithms to the problem of parawood pith detection. We compared two popular DNN object detection algorithms: SSD MobileNet and YOLO. Both algorithms were trained via transfer learning and used for inference with separate datasets. This was done to ensure that the detection results would be comparable and repeatable. Different splits of training and test data were tested. For each algorithm, the number of training iterations used to extract the model for inference was selected based on the minimum loss. The results show that the SSD MobileNet pith detector outperforms the YOLO pith detector in terms of detection rate and average location error. However, YOLO gives rise to better results when the standard deviation as well as the maximum, and minimum location errors are considered. Both pith detectors perform better than the state-of-the-art non-DNN-based algorithm by a factor of six in terms of average location error. To make the experiments reproducible and extendable, we provide all datasets used for training and inference and as well as the number of training iterations to the research community for further retraining of DNNs via transfer learning. It is expected that better detection rates with smaller location errors are achievable.

Declarations

Author contribution statement

Wattanapong Kurdthongmee: Conceived and designed the experiments; Performed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper.

Funding statement

This work was supported by the Thailand Research Fund (TRF) and Walailak University, Thailand (grant number RSA6280097).

Competing interest statement

The authors declare no conflict of interest.

Additional information

No additional information is available for this paper.

1 in total

1. Learning Rotation-Invariant and Fisher Discriminative Convolutional Neural Networks for Object Detection.

Authors: Gong Cheng; Junwei Han; Peicheng Zhou; Dong Xu
Journal: IEEE Trans Image Process Date: 2019-01 Impact factor: 10.856

1 in total