| Literature DB >> 35268130 |
Yongliang Qiao1, Tengfei Xue2, He Kong3, Cameron Clark4, Sabrina Lomax4, Khalid Rafique1, Salah Sukkarieh1.
Abstract
Computer vision-based technologies play a key role in precision livestock farming, and video-based analysis approaches have been advocated as useful tools for automatic animal monitoring, behavior analysis, and efficient welfare measurement management. Accurately and efficiently segmenting animals' contours from their backgrounds is a prerequisite for vision-based technologies. Deep learning-based segmentation methods have shown good performance through training models on a large amount of pixel-labeled images. However, it is challenging and time-consuming to label animal images due to their irregular contours and changing postures. In order to reduce the reliance on the number of labeled images, one-shot learning with a pseudo-labeling approach is proposed using only one labeled image frame to segment animals in videos. The proposed approach is mainly comprised of an Xception-based Fully Convolutional Neural Network (Xception-FCN) module and a pseudo-labeling (PL) module. Xception-FCN utilizes depth-wise separable convolutions to learn different-level visual features and localize dense prediction based on the one single labeled frame. Then, PL leverages the segmentation results of the Xception-FCN model to fine-tune the model, leading to performance boosts in cattle video segmentation. Systematic experiments were conducted on a challenging feedlot cattle video dataset acquired by the authors, and the proposed approach achieved a mean intersection-over-union score of 88.7% and a contour accuracy of 80.8%, outperforming state-of-the-art methods (OSVOS and OSMN). Our proposed one-shot learning approach could serve as an enabling component for livestock farming-related segmentation and detection applications.Entities:
Keywords: deep learning; one-shot learning; precision livestock farming; pseudo-labeling; video segmentation
Year: 2022 PMID: 35268130 PMCID: PMC8908826 DOI: 10.3390/ani12050558
Source DB: PubMed Journal: Animals (Basel) ISSN: 2076-2615 Impact factor: 2.752
Figure 1The framework of one-shot learning with PL for cattle video segmentation.
Figure 2The proposed Xception-FCN network architecture.
Figure 3Samples of images in the challenging cattle dataset.
Comparison of the different video segmentation methods.
| Time (s/f) | ||||||||
|---|---|---|---|---|---|---|---|---|
| OSMN | 80.0 | 93.4 | 16.6 | 62.1 | 74.6 | 11.3 | 47.4 | 1.21 |
| OSVOS | 84.4 | 97.5 | 13.9 | 75.0 | 89.4 | 14.7 | 46.2 | 0.76 |
| Ours-PL | 87.6 | 98.6 | 10.6 | 79.4 | 96.4 | 12.3 | 48.9 | 0.42 |
| Ours | 88.7 | 99.8 | 9.0 | 80.8 | 97.7 | 10.7 | 45.2 | 0.44 |
“M” is short for mean, “R” represents recall and “D” indicates decay. Noticeably, the up-arrows beside the metrics indicate that the higher the metric is, the better it is. Similarly, the down-arrows indicate that a lower figure is preferred. Note that since the original OSMN does not contain fine-tuning, for a fair comparison, fine-tuning was added to the used OSMN.
Figure 4Segmentation results of different approaches on the cattle dataset.
Figure 5Qualitative results of our approach on the cattle dataset.
Comparison of our approach against the downgraded version without pre-training.
| Ours | 88.7 | 99.8 | 9.0 | 80.8 | 97.7 | 10.7 | 45.2 |
| Ours-BT | 77.1 | 82.4 | 30.2 | 66.7 | 73.0 | 30.0 | 56.5 |
| Ours-OT | 87.4 | 99.8 | 10.9 | 75.6 | 92.4 | 19.4 | 41.1 |