Caili Gong1,2,3, Yong Zhang1, Yongfeng Wei2,3, Xinyu Du2, Lide Su1, Zhi Weng1,2,3. 1. College of Mechanical and Electrical Engineering, Inner Mongolia Agricultural University, Hohhot, China. 2. School of Electronic Information Engineering, Inner Mongolia University, Hohhot, China. 3. State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Inner Mongolia University, Hohhot, China.
Abstract
Automatic estimation of the poses of dairy cows over a long period can provide relevant information regarding their status and well-being in precision farming. Due to appearance similarity, cow pose estimation is challenging. To monitor the health of dairy cows in actual farm environments, a multicow pose estimation algorithm was proposed in this study. First, a monitoring system was established at a dairy cow breeding site, and 175 surveillance videos of 10 different cows were used as raw data to construct object detection and pose estimation data sets. To achieve the detection of multiple cows, the You Only Look Once (YOLO)v4 model based on CSPDarkNet53 was built and fine-tuned to output the bounding box for further pose estimation. On the test set of 400 images including single and multiple cows throughout the whole day, the average precision (AP) reached 94.58%. Second, the keypoint heatmaps and part affinity field (PAF) were extracted to match the keypoints of the same cow based on the real-time multiperson 2D pose detection model. To verify the performance of the algorithm, 200 single-object images and 200 dual-object images with occlusions were tested under different light conditions. The test results showed that the AP of leg keypoints was the highest, reaching 91.6%, regardless of day or night and single cows or double cows. This was followed by the AP values of the back, neck and head, sequentially. The AP of single cow pose estimation was 85% during the day and 78.1% at night, compared to double cows with occlusion, for which the values were 74.3% and 71.6%, respectively. The keypoint detection rate decreased when the occlusion was severe. However, in actual cow breeding sites, cows are seldom strongly occluded. Finally, a pose classification network was built to estimate the three typical poses (standing, walking and lying) of cows based on the extracted cow skeleton in the bounding box, achieving precision of 91.67%, 92.97% and 99.23%, respectively. The results showed that the algorithm proposed in this study exhibited a relatively high detection rate. Therefore, the proposed method can provide a theoretical reference for animal pose estimation in large-scale precision livestock farming.
Automatic estimation of the poses of dairy cows over a long period can provide relevant information regarding their status and well-being in precision farming. Due to appearance similarity, cow pose estimation is challenging. To monitor the health of dairy cows in actual farm environments, a multicow pose estimation algorithm was proposed in this study. First, a monitoring system was established at a dairy cow breeding site, and 175 surveillance videos of 10 different cows were used as raw data to construct object detection and pose estimation data sets. To achieve the detection of multiple cows, the You Only Look Once (YOLO)v4 model based on CSPDarkNet53 was built and fine-tuned to output the bounding box for further pose estimation. On the test set of 400 images including single and multiple cows throughout the whole day, the average precision (AP) reached 94.58%. Second, the keypoint heatmaps and part affinity field (PAF) were extracted to match the keypoints of the same cow based on the real-time multiperson 2D pose detection model. To verify the performance of the algorithm, 200 single-object images and 200 dual-object images with occlusions were tested under different light conditions. The test results showed that the AP of leg keypoints was the highest, reaching 91.6%, regardless of day or night and single cows or double cows. This was followed by the AP values of the back, neck and head, sequentially. The AP of single cow pose estimation was 85% during the day and 78.1% at night, compared to double cows with occlusion, for which the values were 74.3% and 71.6%, respectively. The keypoint detection rate decreased when the occlusion was severe. However, in actual cow breeding sites, cows are seldom strongly occluded. Finally, a pose classification network was built to estimate the three typical poses (standing, walking and lying) of cows based on the extracted cow skeleton in the bounding box, achieving precision of 91.67%, 92.97% and 99.23%, respectively. The results showed that the algorithm proposed in this study exhibited a relatively high detection rate. Therefore, the proposed method can provide a theoretical reference for animal pose estimation in large-scale precision livestock farming.
The external behavior of dairy cows is a comprehensive reflection of their well-being and conditions. Daily poses (standing, walking, lying) can reflect the activity level of cows because cows generally reduce activity and increase lying during illness and show mounting behavior during estrus. It is very time-consuming, costly and subjective to record the individual information of dairy cows by long-term manual observation. Currently, many researchers have used various sensors to detect the behavior of dairy cows [1-4]. Wearable sensors pose certain disadvantages that may cause stress responses in dairy cows. Due to the advantages of long-term and noncontact continuous monitoring, machine vision has been used to monitor livestock activity and health in precision livestock farming. Recently, a large number of studies on the behavior detection of dairy cows based on machine vision have been conducted, such as lameness detection [5-7], estrus detection [8, 9] and prediction of the time of calving [10-13]. Studies have found that changes in the pose of cows can reflect their health and provide important data support for lameness detection, estrus detection, and prediction of calving.Currently, research on human pose estimation is relatively advanced and can accurately realize pose estimation in complex backgrounds. The bottom-up method using keypoint heatmaps was used for human pose estimation with a small model and high efficiency [14-17]. Li et al. [18] proposed a top-down method to tackle the problem of pose estimation in the crowd; this method used a single human pose detector to identify humans and then detected key points on each human frame.In recent years, animal pose estimation has received increasing attention, and much research has been performed on the adoption of deep learning algorithms for this task. Talmo et al. [19, 20] proposed a LEAP framework for tracking body-part positions of fruit flies under controlled light and uniform background and tested the method on freely moving laboratory mice. Liu et al. [21] introduced a video-based animal pose estimation architecture that took into account variability in animal body shape and temporal context from nearby video frames. This method exhibited high precision when tested on datasets of mice, zebrafish, and monkeys. Hahn Klimroth et al. [22] presented a multistep convolutional neural network for detecting three typical poses of African ungulates, obtaining a high accuracy of 93%. Zheng et al. [23] introduced Faster R-CNN on a deep learning framework to identify five poses (standing, sitting, sternal recumbency, ventral recumbency and lateral recumbency) and obtained accurate sow locations in loose pens. The estimation of pose change can reveal the health of the sow. Chen et al. [24] proposed an algorithm based on YOLACT with high detection speed and accuracy for real-time detection and tracking of multiple parts of pig bodies. Based on the prediction of keypoints, Song et al. [6] proposed a skeleton extraction model of cows in walking states with a high accuracy rate of up to 93.40% when the OKS was 0.75. Since the color of breeding sites is similar to the body color, pose estimation of dairy cows is more difficult than that for other animals. Relatively few studies have been performed to estimate the pose of cows.Based on long-term manual observation, the daily poses of dairy cows mainly include standing, walking, lying and transitioning between standing and lying. Therefore, we developed a daily pose estimation algorithm for multiple cows in an actual farm environment and performed experiments on the proposed model. We mainly focused on the classification of three daily poses (standing, walking, lying). The main scheme of pose estimation is to detect the object in each frame and then classify the different poses in the image. First, the YOLO v4 [25] model was built and fine-tuned to output the accurate object frame. Then, we extracted the keypoint heatmaps and PAFs of cows and matched the keypoints with the same cow based on a human body keypoint estimation model. Finally, the cow skeletons in different poses were input into the classification model to estimate the three typical poses of cows (standing, walking and lying).
Materials and methods
This study was carried out at Inner Mongolia Flag Animal Husbandry Co., Ltd. in Inner Mongolia Autonomous Region of China. Inner Mongolia Agricultural University has conducted scientific research with Inner Mongolia Flag Animal Husbandry Co., Ltd. for more than five years. The study does not require approval from the relevant authorities. There are no ethical issues. The data was acquired by the monitoring camera, which was fixed on the fence of the breeding site at a height of 4 m. During the experiment, neither the data acquisition equipment nor the experimental personnel contacted the cows and had no stress response to cows. Compared with traditional manual inspection and using wearable sensors, it can realize non-contact animal behavior detection and improve animal welfare.
Data acquisition
All data in this study was acquired at Inner Mongolia Flag Animal Husbandry Co., Ltd. between September 2017 and April 2018. Ten cows were housed in this barn with an activity area that was 35 m long and 20 m wide, ensuring sufficient space for free movement. To ensure the continuous around-the-clock monitoring of the cows’ behavior, two cameras with 5 million pixels infrared mode (Hikvision Inc., Zhejiang, China) were mounted at the fence of the breeding site at a height of 4 m. The surveillance video format was MP4. The image resolution and frame rate were 1920*1080 pixels and 25 fps, respectively. The videos were transmitted to a net video recorder with a 1T hard disk through the network cable and then copied to a computer through the USB interface every week for further processing. The background of the videos was the field of activity, and the cows walked freely without handlers.
Image labeling and data processing
(1) Data set of object detection
In this study, surveillance videos were used as the raw data, and each video was approximately 25 minutes long. Images were selected from 175 videos of 10 cows. To increase the robustness of the object detection model, images with single and multiple objects with occlusion were selected. Since the data used in this study were acquired throughout the day, the illumination conditions in the collected images also differed according to the movement of the sun. LabelImg, an open-source image labeling tool, was used to manually annotate the training set. The area of the annotation rectangle box was as small as possible to reduce the unrelated background pixels. We manually annotated the bounding boxes for 1800 sampled images for object detection that were divided into a training set (1620 images) and a validation set (180 images) at a ratio of 9:1. To improve the robustness of the detection algorithm, a mosaic augmentation algorithm was used to enrich the training set. It randomly selected 4 pictures from the total data set, performed mirroring, flipping, rotation, cropping and stitching at random positions to synthesize new images. After mosaic augmentation, the total number of images in the training set was 3240. A separate data set (400 images) was then selected as the test set from the surveillance video of 10 cows.
(2) Data set of keypoint detection and pose classification
According to the daily poses of cows in the activity field, there are three main types of poses: standing, walking and lying. Single cows with different poses were manually selected from the surveillance video, so that the images contain each condition (standing, walking, lying). The open-source tool Labelme [26] was used to manually label the keypoints of cows in 1800 images, including 600 images each for standing, walking and lying, which were divided into a training set and validation set at a ratio of 9:1. A sample image of keypoint labels is shown in Fig 1. These 16 keypoints (marked A, B, …, P) represent the head, left front leg root, right front leg root, left front knee, right front knee, left front hoof, right front hoof, left hind leg root, right hind leg root, left hind knee, right hind knee, left hind hoof, right hind hoof, neck, spine and coccyx. The visible keypoints (marked green) are named as 2, and the invisible point (marked red) is named as 1. The missing keypoints (marked on the top left corner of the image) were named 0. After obtaining the keypoint information under different behaviors, the pose data set was marked. Considering the influence of light conditions and occlusion on pose estimation, 390 images (including 130 images each for standing, walking and lying poses) with single cow and double cows with occlusion under different light conditions were used as the test set.
Fig 1
Keypoint label template of the cow skeleton.
Methods
The methods used in the previous studies of pose estimation are mainly divided into two types: the top-down methods that first detected objects prior to body part estimation, and bottom-up methods that first detected body parts and then grouped them into objects. In the top-down framework, the network for multianimal pose estimation mainly contains LEAP [19, 20], DeepPoseKit [27] and DeepLabCut [28], in which the detected individual objects are first captured and analyzed separately. The accuracy of pose estimation depends on the result of object detection, which is prone to missed detection and incomplete interception of the whole cow image. The bottom-up frame method is mainly used for multiperson pose detection. To improve the accuracy of pose estimation, we combine object detection with a pose estimation network in this study. Fig 2. shows the flow of the automatic analysis outputting the classification and confidence score from the pose estimation network.
Fig 2
(a) Overview of pose analysis procedure. (b) Close-up of the multiobject detector based on YOLO v4.
(a) Overview of pose analysis procedure. (b) Close-up of the multiobject detector based on YOLO v4.
Multiobject detection
The results for the separation between the cows and background directly affect the accuracy of pose estimation. To achieve the initial separation of the object and background and output the bounding box of cows for pose estimation, we built a multiobject model prior to pose estimation. In recent years, much research has been conducted on multiobject detection [29-31]. In this study, the YOLO v4 network which is a one-stage object detection algorithm was used to carry out fast and accurate detection of multiple cows.As shown in Fig 2(b), the CSPDarkNet53 neural network framework was used as the backbone to extract image features. The backbone contains multiple residual blocks that consist of convolution, batch normalization, and a Mish activation function. PANet (path aggregation network) integrated the extracted features to improve the detection performance.Three feature layers were obtained by the backbone for detecting small, medium, and large objects that were fused through PANet. Therefore, each image was divided into S*S grids (S was 52, 26, and 13) that were used to detect small, medium and large objects. Each grid cell was responsible for checking the object bounding box position, confidence, and type of the center point within it. The confidence reflects whether the current grid cell contains the cow objects and the corresponding prediction accuracy. The calculation formula can be expressed as follows:
where (C,C) are the coordinates of the current grid cell, (t,t) are the horizontal and vertical offsets of the center point, and (t,t) are the width and height of the object detection box.The original loss function in YOLO v4 consists of three parts, namely the bounding box regression loss, the confidence loss and the classification loss. In this paper, we modified the overall loss function into confidence loss and regression loss. Confidence loss was used to describe whether there was an object in the grid cell by calculating the binary cross entropy. Regression loss was used to describe the position and size difference between the annotated object and the predict object. Considering the three most important factors (overlapping area, center point distance and aspect ratio), the regression loss uses CIoU instead of IoU, as shown in Eqs (2)–(4).The network was pretrained on the PASCAL VOC 2012 dataset, and then the convolution layers were fine-tuned to achieve a better training effect using the self-built training dataset. According to GPU performance, the input image was set to 416 pixels * 416 pixels in training. The batch size and number of iterations were set as 2 and 50, and the initial learning rate and maximum learning rate were 0.000001 and 0.001, respectively. For the same object, the network may output multiple bounding boxes. We kept the bounding box with the highest confidence and excluded other boxes that had a high degree of overlap with it. Finally, we achieved the approximate separation of the object and background and output the accurate object frame to provide a reference location for dairy pose estimation.
Skeleton extraction
After achieving the approximate separation of object and background, the keypoint extraction network of dairy cows was constructed, and the pose classification network was built to realize the extraction of three typical poses (standing, walking and lying) [16, 17, 32]. The structure of the skeleton extraction model is shown in Fig 3.
Fig 3
The structure of two-branch skeleton extraction network.
(1) Network structure. The network structure consists of two branches: the upper branch is responsible for predicting the location of keypoints, and the lower branch is responsible for the partial affinity field between keypoints.First, the feature maps F were extracted from the image by backbone VGG-19 (the first 10 layers without pooling). In most cases, the behavior will only appear in a small spatial area in the image. The low-level convolutional layer features contain more object location and detail information, and the high-level features have stronger semantic information. Therefore, we combined low-level features such as texture, color and edges with high-level features. The feature generated by the fourth convolution layer was downsampled, and the channel after the eighth layer was compressed. Then, the two feature maps were connected with the feature map output by the last convolution layer. The correlation information between different spatial areas is captured through feature fusion.A multistage CNN (multistage convolutional neural network) was used to generate keypoint heatmaps and a part affinity field (PAF). S(S1, S2, …, S) represents the confidence map of keypoints that refers to the probability that keypoints appear in each pixel area. L(L1, L2, …, L) represents vector fields between every two keypoints that encodes the correlation degree among all keypoints. The first stage of the two-branch network consists of three convolution layers with a kernel size of 3*3 and two convolution layers with a kernel size of 1*1. The upper branch network predicts the keypoint confidence map S1 = ρ1(F), and the lower branch network predicts part affinity field heatmap L1 = φ1(F), where ρ and φ represent the CNN structure of the convolutional neural network. In each subsequent stage, both branches consist of five convolution layers with a kernel size of 7*7 and two convolution layers with a kernel size of 1*1. The prediction result of the previous stage merged with the original image feature is used as the input of the next stage. After multiple stages of operation, the keypoint prediction accuracy is improved. The network structure is expressed as follows.In the training phase, L2 loss is used to supervise the keypoints and part affinity field. In this study, the pretrained model of the COCO2017 person keypoint data set was used to initialize the network parameters. Then, the convolution layer was fine-tuned to achieve a better training effect on the self-built dataset.(2) Keypoint heatmap detection. The heatmap represents the confidence that a keypoint appears in a certain position of the image that is composed of a series of two-dimensional points. We use a Gaussian kernel with fixed variance to determine the marked keypoint confidence of each position. For the jth keypoint of the kth person, x is used to represent the actual position of the keypoint, and the confidence value of the pixels around the keypoint is expressed as:
where the standard deviation σ controls the distribution range of the confidence value. For an image with multiple cows, the actual heatmap of each keypoint is the maximum value within the Gaussian kernel.(3) Part affinity field association. The pixels on the limbs are represented by a unit vector, including the position and direction information, and all unit vectors on the limbs constitute the part affinity field. This is mathematically expressed by:
where v is the unit vector, and the part affinity field is defined as all unit vectors on the limb. If overlapping limbs of multiple cows k at point p are found, the field is the mean value of all vector fields.The association confidence between any two keypoint positions d and d is obtained by linear integration of the part affinity field.
where p(u) is the interpolation of the two positions.(4) Keypoint matchings with PAF. Due to the uncertainty regarding the number of objects and occlusion in the image, there may be several candidates for each part. Greedy relaxation was used to generate optimal matches. The specific operations are as follows.First, a point set of heatmaps of different cows is obtained to estabish a unique match between different point sets. The keypoints and PAF are regarded as the vertices and edge weight of the graph, respectively. Then, the multiobject detection problem is transformed into binary graph matching, and the optimal matching of the linked keypoints is obtained by using the Hungarian algorithm. Finally, the keypoints belonging to the same object in the image are marked on the object.
Pose classification
After extracting the position information of the keypoints of the cow’s body in the image, pose estimation was performed based on the cow’s skeleton in the object detection frame. Therefore, a fully connected neural network (FCNN) was designed to classify three typical behaviors (standing, walking and lying). The network structure is shown in Fig 4.
Fig 4
Fully connected neural network of pose classification.
Because there are 16 keypoints on the cow’s body and each keypoint corresponds to 2 coordinates on the x-axis and y-axis, the input layer is set as 32 layers. The effective information is extracted and integrated through 3 hidden layers (128, 64, and 16). Each layer introduces the ReLU activation function, and the batch normalization layer is added to prevent the gradient from disappearing. The output layer is a dense layer with a size of 4 activated by softmax, and is used to predict the three typical behaviors. The softmax function maps the output to a value in the interval (0,1) that calculates the probability of each of the three behaviors.
Results
All experiments were performed on a Windows 10 operating system with an Intel(R) Core (TM) with a 2.7 GHz CPU, 128 GB RAM and a 24 GB NVIDIA Quadro P6000 GPU. The model was written by using an open-source deep learning framework named TensorFlow 2.2 based on Python 3.7.
Evaluation of multiobject detection
To verify the effectiveness of the algorithm, four indices namely precision, recall, average precision (AP) and detection speed were adopted in the evaluation of the object detection model. When IoU(Intersection over Union)is greater or equal to the threshold, the prediction result is considered to be a true positive case(TP). When IoU is lower than the threshold, it is considered to be false positive case (FP). When IoU is equal to 0, it is false negative case(FN). In this paper, the average precision(AP50 and AP75) was calculated when the IoU thresholds are set to 0.5 and 0.75, respectively. AP is the average value when an object is detected. The formulas for the calculation of precision, recall and AP are as follows:
where TP, FP, FN and TN indicate the numbers of true positives, false positives, false negatives and true negatives, respectively.The precision, recall and AP of this method are shown in Table 1. AP50 and AP75 represent the average precision when the IOU is equal to 0.5 and 0.75. Detection speed can reach 8.06 f/s. It is observed from the test results that the detection accuracy of multiple cows was lower than that of a single cow because it was difficult to extract features when multiple cows were occluded. There was little difference in the detection accuracy between day and night. This was because the object detection dataset in this study was randomly selected from 24 hours of surveillance video. Therefore, the detection accuracy of the model remained nearly the same throughout the whole day. Generally, the multiobject detection method based on machine vision proposed in this study exhibited good accuracy.
Table 1
Evaluation indicators of object detection under different scenarios.
Test result
IOU
Precision(%)
Recall(%)
AP(%)
AP50(%)
AP75(%)
Single cow without occlusion
0.5
99
99
78.07
99
0.75
98
98
97.61
Multi-cows with occlusion
0.5
99.28
95.8
71.25
95.8
0.75
94.2
90.91
88.75
day
0.5
99.55
98.67
72.03
98.67
0.75
94.64
93.81
91.98
night
0.5
96
84.85
60.27
82.77
0.75
89.14
78.79
74.4
400 images tested simultaneously
0.5
98.56
94.49
69.4
94.04
0.75
93.59
89.72
87.07
Evaluation of keypoint extraction model
To verify the performance of the algorithm, OKS (object keypoint similarity) was adopted as the evaluation index of the keypoint extraction model in this study. OKS plays the same role as IoU and is calculated between the predicted keypoints and ground truth keypoints. The calculation of OKS is shown in formula (13).
where d is the Euclidean distance between each ground truth and detected keypoint and p and i represent the IDs of object cows and keypoints, respectively. s is the object scale, and k represents the normalized factor of the ith keypoint. v is the visibility marker for the ground truth;δ is the Kronecker function.We also take AP to account for the complete test images when the OKS threshold is 0.5. The calculation of AP follows the formula below.Considering the influences of light and occlusion on the results in this study, 390 images including single cows and dual cows with occlusions were tested under different light conditions. The experimental results are shown in Figs 5 and 6. Parts No. 1-16 represent the head, left front leg root, right front leg root, left front knee, right front knee, left front hoof, right front hoof, left hind leg root, right hind leg root, left hind knee, right hind knee, left hind hoof, right hind hoof, neck, spine and coccyx, respectively. Generally, the confidence of the 12 keypoints representing the legs was significantly higher than that of other keypoints. The average confidence of leg keypoints was 83.3% for single cows and 81.2% for double cows during daylight, and decreased slightly at night, with balue of 73.1% for single cows and 71.4% for double cows. The average values of the coccyx, spine, and neck sequentially followed the values for leg keypoints, and the head exhibited the lowest confidence.
Fig 5
The confidence of various keypoints in different scenarios.
Fig 6
Average precision of all keypoints.
(a) All keypoints of single cow. (b) All keypoints of double cows.
Average precision of all keypoints.
(a) All keypoints of single cow. (b) All keypoints of double cows.It is observed from the experimental results that the detection precision of the leg keypoints was the highest, and was greater than 90% regardless of day or night and single cow or double cows. This was followed by the values for the back, neck, and head. For a single cow during the day, the AP of the coccyx was 83%, that of the spine was 65%, and those of the neck and head were slightly lower at approximately 61.5% and 51.5%, respectively. For double cows, the AP of the coccyx was 74.5%, that of the neck was 63.5%, and those of the spine and head were slightly lower at approximately 41.5% and 36.5%, respectively. At night, the AP values were lower for all keypoints for both single cows and double cows. The average detection accuracy of a single cow was 85% during daylight and 78.1% at night, and the average detection accuracies of double cows were 74.3% and 71.6%, respectively.
Evaluation of pose classification model
In most multiclassification tasks, the evaluation metrics are computed based on the confusion matrix. In this study, the confusion matrix was also used as the evaluation metric of three typical pose classifications. Each column of the confusion matrix represents the predicted poses, the total number of which represents the number of predicted poses. Each row represents the true poses, the total number of which represents the actual amount of each pose. The larger numbers on the main diagonal and the smaller numbers on other cells in the confusion matrix indicate better performance of the pose estimation algorithm. The confusion matrix of our pose estimation is listed in Table 2.
Table 2
Confusion matrix of pose classification.
Actual poses
Predicted poses
Standing
Walking
Lying
Total
Standing
121
9
0
130
Walking
10
119
1
130
Lying
1
0
129
130
Total
132
128
130
390
Evaluation of multiclassification
Number of correct predictions
121+119+129 = 369
Precision for each pose
0.9167, 0.9297, 0.9923
Recall for each pose
0.9308, 0.9154, 0.9923
Accuracy
369/390 = 0.9462
The first row in Table 2 shows that 121 standing poses were correctly predicted, and 9 were incorrectly predicted as walking. Similarly, 119 walking and 129 lying poses were correctly predicted in the second and third rows. The results showed that the number of correctly predicted poses was 369, and the number of incorrectly predicted poses was 21. The matrix of precision for each pose on the test set is illustrated in Fig 7. The precision vector was calculated as 0.9167, 0.9297, and 0.9923, and the recall vector was calculated as 0.9308, 0.9154, and 0.9923. Taking standing as an example, we can observe that the detection rates were 0.9167 for standing and 0.0758 for walking in all detected standing poses. The classification of standing and walking is relatively poor compared with lying. The characteristics of the two poses are very similar when the cow is standing or walking facing the camera. It is more difficult to distinguish between these two types of behaviors, which easily causes confusion. The average accuracy and loss function were calculated and compared for the training set and validation set. The accuracy and loss on the training set and validation set were 0.9648 and 0.1017 and 0.9278 and 0.2579, respectively.
Fig 7
Matrix of (a) precision and (b) Recall ratio. (c) Average accuracy and loss function curves.
Matrix of (a) precision and (b) Recall ratio. (c) Average accuracy and loss function curves.
Discussion
In an actual open farm environment, various interference factors are present in the image obtained from the surveillance cameras, such as the varying light, occlusion between the cows and the representation of objects far away from the surveillance cameras. To verify the robustness of the algorithm and evaluate its effectiveness, we investigated the main factors that may affect pose estimation in the present study.
Analysis of the influence of varying light on pose estimation
At the open cow breeding site, light in the early morning and twilight illuminates the surveillance camera. The images taken by the surveillance camera often encounter front-light, back-light or night conditions. The image is brighter under front light and darker under back light, which increases the difficulty of feature extraction. The low-level features of the convolutional layer contain more location and detailed information, and the high-level features have stronger semantic information. To reduce the impact of light on the features, we fused low-dimensional features such as texture, color, and edges with high-level cow semantic features to extract more image features. It is observed from the test results that the keypoint average precision of the multicow system at night was slightly lower than that during the day. The accuracy of the double cows situations was 74.3% during the day and 71.6% at night, showing only a small and not obvious difference. To simulate the impact of front-light and back-light farm environments without interference from other factors, we adjusted the brightness of the same input image by increasing or decreasing it by 25% and 50%. When the original image brightness was reduced by more than 50%, the surveillance camera activated the night vision function. To ensure that the test results were affected by unexpected factors rather than varying light, the same image was randomly selected from the test set. The test results are illustrated in Fig 8. The blue box is the predicted box of the detected cow, and the colored dots are the keypoints of the dairy cow’ body. The pose label in blue font is displayed in the upper left corner of the predicted box. (a) and (b) show the influence of increasing the brightness by 25% and 50%. (c) and (d) show the influence of decreasing the brightness by 25% and 50%, respectively.
Fig 8
The detection result at different brightness values.
(a) and (b) represent the influence of increasing the brightness by 25% and 50%. (c) and (d) represent the influence of decreasing the brightness by 25% and 50%.
The detection result at different brightness values.
(a) and (b) represent the influence of increasing the brightness by 25% and 50%. (c) and (d) represent the influence of decreasing the brightness by 25% and 50%.As seen from the above figure, the detection results of the same image containing 5 cows were basically the same after increasing or decreasing the brightness by 25% and 50%. As the brightness of the image changes, the three poses of the cow can still be accurately detected. The varying light had less effect on the accuracy of pose estimation in this study. The proposed algorithm of pose estimation under front lighting and backlighting exhibited high robustness.
Analysis of the influence of occlusion on pose estimation
To verify the effectiveness of the algorithm under occlusion conditions, the test images were divided into two groups. Under the same light conditions, we compared the effects of different occlusion conditions on pose estimation. Among the test images, each category was collected under good lighting without occlusion, good lighting with occlusion, poor lighting without occlusion and poor lighting with occlusion.As shown in Fig 9, the number of cows in the image ranged from single cow and double cows to 5 cows during the day, and the number also ranged from single cow, double cows to 6 cows at night. (a)-(c) show the pose estimation results under good lighting. (d)-(f) show the pose estimation results under poor lighting. There was obvious occlusion between two cows (front legs, head) and almost no occlusion between multiple cows. The results showed that the method proposed in this paper can still estimate the three typical poses with an increasing number of cows. An increase in the number of cows had less impact on the detection accuracy. The keypoint detection accuracy of a single cow was 85% during daylight and 78.1% at night, and that of double cows with occlusion was 74.3% and 71.6%. The mutual occlusion between cows reduced the keypoint detection accuracy under the same lighting conditions, resulting in the accuracy decline of pose classification. When the cow was occluded by more than 50%, the number of keypoints that could be detected was too small, so the accuracy of pose classification was relatively low. Nevertheless, cows are seldom seriously occluded in the actual cow sport field. The above experiments showed that the method proposed in this study exhibited good robustness for pose estimation under different occlusion conditions.
Fig 9
Effects of different numbers of cows on pose estimation.
(a)-(c) indicate the image containing single cow, double cows and 5 cows during the day. (d), (e) and (f) represent the image containing single cow, double cows and 6 cows at night.
Effects of different numbers of cows on pose estimation.
(a)-(c) indicate the image containing single cow, double cows and 5 cows during the day. (d), (e) and (f) represent the image containing single cow, double cows and 6 cows at night.
Analysis of the influence of image resolution on precision
The shapes of cow images after the object detection network exhibited different sizes, and cows far from surveillance cameras appeared as small objects in the image, so that the resolution of small object images are relatively low. We compared the impacts of different resolutions of input images on the detection accuracy and speed in this study. Images with 300*168 pixels, 600*336 pixels and 900*504 pixels were selected to test keypoint detection in dairy cows. The average precision values were 25%, 52% and 77%, and the detection speeds were 4.15 fps, 3.69 fps, and 2.63 fps, respectively. Different image resolutions affected the accuracy and computational complexity of pose estimation. For the same network model, increasing the size of the input image improved the accuracy of pose estimation to a certain extent, but the computational cost was also greatly increased at the same time. Therefore, follow-up work will use the superresolution enhancement algorithm to change the image resolution to improve the detection accuracy of small objects far from the surveillance camera. We will reduce the number of parameter calculations by designing a lightweight network and improve the efficiency of the network without degrading its performance.
Analysis of the influence of transitions between poses
As found by manual observation, cows become restless immediately during estrus or before calving. According to some previous research works [12, 13], specific irregular behaviors appear before calving and during estrus, such as lying, standing, frequently changing positions between lying down and standing up and crawling behavior. Since the network was trained based on standing, walking and lying images in this study, the failure detection rate was relatively high when the cows were changing poses (transitioning from lying to standing or transitioning from standing to lying). On the other hand, the pose characteristics of standing and walking are very similar, particularly when the cow’s head is facing the camera. Standing and walking poses will be confused, and their classification precision is relatively poor compared with lying. As seen from the confusion matrix (Table 2), 7.58% of standing was incorrectly recognized as walking, and 7.03% of walking was recognized as standing. To overcome the abovementioned precision decrease, more data are needed to validate the findings of this research, particularly with respect to the transitions between poses.
Conclusions
To extract three typical poses (standing, walking and lying) in an actual farm environment, we presented an algorithm for multiobject pose estimation based on transfer-learning methods. We analyzed the main influencing factors on the detection precision, such as the varying light, occlusion between cows and the analysis of small objects far away from the surveillance cameras.The main conclusions are as follows.In the cow breeding site, the collected images will appear with frontlight and backlight. To reduce the impact of light on features, we fused low-dimensional features such as texture, color, and edges with high-level cow semantic features to extract more image features. To simulate the farm environment of front light and back light without interference from other factors, we adjusted the same input image by increasing or decreasing the brightness by 25% and 50%. The results showed that the change in natural light had little effect on the pose estimation algorithm proposed in this study.To verify the effectiveness of the algorithm under occlusion conditions, we compared the influence of different occlusions under the same light conditions. The keypoint detection accuracy of a single cow was 85% in daylight and 78.1% at night, which was significantly higher than that for double cows with obvious occlusion (74.3% and 71.6%). The pose classification was also affected due to decline in keypoint accuracy resulting from occlusion. When the cow was occluded by more than 50%, the accuracy decreased significantly. In actual cow breeding site, cows are seldom seriously occluded. Therefore, the method of this study can be used for multicow pose estimation.The shapes of cow images exhibited different sizes after the object detection network. Cows far away from the surveillance camera manifested as small objects in the image with relatively low resolution. We compared the pose detection results of three input images with different resolutions. As the image resolution improved, the detection accuracy also increased. Therefore, to improve the detection accuracy of small objects far from surveillance cameras, we will use the superresolution enhancement algorithm to change the image resolution in future work. We will reduce the number of parameter calculations by designing a lightweight network and improve the efficiency of the network without degrading its performance.The failure detection rate was relatively high when the cows were changing poses between lying and standing and facing the camera. As found from the test results, 7.58% of standing was incorrectly recognized as walking, and 7.03% of walking was recognized as standing. To overcome the precision decrease, more data are needed to validate the findings of this research, particularly for transitions between poses.The pose estimation and behavior classification methods of dairy cow based on skeleton feature extraction in this study have certain reference significance for animal behavior researchers. And this study can provide further data support for lameness detection, estrus detection and the prediction of calving in large-scale precision farming. However, the real-time performance of the algorithm was relatively poor. We will focus on improving the real-time performance while ensuring high detection accuracy and reducing the number of parameters and calculations in the following work.20 Jan 2022Submitted filename: Response to Reviewers.docxClick here for additional data file.21 Feb 2022
PONE-D-21-40261
Multicows Pose Estimation Based on Keypoint Extraction
PLOS ONE
Dear Dr. Zhang,Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.Please revise accordingly to the comments and suggestion by reviewer.Please submit your revised manuscript by Apr 07 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.Please include the following items when submitting your revised manuscript:
A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.We look forward to receiving your revised manuscript.Kind regards,Yan Chai HumAcademic EditorPLOS ONEJournal Requirements:When submitting your revision, we need you to address these additional requirements.1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found athttps://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf andhttps://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf2. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match.When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.3. Thank you for stating the following financial disclosure:“NO, the funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.At this time, please address the following queries:a) Please clarify the sources of funding (financial or material support) for your study. List the grants or organizations that supported your study, including funding received from your institution.b) State what role the funders took in the study. If the funders had no role in your study, please state: “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”c) If any authors received a salary from any of your funders, please state which authors and which funders.d) If you did not receive any funding for this study, please state: “The authors received no specific funding for this work.”Please include your amended statements within your cover letter; we will change the online submission form on your behalf.4. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.We will update your Data Availability statement to reflect the information you provide in your cover letter.5. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. Please see the following video for instructions on linking an ORCID iD to your Editorial Manager account: https://www.youtube.com/watch?v=_xcclfuvtxQ[Note: HTML markup is below. Please do not edit.]Reviewers' comments:Reviewer's Responses to Questions
Comments to the Author1. Is the manuscript technically sound, and do the data support the conclusions?The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes********** 3. Have the authors made all data underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No********** 4. Is the manuscript presented in an intelligible fashion and written in standard English?PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes********** 5. Review Comments to the AuthorPlease use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The article presents an application of current detection methods to the poses of cows on a farm. The experiments have been explained and performed according to the field’s standard. A couple of points remain, that need attention before publication in my opinion.1) Data availability• additional information in the beginning of the PDF states, that all relevant data is freely available and that it is available within the manuscript and its Supporting Information files.• But the manuscript does not contain any links to images, videos or code.2) Ethics• IACUC and approval number were not mentioned◦ it was also not mentioned, that those were not necessary. Please state why the relevant authorities do not consider this an animal experiment, that needs to be approved. Or if they do, state the approval number. ◦• How did non-contact machine vision improve animal welfare during recording?◦ As stated under “data acquisition”◦ perhaps you meant, that the research could in the future improve welfare?3) Funding• where was funding coming from? If the authors did not receive funding, it has to be stated4) Data• LabelImg was used, please give a source or citation• Mosaic enhancement◦ why did you use it? Why not other augmentation methods like mirroring or rotation?◦ Should it not be called “mosaic augmentation” instead of “enhancement”• How were the data set frames picked? It cannot be random from the videos, since you have exactly 600 images for each condition (standing, walking, lying). How exactly were validation and test set chosen?• How often do the cows occlude each other by more than 50%? You state, that it is rather seldom in a real farm environment. Could you estimate how much it happens in your data?5) Performance• The abstract states, that the algorithm exhibited a higher detection rate. Higher than what?• Table 1◦ In the text and the table it says “IOU” in places, where in my understanding “IOU threshold” is meant. In case I misunderstand, please make it more clear, what “ average precision when the IOU is equal to 0.5” means. If it should indeed be “IOU threshold”, please correct.◦ What is the difference between AP and AP50? They both appear in the rows, where the IOU threshold is 0.5. This confusion might be connected to the previous point.◦ What are the “400 images tested simultaneously”?6) Structure• In the methods section the model setup is detailed. State more clearly what parts are done as in yolo v4 and what you have inserted yourself.7) Conclusion• It is necessary to assess at least roughly, whether the performance of the system would be adequate for applications in the area such as estrus detection. To make a statement under some assumptions in the conclusion would already be good. This would put the numbers into a useful context.If these points are addressed I recommend publication after minor revision.********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.23 Mar 2022Dear Editor and Reviewer:Thank you very much for your constructive comments and valuable suggestions. We have carefully considered the suggestion of Editor and Reviewer. We have tried our best to improve and made some changes in the manuscript according to your kind advices and detailed suggestions. The changes and corrections will be marked in red in the manuscript. This document summarizes our revisions and responses in the following. Thank you again for all your help with our manuscript.Comments from EditorsResponds to the editors' comments:1.Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found athttps://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf andhttps://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdfResponds: Thank you very much for the kind reminder. We have downloaded the LaTeX template from PLOS ONE's official website and read the LaTeX guidelines. We have revised some of the formatting and will do our best to make our manuscript conform to PLOS ONE's style.2. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match.When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.Responds: We are very sorry for the mismatch of grant information that was provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections. We have revised the grant information and do our best to provide the correct grant numbers for the awards in the ‘Funding Information’ section.3. Thank you for stating the following financial disclosure:“NO, the funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.At this time, please address the following queries:a) Please clarify the sources of funding (financial or material support) for your study. List the grants or organizations that supported your study, including funding received from your institution.b) State what role the funders took in the study. If the funders had no role in your study, please state: “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”c) If any authors received a salary from any of your funders, please state which authors and which funders.d) If you did not receive any funding for this study, please state: “The authors received no specific funding for this work.”Please include your amended statements within your cover letter; we will change the online submission form on your behalf.Responds: We are very sorry for missing the funding information.We have listed the grants that supported our study in the ‘Funding Information’ section. This study was funded by the National Natural Science Foundation of China under Grant 61966026, Grant 62161034 and Grant 61561037. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The authors appreciate the funding organization for their financial supports.4. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.We will update your Data Availability statement to reflect the information you provide in your cover letter.Responds: We are very sorry for our negligence of the links to images datasets. The minimal data set underlying the results described in our manuscript can be found at https://www.kaggle.com/twisdu/dairy-cow. All data in our manuscript will be fully shared without restriction. The link to the minimal dataset has been added in the Supporting Information files when submitting the revised manuscript.5. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. Please see the following video for instructions on linking an ORCID iD to your Editorial Manager account: https://www.youtube.com/watch?v=_xcclfuvtxQResponds: Thank you very much for the kind reminder. The corresponding author have an ORCID iD and that it is validated in Editorial Manager. And we have updated the Information in PLOS ONE Editorial Manager Submission System.Comments from Reviewers:Reviewer #1: The article presents an application of current detection methods to the poses of cows on a farm. The experiments have been explained and performed according to the field’s standard. A couple of points remain, that need attention before publication in my opinion.Responds to the reviewer's comments:1) Data availability• additional information in the beginning of the PDF states, that all relevant data is freely available and that it is available within the manuscript and its Supporting Information files.• But the manuscript does not contain any links to images, videos or code.Responds:We are very sorry for our negligence of the links to images datasets. The minimal data set underlying the results described in our manuscript can be found at https://www.kaggle.com/twisdu/dairy-cow. All data in our manuscript will be fully shared without restriction. The link to the minimal dataset has been added in the Supporting Information files when submitting the revised manuscript.2)Ethics• IACUC and approval number were not mentioned◦ it was also not mentioned, that those were not necessary. Please state why the relevant authorities do not consider this an animal experiment, that needs to be approved. Or if they do, state the approval number.• How did non-contact machine vision improve animal welfare during recording?◦ As stated under “data acquisition”◦ perhaps you meant, that the research could in the future improve welfare?Responds:Thank you very much for your kindness suggestion. This study was carried out at Inner Mongolia Flag Animal Husbandry Co., Ltd. Inner Mongolia Agricultural University has conducted scientific research with Inner Mongolia Flag Animal Husbandry Co., Ltd. in Hohhot for more than five years. In the stage of experiment, the data was taken with the consent and direction of the company. The study does not require approval from any relevant authorities.The traditional way based on manual observation methods to detect the behavior of dairy cows increases the probability of human-animal contact, which can lead to some zoonotic diseases. The way of wearable sensors to collect the behavior of dairy cows seriously interferes with the cows during the installation of the equipment, which is likely to cause a strong stress response to the dairy cows. The data in this paper was acquired by the monitoring camera, which was fixed on the fence of the breeding site at a height of 4 m. During the data collection process, neither the collection equipment nor the experimenter contacted the cows. It did not interfere with the normal activities of the cows, and did not cause the cows' stress response. Compared with traditional manual inspection and using wearable sensors, it can realize continuously monitoring the activity and health of dairy cows without contact. If the project could be implemented in the future, it can complete non-contact animal behavior monitoring and improve animal welfare. According to your kind suggestion, we have revised the detailed description of data acquisition in the manuscript3) Funding• where was funding coming from? If the authors did not receive funding, it has to be statedResponds:We are very sorry for missing the funding information. This research was funded by the National Natural Science Foundation of China under Grant 61966026, Grant 62161034 and Grant 61561037. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We have corrected the funding information in the ‘Funding Information’ section when submitting our revised manuscript.4) Data• LabelImg was used, please give a source or citation• Mosaic enhancement◦ why did you use it? Why not other augmentation methods like mirroring or rotation?◦ Should it not be called “mosaic augmentation” instead of “enhancement”• How were the data set frames picked? It cannot be random from the videos, since you have exactly 600 images for each condition (standing, walking, lying). How exactly were validation and test set chosen?• How often do the cows occlude each other by more than 50%? You state, that it is rather seldom in a real farm environment. Could you estimate how much it happens in your data?Responds:Thank you very much for your detailed suggestion.LabelImg: The source of LabelImg annotation tool is https://github.com/tzutalin/labelImg. We have added the source citation in the revised manuscript.Mosaic augmentation: As your kind suggestion, we revised the expression to Mosaic augmentation. It is a kind of data augmentation method. It has the advantage of enriching the background of the detection content. The data of 4 images can be calculated at a time during Batch Normalization (BN) calculation. The 4 images were mirrored, flipped, cropped before they were stitched together.Data set: The data frames were manually selected so that the selected images contain each condition (standing, walking, lying). The validation and test sets are randomly drawn from the overall dataset.Occlusion: During the experiment, we adjusted the height of the camera to make the cow objects less occluded in video. To test the robustness of cow occlusion effects on detection accuracy, we lowered the shooting height in some video. When the little cow body was occluded by more than 50%, the detection accuracy significantly decreased. As your kind suggestion, we will continue to study the occlusion situation to improve the detection accuracy and meet the needs of large-scale precision farming.5) Performance• The abstract states, that the algorithm exhibited a higher detection rate. Higher than what?• Table 1◦ In the text and the table it says “IOU” in places, where in my understanding “IOU threshold” is meant. In case I misunderstand, please make it more clear, what “ average precision when the IOU is equal to 0.5” means. If it should indeed be “IOU threshold”, please correct.◦ What is the difference between AP and AP50? They both appear in the rows, where the IOU threshold is 0.5. This confusion might be connected to the previous point.◦ What are the “400 images tested simultaneously”?Responds:Thank you very much for your constructive comments and valuable suggestions. As your kind suggestion, we revised the expression in manuscript. This study combined YOLO v4 object detection and pose estimation to classify daily behaviors of multi-cow. The Test result showed that the detection accuracy was higher than that in previous research results. Song et al. [6] proposed a skeleton extraction model of cows in walking states with a high accuracy rate of up to 93.40% when the OKS was 0.75. Hahn Klimroth et al. [22] presented a multistep convolutional neural network for detecting three typical poses of African ungulates, obtaining a high accuracy of 93%. Chen et al. [24] proposed an algorithm based on YOLACT with high detection speed and accuracy for real-time detection and tracking of multiple parts of pig bodies. The detect accuracy of the algorithm in the data set could reach up to 90%,In Table 1, IoU means the Intersection over Union, and IoU threshold is a judgment threshold. If the IoU of the predicted box and the ground truth is greater than or equal to the IoU threshold, the predicted result is considered to be TP; otherwise, the predicted result is considered to be FP. When IoU is equal to 0, the prediction result is considered to be FN. AP50 is the average precision when the IoU threshold is set to 0.5. AP is the mean value when the value of IOU is taken from 0.5 to 0.95 and the step size is 0.05.The 400 images refer to the image test set containing all four cases listed in the table.6) Structure• In the methods section the model setup is detailed. State more clearly what parts are done as in yolo v4 and what you have inserted yourself.Responds:Thank you very much for your constructive comments and valuable suggestions. Considering the Reviewer’s suggestion, we have added the detailed statement of the improvement YOLO v4. Firstly, the original loss function consists of three parts, namely the bounding box regression loss, the confidence loss and the category loss. In this paper, we only study on the cow objects, category loss does not need to be considered. We modified the overall loss function of the object detection network into confidence loss and regression loss. Confidence loss was used to describe whether there was an object in the grid cell by calculating the binary cross entropy. Regression loss was used to describe the position and size difference between the annotated object and the predict object. Then, we adopted the transfer learning in this paper. We used the pretrained network weights on the PASCAL VOC 2012 dataset and initialized the network model. And then the network was fine-tuned by self-built dataset to achieve better training effect. We have revised the presentation of this section in the manuscript.7) Conclusion• It is necessary to assess at least roughly, whether the performance of the system would be adequate for applications in the area such as estrus detection. To make a statement under some assumptions in the conclusion would already be good. This would put the numbers into a useful context.Responds:Thank you very much for your valuable suggestions. When a cow is in lameness, estrus and before calving, the pose will change frequently. The pose estimation and behavior classification methods of dairy cow based on skeleton feature extraction in this study have certain reference significance for animal behavior researchers. And this study can provide further data support for lameness detection, estrus detection and the prediction of calving in large-scale precision farming. We will focus on improving the real-time performance while ensuring high detection accuracy and reducing the number of parameters and calculations in the following work.Submitted filename: Response to Reviewers.docxClick here for additional data file.12 Apr 2022
PONE-D-21-40261R1
Multicow Pose Estimation Based on Keypoint Extraction
PLOS ONE
Dear Dr. Zhang,Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.Please improve each of the figure's caption and table's title such that these captions or titles are self-contained in which the take-home message of the figure/table can be described as part of the caption so that the purpose of the figure or table can be easily delivered to the ready, even without referring to the texts. In your "response to comments" for your revised article, please detail the improvement for each of the figure's caption and table's title (before and after) and explain how these captions or titles have been improved. If you wish to remain the current caption or table's title, please explain the reasons for consideration.Please submit your revised manuscript by May 27 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.
A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.We look forward to receiving your revised manuscript.Kind regards,Yan Chai HumAcademic EditorPLOS ONEJournal Requirements:Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.Additional Editor Comments:Please revise the figures captions and table titles so that they can standalone or self-contained such that the main message of the figure/table is narrated as part of the caption.[Note: HTML markup is below. Please do not edit.]Reviewers' comments:[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.
6 May 2022Dear Editor:Thank you very much for your constructive comments and valuable suggestions. We have carefully considered the suggestion of Editors and Reviewers. We have tried our best to improve and made some changes in the manuscript according to your kind advices and detailed suggestions. We added the ethics statement at the beginning of the Materials and methods section of our manuscript file. The changes and corrections will be marked in red in the manuscript. This document summarizes our revisions and responses in the following. Thank you again for all your help with our manuscript. We sincerely hope this manuscript will be finally acceptable to published on PLOS ONE.Comments from EditorResponds to the editor's comments:1.Please improve each of the figure's caption and table's title such that these captions or titles are self-contained in which the take-home message of the figure/table can be described as part of the caption so that the purpose of the figure or table can be easily delivered to the ready, even without referring to the texts. In your "response to comments" for your revised article, please detail the improvement for each of the figure's caption and table's title (before and after) and explain how these captions or titles have been improved. If you wish to remain the current caption or table's title, please explain the reasons for consideration.Responds:Thank you very much for the kind suggestion. We have revised each of the figure's caption and table's title and will do our best to make our manuscript conform to PLOS ONE's style.Fig.1: As your kind suggestion, we revised the caption from ‘Keypoint label template of cow body’ to ‘Keypoints label template of the cow skeleton’. In this study, we estimated the cow skeleton through 16 keypoints and the partial affinity field between these keypoints. It showed the labeled positions and order of the 16 skeleton keypoints in Fig.1. The caption “Keypoints label template of the cow skeleton” can accurately describe the meaning of the current figure.Fig.2: The figure’s caption before was “Overview of the pose analysis procedure.” The revised figure’s caption are: (a) Overview of pose analysis procedure. (b) Close-up of the multiobject detector based on YOLOv4. Fig.2(a) is the overall process of the entire pose analysis, and Fig.2(b) present a detailed description of the object detector based YOLO v4. We added the detailed description of the two figures, so that the figure could be easily understood, even without referring to the texts.Fig.3: Thank you very much for your kindness suggestion. We are very sorry for the inaccurate figure’s description in original manuscript. As your kind suggestion, we revised the caption from “Close-up of keypoint extraction model” to “The structure of two-branch skeleton extraction network”. The network structure in Fig.3 consists of two branches: the upper branch is responsible for predicting the location of keypoints, and the lower branch is responsible for the partial affinity field between keypoints. As a result, the new description are able to more accurately represent the meaning of the figure.Fig.4: We retain the description of Figure 4 “Fully connected neural network of pose classification.” After extracting the skeleton information of the cow’s body in the image, a fully connected neural network was designed to classify three typical behaviors. We think that the classification network structure can be described by “Fully connected neural network of pose classification”.Fig.5: Thank you very much for your kindness suggestion. The figure’s caption before was “The OKS values of the keypoints in different scenarios”. The revised figure’s caption is “The confidence of various keypoints in different scenarios”. The OKS value is the confidence parameter for keypoint detection. If only the OKS value is used to describe figure 5, it is not easy to understand without the textual representation of the paper. As your kind suggestion, we revised the caption to “The confidence of various keypoints in different scenarios”.Fig.6: We are very sorry for the lack of detailed figure’s description in original manuscript. The figure’s caption before was “AP values of all keypoints. The revised figure’s caption: Average precision of all keypoints. (a) All keypoints of single cow. (b) All keypoints of double cows. We added the detailed description of the two figures, so that the figure could be easily understood, even without referring to the texts.Fig.7: Thank you very much for your kindness suggestion. We are very sorry for the lack of detailed figure’s description in original manuscript. The figure’s caption before: (a) Precision Matrix. (b) Average accuracy and loss function curves. The revised figure’s caption: Matrix of (a) precision and (b) Recall ratio. (c)Average accuracy and loss function curves. The description of recall ratio was missing in the original description of figure 7(b), we have improved the detailed caption in the revised manuscript.Fig.8: We are very sorry for the lack of detailed figure’s description in original manuscript. The figure’s caption before:The influence of varying light on pose estimation. The revised figure’s caption: The detection result at different brightness values. (a) and (b) represent the influence of increasing the brightness by 25% and 50%. (c) and (d) represent the influence of decreasing the brightness by 25% and 50%. The lighting changes for each figure were not explicitly stated in the previous representation. As your kind suggestion, we have added more detailed descriptions for each figure, so that the figure could be easily understood, even without referring to the texts.Fig.9: The figure’s caption before: Pose estimation results under different scenarios. The revised figure’s caption: Effects of different numbers of cows on pose estimation. (a)-(c) indicate the image containing single cow, double cows and 5 cows during the day. (d), (e) and (f) represent the image containing single cow, double cows and 6 cows at night. We are very sorry for the lack of detailed figure’s description in original manuscript. We have added detailed descriptions for each figure in the revised version.Table 1: The table's title before: Detection results under different scenarios. The revised figure’s caption: Evaluation indicators of object detection under different scenarios. Table 1 shows several parameters that measure the results of object detection, such as precision, recall, average precision (AP), AP50 and AP75. These parameters are the evaluation indicators of object detection. Therefore, we believe that “Evaluation indicators of object detection under different scenarios” can be more accurately describe the Table 1.Table 2: The table's title before: Detection results under different scenariosThe revised figure’s caption: Confusion matrix of pose classification. We are very sorry for the inaccurate table's title in original manuscript. In most multiclassification tasks, the evaluation metrics are computed based on theconfusion matrix. In table 2, the confusion matrix was also used as the evaluation metric of three typical pose classifications.2. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.Responds:Thank you very much for the kind reminder. We have rechecked the reference list to ensure that it is complete and correct. In the revised manuscript, the reference list has not been changed.3.Please upload a Response to Reviewers letter which should include a point by point response to each of the points made by the Editor and / or Reviewers. (This should be uploaded as a 'sponse to Reviewers'ile type.) Please follow this link for more information: http://blogs.PLOS.org/everyone/2011/05/10/how-to-submit-your-revised-manuscript/Responds:Thank you very much for the kind reminder. We uploaded a Response to Reviewers letter which included a point by point response to each of the points made by the Editor and Reviewers.4.Thank you for including your ethics statement on the online submission form: "This study was carried out at Inner Mongolia Flag Animal Husbandry Co., Ltd. Inner Mongolia Agricultural University has conducted scientific research with Inner Mongolia Flag Animal Husbandry Co., Ltd. in Hohhot for more than five years. In the stage of experiment, the data was taken with the company'approved. The study does not require approval from the relevant authorities. The data was acquired by the monitoring camera, which was fixed on the fence of the breeding site at a height of 4 m. During the data collection process, neither the collection equipment nor the experimenter contacted the cows. It did not interfere with the normal activities of the cows, and did not cause the cows'tress response. Compared with traditional manual inspection and using wearable sensors, it can realize continuously monitoring the activity and health of dairy cows without contact. If the project could be implemented in the future, it can complete non-contact animal behavior monitoring and improve animal welfare." To help ensure that the wording of your manuscript is suitable for publication, would you please also add this statement at the beginning of the Methods section of your manuscript file.Responds:As your kind suggestion, we have added an ethical statement at the beginning of the materials and methods section of the revised manuscript, marked in red font.Comments from Reviewers:Reviewer #1: The article presents an application of current detection methods to the poses of cows on a farm. The experiments have been explained and performed according to the field’s standard. A couple of points remain, that need attention before publication in my opinion.Responds to the reviewer's comments:1) Data availability• additional information in the beginning of the PDF states, that all relevant data is freely available and that it is available within the manuscript and its Supporting Information files.• But the manuscript does not contain any links to images, videos or code.Responds:We are very sorry for our negligence of the links to images datasets. The minimal data set underlying the results described in our manuscript can be found at https://www.kaggle.com/twisdu/dairy-cow. All data in our manuscript will be fully shared without restriction. The link to the minimal dataset has been added in the Supporting Information files when submitting the revised manuscript.2)Ethics• IACUC and approval number were not mentioned◦ it was also not mentioned, that those were not necessary. Please state why the relevant authorities do not consider this an animal experiment, that needs to be approved. Or if they do, state the approval number.• How did non-contact machine vision improve animal welfare during recording?◦ As stated under “data acquisition”◦ perhaps you meant, that the research could in the future improve welfare?Responds:Thank you very much for your kindness suggestion. This study was carried out at Inner Mongolia Flag Animal Husbandry Co., Ltd. Inner Mongolia Agricultural University has conducted scientific research with Inner Mongolia Flag Animal Husbandry Co., Ltd. in Hohhot for more than five years. In the stage of experiment, the data was taken with the consent and direction of the company. The study does not require approval from any relevant authorities.The traditional way based on manual observation methods to detect the behavior of dairy cows increases the probability of human-animal contact, which can lead to some zoonotic diseases. The way of wearable sensors to collect the behavior of dairy cows seriously interferes with the cows during the installation of the equipment, which is likely to cause a strong stress response to the dairy cows. The data in this paper was acquired by the monitoring camera, which was fixed on the fence of the breeding site at a height of 4 m. During the data collection process, neither the collection equipment nor the experimenter contacted the cows. It did not interfere with the normal activities of the cows, and did not cause the cows' stress response. Compared with traditional manual inspection and using wearable sensors, it can realize continuously monitoring the activity and health of dairy cows without contact. If the project could be implemented in the future, it can complete non-contact animal behavior monitoring and improve animal welfare. According to your kind suggestion, we have revised the detailed description of data acquisition in the manuscript3) Funding• where was funding coming from? If the authors did not receive funding, it has to be statedResponds:We are very sorry for missing the funding information. This research was funded by the National Natural Science Foundation of China under Grant 61966026, Grant 62161034 and Grant 61561037. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We have corrected the funding information in the ‘Funding Information’ section when submitting our revised manuscript.4) Data• LabelImg was used, please give a source or citation• Mosaic enhancement◦ why did you use it? Why not other augmentation methods like mirroring or rotation?◦ Should it not be called “mosaic augmentation” instead of “enhancement”• How were the data set frames picked? It cannot be random from the videos, since you have exactly 600 images for each condition (standing, walking, lying). How exactly were validation and test set chosen?• How often do the cows occlude each other by more than 50%? You state, that it is rather seldom in a real farm environment. Could you estimate how much it happens in your data?Responds:Thank you very much for your detailed suggestion.LabelImg: The source of LabelImg annotation tool is https://github.com/tzutalin/labelImg. We have added the source citation in the revised manuscript.Mosaic augmentation: As your kind suggestion, we revised the expression to Mosaic augmentation. It is a kind of data augmentation method. It has the advantage of enriching the background of the detection content. The data of 4 images can be calculated at a time during Batch Normalization (BN) calculation. The 4 images were mirrored, flipped, cropped before they were stitched together.Data set: The data frames were manually selected so that the selected images contain each condition (standing, walking, lying). The validation and test sets are randomly drawn from the overall dataset.Occlusion: During the experiment, we adjusted the height of the camera to make the cow objects less occluded in video. To test the robustness of cow occlusion effects on detection accuracy, we lowered the shooting height in some video. When the little cow body was occluded by more than 50%, the detection accuracy significantly decreased. As your kind suggestion, we will continue to study the occlusion situation to improve the detection accuracy and meet the needs of large-scale precision farming.5) Performance• The abstract states, that the algorithm exhibited a higher detection rate. Higher than what?• Table 1◦ In the text and the table it says “IOU” in places, where in my understanding “IOU threshold” is meant. In case I misunderstand, please make it more clear, what “ average precision when the IOU is equal to 0.5” means. If it should indeed be “IOU threshold”, please correct.◦ What is the difference between AP and AP50? They both appear in the rows, where the IOU threshold is 0.5. This confusion might be connected to the previous point.◦ What are the “400 images tested simultaneously”?Responds:Thank you very much for your constructive comments and valuable suggestions. As your kind suggestion, we revised the expression in manuscript. This study combined YOLO v4 object detection and pose estimation to classify daily behaviors of multi-cow. The Test result showed that the detection accuracy was higher than that in previous research results. Song et al. [6] proposed a skeleton extraction model of cows in walking states with a high accuracy rate of up to 93.40% when the OKS was 0.75. Hahn Klimroth et al. [22] presented a multistep convolutional neural network for detecting three typical poses of African ungulates, obtaining a high accuracy of 93%. Chen et al. [24] proposed an algorithm based on YOLACT with high detection speed and accuracy for real-time detection and tracking of multiple parts of pig bodies. The detect accuracy of the algorithm in the data set could reach up to 90%,In Table 1, IoU means the Intersection over Union, and IoU threshold is a judgment threshold. If the IoU of the predicted box and the ground truth is greater than or equal to the IoU threshold, the predicted result is considered to be TP; otherwise, the predicted result is considered to be FP. When IoU is equal to 0, the prediction result is considered to be FN. AP50 is the average precision when the IoU threshold is set to 0.5. AP is the mean value when the value of IOU is taken from 0.5 to 0.95 and the step size is 0.05.The 400 images refer to the image test set containing all four cases listed in the table.6) Structure• In the methods section the model setup is detailed. State more clearly what parts are done as in yolo v4 and what you have inserted yourself.Responds:Thank you very much for your constructive comments and valuable suggestions. Considering the Reviewer’s suggestion, we have added the detailed statement of the improvement YOLO v4. Firstly, the original loss function consists of three parts, namely the bounding box regression loss, the confidence loss and the category loss. In this paper, we only study on the cow objects, category loss does not need to be considered. We modified the overall loss function of the object detection network into confidence loss and regression loss. Confidence loss was used to describe whether there was an object in the grid cell by calculating the binary cross entropy. Regression loss was used to describe the position and size difference between the annotated object and the predict object. Then, we adopted the transfer learning in this paper. We used the pretrained network weights on the PASCAL VOC 2012 dataset and initialized the network model. And then the network was fine-tuned by self-built dataset to achieve better training effect. We have revised the presentation of this section in the manuscript.7) Conclusion• It is necessary to assess at least roughly, whether the performance of the system would be adequate for applications in the area such as estrus detection. To make a statement under some assumptions in the conclusion would already be good. This would put the numbers into a useful context.Responds:Thank you very much for your valuable suggestions. When a cow is in lameness, estrus and before calving, the pose will change frequently. The pose estimation and behavior classification methods of dairy cow based on skeleton feature extraction in this study have certain reference significance for animal behavior researchers. And this study can provide further data support for lameness detection, estrus detection and the prediction of calving in large-scale precision farming. We will focus on improving the real-time performance while ensuring high detection accuracy and reducing the number of parameters and calculations in the following work.Submitted filename: Response to Reviewers.docxClick here for additional data file.18 May 2022Multicow Pose Estimation Based on Keypoint ExtractionPONE-D-21-40261R2Dear Dr. Zhang,We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.Kind regards,Yan Chai HumAcademic EditorPLOS ONEAdditional Editor Comments (optional):all concerns have been addressed.Reviewers' comments:23 May 2022PONE-D-21-40261R2Multicow Pose Estimation Based on Keypoint ExtractionDear Dr. Zhang:I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.If we can help with anything else, please email us at plosone@plos.org.Thank you for submitting your work to PLOS ONE and supporting open access.Kind regards,PLOS ONE Editorial Office Staffon behalf ofDr. Yan Chai HumAcademic EditorPLOS ONE
Authors: Talmo D Pereira; Diego E Aldarondo; Lindsay Willmore; Mikhail Kislin; Samuel S-H Wang; Mala Murthy; Joshua W Shaevitz Journal: Nat Methods Date: 2018-12-20 Impact factor: 28.547
Authors: Jacob M Graving; Daniel Chae; Hemal Naik; Liang Li; Benjamin Koger; Blair R Costelloe; Iain D Couzin Journal: Elife Date: 2019-10-01 Impact factor: 8.140
Authors: XiaoLe Liu; Si-Yang Yu; Nico A Flierman; Sebastián Loyola; Maarten Kamermans; Tycho M Hoogland; Chris I De Zeeuw Journal: Front Cell Neurosci Date: 2021-05-28 Impact factor: 5.505