| Literature DB >> 35245288 |
Sofia Broomé1, Katrina Ask2, Maheen Rashid-Engström3,4, Pia Haubro Andersen5, Hedvig Kjellström1,6.
Abstract
Orthopedic disorders are common among horses, often leading to euthanasia, which often could have been avoided with earlier detection. These conditions often create varying degrees of subtle long-term pain. It is challenging to train a visual pain recognition method with video data depicting such pain, since the resulting pain behavior also is subtle, sparsely appearing, and varying, making it challenging for even an expert human labeller to provide accurate ground-truth for the data. We show that a model trained solely on a dataset of horses with acute experimental pain (where labeling is less ambiguous) can aid recognition of the more subtle displays of orthopedic pain. Moreover, we present a human expert baseline for the problem, as well as an extensive empirical study of various domain transfer methods and of what is detected by the pain recognition method trained on clean experimental pain in the orthopedic dataset. Finally, this is accompanied with a discussion around the challenges posed by real-world animal behavior datasets and how best practices can be established for similar fine-grained action recognition tasks. Our code is available at https://github.com/sofiabroome/painface-recognition.Entities:
Mesh:
Year: 2022 PMID: 35245288 PMCID: PMC8896717 DOI: 10.1371/journal.pone.0263854
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1We present a study of domain transfer in the context of different types of pain in horses.
Horses with low grade orthopedic pain only show sporadic visual signs of pain, and these signs may overlap with spontaneous expressions of the non-pain class—it is therefore difficult to train a system solely on this data.
Fig 2Pain predictions on the 25 clips included in the baseline study (Table 2), by the human experts (left), and by the C-LSTM-2-PF † (right).
Overview of the datasets.
Frames are extracted at 2 fps, and clips consist of 10 frames. Duration shown in hh:mm:ss.
| Dataset | # horses | # videos | # clips | Pain | No pain | Total | Labeling |
|---|---|---|---|---|---|---|---|
|
| 6 | 60 | 8784 | 03:41:25 | 06:03:44 | 09:45:09 | Video-level. Induced clean experimental acute pain, on/off (binary) |
|
| 7 | 90 | 6710 | 03:37:36 | 05:03:25 | 08:41:01 | Video-level. Induced orthopedic pain, varying number of hours prior to the recording. Binarized human CPS pain scoring before/after recording (3 independent raters) |
Fig 3The figures can be viewed as animations in the Supporting information.
Here, only the middle frame of each sequence is shown. RGB, optical flow, and Grad-CAM [62] saliency maps of the C-LSTM-2-PF † predictions on clips 10 and 24 (Table 2). Clip 10 (left) is a correct prediction of pain. Clip 24 (right) is a failure case, showing an incorrect pain prediction, and we observe that the model partly focuses on the human bystander. The remaining 23 clips with saliency maps can be found in the Appendix of S1 File.
Overview of the predictions on 25 EOP(j) clips made by the human veterinarian experts and by one C-LSTM-2 instance, trained only on PF.
The labels for the C-LSTM-2 were thresholded above 0 (same threshold as for the experts). The behavior symbols in the Behavior column are explained in Table 3.
| 27 Experts | C-LSTM-2-PF † | ||||||
|---|---|---|---|---|---|---|---|
| Clip | Behaviors | CPS | Label | Avg. rating | # correct | Pred. | Conf. |
| 1 | e1 | 2 | 1 | 2.7 | 23 | 1 | 0.9999 |
| 2 | o1 l | 2 | 1 | 1.1 | 10 | 1 | 0.9241 |
| 3 | e2 t1 c1 n1 p | 4.33 | 1 | 3.4 | 22 | 1 | 0.9997 |
| 4 | e2 o1 l | 4.67 | 1 | 3.7 | 21 | 1 | 0.6036 |
| 5 | t1 c1 | 3.67 | 1 | 0.37 | 4 | 1 | 0.9853 |
| 6 | 1.33 | 1 | 0.96 | 13 | 1 | 0.5200 | |
| 7 | u | 4.33 | 1 | 1.3 | 12 | 1 | 0.9063 |
| 8 | c2 n2 m u p | 6.33 | 1 | 4.7 | 25 | 0 | 0.8623 |
| 9 | e1 t1 c1 n1 p | 3 | 1 | 4.6 | 25 | 0 | 0.5504 |
| 10 | e2 t1 c1 n2 l p | 2 | 1 | 4.4 | 25 | 1 | 0.9999 |
| 11 | e2 t1 c1 n2 l | 1 | 1 | 6.8 | 27 | 1 | 0.8231 |
| 12 | e1 t1 c1 n1 m | 4.33 | 1 | 2.5 | 22 | 0 | 0.8046 |
| 13 | e2 t1 c1 n2 | 1.67 | 1 | 4.4 | 26 | 1 | 0.9840 |
| 14 | e2 o1 t1 c1 n1 u | 0 | 0 | 5.5 | 0 | 0 | 0.9993 |
| 15 | e2 t1 | 0 | 0 | 4.1 | 1 | 0 | 0.5848 |
| 16 | e2 o1 t1 c1 n1 | 0 | 0 | 3.9 | 4 | 0 | 0.8439 |
| 17 | e1 o1 t1 n1 l p | 0 | 0 | 4.4 | 5 | 1 | 0.6456 |
| 18 | t1 m u p | 0 | 0 | 0.96 | 17 | 0 | 0.9991 |
| 19 | 0 | 0 | 0.48 | 21 | 0 | 0.9568 | |
| 20 | t1 l | 0 | 0 | 2.9 | 9 | 1 | 1.00 |
| 21 | u | 0 | 0 | 1.3 | 13 | 0 | 0.9999 |
| 22 | 0 | 0 | 2.7 | 4 | 0 | 0.6128 | |
| 23 | c1 n1 u | 0 | 0 | 1.6 | 8 | 0 | 0.9989 |
| 24 | e1 t1 n1 h p | 0 | 0 | 2.4 | 9 | 1 | 1.00 |
| 25 | e2 o1 t1 n1 | 0 | 0 | 5.9 | 0 | 0 | 0.6135 |
Explanation of the listed behavior symbols appearing in Table 2.
| Behavior | Symbol |
|---|---|
| Backwards ears, moderately present | e1 |
| Backwards ears, obviously present | e2 |
| Orbital tightening, moderately present | o1 |
| Orbital tightening, obviously present | o2 |
| Tension above the eye area, moderately present | t1 |
| Tension above the eye area, obviously present | t2 |
| Mouth strained and pronounced chin, moderately present | c1 |
| Mouth strained and pronounced chin, obviously present | c2 |
| Strained nostrils and flattening of the profile, moderately present | n1 |
| Strained nostrils and flattening of the profile, obviously present | n2 |
|
| |
| Large movement | m |
| Mouth play | p |
| Lowered head | l |
| Clearly upright head | u |
| Human in clip | h |
Results (% F1-score) for intra-domain cross-validation for the respective datasets and models.
The results are averages of five repetitions of a full cross-validation and the average of the per-subject-across-runs standard deviations.
| Dataset | # horse folds | F1-score | Accuracy |
|---|---|---|---|
|
| |||
| PF | 6 | 73.5 ±7.1 | 75.2 ±7.4 |
| EOP(j) | 7 | 49.5 ±3.6 | 51.2 ±2.8 |
| PF + EOP(j) | 13* | 60.2 ±2.6 | 61.8 ±3.2 |
| (*PF | 69.1 ±4.9 | 71.1 ±3.9) | |
| (*EOP(j) | 53.4 ±3.0 | 53.9 ±2.5) | |
|
| |||
| PF | 6 | 76.1 ±1.5 | 76.6 ±1.1 |
| EOP(j) | 7 | 52.2 ±2.3 | 52.6 ±2.2 |
| PF + EOP(j) | 13* | 59.5 ±4.3 | 62.2 ±2.7 |
| (*PF | 71.3 ±3.5 | 73.1 ±1.4) | |
| (*EOP(j) | 49.4 ±5.3 | 52.9 ±3.7) | |
F1-scores on EOP(j), when varying the source of domain transfer, for models trained according to Section 3.3.
FT means fine-tuned (three repetitions of full cross-validation runs). Column letters indicate different test subjects. † represents a specific model instance, reoccurring in Tables 2, 6 and 7.
| Model | A | B | H | I | J | K | N | Global |
|---|---|---|---|---|---|---|---|---|
| PF, EOP(j)-FT | 48.7±4.1 | 56.8±0.9 | 47.0±3.4 | 43.6±4.7 | 53.6±1.7 | 54.5±0.9 | 58.2±0.6 | 51.8±2.3 |
| PF, EOP(j) unseen | 54.96 | 45.05 | 53.42 | 56.52 | 55.69 | 41.66 | 46.59 | 52.70 |
| 3 repeated runs | 49.9 ±2.8 | 48.0 ±2.1 | 47.2 ±2.6 | 52.2 ±1.8 | 59.0 ±1.0 | 40.7 ±0.8 | 46.1 ±2.2 | 52.6 ±0.4 |
| PF, EOP(j)-FT | 50.4±2.7 | 55.1±0.8 | 44.9±1.4 | 45.7±0.7 | 69.3±1.7 | 51.2 ±0.7 | 60.8±0.8 | 54.0±1.3 |
| PF, EOP(j) unseen † | 61.55 | 56.34 | 55.55 | 51.51 | 64.76 | 45.11 | 57.84 | 58.17 |
| 3 repeated runs | 59.4 ±4.6 | 56.7 ±2.7 | 60.5 ±5.7 | 52.1 ±1.8 | 53.2 ±14.0 | 49.6 ±3.9 | 54.2 ±4.2 | 56.3 ±2.8 |
Results on video-level for EOP(j), when applying a multiple-instance learning (MIL) filter during inference on the clip-level predictions.
The column letters designate different test subjects. The model has never trained on EOP(j), and is the same model instance as in Tables 2, 5 and 7.
| Model instance | MIL-filter | A | B | H | I | J | K | N |
|---|---|---|---|---|---|---|---|---|
| C-LSTM-2-PF † | - | 61.55 | 56.34 | 55.55 | 51.51 | 64.76 | 45.11 | 57.84 |
| C-LSTM-2-PF † | Top 5% | 88.31 | 51.13 | 59.06 | 59.06 | 65.37 | 31.58 | 36.36 |
| C-LSTM-2-PF † | Top 1% | 88.31 | 51.13 | 53.33 | 59.06 | 65.37 | 45.83 | 45.0 |
F1-scores (%) on the 25 clips of the expert baseline.
The C-LSTM-2-PF † instance was trained on PF but never on EOP(j). Asterisk: results on the entire EOP(j) dataset for comparison.
| Rater | No pain | Pain | Total |
|---|---|---|---|
| Human expert | 34.7 ±10.0 | 60.6 ±4.4 | 47.6 ±5.5 |
| C-LSTM-2-PF † | 75.0 | 76.9 | 76.0 |
| C-LSTM-2-PF †* | 54.47 | 61.86 | 58.17 |
Accuracies (%) from the expert baseline, varying with the chosen pain threshold.
| Threshold | No pain | Pain | Total |
|---|---|---|---|
| 0 | 28.1 ±11.2 | 72.7 ±9.4 | 51.3 ±4.6 |
| 1 | 37.4 ±14.1 | 65.2 ±9.7 | 51.9 ±5.7 |
| 2 | 48.5 ±18.7 | 52.7 ±13.8 | 50.7 ±7.5 |
Global average F1-scores for domain transfer experiments for I3D, using varying pre-training and fine-tuning schemes.
The model is trained on the PF dataset and tested on the EOP(j) dataset. Only the pre-trained model, fine-tuned with a frozen back-bone could achieve results slightly above random performance on EOP(j).
| Epoch | Scratch | Pre-trained | Pre-trained freeze back-bone |
|---|---|---|---|
| 25 | 46.8 ±8.9 | 46.5 ±4.4 | 51.7 ±1.4 |
| 63 | 46.3 ±9.3 | 46.4 ±7.5 | 52.6 ±0.4 |
| 115 | 46.6 ±10.4 | 47.4 ±1.4 | 52.6 ±0.05 |
| 200 | 46.3 ±7.2 | 43.2 ±3.8 | 52.4 ±0.5 |
|
| 24,542,116 | 24,542,116 | 4,100 |