| Literature DB >> 33959704 |
Andrew M Durso1,2, Gokula Krishnan Moorthy3, Sharada P Mohanty4, Isabelle Bolon2, Marcel Salathé4,5, Rafael Ruiz de Castañeda2.
Abstract
We trained a computer vision algorithm to identify 45 species of snakes from photos and compared its performance to that of humans. Both human and algorithm performance is substantially better than randomly guessing (null probability of guessing correctly given 45 classes = 2.2%). Some species (e.g., Boa constrictor) are routinely identified with ease by both algorithm and humans, whereas other groups of species (e.g., uniform green snakes, blotched brown snakes) are routinely confused. A species complex with largely molecular species delimitation (North American ratsnakes) was the most challenging for computer vision. Humans had an edge at identifying images of poor quality or with visual artifacts. With future improvement, computer vision could play a larger role in snakebite epidemiology, particularly when combined with information about geographic location and input from human experts.Entities:
Keywords: biodiversity; crowd-sourcing; epidemiology; fine-grained image classification; reptiles
Year: 2021 PMID: 33959704 PMCID: PMC8093445 DOI: 10.3389/frai.2021.582110
Source DB: PubMed Journal: Front Artif Intell ISSN: 2624-8212
FIGURE 1Examples of high intra-class (i.e., intra-species) and low inter-class (i.e., inter-species) variance among snake images. All photos by Andrew M. Durso (CC-BY).
Summary of three test datasets used to evaluate identification accuracy of top algorithm. Performance of humans on TD3a yielded F1 = 0.76, accuracy = 68%, error = 32%, and on TD3b F1 = 0.79, accuracy = 79%, error = 21%. *The winning F1 high score of 0.861 reported above comes from a different randomly generated subset, not reported here in detail.
| Test dataset | TD1 | TD2 | TD3a | TD3b |
|---|---|---|---|---|
| # of images | 16,483 | 42,688 | 248 | |
| # of classes | 45 | 45 | 27 | |
| # of classes used to compute F1 | 45 | 45 | 45 | 27 |
| Minimum # of images per class | 94 | 23 | 1 | |
| Maximum # of images per class | 2,226 | 6,228 | 10 | |
| F1 score | 0.83* | 0.83 | 0.53 | 0.73 |
| Log-loss | 0.49 | 0.66 | 1.19 | 1.03 |
| Accuracy | 87% | 84% | 73% | 72% |
| Error | 13% | 16% | 27% | 28% |
| Dataset description | Subset of training dataset. Used for pre-submission testing by winning challenge participant | Photos submitted to iNaturalist after the beginning of our challenge | Photos collected from private individuals and social media and labeled by human experts (71–156 labels per image) | |
| When all 45 classes were taken into account (for fairer comparison with human labeling) | When only 27 classes were taken into account (for fairer comparison with TD1 and TD2) | |||
FIGURE 2Diagram of overall pipeline including both object detection and classification.
FIGURE 3Rotation augmentation of a sample image.
FIGURE 4Max learning rate schedule and corresponding F1 score.
FIGURE 5Confusion matrix for the top algorithm, using TD1 and an 80-20 split (subset of training data). The final model was trained on a 95-5 split. For an interactive version see https://chart-studio.plotly.com/~amdurso/1/#/.
Confusion among putative species of the North American Ratsnake (Pantherophis obsoletus) complex (Burbrink, 2001) in TD1 and TD2 (the three putative species were combined in TD3). The identity of species in training and testing data was done exclusively from photos, taking into account the geographic location but not information from scale counts or DNA.
| Correct ID of testing image | ID suggested by algorithm | Percent confusion in TD1 | Percent confusion in TD2 |
|---|---|---|---|
|
|
| 0.31 | 0.11 |
|
|
| 0.25 | 0.08 |
|
|
| 0.13 | 0.07 |
|
|
| 0.08 | 0.20 |
|
|
| 0.05 | 0.10 |
|
|
| 0.04 | 0.13 |
Confusion among species that are uncontroversially delimited, showing only pairs with confusion >10% in TD1, TD2, or TD3 (by algorithm or humans). MIVS = medically important venomous snakes.
| Correct ID | Given ID | TD1 | TD2 | TD3 | Humans-all | Humans-45 | Humans-27 (with geo) | Geo-overlap | MIVS |
|---|---|---|---|---|---|---|---|---|---|
|
|
| 0.26 | 0.18 | 0.10 | 0.06 | 0.11 | Y | Neither | |
|
|
| 0.23 | 0.08 | 0.33 | 0.24 | 0.13 | 0.24 | Y | Neither |
|
|
| 0.19 | NA | NA | NA | Y | Neither | ||
|
|
| 0.13 | 0.22 | 0.14 | NA | Y | Both | ||
|
|
| 0.11 | 0.07 | 0.25 | 0.04 | 0.02 | 0.02 | Y | Neither |
|
|
| 0.10 | 0.19 | 0.10 | 0.15 | Y | Neither | ||
|
|
| 0.10 | 0.18 | 0.22 | 0.12 | 0.06 | Y | Neither | |
|
|
| 0.17 | 0.02 | 0.01 | 0.02 | Y | Neither | ||
|
|
| 0.15 | 0.10 | 0.05 | NA | Y | Neither | ||
|
|
| 0.14 | 0.00 | 0.00 | NA | Y | Neither | ||
|
|
| 0.12 | 0.06 | 0.03 | 0.12 | Y | Neither | ||
|
|
| 0.11 | 0.19 | 0.10 | 0.24 | Y | Neither | ||
|
|
| 0.25 | 0.01 | 0.00 | 0.00 | Y | Neither | ||
|
|
| 0.25 | 0.00 | 0.00 | 0.00 | Y | Neither | ||
|
|
| 0.22 | 0.01 | 0.00 | 0.01 | N | Neither | ||
|
|
| 0.17 | 0.00 | 0.00 | 0.00 | N | False negative | ||
|
|
| 0.14 | 0.04 | 0.02 | 0.11 | Y | Both | ||
|
|
| 0.14 | 0.02 | 0.01 | <0.01 | Y | False negative | ||
|
|
| 0.14 | 0.16 | 0.10 | 0.02 | Y | Neither | ||
|
|
| 0.14 | <0.01 | 0.00 | 0.03 | Y | False negative | ||
|
|
| 0.14 | 0.01 | 0.01 | 0.00 | Y | Neither | ||
|
|
| 0.12 | 0.03 | 0.02 | <0.01 | Y | Neither | ||
|
|
| 0.11 | 0.02 | 0.01 | 0.04 | Y | Neither | ||
|
|
| 0.10 | 0.01 | 0.00 | <0.01 | Y | Both | ||
|
|
| 0.10 | 0.00 | 0.00 | 0.00 | Y | False positive | ||
|
|
| 0.10 | 0.04 | 0.03 | 0.01 | N | Both | ||
|
|
| 0.10 | 0.00 | 0.00 | <0.01 | Y | False positive | ||
|
|
| 0.10 | 0.01 | 0.00 | <0.01 | Y (barely) | False negative | ||
|
|
| 0.10 | 0.00 | 0.00 | <0.01 | Y (barely) | False negative | ||
|
|
| 0.10 | 0.01 | 0.00 | 0.02 | Y | Neither | ||
|
|
| 0.10 | 0.00 | 0.00 | 0.01 | N | Neither | ||
|
|
| 0.23 | Y | Neither | |||||
|
|
| 0.19 | 0.33 | Y | Neither | ||||
|
|
| 0.17 | Y | Neither | |||||
|
|
| 0.15 | Y | False positive | |||||
|
|
| 0.13 | Y | Both | |||||
|
|
| 0.13 | Y | Neither | |||||
|
|
| 0.11 | 0.11 | Y | Neither | ||||
|
|
| 0.19 | Y | Neither | |||||
|
|
| 0.15 | Y | Neither | |||||
|
|
| 0.14 | Y | Neither | |||||
|
|
| 0.14 | Y | Neither | |||||
|
|
| 0.12 | Y | Neither | |||||
|
|
| 0.10 | Y | Neither |
FIGURE 6Average F1 across three test datasets for all 45 species.
FIGURE 7Frequency with which confused species pairs were incorrectly identified by algorithm and humans. The algorithm erred on the side of the species with more training images much more often than humans.