| Literature DB >> 32626612 |
Laura Brenskelle1,2, Rob P Guralnick1, Michael Denslow1, Brian J Stucky1.
Abstract
PREMISE: Digitization and imaging of herbarium specimens provides essential historical phenotypic and phenological information about plants. However, the full use of these resources requires high-quality human annotations for downstream use. Here we provide guidance on the design and implementation of image annotation projects for botanical research. METHODS ANDEntities:
Keywords: citizen science; herbarium specimens; image annotation; machine learning; phenology; specimen images
Year: 2020 PMID: 32626612 PMCID: PMC7328657 DOI: 10.1002/aps3.11370
Source DB: PubMed Journal: Appl Plant Sci ISSN: 2168-0450 Impact factor: 1.936
FIGURE 1Examples of Prunus (left) and Acer (right) specimens included in our studies. The left image from the Ronald L. Jones Herbarium at Eastern Kentucky University (https://www.idigbio.org/portal/mediarecords/f45a2afe‐38b5‐414a‐9ab3‐85f30ae816d9) shows an example of a Prunus fasciculata (Torr.) A. Gray specimen. This image shows unfolded leaves present, fruits absent, and flowers present. The flowers on this species are inconspicuous and best seen by zooming into the full‐scale image, but we have indicated where flowers can be seen by the blue arrow on the image. The image on the right is an Acer negundo L. specimen from the Ted R. Bradley Herbarium at George Mason University (https://www.idigbio.org/portal/records/05320ea2‐d731‐4878‐a308‐f75bcf45e590). This image shows unfolded leaves present, fruits present (indicated by the blue arrow), and flowers absent.
Best‐fit models for the individual studies and the combined model comparing between the two studies.
| Study | Best model parameters | Nonsignificant parameters |
|
|
Marginal
| Conditional |
|---|---|---|---|---|---|---|
| In‐person volunteer |
Fixed effects: genus + trait + genus*trait Random effects: volunteer | Fixed effects: prior experience, career position, total time | 2.3 | 10,795 | 0.016 | 0.062 |
| Notes from Nature |
Fixed effects: genus + trait + image count Random effects: volunteer | Fixed effects: time, genus*trait | 1 | 47,535 | 0.079 | 0.274 |
| Combined |
Fixed effects: genus + trait + study Random effect: volunteer | Fixed effects: genus*trait | ~0 | 58,330 | 0.026 | 0.211 |
| Majority rule | Fixed effects: method + genus + trait | — | — | 17,984 | 0.009 | — |
Note: n = number of trait annotations included in a data set.
AIC shows the difference in Akaike information criterion of the “best” model in comparison to the full model. Where no AIC value is reported, the full model was the best.
Best‐fit models testing whether accuracy in the two studies varied by species. For these models, we filtered out species with fewer than 30 images in the data set. In both the in‐person and Notes from Nature annotations, species is significant in the models.
| Study | Best model parameters |
|
| Marginal | Conditional |
|---|---|---|---|---|---|
| In‐person volunteer |
Fixed effects: genus + trait + prior experience Random effects: volunteer, species | 4.1 | 9044 | 0.004 | 0.104 |
| Notes from Nature |
Fixed effects: genus + trait + image count Random effects: volunteer, species | 2 | 44,754 | 0.076 | 0.301 |
Note: n = number of trait annotations included in a data set.
AIC shows the difference in Akaike information criterion of the “best” model in comparison to the full model. Where no ΔAIC value is reported, the full model was the best.
FIGURE 2Predicted accuracy of in‐person volunteers (A) and Notes from Nature volunteers (B) given the plant genus and phenological trait. Error bars represent 95% confidence intervals.
FIGURE 3Histograms of volunteer accuracy for the in‐person (A) and Notes from Nature (B) studies. Notes from Nature volunteers who annotated fewer than 30 images were excluded.
The distribution of the number of images scored by all 392 Notes from Nature volunteers, and the percent of images each group scored correctly.
| No. of images scored | No. of volunteers | Percent correct |
|---|---|---|
| 1 | 70 | 82% |
| 2–5 | 144 | 82% |
| 6–29 | 128 | 86% |
| 30–100 | 27 | 88% |
| 101+ | 23 | 96% |
FIGURE 4Predicted accuracy for the in‐person and Notes from Nature (NfN) studies. Error bars represent 95% confidence intervals.
FIGURE 5Predicted accuracy for majority‐rule scorings (triplicate scoring; two out of three annotations) versus single classifications from the Notes from Nature study. Error bars represent 95% confidence intervals.