| Literature DB >> 33343217 |
Sohaib Younis1,2, Marco Schmidt3,1, Claus Weiland1, Stefan Dressler4, Bernhard Seeger2, Thomas Hickler1.
Abstract
As herbarium specimens are increasingly becoming digitised and accessible in online repositories, advanced computer vision techniques are being used to extract information from them. The presence of certain plant organs on herbarium sheets is useful information in various scientific contexts and automatic recognition of these organs will help mobilise such information. In our study, we use deep learning to detect plant organs on digitised herbarium specimens with Faster R-CNN. For our experiment, we manually annotated hundreds of herbarium scans with thousands of bounding boxes for six types of plant organs and used them for training and evaluating the plant organ detection model. The model worked particularly well on leaves and stems, while flowers were also present in large numbers in the sheets, but were not equally well recognised. Sohaib Younis, Marco Schmidt, Claus Weiland, Stefan Dressler, Bernhard Seeger, Thomas Hickler.Entities:
Keywords: convolutional neural networks; deep learning; digitisation; herbarium specimens; image annotation; object detection and localisation; plant organ detection
Year: 2020 PMID: 33343217 PMCID: PMC7746675 DOI: 10.3897/BDJ.8.e57090
Source DB: PubMed Journal: Biodivers Data J ISSN: 1314-2828
Figure 1.An illustration of the Faster R-CNN architecture, with ResNet for image feature extraction, RPN for generating object proposals and RoI Pooling for creating fixed-size feature maps for each proposal.
Figure 2.Number of taxa of different rank for the three datasets with overlaps at family, genus and species level. P(Tr), P(Te): MNHN Paris Herbarium training and test datasets, FR: Herbarium Senckenbergianum dataset.
Figure 3.A column chart showing the number of annotated bounding boxes for each organ. Red: Test subset, Blue: Training subset.
Figure 4.Families of labelled specimens (ordered by number of specimens) with number of labelled plant organs. The share of the plant organs differs between families, which may be due to factors depending on the plant itself and collecting habits (season, selection of identifiable specimens).
The number of annotated bounding boxes for each plant organ in training and test subset.
| Category | Training subset | Test subset | Complete dataset |
|---|---|---|---|
| Leaf | 7886 | 2051 | 9937 |
| Flower | 3179 | 763 | 3942 |
| Fruit | 1047 | 296 | 1343 |
| Seed | 4 | 6 | 10 |
| Stem | 3323 | 961 | 4284 |
| Root | 78 | 60 | 138 |
| Total | 15517 | 4137 | 19654 |
Figure 5.An illustration of Feature Pyramid Network, where feature maps are indicated by blue outlines and thicker outlines denote semantically stronger features (Lin et al. 2017).
The precision of the predictions on the MNHN Paris Herbarium test subset with COCO evaluation method.
| AP50 | AP75 | AP |
|---|---|---|
| 22.8 | 6.8 | 9.7 |
Average Precision of each type of organ along with the total bounding boxes for each category in the test subset.
| Category | Bounding Boxes | AP |
|---|---|---|
| Leaf | 2051 | 26.5 |
| Flower | 763 | 4.7 |
| Fruit | 296 | 7.8 |
| Seed | 6 | 0.0 |
| Stem | 961 | 9.9 |
| Root | 60 | 9.4 |
Result of model evaluation on the Herbarium Senckenbergianum annotated dataset.
| AP50 | AP75 | AP |
|---|---|---|
| 32.1 | 16.1 | 16.8 |
Average Precision of each type of organ along with the total bounding boxes for each category in the Herbarium Senckenbergianum annotated dataset.
| Category | Bounding Boxes | AP |
|---|---|---|
| Leaf | 3362 | 37.9 |
| Flower | 1921 | 18.3 |
| Fruit | 183 | 7.9 |
| Seed | 47 | 0.0 |
| Stem | 1063 | 25.1 |
| Root | 117 | 11.8 |