| Literature DB >> 32626608 |
Damon P Little1, Melissa Tulig1, Kiat Chuan Tan2, Yulong Liu2,3, Serge Belongie2,4, Christine Kaeser-Chen2, Fabián A Michelangeli1, Kiran Panesar5, R V Guha5, Barbara A Ambrose1.
Abstract
PREMISE: Plant biodiversity is threatened, yet many species remain undescribed. It is estimated that >50% of undescribed species have already been collected and are awaiting discovery in herbaria. Robust automatic species identification algorithms using machine learning could accelerate species discovery.Entities:
Keywords: FGVC; Kaggle; Melastomataceae; artificial intelligence; computer vision; herbarium specimen; machine learning
Year: 2020 PMID: 32626608 PMCID: PMC7328655 DOI: 10.1002/aps3.11365
Source DB: PubMed Journal: Appl Plant Sci ISSN: 2168-0450 Impact factor: 1.936
FIGURE 1Workflow diagram of the Herbarium 2019 Challenge. The top half of the oval shows a typical taxonomy workflow while the bottom half illustrates the fine‐grained visual classification and machine learning workflow. For Herbarium 2019, the workflow spanned field collections all the way through to the output models. The dotted line connecting model development to species identification and curation remains an open research question. The Herbarium 2019 data set was composed of specimens from the family Melastomataceae, collected in the field, curated, and for which the species had been determined. The physical specimens were previously imaged and placed in the C.V. Starr Virtual Herbarium at New York Botanical Garden. For the Herbarium 2019 competition, the curated Melastomataceae data set contained species with at least 20 specimens each. The data set was resized, and the text and barcodes blurred. The data set was split into training, validation, and test data sets. Competitors developed their models with the test data set using convolutional neural network architectures, which included feature extraction and the modification of variable neuron weights to fit the specimen images. The output classification is represented as a histogram that ranks the probability of the species identification.
FIGURE 2Histogram of the long‐tailed distribution of the Melastomataceae showing the number of specimens for each species. Only species with a minimum of 20 specimens were included in the data set, and 352 species were represented by 20–39 specimens. One species, Miconia prasina (Sw.) DC., was represented by 865 specimens.
The top four Herbarium 2019 Challenge competitors with the accuracy of their models and the model architectures they used in the competition.
| Name of team | Accuracy | Model architecture |
|---|---|---|
| MEGVII Research Nanjing (Beijing, China) | 89.8% | SeResNeXt‐101, ResNet‐50 |
| PEAK (Dalian University of Technology, Liaoning, China) | 89.1% | Not made publicly available |
| Miroslav Valan (Swedish Museum of Natural History, Stockholm, Sweden) | 89.0% | SENet‐154, ResNet‐50 |
| Hugo Touvron and Andrea Vedaldi (Facebook AI Research London, UK) | 88.9% | SENet‐154, ResNet‐50 |