| Literature DB >> 35577859 |
Masato Shirai1, Atsuko Takano2, Takahide Kurosawa3, Masahito Inoue4, Shuichiro Tagane5, Tomoya Tanimoto6, Tohru Koganeyama7, Hirayuki Sato7, Tomohiko Terasawa7, Takehito Horie7, Isao Mandai8, Takashi Akihiro9.
Abstract
Herbarium specimens are dried plants mounted onto paper. They are used by a limited number of researchers, such as plant taxonomists, as a source of information on morphology and distribution. Recently, digitised herbarium specimens have begun to be used in comprehensive research to address broader issues. However, some specimens have been misidentified, and if used, there is a risk of drawing incorrect conclusions. In this study, we successfully developed a system for identifying taxon names with high accuracy using an image recognition system. We developed a system with an accuracy of 96.4% using 500,554 specimen images of 2171 plant taxa (2064 species, 9 subspecies, 88 varieties, and 10 forms in 192 families) that grow in Japan. We clarified where the artificial intelligence is looking to make decisions, and which taxa is being misidentified. As the system can be applied to digitalised images worldwide, it is useful for selecting and correcting misidentified herbarium specimens.Entities:
Mesh:
Year: 2022 PMID: 35577859 PMCID: PMC9110755 DOI: 10.1038/s41598-022-11450-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Specimen images taken with a scanner (a1) and with a camera (a2). (b1) Multiple individuals in one specimen image and (b2) image divided into two. (c) Specimen images showing only branches and leaves that have fallen. (d) Specimen image with many holes in a leaf eaten by insects. (e1) Specimen image with a scale, stamp, and colour bar and (e2) the image after the scale, stamp, and colour bar were removed.
Figure 2(a) List of herbarium-stored specimens used in this study. (b) Locations of specimen archives.
(a) List of experiments and results. (b) List of methods and results. These experiments were performed with the specimens used for the third experiment (excluding broken and misidentified specimens).
| Experiment No | Experiment name | Accuracy | Macro | Weighted | Number of herbarium images | Number of training images | Number of test images | Number of plant taxa | Number of family | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Top1 (%) | Top5 (%) | Recall | Precision | f-score | Recall | Precision | f-score | |||||||
| 1 | All images | 92 | 98 | 0.844 | 0.824 | 0.825 | 0.927 | 0.923 | 0.923 | 5,46,184 | 4,08,701 | 1,55,034 | 3114 | 219 |
| 2 | w/o below 50 images species | 94 | 99 | 0.898 | 0.892 | 0.892 | 0.942 | 0.93 | 0.939 | 5,34,778 | 3,85,536 | 1,49,242 | 2191 | 192 |
| 3 | w/o broken or misidentified specimen | 96 | 99 | 0.929 | 0.921 | 0.923 | 0.964 | 0.962 | 0.962 | 5,00,554 | 3,63,071 | 1,37,483 | 2171 | 192 |
| 4 | w/o color-bar, stamp, scale | 96 | 99 | 0.921 | 0.912 | 0.913 | 0.958 | 0.957 | 0.956 | 5,00,554 | 3,63,071 | 1,37,483 | 2171 | 192 |
| 5 | Only Pteridophytes | 98 | 100 | 0.946 | 0.947 | 0.945 | 0.985 | 0.984 | 0.984 | 2,04,174 | 1,26,218 | 77,956 | 357 | 32 |
List of 17 taxa that are frequently misidentified in the records of identification history of the specimens in the Herbarium of University Archives and Collections, Fukushima University (FKSE).
| Frequently misidentified taxa | No samples in test | Species misidentified in FKSE | No. in Top-1 | No. in Top-2 | Percentage in Top-1 (%) | Percentage in Top-2 (%) | |
|---|---|---|---|---|---|---|---|
| 1 | 51 | 1 | 15 | 6 | 37 | ||
| 0 | 1 | ||||||
| 2 | 3 | ||||||
| 2 | 27 | 0 | 0 | 15 | 15 | ||
| 1 | 1 | ||||||
| 2 | 3 | ||||||
| 0 | 0 | ||||||
| 1 | 0 | ||||||
| 3 | 47 | 1 | 6 | 4 | 32 | ||
| 0 | 0 | ||||||
| 0 | 2 | ||||||
| 0 | 1 | ||||||
| 0 | 0 | ||||||
| 0 | 0 | ||||||
| 1 | 6 | ||||||
| 4 | 19 | 2 | 1 | 16 | 47 | ||
| 0 | 0 | ||||||
| 1 | 0 | ||||||
| 0 | 8 | ||||||
| 5 | 178 | 2 | 6 | 3 | 57 | ||
| 3 | 52 | ||||||
| 0 | 43 | ||||||
| 6 | 47 | 0 | 0 | 17 | 6 | ||
| 5 | 1 | ||||||
| 0 | 1 | ||||||
| 3 | 1 | ||||||
| 0 | 0 | ||||||
| 7 | 13 | 2 | 2 | 15 | 15 | ||
| 0 | 2 | ||||||
| 8 | 55 | 2 | 14 | 5 | 44 | ||
| 1 | 6 | ||||||
| 0 | 3 | ||||||
| 0 | 1 | ||||||
| 0 | 0 | ||||||
| 0 | 0 | ||||||
| 0 | 0 | ||||||
| 9 | 20 | 1 | 6 | 5 | 45 | ||
| 0 | 3 | ||||||
| 10 | 113 | 3 | 15 | 13 | 50 | ||
| 9 | 40 | ||||||
| 3 | 2 | ||||||
| 11 | 51 | 1 | 12 | 6 | 61 | ||
| 1 | 18 | ||||||
| 0 | 1 | ||||||
| 12 | 99 | 1 | 11 | 6 | 51 | ||
| 5 | 39 | ||||||
| 13 | 152 | 10 | 26 | 11 | 31 | ||
| 0 | 3 | ||||||
| 2 | 10 | ||||||
| 5 | 8 | ||||||
| 14 | 43 | 0 | 2 | 2 | 9 | ||
| 0 | 1 | ||||||
| 1 | 1 | ||||||
| 15 | 139 | 10 | 43 | 11 | 40 | ||
| 5 | 8 | ||||||
| 0 | 4 | ||||||
| 0 | 0 | ||||||
| 16 | 32 | 3 | 3 | 22 | 31 | ||
| 2 | 2 | ||||||
| 1 | 0 | ||||||
| 1 | 5 | ||||||
| 0 | 0 | ||||||
| 17 | 32 | 1 | 0 | 6 | 3 |
Figure 3(a) A cross-tabulation table was created for the recall and precision values of 17 taxa of Salix L in the third experiment. (1) S. integra Thunb., (2) S. futura Seemen, (3) S. pierotii Miq., (4) S. udensis Trautv. et C.A.Mey., (5) S. miyabeana Seemen subsp. gymnolepis (H.Lév. et Vaniot) H.Ohashi et Yonek., (6) S. vulpina Andersson subsp. vulpina, (7) S. dolichostyla Seemen subsp. serissifolia (Kimura) H.Ohashi et H.Nakai, (8) S. vulpina Andersson subsp. alopochroa (Kimura) H.Ohashi et Yonek., (9) S. eriocataphylla Kimura, (10) S. japonica Thunb., (11) S. dolichostyla Seemen subsp. dolichostyla, (12) S. eriocarpa Franch. et Sav., (13) S. triandra L. subsp. nipponica (Franch. et Sav.) A.K.Skvortsov, (14) S. gracilistyla Miq., (15) S. caprea L., (16) S. chaenomeloides Kimura, (17) S. sieboldiana Blume var. sieboldiana, (18) others. (b) Results of Grad-CAM analysis of Salix integra and Salix futura. Red indicates the more important parts while blue represents less important parts.
Figure 4(a) Astilbe thunbergii (Siebold & Zucc.) Miq. var. thunbergii, (b) Astilbe odontophylla Miq., (c) Machilus thunbergii Siebold & Zucc., and (d) Lithocarpus edulis (Makino) Nakai. Species (a) and (b) and species (c) and (d) are different plants with similar morphologies. These species are often misidentified by experts and were misidentified by AI in this study.
Figure 5Results of Grad-CAM analysis of (a) Thelypteris acuminata (Houtt.) C.V. Morton and (b) Polystichum tripteron (Kunze) C. Presl. (c) In the Grad-CAM analysis, only part of the specimen was considered important (red). (c2) Images in which the focal part was cut out and (c3) images in which only the non-focal part was cut out were created and used for Grad-CAM analysis. (d1) Multiple individuals in one specimen image. The image was divided into two images (d2, 3) and used for Grad-CAM analysis. (e) The sample image was halved vertically and horizontally, and further divided it into four both vertically and horizontally.
Figure 6A system that automatically identified the taxa name from an image of a plant (http://tayousei.life.shimane-u.ac.jp/ai/index_all.php). The identification system for pteridophytes is located at http://tayousei.life.shimane-u.ac.jp/ai/index_Pteridophytes.php. Drag and drop an image of the plant for which you want to identify the taxa name, or select files and click the send button. Plant taxa from Top-1 to Top-5 are displayed as candidates, and the accuracy is also displayed.