| Literature DB >> 32626609 |
William N Weaver1,2, Julienne Ng1, Robert G Laport3.
Abstract
PREMISE: Obtaining phenotypic data from herbarium specimens can provide important insights into plant evolution and ecology but requires significant manual effort and time. Here, we present LeafMachine, an application designed to autonomously measure leaves from digitized herbarium specimens or leaf images using an ensemble of machine learning algorithms. METHODS ANDEntities:
Keywords: LeafMachine; computer vision; herbarium digitization; leaf morphology; machine learning
Year: 2020 PMID: 32626609 PMCID: PMC7328653 DOI: 10.1002/aps3.11367
Source DB: PubMed Journal: Appl Plant Sci ISSN: 2168-0450 Impact factor: 1.936
Figure 1Workflow of LeafMachine to process a herbarium specimen image. (A) LeafMachine accepts images as a batch input. (B) With each image, LeafMachine uses a modified DeepLabV3+ convolutional neural network (CNN) to segment the image into five segmentation classes, including a ‘leaf’ class for identified leaves. The CNN was trained with 280 ground‐truth images and validated with 71 ground‐truth images. (C) The CNN outputs a binary mask for each segmentation class (‘leaf’ class shown). (D) With the ‘leaf’ class binary mask, LeafMachine then crops the image into different leaf candidate masks (LCMs), each only containing one binary component, and uses an AdaBoost support vector machine (SVM) to further classify LCMs as a single, measurable leaf (check mark), partial leaf (not shown), leaf clump (cross), or not a leaf (rejected; not shown). The SVM was trained using 33,830 leaf candidate masks that were extracted from the CNN training set and manually sorted into ‘leaf,’ ‘partial leaf,’ ‘leaf clump,’ or ‘reject’ categories. (E) LeafMachine then overlays the LCMs onto the original image as output for user verification, as well as (F) a CSV file containing leaf area and perimeter measurements, and processing information.
Evaluation of each step LeafMachine takes to process a herbarium specimen image (see also Fig. 1). (A) The accuracy of LeafMachine’s convolutional neural network (CNN) to segment specimen images into five classes (including a ‘leaf’ class) was calculated as the intersection over union (the proportion of pixels correctly predicted, weighted by the proportion of ground‐truth pixels in each class) using a set of 74 ground‐truthed images. (B) The accuracy of LeafMachine’s support vector machine (SVM) algorithm to identify single, measurable leaves from specimens was evaluated using 1000 randomly sampled high‐ and low‐resolution images from 21,316 processed specimens from SWMT and COLO representing 20 families. (C) LeafMachine’s leaf measurement accuracy was evaluated by processing 12 custom‐created herbarium specimen images (Appendix S5) and comparing LeafMachine leaf measurements to manual measurements in ImageJ. Comparing the binary masks from LeafMachine‐ and ImageJ‐processed images, 38 and 42 high‐ and low‐resolution leaves, respectively, were identified to be comparable and were used to assess differences in leaf measurements.
| Steps taken by LeafMachine to process a specimen image | Evaluation |
|---|---|
|
Identifying different specimen components (see also Fig. |
All segmentation classes: 88.8% Leaf segmentation class: 55.2% |
|
Identifying single leaves (see also Fig. |
Identified at least one leaf: 82.0% high‐resolution, 60.8% low‐resolution Identified all measurable leaves on a specimen: 42.4% high‐resolution, 39.1% low‐resolution
Misidentified leaf: 0.9% high‐resolution, 2.1% low‐resolution |
|
Measuring leaves compared to ImageJ (see also Fig. |
High‐resolution: 145.4 mm2 (257.0 mm2 SD) Low‐resolution: 149.3 mm2 (250.9 mm2 SD) |
The 20 angiosperm families used to test LeafMachine, the total number of digitized specimens for each family, and the proportion of specimens from which LeafMachine identified and measured at least one single leaf.
| Family | No. of digitized specimens | % images with at least one leaf measurement | |
|---|---|---|---|
| High‐resolution | Low‐resolution | ||
| Aceraceae | 504 | 90.7 | 74.2 |
| Adoxaceae | 98 | 89.8 | 79.6 |
| Anacardiaceae | 456 | 93.0 | 75.9 |
| Betulaceae | 940 | 92.3 | 69.1 |
| Cannabaceae | 80 | 91.3 | 70.0 |
| Caprifoliaceae | 706 | 91.9 | 71.7 |
| Ericaceae | 508 | 88.4 | 69.3 |
| Fagaceae | 1507 | 81.8 | 61.6 |
| Lauraceae | 17 | 88.2 | 82.4 |
| Magnoliaceae | 25 | 80.0 | 100.0 |
| Malvaceae | 494 | 98.2 | 92.5 |
| Myrtaceae | 1 | 100.0 | 100.0 |
| Oleaceae | 488 | 96.9 | 84.4 |
| Platanaceae | 6 | 83.3 | 83.3 |
| Rhamnaceae | 724 | 96.8 | 83.1 |
| Salicaceae | 2534 | 98.4 | 80.7 |
| Sapindaceae | 84 | 72.6 | 91.7 |
| Solanaceae | 1148 | 79.0 | 33.1 |
| Ulmaceae | 29 | 86.2 | 96.6 |
| Vitaceae | 23 | 95.7 | 100.0 |
All digitized herbarium specimen images available for these families from the Rhodes College (SWMT) and the University of Colorado Boulder (COLO) herbaria were processed by LeafMachine.
Figure 2High‐ (A, C) and low‐ (B, D) resolution herbarium specimen images (Rhodes College herbarium barcode numbers SWMT09145 [A, B] and SWMT01894 [C, D]) processed by LeafMachine. In both high‐resolution images, LeafMachine identified that the specimens contained a single, measurable leaf (outlined in green; true positive) and a leaf clump (outlined in orange; true negative), while only a leaf clump was identified in the low‐resolution images (false negative). This reflects our general finding that higher‐resolution images result in better leaf identification and measurement by LeafMachine.