| Literature DB >> 32765978 |
Brianna K Almeida1, Manish Garg2, Miroslav Kubat2, Michelle E Afkhami1.
Abstract
PREMISE: Advancements in machine learning and the rise of accessible "big data" provide an important opportunity to improve trait-based plant identification. Here, we applied decision-tree induction to a subset of data from the TRY plant trait database to (1) assess the potential of decision trees for plant identification and (2) determine informative traits for distinguishing taxa.Entities:
Keywords: TRY plant trait database; decision tree; information gain; machine learning; plant identification
Year: 2020 PMID: 32765978 PMCID: PMC7394705 DOI: 10.1002/aps3.11379
Source DB: PubMed Journal: Appl Plant Sci ISSN: 2168-0450 Impact factor: 1.936
Results of the 10‐fold cross‐validation. Each of the 10 trees produced during the 10‐fold cross‐validation analysis was constructed by first leaving out 10% of the data (one of the 10 folds; test data set) and inducing trees using the remaining 90% of the data (the remaining nine folds). The estimate of accuracy reflects how well the resulting tree classified each test data set, providing insight into how well the classifier model places unseen species into their genera.
| Tree fold | Accuracy (%) |
|---|---|
| 1 | 84.0 |
| 2 | 95.6 |
| 3 | 81.2 |
| 4 | 89.9 |
| 5 | 91.3 |
| 6 | 92.8 |
| 7 | 85.5 |
| 8 | 88.4 |
| 9 | 86.8 |
| 10 | 86.8 |
| Mean ± SE | 89.1 ± 1.4 |
Description and information content of the 16 plant traits from the TRY plant trait database (TRYdb).
| Trait | Information gain | Description |
|---|---|---|
| Leaf shape | 2.3554 | Describes the leaf veins, lobes, or leaflets (always grass‐like, linear, always long‐leaf). |
| Fruit type | 2.3318 | Describes the form of the ripened ovary (nut, capsule, schizocarp). |
| Plant growth form | 2.0645 | The whole plant growth form with respect to woodiness (herb, forbs, graminoid). |
| Flower color | 1.8353 | The color of the flowers (green, brown, yellow). |
| Flower sexual system | 1.7827 | Referring to the presence of the stamen and carpel on an individual plant (hermaphroditic, monoecious, andromonoecious). |
| Leaf distribution along the axis (arrangement) | 1.5035 | The form in which leaves cluster on the stem (rosette, semi‐rosette, alternate). |
| Seed morphology | 1.2794 | The form of the fully mature fertilized ovule at its associated structures (elaiosome, open structures, flat appendages). |
| Apomixis | 0.9106 | A feature of the whole plant characterizing the apomixis with respect to its pollination needs (amphimictic, sexual, obligate apomictic). |
| Species reproduction types | 0.8188 | Spore, seed, or vegetative structures (by seed and vegetatively, mostly by seed, rarely by seed). |
| Leaf shape 5: leaf base | 0.6729 | Describes the curvature at the base of the leaf where attached to the petiole/stem (cuneate, cordate, rounded). |
| Leaf shape 6: leaf petiole type | 0.6571 | Describes the presence of the petiole and any appendages associated with it (petiolate, sessile, subsessile). |
| Dicliny (monoecious, dioecious, hermaphrodite) | 0.6374 | A feature of the whole plant defining the spatial separation of sexes on one or several flowers and/or individual plants (hermaphrodite, monoecious, dioecious). |
| Leaf shape 2: outline | 0.5969 | Describes the curvature of the leaf margins (toothed, lobed, serrulate). |
| Shoot growth form | 0.507 | A feature of the whole plant defining the growth form with respect to stem branching mode (stem ascending to prostrate, stem erect, stem prostate). |
| Fertilization | 0.448 | Refers to the genotypic mixing required for fertilization to occur (apomictic, automatic self, obligatory cross). |
| Leaf shape 3: pointed/round | 0.3384 | Describes the curvature of the leaf apex (rounded, point, mucronate). |
The traits shaded in gray were included in the final decision tree by the machine learning algorithm.
TRYdb uses trait definitions from Garnier et al. (2017).
Three categories of each trait are shown in parentheses after the definition. In cases with more than three possible categories, please see Appendix S22 for additional categories. For the definitions of these categories, consult the TRY website (https://www.try‐db.org/TryWeb/Data.php [accessed July 2019]).
Figure 1Genera of examined plants ordered by the proportion of taxonomic misassignment when using the decision tree for plant identification. The proportion of misassignment was calculated by dividing the number of species from a given genus that were not assigned to that genus by the total number of species examined for that genus.
Figure 2Relationship between the initial trait information gain and the proportion of missing information. The amount of missing information for a trait explains ~60% of the variation in the information gain of that trait. Traits with a high proportion of missing information correspond to a low information gain (R 2 = 0.602, df = 1,14, P = 0.0002).