| Literature DB >> 29958275 |
Noemí López-González1, Santiago Andrés-Sánchez1,2, Blanca M Rojas-Andrés1, M Montserrat Martínez-Ortega1.
Abstract
This study exhaustively explores leaf features seeking diagnostic characters to aid the classification (assigning cases to groups, i.e. populations to taxa) in a polyploid plant-species complex. A challenging case study was selected: Veronica subsection Pentasepalae, a taxonomically intricate group. The "divide and conquer" approach was implemented-that is, a difficult primary dataset was split into more manageable subsets. Three techniques were explored: two data-mining tools (artificial neural networks and decision trees) and one unsupervised discriminant analysis. However, only the decision trees and discriminant analysis were finally used to select diagnostic traits. A previously established classification hypothesis based on other data sources was used as a starting point. A guided discriminant analysis (i.e. involving manual character selection) was used to produce a grouping scheme fitting this hypothesis so that it could be taken as a reference. Sequential unsupervised multivariate analysis enabled the recognition of all species and infraspecific taxa; however, a suboptimal classification rate was achieved. Decision trees resulted in better classification rates than unsupervised multivariate analysis, but three complete taxa were misidentified (not present in terminal nodes). The variable selection led to a different grouping scheme in the case of decision trees. The resulting groups displayed low misclassification rates when analyzed using artificial neural networks. The decision trees as well as the discriminant analysis are recommended in the search of diagnostic characters. Due to the high sensitivity that artificial neural networks have to the combination of input/output layers, they are proposed as evaluation tools for morphometric studies. The "divide and conquer" principle is a promising strategy, providing success in the present case study.Entities:
Mesh:
Year: 2018 PMID: 29958275 PMCID: PMC6025878 DOI: 10.1371/journal.pone.0199818
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Plant material.
| Operational taxonomic unit (OTU) | Number of individuals | Number of populations |
|---|---|---|
| 21 | 7 | |
| - | - | |
| (1) | 15 | 5 |
| (2) | 36 | 12 |
| (3) | 55 | 19 |
| 25 | 9 | |
| 24 | 8 | |
| 6 | 2 | |
| 33 | 11 | |
| 72 | 24 | |
| 43 | 15 | |
| 14 | 5 | |
| 40 | 14 | |
| 42 | 15 | |
| 41 | 15 | |
| - | - | |
| (1) | 14 | 5 |
| (2) | 34 | 12 |
| (3) | 29 | 10 |
| 9 | 3 | |
| 43 | 15 | |
| 9 | 3 | |
| Total | 605 | 209 |
Summary of individuals and populations included in the morphometric study. The abbreviations of the 20 operational taxonomic units (OTUs) corresponding to the taxonomic starting hypothesis are indicated in brackets.
* The species marked with an asterisk comprise several subspecies; those belonging to V. austriaca have been highlighted in blue, while those of V. tenuifolia appear in red.
Fig 1Starting taxonomic hypothesis.
Simplified neighbour joining of the taxa examined for V. subsect Pentasepalae; modified from Padilla et al. 2017. a) V. satureiifolia, Borau, Spain. Photo: N. Padilla-García; b) V. senneni, Borau, Spain. Photo: N. Padilla-García; c) V. teucrium, Novi Sad, Serbia. Photo: S. Andrés-Sánchez. d) V. orbiculata, Makarska, Croatia. Photo: S. Andrés-Sánchez; e) V. aragonensis, Mount Baziero, Spain. Photo: N. Padilla-García; f) V. rosea, Djebel Lakra, Marruecos. Photo: S. Andrés-Sánchez; g) V. tenuifolia ssp. fontqueri, Sierra de las Nieves, Spain. Photo: J. Peñas de Giles; h) V. tenuifolia ssp. javalambrensis, Valdeajos, Spain. Photo: N. Padilla-García; i) V. tenuifolia ssp. tenuifolia, Bordón, Spain. Photo: M. M. Martínez-Ortega; j) V. orsiniana, Iglesuela del Cid, Spain. Photo: M. M. Martínez-Ortega; k) V. kindlii, Pljevlja, Montenegro. Photo: S. Andrés-Sánchez; l) V. teucrioides, Mount Olimpus, Greece. Photo: B. M. Rojas-Andrés; m) V. linearis, Kozjak Lake, FYROM. Photo: N. López-González; n) V. rhodopea, Belmeken, Bulgaria. B. M. Rojas-Andrés; o) V. crinita, Popovitsa, Bulgaria. Photo: M. M. Martínez-Ortega; p) V. prostrata, Pirot, Serbia. Photo: S. Andrés-Sánchez; q) V. turrilliana, Veleka river, Bulgaria. Photo: B. M. Rojas-Andrés; r) V. austriaca ssp. austriaca, Cerna Mountains, Romania. Photo: A. Badarau; s) V. austriaca ssp. dentata, Botanical Garden (Univerzity Karlovy, Prague), Czech Republic. Photo: M. Kesl; t) V. austriaca ssp. jacquinii, Josipdol, Croatia. Photo: S. Andrés-Sánchez;
Fig 2Distribution map of the populations included in this study.
Characters measured and abbreviations.
| Abbreviation | Morphological character | ||
|---|---|---|---|
| Medium leaf | Length of trichomes | ||
| Density of indumentum | |||
| Width | Maximum width | ||
| Middle part | |||
| Entire terminal part | |||
| First tooth | |||
| Second tooth | |||
| Length | Total | ||
| First tooth/segment | |||
| First division/segment (bipinnatisect leaf) | |||
| Second tooth/segment | |||
| First tooth of the second segment (bipinnatisect leaf) | |||
| Petiole | |||
| Distance between the leaf base and the maximum width line | |||
| Distance between the leaf apex and the uppermost teeth | |||
| Number of teeth per hemilimb | |||
| Leaf of the apical shoot | Width | Maximum width | |
| Middle part | |||
| Entire terminal part | |||
| First tooth | |||
| Second tooth | |||
| Length | Total | ||
| First tooth/segment | |||
| First division/segment (bipinnatisect leaf) | |||
| Second tooth/segment | |||
| First tooth of the second segment (bipinnatisect leaf) | |||
| Petiole | |||
| Distance between the leaf base and the maximum width line | |||
| Distance between the leaf apex and the uppermost teeth | |||
| Number of teeth per hemilimb | |||
Fig 3Workflow.
The workflow involves the following steps (separated in the image by dashed gray lines): creation of the references, data-analysis approaches and evaluation of the results. The green ticks mean optimal outcomes while red crosses mean suboptimal ones. Processes related to the search of diagnostic characters are indicated in blue, while those corresponding to the groupings are indicated in light orange.
Principal component analysis.
| Axis | Eigenvalue | Percent | Cumulative |
|---|---|---|---|
| 1 | 696.91 | 53.57 | 53.57 |
| 2 | 221.66 | 17.04 | 70.61 |
| 3 | 102.31 | 7.86 | 78.47 |
| 4 | 69.06 | 5.31 | 83.78 |
| 5 | 52.20 | 4.01 | 87.79 |
| 6 | 31.60 | 2.43 | 90.22 |
| 7 | 22.05 | 1.70 | 91.92 |
| 8 | 21.46 | 1.65 | 93.57 |
| 9 | 16.39 | 1.26 | 94.83 |
| 10 | 15.23 | 1.17 | 96.00 |
Eigenvalues and percentages of the data variance accounted by each axis.
Fig 4Initial classification scheme through guided DAs.
Partition of the original dataset in accordance with the starting hypothesis. Box-plots for (A) STLM/STWM, (B) DI, (C) LT, and (D) LLM. See Table 1 for abbreviations. The circles indicate Group II, Group IV, Group VI, and Group VIII, respectively.
Fig 5Pruned tree.
ANNs per species results.
| Species | MCR | NC |
|---|---|---|
| 0.39 | 7 | |
| 1.00 | 5 | |
| 0.38 | 12 | |
| 0.17 | 19 | |
| 0.89 | 9 | |
| 0.56 | 8 | |
| 0.33 | 2 | |
| 0.49 | 11 | |
| 0.18 | 24 | |
| 0.10 | 15 | |
| 0.33 | 5 | |
| 0.56 | 14 | |
| 0.30 | 15 | |
| 0.55 | 15 | |
| 0.44 | 5 | |
| 0.05 | 12 | |
| 0.17 | 10 | |
| 1.00 | 3 | |
| 0.10 | 15 | |
| 1.00 | 3 |
(Hidden layer = 1, number of neurons = 16, output layers = 20). MCR = misclassification rate. NC = number of cases.
* Values over 0.4 indicated with an asterisk.
MCR calculated through ANN.
| Guided DAs final groups; hidden layer = 1 | |||||
|---|---|---|---|---|---|
| I | III | V | VII | VIII | |
| 0.14 | 0.053 | 0.024 | 0.038 | 0.089 | |
| 0.0086 | 0.0089 | 0.0085 | 0.0086 | 0.0086 | |
| 81.00 | 64.80 | 27.36 | 47.34 | 86.38 | |
In this case the input layers are the variables manually selected with the help of guided DAs and the output layers the entities within the final groups (initial classification scheme).
MCR calculated through ANN.
| DTs final groups; hidden layer = 1 | |||||||
|---|---|---|---|---|---|---|---|
| A | A1 | A2 | B | C | C1 | C2 | |
| 0.153 | 0.177 | 0.05 | 0.109 | 0.057 | 0.037 | 0.041 | |
| 0.0089 | 0.009 | 0.0088 | 0.0089 | 0.0087 | 0.0089 | 0.0089 | |
| 321.04 | 161.49 | 106.33 | 101.53 | 107.64 | 53.02 | 50.62 | |
The input layers are the variables selected in DT analysis and the output layers the entities within the final groups obtained through the pruned tree.