| Literature DB >> 28924506 |
Miraemiliana Murat1, Siow-Wee Chang1, Arpah Abu1, Hwa Jen Yap2, Kien-Thai Yong3.
Abstract
Plants play a crucial role in foodstuff, medicine, industry, and environmental protection. The skill of recognising plants is very important in some applications, including conservation of endangered species and rehabilitation of lands after mining activities. However, it is a difficult task to identify plant species because it requires specialized knowledge. Developing an automated classification system for plant species is necessary and valuable since it can help specialists as well as the public in identifying plant species easily. Shape descriptors were applied on the myDAUN dataset that contains 45 tropical shrub species collected from the University of Malaya (UM), Malaysia. Based on literature review, this is the first study in the development of tropical shrub species image dataset and classification using a hybrid of leaf shape and machine learning approach. Four types of shape descriptors were used in this study namely morphological shape descriptors (MSD), Histogram of Oriented Gradients (HOG), Hu invariant moments (Hu) and Zernike moments (ZM). Single descriptor, as well as the combination of hybrid descriptors were tested and compared. The tropical shrub species are classified using six different classifiers, which are artificial neural network (ANN), random forest (RF), support vector machine (SVM), k-nearest neighbour (k-NN), linear discriminant analysis (LDA) and directed acyclic graph multiclass least squares twin support vector machine (DAG MLSTSVM). In addition, three types of feature selection methods were tested in the myDAUN dataset, Relief, Correlation-based feature selection (CFS) and Pearson's coefficient correlation (PCC). The well-known Flavia dataset and Swedish Leaf dataset were used as the validation dataset on the proposed methods. The results showed that the hybrid of all descriptors of ANN outperformed the other classifiers with an average classification accuracy of 98.23% for the myDAUN dataset, 95.25% for the Flavia dataset and 99.89% for the Swedish Leaf dataset. In addition, the Relief feature selection method achieved the highest classification accuracy of 98.13% after 80 (or 60%) of the original features were reduced, from 133 to 53 descriptors in the myDAUN dataset with the reduction in computational time. Subsequently, the hybridisation of four descriptors gave the best results compared to others. It is proven that the combination MSD and HOG were good enough for tropical shrubs species classification. Hu and ZM descriptors also improved the accuracy in tropical shrubs species classification in terms of invariant to translation, rotation and scale. ANN outperformed the others for tropical shrub species classification in this study. Feature selection methods can be used in the classification of tropical shrub species, as the comparable results could be obtained with the reduced descriptors and reduced in computational time and cost.Entities:
Keywords: Classification; Feature selection; Machine learning; Shape descriptor; Tropical shrubs
Year: 2017 PMID: 28924506 PMCID: PMC5600178 DOI: 10.7717/peerj.3792
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Location of sampling area in the University of Malaya (UM), Kuala Lumpur, Malaysia.
List of tropical shrub species in the myDAUN dataset.
| Location | Label | Scientific name | Common name |
|---|---|---|---|
| Faculty of Science | 1 | Tea Leaves | |
| 2 | Copperleaf | ||
| 5 | Yesterday, Today and Tomorrow | ||
| 6 | Sabah Snake Grass | ||
| 7 | Yellow Simpoh | ||
| 8 | Japanese Bamboo | ||
| 9 | Song of India | ||
| 12 | Caricature | ||
| 15 | Crepe Myrtle | ||
| 16 | Lantana | ||
| 17 | Henna | ||
| 19 | Banana Shrub | ||
| 20 | Sleepy Mallow | ||
| 22 | Sesenduk | ||
| 27 | Dinner-plate Aralia | ||
| 28 | Star Gooseberry | ||
| 29 | Bayam Karang | ||
| 30 | Ceylon Jasmine | ||
| 31 | Glory Bush | ||
| 32 | China orange | ||
| 33 | Peppermint | ||
| 34 | King of bitters | ||
| 35 | Downy rose myrtle | ||
| 36 | Cat’s whiskers | ||
| 37 | Lark daisy | ||
| 38 | Laksa leaf | ||
| 40 | Gendarusa | ||
| 41 | Stone leaf | ||
| 42 | Wild pepper | ||
| 44 | Wild hops | ||
| 45 | Ylang- ylang | ||
| Tunku Canselor Hall | 10 | Golden Dew-Drop | |
| 11 | Chinese Croton | ||
| 14 | Jungle Geranium | ||
| 23 | Kemuning | ||
| 24 | Red Flag Bush | ||
| 25 | White Mussaenda | ||
| 26 | Ceylon Myrtle | ||
| 43 | Indian snakefoot | ||
| Varsity Lake | 3 | Golden Trumpet | |
| 4 | Great Bougainvillea | ||
| 13 | Chinese Hibiscus | ||
| Main Library | 18 | Chinese Fringe-flower | |
| 21 | Manihot | ||
| 39 | Crepe jasmine |
Figure 2Flowchart of the proposed methodology.
Figure 3Experimental setup. (A) Leaf compression, (B) Background setup, (C) Overview of experimental setup.
Figure 4Samples of the leaf images in the myDAUN dataset.
Figure 5Image pre-processing.
(A) original image, (B) grayscale image, (C) detected edge, (D) binary image, (E) filled binary image, (F) ROI image.
Basic geometrical and morphological descriptors.
| Diameter | Major axis length | Minor axis length | Area | Perimeter |
| Aspect ratio | Form factor | Rectangularity | Solidity | Eccentricity |
| Narrow factor | Convex area | Irrectangularity | Entirety | Equivalent diameter |
| Perimeter ratio of major axis length and minor axis length | Perimeter of convexity | Perimeter of area | Perimeter ratio of diameter | Perimeter ratio of major axis length |
Figure 6Neural network for tropical shrub species classification.
Classification accuracy for single descriptor.
| Descriptor | ANN | RF | SVM | k-NN | LDA | DAG MLSTSVM |
|---|---|---|---|---|---|---|
| MSD | 92.58 | 79.78 | 91.96 | 82.80 | 95.78 | |
| HOG | 91.58 | 84.53 | 90.40 | 79.76 | 95.40 | |
| Hu invariant moments | 82.27 | 83.36 | 32.74 | 82.99 | 37.65 | |
| Zernike moments | 87.85 | 59.34 | 87.75 | 56.40 | 90.54 | |
Notes.
average accuracy = average of 10 runs.
Classification accuracy of hybrid descriptors.
| Methods | Descriptor | ||||||
|---|---|---|---|---|---|---|---|
| ANN | RF | SVM | k-NN | LDA | DAG MLSTSVM | ||
| Hybrid of two descriptors | MSD + HOG | 93.45 | 91.01 | 92.03 | 89.56 | 96.94 | |
| MSD + Hu invariant moments | 96.67 | 92.84 | 81.99 | 92.35 | 85.14 | 95.96 | |
| MSD + Zernike moments | 96.60 | 92.86 | 84.61 | 91.92 | 84.31 | 96.25 | |
| HOG + Hu invariant moments | 96.24 | 92.39 | 87.72 | 91.07 | 83.47 | 95.88 | |
| HOG + Zernike moments | 93.70 | 92.58 | 89.93 | 91.47 | 85.58 | 93.52 | |
| Hu moments + Zernike moments | 93.67 | 90.07 | 73.45 | 89.35 | 68.31 | 92.65 | |
| Hybrid of three descriptors | MSD + HOG + Hu invariant moments | 97.59 | 93.62 | 91.78 | 92.42 | 90.09 | 96.99 |
| MSD + HOG + Zernike moments | 93.52 | 92.06 | 92.17 | 89.72 | 97.05 | ||
| MSD + Hu moments + Zernike moments | 96.64 | 93.24 | 87.93 | 92.10 | 86.12 | 96.32 | |
| HOG + Hu moments + Zernike moments | 97.06 | 93.37 | 91.29 | 91.56 | 87.23 | 96.70 | |
| Hybrid of all descriptors | MSD + HOG + Hu invariant moments + Zernike moments | 93.83 | 92.74 | 92.60 | 90.86 | 97.72 | |
Notes.
average of 10 runs.
Classification accuracy for the selected feature selection methods.
| Descriptors | Descriptors reduced (%) | |||
|---|---|---|---|---|
| Hybrid of all descriptors | 50 | 97.69 | 97.79 | 97.33 |
| 60 | 98.13 | 96.98 | 97.10 | |
| 70 | 97.64 | 97.10 | 97.15 | |
| None | ||||
Notes.
average of 10 runs.
Running time for features extraction.
| Descriptors | Time for all features extraction (min) | Time for features extraction with Relief (min) |
|---|---|---|
| MSD | 84.01 | 61.04 |
| HOG | 334.39 | 334.39 |
| HU | 225.00 | 189.55 |
| Zernike | 1620.00 | 748.25 |
| Total | 2263.40 | 1033.23 |
Classification results of the Flavia dataset and the Swedish Leaf dataset.
| Methods | Descriptor | ||
|---|---|---|---|
| Single descriptor | MSD | 93.30 | 98.65 |
| HOG | 93.49 | 99.15 | |
| Hu invariant moments | 80.46 | 95.20 | |
| Zernike moments | 83.22 | 95.95 | |
| Hybrid of two descriptors | MSD + HOG | 95.04 | 99.54 |
| MSD + Hu invariant moments | 93.12 | 98.37 | |
| MSD + Zernike moments | 93.41 | 99.16 | |
| HOG + Hu invariant moments | 93.55 | 99.24 | |
| HOG + Zernike moments | 93.87 | 99.54 | |
| Hu invariant moments + Zernike moments | 88.47 | 98.01 | |
| Hybrid of three descriptors | MSD + HOG + Hu invariant moments | 94.01 | 99.43 |
| MSD + HOG + Zernike moments | 95.14 | 99.64 | |
| MSD + Hu invariant moments + Zernike moments | 93.67 | 99.16 | |
| HOG + Hu invariant moments + Zernike moments | 94.08 | 99.52 | |
| Hybrid of all descriptors | MSD + HOG + Hu invariant moments + Zernike moments | ||
Notes.
average of 10 runs.
Comparison studies.
| Reference | Descriptor | Leaf dataset | Accuracy |
|---|---|---|---|
| HOG | Flavia | 84.70% | |
| Hu invariant moments | 25.31% | ||
| Zernike moments | Visleaf | 84.66% | |
| HOG | 92.67% | ||
| MSD | Flavia | 90.31% | |
| MSD, Hu invariant moments | Own dataset | 91.00% | |
| MSD | Flavia | 91.41% | |
| MSD, Texture, Color | Flavia | 93.75% | |
| MSD, Zernike moments, Vein, Color, Texture | Flavia | 93.82% | |
| MSD, Vein | Flavia | 94.20% | |
| Current study | MSD, HOG, Hu invariant moments, Zernike moments | MyDAUN | 98.23% |
| Flavia | 95.25% | ||
| Swedish Leaf | 99.89% |