| Literature DB >> 29695741 |
Halil Bisgin1, Tanmay Bera2, Hongjian Ding3, Howard G Semey3, Leihong Wu2, Zhichao Liu2, Amy E Barnes3, Darryl A Langley3, Monica Pava-Ripoll4, Himansu J Vyas3, Weida Tong2, Joshua Xu5.
Abstract
Insect pests, such as pantry beetles, are often associated with food contaminations and public health risks. Machine learning has the potential to provide a more accurate and efficient solution in detecting their presence in food products, which is currently done manually. In our previous research, we demonstrated such feasibility where Artificial Neural Network (ANN) based pattern recognition techniques could be implemented for species identification in the context of food safety. In this study, we present a Support Vector Machine (SVM) model which improved the average accuracy up to 85%. Contrary to this, the ANN method yielded ~80% accuracy after extensive parameter optimization. Both methods showed excellent genus level identification, but SVM showed slightly better accuracy for most species. Highly accurate species level identification remains a challenge, especially in distinguishing between species from the same genus which may require improvements in both imaging and machine learning techniques. In summary, our work does illustrate a new SVM based technique and provides a good comparison with the ANN model in our context. We believe such insights will pave better way forward for the application of machine learning towards species identification and food safety.Entities:
Mesh:
Year: 2018 PMID: 29695741 PMCID: PMC5917025 DOI: 10.1038/s41598-018-24926-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The basic experimental procedure for our work. (a) Steps (such as beetle collection and imaging) to obtain input images; (b) the input images subjected to Feature Selection Method. (c and d) and the schematic showing the training and cross validation using ANN and SVM models.
Representation of different ANN architectures used in our work using various number of layers and nodes.
| One Layer | Two Layers | Three Layers |
|---|---|---|
| 50 | (50, 50) | (50, 150, 50) |
| 100 | (50, 100) | (100, 150, 100) |
| 150 | (150, 200) | (150, 150, 150) |
Figure 2Overlapping of features for various feature selection methods. Only 7 features were found to be selected by all three methods.
The significance of feature selection on the performance for the multi-class SVM model.
| Feature Set |
|
| Overall Accuracy | Std. Dev |
|---|---|---|---|---|
| All (625) | 1e + 02 | 1e − 04 | 0.84 | 0.048 |
| MID (50) | 1e + 03 | 1e − 05 | 0.81 | 0.050 |
| MIQ (50) | 1e + 01 | 1e − 03 | 0.83 | 0.046 |
| CBF (52) | 1e + 02 | 1e − 03 | 0.79 | 0.044 |
| CS (119) | 1e + 01 | 1e − 03 | 0.85 | 0.044 |
Figure 3Prediction performance as Sensitivity, Precision, Specificity and MCC values for the identification of all 15 species of beetles. It can be noted that some species of beetles could be identified with better confidence level than others. However, beetles such as species 2&10; 5&6 (of genus Oryzaephilus) and 13&14 (of genus Tribolium) were difficult to be identified (indicated by the arrows).
Accuracy, C and γ values for the binary class SVM model to differentiate between the difficult pair.
| Species Pair |
|
| Accuracy | Std. Dev |
|---|---|---|---|---|
| S2–S10 | 1e + 02 | 1e − 05 | 0.87 | 0.08 |
| S5–S6 | 1e + 02 | 1e − 05 | 0.79 | 0.13 |
| S13–S14 | 1e + 02 | 1e − 03 | 0.87 | 0.11 |
Figure 4Performance comparison for the original Multi-class and Hybrid SVM models in terms of their Accuracy values for the difficult pairs. The Hybrid Model that was aimed to combine the advantages of Multi-class and Binary models failed to outperform the original Multi-class model.
Accuracy values for ANN models with different parameters.
| No. of layers | No. of Nodes | Function | Overall Accuracy |
|---|---|---|---|
| 1 | 50 |
| 0.77 |
| 50 |
| 0.70 | |
| 100 |
| 0.79 | |
| 100 |
| 0.69 | |
| 150 |
| 0.79 | |
| 150 |
| 0.67 | |
| 2 | (50, 50) |
| 0.78 |
| (50, 50) |
| 0.71 | |
| (50, 100) |
| 0.78 | |
| (50, 100) |
| 0.70 | |
| (100, 150) |
| 0.79 | |
| (100, 150) |
| 0.51 | |
| 3 | (50, 150, 50) |
| 0.78 |
| (50, 150, 50) |
| 0.58 | |
| (100, 150, 100) |
| 0.79 | |
| (100, 150, 100) |
| 0.52 | |
| (150, 150, 150) |
| 0.79 | |
| (150, 150, 150) | trainrp | 0.43 |
The effect of z-score normalization.
| No. of nodes | without z-score | z-score |
|---|---|---|
| (50, 50) | 0.78 | 0.78 |
| (50, 100) | 0.78 | 0.78 |
| (100, 150) | 0.79 | 0.79 |
Figure 5Comparison the performance metrics between ANN and SVM methods for individual beetle species. It can be noted that both SVM and ANN methods are quite comparable to each other, with each having its own advantages.