| Literature DB >> 35003196 |
Yang Li1,2, Xuewei Chao1.
Abstract
The crop pest recognition based on the convolutional neural networks is meaningful and important for the development of intelligent plant protection. However, the current main implementation method is deep learning, which relies heavily on large amounts of data. As known, current big data-driven deep learning is a non-sustainable learning mode with the high cost of data collection, high cost of high-end hardware, and high consumption of power resources. Thus, toward sustainability, we should seriously consider the trade-off between data quality and quantity. In this study, we proposed an embedding range judgment (ERJ) method in the feature space and carried out many comparative experiments. The results showed that, in some recognition tasks, the selected good data with less quantity can reach the same performance with all training data. Furthermore, the limited good data can beat a lot of bad data, and their contrasts are remarkable. Overall, this study lays a foundation for data information analysis in smart agriculture, inspires the subsequent works in the related areas of pattern recognition, and calls for the community to pay more attention to the essential issue of data quality and quantity.Entities:
Keywords: classification; feature engineering; few-shot; low-shot; redundancy
Year: 2021 PMID: 35003196 PMCID: PMC8739801 DOI: 10.3389/fpls.2021.811241
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Figure 1Some samples of the crop pest dataset.
Figure 2Overall framework.
Algorithm of the ERJ method.
|
|
| (1) Finetune parameters of model on the base dataset. |
| (2) Get the feature extractor from finetuned model. |
| (3) Feed the base data to the feature extractor to obtain the existing embedding range. |
| (4) Feed the pool data to the feature extractor to obtain the pool embeddings, and compare with the existing embedding range to judge the sample's information value. (5) If some samples have several dimensions outside the feature range, add it to the base data. Repeat this step until the data number reached 5% of whole data. |
| (6) Repeat the step 1–5 until the testing performance is satisfied or the data budget is full. |
|
|
Figure 3The relation between accuracy and data quantity under the shallow model.
Figure 5The relation between accuracy and data quantity under the deep model.
Figure 6The data selection strategy of the ERJ method under the shallow model.
Figure 7The data selection strategy of the ERJ method under the deep model.
Figure 8The good data vs. bad data under the shallow model.
Figure 10The good data vs. bad data under the deep model.