| Literature DB >> 28587103 |
Tome Eftimov1,2, Peter Korošec3,4, Barbara Koroušić Seljak5.
Abstract
The European Food Safety Authority has developed a standardized food classification and description system called FoodEx2. It uses facets to describe food properties and aspects from various perspectives, making it easier to compare food consumption data from different sources and perform more detailed data analyses. However, both food composition data and food consumption data, which need to be linked, are lacking in FoodEx2 because the process of classification and description has to be manually performed-a process that is laborious and requires good knowledge of the system and also good knowledge of food (composition, processing, marketing, etc.). In this paper, we introduce a semi-automatic system for classifying and describing foods according to FoodEx2, which consists of three parts. The first involves a machine learning approach and classifies foods into four FoodEx2 categories, with two for single foods: raw (r) and derivatives (d), and two for composite foods: simple (s) and aggregated (c). The second uses a natural language processing approach and probability theory to describe foods. The third combines the result from the first and the second part by defining post-processing rules in order to improve the result for the classification part. We tested the system using a set of food items (from Slovenia) manually-coded according to FoodEx2. The new semi-automatic system obtained an accuracy of 89% for the classification part and 79% for the description part, or an overall result of 79% for the whole system.Entities:
Keywords: FoodEx2; food classification; food description; food standardization; machine learning; natural language processing
Mesh:
Year: 2017 PMID: 28587103 PMCID: PMC5490521 DOI: 10.3390/nu9060542
Source DB: PubMed Journal: Nutrients ISSN: 2072-6643 Impact factor: 5.717
Figure 1The StandFood classification part flowchart. FoodEx2 names are used as training instances. They are pre-processed by removing numbers and punctuations and stemming. Then, the document-term matrix is built, so it can be used for feature selection. Additional features to the problem are added to the selected features. Different classifiers and ensemble learning are used to obtain a model that can be further used to predict the food category of new unseen food items.
Figure 2StandFood description part flowchart. For each food item that needs to be described according to FoodEx2, its English name is used. The name is pre-processed by converting it to lowercase letters. Part-Of-Speech (POS) tagging is used to extract its nouns, adjectives, and verbs. The extracted sets are further transformed using lemmatization. Using the extracted nouns, the FoodEx2 data is searched for the names that consist of at least one of the extracted nouns. Then, the resulting list (subset) is pre-processed by converting each food item name to lowercase letters, applying POS tagging to extract the nouns, adjectives, and verbs, and using lemmatization for the extracted sets. Then, the food item that needs to be described according to FoodEx2 is matched with each food item in the resulting list and a weight, Wi, i = 1,.., n, which is the probability obtained using Equation (3), is assigned on each matching pair. Finally, the pair with the highest weight is the most relevant one, so it is returned together with its food category from FoodEx2.
Figure 3StandFood post-processing part. The results from the StandFood classification and description parts are combined together to improve the accuracy of the classification part for the food category prediction.
Classification accuracy for each machine learning (ML) algorithm using 10-fold cross validation. SVM—support vector machine, SLDA—scaled linear discriminant analysis, RF—random forest, TREE—classification tree, NNET—neural networks.
| Metric | SVM | SLDA | RF | Maxent | Boosting | Bagging | TREE | NNET |
|---|---|---|---|---|---|---|---|---|
| Accuracy (%) | 88.50 | 72.41 | 88.95 | 89.21 | 85.88 | 83.47 | 69.02 | 77.12 |
Precision and recall for each food category using the evaluation set.
| Category | Precision | Recall |
|---|---|---|
| r | 0.72 | 0.99 |
| d | 0.81 | 0.81 |
| c | 0.75 | 0.67 |
| s | 0.95 | 0.57 |
Correctly classified instances by the StandFood classification part.
| Food Item | Category |
|---|---|
| Barley grains | r |
| Mandarins ( | r |
| Buckwheat flour | d |
| Oat flakes | d |
| Fruit compote | s |
| Marmalade, mixed fruit | s |
| Rice and vegetables meal | c |
| Mushroom soup | c |
Results from the StandFood description part for ten randomly selected instances. Food item represents the name of the food item in the test set. StandFood relevant FoodEx2 item is the name of the most relevant match that exists in the FoodEx2 found by the StandFood. StandFood FoodEx2 code is the FoodEx2 code of the most relevant match found by the StandFood. Manual FoodEx2 code is the FoodEx2 code that was manually assigned to that food item by a human expert.
| Food Item | StandFood FoodEx2 Code | StandFood Relevant FoodEx2 Item | Manual FoodEx2 Code |
|---|---|---|---|
| Mushroom soup | A041R | Mushroom soup | A041R |
| Prepared green salad | A042C | Mixed green salad | A042C |
| Meat burger | A03XF | Meat burger no sandwich | A03XF |
| Yeast | A049A | Baking yeast | A049A |
| Brown sauce (gravy, lyonnais sauce) | A043Z | Continental European brown cooked sauce gravy | A043Z |
| Cow milk, <1% fat (skimmed milk) | A02MA | Cow milk skimmed low fat | A02MA |
| Supplements containing special fatty acids (e.g., omega-3, essential fatty acids) | A03SX | Formulations containing special fatty acids (e.g., omega-3 essential fatty acids) | A03SX |
| Durum wheat flour (semola) | A004C | Wheat flour durum | A004F |
| Gingerbread | A00CT | Gingerbread | A009Q$F14.A07GX |
| Cherry, fresh | A01GG | Cherries and similar | A01GK |
| A01GH | Sour cherries | ||
| A01GK | Cherries sweet | ||
| A0DVN | Nanking cherries | ||
| A0DVP | Cornelian cherries | ||
| A0DVR | Black cherries |
Precision and recall for each food category after applying the post-processing rules.
| Category | Precision | Recall |
|---|---|---|
| r | 0.85 | 0.99 |
| d | 0.90 | 0.84 |
| c | 0.82 | 0.87 |
| s | 0.97 | 0.83 |
Food categories for seven food items after post-processing rules. Classification part category is the food category assigned by the StandFood classification part. Post-processing category is the category assigned after post-processing rules.
| Food Item | Classification Part Category | Post-processing Category |
|---|---|---|
| Cabbage Chinese boiled | r | d |
| Marzipan | r | s |
| Gingerbread | r | c |
| Water, bottled, flavored, citrus | d | s |
| Salad, tuna-vegetable, canned | d | c |
| Multigrain rolls | c | s |
| Croissant, filled with jam | s | c |