| Literature DB >> 36076868 |
Gordana Ispirova1,2, Gjorgjina Cenikj1,2, Matevž Ogrinc1,2, Eva Valenčič1,2,3,4, Riste Stojanov5, Peter Korošec1,2, Ermanno Cavalli6, Barbara Koroušić Seljak1,2, Tome Eftimov1,2,7.
Abstract
Besides the numerous studies in the last decade involving food and nutrition data, this domain remains low resourced. Annotated corpuses are very useful tools for researchers and experts of the domain in question, as well as for data scientists for analysis. In this paper, we present the annotation process of food consumption data (recipes) with semantic tags from different semantic resources-Hansard taxonomy, FoodOn ontology, SNOMED CT terminology and the FoodEx2 classification system. FoodBase is an annotated corpus of food entities-recipes-which includes a curated version of 1000 instances, considered a gold standard. In this study, we use the curated version of FoodBase and two different approaches for annotating-the NCBO annotator (for the FoodOn and SNOMED CT annotations) and the semi-automatic StandFood method (for the FoodEx2 annotations). The end result is a new version of the golden standard of the FoodBase corpus, called the CafeteriaFCD (Cafeteria Food Consumption Data) corpus. This corpus contains food consumption data-recipes-annotated with semantic tags from the aforementioned four different external semantic resources. With these annotations, data interoperability is achieved between five semantic resources from different domains. This resource can be further utilized for developing and training different information extraction pipelines using state-of-the-art NLP approaches for tracing knowledge about food safety applications.Entities:
Keywords: annotated corpus; annotation methods; food consumption data; gold standard corpus; recipe data; semantic resource; semantic tags
Year: 2022 PMID: 36076868 PMCID: PMC9455825 DOI: 10.3390/foods11172684
Source DB: PubMed Journal: Foods ISSN: 2304-8158
Figure 1Example instance from the FoodBase-curated corpus.
Figure 2Flowchart of the methodology.
Figure 3Example of an instance from FoodBase annotated with tags from the FoodOn ontology.
Figure 4Example of an instance from FoodBase annotated with tags from the SNOMED CT ontology.
Figure 5Example of an instance from FoodBase annotated with tags from the FoodEx2 classification system.
Figure 6Ten most frequent semantic tags from FoodOn.
Figure 7Ten most frequent semantic tags from SNOMED CT.
Figure 8Ten most frequent semantic tags from FoodEx2.
Descriptive statistics about the FoodEx2 annotations on the FoodBase dataset.
| Number of FoodEx2 Annotations | Total Number of Annotated Instances | Number of Annotated Instances (Without Duplicates) |
|---|---|---|
| 1 | 5955 | 1076 |
| 2 | 801 | 234 |
| 3 | 621 | 91 |
| 4 | 300 | 64 |
| 5 | 146 | 36 |
| 6 | 358 | 58 |
| 7 | 192 | 58 |
| 8 | 1370 | 213 |
Descriptive statistics about the FoodEx2 annotations on the FoodBase dataset.
| Number of FoodEx2 Annotations | Total Number of Annotated Instances | Number of Annotated Instances (Without Duplicates) |
|---|---|---|
| 1 | 5825 | 1070 |
| 2 | 1056 | 270 |
| 3 | 755 | 132 |
| 4 | 530 | 81 |
| 5 | 286 | 72 |
| 6 | 158 | 61 |
| 7 | 157 | 13 |
| 8 | 1953 | 407 |