Literature DB >> 35607620

Applications of knowledge graphs for food science and industry.

Weiqing Min^1,2, Chunlin Liu^1,2, Leyi Xu³, Shuqiang Jiang^1,2.

Abstract

The deployment of various networks (e.g., Internet of Things [IoT] and mobile networks), databases (e.g., nutrition tables and food compositional databases), and social media (e.g., Instagram and Twitter) generates huge amounts of food data, which present researchers with an unprecedented opportunity to study various problems and applications in food science and industry via data-driven computational methods. However, these multi-source heterogeneous food data appear as information silos, leading to difficulty in fully exploiting these food data. The knowledge graph provides a unified and standardized conceptual terminology in a structured form, and thus can effectively organize these food data to benefit various applications. In this review, we provide a brief introduction to knowledge graphs and the evolution of food knowledge organization mainly from food ontology to food knowledge graphs. We then summarize seven representative applications of food knowledge graphs, such as new recipe development, diet-disease correlation discovery, and personalized dietary recommendation. We also discuss future directions in this field, such as multimodal food knowledge graph construction and food knowledge graphs for human health.

Entities: Chemical

Keywords: artificial intelligence; food analysis; food science and industry; knowledge graph; new recipe development; nutrition and health; ontology

Year: 2022 PMID： 35607620 PMCID： PMC9122965 DOI： 10.1016/j.patter.2022.100484

Source DB: PubMed Journal: Patterns (N Y) ISSN： 2666-3899

Introduction

Food is critical to human life. It travels from the farm origin, through the growing, harvesting, packing, processing, transforming, production, transporting, distribution, to consuming and disposing of food, forming the food system. The production of huge volumes of multidisciplinary and heterogeneous food data (e.g., nutrition composition table, health databases, food images, food ordering data, and recipes) from the food system provides a basis for the development of artificial intelligence (AI), making digital technology an indispensable part of food science and industry. Each stage from food processing to consuming in this system can be replaced with data-driven computational methods to prompt the development of food science and industry, such as the use of neural networks in modeling the food process,, food quality assessment, food object recognition and analysis,5, 6, 7 food authentication and traceability, and dietary assessment., However, these food data are still not sufficiently utilized, and it is still hard to satisfy the demand for effective food data sharing, organization, and traceability, which restricts the development of food science and technology. For example, in the food supply chains, the data from different food companies may be under different naming conventions, which restricts the aligning of food terms and the integration of different food data sources, making it harder to optimize the food supply system. In addition, more complex issues such as food contamination traceability and exposure assessment involve data in multiple fields. They also require food systems to have abilities to integrate food data and organize food knowledge extracted from these multi-source heterogeneous data. Therefore, there is one general agreement on the importance of organizing and integrating food data in food science and industry. Only in this way can we easily access and interchange food-relevant data all over the world, extract food-related information, and organize food knowledge, which benefits different stakeholders, such as researchers, food manufacturers, food distributors, retailers, and consumers. For example, such a standardized knowledge organization system can facilitate governance via more efficient knowledge access and utilization, and food manufacturers and distributors can trace the processing and circulation of food commodities. All of them can make smarter decisions with the standardized knowledge organization system mentioned earlier. A key requirement for standardization is to make heterogeneous data from multiple sources interoperable. For that, the Internet of Food is proposed to help tackle this problem via defining one lingua franca. Along with the changes in the form of data and the increasing volume of data, many types of lingua franca emerge with different ways of organizing food data. Considering the ontology describes more complex structures with arbitrary relations and restrictions between concepts, different food ontologies have been developed, such as FoodOn ontology and ISO-FOOD ontology. Some communities, such as the Ontologies Community of Practice (CoP) have been created to support high-quality ontology development for agri-food research. The food knowledge graph generally adopts the ontology as its schema to further model more real-world instances and their relationships in a graph., It provides a unified and standardized conceptual terminology and their relations to link various information silos related to food, and can thus have a considerable impact in food science and industry. A range of applications include food safety (e.g., the traceback of food contamination), food allergy, chemical exposure and nutritional assessment, cooking, and culinary use. There have been some relevant reviews on knowledge graphs from different perspectives.,20, 21, 22, 23 In contrast, this work seeks to provide a comprehensive review on knowledge graphs in the food domain, namely food knowledge graphs, including the evolution from food ontology to knowledge graphs, their representative applications, and prospects in food science and industry.

Knowledge graph

In this section, the history of knowledge graphs is briefly introduced, and how they are constructed, represented, and used is also discussed. In order to better describe the development of knowledge graphs, commonly used terms are summarized in Table 1.

Table 1

A glossary of commonly used terms in knowledge graphs

Term	Description
Entity	an entity can be a real-world object (instance) or an abstract concept. Each entity has a collection of attributes and relations among it
Relation	relation, also named entity description, refers to the interlinked description of entities. It should have formal semantics and support entities to form a graph
RDF	a uniform standard to describe entities and relations in the form of subject-predicate-object triplesa
RDFS	Extends RDF by adding common predefined vocabularies and supports constructing lightweight ontology
OWL	The W3C standard for defining ontologies. It provides the mechanisms for creating all the necessary components of an ontology: concepts, instances, and properties (or relations)
IRI	an Internet protocol standard used to identify and locate every entity and relation uniquely. Common identifiers like URL and URI are subsets of IRI
Classification	classification is one systematic arrangement in groups or categories according to established criteriab
Taxonomy	taxonomy is a classification of things in a hierarchical form. It is usually a tree or a lattice that expresses subsumption relations (i.e., A subsumes B, meaning that everything that is in A is also in B.) The fundamental difference between taxonomy and classification is that taxonomies describe relations between items, while classification simply groups the itemsc
Semantic Web technologies	Semantic Web technologies refer to all the technologies needed in the construction of the Semantic Web, including Hypertext Web technologies like IRI and XML, Standardized Semantic Web technologies for querying (SPARQL), description (RDF), and schema (RDFS/OWL), and those unrealized or unstandardized Semantic Web technologies (like proof and trust layer for inferring and validation, and user interface for interaction). All of these technologies are combined to support a complete knowledge graph. These Web technologies are hierarchical, and each type of Web technology exploits the capabilities of the layers below
Ontology	An ontology is a description of concepts and relations (e.g., synonymy and meronymy). The main difference between ontology and taxonomy is that a taxonomy is an ontology in the form of a hierarchy. In many systems, ontologies and taxonomies work together
Schema	Schema usually means the technology that provides the standard, rules, and principles for entities and their usage: they define all the classes and attributes that entity of each class should have. Ontology is usually used as the schema in the knowledge graph
Semantic network	Semantic network consists of nodes and edges, where nodes represent entities and edges represent the relations. There is no standard for the values of nodes and edges
Linked data	Linked data is about using the Semantic Web technologies to connect related data that are not previously linked and emphasizes the link creation between different datasets. Since datasets of the linked data are open access. It is also called linked open datasets

URL, Uniform Resource Locator.

https://www.w3.org/TR/PR-rdf-syntax/

https://classroom.synonym.com/difference-between-classification-taxonomy-10074596.html

https://www.obitko.com/tutorials/ontologies-semantic-web/specification-of-conceptualization.html

A glossary of commonly used terms in knowledge graphs URL, Uniform Resource Locator. https://www.w3.org/TR/PR-rdf-syntax/ https://classroom.synonym.com/difference-between-classification-taxonomy-10074596.html https://www.obitko.com/tutorials/ontologies-semantic-web/specification-of-conceptualization.html

Brief history of the knowledge graph

The history of knowledge graphs and their related technologies are demonstrated in Figure 1. The graph is a type of sparse data structure that consists of nodes and edges, which is suitable to represent relations between objects. The idea of graph-based knowledge representation can be traced back to the 1960s when the semantic network was first proposed as a form of knowledge representation. It uses the nodes to represent concepts and edges to represent relations between concepts in one graph. In semantic networks, there are no standards for the use of values of nodes and edges, which means the developers can freely define the nodes and their relations. Therefore, it is hard to integrate different semantic networks, making it difficult to apply semantic networks in practice.

Figure 1

The evolution of the knowledge graph

This figure shows the development of main semantic data organizations above the arrow, from the semantic network to the knowledge graph. Below the arrow, it displays key Semantic Web technologies. These Web technologies are listed hierarchically, and each type of Web technology relies on the capabilities of the layers below. With more technologies, more practical and powerful semantic data organization can be supported. The ultimate vision of semantic data organization is the Semantic Web, where all data are linked through relations.

The evolution of the knowledge graph This figure shows the development of main semantic data organizations above the arrow, from the semantic network to the knowledge graph. Below the arrow, it displays key Semantic Web technologies. These Web technologies are listed hierarchically, and each type of Web technology relies on the capabilities of the layers below. With more technologies, more practical and powerful semantic data organization can be supported. The ultimate vision of semantic data organization is the Semantic Web, where all data are linked through relations. Later, the Resource Description Framework (RDF) is proposed to partially solve the problem of standards. RDF is developed by the World Wide Web Consortium (W3C) as a standard for describing Web resources. The main data model of RDF is the subject-predicate-object triple expression, which indicates that the two entities (subject and object) are connected through a relation (predicate). These entities and relations generally use International Resource Identifiers (IRIs) as indexes in the RDF framework to address the difficulties in integrating data from different sources. This is because the same entity and relation have the same and unique IRI which has already been defined. Based on RDF, Berners Lee proposes the concept of the Semantic Web, which is also known as Web 3.0. Semantic Web is a grand idea about the future Internet. Its final goal is to make all the data on the Internet be published with semantics and linked with semantics to enable efficient and intelligent data querying, inference, and understanding. In order to build the Semantic Web, W3C helps to build a technology stack called Semantic Web technologies, which could be involved in the construction of the Semantic Web (e.g., RDF). Although Semantic Web remains largely unrealized, these technologies are widely used. Linked data is one of its implementations proposed in 2006, which publishes and interlinks datasets on the Internet using Semantic Web technologies. Compared with the semantic network, linked data emphasizes links between Web data and Web resources. For example, elements of RDF triples of linked data are expected to be IRIs as much as possible, so that they can be unique and addressable on the Internet. RDF still lacks the abstraction ability and cannot describe or distinguish relations between entities, which affects knowledge understanding and inference. Thus, W3C successively proposes Resource Description Framework Schema (RDFS) and Web Ontology Language (OWL). RDFS and OWL extend RDF by adding common predefined vocabularies in the schema level so that they can represent abstract relations, like classes (concepts), instances (objects), subsets, and properties. The schema level is later separated to be the schema layer and is introduced to graph-based knowledge representation as a vocabulary and semantic specification. Many data models can be used as one schema layer, and ontology is the most widely used one. It is a knowledge specification, a formal explicit description of concepts within a certain domain, properties of each concept, and restrictions on facts. The aim of an ontology is to provide shared understanding to conceptual knowledge and give the definition to mutual relations between concepts, which makes semantic-based inference possible. Since RDFS and OWL provide good presentation capabilities and semantics supports, they are the main description language of the ontologies. In 2012, Google proposed the term knowledge graph, which mainly describes real-world entities and their relations in a graphical representation, and defines possible classes and relations of entities with the ontology as one schema. It is synonymous with the knowledge base with a minor difference. A knowledge graph can be viewed as a graph when considering its graph structure. When we highlight formal semantics, it can be taken as a knowledge base for interpretation and inference over facts. Currently, there is no unifying definition of knowledge graphs. Herein we adopt the following definition: a knowledge graph is viewed as a multi-relational graph of data for conveying real-world knowledge, where nodes represent entities and edges represent different types of relations. The focus of knowledge graphs is instances, while the ontology is often used as the schema and plays a minor role in the knowledge graph. In general, the number of instance-level statements from knowledge graphs is far larger than that from the ontology.

Knowledge graph construction, representation, reasoning, and applications

In order to explore knowledge graphs for applications, we first construct the knowledge graph. Based on the constructed knowledge graph, effective representation for knowledge graphs should be necessary to support further reasoning and applications, such as search and recommendation. The basic pipeline of knowledge graph construction, representation, reasoning, and applications is summarized in Figure 2. More detailed and comprehensive introductions to knowledge graphs, such as knowledge graph creation tools and more application examples, are available elsewhere.

Figure 2

Pipeline of knowledge graph construction, representation, reasoning, and applications

To construct a knowledge graph, a huge volume of data should be processed, including unstructured, semi-structured, and structured data. Later, knowledge graphs can be constructed either manually or automatically, and the latter method mainly includes three components: knowledge extraction, knowledge fusion, and knowledge refinement. Constructed knowledge graphs can be further used for representation learning and reasoning to support various tasks, such as search, recommendation, and question answering.

Pipeline of knowledge graph construction, representation, reasoning, and applications To construct a knowledge graph, a huge volume of data should be processed, including unstructured, semi-structured, and structured data. Later, knowledge graphs can be constructed either manually or automatically, and the latter method mainly includes three components: knowledge extraction, knowledge fusion, and knowledge refinement. Constructed knowledge graphs can be further used for representation learning and reasoning to support various tasks, such as search, recommendation, and question answering.

Construction

Completeness, accuracy, and data quality are three important factors that determine the usefulness of knowledge graphs and are influenced by the way knowledge graphs are constructed. Knowledge graphs can be constructed either manually or automatically. Manual construction methods include curated ones (e.g., Cyc) and collaborative ones (e.g., Wikidata), where the former creates triples by a closed group of experts, while the latter resorts to an open group of volunteers. Manually constructed knowledge graphs have few or no noisy facts. However, they require very great human effort. As a result, auto-constructed methods are explored and have become mainstream. Auto-constructed methods can further be grouped into two types. The first one utilizes hand-crafted rules and learned rules to exploit semi-structured data, such as Wikipedia infoboxes, leading to larger, more highly accurate knowledge graphs such as DBpedia. This method can still guarantee high accuracy of knowledge. However, semi-structured text still covers a small fraction of the information stored on the Web, and these repositories are still far from complete. Hence the second approach is proposed to extract facts from unstructured text using machine learning and natural language processing techniques. The knowledge vault is one representative project in this category. In order to reduce the level of “noise” in extracted facts, a large body of research has been conducted, which mainly consists of three components: knowledge extraction, knowledge fusion, and knowledge refinement. Knowledge extraction aims to acquire relevant entities, attributes, and relations from various data sources. Information is collected and normalized, forming knowledge expression. Considering there are multiple representations for one entity in many cases and inconsistency of triples extracted by multiple information extractors from multiple information sources, knowledge fusion is one necessary step, and the main processes include entity alignment and entity linking, where entity alignment is the process to judge whether different entities refer to the same real-world object or not, and entity linking links the entities in text with the corresponding one in knowledge graphs. After initial construction, different refinement methods, such as entity classification, relation prediction, and anomaly detection, are then utilized to improve the quality of the constructed knowledge graph.

Representation and reasoning

Effective representation learning for knowledge graphs (namely, knowledge graph embedding) then should be explored based on the constructed knowledge graph. It can encode both entities and relations into a continuous low-dimensional vector space. Different representation learning methods, such as linear models, neural networks, and translation methods, are proposed.35, 36, 37 Based on learned feature representation, we can further conduct knowledge graph reasoning to identify errors and infer new conclusions from existing data. New relations among entities can also be derived through knowledge reasoning and in turn can be used to enrich the knowledge graphs. Different reasoning methods, such as rule-based reasoning and neural network-based reasoning, are proposed. Note that neural networks have been widely used for knowledge graph representation and reasoning for their powerful nonlinear fitting capability.

Applications

Knowledge graph representation and reasoning can support various tasks, such as relation extraction and entity classification, and real-world applications, such as question answering (QA), information retrieval, and recommender systems. Here, we briefly discuss four critical use cases: search, recommendation, QA, and decision making. For search, the knowledge graph can be used to understand user’s query intents to support semantic search, which aims to not only find keywords but to determine the intent and contextual meaning of the query words a person is using. Semantic search provides more meaningful search results by evaluating the search phrase and finding more relevant results. The knowledge graph enhances semantic search by providing more structured search results and better summaries. With the knowledge graph, the search engine can summarize relevant content around that topic in the form of knowledge cards, including key facts for that particular thing. For example, when users search “apple cake,” the content presented by knowledge cards includes various attribute information (e.g., cuisine, course, main ingredients) and other relevant information. In addition, it can expand the user’s search results via the rich association of entities in the knowledge graph. For example, when the user searches for apple cake, besides its basic information, semantic search can return its cooking recipes about them. For QA, it has applications in a wide variety of fields such as chatbots. Answering questions using knowledge graphs adds a new dimension to these fields. As outlined by L. Hirschman and R. Gaizauskas, a knowledge-graph-based QA system involves answering a natural language question using the information stored in a knowledge graph. The input question is first translated into a formal query language and then this formal query is executed over the knowledge graph to fetch the answer. Such systems have been integrated into popular Web search engines like Google and Bing as well as conversational assistants like Siri. For recommendation, the recommendation systems based on knowledge graph connect users and items, which can integrate multiple data sources to enrich semantic information. Implicit information can be obtained through reasoning techniques over knowledge graphs to improve recommendation accuracy. There are several typical cases for knowledge-graph-based recommendation, such as food recommendation, movie recommendation, and music recommendation. Knowledge graphs can benefit the recommendation from three aspects: (1) the knowledge graph can introduce the semantic relatedness among items to help find their latent connections and improve the precision of recommended items; (2) various types of relations in the knowledge graph are very helpful to extend a user’s interests and increase the diversity of recommended results; (3) the knowledge graph can bring explainability to recommender systems via the connection between users’ historical records and the recommended ones. For decision making, it is the act of choosing between possible solutions to a problem. Knowledge graphs store expert knowledge from different domains to support highly complex decision making. As one representative domain, knowledge graphs are actively used in the medical domain. When applied to medical knowledge graphs, reasoning on knowledge graphs can help doctors to diagnose disease and control errors to build a decision support system.

Search method

We search for articles where food semantic data organization (such as food linked data, food ontologies, and food knowledge graphs) are proposed or utilized using the following electronic databases: IEEE Electronic Library (ieeexplore.ieee.org), ACM Digital Library (dl.acm.org), Science Direct (www.sciencedirect.com), MEDLINE (PubMed, pubmed.ncbi.nlm.nih.gov), and Arxiv (arxiv.org). The following descriptors are used as a strategy for search in titles and abstracts: (food OR diet OR cook OR nutrition) AND (“linked data” OR ontology OR “knowledge graph” OR “semantic web” OR “semantic network”). Our search strategy is not restricted by publication year and language. We apply the following criteria for the inclusion of studies: (1) at least one food semantic data organization is proposed or utilized in research; (2) the food semantic data organization should be designed for food purposes specifically (such as cook, diet, recipe, health care, and food production); (3) the construction or the usage of the food semantic data organization should be described in detail. The following exclusion criteria are applied: (1) the research is irrelevant to our topic (including not developed for food domains specifically); (2) no food semantic data organization is proposed or utilized in research. A total of 167 studies were identified through the searches in the databases. After the removal of nine duplicate studies, 158 unique records remained, from which 83 studies were excluded based on their titles and abstracts because they were considered irrelevant. Later, the authors manually added several relevant studies that were not included in the above databases. In total, 83 studies were reviewed and evaluated in full for eligibility, and 58 met all the criteria adopted for this review and are thus included as the flowchart shown in Figure 3. All food ontologies and knowledge graphs selected in this article are summarized in Tables 2 and 3, and these studies are organized chronologically by year of publication.

Figure 3

Flowchart of the study selection process

We totally identified 167 studies from the databases. We removed nine duplicate studies, excluded 83 irrelevant studies based on titles and abstracts. After manually adding several relevant studies, 83 studies are reviewed and evaluated in full for eligibility, and 58 studies meet all the criteria for this review and are thus included.

Table 2

Summary of existing food ontologies

Name	Year	Domain	Purpose
PIPS food ontology⁴³	2005	food and nutrition	providing food nutritional information
Cooking ontology⁴⁴	2006	food and cooking	ontology construction research
FOODS⁴⁵	2008	(Thailand) food and nutrition	food or menu planning for people with diabetes
AGROVOCa^,⁴⁶	2011	agriculture, fisheries, forestry and food	agricultural field terminology reference
Edamam food ontologyb	2012	food, recipes, nutrition, and healthy eating	enabling food-related various applications, like healthy eating and cooking robots
FTTO⁴⁷	2013	food supply chain	supporting modeling of the food supply chain
Open food factsc	2013	packaged food product information	food product comparison and search
BBC food ontologyd	2014	food, recipes and diets	recipe data publishment
Taaable cooking ontology⁴⁸	2014	food, cooking, and nutrition	personalized cooking
Unified Traveler and Nutrition ontology⁴⁹	2015	food dishes and medicine	healthy food recommendation
Food in open data ontology⁵⁰	2015	general food	creating linked open data datasets
Food ontology knowledge base (from FoodWiki)⁵¹	2015	packaged food	building ontology-driven mobile safe food consumption system
Food product ontology⁵²	2016	packaged food	(Russia) food products and domain data
OFPEe	2016	food processing	research on food processing
(P O²)⁵³	2016	food processing	research on food production processes with data from different disciplines
RICHIFIELDS ontology⁵⁴	2017	general food	food-related integration, retrieval and updating
AFEOf	2017	viticultural practices and winemaking products	research about food traceability and quality
MESCO⁵⁵	2017	food supply chain	meat supply chain
FoodOn ontology¹⁵	2018	food sources, categories, products, and other facets	increasing global food traceability, quality control, and data integration
HeLiS⁵⁶	2018	food and nutrition	users’ actions and behaviors monitoring
ONS⁵⁷	2019	food and nutrition	nutritional studies
ISO-FOOD¹⁶	2019	food and isotopic	describing isotopic data within food science
Food safety ontology⁵⁸	2019	food safety	QA on food safety
FOBI Ontology⁵⁹	2020	food nutrition and metabolite	food nutrition and metabolite research
SCT⁶⁰	2020	food supply chain	support agricultural food traceability
Seafood ontology⁶¹	2021	seafood	seafood quality control
FEO⁶²	2021	food knowledge about recommendation and explanation	providing users explanations for food recommendation
OFFF⁶³	2021	food and nutrition	fast food nutritional data aggregation

The origin of AGROVOC can be traced back to the 1980s, and its linked data version is realized in 2011.

https://www.edamam.com/

https://world.openfoodfacts.org/data

https://www.bbc.co.uk/ontologies/fo

http://agroportal.lirmm.fr/ontologies/OFPE

http://data.agroportal.lirmm.fr/ontologies/AFEO

Table 3

Summary on existing food knowledge graphs

Name	Year	Ontology	Purpose
Knowledge Graph for the FEWa	2017	–	data-driven research
Chinese Food Knowledge Graph⁶⁴	2018	∗	healthy diet knowledge retrieval
Foodbar Knowledge Graph⁶⁵	2018	–	small miniature bites or dishes cognitive gastroevaluation
Healthy Diet Knowledge Graph⁶⁶	2019	∗	healthy diet management and recommendation
AgriKG⁶⁷	2019	∗	agricultural entity retrieval and QA
FoodKG⁶⁸	2019	WhatToMake ontology	food recommendation
Food Safety Knowledge Graph⁵⁸	2019	food safety ontology	QA system for the food safety domain
Food Knowledge Graph with Dietary Factors and Associated Cardiovascular Disease⁶⁹	2020	–	identifying dietary factors associated with cardiovascular disease
Food Spot-check Knowledge Graph⁷⁰	2020	food safety ontology∗	food spot-check QA system
Food Knowledge Graph (from World Food Atlas Project)⁷¹	2021	FoodOn ontology	supporting healthier and more enjoyable diets
RcpKG⁷²	2021	–	personalized recipe recommendation

Dash (–) indicates unknown ontology, and asterisk (∗) indicates that the food ontology is specially constructed for the corresponding food knowledge graph.

https://mospace.umsystem.edu/xmlui/handle/10355/62663

Flowchart of the study selection process We totally identified 167 studies from the databases. We removed nine duplicate studies, excluded 83 irrelevant studies based on titles and abstracts. After manually adding several relevant studies, 83 studies are reviewed and evaluated in full for eligibility, and 58 studies meet all the criteria for this review and are thus included. Summary of existing food ontologies The origin of AGROVOC can be traced back to the 1980s, and its linked data version is realized in 2011. https://www.edamam.com/ https://world.openfoodfacts.org/data https://www.bbc.co.uk/ontologies/fo http://agroportal.lirmm.fr/ontologies/OFPE http://data.agroportal.lirmm.fr/ontologies/AFEO Summary on existing food knowledge graphs Dash (–) indicates unknown ontology, and asterisk (∗) indicates that the food ontology is specially constructed for the corresponding food knowledge graph. https://mospace.umsystem.edu/xmlui/handle/10355/62663

Development of the food knowledge graph

Knowledge graphs allow for potentially interrelating arbitrary entities with each other from various domains. When focusing on the field of food, they become food knowledge graphs. Before delving into food knowledge graphs, we first introduce the development of food ontology, since food ontology plays an important role in the development of food knowledge graphs. In addition, we also give some discussions on other forms of food knowledge organization, such as food-oriented linked data.

Food ontology

Food ontology uses the shared terminology for types, properties, and relations about food concepts, and thus can help tackle data harmonization problems that span food-relevant domains. Table 2 summarizes existing food ontologies from different aspects, where some food ontologies have been introduced by previous work. Different food ontologies focus on different aspects of food and cover different sub-domains. We mainly divide them into the following four different types: (1) cooking and recipe ontologies, (2) nutrition and health ontologies, (3) other food sub-domain ontologies, and (4) more general and comprehensive food ontologies.

Cooking and recipe ontologies

Taaable, Cooking ontology, Edamam food ontology, and BBC food ontology are about cooking and recipes. Cooking ontology is one of the earliest cooking-oriented ontologies. It mainly contains actions, foods, recipes, and utensils. In the cooking ontology, one recipe is organized by phases of the cooking process, where each phase is a sequence of sorted tasks, and each task is composed by action and incorporates information about needed and produced ingredients and their duration time. Recipes also have their classification, ingredient lists, and required utensils. Cooking ontology aims to enrich cooking-oriented QA by being integrated into a dialogue system. Edamam food ontology aims to support creating a comprehensive and authoritative food knowledge base on cooking information. To do this, Edamam extracts the recipes from websites and maps these recipe terms to professional industry databases to eliminate duplicates and ambiguity. It has already supported search applications on mobile platforms and Web pages for consumers to provide various food knowledge information such as ingredients, nutrition information, and allergies. In contrast, BBC food ontology is a simple lightweight ontology for publishing data about recipes, including the foods they are made from and the foods they create as well as the diets, menus, seasons, courses, and occasions they may be suitable for. These ontologies facilitate cooking recipe-based works, such as mining, retrieval, and recommendation.,

Nutrition and health ontologies

Some food ontologies focus on health and nutrition concepts, which allow them to help healthy advising and monitoring in various food applications. For example, Personalized Information Platform for Health and Life Services (PIPS) food ontology provides nutritional advice for diabetic patients. It presents an abstract model of different types of foods with nutritional information, including the type, amount, and recommended daily intake of nutrients, with a total of 177 classes, 53 properties, and 632 instances. Similarly, Food-Oriented Ontology-Driven System (FOODS) is also designed to provide diet advice for diabetic patients. In contrast, this ontology contains more aspects and concepts, like patients’ personal situations and characteristics of foods (such as food specifics and flavors). Thus FOODS can provide more personalized and suitable diet recommendations for diabetic patients. Different from the above food ontology serving special populations, the unified Traveler and Nutrition ontology can support food recommendation to help general tourists make personalized food-related choices and develop a healthy food plan. This recommendation system is required to give recommendations by combining various factors from the food itself to the cultural requirements of tourists and regions of interest. Therefore, besides food nutrition, this food ontology also integrates various types of concepts from dishes, people, and medical conditions to support more comprehensive dietary recommendation. HeLiS was created to monitor both users’ actions and their unhealthy behaviors by providing the representation of both food and physical activity domains. Besides covering concepts from activities to nutrients in foods, HeLiS also introduces the user concept, and it thus can associate the specific health-related events with people for health monitoring or further nutrient applications. In contrast, the FoodWiki ontology was developed for the packaged food products on market shelves. It collects the nutrition content and provides packaged food recommendation while avoiding the impact of unhealthy or allergic ingredients on consumers. Later, FoodWiki was further developed to build an ontology-driven mobile safe food consumption system for monitoring food intake. In addition, there are some ontologies, such as ontology for nutritional studies (ONS) and ontology of fast food facts (OFFF) for food nutritional science study. For example, ONS is presented to facilitate the integration of different terminologies from different sub-disciplines in dietary and nutritional research and finally supports nutritional studies.

Food safety ontologies

Some ontologies are developed for the food safety domain, where food traceability is mainly considered. For example, Food Track&Trace Ontology (FTTO), was developed for food traceability. It contains representative food concepts involved in a supply chain and is able to integrate and connect the main features of the food traceability domain. The Supply Chain Traceability (SCT) ontology is for the agri-food supply chain where the form of critical tracking events (CTEs) is unified to support agriculture and food traceability from logistics to production lines. Some ontologies are developed for specific food categories. Considering that food processing industries employ different quality control systems to check the quality of the seafood, developers create unique concepts and examples for seafood ontology, such as various freezing in processing (blanched frozen, cooked frozen, and uncooked frozen). The Meat Supply Chain Ontology (MESCO) adapts the meat supply chain area. In MESCO, the concepts in the meat supply chain are specially designed so the attributes of different meat products are adjusted to adapt to their different processing procedures and safety traceability methods, and thus support better meat supply chain management. There are also some ontologies, such as the food safety ontology, that focus on the public issue of food safety and are built to support a food safety knowledge graph. It uses the unqualified food data from Web resources to build the ontology. This ontology organizes concepts about food, food hazards, and food inspection items together, and then maps them to the Hazard Analysis and Critical Control Points (HACCP) system for food production.

Other food sub-domain ontologies

There are also ontologies that focus on other food sub-fields to better promote food science and industry. For example, Ontology for Food Processing Experiment (OFPE) can describe the transformation process from raw materials to products for food processing experiments. It includes different classes that represent products and operations during food transformation processes, which can be classified into four main concepts: product, operation, attribute, and observation. ISO-FOOD was developed for sharing and researching isotope food data. In ISO-FOOD, the factors that are related to isotope are recorded as attributes and different sources of food data are unified and integrated under the standards from ISO-FOOD, so that the research and application of isotope in food chemistry can be promoted. Food-Biomarker (FOBI) ontology was designed to integrate nutritional and metabolomic data to support nutritional research because nutrition research has a strong correlation with food intake evaluation and diet habits. Thus FOBI defines concepts and relations between both foods and metabolites. Its development improves the reusability of nutritional and nutrimetabolomic data. Similar sub-domain-oriented ontologies also include Process and Observation Ontology(P O 2) and Agri-Food Experiment Ontology (AFEO) for food processing, Food Processing Chain Ontology (Onto-FP) for wine-making, and Food Explanation Ontology (FEO) for generating the explanation for food recommendation.

More general and comprehensive food ontologies

There are also some ontologies with broader concepts, like FoodOn and RICHIFIELDS ontology. For example, FoodOn is an open-source, comprehensive food ontology resource composed of various term hierarchy facets from ingredients to packaging and cooking. FoodOn allows defining food product terms directly in the ontology and introduces relation descriptions like “has ingredient,” “has part,” and “derives from,” which provides convenience for describing unique containment relations in food products. FoodOn acts as an interface with more food-specific domain ontologies, like food packaging, food nutrition, and food processing. Its knowledge of both food and food processing is comprehensive enough to drive various applications, such as food safety, farm-to-fork traceability, and risk management. Figure 4 shows a simplified example about apple food products in FoodOn. It describes the relations among food sources like apple tree and pome fruit plants, different kinds of food products like apple pie and caramel apple, and related food processes like food baking processes and food coating or covering processes. Moreover, considering that different food ontologies are developed for different application scenarios, these existing food ontologies can be integrated and reused to provide wider coverage of food concepts or serve more general purposes. For example, FoodOntoMap was constructed to link these food ontologies, so that food concepts of different ontologies can be normalized by mapping them to a unified system. Thus, FoodOntoMap can be considered as a general food ontology for further studies in different areas like diseases, human health, or the environment.

Figure 4

A simplified structure of FoodOn

In this example, the relations among food sources, products, and related food processes of apples are described. Different entities are shown in different colors according to their classes. These entities are linked by different relations with different colors according to the type of relations.

A simplified structure of FoodOn In this example, the relations among food sources, products, and related food processes of apples are described. Different entities are shown in different colors according to their classes. These entities are linked by different relations with different colors according to the type of relations. In summary, food ontologies formally describe food types, their properties, and interrelations between food entities. However, these food ontologies generally lack detailed information about more food instances. For these reasons, food knowledge graphs are developed with both food ontology and specific food-relevant instances, where food ontology is generally considered as the schema.

Food knowledge graph

The proliferation of food-relevant instances, such as recipes and nutrition from various sources, presents an opportunity for discovering and organizing food-related knowledge into the food knowledge graph. We divide food knowledge graphs into four different types, including (1) knowledge graphs about recipes, (2) knowledge graphs about nutrients and health, (3) knowledge graphs about food safety, and (4) general food knowledge graphs. Table 3 lists constructed food knowledge graphs.

Knowledge graphs about recipes

Some food knowledge graphs are mainly built based on recipe entities extracted from the crowdsourced consumer review sites, recipe-sharing websites, and social media to support recipe-related applications. Foodbar knowledge graph contains more types of information, such as ratings and consumers’ opinions from different restaurants and bars. It extracts the above information from BEDCA and CookBook of Wikidata, and links to users, points of interest, cultural facts, and so on. Based on this, Foodbar knowledge graph can be used for recommending miniature food according to the given user preference or providing food-relevant descriptive analytics services. Lei et al. further introduce social relationships into the food knowledge graph. They construct a multimodal and hierarchical recipe knowledge graph (RcpKG). In RcpKG, the users’ demands are converted to nodes and modeled with specific hierarchical structures. Thus, it can link profiles of different users and give reliable recipe recommendations based on both personal preferences and social relationships. Its recipe data are from popular recipe websites (e.g., Yummly and AllRecipes) and datasets (e.g., Recipe1M+).

Knowledge graphs about nutrient and health

FoodKG is a large-scale and unified food knowledge graph that brings together food ontologies, recipes, ingredients, and nutritional data. In particular, as shown in Figure 5, it integrates FoodOn into its WhatToMake ontology, and contains recipe and nutrient instances extracted from Recipe1M+ and nutrient records from the United States Department of Agriculture (USDA). Such a food knowledge graph with more comprehensive recipe and nutrition information can support many applications, such as recipe recommendation, ingredient substitutions, and QA.

Figure 5

The structure of FoodKG

There are different instances of the FoodKG in the bottom left of the figure. FoodKG adopts the WhatToMake ontology as its ontology from several sources, such as FoodOn. Besides, instances in FoodKG are associated with nutrition data from the USDA ingredient nutrient database (the orange block at the top left) to support food recommendations with rich nutritional parameters.

The structure of FoodKG There are different instances of the FoodKG in the bottom left of the figure. FoodKG adopts the WhatToMake ontology as its ontology from several sources, such as FoodOn. Besides, instances in FoodKG are associated with nutrition data from the USDA ingredient nutrient database (the orange block at the top left) to support food recommendations with rich nutritional parameters. The Chinese food knowledge graph and healthy diet knowledge graph focus on food and medicine, especially ingredient and nutrient knowledge of Chinese food and Chinese medicine. Machine learning algorithms are used to extract information from Chinese health food websites and Chinese food composition tables, and their own ontology containing food-related concepts and relations are constructed, respectively. The Chinese food knowledge graph basically supports semantic search, and the healthy diet knowledge graph further enables support for more healthy diet applications, like QA and food recommendation. Recently, there has been some work on the relation between diet and disease. For example, Milanlouei et al. develop a knowledge graph of dietary factors associated with cardiovascular disease. To create this knowledge graph, they collected and filtered papers that studied the association between dietary and cardiovascular complications from PubMed. They finally used 292 associations from 91 papers to construct the knowledge graph, and the environment-wide association study (EWAS) approach was applied to discover relations between multiple types of diet and cardiovascular disease.

Knowledge graphs about food safety

Food safety knowledge graph and food spot-check knowledge graph mainly concern food safety issues. Food safety knowledge graph contains the data of unqualified foods officially released in recent years from the Internet. Based on this knowledge graph, an intelligent food safety-oriented QA system was built to help people get the information of unqualified foods. Similarly, Qin et al. obtained food spot-check data from official websites of China’s national food quality supervision and inspection center and China’s food and drug administration, extracted food spot-check information, and constructed a food spot-check knowledge graph. A QA system is also provided based on the knowledge graph.

General food knowledge graphs

Some food knowledge graphs cover more types of food-related knowledge from broader fields. One of them is the knowledge graph for food, energy, and water (FEW). It extracts a vast amount of available data from USDA, the National Oceanic and Atmospheric Administration (NOAA), the United States Geological Survey (USGS), and the National Drought Mitigation Center (NDMC) to support data-driven research for the FEW domain. Another one is the agricultural knowledge graph (AgriKG), an agriculture domain-specific knowledge graph covering raw food materials and food products. Their agriculture data are extracted from sources like Wikidata, and the fragmented information is integrated for agriculture-relevant applications. In contrast, the World Food Atlas Project was built to include a wider range of food concepts. It is a project that can aggregate and unify food-related information from multiple offline and online sources in the world. To achieve this, researchers developed a food knowledge graph that uses FoodOn as its ontology to collect foods, ingredients, and their relations from multiple sources. Later they develop the FoodLog Athl and the RecipeLog, two mobile applications for collecting diet records as dietary knowledge. Although still in the early stages, the combination of these two works will help explore the relations among food, culture, and personal health, and promote regional food culture research. Considering food knowledge graph construction needs a lot of laborious work, there are not many constructed food knowledge graphs in the academic field. In contrast, because of its vital importance in the food business, many companies, such as Uber, Meituan, and Yummly, have constructed their food knowledge graphs to drive many products and make them more intelligent from different specific domains. For example, Uber Eats builds a food knowledge graph to enable food-related retrieval and recommendation. In this food knowledge graph, the nodes consist of different entities, such as restaurants, cuisines, and menu items, and different relations are constructed as edges, such as the association between cuisines and location information. Edamam developed an extensive knowledge graph on food and cooking, including recipes, ingredients, nutrition information, measures, and allergies. The goal of this food knowledge graph is to offer users multiple ways of searching to enable better food choices. In order to effectively construct a food knowledge graph, one common method is to combine extractions from Web content with domain knowledge from existing knowledge repositories. The semi-automatic way is usually adopted with both machine learning methods and manual efforts. Generally, the first step is to construct the food ontology. One effective method is to reuse existing food ontologies. For example, FoodKG adopts the ontology on food products from the FoodOn as its ontology. In some cases, existing ontologies do not cover what is intended with the target project, and building one food ontology from scratch is thus necessary. The most widely used ontology construction method is to combine top-down and bottom-up approaches, where the former starts with defining the classes for the more general concepts in the domain and continues by defining the subclasses, and the latter starts with a definition of more specific concepts in the domain as subclasses and continues by grouping these classes into more general concepts, such as wine ontology. As shown in Table 3, for food knowledge graphs like the Chinese food knowledge graph, AgriKG and food spot-check knowledge graph, their ontologies are specially constructed from extracted data. After food ontology construction, more information on instance items, such as food entities and their relations,,88, 89, 90 should be extracted from various sources and are added into the food ontology for food knowledge graph construction. There are also some food-oriented relation extraction models, like SAFFRON for food-disease relation extraction and FoodChem for food-chemical relation extraction.

Discussion

Besides food ontology and knowledge graphs, as mentioned in the history of the knowledge graph, linked data are another way of organizing food knowledge. Some representative food linked data are also proposed, and they play roles in food science and industry. Among all of these work, AGROVOC, FOODpedia, and Open Food Facts are three representative linked data. For example, AGROVOC is considered the largest food-linked data source about food and agriculture for the public, which has been coordinated by the Food and Agriculture Organization of the United Nations (FAO) since the early 1980s. AGROVOC introduces concepts to represent almost everything in food and agriculture and consists of over 39,500 concepts and 924,000 terms in up to 41 languages (October 2021). Besides, there are some thesauruses, which are not particularly designed for food domains but involve terms of food and health classes, like the food class in the DBpedia and food and drinks class in SNOMED Clinical Terms. Sometimes, it is difficult to define what food linked data actually belong to. Some food-linked data datasets define the concepts and relations and use them to describe and represent the food domain. From this aspect, they can play the role of the food ontology. On the other hand, despite relatively limited application scenarios, food-linked data may contain a large number of entities and organize them like the knowledge graph. For similar reasons, some food ontologies can be considered as food knowledge graphs, because they contain not only classes and their relations in a schema but also real-world instances, their properties, and relations, according to the definition of knowledge graphs. For example, FoodWiki and FOODS contain not only the food ontology but also product instances, their properties, and relations. This indicates that the boundary between food-linked data, food knowledge graphs, and food ontology is vague in some cases.

Applications of food knowledge graphs

As illustrated in Figure 6, representative applications of food knowledge graphs in food science and industry are identified and summarized from the following seven aspects. Considering food ontology is one important part of food knowledge graphs, we will discuss their applications in this section together.

Figure 6

Applications of food knowledge graphs

Representative applications of food knowledge graphs are shown: new recipe development, food question-answering systems, diet-disease correlation discovery, visual food analysis, personalized dietary recommendation, food supply chain management, and food machinery intelligent manufacturing. FKG, food knowledge graph.

Applications of food knowledge graphs Representative applications of food knowledge graphs are shown: new recipe development, food question-answering systems, diet-disease correlation discovery, visual food analysis, personalized dietary recommendation, food supply chain management, and food machinery intelligent manufacturing. FKG, food knowledge graph.

New recipe development

The research and development of new food products is one important part of the food industry. Food knowledge graphs can be utilized to develop new products via effective knowledge organization and their powerful inference ability. Developing new recipes is one representative application of food knowledge graphs. For existing recipes, we can resort to food knowledge graphs to find various alternative ingredients under requirements or develop new flavors., Also we can develop novel culinary recipes, including not only their ingredient combinations but also their ingredient proportions and time durations of each step via combining the constructed food knowledge graph and mathematical models. First, we can use the auto-infer abilities of food knowledge graphs to discover alternative ingredients. Shirai et al. developed a heuristic method to sort ingredient substitutions based on the similarity of the properties of ingredients and the similarity of latent semantics of ingredient names in FoodKG. In their method, FoodKG is utilized as the latent semantics source in the form of word embedding since it includes abundant linked information about nutrition, ingredients, and recipes, while Word2Vec is utilized as the word-embedding model. Considering that suitable substitute ingredients will have similar word embeddings, cosine similarity is used to measure and sort the best substitution ingredients. For the food industry, such a method can help discover ingredient substitutions for existing products and reduce food production costs. For consumers, such food-knowledge-graph-based ingredient substitution methods can give alternatives to specific recipes to meet their personalized needs. When integrating more comprehensive domain knowledge, the food knowledge graphs can give more personalized alternative ingredients based on more factors, including not only ingredients but also health indexes like glycemic index and glycemic load. The food knowledge graph can also facilitate the development of new recipes. Generally, special flavors of foods come from the mixing or the interaction of several food components during cooking, while the specific mechanism inside may not be explored clearly. However, correlations of food components can be discovered statistically with co-occurrence information. Considering that the food knowledge graph can organize recipes and chemical components in a similar way, food manufacturers can use food knowledge graphs to discover well-matched latent ingredient combinations. Ahn et al. introduce a flavor network to capture the shared flavor compounds in various ingredients. They constructed a bipartite network to link about 400 ingredients and over 1,000 flavor compounds and then project it to a flavor network, where ingredients sharing the same flavor compounds are connected, and the weight of each link depends on the number of their shared flavor compounds. Then they used recipes from American repositories to analyze ingredient combinations in different regions, including popular and unpopular ones. Some ingredient combinations may only be popular in some regions, or combinations are feasible but not being tried yet. Applying these ingredient combinations to existing recipes is an effective way to develop new recipes, which may bring new flavors and genres. In addition, the knowledge graph can also be used directly to develop new recipes. For example, Pinel et al. constructed a food knowledge graph to organize data of recipes and ingredients, and generate recipes that fit the requirements of users based on the constructed knowledge graph. Later, their algorithm selects the best ingredient combination and proportions by novelty and quality evaluators. Recipe steps are then generated using a subgraph composition algorithm, and the time duration of each step is estimated from known complete recipes. Besides the above-mentioned applications, food knowledge graphs also show considerable prospects in more aspects of improving recipes. For example, considering that most foods already exist when the knowledge graphs are developed, it cannot be ignored that there are unhealthy (even hazardous) ingredients or pairings existing in these foods. Linked with toxicology knowledge, a food knowledge graph can assess the toxicity risk of the specific recipe from its components., When food processing knowledge is integrated, it can even assess the potential risk of the specific recipe during the processing by its reasoning and advise substitute ingredients or substitute processing steps. All of these above-mentioned works show that food knowledge graphs can make new recipe development possible, and exploring food knowledge graphs to develop new recipes provides a method of new recipe development with higher efficiency and reliability in the food industry.

Food QA system

The QA system via food knowledge graphs can help people analyze the information and potential problems, and answer food-relevant questions about different food sub-domains, such as nutrition and disease, and food safety. For example, diabetics often ask questions like, “How can I increase the fiber content of this cake?” A person with lactose intolerance may ask “What can I substitute for milk in chocolate cake?” Answering these questions is not possible from general knowledge graphs because of the incompleteness of domain knowledge. Food knowledge graphs can be developed to support natural-language QA based on different categories of questions about recipes and nutrition, such as simple queries for nutritional information, comparisons of nutrients from different foods, and constraint-based queries to find recipes matching certain criteria. Food-knowledge-graph-based QA systems can also describe recipes, nutrients in foods, and the interaction between nutrients and prescribed drugs, disease, and general health to satisfy users’ specific information needs. For example, cooking QA, is intended to satisfy the user’s information need in the cooking domain, and is helpful to people such as housewives and nutritionists. FoodKG organizes diets, nutrients, and food types together, which can be leveraged by a QA system in the food field. It takes natural language questions as the input, and generates answers from the information stored in FoodKG. The questions it can answer can be roughly divided into three categories: simple questions, which directly ask about the ingredients of a certain food; comparative questions, where, given some conditions, the system selects more suitable food; restricted questions, where, given restrictions on ingredients or types of food, the system provides qualified food. When there is a question, the system decides the question style, detects the mentioned topic entity, and then use a knowledge base question answering (KBQA) model to retrieve answers from FoodKG. The system is also enriched by user preferences to improve personalized QA. In addition, some QA systems via food knowledge graphs have been developed for the issue of food safety. For example, Qin et al. constructed a food QA system based on the food safety knowledge graph that collects officially published food data from the Internet. In their QA system, users’ questions are first parsed, and every food and attribute will be mapped to entities and relations in the food safety knowledge graph. Later, questions in natural language will be converted to SPARQL query statements by template matching so that questions related to food safety can be understood and answered by the machine. When food data and templates change, this workflow can also be applied to answer different types of questions. For example, a food spot-check knowledge graph is utilized in a similar way to provide food spot-check data QA.

Diet-disease correlation discovery

The research on diet, disease, and their correlation modeling is always an important aspect in food science and nutrition. It has already been proved that there are inevitable connections between chronic diseases and certain diet styles.100, 101, 102 Some studies also show that diseases without effective treatments (like neurodegenerative diseases) are associated with certain foods, which provides potential opportunities to prevent diseases or delay disease progression. This is because we can build connections among diseases, diets, food, raw food materials, and chemical components via constructing food knowledge graphs and then conduct deeper analysis for their correlations. Jensen et al. built a system called NutriChem, a resource covering the broad molecular content of food, collecting exhaustive resources on the health benefits associated with specific dietary interventions. NutriChem contains 18,478 pairs of 1,772 plant-based foods and 7,898 phytochemicals, and 6,242 pairs of 1,066 plant-based foods and 751 diseases. In addition, it includes predicted associations for 548 phytochemicals and 252 diseases. All of these data are generated by mining 21 million MEDLINE abstracts for information that links plant-based foods with their small molecule components and human disease phenotypes. To organize these data, they introduced an ontology that integrates the taxonomy from NCBI taxonomy, the Plant for a Future (PFAF) and the Danish Food Composition Databank. The relations in the ontology are built using Fisher’s exact test. NutriChem allows us to integrate established relations among food, compounds, and diseases in a more comprehensive way. Therefore, we can easily understand the role of certain types of foods, and even infer which types of food are harmful or beneficial. This provides a foundation for understanding mechanistically the consequences of eating behaviors on health. Nian et al. investigated relations between food and neurodegenerative diseases in a similar way. They collected biomedical annotations from over 4,000 publications and created the knowledge graph. Later, the node2vec algorithm was used to train graph embeddings for clustering similar concepts and distinguishing different concepts. In the constructed knowledge graph, disease nodes and diet nodes are connected if they are relative, and their weights are determined according to the strength of the relevance. Based on this, they found that some food-related species and chemicals coming from the diet have a strong impact on neurodegenerative diseases. Similar work includes the biochemical knowledge graph, a comprehensive source of knowledge for integrating biochemical knowledge and accelerating discovery in biochemical sciences, whose information is extracted and mined from biochemical literature. Some online platforms, such as DietRx (cosylab.iiitd.edu.in/dietrx/) can also collect the food-disease associations from MEDLINE abstracts, which can be used for exploring the interrelationships among food, chemicals, diseases, and genetic mechanisms. These leading-edge studies have proved the feasibility of the food knowledge graphs discovering the food-disease interaction. Thus, constructing a knowledge graph with diseases and food composition can be expected to analyze more general chemical component-disease relationships, generate novel insights, and even explore potential disease prevention strategies by further designing certain diets.

Visual food analysis

Rapidly and reliably detecting and analyzing food product quality and safety (e.g., meat products, cereal products, fruits and vegetables) in one non-destructive way is significant for the food industry. Along with the development of AI, AI-augmented food analysis has become a new trend of food analysis. They use machine learning algorithms to process data from sensors (like spectral and chromatographic data). Vision-based food analysis from the image sensor is usually considered for its non-destructive nature. Among all visual food analysis methods, visual food recognition is a basic task. Automatic food recognition can replace the manual grading process and quality detection, and can also work as one basic step for various applications, like food log systems and suggester systems., Knowledge graphs can be vectorized by machine learning algorithms to support visual food object recognition. Rich knowledge from food knowledge graphs, such as ingredients and their relations, have been explored to improve the performance of visual food recognition.109, 110, 111 For example, Chen et al. leveraged multiple relations among ingredients for ingredient recognition. They constructed a multi-relational knowledge graph to describe the ingredient relations and develop a graph model called multi-relational graph convolutional network (mRGCN) for zero-shot ingredient recognition from the dish, namely, recognizing ingredients that the model has not seen. In mRGCN, the food knowledge graph is introduced as prior knowledge because it contains a large amount of recipe data, which provides the probability of coexistence between ingredients and the probability of food containing a certain ingredient. By ingredient recognition enhanced by food knowledge graphs, mRGCN can predict the dish category. There is a performance improvement of 9.7% when introducing the ingredient knowledge graph, and mRGCN achieves a 24.2% top-1 hit for unseen ingredients in the VIREO Food-172 dataset, where top hit measures the percentage of the most possible predictions that match the ground-truth labels. Based on the recognized dish type, we can also further resort to the food knowledge graph to obtain more detailed information about the recognized dish type (such as properties, macronutrients, and ingredients) to realize automatic dietary assessment., So far there is no work that uses food knowledge graphs to handle more complex food visual analysis tasks like food object detection and segmentation, where, compared with food recognition, food detection additionally provided the localization for the recognized food item, and food segmentation is one process of assigning food labels to every pixel in one food image. However, it has been proved that the object detection framework can integrate external knowledge from a knowledge graph to improve its performance because some combinations of objects are more common than others. Image segmentation augmented by knowledge has also been implemented in medical scenarios., We believe similar methods can also be applied in food settings to improve the performance of complex food visual analysis tasks. For example, we can use the food knowledge graph to enhance the performance of food segmentation, which can further help the dietary assessment.,

Personalized dietary recommendation

Personalized food and nutrition is gaining more and more attention in food science and other relevant domains., They aim to use comprehensive personal information about individuals (e.g., dietary pattern and gut microbiota) for personalized dietary advice or recommendation, which is more suitable than generic advice. However, food recommendation can be a daunting task, partly because of the problem of information silos across multiple sources with large amounts of food and nutrition data. In addition, different from other types of recommendation, food recommendation should take many nutritional parameters into consideration, such as caloric and different macronutrient and micronutrient intake. A natural solution to this problem is to provide an intelligent food recommender system based on the food knowledge graph. Food knowledge graphs provide formal, uniform, and shareable representations about food. They can benefit from different aspects, such as improving the precision of recommended items, increasing the diversity of recommended items, and bringing better explainability. When it is organized with the personal health knowledge graph, it can further give a personalized dietary recommendation based on food knowledge graphs. This can benefit different people, such as diabetics, weightlifting athletes, and older adults. As one use case, personalized food recommendation is conducted over the constructed food knowledge graph FoodKG with recipes, ingredients, and nutrients. When providing a recommendation, given a user query (e.g., “What is a good lunch that contains meat?”) as the input, the system retrieves all recipes from FoodKG for the recommendation. Specifically, the system identifies the query type first. Then the mentioned topic entity (e.g., meat) is detected from FoodKG. With the extracted entity, answers are retrieved from the knowledge graph by a KBQA model. Next, personalized requirements are added as additional constraints, such as the user’s unique health conditions (e.g., allergies) and health guidelines (e.g., nutrition needs) to the raw user query for personalized food recommendation. In addition, we can obtain more accurate estimations of calories and nutrient content of the recipe to develop nutritional profiling systems via food-knowledge-graph-enhanced mapping between cooking recipes and structured data (food composition tables). Such a nutrition profiling system will further guarantee more precise dietary recommendations. In addition, the food knowledge graph can be combined with personal knowledge graphs. A personal knowledge graph is unique for every user, and it can include personal information such as allergies, preferences, and health indexes. With this knowledge, the reasoning of the combined food knowledge graph can be more personalized and tailored. The personal health knowledge graph (PHKG) is a typical application of Semantic Web technology in a comprehensive diet recommendation system. This project builds a knowledge model to provide personalized dietary advice. In the project, PHKG is used to capture personal dietary behaviors such as carbohydrate intake with the extended time series summarization technique. It can also use semantic reasoners to recommend clinically relevant dietary recommendations.

Food supply chain management

The food supply chain comprises all the stages that food products go through, from production to consumption. Nowadays, with the globalization process, food is transported over longer distances before it reaches the consumers, and the food supply chain thus becomes longer and more fragmented. This brings two problems: hard-achieved food traceability and more overall food waste. Therefore, it is necessary to effectively manage the food supply chain to achieve reliable food traceability and control waste. An intuitive idea is to construct a directed graph where nodes represent the status and processes of food materials. Zhang et al. first adopted the concept of CTEs, which was proposed by the Institute of Food Technologists (IFT) to describe the key parts of the life cycle of a food product, like transportation and process. CTEs can associate with data related to the key events (like operators and devices). These CTEs can be linked and organized for tracing and tracking. However, the food supply chain system is often cross-functional and cross-regional, involving data-sharing problems between different company entities. For example, different food processors can have different types of data because they focus on different functions and processing, and their terms are affected by context, so the same name can indicate totally different foods, like the term buttermilk: it can refer to the cultured milk drink or the milk after churning, depending on where it is used. Besides, food manufacturers and raw food materials suppliers may have different naming agreements due to their geography, because of which the same food and terms may have different names. The food knowledge graph is a solution to model, integrate, and align food data in food supply chain management. By assigning a Uniform Resource Identifier (URI) for every unique food material, the food knowledge graph can easily distinguish what exactly a term refers to under certain conditions. Besides, once attributes of operations in food processing (like former step, time, status, equipment information, and safety standard) are aligned, and every participant in the food supply chain will be linked in a unified and standardized form. Based on this, more reliable traceback and related querying can be supported. Some traceability ontologies have become a part of traceability management to support food track and trace in the food supply chain. For example, in FTTO, food processing procedures are mainly considered, and different attributes for different foods, like beverages and additives, are designed. Their attributes are standardized by FTTO, so different food product statuses can be linked through the processing flow and keep their naming consistency through their life cycles. This also makes intelligent querying possible, which means the users can access all intermediate products of a certain product and their relevant information, such as operating time and operator. This enables the users to trace back all security risks, such as tracing the contaminated foods, especially those that occur across borders. FTTO has been used in the global supply chain system. MESCO further extends the FTTO to adapt the meat supply chain area. In particular, It continues to use the traceability method and food-related concepts in FTTO, and the concept of the meat supply chain is more considered, such as the unique identification code of the animal, the place of production, and the date of birth.

Food machinery intelligent manufacturing

With the continuous evolution of technologies, the innovation of IoT sensors also affects food-relevant scenarios like food processing and central kitchens. Data fusion and exploration from different sensors are necessary to support further intelligent decision making, which is of great significance for building automatic food industry production lines and consumer-oriented smart terminal equipment. With the wide application of food knowledge graphs in IoT, we can build the intelligent kitchen or intelligent industry devices that can make intelligent decisions based on data from IoT sensors. For example, in intelligent kitchens, smart refrigerators with cameras can reason with recognized food and drink items, ingredients, and portion sizes, and even estimate their shelf life for timely use in recommended recipes via the embedded food knowledge graph. Smart microwaves with cameras can recognize the food type and then automatically choose important parameters, such as heating methods and heating time via the embedded food knowledge graph. KitchenSense is an early work about intelligent kitchens. It uses knowledge to coordinate the work of various intelligent devices in the kitchen to enhance the intelligent interaction between devices and people. In addition, industry robots and machinery can make more intelligent decisions from food knowledge graphs. They can access the processing status through sensors, obtain the physical properties of food materials, and perform intelligent processing control according to the knowledge from food knowledge graphs. Such an intelligent approach can allow information in different forms to be integrated for industrial machines. To our best knowledge, there are few published works that utilize food knowledge graphs for food processing control. However, we notice that some food ontologies and food knowledge graphs cover the concepts involved in food processing, such as OFPE and AFEO. These established ontologies or knowledge graphs can be further explored to make different stages in food processing better controlled and more effective, resulting in more automatic and intelligent whole-food processing.

Future directions of food knowledge graphs

Based on comprehensive discussions on existing efforts, we now articulate key open challenges and future research directions for food knowledge graphs.

Multimodal food knowledge graph

Most of the existing food knowledge graphs focus on organizing verbal knowledge extracted from text. However, the proliferation of edge devices, such as mobile devices and IoT devices in the food industries, generates large volumes of visual data, e.g., images and videos, which contain another important type of knowledge, namely visual knowledge. From the narrow perspective of computer vision, visual knowledge is any information that can be useful for improving vision tasks like recognition. Such visual knowledge includes different forms, such as labeled examples of different categories (e.g., food categories and rich attributes) and relationships like object-object relations (e.g., chicken is part of Kung Pow chicken). Large-scale efforts, such as visual attribute learning, visual relationship detection, and scene graph generation, are underway to extract a body of visual knowledge. Visual knowledge and verbal knowledge constitute multimodal knowledge. There are some initial attempts to incorporate visual information into knowledge graphs by linking images to text via hyperlinks. In this case, visual information (e.g., entity images) can only be used for visual demonstrations. Most existing food knowledge graphs do not contain visual knowledge and thus cannot support food-oriented visual search and visual illustration. It is the right time to start building a multimodal food knowledge graph, where searching, indexing, organizing, and hyperlinking multimodal knowledge are necessary. Such a multimodal food knowledge graph can help food-oriented multimodal learning technologies to support many cross-modal tasks, such as cross-modal recipe-food image retrieval and generation.,, The downstream applications are various, such as automatically illustrating a given recipe using semantically corresponding images, and supporting food-oriented multimodal dialogue systems. Effective multimodal information integration can also be applied to a lot of food industry scenarios, since abundant multimodal data exist in food industries, like image, video, and other attribute information obtained from various types of sensors. Automatic fruit classification and grading, baking and fermentation time control, and automatic food packaging can all benefit from multimodal food knowledge graphs because multimodal information can be better organized and analyzed, and these procedures can be thus optimized. However, due to different statistical properties between visual knowledge and verbal knowledge, how to reasonably and effectively build multimodal food knowledge graphs is worth further study.

Representation and reasoning on the food knowledge graph

The first step of using food knowledge graphs is to represent them and conduct complex reasoning on them. Numerical computing for knowledge representation and reasoning requires a continuous vector space to capture the semantics of entities and relations. While embedding-based methods have limitations on complex logical reasoning, some recently proposed methods, especially Graph Neural Networks (GNN), on knowledge graph reasoning are very promising for handling complex reasoning. The GNNs learn the representation of a target node by propagating neighbor information in an iterative manner until a stable fixed point is reached. With the help of GNNs, it is possible to extract both entity characteristics and relations from knowledge graphs, which is one essential factor for food-knowledge-graph-based applications, such as compound-food relation prediction and food recommendation.

Food big data organization and mining

A huge amount of food-related information is generated globally from different sources (e.g., IoTs, online databases, and social media) with various types (e.g., food components, nutrition tables, recipe text, food images, and cooking videos), resulting in food big data. These data are related to all the stages of the food system, such as food production, processing, and consumption, and thus enable applications in food science and industry, especially combined with AI. However, due to the large scale of these data, how to organize and explore them well becomes a challenge. By uniforming their names and integrating different information sources with knowledge graphs, we can better organize and utilize rich data. For example, food composition databases and food regulations can be linked with food, and we can use them to discover connections among food and diet-induced illnesses, pushing forward relevant research on food, nutrition, health, and regulations. Furthermore, big data in the food safety domain can also help review previous food safety accidents and help develop tools to deal with hard food safety issues., If we can develop a knowledge graph to organize them, we are more likely to find the key points that lead to the accidents and resolve safety issues via reasoning on knowledge graphs. Furthermore, the big-data-supported food knowledge graphs can realize more precise personalized nutrition recommendations. By detecting consumers’ health indicators by a series of sensors, we can obtain a large amount of health data, including personal features, diet, nutrition, and behaviors. Food knowledge graphs can integrate these data, which are related to the human body in an appropriate way, and the combination of food knowledge graphs and AI algorithms can realize accurate and personalized nutrition. Under appropriate circumstances, we can check health indicators again to obtain feedback, which forms a knowledge-graph-enhanced health nutrition recommendation closed loop.

Internet of Food

Internet of Food (IoF) is designed to make the data from different devices and sources interoperable and to be able to compute across the whole dataset. However, a notable limitation is the lack of integration caused by the mix of data from different sources and hardware standards. Food knowledge graphs can provide standards (food ontologies) for all food information, such as how we describe food attributes, and how it is cooked, processed, or consumed, and make all food-relevant data and information (instances) connected. Therefore, it can foster the development of IoF. However, food knowledge graphs involve complex technologies, such as knowledge aggregation, complex storing, and index technologies, bringing great challenges. In addition, using knowledge graphs to integrate food data from diverse sources at a large scale is necessary, while developing scalable scientific and engineering methods to keep scale with little cost increase is an obvious requirement for the successful application of food knowledge graphs. Once IoF is constructed based on food knowledge graphs, it enables all known food-relevant information to be accessible by machines, consumers, and companies to further enable more applications in food science and industry.

Food knowledge graph for human health

To meet people’s pursuit of better health, the essential demand is presented for better, safer, and more nutritious food. To achieve this goal, building one human nutrition and health platform is necessary. The food knowledge graphs provide one opportunity to build such a platform via large-scale structured food knowledge organization. As the core of this platform, food knowledge graphs can support the tracking and monitoring of the dietary behaviors; health-relevant search and recommendations; and food-relevant studies on nutrition, diet, and disease. To achieve these goals, the food knowledge graph should satisfy certain characteristics. For example, a more complete and accurate interdisciplinary food knowledge graph is one basic requirement, and joint efforts from worldwide experts in food science, nutrition, health, and other relevant domains are thus needed. More challenges should also be solved. For example, there exist different culinary cultures and health beliefs in the world, which probably leads to contradiction when adding this knowledge into the food knowledge graphs. Although existing machine learning and natural language processing methods can achieve food knowledge graph construction automatically, the multiple sources of food data inevitably introduce noise. In addition, such big food knowledge graphs should support dynamic adaptation, which is more difficult to achieve from the perspective of technology.

Food intelligence

Driven by the fast development of AI, there is a urgent need to push the AI frontier to the food domain. Conforming to this trend, food computing has received tremendous amounts of interest for its multifarious applications in health, culture, and medicine. It acquires and analyzes multi-source multimodal food data for food-oriented various tasks via computational approaches. The nexus between food computing and AI gives birth to the novel paradigm of food intelligence. Food knowledge graphs can enhance already-popular techniques of computer vision and natural language processing, such as image recognition,, object detection, and QA, and thus can aid food computing tasks. We can also make decisions and reasoning on food knowledge graphs in combination with advanced AI technologies for many intelligent services in various fields, such as smart kitchens. Therefore, food knowledge graphs will play important roles in realizing food intelligence, which will benefit various studies and applications in food science and industry.

Conclusions

In this review, we summarize food knowledge graphs from their development, applications, and future directions in food science and industry. Our comprehensive review of current research on food knowledge graphs shows that food knowledge graphs have enabled various food-oriented applications for the capability of knowledge graphs in effective food data organization, representation, and reasoning. Future directions for food knowledge graphs show their great potential in solving food-relevant key problems in food industries and daily diet scenarios. Although there are still challenges from multimodal food data and complex computational technologies, we have seen the considerable application prospects shown by food knowledge graphs in the food domain. This is also the purpose of this review, which encourages researchers and engineers in this field to put knowledge graphs into practice for the benefits of food science and industry.

41 in total

1. FOBI: an ontology to represent food intake data and associate it with metabolomic data.

Authors: Pol Castellano-Escuder; Raúl González-Domínguez; David S Wishart; Cristina Andrés-Lacueva; Alex Sánchez-Pla
Journal: Database (Oxford) Date: 2020-01-01 Impact factor: 3.451

2. Recipe1M+: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images.

Authors: Javier Marin; Aritro Biswas; Ferda Ofli; Nicholas Hynes; Amaia Salvador; Yusuf Aytar; Ingmar Weber; Antonio Torralba
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2019-07-09 Impact factor: 6.226

3. ISO-FOOD ontology: A formal representation of the knowledge within the domain of isotopes for food science.

Authors: Tome Eftimov; Gordana Ispirova; Doris Potočnik; Nives Ogrinc; Barbara Koroušić Seljak
Journal: Food Chem Date: 2018-10-28 Impact factor: 7.514

4. Applicability of machine learning techniques in food intake assessment: A systematic review.

Authors: Larissa Oliveira Chaves; Ana Luiza Gomes Domingos; Daniel Louzada Fernandes; Fabio Ribeiro Cerqueira; Rodrigo Siqueira-Batista; Josefina Bressan
Journal: Crit Rev Food Sci Nutr Date: 2021-07-29 Impact factor: 11.176

5. A case study in ontology library construction.

Authors: G van Heijst; S Falasconi; A Abu-Hanna; G Schreiber; M Stefanelli
Journal: Artif Intell Med Date: 1995-06 Impact factor: 5.326

6. Comprehensive study on applications of artificial neural network in food process modeling.

Authors: G V S Bhagya Raj; Kshirod K Dash
Journal: Crit Rev Food Sci Nutr Date: 2020-12-17 Impact factor: 11.176

7. ONS: an ontology for a standardized description of interventions and observational studies in nutrition.

Authors: Francesco Vitali; Rosario Lombardo; Damariz Rivero; Fulvio Mattivi; Pietro Franceschi; Alessandra Bordoni; Alessia Trimigno; Francesco Capozzi; Giovanni Felici; Francesco Taglino; Franco Miglietta; Nathalie De Cock; Carl Lachat; Bernard De Baets; Guy De Tré; Mariona Pinart; Katharina Nimptsch; Tobias Pischon; Jildau Bouwman; Duccio Cavalieri
Journal: Genes Nutr Date: 2018-04-30 Impact factor: 5.523

Review 8. A Comprehensive Survey on Graph Neural Networks.

Authors: Zonghan Wu; Shirui Pan; Fengwen Chen; Guodong Long; Chengqi Zhang; Philip S Yu
Journal: IEEE Trans Neural Netw Learn Syst Date: 2021-01-04 Impact factor: 10.451

9. goFOOD^TM: An Artificial Intelligence System for Dietary Assessment.

Authors: Ya Lu; Thomai Stathopoulou; Maria F Vasiloglou; Lillian F Pinault; Colleen Kiley; Elias K Spanakis; Stavroula Mougiakakou
Journal: Sensors (Basel) Date: 2020-07-31 Impact factor: 3.576

10. A systematic comprehensive longitudinal evaluation of dietary factors associated with acute myocardial infarction and fatal coronary heart disease.

Authors: Soodabeh Milanlouei; Giulia Menichetti; Yanping Li; Joseph Loscalzo; Walter C Willett; Albert-László Barabási
Journal: Nat Commun Date: 2020-11-27 Impact factor: 14.919

1 in total

1. Semantics of Dairy Fermented Foods: A Microbiologist's Perspective.

Authors: Francesco Vitali; Paola Zinno; Emily Schifano; Agnese Gori; Ana Costa; Carlotta De Filippo; Barbara Koroušić Seljak; Panče Panov; Chiara Devirgiliis; Duccio Cavalieri
Journal: Foods Date: 2022-06-29

1 in total