| Literature DB >> 30101038 |
Vieira Bruno1, Silva Resende1, Cui Juan1.
Abstract
Healthy diet with balanced nutrition is key to the prevention of life-threatening diseases such as obesity, cardiovascular disease, and cancer. Recent advances in smartphone and wearable sensor technologies have led to a proliferation of food monitoring applications based on automated food image processing and eating episode detection, with the goal to conquer drawbacks of the traditional manual food journaling that is time consuming, inaccurate, underreporting, and low adherent. In order to provide users feedback with nutritional information accompanied by insightful dietary advice, various techniques in light of the key computational learning principles have been explored. This survey presents a variety of methodologies and resources on this topic, along with unsolved problems, and closes with a perspective and boarder implications of this field.Entities:
Keywords: Automatic nutrient assessment; Food image classification; Food image dataset; Food monitoring; Machine learning
Year: 2017 PMID: 30101038 PMCID: PMC6086355 DOI: 10.4172/2157-7420.1000272
Source DB: PubMed Journal: J Health Med Inform ISSN: 2157-7420
Figure 1:The workflow of an automated food monitoring system that connects various components discussed in the main text.
Food image databases.
| Study | Database | Image content | Total # of class/image | Acquisition | Reference |
|---|---|---|---|---|---|
| Chen et al., 2009 | PFID | Fast food items from USA | 61/1098 | Images taken in restaurants and in lab, with white background | [ |
| Mariappan, 2009 | TADA[ | Common food in USA | 256 food+50 replicas | Images collected in a controlled environment | [ |
| Hoashi et al., 2010 | Food85[ | Japanese food | 85/8500 | Images derived from previous database with 50 Japanese food category and web | [ |
| Chen, 2012 | Chen | Chinese food | 50/5000 | Images downloaded from the web | [ |
| Matsuda et al., 2012 | UEC Food-100 | Popular Japanese food | 100/9060 | Images acquired by digital camera (each photo has a bounding box indicating the location of the food item) | [ |
| Farinella et al., 2014 | Diabetes | Selected food | 11/4868 | Images downloaded from the web | [ |
| Bossard et al., 2014 | Food-101 | Popular food in USA | 101/101000 | Images downloaded from the web | [ |
| Kawano and Yanai, 2014 | UEC Food-256 | Popular foods in Japan and other countries | 256/31397 | Images acquired by digital camera (each photo has a bounding box indicating the location of the food item) | [ |
| Meyers, 2015 | Food201-Segmented[ | Popular food in USA | 201/12625 | Images derived from Food 101 dataset; segmented | [ |
| Beijbom et al., 2015 | Menu-Match | Food from three restaurants (Asian, Italian, and soup) | 41/646 | Images taken by authors | [ |
| Ciocca et al., 2016 | UNIMIB2016 | Food from dining hall | 73/1027 | Images acquired by digital camera in dining hall; segmented | [ |
| Chen and Ngo, 2016 | Vireo | Chinese dishes | 172/110241 | Images downloaded from the web | [ |
Proprietary database
Food segmentation methods.
| Study | Approach | Performance | Reference |
|---|---|---|---|
| Yang et al., 2010 | Semantic Texton Forest calculates the probability for each pixel to belong to one of the food classes. | Output from Semantic Texton Forest is far from a precise parsing of an image | [ |
| Matsuda et al., 2012 | Combined techniques: whole image, DPM, circle detector and JSEG segmentation | Overall accuracy to 21% (top 1) and 45% (top 5)[ | [ |
| Kawano and Yanai, 2013 | Each food item within user generated bound boxes is segmented by GrabCut algorithm | Performance depending on the size of the bounding boxes | [ |
| Pouladzadeh et al., 2014 | Graph cut segmentation algorithm to extract food items and user's finger | Overall accuracy of 95% | [ |
| Shimoda and Yanai, 2015 | CNN model searching for food item based on fragmented reference | Detects correct bounding boxes around food items with mean average precision of 49.9% when compared to ground truth values | [ |
| Meyers, 2015 | DeepLab model | Classification accuracy increases with conditional random fields | [ |
| Zhu et al., 2015 | Multiple segmentations generated for an image and selected by a classifier | It outperforms normalized cut | [ |
| Ciocca et al., 2016 | Combines saturazation, binarization, JSEG segmentation and morphological operations | Achieves better segmentation than using JSEG-only approach | [ |
Top 1 and/or Top 5 indicate that the performance of the classification model was evaluated based on the first assigned class with the highest probability and/or the top 5 classes among the prediction for each given food item, respectively
Traditional and deep learning classification methods.
| Traditional Methods | ||||||
|---|---|---|---|---|---|---|
| Study | Approach | Database | Performance[ | Reference | ||
| Features | Classifier | Top1 Acc. | Top5 Acc. | |||
| Chen, 2012 | SIFT, LBP, color and gabor | Multi-class Adaboost | Chen | 68.3% | 90.9% | [ |
| Beijbom et al., 2015 | SIFT, LBP, color, HOG and MR8 | SVM | 77.4% | 96.2% | [ | |
| Anthimopoulos et al., 2014 | SIFT and color | Bag of Words and SVM | Diabetes | 78.0% | - | [ |
| Bossard et al., 2014 | SURF and L[ | RFDC | Food-101 | 50.8% | - | [ |
| Hoashi et al., 2010 | Bag of features, color, gabor texture and HOG | MKL | Food85 | 62.5% | - | [ |
| Beijbom et al., 2015 | SIFT, LBP, Color, HOG and MR8 | SVM | Menu-Match | 51.2%[ | [ | |
| Christodoulidis et al., 2015 | Color and LBP | SVM | Local dataset | 82.2% | - | [ |
| Pouladzadeh et al., 2014 | Color, texture, size and shape | SVM | 92.2% | - | [ | |
| Pouladzadeh et al., 2014 | Graph Cut, color, texture, size and shape | SVM | 95.0% | - | [ | |
| Kawano and Yanai, 2013 | Color and SURF | SVM | - | 81.6% | [ | |
| Farinella et al., 2014 | Bag of textons | SVM | PFID | 31.3% | - | [ |
| Yang et al., 2010 | Pairwise local features | SVM | 78.0% | - | [ | |
| He et al., 2014 | DCD, MDSIFT, SCD, SIFT | KNN | TADA | 64.5% | - | [ |
| Zhu et al., 2015 | Color, texture and SIFT | KNN | 70.0% | - | [ | |
| Matsuda et al., 2012 | SIFT, HOG, Gabor texture and color | MKL-SVM | UEC-Food-100 | 21.0% | 45.0% | [ |
| Liu et al., 2016 | Extended HOG and Color | Fisher Vector | 59.6% | 82.9% | [ | |
| Kawano and Yanai, 2014 | Color and HOG | Fisher Vector | 65.3% | - | [ | |
| Yanai and Kawano, 2015 | Color and HOG | Fisher Vector | 65.3% | 86.7% | [ | |
| Kawano and Yanai, 2014 | Fisher Vector, HOG and color | One x rest Linear classifier | UEC-Food-256 | 50.1% | 74.4% | [ |
| Yanai et al., 2015 | Color and HOG | Fisher Vector | 52.9% | 75.5% | [ | |
| Anthimopoulos et al., 2014 | ANNnh | Diabetes | 75.0% | - | [ | |
| Bossard et al., 2014 | Food-101 | Food-101 | 56.4% | - | [ | |
| Yanai and Kawano, 2015 | DCNN-Food | 70.4% | - | [ | ||
| Liu et al., 2016 | DeepFood | 77.4% | 93.7% | [ | ||
| Meyers, 2015 | GoogleLeNet | 79.0% | - | [ | ||
| Hassannejad et al., 2016 | Inception v3 | 88.3% | 96.9% | [ | ||
| Meyers, 2015 | GoogleLeNet | Food201 segmented | 76.0% | - | [ | |
| Menu-Match | 81.4%[ | - | ||||
| Christodoulidis et al., 2015 | Patch-wise CNN | Own database | 84.90% | - | [ | |
| Pouladzadeh et al., 2016 | Graph cut+Deep Neural Network | 99.0% | - | [ | ||
| Kawano and Yanai, 2014 | OverFeat+Fisher Vector | UEC-Food-100 | 72.3% | 92.0% | [ | |
| Liu et al., 2016 | DeepFood | 76.3% | 94.6% | [ | ||
| Yanai and Kawano, 2015 | DCNN-Food | 78.8% | 95.2% | [ | ||
| Hassannejad et al., 2016 | Inception v3 | 81.5% | 97.3% | [ | ||
| Chen and Ngo, 2016 | Arch-D | 82.1% | 97.3% | [ | ||
| Liu et al., 2016 | DeepFood | UEC-Food-256 | 54.7% | 81.5% | [ | |
| Yanai and Kawano, 2015 | DCNN-Food | 67.6% | 89.0% | [ | ||
| Hassannejad et al., 2016 | Inception v3 | 76.2% | 92.6% | [ | ||
| Ciocca et al., 2016 | VGG | UNIMINB2016 | 78.3% | - | [ | |
| Chen and Ngo, 2016 | Arch-D | VIREO | 82.1% | 95.9% | [ | |
Represents the mean average precision
Top 1 and/or Top 5 indicate that the performance of the classification model was evaluated based on the first assigned class with the highest probability and/or the top 5 classes among the prediction for each given food item, respectively
Methods for food volume and calorie estimation.
| Study (Year) | Approach | Performance | Reference |
|---|---|---|---|
| Noronha et al., 2011 | Via crowdsourcing (e.g. users from Amazon Mechanical Turk) | Better performance than other commercial app using crowdsourcing but overall it is error prone since users estimate food portion by just looking at the picture | [ |
| Chen, 2012 | Use depth camera to acquire color and depth | Preliminary result showing some limitations when estimating quantity of cooked rice and water | [ |
| Villalobos et al., 2012 | Use Top+side view pictures with user’s finger as reference | Results change due to illumination conditions and image angle; standard error is in an acceptable range | [ |
| Beijbom et al., 2015 | Use menu items from nearby restaurants | Food calorie is from pre-defined restaurant’s menu | [ |
| Meyers, 2015 | 3D volume estimation by capturing images with a depth camera and reconstructing image using Convolutional Neural Network and RANSAC | Using toy food; the CNN volume predictor is accurate for most of the meals; no calorie estimation outside a controlled environment. | [ |
| Woo et al., 2010 | Use a checkerboard as reference for camera calibration and 3D reconstruction | Mean volume error of 5.68% on a test of sever food items | [ |