| Literature DB >> 29848474 |
Nicholas Rosso1, Philippe Giabbanelli2.
Abstract
BACKGROUND: National surveys in public health nutrition commonly record the weight of every food consumed by an individual. However, if the goal is to identify whether individuals are in compliance with the 5 main national nutritional guidelines (sodium, saturated fats, sugars, fruit and vegetables, and fats), much less information may be needed. A previous study showed that tracking only 2.89% of all foods (113/3911) was sufficient to accurately identify compliance. Further reducing the data needs could lower participation burden, thus decreasing the costs for monitoring national compliance with key guidelines.Entities:
Keywords: diet, food, and nutrition; public health informatics; supervised machine learning
Year: 2018 PMID: 29848474 PMCID: PMC6000477 DOI: 10.2196/publichealth.9536
Source DB: PubMed Journal: JMIR Public Health Surveill ISSN: 2369-2960
Figure 1A decision tree starts at a root (top). For a given individual, we repeatedly compare the individual’s data with the questions in the tree. In this example, if the individual did not consume food 1, then the follow-up question is whether food 2 was consumed. Eventually, we reach a conclusion: whether the individual was in compliance with the guideline or not. Such trees are automatically built from the data.
Figure 2Flow diagram of our methodology, showing the acquisition, preprocessing, and mining of the data. NDNS: National Diet and Nutrition Survey; SMOTE: Synthetic Minority Over-Sampling Technique.
Sample outcome for the decision tree classifier on free sugars.
| Study | Minimum number of instances | Accuracy (average) | Recall | Specificity | Number of factors |
| Previous | 60 | 76.5 | 76.1 | 76.9 | 28 |
| Current | 60 | 78.2 | 73.6 | 82.9 | 31 |
| Current | 70 | 78.1 | 74.7 | 77.3 | 31 |
| Current | 80 | 78.3 | 74.7 | 78.3 | 30 |
| Current | 90 | 77.9 | 75.1 | 80.8 | 30 |
| Current | 95 | 77.9 | 75.1 | 80.7 | 25 |
| Current | 100 | 77.3 | 75.7 | 78.9 | 26 |
| Current | 115 | 77.2 | 75.5 | 78.8 | 22 |
Key characteristics of the National Diet and Nutrition Survey (NDNS) household dataset. All participants in the study were within the United Kingdom. There were several study waves, with around 1000 respondents per year.
| Characteristics | Categorical count, n (%) | |
| Male | 5034 (47.41) | |
| Female | 5439 (52.57) | |
| Free sugars | 1472 (35.41) | |
| Salt | 2524 (60.73) | |
| Fat | 1045 (25.14) | |
| Saturated fat | 795 (19.13) | |
| Fruits and vegetables | 656 (15.78) | |
| English | 5036 (48.08) | |
| Northern Irish | 3442 (32.86) | |
| Scottish | 684 (6.53) | |
| Welsh | 398 (3.80) | |
| Irish | 194 (1.85) | |
| Other | 719 (6.88) | |
| Single (never married) | 6240 (59.57) | |
| Married (living with partner) | 1960 (18.71) | |
| Divorced | 261 (2.49) | |
| Married (living separate) | 3 (0.06) | |
| Widowed | 139 (1.32) | |
| Other | 1870 (17.85) | |
| Going to school full-time | 2974 (28.39) | |
| Full or part time employment | 4440 (42.39) | |
| Not working presently | 3039 (29.02) | |
Figure 3Main foods either by (a) contribution to caloric intake, or (b) prevalence among individuals.
Comparison of the best decision tree using the weight of foods (previous study, Giabbanelli and Adams, 2016 [8]) or simplified foods (this study), while keeping the number of foods similar.
| Study | Guidelines | Number of instances | Accuracy (%) | Recall | Specificity | Number of factors |
| Previous | Free sugars | 60 | 76.5 | 76.1 | 76.9 | 28 |
| Current | Free sugars | 95 | 77.9 | 75.1 | 80.7 | 25 |
| Previous | Fat | 70 | 72.4 | 66.3 | 78.4 | 33 |
| Current | Fat | 90 | 79.4 | 70.4 | 88.5 | 33 |
| Previous | Fruits and vegetables | 50 | 83.1 | 82.5 | 83.8 | 11 |
| Current | Fruits and vegetables | 90 | 82.2 | 82.3 | 82.2 | 10 |
| Previous | Saturated fat | 20 | 79.7 | 75.8 | 83.6 | 28 |
| Current | Saturated fat | 90 | 84.6 | 77.4 | 91.8 | 27 |
| Previous | Salt | 15 | 75.8 | 81.9 | 69.8 | 28 |
| Current | Salt | 55 | 76.3 | 79.5 | 73.2 | 26 |
Comparison of the best decision tree using the weight of foods (previous study, Giabbanelli and Adams, 2016 [8]) or simplified foods (this study), without being limited by the number of foods.
| Study | Guidelines | Number of instances | Accuracy (%) | Recall | Specificity | Number of factors |
| Previous | Free sugars | 60 | 76.5 | 76.1 | 76.9 | 28 |
| Current | Free sugars | 60 | 78.2 | 73.6 | 82.9 | 31 |
| Previous | Fat | 70 | 72.4 | 66.3 | 78.4 | 33 |
| Current | Fat | 70 | 79.9 | 72.3 | 87.7 | 43 |
| Previous | Fruits and vegetables | 50 | 83.1 | 82.5 | 83.8 | 11 |
| Current | Fruits and vegetables | 50 | 83.5 | 84.9 | 82.2 | 16 |
| Previous | Saturated fat | 20 | 79.7 | 75.8 | 83.6 | 28 |
| Current | Saturated fat | 20 | 84.7 | 79.3 | 90.1 | 42 |
| Previous | Salt | 15 | 75.8 | 81.9 | 69.8 | 28 |
| Current | Salt | 50 | 76.6 | 79.9 | 73.2 | 25 |
Figure 4Accuracy, recall (“Yes”), and specificity (“No”) when (a) limiting the number of foods as in a previous study (Giabbanelli & Adams, 2016 [8]), or (b) using any number of foods to build the decision trees, giving us the optimized decision trees.
Individual foods used as predictors at least 5 times in the trees generated using our 2 processes (similar/optimized) and for the 5 guidelines: Fruit and Vegetables, Fat, Saturated Fat, Salt, and Free Sugars. The frequency is the number of times that a food is used as a decision node across all trees (eg, if used 3 times in 5 trees each, it would be 15).
| Variables | Similar decision tree | Optimized decision tree | Total frequency | |||||||||
| FVb | Fat | SatFatc | Salt | Sugd | FV | Fat | SatFat | Salt | Sug | |||
| Sausages | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 20 | |||||
| Bananas raw | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 19 | |||||
| Sausage roll | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 16 | |||||
| Cheese cheddar | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 14 | |||||
| Milk chocolate | ✓ | ✓ | ✓ | ✓ | 12 | |||||||
| Butter salted | ✓ | ✓ | ✓ | ✓ | ✓ | 10 | ||||||
| Cheese spreads | ✓ | ✓ | 8 | |||||||||
| Ice cream | ✓ | ✓ | ✓ | 8 | ||||||||
| Fruit drink | ✓ | ✓ | ✓ | 8 | ||||||||
| Chicken pieces | ✓ | ✓ | 8 | |||||||||
| Sex | ✓ | ✓ | ✓ | 7 | ||||||||
| Potato crisps | ✓ | ✓ | 6 | |||||||||
| Apples | ✓ | ✓ | 6 | |||||||||
| Milk whole | ✓ | ✓ | ✓ | 6 | ||||||||
| Beans baked | ✓ | ✓ | ✓ | ✓ | 6 | |||||||
| Onions | ✓ | ✓ | 6 | |||||||||
| Cola | ✓ | ✓ | 6 | |||||||||
| Apple juice unsweetened UHTa | ✓ | ✓ | ✓ | ✓ | 6 | |||||||
| Olive oil | ✓ | 6 | ||||||||||
| Orange juice unsweetened | ✓ | ✓ | 6 | |||||||||
| Orange juice unsweetened UHT | ✓ | 6 | ||||||||||
| Bacon | ✓ | 6 | ||||||||||
| Apple juice unsweetened | ✓ | ✓ | 5 | |||||||||
| Sex | ✓ | ✓ | ✓ | 7 | ||||||||
aUHT: Ultra-high-temperature processing.
bFV: fruits and vegetables.
cSatFat: saturated fat.
dSug: free sugars.