Literature DB >> 26879185

Identifying small groups of foods that can predict achievement of key dietary recommendations: data mining of the UK National Diet and Nutrition Survey, 2008-12.

Philippe J Giabbanelli1, Jean Adams1.   

Abstract

OBJECTIVE: Many dietary assessment methods attempt to estimate total food and nutrient intake. If the intention is simply to determine whether participants achieve dietary recommendations, this leads to much redundant data. We used data mining techniques to explore the number of foods that intake information was required on to accurately predict achievement, or not, of key dietary recommendations.
DESIGN: We built decision trees for achievement of recommendations for fruit and vegetables, sodium, fat, saturated fat and free sugars using data from a national dietary surveillance data set. Decision trees describe complex relationships between potential predictor variables (age, sex and all foods listed in the database) and outcome variables (achievement of each of the recommendations).
SETTING: UK National Diet and Nutrition Survey (NDNS, 2008-12).
SUBJECTS: The analysis included 4156 individuals.
RESULTS: Information on consumption of 113 out of 3911 (3 %) foods, plus age and sex was required to accurately categorize individuals according to all five recommendations. The best trade-off between decision tree accuracy and number of foods included occurred at between eleven (for fruit and vegetables) and thirty-two (for fat, plus age) foods, achieving an accuracy of 72 % (for fat) to 83 % (for fruit and vegetables), with similar values for sensitivity and specificity.
CONCLUSIONS: Using information on intake of 113 foods, it is possible to predict with 72-83 % accuracy whether individuals achieve key dietary recommendations. Substantial further research is required to make use of these findings for dietary assessment.

Entities:  

Keywords:  Data mining; Diet; Dietary assessment; Dietary pattern analysis; Nutrition

Mesh:

Year:  2016        PMID: 26879185      PMCID: PMC4873899          DOI: 10.1017/S1368980016000185

Source DB:  PubMed          Journal:  Public Health Nutr        ISSN: 1368-9800            Impact factor:   4.022


The intention of many dietary assessment methods is to capture information on all foods consumed, or at least those believed to make the largest contribution to total intake( ), in order to estimate total nutrient intake. For some purposes, this detailed estimation of total nutrient intake may lead to collection of much redundant data. This is particularly the case when assessing adherence with policy targets and messages such as ‘five-a-day’ portions of fruit and vegetables. The collection of substantial redundant information places unnecessary burden on research participants and unnecessarily uses scarce research resources. To take a first step to overcoming this problem, we applied data mining techniques to explore how many, and which, foods intake information was required on to accurately predict achievement, or not, of key dietary recommendations.

Data mining, an overview

Unlike traditional statistical approaches such as multiple regression, data mining allows multiple non-linear relationships and interaction effects to be efficiently captured( , ). Several data mining tools exist. In the present study we used ‘classifiers’. A classifier is a function that labels individuals on an outcome (e.g. achieving a dietary recommendation or not) based on a group of predictor variables (e.g. how much of each individual food was consumed). The analysis package is first provided with a ‘training set’ of individual-level data in which both the outcome and the predictor variables are known, and uses this to learn how the predictor variables are related to the outcome. This produces the classifier function, which can then be used to infer the outcome in a new case based on just the predictor variables. Finally, the accuracy of the classifier is evaluated on a new ‘testing set’ of data. There are numerous ways to build classifiers. We used ‘decision trees’( , , ). Decision trees provide a graphical illustration of a classifier composed of a number of predictor variables. A decision tree involves repeated ‘cuts’ of the data according to the level of included predictor variables to identify groups of individuals who are similar in terms of the outcome variable of interest. This produces a decision tree where the path from the root to the outcome corresponds to successive ‘cuts’, or divisions, of the population. Figure 1 provides a simplified, hypothetical example of a decision tree where the intention is to identify whether or not individuals achieve the recommended intake of fruit and vegetables (the outcome) using information on the consumption of carrots and white bread (the two predictor variables). Figure 1(a) shows the decision tree based on the ‘cuts’ represented in Fig. 1(b), the latter being a simple graphical plot of consumption of both carrots and white bread with all individuals labelled according to whether or not they achieve the recommended intake of fruit and vegetables. In terms of meeting fruit and vegetable recommendations there appear to be five ‘clusters’ of participants in Fig. 1(b). A series of ‘cuts’ can isolate these clusters. The first cut (labelled ‘A’ in Fig. 1(a) and 1(b)) divides the population according to consumption of carrots. The next two cuts (labelled ‘B’ and ‘C’) then divide the resulting two groups according to consumption of white bread. Finally, a fourth cut (labelled ‘D’) divides those with a medium carrot and medium white bread intake according to a more fine-grained assessment of carrot intake.
Fig. 1

(colour online) Schematic illustration of a decision tree (a) and how this is formed through repeated ‘cuts’ of the data (b)

(colour online) Schematic illustration of a decision tree (a) and how this is formed through repeated ‘cuts’ of the data (b) To build decision trees with different numbers of predictor variables, the minimum number of individual cases that can be further divided by a subsequent ‘cut’ is varied. If a small group of individuals can be further subdivided, a sizeable tree including many predictor variables can result. However, if limits are placed on the minimum size of group that can be further subdivided, a smaller decision tree, including fewer predictor variables, results. In the current study we made use of this feature to explore the effect of including more or fewer predictor variables on the accuracy of decision trees. A small number of studies have applied data mining techniques to nutritional data. These have focused primarily on dietary pattern analysis, exploring which dietary components are predictive of a range of health outcomes( – ). However, we are not aware of any other uses of data mining to identify which foods are predictive of achievement, or not, of key dietary recommendations.

Aim

Our aim was to use data mining techniques to determine the number of foods that intake information was required on to accurately predict achievement, or not, of dietary recommendations for intake of fruit and vegetables, free sugars, sodium, fat and saturated fat.

Methods

We built decision trees for achievement of key dietary recommendations using data from the first four years of the rolling programme of the UK’s national dietary surveillance data set: the National Diet and Nutrition Survey (NDNS).

Data source

The NDNS is an annual cross-sectional survey assessing the diet, nutrient intake and nutritional status of the general population aged 18 months and upwards living in private households in the UK( ). Since 2008 an annual ‘rolling programme’ has been in place, allowing data to be combined over years. We used data from years 1–4 of this programme, collected in 2008–12. The NDNS aims to collect data from a sample of 1000 respondents per year: at least 500 adults (aged 19 years and older) and at least 500 children (aged 1·5 to 18 years). Households across the UK are selected to take part in the NDNS using a multistage probability design. In each wave, a random sample of primary sampling units is selected for inclusion. These are small geographical areas that allow more efficient data collection by enabling it to be geographically focused. Within these primary sampling units, private addresses are randomly selected for inclusion. If, on visiting, it is found that more than one household lives at a particular address, one is randomly selected for inclusion. Within participating households, up to one adult and one child are randomly selected to take part as ‘respondents’. Data collection includes completion of a 4 d estimated food diary, where participants estimate the weight of foods consumed using food labels and household measures( ). NDNS data were obtained from the UK Data Archive, an online resource that makes research data available to the UK research community.

Inclusion and exclusion criteria

NDNS participants were included in the analysis if they completed 3 or 4 d of the estimated food diary. As recommendations for fruit and vegetable intake apply only to those aged 11 years or older, children aged less than 11 years were excluded from this component of the analysis.

Outcomes of interest: achievement of dietary recommendations

Information on which foods were consumed, and how much participants estimated was consumed, was combined with nutritional information to determine mean daily intake of fruit and vegetables (80 g portions) and sodium (milligrams), as well as mean daily percentage of energy derived from fat, saturated fat and free sugars, for each individual. This information was then used to determine whether or not each individual met international, or UK, recommendations for these variables. We used UK recommendations for fruit and vegetable and sodium intakes, as these have been graded according to age. It is recommended that individuals aged 11 years and older consume at least five 80 g portions of fruit and vegetables daily. This includes a maximum of one portion of juice, with additional juice portions not counted. For sodium, current UK recommendations are that those aged 11 years and older consume no more than 2400 mg/d; children aged 7–10 years, no more than 2000 mg/d; children aged 4–6 years, no more than 1200 mg/d; and children aged 1–3 years, no more than 800 mg/d( ). The WHO recommends population food and nutrient intake goals for the avoidance of diet-related diseases. These state that no more than 30 % of energy should be derived from fat, no more than 10 % from saturated fat and no more than 10 % from free sugars( ).

Predictor variables of interest: foods consumed

In total, 3911 different foods (including drinks) have been recorded in NDNS food diaries. We used total estimated weight (in grams) of each individual food eaten by each individual as potential predictor variables. Age and sex were also included as potential predictor variables. The use of including markers of socio-economic position (education, income and social class) as potential predictor variables was explored, but these were found to add no additional increase in accuracy over and above age, sex and individual foods. Decision trees reported here do not include any socio-economic predictor variables.

Data analysis

Our analysis scripts and detailed decision trees are available at https://osf.io/znv82. In all cases except sodium, the proportion of individuals achieving the recommendations was substantially less than 50 %; for sodium substantially more than 50 % of individuals achieved the recommendations (Table 2). As detailed in the online supplementary material, this imbalance in outcome variables can lead to low-quality classifiers. To correct this, we pre-processed the data using the Synthetic Minority Over-sampling TEchnique (SMOTE)( ), which creates new cases for the group that accounted for less than 50 % of participants by interpolating between existing cases that lie together. WEKA software( ) was then used to build decision trees using the J48 algorithm and error pruning.
Table 2

Prevalence of achieving and not achieving dietary recommendations and accuracy of decision trees to predict this, using data mining techniques on the nutritional intake of 4156 individuals (2967 individuals for fruit and vegetables) from the UK National Diet and Nutrition Survey (2008–12)

Fruit & vegetablesFree sugarsSodiumFatSaturated fat
No. achieving recommendation without oversampling656147225241045795
%22·135·460·725·119·1
SMOTE oversampling %* 252 % (yes)85 % (yes)54 % (no)197 % (yes)322 % (yes)
No. achieving recommendation after oversampling2309* 2679252431033354
No. not achieving recommendation after oversampling2311* 2684251331113361
Decision tree with the best trade-off between accuracy and number of predictor variables
Overall accuracy (%)83·176·575·972·479·7
Sensitivity (%)82·576·181·966·375·8
Specificity (%)83·876·969·878·483·6
No. of predictor variables1128283328
% of all relevant food/nutrient (g) accounted for by predictor variables21·0 31·213·413·027·4
Most accurate decision tree
Overall accuracy (%)83·677·076·172·981·7
Sensitivity (%)83·975·780·769·381·4
Specificity (%)83·378·371·576·481·9
No. of predictor variables506449123156
% of all relevant food/nutrient (g) accounted for by predictor variables30·8 38·625·429·542·7

SMOTE, Synthetic Minority Over-sampling TEchnique.

After oversampling using the SMOTE method (see online supplementary material).

Percentage of all fruit and vegetables (g) recorded, not just those contributing to 5-a-day portions (specifically, fruit juice can contribute a maximum of only one 5-a-day portion).

Comparison of the analytical sample with the UK population For each outcome of interest we built a series of decision trees with different numbers of predictor variables by varying the minimum number of individual cases that could be further divided. For each of the decision trees built, we calculated the number of predictor variables used and overall accuracy in correctly classifying individuals. We used the standard tenfold cross-validation procedure( ) in which the entire eligible NDNS data set was split into ten approximately equally sized parts. Nine parts were used in turn as training sets and the remaining tenth part was used as the testing set. The ability of decision trees to correctly identify those who achieved the recommendations (sensitivity) and those who did not (specificity) was also calculated. Adaptive sampling was used to identify the maximum overall accuracy that could be achieved, as well as the optimum trade-off between minimizing the number of predictor variables and maximizing the overall accuracy.

Results

Overall, 91 % of households eligible for inclusion agreed to take part in the first four waves of NDNS. Within these, 56 % (2083 adults and 2073 children; 4156 participants in total) of individuals selected to take part completed 3 or 4 d of the estimated food diary and were included in the analysis for sodium, free sugars, fat and saturated fat. Of these 4156 participants, 2967 (71·4 %) were aged 11 years or older and included in the analysis for fruit and vegetables. There were no missing data on sex or age. The distributions of age and sex in the analytical sample compared with the UK population as a whole are shown in Table 1. As the NDNS sample contains relatively equal numbers of children aged 18 years or younger and adults, distributions are provided separately for adults and children in Table 1. The main differences in the age and sex distributions between the analytical sample and the UK population were that the analytical sample had a higher proportion of adult women and a lower proportion of young adults (aged 19–29 years) than the UK population.
Table 1

Comparison of the analytical sample with the UK population

Adults aged 19 years or olderChildren aged <19 years
Analytical sample (n 2083)UK populationAnalytical sample (n 2073)UK population
Variable n % n % n % n %
Female118256·825 198 77351·5100748·66 955 26248·8
Age (adults)
19–29 years29614·29 447 07119·3
30–39 years39018·78 319 92617·0
40–49 years42520·49 268 73518·9
50–59 years36317·47 708 53215·8
60–64 years1818·73 807 9757·8
≥65 years42820·610 377 12721·2
Age (children)
0–4 years49924·13 913 95327·5
5–9 years58326·43 516 61524·7
10–14 years54726·43 669 32625·7
15–18 years44421·43 152 91922·1
Figure 2 shows the overall accuracy of decision trees for each of the five outcomes plotted against the number of predictor variables in decision trees. Overall accuracy ranged from 69 % (fat; ten predictor variables) to 84 % (fruit and vegetables; fifty predictor variables) depending on the outcome of interest and number of predictor variables included. For all guidelines but sodium, the relationship between the number of predictor variables and the accuracy was best described using a logarithmic trend model (P<0·01 in all cases). Thus, increasing the number of predictor variables from about ten to thirty improved the accuracy by a maximum of about five percentage points, but beyond this adding even a large number of additional predictor variables yielded only a very small additional improvement. We were unable to fit any function to the relationship between accuracy and number of predictor variables for sodium.
Fig. 2

(colour online) Overall accuracy (with 95 % confidence margins) of decision trees v. the number of predictor variables included, using data mining techniques on the nutritional intake of 4156 individuals (2967 individuals for fruit and vegetables) from the UK National Diet and Nutrition Survey (2008–12)

(colour online) Overall accuracy (with 95 % confidence margins) of decision trees v. the number of predictor variables included, using data mining techniques on the nutritional intake of 4156 individuals (2967 individuals for fruit and vegetables) from the UK National Diet and Nutrition Survey (2008–12) Table 2 provides information on the decision tree for each outcome that represented the best trade-off between accuracy and number of predictor variables. Information on the most accurate possible tree for each outcome is also shown in Table 2. Between eleven (for fruit and vegetables) and thirty-three (for fat) predictor variables provided the best trade-off to identify whether individuals achieved each of the recommendations, achieving overall accuracy of 72 % (for fat) to 83 % (for fruit and vegetables). Adding further predictor variables beyond this improved accuracy by a maximum of 2 % (for saturated fat) and less than 1 % (for all other outcomes). Sensitivity and specificity were similar to overall accuracy for fruit and vegetables and free sugars (and for saturated fat when the maximum number of predictor variables was included). However, specificity was higher than sensitivity for fat (and saturated fat), but the reverse was seen for sodium. Predictor variables in decision trees with the best trade-off between accuracy and number of predictor variables accounted for between 13 % (for fat) and 31 % (for free sugars) of total intake of relevant outcome variables. Prevalence of achieving and not achieving dietary recommendations and accuracy of decision trees to predict this, using data mining techniques on the nutritional intake of 4156 individuals (2967 individuals for fruit and vegetables) from the UK National Diet and Nutrition Survey (2008–12) SMOTE, Synthetic Minority Over-sampling TEchnique. After oversampling using the SMOTE method (see online supplementary material). Percentage of all fruit and vegetables (g) recorded, not just those contributing to 5-a-day portions (specifically, fruit juice can contribute a maximum of only one 5-a-day portion). Predictor variables used in decision trees with the best trade-off between accuracy and number of predictor variables are shown in Table 3. In total, 113 foods (3 % out of a total 3911 recorded as consumed), age and sex were included in the decision trees for all five outcomes. Overall, there was little overlap in predictor variables across outcomes. Age and two foods were included as predictor variables in the decision trees for three outcomes. A further six foods were included as predictor variables in the decision trees for two outcomes. The remaining 104 foods were included as predictor variables in only one decision tree.
Table 3

Predictor variables (individual foods, age and sex) included in decision trees for predicting achievement of five dietary recommendations, using data mining techniques on the nutritional intake of 4156 individuals (2967 individuals for fruit and vegetables) from the UK National Diet and Nutrition Survey (2008–12)

Dietary recommendation outcome
FatFree sugarsFruit & vegetablesSodiumSaturated fatFood name
YesYesYesAge
YesAlcoholic soft drinks, spirit based
YesAlmonds, kernel only: ground almonds
YesApple juice, unsweetened, cartons, pasteurized
YesApple juice, unsweetened, UHT
YesApples, eating, raw, flesh & skin only
YesAvocado pear, flesh only
YesBacon rashers, back, grilled, lean and fat
YesBacon rashers, back, not smoked, grilled, extra trim
YesBaked beans in tomato sauce with pork sausages
YesYesBananas, raw, flesh only
YesBeefburger and onion, grilled
YesBlack pudding, fried
YesBlackcurrant juice drink, ready to drink, not low calorie
YesBoiled sweets, barley sugar, butterscotch, glacier mints, hard candy
YesBread, white, crusty
YesYesBread, white, toasted
YesBread, 50 % white and 50 % wholemeal flours
YesBread, white sliced, not fortified
YesBrown sauce, bottled
YesBrussels sprouts, fresh, boiled
YesButter beans, dried, boiled
YesYesButter, salted
YesButter, unsalted
YesCarbonated beverages, no juice, not low calorie, canned
YesYesYesCarbonated beverages, no juice, not low calorie, not canned
YesCelery, fresh, raw
YesChapatti, brown, no fat
YesYesCheese, cheddar, any other or for recipes
YesCheese, cheddar, English
YesCheese, soft full fat, Philadelphia type
YesChicken fried in olive oil
YesChildren’s fromage frais fruit with added vitamin D
YesChocolate brownie, no nuts, purchased
YesChocolate-covered caramels, Cadbury Caramel
YesChocolate Swiss roll with butter cream, purchased
YesCola cherry cola, canned, not low calorie
YesCola, not canned, not low calorie, not caffeine free
YesColeslaw, purchased, not low calorie
YesCookies and biscuits with chocolate
YesCornetto type ice cream, chocolate or nut based
YesCranberry fruit juice drink, e.g. Ocean Spray
YesCream, double
YesCream egg
YesCroissants, plain, not filled
YesDrinking chocolate, instant, dry weight
YesFat spread (62–72 % fat), not polyunsaturated
YesFruit gums, wine gums
YesFruit juice drink, carbonated, not low calorie, not canned
YesFruit juice drink with 5 % fruit juice, ready to drink
YesFully coated chocolate biscuits with biscuit filling
YesGarlic bread, lower fat
YesHam, unspecified, not smoked, not canned
YesHamburger, Big Mac, McDonalds
YesHigh juice, ready to drink, not blackcurrant or low calorie
YesIce lollies
YesJaffa Cakes
YesKit Kat
YesLager, not canned, e.g. Heineken
YesLager, not canned, e.g. Skol
YesLamb scrag and neck, stewed, lean only
YesLemonade, not low calorie, not canned
YesLight spreadable butter (60 % fat)
YesLucozade sport isotonic drink, not carbonated
YesYesMayonnaise (retail)
YesYesMilk chocolate bar
YesMilk shake, thick style, takeaway
YesMilk, skimmed, after boiling
YesMilk, whole pasteurized, winter
YesMilk, whole pasteurized, summer
YesMushrooms fried in olive oil
YesNaan bread, plain
YesOatcakes
YesOlive oil
YesOnions, boiled
YesOrange juice, unsweetened, UHT
YesOven ready chips
YesPapadums/poppadoms, fried in vegetable ghee
YesPasta noodles, boiled
YesPasta noodles, egg, boiled
YesPasta spaghetti, boiled, white
YesPeanut butter, crunchy, not wholenut
YesPears, eating, raw, flesh & skin only, no core
YesPepperami
YesPetit Filous fromage frais
YesPotato cakes (scones), purchased
YesPotatoes, new, boiled, skins eaten
YesPotatoes, old, baked, flesh & skin
YesPotatoes, old, mashed & butter
YesPrawns, boiled, flesh only
YesReduced fat spread (41–62 %), not polyunsaturated
YesRibena, original blackcurrant drink, concentrate
YesRobinsons Fruit Shoot
YesRolls, white, crusty
YesYesYesSausage roll, flaky pastry, purchased
YesSausages, pork, grilled
YesSausages, premium pork, grilled
YesScrambled eggs with skimmed milk and no fat
YesSemi-sweet biscuit
YesSex
YesSoya alternative to milk, sweetened plain
YesSpinach, fresh, raw
YesSpreadable butter (75–80 % fat)
YesSugar, white
YesSuper Noodles, Batchelors, as served
YesSwiss roll, individual, chocolate coated, purchased
YesTomatoes, raw
YesTurkey slices, unsmoked, pre-pack or deli
YesWater for concentrated soft drinks, not diet
YesWhite chocolate buttons, mice
YesWhole milk, after boiling
YesWine white, dry, not canned
YesYesYoghurt twin pot with cereal/crumble
YesYoghurt, Greek style, cows, natural, whole milk
YesYorkshire pudding, frozen

UHT, ultra-heat treated.

Predictor variables (individual foods, age and sex) included in decision trees for predicting achievement of five dietary recommendations, using data mining techniques on the nutritional intake of 4156 individuals (2967 individuals for fruit and vegetables) from the UK National Diet and Nutrition Survey (2008–12) UHT, ultra-heat treated.

Discussion

Summary of results

The present study represents the first work we are aware of using data mining techniques to explore the number of foods that information is required on to predict achievement of dietary recommendations. In total, information on consumption of 113 of 3911 foods (3 %), plus age and sex was required to accurately categorize individuals according to all five dietary recommendations (fruit and vegetables, free sugars, sodium, fat and saturated fat). The best trade-off between decision tree accuracy and number of foods included was achieved at between eleven (for fruit and vegetables) and thirty-two (for fat, plus age) foods. These decision trees had an overall accuracy of 72 % (for fat) to 83 % (for fruit and vegetables), with similar values for sensitivity and specificity. Few individual foods were present in the decision tree for more than one dietary recommendation, although age was present in three.

Strengths and limitations of methods

We used data from a population-based sample, meaning that our findings are likely to be generalizable across the UK and to other countries with similar dietary profiles. However, diets vary internationally( ) and our results may not be more widely generalizable. The analytical sample had a slightly higher proportion of adult women and a lower proportion of younger adults (aged 19–29 years) than the UK population as a whole. The data used were collected using ‘estimated’ food diaries where portion sizes were estimated but not weighed. These are considered to be one of the more accurate methods of measuring dietary intake( ), meaning that both the predictor and outcome variables are likely to be valid. However, even estimated food diaries have their limitations, particularly in terms of participant burden and under-reporting of energy intake( , ). Doubly labelled water has been used to estimate total energy expenditure in a sub-sample of NDNS participants and compared with reported energy intake from food diaries. This reveals that reported energy intake is 12–34 % lower than estimated total energy expenditure, depending on the age of participants( ). This mismatch may be due to intentional or unintentional misreporting; participants changing their food intake in response to recording it; or a variety of other reasons. However, misreporting is unlikely to affect all foods and nutrients equally. For example, participants may be more likely to misreport confectionery than vegetable intake. For this reason, misreporting is not adjusted for in NDNS and we have not adjusted for misreporting here. Data mining using decision trees is computationally and statistically efficient. For example, inclusion of all 3911 foods consumed by NDNS participants in regression models with achievement of dietary recommendations as outcomes would be computationally, and statistically, demanding and unlikely to produce satisfactory results. Decision trees also produce transparent, and intuitively understandable, outputs (ours are provided at https://osf.io/znv82)( ). Many of foods included in the analysis had very skewed distributions. Indeed, the vast majority of foods in the database (3618) were eaten by less than 150 people. Decision trees seek to maximize information gain at each step, rather than working with the distribution as a whole as in traditional regression analysis. If an item is very discriminatory and helps differentiate between those who do and do not meet a particular guideline then it will be included, even if it is consumed by only a small number of people. Conversely, if an item is eaten by almost everyone but is not discriminatory, then it would be unlikely to be included. There was no overall trend between the proportion of participants who ate a food and the chance that that food was included in a decision tree (data not shown). We used adaptive sampling to identify decision trees that achieved the best trade-off between accuracy and number of predictor variables included. Thus, instead of systematically calculating the accuracy of all decision trees including all possible number of predictor variables, we focused on identifying the relationship between accuracy and number of predictor variables (logarithmic in most cases) and where the optimum trade-off between accuracy and number of predictor variables occurred (i.e. where the logarithmic curve flattened out). This means we cannot be absolutely sure that we have identified the decision trees with the best trade-off between accuracy and number of predictor variables in all cases. However, given the very small additional improvements in accuracy achieved by the most accurate v. best trade-off decision trees, we are certainly likely to have identified the near-best trade-off decision trees. We used estimated dietary records as our ‘gold standard’ tool for determining whether or not individuals achieved recommendations. Further work will be required to compare the accuracy of our decision trees with other methods of estimating who achieves dietary recommendations, such as FFQ.

Interpretation and implications of findings and areas for future work

Our findings indicate that information on only a small number of foods is required to determine whether individuals achieve five important dietary recommendations. If such binary outcomes are the key outcome of interest, then more detailed dietary assessment methods may inappropriately use scarce research resources and be unnecessarily burdensome to participants. While our results suggest that information on only a limited number of foods needs to be captured when assessing whether guidelines are met, substantial further research will be needed before these findings could be applied in the form of a new dietary assessment instrument. First, it would be helpful to replicate our analyses in a different, but comparable, sample. We have not done this as we are not aware of a comparable UK population-representative sample in whom diet diaries have been collected. Our decision trees used information on exact intake of 113 foods over 3–4 d. Assessing exact intake of a small number of foods may be no less burdensome for participants than assessing estimated intake of all foods using a food diary. Future work could compare the accuracy of decision trees based on exact intake of 113 foods, approximate intake of these foods (e.g. using the ordinal categories often used in FFQ), and exact and approximate intakes of foods at the food group, rather than individual food, level. Acceptability to research participants and resource implications of collecting the data required in all cases should also be compared. Our analysis focused on which foods can be used to predict whether or not individuals achieve dietary recommendations. But it is not necessarily the case that it is the foods included in the decision tress which cause people to achieve the recommendations or not. A maximum of only 31 % of the total intake of relevant nutrients or foods was accounted for by predictor variables in decision trees with the best trade-off between accuracy and number of predictor variables. Thus, decision trees did not particularly include foods that account for the majority of intake of nutrients and foods of interest – as might be expected in an FFQ. The complex relationships between individual foods included in our decision trees and the dietary recommendations they are associated with may offer further useful insights and could be studied further.

Conclusion

We used data mining techniques to explore the number of foods that consumption information was required on to accurately predict achievement, or not, of five key dietary recommendations. Information on consumption of eleven to thirty-two foods (plus age and sex) was sufficient to identify with 72–83 % accuracy whether individuals achieved individual dietary recommendations. In total, information on 113 foods was required to predict achievement of all five recommendations studied. This method could be used to develop a new dietary assessment questionnaire.
  12 in total

Review 1.  Misreporting of energy and micronutrient intake estimated by food records and 24 hour recalls, control and adjustment methods in practice.

Authors:  Kamila Poslusna; Jiri Ruprich; Jeanne H M de Vries; Marie Jakubikova; Pieter van't Veer
Journal:  Br J Nutr       Date:  2009-07       Impact factor: 3.718

2.  Indicators for elevated risk factors for alcohol-withdrawal seizures: an analysis using a random forest algorithm.

Authors:  Thomas Hillemacher; Helge Frieling; Julia Wilhelm; Annemarie Heberlein; Deniz Karagülle; Stefan Bleich; Bernd Lenz; Johannes Kornhuber
Journal:  J Neural Transm (Vienna)       Date:  2012-05-24       Impact factor: 3.575

3.  Comparison of dietary assessment methods in nutritional epidemiology: weighed records v. 24 h recalls, food-frequency questionnaires and estimated-diet records.

Authors:  S A Bingham; C Gill; A Welch; K Day; A Cassidy; K T Khaw; M J Sneyd; T J Key; L Roe; N E Day
Journal:  Br J Nutr       Date:  1994-10       Impact factor: 3.718

Review 4.  A systematic review of the validity of dietary assessment methods in children when compared with the method of doubly labeled water.

Authors:  Tracy L Burrows; Rebecca J Martin; Clare E Collins
Journal:  J Am Diet Assoc       Date:  2010-10

5.  Dietary patterns analysis using data mining method. An application to data from the CYKIDS study.

Authors:  Chrystalleni Lazarou; Minas Karaolis; Antonia-Leda Matalas; Demosthenes B Panagiotakos
Journal:  Comput Methods Programs Biomed       Date:  2012-01-31       Impact factor: 5.428

6.  Reproducibility and validity of a semiquantitative food frequency questionnaire.

Authors:  W C Willett; L Sampson; M J Stampfer; B Rosner; C Bain; J Witschi; C H Hennekens; F E Speizer
Journal:  Am J Epidemiol       Date:  1985-07       Impact factor: 4.897

7.  Hazardous or harmful alcohol use in Royal Australian Navy veterans of the 1991 Gulf War: identification of high risk subgroups.

Authors:  Dean P McKenzie; Alexander C McFarlane; Mark Creamer; Jillian F Ikin; Andrew B Forbes; Helen L Kelsall; David M Clarke; Deborah C Glass; Peter Ittak; Malcolm R Sim
Journal:  Addict Behav       Date:  2006-02-07       Impact factor: 3.913

8.  Comparative analysis of a-priori and a-posteriori dietary patterns using state-of-the-art classification algorithms: a case/case-control study.

Authors:  Christina-Maria Kastorini; George Papadakis; Haralampos J Milionis; Kallirroi Kalantzi; Paolo-Emilio Puddu; Vassilios Nikolaou; Konstantinos N Vemmos; John A Goudevenos; Demosthenes B Panagiotakos
Journal:  Artif Intell Med       Date:  2013-09-09       Impact factor: 5.326

9.  Dietary quality among men and women in 187 countries in 1990 and 2010: a systematic assessment.

Authors:  Fumiaki Imamura; Renata Micha; Shahab Khatibzadeh; Saman Fahimi; Peilin Shi; John Powles; Dariush Mozaffarian
Journal:  Lancet Glob Health       Date:  2015-03       Impact factor: 26.763

10.  Identifying binge drinkers based on parenting dimensions and alcohol-specific parenting practices: building classifiers on adolescent-parent paired data.

Authors:  Rik Crutzen; Philippe J Giabbanelli; Astrid Jander; Liesbeth Mercken; Hein de Vries
Journal:  BMC Public Health       Date:  2015-08-05       Impact factor: 3.295

View more
  8 in total

1.  Multi-Statistical Approach for the Study of Volatile Compounds of Industrial Spoiled Manzanilla Spanish-Style Table Olive Fermentations.

Authors:  Antonio Garrido-Fernández; Alfredo Montaño; Amparo Cortés-Delgado; Francisco Rodríguez-Gómez; Francisco Noé Arroyo-López
Journal:  Foods       Date:  2021-05-24

Review 2.  Using Agent-Based Models to Develop Public Policy about Food Behaviours: Future Directions and Recommendations.

Authors:  Philippe J Giabbanelli; Rik Crutzen
Journal:  Comput Math Methods Med       Date:  2017-03-21       Impact factor: 2.238

3.  Development and validation of classifiers and variable subsets for predicting nursing home admission.

Authors:  Mikko Nuutinen; Riikka-Leena Leskelä; Ella Suojalehto; Anniina Tirronen; Vesa Komssi
Journal:  BMC Med Inform Decis Mak       Date:  2017-04-13       Impact factor: 2.796

4.  A Disaggregation Methodology to Estimate Intake of Added Sugars and Free Sugars: An Illustration from the UK National Diet and Nutrition Survey.

Authors:  Birdem Amoutzopoulos; Toni Steer; Caireen Roberts; Darren Cole; David Collins; Dove Yu; Tabitha Hawes; Suzanna Abraham; Sonja Nicholson; Ruby Baker; Polly Page
Journal:  Nutrients       Date:  2018-08-28       Impact factor: 5.717

5.  Accurately Inferring Compliance to Five Major Food Guidelines Through Simplified Surveys: Applying Data Mining to the UK National Diet and Nutrition Survey.

Authors:  Nicholas Rosso; Philippe Giabbanelli
Journal:  JMIR Public Health Surveill       Date:  2018-05-30

6.  Relative Validity of a Method Based on a Smartphone App (Electronic 12-Hour Dietary Recall) to Estimate Habitual Dietary Intake in Adults.

Authors:  Luis María Béjar; María Dolores García-Perea; Óscar Adrián Reyes; Esther Vázquez-Limón
Journal:  JMIR Mhealth Uhealth       Date:  2019-04-11       Impact factor: 4.773

7.  Predicting unplanned medical visits among patients with diabetes: translation from machine learning to clinical implementation.

Authors:  Arielle Selya; Drake Anshutz; Emily Griese; Tess L Weber; Benson Hsu; Cheryl Ward
Journal:  BMC Med Inform Decis Mak       Date:  2021-03-31       Impact factor: 2.796

8.  Electronic 12-Hour Dietary Recall (e-12HR): Comparison of a Mobile Phone App for Dietary Intake Assessment With a Food Frequency Questionnaire and Four Dietary Records.

Authors:  Luis María Béjar; Óscar Adrián Reyes; María Dolores García-Perea
Journal:  JMIR Mhealth Uhealth       Date:  2018-06-15       Impact factor: 4.773

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.