| Literature DB >> 35683993 |
Yusentha Balakrishna1,2, Samuel Manda2,3, Henry Mwambi2, Averalda van Graan4,5.
Abstract
Evidence-based knowledge of the relationship between foods and nutrients is needed to inform dietary-based guidelines and policy. Proper and tailored statistical methods to analyse food composition databases (FCDBs) could assist in this regard. This review aims to collate the existing literature that used any statistical method to analyse FCDBs, to identify key trends and research gaps. The search strategy yielded 4238 references from electronic databases of which 24 fulfilled our inclusion criteria. Information on the objectives, statistical methods, and results was extracted. Statistical methods were mostly applied to group similar food items (37.5%). Other aims and objectives included determining associations between the nutrient content and known food characteristics (25.0%), determining nutrient co-occurrence (20.8%), evaluating nutrient changes over time (16.7%), and addressing the accuracy and completeness of databases (16.7%). Standard statistical tests (33.3%) were the most utilised followed by clustering (29.1%), other methods (16.7%), regression methods (12.5%), and dimension reduction techniques (8.3%). Nutrient data has unique characteristics such as correlated components, natural groupings, and a compositional nature. Statistical methods used for analysis need to account for this data structure. Our summary of the literature provides a reference for researchers looking to expand into this area.Entities:
Keywords: clustering; dimension reduction; food composition database; nutrient database; regression; review; statistical methods
Mesh:
Year: 2022 PMID: 35683993 PMCID: PMC9182527 DOI: 10.3390/nu14112193
Source DB: PubMed Journal: Nutrients ISSN: 2072-6643 Impact factor: 6.706
Figure 1Flowchart of reviewed studies.
Description of included studies.
| Study | Year | Country of Data | Objectives | Methods | Main results |
|---|---|---|---|---|---|
| Akbay et al. [ | 2000 | United States of America | To divide lamb meat into groups distinct in dietary fat and associated nutrients, to offer healthier dietary replacements. | Agglomerative hierarchical cluster analysis with average linkage | Two main clusters were found in the lamb meat data. One of the clusters divided into two families and four subfamilies based on fatty acids, cholesterol, and energy composition. |
| Atsa’am et al. [ | 2021 | West Africa | To determine subgroupings within the ‘cereals’ category of foods. | K-means clustering with Euclidean distance | Six subgroups within the ‘cereals’ category were found which separated the grains by type and preparation method. |
| Balakrishna et al. [ | 2021 | South Africa | To determine nutrient co-occurrence patterns and compositionally similar food groupings. | Spearman’s rank correlation, principal component analysis | Significant correlations were found among the nutrients. Eight nutrient patterns were obtained, which mirrored the South African food-based dietary guidelines. |
| Chu et al. [ | 2009 | Taiwan | To detect unlikely nutrient values (errors) in a food composition database. | Ranking coefficients of variation for each nutrient by food subgroup and detecting outliers | Compared to manual assessment, error detection increased 38-fold with the computerised process. |
| Davis et al. [ | 2004 | United States of America | To determine possible changes in nutrient composition for garden crops between 1950 and 1990. | Wilcoxon signed-rank test | Of the 13 nutrients analysed for 43 food items, 6 exhibited statistically significant declines from 1950 to 1990. Declines ranged from 6% to 38%. |
| Ispirova et al. [ | 2019 | Italy, United Kingdom, Switzerland, Sweden, Slovenia, Belgium, Denmark, Netherlands, United States of America, Canada | To decrease the error of data borrowing when imputing missing nutrient values from other food composition databases. | Non-negative matrix factorization, null hypothesis testing | When borrowing from other food composition databases, the proposed methodology produced smaller absolute errors more often than regular borrowing methods. |
| Ispirova et al. [ | 2020 | Italy, United Kingdom, Switzerland, Sweden, Slovenia, Belgium, Denmark, The Netherlands, United States of America, Canada | To evaluate imputation methods for missing data in food composition databases. | Non-negative matrix factorization, multiple imputations by chained equations, nonparametric missing value imputation using random forest, k-nearest neighbour | Imputation methods using statistical prediction performed better than traditional approaches of fill-in with the mean and fill-in with the median. Overall, the missing value imputation using the random forest technique performed the best by yielding the smallest error. |
| Khan [ | 1996 | Australia | To identify thresholds that would rank food items according to a low, medium, or high nutrient content. | Measures of central tendency | No optimum ranking scheme was recommended, but guidelines were provided. Three different criteria were deemed suitable in line with the proposed guidelines. |
| Kim et al. [ | 2015 | United States of America | To investigate the relationships between food items and between nutrients to inform nutrition. | Network-based approaches, hierarchical clustering with average linkage, pairwise correlations | Clustering revealed a hierarchical organisation of food that was consistent with common nutritional knowledge but also found unexpected relationships between food items. Similarly, significant positive pairwise correlations were found to exist between nutrients. |
| Li et al. [ | 2021 | United States of America | To categorise raw plant foods according to nutritional similarity. | Spearman’s rank correlation, principal component analysis, soft independent modelling of class analogies (SIMCA), agglomerative hierarchical clustering | Four clusters were identified that consisted of foods from different food groups. Better separation was achieved using clusters rather than traditional food groups. |
| Liu et al. [ | 2012 | China | All foods in traditional Chinese medicine are categorised into ‘the four natures’: cold, cool, warm, and hot. The purpose of this paper is to examine the association between the nutrient content of these foods and their cold-hot nature category. | Logistic regression | Fat, carbohydrate, and selenium were significantly associated with the hot nature of foods while iron and copper were significantly associated with the cold nature of foods. The results suggest that the nutrient contents of foods may be one of the distinguishing factors for the categorisation of the cold-hot nature of foods. |
| Mayer [ | 1997 | United Kingdom | To determine if the nutritional composition of fruits and vegetables had changed between the 1930s and 1980s. | Student’s | There was a reduction in the nutrient content for several food items over the 50 years. |
| Nguyen et al. [ | 2016 | United States of America | To determine whether ‘healthier’ versions of common foods have more sugar than ‘regular’ counterparts. | Friedman test, post-hoc Wilcoxon signed-rank test | The sugar content of foods classified as ‘low-fat’ and ‘non-fat’ was higher than that of regular versions. |
| Nikitina et al. [ | 2021 | Unknown | To cluster cottage cheese products and confectionary by carbohydrate content. | K-means clustering | Five clusters were found which identified foods with low, medium, and high carbohydrate contents. |
| Pennington and Fisher [ | 2009 | United States of America | To empirically group fruits and vegetables based on food components of public health significance and thereafter, relate them to four classification variables: botanic family, colour, part of the plant, and total antioxidant capacity. | Agglomerative hierarchical clustering with Ward’s linkage, multivariate analysis of variance (MANOVA) | Eight clusters were identified that could be used to classify the 104 fruits and vegetables. Clusters were best defined by a combination of classification variables, such as colour and part of the plant, and were predictive of the nutritional profile. |
| Pennington and Fisher [ | 2010 | United States of America | To determine fruit and vegetable subgroups with significantly higher concentrations of 24 food components. | Kruskal–Wallis, one-way analysis of variance (ANOVA) | Concentrations of the 24 food components differed between the subgroups and can be used to aid nutritional guidelines. |
| Phanich et al. [ | 2010 | Thailand | To develop a food recommendation system for diabetic patients enabling diabetic patients to find suitable substitutions, similar in nutrient composition and characteristics, for food items. | Self-Organising Map (SOM), k-means clustering | The resulting clusters contained foods that provided similar amounts of eight selected nutrients. Patients were able to use the developed software to select healthier alternatives to current food choices. |
| do Prado et al. [ | 2016 | Brazil | To compare the change in the nutrient composition of specific Brazilian food groups between 2003 and 2013. | Percentage change, hierarchical cluster analysis, principal component analysis | The results showed that using pre-established food groups alone may be inaccurate in assessing changes in nutritional composition. Hierarchical cluster analysis combined with percentage change allowed efficient identification of the changes in the nutritional composition of food items. |
| Similä et al. [ | 2006 | Finland | To obtain information about nutrient co-occurrence patterns among food items. | Factor analysis | Four nutrient content patterns, which was consistent with prior knowledge, were identified. The patterns were characterised by (1) fish, meat, dairy products, legumes, seeds, and nuts; (2) vegetable fats; (3) staple foods; and (4) offal foods (liver, kidney). |
| Westrich et al. [ | 1998 | United States of America | To estimate unknown nutrient values in commercial food products. | Linear programming optimisation, quadratic programming optimisation | The proposed optimisation software was able to estimate missing nutrient values four times faster than conventional methods with the same degree of accuracy. The linear programming method was found to be faster than the quadratic programming method. |
| White and Broadley [ | 2005 | United Kingdom, United States of America | To determine nutrient composition changes in fruits and vegetables between the 1930s and 1980s. | Student’s | Average concentrations of certain minerals in fruits and vegetables had significantly decreased between the 1930s and 1980s. |
| Windham et al. [ | 1985 | United States of America | To group foods within the dairy, grain, and fat commodity groups depending on nutrients of current concern. | Fuzzy c-means clustering | Five clusters within each commodity group were identified. Nutrient content was similar within subgroups and overcame the problem of objective and accurate grouping when dealing simultaneously with many nutrients. The use of fuzzy clustering avoided the use of arbitrary cut-offs when determining foods high and low in specific nutrients. |
| Xie et al. [ | 2020 | United States of America, China | To determine the nutrients affecting the cold-hot nature property of food items as per traditional Chinese medicine. | Multivariate ordinal regression, ANOVA | Six components (folate, vitamin B6, calcium, vitamin A, and caffeine) were found to be predictive of the cold, plain, and hot nature of foods. |
| Yarbrough Al-Bander et al. [ | 1988 | United Kingdom | To explore the distribution of chloride content in uncooked foods and determine its correlation with sodium content. | Wilcoxon signed-rank test, linear regression | Chloride and sodium content exhibited large variation among the 216 uncooked food items. Chloride and sodium content were strongly correlated. |
Characteristics of included studies (n = 24).
| Characteristic |
| % |
|---|---|---|
| Primary statistical method | ||
| Standard statistical methods a | 8 | 33.3 |
| Regression methods | 3 | 12.5 |
| Clustering | 7 | 29.1 |
| Dimension reduction techniques | 2 | 8.3 |
| Other b | 4 | 16.7 |
| Aims c | ||
| To evaluate changes in nutrient content over time | 4 | 16.7 |
| To determine associations between nutrient content and known food features | 6 | 25.0 |
| To identify compositionally similar food items | 9 | 37.5 |
| To determine nutrient co-occurrence patterns | 5 | 20.8 |
| To address the completeness and accuracy of food composition databases | 4 | 16.7 |
| Country of data used c | ||
| United States of America | 13 | 54.2 |
| United Kingdom | 5 | 20.8 |
| Other d | 11 | 45.8 |
a Descriptive statistics, Student’s t-test, Wilcoxon signed-rank, Kruskal–Wallis test, and Friedman test. b Network-based approaches, mathematical optimisation, and hypothesis testing. c Some studies addressed more than one aim or used data from more than one country. One study did not specify the country of data used. d Australia, Belgium, Brazil, Canada, China, Denmark, Finland, Italy, The Netherlands, Slovenia, South Africa, Sweden, Switzerland, Taiwan, Thailand, and West Africa.
Figure 2Themes from the literature on the application of statistical methods to food composition data.