| Literature DB >> 31847188 |
Elizabeth L Chin1,2, Gabriel Simmons3, Yasmine Y Bouzid1,4, Annie Kan1,4, Dustin J Burnett1,4, Ilias Tagkopoulos2,5, Danielle G Lemay1,2,4.
Abstract
The Automated Self-Administered 24-Hour Dietary Assessment Tool (ASA24) is a free dietary recall system that outputs fewer nutrients than the Nutrition Data System for Research (NDSR). NDSR uses the Nutrition Coordinating Center (NCC) Food and Nutrient Database, both of which require a license. Manual lookup of ASA24 foods into NDSR is time-consuming but currently the only way to acquire NCC-exclusive nutrients. Using lactose as an example, we evaluated machine learning and database matching methods to estimate this NCC-exclusive nutrient from ASA24 reports. ASA24-reported foods were manually looked up into NDSR to obtain lactose estimates and split into training (n = 378) and test (n = 189) datasets. Nine machine learning models were developed to predict lactose from the nutrients common between ASA24 and the NCC database. Database matching algorithms were developed to match NCC foods to an ASA24 food using only nutrients ("Nutrient-Only") or the nutrient and food descriptions ("Nutrient + Text"). For both methods, the lactose values were compared to the manual curation. Among machine learning models, the XGB-Regressor model performed best on held-out test data (R2 = 0.33). For the database matching method, Nutrient + Text matching yielded the best lactose estimates (R2 = 0.76), a vast improvement over the status quo of no estimate. These results suggest that computational methods can successfully estimate an NCC-exclusive nutrient for foods reported in ASA24.Entities:
Keywords: database matching; dietary recall; machine learning; nutrient database
Mesh:
Substances:
Year: 2019 PMID: 31847188 PMCID: PMC6950225 DOI: 10.3390/nu11123045
Source DB: PubMed Journal: Nutrients ISSN: 2072-6643 Impact factor: 5.717
Figure 1Comparison of estimated lactose to the servings of dairy for foods reported in the Automated Self-Administered 24-Hour Dietary Assessment Tool (ASA24) by a cohort of healthy U.S. adults. Lactose was estimated by manually looking up ASA24-reported foods into Nutrition Data System for Research (NDSR). The servings of total dairy are from the standard ASA24 output, with servings of soymilk subtracted [10,11]. The blue dashed line indicates the linear line of best fit.
Figure 2Overview of the manual lookup process. Query foods were selected from foods that were retrieved from an ASA24 output (white boxes). Each query food has three parts: (1) The Food Name, (2) the FoodCode, and (3) the corresponding Food Description. The manual lookup process can be broken down into two parts: lookup in ASA24 (light blue) and lookup in NDSR (dark blue). The ASA24 Food Name for each input food was searched in ASA24-2016 to retrieve the answers for each prompt to yield a given Food Description. The input Food Name was then searched in NDSR. The prompt/answer pairs obtained from ASA24-2016 inputs was used as a guide to selection answers from NDSR prompts. If the resulting NDSR output Food Description and the ASA24 Food Description were not similar, then a User Recipe was created in NDSR to serve as the match. NCC: Nutrition Coordinating Center.
Performance of the machine learning models.
| Performance Metric | LASSO | Bounded-LASSO a | Combined LASSO b | Ridge | Bounded-Ridge a | Combined Ridge b | FFNN | XGB-Regressor | Combined |
|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||
|
| 0.45 | 0.55 | 0.64 | 0.42 | 0.53 | 0.64 | 0.74 |
|
|
|
| 0.61 | 0.64 | 0.80 | 0.60 | 0.63 | 0.81 | 0.77 | 0.84 |
|
|
| 0.70 | 0.75 | 0.82 | 0.69 | 0.74 | 0.82 | 0.88 | 0.90 |
|
|
| 1.23 | 1.08 | 0.86 | 1.23 | 1.10 | 0.87 | 0.69 | 0.48 |
|
|
| NA | NA | 0.89 | NA | NA | 0.89 | NA | NA |
|
|
| |||||||||
|
| 0.32 (0.36) | 0.30 (0.53) | 0.32 (0.54) | 0.15 (0.27) | 0.12 (0.50) | 0.18 (0.46) | 0.27 (0.28) | 0.31 (0.31) | |
|
| 0.64 (0.67) | 0.63 (0.68) | 0.71 (0.75) | 0.61 (0.66) | 0.64 (0.69) | 0.72 (0.77) | 0.70 (0.69) | 0.75 (0.76) | |
|
| 0.62 (0.67) | 0.58 (0.78) | 0.61 (0.80) | 0.47 (0.59) | 0.48 (0.76) | 0.51 (0.77) | 0.60 (0.60) | 0.64 (0.63) | |
|
| 1.53 (1.49) | 1.37 (1.24) | 1.28 (1.16) | 1.64 (1.56) | 1.43 (1.27) | 1.37 (1.23) | 1.3 (1.28) | 1.18 (1.19) | |
|
| NA | NA | 0.85 (0.86) | NA | NA | 0.85 (0.86) | NA | NA | |
Values in ( ) are results removal of “Salmon, raw” (ASA24 FoodCode 26137100). SRC: Spearman Rank Coefficient; PCC: Pearson’s Correlation Coefficient; MAE: Mean Absolute Error. Italicized values indicate the highest R2, SRC, PCC, and Classifier Accuracy, and the lowest MAE for the training or test data sets. a negative predictions were clipped to zero, b Combined models include a classifier and regressor, * Training results are the averages from 10-fold cross validation.
Figure 3Comparison of the lactose (g) from the manual lookup to the (a) prediction from the XGB-Regressor model and (b) the Ridge-weighted Nutrient + Text database matching. For (b), markers are colored according to whether the first match was an NCC core food or a non-core food. Example of core foods are apples, honey, and bread. Examples of non-core foods are Genoa salami, cheese bread, and scrambled eggs.
Figure 4Top ten feature importances for the (a) Bounded-LASSO (least absolute shrinkage and selection operator), (b) Bounded-Ridge, (c) feed forward neural network (FFNN), and (d) Combined eXtreme Gradient Boosting (XGB) models. Feature importances were selected based on the absolute value of the coefficient/weight/frequency but the actual value is plotted. The frequency values for the XGB model are always positive. VB2: Vitamin B2; CHOLN: Choline; POTA: Potassium; SUGR: Sugar; MAGN: Magnesium; S080: Octanoic Acid; CHOLE: Cholesterol; S120: Dodecanoic Acid; FIBE: Fiber; FF: Folate; P226: Docosahexaenoic Acid (DHA); SELE: Selenium; PROT: Total Protein; COPP: Copper; CARB: Total Carbohydrate; TFAT: Total Fat; CALC: Calcium; SODI: Sodium; VB6: Vitamin B6.
Comparison of the lactose (g) between the manual lookup and first match for each database matching algorithm.
| Matching Algorithm | Weighting | Training ( | Test ( | All Data ( | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| PCC | MAE | Variation |
| PCC | MAE | Variation |
| PCC | MAE | Variation | ||
| Nutrient-Only | Unweighted |
| 0.71 |
|
|
|
|
|
|
|
|
|
|
| LASSO-weighted | 0.31 (0.47) | 0.55 (0.69) | 1.05 (0.93) | 0.66 (0.68) | 0.19 | 0.44 | 1.45 | 0.77 | 0.25 (0.36) | 0.50 (0.60) | 1.18 (1.11) | 0.70 (0.70) | |
| Ridge-weighted |
|
| 0.74 | 0.48 |
| 0.56 | 1.24 | 0.61 |
|
| 0.91 | 0.50 | |
| Nutrient + Text | Unweighted |
|
|
| 0.60 | 0.72 | 0.85 | 0.69 |
|
|
|
| 0.58 |
| LASSO-weighted | 0.20 (0.57) | 0.44 | 0.92 | 0.60 | 0.64 | 0.80 | 0.78 | 0.63 | 0.26 (0.58) | 0.51 (0.76) | 0.90 (0.73) | 0.60 (0.60) | |
| Ridge-weighted | 0.23 (0.65) | 0.48 | 0.89 |
|
|
|
| 0.31 (0.68) | 0.56 (0.82) | 0.81 (0.64) | |||
PCC: Pearson’s Correlation Coefficient; MAE: Mean Absolute Error. The variation represents the median coefficient of variation in the g of lactose among the top five matches returned by each algorithm. Values in () are after the hot cocoa “outliers” are removed. Italicized values indicate the highest R2 and PCC and lowest MAE and variation values for the nutrient-only and nutrient + text algorithms for each dataset.
Comparisons of first matches among unweighted, LASSO-weighted, and Ridge-weighted. Nutrient-Only and Nutrient + Text matching.
| Input ASA24 Food Description | NDSR/NCC Manual Lookup | Matching Scheme | First MatchNCC Short Description | g Lactose per 100g of Food | |||||
|---|---|---|---|---|---|---|---|---|---|
| Unweighted | LASSO-Weighted | Ridge-Weighted | Manual Lookup | Un-Weighted | LASSO-Weighted | Ridge-Weighted | |||
| Milk, cow’s, fluid, whole | Milk, whole (3.5–4% fat) | Nutrient-Only | Milk, lactose reduced (Lactaid), whole | Milk, lactose reduced (Lactaid), whole | Milk, lactose reduced (Lactaid), whole | 5.05 | 0.00 | 0.00 | 0.00 |
| Nutrient + Text | Milk, whole | Milk, whole | Milk, whole | 5.05 | 5.05 | 5.05 | 5.05 | ||
| Milk, cow’s, fluid, lactose reduced, 2% fat | Milk, lactose reduced (Lactaid), 2% fat or reduced fat | Nutrient-Only | Arby’s milk | Milk, acidophilus, 2% fat (reduced fat) | Milk, lactose reduced (Lactaid), 2% fat (reduced fat) | 0.00 | 5.01 | 5.01 | 0.00 |
| Nutrient + Text | Milk, lactose reduced (Lactaid), 2% fat (reduced fat) | Milk, lactose reduced (Lactaid), 2% fat (reduced fat) | Milk, lactose reduced (Lactaid), 2% fat (reduced fat) | 0.00 | 0.00 | 0.00 | 0.00 | ||
| Milk, soy, ready-to-drink, not baby’s | Milk, soy milk, ready-to-drink, plain or original, unknown if sweetened, unknown sweetening, unknown type, unknown if fortified | Nutrient-Only | Soy milk, plain or original, sweetened with sugar, ready-to-drink, enriched | Soy milk, vanilla or other flavors, sweetened with sugar, ready-to-drink, enriched | Soy milk, vanilla or other flavors, sweetened with sugar, ready-to-drink, enriched | 0.00 | 0.00 | 0.00 | 0.00 |
| Nutrient + Text | Soy milk, chocolate, sweetened with sugar, light, ready-to-drink | Soy milk, chocolate, sweetened with sugar, ready-to-drink, not fortified | Soy milk, chocolate, sweetened with sugar, ready-to-drink, not fortified | 0.00 | 0.00 | 0.00 | 0.00 | ||
| Milk, almond, ready-to-drink | Milk, almond beverage, plain or original, unknown type | Nutrient-Only | Cashew milk, chocolate | SnackWell’s 100 Calorie Pack—Fudge Drizzled Double Chocolate Chip (Nabisco) | Cashew milk, chocolate | 0.00 | 0.00 | 0.00 | 0.00 |
| Nutrient + Text | Almond milk, chocolate, sweetened | Chocolate milk, ready-to-drink | Almond milk, chocolate, sweetened | 0.00 | 0.00 | 5.12 | 0.00 | ||
| Cheese, Cheddar | Cheddar cheese, unknown type | Nutrient-Only | Colby cheese, natural | Pepper Jack cheese | Colby Jack cheese | 0.12 | 0.23 | 0.67 | 0.37 |
| Nutrient + Text | Cheddar cheese, natural | Cheddar cheese, natural | Cheddar cheese, natural | 0.12 | 0.12 | 0.12 | 0.12 | ||
| Yogurt, plain, whole milk | Yogurt, plain, whole milk (3%–4% fat) | Nutrient-Only | Mountain High Original Style Yoghurt—plain | Mountain High Original Style Yoghurt—plain | Stonyfield Organic YoBaby Yogurt—plain | 3.38 | 3.38 | 3.38 | 3.38 |
| Nutrient + Text | Yogurt, plain, whole milk | Yogurt, plain, whole milk | Yogurt, plain, whole milk | 3.38 | 3.38 | 3.38 | 3.38 | ||
| High protein bar, candy-like, soy and milk base | Special formulated products, Tiger’s Milk Nutrition Bar—Protein Rich | Nutrient-Only | Tiger’s Milk Nutrition Bar—Peanut Butter and Honey | Tiger’s Milk Nutrition Bar—Peanut Butter and Honey | Tiger’s Milk Nutrition Bar—Peanut Butter and Honey | 3.24 | 4.52 | 4.52 | 4.52 |
| Nutrient + Text | Nutribar High Protein Meal Replacement Bar—Milk Chocolate Peanut | Slim-Fast High Protein—Creamy Milk Chocolate, dry mix (unprepared) | High-protein bar | 3.24 | 1.54 | 12.19 | 0.12 | ||
| Clam chowder, NS as to Manhattan or New England style | Clams, soup—clam chowder, New England (cream base), unknown preparation | Nutrient-Only | Pescado frito con mojo (fish a la creole) | Dairy Queen Hot Dog with chili | Cheeseburger on a bun, single patty (1/10 LB), with ketchup, tomato, lettuce, pickle, onion, mustard | 2.46 | 0.00 | 0.00 | 0.17 |
| Nutrient + Text | Manhattan clam chowder, tomato base, homemade | Manhattan clam chowder, tomato base, ready-to-serve can | Manhattan clam chowder, condensed | 2.46 | 0.00 | 0.00 | 0.00 | ||
| Cocoa, sugar, and dry milk mixture, water added | Cocoa or hot chocolate, prepared from dry mix, unknown type | Nutrient-Only | Soy milk, vanilla or other flavors, sweetened with sugar, light, ready-to-drink, not fortified | Almond milk, vanilla or other flavors, sweetened | Gerber Breakfast Buddies Hot Cereal with Real Fruit and Yogurt—Bananas and Cream | 0.92 | 0.00 | 0.00 | 2.80 |
| Nutrient + Text | Land O’Lakes Cocoa Classics—Arctic White Cocoa, prepared | Swiss Miss Hot Cocoa Sensible Sweets—No Sugar Added, dry mix (unprepared) | Swiss Miss Hot Cocoa Sensible Sweets—No Sugar Added, dry mix (unprepared) | 0.92 | 0.89 | 51.88 | 51.88 | ||