| Literature DB >> 35047011 |
Yu-Chi Lee1, Jacob J Christensen2, Laurence D Parnell1, Caren E Smith3, Jonathan Shao4, Nicola M McKeown5,6, José M Ordovás3,7,8, Chao-Qiang Lai1.
Abstract
Obesity is associated with many chronic diseases that impair healthy aging and is governed by genetic, epigenetic, and environmental factors and their complex interactions. This study aimed to develop a model that predicts an individual's risk of obesity by better characterizing these complex relations and interactions focusing on dietary factors. For this purpose, we conducted a combined genome-wide and epigenome-wide scan for body mass index (BMI) and up to three-way interactions among 402,793 single nucleotide polymorphisms (SNPs), 415,202 DNA methylation sites (DMSs), and 397 dietary and lifestyle factors using the generalized multifactor dimensionality reduction (GMDR) method. The training set consisted of 1,573 participants in exam 8 of the Framingham Offspring Study (FOS) cohort. After identifying genetic, epigenetic, and dietary factors that passed statistical significance, we applied machine learning (ML) algorithms to predict participants' obesity status in the test set, taken as a subset of independent samples (n = 394) from the same cohort. The quality and accuracy of prediction models were evaluated using the area under the receiver operating characteristic curve (ROC-AUC). GMDR identified 213 SNPs, 530 DMSs, and 49 dietary and lifestyle factors as significant predictors of obesity. Comparing several ML algorithms, we found that the stochastic gradient boosting model provided the best prediction accuracy for obesity with an overall accuracy of 70%, with ROC-AUC of 0.72 in test set samples. Top predictors of the best-fit model were 21 SNPs, 230 DMSs in genes such as CPT1A, ABCG1, SLC7A11, RNF145, and SREBF1, and 26 dietary factors, including processed meat, diet soda, French fries, high-fat dairy, artificial sweeteners, alcohol intake, and specific nutrients and food components, such as calcium and flavonols. In conclusion, we developed an integrated approach with ML to predict obesity using omics and dietary data. This extends our knowledge of the drivers of obesity, which can inform precision nutrition strategies for the prevention and treatment of obesity. Clinical Trial Registration: [www.ClinicalTrials.gov], the Framingham Heart Study (FHS), [NCT00005121].Entities:
Keywords: DNA methylation; GxE interaction; diet; genomics; machine learning; obesity; precision nutrition
Year: 2022 PMID: 35047011 PMCID: PMC8763388 DOI: 10.3389/fgene.2021.783845
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1Phenotype prediction data analysis procedure (pipeline).
General characteristics of the FOS.
| FOS | Training set | Testing set |
|---|---|---|
| N | 1,573 | 394 |
| Men/women, n (% in women) | 700/873 (55.5%) | 178/216 (54.8%) |
| Age, y | 66.3 ± 8.9 | 66.5 ± 8.7 |
| BMI, kg/m2 | 28.1 ± 5.3 | 28.0 ± 5.2 |
| Overweight and obesity, n (%) | 1,122 (71.3%) | 281 (71.3%) |
| Obesity, n (%) | 473 (30.1%) | 118 (30.0%) |
| Smoker, n (%) | 115 (7.3%) | 26 (6.6%) |
| Drinker, n (%) | 1,205 (76.6%) | 321 (81.5%) |
| Type 2 diabetes, n (%) | 210 (13.4%) | 53 (13.5%) |
| Hypertension, n (%) | 858 (54.5%) | 221 (56.1%) |
| Type 2 diabetes medication, n (%) | 160 (10.2%) | 39 (9.9%) |
| Hypertension medication, n (%) | 756 (48.1%) | 196 (49.7%) |
| Lipid-lowering medication, n (%) | 682 (43.4%) | 171 (43.4%) |
| Total energy intake, kcal/d | 1,873 ± 629 | 1,875 ± 636 |
| Physical activity score | 37.7 ± 6.4 | 37.6 ± 5.8 |
All continuous variables were presented as mean ± SD.
FIGURE 2Receiver operating characteristic (ROC) curves and their corresponding AUC values for different machine learning algorithms using 50 sample model objects for obesity status in the training data set of the FOS (n = 1,573). All models were based on continuous input variables and under-sampling approach.
Performance metrics of overweight and obesity prediction models constructed using various machine learning algorithms in the test data set of the FOS.
| Model/algorithm | ROC-AUC | Sensitivity | Specificity | Accuracy |
|---|---|---|---|---|
| Overweight and obesity | ||||
| Boot-strapped trees (treebag) | 0.65 | 0.64 | 0.62 | 0.63 |
| Random forest (ranger) | 0.68 | 0.63 | 0.64 | 0.63 |
| Stochastic gradient boosting machines (gbm) |
|
|
|
|
| Obesity | ||||
| Boot-strapped trees (treebag) | 0.65 | 0.63 | 0.62 | 0.62 |
| Random forest (ranger) | 0.66 | 0.53 | 0.65 | 0.61 |
| Stochastic gradient boosting machines (gbm) | 0.67 | 0.51 | 0.68 | 0.63 |
All models were based on continuous input variables and under-sampling approach.
The best metrics in each column are shown in bold.
FIGURE 3Receiver operating characteristic (ROC) curve of the overweight and obesity prediction model using stochastic gradient boosting machine learning algorithms in the test data set of the FOS (n = 394). This model was based on continuous input variables and under-sampling approach.
Top 50 predictive features of the best-performing model for predicting overweight/obesity status in the FOS.
| Importance | Feature | Chr | Position | Gene |
|---|---|---|---|---|
|
|
| |||
| 69.43 | cg06690548 | 4 | 139162808 |
|
|
|
|
|
|
|
| 40.14 | cg15754660 | 7 | 34699393 |
|
|
|
|
|
|
|
| 36.52 | Nutrient value—calcium | |||
|
|
|
|
|
|
| 33.83 | cg06560379 | 6 | 44231305 |
|
|
|
| |||
| 28.66 | cg11024682 | 17 | 17730094 |
|
|
|
| |||
| 28.49 | cg05201185 | 6 | 30459139 |
|
|
|
| |||
| 28.00 | cg17501210 | 6 | 166970252 |
|
|
|
|
|
|
|
| 25.37 | Food—low-calorie Cola, no caffeine | |||
|
|
|
|
|
|
| 24.77 | cg26278103 | 7 | 124404244 |
|
|
|
|
|
|
|
| 23.40 | Food group—high-fat dairy servings | |||
|
|
|
|
|
|
| 22.34 | rs1740322 | |||
|
|
|
|
|
|
| 21.62 | rs4974985 | 4 | 38961449 |
|
|
|
|
|
|
|
| 18.97 | cg00174508 | 12 | 107774298 |
|
|
|
|
|
|
|
| 18.02 | cg14476101 | 1 | 120255992 |
|
|
|
|
|
|
|
| 17.22 | cg16341269 | 6 | 150213172 |
|
|
|
| |||
| 16.58 | Sex | |||
|
|
|
|
|
|
| 15.75 | cg22650271 | 22 | 39760165 |
|
|
|
|
|
|
|
| 15.21 | cg08766211 | 15 | 79118175 | NA |
|
|
| |||
| 14.77 | cg11963676 | 1 | 76540110 |
|
|
|
|
|
|
|
| 13.58 | cg04582365 | 10 | 59155846 | NA |
|
|
| |||
| 13.24 | cg07052041 | 10 | 135092104 | NA |
|
|
|
|
|
|
| 12.67 | cg18034719 | 5 | 176860863 |
|
|
|
| |||
| 12.44 | cg15448990 | 4 | 88411497 |
|
|
|
|
|
|
|
| 12.29 | cg26722769 | 4 | 170328730 |
|
|
|
|
|
|
|
| 11.86 | cg00945735 | 7 | 41982767 | NA |
Predicted responses in overweight and obesity status of subjects with simulated dietary feature changes in the test data set of the FOS (n = 260).
| Original status | |||
|---|---|---|---|
| Modifying feature | Overweight or obese | Not overweight or obese | Total |
| Food group—processed meat servings | 28 (10.8%) | 23 (8.8%) | 51 (19.6%) |
| Food group—high-fat dairy servings | 15 (5.8%) | 3 (1.2%) | 18 (6.9%) |
| Food—French fries | 0 | 4 (1.5%) | 4 (1.5%) |
| Nutrient value—calcium | 8 (3.1%) | 6 (2.3%) | 14 (5.4%) |
| Nutrient value—animal Fat | 5 (1.9%) | 1 (0.4%) | 6 (2.3%) |