| Literature DB >> 35858966 |
Shizhao Chen1, Yiran Dai1, Xiaoman Ma1, Huimin Peng2, Donghui Wang1, Yili Wang1.
Abstract
Precision medicine applies machine learning methods to estimate the personalized optimal treatment decision based on individual information, such as genetic data and medical history. The main purpose of self obesity management is to develop a personalized optimal life plan that is easy to implement and adhere to, thereby reducing the incidence of obesity and obesity-related diseases. The methodology comprises three components. First, we apply catboost, random forest and lasso covariance test to evaluate the importance of individual features in forecasting body mass index. Second, we apply metaalgorithms to estimate the personalized optimal decision on alcohol, vegetable, high caloric food and daily water intake respectively for each individual. Third, we propose new metaalgorithms named SX and SXwint learners to compute the personalized optimal decision and compare their performances with other prevailing metalearners. We find that people who receive individualized optimal treatment options not only have lower obesity levels than others, but also have lower obesity levels than those who receive 'one-for-all' treatment options. In conclusion, all metaalgorithms are effective at estimating the personalized optimal decision, where SXwint learner shows the best performance on daily water intake.Entities:
Mesh:
Year: 2022 PMID: 35858966 PMCID: PMC9297061 DOI: 10.1038/s41598-022-16260-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Description of characteristics for overweight and obese individuals.
| Feature | Feature summary | Drop in variance | ||||
|---|---|---|---|---|---|---|
| Minimum | Maximum | Average | Standard deviation | |||
| Continuous feature | ||||||
| FCVC | 1.00 | 3.00 | 2.43 | 0.51 | 165.15 | 0.0000 |
| NCP | 1.00 | 4.00 | 2.64 | 0.73 | 3.40 | 0.0334 |
| FAF | 0.00 | 3.00 | 0.93 | 0.80 | 3.34 | 0.0353 |
| CH2O | 1.00 | 3.00 | 2.06 | 0.60 | 0.18 | 0.8328 |
| TUE | 0.00 | 2.00 | 0.62 | 0.58 | 0.15 | 0.8581 |
| Age | 15.00 | 56.00 | 25.60 | 6.48 | 0.02 | 0.9784 |
Drop in Variance is the test statistic of covariance test. P Value is the p value of covariance test. For continuous features, minimum, maximum, average and standard deviation are displayed. For categorical features, Count is the sample size under each category.
Figure 1Feature importance for predicting BMI of overweight and obese people using; (a) catboost, (b) random forest.
Figure 2Illustration of metaalgorithms. D: Training data. S: Testing data. R: Re-training data. A: The set of all individuals. DR: The union of training and re-training data. N: The set of all individuals with treatment observation T 0. K: The set of all individuals with treatment observation T 1. ND: Training data from N. NS: Testing data from N. NR: Re-training data from N. KD: Training data from K. KS: Testing data from K. KR: Re-training data from K. Solid lines represent random splits of datasets. Dotted lines stand for the computational processes of models.
Two-sample Kolmogorov–Smirnov (KS) test results concerning alcohol, vegetable, high caloric food and daily water intake.
| Learner | KS test 1 | KS test 2 | No.O | No.NO | No.G | ||
|---|---|---|---|---|---|---|---|
| KS test 1 D | KS test 2 D | ||||||
| T | 0.5647 | < 2.2e−16 | 0.3892 | 6.7e−16 | 272 | 500 | 207 |
| X | 0.5592 | < 2.2e−16 | 7.8e−16 | 269 | 503 | 207 | |
| S | 0.5437 | < 2.2e−16 | 0.3284 | 2.9e−11 | 262 | 510 | 207 |
| SX | 0.5255 | < 2.2e−16 | 0.3082 | 3.5e−11 | 352 | 420 | 207 |
| SXwint | < 2.2e−16 | 0.3355 | 7.1e−13 | 331 | 441 | 207 | |
| T | 0.5022 | < 2.2e−16 | 0.1782 | 3.2e−4 | 275 | 497 | 275 |
| X | 0.5594 | < 2.2e−16 | 4.2e−6 | 258 | 514 | 275 | |
| S | 0.4165 | < 2.2e−16 | 0.1213 | 4.2e−2 | 251 | 521 | 275 |
| SX | 0.4742 | < 2.2e−16 | 0.1473 | 6.9e−3 | 249 | 523 | 275 |
| SXwint | < 2.2e−16 | 0.1131 | 4.2e−2 | 333 | 439 | 275 | |
| T | 0.6683 | < 2.2e−16 | 1.2e−4 | 170 | 602 | 58 | |
| X | < 2.2e−16 | 0.3267 | 2.2e−4 | 162 | 610 | 58 | |
| S | 0.6004 | < 2.2e−16 | 0.2343 | 2.1e−2 | 147 | 625 | 58 |
| SX | 0.5391 | < 2.2e−16 | 0.1675 | 1.3e−1 | 305 | 467 | 58 |
| SXwint | 0.6530 | < 2.2e−16 | 0.2096 | 3.3e−2 | 242 | 530 | 58 |
| T | 0.4001 | < 2.2e−16 | 0.1474 | 7.3e−4 | 341 | 431 | 391 |
| X | 0.4282 | < 2.2e−16 | 0.1411 | 7.6e−4 | 400 | 372 | 391 |
| S | 0.3880 | < 2.2e−16 | 0.1342 | 2.5e−3 | 352 | 420 | 391 |
| SX | 0.3150 | < 2.2e−16 | 0.1255 | 7.2e−3 | 329 | 443 | 391 |
| SXwint | < 2.2e−16 | 1.3e−6 | 295 | 477 | 391 | ||
KS Test 1 D and P Value 1 are the test statistic and p value of KS test 1 between the distributions of BMI in personalized optimal and non-optimal groups. KS Test 2 D and P Value 2 are the test statistic and p value of KS test 2 between the distributions of BMI in personalized optimal and general optimal groups. No. O is the sample size of personalized optimal group. No. NO is the sample size of non-optimal group. No.G is the sample size of general optimal group.
The largest KS test distance statistic produced under each type of food or drink is in bold.
Figure 3Comparison between BMI distributions in groups; (a) Alcohol Yes and No in the testing data, Yes = positive alcohol intake, No = zero alcohol intake, (b) Alcohol T learner NO and O, NO = non-optimal group, O = personalized optimal group, (c) Alcohol X learner NO and O, (d) Alcohol S learner NO and O, (e) Alcohol SX learner NO and O, (f) Alcohol SXwint learner NO and O, (g) Vegetable High and Low in the testing data, High = FCVC > 2 = positive vegetable intake in every meal, Low = FCVC 2 = no vegetable intake in some meals, (h) Vegetable T learner NO and O, (i) Vegetable X learner NO and O, (j) Vegetable S learner NO and O, (k) Vegetable SX learner NO and O, (l) Vegetable SXwint learner NO and O, (m) HCF Yes and No in the testing data, HCF = high caloric food, Yes = high frequency of HCF intake, No = low frequency of HCF intake, (n) HCF T learner NO and O, (o) HCF X learner NO and O, (p) HCF S learner NO and O, (q) HCF SX learner NO and O, (r) HCF SXwint learner NO and O, (s) Water High and Low in the testing data, High = CH2O > 2 = daily water intake greater than 2 liters, Low = CH2O 2 = daily water intake less than or equal to 2 liters, (t) Water T learner NO and O, (u) Water X learner NO and O, (v) Water S learner NO and O, (w) Water SX learner NO and O, (x) Water SXwint learner NO and O.
Figure 4Comparison of sample size ratios between personalized optimal and non-optimal groups for T, X, S, SX and SXwint learners; (a) Alcohol CALC Yes/No Ratio = (Sample size with CALC = Yes)/(Sample size with CALC = No), (b) Vegetable FCVC High/Low Ratio = (Sample size with FCVC >2)/(Sample size with FCVC2), (c) High caloric food FAVC Yes/No Ratio = (Sample size with FAVC=Yes)/(Sample size with FAVC = No), (d) Water CH2O High/Low Ratio = (Sample size with CH2O > 2)/(Sample size with CH2O 2). Optimal = personalized optimal group. Non-optimal = non-optimal group. On the testing data, (a) CALC Yes/No Ratio = 2.729, (b) FCVC High/Low Ratio = 1.807, (c) FAVC Yes/No Ratio = 12.310, (d) CH2O High/Low Ratio = 0.974.