| Literature DB >> 35206527 |
Abstract
Recent studies have revealed the importance of the interaction effect in cardiac research. An analysis would lead to an erroneous conclusion when the approach failed to tackle a significant interaction. Regression models deal with interaction by adding the product of the two interactive variables. Thus, statistical methods could evaluate the significance and contribution of the interaction term. However, machine learning strategies could not provide the p-value of specific feature interaction. Therefore, we propose a novel machine learning algorithm to assess the p-value of a feature interaction, named the extreme gradient boosting machine for feature interaction (XGB-FI). The first step incorporates the concept of statistical methodology by stratifying the original data into four subgroups according to the two interactive features. The second step builds four XGB machines with cross-validation techniques to avoid overfitting. The third step calculates a newly defined feature interaction ratio (FIR) for all possible combinations of predictors. Finally, we calculate the empirical p-value according to the FIR distribution. Computer simulation studies compared the XGB-FI with the multiple regression model with an interaction term. The results showed that the type I error of XGB-FI is valid under the nominal level of 0.05 when there is no interaction effect. The power of XGB-FI is consistently higher than the multiple regression model in all scenarios we examined. In conclusion, the new machine learning algorithm outperforms the conventional statistical model when searching for an interaction.Entities:
Keywords: XGB; cross-validation; interaction; machine learning
Mesh:
Year: 2022 PMID: 35206527 PMCID: PMC8871671 DOI: 10.3390/ijerph19042338
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1The data structure of the XGB-FI.
Figure 2Correlation structure of simulated data.
Different correlation structures between Y and the predictors.
| Correlation Coefficient Corresponding to Y | ||
|---|---|---|
|
|
| |
| Scenario 1 | 0.3 | 0.3 |
| Scenario 2 | 0.5 | 0.5 |
| Scenario 3 | 0.5 | 0.8 |
| Scenario 4 | 0.8 | 0.8 |
Figure 3The null distribution of the XGB-FI.
Figure 4Power simulation under the mild interaction effect of normal (0.5, 1) for XGB-FI.
Figure 5Power simulation under the moderate interaction effect of normal (1, 1) for XGB-FI.
Figure 6Power simulation under the considerable interaction effect of normal (2, 1) for XGB-FI.
Power comparisons between the XGB-FI and multiple regression with sample size 500.
| Interaction Effect | |||
|---|---|---|---|
| Mild | Moderate | Considerable | |
| XGB-FI | 0.14 (0.07, 0.21) | 0.62 (0.52, 0.72) | 0.79 (0.71, 0.87) |
| Multiple regression | 0.04 (0.002, 0.078) | 0.20 (0.12, 0.28) | 0.52 (0.42, 0.62) |
Power comparisons between the XGB-FI and multiple regression with sample size 1000.
| Interaction Effect | |||
|---|---|---|---|
| Mild | Moderate | Considerable | |
| XGB-FI | 0.23 (0.15, 0.31) | 0.77 (0.69, 0.85) | 0.94 (0.89, 0.99) |
| Multiple regression | 0.12 (0.06, 0.18) | 0.41 (0.31, 0.51) | 0.85 (0.78, 0.92) |
Power comparison under Scenarios 1 to 4.
| Magnitude of Interaction Effects | ||||
|---|---|---|---|---|
| Scenarios | Mild | Moderate | Considerable | |
| 1 | XGB-FI | 0.94 | 1 | 1 |
| Multiple regression | 0.04 | 0.37 | 0.69 | |
| 2 | XGB-FI | 0.95 | 1 | 0.98 |
| Multiple regression | 0.09 | 0.43 | 0.88 | |
| 3 | XGB-FI | 0.23 | 0.8 | 0.96 |
| Multiple regression | 0.1 | 0.4 | 0.88 | |
| 4 | XGB-FI | 1 | 1 | 0.95 |
| Multiple regression | 0.12 | 0.35 | 0.85 | |