| Literature DB >> 28815129 |
Xiangrui Li1, Dongxiao Zhu1, Ming Dong1, Milad Zafar Nezhad2, Alexander Janke3, Phillip D Levy4.
Abstract
Eradicating health disparity is a new focus for precision medicine research. Identifying patient subgroups is an effective approach to customized treatments for maximizing efficiency in precision medicine. Some features may be important risk factors for specific patient subgroups but not necessarily for others, resulting in a potential divergence in treatments designed for a given population. In this paper, we propose a tree-based method, called Subgroup Detection Tree (SDT), to detect patient subgroups with personalized risk factors. SDT differs from conventional CART in the splitting criterion that prioritizes the potential risk factors. Subgroups are automatically formed as leaf nodes in the tree growing procedure. We applied SDT to analyze a clinical hypertension (HTN) dataset, investigating significant risk factors for hypertensive heart disease in African-American patients, and uncovered significant correlations between vitamin D and selected subgroups of patients. Further, SDT is enhanced with ensemble learning to reduce the variance of prediction tasks.Entities:
Year: 2017 PMID: 28815129 PMCID: PMC5543368
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
Figure 1.Flowchart of training a SDT. RSS refers to Residual Sum of Square.
Statistics of correlation tests (𝜎= 0 vs. 𝜎≠ 0) between LVMI and vitamin D for subgroups in SDT and CART. Subgroups of marginal significance are bold-faced. Note that T represents the same subgroup with D.
| Method | Subgroup | Correlation | |
|---|---|---|---|
| Entire dataset | -0.12 | 0.15 | |
| SDT | -0.30 | ||
| B | -0.10 | 0.43 | |
| 0.55 | |||
| -0.40 | |||
| CART | R | -0.07 | 0.58 |
| S | 0.16 | 0.54 | |
| -0.40 |
Figure 2.Average of 10-fold cross-validation MSE on LVMI over 100 runs for each subtree.
Figure 3.(a) Best subtree for SDT. COR represents Cornell product; ALD is aldosterone; REN refers to renin; (b) best subtree for CART. COR is Cornell Product; TRIG represents triglycerides. For each leaf node denoted as a rectangle, {A, B, C, D} and {R, S, T} are used to label subgroups (leaf nodes) identified by SDT and CART respectively, followed by subgroup size.
Average of LVMI and vitamin D (along with standard deviation) for subgroups by SDT and CART.
| Method | Subgroup | Size | LVMI | Vitamin D |
|---|---|---|---|---|
| Entire dataset | 153 | 91.08 (17.93) | 11.09(4.01) | |
| SDT | A | 35 | 80.47 (13.31) | 9.57 (3.25) |
| B | 63 | 94.62 (12.96) | 11.49 (3.76) | |
| C | 10 | 74.98 (9.17) | 11.20 (5.07) | |
| D | 20 | 109.64 (16.61) | 10.85 (4.12) | |
| CART | R | 65 | 92.13 (10.85) | 10.85 (3.75) |
| S | 16 | 99.29 (14.06) | 11.44 (4.70) | |
| T | 20 | 109.64 (16.61) | 10.85 (4.12) |
Figure 4.Scatter plots (LVMI vs. Vitamin D) for each subgroup identified by SDT (upper panel) and CART (lower panel).
Figure 5.Performance comparison between bagging SDT and random forest.