| Literature DB >> 27561809 |
Mona Haghighi1, Suzanne Bennett Johnson2, Xiaoning Qian3, Kristian F Lynch4, Kendra Vehik4, Shuai Huang5.
Abstract
Regression models are extensively used in many epidemiological studies to understand the linkage between specific outcomes of interest and their risk factors. However, regression models in general examine the average effects of the risk factors and ignore subgroups with different risk profiles. As a result, interventions are often geared towards the average member of the population, without consideration of the special health needs of different subgroups within the population. This paper demonstrates the value of using rule-based analysis methods that can identify subgroups with heterogeneous risk profiles in a population without imposing assumptions on the subgroups or method. The rules define the risk pattern of subsets of individuals by not only considering the interactions between the risk factors but also their ranges. We compared the rule-based analysis results with the results from a logistic regression model in The Environmental Determinants of Diabetes in the Young (TEDDY) study. Both methods detected a similar suite of risk factors, but the rule-based analysis was superior at detecting multiple interactions between the risk factors that characterize the subgroups. A further investigation of the particular characteristics of each subgroup may detect the special health needs of the subgroup and lead to tailored interventions.Entities:
Mesh:
Year: 2016 PMID: 27561809 PMCID: PMC5000469 DOI: 10.1038/srep30828
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Previous logistic regression results for the sample with no missing data and the total sample with missing data imputed: Variables associated with study withdrawal in the first year of TEDDY. (Reprinted from Johnson, S. B. et al.10 with permission from John Wiley and Sons Inc).
| Predictor variable | Sample with No Missing Data (N=3431) | Sample with missing data imputed (N = 3757) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Estimate | SE | P-value | OR | 95% Confidence Interval | SE | P-value | ||||
| Intercept | 1.126 | 0.424 | 0.008 | 0.982 | 0.400 | 0.014 | ||||
| Country | United States | ref | ref | |||||||
| Finland | −0.420 | 0.130 | 0.001 | 0.657 | 0.509 | 0.848 | −0.431 | 0.123 | 0.0004 | |
| Germany | 0.278 | 0.222 | 0.211 | 1.321 | 0.854 | 2.042 | 0.154 | 0.218 | 0.481 | |
| Sweden | −0.342 | 0.110 | 0.002 | 0.711 | 0.572 | 0.882 | −0.346 | 0.104 | 0.002 | |
| Child sex female | No | ref | ||||||||
| Yes | 0.160 | 0.092 | 0.081 | 2.316 | 1.840 | 2.915 | 0.217 | 0.086 | 0.012 | |
| Maternal age (years) | −0.058 | 0.009 | <0.0001 | 0.944 | 0.927 | 0.961 | −0.053 | 0.009 | <0.0001 | |
| Maternal Lifestyle Behaviors during Pregnancy | ||||||||||
| Smoked | No | ref | ref | |||||||
| Yes | 0.841 | 0.117 | <0.0001 | 2.318 | 1.841 | 2.918 | 0.803 | 0.117 | <0.0001 | |
| Alcohol consumption in last trimester | None | ref | ||||||||
| 1–2 times/month | −0.343 | 0.148 | 0.020 | 0.709 | 0.531 | 0.948 | −0.280 | 0.140 | 0.045 | |
| >2 times/month | −0.424 | 0.319 | 0.183 | 0.654 | 0.350 | 1.222 | −0.401 | 0.299 | 0.180 | |
| Worked all trimesters | No | ref | ref | |||||||
| Yes | −0.396 | 0.095 | <0.0001 | 0.673 | 0.559 | 0.811 | −0.364 | 0.090 | <0.0001 | |
| Dad participation | No | ref | ref | |||||||
| Yes | −0.569 | 0.162 | 0.0005 | 0.566 | 0.412 | 0.778 | −0.608 | 0.146 | <0.0001 | |
| Risk perception | Underestimate | ref | ref | |||||||
| Accurate | −1.257 | 0.375 | 0.0008 | 0.284 | 0.137 | 0.593 | −1.032 | 0.354 | 0.004 | |
| State Anxiety Inventory score | 0.001 | 0.006 | 0.835 | 1.001 | 0.989 | 1.014 | 0.001 | 0.006 | 0.825 | |
| State Anxiety Inventory score x risk perception | 0.023 | 0.009 | 0.011 | 1.023 | 1.005 | 1.041 | 0.018 | 0.009 | 0.039 | |
| >1 missing data points | 1.321 | 0.464 | 0.007 | |||||||
Characteristics of TEDDY Actives and Withdrawals. (Reprinted from Johnson, S. B. et al.10 with permission from John Wiley and Sons Inc).
| Characteristic | Actives (n = 2994) | Withdrawals (n = 763) | Total Sample (n = 3757) |
|---|---|---|---|
| Country | N (%) | N (%) | N |
| Finland | 747(84%) | 140(16%) | 887 |
| Germany | 106(75%) | 36(25%) | 142 |
| Sweden | 1052(82%) | 231(18%) | 1283 |
| United States | 1089(75%) | 356(25%) | 1445 |
| Child sex | N (%) | N (%) | N |
| Male | 1538 (81%) | 352 (19%) | 1890 |
| Female | 1456 (78%) | 411 (22%) | 1867 |
| Maternal age (years) | M (SD) | M (SD) | M (SD) |
| 30.8 (5.0) | 28.5 (5.7) | 30.4(5.2) | |
| Maternal Lifestyle Behaviors During Pregnancy | |||
| Smoking | N (%) | N (%) | N |
| Smoked | 296(63%) | 171(37%) | 467 |
| Did not smoke | 2602(84%) | 510(16%) | 3112 |
| Data missing | 96(54%) | 82(46%) | 178 |
| Alcohol consumption at 3rd trimester | N (%) | N (%) | N |
| Alcohol 1-2 times per month | 474(87%) | 72(13%) | 546 |
| Alcohol ≥ 3 time per month | 105(89%) | 13(11%) | 118 |
| No alcohol | 2359(79%) | 609(21%) | 2968 |
| Data missing | 56(45%) | 69(55%) | 125 |
| Employment status | N (%) | N (%) | N |
| Worked all 3 trimesters | 1418(85%) | 251(15%) | 1669 |
| Reduced work, quit, or did not work at all | 1426(77%) | 417(23%) | 1843 |
| Data missing | 150(61%) | 95(39%) | 245 |
| Dad Participation in TEDDY | N (%) | N (%) | N |
| Participated | 2813(82%) | 624(18%) | 3437 |
| Did Not Participate | 181(57%) | 139(43%) | 320 |
| Maternal Reactions to Child’s Increased TIDM Risk | |||
| Risk perception | N (%) | N (%) | N |
| Accurate | 1809(84%) | 355(16%) | 2164 |
| Underestimate | 1132(77%) | 343(23%) | 1475 |
| Data missing | 53(45%) | 65(55%) | 118 |
| State Anxiety Inventory score | M (SD) | M (SD) | M (SD) |
| Total Sample | 38.7(9.7) | 40.8(10.6) | 39.1(9.9) |
| Risk Perception: Accurate | 38.8(10.2) | 41.7(10.4) | 39.3(9.6) |
| Risk Perception: Underestimate | 38.4(10.2) | 39.9(10.8) | 38.8(10.4) |
| N (%) | N (%) | N | |
| Data missing | 46 (42%) | 63 (58%) | 109 |
| Missing Data | N (%) | N (%) | N |
| ≤1missing data points | 2944 (81%) | 695 (19%) | 3639 |
| >1 missing data points | 50 (42%) | 68 (58%) | 118 |
Figure 1A decision tree learned from the TEDDY data.
Figure 2Flow diagram of the RuleFit algorithm.
The 8 rules identified by the RuleFit method.
| Rule 1 (risk increasing rule) | Rule 2 (risk increasing rule) |
| Maternal age <27.5 Finland = NO | Smoker during pregnancy = YES Accurate risk perception = NO State anxiety inventory score >45 |
| Rule 3 (risk increasing rule) | Rule 4 (risk increasing rule) |
| State anxiety inventory score >45 Dad participation = NO | Maternal age <27.5 Accurate risk perception = NO Alcohol consumption in last trimester <2 times per month |
| Rule 5 (risk decreasing rule) | Rule 6 (risk decreasing rule) |
| Worked all trimesters = YES Smoker during pregnancy = NO | Finland = NO Alcohol consumption in last trimester >0 Number of negative events <2 |
| Rule 7 (risk decreasing rule) | Rule 8 (risk decreasing rule) |
| Smoker during pregnancy = NO State anxiety inventory score < 45 Number of missing data points < = 1 | Maternal age > 27.5 Smoker during pregnancy = NO Number of missing data points < = 1 |
Figure 3Proportion of early withdrawal of the eight rules and the overall population.
Figure 4Investigation of the redundancy of the 8 rules. The pie graph on row (corresponds to rule) and column (corresponds to rule j) records the proportion of the participants endorsing rule i who also endorse rule.