| Literature DB >> 28754093 |
Justine B Nasejje1, Henry Mwambi2, Keertan Dheda3, Maia Lesosky4.
Abstract
BACKGROUND: Random survival forest (RSF) models have been identified as alternative methods to the Cox proportional hazards model in analysing time-to-event data. These methods, however, have been criticised for the bias that results from favouring covariates with many split-points and hence conditional inference forests for time-to-event data have been suggested. Conditional inference forests (CIF) are known to correct the bias in RSF models by separating the procedure for the best covariate to split on from that of the best split point search for the selected covariate.Entities:
Keywords: Conditional inference forests; Random survival forests; Split-points; Survival analysis; Survival trees
Mesh:
Year: 2017 PMID: 28754093 PMCID: PMC5534080 DOI: 10.1186/s12874-017-0383-8
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Simulated time-to-event datasets
| Properties of simulated time-to-event datasets | ||||
|---|---|---|---|---|
| Type of covariates | Datasets | Sample size | % of censoring | Nature of the |
| hazard | ||||
| Binary | Data 1 | 100 | 80 | Increasing |
| Data 2 | 100 | 50 | Decreasing | |
| Data 3 | 250 | 20 | Constant | |
| Data 4 | 1000 | 80 | Increasing | |
| Data 5 | 1500 | 50 | Decreasing | |
| Data 6 | 2000 | 20 | Constant | |
| Polytomous | Data 1 | 100 | 20 | Increasing |
| Data 2 | 100 | 80 | Constant | |
| Data 3 | 250 | 50 | Decreasing | |
| Data 4 | 1000 | 20 | Increasing | |
| Data 5 | 1500 | 80 | Constant | |
| Data 6 | 2000 | 50 | Decreasing | |
| Binary & polytomous | Data 1 | 1000 | 20 | Increasing |
| Data 2 | 100 | 80 | Decreasing | |
| Data 3 | 250 | 50 | Constant | |
| Data 4 | 1000 | 20 | Increasing | |
| Data 5 | 1500 | 80 | Decreasing | |
| Data 6 | 2000 | 50 | Constant | |
| Interactions | Data 1 | 100 | 20 | Increasing |
| Data 2 | 100 | 50 | Decreasing | |
| Data 3 | 1000 | 20 | Increasing | |
| Data 4 | 1500 | 50 | Decreasing | |
Fig. 1Predictive performance on simulated datasets with binary covariates
Fig. 2Predictive performance of the three survival forest models on simulated datasets with polytomous covariates
Fig. 3Predictive performance on simulated datasets with binary and polytomous covariates
Fig. 4Predictive performance of the three survival forest models on simulated datasets with covariate interactions
Characteristics and the distribution of deaths for covariates in Dataset 1
| Characteristics | Dead N(%) | Alive N(%) | Total | Characteristics | Dead N(%) | Alive N(%) | Total |
|---|---|---|---|---|---|---|---|
| Mother’s education level | Mother’s occupation | ||||||
| Illiterate Mothers | 344(7.7) | 4149(92.3) | 4493 | Not-working | 93(6.9) | 1260(93.1) | 1353 |
| Mother completed primary | 119(6.4) | 1749(93.6) | 1868 | Sales and Services | 110 (6.5) | 1589 (93.5) | 1699 |
| Secondary and higher | 14(4.2) | 317(95.8) | 331 | Agriculture | 274(7.5) | 3366(92.5) | 3640 |
| Partner’s level of education | Births in past 5 years | ||||||
| Illiterate Father | 266(7.7) | 3180(92.3) | 3446 | 1-Birth | 93(4.5) | 1982(95.5) | 2075 |
| Father completed primary | 170(6.9) | 2287(93.1) | 2457 | 2-Birth | 227(6.5) | 3288(93.5) | 3515 |
| Secondary and higher | 41(5.2) | 748(94.8) | 789 | 3-Births | 140(13.6) | 887(86.4) | 1027 |
| Birth status | 4-Births | 17(22.7) | 58(77.3) | 75 | |||
| Singleton births | 431(6.7) | 6048(93.3) | 6479 | Births in past 1 year | |||
| Multiple births (Twins) | 46(21.5) | 167(78.5) | 213 | No-births | 309(6.8) | 4212(93.2) | 4521 |
| Sex of the child | 1-Birth | 163(7.6) | 1971(92.4) | 2134 | |||
| Males | 258(7.8) | 3067(92.2) | 3325 | 2-Births | 5(13.5) | 32(86.5) | 37 |
| Females | 212(6.3) | 3155(93.7) | 3367 | Children Under 5 in Household | |||
| Type of place of residence | No-child | 101(34.9) | 188(65.1) | 289 | |||
| Urban | 81(5.8) | 1308(94.2) | 1389 | 1-Child | 178(10.5) | 1511(89.5) | 1689 |
| Rural | 396(7.5) | 4907(92.5) | 5303 | 2-Children | 146(4.9) | 2831(95.1) | 2977 |
| Wealth index | 3-Children | 35(2.5) | 1349(97.5) | 1384 | |||
| Poorest | 131(7.5) | 1623(92.5) | 1754 | 4-Children | 17(4.8) | 336(95.2) | 353 |
| Poorer | 112(8.5) | 1205(91.5) | 1317 | Mother’s age group | |||
| Middle | 86(7.2) | 1109(92.8) | 1195 | Less than 20 years | 29(8.9) | 296(91.1) | 325 |
| Richer | 72(6.9) | 969(93.1) | 1041 | 20-29 years | 235(6.5) | 3376(93.5) | 3611 |
| Richest | 76(5.5) | 1309(94.5) | 1385 | 30-39 years | 164(7.4) | 2054(92.6) | 2218 |
| Children ever born | 40 years + | 49(7.9) | 489(90.1) | 538 | |||
| One child | 20(3.3) | 581(96.7) | 601 | Birth order number | |||
| Two children | 81(7.1) | 1065(92.9) | 1146 | First child | 95(7.6) | 1154(92.4) | 1249 |
| Three children | 67(6.6) | 953(93.4) | 1020 | Second to Third child | 117(5.6) | 1974(94.4) | 2091 |
| Four and more | 309(7.9) | 3616(92.1) | 3925 | 4 | 149(7.1) | 1949(92.9) | 2098 |
| Birth order number |
| 116(9.3) | 1138(90.7) | 1254 | |||
| First child | 95(7.6) | 1154(92.4) | 1249 | Sex of household head | |||
| Second to Third child | 117(5.6) | 1974(94.4) | 2091 | Male | 341(6.7) | 4771(93.3) | 5112 |
| 4 | 149(7.1) | 1949(92.9) | 2098 | Female | 136(8.6) | 1444(91.4) | 1580 |
|
| 116(9.2) | 1138(90.8) | 1254 | Source of drinking water | |||
| Religion | Piped water | 76(5.9) | 1204(94.1) | 1280 | |||
| Catholics | 217(7.4) | 2722(92.6) | 2939 | Borehole | 216(7.3) | 2731(92.7) | 2947 |
| Muslims | 69(7.5) | 852(92.5) | 921 | Well | 93(6.9) | 1261(93.1) | 1354 |
| Other Christians | 187(6.8) | 2571(93.2) | 2758 | Surface/Rain/Pond/Lake/tank | 70(8.5) | 756(91.5) | 826 |
| Others | 4(5.4) | 70(94.6) | 74 | Other | 22(7.7) | 263(92.3) | 285 |
| Type of toilet facility | Age at first birth | ||||||
| Flush toilet | 5(4.1) | 116(95.9) | 121 | Less than 20 years | 347(7.5) | 4291(92.5) | 4638 |
| Pitlatrine | 376(6.9) | 5031(93.1) | 5407 | 20-29 years | 127(6.3) | 1899(93.7) | 2026 |
| No-facility | 96(8.2) | 1068(91.8) | 1164 | 30-39 years | 3(12.0) | 22(88.0) | 25 |
Characteristics and the distribution of deaths for covariates in Dataset 2
| Characteristics | Dead N(%) | AliveN (%) | Total | Characteristics | Dead N(%) | Alive N(%) | Total |
|---|---|---|---|---|---|---|---|
| Age at diagnosis | Ethionamide | ||||||
| Below 30 | 35(81.3) | 8(18.6) | 43 | Not prescribed | 25(64.10) | 14(35.89) | 39 |
| Above 30 | 43(68.25) | 20(31.75) | 63 | Prescribed | 54(79.41) | 14(20.59) | 68 |
| Gender | Ofloxacin | ||||||
| Females | 41(83.67) | 8(16.33) | 49 | Not prescribed | 48(70.59) | 20(29.41) | 68 |
| Males | 38(65.52) | 20(34.48) | 58 | Prescribed | 31(79.49) | 8(20.51) | 39 |
| smoking status | Ofloxacin and moxifloxacin | ||||||
| No | 28(65.12) | 15(34.88) | 43 | Not prescribed | 72(72.73) | 27(27.27) | 99 |
| Yes | 38(79.17) | 10(20.83) | 48 | Prescribed | 7(87.50) | 1(12.50) | 8 |
| HIV plus ART status | Amikacin | ||||||
| HIV -ve | 46(73.02) | 17(26.98) | 63 | Not prescribed | 76(73.79) | 27(26.21) | 103 |
| HIV +ve ART | 24(68.57) | 11(31.43) | 35 | Prescribed | 3(75.00) | 1(25.00) | 4 |
| HIV +ve no ART | 9 (100.00) | 0 (0.00) | 9 | Capreomycin | |||
| Cohort | Not prescribed | 8(88.98) | 1(11.11) | 9 | |||
| B | 54(83.08) | 11(16.92) | 65 | Prescribed | 71(72.45) | 27(27.55) | 98 |
| N | 12(80.00) | 3(20.00) | 15 | Dapsone | |||
| S | 13(48.15) | 14(51.85) | 27 | Not prescribed | 43(67.19) | 21(32.81) | 64 |
| Race | Prescribed | 36(83.72) | 7(16.28) | 43 | |||
| Blacks | 34(64.15) | 19(35.85) | 53 | Augmentin | |||
| Mixed ancestry | 45(83.33) | 9(16.67) | 54 | Not prescribed | 28(66.67) | 14(33.33) | 42 |
| Drugs used | Prescribed | 51(78.46) | 14(21.54) | 65 | |||
| Isoniazid | Clofazamine | ||||||
| Not prescribed | 57(83.82) | 11(16.18) | 68 | Not prescribed | 70(82.35) | 15(17.65) | 85 |
| Prescribed | 22(56.41) | 17(43.59) | 39 | Prescribed | 9(40.91) | 13(59.09) | 22 |
| Etambutol | Azithromycin | ||||||
| Not prescribed | 39(66.10) | 20(33.89) | 59 | Not prescribed | 75(76.53) | 23(23.47) | 98 |
| Prescribed | 40(83.33) | 8(16.67) | 48 | Prescribed | 4(44.44) | 5(55.56) | 9 |
| Pyrazinainamide | Amoxicillin | ||||||
| Not prescribed | 14(58.33) | 10(41.67) | 24 | Not prescribed | 49(71.01) | 20(28.99) | 69 |
| Prescribed | 65(78.3) | 18(21.69) | 83 | Prescribed | 30(78.95) | 8(21.05) | 38 |
| Clarithromycin | |||||||
| Not prescribed | 19(70.37) | 8(29.63) | 27 | ||||
| Prescribed | 60(75.00) | 20(25.00) | 80 |
Fig. 5Variable importance scores obtained from RSF1 and RSF2 model on Dataset 1
Fig. 6Variable importance scores obtained under CIF model on Dataset 1
Fig. 7Variable importance obtained from fitting RSF1 and RSF2 model on Dataset 2
Fig. 8Variable importance obtained under CIF model on Dataset 2
Fig. 9The predictive performance of the two random survival forest models and the conditional inference forest model on Dataset 1 and 2