| Literature DB >> 25452689 |
Sin-Ho Jung1, Yong Chen2, Hongshik Ahn2.
Abstract
Binary tree classification has been useful for classifying the whole population based on the levels of outcome variable that is associated with chosen predictors. Often we start a classification with a large number of candidate predictors, and each predictor takes a number of different cutoff values. Because of these types of multiplicity, binary tree classification method is subject to severe type I error probability. Nonetheless, there have not been many publications to address this issue. In this paper, we propose a binary tree classification method to control the probability to accept a predictor below certain level, say 5%.Entities:
Keywords: binary tree; classification; permutation; single-step procedure; step-down procedure; type I error
Year: 2014 PMID: 25452689 PMCID: PMC4237155 DOI: 10.4137/CIN.S16342
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Empirical type I error probability for nominal α = 5% with m = 1,000, B = 1,000, and N = 1,000.
| CENSORING | ||||||
|---|---|---|---|---|---|---|
| 0.3 | 0.6 | 0.3 | 0.6 | |||
| 20% | 0.054 | 0.048 | 0.041 | 0.043 | 0.047 | 0.065 |
| 40% | 0.058 | 0.054 | 0.038 | 0.042 | 0.047 | 0.052 |
Empirical power under m = 1,000, B = 1,000, and N = 1,000 for SSP.
| D | CENSORING | # SPLITS | |||||||
|---|---|---|---|---|---|---|---|---|---|
| 0.3 | 0.6 | 0.3 | 0.6 | ||||||
| 0.3 | 5 | 20% | 1 | 200 | 147 | 243 | 321 | 287 | 298 |
| 40% | 1 | 67 | 78 | 110 | 100 | 113 | 91 | ||
| 15 | 20% | 1 | 225 | 276 | 213 | 531 | 492 | 590 | |
| 40% | 1 | 112 | 198 | 85 | 206 | 277 | 301 | ||
| 0.6 | 5 | 20% | 1 | 412 | 416 | 395 | 918 | 893 | 913 |
| 2 | 0 | 0 | 0 | 7 | 3 | 4 | |||
| 3 | 0 | 0 | 0 | 1 | 0 | 1 | |||
| 40% | 1 | 235 | 187 | 279 | 699 | 758 | 799 | ||
| 2 | 0 | 0 | 0 | 3 | 2 | 3 | |||
| 15 | 20% | 1 | 614 | 678 | 599 | 982 | 922 | 968 | |
| 2 | 0 | 0 | 0 | 6 | 6 | 6 | |||
| 3 | 0 | 0 | 0 | 0 | 2 | 5 | |||
| 40% | 1 | 431 | 411 | 402 | 954 | 965 | 911 | ||
| 2 | 0 | 0 | 0 | 6 | 4 | 2 | |||
| 3 | 0 | 0 | 0 | 0 | 3 | 0 | |||
Empirical power under m = 1,000, B = 1,000, and N = 1,000 for SDP.
| D | CENSORING | # SPLITS | |||||||
|---|---|---|---|---|---|---|---|---|---|
| 0.3 | 0.6 | 0.3 | 0.6 | ||||||
| 0.3 | 5 | 20% | 1 | 141 | 137 | 164 | 179 | 201 | 194 |
| 2 | 30 | 41 | 37 | 70 | 68 | 79 | |||
| 3 | 19 | 11 | 7 | 29 | 32 | 26 | |||
| 4 | 8 | 0 | 3 | 17 | 19 | 9 | |||
| 5 | 0 | 0 | 0 | 1 | 0 | 0 | |||
| 40% | 1 | 46 | 67 | 49 | 61 | 73 | 74 | ||
| 2 | 8 | 15 | 18 | 17 | 23 | 17 | |||
| 3 | 10 | 9 | 17 | 11 | 9 | 19 | |||
| 4 | 7 | 0 | 4 | 9 | 6 | 10 | |||
| 15 | 20% | 1 | 181 | 165 | 174 | 376 | 451 | 397 | |
| 2 | 17 | 53 | 34 | 68 | 49 | 58 | |||
| 3 | 27 | 27 | 19 | 26 | 41 | 33 | |||
| 4 | 7 | 23 | 15 | 19 | 21 | 13 | |||
| 5 | 2 | 1 | 0 | 9 | 3 | 10 | |||
| 40% | 1 | 89 | 111 | 66 | 148 | 199 | 182 | ||
| 2 | 13 | 23 | 19 | 26 | 43 | 39 | |||
| 3 | 4 | 11 | 14 | 33 | 40 | 29 | |||
| 4 | 1 | 7 | 0 | 3 | 12 | 18 | |||
| 5 | 0 | 0 | 0 | 1 | 0 | 3 | |||
| 0.6 | 5 | 20% | 1 | 275 | 255 | 278 | 648 | 618 | 634 |
| 2 | 51 | 64 | 57 | 106 | 108 | 96 | |||
| 3 | 68 | 39 | 43 | 87 | 94 | 89 | |||
| 4 | 17 | 18 | 24 | 41 | 37 | 39 | |||
| ≥5 | 12 | 17 | 13 | 35 | 26 | 41 | |||
| 40% | 1 | 87 | 143 | 137 | 511 | 601 | 537 | ||
| 2 | 52 | 48 | 64 | 87 | 78 | 98 | |||
| 3 | 37 | 41 | 47 | 82 | 61 | 53 | |||
| 4 | 23 | 24 | 19 | 16 | 37 | 32 | |||
| ≥5 | 0 | 0 | 0 | 14 | 27 | 29 | |||
| 15 | 20% | 1 | 447 | 478 | 432 | 645 | 658 | 701 | |
| 2 | 87 | 68 | 88 | 121 | 107 | 89 | |||
| 3 | 68 | 54 | 70 | 102 | 85 | 86 | |||
| 4 | 21 | 19 | 13 | 54 | 48 | 57 | |||
| ≥5 | 3 | 9 | 7 | 48 | 45 | 45 | |||
| 40% | 1 | 250 | 278 | 268 | 621 | 663 | 598 | ||
| 2 | 79 | 82 | 92 | 99 | 112 | 103 | |||
| 3 | 61 | 58 | 62 | 95 | 87 | 91 | |||
| 4 | 17 | 22 | 27 | 69 | 56 | 72 | |||
| ≥5 | 9 | 18 | 11 | 59 | 46 | 56 | |||
Figure 1Classification tree for the gene imprinting data generated by SSP with α = 0.05.
Figure 2Classification tree for the lung cancer data generated by SSP with α = 0.2. The median survival time is in months.
Figure 3Kaplan–Meier survival curves for the samples in the terminal nodes of the tree in Figure 2 for the lung cancer data.
Figure 4Classification tree for the lung cancer data generated by SDP with α = 0.2. The median survival time is in months.