| Literature DB >> 28194165 |
Thang Vu1, Chao Sima2, Ulisses M Braga-Neto1,2, Edward R Dougherty1,2.
Abstract
Convex bootstrap error estimation is a popular tool for classifier error estimation in gene expression studies. A basic question is how to determine the weight for the convex combination between the basic bootstrap estimator and the resubstitution estimator such that the resulting estimator is unbiased at finite sample sizes. The well-known 0.632 bootstrap error estimator uses asymptotic arguments to propose a fixed 0.632 weight, whereas the more recent 0.632+ bootstrap error estimator attempts to set the weight adaptively. In this paper, we study the finite sample problem in the case of linear discriminant analysis under Gaussian populations. We derive exact expressions for the weight that guarantee unbiasedness of the convex bootstrap error estimator in the univariate and multivariate cases, without making asymptotic simplifications. Using exact computation in the univariate case and an accurate approximation in the multivariate case, we obtain the required weight and show that it can deviate significantly from the constant 0.632 weight, depending on the sample size and Bayes error for the problem. The methodology is illustrated by application on data from a well-known cancer classification study.Entities:
Keywords: Bias; Bootstrap; Error estimation; Gene expression classification; Linear discriminant analysis
Year: 2014 PMID: 28194165 PMCID: PMC5270504 DOI: 10.1186/s13637-014-0015-0
Source DB: PubMed Journal: EURASIP J Bioinform Syst Biol ISSN: 1687-4145
Figure 1Univariate case. Required weight w∗ for unbiased convex bootstrap estimation plotted against (a) sample size and (b) Bayes error.
Univariate case: required weight for unbiased convex bootstrap estimation
| 0.724 | 0.687 | 0.679 | 0.675 | 0.674 | 0.672 | 0.671 | 0.671 | 0.670 | 0.670 | |
| 0.736 | 0.696 | 0.685 | 0.680 | 0.678 | 0.676 | 0.674 | 0.673 | 0.672 | 0.672 | |
| 0.738 | 0.701 | 0.689 | 0.683 | 0.679 | 0.677 | 0.676 | 0.674 | 0.674 | 0.673 | |
| 0.729 | 0.704 | 0.691 | 0.684 | 0.681 | 0.678 | 0.677 | 0.675 | 0.674 | 0.673 | |
| 0.708 | 0.701 | 0.692 | 0.686 | 0.682 | 0.679 | 0.677 | 0.676 | 0.675 | 0.674 | |
| 0.681 | 0.692 | 0.693 | 0.687 | 0.683 | 0.680 | 0.678 | 0.677 | 0.676 | 0.675 | |
| 0.646 | 0.670 | 0.688 | 0.687 | 0.683 | 0.680 | 0.678 | 0.677 | 0.676 | 0.675 | |
| 0.625 | 0.631 | 0.673 | 0.683 | 0.683 | 0.681 | 0.679 | 0.677 | 0.676 | 0.675 | |
| 0.614 | 0.574 | 0.639 | 0.671 | 0.679 | 0.680 | 0.679 | 0.677 | 0.676 | 0.675 | |
| 0.617 | 0.516 | 0.579 | 0.635 | 0.663 | 0.673 | 0.676 | 0.677 | 0.676 | 0.675 | |
| 0.641 | 0.470 | 0.498 | 0.563 | 0.617 | 0.648 | 0.664 | 0.671 | 0.673 | 0.674 | |
| 0.676 | 0.459 | 0.425 | 0.464 | 0.523 | 0.577 | 0.616 | 0.641 | 0.656 | 0.665 | |
| 0.724 | 0.487 | 0.393 | 0.379 | 0.405 | 0.451 | 0.502 | 0.548 | 0.587 | 0.614 | |
| 0.780 | 0.549 | 0.422 | 0.356 | 0.331 | 0.334 | 0.356 | 0.389 | 0.428 | 0.469 | |
| 0.837 | 0.639 | 0.505 | 0.412 | 0.350 | 0.310 | 0.288 | 0.280 | 0.282 | 0.295 | |
| 0.890 | 0.741 | 0.626 | 0.533 | 0.458 | 0.398 | 0.350 | 0.312 | 0.283 | 0.261 | |
| 0.935 | 0.842 | 0.761 | 0.690 | 0.627 | 0.570 | 0.519 | 0.474 | 0.434 | 0.399 | |
| 0.971 | 0.925 | 0.884 | 0.845 | 0.808 | 0.772 | 0.739 | 0.707 | 0.676 | 0.647 | |
|
|
|
|
|
|
|
|
|
|
| |
| 0.669 | 0.669 | 0.669 | 0.669 | 0.669 | 0.669 | 0.669 | 0.668 | 0.668 | 0.668 | |
| 0.671 | 0.671 | 0.671 | 0.671 | 0.670 | 0.670 | 0.670 | 0.669 | 0.670 | 0.669 | |
| 0.672 | 0.672 | 0.671 | 0.671 | 0.671 | 0.671 | 0.670 | 0.670 | 0.670 | 0.670 | |
| 0.673 | 0.672 | 0.672 | 0.671 | 0.671 | 0.671 | 0.671 | 0.670 | 0.670 | 0.670 | |
| 0.673 | 0.673 | 0.672 | 0.672 | 0.672 | 0.671 | 0.671 | 0.671 | 0.670 | 0.670 | |
| 0.674 | 0.673 | 0.673 | 0.672 | 0.672 | 0.672 | 0.671 | 0.671 | 0.671 | 0.671 | |
| 0.674 | 0.673 | 0.673 | 0.672 | 0.672 | 0.672 | 0.672 | 0.671 | 0.671 | 0.671 | |
| 0.674 | 0.673 | 0.673 | 0.673 | 0.672 | 0.672 | 0.672 | 0.671 | 0.671 | 0.671 | |
| 0.675 | 0.674 | 0.673 | 0.672 | 0.672 | 0.672 | 0.672 | 0.672 | 0.671 | 0.671 | |
| 0.675 | 0.674 | 0.673 | 0.673 | 0.672 | 0.672 | 0.672 | 0.672 | 0.671 | 0.671 | |
| 0.674 | 0.674 | 0.673 | 0.673 | 0.673 | 0.673 | 0.672 | 0.671 | 0.671 | 0.671 | |
| 0.669 | 0.671 | 0.672 | 0.672 | 0.672 | 0.672 | 0.672 | 0.672 | 0.672 | 0.672 | |
| 0.635 | 0.648 | 0.657 | 0.663 | 0.666 | 0.668 | 0.669 | 0.670 | 0.671 | 0.671 | |
| 0.508 | 0.543 | 0.572 | 0.597 | 0.615 | 0.630 | 0.642 | 0.649 | 0.655 | 0.660 | |
| 0.313 | 0.337 | 0.365 | 0.394 | 0.425 | 0.455 | 0.484 | 0.511 | 0.536 | 0.557 | |
| 0.245 | 0.234 | 0.229 | 0.228 | 0.229 | 0.235 | 0.243 | 0.254 | 0.268 | 0.283 | |
| 0.367 | 0.338 | 0.313 | 0.290 | 0.270 | 0.253 | 0.238 | 0.224 | 0.213 | 0.203 | |
| 0.620 | 0.594 | 0.569 | 0.545 | 0.522 | 0.501 | 0.480 | 0.461 | 0.442 | 0.424 |
Figure 2Bivariate case. Required weight w∗ for unbiased convex bootstrap estimation plotted against (a) sample size and (b) Bayes error.
Bivariate case: required weight for unbiased convex bootstrap estimation
| 0.664 | 0.667 | 0.679 | 0.685 | 0.690 | 0.693 | 0.695 | 0.697 | 0.698 | 0.699 | |
| 0.666 | 0.637 | 0.638 | 0.639 | 0.641 | 0.642 | 0.642 | 0.643 | 0.644 | 0.644 | |
| 0.670 | 0.617 | 0.610 | 0.608 | 0.606 | 0.606 | 0.605 | 0.605 | 0.605 | 0.605 | |
| 0.675 | 0.604 | 0.590 | 0.584 | 0.581 | 0.578 | 0.577 | 0.576 | 0.575 | 0.574 | |
| 0.682 | 0.594 | 0.573 | 0.564 | 0.559 | 0.555 | 0.553 | 0.551 | 0.550 | 0.548 | |
| 0.691 | 0.588 | 0.560 | 0.547 | 0.539 | 0.534 | 0.530 | 0.528 | 0.526 | 0.524 | |
| 0.699 | 0.586 | 0.554 | 0.539 | 0.530 | 0.524 | 0.520 | 0.517 | 0.515 | 0.513 | |
| 0.718 | 0.586 | 0.544 | 0.524 | 0.512 | 0.504 | 0.498 | 0.493 | 0.490 | 0.487 | |
| 0.738 | 0.592 | 0.542 | 0.517 | 0.502 | 0.492 | 0.485 | 0.479 | 0.475 | 0.471 | |
| 0.759 | 0.603 | 0.545 | 0.515 | 0.497 | 0.485 | 0.476 | 0.469 | 0.464 | 0.460 | |
| 0.784 | 0.620 | 0.553 | 0.518 | 0.497 | 0.482 | 0.471 | 0.463 | 0.457 | 0.452 | |
| 0.815 | 0.647 | 0.572 | 0.530 | 0.503 | 0.485 | 0.472 | 0.462 | 0.454 | 0.448 | |
| 0.847 | 0.681 | 0.598 | 0.550 | 0.518 | 0.496 | 0.480 | 0.468 | 0.458 | 0.450 | |
| 0.882 | 0.728 | 0.639 | 0.584 | 0.546 | 0.520 | 0.500 | 0.484 | 0.472 | 0.462 | |
| 0.915 | 0.784 | 0.695 | 0.635 | 0.592 | 0.560 | 0.535 | 0.516 | 0.500 | 0.487 | |
| 0.943 | 0.842 | 0.763 | 0.702 | 0.655 | 0.619 | 0.590 | 0.566 | 0.546 | 0.530 | |
| 0.971 | 0.914 | 0.859 | 0.811 | 0.769 | 0.732 | 0.701 | 0.673 | 0.650 | 0.629 | |
| 0.987 | 0.960 | 0.933 | 0.905 | 0.879 | 0.853 | 0.830 | 0.807 | 0.786 | 0.766 | |
|
|
|
|
|
|
|
|
|
|
| |
| 0.700 | 0.701 | 0.701 | 0.702 | 0.702 | 0.703 | 0.703 | 0.704 | 0.704 | 0.704 | |
| 0.644 | 0.645 | 0.645 | 0.645 | 0.645 | 0.645 | 0.645 | 0.646 | 0.646 | 0.646 | |
| 0.604 | 0.604 | 0.604 | 0.604 | 0.604 | 0.604 | 0.604 | 0.604 | 0.604 | 0.604 | |
| 0.574 | 0.573 | 0.573 | 0.573 | 0.573 | 0.572 | 0.572 | 0.572 | 0.572 | 0.572 | |
| 0.548 | 0.547 | 0.546 | 0.546 | 0.545 | 0.545 | 0.544 | 0.544 | 0.544 | 0.543 | |
| 0.523 | 0.522 | 0.521 | 0.520 | 0.519 | 0.518 | 0.518 | 0.517 | 0.517 | 0.517 | |
| 0.511 | 0.510 | 0.509 | 0.508 | 0.507 | 0.506 | 0.506 | 0.505 | 0.505 | 0.504 | |
| 0.485 | 0.483 | 0.482 | 0.480 | 0.479 | 0.478 | 0.477 | 0.477 | 0.476 | 0.475 | |
| 0.469 | 0.466 | 0.464 | 0.463 | 0.461 | 0.460 | 0.459 | 0.458 | 0.457 | 0.456 | |
| 0.457 | 0.454 | 0.452 | 0.449 | 0.448 | 0.446 | 0.445 | 0.443 | 0.442 | 0.441 | |
| 0.448 | 0.444 | 0.442 | 0.439 | 0.437 | 0.435 | 0.433 | 0.432 | 0.430 | 0.429 | |
| 0.443 | 0.438 | 0.435 | 0.432 | 0.429 | 0.426 | 0.424 | 0.422 | 0.420 | 0.419 | |
| 0.444 | 0.439 | 0.434 | 0.430 | 0.426 | 0.423 | 0.421 | 0.418 | 0.416 | 0.414 | |
| 0.454 | 0.447 | 0.441 | 0.435 | 0.431 | 0.427 | 0.423 | 0.420 | 0.417 | 0.415 | |
| 0.476 | 0.467 | 0.459 | 0.452 | 0.446 | 0.441 | 0.436 | 0.432 | 0.428 | 0.424 | |
| 0.516 | 0.504 | 0.493 | 0.484 | 0.476 | 0.469 | 0.462 | 0.457 | 0.451 | 0.447 | |
| 0.611 | 0.594 | 0.580 | 0.567 | 0.555 | 0.544 | 0.535 | 0.526 | 0.518 | 0.511 | |
| 0.748 | 0.731 | 0.715 | 0.700 | 0.687 | 0.674 | 0.662 | 0.650 | 0.640 | 0.630 |
Figure 3Data used in the gene expression experiment. The plot shows the optimal (linear) classifier superimposed on the sample for the genes OXCT and WISP1, from the breast cancer study in [42]. We can see that both populations are approximately Gaussian with equal dispersion. Bad prognosis = red. Good prognosis = blue.
Bias and RMS of estimators considered in the experiment with expression data from genes ‘OXCT’ and ‘WISP1’
|
| n |
| Resub | Basic boot | Opt boot | 0.632 boot | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Bias | RMS | Bias | RMS | Bias | RMS | Bias | RMS | ||||
| 0.33 | 30 | 0.4043 | 0.4206 | −0.0702 | 0.1061 | 0.0008 | 0.0820 | −0.0161 | 0.0803 | −0.0253 | 0.0817 |
| 0.50 | 30 | 0.3969 | 0.4266 | −0.0719 | 0.1060 | 0.0072 | 0.0830 | −0.0116 | 0.0798 | −0.0219 | 0.0806 |
| 0.67 | 30 | 0.3893 | 0.4131 | −0.0914 | 0.1185 | −0.0181 | 0.0878 | −0.0355 | 0.0885 | −0.0451 | 0.0909 |
Also displayed are the assumed values for the prior probability c0, sample size n, the estimated value of the Bayes error ε∗, and the expected classification error E[ εn].