| Literature DB >> 24904445 |
W Holmes Finch1, Jocelyn H Bolin1, Ken Kelley2.
Abstract
Classification using standard statistical methods such as linear discriminant analysis (LDA) or logistic regression (LR) presume knowledge of group membership prior to the development of an algorithm for prediction. However, in many real world applications members of the same nominal group, might in fact come from different subpopulations on the underlying construct. For example, individuals diagnosed with depression will not all have the same levels of this disorder, though for the purposes of LDA or LR they will be treated in the same manner. The goal of this simulation study was to examine the performance of several methods for group classification in the case where within group membership was not homogeneous. For example, suppose there are 3 known groups but within each group two unknown classes. Several approaches were compared, including LDA, LR, classification and regression trees (CART), generalized additive models (GAM), and mixture discriminant analysis (MIXDA). Results of the study indicated that CART and mixture discriminant analysis were the most effective tools for situations in which known groups were not homogeneous, whereas LDA, LR, and GAM had the highest rates of misclassification. Implications of these results for theory and practice are discussed.Entities:
Keywords: classification trees; discriminant analysis; generalized additive models; mixture models; subgroup analysis
Year: 2014 PMID: 24904445 PMCID: PMC4033219 DOI: 10.3389/fpsyg.2014.00337
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Correlation matrix used in simulating predictor variable values.
| X1 | 1 | 0.76 | 0.58 | 0.43 | 0.39 |
| X2 | 0.76 | 1 | 0.57 | 0.36 | 0.49 |
| X3 | 0.58 | 0.57 | 1 | 0.45 | 0.74 |
| X4 | 0.43 | 0.36 | 0.45 | 1 | 0.69 |
| X5 | 0.39 | 0.49 | 0.74 | 0.69 | 1 |
Overall misclassification rates by method and subgroup overlap.
| 0.00 | 0.510 | 0.446 | 0.412 | 0.321 | 0.498 |
| 0.05 | 0.522 | 0.460 | 0.408 | 0.328 | 0.512 |
| 0.10 | 0.529 | 0.466 | 0.425 | 0.349 | 0.521 |
| 0.15 | 0.542 | 0.472 | 0.440 | 0.363 | 0.529 |
| 0.20 | 0.559 | 0.480 | 0.481 | 0.382 | 0.538 |
Overall misclassification rates by method, degree of subgroup overlap (overlap), sample size (N), known group sample size ratio (Nratio), subgroup sample size ratio (Sratio), and difference in known group means (D).
| 150 | 0.00 | 0.548 | 0.469 | 0.369 | 0.268 | 0.511 |
| 0.05 | 0.563 | 0.477 | 0.323 | 0.248 | 0.513 | |
| 0.10 | 0.571 | 0.482 | 0.327 | 0.266 | 0.523 | |
| 0.15 | 0.588 | 0.490 | 0.366 | 0.276 | 0.534 | |
| 0.20 | 0.595 | 0.498 | 0.408 | 0.306 | 0.546 | |
| 300 | 0.00 | 0.514 | 0.453 | 0.386 | 0.294 | 0.511 |
| 0.05 | 0.546 | 0.484 | 0.419 | 0.333 | 0.548 | |
| 0.10 | 0.554 | 0.492 | 0.454 | 0.365 | 0.559 | |
| 0.15 | 0.568 | 0.499 | 0.446 | 0.384 | 0.569 | |
| 0.20 | 0.585 | 0.505 | 0.506 | 0.403 | 0.576 | |
| 750 | 0.00 | 0.469 | 0.416 | 0.481 | 0.401 | 0.471 |
| 0.05 | 0.458 | 0.418 | 0.483 | 0.403 | 0.475 | |
| 0.10 | 0.463 | 0.424 | 0.495 | 0.417 | 0.481 | |
| 0.15 | 0.470 | 0.429 | 0.508 | 0.429 | 0.485 | |
| 0.20 | 0.498 | 0.436 | 0.528 | 0.438 | 0.490 | |
| Equal | 0.00 | 0.583 | 0.501 | 0.403 | 0.350 | 0.529 |
| 0.05 | 0.589 | 0.512 | 0.395 | 0.355 | 0.539 | |
| 0.10 | 0.600 | 0.519 | 0.416 | 0.384 | 0.547 | |
| 0.15 | 0.619 | 0.527 | 0.426 | 0.404 | 0.556 | |
| 0.20 | 0.644 | 0.537 | 0.476 | 0.431 | 0.565 | |
| 75/25 | 0.00 | 0.365 | 0.335 | 0.430 | 0.262 | 0.434 |
| 0.05 | 0.388 | 0.356 | 0.435 | 0.275 | 0.458 | |
| 0.10 | 0.388 | 0.360 | 0.444 | 0.281 | 0.469 | |
| 0.15 | 0.389 | 0.363 | 0.468 | 0.282 | 0.476 | |
| 0.20 | 0.391 | 0.366 | 0.490 | 0.285 | 0.483 | |
| Equal | 0.00 | 0.478 | 0.421 | 0.371 | 0.287 | 0.480 |
| 0.05 | 0.489 | 0.434 | 0.357 | 0.290 | 0.492 | |
| 0.10 | 0.494 | 0.439 | 0.379 | 0.317 | 0.501 | |
| 0.15 | 0.504 | 0.444 | 0.396 | 0.335 | 0.510 | |
| 0.20 | 0.517 | 0.451 | 0.454 | 0.360 | 0.518 | |
| 75/25 | 0.00 | 0.574 | 0.495 | 0.494 | 0.389 | 0.533 |
| 0.05 | 0.588 | 0.512 | 0.511 | 0.404 | 0.553 | |
| 0.10 | 0.599 | 0.520 | 0.518 | 0.413 | 0.560 | |
| 0.15 | 0.619 | 0.529 | 0.527 | 0.420 | 0.568 | |
| 0.20 | 0.644 | 0.538 | 0.534 | 0.427 | 0.576 | |
| 0.2 | 0.00 | 0.607 | 0.537 | 0.494 | 0.392 | 0.606 |
| 0.05 | 0.597 | 0.541 | 0.506 | 0.408 | 0.615 | |
| 0.10 | 0.607 | 0.545 | 0.528 | 0.445 | 0.620 | |
| 0.15 | 0.636 | 0.551 | 0.540 | 0.466 | 0.623 | |
| 0.20 | 0.674 | 0.557 | 0.625 | 0.499 | 0.624 | |
| 0.5 | 0.00 | 0.499 | 0.445 | 0.432 | 0.328 | 0.505 |
| 0.05 | 0.518 | 0.459 | 0.401 | 0.318 | 0.515 | |
| 0.10 | 0.524 | 0.467 | 0.412 | 0.332 | 0.526 | |
| 0.15 | 0.530 | 0.475 | 0.433 | 0.342 | 0.539 | |
| 0.20 | 0.536 | 0.481 | 0.449 | 0.358 | 0.550 | |
| 0.8 | 0.00 | 0.426 | 0.355 | 0.311 | 0.244 | 0.381 |
| 0.05 | 0.451 | 0.380 | 0.319 | 0.258 | 0.406 | |
| 0.10 | 0.457 | 0.386 | 0.335 | 0.272 | 0.416 | |
| 0.15 | 0.461 | 0.391 | 0.347 | 0.281 | 0.427 | |
| 0.20 | 0.469 | 0.401 | 0.369 | 0.291 | 0.439 | |
| Min | 0.23 | 0.20 | 0.01 | 0.15 | 0.26 | |
| Max | 0.83 | 0.63 | 0.71 | 0.67 | 0.71 | |
| Median | 0.52 | 0.46 | 0.44 | 0.32 | 0.54 | |
| Mean | 0.53 | 0.46 | 0.43 | 0.35 | 0.52 | |
| IQR | 0.15 | 0.12 | 0.16 | 0.19 | 0.23 | |
Figure 1Increase in overall misclassification rate from equal subgroup ratio to 75/25 ratio, by method and degree of subgroup overlap.
By group misclassification rates by method and subgroup overlap (overlap).
| 0.00 | 0.382 | 0.579 | 0.330 | 0.603 | 0.420 | 0.531 | 0.299 | 0.409 | 0.754 | 0.206 |
| 0.05 | 0.353 | 0.615 | 0.310 | 0.646 | 0.428 | 0.477 | 0.274 | 0.413 | 0.748 | 0.208 |
| 0.10 | 0.361 | 0.665 | 0.308 | 0.638 | 0.447 | 0.505 | 0.288 | 0.440 | 0.727 | 0.245 |
| 0.15 | 0.392 | 0.701 | 0.325 | 0.668 | 0.440 | 0.538 | 0.312 | 0.444 | 0.709 | 0.264 |
| 0.20 | 0.417 | 0.689 | 0.348 | 0.602 | 0.437 | 0.500 | 0.314 | 0.410 | 0.732 | 0.254 |
By group misclassification rates by method, degree of subgroup overlap (overlap), sample size (N), known group sample size ratio (Nratio), subgroup sample size ratio (Sratio), and difference in known group means (D).
| 150 | 0.00 | 0.338 | 0.582 | 0.290 | 0.616 | 0.542 | 0.538 | 0.285 | 0.322 | 0.768 | 0.204 |
| 0.05 | 0.388 | 0.651 | 0.314 | 0.660 | 0.416 | 0.364 | 0.249 | 0.259 | 0.728 | 0.191 | |
| 0.10 | 0.407 | 0.703 | 0.314 | 0.669 | 0.480 | 0.426 | 0.286 | 0.319 | 0.763 | 0.207 | |
| 0.15 | 0.414 | 0.721 | 0.318 | 0.670 | 0.511 | 0.408 | 0.279 | 0.270 | 0.756 | 0.225 | |
| 0.20 | 0.413 | 0.726 | 0.338 | 0.631 | 0.565 | 0.378 | 0.287 | 0.273 | 0.705 | 0.258 | |
| 300 | 0.00 | 0.333 | 0.577 | 0.301 | 0.624 | 0.421 | 0.474 | 0.239 | 0.375 | 0.783 | 0.189 |
| 0.05 | 0.354 | 0.634 | 0.317 | 0.677 | 0.464 | 0.505 | 0.257 | 0.417 | 0.756 | 0.210 | |
| 0.10 | 0.370 | 0.676 | 0.325 | 0.687 | 0.509 | 0.555 | 0.285 | 0.471 | 0.754 | 0.229 | |
| 0.15 | 0.399 | 0.683 | 0.327 | 0.696 | 0.434 | 0.580 | 0.302 | 0.488 | 0.732 | 0.256 | |
| 0.20 | 0.464 | 0.702 | 0.385 | 0.655 | 0.427 | 0.597 | 0.341 | 0.474 | 0.752 | 0.253 | |
| 750 | 0.00 | 0.470 | 0.578 | 0.394 | 0.572 | 0.313 | 0.581 | 0.373 | 0.520 | 0.713 | 0.226 |
| 0.05 | 0.317 | 0.561 | 0.299 | 0.600 | 0.403 | 0.562 | 0.315 | 0.562 | 0.760 | 0.224 | |
| 0.10 | 0.303 | 0.615 | 0.284 | 0.551 | 0.344 | 0.530 | 0.293 | 0.525 | 0.661 | 0.302 | |
| 0.15 | 0.363 | 0.700 | 0.330 | 0.635 | 0.378 | 0.621 | 0.356 | 0.569 | 0.635 | 0.311 | |
| 0.20 | 0.359 | 0.631 | 0.313 | 0.501 | 0.303 | 0.514 | 0.312 | 0.484 | 0.736 | 0.252 | |
| Equal | 0.00 | 0.485 | 0.715 | 0.417 | 0.740 | 0.341 | 0.685 | 0.328 | 0.511 | 0.693 | 0.195 |
| 0.05 | 0.440 | 0.726 | 0.386 | 0.774 | 0.304 | 0.583 | 0.278 | 0.503 | 0.706 | 0.166 | |
| 0.10 | 0.480 | 0.754 | 0.405 | 0.792 | 0.352 | 0.647 | 0.323 | 0.555 | 0.716 | 0.179 | |
| 0.15 | 0.509 | 0.763 | 0.414 | 0.798 | 0.358 | 0.665 | 0.345 | 0.538 | 0.705 | 0.198 | |
| 0.20 | 0.586 | 0.783 | 0.482 | 0.743 | 0.327 | 0.657 | 0.367 | 0.516 | 0.736 | 0.166 | |
| 75/25 | 0.00 | 0.189 | 0.324 | 0.168 | 0.347 | 0.569 | 0.241 | 0.246 | 0.219 | 0.868 | 0.227 |
| 0.05 | 0.181 | 0.394 | 0.158 | 0.388 | 0.674 | 0.264 | 0.266 | 0.232 | 0.831 | 0.294 | |
| 0.10 | 0.149 | 0.507 | 0.137 | 0.363 | 0.617 | 0.254 | 0.225 | 0.235 | 0.747 | 0.364 | |
| 0.15 | 0.145 | 0.567 | 0.136 | 0.391 | 0.615 | 0.267 | 0.242 | 0.243 | 0.716 | 0.402 | |
| 0.20 | 0.134 | 0.532 | 0.126 | 0.367 | 0.620 | 0.238 | 0.227 | 0.234 | 0.724 | 0.402 | |
| Equal | 0.00 | 0.322 | 0.493 | 0.280 | 0.521 | 0.431 | 0.408 | 0.277 | 0.336 | 0.832 | 0.178 |
| 0.05 | 0.310 | 0.561 | 0.271 | 0.583 | 0.447 | 0.333 | 0.252 | 0.332 | 0.784 | 0.201 | |
| 0.10 | 0.298 | 0.619 | 0.256 | 0.557 | 0.473 | 0.353 | 0.252 | 0.371 | 0.764 | 0.245 | |
| 0.15 | 0.321 | 0.666 | 0.277 | 0.597 | 0.458 | 0.398 | 0.284 | 0.386 | 0.757 | 0.247 | |
| 0.20 | 0.341 | 0.644 | 0.290 | 0.543 | 0.423 | 0.408 | 0.303 | 0.372 | 0.750 | 0.261 | |
| 75/25 | 0.00 | 0.476 | 0.714 | 0.407 | 0.731 | 0.403 | 0.722 | 0.334 | 0.523 | 0.632 | 0.251 |
| 0.05 | 0.440 | 0.725 | 0.388 | 0.771 | 0.388 | 0.766 | 0.317 | 0.574 | 0.674 | 0.224 | |
| 0.10 | 0.471 | 0.747 | 0.402 | 0.782 | 0.402 | 0.776 | 0.351 | 0.561 | 0.662 | 0.247 | |
| 0.15 | 0.519 | 0.763 | 0.411 | 0.794 | 0.410 | 0.788 | 0.363 | 0.548 | 0.622 | 0.294 | |
| 0.20 | 0.599 | 0.798 | 0.490 | 0.745 | 0.471 | 0.723 | 0.342 | 0.503 | 0.688 | 0.237 | |
| 0.2 | 0.00 | 0.548 | 0.746 | 0.423 | 0.728 | 0.533 | 0.670 | 0.386 | 0.482 | 0.759 | 0.308 |
| 0.05 | 0.401 | 0.744 | 0.328 | 0.765 | 0.563 | 0.607 | 0.316 | 0.534 | 0.762 | 0.309 | |
| 0.10 | 0.418 | 0.856 | 0.323 | 0.737 | 0.542 | 0.620 | 0.321 | 0.583 | 0.632 | 0.404 | |
| 0.15 | 0.463 | 0.870 | 0.327 | 0.738 | 0.465 | 0.605 | 0.344 | 0.502 | 0.549 | 0.472 | |
| 0.20 | 0.575 | 0.903 | 0.420 | 0.572 | 0.478 | 0.575 | 0.386 | 0.461 | 0.489 | 0.535 | |
| 0.5 | 0.00 | 0.333 | 0.557 | 0.311 | 0.614 | 0.413 | 0.514 | 0.280 | 0.408 | 0.838 | 0.137 |
| 0.05 | 0.356 | 0.592 | 0.323 | 0.636 | 0.406 | 0.461 | 0.283 | 0.382 | 0.877 | 0.110 | |
| 0.10 | 0.353 | 0.607 | 0.317 | 0.628 | 0.412 | 0.443 | 0.282 | 0.358 | 0.908 | 0.110 | |
| 0.15 | 0.402 | 0.694 | 0.364 | 0.716 | 0.462 | 0.552 | 0.320 | 0.450 | 0.925 | 0.092 | |
| 0.20 | 0.378 | 0.659 | 0.340 | 0.663 | 0.460 | 0.525 | 0.310 | 0.441 | 0.948 | 0.103 | |
| 0.8 | 0.00 | 0.278 | 0.441 | 0.261 | 0.464 | 0.316 | 0.414 | 0.237 | 0.338 | 0.642 | 0.194 |
| 0.05 | 0.302 | 0.509 | 0.279 | 0.536 | 0.313 | 0.363 | 0.223 | 0.322 | 0.604 | 0.206 | |
| 0.10 | 0.304 | 0.509 | 0.283 | 0.535 | 0.375 | 0.439 | 0.256 | 0.359 | 0.654 | 0.202 | |
| 0.15 | 0.303 | 0.517 | 0.283 | 0.542 | 0.391 | 0.448 | 0.268 | 0.373 | 0.672 | 0.201 | |
| 0.20 | 0.321 | 0.535 | 0.295 | 0.560 | 0.375 | 0.406 | 0.256 | 0.330 | 0.700 | 0.180 | |
Figure 2Increase in misclassification rate from equal subgroup ratio to 75/25 ratio, by method, group and degree of subgroup overlap.
Summary of simulation study results.
| Method | CART had lowest misclassification rates; MIXDA had second lowest misclassification rates; LDA, GAM, and LR had highest misclassification rates |
| Overlap | More overlap led to higher misclassification |
| Larger N generally led to lower misclassification rates for LDA, GAM, and LR. Larger N led to higher misclassification rates for CART and MIXDA | |
| Nratio | Known group size inequality led to lower misclassification rates for all methods except MIXDA |
| Subgroup ratio | Subgroup size inequality led to higher misclassification rates for all methods |
| Group separation | Greater known group separation led to lower misclassification rates for all methods |