| Literature DB >> 24616711 |
Jocelyn Holden Bolin1, W Holmes Finch1.
Abstract
Statistical classification of phenomena into observed groups is very common in the social and behavioral sciences. Statistical classification methods, however, are affected by the characteristics of the data under study. Statistical classification can be further complicated by initial misclassification of the observed groups. The purpose of this study is to investigate the impact of initial training data misclassification on several statistical classification and data mining techniques. Misclassification conditions in the three group case will be simulated and results will be presented in terms of overall as well as subgroup classification accuracy. Results show decreased classification accuracy as sample size, group separation and group size ratio decrease and as misclassification percentage increases with random forests demonstrating the highest accuracy across conditions.Entities:
Keywords: classification and regression trees; discriminant analysis; misclassification; random forests; supervised classification; training data
Year: 2014 PMID: 24616711 PMCID: PMC3937587 DOI: 10.3389/fpsyg.2014.00118
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Figure 1Depiction of simulated three group misclassification situations. Note. Groups A, B, and C fall along a continuum such that < < . Darker shaded regions represent higher probability of misclassification.
Simulation conditions for the single predictor three group case.
| Type of 3 group overlap | BC, AB/BC |
| Population variance | 1 |
| Manipulated variables | |
| Statistical analysis method | LDA, QDA, LR, CART, GAM, NNET, MIXDA, RF |
| Percent misclassified (%) | 0, 10, 20, 30 |
| Sample size | 150, 1500 |
| Sample size ratio | 50:50:50, 25:25:100, 25:100:25, 100:25:25 |
| Standardized mean diff. | 0.2, 0.5, 0.8, 1.6 |
Figure 2Overall misclassification rates by method, sample size, and group size ratio.
Figure 3Overall misclassification rate by method and proportion of cases initially misclassified.
Figure 4Overall misclassification rate by method and group separation.
Group misclassification percentage by method, misclassification proportion, group size ratio, and group separation.
| Misclass proportion | 0 | 0.191 | 0.826 | 0.766 | 0.197 | 0.818 | 0.528 | 0.389 | 0.682 | 0.382 | 0.201 | 0.759 | 0.501 | 0.000 | 0.000 | 0.000 | 0.570 | 0.800 | 0.613 | 0.560 | 0.799 | 0.593 |
| 0.1 | 0.205 | 0.695 | 0.876 | 0.208 | 0.707 | 0.616 | 0.388 | 0.649 | 0.362 | 0.241 | 0.644 | 0.579 | 0.096 | 0.098 | 0.098 | 0.513 | 0.730 | 0.629 | 0.409 | 0.678 | 0.629 | |
| 0.2 | 0.251 | 0.568 | 0.948 | 0.254 | 0.584 | 0.745 | 0.420 | 0.634 | 0.368 | 0.314 | 0.521 | 0.692 | 0.201 | 0.205 | 0.205 | 0.589 | 0.713 | 0.725 | 0.546 | 0.648 | 0.770 | |
| 0.3 | 0.334 | 0.425 | 0.979 | 0.346 | 0.424 | 0.835 | 0.461 | 0.649 | 0.372 | 0.427 | 0.384 | 0.790 | 0.299 | 0.298 | 0.298 | 0.760 | 0.751 | 0.821 | 0.719 | 0.701 | 0.888 | |
| Group size ratio | 100/25/25 | 0.067 | 0.814 | 0.965 | 0.066 | 0.818 | 0.802 | 0.451 | 0.572 | 0.350 | 0.117 | 0.747 | 0.767 | 0.178 | 0.187 | 0.187 | 0.699 | 0.890 | 0.822 | 0.643 | 0.860 | 0.865 |
| 25/100/25 | 0.312 | 0.546 | 0.875 | 0.319 | 0.553 | 0.648 | 0.401 | 0.684 | 0.380 | 0.359 | 0.520 | 0.625 | 0.153 | 0.153 | 0.153 | 0.585 | 0.694 | 0.668 | 0.520 | 0.649 | 0.677 | |
| 25/25/100 | 0.132 | 0.734 | 0.917 | 0.138 | 0.738 | 0.717 | 0.421 | 0.632 | 0.384 | 0.175 | 0.642 | 0.626 | 0.147 | 0.146 | 0.146 | 0.596 | 0.818 | 0.745 | 0.553 | 0.793 | 0.776 | |
| 50/50/50 | 0.442 | 0.426 | 0.850 | 0.453 | 0.431 | 0.612 | 0.399 | 0.704 | 0.363 | 0.511 | 0.404 | 0.597 | 0.150 | 0.150 | 0.150 | 0.583 | 0.616 | 0.599 | 0.544 | 0.544 | 0.620 | |
| Group separation | 0.2 | 0.337 | 0.677 | 0.945 | 0.345 | 0.669 | 0.932 | 0.573 | 0.673 | 0.630 | 0.390 | 0.586 | 0.889 | 0.154 | 0.152 | 0.152 | 0.858 | 0.886 | 0.934 | 0.872 | 0.886 | 0.930 |
| 0.5 | 0.277 | 0.681 | 0.907 | 0.288 | 0.677 | 0.833 | 0.439 | 0.776 | 0.409 | 0.314 | 0.620 | 0.784 | 0.158 | 0.159 | 0.159 | 0.735 | 0.840 | 0.862 | 0.698 | 0.796 | 0.862 | |
| 0.8 | 0.230 | 0.655 | 0.869 | 0.239 | 0.657 | 0.685 | 0.367 | 0.708 | 0.296 | 0.270 | 0.609 | 0.649 | 0.154 | 0.158 | 0.158 | 0.597 | 0.764 | 0.735 | 0.477 | 0.701 | 0.730 | |
| 1.6 | 0.149 | 0.463 | 0.872 | 0.145 | 0.494 | 0.305 | 0.286 | 0.451 | 0.150 | 0.230 | 0.457 | 0.267 | 0.158 | 0.160 | 0.160 | 0.254 | 0.496 | 0.275 | 0.198 | 0.429 | 0.385 | |