| Literature DB >> 35812814 |
David Goretzko1, Markus Bühner1.
Abstract
Determining the number of factors in exploratory factor analysis is probably the most crucial decision when conducting the analysis as it clearly influences the meaningfulness of the results (i.e., factorial validity). A new method called the Factor Forest that combines data simulation and machine learning has been developed recently. This method based on simulated data reached very high accuracy for multivariate normal data, but it has not yet been tested with ordinal data. Hence, in this simulation study, we evaluated the Factor Forest with ordinal data based on different numbers of categories (2-6 categories) and compared it to common factor retention criteria. It showed higher overall accuracy for all types of ordinal data than all common factor retention criteria that were used for comparison (Parallel Analysis, Comparison Data, the Empirical Kaiser Criterion and the Kaiser Guttman Rule). The results indicate that the Factor Forest is applicable to ordinal data with at least five categories (typical scale in questionnaire research) in the majority of conditions and to binary or ordinal data based on items with less categories when the sample size is large.Entities:
Keywords: exploratory factor analysis; factor retention; factorial validity; machine learning; number of factors; ordinal data
Year: 2022 PMID: 35812814 PMCID: PMC9265486 DOI: 10.1177/01466216221089345
Source DB: PubMed Journal: Appl Psychol Meas ISSN: 0146-6216
Figure 1.General idea of the factor forest.
Accuracy and Bias of the Factor Retention Criteria for Ordinal Indicators, Symmetric Thresholds and Different Numbers of Categories.
| Method |
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|
| FF | 0.891 | 0.803 | 0.021 | 0.888 | 0.027 | 0.913 | 0.029 | 0.924 | 0.034 | 0.929 | 0.033 |
| PA | 0.807 | 0.727 | −0.538 | 0.803 | −0.412 | 0.828 | −0.369 | 0.838 | −0.350 | 0.842 | −0.342 |
| CD | 0.692 | 0.550 | −1.654 | 0.693 | −1.006 | 0.730 | −0.839 | 0.743 | −0.727 | 0.746 | −0.654 |
| EKC | 0.647 | 0.645 | −0.812 | 0.651 | −0.996 | 0.647 | −1.031 | 0.645 | −1.049 | 0.645 | −1.057 |
| KG | 0.550 | 0.434 | 1.675 | 0.532 | 1.046 | 0.576 | 0.843 | 0.596 | 0.744 | 0.610 | 0.690 |
Note. Acc stands for accuracy - so Acc2− stands for the accuracy in conditions based on ordinal indicators with two levels (and symmetric thresholds). Bias describes the mean deviation of the suggested number of factors from the true number of factors in the respective conditions.
FF = Factor Forest; CD = Comparison Data; EKC = empirical Kaiser criterion; KG = Kaiser–Guttman rule.
Figure 2.Ordinal data with symmetric thresholds: Accuracy of factor retention for different true numbers of factors as well as for different sample sizes (N), inter-factor correlations (rho) and variables per factor (vpf).
Accuracy and Bias of the Factor Retention Criteria for Ordinal Indicators, Asymmetric Thresholds and Different Numbers of Categories.
| Method |
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|
| FF | 0.812 | 0.737 | 0.054 | 0.803 | 0.056 | 0.829 | 0.065 | 0.842 | 0.072 | 0.848 | 0.074 |
| PA | 0.729 | 0.646 | −0.548 | 0.721 | −0.418 | 0.747 | −0.377 | 0.761 | −0.356 | 0.768 | −0.352 |
| CD | 0.632 | 0.515 | −1.759 | 0.603 | −1.362 | 0.658 | −1.125 | 0.686 | −1.006 | 0.699 | −0.944 |
| EKC | 0.648 | 0.633 | −0.702 | 0.654 | −0.939 | 0.652 | −0.986 | 0.651 | −1.005 | 0.649 | −1.015 |
| KG | 0.501 | 0.394 | 1.902 | 0.493 | 1.300 | 0.523 | 1.098 | 0.542 | 0.995 | 0.553 | 0.937 |
Note. Acc stands for accuracy - so Acc2− stands for the accuracy in conditions based on ordinal indicators with two levels (and asymmetric thresholds). Bias describes the mean deviation of the suggested number of factors from the true number of factors in the respective conditions.
FF = Factor Forest; CD = Comparison Data; EKC = empirical Kaiser criterion; KG = Kaiser–Guttman rule.
Figure 3.Ordinal data with asymmetric thresholds: Accuracy of factor retention for different true numbers of factors as well as for different sample sizes (N), inter-factor correlations (rho) and variables per factor (vpf).