| Literature DB >> 35444335 |
Abstract
Determining the number of factors in exploratory factor analysis is arguably the most crucial decision a researcher faces when conducting the analysis. While several simulation studies exist that compare various so-called factor retention criteria under different data conditions, little is known about the impact of missing data on this process. Hence, in this study, we evaluated the performance of different factor retention criteria-the Factor Forest, parallel analysis based on a principal component analysis as well as parallel analysis based on the common factor model and the comparison data approach-in combination with different missing data methods, namely an expectation-maximization algorithm called Amelia, predictive mean matching, and random forest imputation within the multiple imputations by chained equations (MICE) framework as well as pairwise deletion with regard to their accuracy in determining the number of factors when data are missing. Data were simulated for different sample sizes, numbers of factors, numbers of manifest variables (indicators), between-factor correlations, missing data mechanisms and proportions of missing values. In the majority of conditions and for all factor retention criteria except the comparison data approach, the missing data mechanism had little impact on the accuracy and pairwise deletion performed comparably well as the more sophisticated imputation methods. In some conditions, especially small-sample cases and when comparison data were used to determine the number of factors, random forest imputation was preferable to other missing data methods, though. Accordingly, depending on data characteristics and the selected factor retention criterion, choosing an appropriate missing data method is crucial to obtain a valid estimate of the number of factors to extract.Entities:
Keywords: exploratory factor analysis; factor retention; missing data; multiple imputation
Year: 2021 PMID: 35444335 PMCID: PMC9014734 DOI: 10.1177/00131644211022031
Source DB: PubMed Journal: Educ Psychol Meas ISSN: 0013-1644 Impact factor: 2.821
Overall Accuracy of the Factor Retention Criteria in Combination With Different Missing Data Methods.
| Criterion | pair | pmm | rf | em |
|---|---|---|---|---|
| Mode | ||||
| FF | 0.966 | 0.912 | 0.984 | 0.959 |
| | 0.983 | 0.911 | 0.988 | 0.975 |
| | 0.913 | 0.899 | 0.876 | 0.896 |
| CD | 0.656 | 0.549 | 0.962 | 0.699 |
| Cor | ||||
| | NA | 0.982 | 0.985 | 0.991 |
| | NA | 0.907 | 0.874 | 0.896 |
Note. pair stands for pairwise deletion, pmm for predictive mean matching, rf for random forest imputation, and em for the Amelia algorithm. Mode and Cor indicate which aggregation strategy was used for PA. FF = Factor Forest; = parallel analysis based on common factor model; = parallel analysis based on principal component analysis; CD = comparison data; NA = not applicable.
Estimated Bias of the Factor Retention Critieria in Combination With Different Missing Data Methods.
| Criterion | pair | pmm | rf | em |
|---|---|---|---|---|
| Mode | ||||
| FF | 0.070 | 0.193 | 0.018 | 0.076 |
| | 0.013 | 0.106 | −0.015 | 0.018 |
| | −0.178 | −0.175 | −0.296 | −0.223 |
| CD | 0.511 | 0.904 | 0.007 | 0.430 |
| Cor | ||||
| | NA | 0.013 | −0.021 | −0.001 |
| | NA | −0.195 | −0.311 | −0.231 |
Note. pair stands for pairwise deletion, pmm for predictive mean matching, rf for random forest imputation, and em for the Amelia algorithm. Mode and Cor indicate which aggregation strategy was used for PA. FF = Factor Forest; = parallel analysis based on common factor model; = parallel analysis based on principal component analysis; CD = comparison data; NA = not applicable.
Overall Accuracy of the Factor Retention Criteria in Combination With Different Missing Data Methods for Small Sample Sizes (N = 250).
| Criterion | pair | pmm | rf | em |
|---|---|---|---|---|
| Mode | ||||
| FF | 0.908 | 0.772 | 0.953 | 0.886 |
| | 0.959 | 0.777 | 0.965 | 0.930 |
| | 0.846 | 0.814 | 0.784 | 0.822 |
| CD | 0.546 | 0.422 | 0.919 | 0.547 |
| Cor | ||||
| | NA | 0.954 | 0.956 | 0.975 |
| | NA | 0.836 | 0.777 | 0.823 |
Note. pair stands for pairwise deletion, pmm for predictive mean matching, rf for random forest imputation and em for the Amelia algorithm. Mode and Cor indicate which aggregation strategy was used for PA. FF = Factor Forest; = parallel analysis based on common factor model; = parallel analysis based on principal component analysis; CD = comparison data; NA = not applicable.
Accuracy of All Combinations of Factor Retention Criteria and Missing Data Methods for Interfactor Correlations .
| Criterion |
|
|
| |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| pair | pmm | rf | em | pair | pmm | rf | em | pair | pmm | rf | em | |
| Mode | ||||||||||||
| FF | 0.977 | 0.925 | 0.996 | 0.964 | 0.972 | 0.919 | 0.991 | 0.966 | 0.947 | 0.893 | 0.966 | 0.947 |
| | 0.987 | 0.902 | 0.999 | 0.969 | 0.987 | 0.911 | 0.996 | 0.980 | 0.975 | 0.919 | 0.969 | 0.976 |
| | 0.998 | 0.974 | 0.997 | 0.996 | 0.970 | 0.965 | 0.941 | 0.958 | 0.770 | 0.759 | 0.692 | 0.732 |
| CD | 0.634 | 0.495 | 0.969 | 0.633 | 0.661 | 0.550 | 0.969 | 0.704 | 0.673 | 0.602 | 0.948 | 0.760 |
| Cor | ||||||||||||
| | NA | 0.982 | 0.999 | 0.994 | NA | 0.985 | 0.994 | 0.996 | NA | 0.979 | 0.962 | 0.984 |
| | NA | 0.998 | 0.996 | 0.999 | NA | 0.967 | 0.938 | 0.958 | NA | 0.757 | 0.689 | 0.732 |
Note. pair stands for pairwise deletion, pmm for predictive mean matching, rf for random forest imputation and em for the Amelia algorithm. Mode and Cor indicate which aggregation strategy was used for PA. FF = Factor Forest; = parallel analysis based on common factor model; = parallel analysis based on principal component analysis; CD = comparison data; NA = not applicable.
Figure 1.Accuracy of all combinations of factor retention criteria and missing data methods for different amounts of missingness (10% vs. 25%) and different factor solutions (CD + em/pmm/pair was excluded in conditions with 25% missingness as the accuracy was less than 50%).
Accuracy of All Combinations of Factor Retention Criteria and Missing Data Methods for Different Sample Sizes, Numbers of Manifest Variables and Proportions of Missingness (Part 1).
|
|
|
| FF | PA-FA-mode | PA-PCA-mode | CD | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| em | pair | pmm | rf | em | pair | pmm | rf | em | pair | pmm | rf | em | pair | pmm | rf | |||
| 250 | 16 | 10 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.99 | 0.99 | 0.83 | 0.84 | 0.83 | 0.82 | 0.91 | 0.87 | 0.85 | 0.97 |
| 250 | 16 | 25 | 0.92 | 0.91 | 0.81 | 0.99 | 0.90 | 0.88 | 0.72 | 0.96 | 0.81 | 0.83 | 0.82 | 0.77 | 0.31 | 0.30 | 0.10 | 0.88 |
| 250 | 24 | 10 | 0.96 | 0.95 | 0.93 | 0.98 | 0.98 | 0.98 | 0.98 | 0.96 | 0.78 | 0.80 | 0.79 | 0.76 | 0.87 | 0.83 | 0.81 | 0.94 |
| 250 | 24 | 25 | 0.58 | 0.63 | 0.45 | 0.77 | 0.87 | 0.91 | 0.65 | 0.90 | 0.76 | 0.79 | 0.78 | 0.69 | 0.23 | 0.26 | 0.06 | 0.82 |
| 250 | 36 | 10 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.99 | 0.83 | 0.84 | 0.84 | 0.82 | 0.85 | 0.78 | 0.75 | 0.97 |
| 250 | 36 | 25 | 0.93 | 0.96 | 0.70 | 0.99 | 0.87 | 0.95 | 0.50 | 0.95 | 0.82 | 0.84 | 0.77 | 0.76 | 0.18 | 0.24 | 0.03 | 0.89 |
| 250 | 48 | 10 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.92 | 0.94 | 0.93 | 0.88 | 0.85 | 0.79 | 0.73 | 0.98 |
| 250 | 48 | 25 | 0.82 | 0.93 | 0.39 | 0.99 | 0.84 | 0.96 | 0.33 | 0.99 | 0.89 | 0.93 | 0.79 | 0.83 | 0.17 | 0.30 | 0.03 | 0.94 |
| 500 | 16 | 10 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.89 | 0.91 | 0.90 | 0.89 | 0.94 | 0.91 | 0.91 | 0.99 |
| 500 | 16 | 25 | 1.00 | 0.99 | 0.98 | 1.00 | 0.98 | 0.96 | 0.91 | 1.00 | 0.87 | 0.91 | 0.90 | 0.85 | 0.43 | 0.37 | 0.14 | 0.92 |
| 500 | 24 | 10 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.88 | 0.88 | 0.88 | 0.88 | 0.95 | 0.89 | 0.91 | 0.99 |
| 500 | 24 | 25 | 0.95 | 0.94 | 0.83 | 1.00 | 0.99 | 0.99 | 0.93 | 1.00 | 0.86 | 0.88 | 0.88 | 0.84 | 0.36 | 0.33 | 0.09 | 0.96 |
| 500 | 36 | 10 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.94 | 0.96 | 0.95 | 0.92 | 0.96 | 0.89 | 0.92 | 1.00 |
| 500 | 36 | 25 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.94 | 1.00 | 0.90 | 0.95 | 0.95 | 0.85 | 0.41 | 0.39 | 0.11 | 0.98 |
| 500 | 48 | 10 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.98 | 0.92 | 0.95 | 1.00 |
| 500 | 48 | 25 | 1.00 | 1.00 | 0.96 | 1.00 | 1.00 | 1.00 | 0.90 | 1.00 | 1.00 | 1.00 | 1.00 | 0.97 | 0.50 | 0.51 | 0.16 | 0.99 |
| 1,000 | 16 | 10 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.98 | 0.98 | 0.98 | 0.97 | 0.98 | 0.94 | 0.96 | 0.99 |
| 1,000 | 16 | 25 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.98 | 0.98 | 1.00 | 0.94 | 0.97 | 0.96 | 0.94 | 0.61 | 0.47 | 0.28 | 0.93 |
| 1,000 | 24 | 10 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.89 | 0.90 | 0.90 | 0.89 | 0.99 | 0.94 | 0.98 | 1.00 |
| 1,000 | 24 | 25 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.99 | 1.00 | 0.89 | 0.90 | 0.89 | 0.89 | 0.65 | 0.51 | 0.29 | 0.99 |
| 1,000 | 36 | 10 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.96 | 0.99 | 1.00 |
| 1,000 | 36 | 25 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.98 | 0.82 | 0.67 | 0.50 | 1.00 |
| 1,000 | 48 | 10 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.98 | 1.00 | 1.00 |
| 1,000 | 48 | 25 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.92 | 0.79 | 0.69 | 1.00 |
Note. pair stands for pairwise deletion, pmm for predictive mean matching, rf for random forest imputation and em for the Amelia algorithm. PA-FA = Parallel analysis based on the factor model; PA-PCA = parallel analysis based on principal component analysis; FF = Factor Forest.
Accuracy of All Combinations of Factor Retention Criteria and Missing Data Methods for Different Sample Sizes, Numbers of Manifest Variables and Proportions of Missingness (Part 2).
|
|
|
| PA-FA-cor | PA-PCA-cor | ||||
|---|---|---|---|---|---|---|---|---|
| em | pmm | rf | em | pmm | rf | |||
| 250 | 16 | 10 | 0.99 | 0.99 | 0.99 | 0.83 | 0.83 | 0.82 |
| 250 | 16 | 25 | 0.95 | 0.89 | 0.96 | 0.81 | 0.82 | 0.77 |
| 250 | 24 | 10 | 0.98 | 0.98 | 0.96 | 0.78 | 0.79 | 0.76 |
| 250 | 24 | 25 | 0.93 | 0.90 | 0.88 | 0.75 | 0.78 | 0.68 |
| 250 | 36 | 10 | 1.00 | 1.00 | 0.99 | 0.83 | 0.84 | 0.82 |
| 250 | 36 | 25 | 0.98 | 0.94 | 0.93 | 0.82 | 0.83 | 0.73 |
| 250 | 48 | 10 | 1.00 | 1.00 | 1.00 | 0.91 | 0.92 | 0.88 |
| 250 | 48 | 25 | 0.99 | 0.95 | 0.97 | 0.90 | 0.92 | 0.82 |
| 500 | 16 | 10 | 1.00 | 1.00 | 1.00 | 0.90 | 0.90 | 0.89 |
| 500 | 16 | 25 | 0.99 | 0.97 | 1.00 | 0.87 | 0.90 | 0.85 |
| 500 | 24 | 10 | 1.00 | 1.00 | 1.00 | 0.88 | 0.88 | 0.88 |
| 500 | 24 | 25 | 1.00 | 0.99 | 1.00 | 0.86 | 0.88 | 0.84 |
| 500 | 36 | 10 | 1.00 | 1.00 | 1.00 | 0.94 | 0.95 | 0.92 |
| 500 | 36 | 25 | 1.00 | 1.00 | 1.00 | 0.90 | 0.94 | 0.85 |
| 500 | 48 | 10 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 500 | 48 | 25 | 1.00 | 1.00 | 1.00 | 0.99 | 1.00 | 0.97 |
| 1,000 | 16 | 10 | 1.00 | 1.00 | 1.00 | 0.98 | 0.98 | 0.97 |
| 1,000 | 16 | 25 | 1.00 | 0.99 | 1.00 | 0.94 | 0.97 | 0.94 |
| 1,000 | 24 | 10 | 1.00 | 1.00 | 1.00 | 0.89 | 0.90 | 0.89 |
| 1,000 | 24 | 25 | 1.00 | 1.00 | 1.00 | 0.89 | 0.89 | 0.89 |
| 1,000 | 36 | 10 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 1,000 | 36 | 25 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.98 |
| 1,000 | 48 | 10 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 1,000 | 48 | 25 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Note. pair stands for pairwise deletion, pmm for predictive mean matching, rf for random forest imputation and em for the Amelia algorithm. PA-FA = Parallel analysis based on the factor model; PA-PCA = parallel analysis based on principal component analysis; FF = Factor Forest.
Accuracy of the Factor Retention Criteria Different Sample Sizes and Numbers of Manifest Variables When No Data Are Missing.
|
|
| FF | PA-FA | PA-PCA | CD |
|---|---|---|---|---|---|
| 250 | 16 | 1.00 | 1.00 | 0.84 | 0.96 |
| 250 | 24 | 0.99 | 0.98 | 0.80 | 0.92 |
| 250 | 36 | 1.00 | 1.00 | 0.83 | 0.90 |
| 250 | 48 | 1.00 | 1.00 | 0.92 | 0.89 |
| 500 | 16 | 1.00 | 1.00 | 0.91 | 0.98 |
| 500 | 24 | 1.00 | 1.00 | 0.88 | 0.96 |
| 500 | 36 | 1.00 | 1.00 | 0.95 | 0.96 |
| 500 | 48 | 1.00 | 1.00 | 1.00 | 0.97 |
| 1,000 | 16 | 1.00 | 1.00 | 0.98 | 0.98 |
| 1,000 | 24 | 1.00 | 1.00 | 0.90 | 0.98 |
| 1,000 | 36 | 1.00 | 1.00 | 1.00 | 0.98 |
| 1,000 | 48 | 1.00 | 1.00 | 1.00 | 0.99 |
Note. PA-FA = parallel analysis based on the factor model; PA-PCA = parallel analysis based on principal component analysis; CD = comparison data; FF = Factor Forest.