| Literature DB >> 35989724 |
Tim Cosemans1, Yves Rosseel2, Sarah Gelper1.
Abstract
Exploratory graph analysis (EGA) is a commonly applied technique intended to help social scientists discover latent variables. Yet, the results can be influenced by the methodological decisions the researcher makes along the way. In this article, we focus on the choice regarding the number of factors to retain: We compare the performance of the recently developed EGA with various traditional factor retention criteria. We use both continuous and binary data, as evidence regarding the accuracy of such criteria in the latter case is scarce. Simulation results, based on scenarios resulting from varying sample size, communalities from major factors, interfactor correlations, skewness, and correlation measure, show that EGA outperforms the traditional factor retention criteria considered in most cases in terms of bias and accuracy. In addition, we show that factor retention decisions for binary data are preferably made using Pearson, instead of tetrachoric, correlations, which is contradictory to popular belief.Entities:
Keywords: binary data; exploratory factor analysis; exploratory graph analysis; factor retention; simulation
Year: 2021 PMID: 35989724 PMCID: PMC9386885 DOI: 10.1177/00131644211059089
Source DB: PubMed Journal: Educ Psychol Meas ISSN: 0013-1644 Impact factor: 3.088
Overview of Recent Studies Reviewing Factor Retention Criteria.
| Author | Dichotomous data | Correlations | Minor factors | EGA | Criteria included |
|---|---|---|---|---|---|
|
| No | Pearson | Yes | No | KC, PA, RPA, CD, |
|
| Yes | Pearson (dichotomous) and tetrachoric (dichotomous) | No | No | PA |
|
| No | Pearson | No | No | PA |
|
| Yes | Tetrachoric (dichotomous) | No | Yes | VSS, MAP, (E)BIC, PA, KC, and EGA |
|
| Yes | Pearson (continuous) and tetrachoric (dichotomous) | No | Yes | KC, PA, scree plot, and EGA |
|
| No | Pearson | No | No | KC, PA, CD, empirical KC, random forest, and extreme/automatic gradient boosting |
|
| Yes | Tetrachoric | No | No | PA and RPA |
|
| No | Pearson | No | No | PA, empirical KC, LRT, comparative fit index, Tucker–Lewis index, and RMSEA |
|
| Yes | Pearson (continuous) and tetrachoric (dichotomous) | No | No | PA, RPA, and CD |
|
| No | Pearson | No | No | KC, PA, scree test, MAP, CD, AIC, BIC, and |
|
| Yes | Tetrachoric | No | No | KC, PA, |
|
|
|
|
|
|
|
Note. AIC = Akaike Information Criterion; EGA = exploratory graph analysis; KC = Kaiser criterion; PA = traditional parallel analysis; RPA = revised parallel analysis; CD = comparison data; VSS = very simple structure; MAP = minimum average partial; (E)BIC = (extended) Bayesian information criterion; LRT = likelihood ratio test; RMSEA = root mean square error of approximation.Matrices/vectors are in bold.
Variable Input Parameters.
| Parameter | Levels | References |
|---|---|---|
| Sample size |
|
|
| Interfactor correlations |
|
|
| Communalities from major factors |
|
|
Communality levels from major factors considered are two ranges. Actual communalities used for each of the variables are randomly sampled from this range (following Tucker et al. (1969)).
Figure 1.Schematic Representation of the Data Simulation Procedure.
Fixed Input Parameters.
| Parameter | Value | References |
|---|---|---|
| Variables | 20 |
|
| Major factors | 3 |
|
| Minor factors | 200 |
|
| Unique factors | 20 (= |
|
| Common ratio ( | 0.8 |
|
| Correlation minor factors
|
|
|
| Correlation major and minor factors
|
|
|
The factor analytic model assumes all unmodelled influences to be uncorrelated with those that are (Mulaik, 2009).
Figure 2.Results for Continuous Data: (A) Expected Probability of Correctly Predicting the Number of Factors and (B) Expected Bias.
Note. EV = Kaiser criterion; AF = acceleration factor alternative to the scree plot; AFEV = acceleration factor combined with the Kaiser criterion; PAM = parallel analysis based on the mean eigenvalues; PA95 = parallel analysis based on the 95th percentile of eigenvalues; RPA = revised parallel analysis based on the 95th percentile of eigenvalues; RPAEV = revised parallel analysis based on the 95th percentile of eigenvalues combined with the Kaiser criterion; MAP = minimum average partial method; EGA = exploratory graph analysis.
Results for Continuous Data.
| A. Expected probability of correctly predicting the number of factors | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Communalities | Interfactor correlation |
| EV | AF | AFEV | PAM | PA95 | RPA | RPAEV | MAP | EGA |
| Low | Low | 100 | 0 | 0.23 | 0.14 | 0.03 | 0.07 | 0.14 | 0.14 | 0.03 | 0.33 |
| Low | Low | 1,000 | 0 | 0.22 | 0.07 | 0.02 | 0.03 | 0.01 | 0.01 | 0.05 | 0.67 |
| Low | High | 100 | 0 | 0 | 0 | 0.04 | 0.11 | 0.18 | 0.18 | 0.09 | 0.3 |
| Low | High | 1,000 | 0 | 0 | 0 | 0.03 | 0.05 | 0.02 | 0.02 | 0.13 | 0.64 |
| High | Low | 100 | 0.08 | 0.85 | 0.98 | 0.93 | 0.97 | 0.93 | 0.93 | 0.1 | 0.96 |
| High | Low | 1,000 | 0.11 | 0.84 | 0.95 | 0.9 | 0.92 | 0.49 | 0.49 | 0.15 | 0.99 |
| High | High | 100 | 0.09 | 0.02 | 0.03 | 0.95 | 0.98 | 0.95 | 0.95 | 0.25 | 0.96 |
| High | High | 1,000 | 0.12 | 0.02 | 0.01 | 0.92 | 0.95 | 0.57 | 0.57 | 0.34 | 0.99 |
| B. Expected bias | |||||||||||
| Communalities | Interfactor correlation |
| EV | AF | AFEV | PAM | PA95 | RPA | RPAEV | MAP | EGA |
| Low | Low | 100 | 3.69 | −0.23 | −0.24 | 2.47 | 2.13 | 2.21 | 2.05 | −0.26 | 0.46 |
| Low | Low | 1,000 | 3.16 | −0.19 | −0.19 | 2.83 | 2.72 | 3.6 | 3.17 | 0.84 | 0.16 |
| Low | High | 100 | 3.55 | −1.97 | −1.98 | 2.23 | 1.85 | 2 | 1.72 | 0.73 | 0.49 |
| Low | High | 1,000 | 3.02 | −1.93 | −1.93 | 2.6 | 2.45 | 3.39 | 2.85 | 1.83 | 0.2 |
| High | Low | 100 | 1.23 | −0.15 | −0.16 | 0 | −0.11 | −0.3 | −0.2 | 0.2 | 0.19 |
| High | Low | 1,000 | 0.7 | −0.11 | −0.11 | 0.36 | 0.49 | 1.09 | 0.93 | 1.3 | −0.11 |
| High | High | 100 | 1.09 | −1.89 | −1.9 | −0.24 | −0.38 | −0.51 | −0.52 | 1.19 | 0.23 |
| High | High | 1,000 | 0.56 | −1.85 | −1.86 | 0.12 | 0.21 | 0.88 | 0.61 | 2.29 | −0.07 |
Note. EV = Kaiser criterion; AF = acceleration factor alternative to the scree plot; AFEV = acceleration factor combined with the Kaiser criterion; PAM = parallel analysis based on the mean eigenvalues; PA95 = parallel analysis based on the 95th percentile of eigenvalues; RPA = revised parallel analysis based on the 95th percentile of eigenvalues; RPAEV = revised parallel analysis based on the 95th percentile of eigenvalues combined with the Kaiser criterion; MAP = minimum average partial method; EGA = exploratory graph analysis.
Figure 3.Results for Dichotomous Data With a 50–50 Split, Using Pearson Correlations: (A) Expected Probability of Correctly Predicting the Number of Factors and (B) Expected Bias.
Figure 4.Results for Dichotomous Data With a 50–50 Split, Using Tetrachoric Correlations: (A) Expected Probability of Correctly Predicting the Number of Factors and (B) Expected Bias.
Results for Continuous Data With a 50–50 Split, Using Pearson Correlations.
| A. Expected probability of correctly predicting the number of factors | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Communalities | Interfactor correlation |
| EV | AF | AFEV | PAM | PA95 | RPA | RPAEV | MAP | EGA |
| Low | Low | 100 | 0 | 0.24 | 0.16 | 0.04 | 0.13 | 0.22 | 0.22 | 0.18 | 0.27 |
| Low | Low | 1,000 | 0 | 0.23 | 0.08 | 0.02 | 0.06 | 0.02 | 0.02 | 0.26 | 0.6 |
| Low | High | 100 | 0 | 0 | 0 | 0.05 | 0.2 | 0.28 | 0.28 | 0.4 | 0.25 |
| Low | High | 1,000 | 0 | 0 | 0 | 0.03 | 0.09 | 0.03 | 0.03 | 0.51 | 0.57 |
| High | Low | 100 | 0.03 | 0.85 | 0.98 | 0.95 | 0.98 | 0.96 | 0.96 | 0.43 | 0.95 |
| High | Low | 1,000 | 0.04 | 0.84 | 0.96 | 0.92 | 0.96 | 0.62 | 0.62 | 0.54 | 0.99 |
| High | High | 100 | 0.03 | 0.02 | 0.03 | 0.96 | 0.99 | 0.97 | 0.97 | 0.69 | 0.94 |
| High | High | 1,000 | 0.04 | 0.02 | 0.01 | 0.93 | 0.97 | 0.69 | 0.69 | 0.78 | 0.99 |
| B. Expected bias | |||||||||||
| Communalities | Interfactor correlation |
| EV | AF | AFEV | PAM | PA95 | RPA | RPAEV | MAP | EGA |
| Low | Low | 100 | 4.03 | −0.24 | −0.25 | 2.33 | 1.88 | 1.95 | 1.81 | −1.88 | 0.49 |
| Low | Low | 1,000 | 3.49 | −0.2 | −0.21 | 2.69 | 2.47 | 3.34 | 2.94 | −0.78 | 0.19 |
| Low | High | 100 | 3.89 | −1.98 | −1.99 | 2.09 | 1.6 | 1.74 | 1.49 | −0.89 | 0.53 |
| Low | High | 1,000 | 3.36 | −1.94 | −1.95 | 2.45 | 2.2 | 3.14 | 2.61 | 0.21 | 0.23 |
| High | Low | 100 | 1.57 | −0.16 | −0.17 | −0.15 | −0.36 | −0.56 | −0.43 | −1.42 | 0.22 |
| High | Low | 1,000 | 1.03 | −0.12 | −0.13 | 0.21 | 0.24 | 0.83 | 0.7 | −0.32 | −0.07 |
| High | High | 100 | 1.43 | −1.9 | −1.91 | −0.38 | −0.63 | −0.76 | −0.75 | −0.43 | 0.26 |
| High | High | 1,000 | 0.9 | −1.86 | −1.87 | −0.02 | −0.03 | 0.63 | 0.37 | 0.67 | −0.03 |
Results for Dichotomous Data With a 50–50 Split, Using Tetrachoric Correlations.
| A. Expected probability of correctly predicting the number of factors | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Communalities | Interfactor correlation |
| EV | AF | AFEV | PAM | PA95 | RPA | RPAEV | MAP | EGA |
| Low | Low | 100 | 0 | 0.01 | 0.16 | 0.01 | 0.02 | 0.06 | 0.06 | 0.04 | 0.22 |
| Low | Low | 1,000 | 0 | 0.01 | 0.08 | 0 | 0.01 | 0 | 0 | 0.06 | 0.54 |
| Low | High | 100 | 0 | 0 | 0 | 0.01 | 0.03 | 0.09 | 0.09 | 0.1 | 0.2 |
| Low | High | 1,000 | 0 | 0 | 0 | 0.01 | 0.01 | 0.01 | 0.01 | 0.15 | 0.51 |
| High | Low | 100 | 0.03 | 0.19 | 0.98 | 0.76 | 0.86 | 0.86 | 0.86 | 0.11 | 0.94 |
| High | Low | 1,000 | 0.05 | 0.18 | 0.96 | 0.68 | 0.71 | 0.29 | 0.29 | 0.17 | 0.98 |
| High | High | 100 | 0.04 | 0 | 0.03 | 0.8 | 0.91 | 0.89 | 0.89 | 0.28 | 0.93 |
| High | High | 1,000 | 0.05 | 0 | 0.01 | 0.72 | 0.8 | 0.36 | 0.36 | 0.37 | 0.98 |
| B. Expected bias | |||||||||||
| Communalities | Interfactor correlation |
| EV | AF | AFEV | PAM | PA95 | RPA | RPAEV | MAP | EGA |
| Low | Low | 100 | 4 | −1.24 | −0.23 | 2.78 | 2.4 | 2.68 | 2.37 | −1.75 | 0.36 |
| Low | Low | 1,000 | 3.47 | −1.2 | −0.19 | 3.14 | 2.99 | 4.08 | 3.5 | −0.65 | 0.06 |
| Low | High | 100 | 3.86 | −2.98 | −1.97 | 2.55 | 2.12 | 2.48 | 2.05 | −0.76 | 0.4 |
| Low | High | 1,000 | 3.33 | −2.94 | −1.93 | 2.91 | 2.72 | 3.87 | 3.17 | 0.34 | 0.1 |
| High | Low | 100 | 1.54 | −1.16 | −0.15 | 0.31 | 0.16 | 0.18 | 0.13 | −1.3 | 0.09 |
| High | Low | 1,000 | 1.01 | −1.12 | −0.11 | 0.67 | 0.76 | 1.57 | 1.25 | −0.2 | −0.2 |
| High | High | 100 | 1.4 | −2.9 | −1.9 | 0.07 | −0.11 | −0.03 | −0.2 | −0.3 | 0.13 |
| High | High | 1,000 | 0.87 | −2.86 | −1.85 | 0.43 | 0.49 | 1.36 | 0.93 | 0.8 | −0.16 |
Note. EV = Kaiser criterion; AF = acceleration factor alternative to the scree plot; AFEV = acceleration factor combined with the Kaiser criterion; PAM = parallel analysis based on the mean eigenvalues; PA95 = parallel analysis based on the 95th percentile of eigenvalues; RPA = revised parallel analysis based on the 95th percentile of eigenvalues; RPAEV = revised parallel analysis based on the 95th percentile of eigenvalues combined with the Kaiser criterion; MAP = minimum average partial method; EGA = exploratory graph analysis.
Figure 5.Results for Dichotomous Data With a 75–25 Split, Using Pearson Correlations: (A) Expected Probability of Correctly Predicting the Number of Factors and (B) Expected Bias.
Figure 6.Results for Dichotomous Data With a 75–25 Split, Using Tetrachoric Correlations: (A) Expected Probability of Correctly Predicting the Number of Factors and (B) Expected Bias.
Results for Dichotomous Data With a 75–25 Split, Using Pearson Correlations.
| A. Expected probability of correctly predicting the number of factors | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Communalities | Interfactor correlation |
| EV | AF | AFEV | PAM | PA95 | RPA | RPAEV | MAP | EGA |
| Low | Low | 100 | 0 | 0.25 | 0.16 | 0.02 | 0.09 | 0.19 | 0.19 | 0.15 | 0.24 |
| Low | Low | 1,000 | 0 | 0.23 | 0.08 | 0.01 | 0.04 | 0.02 | 0.02 | 0.21 | 0.57 |
| Low | High | 100 | 0 | 0 | 0 | 0.03 | 0.13 | 0.24 | 0.24 | 0.34 | 0.22 |
| Low | High | 1,000 | 0 | 0 | 0 | 0.02 | 0.06 | 0.02 | 0.02 | 0.45 | 0.54 |
| High | Low | 100 | 0.03 | 0.86 | 0.98 | 0.91 | 0.97 | 0.95 | 0.95 | 0.37 | 0.94 |
| High | Low | 1,000 | 0.04 | 0.85 | 0.96 | 0.87 | 0.93 | 0.58 | 0.58 | 0.48 | 0.99 |
| High | High | 100 | 0.03 | 0.02 | 0.03 | 0.93 | 0.98 | 0.97 | 0.97 | 0.64 | 0.94 |
| High | High | 1,000 | 0.05 | 0.02 | 0.01 | 0.89 | 0.96 | 0.65 | 0.65 | 0.73 | 0.98 |
| B. Expected bias | |||||||||||
| Communalities | Interfactor correlation |
| EV | AF | AFEV | PAM | PA95 | RPA | RPAEV | MAP | EGA |
| Low | Low | 100 | 4.18 | −0.23 | −0.23 | 2.44 | 1.96 | 1.68 | 1.89 | −2.19 | 0.48 |
| Low | Low | 1,000 | 3.65 | −0.19 | −0.19 | 2.8 | 2.55 | 3.07 | 3.02 | −1.09 | 0.19 |
| Low | High | 100 | 4.04 | −1.97 | −1.97 | 2.2 | 1.68 | 1.48 | 1.57 | −1.19 | 0.52 |
| Low | High | 1,000 | 3.51 | −1.93 | −1.93 | 2.56 | 2.28 | 2.87 | 2.7 | −0.09 | 0.23 |
| High | Low | 100 | 1.72 | −0.15 | −0.15 | −0.04 | −0.28 | −0.83 | −0.35 | −1.73 | 0.22 |
| High | Low | 1,000 | 1.19 | −0.11 | −0.11 | 0.32 | 0.32 | 0.56 | 0.78 | −0.63 | −0.08 |
| High | High | 100 | 1.58 | −1.89 | −1.89 | −0.27 | −0.55 | −1.03 | −0.67 | −0.73 | 0.26 |
| High | High | 1,000 | 1.05 | −1.85 | −1.85 | 0.09 | 0.04 | 0.36 | 0.45 | 0.37 | −0.04 |
Note. EV = Kaiser criterion; AF = acceleration factor alternative to the scree plot; AFEV = acceleration factor combined with the Kaiser criterion; PAM = parallel analysis based on the mean eigenvalues; PA95 = parallel analysis based on the 95th percentile of eigenvalues; RPA = revised parallel analysis based on the 95th percentile of eigenvalues; RPAEV = revised parallel analysis based on the 95th percentile of eigenvalues combined with the Kaiser criterion; MAP = minimum average partial method; EGA = exploratory graph analysis.
Results for Dichotomous Data With a 75–25 Split, Using Pearson Correlations.
| A. Expected probability of correctly predicting the number of factors | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Communalities | Interfactor correlation |
| EV | AF | AFEV | PAM | PA95 | RPA | RPAEV | MAP | EGA |
| Low | Low | 100 | 0 | 0.01 | 0.16 | 0 | 0.01 | 0.06 | 0.06 | 0.03 | 0.2 |
| Low | Low | 1,000 | 0 | 0.01 | 0.08 | 0 | 0 | 0 | 0 | 0.04 | 0.5 |
| Low | High | 100 | 0 | 0 | 0 | 0.01 | 0.02 | 0.07 | 0.07 | 0.08 | 0.18 |
| Low | High | 1,000 | 0 | 0 | 0 | 0 | 0.01 | 0.01 | 0.01 | 0.12 | 0.47 |
| High | Low | 100 | 0.04 | 0.19 | 0.98 | 0.65 | 0.79 | 0.83 | 0.83 | 0.09 | 0.93 |
| High | Low | 1,000 | 0.05 | 0.18 | 0.96 | 0.55 | 0.61 | 0.25 | 0.25 | 0.14 | 0.98 |
| High | High | 100 | 0.04 | 0 | 0.03 | 0.7 | 0.86 | 0.87 | 0.87 | 0.23 | 0.92 |
| High | High | 1,000 | 0.06 | 0 | 0.01 | 0.61 | 0.72 | 0.32 | 0.32 | 0.32 | 0.98 |
| B. Expected bias | |||||||||||
| Communalities | Interfactor correlation |
| EV | AF | AFEV | PAM | PA95 | RPA | RPAEV | MAP | EGA |
| Low | Low | 100 | 4.15 | −1.23 | −0.21 | 2.89 | 2.48 | 2.42 | 2.45 | −2.06 | 0.35 |
| Low | Low | 1,000 | 3.62 | −1.19 | −0.17 | 3.25 | 3.07 | 3.81 | 3.58 | −0.96 | 0.06 |
| Low | High | 100 | 4.01 | −2.97 | −1.96 | 2.66 | 2.2 | 2.21 | 2.13 | −1.06 | 0.39 |
| Low | High | 1,000 | 3.48 | −2.93 | −1.91 | 3.02 | 2.8 | 3.6 | 3.25 | 0.04 | 0.1 |
| High | Low | 100 | 1.69 | −1.15 | −0.14 | 0.42 | 0.24 | −0.09 | 0.21 | −1.6 | 0.09 |
| High | Low | 1,000 | 1.16 | −1.11 | −0.09 | 0.78 | 0.84 | 1.3 | 1.33 | −0.5 | −0.21 |
| High | High | 100 | 1.55 | −2.89 | −1.88 | 0.18 | −0.03 | −0.3 | −0.11 | −0.61 | 0.13 |
| High | High | 1,000 | 1.02 | −2.85 | −1.84 | 0.54 | 0.56 | 1.1 | 1.01 | 0.49 | −0.17 |
Note. EV = Kaiser criterion; AF = acceleration factor alternative to the scree plot; AFEV = acceleration factor combined with the Kaiser criterion; PAM = parallel analysis based on the mean eigenvalues; PA95 = parallel analysis based on the 95th percentile of eigenvalues; RPA = revised parallel analysis based on the 95th percentile of eigenvalues; RPAEV = revised parallel analysis based on the 95th percentile of eigenvalues combined with the Kaiser criterion; MAP = minimum average partial method; EGA = exploratory graph analysis.
Overview of Results: Recommended Procedure to Use in Each of the Scenarios.
| Low interfactor correlation | High interfactor correlation | |
|---|---|---|
| Low communalities | - EGA | |
| - RPA/RPAEV if small samples | ||
| - MAP if small samples and Pearson correlations | ||
| High communalities | - AF/AFEV/EGA/PAM/PA95 | - EGA/PAM/PA95 |
| - RPA/RPAEV if small samples | - RPA/RPAEV if small samples |
Note. EGA = exploratory graph analysis; RPA = revised parallel analysis based on the 95th percentile of eigenvalues; RPAEV = revised parallel analysis based on the 95th percentile of eigenvalues Kaiser criterion; MAP = minimum average partial method; AF = acceleration factor alternative to the scree plot; AFEV = acceleration factor combined with the Kaiser criterion; PAM = parallel analysis based on the mean eigenvalues; PA95 = parallel analysis based on the 95th percentile of eigenvalues; RPAEV = revised parallel analysis based on the 95th percentile of eigenvalues combined with the Kaiser criterion.