| Literature DB >> 25789992 |
Cees van der Eijk1, Jonathan Rose1.
Abstract
This paper undertakes a systematic assessment of the extent to which factor analysis the correct number of latent dimensions (factors) when applied to ordered-categorical survey items (so-called Likert items). We simulate 2400 data sets of uni-dimensional Likert items that vary systematically over a range of conditions such as the underlying population distribution, the number of items, the level of random error, and characteristics of items and item-sets. Each of these datasets is factor analysed in a variety of ways that are frequently used in the extant literature, or that are recommended in current methodological texts. These include exploratory factor retention heuristics such as Kaiser's criterion, Parallel Analysis and a non-graphical scree test, and (for exploratory and confirmatory analyses) evaluations of model fit. These analyses are conducted on the basis of Pearson and polychoric correlations. We find that, irrespective of the particular mode of analysis, factor analysis applied to ordered-categorical survey data very often leads to over-dimensionalisation. The magnitude of this risk depends on the specific way in which factor analysis is conducted, the number of items, the properties of the set of items, and the underlying population distribution. The paper concludes with a discussion of the consequences of over-dimensionalisation, and a brief mention of alternative modes of analysis that are much less prone to such problems.Entities:
Mesh:
Year: 2015 PMID: 25789992 PMCID: PMC4366083 DOI: 10.1371/journal.pone.0118900
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Histograms of simulated respondent positions for four distributions.
Histograms of distributions of simulated respondents on the latent continuum used in this paper (n = 2000): bimodal (panel A), normal (panel B), uniform (panel C) and skewed normal (panel D).
Specification of boundaries between categories of five-point ordered categorical items*.
| Item | Category boundary 1 | Category boundary 2 | Category boundary 3 | Category boundary 4 |
|---|---|---|---|---|
| Item 1 | 13 | 21 | 29 | 36 |
| Item 2 | 16 | 24 | 33 | 41 |
| Item 3 | 18 | 28 | 36 | 46 |
| Item 4 | 21 | 31 | 38 | 48 |
| Item 5 | 24 | 34 | 42 | 53 |
| Item 6 | 27 | 38 | 46 | 56 |
| Item 7 | 31 | 41 | 48 | 59 |
| Item 8 | 34 | 45 | 55 | 64 |
| Item 9 | 36 | 48 | 57 | 66 |
| Item 10 | 41 | 51 | 61 | 69 |
| Item 11 | 45 | 54 | 64 | 73 |
| Item 12 | 48 | 57 | 66 | 76 |
| Item 13 | 53 | 62 | 71 | 79 |
| Item 14 | 56 | 66 | 75 | 83 |
| Item 15 | 61 | 70 | 79 | 86 |
| Item 16 | 18 | 29 | 41 | 56 |
| Item 17 | 24 | 38 | 51 | 62 |
| Item 18 | 31 | 44 | 56 | 71 |
| Item 19 | 38 | 51 | 63 | 77 |
| Item 20 | 41 | 54 | 68 | 83 |
| Item 21 | 25 | 41 | 54 | 66 |
| Item 22 | 26 | 46 | 64 | 81 |
| Item 23 | 21 | 46 | 66 | 83 |
| Item 24 | 25 | 51 | 71 | 81 |
| Item 25 | 24 | 44 | 65 | 84 |
| Item 26 | 19 | 26 | 53 | 82 |
| Item 27 | 21 | 38 | 51 | 59 |
* The first response category captures all cases with positions between minus infinity and up to (but not including) the first category boundary. Similarly, the fifth response category begins at the value of the fourth category boundary and stretches to plus infinity.
Range and average test-retest correlations (Pearson and polychoric) for the 27 simulated items, by population distribution and level of random error.
| Large Random Error | ||||||||
|---|---|---|---|---|---|---|---|---|
| Normal | Uniform | Skewed Normal | Bimodal | |||||
| Pearson | Polychoric | Pearson | Polychoric | Pearson | Polychoric | Pearson | Polychoric | |
| Mean | 0.80 | 0.85 | 0.88 | 0.92 | 0.72 | 0.77 | 0.88 | 0.92 |
| Minimum | 0.63 | 0.68 | 0.80 | 0.79 | 0.44 | 0.49 | 0.78 | 0.82 |
| Maximum | 0.87 | 0.91 | 0.91 | 0.98 | 0.84 | 0.89 | 0.93 | 0.96 |
| Spread | 0.24 | 0.23 | 0.11 | 0.19 | 0.40 | 0.40 | 0.15 | 0.14 |
Fig 2Response distributions for Item 7 for each of the population distributions.
Distribution of simulated responses on item 7 (see Table 1) under the large error condition (see Table 2) for four latent population distributions: bimodal (A), normal (B), uniform (C) and skewed normal (D); (n = 2000).
Fig 3Response distributions for Item 26 for each of the population distributions.
Distribution of simulated responses on item 26 (see Table 1) under the large error condition (see Table 2) for four latent population distributions: bimodal (A), normal (B), uniform (C) and skewed normal (D); (n = 2000).
Percentage of over-dimensionalised solutions from exploratory factor analysis applying criteria listed in columns; based on Pearson and polychoric correlations of uni-dimensional simulated data; separately for different underlying population distributions and different numbers of items (each cell based on c.200 simulated datasets*).
| Latent population distribution | # of items | K1 (eigenvalues>1), Pearson | K1 (eigenvalues>1), Polychoric | Parallel Analysis, Pearson | Parallel Analysis, Polychoric | Acceleration.Factor, Pearson | Acceleration.Factor, Polychoric |
|---|---|---|---|---|---|---|---|
| Normal | 5 | 2 | 0 | 2 | 0 | 0 | 0 |
| 8 | 23 | 1 | 13 | 0 | 0 | 0 | |
| 10 | 55 | 6 | 36 | 5 | 0 | 0 | |
| Skewed normal | 5 | 6 | 0 | 3 | 0 | 0 | 0 |
| 8 | 32 | 14 | 18 | 6 | 0 | 0 | |
| 10 | 63 | 33 | 43 | 16 | 0 | 0 | |
| Uniform | 5 | 0 | 0 | 0 | 0 | 0 | 0 |
| 8 | 4 | 0 | 1 | 0 | 0 | 0 | |
| 10 | 11 | 0 | 3 | 0 | 0 | 0 | |
| Bi-modal | 5 | 0 | 0 | 0 | 0 | 0 | 0 |
| 8 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 10 | 0 | 0 | 0 | 0 | 0 | 0 |
* In a small number of cases the algorithms calculating polychoric correlations did not return results owing to non-positive definite matrices. These missing values are omitted, and thus the n varies slightly between different analyses. We have no reason to believe that this has a systematic impact on the results.
Percentage of instances where an exploratory 1-factor model would be rejected owing to poor fit according to criteria listed in columns; based on Pearson and polychoric correlations of uni-dimensional simulated data; separately for different underlying population distributions and different numbers of items; (each cell based on c.200 simulated datasets, see also footnote at Table 3).
| Latent population distribution | # of items | Chi2 comparison of 1- and 2 factor models, Pearson | Chi2 comparison of 1- and 2-factor models, Polychoric | RMSEA, Pearson | RMSEA, Polychoric |
|---|---|---|---|---|---|
| Normal | 5 | 98.5 | 98 | 91.5 | 87 |
| 8 | 100 | 100 | 97.5 | 96.5 | |
| 10 | 100 | 100 | 98.5 | 99 | |
| Skewed normal | 5 | 97 | 99 | 90 | 91 |
| 8 | 100 | 100 | 90 | 94.5 | |
| 10 | 100 | 100 | 96 | 97.5 | |
| Uniform | 5 | 100 | 100 | 98 | 97.5 |
| 8 | 100 | 100 | 100 | 100 | |
| 10 | 100 | 100 | 98.5 | 98.5 | |
| Bi-modal | 5 | 99 | 99.5 | 95 | 97 |
| 8 | 100 | 100 | 98 | 99 | |
| 10 | 100 | 100 | 99.5 | 100 |
Percentage of instances where a confirmatory 1-factor model would be rejected owing to poor fit according to criteria listed in columns; based on Pearson and polychoric correlations of uni-dimensional simulated data; separately for different underlying population distributions and different numbers of items; (each cell based on c.200 simulated datasets*).
| Latent population distribution | # of items | CFA Chi2, Pearson | CFA Chi2, Polychoric | CFA AGFI, Pearson | CFA AGFI, Polychoric | CFA RMSEA, Pearson | CFA RMSEA, Polychoric |
|---|---|---|---|---|---|---|---|
| Normal | 5 | 99.5 | 97.5 | 72.5 | 63 | 91.5 | 78 |
| 8 | 100 | 97.5 | 97.5 | 90.5 | 97.5 | 92 | |
| 10 | 100 | 99 | 98.5 | 97 | 98.5 | 95 | |
| Skewed normal | 5 | 98 | 97 | 66 | 63.5 | 90,5 | 81.5 |
| 8 | 100 | 94.5 | 86.5 | 85.5 | 90 | 87.5 | |
| 10 | 100 | 86 | 97.5 | 84 | 96 | 81 | |
| Uniform | 5 | 100 | 99.5 | 86.5 | 86 | 98 | 94.5 |
| 8 | 100 | 98 | 100 | 98 | 100 | 98 | |
| 10 | 100 | 98 | 100 | 94.5 | 100 | 94.5 | |
| Bi-modal | 5 | 99 | 97 | 81 | 81.5 | 95 | 89.5 |
| 8 | 100 | 90.5 | 97 | 88.5 | 98 | 89 | |
| 10 | 99.5 | 82.5 | 99 | 82.5 | 99 | 82 |
* In a small number of cases the algorithms calculating polychoric correlations, or estimating the CFA model did not return results owing to non-positive definite matrices or non-convergence of procedures. In the analyses reported below these cases are omitted, and thus the n varies slightly between different analyses. We have no reason to believe that this has a systematic impact on the results.
OLS regressions of model evaluation criteria; cell-entries are regression coefficients (n = 2271–2400, see footnote at Table 5)# *.
| Model A | Model B | Model C | Model D | Model E | Model F | Model G | Model H | Model I | Model J | |
|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||
| 1.80 | 0.60 | 0.03 | (0.01) | 2.08 | 1.09 | 1.16 | 1.24 | 0.04 | (0.00) | |
|
| ||||||||||
| -0.92 | -0.94 | -0.03 | -0.03 | -1.01 | -1.10 | 0.07 | 0.07 | -0.03 | -0.03 | |
|
| ||||||||||
| 3.32 | 3.01 | 0.10 | 0.09 | 3.11 | 2.80 | -0.29 | -0.27 | 0.10 | 0.09 | |
|
| ||||||||||
| 1.10 | 1.72 | 0.03 | 0.05 | 1.11 | 1.65 | -0.07 | -0.11 | 0.03 | 0.05 | |
|
| ||||||||||
| Bimodal | 1.84 | 3.02 | 0.06 | 0.09 | 1.97 | 3.07 | -0.14 | -0.22 | 0.06 | 0.09 |
| Uniform | 3.00 | 3.92 | 0.09 | 0.11 | 3.05 | 3.82 | -0.23 | -0.28 | 0.09 | 0.11 |
| Skewed Normal | -0.48 | -0.36 | -0.01§ | (-0.00) | -0.52 | -0.28 | 0.01~ | (-0.00) | -0.01§ | (-0.00) |
|
| ||||||||||
| 8 items | 2.90 | 2.76 | -0.03 | -0.04 | 3.25 | 3.27 | -0.03 | -0.01~ | -0.03 | -0.04 |
| 10 items | 4.35 | 4.13 | -0.05 | -0.06 | 4.89 | 4.87 | -0.05 | -0.03 | -0.05 | -0.06 |
|
| ||||||||||
| 0.902 | 0.872 | 0.816 | 0.758 | 0.921 | 0.898 | 0.846 | 0.790 | 0.816 | 0.762 | |
# Model A: dependent: Cube root of Δ chi-square between 1- and 2-factor EFA model; Pearson correlations
Model B: dependent: Cube root of Δ chi-square between 1- and 2-factor EFA model; polychoric correlations
Model C: dependent: RMSEA of a 1-factor EFA model; Pearson correlations
Model D: dependent: RMSEA of a 1-factor EFA model; polychoric correlations
Model E: dependent: Cube root of chi-square of a 1-factor CFA model; Pearson correlations
Model F: dependent: Cube root of chi-square of a 1-factor CFA model; polychoric correlations
Model G: dependent: AGFI of a 1-factor CFA model; Pearson correlations
Model H: dependent: AGFI of a 1-factor CFA model; polychoric correlations
Model I: dependent: RMSEA of a 1-factor CFA model; Pearson correlations
Model J: dependent: RMSEA of a 1-factor CFA model; polychoric correlations
* all coefficients p<.00001, except when indicated with § (p<.0001), with # (p<.001), with ~ (p<.01). Bracketed coefficients are not significant (p>.01)