| Literature DB >> 35494838 |
Yen-Kuang Lin1, Chen-Yin Lee2, Chen-Yueh Chen3.
Abstract
Background: The principal component analysis (PCA) is known as a multivariate statistical model for reducing dimensions into a representation of principal components. Thus, the PCA is commonly adopted for establishing psychometric properties, i.e., the construct validity. Autoencoder is a neural network model, which has also been shown to perform well in dimensionality reduction. Although there are several ways the PCA and autoencoders could be compared for their differences, most of the recent literature focused on differences in image reconstruction, which are often sufficient for training data. In the current study, we looked at details of each autoencoder classifier and how they may provide neural network superiority that can better generalize non-normally distributed small datasets. Methodology: A Monte Carlo simulation was conducted, varying the levels of non-normality, sample sizes, and levels of communality. The performances of autoencoders and a PCA were compared using the mean square error, mean absolute value, and Euclidian distance. The feasibility of autoencoders with small sample sizes was examined. Conclusions: With extreme flexibility in decoding representation using linear and non-linear mapping, this study demonstrated that the autoencoder can robustly reduce dimensions, and hence was effective in building the construct validity with a sample size as small as 100. The autoencoders could obtain a smaller mean square error and small Euclidian distance between original dataset and predictions for a small non-normal dataset. Hence, when behavioral scientists attempt to explore the construct validity of a newly designed questionnaire, an autoencoder could also be considered an alternative to a PCA. ©2022 Lin et al.Entities:
Keywords: Autoencoder; Construct validity; Dimension reduction; Factor analysis; Principal component analysis; Sample size
Year: 2022 PMID: 35494838 PMCID: PMC9044230 DOI: 10.7717/peerj-cs.782
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Figure 1Flow chart for the Monte Carlo (MCMC) simulations.
Figure 2(A–B) Autoencoder architecture.
Items on the curiosity questionnaire.
| Item | Description |
|---|---|
| 1 | I enjoy collecting and calculating statistics for my favorite basketball team. |
| 2 | I often imagine how my favorite basketball team is playing to defeat their opponent. |
| 3 | I enjoy exploring my favorite basketball stadiums or facilities. |
| 4 | Watching basketball games with my friends is joyful. |
| 5 | I enjoy reading articles about basketball players, teams, events, and games. |
| 6 | I am interested in learning how much it costs to build a brand-new basketball stadium. |
| 7 | When I miss a game, I often look for information on television, the internet, or newspaper to catch the game results. |
| 8 | I am interested in learning how large a basketball court is. |
| 9 | I enjoy probing deeply into basketball. |
| 10 | I am eager to learn more about basketball. |
| 11 | I enjoy any movement that occurs during a basketball game. |
Parameters for generating non-normal data.
| Scenario no. | Mean | Standard deviation | Skewness | Kurtosis |
|---|---|---|---|---|
| 1 | 0 | 1 | 0 | 0 |
| 2 | 0 | 1 | 1 | 3 |
| 3 | 0 | 1 | 2 | 20 |
Performance metrics for three communality conditions.
| High communality | Wide communality | Low communality | |||||
|---|---|---|---|---|---|---|---|
| Metric | Algorithm | Mean | SD | Mean | SD | Mean | SD |
| MSE | Simple encoder | 0.343 | 0.175 | 0.382 | 0.203 | 0.458 | 0.146 |
| Tied encoder | 0.034 | 0.009 | 0.034 | 0.010 | 0.034 | 0.009 | |
| PCA | 0.230 | 0.160 | 0.254 | 0.196 | 0.337 | 0.148 | |
| Deep autoencoder | 0.032 | 0.019 | 0.036 | 0.022 | 0.044 | 0.017 | |
| Independent autoencoder | 0.036 | 0.021 | 0.040 | 0.023 | 0.048 | 0.018 | |
| MAE | Simple encoder | 0.444 | 0.112 | 0.460 | 0.128 | 0.523 | 0.086 |
| Tied encoder | 0.126 | 0.018 | 0.126 | 0.018 | 0.127 | 0.017 | |
| PCA | 0.357 | 0.116 | 0.356 | 0.144 | 0.443 | 0.101 | |
| Deep autoencoder | 0.107 | 0.030 | 0.110 | 0.034 | 0.128 | 0.024 | |
| Independent autoencoder | 0.113 | 0.031 | 0.117 | 0.035 | 0.133 | 0.025 | |
| NED | Simple encoder | 2.288 | 0.166 | 2.277 | 0.179 | 2.226 | 0.144 |
| Tied encoder | 1.053 | 0.076 | 1.055 | 0.074 | 1.060 | 0.073 | |
| PCA | 4.730 | 0.303 | 4.745 | 0.341 | 4.634 | 0.291 | |
| Deep autoencoder | 1.146 | 0.086 | 1.144 | 0.094 | 1.114 | 0.079 | |
| Independent autoencoder | 1.138 | 0.099 | 1.132 | 0.104 | 1.111 | 0.085 | |
Notes.
mean squared error
principal component analysis
mean absolute error
non-Euclidian distance
standard deviation
Performance metrics for three normality conditions.
| Normal | Slightly un-normal | Un-normal | |||||
|---|---|---|---|---|---|---|---|
| Metric | Algorithm | Mean | SD | Mean | SD | Mean | SD |
| MSE | Simple encoder | 0.397 | 0.182 | 0.394 | 0.181 | 0.392 | 0.185 |
| Tied encoder | 0.034 | 0.009 | 0.034 | 0.009 | 0.034 | 0.010 | |
| PCA | 0.277 | 0.174 | 0.275 | 0.175 | 0.269 | 0.177 | |
| Deep autoencoder | 0.038 | 0.020 | 0.038 | 0.020 | 0.037 | 0.019 | |
| Independent autoencoder | 0.042 | 0.021 | 0.041 | 0.021 | 0.041 | 0.021 | |
| MAE | Simple encoder | 0.484 | 0.117 | 0.473 | 0.114 | 0.471 | 0.115 |
| Tied encoder | 0.128 | 0.018 | 0.126 | 0.018 | 0.125 | 0.017 | |
| PCA | 0.393 | 0.130 | 0.382 | 0.128 | 0.380 | 0.127 | |
| Deep autoencoder | 0.117 | 0.032 | 0.114 | 0.031 | 0.114 | 0.030 | |
| Independent autoencoder | 0.123 | 0.033 | 0.121 | 0.032 | 0.120 | 0.031 | |
| NED | Simple encoder | 2.305 | 0.158 | 2.263 | 0.164 | 2.223 | 0.166 |
| Tied encoder | 1.075 | 0.073 | 1.053 | 0.074 | 1.041 | 0.072 | |
| PCA | 4.787 | 0.299 | 4.699 | 0.314 | 4.622 | 0.314 | |
| Deep autoencoder | 1.155 | 0.086 | 1.134 | 0.087 | 1.116 | 0.087 | |
| Independent autoencoder | 1.147 | 0.095 | 1.126 | 0.096 | 1.108 | 0.096 | |
Notes.
mean squared error
mean absolute error
non-Euclidian distance
principal component analysis
standard deviation
Figure 3Interaction of communality and non-normality condition on the mean squared error (MSE) for the simple autoencoder.
Figure 4Reconstruction error for the autoencoder family under different sample sizes.
Figure 5Mean square errors (MSEs) for different combinations of communality and normality.
Figure 6Mean absolute errors (MAEs) for the principal component analysis (PCA) and deep autoencoder.
Figure 7Mean squared errors (MSEs) of the autoencoder and principal component analysis (PCA) under various extracted components.
Figure 8TSNE visualizations for PCA and Autoencoder.
Absolute values of bottleneck weights on the Curiosity dataset.
| a. Bottleneck weights estimates from the population. | |||
|---|---|---|---|
| 0.029 | 0.237 | I enjoy collecting and calculating statistics of my favorite basketball team. | |
| 0.178 | 0.293 | I enjoy reading articles about basketball players, teams, events, and games. | |
| 0.158 | 0.261 | I am eager to learn more about basketball. | |
| 0.003 | 0.227 | I enjoy any movement that occurs during a basketball game. | |
| 0.433 | 0.415 | Watching basketball games with my friends is joyful. | |
| 0.575 | 0.114 | I enjoy probing deeply into basketball. | |
| 0.662 | 0.039 | I often imagine how my favorite basketball team is playing to defeat their opponent. | |
| 0.519 | 0.469 | I enjoy exploring my favorite basketball stadiums or facilities. | |
| 0.385 | 0.023 | I am interested in learning how much it costs to build a brand new basketball stadium. | |
| 0.19 | 0.581 | I am interested in learning how large a basketball court is. | |
| 0.127 | 0.231 | When I miss a game, I often look for information on television, the internet, or newspaper to catch the game results. | |