| Literature DB >> 29706907 |
Catherine Hanson1,2, Leyla Roskan Caglar1,2, Stephen José Hanson1,2.
Abstract
Category learning performance is influenced by both the nature of the category's structure and the way category features are processed during learning. Shepard (1964, 1987) showed that stimuli can have structures with features that are statistically uncorrelated (separable) or statistically correlated (integral) within categories. Humans find it much easier to learn categories having separable features, especially when attention to only a subset of relevant features is required, and harder to learn categories having integral features, which require consideration of all of the available features and integration of all the relevant category features satisfying the category rule (Garner, 1974). In contrast to humans, a single hidden layer backpropagation (BP) neural network has been shown to learn both separable and integral categories equally easily, independent of the category rule (Kruschke, 1993). This "failure" to replicate human category performance appeared to be strong evidence that connectionist networks were incapable of modeling human attentional bias. We tested the presumed limitations of attentional bias in networks in two ways: (1) by having networks learn categories with exemplars that have high feature complexity in contrast to the low dimensional stimuli previously used, and (2) by investigating whether a Deep Learning (DL) network, which has demonstrated humanlike performance in many different kinds of tasks (language translation, autonomous driving, etc.), would display human-like attentional bias during category learning. We were able to show a number of interesting results. First, we replicated the failure of BP to differentially process integral and separable category structures when low dimensional stimuli are used (Garner, 1974; Kruschke, 1993). Second, we show that using the same low dimensional stimuli, Deep Learning (DL), unlike BP but similar to humans, learns separable category structures more quickly than integral category structures. Third, we show that even BP can exhibit human like learning differences between integral and separable category structures when high dimensional stimuli (face exemplars) are used. We conclude, after visualizing the hidden unit representations, that DL appears to extend initial learning due to feature development thereby reducing destructive feature competition by incrementally refining feature detectors throughout later layers until a tipping point (in terms of error) is reached resulting in rapid asymptotic learning.Entities:
Keywords: attentional bias; categorization; condensation; deep learning; filtration; learning theory; neural networks
Year: 2018 PMID: 29706907 PMCID: PMC5909172 DOI: 10.3389/fpsyg.2018.00374
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Figure 1Renditions of the original stimuli from Kruschke (1993) (A) a sample of 4 stimuli in the low dimensional categorization (B) the filtration rule applied to the 8 stimuli as indicated in the 2-d feature space. (C) The condensation rule applied to the 8 stimuli as indicated in the 2-d feature space.
Figure 2The final high dimensional (A) filtration separable (FS) and (B) condensation integral (CI) sets with the categorization lines indicated in red.
Figure 3Human behavioral results for the 2D binary Kruschke (1993) stimuli for the FS and the CI conditions. Data fit with Equation 2: neg-exp (red FS and green CI). Data are binned (every 4 trials) and plotted at midpoints of each bin.
Figure 4The human behavioral result for the FS and CI conditions using the high dimensional naturalistic stimuli. Best fitting functions, hyperbolic exponential functions, (red FS, green CI). Data are binned (every 4 trials) and plotted at the midpoints of each bin.
Figure 5Modeling of the FS (green) and CI (red) conditions with the low dimensional Kruschke (1993) stimuli for BP (A) and DL (B) over 10 repetitions.
Figure 6Modeling of the FS and CI conditions with the high dimensional naturalistic stimuli for BP (A) and DL (B) averaged over 100 runs. Best fitting functions were the negative exponential functions for BP and the hyperbolic exponential functions for DL (overlaid in red). These are binned averages over blocks of 100, there were a total of 250 weight updates.
Model fits to subject and simulation learning.
| R∧2 | 90 | 84 | 98 | 99.2 | 99.7 | ||||
| LL | 89.5 | 72.7 | 120 | 67.0 | 88 | ||||
| Param scale, | 0.02 | 0.010.24 | 0.05 | 0.005 | 0.04 | 0.02 | 0.06 | 0.04 | |
| shape | 0.46 | 0.61 | 0.48 | 0.99 | 0.42 | 0.50 | 0.44 | ||
| R∧2 | 92 | 89 | 98 | 98.7 | 92.0 | 99.3 | |||
| LL | 89.0 | 83.9 | 121 | 69.3 | 65.6 | 85 | |||
| Param scale, | 0.02 | 0.03 | 0.03 | 0.01 | 0.04 | 0.02 | 0.08 | 0.04 | |
| shape | 1.04 | 1.35 | 1.08 | 1.16 | 1.00 | 1.09 | 1.00 | 1.05 | |
| R∧2 | 86 | 87 | 89 | 94 | 97 | 97 | 98 | 94 | |
| LL | 74.5 | 76.5 | 82.4 | 96.1 | 58.8 | 50.3 | 67.6 | 39.8 | |
| Param scale, | 0.01 | 0.005 | 0.007 | 0.008 | 0.02 | 0.006 | 0.05 | 0.01 | |
| shape | 0.62 | 0.42 | 0.84 | 0.29 | 0.52 | 0.58 | 0.49 | 0.68 | |
0.1,
0.05. Bold entries indicates a statistically significant value above all others in that column.
Figure 7Visualizations of the internal representations of a representative sample of the hidden units in the BP (A) and the DL (B) learning categories in the FS condition.