| Literature DB >> 30104363 |
Hanlin Tang1,2, Martin Schrimpf2,3,4,5, William Lotter1,2,6, Charlotte Moerman2, Ana Paredes2, Josue Ortega Caro2, Walter Hardesty2, David Cox6, Gabriel Kreiman7.
Abstract
Making inferences from partial information constitutes a critical aspect of cognition. During visual perception, pattern completion enables recognition of poorly visible or occluded objects. We combined psychophysics, physiology, and computational models to test the hypothesis that pattern completion is implemented by recurrent computations and present three pieces of evidence that are consistent with this hypothesis. First, subjects robustly recognized objects even when they were rendered <15% visible, but recognition was largely impaired when processing was interrupted by backward masking. Second, invasive physiological responses along the human ventral cortex exhibited visually selective responses to partially visible objects that were delayed compared with whole objects, suggesting the need for additional computations. These physiological delays were correlated with the effects of backward masking. Third, state-of-the-art feed-forward computational architectures were not robust to partial visibility. However, recognition performance was recovered when the model was augmented with attractor-based recurrent connectivity. The recurrent model was able to predict which images of heavily occluded objects were easier or harder for humans to recognize, could capture the effect of introducing a backward mask on recognition behavior, and was consistent with the physiological delays along the human ventral visual stream. These results provide a strong argument of plausibility for the role of recurrent computations in making visual inferences from partial information.Entities:
Keywords: artificial intelligence; computational neuroscience; machine learning; pattern completion; visual object recognition
Mesh:
Year: 2018 PMID: 30104363 PMCID: PMC6126774 DOI: 10.1073/pnas.1719397115
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.Backward masking disrupts recognition of partially visible objects. (A and B) Forced-choice categorization task (n = 21 subjects). After 500 ms of fixation, stimuli were presented for variable exposure times (SOA from 25 to 150 ms), followed by a gray screen (A) or a noise mask (B) for 500 ms. Stimuli were presented unaltered (Whole; C, Left and D, Left), rendered partially visible (Partial; C, Right), or rendered occluded (D, Right) (). (E) Experimental variation with novel objects (). Behavioral performance is shown as a function of visibility for the unmasked (F) and masked (G) trials. Colors denote different SOAs. Error bars denote SEM. The horizontal dashed line indicates chance level (20%). Bin size = 2.5%. Note the discontinuity in the x axis to report performance at 100% visibility. (H) Average recognition performance as a function of SOA for partial objects (same data replotted from F and G, excluding 100% visibility). Performance was significantly degraded by masking (solid gray line) compared with the unmasked trials (dotted gray line) (P < 0.001, χ2 test; df = 4). (I) Performance versus SOA for the occluded stimuli in D (note: chance = 25% here) (). (J) Performance versus SOA for the novel object stimuli in E.
Fig. 2.Behavioral effect of masking correlated with the neural response latency on an image-by-image basis. (A) Intracranial field potential (IFP) responses from an electrode in the left fusiform gyrus averaged across five categories of whole objects while a subject was performing the task described in Fig. 1 (no masking, 150-ms presentation time). This electrode showed a stronger response to faces (green). The gray rectangle indicates the stimulus presentation time (150 ms). The shaded area indicates SEM (details are provided in ref. 13). (B) IFP responses for one of the whole objects for the electrode in A showing single-trial responses (gray, n = 9) and average response (green). The latency of the peak response is marked on the x axis. (C) Single-trial responses (n = 1) to four partial images of the same object in B. (D) New stimulus set for psychophysics experiments was constructed from the images in 650 trials from two electrodes in the physiology experiments. A raster of the neural responses for the example electrode in A, one trial per line, from partial image trials selected for psychophysics is shown. These trials elicited strong physiological responses with a wide distribution of response latencies (sorted by the neural latency). The color indicates the voltage (color scale on bottom). (Right, Inset) Zoomed-in view of the responses to the 82 trials in the preferred category. (E) We measured the effect of backward masking at various SOAs for each of the same partial exemplar images used in the physiology experiment (n = 33 subjects) and computed an MI for each image (). The larger the MI for a given image, the stronger was the effect of masking. (F) Correlation between the effect of backward masking (y axis, MI as defined in E) and the neural response latency (x axis, as defined in B and C). Each dot is a single partial object from the preferred category for electrode 1 (blue) or 2 (gray). Error bars for the MI are based on half-split reliability (), and the neural latency values are based on single trials. There was a significant correlation (Pearson r = 0.37; P = 0.004, linear regression, permutation test). cc, correlation coefficient.
Fig. 3.Standard feed-forward models were not robust to occlusion. (A) Performance of feed-forward computational models (colors) compared with humans (black) (also ). We used the feature representation of a subset of whole objects to train an SVM classifier and evaluated the model’s performance on the feature representation of partial objects (). The objects used to train the classifier did not appear as partial objects in the test set. Human performance is shown here (150-ms SOA) for the same set of images. Error bars denote SEM (fivefold cross-validation). The single-trial neural latency for each image (Fig. 2) was correlated with the distance of each partial object to its whole object category center for AlexNet pool5 (B) and AlexNet fc7 (C). Each dot represents a partial object, with responses recorded from either electrode 1 (blue dots) or electrode 2 (gray dots). The correlation coefficients (cc) and P values from the permutation test are shown for each subplot.
Fig. 4.Dynamic RNN showed improved performance over time and was impaired by backward masking. (A) Top-level representation in AlexNet (fc7) receives inputs from fc6, governed by weights W6→7. We added a recurrent loop within the top-level representation (RNN). The weight matrix W governs the temporal evolution of the fc7 representation (). (B) Performance of the RNNh (blue) as a function of visibility. The RNNh approached human performance (black curve) and represented a significant improvement over the original fc7 layer (red curve). The red and black curves are copied from Fig. 3 for comparison. Error bars denote SEM. (C) Temporal evolution of the feature representation for RNNh as visualized with stochastic neighborhood embedding. Over time, the representation of partial objects approaches the correct category in the clusters of whole images. (D) Overall performance of the RNNh as a function of recurrent time step compared with humans (top dashed line) and chance (bottom dashed line). Error bars denote SEM (five-way cross-validation; ). (E) Correlation (Corr.) in the classification of each object between the RNNh and humans. The dashed line indicates the upper bound of human–human similarity obtained by computing how well half of the subject pool correlates with the other half. Regressions were computed separately for each category, followed by averaging the correlation coefficients across categories. Over time, the model becomes more human-like (). Error bars denote SD across categories. (F) Effect of backward masking. The same backward mask used in the psychophysics experiments was fed to the RNNh model at different SOA values (x axis). Error bars denote SEM (five-way cross-validation). Performance improved with increasing SOA values ().