| Literature DB >> 34093154 |
Shan Xu1, Yiyuan Zhang1, Zonglei Zhen1, Jia Liu2.
Abstract
Can we recognize faces with zero experience on faces? This question is critical because it examines the role of experiences in the formation of domain-specific modules in the brain. Investigation with humans and non-human animals on this issue cannot easily dissociate the effect of the visual experience from that of the hardwired domain-specificity. Therefore, the present study built a model of selective deprivation of the experience on faces with a representative deep convolutional neural network, AlexNet, by removing all images containing faces from its training stimuli. This model did not show significant deficits in face categorization and discrimination, and face-selective modules automatically emerged. However, the deprivation reduced the domain-specificity of the face module. In sum, our study provides empirical evidence on the role of nature vs. nurture in developing the domain-specific modules that domain-specificity may evolve from non-specific experience without genetic predisposition, and is further fine-tuned by domain-specific experience.Entities:
Keywords: deep convolutional neural network; experience; face domain; face perception; visual deprivation
Year: 2021 PMID: 34093154 PMCID: PMC8173218 DOI: 10.3389/fncom.2021.626259
Source DB: PubMed Journal: Front Comput Neurosci ISSN: 1662-5188 Impact factor: 2.380
Figure 1(A) An illustration of the screening to remove images containing faces for the d-AlexNet. The “faces” shown in the figure were AI-generated for illustration purpose only, and therefore have no relation to real person. In the experiment, face images were from the ImageNet, with real persons' faces. (B) The classification performance across categories of the two DCNNs was comparable. (C) Both DCNNs achieved high accuracy in categorizing faces from other images. (D) Both DCNNs' performance in discriminating faces was above the chance level, and the d-AlexNet's accuracy was significantly higher than that of the AlexNet. The error bars in (B) denote the standard error of the mean across the 205 categories in the Classification dataset. The error bars in (D) denote the standard error of the mean across the 133 identities in the Discrimination dataset. The asterisk denotes statistical significance (α = 0.05). n.s. denotes no significance.
Figure 2(A) The category-wise activation profiles of example face-selective channels of the AlexNet (left) and the d-AlexNet (right). The “faces” shown here were AI-generated for illustration purpose only. (B) The R2 maps of the regression with the activation of the d-AlexNet's (right) or the AlexNet's face-selective channels (left) as the independent variables. The higher R2 in multiple regression, the better correspondence between the face channels in the DCNNs and the face-selective regions in the human brain. The crimson lines delineate the ROIs of the OFA and the FFA. (C) The face-channels of both DCNNs corresponded better with the FFA than the OFA, and the difference between the AlexNet and the d-AlexNet was larger in the FFA. (D) Face inversion effect. The average activation amplitude of the top two face-selective channels differed in response to upright and inverted faces in the AlexNet but not the d-AlexNet. The error bar denotes standard error. The asterisk denotes statistical significance (α = 0.05). n.s. denotes no significance.
Figure 3(A) The within-category similarity in the face category and an unseen non-face category (bowling pins) in the DCNNs. (B) The between-category similarity between faces and bowling pins. (C) The activation maps of a typical face-selective channel of each DCNN in responses to natural images containing faces. Each pixel denotes activation in one unit. The color denotes the activation amplitude (a.u.). (D) The extent of activation of the face-selective channels of each DCNN in responses to natural images containing faces. (E) The empirical receptive fields of the top two face-selective channels of each DCNN. The color denotes the average activation amplitude (a.u, see section Sparse Coding and Empirical Receptive Field). The error bar denotes standard error. The asterisk denotes statistical significance (α = 0.05). The real faces used in this figure are adapted from the FITW dataset.