| Literature DB >> 21549634 |
Cathy J Price1, Joseph T Devlin.
Abstract
The ventral occipitotemporal cortex (vOT) is involved in the perception of visually presented objects and written words. The Interactive Account of vOT function is based on the premise that perception involves the synthesis of bottom-up sensory input with top-down predictions that are generated automatically from prior experience. We propose that vOT integrates visuospatial features abstracted from sensory inputs with higher level associations such as speech sounds, actions and meanings. In this context, specialization for orthography emerges from regional interactions without assuming that vOT is selectively tuned to orthographic features. We discuss how the Interactive Account explains left vOT responses during normal reading and developmental dyslexia; and how it accounts for the behavioural consequences of left vOT damage.Entities:
Mesh:
Year: 2011 PMID: 21549634 PMCID: PMC3223525 DOI: 10.1016/j.tics.2011.04.001
Source DB: PubMed Journal: Trends Cogn Sci ISSN: 1364-6613 Impact factor: 20.229
Figure 1Visual word recognition in the ventral occipitotemporal cortex (vOT). (a) The anatomy of vOT and its relation to activation for visual word recognition (red-yellow) shown on the ventral surface of an inflated left hemisphere. vOT is centred on the occipitotemporal sulcus (broken white line) at the transition from the occipital (blue) to the temporal lobe (green).(b) Examples of simple shape stimuli that are important for recognizing both visual words and objects. Neurons within V2 respond to these types of simple shapes and project to V4, where the cells have more complex receptive fields that respond to combinations of these shapes within a retinotopic reference frame. These in turn project to vOT neurons that have receptive fields with multidimensional tuning functions, where simple shape elements are combined nonlinearly in an object-centred reference frame. Thus, unlike earlier visual areas, it is difficult – if not impossible – to find the optimal stimulus driving a cell using a simple line drawing. Adapted with permission from [51]. (c) A hypothetical example of a complex, object-centred receptive field for a vOT neuron. On the left are three ‘J's of different sizes in different retinal positions. Within early retinotopic areas, each J would be encoded by non-overlapping sets of neurons. By contrast, the receptive field illustrated on the right by a three by three grid of panels provides a more compact, stable object-centred representation. Here, curvature and orientation are plotted recursively within each receptive field region such that it will respond strongly to any combination of a vertical straight line at the top right and a concave-up curved horizontal line at the bottom. Although it is tempting to call this a ‘J-detector’, this would be incorrect – the receptive field responds equally well to the handle of an umbrella or trunk of an elephant but does not respond to the letter j written in script. Reproduced with permission from [52]. cs, collateral sulcus; mt, visual motion area; ots, occipitotemporal sulcus; pITG, posterior inferior temporal gyrus; sts, superior temporal sulcus; V1, central field of primary visual cortex; V2, secondary visual cortex; V4v, ventral component of visual area 4.
Figure 2Activation in ventral occipitotemporal cortex (vOT) according to the predictive coding framework. The schematic in (a), adapted from [22], outlines the hierarchical architecture that underlies neuronal responses involved in the perception of visual inputs according to the predictive coding framework [22]. It shows the putative (pyramidal) cells that send forward driving connections (red) from the supragranular cortical layer; and nonlinear (modulatory) backward connections (black) from the infragranular layer. The backward connections predict the response to the forward connections. Predictions are optimized to minimize prediction error at each level in the hierarchy. Prediction error is the difference between the top-down prediction and the representations being predicted at each level. Prediction errors change the predictions through recurrent neuronal message passing until the error is minimized. Recurrent connectivity between different levels of the hierarchy is optimized by experience and therefore depends on learning (as illustrated by the broken lines between vOT and higher levels). In functional magnetic resonance imaging, activation is a measure of combined neuronal firing from the stimulus, predictions and their prediction error.(b) Inverted-U shape of activation levels in vOT across three stages of learning. Before learning (stage 1), activation from top-down predictions is precluded because stimuli cannot elicit them (because the appropriate associations have not been learned). This would be the case, for example, in pre-literates and illiterates viewing orthographic stimuli that have no semantic or phonological associations [53] or in literates viewing an unknown orthography (e.g. English readers viewing Chinese characters or an artificial orthography) [1]. In contrast, vOT activation levels are highest during learning (stage 2), when the stimulus is recognized as potentially meaningful (with semantic or phonological associations) but is not predicted efficiently (high prediction error). An example here would be when subjects view pseudowords (that engage high-level representations) but cannot predict their visual form efficiently [41]. With practice, exposure and experience-dependent learning or expertise (stage 3), prediction error decreases and vOT activation declines. The difference between stages 2 and 3 explains why vOT responses are lower for high versus low frequency words [43], real words relative to pseudowords [42] and when words are primed by identical words versus pseudowords [45].