© Anatomical Travelogue/Photo Researchers
In the early visual system, signals travel from the retina through the visual thalamus near the middle of the brain to area V1 at the back of the brain. V1 sends signals to the rest of visual cortex.
Since long before the word neuroscience was coined, the community has devoted substantial resources to studying the visual system, and for good reason. The visual system occupies a huge portion of the brain–about 40% of the cerebral cortex in monkeys. But with roughly 30 processing areas in the cortex, fed either directly or indirectly from the primary visual area, V1, deciding the correct entry point is a challenge.
With a few well characterized exceptions, not much is known about the responses of these processing centers. Indeed, besides conjecturing that it must somehow be advantageous, we don't really know why visual processing is distributed into so many...
COMMITTED TO PREDICTION
This question is the topic of a Society for Neuroscience symposium, which Nicole Rust and I organized.1 The participants are all involved in a similar effort: They design simple quantitative models of visual responses and test these models on neurons in the early visual system. Their research concentrates on different portions of the visual system, from the retina (Jonathan Demb), through the visual thalamus (Valerio Mante), to area V1 (David Tolhurst, Yang Dan, and Bruno Olshausen), and to visual areas V2 and V4 (Jack Gallant).
These scientists share a commitment to prediction. We can say that we know what the early visual system does only if we can predict its response to arbitrary stimuli. These stimuli should include both the simple images used commonly in the laboratory (spots, bars, gratings, and plaids) and the more complex images encountered in nature. Very few existing models have been held to this rigorous test. Efforts to predict responses of neurons in the early visual system to such images have been made for the visual thalamus2 and for area V1.34 These mostly have involved the simplest possible model of visual response, one based on pure linear filtering of the images. And as Demb has pointed out, no published efforts to date predict retinal response to complex natural images.
The models used to explain responses should be as simple as possible. They should capture the system's computation while remaining intuitive and, crucially, have a limited number of parameters, which can be derived directly from the data. But this simplicity has a cost: There is not likely to be a one-to-one mapping between model components and the underlying biophysics and anatomy. While we may be able to predict what the system does, answers will still be lacking as to how it does it.
All the models being presented at our symposium share a common arrangement, an elaboration of the classical concept of receptive field. The receptive field is the window through which a visual neuron observes the scene. Mathematically, it specifies the weights that a neuron applies to each image location. A neuron that responds purely as dictated by its receptive field would operate exactly as one of the linear filters that are well known to engineers: It would compute a weighted sum. This simple and intuitive description has dominated visual neuroscience since the 1960s, allowing the field to form solid ties with germane disciplines such as image processing and visual psychophysics.
In the last couple of decades, however, a number of nonlinear phenomena were discovered, which cannot be explained by the receptive field alone. These nonlinearities have been discovered at all stages of the early visual system. New models were developed, which built upon the receptive field endowing it with mechanisms that adjust responses based on the prevalent stimulus conditions, making the neuron more responsive if a stimulus is weak, or preceded by a weak stimulus, or surrounded by a weak stimulus.
Contemporary models, including those presented by the participants in the symposium, all include a linear receptive field but accompany it with a number of nonlinear mechanisms. At the output of the receptive field it is common to place a "pointwise" nonlinearity, in which output depends on the response of the neuron and not on the responses of its neighbors. At the input of the receptive field it is common to place more complex, "visual processing" nonlinearities, in which output depends on the intensity of the image at a number of spatial locations. This nonlinearity can be simple, such as an adjustment of responsiveness, or it can be more complex, such as the extraction of edges5 or of the amplitude of a Fourier Transform.4
How well these models do depends on the visual stage, and on one's perspective. Demb and Mante indicate that for the retina and the visual thalamus we know what the ingredients should be to account for a large part of the stimulus-driven responses. The opinions diverge when it comes to area V1. Tolhurst and Dan argue that even the simplest versions of the receptive-field model for V1 capture the gist of neuronal responses in this area. Results from the Gallant laboratory provide an estimate of this performance, and indicate that the receptive field alone explains about 35% of the responses,4 a good start but certainly one with room for improvement. Adding the known nonlinearities to the receptive field may provide a much higher performance.
The natural images at bottom were taken by a camera on the head of a cat roaming the forests of Switzerland (C. Kayser et al,
Olshausen, however, argues that we know very little about what V1 does.6 He points to a number of limitations in the current approach to the study of area V1: One is strongly biased neuron sampling, which ignores quieter neurons; another is the ecological deviance between simple laboratory stimuli and more complex images encountered in nature. Natural images contain much information about spatial structure, such as shading and cast shadows, which might be crucial in determining the responses of visual neurons even in an early area such as V1.
Indeed, the best use of complex stimuli such as natural images remains a point of contention among symposium participants. Mante and Tolhurst use them only to test a model that has been constrained with simpler stimuli. An alternative approach, proposed for example by Gallant and by Olshausen is to use them also to discover the appropriate type of model and constrain the model. The first approach posits that appropriate models of neural function are so nonlinear that it would be hopeless to try to fit them to responses to complicated stimuli. Thus it would be better to constrain the model with simpler stimuli such as spots and gratings. On the other hand, neurons in visual cortex, and particularly in areas beyond V1, are likely to have scene-analysis specialization that goes well beyond the extraction of edges and similar low-level image processing. It could become pointless to try to characterize these neurons using simple stimuli. Simple stimuli might perhaps become useful after a wide exploration is made with complex, natural stimuli and the general outlines of the mechanisms underlying the responses have been elucidated.
Clearly, much work lies ahead before we can say that we understand what the early visual system does. The goal of the symposium is to discuss and if possible overcome the differences of opinion. The way forward lies in establishing a shared method of analysis of the different visual stages. These efforts will bring the field of visual neuroscience closer to that of established quantitative fields such as physics. In such fields there is wide agreement as to what constitutes a "standard theory" and which results should be the source of surprise. Thanks to the participants in the symposium, the coming years will certainly bring great improvements in our understanding of the early visual system.
Matteo Carandini is a scientist at the Smith-Kettlewell Eye Research Institute in San Francisco. He works to decipher what the primary visual cortex and the visual thalamus contribute to early visual processing.
He can be contacted at