Nicolas Heess PhD

Nicolas Heess


Publications:
2014
  Visual Boundary Prediction: A Deep Neural Prediction Network and Quality Dissection
Kivinen, J, Williams, CKI & Heess, N 2014, Visual Boundary Prediction: A Deep Neural Prediction Network and Quality Dissection. in Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics. vol. 33, Journal of Machine Learning Research: Workshop and Conference Proceedings, pp. 512-521.
This paper investigates visual boundary detection, i.e. prediction of the presence of a boundary at a given image location. We develop a novel neurally-inspired deep architecture for the task. Notable aspects of our work are (i) the use of “covariance features” [Ranzato and Hinton, 2010] which depend on the squared response of a filter to the input image, and (ii) the integration of image information from multiple scales and semantic levels via multiple streams of interlinked, layered, and non-linear “deep” processing. Our results on the Berkeley Segmentation Data Set 500 (BSDS500) show comparable or better performance to the top-performing methods [Arbelaez et al., 2011, Ren and Bo, 2012, Lim et al., 2013, Dollár and Zitnick, 2013] with effective inference times. We also propose novel quantitative assessment techniques for improved method understanding and comparison. We carefully dissect the performance of our architecture, feature-types used and training methods, providing clear signals for model understanding and development.
General Information
Organisations: Institute for Adaptive and Neural Computation .
Authors: Kivinen, Jyri, Williams, Christopher K. I. & Heess, Nicolas.
Number of pages: 10
Pages: 512-521
Publication Date: 2014
Publication Information
Category: Conference contribution
Original Language: English
  The Shape Boltzmann Machine
Eslami, SMA, Heess, N, Williams, CKI & Winn, J 2014, 'The Shape Boltzmann Machine: A Strong Model of Object Shape' International Journal of Computer Vision, vol 107, no. 2, pp. 155-176. DOI: 10.1007/s11263-013-0669-1

A good model of object shape is essential in applications such as segmentation, detection, inpainting and graphics. For example, when performing segmentation, local constraints on the shapes can help where object boundaries are noisy or unclear, and global constraints can resolve ambiguities where background clutter looks similar to parts of the objects. In general, the stronger the model of shape, the more performance is improved. In this paper, we use a type of deep Boltzmann machine (Salakhutdinov and Hinton, International Conference on Artificial Intelligence and Statistics, 2009) that we call a Shape Boltzmann Machine (SBM) for the task of modeling foreground/background (binary) and parts-based (categorical) shape images. We show that the SBM characterizes a strong model of shape, in that samples from the model look realistic and it can generalize to generate samples that differ from training examples. We find that the SBM learns distributions that are qualitatively and quantitatively better than existing models for this task.


General Information
Organisations: Institute for Adaptive and Neural Computation .
Authors: Eslami, S. M. Ali, Heess, Nicolas, Williams, Christopher K. I. & Winn, John.
Keywords: (Shape, Generative, Deep Boltzmann machine, Sampling, BELIEF NETWORKS, RANDOM-FIELDS, IMAGE, SEGMENTATION, ANNOTATION, EXPERTS, PRIORS. )
Number of pages: 22
Pages: 155-176
Publication Date: Apr 2014
Publication Information
Category: Article
Journal: International Journal of Computer Vision
Volume: 107
Issue number: 2
ISSN: 0920-5691
Original Language: English
DOIs: 10.1007/s11263-013-0669-1
2012
  Searching for objects driven by context
Alexe, B, Heess, N, Teh, YW & Ferrari, V 2012, Searching for objects driven by context. in NIPS 2012. pp. 890-898.
The dominant visual search paradigm for object class detection is sliding windows. Although simple and effective, it is also wasteful, unnatural and rigidly hardwired.We propose strategies to search for objects which intelligently explore the space of windows by making sequential observations at locations decided based on previous observations. Our strategies adapt to the class being searched and to the content of a particular test image, exploiting context as the statistical relation between the appearance of a window and its location relative to the object, as observed in the training set. In addition to being more elegant than sliding windows, we demonstrate experimentally on the PASCAL VOC 2010 dataset that our strategies evaluate two orders of magnitude fewer windows while achieving higher object detection performance.
General Information
Organisations: Institute of Perception, Action and Behaviour .
Authors: Alexe, Bogdan, Heess, Nicolas, Teh, Yee Whye & Ferrari, Vittorio.
Keywords: (object detection, object recognition. )
Number of pages: 9
Pages: 890-898
Publication Date: 1 Dec 2012
Publication Information
Category: Conference contribution
Original Language: English
2011
  Multimodal Nonlinear Filtering Using Gauss-Hermite Quadrature
Saal, H, Heess, N & Vijayakumar, S 2011, Multimodal Nonlinear Filtering Using Gauss-Hermite Quadrature. in D Gunopulos, T Hofmann, D Malerba & M Vazirgiannis (eds), Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2011, Athens, Greece, September 5-9, 2011, Proceedings, Part III. Lecture Notes in Computer Science, vol. 6913, Springer Berlin Heidelberg, pp. 81-96. DOI: 10.1007/978-3-642-23808-6_6
In many filtering problems the exact posterior state distribution is not tractable and is therefore approximated using simpler parametric forms, such as single Gaussian distributions. In nonlinear filtering problems the posterior state distribution can, however, take complex shapes and even become multimodal so that single Gaussians are no longer sufficient. A standard solution to this problem is to use a bank of independent filters that individually represent the posterior with a single Gaussian and jointly form a mixture of Gaussians representation. Unfortunately, since the filters are optimized separately and interactions between the components consequently not taken into account, the resulting representation is typically poor. As an alternative we therefore propose to directly optimize the full approximating mixture distribution by minimizing the KL divergence to the true state posterior. For this purpose we describe a deterministic sampling approach that allows us to perform the intractable minimization approximately and at reasonable computational cost. We find that the proposed method models multimodal posterior distributions noticeably better than banks of independent filters even when the latter are allowed many more mixture components. We demonstrate the importance of accurately representing the posterior with a tractable number of components in an active learning scenario where we report faster convergence, both in terms of number of observations processed and in terms of computation time, and more reliable convergence on up to ten-dimensional problems.
General Information
Organisations: Institute of Perception, Action and Behaviour .
Authors: Saal, Hannes, Heess, Nicolas & Vijayakumar, Sethu.
Number of pages: 16
Pages: 81-96
Publication Date: 2011
Publication Information
Category: Conference contribution
Original Language: English
DOIs: 10.1007/978-3-642-23808-6_6
  Weakly Supervised Learning of Foreground-Background Segmentation using Masked RBMs
Heess, N, Le Roux, N & Winn, J 2011, Weakly Supervised Learning of Foreground-Background Segmentation using Masked RBMs. in Artificial Neural Networks and Machine Learning – ICANN 2011: 21st International Conference on Artificial Neural Networks, Espoo, Finland, June 14-17, 2011, Proceedings, Part II. Lecture Notes in Computer Science, vol. 6792, Springer Berlin Heidelberg, pp. 9-16, International Conference on Artificial Neural Networks (ICANN), Espoo, Finland, 14-17 June. DOI: 10.1007/978-3-642-21738-8_2
We propose an extension of the Restricted Boltzmann Machine (RBM) that allows the joint shape and appearance of foreground objects in cluttered images to be modeled independently of the background. We present a learning scheme that learns this representation directly from cluttered images with only very weak supervision. The model generates plausible samples and performs foreground-background segmentation. We demonstrate that representing foreground objects independently of the background can be beneficial in recognition tasks.
General Information
Organisations: Neuroinformatics DTC.
Authors: Heess, Nicolas, Le Roux, Nicolas & Winn, John.
Number of pages: 8
Pages: 9-16
Publication Date: 2011
Publication Information
Category: Conference contribution
Original Language: English
DOIs: 10.1007/978-3-642-21738-8_2
  Learning a Generative Model of Images by Factoring Appearance and Shape
Le Roux, N, Heess, N, Shotton, J & Winn, J 2011, 'Learning a Generative Model of Images by Factoring Appearance and Shape' Neural Computation, vol 23, no. 3, pp. 593-650. DOI: 10.1162/NECO_a_00086
Computer vision has grown tremendously in the last two decades. Despite all efforts, existing attempts at matching parts of the human visual system's extraordinary ability to understand visual scenes lack either scope or power. By combining the advantages of general low-level generative models and powerful layer-based and hierarchical models, this work aims at being a first step towards richer, more flexible models of images. After comparing various types of RBMs able to model continuous-valued data, we introduce our basic model, the masked RBM, which explicitly models occlusion boundaries in image patches by factoring the appearance of any patch region from its shape. We then propose a generative model of larger images using a field of such RBMs. Finally, we discuss how masked RBMs could be stacked to form a deep model able to generate more complicated structures and suitable for various tasks such as segmentation or object recognition.
General Information
Organisations: Neuroinformatics DTC.
Authors: Le Roux, Nicolas, Heess, Nicolas, Shotton, Jamie & Winn, John.
Pages: 593-650
Publication Date: 1 Mar 2011
Publication Information
Category: Article
Journal: Neural Computation
Volume: 23
Issue number: 3
ISSN: 0899-7667
Original Language: English
DOIs: 10.1162/NECO_a_00086
2010
  Direction Opponency, Not Quadrature, Is Key to the 1/4 Cycle Preference for Apparent Motion in the Motion Energy Model
Heess, N & Bair, W 2010, 'Direction Opponency, Not Quadrature, Is Key to the 1/4 Cycle Preference for Apparent Motion in the Motion Energy Model' Journal of Neuroscience, vol 30, no. 34, pp. 11300-11304. DOI: 10.1523/JNEUROSCI.1271-10.2010
Sensitivity to visual motion is a fundamental property of neurons in the visual cortex and has received wide attention in terms of mathematical models.Akey feature of many popular models for cortical motion sensors is the use of pairs of functions that are related by a 90° phase shift. This phase relationship, known as quadrature, is the hallmark of the motion energy model and played an important role in the development of a class of model dubbed elaborated Reichardt detectors. For decades, the literature has supported a link between quadrature and the observation that motion detectors and human observers often prefer a 1/4 cycle displacement of an apparent motion stimulus that consists of a pair of sinusoidal gratings. We show that there is essentially no link between quadrature and this preference. Quadrature is neither necessary nor sufficient for a motion sensor to prefer 1/4 cycle displacement, and motion energy is not maximized for a 1/4 cycle step. Other properties of motion sensors are the key: the opponent subtraction of two oppositely tuned stages that individually have sinusoidal displacement tuning curves. Thus, psychophysical and neurophysiological data revealing a preference at or near 1/4 cycle displacement do not offer specific support for common quadrature or energy-based motion models. Instead, they point to a broader class of model.
General Information
Organisations: Neuroinformatics DTC.
Authors: Heess, Nicolas & Bair, Wyeth.
Pages: 11300-11304
Publication Date: 25 Aug 2010
Publication Information
Category: Article
Journal: Journal of Neuroscience
Volume: 30
Issue number: 34
ISSN: 0270-6474
Original Language: English
DOIs: 10.1523/JNEUROSCI.1271-10.2010
  Deep Segmentation Networks
Le Roux, N, Heess, N, Shotton, J & Winn, J 2010, Deep Segmentation Networks. in The Learning Workshop Snowbird.
There are two dominating trends when it comes to building a generative model of images: one may have a fully generic model, often so general that it struggles to learn complex structures, or a model relying heavily on prior knowledge, often too restrictive to learn about the wide variety of images. This work aims at combining the advantages of general low-level generative models and powerful layer-based and hierarchical models, with the hope of being a first step towards richer, more flexible models of images. It incorporates features from both groups of works mentioned above: (i) the modeling of an image as the combination of multiple objects occluding each other, each object having its own appearance and shape; (ii) the use of fully generic models for these appearances and shapes, based on Restricted Boltzmann Machines (RBMs); (iii) a hierarchical structure to model objects of all sizes and at all scales.
General Information
Organisations: Neuroinformatics DTC.
Authors: Le Roux, Nicolas, Heess, Nicolas, Shotton, Jamie & Winn, John.
Publication Date: 2010
Publication Information
Category: Conference contribution
Original Language: English
2009
  Learning generative texture models with extended Fields-of-Experts
Heess, N, Williams, CKI & Hinton, GE 2009, Learning generative texture models with extended Fields-of-Experts. in Proceedings of the British Machine Vision Conference. BMVA Press, pp. 115.1-115.11. DOI: 10.5244/C.23.115
We evaluate the ability of the popular Field-of-Experts (FoE) to model structure in images. As a test case we focus on modeling synthetic and natural textures. We find that even for modeling single textures, the FoE provides insufficient flexibility to learn good generative models – it does not perform any better than the much simpler Gaussian FoE. We propose an extended version of the FoE (allowing for bimodal potentials) and demonstrate that this novel formulation, when trained with a better approximation of the likelihood gradient, gives rise to a more powerful generative model of specific visualstructure that produces significantly better results for the texture task.
General Information
Organisations: Institute for Adaptive and Neural Computation .
Authors: Heess, Nicolas, Williams, Christopher K. I. & Hinton, Geoffrey E..
Pages: 115.1-115.11
Publication Date: 2009
Publication Information
Category: Conference contribution
Original Language: English
DOIs: 10.5244/C.23.115
2008
  Spatial integration in direction selective cortical neurons and the notion of a fundamental spatial subunit
Heess, N, Blacker, E, McLelland, D, Ahmed, B & Bair, W 2008, 'Spatial integration in direction selective cortical neurons and the notion of a fundamental spatial subunit' Society for Neuroscience Annual meeting, 2008, Washington DC, United States, 15/11/08 - 19/11/08, .
In the primary visual cortex (V1), images are broken down into spatially-localized components by neurons having narrow-band orientation and spatial frequency (SF) tuning. Outputs of these neurons would make ideal inputs for a popular class of models of motion detection, i.e., motion energy models. Recent work in V1 and in the cortical motion area V5/MT,however, has shown that direction selective (DS) neurons compute motion with a fundamental spatial subunit that is small in scale and relatively fixed across the visual field (Livingstone et al., 2001, Neuron 30:781; Pack etal., 2006, JNeurosci 26:893). This presents a paradox: why should the visual system represent images using a wide array of SF-tuned channels and then only use a narrow range of those channels for motion computation, and why would spatial scaling with eccentricity not apply to the motion pathway? To test whether spatial integration varied with changes in stimulus structure, we made extracellular recordings from DS neurons in V1 and V5/MT in the anesthetized macaque monkey, and we varied the spatial scale of moving visual stimuli to determine the size of the displacements that were optimal at each scale. We used two stimuli, a sinusoidal grating and a two-bar stimulus that had been used by others. The sinusoidal grating patch was presented at a variety of SFs and stepped according to a random sequence at various displacements along the preferred axis of motion of the cell. The two-bar stimulus was the one used by Livingstone et al. except that we varied the width of the bars and the range over which they were presented across trials. We presented the same sets of stimuli to two commonly used models for DS neurons: a Reichardt detector and a motion energy model. We found that for the large majority of neurons, the optimal displacement increased as SF decreased for the sinusoidal grating stimulus, and the optimal displacement over all SFs for a neuron was highly correlated with the optimal SF of the neuron. Using the two-bar stimulus, we found that the optimal step size changed with the width of the bars being used. These experimental results suggest that motion is computed across a range of spatial scales and that there is not a fundamental, small step size that characterizes most neurons. Both the motion energy and Reichardt models were able to match the experimental results to some degree, but our motion energy model gave a better account for the shapes of the optimal displacement tuning curves for the two-bar stimuli. Thus, the observed changes of the scale of motion computations within and across cells appear to be an inherent property of simple models of motion detection.
General Information
Organisations: Neuroinformatics DTC.
Authors: Heess, Nicolas, Blacker, Edward, McLelland, Douglas, Ahmed, Bashir & Bair, Wyeth.
Publication Date: 2008
Publication Information
Category: Poster
Original Language: English

Projects:
Learning Probabilistics Hierarchical Image Models (PhD)