# Past Events (2011)

**29/11/11 11:00 - 12:00**

**ANC/DTC Seminar: Manfred Opper, Technische Universität Berlin (Host: Guido Sanguinetti)**

**Expectation Propagation: A Physicist's Perspective**

**22/11/11 11:00 - 12:00**

**ANC/DTC Seminar: Arthur Gretton, Gatsby Computational Neuroscience Unit, University College London Host: Charles Sutton**

**Hypothesis Testing and Bayesian Inference: New Applications of Kernel Methods **

In the early days of kernel machines research, the "kernel trick" was considered a useful way of constructing nonlinear learning algorithms from linear ones, by applying the linear algorithms to feature space mappings of the original data. More recently, it has become clear that a potentially more far reaching use of kernels is as a linear way of dealing with higher order statistics, by mapping probabilities to a suitable reproducing kernel Hilbert space (i.e., the feature space is an RKHS). I will describe how probabilities can be mapped to kernel feature spaces, and how to compute distances between these mappings. A measure of strength of dependence between two random variables follows naturally from this distance. Applications that make use of kernel probability embeddings include:

- Nonparametric two-sample testing and independence testing in complex (high dimensional) domains. In the latter case, we test whether text in English is translated from the French, as opposed to being random extracts on the same topic.
- Inference on graphical models, in cases where the variable interactions are modeled nonparametrically (i.e., when parametric models are impractical or unknown). In experiments, this approach outperforms state-of-the-art nonparametric techniques in 3-D depth reconstruction from 2-D images, and on a protein structure prediction task.

**08/11/11 11:00 - 12:00**

**ANC/DTC Seminar: Dr. Jörg Lücke, University of Frankfurt (Host: Peggy Series)**

**Representational Learning of Sensory Data Components**

In the nervous system of humans and animals, sensory data are represented as combinations of elementary data components. While for data such as sound waveforms the elementary components combine linearly, other data can better be modeled by non-linear forms of component superpositions. I motivate and discuss two models of component extraction: one using standard linear superpositions of basis functions (similar to standard sparse coding) and one using non-linear superpositions. Crucial for the applicability of both models are efficient learning procedures. I briefly introduce a novel training scheme (truncated variational EM) and show how it can be applied to probabilistic generative models. For linear and non-linear models the scheme efficiently infers the basis functions as well as the level of sparseness and data noise. Furthermore, I discuss the close relation of the approach to neural processing. In large-scale applications to image patches, we show results on the statistics of the inferred parameters of linear and non-linear models. Differences between the models are discussed, and both models are compared to results of standard approaches in the literature and to experimental findings. Finally, I briefly discuss other recent projects of my group including learning in a model that takes explicit component occlusions into account.

**03/11/11 12:00 - 13:00**

**ANC/DTC Seminar: Dmitri Rusakov, UCL (Host: Mark van Rossum)**

**Electrodiffusion of glutamate in the cleft enables coincidence detection in excitatory neural circuits**

The synaptic response waveform, which determines signal integration properties in the brain, depends on the spatiotemporal profile of neurotransmitter in the synaptic cleft. We have found that electric interactions between postsynaptic excitatory currents and negatively charged glutamate molecules accelerate the clearance of glutamate from the synaptic cleft, thus speeding up synaptic responses. Rapid voltage-dependent temporal tuning of excitatory currents may thus contribute to signal integration in the dendritic tree. Furthermore, we find that a single postsynaptic action potential is sufficient to significantly decelerate intra-cleft diffusion of glutamate. At excitatory synapses formed on electrically compact cerebellar granule cells, this deceleration boosts activation of metabotropic glutamate receptors at the synaptic periphery. The coincidence between postsynaptic spikes and presynaptic discharges of glutamate could therefore induce a lasting change of excitatory transmission thus altering signal summation rules of this circuitry. The results unveil a basic synaptic memory induction mechanism which depends on glutamate electrodiffusion and acts with exceptional temporal precision.

**25/10/11 11:00 - 12:00**

**ANC/DTC Seminar: Onno Zoeter, Xerox Research Centre Europe (Host: Charles Sutton)**

**Optimal Learning While Selling **

This talk studies optimal price learning for one or more items. This is an important example of machine learning in the context of strategic data sources. We introduce the Schrödinger price experiment (SPE) which superimposes classical price experiments using lotteries, and thereby extracts more information from each customer interaction. If buyers are perfectly rational we show that there exist SPEs that in the limit of infinite superposition learn optimally and exploit optimally. Such selling mechanisms can therefore be considered as an optimal way of learning and selling. We refer to the new resulting mechanism as the hopeful mechanism (HM) since although it is incentive compatible, buyers can deviate with extreme consequences for the seller at very little cost to themselves. For real-world settings we propose a robust version of the approach which takes the form of a Markov decision process where the actions are functions. This is joint work with Chris Dance.

**04/10/11 11:00 - 12:00**

**ANC/DTC Seminar: Derek Gatherer, University of Glasgow (Host: David Sterratt)**

**Open Machine Learning Problems in Bioinformatics
**

This talk will explore two unjustly neglected areas of computational biology: 1) the application of Self-Organizing Maps (SOMs) for classification of DNA sequences, showing how they can be used for cataloguing the species content of environmental samples (metagenomics), 2) molecular biocryptography, presenting a novel word detection algorithm that successfully extracts most of the vocabulary from texts of human origin, and when applied to protein sequences reveals the presence of molecular homonyms - sequences that although identical are likely to have different functions. My aim is to show computer scientists that there is still interesting work to be done in bioinformatics beyond the usual set of fashionable problems.

**20/09/11 14:00 - 15:00**

**ANC/DTC Seminar: Matthias Bethge, Max Planck Institute (Host: Chris Williams)**

**Surprising Effects of Correlations in Neural Population Codes**

Several recent studies have shown that the Ising model can provide surprisingly accurate descriptions of neuronal population activity. It has also been shown, however, that its higher-order correlations sometimes failed to explain those found in neural recordings in characteristic and interesting ways. Here, we use a population of threshold neurons receiving correlated inputs to model neural population recordings. We show analytically that small changes in second-order correlations can lead to large changes in higher-order redundancies, and that the resulting interactions have a strong impact on the entropy, sparsity, and statistical heat capacity of the population. Our findings for this simple model show that it can account for both the success and the shortcomings of the Ising model. As an additional teaser, I will present a short summary of new results on how heterogeneities change the effect of noise correlations on the accuracy of population codes.

**20/09/11 11:00 - 12:00**

**ANC/DTC Seminar: Aapo Hyvarinen, University of Helsinki (Host: Chris Williams)**

**Unsupervised Machine Learning for Analysis of EEG and MEG at rest
**

Recently, a lot of brain imaging has concentrated on analyzing brain activity at rest, i.e. when the subject is not doing anything particular and does not receive specific stimulation. When using functional magnetic resonance imaging (fMRI), such analysis is typically done by independent component analysis (ICA). However, there has not been very much work on analysing resting activity measured by EEG or MEG. We have recently developed various methods to analyse EEG/MEG data at rest. First, we have created new variants of ICA to more completely exploit the statistical structure of EEG/MEG data. Second, we have developed tests of the statistical significance of the independent components. Third, we have a framework for analysis of causality (connectivity) which uses the non-Gaussianity of the data and hopefully it can be used to analyse the connectivity of the independent components.

**13/09/11 11:00 - 12:00**

**SICSA Distinguished Visitor: Sharon Crook, Arizona State University (Host: Jim Bednar)**

**A Continuum Model for Structural Plasticity of Dendritic Spines **

Recent evidence indicates that the morphology and density of dendritic spines are regulated during synaptic plasticity. High-frequency stimuli that induce long-term potentiation have been associated with increases in the number and size of spines. In contrast, low-frequency stimuli that induce long-term depression are associated with decreases in the number and size of spines. This activity-dependent structural plasticity occurs over a vast range of time scales, from minutes to days or weeks. In this work, we extend previous modeling studies to include calcium-mediated spine restructuring. The models are based on the dimensionless cable equations with additional equations that characterize the activity-dependent changes in spines along the dendrite. Computational studies are used to investigate the impact of the dynamics of spines on the output properties of the dendrite.

**08/07/11 11:00 - 12:00**

**SICSA Distinguished Visiting Fellow: Tong Zhang (Host: Charles Sutton)**

**Spectral Methods for Learning Graphical Models**

This talk presents a methodology for learning graphical models with hidden nodes that I have been studying with collaborators in recentyears. The idea is to employ algebraic techniques (in particular, matrix decomposition and spectral methods) to learn unobserved quantities in graphical models.The talk focuses on tree models, and covers two aspects of the underlying learning problem: parameter estimation and structural learning.

The first part is concerned with parameter estimation, where an algorithm called learnHMM is presented that learns hidden Markov models. It is shown that this method can efficiently recover the correct HMM dynamics with a sample complexity depending on some mild conditions of the underlying system. The advantage of this approach over some traditional methods (such as EM) is that our algorithm does not suffer from local minimum issues in nonconvex optimization, and it handles high dimensional observations and long range dependencies more easily. The method can be extended to estimating parameters for nonlinear systems and general tree structured graphical models with unobserved nodes.

The second part is concerned with structural learning, where an algorithm is presented to learn the underlying tree topology of a broad class of multivariate tree models with hidden nodes. Exact recovery of the tree structure can be established based on certain natural dependencies on statistical and structural properties of the underlying joint distribution. This method handles high dimensional observations and is more general than existing approaches.

Collaborators: Daniel Hsu, Sham Kakade, Anima Anandkumar, Le Song

**05/07/11 14:00 - 15:00**

**ANC/DTC Seminar: Rob Fergus (Host: Chris Williams)**

**Deconvolutional Networks**

We present a hierarchical model that learns image decompositions via alternating layers of convolutional sparse coding and max pooling. When trained on natural images, the layers of our model capture image information in a variety of forms: low-level edges, mid-level edge junctions, high-level object parts and complete objects. To build our model we rely on a novel inference scheme that ensures each layer reconstructs the input, rather than just the output of the layer directly beneath, as is common with existing hierarchical approaches. This makes it possible to learn multiple layers of representation and we show models with 4 layers, trained on images from the Caltech-101 and 256 datasets. Features extracted from these models, in combination with a standard classiﬁer, outperform SIFT and representations from other feature learning approaches.

**29/06/11 14:00 - 15:00**

**ANC/DTC Seminar: Francis Bach (Host: Charles Sutton)**

**Structured sparsity and convex optimization**

The concept of parsimony is central in many scientific domains. In the context of statistics, signal processing or machine learning, it takes the form of variable or feature selection problems, and is commonly used in two situations: First, to make the model or the prediction more interpretable or cheaper to use, i.e., even if the underlying problem does not admit sparse solutions, one looks for the best sparse approximation. Second, sparsity can also be used given prior knowledge that the model should be sparse. In these two situations, reducing parsimony to finding models with low cardinality turns out to be limiting, and structured parsimony has emerged as a fruitful practical extension, with applications to image processing, text processing or bioinformatics. In this talk, I will review recent results on structured sparsity, as it applies to machine learning and signal processing.

Please note the change of venue to IF-G.07

**21/06/11 09:15 - 17:00**

**ANC Review Day**

**31/05/11 11:00 - 12:00**

**ANC/DTC Seminar: Geoff Hinton (Host: Iain Murray)**

**How to force unsupervised neural networks to discover the right representation of images**

One appealing way to design an object recognition system is to define objects recursively in terms of their parts and the required spatial relationships between the parts and the whole. These relationships can be represented by the coordinate transformation between an intrinsic frame of reference embedded in the part and an intrinsic frame embedded in the whole. This transformation is unaffected by the viewpoint so this form of knowledge about the shape of an object is viewpoint invariant. A natural way for a neural network to implement this knowledge is by using a matrix of weights to represent each part-whole relationship and a vector of neural activities to represent the pose of each part or whole relative to the viewer. The pose of the whole can then be predicted from the poses of the parts and, if the predictions agree, the whole is present. This leads to neural networks that can recognize objects over a wide range of viewpoints using neural activities that are "equivariant'' rather than invariant: as the viewpoint varies the neural activities all vary even though the knowledge is viewpoint-invariant. The ``capsules'' that implement the lowest-level parts in the shape hierarchy need to extract explicit pose parameters from pixel intensities and these pose parameters need to have the right form to allow coordinate transformations to be implemented by matrix multiplies. These capsules are quite easy to learn from pairs of transformed images if the neural net has direct, non-visual access to the transformations, as it would if it controlled them. (Joint work with Sida Wang and Alex Krizhevsky)

Please note the change of venue to Appleton Tower Lecture Theatre 3

**24/05/11 11:00 - 12:00**

**ANC/DTC Seminar: Wim Wiegerinck (Host: Chris Williams)**

**Super-modeling: combining models by synchronization**

Scientists develop computer models of real, complex systems to increase understanding of their behavior and make predictions. A prime example is the Earth's climate. Complex climate models are used to compute the climate change in response to expected changes in the composition of the atmosphere, e.g. due to greenhouse gas emissions. Years of research have improved the ability to simulate the climate of the recent past but these models are still far from perfect. The model projections of the globally averaged temperature increase by the end of this century differ by as much as a factor of two. Current practice commonly averages the predictions of the separate model runs.

Our proposed approach is instead to form a consensus by combining the models into one super-model, combining the weaknesses and strengths of each of the models in an optimized way. Results from nonlinear dynamics suggest that the models can be made to synchronize with each other even if only a small amount of information is exchanged, forming a consensus while retaining autonomous model variability.

Preliminary simulations on small scale dynamical systems suggest that such an approach may lead to a richer dynamics than would be obtained by a direct ensemble average approach.

**17/05/11 11:00 - 12:00**

**ANC/DTC Seminar: Zoubin Ghahramani (Host: Iain Murray)**

**Nonparametric Bayesian Modelling**

Because uncertainty, data, and inference play a fundamental role in the design of systems that learn, probabilistic modelling has become one of the cornerstones of the field of machine learning. Once a probabilistic model is defined, Bayesian statistics (which used to be called "inverse probability") can be used to make inferences and predictions from the model. Bayesian methods also elucidate how probabilities can be used to coherently represent degrees of belief in a rational artificial agent. Bayesian methods work best when they are applied to models that are flexible enough to capture the complexity of real-world data. Recent work on non-parametric Bayesian machine learning provides this flexibility. I will survey some of our recent work in this area, including infinite hidden Markov models for sequence modelling, the Indian buffet process for latent feature discovery, nonparametric deep sparse graphical models, and Wishart processes for covariance modelling.

**26/04/11 11:00 - 12:00**

**ANC/DTC Seminar: Albert Compte (Host: Mark van Rossum)**

**Neural circuit modeling of working memory capacity and precision limitations**

The ability to hold information in working-memory is constrained both by the amount of information to hold and by the duration of the holding interval. Understanding the neuronal basis of these limitations is important given that working-memory capacity is thought to be a key component of more complex cognitive functions. It is thought that these limitations lead to the inability to store more than a certain number of items (usually three or four, termed working memory capacity) and that the precision of memories degrades progressively with time. We use a biologically-plausible spiking network model of interacting cortical areas to study the neuronal mechanisms of capacity and precision limitations in visuospatial working memory (vsWM). As shown before, neuronal activity in the model is able to reproduce sustained activity of neurons in prefrontal and parietal cortices of monkeys performing classical vsWM tasks based on strong excitatory and inhibitory intracortical connectivity. Based on this model, we propose that a major role of dorsolateral parietal cortex in human working memory is to boost parietal memory capacity, which would be set by the level of parietal intracortical inhibition. We confirmed some predictions from the model in an fMRI study, thus showing that although memories are stored in the parietal cortex, interindividual differences in memory capacity are partly determined by the strength of prefrontal top-down control. We also use the model to analyze theoretically the behavioral effects of the number of items to hold in memory, the duration of the mnemonic intervals and the relative distances between the positions of the items on the probability of keeping an item in memory and on the precision of this memory. We confirm these model-derived behavioral effects in psychophysical studies. We further show how the model reveals mechanisms at the neuronal level that can explain apparent discrepancies in psychophysical results that address the relation between capacity and precision of vsWM.

**29/03/11 11:00 - 12:00**

**ANC/DTC Seminar: Thomas Serre (Host: Douglas Armstrong)**

**A biologically-motivated approach to computer vision**

Understanding the processing of information in our cortex is a significant part of understanding how the brain works and, in a sense, understanding Intelligence itself. In particular our visual capabilities are exceptional and despite decades of efforts in engineering, no computer algorithm has been able to match the level of performance of the primate visual system. Our visual cortex may serve as a proxy for the rest of the cortex and thus for intelligence itself.

In the talk, I will review work towards the development of a hierarchical architecture for visual recognition based on the anatomy and the physiology of the primate visual cortex. I will briefly review some of the biological evidence in favor of this class of models and show real-world applications of the system for the recognition of objects and actions. In particular, I will describe an effort to develop a system for the recognition of rodent behavior from video sequences to help automate the behavioral phenotyping of animals.

The broad thesis of this talk is that computational neuroscience is beginning to provide novel insights into the problem of how our visual cortex is computing and of how some aspects of learning and intelligence may be implemented in machines.

**22/03/11 11:00 - 12:00**

**ANC/DTC Seminar: Zhaoping Li (Host: Jim Bednar)**

**A Saliency map in the primary visual cortex --- theory and experimental test
**I will speak about the background and motivation for this theory, and introduce a model of V1 to demonstrate the theory, and the psychophysical test of the theoretical predictions. More details can be found at http://www.cs.ucl.ac.uk/staff/Zhaoping.Li/V1Saliency.html

**15/03/11 11:00 - 12:00**

**ANC/DTC Seminar: Susan Denham (Host: Mark van Rossum)**

**Auditory Scene Analysis and the Dynamics of Auditory Object Formation **

Our senses probe the physical world around us, providing our brains with a continuous stream of information from which we have learnt to extract details of the nature of our environment and the objects within it. As there are likely to be many sound-emitting objects active at any time, each generating discrete sound events, it is important to ensure that the sounds they emit are correctly associated if we are to understand their behaviour and interact with them appropriately. However, this process of segregating and grouping sound events is not straightforward, as the acoustic environment we inhabit can be very complex and grouping decisions must be made ‘on the fly’. While the correct decisions may be easy with hindsight, the problem is to try to get them right when the information needed for an optimal decision is incomplete. In this talk I will describe our recent experiments using the auditory streaming paradigm investigating the way in which the auditory system constructs, modifies and maintains dynamic representations of putative objects within the environment, and present a model which captures our current understanding of this process.

**08/03/11 11:00 - 12:00**

**ANC/DTC Seminar: Björn Kampa (Host: Mark van Rossum)**

**Cortical feed-forward networks for binding different streams of sensory information **

The cortex is an architectural masterpiece. It combines horizontal layers with vertical columns, anatomical organization with functionality. Layers contain different cell types that form different connections with each other, whereas columns are defined by the features of the information they process, like brightness of a spot in the receptive field. Over the last 50 years, the idea of the ‘functional column’ has provided dominant influence on our understanding of cortical circuits. However, recent findings have put this arrangement of computational units into question. The finer scale structure of microcolumns and sub-networks become more and more the focus of investigation. Yet, the formation and structure of these cortical assemblies is still largely unknown. I will present evidence that these sub-networks form microcircuits that are tuned for binding different streams of sensory information. Further, I will show that different learning rules exist for bottom-up or top-down pathways. The different temporal windows for plasticity induction at synapses conducting sensory or context information might lead to the assembly of highly specific cortical circuits tuned for integration of different sensory features. The development of a novel 3D network imaging technique allows now to record from visually identified neuronal networks in the intact brain during sensory stimulation. This technique has been used to study the interaction of cortical subnetworks in the supragranular layers of mouse visual cortex. Neuronal Networks can be identified that specifically encode simple gratings or more complex stimuli like noise patterns and natural scenes. We found stimulus-specific responses in locally clustered neurons. Together, fine-scale local networks of neurons might play an important role in sensory stimulus representation and computation in the cortex.

**01/03/11 11:00 - 12:00**

**ANC/DTC Seminar: Neil Lawrence (Host: Guido Sanguinetti)**

**A Unifying Probabilistic Perspective on Spectral Approaches to Dimensionality Reduction**

Spectral approaches to dimensionality reduction typically reduce the dimensionality of a data set through taking the eigenvectors of a Laplacian or a similarity matrix. Classical multidimensional scaling also makes use of the eigenvectors of a similarity matrix. In this talk we introduce a maximum entropy approach to designing this similarity matrix. The approach is closely related to maximum variance unfolding. Other spectral approaches such as locally linear embeddings and Laplacian eigenmaps also turn out to be closely related. Each method can be seen as a sparse Gaussian graphical model where correlations between data points (rather than across data features) are specified in the graph. This also suggests optimization via sparse inverse covariance techniques such as the graphical LASSO. The hope is that this unifying perspective will allow the relationships between these methods to be better understood and will also provide the groundwork for further research.

**15/02/11 10:30 - 11:30**

**ANC/DTC Seminar: Florentin Wörgötter (Host: Peggy Series)**

**Learning and Stability during Network Development**

How developing networks can gain stability in their final activity as well as in their synaptic configuration remains a big, largely unresolved puzzle. Many times such networks, for example in the cortex, attain a final state characterized by self-organized criticality. In this state, activity is similar across temporal scales and this is beneficial with respect to information flow. If subcritical, activity can die out, if supercritical epileptiform patterns may occur. We have monitored the development between 13 and 95 days in vitro of real cortical cell cultures and find four different phases, related to their morphological maturation: An initial low-activity state is followed by a supercritical and then a subcritical one until the network finally reaches stable criticality. Using network modeling we describe the dynamics of the emergent connectivity in such developing systems. Based on physiological observations, the synaptic development in the model is determined by the drive of the neurons to adjust their connectivity for reaching firing rate homeostasis. We show that this homeostasis mechanism seems to guide the developing networks through the different observed stages into final stability. Mathematical analysis of synaptic homeostasis reveals globally stable fixed points for Hebbian learning. This result may prove to some degree important as it allows arbitrarily wired networks to become stable and to store input patterns as synaptictraces.

**01/02/11 11:00 - 12:00**

**ANC/DTC Seminar: Sami Kaski (Host: Guido Sanguinetti)**

**Contextual information interfaces: machine learning for eye pattern-based proactive information retrieval
**

In proactive information retrieval the ultimate goal is to seamlessly access relevant multimodal information in a context-sensitive way. Usually explicit queries are not available or are insufficient, and the alternative is to try to infer users' interests from implicit feedback signals, such as clickstreams or eye tracking. We have studied how to infer relevance of texts and images to the user from the gaze patterns. The interests, formulated as an implicit query, can then be used in further searches. I will discuss our results in this field, including probabilistic model which can be interpreted as a kind of transfer or meta-learning, and recently a data glasses-based augmented reality interface to contextual information.

**14/01/11 12:00 - 13:00**

**ANC/DTC Seminar: Laura Dietz (Host: Charles Sutton)**

**Exploiting Graph-Structured Data in Generative Probabilistic Models**

Unsupervised machine learning aims to make predictions when labelled data is absent, and thus, supervised machine learning cannot be applied. These algorithms build on assumptions about how data and predictions relate to each other. One technique for unsupervised problem settings are generative models, which specify the set of assumptions as a probabilistic process that generates the data.

The subject of this thesis is how to most effectively exploit input data that has an underlying graph structure in unsupervised learning for three important use cases. The first use case deals with localizing defective code regions in software, given the execution graph of code lines and transitions. Citation networks are exploited in the next use case to quantify the influence of citations on the content of the citing publication. In the final use case, shared tastes of friends in a social network are identified, enabling the prediction of items from a user a particular friend of his would be interested in.

For each use case, prediction performance is evaluated via held-out test data that is only scarcely available in the domain. This comparison quantifies under which circumstances each generative model best exploits the given graph structure.