Personal tools
You are here: Home Events ANC Workshop: Simao Eduardo and Chris Williams, Chair: David Sterratt

ANC Workshop: Simao Eduardo and Chris Williams, Chair: David Sterratt

— filed under:

  • ANC Workshop Talk
When Feb 12, 2019
from 11:00 AM to 12:00 PM
Where IF 4.31/33
Add event to calendar vCal

Chris Williams

Autoencoders and Probabilistic Inference with Missing Data: An Exact Solution for The Factor Analysis Case


Latent variable models can be used to probabilistically “fill-in”

missing data entries.  The variational autoencoder architecture (Kingma and Welling, 2014; Rezende et al., 2014) includes a “recognition” or “encoder” network that infers the latent variables given the data variables. However, it is not clear how to handle missing data variables in this network. The factor analysis (FA) model is a basic autoencoder, using linear encoder and decoder networks. We show how to calculate exactly the latent posterior distribution for the FA model in the presence of missing data, and note that this solution exhibits a non-trivial dependence on the pattern of missingness. We also discuss various approximations to the exact solution. Experiments compare the effectiveness of various approaches to imputing the missing data


Joint work with Charlie Nash and Alfedo Nazabal



Simao Eduardo


Self-Cleaning VAE: Robust Variational Autoencoders for Mixed-Type Data
Variational Autoencoders (VAE) have been successfully applied to datasets that span from images to tabular data, 
e.g. UCI repository. However, there is a plethora of real world datasets that are corrupted by noise, making them
unsuitable for certain tasks like model training. In addition, sometimes the objective is to obtain a clean dataset (repair)
or remove the outliers from it (detection). Available models for this task may have one of several drawbacks: need of 
clean subset of data to train; dirty-clean pairs to train; not easily understood hyper-parameters; outlier detection 
granularity is at instance level rather than feature level; does not model mixed-types, e.g. categorical and real features.
In this ongoing project, our aim is to provide a fully unsupervised generative model that focuses on modelling the 
inliers of the dataset (robust), directly training on dirty instances. We provide a probabilistic framework for mixed-type 
datasets, which also enables cell-wise (feature) outlier detection and repair. Our robust VAE (RVAE) outperforms 
standard VAE in several corruption scenarios.

Joint work with Alfredo Nazabal, Chris Williams and Charles Sutton.