# Bayes-250 Programme

All events will be held on the fourth floor of the Informatics Forum, except for the public lecture on Monday.

MONDAY 5 SEPTEMBER
13:00 - 13:50 Gathering in Mini-Forum 2 for tea and coffee (MF2 is the lounge on the fourth floor)
13:50 - 15:00 Sparse Nonparametric Bayesian Learning from Big Data
David Dunson, Duke University
15:00 - 15:30 Tea break
15:30 - 16:00 Classification Models and Predictions for Ordered Data
Chris Holmes, Oxford University
16:00 - 16:30 Bayesian Variable Selection in Markov Mixture Models
Luigi Spezia, Biomathematics & Statistics Scotland, Aberdeen
slides
16:30 - 17:00 Bayesian inference for partially observed Markov processes, with application to systems biology
Darren Wilkinson, University of Newcastle
slides
17:00 - 17:30 Break
17:30 - 18:30 PUBLIC LECTURE, Appleton Tower, Lecture Theatre 5
How To Gamble If You Must (courtesy of the Reverend Bayes)
David Spiegelhalter, University of Cambridge
Video
18:30 - 19:30 Reception (Informatics Forum, MF2, 4th floor)
TUESDAY 6 SEPTEMBER
9:00 - 9:30 Coherent Inference on Distributed Bayesian Expert Systems
Jim Smith, University of Warwick
slides
9:30 - 10:00 Probabilistic Programming
John Winn, Microsoft Research
slides
10:00 - 10:30 Inference and computing with decomposable graphs
Peter Green, University of Bristol
slides
10:30 - 11:00 Tea break
11:00 - 11:30 Nonparametric Bayesian Models for Sparse Matrices and Covariances
Zoubin Gharamani, University of Cambridge
slides
11:30 - 12:00 Latent Force Models
Neil Lawrence, University of Sheffield
slides mp3
12:00 - 13:30 LUNCH
13:30 - 14:00 Does Bayes Theorem Work?
Michael Goldstein, Durham University
slides
14:00 - 14:30 Bayesian Priors in the Brain
Peggy Series, University of Edinburgh
14:30 - 15:00 Approximate Bayesian Computation for model selection
Christian Robert, Université Paris-Dauphine
slides
15:00 - 15:30 Tea break
15:30 - 16:00 ABC-EP: Expectation Propagation for Likelihood-free Bayesian Computation
Nicholas Chopin, ENSAE
slides
16:00 - 17:00 Bayes at Edinburgh University - a talk and tour
Dr Andrew Fraser, Honorary Fellow, University of Edinburgh
Followed by tour of Old College, University of Edinburgh, where Bayes studied
WEDNESDAY 7 SEPTEMBER
9:00 - 9:30 Intractable likelihoods and exact approximate MCMC algorithms
Christophe Andrieu, University of Bristol
9:30 - 10:00 Bayesian computational methods for intractable continuous-time non-Gaussian time series
Simon Godsill, University of Cambridge
10:00 - 10:30 Efficient MCMC for Continuous Time Discrete State Systems
Yee Whye Teh, Gatsby Computational Neuroscience Unit, University College London
slides
10:30 - 11:00 Tea break
11:00 - 11:30 Adaptive Control and Bayesian Inference
Carl Rasmussen, University of Cambridge
11:30 - 12:00 Bernstein - von Mises theorem for irregular statistical models
Natalia Bochkina, University of Edinburgh
How To Gamble If You Must (courtesy of the Reverend Bayes)

(evening lecture, open to general public)

When the Reverend Thomas Bayes died in 1761 he left behind two revolutionary ideas: expressing our uncertainty about current or future states of the world as a probability distribution, and how to revise our probabilities in the light of experience.

The talk will include some selected modern applications of these concepts, such as catching doping athletes, predicting volcanic eruptions, gambling and weather forecasting. The speaker will also be checking whether the audience knows how ignorant they are.

Sparse Nonparametric Bayesian Learning from Big Data

In modern applications, data sets tend to be big and highly structured, with large p, small n problems commonly encountered. In such settings, sparse representations of the data are crucial and there is a rich frequentist literature focused on inducing sparsity through penalization (typically L1). Motivated by genetic epidemiology and imaging applications, we instead develop nonparametric Bayesian methods that avoid parametric assumptions while favoring low-dimensional representations of complex high-dimensional data. In this talk, the particular focus is on Bayesian probabilistic tensor factorizations, which generalize low rank matrix factorizations, such as SVD, to higher orders. The framework accommodates general joint modeling of object data of different types (images, text, categorical, real, etc) but for simplicity we focus on two applications: (1) high-dimensional multivariate categorical data analysis (contingency tables); (2) estimation of lower dimensional manifolds from point cloud data. In the contingency table case, we propose a collapsed Tucker factorization and develop associated methods for testing of associations and interactions in huge sparse tables. In the manifold learning case, we propose a tensor product of basis functions for estimating 3d closed surfaces. In both settings, theoretical results are provided on large support and asymptotic properties & efficient computational methods are developed, which scale to large data sets.

joint work with Anirban Bhattacharya & Debdeep Pati

Classification Models and Predictions for Ordered Data

We are interested in classification models for ordered data, $\{y_{c(i)}, x_{c(i)}\}_{i=1}^n$ where the class label $y_{c(i)}$ shows local dependence on the index, $c(\cdot)$, of the sequence as well as on $x_{c(i)}$. For example, in time series $c(i)$ might index time of measurement, or in genomics $c(i)$ might refer to the genetic location of the measurement on a genome. State-space models for $Pr(y_{c(i)} \mid x_{c(i)})$ are useful here, with say Markov dependence in the states $Pr(y_{c(i)} \mid y_{c(i-1)}, y_{c(i+1)})$. In this talk we shall briefly review the use of subjective Bayesian (Hidden) Markov Models for this task and the problems that arise in making predictions. This motivates the use of computationally tractable predictive loss functions for which the prediction of maximum expected utility, minimum expected loss, can be enumerated exactly in reasonable computation time. The methods are motivated and illustrated by on-going studies into analysis of copy-number-variation in the human and cancer genomes where n is typically > 20,000.

Bayesian Variable Selection in Markov Mixture Models

Several Bayesian methods have been proposed in the recent years to perform variable selection in the case of linear and generalized linear models. We review three of the most popular algorithms and show how they can be adapted to two models belonging to the class of Markov mixture models (MMM), i.e. the non-homogeneous hidden Markov models (NHHMMs) and the Markov switching autoregressive models with covariates (MSARMs). We also propose a new method in order to efficiently tackle the variable selection issue, both when the complexity of the model is high, as in MMM, and when the exogenous variables are strongly correlated. Numerical comparisons of the competing methods are presented via simulation examples. Finally, three ecological and environmental applications will be presented: the mapping of the species distribution of freshwater pearl mussels in a river through Bernoulli NHHMMs; the analysis of the dynamics of two stream isotopes and an air pollutant through non-homogeneous MSARMs. The talk is based on joint works with Roberta Paroli (Catholic University of Milan), Mark Brewer (Biomathematics & Statistics Scotland), Susan Cooksley (The James Hutton Institute), Christian Birkel (University of Aberdeen).

Bayesian inference for partially observed Markov processes, with application to systems biology

Within the field of systems biology there is increasing interest in developing computational models which simulate the dynamics of intra-cellular biochemical reaction networks and incorporate the stochasticity inherent in such processes. These models can often be represented as nonlinear multivariate Markov processes. Analysing such models, comparing competing models and fitting model parameters to experimental data are all challenging problems. This talk will provide an overview of a Bayesian approach to the problem. Since the models are typically intractable, use will be made of algorithms exploiting forward simulation from the model in order to render the analysis "likelihood free". There have been a number of recent developments in the literature relevant to this problem, involving a mixture of sequential and Markov chain Monte Carlo methods. Particular emphasis will be placed on the problem of Bayesian parameter inference for the rate constants of stochastic biochemical network models, using noisy, partial high-resolution time course data, such as that obtained from single-cell fluorescence microscopy studies.

Coherent Inference on Distributed Bayesian Expert Systems

It is becoming increasingly necessary for different probabilistic expert systems to be networked together. Different collections of domain experts must independently specify their judgments within each component system and update these in the light of the data they receive. But in these circumstances what overarching beliefs must the collective agree and what tyopes of data can be admitted in the system so that the collective acts as if it were a single Bayesian? In this talk I will explore these issues and illustrate the main technical problems through discussing some simple examples.

Probabilistic Programming

Bayesian methods can be difficult and time-consuming to implement correctly, particularly when working with complex statistical models. The result is that many developers resort to simpler, less powerful techniques. Furthermore, this difficulty limits what problems Bayesian inference can be used to solve—since even the most capable statistical developer will struggle to tackle models above a certain complexity.

Probabilistic programming aims to address both of these problems, by allowing the statistical model to be described in the form of a program and then automatically applying Bayesian inference to "execute" the program. At Microsoft Research Cambridge, we have been working since 2004 on a Bayesian inference engine called Infer.NET, which is capable of executing a wide variety of probabilistic programs. In this talk, I will give an overview of probabilistic programming and describe how Infer.NET works and what it has been used for. I will also suggest some directions that probabilistic programming may take to bring the power of Bayesian methods to an unprecedentedly wide audience.

Inference and computing with decomposable graphs

This work, joint with Alun Thomas (Utah), is a contribution to inference about structure in multivariate distributions through graphical modelling, and in particular, the use of decomposable undirected graphs. Unless there are very few variables, model spaces are huge, even with the restriction to decomposable graphs, and Bayesian inference poses a computational challenge. I will discuss some recent results on the enumeration of junction trees representing a given decomposable graph, and on local perturbations to junction trees that preserve decomposability. These results are put to use in the construction of a Markov chain sampler on junction trees that can be used to compute joint inference about structure and parameters in graphical models on quite a large scale.

Nonparametric Bayesian Models for Sparse Matrices and Covariances

Bayesian nonparametrics provides an elegant framework for developing flexible models. Much work has been done on nonparametric models of distributions (e.g. Dirichlet processes) and functions (e.g. Gaussian processes). I will focus on some of our recent work on modelling sparse matrices and graph structures (via the Indian Buffet Process), and on models of covariance matrices and operators (via the Wishart process).

Latent Force Models

Physics based approaches to data modeling involve constructing an accurate mechanistic model of data, often based on differential equations. Statistical and machine learning approaches are typically data driven---perhaps through regularized function approximation. These two approaches to data modeling are often seen as polar opposites, but in reality they are two different ends to a spectrum of approaches we might take. Physics based approaches can be seen as \emph{strongly mechanistic}, the mechanistic assumptions are hard encoded into the model. Data-driven approaches do incorporate assumptions that might be seen as being derived from some underlying mechanism, such as smoothness. In this sense they are \emph{weakly mechanistic}.

In this talk we introduce latent force models. Latent force models are a new approach to data representation that model data through unknown forcing functions that drive differential equation models. By treating the unknown forcing functions with Gaussian process priors we can create probabilistic models that exhibit particular physical characteristics of interest, for example, in dynamical systems resonance and inertia. This allows us to perform a synthesis of the data driven and physical modeling paradigms. A \emph{moderately mechanistic} approach. We will show applications of these models in systems biology and (given time) modelling of human motion capture data.

Does Bayes Theorem Work?

An examination of the rationale for the use of Bayes Theorem and an assessment of its strengths and limitations.

Bayesian Priors in the Brain

A growing idea in computational neuroscience is that perception and cognition can be successfully described using Bayesian inference models and that the brain is 'Bayes-optimal' under some constraints. I will briefly review the experimental evidence supporting this idea, as well as the ‘Bayesian coding hypothesis’: that the brain represents sensory information probabilistically, in the form of probability I will then present studies from my lab showing how perceptual expectations, which can be modelled modeled as Bayesian priors, influence perception in the form of biases and hallucinations, and how they can be learned and unlearned through statistical learning.

Approximate Bayesian Computation for model selection

Approximate Bayesian computation (ABC), also known as likelihood-free methods, have become a standard tool for the analysis of complex models, primarily in population genetics but also for complex financial models. The development of new ABC methodology is undergoing a rapid increase in the past years, as shown by multiple publications, conferences and even softwares. While one valid interpretation of ABC based estimation is connected with nonparametrics, the setting is quite different for model choice issues. We examined in Grelaud et al. (2009) the use of ABC for Bayesian model choice in the specific of Gaussian random fields (GRF), relying on a sufficient property only enjoyed by GRFs to show that the approach was legitimate. Despite having previously suggested the use of ABC for model choice in a wider range of models in the DIY ABC software (Cornuet et al., 2008), we present in Robert et al. (arxiv:1102.4432) theoretical evidence that the general use of ABC for model choice is fraught with danger in the sense that no amount of computation, however large, can guarantee a proper approximation of the posterior probabilities of the models under comparison. This work shows as a corollary that GRFs are the most natural exception to this lack of convergence.

ABC-EP: Expectation Propagation for Likelihood-free Bayesian Computation

Many statistical models of interest to the natural and social sciences have no tractable likelihood function. Until recently, Bayesian inference for such models was thought infeasible. Pritchard et al. (1999) introduced an algorithm known as ABC, for Approximate Bayesian Computation, that enables Bayesian computation in such models. Despite steady progress since this rst breakthrough, such as the adaptation of MCMC and Sequential Monte Carlo techniques to likelihood-free inference, state-ofthe-art methods remain hard to use and require enormous computation times. Among other issues, one faces the di cult task of nding appropriate summary statistics for the model, and tuning the algorithm can be time-consuming when little prior information is available. We show that Expectation Propagation, a widely successful approximate inference technique, can be adapted to the likelihood-free context. The resulting algorithm does not require summary statistics, is an order of magnitude faster than existing techniques, and remains usable when prior information is vague. (joint work with Simon Barthelmé)

Bayesian computational methods for intractable continuous-time non-Gaussian time series

In this talk I will discuss recently developed methods for inference in continuous time state-space models which are characterised by strong elements of non-Gaussianity. The focus will be on heavy-tailed processes with jumps as are commonly encountered in many applications including finance and tracking. In the first part of the talk I will describe Monte Carlo methods for jump-diffusions, in which an (almost surely) finite number of jumps is present in any finite time interval. In these scenarios we present powerful particle filtering methods that treat jump times directly as part of the unknown state of the system - the Variable Rate Particle Filter. In the second part of the talk the methods are extended to processes where an infinite number of jumps is (almost surely) present in finite time intervals. Here we focus on $\alpha$-stable L\'{e}vy processes and apply some remarkable theorems about these processes which make them tractable for Monte Carlo analysis, both in static inference using MCMC and in evolving scenarios using particle filtering.

Efficient MCMC for Continuous Time Discrete State Systems

A variety of phenomena are best described using dynamical models which operate on a discrete state space and in continuous time. Examples include Markov jump processes, continuous time Bayesian networks, renewal processes and other point processes, with applications ranging from systems biology, genetics, computing networks and human-computer interactions. Posterior computations typically involve approximations like time discretization and can be computationally intensive. In this talk I will describe our recent work on a class of Markov chain Monte Carlo methods that allow efficient computations while still being exact. The core idea is to use an auxiliary variable Gibbs sampler based on uniformization, a representation of a continuous time dynamical system as a Markov chain operating over a discrete set of points drawn from a Poisson process. Joint work with Vinayak Rao.

Humans are able to learn complex motor coordination incredibly rapidly. Various undesirable properties of human hardware, such as fatigue, slow actuators, noisy perception and slow and noisy neural pathways etc, are more than compensated for by the astounding flexibility and adaptivity underlying learning and adaptation of motor commands.

In this talk, I will demonstrate motor learning in both simple and more complex mechanical systems. Learning is done by Bayesian inference and it is shown how extremely rapid learning can be achieved from essentially no prior information. The crucial aspects of the algorithm is the flexibility of the non-parametric models, the quantification of all types of uncertainty, and the integration over the posterior distribution without which the algorithm fails to learn.

This is joint work with Marc Deisenroth and Philipp Hennig.

Bernstein - von Mises theorem for irregular statistical models

Bernstein - von Mises theorem states that for regular statistical models, namely if the true parameter value is an interior point of the parameter space and Fisher's information matrix at this point is of full rank, then, as the sample size grows, the appropriately rescaled posterior distribution converges to a Gaussian distribution independent of the prior, provided the prior distribution is continuous at the true parameter value.

However, for an irregular statistical model where the above assumptions do not hold, the posterior distribution exhibits a different behaviour. We study the limit of the appropriately rescaled posterior distribution in the case where the Fisher's information matrix is not of full rank, and where the point of concentration of the posterior distribution may lie on the boundary of the parameter space, and show that in the latter case it may differ from Gaussian distribution.

This study is illustrated by several examples including emission tomography modelled as an ill-posed linear inverse problem with Poisson errors.