# Abstracts Research Seminar Summer Term 2013

## Sonia Petrone: Bayesian Nonparametric Inference for Hidden Markov Models: An Overview and Some New Insights

Bayesian nonparametrics has grown impressively in the recent years. One of the keys of this development is the construction of priors through a predictive approach.

In this talk, I will give an overview of some of these constructions, focussing on Markov exchangeable sequences and discussing applications in Bayesian nonparametric inference for hidden Markov models.

Some insights are given which shed light on connections between reinforced urn schemes and constructios recently proposed in the machine learning literature, such as the infinite hidden Markov model, offering a general framework for a deeper study of their theoretical properties.

## Peter Stadler: Discoveries in Genomes and Transcriptomes: Challenges in High Throughput Sequencing Data Analysis

Following the completion of the human genome project at the turn of the millennium, major technical advances in sequencing technologies have fundamentally changed much of the life sciences. Suddenly data are not scarce and expensive any more. Instead, data analysis has become the dominating and limiting component of research. At the same time, our picture of the human genome has changed profoundly. Instead of simple 'beads on string model' of well-separated individualized genes, several large-scale international survey efforts such as the ENCODE project have drawn a much more complex picture of an intricate network of intertwined transcripts.

In my presentation I will focus on some of the computational and algorithmic challenges in the field of high-throughput genomics and transcriptomics: Genome assembly and the mapping transcriptomics data to reference genomes form the basis of most subsequent analysis steps, in particular the detection of sequence variations. Large-scale sequence comparison the opens the door to answering questions about sequence evolution, and to detect functional elements, from protein coding regions to RNA secondary structures.

## Rodney Strachan: Efficient Simulation and Integrated Likelihood Estimation in Non-Linear Non-Gaussian State Space Models

(Joshua C.C. Chan & Rodney Strachan, Research School of Economics, Australian National University)

We propose a generic approach to inference in the non-linear, non-Gaussian state space model. This approach builds on recent developments in precision-based algorithms to estimating general state space models with multivariate observations and states. The baseline algorithm approximates the conditional distribution of the states by a multivariate t density, which is then used for integrated likelihood estimation via importance sampling or for posterior simulation using Markov chain Monte Carlo (MCMC). We build further upon this baseline approach to consider more sophisticated algorithms such as accept-reject Metropolis-Hasting and variational approximation. To illustrate the proposed approach, we estimate the risk of a liquidity trap in the US under a time-varying parameter vector autoregressive (TVP-VAR) model with stochastic volatility.

## John Geweke: Adaptive Sequential Posterior Simulators for Massively Parallel Computing Environments

(Authors: Garland Durham, Quantos Analytics, LLC and John Geweke, Economics Discipline Group, UTS Business School, University of Technology, Sydney)

Massively parallel desktop computing capabilities now well within the reach of individual academics modify the environment for posterior simulation in fundamental and potentially quite advantageous ways. But to fully exploit these benefits algorithms that conform to parallel computing environments are needed. Sequential Monte Carlo comes very close to this ideal whereas other approaches like Markov chain Monte Carlo do not. This paper presents a sequential posterior simulator well suited to this computing environment. The simulator makes fewer analytical and programming demands on investigators, and is faster, more reliable and more complete than conventional posterior simulators. The paper extends existing sequential Monte Carlo methods and theory to provide a thorough and practical foundation for sequential posterior simulation that is well suited to massively parallel computing environments. It provides detailed recommendations on implementation, yielding an algorithm that requires only code for simulation from the prior and evaluation of prior and data densities and works well in a variety of applications representative of serious empirical work in economics and finance. The algorithm is robust to pathological posterior distributions, generates accurate marginal likelihood approximations, and provides estimates of numerical standard error and relative numerical efficiency intrinsically. The paper concludes with an application that illustrates the potential of these simulators for applied Bayesian inference.

## Alan Agresti: Modeling Ordinal Categorical Data

This overview lecture surveys methods for analyzing categorical response variables that have a natural ordering of the categories. Such data often occur in the social sciences (e.g., for measuring attitudes and opinions) and in medical and public health disciplines (e.g., pain, quality of life, severity of a condition). Topics to be covered include logistic regression models using cumulative logits with proportional odds structure, other ordinal logistic regression models, and other multinomial response models (e.g., cumulative probit). Examples shown use SAS and R software. The presentation emphasizes interpretation of the methods rather than technical details, with examples including randomized clinical trials and social survey data.

The lecture will take material from the book, “Analysis of Ordinal Categorical Data” by Alan Agresti (2nd ed., Wiley, 2010).

## Alan Agresti: Some Remarks on Latent Variable Models in Categorical Data Analysis

This talk presents a historical overview of some important and/or interesting contributions to the latent variable literature for the analysis of multivariate categorical responses. There is by now an enormous literature on latent variable models for categorical responses, so the presentation is necessarily quite selective. As part of the presentation, I'll briefly summarize some current work on a way to summarize evidence supporting a particular latent structure for contingency tables with ordinal responses, and I'll raise a couple of questions that may suggest future research work.

## Cristiano Varin: The Ranking Lasso

Ranking a vector of alternatives on the basis of a series of paired comparisons is a relevant topic in many instances. A popular example is ranking contestants in sport tournaments. To this purpose, paired comparison models such as the Bradley-Terry model are often used. In this talk, I will discuss fitting paired comparison models with a lasso-type procedure that forces contestants with similar abilities to be classified into the same group. Benefits of the proposed method are easier interpretation of rankings and a significant improvement of the quality of predictions with respect to the standard maximum likelihood fitting. The proposed fitting method poses non-trivial computational difficulties that will be discussed in detail. The methodology is illustrated through ranking of teams in sport competitions and ranking of statistical journals based on citations exchange.

This talk is based on joint works with Guido Masarotto and with Manuela Cattelan and David Firth.

## Alan Agresti: Good Confidence Intervals for Categorical Data Analyses

This talk surveys confidence intervals that perform well for estimating parameters used in categorical data analysis. Considerable research has now shown that intervals resulting from inverting score tests perform much better than inverting Wald tests and often better than inverting likelihood-ratio tests. For some models, ordinary score-test-based inferences are impractical, such as when the likelihood function is not an explicit function of the model parameters. For such cases, we propose pseudo-score inference based on a Pearson-type chi-squared statistic. For small samples, “exact” methods are conservative inferentially, but inverting a score test using the mid-P value provides a sensible compromise. Finally, we briefly review a different pseudo-score approach that approximates the score interval for proportions and their differences with independent or dependent samples by adding pseudo data before forming simple Wald confidence intervals.

## Alexander McNeil: Copula Families that Generalise the Archimedean Class

The standard examples of Archimedean copulas arise from multivariate shared frailty models in survival analysis. We will look at the connection between Archimedean copulas and simplex distributions and then show various ways of generalising the Archimedean family to obtain richer dependence models for real-world applications.

This talk is based on joint work with Johanna Neslehova and Marius Hofert.

## Kemal Dinçer Dingeç: New Control Variates for Levy Processes and Asian Options

We present two new variance reduction methods for Levy processes and Asian options. The first method is a general control variate technique for Monte Carlo estimation of the expectations of the functionals of Levy processes. It is based on fast numerical inversion of the cumulative distribution functions and exploits the strong correlation between the increments of the original process and Brownian motion. In the suggested control variate framework, a similar functional of Brownian motion is used as a main control variate while some other characteristics of the paths are used as auxiliary control variates. The method is applicable for all types of Levy processes for which the probability density function of the increments is available in closed form. We present the applications of our general approach to the simulation of path dependent options. Numerical experiments confirm that our method achieves considerable variance reduction.

The second variance reduction method is suggested for Asian options under a general model framework. The three special cases we consider are Levy processes, Heston stochastic volatility and regime switching models. The proposed method is a combination of control variate and conditional Monte Carlo techniques. While the control variate can be used for any model allowing the numerical computation of the multivariate characteristic function of the log-return vector, conditional Monte Carlo is based on the unified representation of the three models. Computational results confirm that the new method performs better than the other available control variate methods.