\documentclass{article}
\usepackage{a4wide, url}
\newcommand{\code}[1]{\texttt{#1}}
\newcommand{\file}[1]{\textsf{#1}}
\title{R Code for Cutting Feedback in Bayesian Models}
\author{Martyn Plummer}
\date{August 29, 2014}
\begin{document}
\maketitle
This appendix contains the R code to reproduce the examples in the
article ``Cutting Feedback in Bayesian Models''. In order to run the
examples, you must download the file \file{Cuts.Rnw}, which contains
the Sweave source code, and from within R run
\begin{verbatim}
> Stangle("Cuts.Rnw")
\end{verbatim}
This will produce an R script \file{Cuts.R}. The PDF document can be
reproduced by running, from within R
\begin{verbatim}
> Sweave("Cuts.Rnw")
\end{verbatim}
and then processing the resulting LaTeX files with
\begin{verbatim}
pdflatex Cuts
\end{verbatim}
Repeat the \code{pdflatex} command to get the correct cross-references.
There are two sections to this appendix. The first using OpenBUGS, via
the \code{BRugs} package, and shows how the naive cut algorithm fails
to converge. The second part uses custom R code and demonstrates the
tempered cut algorithm.
This appendix uses the R packages \code{BRugs}, \code{coda}, and
\code{RColorBrewer}, which can all be downloaded from CRAN
(\url{http://cran.r-project.org}).
<>=
library(BRugs)
library(coda)
library(RColorBrewer)
@
BRugs 0.8-3, which contains a copy of OpenBUGS 3.2.2 was used for
the analysis in the paper. It can be run on Windows or on x86 Linux.
\section{Testing the WinBUGS cut function for different samplers}
First we introduce the data for the problem. In BUGS code, it is
possible to give more descriptive names to the data than in the
mathematical notation used in the article. If you wish to
cross-reference with the article, use the symbols used in brackets:
\code{Npart} ($N$) is the number of participants in the HPV prevalence
survey; \code{nhpv} ($Z$) is the number of high-risk HPV-positive
participants; \code{Npop} ($T$) is the population size covered by the
cancer registry corresponding to the area of the survey; \code{ncases}
($Y$) is the number of cases of cervical cancer.
<>=
nhpv <- c(7, 6, 10, 10, 1, 1, 10, 4, 35, 0, 10, 8, 4)
Npart <- c(111, 71, 162, 188, 145, 215, 166, 37, 173,
143, 229, 696, 93)
ncases <- c(16, 215, 362, 97, 76, 62, 710, 56, 133,28, 62, 413, 194)
Npop <- c(26983, 250930, 829348, 157775, 150467, 352445, 553066,
26751, 75815, 150302, 354993, 3683043, 507218)
@
The model is a Poisson regression model for cervical cancer cases,
depending on the unobserved prevalence of HPV in the population. The
prevalence is estimated using a binomial model for the number of
HPV-positive women. The full Bayesian model, shown below, is
contained in the file \file{FullModel.bug}.
\begin{verbatim}
model {
## Disease model
for (j in 1:13) {
ncases[j] ~ dpois(mean[j])
log(mean[j]) <- theta1 + phi[j] * theta2 + log(Npop[j] * 1.0E-3)
}
theta1 ~ dnorm(0, 1.0E-3)
theta2 ~ dnorm(0, 1.0E-3) I(0,)
## Measurement model
for (j in 1:13) {
nhpv[j] ~ dbin(phi[j], Npart[j])
}
## Exposure model
for (j in 1:13) {
phi[j] ~ dunif(0, 1)
}
}
\end{verbatim}
<>=
fullmodel <- c(
"model {",
"",
" ## Disease model",
" for (j in 1:13) {",
" ncases[j] ~ dpois(mean[j])",
" log(mean[j]) <- theta1 + phi[j] * theta2 + log(Npop[j] * 1.0E-3) ",
" }",
" ",
" theta1 ~ dnorm(0, 1.0E-3)",
" theta2 ~ dnorm(0, 1.0E-3) I(0,)",
"",
" ## Measurement model",
" for (j in 1:13) {",
" nhpv[j] ~ dbin(phi[j], Npart[j])",
" }",
"",
" ## Exposure model",
" for (j in 1:13) {",
" phi[j] ~ dunif(0, 1)",
" }",
"",
"}")
writeLines(fullmodel, "FullModel.bug")
@
The cut model is contained in the file \file{CutModel.bug} and is
shown below. The cut model differs from the full model by the addition
of new nodes \code{p.cut} depending on \code{p} via the WinBUGS
\code{cut} function.
\begin{verbatim}
model {
## Disease model
for (j in 1:13) {
ncases[j] ~ dpois(mean[j])
log(mean[j]) <- theta1 + p.cut[j] * theta2 + log(Npop[j] * 1.0E-3)
}
theta1 ~ dnorm(0, 1.0E-3)
theta2 ~ dnorm(0, 1.0E-3)
## Cuts
for (j in 1:13) {
p.cut[j] <- cut(phi[j])
}
## Measurement model - below the cut
for (j in 1:13) {
nhpv[j] ~ dbin(phi[j], Npart[j])
}
## Exposure model - below the cut
for (j in 1:13) {
phi[j] ~ dunif(0, 1)
}
}
\end{verbatim}
<>=
cutmodel <- c(
"model {",
"",
" ## Disease model",
" for (j in 1:13) {",
" ncases[j] ~ dpois(mean[j])",
" log(mean[j]) <- theta1 + p.cut[j] * theta2 + log(Npop[j] * 1.0E-3) ",
" }",
"",
" theta1 ~ dnorm(0, 1.0E-3)",
" theta2 ~ dnorm(0, 1.0E-3)",
"",
" ## Cuts",
" for (j in 1:13) {",
" p.cut[j] <- cut(phi[j])",
" }",
"",
" ## Measurement model - below the cut",
" for (j in 1:13) {",
" nhpv[j] ~ dbin(phi[j], Npart[j])",
" }",
"",
" ## Exposure model - below the cut",
" for (j in 1:13) {",
" phi[j] ~ dunif(0, 1)",
" }",
"",
"}")
writeLines(cutmodel, "CutModel.bug")
@
The cut model is fitted using functions from the \code{BRugs} package,
wrapped up inside the \code{fit} function. This takes three arguments:
\code{model}, a string giving the name of the file with the BUGS model
code; \code{N.iterations}, which is both the number of iterations to
keep as samples and the number of iterations to discard as burn-in;
and \code{thin}, the thinning interval when sampling. The \code{fit}
function runs two parallel chains with starting values automatically
generated by OpenBUGS by sampling from the prior.
<>=
fit <- function(model, N.iterations, thin)
{
data <- c("nhpv","Npart","ncases","Npop")
modelCheck(model)
modelData(bugsData(data))
modelCompile(2)
modelGenInits()
modelUpdate(N.iterations, thin=thin)
samplesSet("theta2")
modelUpdate(N.iterations, thin=thin)
buildMCMC("theta2")
}
@
To compare different updating (i.e. sampling) methods, we disable
all potential samplers for \code{theta1} and \code{theta2}. This is done
below using the \code{modelDisable} function from the \code{Brugs}
package. The list of possible samplers is obtained from an interactive
OpenBUGS session. When the model is compiled, the updaters in use can
be inspected using the \code{Info} menu in the GUI and selecting
\code{Updaters(by name)}. Furthermore, updaters can be turned off by
the user before compiling the model using the \code{Updater
options...} dialogue under the \code{Model} menu. By inspecting the
updaters in use for \code{theta1} and \code{theta2}, disabling them, and
then recompiling the model to reveal the next available sampler, we
obtain the list of all possible samplers.
<>=
modelDisable('logit/log-linear block glm')
modelDisable('logit/log-linear rejection')
modelDisable('adaptive metropolis 1D')
modelDisable('adaptive acceptance rate')
modelDisable('state dependent scale')
modelDisable('slice')
@
With all possible samplers disabled, the cut model can no longer be
compiled in OpenBUGS.
Now we re-enable each sampler in turn and fit the model with the
\code{fit} function. We compare only the first three methods.
Although the OpenBUGS 3.2.2 GUI allows the ``adaptive acceptance
rate'' and ``state dependent scale'' samplers, BRugs 0.8-3 does
not. In addition, the ``slice'' updater does not converge for the cut
model.
The example in the article is derived from a long run of 50000
iterations in order to ensure convergence of the two chains. Here for
the sake of time we run only 1000 iterations.
<>=
updaters <- c('logit/log-linear block glm',
'logit/log-linear rejection',
'adaptive metropolis 1D')
cut.samples <- vector("list", 3)
for (i in 1:3) {
modelEnable(updaters[i])
cut.samples[[i]] <- fit("CutModel.bug", N.iterations=1000, thin=20)
modelDisable(updaters[i])
}
@
Figure \ref{fig:cut} is created using the \code{plot.samples} function:
<>=
plot.samples <- function(s)
{
ymax <- sapply(s, function(x) { max(density(unlist(x))$y) })
xrange <- sapply(s, function(x) { quantile(unlist(x), c(0.002, 0.998)) })
plot(NA, xlim=range(xrange), ylim=c(0, max(ymax)), ylab="density",
xlab=expression(theta[2]))
pal <- brewer.pal(3, name="Dark2")
for (i in 1:3) {
for (j in 1:2) {
lines(density(s[[i]][[j]]), col=pal[i])
}
}
legend("topright", col=pal, lty=1, lwd=2,
legend=c("log-linear block glm",
"log-linear rejection",
"adaptive metropolis 1D"))
}
@
\begin{figure}[p]
\begin{center}
<>=
plot.samples(cut.samples)
@
\end{center}
\caption{Estimates for the cut model}
\label{fig:cut}
\end{figure}
\clearpage
\section{Tempered cut algorithm}
We now introduce custom R code r running the tempered cut algorithm on
the same model. Under the cut model, the prevalence parameters
\code{p} have independent conjugate theta2 distributions and so are easy
to sample. The \code{update.p} function returns a new sample of
\code{p}.
<>=
update.p <- function(nhpv, Npart) {
shape1 <- 1 + nhpv
shape2 <- 1 + (Npart - nhpv)
rbeta(length(shape1), shape1, shape2)
}
@
To update the \code{theta1} and \code{theta2} parameters, we use a
random walk Metropolis algorithm. In a minor change from the OpenBUGS
model, we use an improper flat prior for these parameters. Thus the
Metropolis-Hastings rejection ratio \code{R} depends only on
the likelihood ratio between the current and proposed values. The
log-likelihood is calculated by the \code{poisson.loglik} function
<>=
poisson.loglik <- function(Npop, ncases, theta1, theta2, p)
{
mean <- Npop * 1.0E-3 * exp(theta1 + p * theta2)
sum(dpois(ncases, mean, log=TRUE))
}
@
The \code{update.theta1} function applies a single random-walk update
to \code{alkpha} returns the next sample for \code{theta1} (which is
the same as the current sample if the update is rejected). The step
size of the random walk is determined by the argument \code{STEP}.
<>=
update.theta1 <- function(Npop, ncases, theta1, theta2, p, STEP)
{
logdensity <- function(a) {
poisson.loglik(Npop, ncases, a, theta2, p)
}
logdensity0 <- logdensity(theta1)
theta1.new <- theta1 + rnorm(1, mean=0, sd=STEP)
logdensity1 <- logdensity(theta1.new)
R <- exp(logdensity1 - logdensity0)
if (runif(1) < R) theta1.new else theta1
}
@
The \code{update.theta2} function is almost identical, but returns the
next sample of \code{theta2}.
<>=
update.theta2 <- function(Npop, ncases, theta1, theta2, p, STEP)
{
logdensity <- function(b) {
poisson.loglik(Npop, ncases, theta1, b, p)
}
logdensity0 <- logdensity(theta2)
theta2.new <- theta2 + rnorm(1, mean=0, sd=STEP)
logdensity1 <- logdensity(theta2.new)
R <- exp(logdensity1 - logdensity0)
if (runif(1) < R) theta2.new else theta2
}
@
Putting these together, the \code{sample.temper} function returns
sampled values of \code{theta1} and \code{theta2} for \code{niter}
iterations using the tempered sampler. Each iteration contains an
inner loop in which the value of \code{p} is moved in steps from its
old to its new value. The number of steps is controlled by the
\code{ntemper} parameter, and its default value of 1 corresponds to
the naive cut algorithm.
<>=
sample.temper <- function(niter, step.theta1=0.1, step.theta2=1, ntemper=1)
{
theta1 <- rnorm(1, mean=-1.66, sd=0.344)
theta2 <- rnorm(1, mean=13.05, sd=3.39)
p <- rep(13, 0.5)
theta1.samples <- theta2.samples <- numeric(niter)
for (i in 1:niter) {
p.old <- p
p.new <- update.p(nhpv, Npart)
for (t in 1:ntemper) {
pt <- ((ntemper - t) * p.old + t * p.new) / ntemper
theta1 <- update.theta1(Npop, ncases, theta1, theta2, pt, step.theta1)
theta2 <- update.theta2(Npop, ncases, theta1, theta2, pt, step.theta2)
}
p <- p.new
theta1.samples[i] <- theta1
theta2.samples[i] <- theta2
}
list("theta1" = theta1.samples, "theta2" = theta2.samples)
}
@
The \code{run.temper} function runs the \code{sample.temper} function
for a sequence of values of \code{ntemper} from 1 to 128, increasing
in powers of 2. The \code{run.temper} function returns a list of
sampled values for both \code{theta1} and \code{theta2}: each element of
the list corresponds to sampled values for a different value of
\code{ntemper}.
<>=
run.temper <- function(nsim=1000, nburnin=1000, seed=121876) {
set.seed(seed)
Nstep <- 8
#Based on pilot runs, we set thinning interval to get approximately
#independent samples from each chain (and hence the same effective
#sample size).
thin <- c(50, 30, 12, 6, 3, 2, 2, 1)
theta1.samples <- theta2.samples <- vector("list", Nstep)
for (t in 1:Nstep) {
ntemper <- 2^(t-1)
new.samp <- sample.temper(thin[t] * (nsim + nburnin), ntemper=ntemper)
theta1 <- as.mcmc(new.samp$theta1)
theta1 <- window(theta1, start=nburnin * thin[t] + 1, thin=thin[t])
theta1.samples[[t]] <- theta1
theta2 <- as.mcmc(new.samp$theta2)
theta2 <- window(theta2, start=nburnin * thin[t] + 1, thin=thin[t])
theta2.samples[[t]] <- theta2
}
list(theta1=theta1.samples, theta2=theta2.samples)
}
@
Results are plotted with the \code{plot.temper} function.
<>=
plot.temper <- function(s) {
plot(density(s[[1]], bw="nrd"), type = "n", main="",
xlab=expression(theta[2]))
Nstep <- length(s)
for (t in 1:Nstep) {
lines(density(s[[t]], bw="nrd"), col=grey(1 - t/Nstep), lwd=2)
}
legend("topright", col=grey(1 - (1:Nstep)/Nstep), lty=1, lwd=2,
legend=formatC(2^(0:(Nstep-1)), width=2), title="Steps")
}
@
Figure~\ref{fig:temper} is created by running the \code{run.temper}
function and passing the samples of \code{theta2} to \code{plot.temper}.
<>=
tempered.samples <- run.temper()
plot.temper(tempered.samples$theta2)
@
\begin{figure}[p]
\begin{center}
<>=
plot.temper(tempered.samples$theta2)
@
\end{center}
\caption{Estimates from the tempered sampler for varying number of
tempering steps}
\label{fig:temper}
\end{figure}
\end{document}