This module implements Markov Chain Monte Carlo (MCMC) algorithms which are used to sample from a target density. All samplers operates on log-densities instead of densities both for efficiency and numerical stability reasons.
Implements the No-U-Turn Sampler (NUTS) of Hoffman&Gelman[13]. This is an adaptive MCMC algorithm which belongs to the class of Hamiltonian Monte Carlo (HMC) samplers. As such the gradient of the log-density is required to carry out the simulation. The sci.diff module allows for the gradient to be automatically computed without approximations with minimal user effort.
The algorithm is implemented according to the paper mentioned above but additionally a diagonal (or optionally dense) mass matrix is estimated during the adaptation phase to improve the performance of the sampler. Similar improvements have been implemented in the Stan Project implementation of NUTS.
The argument rng
is a prng object, the function f(x, grad)
sets grad
to the gradient of the
log-density at x
and returns the value of the log-density at x
. Both x
and grad
must have the same length of
theta0
which is the starting value of the Markov Chain (adaptation phase). The log-density must evaluate to a finite value at theta0
. The
argument options
is a table and the algorithm makes use of the following string keys (defaults in square brackets):
Key | Description |
---|---|
stopadapt | [1024] number of adaptation samples, 0 or power of 2 >= 64 |
stop | number of samples or function(theta) |
olstat | on-line statistical estimators |
mass | ["diagonal"] can be "diagonal" or "dense" |
eps | [rough guess] initial value for eps |
cov | [nil] initial covariance: densemass^-1 (matrix) |
var | [nil] initial variance: diagonalmass^-1 (vector) |
delta | [0.8] for adaptation algorithm, see NUTS paper |
gamma | [0.05] for adaptation algorithm, see NUTS paper |
t0 | [10] for adaptation algorithm, see NUTS paper |
k | [0.75] for adaptation algorithm, see NUTS paper |
dlmax | [1000] to cap tree growth, see deltamax in NUTS paper |
The Markov Chain is started at theta0
and the adaptation takes place for the first stopadapt
samples (stopadadapt
may be
0
). Then the parameters determining the evolution of the Markov Chain are fixed and the sampling phase follows. If stop
is a number then
stop
samples are drawn from the chain. If stop
is a function, each drawn sample is passed to stop
and the algorithm terminates
when stop
evaluates to true
. In any case olstat:push(theta)
is called for every drawn sample theta
(samples from the
adaptation phase are excluded). Using the on-line statistical accumulators of sci.stat
allows to estimate statistics of the target distribution. When the
algorithm terminates it returns: the last drawn sample, the adapted eps and the adapted inverse mass (variance vector if the mass is diagonal or covariance matrix if
the mass is dense). Notice that this information allows to re-start the chain if so desired by passing these returned values in the options
argument and
setting stopadapt
to 0
.
The mass
option determines whether the algorithm uses a diagonal or dense mass matrix for the Hamiltonian dynamics. The eps
option allows to
provide an initial guess for the discretization step (otherwise a reasonable value is automatically determined). The options cov
and
var
allows to specify an initial estimates for the mass matrix which is otherwise set equal to the identity matrix. We refer to the NUTS paper for the
remaining options, the default values are perfectly adequate in most cases.
This example illustrates an use of mcmc.nuts()
to sample from a simple bivariate density:
jit.opt.start("loopunroll=60") -- Set optimization flags:
-- Load required modules:
local math = require "sci.math".generic
local diff = require "sci.diff"
local alg = require "sci.alg"
local prng = require "sci.prng"
local stat = require "sci.stat"
local mcmc = require "sci.mcmc"
local rng = prng.std()
local N = 50000 -- 50000 samples (after adaptation).
local theta0 = alg.tovec{ 0.5,0.5 } -- Starting value of chain.
local mystat = stat.olcov(2) -- Compute mean and covariance.
-- Logarithm of target density:
local function logpdf(x)
return 2*math.log(x[1]) - x[1]*x[2]^2 - x[2]^2 + 2*x[2] - 4*x[1]
end
-- Automatic gradient computation (vector of 2 elements):
local dlogpdf = diff.gradientf(logpdf, 2)
local start = os.clock() -- To time the simulation.
mcmc.nuts(rng, dlogpdf, theta0, {
stop = N,
olstat = mystat,
})
local mu, sig = alg.vec(2), alg.mat(2, 2)
mystat:mean(mu) -- Compute mean of target density.
mystat:cov(sig) -- Compute covariance of target density.
local time = os.clock() - start
print("mean:")
print(mu:width())
print("covariance:")
print(sig:width())
--> mean:
--> +0.650281,+0.637480
--> covariance:
--> +0.159994,-0.052265
--> -0.052265,+0.338427
print("CPU secs:", time) --> 1/4 of a second here with LuaJIT 2.1.