Metropolis-adjusted Langevin algorithm

Markov Chain Monte Carlo algorithm

In computational statistics, the Metropolis-adjusted Langevin algorithm (MALA) or Langevin Monte Carlo (LMC) is a Markov chain Monte Carlo (MCMC) method for obtaining random samples – sequences of random observations – from a probability distribution for which direct sampling is difficult. As the name suggests, MALA uses a combination of two mechanisms to generate the states of a random walk that has the target probability distribution as an invariant measure:

Informally, the Langevin dynamics drive the random walk towards regions of high probability in the manner of a gradient flow, while the Metropolis–Hastings accept/reject mechanism improves the mixing and convergence properties of this random walk. MALA was originally proposed by Julian Besag in 1994,[1] (although the method Smart Monte Carlo was already introduced in 1978 [2]) and its properties were examined in detail by Gareth Roberts together with Richard Tweedie[3] and Jeff Rosenthal.[4] Many variations and refinements have been introduced since then, e.g. the manifold variant of Girolami and Calderhead (2011).[5] The method is equivalent to using the Hamiltonian Monte Carlo (hybrid Monte Carlo) algorithm with only a single discrete time step.[5]

Further details

Let π {\displaystyle \pi } denote a probability density function on R d {\displaystyle \mathbb {R} ^{d}} , one from which it is desired to draw an ensemble of independent and identically distributed samples. We consider the overdamped Langevin Itô diffusion

X ˙ = log π ( X ) + 2 W ˙ {\displaystyle {\dot {X}}=\nabla \log \pi (X)+{\sqrt {2}}{\dot {W}}}

driven by the time derivative of a standard Brownian motion W {\displaystyle W} . (Note that another commonly used normalization for this diffusion is

X ˙ = 1 2 log π ( X ) + W ˙ , {\displaystyle {\dot {X}}={\frac {1}{2}}\nabla \log \pi (X)+{\dot {W}},}

which generates the same dynamics.) In the limit as t {\displaystyle t\to \infty } , this probability distribution ρ ( t ) {\displaystyle \rho (t)} of X ( t ) {\displaystyle X(t)} approaches a stationary distribution, which is also invariant under the diffusion, which we denote ρ {\displaystyle \rho _{\infty }} . It turns out that, in fact, ρ = π {\displaystyle \rho _{\infty }=\pi } .

Approximate sample paths of the Langevin diffusion can be generated by many discrete-time methods. One of the simplest is the Euler–Maruyama method with a fixed time step τ > 0 {\displaystyle \tau >0} . We set X 0 := x 0 {\displaystyle X_{0}:=x_{0}} and then recursively define an approximation X k {\displaystyle X_{k}} to the true solution X ( k τ ) {\displaystyle X(k\tau )} by

X k + 1 := X k + τ log π ( X k ) + 2 τ ξ k , {\displaystyle X_{k+1}:=X_{k}+\tau \nabla \log \pi (X_{k})+{\sqrt {2\tau }}\xi _{k},}

where each ξ k {\displaystyle \xi _{k}} is an independent draw from a multivariate normal distribution on R d {\displaystyle \mathbb {R} ^{d}} with mean 0 and covariance matrix equal to the d × d {\displaystyle d\times d} identity matrix. Note that X k + 1 {\displaystyle X_{k+1}} is normally distributed with mean X k + τ log π ( X k ) {\displaystyle X_{k}+\tau \nabla \log \pi (X_{k})} and covariance equal to 2 τ {\displaystyle 2\tau } times the d × d {\displaystyle d\times d} identity matrix.

In contrast to the Euler–Maruyama method for simulating the Langevin diffusion, which always updates X k {\displaystyle X_{k}} according to the update rule

X k + 1 := X k + τ log π ( X k ) + 2 τ ξ k , {\displaystyle X_{k+1}:=X_{k}+\tau \nabla \log \pi (X_{k})+{\sqrt {2\tau }}\xi _{k},}

MALA incorporates an additional step. We consider the above update rule as defining a proposal X ~ k + 1 {\displaystyle {\tilde {X}}_{k+1}} for a new state,

X ~ k + 1 := X k + τ log π ( X k ) + 2 τ ξ k . {\displaystyle {\tilde {X}}_{k+1}:=X_{k}+\tau \nabla \log \pi (X_{k})+{\sqrt {2\tau }}\xi _{k}.}

This proposal is accepted or rejected according to the Metropolis-Hastings algorithm: set

α := min { 1 , π ( X ~ k + 1 ) q ( X k X ~ k + 1 ) π ( X k ) q ( X ~ k + 1 X k ) } , {\displaystyle \alpha :=\min \left\{1,{\frac {\pi ({\tilde {X}}_{k+1})q(X_{k}\mid {\tilde {X}}_{k+1})}{\pi ({X}_{k})q({\tilde {X}}_{k+1}\mid X_{k})}}\right\},}

where

q ( x x ) exp ( 1 4 τ x x τ log π ( x ) 2 2 ) {\displaystyle q(x'\mid x)\propto \exp \left(-{\frac {1}{4\tau }}\|x'-x-\tau \nabla \log \pi (x)\|_{2}^{2}\right)}

is the transition probability density from x {\displaystyle x} to x {\displaystyle x'} (note that, in general q ( x x ) q ( x x ) {\displaystyle q(x'\mid x)\neq q(x\mid x')} ). Let u {\displaystyle u} be drawn from the continuous uniform distribution on the interval [ 0 , 1 ] {\displaystyle [0,1]} . If u α {\displaystyle u\leq \alpha } , then the proposal is accepted, and we set X k + 1 := X ~ k + 1 {\displaystyle X_{k+1}:={\tilde {X}}_{k+1}} ; otherwise, the proposal is rejected, and we set X k + 1 := X k {\displaystyle X_{k+1}:=X_{k}} .

The combined dynamics of the Langevin diffusion and the Metropolis–Hastings algorithm satisfy the detailed balance conditions necessary for the existence of a unique, invariant, stationary distribution ρ = π {\displaystyle \rho _{\infty }=\pi } . Compared to naive Metropolis–Hastings, MALA has the advantage that it usually proposes moves into regions of higher π {\displaystyle \pi } probability, which are then more likely to be accepted. On the other hand, when π {\displaystyle \pi } is strongly anisotropic (i.e. it varies much more quickly in some directions than others), it is necessary to take 0 < τ 1 {\displaystyle 0<\tau \ll 1} in order to properly capture the Langevin dynamics; the use of a positive-definite preconditioning matrix A R d × d {\displaystyle A\in \mathbb {R} ^{d\times d}} can help to alleviate this problem, by generating proposals according to

X ~ k + 1 := X k + τ A log π ( X k ) + 2 τ A ξ k , {\displaystyle {\tilde {X}}_{k+1}:=X_{k}+\tau A\nabla \log \pi (X_{k})+{\sqrt {2\tau A}}\xi _{k},}

so that X ~ k + 1 {\displaystyle {\tilde {X}}_{k+1}} has mean X k + τ A log π ( X k ) {\displaystyle X_{k}+\tau A\nabla \log \pi (X_{k})} and covariance 2 τ A {\displaystyle 2\tau A} .

For limited classes of target distributions, the optimal acceptance rate for this algorithm can be shown to be 0.574 {\displaystyle 0.574} ; if it is discovered to be substantially different in practice, τ {\displaystyle \tau } should be modified accordingly.[4]

References

  1. ^ J. Besag (1994). "Comments on "Representations of knowledge in complex systems" by U. Grenander and MI Miller". Journal of the Royal Statistical Society, Series B. 56: 591–592.
  2. ^ Rossky, P.J.; Doll, J.D.; Friedman, H.L. (1978). "Brownian Dynamics as smart Monte Carlo simulation". Journal of Chemical Physics. 69 (10): 4628. Bibcode:1978JChPh..69.4628R. doi:10.1063/1.436415.
  3. ^ G. O. Roberts and R. L. Tweedie (1996). "Exponential convergence of Langevin distributions and their discrete approximations". Bernoulli. 2 (4): 341–363. doi:10.2307/3318418. JSTOR 3318418.
  4. ^ a b G. O. Roberts and J. S. Rosenthal (1998). "Optimal scaling of discrete approximations to Langevin diffusions". Journal of the Royal Statistical Society, Series B. 60 (1): 255–268. doi:10.1111/1467-9868.00123. S2CID 5831882.
  5. ^ a b M. Girolami and B. Calderhead (2011). "Riemann manifold Langevin and Hamiltonian Monte Carlo methods". Journal of the Royal Statistical Society, Series B. 73 (2): 123–214. CiteSeerX 10.1.1.190.580. doi:10.1111/j.1467-9868.2010.00765.x.
Retrieved from "https://en.wikipedia.org/w/index.php?title=Metropolis-adjusted_Langevin_algorithm&oldid=1296821708"