lecture 7: markov chain monte carlo - zcu.cz

15
LECTURE 7: Markov Chain Monte Carlo Modeling and Simulation 2 Daniel Georgiev

Upload: others

Post on 23-Jan-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LECTURE 7: Markov Chain Monte Carlo - zcu.cz

L E C T U R E 7 : M a r k o v C h a i n M o n t e C a r l o

M o d e l i n g a n d S i m u l a t i o n 2 D a n i e l G e o r g i e v

Page 2: LECTURE 7: Markov Chain Monte Carlo - zcu.cz

O U T L I N E

• Review • Markov Chain Monte Carlo • Metropolis-Hastings algorithm

reversible Markov chain perfect balance derivation of transition probabilities thinning burn in example

Page 3: LECTURE 7: Markov Chain Monte Carlo - zcu.cz

C A S E S T U D Y D ATAEstimated marginal distribution

X20.049 0.0495 0.05 0.0505 0.051 0.0515

Ps

2.94

2.96

2.98

3

3.02

3.04

3.06

3.08

3.1

3.12

Estimated marginal distribution

X20.049 0.0495 0.05 0.0505 0.051 0.0515

Tm

8

8.05

8.1

8.15

8.2

8.25

8.3

8.35

8.4

8.45

8.5

Ps2.95 3 3.05 3.1

Tm

8

8.05

8.1

8.15

8.2

8.25

8.3

8.35

8.4

8.45

8.5

Histogram data for rainy days

0.0520.051

X20.05

0.049

Empirical data

0.0482.92.95

33.05

Ps

3.1

7.98

8.18.28.38.48.58.6

3.15

Tm

Page 4: LECTURE 7: Markov Chain Monte Carlo - zcu.cz

M A R K O V C H A I N M O N T E C A R L O ( M C M C )

The inverse method allows sampling 1d RVs with arbitrary distributions. Many real world systems include correlated inputs. Markov Chain Monte Carlo (MCMC) is a method for sampling arbitrary multivariate RVs.

Page 5: LECTURE 7: Markov Chain Monte Carlo - zcu.cz

R E V E R S I B L E M A R K O V C H A I N

Kolmogorov’s criterion: A Markov chain with a transition probability P is said to be reversible if

for all finite sequences of states.

Same holds for continuous time MC, only propensities substituted for probabilities.

pj1,j2pj2,j3 · · · pjn�1,jnpjn,j1 = pj1,jnpjn,jn�1 · · · pj3,j2pj2,j1

Page 6: LECTURE 7: Markov Chain Monte Carlo - zcu.cz

P E R F E C T B A L A N C E

Theorem: A reversible Markov Chain satisfies

Hence, an ergodic Markov Chain with a probability transition matrix

converges with probability one to the stationary distribution

⇡ipi,j = ⇡jpj,i

⇡i⇡j

= pj,i

pi,j

Page 7: LECTURE 7: Markov Chain Monte Carlo - zcu.cz

T R A N S I T I O N P R O B A B I L I T I E S

Approach: Separate the transition probability into a proposition probability and an acceptance probability

From detailed balanced we obtain

P (xi|xj) = pr (xi|xj) ac (xi|xj)

ac(xi|xj)ac(xj |xi)

= P (xi)pr(xj |xi)P (xj)pr(xi|xj)

Page 8: LECTURE 7: Markov Chain Monte Carlo - zcu.cz

T R A N S I T I O N P R O B A B I L I T I E S

Approach: Separate the transition probability into a proposition probability and an acceptance probability

From detailed balanced we obtain

One choice for the acceptance function is Metropolis choice

P (xi|xj) = pr (xi|xj) ac (xi|xj)

ac(xi|xj)ac(xj |xi)

= P (xi)pr(xj |xi)P (xj)pr(xi|xj)

ac (xi

|xj

) = min⇣1, P (xi)pr(xj |xi)

P (xj)pr(xi|xj)

Page 9: LECTURE 7: Markov Chain Monte Carlo - zcu.cz

P R O P O S A L F U N C T I O N

The proposal distribution is usually Gaussian, i.e.,

For a uniform target distribution, where the P(xi) = P(xj), MCMC reduces to a continuous random walk.

Q(x|xi

) = 1(2⇡)d/2

e

� 12 (x�xi)

0(x�xi)

Page 10: LECTURE 7: Markov Chain Monte Carlo - zcu.cz

T R A N S I T I O N P R O B A B I L I T I E S

Gaussian proposition function: for an unbiased random walk the acceptance function reduces to

Hence, if the proposed state has a higher probability, it is always accepted. If the proposed state has a lower probability, it is accepted at a rate proportional to the two state probability quotient.

ac (xi

|xj

) = min⇣1, P (xi)

P (xj)

Page 11: LECTURE 7: Markov Chain Monte Carlo - zcu.cz

A U T O C O R R E L AT I O N

Monte Carlo method requires generation of iid samples, MCMC samples are not independent. To generate approximately iid samples, discard d samples for every 1 kept sample. The number d depends on the sample space, the standard deviation the proposal function, and the target distribution. A test whether d is large enough is given by the autocorrelation function.

Where a sample autocorrelation using circular convolution is

The practice of discarding samples is called thinning.

autocorr (⌧) = E[(Xt�µ)(Xt+⌧�µ)]�2

autocorr (i) =PN�1

j=0 xjx⇤j+i

Page 12: LECTURE 7: Markov Chain Monte Carlo - zcu.cz

B U R N I N

A Markov chain takes some time to reach the station distribution. Although convergence to the station distribution is guaranteed from any initial state, the amount of time it takes depends on many parameters. Standard practice is to discard enough of the initial samples to ensure samples are only drawn from the target distribution. The discard period is called the burn in period and can again be computed from the autocorrelation function. In other words, the samples after the burn in period should be independent of the initial state.

Page 13: LECTURE 7: Markov Chain Monte Carlo - zcu.cz

M A R K O V C H A I N M O N T E C A R L O ( M C M C )

MCMC (Metropolis-Hastings Algorithm): Consider a RV 1. Initialisation (x0) - select an initial value of X and let i = 0. 2. Proposal (xp) - propose a new sample value of X from the last sampled

value xi according to the conditional distribution Q(x|xi). The distribution may be the multivariate distribution that can be sampled by generating d independent normal RVs.

3. Acceptance (xi+1) - compute the value α(xp,xi) by the Metropolis formulaaccept the proposed value with probability α(xp,xi), i.e., if U < α(xp,xi), xi+1 = xp, i = i +1, and go to step 2. Otherwise, reject xp and go to 2. U is uniformly distributed on [0,1].

Q(x|xi

) = 1(2⇡)d/2

e

� 12 (x�xi)

0(x�xi)

X : ⌦ ! Rd

↵(xp

, x

i

) = min⇣1, f(xp)Q(xi|xp)

f(xi)Q(xp|xi)

Page 14: LECTURE 7: Markov Chain Monte Carlo - zcu.cz

M A R K O V C H A I N M O N T E C A R L O T H E O R E M

Consider the Markov chain {Xi, i = 1,2,3…} generated by the Metropolis-Hastings Algorithm with the acceptance functionThen the probability density of Xi converges and is equal to f, i.e., for sufficiently large i, the generated states are approximately sampled from the desired distribution.

Page 15: LECTURE 7: Markov Chain Monte Carlo - zcu.cz

M C M C E X A M P L E

MCMS generates autocorrelated samples. Monte Carlo requires IID samples. Discarding samples reduces the autocorrelation.

Acceptance function results in higher probability values being sampled more often. This increases the sampling efficiency. Note, the computationally expensive part of a simulation is the deterministic function evaluation, not the RV generation.

The Markov process is guaranteed to have a stationary distribution that is equal to the target distribution. Depending on the mixing rate, however, it takes some time for the Markov process to converge.