bayesian quadrature - university of torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 ·...

52
Bayesian Quadrature: Model-based Approximate Integration David Duvenaud University of Cambridge

Upload: others

Post on 05-Apr-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Bayesian Quadrature:Model-based Approximate Integration

David DuvenaudUniversity of Cambridge

December 8, 2012

Page 2: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

The Quadrature Problem

� We want to estimate anintegral

Z =

∫f (x)p(x)dx

� Most computational problems

in inference correspond to

integrals:

� Expectations� Marginal distributions� Integrating out

nuisance parameters� Normalization

constants� Model comparison

x

function f(x)

input density p(x)

Page 3: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Sampling Methods

� Monte Carlo methods:Sample from p(x), takeempirical mean:

Z =1

N

N∑i=1

f (xi )

x

function f(x)

input density p(x)

samples

Page 4: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Sampling Methods

� Monte Carlo methods:Sample from p(x), takeempirical mean:

Z =1

N

N∑i=1

f (xi )

� Possibly sub-optimal for tworeasons:

x

function f(x)

input density p(x)

samples

Page 5: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Sampling Methods

� Monte Carlo methods:Sample from p(x), takeempirical mean:

Z =1

N

N∑i=1

f (xi )

� Possibly sub-optimal for two

reasons:

� Random bunching up x

function f(x)

input density p(x)

samples

Page 6: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Sampling Methods

� Monte Carlo methods:Sample from p(x), takeempirical mean:

Z =1

N

N∑i=1

f (xi )

� Possibly sub-optimal for two

reasons:

� Random bunching up� Often, nearby function

values will be similar

x

function f(x)

input density p(x)

samples

Page 7: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Sampling Methods

� Monte Carlo methods:Sample from p(x), takeempirical mean:

Z =1

N

N∑i=1

f (xi )

� Possibly sub-optimal for two

reasons:

� Random bunching up� Often, nearby function

values will be similar

� Model-based andquasi-Monte Carlo methodsspread out samples to achievefaster convergence.

x

function f(x)

input density p(x)

samples

Page 8: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Model-based Integration

� Place a prior on f , for example, a GP

� Posterior over f implies a posterior over Z .

Page 9: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Model-based Integration

� Place a prior on f , for example, a GP

� Posterior over f implies a posterior over Z .

Z

x

f(x)

p(x)

GP meanGP mean ± SDp(Z)

Page 10: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Model-based Integration

� Place a prior on f , for example, a GP

� Posterior over f implies a posterior over Z .

Z

x

f(x)

p(x)

GP meanGP mean ±2SDp(Z)

samples

Page 11: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Model-based Integration

� Place a prior on f , for example, a GP

� Posterior over f implies a posterior over Z .

Z

x

f(x)

p(x)

GP meanGP mean ±2SDp(Z)

samples

Page 12: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Model-based Integration

� Place a prior on f , for example, a GP

� Posterior over f implies a posterior over Z .

Z

x

f(x)

p(x)

GP meanGP mean ±2SDp(Z)

samples

Page 13: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Model-based Integration

� Place a prior on f , for example, a GP

� Posterior over f implies a posterior over Z .

Z

x

f(x)

p(x)

GP meanGP mean ±2SDp(Z)

samples

Page 14: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Model-based Integration

� Place a prior on f , for example, a GP

� Posterior over f implies a posterior over Z .

Z

x

f(x)

p(x)

GP meanGP mean ±2SDp(Z)

samples

Page 15: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Model-based Integration

� Place a prior on f , for example, a GP

� Posterior over f implies a posterior over Z .

Z

x

f(x)

p(x)

GP meanGP mean ±2SDp(Z)

samples

� We’ll call using a GP prior Bayesian Quadrature

Page 16: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Bayesian Quadrature Estimator

� Posterior over Z has mean linear in f (xs):

Egp [Z |f (xs)] =N∑i=1

zTK−1f (xi )

where zn =∫k(x , xn)p(x)dx

� Natural to minimize posterior variance of Z :

V [Z |f (xs)] =

∫ ∫k(x , x ′)p(x)p(x ′)dxdx ′ − zTK−1z

� Doesn’t depend on function values at all!

� Choosing samples sequentially to minimize variance:Sequential Bayesian Quadrature.

Page 17: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Bayesian Quadrature Estimator

� Posterior over Z has mean linear in f (xs):

Egp [Z |f (xs)] =N∑i=1

zTK−1f (xi )

where zn =∫k(x , xn)p(x)dx

� Natural to minimize posterior variance of Z :

V [Z |f (xs)] =

∫ ∫k(x , x ′)p(x)p(x ′)dxdx ′ − zTK−1z

� Doesn’t depend on function values at all!

� Choosing samples sequentially to minimize variance:Sequential Bayesian Quadrature.

Page 18: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Bayesian Quadrature Estimator

� Posterior over Z has mean linear in f (xs):

Egp [Z |f (xs)] =N∑i=1

zTK−1f (xi )

where zn =∫k(x , xn)p(x)dx

� Natural to minimize posterior variance of Z :

V [Z |f (xs)] =

∫ ∫k(x , x ′)p(x)p(x ′)dxdx ′ − zTK−1z

� Doesn’t depend on function values at all!

� Choosing samples sequentially to minimize variance:Sequential Bayesian Quadrature.

Page 19: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Bayesian Quadrature Estimator

� Posterior over Z has mean linear in f (xs):

Egp [Z |f (xs)] =N∑i=1

zTK−1f (xi )

where zn =∫k(x , xn)p(x)dx

� Natural to minimize posterior variance of Z :

V [Z |f (xs)] =

∫ ∫k(x , x ′)p(x)p(x ′)dxdx ′ − zTK−1z

� Doesn’t depend on function values at all!

� Choosing samples sequentially to minimize variance:Sequential Bayesian Quadrature.

Page 20: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Things you can do with Bayesian Quadrature

� Can incorporate knowledge of function (symmetries)

f (x , y) = f (y , x)⇔ ks(x , y , x ′, y ′) = k(x , y , x ′, y ′) + k(x , y ′, x ′, y)

+ k(x ′, y , x , y ′) + k(x ′, y ′, x , y)

� Can condition on gradients

� Posterior variance is a natural convergence diagnostic

100 120 140 160 180 200−0.2

−0.1

0

0.1

0.2

Z

# of samples

Page 21: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Things you can do with Bayesian Quadrature

� Can incorporate knowledge of function (symmetries)

f (x , y) = f (y , x)⇔ ks(x , y , x ′, y ′) = k(x , y , x ′, y ′) + k(x , y ′, x ′, y)

+ k(x ′, y , x , y ′) + k(x ′, y ′, x , y)

� Can condition on gradients

� Posterior variance is a natural convergence diagnostic

100 120 140 160 180 200−0.2

−0.1

0

0.1

0.2

Z

# of samples

Page 22: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Things you can do with Bayesian Quadrature

� Can incorporate knowledge of function (symmetries)

f (x , y) = f (y , x)⇔ ks(x , y , x ′, y ′) = k(x , y , x ′, y ′) + k(x , y ′, x ′, y)

+ k(x ′, y , x , y ′) + k(x ′, y ′, x , y)

� Can condition on gradients

� Posterior variance is a natural convergence diagnostic

100 120 140 160 180 200−0.2

−0.1

0

0.1

0.2

Z

# of samples

Page 23: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

More things you can do with Bayesian Quadrature

� Can compute likelihood of GP, learn kernel

� Can compute marginals with error bars, in two ways:

Page 24: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

More things you can do with Bayesian Quadrature

� Can compute likelihood of GP, learn kernel

� Can compute marginals with error bars, in two ways:

� Simply from the GP posterior:

Page 25: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

More things you can do with Bayesian Quadrature

� Can compute likelihood of GP, learn kernel

� Can compute marginals with error bars, in two ways:

� Or by recomputing fθ(x) for different θ with same x

0.2 0.4 0.6 0.8 1 1.2σ

x

Z

True σx

Marg. Like.

Page 26: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

More things you can do with Bayesian Quadrature

� Can compute likelihood of GP, learn kernel

� Can compute marginals with error bars, in two ways:

� Or by recomputing fθ(x) for different θ with same x

0.2 0.4 0.6 0.8 1 1.2σ

x

Z

True σx

Marg. Like.

� Much nicer than histograms!

Page 27: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Rates of Convergence

What is rate of convergence of SBQ when its assumptions are true?

Expected Variance / MMD

Page 28: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Rates of Convergence

What is rate of convergence of SBQ when its assumptions are true?

Expected Variance / MMD

Page 29: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Rates of Convergence

What is rate of convergence of SBQ when its assumptions are true?

Expected Variance / MMD

Page 30: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Rates of Convergence

What is rate of convergence of SBQ when its assumptions are true?

Expected Variance / MMD Empirical Rates in RKHS

Page 31: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Rates of Convergence

What is rate of convergence of SBQ when its assumptions are true?

Expected Variance / MMDEmpirical Rates out of

RKHS

Page 32: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Rates of Convergence

What is rate of convergence of SBQ when its assumptions are true?

Expected Variance / MMD Bound on Bayesian Error

Page 33: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

GPs vs Log-GPs for Inference

−20

0

20

40

60

80

100

120

x

`(x)

True FunctionEvaluations

Page 34: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

GPs vs Log-GPs for Inference

−20

0

20

40

60

80

100

120

x

`(x)

True FunctionEvaluationsGP Posterior

Page 35: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

GPs vs Log-GPs for Inference

−20

0

20

40

60

80

100

120

x

`(x)

−1

0

1

2

3

4

x

log`(x)

True FunctionEvaluationsGP Posterior

True Log-func

Evaluations

Page 36: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

GPs vs Log-GPs for Inference

−20

0

20

40

60

80

100

120

x

`(x)

−1

0

1

2

3

4

x

log`(x)

True Log-func

Evaluations

GP Posterior

True FunctionEvaluationsGP Posterior

Page 37: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

GPs vs Log-GPs for Inference

−20

0

20

40

60

80

100

120

x

`(x)

True FunctionEvaluationsGP PosteriorLog-GP Posterior

−1

0

1

2

3

4

x

log`(x)

True Log-func

Evaluations

GP Posterior

Page 38: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

GPs vs Log-GPs

0

50

100

150

200

x

`(x)

True FunctionEvaluations

Page 39: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

GPs vs Log-GPs

0

50

100

150

200

x

`(x)

True FunctionEvaluationsGP Posterior

Page 40: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

GPs vs Log-GPs

0

50

100

150

200

x

`(x)

−200

−150

−100

−50

0

x

log`(x)

True FunctionEvaluationsGP Posterior

True Log-func

Evaluations

Page 41: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

GPs vs Log-GPs

0

50

100

150

200

x

`(x)

−200

−150

−100

−50

0

x

log`(x)

True Log-func

Evaluations

GP Posterior

True FunctionEvaluationsGP Posterior

Page 42: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

GPs vs Log-GPs

0

50

100

150

200

x

`(x)

True FunctionEvaluationsGP PosteriorLog-GP Posterior

−200

−150

−100

−50

0

x

log`(x)

True Log-func

Evaluations

GP Posterior

Page 43: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Integrating under Log-GPs

−20

0

20

40

60

80

x

`(x)

True FunctionEvaluationsGP PosteriorLog-GP Posterior

−1

0

1

2

3

4

x

log`(x)

True Log-func

Evaluations

GP Posterior

Page 44: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Integrating under Log-GPs

−20

0

20

40

60

80

x

`(x)

True FunctionEvaluationsMean of GPApprox Log-GP

Inducing Points

−1

0

1

2

3

4

x

log`(x)

True Log-func

Evaluations

GP Posterior

Inducing Points

Page 45: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Conclusions

� Model-based integration allows active learning about integrals,can require fewer samples than MCMC, and allows us tocheck our assumptions.

� BQ has nice convergence properties if its assumptions arecorrect.

� For inference, GP is not especially appropriate, but othermodels are intractable.

Page 46: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Conclusions

� Model-based integration allows active learning about integrals,can require fewer samples than MCMC, and allows us tocheck our assumptions.

� BQ has nice convergence properties if its assumptions arecorrect.

� For inference, GP is not especially appropriate, but othermodels are intractable.

Page 47: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Conclusions

� Model-based integration allows active learning about integrals,can require fewer samples than MCMC, and allows us tocheck our assumptions.

� BQ has nice convergence properties if its assumptions arecorrect.

� For inference, GP is not especially appropriate, but othermodels are intractable.

Page 48: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Limitations and Future Directions

� Right now, BQ really only works in low dimensions ( < 10 ),when the function is fairly smooth, and is only worth usingwhen computing f (x) is expensive.

� How to extend to high dimensions? Gradient observations arehelpful, but a D-dimensional gradient is D separateobservations.

� It seems unlikely that we’ll find another tractablenonparametric distribution like GPs - should we accept thatwe’ll need a second round of approximate integration on asurrogate model?

� How much overhead is worthwhile? Bounded rationality workseems relevant.

Thanks!

Page 49: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Limitations and Future Directions

� Right now, BQ really only works in low dimensions ( < 10 ),when the function is fairly smooth, and is only worth usingwhen computing f (x) is expensive.

� How to extend to high dimensions? Gradient observations arehelpful, but a D-dimensional gradient is D separateobservations.

� It seems unlikely that we’ll find another tractablenonparametric distribution like GPs - should we accept thatwe’ll need a second round of approximate integration on asurrogate model?

� How much overhead is worthwhile? Bounded rationality workseems relevant.

Thanks!

Page 50: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Limitations and Future Directions

� Right now, BQ really only works in low dimensions ( < 10 ),when the function is fairly smooth, and is only worth usingwhen computing f (x) is expensive.

� How to extend to high dimensions? Gradient observations arehelpful, but a D-dimensional gradient is D separateobservations.

� It seems unlikely that we’ll find another tractablenonparametric distribution like GPs - should we accept thatwe’ll need a second round of approximate integration on asurrogate model?

� How much overhead is worthwhile? Bounded rationality workseems relevant.

Thanks!

Page 51: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Limitations and Future Directions

� Right now, BQ really only works in low dimensions ( < 10 ),when the function is fairly smooth, and is only worth usingwhen computing f (x) is expensive.

� How to extend to high dimensions? Gradient observations arehelpful, but a D-dimensional gradient is D separateobservations.

� It seems unlikely that we’ll find another tractablenonparametric distribution like GPs - should we accept thatwe’ll need a second round of approximate integration on asurrogate model?

� How much overhead is worthwhile? Bounded rationality workseems relevant.

Thanks!

Page 52: Bayesian Quadrature - University of Torontoduvenaud/talks/intro_bq.pdf · 2017-05-29 · Limitations and Future Directions Right now, BQ really only works in low dimensions (

Limitations and Future Directions

� Right now, BQ really only works in low dimensions ( < 10 ),when the function is fairly smooth, and is only worth usingwhen computing f (x) is expensive.

� How to extend to high dimensions? Gradient observations arehelpful, but a D-dimensional gradient is D separateobservations.

� It seems unlikely that we’ll find another tractablenonparametric distribution like GPs - should we accept thatwe’ll need a second round of approximate integration on asurrogate model?

� How much overhead is worthwhile? Bounded rationality workseems relevant.

Thanks!