Deep Learning:第17章

September 18, 2017 | Author: matsuolab | Category: N/A
Share Embed Donate


Short Description

Deep Learning日本語翻訳版...

Description

561

17

2

17.1

17.

17.1.1

17.1.2

s=

!

p(x)f (x) = Ep [f (x)]

(17.1)

p(x)f (x)dx = Ep [f (x)]

(17.2)

x

s=

"

p

p

x

x(1) , . . . , x(n)

n

n

sˆn =

1! f (x(i) ) n i=1

(17.3)

s sˆ n

E[ˆ sn ] =

n

1! 1! E[f (x(i) )] = s=s n i=1 n i=1 562

(17.4)

17.

law of large numbers (i)

x

i.i.d. (17.5)

lim sˆn = s

n→∞

Var[f (x(i) )] n

sˆn

Var[ˆ sn ]

0

(i)

Var[f (x )] < ∞

Var[ˆ sn ] = =

n theorem

n 1 ! Var[f (x)] n2 i=1

(17.6)

Var[f (x)] . n

(17.7)

f (x(i) )

*1

Var[ˆ sn ]

central limit

sˆn

Var[f (x)] n

s

normal density

sˆn

base distribution p(x) p 17.2 ( 17.3 )

17.2 17.2 p(x) f (x) p(x)f (x) p(x)f (x) = q(x) *1

p(x)f (x) q(x)

(17.8) n

563

n−1

17.

q pf q

p

f p

f q∗

q∗

17.8 sˆp =

1 sˆq = n

1 n

n !

f (x(i) )

(17.9)

p(x(i) )f (x(i) ) . q(x(i) )

(17.10)

i=1,x(i) ∼p

n !

i=1,x(i) ∼q

q (17.11)

Eq [ˆ sq ] = Eq [ˆ sp ] = s. q Var[ˆ sq ] = Var[

p(x)f (x) ]/n q(x)

(17.12)

q q ∗ (x) =

p(x)|f (x)| Z

(17.13) 1

q ∗ (x)

Z f (x)

Var[ˆ sq ∗ ] = 0 q∗

q q∗

q∗ q 564

17.

p

q

biased importance sampling





sˆBIS =

!n

(17.14)

=

!n

(17.15)

=

!n

(17.16)

p

p(x(i) ) (i) i=1 q(x(i) ) f (x ) !n p(x(i) ) i=1 q(x(i) ) p(x(i) ) (i) i=1 q˜(x(i) ) f (x ) !n p(x(i) ) i=1 q˜(x(i) ) p(x ˜ (i) ) (i) i=1 q˜(x(i) ) f (x ) !n p(x ˜ (i) ) i=1 q˜(x(i) )

x(i)

q

E[ˆ sBIS ] ̸= s

q

n→∞

17.14

1

q

q p(x)|f (x)| q(x)

17.12

q q(x)

f (x)

q x

p

p(x)

p|f |

(i)

0

q(x(i) ) ≪ p(x(i) )|f (x(i) )|

q (i)

(i)

q(x ) ≫ p(x )|f (x )|

s x

12.4.3.3 18.7

20.10.3

565

17.

(Hinton, 2006)

17.3 pmodel (x) q(x) pmodel (x) Markov chain

pmodel (x)

Markov chain Monte Carlo methods MCMC

Koller

and Friedman (2009)

MCMC 16.2.4

energy-based

model EBM p(x) ∝ exp (−E(x)) EBM

MCMC

MCMC

p(a, b) p(a | b)

2

EBM b

a p(b | a) ancestral

sampling 16.3 EBM 566

17.

x x

x

p(x) T (x′ | x)

x

x′

x T (x′ | x)

x x′ MCMC x x

x

x q (t) (x)

t

q (0) q q (t) (x)

x (t)

p(x)

x

q

v (17.17)

q(x = i) = vi

x′

x x′ q (t+1) (x′ ) =

! x

q (t) (x)T (x′ | x)

A A

(17.18) T

Ai,j = T (x′ = i | x = j)

(17.19)

17.18 q v

T

A v (t) = Av (t−1) .

567

(17.20)

17.

A A v (t) = At v (0) .

(17.21)

A stochastic matrices x

t

x



Perron-Frobenius theorem (Perron, 1907; Frobenius, 1908) 1

1

! "t v (t) = V diag(λ)V −1 v (0) = V diag(λ)t V −1 v (0) .

A

(17.22)

0

1

1

stationary distribution equilibrium distribution v ′ = Av = v

1

v

T

(17.23)

q

p

17.4

T Harris chain T

q ′ (x′ ) = Ex∼q T (x′ | x). 17.23

x 568

(17.24) x

17.

burning in 2 1

n

1

MCMC

1 1 100 1 mixing time v A

At

1 2

A

v A A

569

1

17.

17.4 x ← x′ ∼ T (x′ | x)

q(x) q(x)

2 pmodel

1 EBM

T 2

T 2

pmodel 20.12

20.13 pmodel (x) q(x)

pmodel (x)



T (x | x)

q(x) pmodel (x) Gibbs sampling T (x′ | x)

1

xi

G

pmodel 16.7.1

RBM

block Gibbs sampling pmodel Metropolis-Hastings algorithm

1

570

RBM

17.

17.5 mix

MCMC p(x)

x MCMC slow mixing failure to mix

MCMC

E(x(t) )

E(x(t−1) )

x(t−1)

x(t) p(x)

( 17.4 ) energy barrier 2 17.1

2 −1

a

b

1

w a

b

E(a, b) = −wab a = 1

571

17.

17.1: 3 2

1

1

b P (b = 1 | a = 1) = σ(w)

b

1

1

Pmodel (a, b)

b w a = −1

Pmodel (a | b) 2

572

b

−1

1

17.

17.2: MNIST

generative adversarial network

pmodel (x | h)

pmodel (x, h) pmodel (h | x)

x

pmodel (h | x)

h

h

x

x h

x

2 x

h

17.2 MCMC

MCMC

573

17.

17.5.1

p(x) ∝ exp (−E(x))

(17.25)

pβ (x) ∝ exp (−βE(x))

(17.26)

β

β temperature

β β

0

x β=1

β
View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF