Deep Learning:第19章

September 18, 2017 | Author: matsuolab | Category: N/A
Share Embed Donate


Short Description

Deep Learning日本語翻訳版...

Description

600

19

v p(h | v)

h

p(h | v) PCA p(h | v)

1 p(h | v)

20

19.1

19.

19.1: V semi-restricted Boltzmann machine (Osindero and Hinton, 2008)

2 1 PCA

19.1

v log p(v; θ)

h h

log p(v; θ)

log p(v; θ) evidence lower bound ELBO

L(v, θ, q)

variational free

601

19.

energy (19.1)

L(v, θ, q) = log p(v; θ) − DKL (q(h | v)∥p(h | v; θ)) q

h KL

L(v, θ, q)

log p(v) 2

q

L

KL

p(h | v) L

L

q

(19.2)

L(v, θ, q) = log p(v; θ) − DKL (q(h | v)∥p(h | v; θ)) = log p(v; θ) − Eh∼q log = log p(v; θ) − Eh∼q log

q(h | v) p(h | v) q(h | v)

(19.3) (19.4)

p(h,v;θ) p(v;θ)

= log p(v; θ) − Eh∼q [log q(h | v) − log p(h, v; θ) + log p(v; θ)] = − Eh∼q [log q(h | v) − log p(h, v; θ)] .

p(h | v)

L

L

q q(h | v)

q(h | v) = p(h | v)

log p(v)

L(v, θ, q) = log p(v; θ) p(h | v)

L

q

q L

q

q

L

q

L 602

(19.6)

(19.7)

L(v, θ, q) = Eh∼q [log p(h, v)] + H(q). q

(19.5)

L

L

19.

q q

19.2 expectation max-

L

imization EM

EM Neal and Hinton (1999)

EM

EM EM

2

• E

E-step θ

(

Expectation step)

(0)

v (i) q(h(i) | v) = p(h(i) | v (i) ; θ (0) )

i θ (0)

q p(h | v; θ)

• M

M-step

(

q(h | v)

θ

)

Maximization step)

θ ! i

q

p(h | v; θ

(0)

L

L(v (i) , θ, q)

L

(19.8)

L

θ M

EM

EM M q

θ E

EM M 603

q

19.

M

θ θ

(0)

E

L

E

log p(v) 0

EM

EM EM 1

1

θ

q

M M EM

2

19.3 MAP p(h | v) 1 h∗ = arg max p(h | v)

(19.9)

h

maximum a posteriori MAP MAP

h∗

MAP

q

L(v, h, q)

q

MAP 19.1 q L(v, θ, q) = Eh∼q [log p(h, v)] + H(q) 604

(19.10)

19.

.

MAP

q q

(19.11)

q(h | v) = δ(h − µ) µ

q

µ

L µ∗ = arg max log p(h = µ, v)

(19.12)

µ

MAP h∗ = arg max p(h | v)

(19.13)

h

EM MAP

h∗

EM L

log p(h∗ , v)

θ

L

L

q L

θ

log p(v)

MAP

µ MAP 13.4

p(hi ) =

λ −λ|hi | e . 2

(19.14)

p(x | h) = N (v; W h + b, β −1 I). p(h | v)

v

(19.15) hi

v

hi 605

hj

hj

19.

1

p(h | v)

p(h | v)

MAP

MAP

h

ELBO h

H

v

V J(H, W ) =

! i,j

|Hi,j | +

!" i,j

V − HW ⊤

#2

(19.16)

i,j

H

W

W H

W

J W

J feature-sign search

H

(Lee et al., 2007)

19.4 L

L(v, θ, q)

log p(v; θ) θ

EM

q

q L

MAP p(h | v) q Eq log p(h, v)

L

q q q(h | v) =

$ i

606

q(hi | v)

(19.17)

19.

mean field q structured variational inference (Saul and Jordan, 1996)

q

q q

q q L(v, θ, q)

log p(v; θ) − DKL (q(h | v)∥p(h | v; θ))

DKL (q(h | v)∥p(h | v))

q q

p

KL DKL (pdata ∥pmodel )

3.6

KL

q

q

L

DKL (q(h | v)∥p(h | v))

q

DKL (q(h | v)∥p(h | v)) KL

DKL (p(h | v)∥q(h | v))

607

19.

19.4.1 q q h

q

hi ˆ h

q

ˆi q(hi = 1 | v) = h q q

ˆi h ∂ L=0 ˆi ∂h

(19.18)

ˆ h

.

binary sparse coding model

Henniges et al. (2010)

3.10

v ∈ Rn

m

h ∈ {0, 1}m p(hi = 1) = σ(bi ) 608

(19.19)

19.

p(v | h) = N (v; W h, β −1 ) .

b

(19.20)

W

β

1

=

∂ log p(v) ∂bi ∂ ∂bi p(v) p(v) ! ∂

= ∂bi = = =

∂ ∂bi

!

p(h, v)

p(h, v)

(19.23)

p(v) ! h p(h)p(v | h)

h

(19.24)

p(v | h) ∂b∂ i p(h)

(19.25)

p(v)

∂ p(h) p(h | v) ∂bi p(h)

=Eh∼p(h|v) p(h | v)

(19.22)

p(v)

" h

h

(19.21)

(19.26)

∂ log p(h). ∂bi

(19.27) p(h | v)

19.2

p(h | v)

brute force

q(h | v) =

# i

(19.28)

q(hi | v). q

m

q(hi | v) ˆ h

ˆi q(hi = 1 | v) = h 609

ˆi h

19.

h1

h2

v1

h3

v2

h4 h1

h3

h2

h4

v3

19.2: 4

p(h, v) 2 p(h | v)

0

ˆi log h

1 ˆi h

0

1 0

1

z ˆ = σ(z) h

ˆ h

softplus

log σ(zi ) = −ζ(−zi )

ˆi log h

(19.29)

L(v, θ, q)

=Eh∼q [log p(h, v)] + H(q)

(19.30)

=Eh∼q [log p(h) + log p(v | h) − log q(h | v)] !m # n m " " " =Eh∼q log p(hi ) + log p(vi | h) − log q(hi | v)

(19.31)

i=1

=

m $ " i=1

i=1

hˆi (log σ(bi ) − log hˆi ) + (1 − hˆi )(log σ(−bi ) − log(1 − hˆi ))

+ Eh∼q

!

n " i=1

log

&

(19.32)

i=1

' (# βi βi 2 exp − (vi − Wi,: h) 2π 2

610

%

(19.33) (19.34)

19.

=

m " ! i=1

+

hˆi (log σ(bi ) − log hˆi ) + (1 − hˆi )(log σ(−bi ) − log(1 − hˆi ))

n 1!

2





ˆ+ ⎣log βi − βi ⎝vi2 − 2vi Wi,: h 2π i=1

! j



2 ˆ ⎣Wi,j hj +

v

k̸=j

(19.35) ⎤⎞⎤

ˆj h ˆ k ⎦⎠⎦ . Wi,j Wi,k h

(19.36)

L

L

L

!

#

h 2

1

2

v

ˆ h

ˆ h ˆ h

v

ˆ h

2 ˆ h ˆ =0 ∇h L(v, θ, h)

ˆ h

∂ ˆ = 0. L(v, θ, h) ˆi ∂h

(19.37)

i = 1, . . . , m L

ˆ h

ˆi h

19.36

611

19.37

19.

∂ ˆ L(v, θ, h) (19.38) ˆi ∂h ⎡ m % ∂ ⎣# $ ˆ = hj (log σ(bj ) − log hˆj ) + (1 − hˆj )(log σ(−bj ) − log(1 − hˆj )) (19.39) ˆi ∂h j=1 ⎡ ⎛ ⎡ ⎤ ⎞⎤ ⎤ n # # # 1 2 ˆ ˆ+ ˆkh ˆ l ⎦ ⎠⎦ ⎦ ⎣log βj − βj ⎝vj2 − 2vj Wj,: h ⎣Wj,k + hk + Wj,k Wj,l h 2 j=1 2π k

l̸=k

(19.40)

ˆ i − 1 + log(1 − h ˆ i ) + 1 − log σ(−bi ) = log σ(bi ) − log h ⎡ ⎛ ⎞⎤ n # # 1 2 ˆ k ⎠⎦ ⎣βj ⎝vj Wj,i − Wj,i + − Wj,k Wj,i h 2 j=1

(19.41) (19.42)

k̸=i

# ⊤ ˆ i ) + v ⊤ βW:,i − 1 W ⊤ βW:,i − ˆj . =bi − log hˆi + log(1 − h W:,j βW:,i h 2 :,i

(19.43)

j̸=i

19.43

0

ˆ i = σ ⎝bi + v ⊤ βW:,i − 1 W ⊤ βW:,i − h 2 :,i

#



j̸=i

20 19.44 v ⊤ βW 2

ˆi h

ˆj h 2

612

ˆi h ⎞

⊤ ˆj ⎠ . W:,j βW:,i h

(19.44)

19.

q 1 3.6 19.44 ⎛



ˆi = σ ⎜ h ⎝bi + ⎝v −

$ j̸=i

ˆ j ⎠ βW:,i − 1 W ⊤ βW:,i ⎟ W:,j h ⎠. 2 :,i v−

v i

1



⎞⊤

(

j̸=i

(19.45)

ˆj W:,j h

v

1 ˆ h damping ˆ h L

Koller and Friedman (2009)

19.4.2 calculus of variations θ ∈ Rn

J(θ) ∇θ J(θ) = 0 613

19.

f (x) functional J[f ]

f

x f (x)

functional derivatives

J[f ]

variational derivatives f

x δ δf (x) J

J

f (x) δ δf (x)

g(y, x) !

g (f (x), x) dx =

∂ g(f (x), x). ∂y

f (x)

(19.46)

x θ ∈ Rn

∂ " ∂ g(θj , j) = g(θi , i). ∂θi j ∂θi

(19.47)

Euler-Lagrange equation

g

f

f

0

0 x∈R

p(x) H[p] = −Ex log p(x).

H[p] = −

!

p(x) log p(x)dx.

614

(19.48)

(19.49)

19.

p(x)

H[p] 1

p(x)

σ2 µ

L[p] = λ1 =

"

$

!"

# $ % p(x)dx − 1 + λ2 (E[x] − µ) + λ3 E[(x − µ)2 ] − σ 2 + H[p]

(19.50)

% λ1 p(x) + λ2 p(x)x + λ3 p(x)(x − µ)2 − p(x) log p(x) dx − λ1 − µλ2 − σ 2 λ3 (19.51) 0

p ∀x,

δ L = λ1 + λ2 x + λ3 (x − µ)2 − 1 − log p(x) = 0. δp(x)

(19.52)

p(x) $ % p(x) = exp λ1 + λ2 x + λ3 (x − µ)2 − 1

(19.53)

p(x) λ λ



0

λ

λ1 = 1 − log σ 2π λ2 = 0 λ3 = − 2σ1 2 p(x) = N (x; µ, σ 2 )

(19.54) 1

1 615

19.

2 2

x=µ+σ

x

x=µ−σ

2

0

1 2

0

19.4.3

q(h | v)

q(h | v) = j ̸= i

i

q(hi | v)

L

(19.55)

q(hj | v)

" # q˜(hi | v) = exp Eh−i ∼q(h−i |v) log p˜(v, h) 0

p

!

L

q(hi | v)

q(hi | v)

19.56

q

19.56

i 616

(19.56)

19.

p(h) = N (h; 0, I)

h ∈ R2

1

v ⊤

p(v | h) = N (v; w h; 1)

h v

(19.57)

p(h | v)

(19.58)

∝p(h, v)

=p(h1 )p(h2 )p(v | h) (19.59) $ ! # 1" ∝ exp − h21 + h22 + (v − h1 w1 − h2 w2 )2 (19.60) 2 ! $ # 1" = exp − h21 + h22 + v 2 + h21 w12 + h22 w22 − 2vh1 w1 − 2vh2 w2 + 2h1 w1 h2 w2 . (19.61) 2

h1

h2

h1

h2

19.56 (19.62)

q˜(h1 | v) % & = exp Eh2 ∼q(h2 |v) log p˜(v, h) ! " 1 = exp − Eh2 ∼q(h2 |v) h21 + h22 + v 2 + h21 w12 + h22 w22 2 $ −2vh1 w1 − 2vh2 w2 + 2h1 w1 h2 w2 ]

(19.64) (19.65)

2

q(h2 | v)

Eh2 ∼q(h|v) [h2 ]

(19.63)

Eh2 ∼q(h|v) [h22 ] !

⟨h2 ⟩

1" 2 h + ⟨h22 ⟩ + v 2 + h21 w12 + ⟨h22 ⟩w22 2 1 $ −2vh1 w1 − 2v⟨h2 ⟩w2 + 2h1 w1 ⟨h2 ⟩w2 ] .

q˜(h1 | v) = exp −

617

⟨h22 ⟩ (19.66) (19.67)

19.

q˜ N (h; µ, β

−1

)

µ

β

q(h | v) =

q L

q

q Goodfellow et al. (2013d)

19.4.4

(19.68)

Eh∼q log p(v, h) q(h | v)

v p(h | v)

q(h | v) *1

log p(v) log p(v; θ)

L(v, θ, q)

θ

*1

618

h h

p(h | v)

19.

θ ∗ = maxθ log p(v; θ) ∗



log p(v; θ) ≪ log p(v; θ )

L(v, θ, q) ≈ log p(v; θ)

maxq L(v, θ , q) ≪ log p(v; θ ∗ )

q

θ∗

θ∗ θ∗

19.5

Learned approximate inference L

v q ∗ = arg maxq L(v, q)

f ˆ f (v; θ)

19.5.1 Wake-Sleep v

1

h v

v

h

h θ

wake-sleep

(Hinton et al., 1995b; Frey et al., 1996)

h v

v h

v

18.2

1

619

h

v

19.

1

v

h

p(h, v)

q

L

θ

q L

L

log p(v)

19.5.2 Salakhutdinov and Larochelle (2010)

1

DBM

1

14.8 MAP PSD ISTA technique(Gregor and LeCun, 2010b)

620

19.

(Kingma, 2013; Rezende et al., 2014) 1 L

L

20.10.3

621

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF