Deep Learning:第19章
September 18, 2017 | Author: matsuolab | Category: N/A
Short Description
Deep Learning日本語翻訳版...
Description
600
19
v p(h | v)
h
p(h | v) PCA p(h | v)
1 p(h | v)
20
19.1
19.
19.1: V semi-restricted Boltzmann machine (Osindero and Hinton, 2008)
2 1 PCA
19.1
v log p(v; θ)
h h
log p(v; θ)
log p(v; θ) evidence lower bound ELBO
L(v, θ, q)
variational free
601
19.
energy (19.1)
L(v, θ, q) = log p(v; θ) − DKL (q(h | v)∥p(h | v; θ)) q
h KL
L(v, θ, q)
log p(v) 2
q
L
KL
p(h | v) L
L
q
(19.2)
L(v, θ, q) = log p(v; θ) − DKL (q(h | v)∥p(h | v; θ)) = log p(v; θ) − Eh∼q log = log p(v; θ) − Eh∼q log
q(h | v) p(h | v) q(h | v)
(19.3) (19.4)
p(h,v;θ) p(v;θ)
= log p(v; θ) − Eh∼q [log q(h | v) − log p(h, v; θ) + log p(v; θ)] = − Eh∼q [log q(h | v) − log p(h, v; θ)] .
p(h | v)
L
L
q q(h | v)
q(h | v) = p(h | v)
log p(v)
L(v, θ, q) = log p(v; θ) p(h | v)
L
q
q L
q
q
L
q
L 602
(19.6)
(19.7)
L(v, θ, q) = Eh∼q [log p(h, v)] + H(q). q
(19.5)
L
L
19.
q q
19.2 expectation max-
L
imization EM
EM Neal and Hinton (1999)
EM
EM EM
2
• E
E-step θ
(
Expectation step)
(0)
v (i) q(h(i) | v) = p(h(i) | v (i) ; θ (0) )
i θ (0)
q p(h | v; θ)
• M
M-step
(
q(h | v)
θ
)
Maximization step)
θ ! i
q
p(h | v; θ
(0)
L
L(v (i) , θ, q)
L
(19.8)
L
θ M
EM
EM M q
θ E
EM M 603
q
19.
M
θ θ
(0)
E
L
E
log p(v) 0
EM
EM EM 1
1
θ
q
M M EM
2
19.3 MAP p(h | v) 1 h∗ = arg max p(h | v)
(19.9)
h
maximum a posteriori MAP MAP
h∗
MAP
q
L(v, h, q)
q
MAP 19.1 q L(v, θ, q) = Eh∼q [log p(h, v)] + H(q) 604
(19.10)
19.
.
MAP
q q
(19.11)
q(h | v) = δ(h − µ) µ
q
µ
L µ∗ = arg max log p(h = µ, v)
(19.12)
µ
MAP h∗ = arg max p(h | v)
(19.13)
h
EM MAP
h∗
EM L
log p(h∗ , v)
θ
L
L
q L
θ
log p(v)
MAP
µ MAP 13.4
p(hi ) =
λ −λ|hi | e . 2
(19.14)
p(x | h) = N (v; W h + b, β −1 I). p(h | v)
v
(19.15) hi
v
hi 605
hj
hj
19.
1
p(h | v)
p(h | v)
MAP
MAP
h
ELBO h
H
v
V J(H, W ) =
! i,j
|Hi,j | +
!" i,j
V − HW ⊤
#2
(19.16)
i,j
H
W
W H
W
J W
J feature-sign search
H
(Lee et al., 2007)
19.4 L
L(v, θ, q)
log p(v; θ) θ
EM
q
q L
MAP p(h | v) q Eq log p(h, v)
L
q q q(h | v) =
$ i
606
q(hi | v)
(19.17)
19.
mean field q structured variational inference (Saul and Jordan, 1996)
q
q q
q q L(v, θ, q)
log p(v; θ) − DKL (q(h | v)∥p(h | v; θ))
DKL (q(h | v)∥p(h | v))
q q
p
KL DKL (pdata ∥pmodel )
3.6
KL
q
q
L
DKL (q(h | v)∥p(h | v))
q
DKL (q(h | v)∥p(h | v)) KL
DKL (p(h | v)∥q(h | v))
607
19.
19.4.1 q q h
q
hi ˆ h
q
ˆi q(hi = 1 | v) = h q q
ˆi h ∂ L=0 ˆi ∂h
(19.18)
ˆ h
.
binary sparse coding model
Henniges et al. (2010)
3.10
v ∈ Rn
m
h ∈ {0, 1}m p(hi = 1) = σ(bi ) 608
(19.19)
19.
p(v | h) = N (v; W h, β −1 ) .
b
(19.20)
W
β
1
=
∂ log p(v) ∂bi ∂ ∂bi p(v) p(v) ! ∂
= ∂bi = = =
∂ ∂bi
!
p(h, v)
p(h, v)
(19.23)
p(v) ! h p(h)p(v | h)
h
(19.24)
p(v | h) ∂b∂ i p(h)
(19.25)
p(v)
∂ p(h) p(h | v) ∂bi p(h)
=Eh∼p(h|v) p(h | v)
(19.22)
p(v)
" h
h
(19.21)
(19.26)
∂ log p(h). ∂bi
(19.27) p(h | v)
19.2
p(h | v)
brute force
q(h | v) =
# i
(19.28)
q(hi | v). q
m
q(hi | v) ˆ h
ˆi q(hi = 1 | v) = h 609
ˆi h
19.
h1
h2
v1
h3
v2
h4 h1
h3
h2
h4
v3
19.2: 4
p(h, v) 2 p(h | v)
0
ˆi log h
1 ˆi h
0
1 0
1
z ˆ = σ(z) h
ˆ h
softplus
log σ(zi ) = −ζ(−zi )
ˆi log h
(19.29)
L(v, θ, q)
=Eh∼q [log p(h, v)] + H(q)
(19.30)
=Eh∼q [log p(h) + log p(v | h) − log q(h | v)] !m # n m " " " =Eh∼q log p(hi ) + log p(vi | h) − log q(hi | v)
(19.31)
i=1
=
m $ " i=1
i=1
hˆi (log σ(bi ) − log hˆi ) + (1 − hˆi )(log σ(−bi ) − log(1 − hˆi ))
+ Eh∼q
!
n " i=1
log
&
(19.32)
i=1
' (# βi βi 2 exp − (vi − Wi,: h) 2π 2
610
%
(19.33) (19.34)
19.
=
m " ! i=1
+
hˆi (log σ(bi ) − log hˆi ) + (1 − hˆi )(log σ(−bi ) − log(1 − hˆi ))
n 1!
2
⎡
⎛
ˆ+ ⎣log βi − βi ⎝vi2 − 2vi Wi,: h 2π i=1
! j
⎡
2 ˆ ⎣Wi,j hj +
v
k̸=j
(19.35) ⎤⎞⎤
ˆj h ˆ k ⎦⎠⎦ . Wi,j Wi,k h
(19.36)
L
L
L
!
#
h 2
1
2
v
ˆ h
ˆ h ˆ h
v
ˆ h
2 ˆ h ˆ =0 ∇h L(v, θ, h)
ˆ h
∂ ˆ = 0. L(v, θ, h) ˆi ∂h
(19.37)
i = 1, . . . , m L
ˆ h
ˆi h
19.36
611
19.37
19.
∂ ˆ L(v, θ, h) (19.38) ˆi ∂h ⎡ m % ∂ ⎣# $ ˆ = hj (log σ(bj ) − log hˆj ) + (1 − hˆj )(log σ(−bj ) − log(1 − hˆj )) (19.39) ˆi ∂h j=1 ⎡ ⎛ ⎡ ⎤ ⎞⎤ ⎤ n # # # 1 2 ˆ ˆ+ ˆkh ˆ l ⎦ ⎠⎦ ⎦ ⎣log βj − βj ⎝vj2 − 2vj Wj,: h ⎣Wj,k + hk + Wj,k Wj,l h 2 j=1 2π k
l̸=k
(19.40)
ˆ i − 1 + log(1 − h ˆ i ) + 1 − log σ(−bi ) = log σ(bi ) − log h ⎡ ⎛ ⎞⎤ n # # 1 2 ˆ k ⎠⎦ ⎣βj ⎝vj Wj,i − Wj,i + − Wj,k Wj,i h 2 j=1
(19.41) (19.42)
k̸=i
# ⊤ ˆ i ) + v ⊤ βW:,i − 1 W ⊤ βW:,i − ˆj . =bi − log hˆi + log(1 − h W:,j βW:,i h 2 :,i
(19.43)
j̸=i
19.43
0
ˆ i = σ ⎝bi + v ⊤ βW:,i − 1 W ⊤ βW:,i − h 2 :,i
#
⎛
j̸=i
20 19.44 v ⊤ βW 2
ˆi h
ˆj h 2
612
ˆi h ⎞
⊤ ˆj ⎠ . W:,j βW:,i h
(19.44)
19.
q 1 3.6 19.44 ⎛
⎛
ˆi = σ ⎜ h ⎝bi + ⎝v −
$ j̸=i
ˆ j ⎠ βW:,i − 1 W ⊤ βW:,i ⎟ W:,j h ⎠. 2 :,i v−
v i
1
⎞
⎞⊤
(
j̸=i
(19.45)
ˆj W:,j h
v
1 ˆ h damping ˆ h L
Koller and Friedman (2009)
19.4.2 calculus of variations θ ∈ Rn
J(θ) ∇θ J(θ) = 0 613
19.
f (x) functional J[f ]
f
x f (x)
functional derivatives
J[f ]
variational derivatives f
x δ δf (x) J
J
f (x) δ δf (x)
g(y, x) !
g (f (x), x) dx =
∂ g(f (x), x). ∂y
f (x)
(19.46)
x θ ∈ Rn
∂ " ∂ g(θj , j) = g(θi , i). ∂θi j ∂θi
(19.47)
Euler-Lagrange equation
g
f
f
0
0 x∈R
p(x) H[p] = −Ex log p(x).
H[p] = −
!
p(x) log p(x)dx.
614
(19.48)
(19.49)
19.
p(x)
H[p] 1
p(x)
σ2 µ
L[p] = λ1 =
"
$
!"
# $ % p(x)dx − 1 + λ2 (E[x] − µ) + λ3 E[(x − µ)2 ] − σ 2 + H[p]
(19.50)
% λ1 p(x) + λ2 p(x)x + λ3 p(x)(x − µ)2 − p(x) log p(x) dx − λ1 − µλ2 − σ 2 λ3 (19.51) 0
p ∀x,
δ L = λ1 + λ2 x + λ3 (x − µ)2 − 1 − log p(x) = 0. δp(x)
(19.52)
p(x) $ % p(x) = exp λ1 + λ2 x + λ3 (x − µ)2 − 1
(19.53)
p(x) λ λ
√
0
λ
λ1 = 1 − log σ 2π λ2 = 0 λ3 = − 2σ1 2 p(x) = N (x; µ, σ 2 )
(19.54) 1
1 615
19.
2 2
x=µ+σ
x
x=µ−σ
2
0
1 2
0
19.4.3
q(h | v)
q(h | v) = j ̸= i
i
q(hi | v)
L
(19.55)
q(hj | v)
" # q˜(hi | v) = exp Eh−i ∼q(h−i |v) log p˜(v, h) 0
p
!
L
q(hi | v)
q(hi | v)
19.56
q
19.56
i 616
(19.56)
19.
p(h) = N (h; 0, I)
h ∈ R2
1
v ⊤
p(v | h) = N (v; w h; 1)
h v
(19.57)
p(h | v)
(19.58)
∝p(h, v)
=p(h1 )p(h2 )p(v | h) (19.59) $ ! # 1" ∝ exp − h21 + h22 + (v − h1 w1 − h2 w2 )2 (19.60) 2 ! $ # 1" = exp − h21 + h22 + v 2 + h21 w12 + h22 w22 − 2vh1 w1 − 2vh2 w2 + 2h1 w1 h2 w2 . (19.61) 2
h1
h2
h1
h2
19.56 (19.62)
q˜(h1 | v) % & = exp Eh2 ∼q(h2 |v) log p˜(v, h) ! " 1 = exp − Eh2 ∼q(h2 |v) h21 + h22 + v 2 + h21 w12 + h22 w22 2 $ −2vh1 w1 − 2vh2 w2 + 2h1 w1 h2 w2 ]
(19.64) (19.65)
2
q(h2 | v)
Eh2 ∼q(h|v) [h2 ]
(19.63)
Eh2 ∼q(h|v) [h22 ] !
⟨h2 ⟩
1" 2 h + ⟨h22 ⟩ + v 2 + h21 w12 + ⟨h22 ⟩w22 2 1 $ −2vh1 w1 − 2v⟨h2 ⟩w2 + 2h1 w1 ⟨h2 ⟩w2 ] .
q˜(h1 | v) = exp −
617
⟨h22 ⟩ (19.66) (19.67)
19.
q˜ N (h; µ, β
−1
)
µ
β
q(h | v) =
q L
q
q Goodfellow et al. (2013d)
19.4.4
(19.68)
Eh∼q log p(v, h) q(h | v)
v p(h | v)
q(h | v) *1
log p(v) log p(v; θ)
L(v, θ, q)
θ
*1
618
h h
p(h | v)
19.
θ ∗ = maxθ log p(v; θ) ∗
∗
log p(v; θ) ≪ log p(v; θ )
L(v, θ, q) ≈ log p(v; θ)
maxq L(v, θ , q) ≪ log p(v; θ ∗ )
q
θ∗
θ∗ θ∗
19.5
Learned approximate inference L
v q ∗ = arg maxq L(v, q)
f ˆ f (v; θ)
19.5.1 Wake-Sleep v
1
h v
v
h
h θ
wake-sleep
(Hinton et al., 1995b; Frey et al., 1996)
h v
v h
v
18.2
1
619
h
v
19.
1
v
h
p(h, v)
q
L
θ
q L
L
log p(v)
19.5.2 Salakhutdinov and Larochelle (2010)
1
DBM
1
14.8 MAP PSD ISTA technique(Gregor and LeCun, 2010b)
620
19.
(Kingma, 2013; Rezende et al., 2014) 1 L
L
20.10.3
621
View more...
Comments