Deep Learning:第14章
September 18, 2017 | Author: matsuolab | Category: N/A
Short Description
Deep Learning日本語翻訳版...
Description
478
14
autoencoder code h
h = f (x)
r = g(h)
2 14.1
g(f (x)) = x
pencoder (h | x)
pdecoder (x | h)
(LeCun, 1987; Bourlard and Kamp, 1988; Hinton and Zemel, 1994) 20
recirculation
(Hinton and McClelland, 1988)
14.
h
14.1:
f
g
x
r
h
x f
r x
h
g h
r
2
14.1
h 1
h
x undercomplete
(14.1)
L(x, g(f (x))) L
g(f (x))
x PCA
L
f
g
PCA 1 i 479
(i)
x
14.
14.2
overcomplete
2 (Hinton et al., 1995b)
20.10.3
20.12
480
14.
14.2.1
h
Ω(h) (14.2)
L(x, g(f (x))) + Ω(h) g(h)
h = f (x)
Ω(h) (
)
5.6.1 MAP p(θ | x)
log p(x | θ) + log p(θ)
log p(x | θ)
log p(θ)
5.6
θ
x pmodel (h)pmodel (x | h)
h
pmodel (x h) = pmodel (h)
x
p(θ) 481
14.
log pmodel (x) = log
!
(14.3)
pmodel (h, x)
h
1
h
13.4 h
h h log pmodel (h, x) = log pmodel (h) + log pmodel (x | h)
(14.4)
log pmodel (h) pmodel (hi ) =
Ω(h) = λ
! i
− log pmodel (h) = λ
!" i
λ −λ|hi | e 2
(14.5)
(14.6)
|hi | λ λ|hi | − log 2
#
= Ω(h) + const
h
λ
t pmodel (h)
(Ranzato et al., 2007a, 2008) p(x) = log Z
1 ˜(x) Zp
log Z
482
(14.7)
14.
pmodel (h)pmodel (x | h)
log pmodel (h) 1
h et al. (2011b)
Glorot
ReLU
14.2.2 Ω
L(x, g(f (x))) L
L2
x
(14.8)
g(f (x))
g◦f denoising autoencoder DAE) ˜ L(x, g(f (x))) ˜ x
x Alain and Bengio (2013)Bengio et al. (2013c)
g
(14.9)
f
pdata (x) 1
14.5
14.2.3 1 Ω L(x, g(f (x))) + Ω(h, x) 483
(14.10)
14.
Ω Ω(h, x) = λ
! i
||∇x hi ||2
(14.11)
x
contractive autoencoder CAE CAE
14.7
14.3
6.4.1 .
1
1 1
1
6.4.1 (Hinton and Salakhutdinov, 2006) 484
14.
14.4
6.2.2.4 p(y | x)
− log p(y | x)
y
x pdecoder (x | h)
h − log pdecoder (x | h) pdecoder
x x x h
h
pencoder (h | x)
pdecoder (x | h)
x
r
14.2: pdecoder (x | h)
pencoder (h | x)
14.2 encoding function f (x) 485
encoding
14.
distribution pencoder (h | x) pmodel (h x) pencoder (h | x) = pmodel (h | x)
(14.12)
pdecoder (x | h) = pmodel (x | h)
(14.13) pmodel (x h)
Alain et al. (2015) (
)
14.5 denoising autoencoder DAE
h g
f ˜ x
L
˜ | x) C(x x
14.3: ˜ x
x
˜ x ˜ L = − log pdecoder (x | h = f (x))
pdecoder g
DAE
14.3
C(˜ x | x) ˜ x
x 486
x ˜ | x) C(x
14.
reconstruction
˜ (x x) ˜) distribution preconstruct (x | x 1.
:
x
2. C(˜ x | x = x)
˜ x
˜ 3. (x, x)
pdecoder (x | h)
h
˜ f (x)
˜ = preconstruct (x | x) pdecoder
g(h)
− log pdecoder (x | h)
DAE (14.14)
˜ − Ex∼pˆdata (x) Ex˜ ∼C(˜x|x) log pdecoder (x | h = f (x)) pˆdata (x)
14.5.1 (Hyvärinen, 2005) score
x
(14.15)
∇x log p(x) 18.4 log pdata
1
pdata
DAE
p(x | h) 14.4
487
(g(f (x)) − x)
14.
˜ x x
g f
˜ x
x
14.4:
˜ | x) C(x
˜ x
x
x ˜ | x) C(x ˜ − x||2 ||g(f (x)) ˜ Ex,˜x∼pdata (x)C(˜x|x) [x | x] ˜ g(f (x))
˜ g(f (x)) ˜ x
˜ −x ˜ g(f (x))
x g(f (x)) − x
∇x log pdata (x)
RBM (Vincent, 2011)
20.5.1
pmodel (x; θ) denoising score matching
(Kingma and LeCun, 2010) RBM
0 18.5 RBM
RBM CAE (Swersky et al., 2011) 488
RBM
14.
Bengio and Delalleau (2009) RBM x (Alain and Bengio, 2013) ˜ − x||2 ||g(f (x))
(14.16)
˜ ˜ µ = x, Σ = σ 2 I) C(˜ x = x|x) = N (x;
(14.17)
σ2
14.5
g(f (x))
x (Vincent, 2011)
g(f (x)) − x
Kamyshanska
and Memisevic (2015)
g(f (x)) − x
Vincent (2011)
20.11 14.5.1.1 MLP
LeCun (1987)
Gallinari et al. (1987)
Behnke (2001) MLP
.
(Vincent et al., 2008, 2010)
DAE 489
14.
14.5:
1D 2D
0
1 2
Alain and Bengio (2013)
DAE
Inayoshi and Kurita (2005)
MLP
. DAE
490
14.
14.6 . 5.11.3
.
tangent planes
1 d
x 14.6
d x 2 1.
x
h
h
x
x
2.
2
x
h = f (x) x
491
14.
14.6: MNIST
784
1
784
1 1 PCA 1
2
n
n 1
1
492
14.
1
14.7
.
1.0
r(x)
0.8
Identity Optimal reconstruction
0.6 0.4 0.2 0.0
x0
x1
x2
x
14.7: 0
1 r(x)
r(x) − x 1
r(x)
representation
( )
nearest-neighbor graph
non-parametric 493
14.
14.8: embedding . QMUL (Gong et al., 2000)
1 (Schölkopf et al., 1998; Roweis and Saul, 2000; Tenenbaum et al., 2000; Brand, 2003; Belkin and Niyogi, 2003; Donoho and Grimes, 2003; Weinberger and Saul, 2004; Hinton and Roweis, 2003; van der Maaten and Hinton, 2008) . 14.9
Bengio and Monperrus (2005) 494
14.8
14.
14.9:
14.6 . Parzen (Vincent and Bengio, 2003) (Bengio et al., 2006c)
AI 14.6 1
495
xi
14.
14.7 (Rifai et al., 2011a,b)
h = f (x)
f ! ! ! ∂f (x) !2 ! ! Ω(h) = λ ! ∂x !F
Ω(h) (2
(14.18) 2
) Alain and Bengio (2013) x
r = g(f (x))
f (x) g(f (x))
f (x)
14.5.1
f (x)
contractive
CAE
CAE
CAE
x 2
f (x)
f (x′ )
x
f (x)
′
x
f 14.7
1
Ω(h) 1
0
CAE
1
CAE
x
J
f (x) 496
14.
Jx
x
1 J
CAE
x
f (x) 14.6
2 CAE
2 CAE
Ω(h) CAE
x ∂f (x) ∂x
2 CAE
x
Jx
h Rifai et al. (2011a)
Rifai et al. (2011b)
CAE
J
1 CAE
1
14.6 CAE 14.10 CAE
1
Rifai et al. (2011a)
1 497
1
14.
PCA (
)
14.10: PCA CIFAR-10 PCA
∂h ∂x
CAE
CAE CAE CAE Rifai et al. (2011c)
ϵ ϵ
ϵ Ω(h)
0
0
Rifai et al. (2011a) f
f
g
g g
f
14.8 Predictive sparse decomposition PSD (Kavukcuoglu et al., 2008) PSD (Kavukcuoglu et al., 2009, 2010; Jarrett et al., 2009; Farabet et al., 2011)
(Henaff et al., 2011) f (x)
498
g(h)
h
14.
||x − g(h)||2 + λ|h|1 + γ||h − f (x)||2
(14.19) h
f (x) h
f (x)
h
h
10
h
PSD PSD
f (x) f (x)
learned approximate inference 19.5
19
PSD
PSD
f f h
f
PSD
14.9 1
1
Hinton and Salakhutdinov (2006) 30
RBM 30
PCA
499
14.
Salakhutdinov and Hinton (2007b)
Torralba et al. (2008)
information retrieval
semantic hashing
(Salakhutdinov and Hinton, 2007b, 2009b)
(Salakhutdinov and Hinton, 2007b, 2009b)
(Torralba et al.,
2008; Weiss et al., 2008; Krizhevsky and Hinton, 2011)
0
1
(Norouzi and Fleet, 2011)
500
View more...
Comments