Deep Learning:第15章
September 20, 2017 | Author: matsuolab | Category: N/A
Short Description
Deep Learning日本語翻訳版...
Description
501
15
(Hinton et al., 1986)
210 CCX
6
VI
O(n) O(log n)
15.
(Salakhutdinov and Hinton, 2007a)
2
h
1 502
15.
15.1
unsupervised pretraining
greedy layer-wise unsu-
pervised pretraining
RBM 1
15.1
(Fukushima, 1975)
2006
(Hinton et al., 2006; Hinton and Salakhutdinov, 2006; Hinton, 2006; Bengio et al., 2007; Ranzato et al., 2007a)
greedy
greedy algo-
rithm layer-wise
1 503
15.
1
1
k
unsupervised pretraining fine-tune
2
(Hinton and Salakhutdinov, 2006) (Hinton et al., 2006) (Salakhutdinov and Hinton, 2009a) 20 8.7.4 (Erhan et al., 2010)
15.1.1
2006 (Hinton et al., 2006; Bengio et al., 2007; Ranzato et al., 2007a) Ma et al. (2015) 504
15.
Algorithm 15.1
. L
f 1
f (1) (X) T
X
1
X
T
X
L
f
X
Y m
f ← Identity function ˜ =X X for k = 1, . . . , m do ˜ f (k) = L(X) f ← f (k) ◦ f ˜ ← f (k) (X) ˜ X
end for
if fine-tuning then f ← T (f, X, Y )
end if
Return f
7.13 1 RBM(Larochelle and Bengio, 2008) (Rasmus et al., 2015)
2 2
1
1 1 2
505
15.
1
2 1
1 506
15.
1
2
one-hot
L2
2
one-hot
2011
2 (Mesnil et al., 2011; Goodfellow et al., 2011) 5 Paine et al. (2014)
(Hinton and Salakhutdinov, 2006) Erhan et al. (2010) 507
15.
1500
With pretraining Without pretraining
1000 500 0 −500 −1000 −1500 −4000 −3000 −2000 −1000
0
1000
2000
3000
4000
15.1:
Erhan et al. (2010) x
Erhan et al.
y
(2010) x y Isomap(Tenenbaum et al., 2000)
2
y 1 Isomap With pretraining:
Without pretraining:
15.1
508
15.
Erhan et al. (2010)
ReLU
1
1
15.3 2 1
1
2
1 2
1
1
2 Larochelle et al. (2009)
509
15.
one-hot 1
Collobert and Weston (2008b)
Turian et al. (2010) Collobert et al.
(2011a)
CIFAR-10
MNIST
5,000
(Srivastava, 2013) 8.7.4 supervised pretraining ImageNet (Oquab et al., 2014; Yosinski et al., 2014) (Collobert et al., 2011a; Mikolov et al., 2013a)
15.2 P1 P2
510
15.
transfer learning
2
P1
P2
P1 P2
( 7.7 ) 7.2
15.2 domain adaptation
(Glorot et al., 2011b) concept drift 1
511
15.
y
h(shared)
Selection switch h(1)
h(2)
h(3)
x(1)
x(2)
x(3)
15.2:
y x 3
x(1) x(2) x(3) shared:
Selection
switch:
2 2
(Mesnil et al., 2011; Goodfellow et al., 2011) P1 P2
1
P1 512
15.
2
P2
one-shot learning
zero-shot learning zero-data
2 learning
1
(Fei-Fei et al., 2006) 1
1
1
4
(Larochelle et al., 2008)
(Palatucci et al., 2009;
Socher et al., 2013b) 3 x
y
T
p(y | x, T ) y=1
"yes" y = 0
T
"no"
y
T T T
4
T 513
15.
one-hot
T Socher et al. (2013b)
(Klementiev et al., 2012; Mikolov et al., 2013b; Gouws et al., 2014) X A
Y
B A X
Y
2 3
2
multi-modal learning x
y
(x, y)
(Srivastava and Salakhutdinov, 2012)
3
x
2
y
15.3
15.3 1
p(x)
y 1990
p(y | x)
(Becker and Hinton, 1992; Hinton and Sejnowski, 1999) 514
x
15.
hx = fx (x) hy = fy (y)
fx fy x space y space xtest ytest
(x, y) pairs in the training set fx : encoder function for x fy : encoder function for y Relationship between embedded points within one of the domains
Maps between representation spaces
15.3: x
y
2 x
fx
fx
y
fy
fy
hx
x
x hy
fx (x)
fy (y)
y
1 (x, y)
xtest fx (xtest )
ytest
fy (ytest )
fx (xtest ) fy (ytest ) Hrant Khachatrian x-space: x pairs in the training set: encoder function for x: x Relationship between embedded points within one of the domains: 1 Maps between representation spaces:
515
15.
Chapelle et al. (2006)
1.2
AI
2 h
x
y
h
y=2
y=3
p(x)
y=1
y
x
15.4: y p(x)
3
x y
p(y | x)
p(x)
f (x) = E[y | x]
p(x) x
p(y | x) 15.4
y
1
x p(x) 1 516
p(y | x)
15.
p(y | x) y
y
1
x
p(x)
1
x h
p(x)
p(y | x)
h
x p(h, x) = p(x | h)p(h)
(15.1)
p(x) = Eh p(x | h)
(15.2)
x
x h 1
y 1
y x
y p(y | x) = p(x)
p(x | y)p(y) p(x)
(15.3)
p(y | x)
y = hi
hi
hj hj
y
h
517
y
15.
Simons and Levin (1998)
2
1 2
15.5
Generative Adversarial Networks, GANs (Goodfellow et al., 2014c)
20.10.4 Lotter et al. (2015)
518
15.
Input
Reconstruction
15.5:
Chelsea Finn Input:
Reconstruction:
15.6 1
Schölkopf et al. (2012) x
p(x | y)
y p(x | y)
h
p(y)
p(x | h)
519
p(y)
15.
Ground Truth
MSE
Adversarial
15.6: 3D
Lotter et al. (2015) Adversarial:
Ground Truth:
MSE:
15.4 1 n
k k
n
15.3
1
n 2n 520
15.7
15.
1 n n 15.8
n one-hot
n 1
n
1
• k 1
• k
k>1
1
• •
k
•
• n
/ 1
521
w1
w2
2
15.
h2
h3 h = [1, 0, 0]>
h = [1, 1, 0]>
h = [1, 0, 1]> h = [1, 1, 1]>
h1 h = [0, 1, 0]>
h = [0, 1, 1]>
h = [0, 0, 1]>
15.7: 3
h 1 h 2 h3
R2 h− i
2
h+ i
hi = 0
hi = 1 1
hi
h+ i [1, 1, 1]⊤ 15.8 R
d
+ + h+ 1 ∩ h2 ∩ h3 d
n
O(nd ) n
n h
h=0 VC O(w log w)
(Sontag, 1998)
w
hi
522
15.
15.8:
prior 15.7
2
one-hot 12.4
u≈v
f
523
f (u) ≈ f (v)
15.
f (x) ≈ y
(x, y)
x+ϵ fˆ
*1
15.7
Rd n Rd n (Zaslavsky, 1975) d " # ! n j=0
j
= O(nd )
(15.4)
(Pascanu et al., 2014b)
Rd
O(nd)
n
d
O(n ) 1 Rd
O(nd )
*1
d
2 O(2d )
524
f
2d
15.
O(nd )
k≪r
k
r
r
VC
O(r)
(Sontag, 1998)
O(w log w) h
w
y h
1
Zhou et al. (2015)
ImageNet
Places
Radford et al. (2015) 15.9
525
15.
-
+
=
15.9:
Radford et al. (2015)
n−1
15.5 6.4.1
15.4
526
15.
AI
1 RBF
k 2
k−1
6.4.1 1 (Le Roux and Bengio, 2008, 2010; Montúfar and Ay, 2011; Montúfar, 2014; Krause et al., 2013) 6.4.1 1 sum-product network SPN
(Poon and Domingos, 2011) Delalleau
and Bengio (2011)
SPN Martens and Medabalimi
(2014)
SPN
2
527
SPN
15.
(Cohen et al., 2015)
15.6 1
15.3
x
y
1
1 AI
Bengio et al. (2013d)
•
d
ϵ
•
528
3.1
f (x + ϵd) ≈ f (x)
15.
Goodfellow et al. (2014b) • 15.3 p(x)
p(y | x)
p(y | x)
p(x)
15.4
•
h
x 15.3
•
•
yi
x x f (i) (x)
yi
h
P (h | x)
P (yi | x)
•
(Goodfellow et al., 2014b) • 529
15.
1
•
Slow feature analysis
13.3 •
•
P (h) =
530
!
i
P (hi )
View more...
Comments