Deep Learning：第15章

September 20, 2017 | Author: matsuolab | Category: N/A

Short Description

Deep Learning日本語翻訳版...

Description

501

15

(Hinton et al., 1986)

210 CCX

6

VI

O(n) O(log n)

15.

(Salakhutdinov and Hinton, 2007a)

2

h

1 502

15.

15.1

unsupervised pretraining

greedy layer-wise unsu-

pervised pretraining

RBM 1

15.1

(Fukushima, 1975)

2006

(Hinton et al., 2006; Hinton and Salakhutdinov, 2006; Hinton, 2006; Bengio et al., 2007; Ranzato et al., 2007a)

greedy

greedy algo-

rithm layer-wise

1 503

15.

1

1

k

unsupervised pretraining fine-tune

2

(Hinton and Salakhutdinov, 2006) (Hinton et al., 2006) (Salakhutdinov and Hinton, 2009a) 20 8.7.4 (Erhan et al., 2010)

15.1.1

2006 (Hinton et al., 2006; Bengio et al., 2007; Ranzato et al., 2007a) Ma et al. (2015) 504

15.

Algorithm 15.1

. L

f 1

f (1) (X) T

X

1

X

T

X

L

f

X

Y m

f ← Identity function ˜ =X X for k = 1, . . . , m do ˜ f (k) = L(X) f ← f (k) ◦ f ˜ ← f (k) (X) ˜ X

end for

if fine-tuning then f ← T (f, X, Y )

end if

Return f

7.13 1 RBM(Larochelle and Bengio, 2008) (Rasmus et al., 2015)

2 2

1

1 1 2

505

15.

1

2 1

1 506

15.

1

2

one-hot

L2

2

one-hot

2011

2 (Mesnil et al., 2011; Goodfellow et al., 2011) 5 Paine et al. (2014)

(Hinton and Salakhutdinov, 2006) Erhan et al. (2010) 507

15.

1500

With pretraining Without pretraining

1000 500 0 −500 −1000 −1500 −4000 −3000 −2000 −1000

0

1000

2000

3000

4000

15.1:

Erhan et al. (2010) x

Erhan et al.

y

(2010) x y Isomap(Tenenbaum et al., 2000)

2

y 1 Isomap With pretraining:

Without pretraining:

15.1

508

15.

Erhan et al. (2010)

ReLU

1

1

15.3 2 1

1

2

1 2

1

1

2 Larochelle et al. (2009)

509

15.

one-hot 1

Collobert and Weston (2008b)

Turian et al. (2010) Collobert et al.

(2011a)

CIFAR-10

MNIST

5,000

(Srivastava, 2013) 8.7.4 supervised pretraining ImageNet (Oquab et al., 2014; Yosinski et al., 2014) (Collobert et al., 2011a; Mikolov et al., 2013a)

15.2 P1 P2

510

15.

transfer learning

2

P1

P2

P1 P2

( 7.7 ) 7.2

15.2 domain adaptation

(Glorot et al., 2011b) concept drift 1

511

15.

y

h(shared)

Selection switch h(1)

h(2)

h(3)

x(1)

x(2)

x(3)

15.2:

y x 3

x(1) x(2) x(3) shared:

Selection

switch:

2 2

(Mesnil et al., 2011; Goodfellow et al., 2011) P1 P2

1

P1 512

15.

2

P2

one-shot learning

zero-shot learning zero-data

2 learning

1

(Fei-Fei et al., 2006) 1

1

1

4

(Larochelle et al., 2008)

(Palatucci et al., 2009;

Socher et al., 2013b) 3 x

y

T

p(y | x, T ) y=1

"yes" y = 0

T

"no"

y

T T T

4

T 513

15.

one-hot

T Socher et al. (2013b)

(Klementiev et al., 2012; Mikolov et al., 2013b; Gouws et al., 2014) X A

Y

B A X

Y

2 3

2

multi-modal learning x

y

(x, y)

(Srivastava and Salakhutdinov, 2012)

3

x

2

y

15.3

15.3 1

p(x)

y 1990

p(y | x)

(Becker and Hinton, 1992; Hinton and Sejnowski, 1999) 514

x

15.

hx = fx (x) hy = fy (y)

fx fy x space y space xtest ytest

(x, y) pairs in the training set fx : encoder function for x fy : encoder function for y Relationship between embedded points within one of the domains

Maps between representation spaces

15.3: x

y

2 x

fx

fx

y

fy

fy

hx

x

x hy

fx (x)

fy (y)

y

1 (x, y)

xtest fx (xtest )

ytest

fy (ytest )

fx (xtest ) fy (ytest ) Hrant Khachatrian x-space: x pairs in the training set: encoder function for x: x Relationship between embedded points within one of the domains: 1 Maps between representation spaces:

515

15.

Chapelle et al. (2006)

1.2

AI

2 h

x

y

h

y=2

y=3

p(x)

y=1

y

x

15.4: y p(x)

3

x y

p(y | x)

p(x)

f (x) = E[y | x]

p(x) x

p(y | x) 15.4

y

1

x p(x) 1 516

p(y | x)

15.

p(y | x) y

y

1

x

p(x)

1

x h

p(x)

p(y | x)

h

x p(h, x) = p(x | h)p(h)

(15.1)

p(x) = Eh p(x | h)

(15.2)

x

x h 1

y 1

y x

y p(y | x) = p(x)

p(x | y)p(y) p(x)

(15.3)

p(y | x)

y = hi

hi

hj hj

y

h

517

y

15.

Simons and Levin (1998)

2

1 2

15.5

Generative Adversarial Networks, GANs (Goodfellow et al., 2014c)

20.10.4 Lotter et al. (2015)

518

15.

Input

Reconstruction

15.5:

Chelsea Finn Input:

Reconstruction:

15.6 1

Schölkopf et al. (2012) x

p(x | y)

y p(x | y)

h

p(y)

p(x | h)

519

p(y)

15.

Ground Truth

MSE

Adversarial

15.6: 3D

Lotter et al. (2015) Adversarial:

Ground Truth:

MSE:

15.4 1 n

k k

n

15.3

1

n 2n 520

15.7

15.

1 n n 15.8

n one-hot

n 1

n

1

• k 1

• k

k>1

1

• •

k

•

• n

/ 1

521

w1

w2

2

15.

h2

h3 h = [1, 0, 0]>

h = [1, 1, 0]>

h = [1, 0, 1]> h = [1, 1, 1]>

h1 h = [0, 1, 0]>

h = [0, 1, 1]>

h = [0, 0, 1]>

15.7: 3

h 1 h 2 h3

R2 h− i

2

h+ i

hi = 0

hi = 1 1

hi

h+ i [1, 1, 1]⊤ 15.8 R

d

+ + h+ 1 ∩ h2 ∩ h3 d

n

O(nd ) n

n h

h=0 VC O(w log w)

(Sontag, 1998)

w

hi

522

15.

15.8:

prior 15.7

2

one-hot 12.4

u≈v

f

523

f (u) ≈ f (v)

15.

f (x) ≈ y

(x, y)

x+ϵ fˆ

*1

15.7

Rd n Rd n (Zaslavsky, 1975) d " # ! n j=0

j

= O(nd )

(15.4)

(Pascanu et al., 2014b)

Rd

O(nd)

n

d

O(n ) 1 Rd

O(nd )

*1

d

2 O(2d )

524

f

2d

15.

O(nd )

k≪r

k

r

r

VC

O(r)

(Sontag, 1998)

O(w log w) h

w

y h

1

Zhou et al. (2015)

ImageNet

Places

Radford et al. (2015) 15.9

525

15.

-

+

=

15.9:

Radford et al. (2015)

n−1

15.5 6.4.1

15.4

526

15.

AI

1 RBF

k 2

k−1

6.4.1 1 (Le Roux and Bengio, 2008, 2010; Montúfar and Ay, 2011; Montúfar, 2014; Krause et al., 2013) 6.4.1 1 sum-product network SPN

(Poon and Domingos, 2011) Delalleau

and Bengio (2011)

SPN Martens and Medabalimi

(2014)

SPN

2

527

SPN

15.

(Cohen et al., 2015)

15.6 1

15.3

x

y

1

1 AI

Bengio et al. (2013d)

•

d

ϵ

•

528

3.1

f (x + ϵd) ≈ f (x)

15.

Goodfellow et al. (2014b) • 15.3 p(x)

p(y | x)

p(y | x)

p(x)

15.4

•

h

x 15.3

•

•

yi

x x f (i) (x)

yi

h

P (h | x)

P (yi | x)

•

(Goodfellow et al., 2014b) • 529

15.

1

•

Slow feature analysis

13.3 •

•

P (h) =

530

!

i

P (hi )

Deep Learning：第15章

Short Description

Description

Comments

We need your help!