Deep Learning：第14章

September 18, 2017 | Author: matsuolab | Category: N/A

Short Description

Deep Learning日本語翻訳版...

Description

478

14

autoencoder code h

h = f (x)

r = g(h)

2 14.1

g(f (x)) = x

pencoder (h | x)

pdecoder (x | h)

(LeCun, 1987; Bourlard and Kamp, 1988; Hinton and Zemel, 1994) 20

recirculation

(Hinton and McClelland, 1988)

14.

h

14.1:

f

g

x

r

h

x f

r x

h

g h

r

2

14.1

h 1

h

x undercomplete

(14.1)

L(x, g(f (x))) L

g(f (x))

x PCA

L

f

g

PCA 1 i 479

(i)

x

14.

14.2

overcomplete

2 (Hinton et al., 1995b)

20.10.3

20.12

480

14.

14.2.1

h

Ω(h) (14.2)

L(x, g(f (x))) + Ω(h) g(h)

h = f (x)

Ω(h) (

)

5.6.1 MAP p(θ | x)

log p(x | θ) + log p(θ)

log p(x | θ)

log p(θ)

5.6

θ

x pmodel (h)pmodel (x | h)

h

pmodel (x h) = pmodel (h)

x

p(θ) 481

14.

log pmodel (x) = log

!

(14.3)

pmodel (h, x)

h

1

h

13.4 h

h h log pmodel (h, x) = log pmodel (h) + log pmodel (x | h)

(14.4)

log pmodel (h) pmodel (hi ) =

Ω(h) = λ

! i

− log pmodel (h) = λ

!" i

λ −λ|hi | e 2

(14.5)

(14.6)

|hi | λ λ|hi | − log 2

#

= Ω(h) + const

h

λ

t pmodel (h)

(Ranzato et al., 2007a, 2008) p(x) = log Z

1 ˜(x) Zp

log Z

482

(14.7)

14.

pmodel (h)pmodel (x | h)

log pmodel (h) 1

h et al. (2011b)

Glorot

ReLU

14.2.2 Ω

L(x, g(f (x))) L

L2

x

(14.8)

g(f (x))

g◦f denoising autoencoder DAE) ˜ L(x, g(f (x))) ˜ x

x Alain and Bengio (2013)Bengio et al. (2013c)

g

(14.9)

f

pdata (x) 1

14.5

14.2.3 1 Ω L(x, g(f (x))) + Ω(h, x) 483

(14.10)

14.

Ω Ω(h, x) = λ

! i

||∇x hi ||2

(14.11)

x

contractive autoencoder CAE CAE

14.7

14.3

6.4.1 .

1

1 1

1

6.4.1 (Hinton and Salakhutdinov, 2006) 484

14.

14.4

6.2.2.4 p(y | x)

− log p(y | x)

y

x pdecoder (x | h)

h − log pdecoder (x | h) pdecoder

x x x h

h

pencoder (h | x)

pdecoder (x | h)

x

r

14.2: pdecoder (x | h)

pencoder (h | x)

14.2 encoding function f (x) 485

encoding

14.

distribution pencoder (h | x) pmodel (h x) pencoder (h | x) = pmodel (h | x)

(14.12)

pdecoder (x | h) = pmodel (x | h)

(14.13) pmodel (x h)

Alain et al. (2015) (

)

14.5 denoising autoencoder DAE

h g

f ˜ x

L

˜ | x) C(x x

14.3: ˜ x

x

˜ x ˜ L = − log pdecoder (x | h = f (x))

pdecoder g

DAE

14.3

C(˜ x | x) ˜ x

x 486

x ˜ | x) C(x

14.

reconstruction

˜ (x x) ˜) distribution preconstruct (x | x 1.

:

x

2. C(˜ x | x = x)

˜ x

˜ 3. (x, x)

pdecoder (x | h)

h

˜ f (x)

˜ = preconstruct (x | x) pdecoder

g(h)

− log pdecoder (x | h)

DAE (14.14)

˜ − Ex∼pˆdata (x) Ex˜ ∼C(˜x|x) log pdecoder (x | h = f (x)) pˆdata (x)

14.5.1 (Hyvärinen, 2005) score

x

(14.15)

∇x log p(x) 18.4 log pdata

1

pdata

DAE

p(x | h) 14.4

487

(g(f (x)) − x)

14.

˜ x x

g f

˜ x

x

14.4:

˜ | x) C(x

˜ x

x

x ˜ | x) C(x ˜ − x||2 ||g(f (x)) ˜ Ex,˜x∼pdata (x)C(˜x|x) [x | x] ˜ g(f (x))

˜ g(f (x)) ˜ x

˜ −x ˜ g(f (x))

x g(f (x)) − x

∇x log pdata (x)

RBM (Vincent, 2011)

20.5.1

pmodel (x; θ) denoising score matching

(Kingma and LeCun, 2010) RBM

0 18.5 RBM

RBM CAE (Swersky et al., 2011) 488

RBM

14.

Bengio and Delalleau (2009) RBM x (Alain and Bengio, 2013) ˜ − x||2 ||g(f (x))

(14.16)

˜ ˜ µ = x, Σ = σ 2 I) C(˜ x = x|x) = N (x;

(14.17)

σ2

14.5

g(f (x))

x (Vincent, 2011)

g(f (x)) − x

Kamyshanska

and Memisevic (2015)

g(f (x)) − x

Vincent (2011)

20.11 14.5.1.1 MLP

LeCun (1987)

Gallinari et al. (1987)

Behnke (2001) MLP

.

(Vincent et al., 2008, 2010)

DAE 489

14.

14.5:

1D 2D

0

1 2

Alain and Bengio (2013)

DAE

Inayoshi and Kurita (2005)

MLP

. DAE

490

14.

14.6 . 5.11.3

.

tangent planes

1 d

x 14.6

d x 2 1.

x

h

h

x

x

2.

2

x

h = f (x) x

491

14.

14.6: MNIST

784

1

784

1 1 PCA 1

2

n

n 1

1

492

14.

1

14.7

.

1.0

r(x)

0.8

Identity Optimal reconstruction

0.6 0.4 0.2 0.0

x0

x1

x2

x

14.7: 0

1 r(x)

r(x) − x 1

r(x)

representation

( )

nearest-neighbor graph

non-parametric 493

14.

14.8: embedding . QMUL (Gong et al., 2000)

1 (Schölkopf et al., 1998; Roweis and Saul, 2000; Tenenbaum et al., 2000; Brand, 2003; Belkin and Niyogi, 2003; Donoho and Grimes, 2003; Weinberger and Saul, 2004; Hinton and Roweis, 2003; van der Maaten and Hinton, 2008) . 14.9

Bengio and Monperrus (2005) 494

14.8

14.

14.9:

14.6 . Parzen (Vincent and Bengio, 2003) (Bengio et al., 2006c)

AI 14.6 1

495

xi

14.

14.7 (Rifai et al., 2011a,b)

h = f (x)

f ! ! ! ∂f (x) !2 ! ! Ω(h) = λ ! ∂x !F

Ω(h) (2

(14.18) 2

) Alain and Bengio (2013) x

r = g(f (x))

f (x) g(f (x))

f (x)

14.5.1

f (x)

contractive

CAE

CAE

CAE

x 2

f (x)

f (x′ )

x

f (x)

′

x

f 14.7

1

Ω(h) 1

0

CAE

1

CAE

x

J

f (x) 496

14.

Jx

x

1 J

CAE

x

f (x) 14.6

2 CAE

2 CAE

Ω(h) CAE

x ∂f (x) ∂x

2 CAE

x

Jx

h Rifai et al. (2011a)

Rifai et al. (2011b)

CAE

J

1 CAE

1

14.6 CAE 14.10 CAE

1

Rifai et al. (2011a)

1 497

1

14.

PCA (

)

14.10: PCA CIFAR-10 PCA

∂h ∂x

CAE

CAE CAE CAE Rifai et al. (2011c)

ϵ ϵ

ϵ Ω(h)

0

0

Rifai et al. (2011a) f

f

g

g g

f

14.8 Predictive sparse decomposition PSD (Kavukcuoglu et al., 2008) PSD (Kavukcuoglu et al., 2009, 2010; Jarrett et al., 2009; Farabet et al., 2011)

(Henaﬀ et al., 2011) f (x)

498

g(h)

h

14.

||x − g(h)||2 + λ|h|1 + γ||h − f (x)||2

(14.19) h

f (x) h

f (x)

h

h

10

h

PSD PSD

f (x) f (x)

learned approximate inference 19.5

19

PSD

PSD

f f h

f

PSD

14.9 1

1

Hinton and Salakhutdinov (2006) 30

RBM 30

PCA

499

14.

Salakhutdinov and Hinton (2007b)

Torralba et al. (2008)

information retrieval

semantic hashing

(Salakhutdinov and Hinton, 2007b, 2009b)

(Salakhutdinov and Hinton, 2007b, 2009b)

(Torralba et al.,

2008; Weiss et al., 2008; Krizhevsky and Hinton, 2011)

0

1

(Norouzi and Fleet, 2011)

500

Deep Learning：第14章

Short Description

Description

Comments

We need your help!