Variational Attention - NTNU184pc128.csie.ntnu.edu.tw/presentation/20-01-03/Variational Attenti… · Variational Autoencoder [Bowman et al. 2016] Generating Sentences from a Continuous

Variational Attention

《Sequence-to-Sequence Models》

Source : COLING 2018

Speaker : Ya-Fang, Hsiao

Advisor : Jia-Ling, Koh

Date : 2020/01/03

for

PART

Introduction

Auto-Encoder

Encoder-DecoderDeterministic

Variational

Auto-Encoder

Encoder-Decoder

DAE

DED

VAE

VED

PART

Variational Autoencoder[Bowman et al. 2016] Generating Sentences from a Continuous Space

Data likelihood under the posterior (cross entropy)

KL divergence of the posterior from the prior

ℒ θ, 𝜙 = 𝔼𝑞𝜙 𝑧 𝑥 [𝑙𝑜𝑔𝑝θ(𝑥|𝑧)] − KL 𝑞𝜙 𝑧 𝑥 ||𝑝(𝑧)

Variational Seq2Seq model

A B

Bypassing phenomenon

C D

PART

VED+VAttn

ℒ θ, 𝜙 = 𝔼𝑞𝜙 𝑧, 𝑎 𝑥 [𝑙𝑜𝑔𝑝θ(𝑦|𝑧, 𝑎)] − KL 𝑞𝜙 𝑧, 𝑎 𝑥 ||𝑝(𝑧, 𝑎)

Variational Attention for

ℒ θ, 𝜙 = 𝔼𝑞𝜙(𝑧) 𝑧 𝑥 ,𝑞𝜙

(𝑎) 𝑎 𝑥 𝑙𝑜𝑔𝑝θ 𝑦 𝑧, 𝑎

−KL 𝑞𝜙(𝑧)

𝑧 𝑥 ||𝑝(𝑧) − KL 𝑞𝜙(𝑎)

𝑎 𝑥 ||𝑝(𝑎)

Variational Encoder Decoder

1. 𝑁 0, 𝐼

2. 𝑁(തℎ 𝑠𝑟𝑐 , 𝐼)

ℒ θ, 𝜙 = 𝔼𝑞𝜙(𝑧) 𝑧 𝑥 ,𝑞𝜙

(𝑎) 𝑎 𝑥 𝑙𝑜𝑔𝑝θ 𝑦 𝑧, 𝑎

−KL 𝑞𝜙𝑧

𝑧 𝑥 ||𝑝(𝑧) − KL 𝑞𝜙(𝑎)

𝑎 𝑥 ||𝑝(𝑎)𝜆𝐾𝐿[ ]

VED+VAttn

Variational Attention for Variational Encoder Decoder

𝛾𝑎

+

PART

Question GenerationStandford Question Answering Dataset (Rajpurkar et al., 2016, SQuAD)



Case study

PART

Using variational attention

to solve bypassing phenomenon

Showing more diversified

while retaining high quality

Documents

Variational Attention - NTNU184pc128.csie.ntnu.edu.tw/presentation/20-01-03/Variational Attenti… · Variational Autoencoder [Bowman et al. 2016] Generating Sentences from a Continuous