Click here to load reader
View
2
Download
0
Embed Size (px)
Variational Attention
《Sequence-to-Sequence Models》
Source : COLING 2018
Speaker : Ya-Fang, Hsiao
Advisor : Jia-Ling, Koh
Date : 2020/01/03
for
PART
Introduction
Auto-Encoder
Encoder-Decoder Deterministic
Variational
Auto-Encoder
Encoder-Decoder
DAE
DED
VAE
VED
PART
Variational Autoencoder [Bowman et al. 2016] Generating Sentences from a Continuous Space
Data likelihood under the posterior (cross entropy)
KL divergence of the posterior from the prior
ℒ θ, 𝜙 = 𝔼𝑞𝜙 𝑧 𝑥 [𝑙𝑜𝑔𝑝θ(𝑥|𝑧)] − KL 𝑞𝜙 𝑧 𝑥 ||𝑝(𝑧)
Variational Seq2Seq model
A B
Bypassing phenomenon
C D
PART
VED+VAttn
ℒ θ, 𝜙 = 𝔼𝑞𝜙 𝑧, 𝑎 𝑥 [𝑙𝑜𝑔𝑝θ(𝑦|𝑧, 𝑎)] − KL 𝑞𝜙 𝑧, 𝑎 𝑥 ||𝑝(𝑧, 𝑎)
Variational Attention for
ℒ θ, 𝜙 = 𝔼 𝑞𝜙 (𝑧) 𝑧 𝑥 ,𝑞𝜙
(𝑎) 𝑎 𝑥 𝑙𝑜𝑔𝑝θ 𝑦 𝑧, 𝑎
−KL 𝑞𝜙 (𝑧)
𝑧 𝑥 ||𝑝(𝑧) − KL 𝑞𝜙 (𝑎)
𝑎 𝑥 ||𝑝(𝑎)
Variational Encoder Decoder
1. 𝑁 0, 𝐼 2. 𝑁(തℎ 𝑠𝑟𝑐 , 𝐼)
ℒ θ, 𝜙 = 𝔼 𝑞𝜙 (𝑧) 𝑧 𝑥 ,𝑞𝜙
(𝑎) 𝑎 𝑥 𝑙𝑜𝑔𝑝θ 𝑦 𝑧, 𝑎
−KL 𝑞𝜙 𝑧
𝑧 𝑥 ||𝑝(𝑧) − KL 𝑞𝜙 (𝑎)
𝑎 𝑥 ||𝑝(𝑎)𝜆𝐾𝐿[ ]
VED+VAttn
Variational Attention for Variational Encoder Decoder
𝛾𝑎
+
PART
Question Generation Standford Question Answering Dataset (Rajpurkar et al., 2016, SQuAD)
Question Generation Standford Question Answering Dataset (Rajpurkar et al., 2016, SQuAD)
Question Generation Standford Question Answering Dataset (Rajpurkar et al., 2016, SQuAD)
Case study
PART
Using variational attention
to solve bypassing phenomenon
Showing more diversified
while retaining high quality