RNN & NLP Application - cs.kangwon.ac.krleeck/ML/RNN_NLP.pdf · LSTM RNN 응용 예 • Hand...

RNN & NLP Application

강원대학교 IT대학

이창기

차례

• RNN

• Word Embedding

• NLP application

Recurrent Neural Network

• “Recurrent” property dynamical system over time

Bidirectional RNN

• Exploit future context as well as past

Long Short-Term Memory RNN • Vanishing Gradient Problem for RNN

• LSTM can preserve gradient information

LSTM Block Architecture

Gated Recurrent Unit (GRU)

• 𝑟𝑡 = 𝜎 𝑊𝑥𝑟𝑥𝑡 + 𝑊ℎ𝑟ℎ𝑡−1 + 𝑏𝑟

• 𝑧𝑡 = 𝜎 𝑊𝑥𝑥𝑥𝑡 + 𝑊ℎ𝑧ℎ𝑡−1 + 𝑏𝑧

• ℎ 𝑡 = 𝜙 𝑊𝑥ℎ𝑥𝑡 + 𝑊ℎℎ 𝑟𝑡 ⊙ ℎ𝑡−1 + 𝑏ℎ

• ℎ𝑡 = 𝑧𝑡ℎ𝑡 + 1 − 𝑧𝑡 ℎ 𝑡

• 𝑦𝑡 = 𝑔(𝑊ℎ𝑦ℎ𝑡 + 𝑏𝑦)

LSTM RNN 응용 예

• Hand Writing by Machine

• Music Composition

• Neural Machine Translation

• Image Caption Generation CNN + LSTM RNN

차례

• RNN

• Word Embedding

• NLP application

Deep Belief Network [Hinton06]

• Key idea

– Pre-train layers with an unsupervised learning algorithm in phases

– Then, fine-tune the whole network by supervised learning

• DBN are stacks of Restricted Boltzmann Machines (RBM)

Restricted Boltzmann Machine

• A Restricted Boltzmann machine (RBM) is a generative stochastic neural network that can learn a probability distribution over its set of inputs

• Major applications – Dimensionality reduction

– Topic modeling, …

Training DBN: Pre-Training

• 1. Layer-wise greedy unsupervised pre-training – Train layers in phase from the bottom layer

Training DBN: Fine-Tuning

• 2. Supervised fine-tuning for the classification task

The Back-Propagation Algorithm

Autoencoder

• Autoencoder is an NN whose desired output is the same as the input

– To learn a compressed representation (encoding) for a set of data.

– Find weight vectors A and B that minimize: Σi(yi-xi)

15 <겨울학교14 Deep Learning 자료 참고>

Stacked Autoencoders

• After training, the hidden node extracts features from the input nodes

• Stacking autoencoders constructs a deep network

16 <겨울학교14 Deep Learning 자료 참고>

텍스트의 표현 방식

• One-hot representation (or symbolic)

– Ex. [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0]

– Dimensionality • 50K (PTB) – 500K (big vocab) – 3M (Google 1T)

– Problem • Motel [0 0 0 0 0 0 0 0 1 0 0] AND

• Hotel [0 0 0 0 0 0 1 0 0 0 0] = 0

• Continuous representation

– Latent Semantic Analysis, Random projection

– Latent Dirichlet Allocation, HMM clustering

– Neural Word Embedding • Dense vector

• By adding supervision from other tasks improve the representation

Neural Network Language Model (Bengio00,03)

Shared weights = Word embedding

• Idea – A word and its context is a

positive training sample

– A random word in that same context negative training sample

– Score(positive) > Score(neg.)

• Training complexity is high – Hidden layer output

– Softmax in the output layer

• Hierarchical softmax

• Negative sampling

• Ranking(hinge loss) Input Dim: 1 Dim: 2 Dim: 3 Dim: 4 Dim: 5

1 (boy) 0.01 0.2 -0.04 0.05 -0.3

2 (girl) 0.02 0.22 -0.05 0.04 -0.4

LT: |V|*d, Input(one hot): |V|*1 LTT I

Ranking-based Model (Collobert)

본 연구 추가

Shared weights = Word embedding

w(t-2)

w(t-1)

w(t) or wc(t)

s or sc

Negative sampling

s > sc

Word2Vec: CBOW, Skip-Gram

• Remove the hidden layer Speedup 1000x – Negative sampling

– Frequent word sampling

– Multi-thread (no lock)

• Continuous Bag-of-words (CBOW) – Predicts the current word given the con

• Skip-gram – Predicts the surrounding words given

the current word

– CBOW + DropOut/DropConnect

Shared weights

한국어 Word Embedding: NNLM

• Data – 세종코퍼스 원문 + Korean Wiki abstract + News data

• 2억 8000만 형태소

– Vocab. size: 60,000

• 모든 형태소 대상 (기호, 숫자, 한자, 조사 포함)

• 숫자 정규화 + 형태소/POS: 정부/NNG, 00/SN

• NNLM model – Dimension: 50

– Matlab 구현

– 학습시간: 16일

한국어 Word Embedding: NNLM

한국어 Word Embedding: Word2Vec(CBOW)

• Data – 2012-2013 News + Korean Wiki abstract:

• 9GB raw text

• 29억 형태소

– Vocab. size: 100,000 • 모든 형태소 대상 (기호, 숫자, 한자, 조사 포함)

• 숫자 정규화 + 영어 소문자 + 형태소/POS

• 기존 Word2Vec(CBOW, Skip-Gram) – 모델: CBOW > SKIP-Gram

– 학습 시간: 24분 Shared weights

한국어 Word Embedding: Word2Vec(CBOW)

Word Embedding 응용: Word Analogy

King – Man + Woman ≈ Queen

http://deeplearner.fz-qqq.net/

Bilingual Word Embedding

• Bilingual Word Embedding for PBMT (EMNLP13) – 중국어-영어 word aliment 이용

한영 Bilingual Word Embedding

한영 Bilingual Word Embedding – cont’d

차례

• RNN

• Word Embedding

• NLP application

Sequence Labeling – RNN, LSTM

Word embedding

Feature embedding

FFNN(or CNN), CNN+CRF (SENNA)

x(t-1) x(t ) x(t+1)

h(t-1) h(t ) h(t+1)

y(t+1) y(t-1) y(t )

x(t-1) x(t ) x(t+1)

h(t-1) h(t ) h(t+1)

y(t+1) y(t-1) y(t )

x(t-1) x(t ) x(t+1)

h(t-1) h(t ) h(t+1)

y(t+1) y(t-1) y(t )

x(t-1) x(t ) x(t+1)

y(t+1) y(t-1) y(t )

RNN, CRF Recurrent CRF

x(t-1) x(t ) x(t+1)

h(t-1) h(t ) h(t+1)

y(t+1) y(t-1) y(t )

x(t-1) x(t ) x(t+1)

y(t+1) y(t-1) y(t )

C(t) x(t ) h(t )

i (t )

f (t )

x(t-1) x(t ) x(t+1)

h(t-1) h(t ) h(t+1)

y(t+1) y(t-1) y(t )

LSTM RNN + CRF LSTM-CRF (KCC 15)

x(t-1) x(t ) x(t+1)

h(t-1) h(t ) h(t+1)

y(t+1) y(t-1) y(t )

LSTM-CRF

• 𝑖𝑡 = 𝜎 𝑊𝑥𝑖𝑥𝑡 + 𝑊ℎ𝑖ℎ𝑡−1 + 𝑊𝑐𝑖𝑐𝑡−1 + 𝑏𝑖

• 𝑓𝑡 = 𝜎 𝑊𝑥𝑓𝑥𝑡 + 𝑊ℎ𝑓ℎ𝑡−1 + 𝑊𝑐𝑓𝑐𝑡−1 + 𝑏𝑓

• 𝑐𝑡 = 𝑓𝑡 ⊙ 𝑐𝑡−1 + 𝑖𝑡 ⊙ tanh 𝑊𝑥𝑐𝑥𝑡 + 𝑊ℎ𝑐ℎ𝑡−1 + 𝑏𝑐

• 𝑜𝑡 = 𝜎 𝑊𝑥𝑜𝑥𝑡 + 𝑊ℎ𝑜ℎ𝑡−1 + 𝑊𝑐𝑜𝑐𝑡 + 𝑏𝑜

• ℎ𝑡 = 𝑜𝑡 ⊙ tanh(𝑐𝑡)

• 𝑦𝑡 = 𝑊ℎ𝑦ℎ𝑡 + 𝑏𝑦

• 𝑠 𝐱, 𝐲 = 𝐴 𝑦𝑡−1, 𝑦𝑡 + 𝑦𝑡𝑇𝑡=1

• log 𝑃 𝐲 𝐱 = 𝑠 𝐱, 𝐲 − log exp(𝑠(𝐱, 𝐲′))𝐲′

GRU+CRF

• 𝑟𝑡 = 𝜎 𝑊𝑥𝑟𝑥𝑡 + 𝑊ℎ𝑟ℎ𝑡−1 + 𝑏𝑟

• 𝑧𝑡 = 𝜎 𝑊𝑥𝑧𝑥𝑡 + 𝑊ℎ𝑧ℎ𝑡−1 + 𝑏𝑧

• ℎ 𝑡 = 𝜙 𝑊𝑥ℎ𝑥𝑡 + 𝑊ℎℎ 𝑟𝑡 ⊙ ℎ𝑡−1 + 𝑏ℎ

• ℎ𝑡 = 𝑧𝑡 ⊙ ℎ𝑡−1 + 𝟏 − 𝑧𝑡 ⊙ ℎ 𝑡

• 𝑦𝑡 = 𝑊ℎ𝑦ℎ𝑡 + 𝑏𝑦

• 𝑠 𝐱, 𝐲 = 𝐴 𝑦𝑡−1, 𝑦𝑡 + 𝑦𝑡𝑇𝑡=1

• log 𝑃 𝐲 𝐱 = 𝑠 𝐱, 𝐲 − log exp(𝑠(𝐱, 𝐲′))𝐲′

x(t-1) x(t ) x(t+1)

h(t-1) h(t ) h(t+1)

y(t+1) y(t-1) y(t )

Bi-LSTM CRF

• Bidirectional LSTM+CRF

• Bidirectional GRU+CRF

• Stacked Bi-LSTM+CRF …

x(t-1) x(t ) x(t+1)

h(t-1) h(t ) h(t+1)

y(t+1) y(t-1) y(t )

bh(t-1) bh(t ) bh(t+1)

Stacked LSTM CRF

x(t-1) x(t ) x(t+1)

h(t-1) h(t ) h(t+1)

y(t+1) y(t-1) y(t )

x(t-1) x(t ) x(t+1)

h(t-1) h(t ) h(t+1)

y(t+1) y(t-1) y(t )

h2(t-1) h2(t ) h2(t+1)

LSTM CRF with Context words = CNN + LSTM CRF

x(t-1) x(t ) x(t+1)

h(t-1) h(t ) h(t+1)

y(t+1) y(t-1) y(t )

x(t-2) x(t+2)

• Bi-LSTM CRF =~ LSTM CRF with Context > LSTM CRF

영어 개체명 인식 (KCC 15, Journal submitted)

영어 개체명 인식 (CoNLL03 data set) F1(dev) F1(test)

SENNA (Collobert) - 89.59

Structural SVM (baseline + Word embedding feature) - 85.58

FFNN (Sigm + Dropout + Word embedding) 91.58 87.35

RNN (Sigm + Dropout + Word embedding) 91.83 88.09

LSTM RNN (Sigm + Dropout + Word embedding) 91.77 87.73

GRU RNN (Sigm + Dropout + Word embedding) 92.01 87.96

CNN+CRF (Sigm + Dropout + Word embedding) 93.09 88.69

RNN+CRF (Sigm + Dropout + Word embedding) 93.23 88.76

LSTM+CRF (Sigm + Dropout + Word embedding) 93.82 90.12

GRU+CRF (Sigm + Dropout + Word embedding) 93.67 89.98

한국어 개체명 인식 (Journal submitted)

한국어 개체명 인식 (TV domain) F1(test)

Structural SVM (baseline) (basic + NE dic. + word cluster feature + morpheme feature)

FFNN (ReLU + Dropout + Word embedding) 87.70

RNN (Tanh + Dropout + Word embedding) 88.93

LSTM RNN (Tanh + Dropout + Word embedding) 89.38 (+0.35)

Bi-LSTM RNN (Tanh + Dropout + Word embedding) 89.21 (+0.18)

CNN+CRF (ReLU + Dropout + Word embedding) 90.06 (+1.03)

RNN+CRF (Sigm + Dropout + Word embedding) 90.52 (+1.49)

LSTM+CRF (Sigm + Dropout + Word embedding) 91.04 (+2.01)

GRU+CRF (Sigm + Dropout + Word embedding) 91.02 (+1.99)

NLU 실험 (ATIS data)

NLU (Airplane Traveling Information System data) F1(dev) F1(test)

FFNN (Sigm + Dropout + Word embedding) 93.75 92.13

RNN (Sigm + Dropout + Word embedding) 98.05 94.78

LSTM RNN (Sigm + Dropout + Word embedding) 98.27 94.85

GRU RNN (Sigm + Dropout + Word embedding) 98.11 94.82

CNN4+CRF (Sigm + Dropout + Word embedding) 96.66 95.56

RNN+CRF (Sigm + Dropout + Word embedding) 98.42 96.19

LSTM+CRF (Sigm + Dropout + Word embedding) 98.79 96.48

학습 속도

LSTM RNN

LSTM R-CRF

GRU-CRF

SCRN-CRF

Neural Architectures for NER (Arxiv16)

• LSTM-CRF model + Char-based Word Representation – Char: Bi-LSTM RNN

Neural Architectures for NER (Arxiv16)

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF (ACL16)

• LSTM-CRF model + Char-level Representation – Char: CNN

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF (ACL16)

NER with Bidirectional LSTM-CNNs (Arxiv16)

의미역 결정 (Semantic Role Labeling)

Korean PropBank

• Virginia Corpus – 군대 도메인, 54,500 단어 도메인이 맞지 않음

• Newswire Corpus – 신문 도메인

(1994/6/2~2000/3/20), 131,800 단어

– 구구조 기반 태깅을 의존 구조로 변환 4,882 문장

• Frame file – 2,749 xml files

• 용언의 root (not stem)

• 용언화 가능 명사 (deverbal noun)

• Frame file 예제

– 덧붙.01

– English Define = add

– Role set • ARG0: adder

• ARG1: thing added

• ARG2: added to

– Mapping1 • Rel = 덧붙이다

• Src = sbj, Trg = ARG0

• Src = obj, Trg = ARG1

• Src = comp, Trg=ARG2

한국어 의미역 결정 (SRL)

• 서술어 인식(PIC)

– 그는 르노가 3월말까지 인수제의 시한을 [갖고]갖.1 있다고 [덧붙였다]덧붙.1

• 논항 인식(AIC)

– 그는 [르노가]ARG0 [3월말까지]ARGM-TMP 인수제의 [시한을]ARG1 [갖고]갖.1 [있다고]AUX 덧붙였다

– [그는]ARG0 르노가 3월말까지 인수제의 시한을 갖고 [있다고]ARG1 [덧붙였다]

덧붙.1

의존 구문 분석

의미역 결정

딥러닝 기반 한국어 의미역 결정 – 한글 및 한국어 정보처리/동계학술대회 15

• Bidirectional LSTM+CRF

• Korean Word embedding

– Predicate word, argument word

– NNLM

• Feature embedding

– POS, distance, direction

– Dependency path, LCA

• Bi-LSTM+CRF 성능 (AIC)

– F1: 78.2% (+1.2)

– Backward LSTM+CRF: F1 77.6%

– S-SVM 성능 (KCC14) • 기본자질: F1 74.3%

• 기본자질+word cluster: 77.0% – 정보과학회 논문지 2015.02

x(t-1) x(t ) x(t+1)

h(t-1) h(t ) h(t+1)

y(t+1) y(t+1) y(t )

Stacked Bi-LSTM CRF (한국어의미역결정) (정보과학회지 제출)

Syntactic information w/ w/o

Structural SVM

Backward LSTM CRFs

Bidirectional LSTM CRFs

Stacked Bidirectional LSTM CRFs (2 layers)

Stacked Bidirectional LSTM CRFs (3 layers)

CNN for Sentence Classification (Sentiment Analysis)

• Convolutional NN – Convolution Layer

• Sparse Connectivity

• Shared Weights

• Multiple feature maps

– Sub-sampling Layer • Average/max pooling

• NxN1

• NLP (Sentence Classification)에 적용 – ACL14

– EMNLP14

Multiple feature maps

한국어 감성 분석 – CNN

• Mobile data – Train: 4543, Test: 500

• EMNLP14 모델(CNN) 적용 – Matlab으로 구현

– Word embedding • 한국어 10만 단어 + 도메인 특화 1420 단어

한국어 감성 분석 – CNN 실험 Data set Model Accuracy

Mobile Train: 4543 Test: 500

SVM (word feature) 85.58

CNN(relu,kernel3,hid50)+Word embedding (word feature)

CNN(relu,kernel3,hid50)+Random init. 89.00

LSTM RNN 기반 한국어 감성분석

• LSTM RNN-based encoding – Sentence embedding 입력

– Fully connected NN 출력

– GRU encoding 도 유사함

x(1) x(2 ) x(t)

h(1) h(2 ) h(t)

Data set Model Accuracy

GRU encoding + Fully connected NN 91.12

LSTM RNN encoding + Fully connected NN 90.93

Attention-based LSTM RNN encoding 기반 한국어 감성분석

• Bi-LSTM RNN-based encoding + Attention Mechanism

Data set Model Accuracy

GRU encoding + Fully connected NN 91.12

Attention-based GRU encoding + Fully connected NN

Attention Mechanism 결과

• 긍정 – 반응/nng/2 속도/nng/2 터치감/nng/2 만족/nng/2 하/xsa/19 고요/ec/25 </s>/44

– 카메라/nng/1 가/jks/2 깨끗이/mag/1 잘/mag/0 나오/vv/0 는/etm/1 듯/nnb/1 하/xsa/2 아/ec/1 참/mag/0 마음/nng/0 에/jkb/10 들/vv/12 었/ep/12 습니다/ef/12 ./sf/15 </s>/21

– 검정/nng/0 보다/jkb/0 화이트/nng/0 를/jko/1 선호/nng/1 하/xsv/3 는데/ec/3 다행/nng/2 이/jks/4 화이트/nng/2 가/jks/6 있/vv/4 어서/ec/5 디자인/nng/4 도/jx/8 맘/nng/8 에/jkb/10 들/vv/9 구요/ec/9 </s>/11

• 부정 – 화면/nng/10 이나/jc/12 해상도/nng/6 는/jx/12 않/nng/25 좋/va/1 아서/ec/8

</s>/23

– 화면/nng/5 자체/nng/9 가/jks/13 작/va/4 은데/ec/3 편하/va/1 겠/ep/7 냐/ec/16 </s>/38

– 메뉴/nng/5 무지/nng/0 느리/va/4 고요/ec/3 인터넷/nng/1 심심/xr/0 하/xsa/3 면/ec/4 렉먹/vv/7 고/ec/12 문자/nng/8 전송/nng/4 속도/nng/2 gg/sl/12 </s>/28

– 요금/nng/12 넘/vv/9 비싸/va/15 아/ec/19 </s>/43

GRU encoding + Deep Output

• GRU encoding + Deep Output – Sentence embedding

입력

• 실험 결과 성능 개선 없음

x(1) x(2 ) x(t)

h(1) h(2 ) h(t)

Attention-based GRU encoding + Deep Output

• Attention-based GRU encoding + Deep Output – Sentence embedding

입력

• 실험 결과 성능 개선 없음

Stacked GRU encoding + MLP

• Stacked GRU encoding – Sentence embedding

입력

• 실험 중 – 일부 태스크에서 좋은 성능

을 보임

x(1) x(2 ) x(t)

h(1) h(2 ) h(t)

h’t)

Neural Machine Translation

T|S 777 항공편 은 3 시간 동안 지상 에 있 겠 습니다 . </s>

flight 0.5 0.4 0 0 0 0 0 0 0 0 0 0 0

777 0.3 0.6 0 0 0 0 0 0 0 0 0 0 0

is 0 0.1 0 0 0.1 0.2 0 0.4 0 0.1 0 0 0

on 0 0 0 0 0 0 0 0.7 0.2 0.1 0 0 0

the 0 0 0 0.2 0.3 0.3 0.1 0 0 0 0 0

ground 0 0 0 0.1 0.2 0.5 0.3 0 0 0 0 0 0

for 0 0 0 0.1 0.2 0.5 0.1 0.1 0 0 0 0 0

three 0 0 0 0.2 0.2 0.6 0 0 0 0 0 0 0

hours 0 0 0 0.1 0.3 0.5 0 0 0 0 0 0 0

. 0 0 0 0.4 0 0.1 0.2 0.1 0.1 0.1 0 0 0

</s> 0 0 0 0 0 0 0 0.1 0 0.1 0.1 0.3 0.3

Recurrent NN Encoder–Decoder for Statistical Machine Translation (EMNLP14)

GRU RNN Encoding GRU RNN Decoding Vocab: 15,000 (src, tgt)

Sequence to Sequence Learning with Neural Networks (NIPS14 – Google)

Source Voc.: 160,000 Target Voc.: 80,000 Deep LSTMs with 4 layers Train: 7.5 epochs (12M sentences, 10 days with 8-GPU machine)

Neural MT by Jointly Learning to Align and Translate (ICLR15)

GRU RNN + Alignment Encoding GRU RNN Decoding Vocab: 30,000 (src, tgt) Train: 5 days

NMT 모델의 장단점

• NMT의 장점 – Feature Engineering이 필요 없음 – End-to-end 방식의 단일 신경망 구조

• 단어 정렬, 번역 모델, 언어 모델 등이 필요 없음

– 디코더가 간단함 – 구문 분석 등이 필요 없음

• Pre-ordering, Syntax-based SMT

• NMT의 단점 – 출력 언어 단어 사전의 크기가 제한 됨

• 사전의 크기에 비례하여 학습/디코딩 시간이 필요 • 미등록어 처리가 필요함 • 최근 NMT 논문들은 이러한 미등록어 처리에 관한 연구가 대

부분임

문자 단위의 NMT 제안 (WAT15, 한글 및 한국어15)

• 기존의 NMT: 단어 단위의 인코딩-디코딩

– 미등록어 후처리 or NMT 모델의 수정 등이 필요

• 문자 단위의 NMT

– 입력 언어는 단어 단위로 인코딩

• 입력 언어를 문자 단위로 인코딩 성능 하락

– 출력 언어는 문자 단위로 디코딩

• 단어 단위: その/UN 結果/NCA を/PS 詳細/NCD …

• 문자 단위: そ/B の/I 結/B 果/I を/B 詳/B 細/I …

• 문자 단위 NMT의 장점

– 모든 문자를 사전에 등록 미등록어 문제해결

– 학습 및 디코딩 속도 향상

– 기존 NMT 모델의 수정이 필요 없음

– 미등록어 후처리 작업이 필요 없음

본 연구에서 확장한 NMT 모델

• Encoding (source language)

– Bidirectional GRU RNN 이용

– ℎ𝑡 = [ℎ𝑡 , ℎ𝑡].

• Decoding (target language)

– 𝑐 = ℎ0 (global context)

– 𝑐𝑡 = 𝑎𝑡𝑖ℎ𝑖 (local context)

– 𝑧𝑡 = 𝑠𝑖𝑔𝑚 𝑊𝑧𝑥𝐸𝑡(𝑦𝑡−1) + 𝑊𝑧𝑠𝑠𝑡−1 + 𝑏𝑧

– 𝑟𝑡 = 𝑠𝑖𝑔𝑚 𝑊𝑟𝑥𝐸𝑡(𝑦𝑡−1) + 𝑊𝑟𝑠𝑠𝑡−1 + 𝑏𝑟

– 𝑠 𝑡 = 𝑡𝑎𝑛ℎ 𝑊𝑠𝑥𝐸𝑡(𝑦𝑡−1) + 𝑊𝑠𝑠(𝑟𝑡𝑠𝑡−1) + 𝑊𝑠𝑐𝑐𝑡 + 𝑏𝑠

– 𝑠𝑡 = (1 − 𝑧𝑡)𝑠𝑡−1 + 𝑧𝑡𝑠 𝑡

– 𝒔′𝒕 = 𝒓𝒆𝒍𝒖 𝑾𝒔′𝒔𝒔𝒕 + 𝒃𝒚

– 𝑦𝑡 =

𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑾𝒚𝒔′𝒔′𝒕 + 𝑊𝑦𝑠𝑠𝑡 + 𝑊𝑦𝑦𝐸𝑡(𝑦𝑡−1) + 𝑾𝒚𝒄𝒄 + 𝑏𝑦

S’ S’

yt-1 yt

ASPEC J-to-E 실험

• ASPEC J-to-E data

• 성능 (Juman 이용 BLEU)

– PB SMT: 18.45

– HPB SMT: 18.72

– Tree-to-string SMT: 22.16

– NMT (Word-level decoding): 21.63

– NMT (Character-level decoding): 21.72

– NMT (Word+Character-level decoding): 23.76

ASPEC E-to-J 실험 (WAT15)

• ASPEC E-to-J data

– PB SMT: 27.48

– HPB SMT: 30.19

– Tree-to-string SMT: 32.63

– NMT (Word-level decoding): 29.78

– NMT (Character-level decoding): 33.14 (4위) • RIBES 0.8073 (2위)

– Tree-to-String + NMT(Character-level) re-ranking • BLEU 34.60 (2위)

• Human 53.25 (2위)

JPO K-to-J 실험 (WAT15)

• JPO Patent K-to-J data

– PB SMT: 69.22

– HPB SMT: 67.41

– NMT(Word-level): 61.52

– NMT(Character-level): 65.72

– PB SMT + NMT(Character-level) re-ranking • BLEU 71.38 (2위)

• Human 14.75 (1위)

ASPEC E-to-J 실험 – 예제

• This/DT:0 paper/NN:1 explaines/NNS:2 experimental/JJ:3 result/NN:4 according/VBG:5 to/TO:6 the/DT:7 model/NN:8 ./.:9 </s>:10

• こ/B:0 の/I:1 モ/B:2 デ/I:3 ル/I:4 に/B:5 よ/B:6 る/I:7 実/B:8 験/I:9 結/B:10 果/I:11 を/B:12 説/B:13 明/I:14 し/B:15 た/B:16 。/B:17 </s>:18

ASPEC J-to-E 실험 – 예제

• 又/CJ:0 メルト/NQC:1 フラクチャー/NCC:2 防止/NCS:3 策/NCC:4 に/PS:5 つい/VC:6 て/PJ:7 も/PC:8 述べ/VC:9 た/VX:10 </s>:11

• M/B:0 e/I:1 l/I:2 t/I:3 i/I:4 n/I:5 g/I:6 prevention/NN:7 measures/NNS:8 are/VBP:9 also/RB:10 described/VBN:11 ./.:12 </s>:13

JPO K-to-J 실험 – 예제

• ○/ETC:0 :/ETC:1 필름/NOUN:2 면/NOUN:3 과의/JOSA:4 접착/NOUN:5 이/JOSA:6 없/NORMALVERB:7 다/EOMI:8 ./ETC:9 </s>:10

• ○/B:0 :/B:1 フ/B:2 ィ/I:3 ル/I:4 ム/I:5 面/B:6 と/B:7 の/B:8 接/B:9 着/I:10 が/B:11 な/B:12 い/I:13 。/B:14 </s>:15

Input-feeding Approach (EMNLP15)

The attentional decisions are made independently, which is suboptimal.

In standard MT, a coverage set is often maintained during the translation process to keep track of which source words have been translated.

Effect: - We hope to make the model fully aware of

previous alignment choices - We create a very deep network spanning

both horizontally and vertically

Copying Mechanism or CopyNet (ACL16)

Abstractive Text Summarization (한글 및 한국어 16)

로드킬로 숨진 친구의 곁을 지키는 길고양이의 모습이 포착되었다.

RNN_search+input_feeding+CopyNet

End-to-End 한국어 형태소 분석 (동계학술대회16)

Attention + Input-feeding + Copying mechanism

Sequence-to-sequence 기반 한국어 구구조 구문 분석 (한글 및 한국어 16)

입력 예시 1 43/SN 국/NNG <sp> 참가/NNG

입력 예시 2 4 3 <SN> 국 <NNG> <sp> 참 가 <NNG>

43/SN 국/NNG + 참가/NNG

(NP (NP 43/SN + 국/NNG) (NP 참가/NNG))

Attention + Input-feeding

입력

선 생 <NNG> 님 <XSN> 의 <JKG> <sp> 이 야 기 <NNG> <sp> 끝 나 <VV> 자 <EC> <sp> 마 치 <VV> 는 <ETM> <sp>

종 <NNG> 이 <JKS> <sp> 울 리 <VV> 었 <EP> 다 <EF> . <SF>

정답 (S (S (NP_SBJ (NP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) (S

(NP_SBJ (VP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) )

RNN-search[7] (Beam size 10)

(S (VP (NP_OBJ (NP_MOD XX ) (NP_OBJ XX ) ) (VP XX ) ) (S (NP_SBJ (VP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) )

RNN-search + Input-feeding +

Dropout (Beam size 10)

(S (S (NP_SBJ (NP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) (S (NP_SBJ (VP_MOD XX ) (NP_SBJ XX ) ) (VP XX ) ) )

Sequence-to-sequence 기반 한국어 구구조 구문 분석 (한글 및 한국어 16)

모델 F1

스탠포드 구문분석기[13] 74.65

버클리 구문분석기[13] 78.74

형태소 + <sp> RNN-search[7] (Beam size 10)

87.34(baseline)

87.65*(+0.31)

형태소의 음절 + 품사태그 + <sp>

RNN-search[7] (Beam size 10) 87.69(+0.35)

88.00*(+0.66)

RNN-search + Input-feeding (Beam size 10) 88.23(+0.89)

88.68*(+1.34)

RNN-search + Input-feeding + Dropout (Beam size 10)

88.78(+1.44)

89.03*(+1.69)

Pointer Network (NIPS15) • Travelling Salesman Problem: NP-hard • Pointer Network can learn approximate solutions: O(n^2)

상호참조해결을 위한 포인터 네트워크 모델 (Journal submitted)

• 입력: 단어(형태소) 열, 출발점(대명사, 한정사구(이 별자리 등))

– X = {A:0, B:1, C:2, D:3, <EOS>:4}, Start_Point=A:0

• 출력: 입력 단어 열의 위치(Pointer) 열 Entity

– Y = {A:0, C:2, D:3, <EOS>:4}

• 특징: End-to-end 방식의 대명사 상호참조 해결 (mention detection 과정 X)

A B C D <EOS> A C D <EOS>

Encoding Decoding

Hidden Layer

Projection Layer

Attention Layer

포인터 네트워크를 이용한 한국어 의존 구문 분석 (동계학술대회16)

Neural Responding Machine for Short-Text Conversation (ACL 15)

Neural Responding Machine – cont’d

실험 결과 (ACL 15)

관련연구: Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network

Models (arXiv 15) - HRED

관련연구: Attention with Intention for a Neural Network Conversation Model (arXiv 15) - AWI

관련연구: A Diversity-Promoting Objective Function for Neural Conversation Models

(arXiv 16) - MMI

관련연구: A Persona-Based Neural Conversation Model (arXiv 16)

Speaker model + MMI

강원대 딥러닝 기반 챗봇 모델

• Short-Text Conversation 가정

– 사용자의 현재 발화 정보만을 이용하여 대답을 생성

• GRU_encdec 모델

– Gate Recurrent Unit(GRU) 기반의 Encoder-Decoder 모델

• Stacked_GRU_encdec 모델

– 2 layer GRU_encdec 모델

• GRU_search 모델

– Attention 기반의 모델

• GRU_DO_search 모델

– Attention 기반의 모델 + Deep Output 모델

Subtitle 데이터 실험

• 학습데이타: 영화 대사 데이터 100만 문장

• 모델: GRU + enc-dec • what is two plus two ?

– A) one , two , three .

• how old are you ?

– A) 18 years .

• are you a follower or a leader ?

– A) yes , i am .

• my name is david . what is my name ?

– A) david palmer .

• what is the purpose of being intelligent ?

– A) it is difficult to explain .

• what happens if machines can think ?

– A) if they find anything , they 'll kill us .

고객센터 데이터 실험

• 학습데이타: 고객센터 데이터 18만 문장

• 모델: Stacked GRU + enc-dec • 세탁기가 전원이 안 들어와요?

– A) 네 아 그러세요

– A) 많이 불편하셨겠습니다

– A) 드럼이세요 아니면 일반 세탁기세요

• 언제 방문해 줄수 있나요?

– A) 언제 방문 드리면 편하실까요

– A) 언제 방문 드리면 편하시겠어요

• 출장비가 얼마죠?

– A) 출장 점검료는 15000원이구요

– A) 출장비가 만오천원이구요

Short-Text Conversation (동계학술대회16)

- Data: 클리앙 ‘아무거나 질문 게시판’ - 77,346 질문-응답 쌍 - 학습:개발:평가 = 8:1:1

이미지 캡션 생성 소개

• 이미지 내용 이해 이미지 내용을 설명하는 캡션 자동 생성 – 이미지 인식(이해) 기술 + 자연어처리(생성) 기술

• 활용 분야 – 이미지 검색

– 맹인들을 위한 사진 설명, 네비게이션

– 유아 교육, …

기존 연구 • Multimodal RNN (M-RNN) [2]

Baidu CNN + vanilla RNN

CNN: VGGNet • Neural Image Caption

generator (NIC) [4] Google CNN + LSTM RNN

CNN: GoogLeNet

• Deep Visual-Semantic alignments (DeepVS) [5] Stanford University RCNN + Bi-RNN

alignment (training) CNN + vanilla RNN

CNN: AlexNet

AlexNet, VGGNet

RNN을 이용한 이미지 캡션 생성 (동계학술대회 15)

• CNN + RNN

– CNN: VGGNet 15 번째 layer (4096 차원)

– RNN: GRU (LSTM RNN의 변형) • Hidden layer unit: 500, 1000 (Best)

• Multimodal layer unit: 500, 1000 (Best)

– Word embedding • SENNA: 50차원 (Best)

• Word2Vec: 300 차원

– Data set • Flickr 8K : 8000 이미지 * 이미지 캡션 5문장

– 6000 학습, 1000 검증, 1000 평가

• Flickr 30K : 31783 이미지 * 이미지 캡션 5문장

– 29000 학습, 1014 검증, 1000 평가

– 4가지 모델 실험 • GRU-DO1, GRU-DO2, GRU-DO3, GRU-DO4

Embedding

CNNMultimodal

Softmax

Embedding

CNNMultimodal

Softmax

Image GRU

Embedding

CNNMultimodal

Softmax

• GRU-DO1 • GRU-DO2

• GRU-DO3 • GRU-DO4

RNN을 이용한 이미지 캡션 생성 (동계학술대회 15)

Embedding

CNNMultimodal

Softmax

Image GRU

Embedding

CNNMultimodal

Softmax

Image GRU

Embedding

CNNMultimodal

Softmax

Flickr 30K B-1 B-2 B-3 B-4

m-RNN (Baidu)[2] 60.0 41.2 27.8 18.7

DeepVS (Stanford)[5] 57.3 36.9 24.0 15.7

NIC (Google)[4] 66.3 42.3 27.7 18.3

Ours-GRU-DO1 63.01 43.60 29.74 20.14

Ours-GRU-DO2 63.24 44.25 30.45 20.58

Ours-GRU-DO3 62.19 43.23 29.50 19.91

Ours-GRU-DO4 63.03 43.94 30.13 20.21

Flickr 8K B-1 B-2 B-3 B-4

m-RNN (Baidu)[2] 56.5 38.6 25.6 17.0

DeepVS (Stanford)[5] 57.9 38.3 24.5 16.0

NIC (Google)[4] 63.0 41.0 27.0 -

Ours-GRU-DO1 63.12 44.27 29.82 19.34

Ours-GRU-DO2 61.89 43.86 29.99 19.85

Ours-GRU-DO3 62.63 44.16 30.03 19.83

Ours-GRU-DO4 63.14 45.14 31.09 20.94

Flickr30k 실험 결과

A black and white dog is jumping in the grass

A group of people in the snow

Two men are working on a roof

신규 데이터

A large clock tower in front of a building

A man in a field throwing a frisbee

A little boy holding a white frisbee

A man and a woman are playing with a sheep

한국어 이미지 캡션 생성

한 어린 소녀가 풀로 덮인 들판에 서 있다

건물 앞에 서 있는 한 남자

분홍색 개를 데리고 있는 한 여자와 한 여자

구명조끼를 입은 한 작은 소녀가 웃고 있다

Embedding

CNNMultimodal

Softmax

Residual Network + 한국어 이미지 캡션 생성 (동계학술대회16)

Visual Question Answering

Facebook: Visual Q&A

Toronto COCO-QA dataset (ArXiv 15)

• QA dataset 자동 생성

– MS COCO (image caption) data set 이용

– Image caption으로부터 parser 등을 이용하여 질문과 정답을 자동 생성 • 정답은 single word라고 가정함 (classification problem)

• Object, Number, Color, Location types의 질문 생성 – Object: 70%, Color: 17%, Number: 7%, Location: 6%

– 123,287 images, 78,736 train questions, 38,948 questions

– Ex. What is there in front of the sofa? • Answer: table

VIS+LSTM model (Arxiv 15)

DPPnet (포항공대)

Stacked Attention Networks for Image QA (ArXiv 16)

RNN & NLP Application - cs.kangwon.ac.krleeck/ML/RNN_NLP.pdf · LSTM RNN 응용 예 • Hand...

Documents

แนวคิด RNN Open Platform

RNN & LSTM

Machine Learning 2020 - 國立臺灣大學speech.ee.ntu.edu.tw/~tlkagk/courses/ML2020/introduction.pdf · 2020-03-04 · Regression Classification RNN Seq2seq Meta Learning Unsupervised

쫄지말자딥러닝2 - CNN RNN 포함버전

Deep Learning: Models for Sequence Data (RNN and …i-systems.github.io/HSE545/machine learning all/16 Deep learning... · Machine Learning (CS771A) ... Open gate denoted by ’o’,

CSB-RNN: A Faster-than-Realtime RNN Acceleration …

Manual scada b rnn

CNN RNN Autoencoder - UNISTisystems.unist.ac.kr/wp-content/uploads/sites/209/2017/... · 2018. 12. 28. · CNN RNN Autoencoder. Today 1. MachineLearning 2. DeepLearning –CNN –RNN

(final) RNN Implementation on FPGA

RNN Slides

RnN#02 Doppler Radar.pdf

“וח...nun-Illil quo rnn nrnnn - 30 'orp h' It-IV ,nv-no an-I 0'mnn npown rnrpg ;nvn1'Y' rnn'9 rpn:wnn nvnun , nupa rnn'91 -nn'R rpnyv numm 0'or1' njvv' no-nn rnn'91 ,0'-nnt

Agreement for rnn

Statistical Machine Learning Methods for Bioinformatics IV ...calla.rnet.missouri.edu/cheng_courses/cheng_nn_bioinfo.pdf · Dimensional Recurrent Neural Network (1D-RNN) Pollastri

CSC411 RNN Application: Machine Translationguerzhoy/411/lec/W11/rnn_translate.pdf · autorise les pilotes à demander aux passagers d' éteindre leurs appareils pendant les atterrissages

제1장 자연언어처리의 개념 - cs.kangwon.ac.krleeck/NLP/01_intro.pdf · 자연언어처리 •자연언어처리란? –컴퓨터를 통하여 인간의 언어를 처리하고

Deep Leaning : Modeling Sequence / RNN

Compression of Neural Machine Translation Models via Pruningnlp.stanford.edu/pubs/see2016compression.pdf(2015b) to recurrent neural networks (RNN), in particular LSTM architectures

Recurrent Neural Network shad.pcs15@iitp.acshad.pcs15/data/rnn-shad.pdf · Recurrent Neural Network (RNN) Training of RNNs BPTT Visualization of RNN through Feed-Forward Neural Network

SSoouutt thhhe eer rnn AAArreeaaa