107
NLP using Transformer Architectures TF World 2019 1 Aurélien Géron ML Consultant @aureliengeron

NLP using Transformer Architectures

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: NLP using Transformer Architectures

NLP using Transformer Architectures

TF World 2019

1

Aurélien GéronML Consultant @aureliengeron

Page 2: NLP using Transformer Architectures

❖ NLP Tasks and Datasets

❖ The Transformer Architecture

❖ Recent Language Models

Page 3: NLP using Transformer Architectures

Natural Language Processing

Page 4: NLP using Transformer Architectures

Sentiment Analysis

Page 5: NLP using Transformer Architectures

Sentiment Analysis

POSITIVE ?

Page 6: NLP using Transformer Architectures

Sentiment Analysis

POSITIVE ?

LABEL = POSITIVE

Page 7: NLP using Transformer Architectures

Sentiment Analysis

● Recommender Systems● Market Sentiment

Page 8: NLP using Transformer Architectures

Sentiment Analysis

Text Classification

Sentiment Analysis

Page 9: NLP using Transformer Architectures

!pip install tensorflow-datasets # or tfds-nightly

import tensorflow_datasets as tfds

datasets = tfds.load("imdb_reviews")

Sentiment Analysis

Page 10: NLP using Transformer Architectures

TensorFlow Datasetsdatasets = tfds.load("imdb_reviews")train_set = datasets["train"] # 25,000 reviewstest_set = datasets["test"] # 25,000 reviews

Page 11: NLP using Transformer Architectures

TensorFlow Datasetstrain_set, test_set = tfds.load( "imdb_reviews", split=["train", "test"])

Page 12: NLP using Transformer Architectures

TensorFlow Datasetstrain_set, test_set = tfds.load( "imdb_reviews:1.0.0", split=["train", "test"])

Page 13: NLP using Transformer Architectures

TensorFlow Datasetstrain_set, test_set = tfds.load( "imdb_reviews:1.0.0", split=["train", "test[:60%]"])

Page 14: NLP using Transformer Architectures

TensorFlow Datasetstrain_set, test_set, valid_set = tfds.load( "imdb_reviews:1.0.0", split=["train", "test[:60%]", "test[60%:]"])

Page 15: NLP using Transformer Architectures

TensorFlow Datasetstrain_set, test_set, valid_set = tfds.load( "imdb_reviews:1.0.0", split=["train", "test[:60%]", "test[60%:]"], as_supervised=True)

Page 16: NLP using Transformer Architectures

for review, label in train_set.take(2): print(review.numpy().decode("utf-8")) print(label.numpy())

TensorFlow Datasets

Page 17: NLP using Transformer Architectures

for review, label in train_set.take(2): print(review.numpy().decode("utf-8")) print(label.numpy())

This was an absolutely terrible movie. Don't be lured in by Christopher Walken or Michael Ironside. Both are great actors, but this must simply be their worst role in history.[...]0This is the kind of film for a snowy Sunday afternoon when the rest of the world can go ahead with its own [...] A family film in every sense and one that deserves the praise it received.1

TensorFlow Datasets

Page 18: NLP using Transformer Architectures

TensorFlow Datasets

tensorflow.org/datasets/catalog

Page 19: NLP using Transformer Architectures

"Das Leben ist wunderbar."

"Life is wonderful."

Translation

Page 20: NLP using Transformer Architectures

"Das Leben ist wunderbar."

"Life is wonderful."

Translation

Workshop on Machine Translation (WMT)

datasets = tfds.load("wmt19_translate/de-en") # 10 GB!

Page 21: NLP using Transformer Architectures

Summarization

"Popular YouTubers raise 10 million dollars in 5 days to plant 10 million trees, in an effort to fight global warming."

Page 22: NLP using Transformer Architectures

Summarization

"Popular YouTubers raise 10 million dollars in 5 days to plant 10 million trees, in an effort to fight global warming."

datasets = tfds.load("multi_news")Predict abstract from news articles

Page 23: NLP using Transformer Architectures

SoCal

What is Southern California often abbreviated as?

Question

Question Answering

Southern California, often abbreviated SoCal, is a geographic and cultural region

that generally comprises California's southernmost 10 counties. The region is

traditionally described as "eight counties", based on demographics and [...]

Context

Answer

Page 24: NLP using Transformer Architectures

datasets = tfds.load("squad")

SoCal

What is Southern California often abbreviated as?

Question

Question Answering

Southern California, often abbreviated SoCal, is a geographic and cultural region

that generally comprises California's southernmost 10 counties. The region is

traditionally described as "eight counties", based on demographics and [...]

Context

Answer

Stanford Question Answering Dataset (SQuAD)

Page 25: NLP using Transformer Architectures

Semantic Equivalence

"Alice could not find her keys.""Alice lost her keys.”?

Page 26: NLP using Transformer Architectures

Semantic Equivalence

"Alice could not find her keys.""Alice lost her keys.”?

datasets = tfds.load("glue/mrpc")Microsoft Research Paraphrase Corpus

Page 27: NLP using Transformer Architectures

Entailment

"Alice got her keys back.""Alice lost her keys, but Betty found them and returned them to her.”

Page 28: NLP using Transformer Architectures

Entailment

"Alice got her keys back.""Alice lost her keys, but Betty found them and returned them to her.” Entails

Page 29: NLP using Transformer Architectures

Entailment

"Alice lost her keys forever.""Alice lost her keys, but Betty found them and returned them to her.” Contradicts

Page 30: NLP using Transformer Architectures

Entailment

"Alice and Betty are sisters.""Alice lost her keys, but Betty found them and returned them to her.” Neutral

Page 31: NLP using Transformer Architectures

Entailment

"Alice and Betty are sisters."

datasets = tfds.load("snli/plain_text")Stanford Natural Language Inference Dataset

"Alice lost her keys, but Betty found them and returned them to her.” Neutral

Page 32: NLP using Transformer Architectures

Coreference Resolution

"Alice lost her keys, but Betty found them and returned them to her.”

Page 33: NLP using Transformer Architectures

Coreference Resolution

"Alice lost her keys, but Betty found them and returned them to her.”

Alice Betty

Page 34: NLP using Transformer Architectures

Coreference Resolution

"Alice lost her keys, but Betty found them and returned them to her.”

Alice Betty

Page 35: NLP using Transformer Architectures

Coreference Resolution

datasets = tfds.load("gap")Stanford Natural Language Inference Dataset

"Alice lost her keys, but Betty found them and returned them to her.”

Alice Betty

Page 36: NLP using Transformer Architectures

born in

Information Extraction

Alice London

Bob

married to

Charliefriends

Eve

friends

Page 37: NLP using Transformer Architectures

Speech to Text

Page 38: NLP using Transformer Architectures

Text to Speech

Page 39: NLP using Transformer Architectures

Optical Character Recognition

Page 40: NLP using Transformer Architectures

Metrics

Source: “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer” by Raffel, Shazeer, Roberts, Lee et al., https://arxiv.org/abs/1910.10683

Page 41: NLP using Transformer Architectures

Transformer Architecture

Page 42: NLP using Transformer Architectures

Attention Is All You Need

https://arxiv.org/abs/1706.03762

Page 43: NLP using Transformer Architectures

Attention Is All You Need

Source: Figure 1 from the paper“Attention Is All You Need” byVaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser and Polosukhin https://arxiv.org/abs/1706.03762

Page 44: NLP using Transformer Architectures

GPT-2Feb 2019

BERTOct 2018

Page 45: NLP using Transformer Architectures

Transformer-XLJan 2019

XLNetJun 2019

Page 46: NLP using Transformer Architectures

RoBERTaJul 2019

T5Oct 2019

Page 47: NLP using Transformer Architectures

LSTM Encoder/Decoder

“I drink milk”

Page 48: NLP using Transformer Architectures

BiLSTM Encoder/Decoder

“I drink milk”

Page 49: NLP using Transformer Architectures

Attention MechanismsSep 2014

Page 50: NLP using Transformer Architectures

Encoder/Decoder with Attention

“I drink milk”

bois

Page 51: NLP using Transformer Architectures

“I drink milk”

Encoder/Decoder with Attentionlait

Page 52: NLP using Transformer Architectures

“I drink milk”

Encoder/Decoder with Attentionlait

Page 53: NLP using Transformer Architectures

“I drink milk”

Encoder/Decoder with Attention

Page 54: NLP using Transformer Architectures

I

Attention Mechanism

milk

drink

Page 55: NLP using Transformer Architectures

I

Attention Mechanism

milk

Verb

Non-verb

Food-relatedNot food-related

drink

Page 56: NLP using Transformer Architectures

“I drink milk”

Encoder/Decoder with Attention

Page 57: NLP using Transformer Architectures

I

Attention Mechanismdrink

milk

Verb

Non-verb

Food-relatedNot food-related

“I drink”

Page 58: NLP using Transformer Architectures

“I drink milk”

Encoder/Decoder with Attention

Page 59: NLP using Transformer Architectures

I

Attention Mechanismdrink

milk

Verb

Non-verb

Food-relatedNot food-related

Query

“Je bois le <?>”

Page 60: NLP using Transformer Architectures

Query

I

Attention Mechanismdrink

milk

Verb

Non-verb

Food-relatedNot food-related

0.9

0.05

0.05

Page 61: NLP using Transformer Architectures

Input to next stepI

Attention Mechanismdrink

milk

Verb

Non-verb

Food-relatedNot food-related

Page 62: NLP using Transformer Architectures

“I drink milk”

Encoder/Decoder with Attentionlait

Page 63: NLP using Transformer Architectures

“I drink milk”

Attention Is All You Needlait

Page 64: NLP using Transformer Architectures

Attention Is All You Need

The bat was sleeping

Page 65: NLP using Transformer Architectures

Attention Is All You Need

The bat was sleeping?

Page 66: NLP using Transformer Architectures

Attention Is All You Need

The bat was sleeping

Page 67: NLP using Transformer Architectures

Attention Is All You Need

The bat was sleeping

The bat was sleepingAttention layer

Page 68: NLP using Transformer Architectures

Attention Is All You Need

The bat was sleeping

The bat was sleepingAttention layer

Page 69: NLP using Transformer Architectures

Figure 1 from the paper

Page 70: NLP using Transformer Architectures

Figure 1 from the paper(simplified)

Encoder

Decoder

Page 71: NLP using Transformer Architectures

Figure 1 from the paper(simplified)

Encoder

Decoder

Page 72: NLP using Transformer Architectures

Encoder

Decoder

Decoder 1

Figure 1 from the paper(simplified)

Encoder 2

Encoder 3

Encoder 4

Encoder 5

Decoder 2

Decoder 3

Decoder 4

Decoder 5

Encoder 1

Encoder 6 Decoder 6

Page 73: NLP using Transformer Architectures

Encoder

Decoder

Decoder 1

Figure 1 from the paper(simplified)

Encoder 2

Encoder 3

Encoder 4

Encoder 5

Decoder 2

Decoder 3

Decoder 4

Decoder 5

Encoder 1

Encoder 6 Decoder 6

Page 74: NLP using Transformer Architectures

Attention Is All You Need

The bat was sleeping

Page 75: NLP using Transformer Architectures

Attention Is All You Need

The bat was sleeping

The bat was sleeping

Page 76: NLP using Transformer Architectures

Attention Is All You Need

The bat was sleeping

The bat was sleeping

Page 77: NLP using Transformer Architectures

Attention Is All You Need

The bat was sleeping

The bat was sleeping

The bat was sleeping

Page 78: NLP using Transformer Architectures

Attention Is All You Need

The bat was sleeping

The bat was sleeping

The bat was sleeping

The bat was sleeping

Page 79: NLP using Transformer Architectures

Attention Is All You Need

The bat was sleeping

The bat was sleeping

The bat was sleeping

The bat was sleeping

<sos> La chauve souris dort

Page 80: NLP using Transformer Architectures

Attention Is All You Need

The bat was sleeping

The bat was sleeping

The bat was sleeping

The bat was sleeping

<sos> La chauve souris dort

<sos> La chauve souris dort

Page 81: NLP using Transformer Architectures

Attention Is All You Need

The bat was sleeping

The bat was sleeping

The bat was sleeping

The bat was sleeping

<sos> La chauve souris dort

<sos> La chauve souris dort

<sos> La chauve souris dort

<sos> La chauve souris dort

Page 82: NLP using Transformer Architectures

Attention Is All You Need

The bat was sleeping

The bat was sleeping

The bat was sleeping

The bat was sleeping

<sos> La chauve souris dort

<sos> La chauve souris dort

<sos> La chauve souris dort

<sos> La chauve souris dort

La chauve souris dort <eos>

Page 83: NLP using Transformer Architectures

Attention Is All You Need

The bat was sleeping

The bat was sleeping

The bat was sleeping

The bat was sleeping

<sos>

<sos>

<sos>

<sos>

La

Page 84: NLP using Transformer Architectures

Attention Is All You Need

The bat was sleeping

The bat was sleeping

The bat was sleeping

The bat was sleeping

<sos> La

<sos>

<sos>

<sos>

La

Page 85: NLP using Transformer Architectures

Attention Is All You Need

The bat was sleeping

The bat was sleeping

The bat was sleeping

The bat was sleeping

<sos> La

<sos> La

<sos> La

<sos> La

La chauve

Page 86: NLP using Transformer Architectures

Attention Is All You Need

The bat was sleeping

The bat was sleeping

The bat was sleeping

The bat was sleeping

<sos> La chauve

<sos> La

<sos> La

<sos> La

La chauve

Page 87: NLP using Transformer Architectures

Attention Is All You Need

The bat was sleeping

The bat was sleeping

The bat was sleeping

The bat was sleeping

<sos> La chauve souris dort

<sos> La chauve souris dort

<sos> La chauve souris dort

<sos> La chauve souris dort

La chauve souris dort <eos>

Page 88: NLP using Transformer Architectures

Query

I

Multi-Head Attentiondrink

milk

Verb

Non-verb

Food-relatedNot food-related

Page 89: NLP using Transformer Architectures

Query

I

Multi-Head Attentiondrink

milk

Verb

Non-verb

Food-relatedNot food-related

Page 90: NLP using Transformer Architectures

milk

Query

I

Multi-Head Attentiondrink

Verb

Non-verb

Food-relatedNot food-related

Page 91: NLP using Transformer Architectures

milk

Query

I

Multi-Head Attentiondrink

Verb

Non-verb

Food-relatedNot food-related

Page 92: NLP using Transformer Architectures

Query

I

Multi-Head Attentiondrink

milk

Verb

Non-verb

Food-relatedNot food-related

Page 93: NLP using Transformer Architectures

I

Query

Multi-Head Attentiondrink

milk

Verb

Non-verb

Food-relatedNot food-related

Page 94: NLP using Transformer Architectures

Language Models

Page 95: NLP using Transformer Architectures

Pretraining + Fine-tuningULMFitJan 2018

ELMoFeb 2018

Page 96: NLP using Transformer Architectures

BERT● Large number of parameters● Pretraining on a huge

corpus● Subword tokenization● Single architecture for

multiple tasks

Page 97: NLP using Transformer Architectures

Subword Tokenization

Going on computerless vacation

Going on a computer ##less vacation

Page 98: NLP using Transformer Architectures

BERT — Figure 4 From Paper

Page 99: NLP using Transformer Architectures

BERT — Figure 4 From Paper

Page 100: NLP using Transformer Architectures

GPT-2

Page 101: NLP using Transformer Architectures

Transformer-XL

Page 102: NLP using Transformer Architectures

XLNet

Page 103: NLP using Transformer Architectures

RoBERTa

Page 104: NLP using Transformer Architectures

T5

Page 105: NLP using Transformer Architectures

T5 — Figure 1 From Paper

Page 106: NLP using Transformer Architectures

Get Coding!

https://github.com/huggingface/transformershttps://medium.com/@lysandrejik

Page 107: NLP using Transformer Architectures

Thank you!