95
T5 and large language models: The good, the bad, and the ugly Colin Raffel

T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

T5 and large language models: The good, the bad,

and the ugly Colin Raffel

Page 2: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Which transfer learning methods work best, and what happens when we scale them up?

What about non-English pre-trained models?

How much knowledge does the model learn during pre-training?

Does the model memorize data during pre-training?

Which Transformer modifications work best?

Page 3: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

The cabs ____ the same rates as those ____ by horse-drawn cabs and were ____ quite popular, ____ the Prince of Wales (the ____ King Edward VII) travelled in ____. The cabs quickly ____ known as "hummingbirds" for ____ noise made by their motors and their distinctive black and ____ livery. Passengers ____ ____ the interior fittings were ____ when compared to ____ cabs but there ____ some complaints ____ the ____ lighting made them too ____ to those outside ____.

charged, used, initially, even, future, became, the, yellow, reported, that, luxurious, horse-drawn, were that, internal, conspicuous, cab

Unsupervised pre-training

This movie is terrible! The acting is bad and I was bored the entire time. There was no plot and nothing interesting happened. I was really surprised since I had very high expectations. I want 103 minutes of my life back!

negative

Supervised fine-tuning

Page 4: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

SQuAD Exact Match score (validation set)

Source: https://paperswithcode.com/sota/question-answering-on-squad11-dev

Page 5: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Source: https://paperswithcode.com/sota/question-answering-on-squad11-dev

{ {Transfer learning

No transfer learning

Page 6: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Source: https://paperswithcode.com/sota/question-answering-on-squad11-dev

BERT

T5

Page 7: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

word2vec

ELMoULMFiT

BERTMASS

20192015 2016 2017 20182014 2020

GPT-1Semi-S

upervise

d Sequence Learning

StructB

ERT

FreeLB

ALBERT

SpanBERT

RoBERTa

XLNet

MT-DNN

BERT on STILTs

Unsupervi

sed se

ntiment n

euron

Page 8: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

word2vec

ELMoULMFiT

20192015 2016 2017 20182014 2020

Semi-Supervi

sed Sequence

Learning

{Lots of stuff!

Unsupervi

sed se

ntiment n

euron

Page 9: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

- Paper A proposes an unsupervised pre-training technique called "FancyLearn".

- Paper B proposes another pre-training technique called "FancierLearn" and achieves better results.

- Paper A uses Wikipedia for unlabeled data.

- Paper B uses Wikipedia and the Toronto Books Corpus.

- Is FancierLearn better than FancyLearn?

Page 10: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

- Paper A proposes an unsupervised pre-training technique called "FancyLearn".

- Paper B proposes another pre-training technique called "FancierLearn" and achieves better results.

- Paper A uses a model with 100 million parameters.

- Paper B uses a model with 200 million parameters.

- Is FancierLearn better than FancyLearn?

Page 11: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

- Paper A proposes an unsupervised pre-training technique called "FancyLearn".

- Paper B proposes another pre-training technique called "FancierLearn" and achieves better results.

- Paper A pre-trains on 100 billion tokens of unlabeled data.

- Paper B pre-trains on 200 billion tokens of unlabeled data.

- Is FancierLearn better than FancyLearn?

Page 12: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

- Paper A proposes an unsupervised pre-training technique called "FancyLearn".

- Paper B proposes another pre-training technique called "FancierLearn" and achieves better results.

- Paper A uses the Adam optimizer.

- Paper B uses SGD with momentum.

- Is FancierLearn better than FancyLearn?

Page 13: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Given the current landscape of transfer learning for NLP, what works best? And how

far can we push the tools we already have?

Page 14: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

T5

Text-to-Text Transfer

Transformer

Page 15: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

"translate English to German: That is good." T5 "Das ist gut."

Page 16: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

"cola sentence: The course is jumping well." T5 "not acceptable"

Page 17: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

"stsb sentence1: The rhino grazed on the grass. sentence2: A rhino

is grazing in a field." T5 "3.8"

Page 18: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

"summarize: state authorities dispatched emergency crews tuesday to survey the damage after an onslaught of severe weather in mississippi…"

T5 "six people hospitalized after a storm in attala county."

Page 19: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

"translate English to German: That is good."

"cola sentence: The course is jumping well."

"summarize: state authorities dispatched emergency crews tuesday to survey the damage after an onslaught of severe weather in mississippi…"

"stsb sentence1: The rhino grazed on the grass. sentence2: A rhino

is grazing in a field."T5

"Das ist gut."

"not acceptable"

"six people hospitalized after a storm in attala county."

"3.8"

Page 20: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Source: http://jalammar.github.io/illustrated-transformer/

Page 21: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

== treaty of paris (1763)

the treaty of paris, also known as the treaty of 1763, was signed on 10 february 1763 by the kingdoms of great britain, france and spain, with portugal in agreement, after great britain's victory over france and spain during the seven years' war.

the signing of the treaty formally ended the seven years' war, known as the french and indian war in the north american theatre,[1] and marked the beginning of an era of british dominance outside europe.[2] great britain and france each returned much of the territory that they had captured during the war, but great britain gained much of france's possessions in north america. additionally, great britain agreed to protect roman catholicism in the new world...

== wheelbarrow

a wheelbarrow is a small hand-propelled vehicle, usually with just one wheel, designed to be pushed and guided by a single person using two handles at the rear, or by a sail to push the ancient wheelbarrow by wind. the term "wheelbarrow" is made of two words: "wheel" and "barrow." "barrow" is a derivation of the old english "bearwe" which was a device used for carrying loads.

the wheelbarrow is designed to distribute the weight of its load between the wheel and the operator, so enabling the convenient carriage of heavier and bulkier loads than would be possible were the weight carried entirely by the operator. as such it is a second-class lever...

== lemon

the lemon, citrus limon (l.) osbeck, is a species of small evergreen tree in the flowering plant family rutaceae, native to south asia, primarily north eastern india.

the tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.[2] the pulp and rind (zest) are also used in cooking and baking. the juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste. the distinctive sour taste of lemon juice makes it a key ingredient in drinks and foods such as lemonade and lemon meringue pie...

== lemon

the lemon, citrus limon (l.) osbeck, is a species of small evergreen tree in the flowering plant family rutaceae, native to south asia, primarily north eastern india.

the tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.[2] the pulp and rind (zest) are also used in cooking and baking. the juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste. the distinctive sour taste of lemon juice makes it a key ingredient in drinks and foods such as lemonade and lemon meringue pie...

== treaty of paris (1763)

the treaty of paris, also known as the treaty of 1763, was signed on 10 february 1763 by the kingdoms of great britain, france and spain, with portugal in agreement, after great britain's victory over france and spain during the seven years' war.

the signing of the treaty formally ended the seven years' war, known as the french and indian war in the north american theatre,[1] and marked the beginning of an era of british dominance outside europe.[2] great britain and france each returned much of the territory that they had captured during the war, but great britain gained much of france's possessions in north america. additionally, great britain agreed to protect roman catholicism in the new world...

== treaty of paris (1763)

the treaty of paris, also known as the treaty of 1763, was signed on 10 february 1763 by the kingdoms of great britain, france and spain, with portugal in agreement, after great britain's victory over france and spain during the seven years' war.

the signing of the treaty formally ended the seven years' war, known as the french and indian war in the north american theatre,[1] and marked the beginning of an era of british dominance outside europe.[2] great britain and france each returned much of the territory that they had captured during the war, but great britain gained much of france's possessions in north america. additionally, great britain agreed to protect roman catholicism in the new world...

== lemon

the lemon, citrus limon (l.) osbeck, is a species of small evergreen tree in the flowering plant family rutaceae, native to south asia, primarily north eastern india.

the tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.[2] the pulp and rind (zest) are also used in cooking and baking. the juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste. the distinctive sour taste of lemon juice makes it a key ingredient in drinks and foods such as lemonade and lemon meringue pie...

== wheelbarrow

a wheelbarrow is a small hand-propelled vehicle, usually with just one wheel, designed to be pushed and guided by a single person using two handles at the rear, or by a sail to push the ancient wheelbarrow by wind. the term "wheelbarrow" is made of two words: "wheel" and "barrow." "barrow" is a derivation of the old english "bearwe" which was a device used for carrying loads.

the wheelbarrow is designed to distribute the weight of its load between the wheel and the operator, so enabling the convenient carriage of heavier and bulkier loads than would be possible were the weight carried entirely by the operator. as such it is a second-class lever...

== treaty of paris (1763)

the treaty of paris, also known as the treaty of 1763, was signed on 10 february 1763 by the kingdoms of great britain, france and spain, with portugal in agreement, after great britain's victory over france and spain during the seven years' war.

the signing of the treaty formally ended the seven years' war, known as the french and indian war in the north american theatre,[1] and marked the beginning of an era of british dominance outside europe.[2] great britain and france each returned much of the territory that they had captured during the war, but great britain gained much of france's possessions in north america. additionally, great britain agreed to protect roman catholicism in the new world...

== wheelbarrow

a wheelbarrow is a small hand-propelled vehicle, usually with just one wheel, designed to be pushed and guided by a single person using two handles at the rear, or by a sail to push the ancient wheelbarrow by wind. the term "wheelbarrow" is made of two words: "wheel" and "barrow." "barrow" is a derivation of the old english "bearwe" which was a device used for carrying loads.

the wheelbarrow is designed to distribute the weight of its load between the wheel and the operator, so enabling the convenient carriage of heavier and bulkier loads than would be possible were the weight carried entirely by the operator. as such it is a second-class lever...

== treaty of paris (1763)

the treaty of paris, also known as the treaty of 1763, was signed on 10 february 1763 by the kingdoms of great britain, france and spain, with portugal in agreement, after great britain's victory over france and spain during the seven years' war.

the signing of the treaty formally ended the seven years' war, known as the french and indian war in the north american theatre,[1] and marked the beginning of an era of british dominance outside europe.[2] great britain and france each returned much of the territory that they had captured during the war, but great britain gained much of france's possessions in north america. additionally, great britain agreed to protect roman catholicism in the new world...

== lemon

the lemon, citrus limon (l.) osbeck, is a species of small evergreen tree in the flowering plant family rutaceae, native to south asia, primarily north eastern india.

the tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.[2] the pulp and rind (zest) are also used in cooking and baking. the juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste. the distinctive sour taste of lemon juice makes it a key ingredient in drinks and foods such as lemonade and lemon meringue pie...

== lemon

the lemon, citrus limon (l.) osbeck, is a species of small evergreen tree in the flowering plant family rutaceae, native to south asia, primarily north eastern india.

the tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.[2] the pulp and rind (zest) are also used in cooking and baking. the juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste. the distinctive sour taste of lemon juice makes it a key ingredient in drinks and foods such as lemonade and lemon meringue pie...

== treaty of paris (1763)

the treaty of paris, also known as the treaty of 1763, was signed on 10 february 1763 by the kingdoms of great britain, france and spain, with portugal in agreement, after great britain's victory over france and spain during the seven years' war.

the signing of the treaty formally ended the seven years' war, known as the french and indian war in the north american theatre,[1] and marked the beginning of an era of british dominance outside europe.[2] great britain and france each returned much of the territory that they had captured during the war, but great britain gained much of france's possessions in north america. additionally, great britain agreed to protect roman catholicism in the new world...

== lemon

the lemon, citrus limon (l.) osbeck, is a species of small evergreen tree in the flowering plant family rutaceae, native to south asia, primarily north eastern india.

the tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.[2] the pulp and rind (zest) are also used in cooking and baking. the juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste. the distinctive sour taste of lemon juice makes it a key ingredient in drinks and foods such as lemonade and lemon meringue pie...

== treaty of paris (1763)

the treaty of paris, also known as the treaty of 1763, was signed on 10 february 1763 by the kingdoms of great britain, france and spain, with portugal in agreement, after great britain's victory over france and spain during the seven years' war.

the signing of the treaty formally ended the seven years' war, known as the french and indian war in the north american theatre,[1] and marked the beginning of an era of british dominance outside europe.[2] great britain and france each returned much of the territory that they had captured during the war, but great britain gained much of france's possessions in north america. additionally, great britain agreed to protect roman catholicism in the new world...

== wheelbarrow

a wheelbarrow is a small hand-propelled vehicle, usually with just one wheel, designed to be pushed and guided by a single person using two handles at the rear, or by a sail to push the ancient wheelbarrow by wind. the term "wheelbarrow" is made of two words: "wheel" and "barrow." "barrow" is a derivation of the old english "bearwe" which was a device used for carrying loads.

the wheelbarrow is designed to distribute the weight of its load between the wheel and the operator, so enabling the convenient carriage of heavier and bulkier loads than would be possible were the weight carried entirely by the operator. as such it is a second-class lever...

== treaty of paris (1763)

the treaty of paris, also known as the treaty of 1763, was signed on 10 february 1763 by the kingdoms of great britain, france and spain, with portugal in agreement, after great britain's victory over france and spain during the seven years' war.

the signing of the treaty formally ended the seven years' war, known as the french and indian war in the north american theatre,[1] and marked the beginning of an era of british dominance outside europe.[2] great britain and france each returned much of the territory that they had captured during the war, but great britain gained much of france's possessions in north america. additionally, great britain agreed to protect roman catholicism in the new world...

== lemon

the lemon, citrus limon (l.) osbeck, is a species of small evergreen tree in the flowering plant family rutaceae, native to south asia, primarily north eastern india.

the tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.[2] the pulp and rind (zest) are also used in cooking and baking. the juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste. the distinctive sour taste of lemon juice makes it a key ingredient in drinks and foods such as lemonade and lemon meringue pie...

== treaty of paris (1763)

the treaty of paris, also known as the treaty of 1763, was signed on 10 february 1763 by the kingdoms of great britain, france and spain, with portugal in agreement, after great britain's victory over france and spain during the seven years' war.

the signing of the treaty formally ended the seven years' war, known as the french and indian war in the north american theatre,[1] and marked the beginning of an era of british dominance outside europe.[2] great britain and france each returned much of the territory that they had captured during the war, but great britain gained much of france's possessions in north america. additionally, great britain agreed to protect roman catholicism in the new world...

== treaty of paris (1763)

the treaty of paris, also known as the treaty of 1763, was signed on 10 february 1763 by the kingdoms of great britain, france and spain, with portugal in agreement, after great britain's victory over france and spain during the seven years' war.

the signing of the treaty formally ended the seven years' war, known as the french and indian war in the north american theatre,[1] and marked the beginning of an era of british dominance outside europe.[2] great britain and france each returned much of the territory that they had captured during the war, but great britain gained much of france's possessions in north america. additionally, great britain agreed to protect roman catholicism in the new world...

== treaty of paris (1763)

the treaty of paris, also known as the treaty of 1763, was signed on 10 february 1763 by the kingdoms of great britain, france and spain, with portugal in agreement, after great britain's victory over france and spain during the seven years' war.

the signing of the treaty formally ended the seven years' war, known as the french and indian war in the north american theatre,[1] and marked the beginning of an era of british dominance outside europe.[2] great britain and france each returned much of the territory that they had captured during the war, but great britain gained much of france's possessions in north america. additionally, great britain agreed to protect roman catholicism in the new world...

== wheelbarrow

a wheelbarrow is a small hand-propelled vehicle, usually with just one wheel, designed to be pushed and guided by a single person using two handles at the rear, or by a sail to push the ancient wheelbarrow by wind. the term "wheelbarrow" is made of two words: "wheel" and "barrow." "barrow" is a derivation of the old english "bearwe" which was a device used for carrying loads.

the wheelbarrow is designed to distribute the weight of its load between the wheel and the operator, so enabling the convenient carriage of heavier and bulkier loads than would be possible were the weight carried entirely by the operator. as such it is a second-class lever...

== wheelbarrow

a wheelbarrow is a small hand-propelled vehicle, usually with just one wheel, designed to be pushed and guided by a single person using two handles at the rear, or by a sail to push the ancient wheelbarrow by wind. the term "wheelbarrow" is made of two words: "wheel" and "barrow." "barrow" is a derivation of the old english "bearwe" which was a device used for carrying loads.

the wheelbarrow is designed to distribute the weight of its load between the wheel and the operator, so enabling the convenient carriage of heavier and bulkier loads than would be possible were the weight carried entirely by the operator. as such it is a second-class lever...

== lemon

the lemon, citrus limon (l.) osbeck, is a species of small evergreen tree in the flowering plant family rutaceae, native to south asia, primarily north eastern india.

the tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.[2] the pulp and rind (zest) are also used in cooking and baking. the juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste. the distinctive sour taste of lemon juice makes it a key ingredient in drinks and foods such as lemonade and lemon meringue pie...

== lemon

the lemon, citrus limon (l.) osbeck, is a species of small evergreen tree in the flowering plant family rutaceae, native to south asia, primarily north eastern india.

the tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.[2] the pulp and rind (zest) are also used in cooking and baking. the juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste. the distinctive sour taste of lemon juice makes it a key ingredient in drinks and foods such as lemonade and lemon meringue pie...

== oklahoma city

oklahoma city (/oʊkləˌhoʊmə -/), often shortened to okc, is the capital and largest city of the u.s. state of oklahoma. the county seat of oklahoma county,[8] the city ranks 27th among united states cities in population. the population grew following the 2010 census, with the population estimated to have increased to 643,648 as of july 2017.[5] as of 2015, the oklahoma city metropolitan area had a population of 1,358,452,[9] and the oklahoma city-shawnee combined statistical area had a population of 1,459,758 residents,[9] making it oklahoma's largest metropolitan area.

oklahoma city's city limits extend into canadian,...

== treaty of paris (1763)

the treaty of paris, also known as the treaty of 1763, was signed on 10 february 1763 by the kingdoms of great britain, france and spain, with portugal in agreement, after great britain's victory over france and spain during the seven years' war.

the signing of the treaty formally ended the seven years' war, known as the french and indian war in the north american theatre,[1] and marked the beginning of an era of british dominance outside europe.[2] great britain and france each returned much of the territory that they had captured during the war, but great britain gained much of france's possessions in north america. additionally, great britain agreed to protect roman catholicism in the new world...

== piano

the piano is an acoustic, stringed musical instrument invented in italy by bartolomeo cristofori around the year 1700 (the exact year is uncertain), in which the strings are struck by hammers. it is played using a keyboard,[1] which is a row of keys (small levers) that the performer presses down or strikes with the fingers and thumbs of both hands to cause the hammers to strike the strings.

the word piano is a shortened form of pianoforte, the italian term for the early 1700s versions of the instrument, which in turn derives from gravicembalo col piano e forte[2] and fortepiano. the italian musical terms piano and forte indicate "soft" and "loud" respectively,[3] in this context referring to the variations in volume ...

== running man (tv series)

running man was classified as an "urban action variety"; a genre of variety shows in an urban environment.[1] the mcs and guests were to complete missions at a landmark to win the race.[2] the show has since shifted to a more familiar reality-variety show concept focused on games. it has garnered attention as being the comeback program for yoo jae-suk, the main mc of the program, after leaving good sunday's family outing in february 2010.[3]

the show has become popular in other parts of asia, and has gained online popularity among hallyu fans, having been fansubbed into various languages, such as english, spanish, portuguese, french, italian, thai, vietnamese, chinese, ...

== wheelbarrow

a wheelbarrow is a small hand-propelled vehicle, usually with just one wheel, designed to be pushed and guided by a single person using two handles at the rear, or by a sail to push the ancient wheelbarrow by wind. the term "wheelbarrow" is made of two words: "wheel" and "barrow." "barrow" is a derivation of the old english "bearwe" which was a device used for carrying loads.

the wheelbarrow is designed to distribute the weight of its load between the wheel and the operator, so enabling the convenient carriage of heavier and bulkier loads than would be possible were the weight carried entirely by the operator. as such it is a second-class lever...

== wheelbarrow

a wheelbarrow is a small hand-propelled vehicle, usually with just one wheel, designed to be pushed and guided by a single person using two handles at the rear, or by a sail to push the ancient wheelbarrow by wind. the term "wheelbarrow" is made of two words: "wheel" and "barrow." "barrow" is a derivation of the old english "bearwe" which was a device used for carrying loads.

the wheelbarrow is designed to distribute the weight of its load between the wheel and the operator, so enabling the convenient carriage of heavier and bulkier loads than would be possible were the weight carried entirely by the operator. as such it is a second-class lever...

== lemon

the lemon, citrus limon (l.) osbeck, is a species of small evergreen tree in the flowering plant family rutaceae, native to south asia, primarily north eastern india.

the tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.[2] the pulp and rind (zest) are also used in cooking and baking. the juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste. the distinctive sour taste of lemon juice makes it a key ingredient in drinks and foods such as lemonade and lemon meringue pie...

== oklahoma city

oklahoma city (/oʊkləˌhoʊmə -/), often shortened to okc, is the capital and largest city of the u.s. state of oklahoma. the county seat of oklahoma county,[8] the city ranks 27th among united states cities in population. the population grew following the 2010 census, with the population estimated to have increased to 643,648 as of july 2017.[5] as of 2015, the oklahoma city metropolitan area had a population of 1,358,452,[9] and the oklahoma city-shawnee combined statistical area had a population of 1,459,758 residents,[9] making it oklahoma's largest metropolitan area.

oklahoma city's city limits extend into canadian,...

== treaty of paris (1763)

the treaty of paris, also known as the treaty of 1763, was signed on 10 february 1763 by the kingdoms of great britain, france and spain, with portugal in agreement, after great britain's victory over france and spain during the seven years' war.

the signing of the treaty formally ended the seven years' war, known as the french and indian war in the north american theatre,[1] and marked the beginning of an era of british dominance outside europe.[2] great britain and france each returned much of the territory that they had captured during the war, but great britain gained much of france's possessions in north america. additionally, great britain agreed to protect roman catholicism in the new world...

Page 22: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Please enable JavaScript to use our site.

HomeProductsShippingContactFAQ

Dried Lemons, $3.59/pound

Organic dried lemons from our farm in California.Lemons are harvested and sun-dried for maximum flavor.Good in soups and on popcorn.

The lemon, Citrus Limon (l.) Osbeck, is a species of small evergreen tree in the flowering plant family rutaceae.The tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.The juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste.

Menu

Lemon

Introduction

The lemon, Citrus Limon (l.) Osbeck, is a species of small evergreen tree in the flowering plant family rutaceae.The tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.The juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste.

Article

The origin of the lemon is unknown, though lemons are thought to have first grown in Assam (a region in northeast India), northern Burma or China.A genomic study of the lemon indicated it was a hybrid between bitter orange (sour orange) and citron.

Lorem ipsum dolor sit amet, consectetur adipiscing elit.Curabitur in tempus quam. In mollis et ante at consectetur.Aliquam erat volutpat.Donec at lacinia est.Duis semper, magna tempor interdum suscipit, ante elit molestie urna, eget efficitur risus nunc ac elit.Fusce quis blandit lectus.Mauris at mauris a turpis tristique lacinia at nec ante.Aenean in scelerisque tellus, a efficitur ipsum.Integer justo enim, ornare vitae sem non, mollis fermentum lectus.Mauris ultrices nisl at libero porta sodales in ac orci.

function Ball(r) { this.radius = r; this.area = pi * r ** 2; this.show = function(){ drawCircle(r); }}

Common Crawl Web Extracted Text

Page 23: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Please enable JavaScript to use our site.

HomeProductsShippingContactFAQ

Dried Lemons, $3.59/pound

Organic dried lemons from our farm in California.Lemons are harvested and sun-dried for maximum flavor.Good in soups and on popcorn.

The lemon, Citrus Limon (l.) Osbeck, is a species of small evergreen tree in the flowering plant family rutaceae.The tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.The juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste.

Menu

Lemon

Introduction

The lemon, Citrus Limon (l.) Osbeck, is a species of small evergreen tree in the flowering plant family rutaceae.The tree's ellipsoidal yellow fruit is used for culinary and non-culinary purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses.The juice of the lemon is about 5% to 6% citric acid, with a ph of around 2.2, giving it a sour taste.

Article

The origin of the lemon is unknown, though lemons are thought to have first grown in Assam (a region in northeast India), northern Burma or China.A genomic study of the lemon indicated it was a hybrid between bitter orange (sour orange) and citron.

Lorem ipsum dolor sit amet, consectetur adipiscing elit.Curabitur in tempus quam. In mollis et ante at consectetur.Aliquam erat volutpat.Donec at lacinia est.Duis semper, magna tempor interdum suscipit, ante elit molestie urna, eget efficitur risus nunc ac elit.Fusce quis blandit lectus.Mauris at mauris a turpis tristique lacinia at nec ante.Aenean in scelerisque tellus, a efficitur ipsum.Integer justo enim, ornare vitae sem non, mollis fermentum lectus.Mauris ultrices nisl at libero porta sodales in ac orci.

function Ball(r) { this.radius = r; this.area = pi * r ** 2; this.show = function(){ drawCircle(r); }}

Common Crawl Web Extracted Text

Page 24: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time
Page 25: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Thank you for inviting me to your party last week.Original text

Page 26: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Thank you for inviting me to your party last week.Original text

Page 27: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Thank you <X> me to your party <Y> week.

Thank you for inviting me to your party last week.Original text

Inputs

Page 28: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Thank you <X> me to your party <Y> week.

Thank you for inviting me to your party last week.Original text

Inputs

<X> for inviting <Y> last <Z>Targets

Page 29: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

219 steps235 or ~34B tokens

Inverse square root learning rate schedule

Pretrain

BERTBASE-sized encoder-decoder

Transformer

C4 dataset

Denoising objective

Page 30: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

GLUE

219 steps235 or ~34B tokens

Inverse square root learning rate schedule

Pretrain

Finetune

BERTBASE-sized encoder-decoder

Transformer

C4 dataset

Denoising objective

218 steps234 or ~17B tokens

Constant learning rate

Page 31: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

218 steps234 or ~17B tokens

Constant learning rate

GLUE

CNN/DM

219 steps235 or ~34B tokens

Inverse square root learning rate schedule

Pretrain

Finetune

BERTBASE-sized encoder-decoder

Transformer

C4 dataset

Denoising objective

Page 32: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

218 steps234 or ~17B tokens

Constant learning rate

GLUE

CNN/DM

SQuAD

219 steps235 or ~34B tokens

Inverse square root learning rate schedule

Pretrain

Finetune

BERTBASE-sized encoder-decoder

Transformer

C4 dataset

Denoising objective

Page 33: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

218 steps234 or ~17B tokens

Constant learning rate

GLUE

CNN/DM

SQuAD

SuperGLUE

219 steps235 or ~34B tokens

Inverse square root learning rate schedule

Pretrain

Finetune

BERTBASE-sized encoder-decoder

Transformer

C4 dataset

Denoising objective

Page 34: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

218 steps234 or ~17B tokens

Constant learning rate

GLUE

CNN/DM

SQuAD

SuperGLUE

WMT14 EnDe

WMT15 EnFr

WMT16 EnRo219 steps

235 or ~34B tokensInverse square root learning

rate schedule

Pretrain

Finetune

BERTBASE-sized encoder-decoder

Transformer

C4 dataset

Denoising objective

Page 35: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

218 steps234 or ~17B tokens

Constant learning rate

GLUE

CNN/DM

SQuAD

SuperGLUE

WMT14 EnDe

WMT15 EnFr

WMT16 EnRo219 steps

235 or ~34B tokensInverse square root learning

rate schedule

Pretrain

Finetune Evaluate on validation

step 750000

step 760000

step 770000

step 780000

Evaluate all checkpoints, choose the best

BERTBASE-sized encoder-decoder

Transformer

C4 dataset

Denoising objective

Page 36: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Downstream task performanceSetting 1Setting 2...

Page 37: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time
Page 38: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Star denotes baseline Comparable to BERT Bold = 1 std. dev. of max

Big training set

Page 39: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Disclaimer

Page 40: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

x1 x2 x3 x4

y1 y2 .

Enco

der

Dec

oder

x1 x2 x3 y1 y2

x2 x3 y1 y2 .

Language model

x1 x2 x3 y1 y2

x2 x3 y1 y2 .

Prefix LM

Page 41: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

High-levelapproaches

BERT-style

Deshuffling

Language modeling

Corruption strategies

Mask

Drop

Replace spans

10%

15%

25%

Corruption rate

50%

2

3

5

Corrupted span length

10

Page 42: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Please enable JavaScript to use our site.

HomeAboutProductsShippingContactFAQ

Dried Lemons, $3.59/pound

Organic dried lemons from our farm in California.Lemons are harvested and sun-dried for maximum flavor.Good in soups and on popcorn.

Please enable JavaScript to use our site.

HomeAboutProductsShippingContactFAQ

Dried Lemons, $3.59/pound

Organic dried lemons from our farm in California.Lemons are harvested and sun-dried for maximum flavor.Good in soups and on popcorn.

Please enable JavaScript to use our site.

HomeAboutProductsShippingContactFAQ

Dried Lemons, $3.59/pound

Organic dried lemons from our farm in California.Lemons are harvested and sun-dried for maximum flavor.Good in soups and on popcorn.

Please enable JavaScript to use our site.

HomeAboutProductsShippingContactFAQ

Dried Lemons, $3.59/pound

Organic dried lemons from our farm in California.Lemons are harvested and sun-dried for maximum flavor.Good in soups and on popcorn.

Much better on MultiRC

Much better on ReCoRDMuch worse on CoLA

{Order of magnitude smaller

Please enable JavaScript to use our site.

HomeAboutProductsShippingContactFAQ

Dried Lemons, $3.59/pound

Organic dried lemons from our farm in California.Lemons are harvested and sun-dried for maximum flavor.Good in soups and on popcorn.

Please enable JavaScript to use our site.

HomeAboutProductsShippingContactFAQ

Dried Lemons, $3.59/pound

Organic dried lemons from our farm in California.Lemons are harvested and sun-dried for maximum flavor.Good in soups and on popcorn.

Page 43: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time
Page 44: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Task

Mix

ing

wei

ght Tem

perature (T)

Threshold (K)

Page 45: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Task A Task B

Task C

Task A

Task B

Task C

UnsupervisedTask

Task A

Task B

Task C

UnsupervisedTask

Task ATask B

Task C

UnsupervisedTask

Task ATask B

Task C

Task A

Task B

Task C

UnsupervisedTask

Task A

Task C

Task B

Page 46: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time
Page 47: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Encoder-decoder architecture

Span prediction objective

C4 dataset

Multi-task pre-training

Bigger models trained longer

Page 48: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Model size variants

Page 49: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Human score = 89.8

Back-translation beats English-only pre-training

Page 50: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

https://github.com/google-research/text-to-text-transfer-transformer

Page 51: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

http://tiny.cc/t5-colab

Page 52: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

What about all of the other languages?

Page 53: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time
Page 54: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese, Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Sotho, Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, West Frisian, Xhosa, Yiddish, Yoruba, Zulu.

Page 55: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

English3B pages 3T tokens

Yoruba50K pages

50M tokens

Slide from Noah Constant

Page 56: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Slide from Noah Constant

Page 57: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

XNLI Zero-shot AccuracyUrdu Russian

α=0.2 73.9 81.2α=0.3 73.5 81.5α=0.7 71.7 82.8

Slide from Noah Constant

Page 58: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Slide from Noah Constant

Page 59: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

TyDi QA GoldP Performance

Slide from Noah Constant

Page 60: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

How much knowledge does a language model

pick up during pre-training?

Page 61: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Reading Comprehension

"The lemon tree's ellipsoidal yellow fruit is used for culinary and non-culinary

purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses. The pulp and rind are also

used in cooking and baking."

"What color is a lemon?"

Model yellow

Question

Context

Page 62: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

"What color is a lemon?"

Model yellow

Question

Database

Open-Domain Question Answering

"The lemon tree's ellipsoidal yellow fruit is used for culinary and non-culinary

purposes throughout the world, primarily for its juice, which has both culinary and cleaning uses. The pulp and rind are also

used in cooking and baking."

Page 63: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

"What color is a lemon?" Model yellow

Question

Closed-Book Question Answering

Page 64: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

President Franklin <M> born <M> January 1882.

Our <M> hand-picked and sun-dried <M> orchard in Georgia.

Lily couldn't <M>. The waitress had brought the largest <M> of

chocolate cake <M> seen. T5D. Roosevelt was <M> in

believe her eyes <M> piece <M> she had ever

peaches are <M> at our

When was Franklin D. Roosevelt born? T5 1882

President Franklin D. Roosevelt was bornin January 1882.

Pre-training

Fine-tuning

Page 65: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time
Page 66: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

T5 Ana Santos Aramburo<M> (born 1957) is a Spanish librarian

who has been the director of the National Library of Spain since February 2013.

SSM data from "REALM: Retrieval-Augmented Language Model Pre-Training" by Guu et al.

Page 67: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time
Page 68: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time
Page 69: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

✅✅

��

Page 70: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

12.5% 25% 37.5% 50% 62.5%

❌ True Negative

✅ Phrasing mismatch

✅ Incomplete annotation

🗑 Unanswerable

Exact Match: 36.6 → 57.8%!

Page 71: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Do large language models memorize their

training data?

Page 72: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

“... the extent that a work is produced with a machine learning tool that was trained on a large number of copyrighted works, the degree of copying with respect to any given work is likely to be, at most, de minimis.”

– Electronic Frontier Foundation

“Well-constructed AI systems generally do not regenerate, in any nontrivial portion, unaltered data from any particular work in their training corpus.”

– OpenAI

Page 73: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time
Page 74: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Top-n samplingDecaying-temperature sampling

Conditioning on Internet text

Perplexity… vs. different GPT

… vs. zlib… vs. lowercased

Windowed perplexity

In training set?

Page 75: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time
Page 76: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time
Page 77: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time
Page 78: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Can we close the gap between large and small models by improving the

Transformer architecture?

Page 79: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Source: http://jalammar.github.io/illustrated-transformer/

Page 80: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Factorized embeddings

Page 81: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Shared embedding and softmax layer

Page 82: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Mixture of Softmaxes, Adaptive softmax

Page 83: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

RMSNorm, ReZero, FixUp

Page 84: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Transparent Attention,Lightweight & Dynamic Convolutions,

Synthesizer

Page 85: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Nonlinearities,Mixture of Experts,Switch Transformer

Page 86: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Funnel Transformer, Evolved Transformer, Universal Transformer, block sharing ...

Page 87: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time
Page 88: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time
Page 89: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time
Page 90: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time
Page 91: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Validation loss

Supe

rGLU

E sc

ore

Page 92: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Validation loss

Supe

rGLU

E sc

ore

Transparent Attention

Switch Transformer

Page 93: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Validation loss

WQ

Acc

urac

yTransparent Attention

Switch Transformer

Page 94: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

- Is our codebase unusual?

- Are our tasks non-standard?

- Do we need to tune hyperparameters?

- Did we implement the modifications correctly?

- Do Transformer modifications not “transfer”?

Page 95: T5 and large language models: The good, the bad, and the uglyweb.stanford.edu/class/cs224n/slides/cs224n-2021-lecture... · 2021. 4. 15. · is bad and I was bored the entire time

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

mT5: A massively multilingual pre-trained text-to-text transformer

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

Extracting Training Data from Large Language Models

Do Transformer Modifications Transfer Across Implementations and Applications?

Work done with Adam Roberts, Aditya Barua, Aditya Siddhant, Alina Oprea, Ariel Herbert-Voss, Dawn Song, Eric Wallace, Florian Tramer, Hyung Won Chung, Jake Marcus, Karishma Malkan, Katherine Lee, Linting Xue, Matthew Jagielski, Michael Matena, Mihir Kale, Nan Ding, Nicholas Carlini, Noah Constant, Noah Fiedel, Noam Shazeer, Peter J. Liu, Rami Al-Rfou, Sharan Narang, Thibault Fevry, Tom Brown, Ulfar Erlingsson, Wei Li, William Fedus, Yanqi Zhou, Yi Tay, and Zhenzhong Lan

Questions?