Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
10417/10617IntermediateDeepLearning:
Fall2019RussSalakhutdinov
Machine Learning [email protected]
https://deeplearning-cmu-10417.github.io/
Representation Learning for Reading Comprehension
• Disclaimer:SomeofthematerialandslidesforthislecturewereborrowedfromBhuwanDhingra’stalk.
2
ReadingComprehension
ReadingComprehension
Query:“President-electBarackObamasaidTuesdayhewasnotawareofallegedcorruptionbyXwhowasarrestedonchargesoftryingtosellObama’ssenateseat.”FindX.
Document:“…arrestedIllinoisgovernorRodBlagojevichandhischiefofstaffJohnHarrisoncorruptioncharges…includedBlogojevichallegedlyconspiringtosellortradethesenateseatleftvacantbyPresident-electBarackObama…”
Answer:RodBlagojevich
Who-Did-What Dataset (Onishi, Wang, Bansal, Gimpel, McAllester, EMNLP, 2016)
Ø isadocumentØ isaquestionoverthecontentsofthatdocumentØ istheanswertothisquery
• Theanswercomesfromafixedvocabulary.
TASK:Givenadocumentquerypairfindwhichanswers.
ReadingComprehension
Ø mightconsistofalltokens/spansoftokensinthedocument(ExtractiveQuestionAnswering)
• QuestionAnswering/InformationExtraction• Testfortextrepresentationmodels
– Abetterrepresentationcanhelpanswermorequestions
Approach--SupervisedLearning
ModelPr(c|d, q) = f✓(d, q, c) 8 c 2 A
L(✓) =X
(d,q,a)2D
� logPr(a|d, q) Loss
✓̂ = argmin✓
L(✓) Training
Whatarchitecturalbiasescanwebuildintothemodel?
(Aneuralnetwork)
D = {(d, q, a)}Ni=1 Dataset
ArchitecturalBias
• CNNs,RNNshavearchitecturalbiasestowardsimages/sequences
• Forreadingcomprehensionwhatbiasescanwebuildtoreflectlinguisticphenomena?– Alignment,Paraphrasing,Aggregation– Coreference,SyntacticandSemanticDependencies
DesigningtheconnectivitypatternofaNeuralNetworktoreflectthenatureoftheproblembeingsolved.
(Part1)
(Part2)
Query:“President-electBarackObamasaidTuesdayhewasnotawareofallegedcorruptionbyXwhowasarrestedonchargesoftryingtosellObama’ssenateseat.”FindX.
Document:“…arrestedIllinoisgovernorRodBlagojevichandhischiefofstaffJohnHarrisoncorruptioncharges…includedBlogojevichallegedlyconspiringtosellortradethesenateseatleftvacantbyPresident-electBarackObama…”
TextPhenomena
Who-Did-What Dataset (Onishi, Wang, Bansal, Gimpel, McAllester, EMNLP, 2016)
Alignment
Query:“President-electBarackObamasaidTuesdayhewasnotawareofallegedcorruptionbyXwhowasarrestedonchargesoftryingtosellObama’ssenateseat.”FindX.
Document:“…arrestedIllinoisgovernorRodBlagojevichandhischiefofstaffJohnHarrisoncorruptioncharges…includedBlogojevichallegedlyconspiringtosellortradethesenateseatleftvacantbyPresident-electBarackObama…”
TextPhenomena
Who-Did-What Dataset (Onishi, Wang, Bansal, Gimpel, McAllester, EMNLP, 2016)
Alignment Paraphrasing
Query:“President-electBarackObamasaidTuesdayhewasnotawareofallegedcorruptionbyXwhowasarrestedonchargesoftryingtosellObama’ssenateseat.”FindX.
Document:“…arrestedIllinoisgovernorRodBlagojevichandhischiefofstaffJohnHarrisoncorruptioncharges…includedBlogojevichallegedlyconspiringtosellortradethesenateseatleftvacantbyPresident-electBarackObama…”
TextPhenomena
Who-Did-What Dataset (Onishi, Wang, Bansal, Gimpel, McAllester, EMNLP, 2016)
Alignment AggregationParaphrasing
Alignment
WordVectors+RNNstorepresentDocumentandQuery
Multiplepassesoverthedocument
+PointerSumAttention
AggregationParaphrasing
MultiplicativeAttention
Biases
• Ascompositionsofwordvectors
arrested Illinois governor Rod Blagojevich
Thingswhichhelp:Ø PretrainedGloveembeddingsØ RandomvectorsforOOVtokensattesttime.
• Betterthantrained“UNK”embedding.Ø Characterembeddings
*Dhingra,Liu,Salakhutdinov,Cohen(preprint,2017)**Yang,Dhingra,Yuan,Hu,Cohen,Salakhutdinov(ICLR,2017)
RepresentingDocument/Query
• BidirectionalGatedRecurrentUnitsprocessthetokensfromlefttorightandrighttoleft
arrested Illinois governor Rod Blagojevich
RepresentingDocument/Query
dt = [�!ht ; �ht ]
• Bothdocumentandqueryarerepresentedasmatrices
RepresentingDocument/Query
arrested Illinois governor Rod Blagojevich corruption by X who was
Q 2 R2h⇥|Q|D 2 R2h⇥|D|
h–StatesizeofeachGRU
• ForeachtokeninD,weformatokenspecificrepresentationofthequery
Blagojevich corruption by X who was
di
GatedAttentionMechanism
Dhingra et al (ACL 2017)
↵ij = softmax(qTj di)q̃i =
|Q|X
j=1
↵ijqj
document query
qj
GatedAttentionMechanism• Useelement-wisemultiplicationtogatethe
interactionbetweendocumenttokensandquery
corruption by X who wasBlagojevich
di
d0i = di ⇤ q̃i
Dhingra et al (ACL 2017)
q̃i =
|Q|X
j=1
↵ijqj
1. Findfeaturesinthequerywhichmatchthecontextualrepresentationofthedocument
2. Gatethedocumentrepresentationbymultiplyingwiththesefeatures
MultiHopArchitecture• Performseveralpassesoverthedocument
– Allowmodeltocombineevidencefrommultiplesentences
Dhingra et al (ACL 2017)corruption by X who wasBlagojevicharrested
RepeatforKlayers
GatedAttention
GatedAttention
Multi-hopArchitecture• Performseveralpassesoverthedocument
– Allowmodeltocombineevidencefrommultiplesentences
Dhingra et al (ACL 2017)
OutputModel• Probabilitythataparticulartokeninthedocumentanswersthequery:
Ø Takeaninnerproductbetweenthequeryembeddingandtheoutputofthelastlayer:
• Theprobabilityofaparticularcandidateisthenaggregatedoveralldocumenttokenswhichappearinc:
setofpositionswhereatokenincappearsinthedocumentd.
PointerSumAttentionofKadlecetal.,2016
si =exp (hq(K), d(K)
i i)P
i0 exp (hq(K), d(K)i0 i)
, i = 1, . . . , |D|
• Theprobabilityofaparticularcandidateisthenaggregatedoveralldocumenttokenswhichappearinc:
• Thecandidatewithmaximumprobabilityisselectedasthepredictedanswer:
• Usecross-entropylossbetweenthepredictedprobabilitiesandthetrueanswers.
OutputModel
Results
Datasets
Westudied5datasets:
Results
Datasets
Westudied5datasets:Stateoftheart
Dhingra et al (ACL 2017)
AnalysisofAttention
Query:“President-electBarackObamasaidTuesdayhewasnotawareofallegedcorruptionbyXwhowasarrestedonchargesoftryingtosellObama’ssenateseat.”FindX.
Document:“…arrestedIllinoisgovernorRodBlagojevichandhischiefofstaffJohnHarrisoncorruptioncharges…includedBlogojevichallegedlyconspiringtosellortradethesenateseatleftvacantbyPresident-electBarackObama…”
Blagojevich corruption by X who was
di
↵ij = softmax(qTj di)q̃i =
|Q|X
j=1
↵ijqj
AnalysisofAttention
Query:“President-electBarackObamasaidTuesdayhewasnotawareofallegedcorruptionbyXwhowasarrestedonchargesoftryingtosellObama’ssenateseat.”FindX.
Document:“…arrestedIllinoisgovernorRodBlagojevichandhischiefofstaffJohnHarrisoncorruptioncharges…includedBlogojevichallegedlyconspiringtosellortradethesenateseatleftvacantbyPresident-electBarackObama…”
Layer1 Layer2
Dhingra et al (ACL 2017)
Summarysofar
• Multiplicativeattentionfordocumentandqueryalignment
• Multiplelayersallowmodeltofocusondifferentsalientaspectsofthequery
Code+Data:https://github.com/bdhingra/ga-reader
Wordsvs.Characters• Word-levelrepresentationsaregoodatlearningthesemanticsofthetokens
• Character-levelrepresentationsaremoresuitableformodelingsub-wordmorphologies(“cat”vs.“cats”)
Ø Word-levelrepresentationsareobtainedfromalearnedlookuptable
Ø Character-levelrepresentationsareusuallyobtainedbyapplyingRNNorCNN
• Hybridword-charactermodelshavebeenshowntobesuccessfulinvariousNLPtasks(Yangetal.,2016a,Miyamoto&Cho(2016),Lingetal.,2015)
• Commonlyusedmethodistoconcatenatethesetworepresentations
Fine-GrainedGating• Fine-grainedgatingmechanism:
Word-levelrepresentation
Character-levelrepresentation
Gating
Additionalfeatures:namedentitytags,part-of-speechtags,documentfrequencyvectors,wordlook-uprepresentations
Yang et al, ICLR 2017
Children’sBookTest(CBC)Dataset
Wordsvs.Characters• Highgatevalues:character-levelrepresentations• Lowgatevalues:word-levelrepresentations.
TalkRoadmap
• MultiplicativeandFine-grainedAttention• LinguisticKnowledgeasExplicitMemoryforRNNs
Broad-ContextLanguageModelingHer plain face broke into a huge smile when she saw Terry. “Terry!” she called out. She rushed to meet him and they embraced. “Hon, I want you to meet an old friend, Owen McKenna. Owen, please meet Emily.'' She gave me a quick nod and turned back to X
Task:FindX.
LAMBADA dataset, Paperno et al., ACL 2016
(Xcanbeanywordinthevocabulary)
Passagesarefilteredsuchthathumanscancorrectlyanswerwhengiventhewholecontextbutnotwhengivenonlythelastsentence.
Broad-ContextLanguageModeling(recastasReadingComprehension)
Query:She gave me a quick nod and turned back to XFindX.(NowXisassumedtobeinthedocument)
Document:Her plain face broke into a huge smile when she saw Terry. “Terry!” she called out. She rushed to meet him and they embraced. “Hon, I want you to meet an old friend, Owen McKenna. Owen, please meet Emily.''
Chu et al, EACL 2017
• ReadingComprehensionapproachesperformbestonthistask
TextPhenomenaDocument:Her plain face broke into a huge smile when she saw Terry. “Terry!” she called out. She rushed to meet him and they embraced. “Hon, I want you to meet an old friend, Owen McKenna. Owen, please meet Emily.''
Query:
She gave me a quick nod and turned back to XFindX.
conjnsubj nmod
A0:turner A1:destination
SyntacticDependency
SemanticDependency
TextPhenomenaDocument:Her plain face broke into a huge smile when she saw Terry. “Terry!” she called out. She rushed to meet him and they embraced. “Hon, I want you to meet an old friend, Owen McKenna. Owen, please meet Emily.''
Query:
She gave me a quick nod and turned back to XFindX.
SyntacticDependency
SemanticDependency
X=Terry
Coreference
ArchitecturalBias
1. Dependent/Coreferentmentionsmaybefarapartinthedocument– RNNs(includingGRU/LSTM)tendto“forget”long-terminteractions
Usememory-augmentedarchitecture.2. Off-the-shelfNLPtoolsavailablewhichextract
thesedependencies– Example:StanfordCoreNLP
Uselinguisticannotationstoguidememorypropagationinthenetwork.
LinguisticKnowledge
there ball the left and kitchen the to went she football the got mary
CoreNLP / SENNA
there ball the left and kitchen the to went she football the got mary
1. Coreference2. SyntacticDependencies3. SemanticRoles+Inverserelations
• Bothdocumentandqueryarerepresentedasmatrices
RepresentingDocument/Query
arrested Illinois governor Rod Blagojevich corruption by X who was
Q 2 R2h⇥|Q|D 2 R2h⇥|D|
h–StatesizeofeachGRU
• Bothdocumentandqueryarerepresentedasmatrices
RepresentingDocument/QueryGraphs
Q 2 R2h⇥|Q|D 2 R2h⇥|D|
there ball the left and kitchen the to went she football the got mary there ball the left and kitchen the to went she football the got mary
GraphRepresentation GraphRepresentation
Forward/BackwardDAGs• Topologicalordergivenbythesequenceitself
there ball the left and kitchen the to went she football the got mary
there ball the left and kitchen the to went she football the got mary
ForwardDAG
BackwardDAG
*Peng et al (TACL, 2017) use similar decomposition for Relation Extraction
MemoryasAcyclicGraphEncoding(MAGE)RNN
there ball the left and kitchen the to went she football the got mary
went
she
she
left
to
kitchenh3t
h2th1t
m1t
m3t
MemoryasAcyclicGraphEncoding(MAGE)RNN
there ball the left and kitchen the to went she football the got mary
went
Updateequations:
mt = m1tkm2
tk . . . km|Ef |t ,
rt = �(Wrxst + Urmt + br),
zt = �(Wzxst + Uzmt + bz),
h̃t = tanh(Whxst + rt � Uhmt + bh),
ht = (1� zt)� ht�1 + zt � h̃t,
Memory
GRUupdate
Performance• LAMBADAdataset
– Onlyonquestionswhereanswerisincontext
Performance• LAMBADAdataset
– OnlyonquestionswhereanswerisincontextStateoftheart
Document:marytravelledtothekitchen.danielgotthemilk.hewenttothekitchen.johnpickeduptheapple.marywenttothehallway.johnmovedtothebedroom.sandrajourneyedtothekitchen.danielwentbacktothebedroom.johnwenttotheoffice.danielputdownthemilk.hepickedupthemilk.sandrawenttothebedroom.johndroppedtheapple.marywentbacktothebathroom.
Performance• FacebookbAbIdataset
– Task3(1000trainingexamples)
Query:wherewastheapplebeforetheoffice?
Answer:bedroom
Onlycoreferenceannotationsextractedforthisdataset
Performance• FacebookbAbIdataset
– Task3
Performance• FacebookbAbIdataset
– Task3Onehotindicatorfeatureforentitymentionappendedtoinput
Performance• FacebookbAbIdataset
– Task3
Summarysofar
• LinguisticannotationscanhelpguidememorypropagationinRNNs– Bettermodelingoflongtermdependencies
AnnotatorNoise
• Performancedropssharplywithaddednoiseinlinguisticannotations
• Canwejointlytrainannotationandreadingcomprehensionmodels?– Canwelearntask-specificknowledge?
“Common”Sense?
Query:“Whowasthefemalememberofthe1980’spopmusicduoEurythmics?”
Document:“EurythmicswereaBritishmusicduoconsistingofmembersAnnieLennoxandDavidA.Stewart”
Answer:AnnieLennox
TriviaQA Dataset (Joshi et al, ACL 2017)
• Wherecanwefindsuchknowledge?• Needmoredatasetsfortestingthisaspecttoo.
HerplainfacebrokeintoahugesmilewhenshesawTerry.“Terry!”shecalledout.Sherushedtomeethimandtheyembraced.“Hon,Iwantyoutomeetanoldfriend,OwenMcKenna.Owen,pleasemeetEmily.'’ShegavemeaquicknodandturnedbacktoX
CoreferenceDependencyParses
Entityrelations
Wordrelations
CoreNLP
Freebase
WordNet
RecurrentNeuralNetwork
TextRepresentation
IncorporatingPriorKnowledge
Search-and-ReadforopendomainQA• RCassumespassagecontaininganswerisalreadyknown
• ForrealQAweneedtosearchforthepassageaswell
7-ElevenstoresweretemporarilyconvertedintoKwikE-martstopromotethereleaseofwhatmovie?
QUASAR Dataset (Dhingra, Mazaitis, Cohen, in submission)
InJuly2007,7-ElevenredesignedsomestorestolooklikeKwik-E-MartsinselectcitiestopromoteTheSimpsonsMovie.
Corpus
Tie-inpromotionsweremadewithseveralcompanies,including7-Eleven,whichtransformedselectedstoresintoKwik-E-Marts.
Retrieval
7-ElevenBecomesKwik-E-MartforSimpsonsMoviePromotion
Excerpts
Thankyou