Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
YoichiIshibashi,Sho Sugimoto,Hisashi Miyamori (KyotoSangyoUniversity,Japan){g1445539,g1444693,miya}@cc.kyoto-su.ac.jp
NTCIR-13STC-2
Generation-basedmethod
Input
The University Entrance exam will be held on 14th and 15th.
Well, today is All Japan Figure Skating Championships.
14日と15日は、 大学入試センター試験です。
さあ、きょうまずは、フィギュアスケートの全日本選手権。
Outputby Baseline
There will be a large-scale fire that is also in western Japan and eastern
Japan.
Aiming for four consecutive championships in the women's singles, athletes of the Japanese championships
participated in the tournament.
西日本や東日本にもなる、 も大規模な火災ができます。
女子シングルで4連覇を狙うで、 日本選手権の選手が、 大会に出場しました
Outputby ACM
It is highly expected to be snowy and windy.
A player who has won the gold medal in women's singles.
雪の風の予想が 広がっています。
女子シングルで3した金メダル を獲得している選手。
Image associated from
the input
Words generated mainly from the associated image
Snowy Gold medal
雪 金メダル
1. Visualassociation fromtheinputtext2. Fusion betweentextualinformationandassociatedvisualinformation3. Responsetextgenerationbasedonthefusedinformation.
How canthevisualinformationbeusedwithoutimageinput?
Priortrain1:Extractionofcontextvectorsbetweentextualandvisualinformation
Priortrain2:Learningforvisualassociation
Finalstep:Generationofresponsetextviaassociation
ThemodelhastwoLSTMencoder,onefusionlayer,andonedecoderwithattention.!"iscalculatedbyfollowingequationatthefusionlayer. Thetrainedmodelisusedtoextractthecorrespondencebetweenthetextualandvisualinformation.
Theassociativeencoder(LSTM)istrained,whichinputsthetextualcontextvector!"$"extractedinstep1andoutputsthevisualcontextvector!%&'correspondingtotheinpututterancetext.
Figure2showsanetworkusedinstep3.Learningisperformedinthenetworkwherethevisualencoderinstep1isreplacedwiththeassociativeencodertrainedinstep2.
• In Rule 1, Mean 34456@2and Mean 34456@1, which are the accuracy whenonly L2 is correct, were more than 0.5, indicating that Run2 had many L1 andfew L2.
•InRule2,Mean34456@2,andMean34456@1gotworsethanthoseinRule1
• Subtitles in TV drama and the video where the subtitles were displayed.→ The image or video may contain
more detailed informationthan texts.
Why doweusethevisualinformation?
• ComparisonbetweenourmodelandbaselinemodelBaselinemodel:Anencoder-decodermodelwithattentionmechanism
•Ourgeneration-basedmodelistrainedwithTVnewsdatainsteadofTVdramadata.
•SubtitlesinTVnewsandthevideowherethesubtitlesweredisplayed.
Ourgeneration-basedmodel(Run2)istrainedwithTVdramadata.
•Ourmodel(ACM)associatedvisualinformationfromtheinputtext.
•Ourmodelcouldgenerateresponsesincludingusefulinformationcomparedwiththemodelwithoutassociation.
Figure 1: Network in step 1
Figure 2: Network in step 3
NTCIR-13STC-2formal-run
AdditionalExperiments
KSUTeam’sDialogueSystemattheNTCIR-13ShortTextConversationTask2
• our model associates the snow scenewith the input text and generatedthe word “snowy” (Fig.3 left).•Notethatitisafactthatitactuallysnowedonthedayoftheexam.
LearningMethod
Overview
!"%&' = )*+,(!""$")
!" = /"$"!""$" +/%&'!"%&' + 12
insufficient amount of learning data, lack of tuning,and�many unknown words (proper nouns) in the test data.
Abstract Conclusion
Figure 3: Example of comparison results
Intheretrieval-based methods, itwasfoundthattheaccuracyisimprovedbyexcludingplacenamesandpersonnamesfromarticlethemes.Inthegeneration-based method,wecouldnotfindenoughevidencefromtheresultsusingtheevaluationdataofSTCtoshowthattheassociatedvisualinformationwouldacceleratetogeneratemoreappropriateresponses. Newly,weconductedadditionalexperimentbasedontheproblemsfoundintheevaluationresults.Asapreliminaryresult,itwasconfirmedthatvisualinformationseemedtoworkeffectivelyinseveralexamples.
Intheretrieval-basedmethods,acommenttextwithhighsimilaritywiththegivenutterancetextisobtained,andthereplytexttothecommenttextisreturnedastheresponsetotheinput.Inthegeneration-basedmethod,weproposetheAssociativeConversationModelthatgeneratesvisualinformationfromtextualinformationandusesitforgeneratingsentencesinordertoutilizevisualinformationinadialoguesystemwithoutimageinput.
• ResponseSelectionThefinaloutputisthereplytext correspondingtotheobtainedcommenttextinDocumentRetrieval.
Attention
Thefactorscanbeconsideredasproblems;
Attributions Fig3left: “NHK����7”,NHK,11January2017Fig3right: “NHK����7”,NHK,23February2017
• QueryGenerationAqueryusedforsimilaritysearchisgeneratedfromthegiveninputdatabyusingtheanalyzersprovidedinSolr.InRun1,theplacenames andpersonnameswereremovedfromthewordsinthethemeasthesearchquery.
• DocumentRetrievalSimilaritysearchbyOkapiBM25 isconductedforonlycommenttextsintheYahoo!newscommentsdata.Theeachquerywasweighted.(refertothetableontheleft)
Retrieval-basedmethod
Output
QueryGeneration
InputDataComment
TextTitle
inYahoo!Topics Theme
DocumentRetrievalDialogueCorpus
ResponseSelection
Querytype WeightCommenttext 1.0TitleinYahoo!Topics 9.1Theme 3.5
Result
Input
<Commenttext> Istoppedgoingoutbybicyclebecausetheforecastsaiditwouldrainorsnowat12o’clock,butthereisnothingfallingyet.
<TitleinYahoo!Topics> SnowfallsindowntownTokyo.SnowlikelytopileupintheKantoflatland<Theme> TochigiPrefecture,Weatherforecast,Gunmaprefecture,Ibaraki prefecture,Snowdamageanditsmeasures
Run1 Run3
Rank1
Itwouldbefunifthesnowoccasionallypiledupindowntownarea.BecauseeventhoughitistheKantolocalnews,thenewsprogramofthekeystationmakesafusslikeanationwideincident.
It'samazingthatasmanyastwoschoolswillparticipatefromKochiprefecture,includinga"21st-centuryselection"slot.
Rank2 Unusually,therewasnosnowlastmonth,wasn'tit? It'salsoheavysnowinthesouthernpartofKanagawaprefecture.
Rank3
It'dbeeasiertoobjectivelyunderstandthesituationwiththeexpressionslikepossibilityorprobability.Ridiculoustodescribeitwiththeword"worry"...It'squiteanemotionalexpression.
Intheelectricityindustry,NiigataprefectureiswithinthejurisdictionofTohokuElectricPowerCo.
OverviewofArchitecture
(Sourceisdescribedbelow)
ExampleofVisualAssociation
ResultsandAnalysisofVisualAssociation
TVnewsDataset
Results
TVdramaDataset
Source